metrics#
Kernel functions and helper utilities for kernel-based calculations.
Most of the commonly-used kernel functions are already implemented in
sklearn.metrics.pairwise. The RBF kernel and pairwise Euclidean distance
function is re-implemented here as an alternative, in case sklearn is
unavailable. Most, if not all of the kernel functions defined in
sklearn.metrics.pairwise should be compatible with the functions defined here.
Attention
In Matlab, data is typically ordered differently than in Python. In Matlab, data is
ordered in columns, whereas in Python, data is ordered in rows. If you import
data from a Matlab file, be sure to transpose it if needed to follow Python
formatting, i.e. X = X.T.
For example, the data in X and Y should be organized as:
X = [[--- x1 ---],
[--- x2 ---],
...
[--- xn ---]]
Y = [[--- y1 ---],
[--- y2 ---],
...
[--- ym ---]]
- gym_socks.kernel.metrics.abel_kernel(X, Y=None, sigma=None, D=None)[source]#
Abel kernel function.
- Parameters
X (numpy.ndarray) – A 2D array with observations oganized in ROWS.
Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.
sigma (Optional[float]) – Strictly positive real-valued kernel parameter.
D (Optional[numpy.ndarray]) – Pairwise distance matrix.
- Returns
Gram matrix of pairwise evaluations of the kernel function.
- Return type
numpy.ndarray
- gym_socks.kernel.metrics.check_pairwise_arrays(X, Y=None, dtype=numpy.float64, ensure_finite=True, copy=False)[source]#
Check pairwise arrays.
Note
This function is intended as a simple replacement for the
sklearn.metrics.pairwise.check_pairwise_arrays()function. Unlikesklearn, this function does not check sparse input data, and does not do sophisticated type checking or upcasting.- Parameters
X – A 2D array with observations oganized in ROWS.
Y – A 2D array with observations oganized in ROWS.
dtype (numpy.dtype) – The data type of the resulting array.
copy (bool) – Whether to create a forced copy of
array.ensure_finite (bool) – Whether to raise an error if the array is not finite.
- Returns
The validated arrays
XandY.
- gym_socks.kernel.metrics.check_pairwise_distances(D, shape, copy=True)[source]#
Validate the pairwise distance matrix.
Performs checks to ensure the pairwise distance matrix is valid.
- Parameters
D (numpy.ndarray) – Pairwise distance matrix.
shape (tuple) – The desired shape of the array.
copy (bool) – Whether to create a forced copy of
D.
- Returns
The validated matrix.
- gym_socks.kernel.metrics.delta_kernel(X, Y=None)[source]#
Delta (discrete) kernel function.
The delta kernel is defined as \(k(x_{i}, x_{j}) = \delta_{ij}\), meaning the kernel returns a 1 if the vectors are the same, and a 0 otherwise. The vectors in
XandYshould have discrete values, meaning each element in the vector should be a natural number or integer value.- Parameters
X (numpy.ndarray) – A 2D array with observations oganized in ROWS.
Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.
- Returns
Gram matrix of pairwise evaluations of the kernel function.
- gym_socks.kernel.metrics.euclidean_distances(X, Y=None, squared=False)[source]#
Compute the pairwise Euclidean distance matrix between points.
Note
This function is intended as a simple replacement for the
sklearn.metrics.pairwise.euclidean_distances()function. Unlikesklearn, this function does not check sparse input data, and does not do sophisticated type checking or upcasting.- Parameters
X – A 2D array with observations oganized in ROWS.
Y – A 2D array with observations oganized in ROWS.
squared (bool) – Whether the result is squared before returning.
- Returns
The matrix of pairwise Euclidean distances between points.
- gym_socks.kernel.metrics.hybrid_kernel(X, Q, Y=None, R=None)[source]#
Hybrid systems kernel.
In a hybrid system, we split the sample according to the mode
Q. The vectors inQandRshould have discrete values, meaning each element in the vector should be a natural number or integer value.\[\begin{split}d((x, q), (x', q')) = \begin{cases} \zeta(x - x'), & q = q' \\ 1, & q \neq q' \end{cases}\end{split}\]\[\zeta(x) = (2/\pi) \max_{1 \leq i \leq n} \tan^{-1} \vert x_{i} \vert\]\[k(x, q, x', q') = 1 - d((x, q), (x', q'))\]- Parameters
X (numpy.ndarray) – A 2D array with observations oganized in ROWS.
Q (numpy.ndarray) – A 2D array with observations oganized in ROWS.
Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.
R (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.
- Returns
Gram matrix of pairwise evaluations of the kernel function.
- gym_socks.kernel.metrics.rbf_kernel(X, Y=None, sigma=None, D=None)[source]#
RBF kernel function.
Computes the pairwise evaluation of the RBF kernel on each vector in
XandY. For example, ifXhas \(m\) vecotrs, andYhas \(n\) vectors, then the result is an \(m \times n\) matrix \(K\) where \(K_{ij} = k(x_i, y_j)\).\[\begin{split}K = \begin{bmatrix} k(x_1,y_1) & \cdots & k(x_1,y_n) \\ \vdots & \ddots & \vdots \\ k(x_m,y_1) & \cdots & k(x_m,y_n) \end{bmatrix}\end{split}\]Note
This function is intended as a simple replacement for the
sklearn.metrics.pairwise.rbf_kernel()function. Unlikesklearn, this function does not check sparse input data, and does not do sophisticated type checking or upcasting.The main difference between this implementation and the
sklearn.metrics.pairwise.rbf_kernel()insklearn.metrics.pairwiseis that this function optionally allows you to specify a different distance metric in the event the data is non-Euclidean.Attention
If you are familiar with
sklearn.metrics.pairwise.rbf_kernel(), note that the parametersigmais not the same as thegammaparameter used bysklearn.metrics.pairwise.rbf_kernel(). However, they are related:\[\gamma = \frac{1}{2 \sigma^{2}}\]- Parameters
X (numpy.ndarray) – A 2D array with observations oganized in ROWS.
Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.
sigma (Optional[float]) – Strictly positive real-valued kernel parameter.
D (Optional[numpy.ndarray]) – Pairwise distance matrix.
- Returns
Gram matrix of pairwise evaluations of the kernel function.
- Return type
numpy.ndarray
- gym_socks.kernel.metrics.regularized_inverse(G, regularization_param=None, copy=True)[source]#
Regularized inverse.
Computes the regularized matrix inverse.
\[W = (G + \lambda M I)^{-1}, \quad G \in \mathbb{R}^{n \times n}, \quad G_{ij} = k(x_{i}, y_{j})\]- Parameters
G (numpy.ndarray) – The Gram (kernel) matrix.
regularization_param (Optional[float]) – The regularization parameter \(\lambda > 0\).
copy (bool) – Whether to create a forced copy of
G.
- Returns
Regularized matrix inverse.
- Return type
numpy.ndarray
- gym_socks.kernel.metrics.woodbury_inverse(A, U, C, V, precomputed=False)[source]#
Computes the matrix inverse using the Woodbury matrix identity.
\[W = (A + U C V)^{-1} = A^{-1} - A^{-1} U (C^{-1} + V A^{-1} U)^{-1} V A^{-1}\]where \(A\) is \(n \times n\), \(C\) is \(k \times k\), \(U\) is \(n \times k\), and \(V\) is \(k \times n\).
This function is useful for computing the regularized inverse in a more computationally efficient manner. This happens because the matrices \(A\) and \(C\) are typically easy to invert manually, either because the inverses are known a priori, are scalar matrices, or identity, leading to a smaller matrix inversion in the calculations.
Example
>>> import numpy as np >>> from gym_socks.kernel.metrics import regularized_inverse >>> from gym_socks.kernel.metrics import woodbury_inverse >>> X = np.random.randn(100, 2) >>> G = X @ X.T >>> timeit regularized_inverse(G, 1) 184 µs ± 7.14 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) >>> A = (1 / 100) * np.identity(100) >>> C = np.identity(2) >>> timeit woodbury_inverse(A, X, C, X.T, precomputed=True) 78.9 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) >>> W1 = regularized_inverse(G, 1) >>> W2 = woodbury_inverse(A, X, C, X.T, precomputed=True) >>> np.allclose(W1, W2) True
- Parameters
A (numpy.ndarray) – A conformable square matrix. Must be nonsingular.
U (numpy.ndarray) – A conformable matrix.
C (numpy.ndarray) – A conformable square matrix. Must be nonsingular.
V (numpy.ndarray) – A conformable matrix.
precomputed (bool) – Whether
AandCare the precomputed inverses.
- Returns
The resulting inverse matrix \(W\).
- Return type
numpy.ndarray