metrics#

Kernel functions and helper utilities for kernel-based calculations.

Most of the commonly-used kernel functions are already implemented in sklearn.metrics.pairwise. The RBF kernel and pairwise Euclidean distance function is re-implemented here as an alternative, in case sklearn is unavailable. Most, if not all of the kernel functions defined in sklearn.metrics.pairwise should be compatible with the functions defined here.

Attention

In Matlab, data is typically ordered differently than in Python. In Matlab, data is ordered in columns, whereas in Python, data is ordered in rows. If you import data from a Matlab file, be sure to transpose it if needed to follow Python formatting, i.e. X = X.T.

For example, the data in X and Y should be organized as:

X = [[--- x1 ---],
     [--- x2 ---],
     ...
     [--- xn ---]]

Y = [[--- y1 ---],
     [--- y2 ---],
     ...
     [--- ym ---]]
gym_socks.kernel.metrics.abel_kernel(X, Y=None, sigma=None, D=None)[source]#

Abel kernel function.

Parameters
  • X (numpy.ndarray) – A 2D array with observations oganized in ROWS.

  • Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.

  • sigma (Optional[float]) – Strictly positive real-valued kernel parameter.

  • D (Optional[numpy.ndarray]) – Pairwise distance matrix.

Returns

Gram matrix of pairwise evaluations of the kernel function.

Return type

numpy.ndarray

gym_socks.kernel.metrics.check_pairwise_arrays(X, Y=None, dtype=numpy.float64, ensure_finite=True, copy=False)[source]#

Check pairwise arrays.

Note

This function is intended as a simple replacement for the sklearn.metrics.pairwise.check_pairwise_arrays() function. Unlike sklearn, this function does not check sparse input data, and does not do sophisticated type checking or upcasting.

Parameters
  • X – A 2D array with observations oganized in ROWS.

  • Y – A 2D array with observations oganized in ROWS.

  • dtype (numpy.dtype) – The data type of the resulting array.

  • copy (bool) – Whether to create a forced copy of array.

  • ensure_finite (bool) – Whether to raise an error if the array is not finite.

Returns

The validated arrays X and Y.

gym_socks.kernel.metrics.check_pairwise_distances(D, shape, copy=True)[source]#

Validate the pairwise distance matrix.

Performs checks to ensure the pairwise distance matrix is valid.

Parameters
  • D (numpy.ndarray) – Pairwise distance matrix.

  • shape (tuple) – The desired shape of the array.

  • copy (bool) – Whether to create a forced copy of D.

Returns

The validated matrix.

gym_socks.kernel.metrics.delta_kernel(X, Y=None)[source]#

Delta (discrete) kernel function.

The delta kernel is defined as \(k(x_{i}, x_{j}) = \delta_{ij}\), meaning the kernel returns a 1 if the vectors are the same, and a 0 otherwise. The vectors in X and Y should have discrete values, meaning each element in the vector should be a natural number or integer value.

Parameters
  • X (numpy.ndarray) – A 2D array with observations oganized in ROWS.

  • Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.

Returns

Gram matrix of pairwise evaluations of the kernel function.

gym_socks.kernel.metrics.euclidean_distances(X, Y=None, squared=False)[source]#

Compute the pairwise Euclidean distance matrix between points.

Note

This function is intended as a simple replacement for the sklearn.metrics.pairwise.euclidean_distances() function. Unlike sklearn, this function does not check sparse input data, and does not do sophisticated type checking or upcasting.

Parameters
  • X – A 2D array with observations oganized in ROWS.

  • Y – A 2D array with observations oganized in ROWS.

  • squared (bool) – Whether the result is squared before returning.

Returns

The matrix of pairwise Euclidean distances between points.

gym_socks.kernel.metrics.hybrid_kernel(X, Q, Y=None, R=None)[source]#

Hybrid systems kernel.

In a hybrid system, we split the sample according to the mode Q. The vectors in Q and R should have discrete values, meaning each element in the vector should be a natural number or integer value.

\[\begin{split}d((x, q), (x', q')) = \begin{cases} \zeta(x - x'), & q = q' \\ 1, & q \neq q' \end{cases}\end{split}\]
\[\zeta(x) = (2/\pi) \max_{1 \leq i \leq n} \tan^{-1} \vert x_{i} \vert\]
\[k(x, q, x', q') = 1 - d((x, q), (x', q'))\]
Parameters
  • X (numpy.ndarray) – A 2D array with observations oganized in ROWS.

  • Q (numpy.ndarray) – A 2D array with observations oganized in ROWS.

  • Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.

  • R (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.

Returns

Gram matrix of pairwise evaluations of the kernel function.

gym_socks.kernel.metrics.rbf_kernel(X, Y=None, sigma=None, D=None)[source]#

RBF kernel function.

Computes the pairwise evaluation of the RBF kernel on each vector in X and Y. For example, if X has \(m\) vecotrs, and Y has \(n\) vectors, then the result is an \(m \times n\) matrix \(K\) where \(K_{ij} = k(x_i, y_j)\).

\[\begin{split}K = \begin{bmatrix} k(x_1,y_1) & \cdots & k(x_1,y_n) \\ \vdots & \ddots & \vdots \\ k(x_m,y_1) & \cdots & k(x_m,y_n) \end{bmatrix}\end{split}\]

Note

This function is intended as a simple replacement for the sklearn.metrics.pairwise.rbf_kernel() function. Unlike sklearn, this function does not check sparse input data, and does not do sophisticated type checking or upcasting.

The main difference between this implementation and the sklearn.metrics.pairwise.rbf_kernel() in sklearn.metrics.pairwise is that this function optionally allows you to specify a different distance metric in the event the data is non-Euclidean.

Attention

If you are familiar with sklearn.metrics.pairwise.rbf_kernel(), note that the parameter sigma is not the same as the gamma parameter used by sklearn.metrics.pairwise.rbf_kernel(). However, they are related:

\[\gamma = \frac{1}{2 \sigma^{2}}\]
Parameters
  • X (numpy.ndarray) – A 2D array with observations oganized in ROWS.

  • Y (Optional[numpy.ndarray]) – A 2D array with observations oganized in ROWS.

  • sigma (Optional[float]) – Strictly positive real-valued kernel parameter.

  • D (Optional[numpy.ndarray]) – Pairwise distance matrix.

Returns

Gram matrix of pairwise evaluations of the kernel function.

Return type

numpy.ndarray

gym_socks.kernel.metrics.regularized_inverse(G, regularization_param=None, copy=True)[source]#

Regularized inverse.

Computes the regularized matrix inverse.

\[W = (G + \lambda M I)^{-1}, \quad G \in \mathbb{R}^{n \times n}, \quad G_{ij} = k(x_{i}, y_{j})\]
Parameters
  • G (numpy.ndarray) – The Gram (kernel) matrix.

  • regularization_param (Optional[float]) – The regularization parameter \(\lambda > 0\).

  • copy (bool) – Whether to create a forced copy of G.

Returns

Regularized matrix inverse.

Return type

numpy.ndarray

gym_socks.kernel.metrics.woodbury_inverse(A, U, C, V, precomputed=False)[source]#

Computes the matrix inverse using the Woodbury matrix identity.

\[W = (A + U C V)^{-1} = A^{-1} - A^{-1} U (C^{-1} + V A^{-1} U)^{-1} V A^{-1}\]

where \(A\) is \(n \times n\), \(C\) is \(k \times k\), \(U\) is \(n \times k\), and \(V\) is \(k \times n\).

This function is useful for computing the regularized inverse in a more computationally efficient manner. This happens because the matrices \(A\) and \(C\) are typically easy to invert manually, either because the inverses are known a priori, are scalar matrices, or identity, leading to a smaller matrix inversion in the calculations.

Example

>>> import numpy as np
>>> from gym_socks.kernel.metrics import regularized_inverse
>>> from gym_socks.kernel.metrics import woodbury_inverse
>>> X = np.random.randn(100, 2)
>>> G = X @ X.T
>>> timeit regularized_inverse(G, 1)
184 µs ± 7.14 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> A = (1 / 100) * np.identity(100)
>>> C = np.identity(2)
>>> timeit woodbury_inverse(A, X, C, X.T, precomputed=True)
78.9 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> W1 = regularized_inverse(G, 1)
>>> W2 = woodbury_inverse(A, X, C, X.T, precomputed=True)
>>> np.allclose(W1, W2)
True
Parameters
  • A (numpy.ndarray) – A conformable square matrix. Must be nonsingular.

  • U (numpy.ndarray) – A conformable matrix.

  • C (numpy.ndarray) – A conformable square matrix. Must be nonsingular.

  • V (numpy.ndarray) – A conformable matrix.

  • precomputed (bool) – Whether A and C are the precomputed inverses.

Returns

The resulting inverse matrix \(W\).

Return type

numpy.ndarray