ITMO_FS.filters.univariate.su_measure

ITMO_FS.filters.univariate.su_measure(x, y)

SU is a correlation measure between the features and the class calculated via formula SU(X,Y) = 2 * I(X|Y) / (H(X) + H(Y)). Bigger values mean more important features. This measure works best with discrete features due to being based on information theory.

Parameters:
  • x (array-like, shape (n_samples, n_features)) – The training input samples.
  • y (array-like, shape (n_samples,)) – The target values.
Returns:

array-like, shape (n_features,)

Return type:

feature scores

See also

https()
//pdfs.semanticscholar.org/9964/c7b42e6ab311f88e493b3fc552515e0c764a.pdf

Examples

>>> from ITMO_FS.filters.univariate import su_measure
>>> from sklearn.preprocessing import KBinsDiscretizer
>>> import numpy as np
>>> x = np.array([[3, 3, 3, 2, 2], [3, 3, 1, 2, 3], [1, 3, 5, 1, 1],
... [3, 1, 4, 3, 1], [3, 1, 2, 3, 1]])
>>> y = np.array([1, 3, 2, 1, 2])
>>> est = KBinsDiscretizer(n_bins=10, encode='ordinal')
>>> x = est.fit_transform(x)
>>> su_measure(x, y)
array([0.28694182, 0.13715115, 0.79187567, 0.47435099, 0.67126949])