ITMO_FS.filters.univariate.information_gain

ITMO_FS.filters.univariate.information_gain(x, y)

Calculate mutual information for each feature by formula I(X,Y) = H(Y) - H(Y|X). Bigger values mean more important features. This measure works best with discrete features due to being based on information theory.

Parameters:
  • x (array-like, shape (n_samples, n_features)) – The training input samples.
  • y (array-like, shape (n_samples,)) – The target values.
Returns:

array-like, shape (n_features,)

Return type:

feature scores

Examples

>>> from ITMO_FS.filters.univariate import information_gain
>>> import numpy as np
>>> from sklearn.preprocessing import KBinsDiscretizer
>>> x = np.array([[1, 2, 3, 3, 1], [2, 2, 3, 3, 2], [1, 3, 3, 1, 3],
... [3, 1, 3, 1, 4], [4, 4, 3, 1, 5]])
>>> y = np.array([1, 2, 3, 4, 5])
>>> est = KBinsDiscretizer(n_bins=10, encode='ordinal')
>>> x = est.fit_transform(x)
>>> information_gain(x, y)
array([1.33217904, 1.33217904, 0.        , 0.67301167, 1.60943791])