ITMO_FS.filters.multivariate.generalizedCriteria¶
-
ITMO_FS.filters.multivariate.generalizedCriteria(selected_features, free_features, x, y, beta, gamma, **kwargs)¶ This feature scoring criteria is a linear combination of all relevance, redundancy, conditional dependency Given set of already selected features and set of remaining features on dataset X with labels y selects next feature.
Parameters: - selected_features (list of ints) – already selected features
- free_features (list of ints) – free features
- x (array-like, shape (n_samples, n_features)) – The training input samples.
- y (array-like, shape (n_samples,)) – The target values.
- beta (float) – Coefficient for redundancy term.
- gamma (float) – Coefficient for conditional dependancy term.
Returns: array-like, shape (n_features,)
Return type: feature scores
Notes
See the original paper [1] for more details.
References
[1] Brown, Gavin et al. “Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection.” JMLR 2012.
Examples
>>> from ITMO_FS.filters.multivariate import CFR >>> from sklearn.preprocessing import KBinsDiscretizer >>> import numpy as np >>> est = KBinsDiscretizer(n_bins=10, encode='ordinal') >>> x = np.array([[1, 2, 3, 3, 1], [2, 2, 3, 3, 2], [1, 3, 3, 1, 3], ... [3, 1, 3, 1, 4], [4, 4, 3, 1, 5]]) >>> y = np.array([1, 2, 3, 4, 5]) >>> x = est.fit_transform(x) >>> selected_features = [] >>> other_features = [i for i in range(0, x.shape[1]) if i ... not in selected_features] >>> generalizedCriteria(np.array(selected_features), ... np.array(other_features), x, y, 0.4, 0.3) array([1.33217904, 1.33217904, 0. , 0.67301167, 1.60943791]) >>> selected_features = [1, 2] >>> other_features = [i for i in range(0, x.shape[1]) if i ... not in selected_features] >>> generalizedCriteria(np.array(selected_features), ... np.array(other_features), x, y, 0.4, 0.3) array([0.91021097, 0.403807 , 1.0765663 ])