ITMO_FS.filters.univariate.UnivariateFilter

class ITMO_FS.filters.univariate.UnivariateFilter(measure, cutting_rule=('Best by percentage', 0.2))

Basic interface for using univariate measures for feature selection. List of available measures is in ITMO_FS.filters.univariate.measures, also you can provide your own measure but it should suit the argument scheme for measures, i.e. take two arguments x,y and return scores for all the features in dataset x. Same applies to cutting rules.

Parameters:
  • measure (string or callable) – A metric name defined in GLOB_MEASURE or a callable with signature measure (sample dataset, labels of dataset samples) which should return a list of metric values for each feature in the dataset.
  • cutting_rule (string or callables) – A cutting rule name defined in GLOB_CR or a callable with signature cutting_rule (features), which should return a list features ranked by some rule.

Examples

>>> from sklearn.datasets import make_classification
>>> from ITMO_FS.filters.univariate import select_k_best
>>> from ITMO_FS.filters.univariate import UnivariateFilter
>>> from ITMO_FS.filters.univariate import f_ratio_measure
>>> x, y = make_classification(1000, 100, n_informative = 10, n_redundant = 30, n_repeated = 10, shuffle = False)
>>> ufilter = UnivariateFilter(f_ratio_measure, select_k_best(10))
>>> ufilter.fit(x, y)
>>> print(ufilter.selected_features)
__init__(measure, cutting_rule=('Best by percentage', 0.2))

Initialize self. See help(type(self)) for accurate signature.

fit(X, y, feature_names=None, store_scores=True)

Fits the filter.

Parameters:
  • X (array-like, shape (n_features, n_samples)) – The training input samples.
  • y (array-like, shape (n_samples, )) – The target values.
  • feature_names (list of strings, optional) – In case you want to define feature names
  • store_scores (boolean, optional (by default False)) – In case you want to store the scores of features for future calls to Univariate filter
Returns:

Return type:

None

fit_transform(X, y=None, feature_names=None, store_scores=False, **fit_params)

Fits the filter and transforms given dataset X.

Parameters:
  • X (array-like, shape (n_features, n_samples)) – The training input samples.
  • y (array-like, shape (n_samples, ), optional) – The target values.
  • feature_names (list of strings, optional) – In case you want to define feature names
  • store_scores (boolean, optional (by default False)) – In case you want to store the scores of features for future calls to Univariate filter
  • **fit_params – dictonary of measure parameter if needed.
Returns:

Return type:

X dataset sliced with features selected by the filter

get_scores(X, y, feature_names)

Counts feature scores on given data.

Parameters:
  • X (array-like, shape (n_features, n_samples)) – The training input samples.
  • y (array-like, shape (n_samples, )) – The target values.
  • feature_names (list of strings) – In case you want to define feature names
Returns:

dictionary of format

Return type:

key - feature_names, values - feature scores

transform(X)

Slices given dataset by previously selected features.

Parameters:X (array-like, shape (n_features, n_samples)) – The training input samples.
Returns:
Return type:X dataset sliced with features selected by the filter