ITMO_FS.filters.multivariate
.STIR¶
-
class
ITMO_FS.filters.multivariate.
STIR
(n_features_to_keep=10)¶ Feature selection using STIR algorithm.
Algorithm taken from paper:
STatistical Inference Relief (STIR) feature selection (https://academic.oup.com/bioinformatics/article/35/8/1358/5100883).
-
__init__
(n_features_to_keep=10)¶ Sets up STIR to perform feature selection.
-
distance_matrix
(X)¶ Computes the distance matrix.
Before calculating distance we center matrix and normalize it.
Parameters: X (array-like, shape (n_samples, n_features)) – matrix to compute column difference of. Returns: X_distances – distance matrix. Return type: array-like, shape (n_samples, n_samples)
-
find_neighbors
(X, y, k=1)¶ Finds the nearest hit/miss matrices.
Parameters: - X (array-like, shape (n_samples, n_features)) – matrix to compute neighbors of.
- y (array-like, shape (n_samples, )) – vector of binary class status (usually -1/1).
- k (int, optional) – number of constant nearest hits/misses.
Returns: hitmiss – hitmiss[1] (hits) and hitmiss[2] (misses). Each list has two columns: index is the first column (instances) in both lists. The second column is hit_index (nearest hits for the first column instance) for list [1] and miss_index (nearest misses) for list [2].
Return type: array-like, shape (2, )
-
fit
(X, y, feature_names=None, k=1)¶ Computes the feature importance scores from the training data.
Parameters: - X (array-like, shape (n_samples, n_features)) – Training instances to compute the feature importance scores from.
- y (array-like, shape (n_samples, )) – Training labels.
- feature_names (list of strings, optional) – In case you want to define feature names
- k (int, optional) – number of constant nearest hits/misses.
Returns: Return type: None
-
fit_transform
(X, y, feature_names=None, k=1)¶ Fits and transforms data.
Computes the feature importance scores from the training data, then reduces the feature set down to the top ‘n_features_to_keep’ features.
Parameters: - X (array-like, shape (n_samples, n_features)) – Training instances to compute the feature importance scores from.
- y (array-like, shape (n_samples, )) – Training labels.
- feature_names (list of strings, optional) – In case you want to define feature names
- k (int, optional) – number of constant nearest hits/misses.
Returns: Return type: Transformed 2D numpy array
-
max_diff
(X)¶ Computes max difference in each column.
Parameters: X (array-like, shape (n_samples, n_features)) – matrix to compute column difference of. Returns: diff_vector – column difference vector. Return type: array-like, shape (n_features)
-
transform
(X)¶ Reduces the feature set down to the top n_features_to_keep features.
Parameters: X (array-like, shape (n_samples, n_features)) – Feature matrix to perform feature selection on. Returns: Return type: Transformed 2D numpy array
-