`ITMO_FS.embedded`.MOS¶

class ITMO_FS.embedded.MOS(model, weight_func, loss='log', seed=42, l1_ratio=0.5, threshold=0.001, epochs=1000, alphas=array([0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19]), sampling=False, k_neighbors=2)¶

Perform Minimizing Overlapping Selection under SMOTE (MOSS) or under No-Sampling (MOSNS) algorithm.

Parameters:

model (object) – The model that should have a fit(X, y) method and a field corresponding to feature weights. Currently only SGDClassifier should be passed, other models would not work.
weight_func (callable) – The function to extract weights from the model.
loss (str, 'log' or 'hinge') – Loss function to use in the algorithm. ‘log’ gives a logistic regression, while ‘hinge’ gives a support vector machine.
seed (int, optional) – Seed for python random.
l1_ratio (float) – The value used to balance the L1 and L2 penalties in elastic-net.
threshold (float) – The threshold value for feature dropout. Instead of comparing them to zero, they are normalized and values with absolute value lower than the threshold are dropped out.
epochs (int) – The number of epochs to perform in the algorithm.
alphas (array-like, shape (n_alphas,), optional) – The range of lambdas that should form the regularization path.
sampling (bool) – Bool value that control whether MOSS (True) or MOSNS (False) should be executed.
k_neighbors (int) – Amount of nearest neighbors to use in SMOTE if MOSS is used.

Notes

For more details see this paper.

Examples

>>> from ITMO_FS.embedded import MOS
>>> from sklearn.linear_model import SGDClassifier
>>> import numpy as np
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> dataset = make_classification(n_samples=100, n_features=10,
... n_informative=5, n_redundant=0, weights=[0.85, 0.15], random_state=42,
... shuffle=False)
>>> X, y = np.array(dataset[0]), np.array(dataset[1])
>>> m = MOS(model=SGDClassifier(),
... weight_func=lambda model: np.square(model.coef_).sum(axis=0)).fit(X, y)
>>> m.selected_features_
array([1, 3, 4], dtype=int64)
>>> m = MOS(model=SGDClassifier(), sampling=True,
... weight_func=lambda model: np.square(model.coef_).sum(axis=0)).fit(X, y)
>>> m.selected_features_
array([1, 3, 4, 6], dtype=int64)

__init__(model, weight_func, loss='log', seed=42, l1_ratio=0.5, threshold=0.001, epochs=1000, alphas=array([0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19]), sampling=False, k_neighbors=2)¶: Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None, **fit_params)¶

Fit the algorithm.

Parameters:	X (array-like, shape (n_samples, n_features)) – The training input samples. y (array-like, shape (n_samples,), optional) – The class labels. fit_params (dict, optional) – Additional parameters to pass to underlying _fit function.
Returns:
Return type:	Self, i.e. the transformer object.

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X ({array-like, sparse matrix, dataframe} of shape (n_samples, n_features)) – y (ndarray of shape (n_samples,), default=None) – Target values. *fit_params (dict*) – Additional fit parameters.
Returns:	X_new – Transformed array.
Return type:	ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params – Parameter names mapped to their values.
Return type:	mapping of string to any

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:	*params (dict*) – Estimator parameters.
Returns:	self – Estimator instance.
Return type:	object

transform(X)¶

Transform given data by slicing it with selected features.

Parameters:	X (array-like, shape (n_samples, n_features)) – The training input samples.
Returns:
Return type:	Transformed 2D numpy array

ITMO_FS.embedded.MOS¶

`ITMO_FS.embedded`.MOS¶