ITMO_FS.embedded.MOS

class ITMO_FS.embedded.MOS(model=<class 'sklearn.linear_model._stochastic_gradient.SGDClassifier'>, loss='log', seed=42)

Performs Minimizing Overlapping Selection under SMOTE (MOSS) or under No-Sampling (MOSNS) algorithm.

Parameters:
  • model (constructor) – The constructor of the model that will be used. Currently only SGDClassifier should be passed, other models would not work.
  • loss (str, 'log' or 'hinge') – Loss function to use in the algorithm. ‘log’ gives a logistic regression, while ‘hinge’ gives a support vector machine.
  • seed (int) – Seed for python random.

Notes

For more details see this paper.

Examples

>>> from ITMO_FS.embedded import MOS
>>> import numpy as np
>>> from sklearn.datasets import make_classification
>>> dataset = make_classification(n_samples=100, n_features=20)
>>> data, target = np.array(dataset[0]), np.array(dataset[1])
>>> for i in range(50):  # create imbalance between classes
...     target[i] = 0
>>> print(MOS().fit_transform(data, target))
__init__(model=<class 'sklearn.linear_model._stochastic_gradient.SGDClassifier'>, loss='log', seed=42)

Initialize self. See help(type(self)) for accurate signature.

fit(X, y, l1_ratio=0.5, threshold=0.001, epochs=1000, alphas=array([0.0002, 0.0004, 0.0006, 0.0008, 0.001, 0.0012, 0.0014, 0.0016, 0.0018, 0.002, 0.0022, 0.0024, 0.0026, 0.0028, 0.003, 0.0032, 0.0034, 0.0036, 0.0038, 0.004, 0.0042, 0.0044, 0.0046, 0.0048, 0.005, 0.0052, 0.0054, 0.0056, 0.0058, 0.006, 0.0062, 0.0064, 0.0066, 0.0068, 0.007, 0.0072, 0.0074, 0.0076, 0.0078, 0.008, 0.0082, 0.0084, 0.0086, 0.0088, 0.009, 0.0092, 0.0094, 0.0096, 0.0098, 0.01, 0.0102, 0.0104, 0.0106, 0.0108, 0.011, 0.0112, 0.0114, 0.0116, 0.0118, 0.012, 0.0122, 0.0124, 0.0126, 0.0128, 0.013, 0.0132, 0.0134, 0.0136, 0.0138, 0.014, 0.0142, 0.0144, 0.0146, 0.0148, 0.015, 0.0152, 0.0154, 0.0156, 0.0158, 0.016, 0.0162, 0.0164, 0.0166, 0.0168, 0.017, 0.0172, 0.0174, 0.0176, 0.0178, 0.018, 0.0182, 0.0184, 0.0186, 0.0188, 0.019, 0.0192, 0.0194, 0.0196, 0.0198]), sampling=True, feature_names=None)

Runs the MOS algorithm on the specified dataset.

Parameters:
  • X (array-like, shape (n_samples,n_features)) – The input samples.
  • y (array-like, shape (n_samples)) – The classes for the samples.
  • l1_ratio (float, optional) – The value used to balance the L1 and L2 penalties in elastic-net.
  • threshold (float, optional) – The threshold value for feature dropout. Instead of comparing them to zero, they are normalized and values with absolute value lower than the threshold are dropped out.
  • epochs (int, optional) – The number of epochs to perform in the algorithm.
  • alphas (array-like, shape (n_alphas), optional) – The range of lambdas that should form the regularization path.
  • sampling (bool, optional) – Bool value that control whether MOSS (True) or MOSNS (False) should be executed.
  • feature_names (list of strings, optional) – In case you want to define feature names
Returns:

Return type:

None

fit_transform(X, y, l1_ratio=0.5, threshold=0.001, epochs=1000, alphas=array([0.0002, 0.0004, 0.0006, 0.0008, 0.001, 0.0012, 0.0014, 0.0016, 0.0018, 0.002, 0.0022, 0.0024, 0.0026, 0.0028, 0.003, 0.0032, 0.0034, 0.0036, 0.0038, 0.004, 0.0042, 0.0044, 0.0046, 0.0048, 0.005, 0.0052, 0.0054, 0.0056, 0.0058, 0.006, 0.0062, 0.0064, 0.0066, 0.0068, 0.007, 0.0072, 0.0074, 0.0076, 0.0078, 0.008, 0.0082, 0.0084, 0.0086, 0.0088, 0.009, 0.0092, 0.0094, 0.0096, 0.0098, 0.01, 0.0102, 0.0104, 0.0106, 0.0108, 0.011, 0.0112, 0.0114, 0.0116, 0.0118, 0.012, 0.0122, 0.0124, 0.0126, 0.0128, 0.013, 0.0132, 0.0134, 0.0136, 0.0138, 0.014, 0.0142, 0.0144, 0.0146, 0.0148, 0.015, 0.0152, 0.0154, 0.0156, 0.0158, 0.016, 0.0162, 0.0164, 0.0166, 0.0168, 0.017, 0.0172, 0.0174, 0.0176, 0.0178, 0.018, 0.0182, 0.0184, 0.0186, 0.0188, 0.019, 0.0192, 0.0194, 0.0196, 0.0198]), sampling=True, feature_names=None)

Fits the algorithm and transforms given dataset X.

Parameters:
  • X (array-like, shape (n_features, n_samples)) – The training input samples.
  • y (array-like, shape (n_samples, )) – The target values.
  • l1_ratio (float, optional) – The value used to balance the L1 and L2 penalties in elastic-net.
  • threshold (float, optional) – The threshold value for feature dropout. Instead of comparing them to zero, they are normalized and values with absolute value lower than the threshold are dropped out.
  • epochs (int, optional) – The number of epochs to perform in gradient descent.
  • alphas (array-like, shape (n_alphas), optional) – The range of lambdas that should form the regularization path.
  • sampling (bool, optional) – Bool value that control whether MOSS (True) or MOSNS (False) should be executed.
  • feature_names (list of strings, optional) – In case you want to define feature names
Returns:

Return type:

X dataset sliced with features selected by the algorithm

transform(X)

Transform given data by slicing it with selected features.

Parameters:X (array-like, shape (n_samples, n_features)) – The training input samples.
Returns:
Return type:Transformed 2D numpy array