Feature Selectors (tm.fs)

The tablemage.fs module contains the feature selectors used by the tablemage.Analyzer.regress() and tablemage.Analyzer.classify() methods of the tablemage.Analyzer class.

tm.fs.KBestFSR

class tablemage.fs.KBestFSR(scorer: Literal['f_regression', 'r_regression', 'mutual_info_regression'], k: int, name: str | None = None)[source]

Selects the k best features based on the f_regression, r_regression, or mutual info regression score.

__init__(scorer: Literal['f_regression', 'r_regression', 'mutual_info_regression'], k: int, name: str | None = None)[source]

Constructs a KBestFSR.

Parameters:
  • scorer (Literal['f_regression', 'r_regression',) – ‘mutual_info_regression’]

  • k (int) – Number of desired features, < n_predictors.

  • name (str | None) – Default: None. If None, then outputs the class name.

tm.fs.LassoFSR

class tablemage.fs.LassoFSR(max_n_features: int, alpha: float | None = None, name: str | None = None)[source]

Selects the (at most) k best features via Lasso regression model-inherent feature selection.

__init__(max_n_features: int, alpha: float | None = None, name: str | None = None)[source]

Constructs a LassoFSR.

Parameters:
  • max_n_features (int) – Number of desired features, < n_predictors.

  • alpha (float | None) – Default: None. Regularization term weight. If None, then alpha is selected via five-fold cross validation from a default grid of candidate alphas.

  • name (str | None) – Default: None. If None, then name is set to default.

tm.fs.BorutaFSR

class tablemage.fs.BorutaFSR(estimator: Literal['random_forest', 'xgboost'] = 'random_forest', n_estimators: int = 100, max_depth: int = 5, model_random_state: int = 42, n_jobs: int = -1, name: str | None = None)[source]
__init__(estimator: Literal['random_forest', 'xgboost'] = 'random_forest', n_estimators: int = 100, max_depth: int = 5, model_random_state: int = 42, n_jobs: int = -1, name: str | None = None)[source]

Constructs a BorutaFSR.

Parameters:
  • estimator (Literal["random_forest", "xgboost"]) – Default: “random_forest”. The estimator to use for Boruta. Default hyperparameters are used for the estimator.

  • n_estimators (int) – Default: 100. The number of estimators to use for Boruta.

  • max_depth (int) – Default: 5. The maximum depth of the trees in the ensemble.

  • model_random_state (int) – Default: 42. The random state to use for the estimator.

  • n_jobs (int) – Default: -1. The number of jobs to run in parallel.

  • name (str | None) – Default: None. If None, then outputs the default name.

tm.fs.KBestFSC

class tablemage.fs.KBestFSC(scorer: Literal['f_classif', 'mutual_info_classif', 'chi2'], k: int, name: str | None = None)[source]

Selects the k best features based on the f_classif or mutual info regression score.

__init__(scorer: Literal['f_classif', 'mutual_info_classif', 'chi2'], k: int, name: str | None = None)[source]

Initializes a KBestFSC object.

Parameters:
  • scorer (Literal['f_classif', 'mutual_info_classif'])

  • k (int) – Number of desired features, < n_predictors.

  • name (str | None) – Default: None. If None, then outputs the default name.

tm.fs.LassoFSC

class tablemage.fs.LassoFSC(max_n_features: int, c: float | None = None, name: str | None = None)[source]

Selects the (at most) k best features via Lasso regression model-inherent feature selection.

__init__(max_n_features: int, c: float | None = None, name: str | None = None)[source]

Constructs a LassoFSC.

Parameters:
  • max_n_features (int) – Number of desired features, < n_predictors.

  • c (float | None) – Default: None. Inverse of regularization strength. If None, then c is selected via five-fold cross validation from a grid of 10 candidate values, on a log scale from 1e-4 to 1e4.

  • name (str | None) – Default: None. If None, then name is set to default.

tm.fs.BorutaFSC

class tablemage.fs.BorutaFSC(estimator: Literal['random_forest', 'xgboost'] = 'random_forest', n_estimators: int = 100, max_depth: int = 5, model_random_state: int = 42, name: str | None = None)[source]
__init__(estimator: Literal['random_forest', 'xgboost'] = 'random_forest', n_estimators: int = 100, max_depth: int = 5, model_random_state: int = 42, name: str | None = None)[source]

Constructs a BorutaFSC.

Parameters:
  • estimator (Literal["random_forest", "xgboost"]) – Default: “random_forest”. The estimator to use for Boruta. Default hyperparameters are used for the estimator.

  • n_estimators (int) – Default: 100. The number of estimators to use for Boruta’s estimator.

  • max_depth (int) – Default: 5. The maximum depth of the trees in the ensemble.

  • model_random_state (int) – Default: 42. The random state to use for the estimator.

  • name (str | None) – Default: None. If None, then outputs the default name.