Reports (tm._reports)
Report objects are outputted by the tablemage.Analyzer.eda(),
tablemage.Analyzer.ols(), tablemage.Analyzer.logit(),
tablemage.Analyzer.regress(),
and tablemage.Analyzer.classify() methods of the tablemage.Analyzer class.
They may contain information about model performance, feature importance, or other
relevant statistics. They also have methods for plotting relevant diagnostic figures.
tm._reports.MLClassificationReport
- class tablemage._reports.MLClassificationReport(models: list[BaseC], datahandler: DataHandler, target: str, predictors: list[str], feature_selectors: list[BaseFSC] | None = None, max_n_features: int | None = None, outer_cv: int | None = None, outer_cv_seed: int = 42, verbose: bool = True)[source]
Class for evaluating multiple classification models. Fits the model based on provided DataHandler.
- cv_metrics(average_across_folds: bool = True) DataFrame | None[source]
Returns a DataFrame containing the evaluation metrics for all models on the training data. Cross validation must have been conducted, otherwise None is returned.
- Parameters:
average_across_folds (bool) – Default: True. If True, returns a DataFrame containing goodness-of-fit statistics across all folds.
- Returns:
None is returned if cross validation was not conducted.
- Return type:
pd.DataFrame | None
- cv_metrics_by_class(averaged_across_folds: bool = True) DataFrame | None[source]
Returns a DataFrame containing the cross-validated evaluation metrics for all models on the specified data, broken down by class.
- Parameters:
averaged_across_folds (bool) – Default: True. If True, returns a DataFrame containing goodness-of-fit statistics across all folds.
- Returns:
None is returned if cross validation was not conducted.
- Return type:
pd.DataFrame | None
- feature_importance(model_id: str) DataFrame | None[source]
Returns the feature importances of the model with the specified id. If the model does not have feature importances, the coefficients are returned instead. Otherwise, None is returned.
- Parameters:
model_id (str) – The id of the model.
- Returns:
None is returned if the model does not have feature importances or coefficients.
- Return type:
pd.DataFrame | None
- fs_report() VotingSelectionReport | None[source]
Returns the feature selection report. If feature selectors were specified at the model level or not at all, then this method will return None.
To access the feature selection report for a specific model, use model_report(<model_id>).feature_selection_report().
- Returns:
None is returned if no feature selectors were specified.
- Return type:
VotingSelectionReport | None
- is_binary() bool[source]
Returns True if the target variable is binary.
- Returns:
True if the target variable is binary.
- Return type:
bool
- metrics(dataset: Literal['train', 'test', 'both']) DataFrame[source]
Returns a DataFrame containing the evaluation metrics for all models on the specified data.
- Parameters:
dataset (Literal['train', 'test', 'both']) – The dataset to return the metrics for.
- Return type:
pd.DataFrame
- metrics_by_class(dataset: Literal['train', 'test']) DataFrame | None[source]
Returns a DataFrame containing the evaluation metrics for all models on the specified data, broken down by class.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to return the fit statistics for.
- Returns:
None is returned if the model is binary.
- Return type:
pd.DataFrame | None
- model(model_id: str) BaseC[source]
Returns the model with the specified id.
- Parameters:
model_id (str) – The id of the model.
- Return type:
BaseC
- plot_confusion_matrix(model_id: str, dataset: Literal['train', 'test'], figsize: tuple[float, float] = (5, 5), ax: Axes | None = None) Figure[source]
Returns a figure that is the confusion matrix for the model.
- Parameters:
model_id (str) – The id of the model.
dataset (Literal['train', 'test']) – The dataset to plot the confusion matrix for.
figsize (tuple[float, float]) – Default: (5, 5). The size of the figure.
ax (plt.Axes | None) – Default: None. The axes on which to plot the figure. If None, a new figure is created.
- Returns:
Figure of the confusion matrix.
- Return type:
plt.Figure
- plot_roc_curve(model_id: str, dataset: Literal['train', 'test'], figsize: tuple[float, float] = (5, 5), ax: Axes | None = None) Figure | None[source]
Plots the ROC curve for a single model.
- Parameters:
model_id (str) – The id of the model.
dataset (Literal['train', 'test']) – The dataset to plot the ROC curve for.
figsize (tuple[float, float]) – Default: (5, 5). The size of the figure.
- Returns:
Figure of the ROC curve. None is returned if the model is not binary.
- Return type:
plt.Figure | None
- plot_roc_curves(dataset: Literal['train', 'test'], figsize: tuple[float, float] = (5, 5), ax: Axes | None = None) Figure[source]
Plots the ROC curves for all models.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to plot the ROC curves for.
figsize (tuple[float, float]) – Default: (5, 5). The size of the figure.
ax (plt.Axes | None) – Default: None. The axes to plot on. If None, a new figure is created.
tm._reports.MLRegressionReport
- class tablemage._reports.MLRegressionReport(models: list[BaseR], datahandler: DataHandler, target: str, predictors: list[str], feature_selectors: list[BaseFSR] | None = None, max_n_features: int | None = None, outer_cv: int | None = None, outer_cv_seed: int = 42, verbose: bool = True)[source]
Class for reporting model goodness of fit. Fits the model based on provided DataHandler.
- cv_metrics(average_across_folds: bool = True) DataFrame | None[source]
Returns a DataFrame containing the cross-validated goodness-of-fit statistics for all models on the training data. Cross validation must have been conducted, otherwise None is returned.
- Parameters:
average_across_folds (bool) – Default: True. If True, returns a DataFrame containing goodness-of-fit statistics averaged across all folds. Otherwise, returns a DataFrame containing goodness-of-fit statistics for each fold.
- Returns:
None if cross validation was not conducted.
- Return type:
pd.DataFrame | None
- feature_importance(model_id: str) DataFrame | None[source]
Returns the feature importances of the model with the specified id. If the model does not have feature importances, the coefficients are returned instead. Otherwise, None is returned.
- Parameters:
model_id (str) – The id of the model.
- Returns:
None is returned if the model does not have feature importances or coefficients.
- Return type:
pd.DataFrame | None
- fs_report() VotingSelectionReport | None[source]
Returns the feature selection report. If feature selectors were specified at the model level or not at all, then this method will return None.
To access the feature selection report for a specific model, use model_report(<model_id>).feature_selection_report().
- Returns:
None if feature selectors were not specified.
- Return type:
VotingSelectionReport | None
- metrics(dataset: Literal['train', 'test', 'both']) DataFrame[source]
Returns a DataFrame containing the metrics for all models on the specified data.
- Parameters:
dataset (Literal['train', 'test', 'both']) – The dataset for which to return the metrics.
- Return type:
pd.DataFrame
- model(model_id: str) BaseR[source]
Returns the model with the specified id.
- Parameters:
model_id (str) – The id of the model.
- Return type:
BaseR
- plot_obs_vs_pred(model_id: str, dataset: Literal['train', 'test'], figsize: tuple[float, float] = (5, 5), ax: Axes | None = None) Figure[source]
Returns a figure that is a scatter plot of the observed (y-axis) and predicted (x-axis) values for the specified model and dataset.
- Parameters:
model_id (str) – The id of the model.
dataset (Literal['train', 'test']) – The dataset for which to plot the observed vs predicted values.
figsize (tuple[float, float]) – Default: (5, 5). The size of the figure.
ax (plt.Axes | None) – Default: None. The axes on which to plot the figure. If None, a new figure is created.
- Return type:
plt.Figure
tm._reports.OLSReport
- class tablemage._reports.OLSReport(model: OLSLinearModel, datahandler: DataHandler, target: str, predictors: list[str], dataemitter: DataEmitter | None = None)[source]
OLSReport. Fits the model based on provided DataHandler. Contains methods for generating regression-relevant diagnostic plots and tables for a single linear regression model.
- coefs(format: Literal['coef(se)|pval', 'coef|se|pval', 'coef(ci)|pval', 'coef|ci_low|ci_high|pval'] = 'coef(se)|pval') DataFrame[source]
Returns the coefficients of the model.
- Parameters:
format (Literal["coef(se)|pval", "coef|se|pval", "coef(ci)|pval",) – “coef|ci_low|ci_high|pval”] Default: ‘coef(se)|pval’.
- Return type:
pd.DataFrame
- get_outlier_indices(dataset: Literal['train', 'test'] = 'test') list[source]
Returns the indices corresponding to DataFrame examples associated with standardized residual outliers.
- Parameters:
dataset (Literal['train', 'test']) – Default: ‘test’.
- Returns:
outliers_df_idx
- Return type:
list ~ (n_outliers)
- metrics(dataset: Literal['train', 'test', 'both']) DataFrame[source]
Returns a DataFrame containing the goodness-of-fit statistics for the model.
- Parameters:
dataset (Literal['train', 'test', 'both']) – The dataset to compute the metrics for.
- Return type:
pd.DataFrame
- model() OLSLinearModel[source]
Returns the fitted OLSLinearModel object.
- Return type:
OLSLinearModel
- plot_diagnostics(dataset: Literal['train', 'test'], show_outliers: bool = False, figsize: tuple[float, float] = (7.0, 7.0)) Figure[source]
Plots several useful linear regression diagnostic plots.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to generate the plot for.
show_outliers (bool) – Default: False. If True, plots the residual outliers in red.
figsize (tuple[float, float]) – Default: (7.0, 7.0).
- Return type:
plt.Figure
- plot_obs_vs_pred(dataset: Literal['train', 'test'], show_outliers: bool = True, figsize: tuple[float, float] = (5.0, 5.0), ax: Axes | None = None) Figure[source]
Plots a scatter plot of the true and predicted y values.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to generate the plot for.
show_outliers (bool) – Default: True. If True, then the outliers calculated using standard errors will be shown in red.
figsize (tuple[float, float]) – Default: (5.0,5.0). Sets the size of the resulting graph.
ax (plt.Axes) – Default: None.
- Return type:
Figure
- plot_qq(dataset: Literal['train', 'test'], standardized: bool = True, show_outliers: bool = False, figsize: tuple[float, float] = (5.0, 5.0), ax: Axes | None = None) Figure[source]
Plots a quantile-quantile plot of the residuals.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to generate the plot for.
standardized (bool) – Default: True. If True, standardizes the residuals.
show_outliers (bool) – Default: False. If True, plots the outliers in red.
figsize (tuple[float, float]) – Default: (5.0, 5.0).
ax (plt.Axes) – Default: None.
- Return type:
plt.Figure
- plot_residuals_hist(dataset: Literal['train', 'test'], standardized: bool = False, density: bool = False, figsize: tuple[float, float] = (5.0, 5.0), ax: Axes | None = None) Figure[source]
Returns a figure that is a histogram of the residuals.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to generate the plot for.
standardized (bool) – Default: False. If True, standardizes the residuals.
density (bool) – Default: False. If True, plots density rather than frequency.
figsize (tuple[float, float]) – Default: (5.0, 5.0). Determines the size of the returned figure.
ax (plt.Axes) – Default: None.
- Return type:
plt.Figure
- plot_residuals_vs_fitted(dataset: Literal['train', 'test'], standardized: bool = False, show_outliers: bool = True, figsize: tuple[float, float] = (5.0, 5.0), ax: Axes | None = None) Figure[source]
Plots the residuals versus the fitted values.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to generate the plot for.
standardized (bool) – Default: False. If True, plots the standardized residuals as opposed to the raw residuals.
show_outliers (bool) – Default: True. If True, colors the outliers determined by the standardized residuals in red.
figsize (tuple[float, float]) – Default: (5.0, 5.0). Determines the size of the returned figure.
ax (plt.Axes) – Default: None.
- Return type:
plt.Figure
- plot_residuals_vs_leverage(dataset: Literal['train', 'test'], standardized: bool = True, show_outliers: bool = True, figsize: tuple[float, float] = (5.0, 5.0), ax: Axes | None = None) Figure[source]
Plots the residuals versus leverage.
- Parameters:
dataset (Literal['train', 'test']) – Default: ‘test’.
standardized (bool) – Default: True. If True, standardizes the residuals.
show_outliers (bool) – Default: True. If True, plots the outliers in red.
figsize (tuple[float, float]) – Default: (5.0, 5.0).
ax (plt.Axes) – Default: None.
- Return type:
plt.Figure
- plot_residuals_vs_var(predictor: str, dataset: Literal['train', 'test'], standardized: bool = False, show_outliers: bool = False, figsize: tuple[float, float] = (5.0, 5.0), ax: Axes | None = None) Figure[source]
Returns a figure that is a residuals vs fitted (y_pred) plot.
- Parameters:
predictor (str) – The predictor variable whose values should be plotted on the x-axis.
dataset (Literal['train', 'test']) – The dataset to generate the plot for.
standardized (bool) – Default: False. If True, standardizes the residuals.
show_outliers (bool) – Default: False. If True, plots the outliers in red.
figsize (tuple[float, float]) – Default: (5.0, 5.0). Determines the size of the returned figure.
ax (plt.Axes) – Default: None.
- Return type:
plt.Figure
- plot_scale_location(dataset: Literal['train', 'test'], show_outliers: bool = True, figsize: tuple[float, float] = (5.0, 5.0), ax: Axes | None = None) Figure[source]
Returns a figure that is a plot of the sqrt of the residuals versus the fitted.
- Parameters:
dataset (Literal['train', 'test']) – The dataset to generate the plot for.
show_outliers (bool) – Default: True. If True, plots the outliers in red.
figsize (tuple[float, float]) – Default: (5.0, 5.0).
ax (plt.Axes) – Default: None.
- Return type:
plt.Figure
- set_outlier_threshold(threshold: float) OLSReport[source]
Standardized residuals threshold for outlier identification. Recomputes the outliers.
- Parameters:
threshold (float) – Default: 2. Must be a nonnegative value.
- Returns:
Returns self for method chaining.
- Return type:
- statsmodels_summary()[source]
Returns the summary of the statsmodels RegressionResultsWrapper for OLS.
- step(direction: Literal['both', 'backward', 'forward'] = 'backward', criteria: Literal['aic', 'bic'] = 'aic', kept_vars: list[str] | None = None, all_vars: list[str] | None = None, start_vars: list[str] | None = None, max_steps: int = 100) OLSReport[source]
Performs stepwise selection. Returns a new OLSReport object with the reduced model.
- Parameters:
direction (Literal["both", "backward", "forward"]) – Default: ‘backward’. The direction of the stepwise selection.
criteria (Literal["aic", "bic"]) – Default: ‘aic’. The criteria to use for selecting the best model.
kept_vars (list[str]) – Default: None. The variables that should be kept in the model. If None, defaults to an empty list.
all_vars (list[str]) – Default: None. The variables that are candidates for inclusion in the model. If None, defaults to all variables in the training data.
start_vars (list[str]) – Default: None. The variables to start the bidirectional stepwise selection with. Ignored if direction is not ‘both’. If direction is ‘both’ and start_vars is None, then the starting variables are the kept_vars.
max_steps (int) – Default: 100. The maximum number of steps to take.
- Return type:
- test_lr(alternative_report: OLSReport) StatisticalTestReport[source]
Performs a likelihood ratio test to compare an alternative OLSLinearModel. Returns an object of class StatisticalTestReport describing the results.
- Parameters:
alternative_report (OLSReport) – The report of an alternative OLSLinearModel. The alternative model must be a nested version of the current model or vice-versa.
- Return type:
- test_partialf(alternative_report: OLSReport) StatisticalTestReport[source]
Performs a partial F-test to compare an alternative OLSLinearModel. Returns an object of class StatisticalTestReport describing the results.
- Parameters:
alternative_report (OLSReport) – The report of an alternative OLSLinearModel. The alternative model must be a nested version of the current model or vice-versa.
- Return type:
tm._reports.EDAReport
- class tablemage._reports.EDAReport(df: DataFrame)[source]
Class for generating EDA-relevant plots and tables for all variables.
- anova(numeric_var: str, stratify_by: str, strategy: Literal['auto', 'anova_oneway', 'kruskal'] = 'auto') StatisticalTestReport[source]
Tests for equal means between three or more groups. Null hypothesis: All group means are equal. Alternative hypothesis: At least one group’s mean is different from the others. NaNs in numeric_var and stratify_by are dropped before the test is conducted.
- Parameters:
numeric_var (str) – Numeric variable name to be stratified and compared.
stratify_by (str) – Categorical variable name.
strategy (Literal['auto', 'anova_oneway', 'kruskal']) – Default: ‘auto’. If ‘auto’, a test is selected as follows: If the data in any group is not normally distributed or not homoskedastic, then the Kruskal-Wallis test is used. Otherwise, the one-way ANOVA test is used. ANOVA is somewhat robust to heteroscedasticity and violations of the normality assumption.
- Return type:
StatisticalTestResult
- categorical_stats() DataFrame | None[source]
Returns a DataFrame containing summary statistics for all categorical variables.
Returns None if there are no categorical variables.
- Return type:
pd.DataFrame | None
- categorical_vars() list[str][source]
Returns a list of the names of all categorical variables.
- Return type:
list[str]
- chi2(categorical_var_1: str, categorical_var_2: str) StatisticalTestReport[source]
Tests for independence between two categorical variables using the chi-squared test.
- Parameters:
categorical_var_1 (str) – Name of the first categorical variable.
categorical_var_2 (str) – Name of the second categorical variable.
- Returns:
A structured report of the statistical test results.
- Return type:
- numeric_stats() DataFrame | None[source]
Returns a DataFrame containing summary statistics for all numeric variables.
Returns None if there are no numeric variables.
- Return type:
pd.DataFrame | None
- numeric_vars() list[str][source]
Returns a list of the names of all numeric variables.
- Return type:
list[str]
- plot(x: str, y: str | None = None, figsize: tuple[float, float] = (5, 5), ax: Axes | None = None) Figure[source]
General purpose plot method for single variable distributions and relationships between two variables. Variables may be numeric or categorical.
If both numeric, scatter plot is produced. If one numeric and one categorical, boxplot is produced. If both categorical, cross tab heatmap is produced.
- Parameters:
x (str) – The name of the variable to plot on the x-axis.
y (str | None) – Default: None. The name of the variable to plot on the y-axis.
figsize (tuple[float, float]) – Default: (5, 5). The size of the figure. Only used if ax is None.
ax (plt.Axes | None) – Default: None. The axes to plot on. If None, a new figure is created.
- plot_correlation_heatmap(numeric_vars: list[str] | None = None, htest: bool = False, cmap: str | Colormap | None = None, figsize: tuple[float, float] = (7, 7), ax: Axes | None = None) Figure[source]
Plots a heatmap of the correlation matrix of the numeric variables.
- Parameters:
numeric_vars (list[str] | None) – List of numeric variables to include in the heatmap. If None, all numeric variables are considered.
htest (bool) – If True, displays correlation coefficients with their corresponding p-values in parentheses.
cmap (str | plt.Colormap | None) – The colormap to use for the heatmap visualization. If None, uses a default colormap.
figsize (tuple[float, float]) – The size of the figure (width, height) in inches. Only used if ax is None.
ax (plt.Axes | None) – If provided, the plot is drawn on this Axes instance.
- Returns:
The figure containing the correlation heatmap.
- Return type:
plt.Figure
- plot_pairs(vars: list[str] | None = None, htest: bool = True, figsize: tuple[float, float] = (7, 7)) Figure[source]
Plots pairwise relationships among the specified variables (numeric or categorical).
Diagonal plots show distributions of single variables, lower panels show one type of plot, upper panels another.
- Parameters:
df (pd.DataFrame) – Your DataFrame containing the data.
vars (list[str] | None) – Default: None. A list of variable names (numeric or categorical). If None, all columns are considered.
htest (bool) – Default: True. If True, includes correlation coefficients and p-values for numeric-numeric pairs, chi-squared test results for categorical-categorical pairs, and either t-test or ANOVA results for numeric-categorical pairs in the upper triangle.
figsize (tuple[float, float]) – Default: (7, 7). The size of the figure.
- Return type:
plt.Figure
- plot_pca(numeric_vars: list[str], stratify_by: str | None = None, strata: Series | None = None, scale_strategy: Literal['standardize', 'center', 'none'] = 'center', whiten: bool = False, three_components: bool = False, figsize: tuple[float, float] = (5, 5), ax: Axes | None = None) Figure[source]
Plots the first two (or three) principle components, optionally stratified by an additional variable. Drops examples with missing values across the given variables of interest.
- Parameters:
numeric_vars (list[str]) – List of numeric variables across which the PCA will be performed.
stratify_by (str) – Categorical variable from which strata are identified.
strata (pd.Series | None) – Default: None. The lables/strata. Must be the same length as the dataset. Index must be compatible with self.df. Overidden by stratify_by if both provided.
scale_strategy (Literal["standardize", "center", "none"].) – Default: “center”.
whiten (bool) – Default: False. If True, performs whitening on the data during PCA.
three_components (bool) – Default: False. If True, returns a 3D plot. Otherwise plots the first two components only.
figsize (tuple[float, float]) – Default: (5, 5). The size of the figure. Only used if ax is None.
ax (plt.Axes | None) – Default: None. If not None, does not return a figure; plots the plot directly onto the input Axes.
- Return type:
plt.Figure
- tabulate_correlation_comparison(numeric_vars: list[str], target: str, bonferroni_correction: bool = False) DataFrame[source]
Generates a table of the Pearson correlation coefficients between the numeric variables and a target variable.
- Parameters:
numeric_vars (list[str]) – List of numeric variables.
target (str) – The numeric variable to correlate the numeric_vars with.
bonferroni_correction (bool, default=False) – If True, applies the Bonferroni correction to the p-values (multiplies them by the number of tests).
dropna (bool, default=True) – If True, drops rows with NaN values when computing correlations. If False, raises an error if NaN values are present.
- Returns:
DataFrame with index as the numeric variables. Columns include the Pearson correlation coefficient, p-value, and number of units considered (if dropna was True).
- Return type:
pd.DataFrame
- tabulate_correlation_matrix(numeric_vars: list[str], htest: bool = False) DataFrame[source]
Generates a table of the Pearson correlation coefficients between numeric variables.
The function computes correlations efficiently by leveraging numpy operations and avoiding redundant calculations. For symmetric pairs (i,j) and (j,i), it only computes one and mirrors the result. Handles missing values by using pairwise complete observations.
- Parameters:
numeric_vars (list[str]) – List of numeric variables to compute correlations for.
htest (bool, default=False) – If True, includes p-values in the output in format: “corr (p-val)”
- Returns:
DataFrame with index and columns as the numeric variables. Values are either correlation coefficients or “correlation (p-value)” if p_values=True. Missing values are represented as “NA”.
- Return type:
pd.DataFrame
- Raises:
ValueError – If any variable in numeric_vars is not a known numeric variable.
- tabulate_tableone(vars: list[str], stratify_by: str | None, show_missingness: bool = True, show_htest_name: bool = True, bonferroni_correction: bool = False) TableOne[source]
Generates a tableone for the given variables stratified by the given variable.
- Parameters:
vars (list[str]) – List of variables to include in the tableone.
stratify_by (str) – Categorical variable to stratify by.
show_missingness (bool) – Default: True. If True, includes missingness information in the table.
show_htest_name (bool) – Default: True. If True, includes the name of the hypothesis test in the table.
bonferroni_correction (bool) – Default: False. If True, applies Bonferroni correction to the p-values.
- Return type:
TableOne
- test_categorical_independence(categorical_var_1: str, categorical_var_2: str) StatisticalTestReport[source]
Tests for independence between two categorical variables using the chi-squared test.
- Parameters:
categorical_var_1 (str) – Name of the first categorical variable.
categorical_var_2 (str) – Name of the second categorical variable.
- Returns:
A structured report of the statistical test results.
- Return type:
- test_equal_means(numeric_var: str, stratify_by: str) StatisticalTestReport[source]
Conducts the appropriate statistical test to test for equal means between two ore more groups (null hypothesis).
- Parameters:
numeric_var (str) – Numeric variable name to be stratified and compared.
stratify_by (str) – Categorical variable name.
- Return type:
StatisticalTestResult
- test_normality(numeric_var: str, method: Literal['shapiro', 'kstest', 'anderson'] = 'shapiro') StatisticalTestReport[source]
Tests the normality of a numeric variable.
- Parameters:
numeric_var (str) – Numeric variable name.
method (str) – Default: ‘shapiro’. The normality test to use. Options: ‘shapiro’, ‘kstest’, ‘anderson’.
- Return type:
StatisticalTestResult
- ttest(numeric_var: str, stratify_by: str, strategy: Literal['auto', 'student', 'welch', 'yuen', 'mann-whitney'] = 'welch') StatisticalTestReport[source]
Conducts the appropriate statistical test to test for equal means between two groups. The parameter stratify_by must be the name of a binary variable, i.e. a categorical or numeric variable with exactly two unique values.
Null hypothesis: mu_1 = mu_2. Alternative hypothesis: mu_1 != mu_2 This is a two-sided test.
- NaNs in numeric_var and stratify_by
are dropped before the test is conducted.
- Parameters:
numeric_var (str) – numeric variable name to be stratified and compared.
stratify_by (str) – Categorical or numeric variable name. Must be binary.
strategy (Literal['auto', 'student', 'welch', 'yuen', 'mann-whitney']) – Default: ‘welch’. If ‘auto’, a test is selected as follows: If the data in either group is not normally distributed, and the variances are not equal, then Yuen’s (20% trimmed mean) t-test is used. If the data in either group is not normally distributed, but the variances are equal, then the Mann-Whitney U test is used. If the data in both groups are normally distributed but the variances are not equal, Welch’s t-test is used. Otherwise, Student’s t-test is used.
- Return type:
StatisticalTestResult
- value_counts(var: str, normalize: bool = False) DataFrame[source]
Returns the value counts for a given categorical variable as a DataFrame, with first column as the unique values and the second column as the counts.
- Parameters:
var (str) – Categorical variable name.
normalize (bool) – Default: False. If True, returns the value counts as proportions.
- Return type:
pd.DataFrame
tm._reports.VotingSelectionReport
- class tablemage._reports.VotingSelectionReport(selectors: list[BaseFS], dataemitter: DataEmitter, max_n_features: int | None = None, verbose: bool = True)[source]
Class for generating feature selection-relevant tables.
- all_features() list[source]
Returns a list of all features considered by the voting selectors.
- Returns:
All features.
- Return type:
list
tm._reports.StatisticalTestReport
- class tablemage._reports.StatisticalTestReport(description: str, statistic: float, pval: float, descriptive_statistic: float | None = None, degfree: float | None = None, statistic_description: str | None = None, descriptive_statistic_description: str | None = None, null_hypothesis_description: str | None = None, alternative_hypothesis_description: str | None = None, assumptions_description: str | list | None = None, long_description: str | None = None)[source]
Class for storing and displaying statistical test results.
tm._reports.CausalReport
- class tablemage._reports.CausalReport(estimate: float, se: float, n_units: int, n_units_treated: int, outcome_var: str, treatment_var: str, confounders: list[str], estimand: str, method: str, method_description: str, p_value: float | None = None)[source]
Class for storing and displaying causal inference results.