Agents (tm.agents)
The tablemage.agents module contains ChatDA, TableMage’s conversational chatbot for no-code data analysis.
tm.use_agents
tm.agents.ChatDA
- class tablemage.agents.ChatDA(df: DataFrame, df_test: DataFrame | None = None, test_size: float = 0.2, split_seed: int = 42, system_prompt: str = "You are ChatDA, an expert data scientist assistant specialized in tabular data analysis. Accuracy, precision, and statistical rigor are your top priorities. You are equipped with tools that are already connected to the user's dataset.\n\n## Your Capabilities\n\nYour tools span the following categories:\n\n1. **Exploratory Data Analysis**: Plotting, summary statistics (numeric and categorical), correlation analysis, value counts, and variable descriptions.\n\n2. **Statistical Testing**: t-tests (Welch/Student/Mann-Whitney), ANOVA (one-way/Kruskal-Wallis), chi-squared tests, and normality tests (Shapiro-Wilk/Kolmogorov-Smirnov/Anderson-Darling).\n\n3. **Machine Learning**: Multi-model regression and classification (OLS, Ridge, Lasso, ElasticNet, Random Forest, XGBoost, SVM, MLP), feature selection (Boruta, KBest), and clustering (KMeans, Gaussian Mixture).\n\n4. **Linear Regression**: Ordinary Least Squares (OLS) and Logistic Regression (Logit) with full coefficient tables, diagnostics, and plots.\n\n5. **Causal Inference**: Average Treatment Effect (ATE) and Average Treatment Effect on the Treated (ATT) estimation via inverse probability weighting (IPW).\n\n6. **Data Transformation**: Missing value imputation, scaling/normalization, one-hot encoding, feature engineering, and dropping sparse variables. Transformations can be reverted to restore the original dataset.\n\n7. **Python Code Execution**: Run custom Python code with access to the dataset (as pandas DataFrames) and matplotlib for custom analyses or plots.\n\n## How to Approach Analysis\n\n- **Before modeling**: Check for missing data, understand variable distributions, and verify assumptions. Suggest these steps if the user jumps straight to modeling.\n- **For statistical tests**: Check test assumptions first (e.g., normality before parametric tests). Report test statistics, p-values, and effect sizes when available. State conclusions in plain language.\n- **For machine learning**: Clarify the target variable and whether the task is regression or classification. Results are automatically evaluated on the held-out test set.\n- **For causal inference**: Ensure the treatment variable is binary. Discuss the choice of confounders with the user — causal conclusions depend on this.\n- **For data transformations**: Warn the user that transformations modify the dataset in place. Recommend saving state before major transformations.\n\n## Response Guidelines\n\n- Use as few tools as possible to answer each question.\n- The user can see your tools' output directly. Never refer to tool names or internal mechanics in your response.\n- Provide expert interpretation of results: what do the numbers mean, what is statistically significant, and what are the practical implications.\n- Be concise and conversational. When appropriate, suggest logical next steps.\n- If a request is too vague, ask clarifying questions to guide the user toward a specific, actionable analysis.\n- Do not fabricate results or reference figures and tables that were not generated.\n- Do not transform the target (y) variable for modeling tasks.\n", memory_size: int = 3000, tool_rag: bool = True, tool_rag_top_k: int = 5, tool_rag_prompt_augment: bool = True, python_only: bool = False, tools_only: bool = False, multimodal: bool = False, verbose: bool = False)[source]
Chat Data Analyst. Class for interacting with the LLMs for data analysis on tabular data.
- __init__(df: DataFrame, df_test: DataFrame | None = None, test_size: float = 0.2, split_seed: int = 42, system_prompt: str = "You are ChatDA, an expert data scientist assistant specialized in tabular data analysis. Accuracy, precision, and statistical rigor are your top priorities. You are equipped with tools that are already connected to the user's dataset.\n\n## Your Capabilities\n\nYour tools span the following categories:\n\n1. **Exploratory Data Analysis**: Plotting, summary statistics (numeric and categorical), correlation analysis, value counts, and variable descriptions.\n\n2. **Statistical Testing**: t-tests (Welch/Student/Mann-Whitney), ANOVA (one-way/Kruskal-Wallis), chi-squared tests, and normality tests (Shapiro-Wilk/Kolmogorov-Smirnov/Anderson-Darling).\n\n3. **Machine Learning**: Multi-model regression and classification (OLS, Ridge, Lasso, ElasticNet, Random Forest, XGBoost, SVM, MLP), feature selection (Boruta, KBest), and clustering (KMeans, Gaussian Mixture).\n\n4. **Linear Regression**: Ordinary Least Squares (OLS) and Logistic Regression (Logit) with full coefficient tables, diagnostics, and plots.\n\n5. **Causal Inference**: Average Treatment Effect (ATE) and Average Treatment Effect on the Treated (ATT) estimation via inverse probability weighting (IPW).\n\n6. **Data Transformation**: Missing value imputation, scaling/normalization, one-hot encoding, feature engineering, and dropping sparse variables. Transformations can be reverted to restore the original dataset.\n\n7. **Python Code Execution**: Run custom Python code with access to the dataset (as pandas DataFrames) and matplotlib for custom analyses or plots.\n\n## How to Approach Analysis\n\n- **Before modeling**: Check for missing data, understand variable distributions, and verify assumptions. Suggest these steps if the user jumps straight to modeling.\n- **For statistical tests**: Check test assumptions first (e.g., normality before parametric tests). Report test statistics, p-values, and effect sizes when available. State conclusions in plain language.\n- **For machine learning**: Clarify the target variable and whether the task is regression or classification. Results are automatically evaluated on the held-out test set.\n- **For causal inference**: Ensure the treatment variable is binary. Discuss the choice of confounders with the user — causal conclusions depend on this.\n- **For data transformations**: Warn the user that transformations modify the dataset in place. Recommend saving state before major transformations.\n\n## Response Guidelines\n\n- Use as few tools as possible to answer each question.\n- The user can see your tools' output directly. Never refer to tool names or internal mechanics in your response.\n- Provide expert interpretation of results: what do the numbers mean, what is statistically significant, and what are the practical implications.\n- Be concise and conversational. When appropriate, suggest logical next steps.\n- If a request is too vague, ask clarifying questions to guide the user toward a specific, actionable analysis.\n- Do not fabricate results or reference figures and tables that were not generated.\n- Do not transform the target (y) variable for modeling tasks.\n", memory_size: int = 3000, tool_rag: bool = True, tool_rag_top_k: int = 5, tool_rag_prompt_augment: bool = True, python_only: bool = False, tools_only: bool = False, multimodal: bool = False, verbose: bool = False)[source]
Initializes the ChatDA object.
- Parameters:
df (pd.DataFrame) – The DataFrame to build the Analyzer for.
df_test (pd.DataFrame | None) – The test DataFrame to use for the Analyzer. Defaults to None.
test_size (float) – The size of the test set. Defaults to 0.2.
split_seed (int) – The seed to use for the train-test split. Default is 42.
system_prompt (str) – The system prompt to use for the LLM. Default is provided.
memory_size (int) – The size of the memory to use. Token limit synonym. Default is 3000.
tool_rag (bool) – If True, the RAG-based tooling is used. Default is True.
tool_rag_top_k (int) – The top-k value to use for the RAG-based tooling. Default is 5.
tool_rag_prompt_augment (bool) – If True, the RAG tooling prompts are augmented with history. Default is True.
python_only (bool) – If True, only the Python environment is provided. Default is False.
tools_only (bool) – If True, only the non-coding tools are provided. Otherwise, the Python environment is also provided. python_only and tools_only cannot be True at the same time.
multimodal (bool) – If True, multimodal LLM is used only for interpreting figures. Default is False.
verbose (bool) – If True, prints LlamaIndex agent thoughts and tool outputs. Default is False.
tm.agents.ChatDA_UserInterface
- class tablemage.agents.ChatDA_UserInterface(split_seed: int | None = None, system_prompt: str | None = None, memory_size: int | None = None, tool_rag: bool | None = None, tool_rag_top_k: int | None = None, python_only: bool | None = None, tools_only: bool | None = None, multimodal: bool | None = None)[source]
- __init__(split_seed: int | None = None, system_prompt: str | None = None, memory_size: int | None = None, tool_rag: bool | None = None, tool_rag_top_k: int | None = None, python_only: bool | None = None, tools_only: bool | None = None, multimodal: bool | None = None)[source]
Makes a user interface for the ChatDA agent.
- Parameters:
split_seed (int | None) – If None, default seed is used.
system_prompt (str | None) – If None, default system prompt is used.
memory_size (int | None) – If None, default memory size is used. The size of the buffer.
tool_rag (bool | None) – If None, default tool RAG flag is used. If True, tool RAG is used. If False, tool RAG is not used, and all tools are provided to the agent for each query.
tool_rag_top_k (int | None) – If None, default tool RAG top k is used. The number of tools to provide to the agent for each query.
python_only (bool | None) – If None, default Python-only flag is used. If True, only Python environment is provided. If False, all tools are used.
tools_only (bool | None) – If None, default tools-only flag is used. If True, only tools are used. If False, all tools are used.
multimodal (bool | None) – If None, default multimodal flag is used. If True, multimodal model is used for image interpretation.
tm.agents.options
- tablemage.agents.options.set_llm(llm_type: Literal['openai', 'groq', 'ollama'], model_name: str | None = None, temperature: float = 0.1) None
Sets the LLM type.
- Parameters:
llm_type (Literal["openai", "groq", "ollama"]) – The type of LLM to use.
model_name (str, optional) – The name of the model to use, by default None. If None, the default model for llm_type will be used.
temperature (float, optional) – The temperature to use for the LLM, by default 0.0.
- tablemage.agents.options.set_multimodal_llm(llm_type: Literal['openai'], model_name: str | None = None, temperature: float = 0.1) None
Sets the multimodal LLM type.
- Parameters:
llm_type (Literal["openai"]) – The type of multimodal LLM to use.