Dictionary of dataframes from generated code not handled correctly #1497

francler · 2024-12-31T09:35:03Z

System Info

pandasai==2.4.1
python==3.12.2

ProductName: macOS
ProductVersion: 15.1.1
BuildVersion: 24B91

🐛 Describe the bug

I am using PandasAI in a complex codebase. Note that I am creating an Agent with two data frames with different columns:

self.agent = Agent(
            dfs=[self.df_1, self.df_2],
            description="Whatever the query, DO NOT MAKE PLOTS",
            config={
                "llm": self.llm,
                "save_charts": True,
                "enable_cache": False,
                "conversational": True,
                "save_charts_path": self.save_charts_path,
                "enable_code": True,
                "allowed_modules": ["pandas", "numpy"],
                "verbose": True
            },
            memory_size=10,
        )

where df_1 and df_2 are valid pandas dataframes.
Sometimes, for some queries, PandasAI generates code like this:

successful_actions = dfs[0]
unsuccessful_actions = dfs[1]
best_actions  = allowed_queries[(successful_actions['success'] > 0.8)][['type', 'result', 'success', 'relevancy']]
struggled_actions= unsuccessful_actions[['action', 'type']]
result = {'type': 'dataframe', 'value': {'best_actions': best_actions, 'struggled_actions': struggled_actions}}

Once executed, the return type of this function collides with the command in line 11 of the file pandasai/responses/response_serializer.py:

json_data = json.loads(df.to_json(orient="split", date_format="iso"))

as the above return type is a dictionary, not a dataframe, leading to this error:

2024-12-31 10:26:04 [ERROR] Pipeline failed on step 6: 'dict' object has no attribute 'to_json'
Traceback (most recent call last):
  File "...pandasai/pipelines/chat/generate_chat_pipeline.py", line 335, in run
    ).run(input)
      ^^^^^^^^^^
  File "...pandasai/pipelines/pipeline.py", line 137, in run
    raise e
  File "...pandasai/pipelines/pipeline.py", line 101, in run
    step_output = logic.execute(
                  ^^^^^^^^^^^^^^
  File "...pandasai/pipelines/chat/code_execution.py", line 133, in execute
    {"content_type": "response", "value": ResponseSerializer.serialize(result)},
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...pandasai/responses/response_serializer.py", line 27, in serialize
    df_dict = ResponseSerializer.serialize_dataframe(result["value"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...pandasai/responses/response_serializer.py", line 11, in serialize_dataframe
    json_data = json.loads(df.to_json(orient="split", date_format="iso"))
                           ^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'to_json'

Note that this problem can be partially solved with strong prompting - still it happens sometimes anyway.

The text was updated successfully, but these errors were encountered:

dosubot · 2024-12-31T09:36:17Z

Hey there, @francler! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together!

The issue you're encountering arises because the ResponseSerializer class in PandasAI is expecting a DataFrame, but the code generated by PandasAI is returning a dictionary of DataFrames. The serialize_dataframe method is trying to call to_json on a dictionary, which leads to the AttributeError.

To address this, you need to modify the handling of the result in the ResponseSerializer class to accommodate dictionaries of DataFrames. Here's a potential approach:

Modify the serialize Method: Update the serialize method to check if the result["value"] is a dictionary. If it is, iterate over the dictionary and serialize each DataFrame individually.

Example Code:

@staticmethod
def serialize(result):
    if result["type"] == "dataframe":
        if isinstance(result["value"], dict):
            # Handle dictionary of DataFrames
            serialized_dict = {}
            for key, df in result["value"].items():
                if isinstance(df, pd.Series):
                    df = df.to_frame()
                serialized_dict[key] = ResponseSerializer.serialize_dataframe(df)
            return {"type": result["type"], "value": serialized_dict}
        else:
            # Handle single DataFrame
            if isinstance(result["value"], pd.Series):
                result["value"] = result["value"].to_frame()
            df_dict = ResponseSerializer.serialize_dataframe(result["value"])
            return {"type": result["type"], "value": df_dict}
    elif result["type"] == "plot" and isinstance(result["value"], str):
        # Existing plot handling code
        ...
    else:
        return result

This modification checks if the value is a dictionary and then serializes each DataFrame within it. This should resolve the error you're facing by ensuring that each DataFrame in the dictionary is correctly serialized before being returned.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

dosubot bot added the bug Something isn't working label Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dictionary of dataframes from generated code not handled correctly #1497

Dictionary of dataframes from generated code not handled correctly #1497

francler commented Dec 31, 2024 •

edited

Loading

dosubot bot commented Dec 31, 2024

Dictionary of dataframes from generated code not handled correctly #1497

Dictionary of dataframes from generated code not handled correctly #1497

Comments

francler commented Dec 31, 2024 • edited Loading

System Info

pandasai==2.4.1 python==3.12.2

🐛 Describe the bug

dosubot bot commented Dec 31, 2024

francler commented Dec 31, 2024 •

edited

Loading

pandasai==2.4.1
python==3.12.2