Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary of dataframes from generated code not handled correctly #1497

Open
francler opened this issue Dec 31, 2024 · 1 comment
Open

Dictionary of dataframes from generated code not handled correctly #1497

francler opened this issue Dec 31, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@francler
Copy link

francler commented Dec 31, 2024

System Info

pandasai==2.4.1
python==3.12.2

ProductName: macOS
ProductVersion: 15.1.1
BuildVersion: 24B91

🐛 Describe the bug

I am using PandasAI in a complex codebase. Note that I am creating an Agent with two data frames with different columns:

self.agent = Agent(
            dfs=[self.df_1, self.df_2],
            description="Whatever the query, DO NOT MAKE PLOTS",
            config={
                "llm": self.llm,
                "save_charts": True,
                "enable_cache": False,
                "conversational": True,
                "save_charts_path": self.save_charts_path,
                "enable_code": True,
                "allowed_modules": ["pandas", "numpy"],
                "verbose": True
            },
            memory_size=10,
        )

where df_1 and df_2 are valid pandas dataframes.
Sometimes, for some queries, PandasAI generates code like this:

successful_actions = dfs[0]
unsuccessful_actions = dfs[1]
best_actions  = allowed_queries[(successful_actions['success'] > 0.8)][['type', 'result', 'success', 'relevancy']]
struggled_actions= unsuccessful_actions[['action', 'type']]
result = {'type': 'dataframe', 'value': {'best_actions': best_actions, 'struggled_actions': struggled_actions}}

Once executed, the return type of this function collides with the command in line 11 of the file pandasai/responses/response_serializer.py:

json_data = json.loads(df.to_json(orient="split", date_format="iso"))

as the above return type is a dictionary, not a dataframe, leading to this error:

2024-12-31 10:26:04 [ERROR] Pipeline failed on step 6: 'dict' object has no attribute 'to_json'
Traceback (most recent call last):
  File "...pandasai/pipelines/chat/generate_chat_pipeline.py", line 335, in run
    ).run(input)
      ^^^^^^^^^^
  File "...pandasai/pipelines/pipeline.py", line 137, in run
    raise e
  File "...pandasai/pipelines/pipeline.py", line 101, in run
    step_output = logic.execute(
                  ^^^^^^^^^^^^^^
  File "...pandasai/pipelines/chat/code_execution.py", line 133, in execute
    {"content_type": "response", "value": ResponseSerializer.serialize(result)},
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...pandasai/responses/response_serializer.py", line 27, in serialize
    df_dict = ResponseSerializer.serialize_dataframe(result["value"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...pandasai/responses/response_serializer.py", line 11, in serialize_dataframe
    json_data = json.loads(df.to_json(orient="split", date_format="iso"))
                           ^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'to_json'

Note that this problem can be partially solved with strong prompting - still it happens sometimes anyway.

Copy link

dosubot bot commented Dec 31, 2024

Hey there, @francler! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together!

The issue you're encountering arises because the ResponseSerializer class in PandasAI is expecting a DataFrame, but the code generated by PandasAI is returning a dictionary of DataFrames. The serialize_dataframe method is trying to call to_json on a dictionary, which leads to the AttributeError.

To address this, you need to modify the handling of the result in the ResponseSerializer class to accommodate dictionaries of DataFrames. Here's a potential approach:

  1. Modify the serialize Method: Update the serialize method to check if the result["value"] is a dictionary. If it is, iterate over the dictionary and serialize each DataFrame individually.

  2. Example Code:

    @staticmethod
    def serialize(result):
        if result["type"] == "dataframe":
            if isinstance(result["value"], dict):
                # Handle dictionary of DataFrames
                serialized_dict = {}
                for key, df in result["value"].items():
                    if isinstance(df, pd.Series):
                        df = df.to_frame()
                    serialized_dict[key] = ResponseSerializer.serialize_dataframe(df)
                return {"type": result["type"], "value": serialized_dict}
            else:
                # Handle single DataFrame
                if isinstance(result["value"], pd.Series):
                    result["value"] = result["value"].to_frame()
                df_dict = ResponseSerializer.serialize_dataframe(result["value"])
                return {"type": result["type"], "value": df_dict}
        elif result["type"] == "plot" and isinstance(result["value"], str):
            # Existing plot handling code
            ...
        else:
            return result

This modification checks if the value is a dictionary and then serializes each DataFrame within it. This should resolve the error you're facing by ensuring that each DataFrame in the dictionary is correctly serialized before being returned.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the bug Something isn't working label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant