Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf.pandas doesn't work with pandasai (LLM-based chat interface to pandas dataframes) #14384

Closed
beckernick opened this issue Nov 9, 2023 · 1 comment · Fixed by #14388
Labels
bug Something isn't working

Comments

@beckernick
Copy link
Member

beckernick commented Nov 9, 2023

I've been testing pandasai (docs) with cudf.pandas and experienced a surprising error.

This code works:

import pandas as pd
from pandasai import SmartDataframe

df = pd.DataFrame({"a":[0,1,2]})
df = SmartDataframe(df)

But this doesn't:

%load_ext cudf.pandas
import pandas as pd
from pandasai import SmartDataframe

df = pd.DataFrame({"a":[0,1,2]})
df = SmartDataframe(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 5
      2 from pandasai import SmartDataframe
      4 df = pd.DataFrame({"a":[0,1,2]})
----> 5 df = SmartDataframe(df)

File ~/rapids-2310/lib/python3.10/site-packages/pandasai/smart_dataframe/__init__.py:277, in SmartDataframe.__init__(self, df, name, description, sample_head, config, logger)
    275     if description is None:
    276         description = df_config["description"]
--> 277 self._core = SmartDataframeCore(df, logger)
    279 self._table_description = description
    280 self._table_name = name

File ~/rapids-2310/lib/python3.10/site-packages/pandasai/smart_dataframe/__init__.py:64, in SmartDataframeCore.__init__(self, df, logger)
     62 def __init__(self, df: DataFrameType, logger: Logger = None):
     63     self._logger = logger
---> 64     self._load_dataframe(df)

File ~/rapids-2310/lib/python3.10/site-packages/pandasai/smart_dataframe/__init__.py:93, in SmartDataframeCore._load_dataframe(self, df)
     89         raise ValueError(
     90             "Invalid input data. We cannot convert it to a dataframe."
     91         ) from e
     92 else:
---> 93     self.dataframe = df

File ~/rapids-2310/lib/python3.10/site-packages/pandasai/smart_dataframe/__init__.py:208, in SmartDataframeCore.dataframe(self, df)
    205 self._df = df
    207 if df is not None:
--> 208     self._load_engine()

File ~/rapids-2310/lib/python3.10/site-packages/pandasai/smart_dataframe/__init__.py:124, in SmartDataframeCore._load_engine(self)
    121 engine = df_type(self._df)
    123 if engine is None:
--> 124     raise ValueError(
    125         "Invalid input data. Must be a Pandas or Polars dataframe."
    126     )
    128 self._engine = engine

ValueError: Invalid input data. Must be a Pandas or Polars dataframe.

The error is surprising. It's not in the stacktrace because the function causing the error doesn't throw, it just returns None instead of "pandas". We're failing in this function:

def df_type(df: DataFrameType) -> Union[str, None]:
    """
    Returns the type of the dataframe.

    Args:
        df (DataFrameType): Pandas or Polars dataframe

    Returns:
        str: Type of the dataframe
    """
    if polars_imported and isinstance(df, pl.DataFrame):
        return "polars"
    elif isinstance(df, pd.DataFrame):
        return "pandas"
    else:
        return None

Seems like we should pass, right? When I rip the function out of the module and run it in my notebook, it works smoothly. But in the pandasai module, it fails.

The following print statements I added to the pandasai module code suggest something is going wrong:

def df_type(df: DataFrameType) -> Union[str, None]:
    """
    Returns the type of the dataframe.

    Args:
        df (DataFrameType): Pandas or Polars dataframe

    Returns:
        str: Type of the dataframe
    """
    print(pd)
    print(f"type(type(df)): {type(type(df))}")
    print(f"type(pd.DataFrame): {type(pd.DataFrame)}")
    print(f"isinstance(df, pd.DataFrame): {isinstance(df, pd.DataFrame)}")
    if polars_imported and isinstance(df, pl.DataFrame):
        return "polars"
    elif isinstance(df, pd.DataFrame):
        return "pandas"
    else:
        return None
%load_ext cudf.pandas
import pandas as pd
from pandasai import SmartDataframe

df = pd.DataFrame({"a":[0,1,2]})
df = SmartDataframe(df)
<module 'pandas' (ModuleAccelerator(fast=cudf, slow=pandas))>
type(type(df)): <class 'cudf.pandas.fast_slow_proxy._FastSlowProxyMeta'>
type(pd.DataFrame): <class 'type'>
isinstance(df, pd.DataFrame): False

pd.DataFrame is referring to the raw pd.DataFrame, not a wrapped proxy. So the isinstance check returns False.

In a chat this morning, @shwina wondered if this might be happening because the module name starts with "pandas"!

@beckernick beckernick added bug Something isn't working pandas labels Nov 9, 2023
@beckernick
Copy link
Member Author

Per additional exploration by @shwina , seems like the issue may be this behavior, which was explored a few days ago:

%load_ext cudf.pandas

import pandas as pd
print(type(pd.DataFrame))
print(getattr(pd, "DataFrame"))
<class 'cudf.pandas.fast_slow_proxy._FastSlowProxyMeta'>
<class 'pandas.core.frame.DataFrame'>

rapids-bot bot pushed a commit that referenced this issue Nov 14, 2023
Closes #14384. `x.startswith(y)` is not a good enough check for if `x` is a subdirectory of `y`. It causes `pandasai` to be reported as a sub-package of `pandas`.

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - https://github.com/brandon-b-miller

URL: #14388
copy-pr-bot bot pushed a commit that referenced this issue Nov 22, 2023
Closes #14384. `x.startswith(y)` is not a good enough check for if `x` is a subdirectory of `y`. It causes `pandasai` to be reported as a sub-package of `pandas`.

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - https://github.com/brandon-b-miller

URL: #14388
@vyasr vyasr removed the pandas label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants