-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make pandas and NumPy optional dependencies, don't require PyArrow for plotting with Polars/Modin/cuDF #3452
Changes from 1 commit
ef2a10e
6da3e3b
e91ed4d
f6d639e
7c052c0
81da742
c72fb9b
1078d38
bb84f22
b111fe9
b2118f9
84bda85
110f848
6be087a
795b464
3063fdf
1c8c5a3
b0ca54d
d0417df
5d54bc4
b52eaca
8b4b3db
cd81385
bec9bc2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -468,12 +468,13 @@ def sanitize_narwhals_dataframe( | |
return data.select(columns) | ||
|
||
|
||
def narwhalify(data: DataType) -> nw.DataFrame: | ||
"""Wrap `data` in `narwhals.DataFrame`. | ||
def narwhalify(data: DataType) -> nw.DataFrame[Any]: | ||
"""Wrap `data` in `narwhals.DataFrame` (if possible). | ||
|
||
If `data` is not supported by Narwhals, but it is convertible | ||
to a PyArrow table, then first convert to a PyArrow Table, | ||
and then wrap in `narwhals.DataFrame`. | ||
If it can't even be converted to a PyArrow Table, return as-is. | ||
""" | ||
# Using `strict=False` will return `data` as-is if the object cannot be converted. | ||
data = nw.from_native(data, eager_only=True, strict=False) | ||
|
@@ -634,7 +635,7 @@ def parse_shorthand( | |
if "type" not in attrs and is_data_type(data): | ||
data_nw = narwhalify(data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something important that I think we're losing here is that previously we were checking the types of columns in DataFrame Interchange Protocol DataFrames without loading them all into memory as pyarrow (which is what narwhalify does for types that narwhals doesn't support). Ibis has an optimization where calling When used with plain Altair, the full dataset is loaded into memory anyway during the So unless/until narwhals can support wrapping DataFrame Interchange Protocol DataFrames directly, I think we need to avoid the arrow conversion here and fall back to the legacy |
||
unescaped_field = attrs["field"].replace("\\", "") | ||
if isinstance(data_nw, nw.DataFrame) and unescaped_field in data_nw.columns: | ||
if unescaped_field in data_nw.columns: | ||
column = data_nw[unescaped_field] | ||
if column.dtype in {nw.Object, nw.Unknown} and _is_pandas_dataframe(data): | ||
attrs["type"] = infer_vegalite_type_for_pandas(nw.to_native(column)) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,6 +67,7 @@ all = [ | |
dev = [ | ||
"hatch", | ||
"ruff>=0.5.1", | ||
"ibis-framework", | ||
"ipython", | ||
"pandas>=0.25.3", | ||
"pytest", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's rename this test file to something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I can't see the original line you'd commented on any more...any chance you remember which test it was? Sorry about this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was referring to the file name itself. It's currently |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
"""Unit tests for altair API""" | ||
|
||
from datetime import date | ||
import io | ||
import ibis | ||
import sys | ||
import json | ||
import operator | ||
|
@@ -1082,3 +1084,16 @@ def test_polars_with_pandas_nor_pyarrow(monkeypatch: pytest.MonkeyPatch): | |
assert "pandas" not in sys.modules | ||
assert "pyarrow" not in sys.modules | ||
assert "numpy" not in sys.modules | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hadn't seen |
||
|
||
|
||
MarcoGorelli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def test_ibis_with_date_32(): | ||
df = pl.DataFrame( | ||
{"a": [1, 2, 3], "b": [date(2020, 1, 1), date(2020, 1, 2), date(2020, 1, 3)]} | ||
) | ||
tbl = ibis.memtable(df) | ||
result = alt.Chart(tbl).mark_line().encode(x="a", y="b").to_dict() | ||
assert next(iter(result["datasets"].values())) == [ | ||
{"a": 1, "b": "2020-01-01T00:00:00.000000"}, | ||
{"a": 2, "b": "2020-01-02T00:00:00.000000"}, | ||
{"a": 3, "b": "2020-01-03T00:00:00.000000"}, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does narwhals do with geopandas DataFrames? Are these just treated the same as pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now they're going down different paths:
https://github.com/MarcoGorelli/altair/blob/5d54bc4263dd5bdba600819b316dda8c592bdd8f/altair/utils/data.py#L316-L334
Maybe this can all be unified further, although the pandas/geointerface paths are doing some very library-specific things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so they get wrapped by narwhals fine (just as normal pandas DataFrames), but then we use
from_native
whenever we need to be able to tell the difference between a GeoDataFrame and a regular DataFrame?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right!
it's a (practically) free operation,
narwhals.DataFrame
just holds a reference to the original dataframe, there's no copying nor conversion involved - getting the original dataframe out again is just a matter of accessing 1-2 propertieshttps://github.com/narwhals-dev/narwhals/blob/a5276cd6e80781c61143c71041db81bd700a0e12/narwhals/translate.py#L41-L75