Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series dtype inference based on the type of the first anyvalue #7212

Closed
aldanor opened this issue Feb 27, 2023 · 4 comments
Closed

Series dtype inference based on the type of the first anyvalue #7212

aldanor opened this issue Feb 27, 2023 · 4 comments
Labels
A-input-parsing Area: parsing input arguments bug Something isn't working python Related to Python Polars

Comments

@aldanor
Copy link
Contributor

aldanor commented Feb 27, 2023

Currently, it seems like when Series is constructed from any-values, it simply grabs the first non-null value and uses that to convert all other values to it? This may lead to some weird examples like:

>>> pl.Series([datetime.now(), 3.1415])  # swap these to get an error
shape: (2,)
Series: '' [datetime[μs]]
[
	2023-02-27 02:30:28.538807
	1970-01-01 00:00:00.000003   # uh, well...
]

(pandas would simply cast both to 'object')

Should it just always raise an error in cases like this? Is there are a better way to handle this kind of cases?

@ritchie46
Copy link
Member

We should error. For pl.Series([datetime.now(), "foo"]) it errors. Somewhere an conversion should be more strict in your expample.

@aldanor
Copy link
Contributor Author

aldanor commented Feb 27, 2023

I think it works because the primitive type for datetimes is int64 and extracting ints out of floats does work somewhere along the way.

Plus, conversion logic is (at least one branch of it) based on the type of the first non-null element, that's pretty unobvious for the user - swap the order and get different result.

@lars-reimann
Copy link

lars-reimann commented Apr 22, 2023

I also find the conversion logic quite confusing: In polars v0.17.6, the following code

import polars as pl
from datetime import datetime

series = pl.Series([1, True, "a", datetime.now(), None])

print(list(series))

prints [None, None, 'a', None, None]. Here it doesn't seem to be based on the type of the first element anymore but polars keeps only the strings.

On the other hand this code raises an error TypeError: 'datetime.datetime' object cannot be interpreted as an integer:

import polars as pl
from datetime import datetime

series = pl.Series([1, True, datetime.now(), "a", None])

print(list(series))

Explicitly specifying the dtype as Object makes this work as expected:

import polars as pl
from datetime import datetime

series = pl.Series([1, True, datetime.now(), "a", None], dtype=pl.Object)

print(list(series)) # [1, True, datetime.datetime(2023, 4, 22, 12, 16, 42, 870943), 'a', None]

In any case, I would expect that the description of the dtype parameter of Series "If not specified, the dtype is inferred." means that if a user omits the dtype, polars figures out something that works (like pl.Object).

@stinodego stinodego added bug Something isn't working python Related to Python Polars and removed question labels Aug 16, 2023
@stinodego stinodego added needs triage Awaiting prioritization by a maintainer A-input-parsing Area: parsing input arguments and removed needs triage Awaiting prioritization by a maintainer labels Jan 13, 2024
@stinodego
Copy link
Member

Closing in favor of #11156

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input-parsing Area: parsing input arguments bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

4 participants