Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): extend existing fast range->Series init to lists of ranges in a Series #6099

Merged
merged 1 commit into from
Jan 7, 2023

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jan 7, 2023

We currently have a range => Series fast-path; this extends the concept slightly to allow for easy creation of lists of ranges within a Series (previously would just dump the ranges in as unexpanded python objects - something for which there is no realistic use).

Example:

pl.Series( "a", [range(-2,1),range(3),range(2,8,2)] ) 

# shape: (3,)
# Series: 'a' [list]
# [
#     [-2, -1, 0]
#     [0, 1, 2]
#     [2, 4, 6]
# ]

pl.Series( "b", ((range(n),range(n+1),range(n+2)) for n in range(1,4)) ) 

# shape: (3,)
# Series: 'b' [list]
# [
#     [[0], [0, 1], [0, 1, 2]]
#     [[0, 1], [0, 1, 2], [0, 1, ... 3]]
#     [[0, 1, 2], [0, 1, ... 3], [0, 1, ... 4]]
# ]

Note:

The 1D case (range => Series) is fast; the 2D case (above) is currently "ok" as the number of distinct ranges gets large; each range is mediated by creation of an associated DataFrame inside pl.arange. Would be interesting if we could push some kind of sequence generation based on row number (like the above) into an expression...🤔

(FYI: I originally though pandas had some insanely fast method of doing this 100s of times faster than us, until I realised it was just loading the range objects "as-is" but its repr was unpacking a few rows of them when you eyeball the frame, hah! Almost spit out my tea when it first appeared to benchmark 586x faster - then reality reasserted itself :)

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Jan 7, 2023
@ritchie46
Copy link
Member

Anything over 100x difference is comparing apples to peaches is my rule of thumb. ;)

@ritchie46 ritchie46 merged commit 4e13a81 into pola-rs:master Jan 7, 2023
@alexander-beedie alexander-beedie deleted the series-lists-of-ranges branch January 7, 2023 15:55
zundertj pushed a commit to zundertj/polars that referenced this pull request Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants