-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEPRECATION] Remove Series.set_index #9541
Comments
This issue has been labeled |
This PR removes the deprecated method `Series.set_index`. Resolves #9541, follows up on #9529. Authors: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Christopher Harris (https://github.com/cwharris) - Vyas Ramasubramani (https://github.com/vyasr) URL: #9945
@vyasr is this a correct rewrite of impacted code? df['x2'] = df['x'].set_index(df['idx']).sort_index() => df['x2'] = df[['x']].set_index(df['idx']).sort_index()['x'] |
That is a valid rewrite. However, I'm not certain that ever did exactly what you intended unless the
Note that because Leaving aside that question, as far as the best replacement it depends on personal preference and how you want to trade off performance vs simplicity. You may get better mileage from one of the following depending on the number of rows and columns in your table:
|
Excellent, thanks! 🙌 Our case is a couple of permutations we're tacking on for generating some sort of funny ~reverse ~indices, and primarily for export, so (I think) we're good An interesting bit these kinds of examples show that it'd be interesting (R&D-level) to do some sort of call tree recording to identify and recommend fusion + sorting optimizations, basically what a SQL optimizer would do, and advise on those! |
You may be interested in expression templates, which are one way that e.g. linear algebra libraries do exactly what you're talking about to enable operator fusion and minimizing intermediates. You can do it in Python with lazy evaluation like what Dask and Vaex do. But there's no real way to offer both lazy and greedy evaluation at the same time (although you could feature flag it), so the best we could do with that is probably what you suggested: record ops and make recommendations. That would be cool though! |
we were playing with the idea of LINQ-based df expression trees -- a lot of our original core was rxjs -> dataframejs -- where the same relational structure can work for diff eval structures. we've moved to python as rapids grew though. doing for dask_cudf DAGs is interesting. we have a lot of some big examples for us:
|
pandas supports
DataFrame.set_index
to convert specified columns of a DataFrame into its index. To actually set a separate index object as the index of a DataFrame or Series, however, a user must set theindex
property (e.g.df.index = RangeIndex(10)
).Series.set_index
as implemented in cuDF (as an alias for setting the property) confuses the issue. The method was deprecated in #9529 for 21.12 and we will remove it in 22.02.The text was updated successfully, but these errors were encountered: