Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12563] swaplevel general function for dataframe and series #15944

Merged
merged 5 commits into from
Feb 25, 2022

Conversation

AlikRodriguez
Copy link
Contributor

Implementation of swaplevel


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Nov 11, 2021

Codecov Report

Merging #15944 (12fcda5) into master (6a573e4) will increase coverage by 1.27%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15944      +/-   ##
==========================================
+ Coverage   83.51%   84.78%   +1.27%     
==========================================
  Files         445      453       +8     
  Lines       61390    71025    +9635     
==========================================
+ Hits        51267    60217    +8950     
- Misses      10123    10808     +685     
Flag Coverage Δ
python 83.62% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/dataframe/frames.py 95.89% <100.00%> (+0.97%) ⬆️
sdks/python/apache_beam/internal/pickler.py 77.27% <0.00%> (-9.59%) ⬇️
...on/apache_beam/examples/dataframe/flight_delays.py 25.00% <0.00%> (-2.91%) ⬇️
...runners/interactive/options/interactive_options.py 83.33% <0.00%> (-2.39%) ⬇️
...hon/apache_beam/runners/direct/test_stream_impl.py 94.02% <0.00%> (-2.24%) ⬇️
sdks/python/apache_beam/io/gcp/bigtableio.py 77.11% <0.00%> (-2.11%) ⬇️
sdks/python/apache_beam/io/jdbc.py 82.92% <0.00%> (-2.08%) ⬇️
...eam/io/gcp/datastore/v1new/rampup_throttling_fn.py 93.75% <0.00%> (-1.38%) ⬇️
...s/python/apache_beam/examples/wordcount_minimal.py 92.00% <0.00%> (-0.86%) ⬇️
...ks/python/apache_beam/runners/interactive/utils.py 94.69% <0.00%> (-0.77%) ⬇️
... and 89 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a573e4...12fcda5. Read the comment docs.

@AlikRodriguez AlikRodriguez changed the title WIP [BEAM-12563]swaplevel general function for dataframe and series [BEAM-12563]swaplevel general function for dataframe and series Dec 2, 2021
@AlikRodriguez AlikRodriguez changed the title [BEAM-12563]swaplevel general function for dataframe and series [BEAM-12563] swaplevel general function for dataframe and series Dec 2, 2021
@AlikRodriguez
Copy link
Contributor Author

@TheNeuralBit Hi, I'm having troubles with the docs in this issue, the error refers to the pandas documentation, with the message NameError: name 'df' is not defined, is something I'm not considering here?

@TheNeuralBit
Copy link
Member

The PythonDocs issue is really strange. I can take a look at what's going on there. Is this ready to go otherwise?

@TheNeuralBit
Copy link
Member

TheNeuralBit commented Dec 3, 2021

The PythonDocs issue is because the docstring for swaplevel is a little mangled and it breaks some of our automation:

Swap levels i and j in a :class:`MultiIndex`.

Default is to swap the two innermost levels of the index.

Parameters
----------
i, j : int or str
    Levels of the indices to be swapped. Can pass level name as string.
axis : {0 or 'index', 1 or 'columns'}, default 0
            The axis to swap levels on. 0 or 'index' for row-wise, 1 or
            'columns' for column-wise.

Returns
-------
DataFrame
    DataFrame with levels swapped in MultiIndex.

Examples
        --------
        >>> df = pd.DataFrame(
        ...     {"Grade": ["A", "B", "A", "C"]},
        ...     index=[
        ...         ["Final exam", "Final exam", "Coursework", "Coursework"],
        ...         ["History", "Geography", "History", "Geography"],
        ...         ["January", "February", "March", "April"],
        ...     ],
        ... )
        >>> df
...

I proposed a change upstream (pandas-dev/pandas#44740), if that gets merged and released in 1.3.5 this issue should go away.

If that takes a while or doesn't happen, we'll have to come up with some kind of workaround. I'll have to think on that. You should feel free to ignore this issue for now.

@AlikRodriguez
Copy link
Contributor Author

@TheNeuralBit great! Aside of that, do you think is anything else I should change?

@TheNeuralBit TheNeuralBit self-requested a review December 8, 2021 16:48
Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is ready to merge aside from the docs issue. I think the fix there may just be to wait until pandas 1.4.0 is out, pandas maintainers are uninterested in backporting the fix to 1.3.5

'swaplevel',
lambda df: df.swaplevel(**kwargs), [self._expr],
requires_partition_by=partitionings.Arbitrary(),
preserves_partition_by=partitionings.Arbitrary()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't think this should preserve Index partitioning, since it modifies the index. But our test framework does verify this.

It turns out this expression does preserve index partitioning, since we generate hashes by summing hashes on the individual index levels, which is independent of the order of the index levels.

@TheNeuralBit
Copy link
Member

Just to be clear, no further action is needed from you @AlikRodriguez - I will take care of making sure this gets merged when the docs issue is resolved.

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 13, 2022
@TheNeuralBit
Copy link
Member

retest this please

@github-actions github-actions bot removed the stale label Feb 15, 2022
@TheNeuralBit
Copy link
Member

retest this please

@TheNeuralBit
Copy link
Member

Run PythonDocs PreCommit

@aaltay
Copy link
Member

aaltay commented Feb 25, 2022

@TheNeuralBit - What is the next step on this PR?

@TheNeuralBit
Copy link
Member

Run PythonLint PreCommit

@TheNeuralBit
Copy link
Member

Run Python PreCommit

@TheNeuralBit
Copy link
Member

@TheNeuralBit - What is the next step on this PR?

Thanks for the ping - this just needs a green CI run.

@TheNeuralBit
Copy link
Member

Run PythonLint PreCommit

4 similar comments
@TheNeuralBit
Copy link
Member

Run PythonLint PreCommit

@TheNeuralBit
Copy link
Member

Run PythonLint PreCommit

@TheNeuralBit
Copy link
Member

Run PythonLint PreCommit

@TheNeuralBit
Copy link
Member

Run PythonLint PreCommit

@TheNeuralBit TheNeuralBit merged commit ba2aef5 into apache:master Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants