Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate include grouping order in .clang-format #15063

Merged

Conversation

harrism
Copy link
Member

@harrism harrism commented Feb 15, 2024

Replaces #14993
Closes #15103

Description

This uses the IncludeCategories settings in .clang-format to attempt to enforce our documented #include order in libcudf. See https://docs.rapids.ai/api/libcudf/stable/developer_guide

I realize that there was a previous attempt at this by @bdice that met with some resistance. Reading it, I wouldn't say it was vetoed; rather, reviewers requested something much simpler. I have a few reasons to attempt this again.

  1. To make a separate task much easier. We are undertaking a refactoring of RMM that will replace rmm::mr::device_memory_resource* with rmm::device_async_resource-ref everywhere in RAPIDS (not just cuDF). This requires adding an include to MANY files. Getting the location of the include right everywhere is very difficult without automatic grouping of headers. I started out writing a bash script to do this before realizing clang-format has the necessary feature. And I realized that my script would never properly handle files like this.
  2. To increase velocity. Everywhere in RAPIDS that we have automated code standard/style/formatting/other, the benefits to velocity have outweighed the costs. To paraphrase @bdice, $auto \nearrow \rightarrow \mu \searrow \rightarrow v \nearrow$
  3. The previous PR Automatic include sorting with clang-format #12760 had nearly 50 categories of headers. There was no way this could be applied universally across RAPIDS repos. My proposal has 10 categories. I tried to reduce it further but realized that it wouldn't be much less configuration to maintain, so I stopped at 10.

Note that one of the ways that having few categories can work while still maintaining clear groups is that this PR updates many files to use quotes ("") instead of angle brackets (<>) for local cuDF headers that do not live in cudf/cpp/include. With our "near to far" include ordering policy, these are arguably the nearest files, and using quotes allows us to have our first category simply check for quotes. These files will be grouped and sorted without blank lines, but in practice this does not lose clarity because typically headers from more than two directories are not included from the same file. The downside of this change is I don't yet know how to automatically enforce it. I hope that when developers accidentally use <> for internal includes that don't start with (e.g.) "cudf", they will be grouped one of the lowest priority categories, and perhaps this will induce them to switch to "" to get the headers listed at the top. The rule is simple: if it's in libcudf but not in cpp/include/cudf, then use quotes. For everything else, use angle brackets.

Other than headers from RAPIDS repos, we have a group for all CCCL/CUDA headers, a group for all other headers that have a file extension, and a final group for all files that have no file extension (e.g. STL).

Below I'm listing the (fairly simple, in my opinion) .clang-format settings for this PR. Note that categories 2-5 will require tweaking for different RAPIDS repos.

Some may ask why I ordered cudf_test headers before cudf headers. I tried both orders, and putting cudf_test first generated significantly fewer changes in the PR, meaning that it's already the more common ordering (I suppose cudf_test is closer to the files that include it, since they are libcudf tests).

I've opened a similar PR for RMM with only 5 groups. rapidsai/rmm#1463

CC @davidwendt @vyasr @wence- @GregoryKimball for feedback

@isVoid contributed to this PR via pair programming.

IncludeBlocks: Regroup
IncludeCategories:
  - Regex:           '^"' # quoted includes
    Priority:        1
  - Regex:           '^<(benchmarks|tests)/' # benchmark includes
    Priority:        2
  - Regex:           '^<cudf_test/' # cuDF includes
    Priority:        3
  - Regex:           '^<cudf/' # cuDF includes
    Priority:        4
  - Regex:           '^<(nvtext|cudf_kafka)' # other libcudf includes
    Priority:        5
  - Regex:           '^<(cugraph|cuml|cuspatial|raft|kvikio)' # Other RAPIDS includes
    Priority:        6
  - Regex:           '^<rmm/' # RMM includes
    Priority:        7
  - Regex:           '^<(thrust|cub|cuda)/' # CCCL includes
    Priority:        8
  - Regex:           '^<(cooperative_groups|cuco|cuda.h|cuda_runtime|device_types|math_constants|nvtx3)' # CUDA includes
    Priority:        8
  - Regex:           '^<.*\..*' # other system includes (e.g. with a '.')
    Priority:        9
  - Regex:           '^<[^.]+' # STL includes (no '.')
    Priority:        10

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@harrism harrism added feature request New feature or request non-breaking Non-breaking change labels Feb 15, 2024
@harrism harrism self-assigned this Feb 15, 2024
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Feb 15, 2024
@wence-
Copy link
Contributor

wence- commented Feb 15, 2024

As mentioned on the other PR, I think this is great, thanks for doing it! I just had one bikeshed colour policy question:

  - Regex:           '^<cudf/' # cuDF includes
    Priority:        4

Is it desirable to have a separation of detail API headers from public ones? (so '^<cudf/detail' with a separate priority from '^<cudf/') This would work because matching is done top down, so we would just put the detail regex before the public API regex in the list.

@davidwendt
Copy link
Contributor

As mentioned on the other PR, I think this is great, thanks for doing it! I just had one bikeshed colour policy question:

  - Regex:           '^<cudf/' # cuDF includes
    Priority:        4

Is it desirable to have a separation of detail API headers from public ones? (so '^<cudf/detail' with a separate priority from '^<cudf/') This would work because matching is done top down, so we would just put the detail regex before the public API regex in the list.

I don't think we need this extra group.

@harrism harrism marked this pull request as ready for review February 16, 2024 06:50
@harrism harrism requested review from a team as code owners February 16, 2024 06:50
@harrism
Copy link
Member Author

harrism commented Feb 16, 2024

Moving out of draft

@@ -15,6 +15,7 @@
*/

#include "reader_impl.hpp"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what happened here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a comment @harrism had with @bdice, this might be a clang-format bug,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 25 to 27
#include <join/join_common_utils.cuh>
#include <join/join_common_utils.hpp>
#include <join/mixed_join_common_utils.cuh>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This move seems wrong. Perhaps they are meant to be quoted?

Copy link
Member Author

@harrism harrism Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Changed all #include <SRCDIR/...> to #include "SRCDIR/..." where SRCDIR is any directory under cudf/src. From now on all includes of cudf headers that are not in cudf/include should be quoted. And the nice thing is when they are not, they end up down at the bottom of the includes, making it easier to spot.

We could consider quoting all includes of cuDF headers.

@@ -14,10 +14,11 @@
* limitations under the License.
*/

#include "spdlog/sinks/stdout_sinks.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thx.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to the start of the test files.

Comment on lines -18 to -20
// To avoid https://github.com/NVIDIA/libcudacxx/issues/460
// in libcudacxx with CTK 12.0/12.1
#include <cuda_runtime.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer necessary because we have a CCCL with the fix?

Copy link
Contributor

@bdice bdice Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct but potentially better for a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted this on the previous PR that was closed, I think. Can we double check that my previous comments were addressed (or deferred for a later issue/PR)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. It was addressed here. I misread the diff.

Copy link
Member Author

@harrism harrism Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice I went over all of your feedback and made sure it was addressed (except the things where I said "please open an issue".)

@@ -14,14 +14,9 @@
* limitations under the License.
*/

#include "hash/concurrent_unordered_map.cuh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliberate change from <> to ""?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@@ -15,6 +15,7 @@
*/

#include "reader_impl.hpp"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a comment @harrism had with @bdice, this might be a clang-format bug,

@@ -16,7 +16,7 @@

#include "reader_impl_helpers.hpp"

#include <io/utilities/row_selection.hpp>
#include "io/utilities/row_selection.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliberate change to ""?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

#include <fcntl.h>
#include <io/utilities/config_utils.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should this be "", et seq. throughout?

Copy link
Member Author

@harrism harrism Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Now done for all io, hash, join, and many other directories under cudf/src/. Basically I searched for regexp #include <(NAME/.*)> and replaced with the same in quotes, where NAME is any folder name in cudf/src/.

@PointKernel
Copy link
Member

Other than headers from RAPIDS repos, I group all other headers that have a file extension in a single group, and all files that have no file extension in another group. Since the latter also matches includes some files from libcu++, I have an explicit category for <cuda/ includes to keep them separate from STL includes. A frequent effect of the single "." group is that cub and thrust headers get grouped without a blank line between them. I don't think this is a problem.

PR description to be updated. It seems all CCCL headers now belong to one header group.

@harrism
Copy link
Member Author

harrism commented Feb 20, 2024

PR description to be updated. It seems all CCCL headers now belong to one header group.

Updated, thanks.

@harrism
Copy link
Member Author

harrism commented Feb 21, 2024

/merge

@rapids-bot rapids-bot bot merged commit d053323 into rapidsai:branch-24.04 Feb 21, 2024
69 checks passed
rapids-bot bot pushed a commit to rapidsai/rmm that referenced this pull request Feb 22, 2024
This uses the IncludeCategories settings in .clang-format to attempt to enforce our documented #include order in librmm. For discussion, see rapidsai/cudf#15063. This PR uses a subset of the header grouping categories used in that PR.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Lawrence Mitchell (https://github.com/wence-)
  - Rong Ou (https://github.com/rongou)

URL: #1463
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get a chance to review this earlier due to being out, but I agree with the approach and the results look good from a spot check of some of these files. Thanks for the ping, and thanks for doing this @harrism!

rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Feb 29, 2024
…5787)

This uses the `IncludeCategories` settings in` .clang-format` to automate include ordering and grouping and to make include ordering more consistent with the rest of RAPIDS. For discussion, see rapidsai/cudf#15063. This PR uses a similar set of header grouping categories used in that PR, adapted for cuML.

One purpose of this is to make it easier to automate injection of a header change with an upcoming RMM refactoring (and in the future).

The header reordering in this PR uncovered multiple places where headers were not included where they are used. Most commonly this was a missing `#include <raft/core/handle.hpp>`

Closes #5779

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5787
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Feb 29, 2024
…4205)

This uses the `IncludeCategories` settings in` .clang-format` to automate include ordering and grouping and to make include ordering more consistent with the rest of RAPIDS. For discussion, see rapidsai/cudf#15063. This PR uses a similar set of header grouping categories used in that PR, adapted for cuGraph.

One purpose of this is to make it easier to automate injection of a header change with an upcoming RMM refactoring (and in the future).

Note that this PR also updates all of cugraph to use quotes ("") when includeing local *source* headers. That is, whenever something is included from `cugraph/src/*`, it should be included with quotes rather than angle brackets. This makes it much easier to order includes from "nearest to farthest", and matches the approach now taken in other RAPIDS repos.  Without switching to quotes, keeping these includes at the top requires special casing each subdirectory of `src` in the clang-format group regular expressions. 

Includes from the `include` directory stay as angle brackets, because they always use `#include <cugraph/...>`.

cuGraph tests include a LOT of internal source headers. I think we should generally avoid this, but it depends on the testing philosophy. If one believes that tests should only cover the external interfaces of the library, then there is no need to include internal headers. But if we want to test internal functionality, then the tests need to be more "internal" to the library.

The header reordering in this PR also uncovered some places where headers were not included where they are used, which I have fixed.

Closes #4185

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Naim (https://github.com/naimnv)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4205
rapids-bot bot pushed a commit to rapidsai/raft that referenced this pull request Mar 6, 2024
…2202)

This uses the IncludeCategories settings in .clang-format to automate include ordering and grouping and to make include ordering more consistent with the rest of RAPIDS. For discussion, see rapidsai/cudf#15063. This PR uses a similar set of header grouping categories used in that PR, adapted for RAFT.

The header reordering in this PR uncovered one place where `#include <cutlass/layout/pitch_linear.h>` was missing from `raft/cpp/include/raft/distance/detail/./pairwise_distance_epilogue_elementwise.h`, so I also added that include.

One purpose of this is to make it easier to automate injection of a header change with an upcoming RMM refactoring (and in the future).

Closes #2193

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2202
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this pull request Mar 6, 2024
…1349)

This uses the IncludeCategories settings in .clang-format to attempt to enforce our documented #include order in libcuspatial. For discussion, see rapidsai/cudf#15063. This PR uses a subset of the header grouping categories used in that PR.

Note that this also updates the pre-commit configuration to check and correct file copyright years. This comes from rapidsai/cudf#14917.  @KyleFromNVIDIA can you confirm whether or not it's OK to do this? It seems to be working fine.

Closes #1345

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Michael Wang (https://github.com/isVoid)

URL: #1349
rapids-bot bot pushed a commit that referenced this pull request Mar 6, 2024
…udes (#15238)

Follow up to #15063 to add new guidance for quoting includes of internal headers from `src` paths. Also covers clang-format include grouping. 

Also fixes a single include that was added with `<>` recently that should be `""`. #15063 updated all includes to match the guidance in this PR (changing a lot of `<>` to `""` for includes from `src/...`.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #15238
@vyasr vyasr mentioned this pull request Jun 26, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automate C++ include file grouping and ordering using clang-format
7 participants