Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkstyle rule for uppercase constant fields #10673

Merged
merged 6 commits into from
Jul 11, 2024

Conversation

attilakreiner
Copy link
Contributor

No description provided.

@nastra
Copy link
Contributor

nastra commented Jul 10, 2024

@attilakreiner thanks for doing this and generally this looks good.

Could you please extract the Spark + Flink changes into a separate PR (where checkstyle doesn't enforce this naming yet)? I'll review the Spark + Flink changes first (as those typically get quite large) and once those changes are in, we can continue with this PR (containing the changes for all other modules + enforcing the naming).

@attilakreiner
Copy link
Contributor Author

attilakreiner commented Jul 10, 2024

@nastra, got it. Done.

@@ -284,6 +284,10 @@
<property name="format" value="^[a-z][a-zA-Z0-9]+$"/>
<message key="name.invalidPattern" value="Member name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="ConstantName">
<property name="format" value="^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use possessive quantifiers. That way the matching might be simpler.
I didn't benchmark it, but checksyle can take considerable time overall, and small things like this add up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the regexp from
^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$
to
^[A-Z][A-Z0-9]+(_[A-Z0-9]+)*$
Pls mind the * vs + char at pos 15. Pls clarify whether or not this is what you were pointing out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I reverted this change as it breaks for Z_COLUMN, Z_SCHEMA and Z_SORT_ORDER.

Pls point out if you suggest a different regexp here. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to leverage possessive quantifiers replace

^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$

with

^[A-Z][A-Z0-9]*+(_[A-Z0-9]++)*+$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I see. I experimented a bit with the possessive quantifiers in the regexes as well as removing some custom messages that seem redundant.

I created a draft PR with these changes: #10681. Even though it seems like just a few lines of changes, it has an impact for the whole codebase and I don't want to hold up this one. (I'd prefer to keep the scope of this PR tight, really just renaming the constants to uppercase and enforcing that rule.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this PR we should just stick with the default regex pattern of checkstyle, which I believe is what you had initially. Improving the regex can be done in a separate PR.

Copy link
Contributor Author

@attilakreiner attilakreiner Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra, yep, same thought here, hence the spin-off draft PR.

In this PR the definition I used is this (this is the current state right now):
<property name="format" value="^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"/>

the default you linked is this:
<property name="format" value="^log(ger)?$|^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"/>

So the only difference is I removed the first part that allows for the lowercase log(ger) as this codebase doesn't use it, so it was redundant.

Also we have the custom message in place to be consistent with the rest of the definitions for now.

Pls LMK if this is all good as it is right now or we need some changes here.

@@ -284,6 +284,10 @@
<property name="format" value="^[a-z][a-zA-Z0-9]+$"/>
<message key="name.invalidPattern" value="Member name ''{0}'' must match pattern ''{1}''."/>
</module>
<module name="ConstantName">
<property name="format" value="^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"/>
<message key="name.invalidPattern" value="Constant name ''{0}'' must match pattern ''{1}''."/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this property needed? What error message does checkstyle produce when we omit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message without this property is:
Name 'factory' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'. [ConstantName]
with this property it's:
Constant name 'factory' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'. [ConstantName]

My preference is to keep it as both the definition and the output is more consistent with the existing rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both messages are clear and actionable. My preference would be not to add configuration that is not needed.
I noticed we do set this for other checks, and I believe we can remove those other configurations too, in places where they do not add much value.

Copy link
Contributor Author

@attilakreiner attilakreiner Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Pls see my previous comment about #10681.

@attilakreiner
Copy link
Contributor Author

Merged in the main branch.

@@ -34,11 +34,11 @@ public enum Support {
this.lowerCaseName = name().toLowerCase(Locale.ROOT);
}

public static final Map<String, Support> nameToSupportMap = Maps.newHashMap();
public static final Map<String, Support> NAME_TO_SUPPORT_MAP = Maps.newHashMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes a publicly accessible field - a breaking ABI change.

Copy link
Contributor Author

@attilakreiner attilakreiner Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I reverted this change. I think it's fair to think about this one as an exceptional case, so I added @SuppressWarnings("checkstyle:ConstantName") to overcome the issue. Pls LMK if this fix is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked the PR for similar cases and I did not find any more public static fields that were affected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just adding some additional context that Iceberg only provides API/ABI stability for certain modules (iceberg-mr module isn't included) which is enforced by revapi, but great if we can avoid breaking the API/ABI for modules where we don't enforce it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WFM

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @attilakreiner

@nastra nastra merged commit 41e1951 into apache:main Jul 11, 2024
50 checks passed
jasonf20 pushed a commit to jasonf20/iceberg that referenced this pull request Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants