-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line length incorrect for non-ascii text #3825
Comments
It's not quite the number of bytes (or, at least, it shouldn't be). Instead, it's the unicode width (or, at least, it should be). The relevant PR is here: #3714. |
Hi! #3714 is finished, so this issue is already solved ? # Ok length=84
class JapaneseOk:
"""
このドキュメントはテスト用のダミーです。十分に長い文章を用意しても大丈夫なことを確認してください。このくらいの長さではまだエラーになりません、
"""
# Error length=91
class JapaneseError:
"""
このドキュメントはテスト用のダミーです。十分に長い文章を用意しても大丈夫なことを確認してください。このくらいの長さではまだエラーになりません、もーっともっと伸ばすとエラーになります。
""" |
It should be, yeah. I'll \cc @MichaReiser who may know better! |
We don't report either of the two examples because both strings contain no whitespace, meaning there's no reasonable position where the user can break the text over multiple lines. Ruff reports both lines when adding whitespace somewhere inside the text # Ok length=84
class JapaneseOk:
"""
このドキュメントは テスト用のダミーです。十分に長い文章を用意しても大丈夫なことを確認してください。このくらいの長さではまだエラーになりません、
"""
# Error length=91
class JapaneseError:
"""
このドキュメントは テスト用のダミーです。十分に長い文章を用意しても大丈夫なことを確認してください。このくらいの長さではまだエラーになりません、もーっともっと伸ばすとエラーになります。
""" Ruff reports both lines because both exceed the unicode width of 88:
|
thanks for detailed explanation !
I misunderstood about this.
I also misunderstood about this. (I assumed ruff uses &str.chars().count().) Do you have any plan for move to grapheme cluster count ? (ex: use unicode_segmentation) |
Ruff did use This commit in the black repository has a good explanation why they (and Ruff) choose unicode width over character or grapheme count. |
We have a case where ruff is reporting an over count line (112 vs our standard 88) but if we attempt to split it into two lines black auto-formats it back to one line because it'll fit on one line. Is there still some misalignment between ruff and black, or is there a correct "fix" to this (other than adding query = '남포역립카페추천 ˇjjtat닷컴ˇ ≡제이제이♠♣ 남포역스파 남포역op남포역유흥≡남포역안마남포역오피 ♠♣' |
Oh, I only now realized that Black only changed the behavior for the preview style but not for the "stable" style. You can see this in the changes.md. Black also only changed the behavior for string literals only, not for all characters. @eviljeff could you try using Black's preview style to see if that fixes the issue? I wonder if we should add a configuration setting to allow users to pick their preferred "width" measurement unit. |
It does fix that problem - (Although if we wanted to use that as a workaround it wouldn't be a solution today as
Personally, I prefer measuring for the line length seen in a (modern) code editor that supports unicode, because a line limit is mostly about readability, but practically, aligning with black is the most efficient solution (unless black adds a similar setting - which I would guess is unlikely given the project ethos of minimal configuration). |
fwiw, looks like that bug(?) has been reported already as psf/black#3643 (the example I included was also a multi-line string) |
@MichaReiser I think that the new behavior (using unicode-width instead of byte-count) should be an opt-in feature because Black's preview style is also an opt-in feature until Jan 2024... it's too far. Note that, for the Black's preview style, "there are no guarantees around the stability of the output" according to their stability policy. I don't think we should use it for production code. |
Black's preview style is about to be released and it includes the change to measure strings using the unicode width rather the character length. Ruff's formatter already uses unicode width. We also updated our documentation to improve the explanation that it isn't the character limit when using emoji or east asian characters. |
Even ruff uses Unicode width to measure line length, the editor font might not respect that, which appears as an in-consistency line length in ruff's auto-formatting or E501 reporting. To resolve this, your favorite editor shall employ a carefully designed font. For example Sarasa-Gothic |
version: ruff 0.0.260
The line length calculation for non-ascii characters is incorrect:
Both lines are under the character limit, but ruff flags the second line as too long, reporting the number of bytes rather than the number of characters. This changed recently. 0.254 accepted this file, but 0.260 rejects it:
pyproject.toml:
The text was updated successfully, but these errors were encountered: