-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow any characters in filenames / labels #374
Comments
|
In POSIX, filenames are "bags of bytes"--there is no encoding; however, |
Well, I think we can probably require valid UTF-8 file names and strongly recommend that people use UTF-8 for their file system. For labels / BUILD files, we probably need an escaping scheme, at least for the control characters. If there's a file that isn't valid UTF-8, we give an error message? |
Our company codes mainly in C++, but our frontend uses a lot of JS and nodejs modules which have all sorts of characters in the filenames--for example, -, #, @, (, and ). Right now this is a major blocker for getting all our codebase under one build system since we can't reference files with semi-special characters. I don't think Bazel should decide what characters are acceptable in file names, as that reduces file names to those that fit both (1) supported languages and (2) supported platforms. This seems unnecessarily restrictive, and is becoming a major pain point for us. |
Agreed. Unfortunately, it's a bit tricky to fix, as a lot of code assumes that the mapping from labels to file names (and vice versa) is trivial, and doesn't require escaping. Any suggestions on an escaping scheme? |
URL based? |
You mean an own URI scheme? Sounds good. |
I mean replacing special characters by %XX where XX is the UTF-8 code in hexa. |
Sorry, I won't be able to work on this. @philwo had an interest, maybe he can make some progress here. :-/ |
This blocks our Bazel deployment as well. |
This is blocking us. We have a templating system where we need to build our template files. The filenames themselves contains template variables (e.g. |
I totally agree that this is important, should be done, I want this myself, however I don't have the time to work on it in the coming months, thus I have to unassign it. |
Here is my proposal:
|
Plain ASCII (and even that partial) makes this feels like we are in the early 90s. There are reasonable ways to handle that. If my project is C/C++, and it is cross-platform, and I have problems handling Unicode, then I will not use Unicode in file names. And the fact that bazel "explodes" is not such a problem. Even better would be to to allow for a character-set option in the project file. I did not move one project to bazel because test units check that Unicode file names work. |
Bazel pep_deps rule can not handle files names with `=` in the name [1]. Unfortunately, there is no way to fix this issue in Bazel. [1] bazelbuild/bazel#374 Closes #12643 from iampat/iampat/fix_bazel_pep_import Authored-by: Ali Amiri <ali@amiri.dev> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
This is similar to 53e9d3e. Over at https://github.com/cockroachdb/cockroach, we use bazel and it chokes on this file: ``` java.io.IOException: Error extracting [...]/com_github_maruel_panicparse_v2-v2.2.1.zip to [...]: [...]/com_github_maruel_panicparse_v2/cmd/panic/internal/utf8/?tf8.go (Illegal byte sequence) ``` which is why we're on a fork so far. It would be nice to avoid this. There is an (eternal, it seems) [upstream discussion], so there is little home of bazel starting to support non-unicode filenames. I wonder if the loss in test coverage (if any) is acceptable in return for avoiding build problems like the ones we've been seeing. [upstream discussion]: bazelbuild/bazel#374
This is similar to 53e9d3e. Over at https://github.com/cockroachdb/cockroach, we use bazel and it chokes on this file: ``` java.io.IOException: Error extracting [...]/com_github_maruel_panicparse_v2-v2.2.1.zip to [...]: [...]/com_github_maruel_panicparse_v2/cmd/panic/internal/utf8/?tf8.go (Illegal byte sequence) ``` which is why we're on a fork so far. It would be nice to avoid this. There is an (eternal, it seems) [upstream discussion], so there is little home of bazel starting to support non-unicode filenames. I wonder if the loss in test coverage (if any) is acceptable in return for avoiding build problems like the ones we've been seeing. [upstream discussion]: bazelbuild/bazel#374
This is similar to 53e9d3e. Over at https://github.com/cockroachdb/cockroach, we use bazel and it chokes on this file: ``` java.io.IOException: Error extracting [...]/com_github_maruel_panicparse_v2-v2.2.1.zip to [...]: [...]/com_github_maruel_panicparse_v2/cmd/panic/internal/utf8/?tf8.go (Illegal byte sequence) ``` which is why we're on a fork so far. It would be nice to avoid this. There is an (eternal, it seems) [upstream discussion], so there is little home of bazel starting to support non-unicode filenames. I wonder if the loss in test coverage (if any) is acceptable in return for avoiding build problems like the ones we've been seeing. [upstream discussion]: bazelbuild/bazel#374
This is similar to 53e9d3e. Over at https://github.com/cockroachdb/cockroach, we use bazel and it chokes on this file: ``` java.io.IOException: Error extracting [...]/com_github_maruel_panicparse_v2-v2.2.1.zip to [...]: [...]/com_github_maruel_panicparse_v2/cmd/panic/internal/utf8/?tf8.go (Illegal byte sequence) ``` which is why we're on a fork so far. It would be nice to avoid this. There is an (eternal, it seems) [upstream discussion], so there is little home of bazel starting to support non-unicode filenames. I wonder if the loss in test coverage (if any) is acceptable in return for avoiding build problems like the ones we've been seeing. [upstream discussion]: bazelbuild/bazel#374
This is similar to 53e9d3e. Over at https://github.com/cockroachdb/cockroach, we use bazel and it chokes on this file: ``` java.io.IOException: Error extracting [...]/com_github_maruel_panicparse_v2-v2.2.1.zip to [...]: [...]/com_github_maruel_panicparse_v2/cmd/panic/internal/utf8/?tf8.go (Illegal byte sequence) ``` which is why we're on a fork so far. It would be nice to avoid this. There is an (eternal, it seems) [upstream discussion], so there is little home of bazel starting to support non-unicode filenames. [upstream discussion]: bazelbuild/bazel#374
I'm reconsidering what I said earlier about allowing Footnotes
|
TLDR: I think it is not the job of the build tool to force naming conventions on a project A project might be Linux and Mac only, no Windows. The file naming rules would be selected by the developers looking at the big picture: Should a tool disallow the existence of the same file with different case? Should a tool force a certain Unicode normalization? What about a Mac OS accessing a disk shared by a Linux machine using Samba (a Windows protocol)? Long story short: not a bazel responsibility to force any limitations on what I can do or not. |
I agree, mostly. My point was that there are reasons why we might want to/need to preserve some limitations on legal target names. Ideally any name would be supported, but if certain patterns are difficult to support, if they would anyway be illegal on some platforms that might be a sufficient excuse to not support them. I definitely feel like some of the proposed solutions start to get a bit convoluted. |
Illegal on some platforms is not a sufficient excuse to not support a filename pattern. Some filenames are system provided and cannot change to match the build system's preferences. Choosing some subset of valid filename patterns is equivalent to not having full support for any platform that is not encapsulated by the entire subset. We should be supporting the superset of all filename patterns instead. |
Encountered this issue while using aws-sdk-cpp as a dependency. Luckily in that case it was only test file that we could safely remove with |
Additionally bump aws-sdk-cpp version to 1.11.279 Encountered issues with bazel. Non ASCII filenames don't work inside bazel. Workaround is to remove test file from aws-sdk-cpp. bazelbuild/bazel#374 Building with rules_foreign cmake doesn't work with parallel builds. This is something we could optimize later, for now hardcode some low value: bazel-contrib/rules_foreign_cc#329 On Redhat to properly build targets we now have to add: --//:distro=redhat By default distro is set to ubuntu. This is workaround for not bazel not being able to differentiate between the two. There is difference in aws-sdk-cpp static libraries location after build. Ideally we should find how to use select on aws-sdk-cpp repo BUILD file which would rely on main repo flag, but I didn't find way to support it there. JIRA:CVS-130367
Additionally bump aws-sdk-cpp version to 1.11.279 Encountered issues with bazel. Non ASCII filenames don't work inside bazel. Workaround is to remove test file from aws-sdk-cpp. bazelbuild/bazel#374 Building with rules_foreign cmake doesn't work with parallel builds. This is something we could optimize later, for now hardcode some low value: bazel-contrib/rules_foreign_cc#329 On Redhat to properly build targets we now have to add: --//:distro=redhat By default distro is set to ubuntu. This is workaround for not bazel not being able to differentiate between the two. There is difference in aws-sdk-cpp static libraries location after build. Ideally we should find how to use select on aws-sdk-cpp repo BUILD file which would rely on main repo flag, but I didn't find way to support it there. JIRA:CVS-130367
We're running into this in erlang 26 which bundles emoji filenames: https://github.com/erlang/otp/blob/master/lib/stdlib/test/edlin_expand_SUITE_data/.hidden%F0%9F%98%80_file Reported upstream as not a bug: erlang/otp#8005 |
@atobiszei @snakethatlovesstaticlibs Both your cases sound like you are in a slightly different situation: Files with complex file names are part of some external dependency you consume, but aren't relevant to its build (for example, because they are part of the test suite). The errors arise while extracting an archive. This use case should be fully supported in Bazel 6.4.0 and higher, could you give that a try? |
@fmeum I worked around this by adding the test directories to an For reference, the error I saw was:
sorry if this is unrelated to the issue in the original report, it sounded relevant while I was debugging |
Ultimately any character can be part of a filename. We should probably allow that.
Some mangling to generate the corresponding label should probably be done.
Original report on the mailing-list:
https://groups.google.com/d/msgid/bazel-discuss/CAN0GiO3__5jXo5rZqroSj0mFxpqCzUZZVkY%3DSNsJK1%2BZ1BdJLg%40mail.gmail.com
The text was updated successfully, but these errors were encountered: