Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for C_EU.UTF-8 locale in ksh93 #177

Closed
siteshwar opened this issue Dec 2, 2017 · 11 comments
Closed

Support for C_EU.UTF-8 locale in ksh93 #177

siteshwar opened this issue Dec 2, 2017 · 11 comments

Comments

@siteshwar
Copy link
Contributor

I could not find it documented anywhere, however there is one comment about this locale in the test cases :

# this locale is supported by ast on all platforms
# EU for { decimal_point="," thousands_sep="." }
locale=C_EU.UTF-8
@siteshwar siteshwar added the RFC label Dec 3, 2017
@krader1961
Copy link
Contributor

From (among others) the https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html page:

Portability Note: With the notable exception of the standard locale names ‘C’ and ‘POSIX’, locale names are system-specific.

I am not aware of any non-AST implementation that provides support for "territory" extensions to the C or POSIX codeset. Given that Unicode encodings provide support for code points like the Euro currency symbol there should not be a need for the C_EU.UTF-8 locale name. The C (or POSIX) locale implicitly does not support things like the EU territory for number formatting. The need for such support is why the misguided ISO-8859-x standards exist and were ultimately replaced by the Unicode standard.

Anyone wanting support for EU formatting of numbers can do so by setting the LC_NUMBER env var while leaving LANG set to C.

The bottom line is that ksh should use the same distro provided library code that almost every other program uses for localization. If only ksh93 recognizes LANG=C_EU.UTF-8 what use is it?

@kdudka
Copy link
Contributor

kdudka commented Dec 5, 2017

How are decimal_point and thousands_sep used in ksh?

@siteshwar
Copy link
Contributor Author

It's used with builtins for e.g.

$ echo $LANG
en_US.UTF-8
$ printf "%'d\n" 1000
1,000
$ export LANG=C_EU.UTF-8
$ printf "%'d\n" 1000   
1.000

@krader1961
Copy link
Contributor

Yes, but since it isn't a standard locale no other program will honor it. For example, if you do this on Ubuntu you get no thousands separator:

$ env LANG=C_EU.UTF-8 /usr/bin/printf "%'d\n" 1000
1000
$ echo $LANG
en_US.UTF-8
$ /usr/bin/printf "%'d\n" 1000
1,000

The C_EU.UTF-8 locale is only meaningful inside ksh which is borderline useless and going to be the source of many bug reports due to the confusion it causes.

@krader1961
Copy link
Contributor

krader1961 commented Dec 6, 2017

Basically, this "feature" is nonstandard, only usable within ksh (or other programs built against the AST libraries), and guaranteed to be a source of confusion and bug reports. It should be removed. Even better would be to remove the ksh93 dependency on the AST locale support code and use what is provided by the OS. Which is what we're slowly doing with other things like the string manipulation functions.

@kdudka
Copy link
Contributor

kdudka commented Dec 6, 2017

Agreed. If the locale is understood only by ksh built-ins, it is not very useful.

@krader1961
Copy link
Contributor

I propose we remove the ksh dependency on the libast locale code ASAP unless someone provides a convincing argument for retaining the dependency. Like the SFIO subsystem and other parts of the AST code base it would have been great if it had been adopted by a the broader UNIX community. But that didn't happen. It would be better if ksh used the standard locale support provided by any (semi-)compatible POSIX environment. We should only be using fallback implementations to work around shortcomings of the target platform. Which for something like locale support should be done by mapping the POSIX functions to native functions rather than using an independent implementation.

@kdudka
Copy link
Contributor

kdudka commented Dec 8, 2017

I agree that using local (re)implementation of the system libraries is a bad approach. The only thing I am afraid of is that switching the implementation of such a core functionality must have observable side effects and there is no reliable way to check in advance what is going to break in all the ksh scripts that people have been using for the last few decades.

@krader1961
Copy link
Contributor

...switching the implementation of such a core functionality must have observable side effects...

Yes, but at this point we're looking at the next published, stable, version being what amounts to a major release. Even if we don't implement any new features. Precisely because changes like this one have a very small, but non zero, chance of breaking existing uses.

As a practical matter I will be surprised if there is anyone using the C_EU.UTF-8 locale since, as said above, only AST code recognizes it. Still, it's something we'll want to clearly document. In fact, I'm going to start a CHANGELOG.md file since we should be keeping track of changes that could be noticed by a user some place of than the git commit history.

@jelmd
Copy link

jelmd commented Dec 9, 2017

Hmmm, I suggest to study the code and tests a little bit deeper and if they were still not understood, just use online docs like ML archives, etc. to find out, why this feature exists, the intention behind it - google makes this really easy ;-) .

A serious developer would have find out in < 5 min, that C_EU.UTF-8 and C.UTF-8 are "just" test locales, introduced with the intention to write regression tests with deterministic results across all
platforms. But since the current target of development seems to be RH linux, only, who cares ...

siteshwar added a commit that referenced this issue Jan 14, 2018
Also disable a test case for C_EU.UTF-8 locale. This test case started
failing after removing strtold function.

See #177 for discussion about
C_EU.UTF-8 locale.
@krader1961 krader1961 self-assigned this Sep 5, 2018
@krader1961
Copy link
Contributor

I've removed the AST locale subsystem. While doing that work it became obvious that it mostly depended on the platform locale subsystem. So unless a non-standard extension, like locale C_EU.UTF-8, was used ksh was relying on the platform implementation for defining things like the decimal and thousands separator. And simply relying on standard locales like en_US.UTF-8 which can now be found on every platform makes it easy enough to write unit tests that require features not available in the C/POSIX locale.

Closing since this has been resolved by virtue of depending solely on whatever locales the platform provides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants