Support for C_EU.UTF-8 locale in ksh93 #177

siteshwar · 2017-12-02T19:33:36Z

I could not find it documented anywhere, however there is one comment about this locale in the test cases :

# this locale is supported by ast on all platforms
# EU for { decimal_point="," thousands_sep="." }
locale=C_EU.UTF-8

The text was updated successfully, but these errors were encountered:

krader1961 · 2017-12-04T03:37:05Z

From (among others) the https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html page:

Portability Note: With the notable exception of the standard locale names ‘C’ and ‘POSIX’, locale names are system-specific.

I am not aware of any non-AST implementation that provides support for "territory" extensions to the C or POSIX codeset. Given that Unicode encodings provide support for code points like the Euro currency symbol there should not be a need for the C_EU.UTF-8 locale name. The C (or POSIX) locale implicitly does not support things like the EU territory for number formatting. The need for such support is why the misguided ISO-8859-x standards exist and were ultimately replaced by the Unicode standard.

Anyone wanting support for EU formatting of numbers can do so by setting the LC_NUMBER env var while leaving LANG set to C.

The bottom line is that ksh should use the same distro provided library code that almost every other program uses for localization. If only ksh93 recognizes LANG=C_EU.UTF-8 what use is it?

kdudka · 2017-12-05T20:31:41Z

How are decimal_point and thousands_sep used in ksh?

siteshwar · 2017-12-06T03:08:23Z

It's used with builtins for e.g.

$ echo $LANG
en_US.UTF-8
$ printf "%'d\n" 1000
1,000
$ export LANG=C_EU.UTF-8
$ printf "%'d\n" 1000   
1.000

krader1961 · 2017-12-06T03:18:33Z

Yes, but since it isn't a standard locale no other program will honor it. For example, if you do this on Ubuntu you get no thousands separator:

$ env LANG=C_EU.UTF-8 /usr/bin/printf "%'d\n" 1000
1000
$ echo $LANG
en_US.UTF-8
$ /usr/bin/printf "%'d\n" 1000
1,000

The C_EU.UTF-8 locale is only meaningful inside ksh which is borderline useless and going to be the source of many bug reports due to the confusion it causes.

krader1961 · 2017-12-06T03:27:08Z

Basically, this "feature" is nonstandard, only usable within ksh (or other programs built against the AST libraries), and guaranteed to be a source of confusion and bug reports. It should be removed. Even better would be to remove the ksh93 dependency on the AST locale support code and use what is provided by the OS. Which is what we're slowly doing with other things like the string manipulation functions.

kdudka · 2017-12-06T07:53:02Z

Agreed. If the locale is understood only by ksh built-ins, it is not very useful.

krader1961 · 2017-12-08T08:12:25Z

I propose we remove the ksh dependency on the libast locale code ASAP unless someone provides a convincing argument for retaining the dependency. Like the SFIO subsystem and other parts of the AST code base it would have been great if it had been adopted by a the broader UNIX community. But that didn't happen. It would be better if ksh used the standard locale support provided by any (semi-)compatible POSIX environment. We should only be using fallback implementations to work around shortcomings of the target platform. Which for something like locale support should be done by mapping the POSIX functions to native functions rather than using an independent implementation.

kdudka · 2017-12-08T08:58:18Z

I agree that using local (re)implementation of the system libraries is a bad approach. The only thing I am afraid of is that switching the implementation of such a core functionality must have observable side effects and there is no reliable way to check in advance what is going to break in all the ksh scripts that people have been using for the last few decades.

krader1961 · 2017-12-09T00:44:42Z

...switching the implementation of such a core functionality must have observable side effects...

Yes, but at this point we're looking at the next published, stable, version being what amounts to a major release. Even if we don't implement any new features. Precisely because changes like this one have a very small, but non zero, chance of breaking existing uses.

As a practical matter I will be surprised if there is anyone using the C_EU.UTF-8 locale since, as said above, only AST code recognizes it. Still, it's something we'll want to clearly document. In fact, I'm going to start a CHANGELOG.md file since we should be keeping track of changes that could be noticed by a user some place of than the git commit history.

jelmd · 2017-12-09T19:16:00Z

Hmmm, I suggest to study the code and tests a little bit deeper and if they were still not understood, just use online docs like ML archives, etc. to find out, why this feature exists, the intention behind it - google makes this really easy ;-) .

A serious developer would have find out in < 5 min, that C_EU.UTF-8 and C.UTF-8 are "just" test locales, introduced with the intention to write regression tests with deterministic results across all
platforms. But since the current target of development seems to be RH linux, only, who cares ...

Also disable a test case for C_EU.UTF-8 locale. This test case started failing after removing strtold function. See #177 for discussion about C_EU.UTF-8 locale.

krader1961 · 2018-09-05T02:46:14Z

I've removed the AST locale subsystem. While doing that work it became obvious that it mostly depended on the platform locale subsystem. So unless a non-standard extension, like locale C_EU.UTF-8, was used ksh was relying on the platform implementation for defining things like the decimal and thousands separator. And simply relying on standard locales like en_US.UTF-8 which can now be found on every platform makes it easy enough to write unit tests that require features not available in the C/POSIX locale.

Closing since this has been resolved by virtue of depending solely on whatever locales the platform provides.

siteshwar added the RFC label Dec 3, 2017

siteshwar mentioned this issue Jan 14, 2018

Remove compatibility function for strtold #355

Merged

krader1961 added compatibility cleanup labels Jan 14, 2018

siteshwar added a commit that referenced this issue Jan 14, 2018

Remove compatibility function for strtold

206f377

Also disable a test case for C_EU.UTF-8 locale. This test case started failing after removing strtold function. See #177 for discussion about C_EU.UTF-8 locale.

krader1961 self-assigned this Sep 5, 2018

krader1961 closed this as completed Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for C_EU.UTF-8 locale in ksh93 #177

Support for C_EU.UTF-8 locale in ksh93 #177

siteshwar commented Dec 2, 2017

krader1961 commented Dec 4, 2017

kdudka commented Dec 5, 2017

siteshwar commented Dec 6, 2017

krader1961 commented Dec 6, 2017

krader1961 commented Dec 6, 2017 •

edited

Loading

kdudka commented Dec 6, 2017

krader1961 commented Dec 8, 2017

kdudka commented Dec 8, 2017

krader1961 commented Dec 9, 2017

jelmd commented Dec 9, 2017

krader1961 commented Sep 5, 2018

Support for C_EU.UTF-8 locale in ksh93 #177

Support for C_EU.UTF-8 locale in ksh93 #177

Comments

siteshwar commented Dec 2, 2017

krader1961 commented Dec 4, 2017

kdudka commented Dec 5, 2017

siteshwar commented Dec 6, 2017

krader1961 commented Dec 6, 2017

krader1961 commented Dec 6, 2017 • edited Loading

kdudka commented Dec 6, 2017

krader1961 commented Dec 8, 2017

kdudka commented Dec 8, 2017

krader1961 commented Dec 9, 2017

jelmd commented Dec 9, 2017

krader1961 commented Sep 5, 2018

krader1961 commented Dec 6, 2017 •

edited

Loading