Skip to content
This repository has been archived by the owner on Mar 26, 2024. It is now read-only.

RFC: Flexible String Representation #303

Closed
omasanori opened this issue Jul 23, 2015 · 4 comments
Closed

RFC: Flexible String Representation #303

omasanori opened this issue Jul 23, 2015 · 4 comments

Comments

@omasanori
Copy link
Contributor

Related to #211.

Python community published a proposal named PEP 0393, "Flexible String Representation" and implemented in 3.3. picrin doesn't use wchar_t now, but we might benefit from any similar representation. What do you think?

Reference: https://www.python.org/dev/peps/pep-0393/

@KeenS
Copy link
Member

KeenS commented Jul 23, 2015

Once wasabiz and I discussed what picrin's internal representation of unicode would be. There are some pros of UTF-8 and several cons of UTF-16 (or UTF-32).

  • As picrin is a lightweight implementation, memory consuming UTF-16 (or UTF-32) is not suitable.
  • wchar_t is specified to be at least 16bit width, that means you cannot use wchar_t to represent UTF-32
  • Because picrin uses ropes to hold strings, UTF-8's O(n) problem does not matter so much.

Thus remaining char and using UTF-8 will do, I think. How do you think @wasabiz ?

@omasanori
Copy link
Contributor Author

@KeenS Thank you. I agree with your opinion.

Some additions from my perspective:

  • Caching UTF-32 sequences may boost performance of character-oriented operations, at the cost of memory efficiency as @KeenS pointed out.
  • Strictly speaking, wchar_t is not for storing Unicode characters, but wide characters. The portable usage of wchar_t is treating it as an opaque type and not relying on its bit pattern.
    • Yes, the big three platforms all treat it as UTF-{16,32} code unit, but e.g. NetBSD doesn't.

@nyuichi
Copy link
Member

nyuichi commented Jul 24, 2015

@omasanori @KeenS

Sounds nice. I don't mind changing string internal representation unless it compiles on freestanding environment. UTF-32 sequence is good for programs doing heavy string modifications, but I think such a case is no more than 1% of the total. Providing a different structure is rational.

@nyuichi
Copy link
Member

nyuichi commented Jul 24, 2015

@omasanori @KeenS

Even if it breaks no-libc rule, if we can switch it off with macros, it'll probably be ok.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants