[Discussion] string encoding #5

jedwards1211 · 2020-09-23T02:12:36Z

Hi there, Ć looks awesome and I really want to use it but for some of my use cases (file parsing) dealing with string encoding would be hard...

I'm not experienced in working with unicode string encodings in C/C++ and I don't know if you are either, but have you had any thoughts about what it would take to make Ć strings unicode (or maybe a pragma to turn on unicode strings)?

It's something I might look into contributing if you're open to it and would like to give me tips on working with the codebase.
In a quest to make cross-language APIs I've determined that Haxe definitely won't work, and SWIG/Emscripten seem like they would be workable, but a huge hassle compared to if I could use Ć.

pfusik · 2020-09-23T07:46:52Z

Ć is already Unicode-capable. The actual string encoding varies between the target languages. Are you concerned with the C or C++ interface? The C and C++ strings are expected to be UTF-8-encoded, which is the default encoding on modern GNU/Linux and macOS.
UTF-8 is also widely used for text file encoding on Windows. However, Windows API historically uses UTF-16. You can convert between UTF-8 and UTF-16 of course.

jedwards1211 · 2020-09-23T21:31:56Z

Oh I didn't realize that the C/C++ strings would be UTF-8, that's great news! I just assumed because somewhere in the docs you said if you're planning to do a bunch of string manipulation, use perl, and I know string encoding in C/C++ is kind of crazy.

In that case, I'll give Ć a try soon 😃

jedwards1211 · 2020-09-23T21:32:40Z

There's not currently a Ć-native regex that abstracts the differences between target languages is there? Feel free to let me know your thoughts on that.

pfusik · 2020-09-24T15:11:33Z

I started adding regular expressions today. So far it's just one method, see its test.
Next up: retrieving the position and contents of the match, then the captures.

jedwards1211 · 2020-09-24T18:16:16Z

Wow cool! I'm gonna start playing around with Ć this weekend.

pfusik · 2020-09-25T12:28:47Z

Match location and captures. It's implemented for C#, Java, JavaScript and Python.
Next up: documentation, Regex object with pre-processed expression for improved performance of repeated searches.

pfusik · 2020-10-23T18:23:10Z

Unicode capabilities explained. Regexes implemented and documented. Can we close this?

jedwards1211 · 2020-10-23T18:27:15Z

Yup!

pfusik self-assigned this Oct 1, 2020

pfusik closed this as completed Oct 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] string encoding #5

[Discussion] string encoding #5

jedwards1211 commented Sep 23, 2020 •

edited

Loading

pfusik commented Sep 23, 2020

jedwards1211 commented Sep 23, 2020 •

edited

Loading

jedwards1211 commented Sep 23, 2020

pfusik commented Sep 24, 2020

jedwards1211 commented Sep 24, 2020

pfusik commented Sep 25, 2020

pfusik commented Oct 23, 2020

jedwards1211 commented Oct 23, 2020

[Discussion] string encoding #5

[Discussion] string encoding #5

Comments

jedwards1211 commented Sep 23, 2020 • edited Loading

pfusik commented Sep 23, 2020

jedwards1211 commented Sep 23, 2020 • edited Loading

jedwards1211 commented Sep 23, 2020

pfusik commented Sep 24, 2020

jedwards1211 commented Sep 24, 2020

pfusik commented Sep 25, 2020

pfusik commented Oct 23, 2020

jedwards1211 commented Oct 23, 2020

jedwards1211 commented Sep 23, 2020 •

edited

Loading

jedwards1211 commented Sep 23, 2020 •

edited

Loading