-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Z3_get_lstring now adds escaping of some bytes. #5615
Comments
The get_lstring API didn't age well: Z3 used to only support ASCII (8 bits per character). It now supports up to 32 bits and defaults to a Unicode range as defined by SMTLIB string definition. |
…string, #5615 Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
I updated the C++ API by removing the "escaped" getter because they are both escaped (and there was another bug in one of the escape conversions to fix). There is a new function get_wstring() that returns a vector of unsigned. I am not sure if there is any more useful API (maybe one of the C++ versions for building Unicode strings works, but wchar definitely doesn't have enough bits with only 16). |
Thanks for the quick turnaround, this is great! I am only half familiar with this, but C++ has
In my (again limit) experience most situations have std::string with an encoded utf8 rather then a std::u32string. The conversion between a utf8-encoded std::string and a std::u32string of codepoints in the stl is shown here: https://stackoverflow.com/a/43302460/9517687 If you think it makes sense, I can make the changes to implement something like
The change between std::u32string and std::vector versions is not a big difference, just saves an extra conversion step if you want to use the conversion code I linked above. The new version for creating exprs avoids some questions about how to properly call the |
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
…haracters in get_lstring Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
I have iterated on this with a few updates. While I haven't adapted the more descriptive naming convention with "unescaped_utf8_string_val" I believe the functionality now addresses accessing characters properly. |
I am closing this for now. If there is something that needs fixing add a comment, but otherwise I have to assume it is handled. |
For #2286 the function Z3_get_lstring was introduced, and its documentation says it returns an unescaped string. We use this in the C++ API via "get_string" which was introduced in a second commit. Our team uses this function to get the string in a C++ program, and we need to get the string to be exactly the string produced by z3 (e.g. if we are asserting the length is 3 we expect the length of the produced string to be 3). Furthermore, the result is now ambiguous: "\u{a}" could mean that Z3 found the one-character string "\n" or the 5 character string "\u{a}".
As of this commit and this commit the function now does escaping. In particular
c.string_val("\n", 1).get_string() == "\n"
is no longer true. Is there a version to preserve the round-trip?I don't have the context of the first commit that escaped non-ascii bytes, but I would expect that that round trip works for even unicode if that is now supported.
The text was updated successfully, but these errors were encountered: