-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Search before asking
- I searched the issues and found no similar issues.
What happened + What you expected to happen
R7rs suggests that string may contain unicode characters, and builtin method of string such as (string-ci=?) should handle i18n case mapping. So this interpreter should adjust inner encoding of String class.
There is several suggestions and requirements for unicode string handling in scheme. R7rs does not require constant complexity of (string-set!) and (string-ref), but requires index of string is index of code point. R7rs and The Scheme Programming Language does not suggest surrogate pairs in java.
Support of unicode string varies between standard library and spdlog. Standard library supports indexing on utf-32, but indexing on utf-8 and utf-16 string in standard library is byte index, rather than code point index. Standard library supports case mapping for unicode also. According to api interface of spdlog, this library may only receive utf-8 encoding message.
So I suggest to use utf-32 as encoding of strings appears in this project. Although utf-32 consumes more memory space, because character in utf-32 occupies 4 bits, it's easier than utf-8 and utf-16 to locate code point by given index.
Another choice is to use utf-8. Utf-8 consumes less space than utf-16 and utf-32, and can be scanned from start to end, therefore can be used to store scheme input code. C++ std lib lack methods to iterate codepoint from utf-8 string.
Reproduction way
THIS IS A FEATURE REQUEST AND PROPOSAL.
Anything else
Are you willing to submit a PR?
- Yes I am willing to submit a PR!