Skip to content
This repository was archived by the owner on Aug 17, 2022. It is now read-only.
This repository was archived by the owner on Aug 17, 2022. It is now read-only.

Support UTF-16 as an additional encoding #136

@dcodeIO

Description

@dcodeIO

From WebAssembly/design#1419 (comment):

@lukewagner: I think it would make sense to talk about supporting UTF-16 as an additional encoding in the canonical ABI of string. But that's a whole separate topic with a few options, so I don't want to mix that up with the abstract string semantics which need to be understood first.

I am very interested in making this happen, as it would already be a considerable improvement for languages using a 16-bit Unicode representation. What I could imagine currently is having either separate instructions, an immediate (but then it may as well be separate instructions I guess) or a parameter. For example:

list.lift_utf8 [...]
list.lift_utf16 [...]

list.is_utf8 [...]
list.is_utf16 [...]

list.lower_utf8 [...]
list.lower_utf16 [...]

Is that what you had in mind? If not, I am of course very interested in the other options :)

It may also be worthwhile to consider list.lift_latin1, which corresponds to narrow UTF-16 (with the high zero bytes left out), as it is a common optimization strategy in UTF-16 languages (to save memory and better utilize the CPU cache when possible). I do not feel strongly about whether or not we need the latter in an MVP already, though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions