converters/simdutf: Unicode validation and transcoding with SIMD
This library provide fast Unicode functions such as
- ASCII, UTF-8, UTF-16LE/BE and UTF-32 validation, with and without error identification,
- transcoding between each of Latin1, UTF-8, UTF-16LE/BE, and UTF-32, with and without validation, with and without error identification
- From an UTF-8 string, compute the size of the Latin1/UTF-16/UTF-32 equivalent string,
- From an UTF-16LE/BE string, compute the size of the Latin1/UTF-8/UTF-32 equivalent string,
- From an UTF-32 string, compute the size of the UTF-8 or UTF-16LE equivalent string,
- UTF-8 and UTF-16LE/BE character counting.
- UTF-16 endianness change (UTF16-LE/BE to UTF-16-BE/LE)
The functions are accelerated using SIMD instructions (e.g., ARM NEON,
SSE, AVX, AVX-512, etc.). When your strings contain hundreds of
characters, we can often transcode them at speeds exceeding a billion
characters per second. You should expect high speeds not only with
English strings (ASCII) but also Chinese, Japanese, Arabic, and so
forth. We handle the full character range (including, for example,
emojis).
The library compiles down to a small library of a few hundred kilobytes.
Our functions are exception-free and non allocating. We have extensive
tests and extensive benchmarks.