HomeFreeBSD

converters/simdutf: Unicode validation and transcoding with SIMD

Description

converters/simdutf: Unicode validation and transcoding with SIMD

This library provide fast Unicode functions such as

  • ASCII, UTF-8, UTF-16LE/BE and UTF-32 validation, with and without error identification,
  • transcoding between each of Latin1, UTF-8, UTF-16LE/BE, and UTF-32, with and without validation, with and without error identification
  • From an UTF-8 string, compute the size of the Latin1/UTF-16/UTF-32 equivalent string,
  • From an UTF-16LE/BE string, compute the size of the Latin1/UTF-8/UTF-32 equivalent string,
  • From an UTF-32 string, compute the size of the UTF-8 or UTF-16LE equivalent string,
  • UTF-8 and UTF-16LE/BE character counting.
  • UTF-16 endianness change (UTF16-LE/BE to UTF-16-BE/LE)

The functions are accelerated using SIMD instructions (e.g., ARM NEON,
SSE, AVX, AVX-512, etc.). When your strings contain hundreds of
characters, we can often transcode them at speeds exceeding a billion
characters per second. You should expect high speeds not only with
English strings (ASCII) but also Chinese, Japanese, Arabic, and so
forth. We handle the full character range (including, for example,
emojis).

The library compiles down to a small library of a few hundred kilobytes.
Our functions are exception-free and non allocating. We have extensive
tests and extensive benchmarks.

WWW: https://simdutf.github.io/simdutf/

Details

Provenance
fuzAuthored on Oct 20 2023, 5:07 PM
Parents
R11:a8b90e8c06d5: databases/tiledb: fix build on armv6/armv7
Branches
Unknown
Tags
Unknown