Page MenuHomeFreeBSD

localedata: add some exceptions to utf8proc widths
ClosedPublic

Authored by kevans on Nov 7 2024, 5:01 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 11, 7:16 AM
Unknown Object (File)
Fri, Jan 10, 4:04 PM
Unknown Object (File)
Dec 8 2024, 8:46 PM
Unknown Object (File)
Nov 28 2024, 11:36 PM
Unknown Object (File)
Nov 28 2024, 5:14 AM
Unknown Object (File)
Nov 21 2024, 7:42 AM
Unknown Object (File)
Nov 15 2024, 7:46 AM
Unknown Object (File)
Nov 12 2024, 1:38 AM
Subscribers

Details

Summary

commit 88082b41b38f62f74e1a7116546bda6912e2efad (HEAD -> kbsd/localedata)
Author: Kyle Evans <kevans@FreeBSD.org>
Date: Wed Nov 6 22:55:05 2024 -0600

localedata: update widths.txt after recent Hangul exceptions

Sponsored by:   Klara, Inc.

commit 18264bd2ce00cfcd0a86cddab5da5502395661d3
Author: Kyle Evans <kevans@FreeBSD.org>
Date: Wed Nov 6 22:51:25 2024 -0600

localedata: add some exceptions to utf8proc widths

Ignorable characters will technically have a width of 1, but we should
treat them as zero-width.  This corrects some hangul filler characters'
width, which is otherwise equivalent to the block of characters around
them.

Hangul Jamo medial vowels and final consonants are reportedly combining
characters that won't take up any columns on their own and should be
reported as zero-width, so add an exception for these as well to reflect
how they work in practice.

Sponsored by:   Klara, Inc.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kevans requested review of this revision.Nov 7 2024, 5:01 AM

Although Hangul Jamo charaters occupy space, it seems many implementations use zero-width nowadays, including GNU libc.

https://sourceware.org/bugzilla/show_bug.cgi?id=19852
https://github.com/ridiculousfish/widecharwidth/issues/16

I agree that we should follow other implementations.

However, all fillers must occupy proper spaces, i.e., <HANGUL_CHOSEONG_FILLER> => 2, <HANGUL_JUNGSEONG_FILLER> => 0, and <HALFWIDTH_HANGUL_FILLER> => 1.

This revision is now accepted and ready to land.Nov 7 2024, 8:07 AM

Although Hangul Jamo charaters occupy space, it seems many implementations use zero-width nowadays, including GNU libc.

https://sourceware.org/bugzilla/show_bug.cgi?id=19852
https://github.com/ridiculousfish/widecharwidth/issues/16

I agree that we should follow other implementations.

However, all fillers must occupy proper spaces, i.e., <HANGUL_CHOSEONG_FILLER> => 2, <HANGUL_JUNGSEONG_FILLER> => 0, and <HALFWIDTH_HANGUL_FILLER> => 1.

Ah, ok, thanks for confirming that- so if I just drop the ignorable check, the fillers return to their previous value with exception to <HANGUL_JUNGSEONG_FILLER> since it falls in the Hangul Kamo range that we're forcing to zero-width. That looks right based on what you've written, since omitted values default to 1.

Although Hangul Jamo charaters occupy space, it seems many implementations use zero-width nowadays, including GNU libc.

https://sourceware.org/bugzilla/show_bug.cgi?id=19852
https://github.com/ridiculousfish/widecharwidth/issues/16

I agree that we should follow other implementations.

However, all fillers must occupy proper spaces, i.e., <HANGUL_CHOSEONG_FILLER> => 2, <HANGUL_JUNGSEONG_FILLER> => 0, and <HALFWIDTH_HANGUL_FILLER> => 1.

Ah, ok, thanks for confirming that- so if I just drop the ignorable check, the fillers return to their previous value with exception to <HANGUL_JUNGSEONG_FILLER> since it falls in the Hangul Kamo range that we're forcing to zero-width. That looks right based on what you've written, since omitted values default to 1.

Yes, that is exactly what I meant.

jkim requested changes to this revision.Nov 7 2024, 5:06 PM
jkim added a reviewer: jkim.
This revision now requires changes to proceed.Nov 7 2024, 5:07 PM

Drop the ignorable character bit, which renders the soft-hyphen exception redundant.

This revision is now accepted and ready to land.Nov 7 2024, 6:30 PM