utf8 Module
The utf8 module provides low-level UTF-8 helpers for working with Unicode code points and byte-oriented operations.
Functions
RuneCountInString(s string)
Returns the number of Unicode code points (runes) in a UTF-8 string.
Signature:
Example:
| RuneCountInString Example | |
|---|---|
DecodeRuneInString(s string)
Decodes the first UTF-8 encoded rune in s and returns its value and width in bytes.
Signature:
- On success: returns
(runeValue, widthBytes, None). - On invalid encoding at the start of
s: returns(RuneError, widthBytes, error).
Example (Thai "สวัสดี")
| DecodeRuneInString Example | |
|---|---|
Valid(s string)
Reports whether a string is valid UTF-8.
Signature:
Example:
| Valid Example | |
|---|---|
EastAsianWidth(s string)
Computes a simple display-width metric for a UTF-8 string, treating East Asian wide/fullwidth characters as width 2, combining/control as 0, and others as 1.
Signature:
Example:
| EastAsianWidth Example | |
|---|---|
Relationship to unicode and encoding
utf8focuses on byte-level UTF-8 operations (decode, count, validate, width).- Higher-level Unicode features such as normalization and character classification remain in the
unicodemodule. - Some legacy UTF-8 helpers may still exist in
encodingorunicodefor backward compatibility, but new code should prefer theutf8module.