Skip to content

utf8 Module

The utf8 module provides low-level UTF-8 helpers for working with Unicode code points and byte-oriented operations.

Functions

RuneCountInString(s string)

Returns the number of Unicode code points (runes) in a UTF-8 string.

Signature:

utf8.RuneCountInString(s string) (int, error)

Example:

RuneCountInString Example
1
2
3
4
5
6
7
8
9
package main
import fmt, utf8

const s string = "こんにちは"

var count, err = utf8.RuneCountInString(s)
if err == None {
    fmt.Println("Rune count:", count)
}

DecodeRuneInString(s string)

Decodes the first UTF-8 encoded rune in s and returns its value and width in bytes.

Signature:

utf8.DecodeRuneInString(s string) (int, int, error)
  • On success: returns (runeValue, widthBytes, None).
  • On invalid encoding at the start of s: returns (RuneError, widthBytes, error).

Example (Thai "สวัสดี")

DecodeRuneInString Example
package main
import fmt, utf8

const s string = "สวัสดี"

var i int = 0
for i < len(s) {
    var slice = s[i:]
    var r, width, err = utf8.DecodeRuneInString(slice)
    if err != None {
        fmt.Println("Decode error:", err)
        break
    }
    fmt.Printf("U+%04X starts at %d\n", r, i)
    i = i + width
}

Valid(s string)

Reports whether a string is valid UTF-8.

Signature:

utf8.Valid(s string) (bool, error)

Example:

Valid Example
1
2
3
4
5
6
7
8
package main
import fmt, utf8

var s string = "こんにちは"
var ok, err = utf8.Valid(s)
if err == None {
    fmt.Println("Valid UTF-8:", ok)
}

EastAsianWidth(s string)

Computes a simple display-width metric for a UTF-8 string, treating East Asian wide/fullwidth characters as width 2, combining/control as 0, and others as 1.

Signature:

utf8.EastAsianWidth(s string) (int, error)

Example:

EastAsianWidth Example
1
2
3
4
5
6
7
8
9
package main
import fmt, utf8

const s string = "こんにちは"

var width, err = utf8.EastAsianWidth(s)
if err == None {
    fmt.Println("East Asian width:", width)
}

Relationship to unicode and encoding

  • utf8 focuses on byte-level UTF-8 operations (decode, count, validate, width).
  • Higher-level Unicode features such as normalization and character classification remain in the unicode module.
  • Some legacy UTF-8 helpers may still exist in encoding or unicode for backward compatibility, but new code should prefer the utf8 module.