Skip to content

Regex

Regular expressions using Go's RE2 engine (same syntax as Go's regexp). All functions return a tuple (result, error) following Harneet's stdlib conventions.

Import

Import
import regex

Functions

Basic Pattern Matching

  • regex.Match(pattern string, s string) (boolean, error)
  • regex.FindString(pattern string, s string) (string, error)
  • regex.FindAllString(pattern string, s string, n int) (array, error)
  • regex.FindStringIndex(pattern string, s string) (array, error)

Capture Groups

  • regex.FindStringSubmatch(pattern string, s string) (array, error)
  • regex.FindAllStringSubmatch(pattern string, s string, n int) (array, error)
  • regex.FindStringSubmatchIndex(pattern string, s string) (array, error)

String Manipulation

  • regex.ReplaceAllString(pattern string, s string, repl string) (string, error)
  • regex.ReplaceAllFunc(pattern string, s string, fn function(RegexMatchInfo) string) (string, error)
  • regex.Split(pattern string, s string, n int) (array, error)

Compiled Regex Objects

  • regex.Compile(pattern string) (Regex, error)
  • regex.MustCompile(pattern string) (Regex, error)

Compiled Regex objects expose instance methods that mirror the function-based API but reuse the compiled pattern:

  • r.MatchString(s string) (boolean, error)
  • r.FindString(s string) (string, error)
  • r.FindAllString(s string, n int) (array, error)
  • r.FindStringIndex(s string) (array, error)
  • r.FindStringSubmatch(s string) (array, error)
  • r.FindAllStringSubmatch(s string, n int) (array, error)
  • r.FindStringSubmatchIndex(s string) (array, error)
  • r.ReplaceAllString(s string, repl string) (string, error)

Notes: - Patterns use RE2 syntax (same as Go) - Escape sequences need double escaping inside strings, e.g. "\\d+" - Capture groups use parentheses: "(\\w+)@(\\w+\\.\\w+)" - Index 0 is the full match, index 1+ are capture groups

Basic Examples

Match

Match
package main
import regex
import fmt

var matched, err = regex.Match("h.llo", "hello")
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println("matched:", matched)
}

FindString

FindString
1
2
3
4
5
6
7
package main
var fs, err = regex.FindString("[a-z]+", "Go123lang")
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println("first word:", fs)
}

Compiled Regex Object Examples

Compile Once, Use Many Times

Compiled Regex
package main
import regex
import fmt

// Compile a pattern once
var r, err = regex.Compile("p([a-z]+)ch")
if err != None {
    fmt.Println("compile error:", err)
} else {
    var ok, mErr = r.MatchString("peach")
    if mErr == None {
        fmt.Println("MatchString(peach):", ok)
    }

    var first, fErr = r.FindString("peach punch")
    if fErr == None {
        fmt.Println("FindString(peach punch):", first)
    }

    var all, aErr = r.FindAllString("peach punch pinch", -1)
    if aErr == None {
        fmt.Println("FindAllString(peach punch pinch, -1):", all)
    }

    var out, rErr = r.ReplaceAllString("a peach and a punch", "<fruit>")
    if rErr == None {
        fmt.Println("ReplaceAllString(a peach and a punch, <fruit>):", out)
    }
}

MustCompile behaves like Compile but is intended for patterns that are known to be valid (for example, declared at the top of a file). In Harneet it still returns (Regex, error) and does not panic.

🔥 Capture Groups Examples

Email Parsing with Capture Groups

Email Parsing
package main
import regex
import fmt

// Extract username and domain from email
var matches, err = regex.FindStringSubmatch("(\\w+)@(\\w+\\.\\w+)", "Contact: user@example.com")
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 3 {
    fmt.Printf("Full match: %s\n", matches[0])    // "user@example.com"
    fmt.Printf("Username: %s\n", matches[1])      // "user"
    fmt.Printf("Domain: %s\n", matches[2])        // "example.com"
}

Phone Number Extraction

Phone Number Extraction
package main
import regex
import fmt

// Extract all phone numbers with area codes
var text = "Call 123-456-7890 or 987-654-3210"
var phoneMatches, err = regex.FindAllStringSubmatch("(\\d{3})-(\\d{3})-(\\d{4})", text, -1)
if err != None {
    fmt.Println("error:", err)
} else {
    for i, match in phoneMatches {
        fmt.Printf("Phone %d: %s\n", i+1, match[0])     // Full number
        fmt.Printf("  Area: %s\n", match[1])            // Area code
        fmt.Printf("  Exchange: %s\n", match[2])        // Exchange
        fmt.Printf("  Number: %s\n", match[3])          // Last 4 digits
    }
}

URL Parsing

URL Parsing
package main
import regex
import fmt

// Parse URL components
var url = "https://api.example.com:8080/v1/users"
var matches, err = regex.FindStringSubmatch("(\\w+)://([^:/]+)(:(\\d+))?(/.*)?", url)
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 3 {
    fmt.Printf("Protocol: %s\n", matches[1])     // "https"
    fmt.Printf("Host: %s\n", matches[2])         // "api.example.com"
    if matches[4] != "" {
        fmt.Printf("Port: %s\n", matches[4])     // "8080"
    }
    if matches[5] != "" {
        fmt.Printf("Path: %s\n", matches[5])     // "/v1/users"
    }
}

Date Format Parsing

Date Format Parsing
package main
import regex
import fmt

// Parse different date formats
var dates = ["2024-09-25", "09/25/2024", "September 25, 2024"]
var patterns = [
    {"pattern": "(\\d{4})-(\\d{2})-(\\d{2})", "name": "ISO"},
    {"pattern": "(\\d{2})/(\\d{2})/(\\d{4})", "name": "US"},
    {"pattern": "(\\w+)\\s+(\\d{1,2}),\\s+(\\d{4})", "name": "Long"}
]

for date in dates {
    fmt.Printf("Parsing: %s\n", date)
    for pattern in patterns {
        var matches, err = regex.FindStringSubmatch(pattern["pattern"], date)
        if err == None and len(matches) >= 4 {
            fmt.Printf("  %s format: %s/%s/%s\n", 
                      pattern["name"], matches[1], matches[2], matches[3])
        }
    }
}

Log Entry Parsing

Log Entry Parsing
package main
import regex
import fmt

// Parse structured log entries
var logEntry = "2024-09-25 14:30:15 [INFO] User login successful for user123"
var matches, err = regex.FindStringSubmatch("(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}:\\d{2}:\\d{2})\\s+\\[(\\w+)\\]\\s+(.*)", logEntry)
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 5 {
    fmt.Printf("Date: %s\n", matches[1])         // "2024-09-25"
    fmt.Printf("Time: %s\n", matches[2])         // "14:30:15"
    fmt.Printf("Level: %s\n", matches[3])        // "INFO"
    fmt.Printf("Message: %s\n", matches[4])      // "User login successful for user123"
}

Position Tracking with FindStringSubmatchIndex

Harneet does not support slicing strings with text[a:b] syntax. Use capture groups (strings) directly, or print indices for reference.

Working approach using capture groups (strings):

Capture Groups Example
package main
import regex
import fmt

var text = "The price is $123.45"
var matches, err = regex.FindStringSubmatch("\\$(\\d+)\\.(\\d+)", text)
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 3 {
    fmt.Printf("Full match: '%s'\n", matches[0])   // "$123.45"
    fmt.Printf("Dollars: '%s'\n", matches[1])     // "123"
    fmt.Printf("Cents: '%s'\n", matches[2])       // "45"
}

If you still need byte indices for integration or tooling, you can print them like this:

Byte Indices Example
package main
import regex
import fmt

var text = "The price is $123.45"
var idx, err = regex.FindStringSubmatchIndex("\\$(\\d+)\\.(\\d+)", text)
if err != None {
    fmt.Println("error:", err)
} else if len(idx) >= 6 {
    fmt.Printf("Full match bytes:   [%d:%d]\n", idx[0], idx[1])
    fmt.Printf("Dollars group bytes: [%d:%d]\n", idx[2], idx[3])
    fmt.Printf("Cents group bytes:   [%d:%d]\n", idx[4], idx[5])
}

Unicode-safe slicing from regex indices

If your text may contain non-ASCII characters, convert byte indices to rune indices and then slice safely:

Unicode-safe Slicing
package main
import regex
import strings
import fmt

var text = "Price: $123.45"  // full-width digits to illustrate multibyte
var idx, err = regex.FindStringSubmatchIndex("\\$(.+)\\.(.+)", text)
if err != None {
    fmt.Println("error:", err)
} else if len(idx) >= 2 {
    // Convert full match byte range to rune range
    var rr, rerr = strings.ByteRangeToRuneRange(text, idx[0], idx[1])
    if rerr == None {
        var full, serr = strings.Substring(text, rr[0], rr[1])
        if serr == None {
            fmt.Printf("Full (rune-safe): '%s'\n", full)
        }
    }
}

Advanced Use Cases

Data Validation

Data Validation
package main
import regex
import fmt

function validateEmail(email string) bool {
    var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
    return err == None and len(matches) >= 3
}

function extractEmailParts(email string) map {
    var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
    if err != None or len(matches) < 3 {
        return None
    }

    return {
        "username": matches[1],
        "domain": matches[2],
        "valid": true
    }
}

// Usage
var email = "user.name+tag@example.co.uk"
if validateEmail(email) {
    var parts = extractEmailParts(email)
    fmt.Printf("Valid email - Username: %s, Domain: %s\n", 
              parts["username"], parts["domain"])
}

Configuration Parsing

Configuration Parsing
package main
import regex
import fmt

function parseConfigLine(line string) map {
    // Parse key=value or key="quoted value"
    var matches, err = regex.FindStringSubmatch("^\\s*(\\w+)\\s*=\\s*(?:\"([^\"]*)\"|([^\\s#]+))", line)
    if err != None or len(matches) < 4 {
        return None
    }

    var value = matches[2]  // Quoted value
    if value == "" {
        value = matches[3]  // Unquoted value
    }

    return {
        "key": matches[1],
        "value": value
    }
}

// Usage
var configLines = [
    "port=8080",
    "host=\"localhost\"",
    "debug=true"
]

for line in configLines {
    var config = parseConfigLine(line)
    if config != None {
        fmt.Printf("%s = %s\n", config["key"], config["value"])
    }
}

HTTP Header Parsing

HTTP Header Parsing
package main
import regex
import fmt

function parseContentType(header string) map {
    // Parse Content-Type: text/html; charset=utf-8
    var matches, err = regex.FindStringSubmatch("^([^/]+)/([^;\\s]+)(?:;\\s*charset=([^;\\s]+))?", header)
    if err != None or len(matches) < 3 {
        return None
    }

    return {
        "type": matches[1],
        "subtype": matches[2],
        "charset": matches[3]
    }
}

// Usage for HTTP module integration
var contentType = "application/json; charset=utf-8"
var parsed = parseContentType(contentType)
if parsed != None {
    fmt.Printf("Type: %s/%s, Charset: %s\n", 
              parsed["type"], parsed["subtype"], parsed["charset"])
}

Basic Pattern Matching Examples

FindAllString

FindAllString
1
2
3
4
5
6
7
package main
var all, err = regex.FindAllString("[a-z]+", "go is fun", -1)
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println(all)  // ["go", "is", "fun"]
}

FindStringIndex

FindStringIndex
1
2
3
4
5
6
7
package main
var idx, err = regex.FindStringIndex("foo", "xxfooYY")
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println(idx)  // [2, 5]
}

ReplaceAllString

ReplaceAllString
1
2
3
4
5
6
7
package main
var out, err = regex.ReplaceAllString("\\d+", "abc123xyz456", "#")
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println(out)  // "abc#xyz#"
}

ReplaceAllFunc (callback-based replacement)

ReplaceAllFunc lets you compute the replacement string using a Harneet callback for each match, similar to Go's ReplaceAllStringFunc:

ReplaceAllFunc Uppercase Matches
package main
import regex
import fmt

function toUpperMatch(info RegexMatchInfo) string {
    // info.text is the full match; info.groups[0] is also the full match, groups[1:] are captures
    // info.indices is a flattened [start0, end0, start1, end1, ...] array in byte offsets
    var m = info.text
    return m.Upper()
}

var input = "a peach and a punch"
var out, err = regex.ReplaceAllFunc("p([a-z]+)ch", input, toUpperMatch)
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println(out)  // "a PEACH and a PUNCH"

Callback Rules & Safety Notes:

  • The callback must accept exactly one argument, a RegexMatchInfo object.
  • It must return a string; if it returns any other type, ReplaceAllFunc returns (None, error).
  • If the callback returns an Error or ErrorValue, iteration stops and that error is returned.
  • Pattern compilation errors are surfaced in the error position, like other regex functions.

RegexMatchInfo

RegexMatchInfo is a struct-like object constructed by the regex module and passed into ReplaceAllFunc callbacks. You can treat it like any other struct in Harneet:

  • Access fields with dot syntax: info.fieldName.
  • It is not constructed directly by user code; it is produced internally for each match.

Fields currently provided:

  • pattern (string): the regex pattern used for the match.
  • text (string): full match text.
  • groups (array<string>): capture groups (groups[0] is full match, groups[1:] are captures).
  • indices (array<int>): flattened [start0, end0, start1, end1, ...] byte indices into the original string.
  • start (int): start index of the full match.
  • end (int): end index of the full match.
  • groupCount (int): number of capture groups (excluding the full match).
  • groupNames (array<string>): names for each subexpression as returned by Go's SubexpNames() (same length as groups; entries may be empty strings when groups are unnamed).

This design keeps the callback signature stable while allowing the regex module to evolve (for example, by adding more metadata fields later) without breaking existing callbacks.

Split

Split
1
2
3
4
5
6
7
package main
var parts, err = regex.Split("\\s+", "a   b c", -1)
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println(parts)  // ["a", "b", "c"]
}

Function Reference

Basic Functions

Function Description Returns
Match(pattern, s) Test if pattern matches string (boolean, error)
FindString(pattern, s) Find first match (string, error)
FindAllString(pattern, s, n) Find all matches (array, error)
FindStringIndex(pattern, s) Find position of first match (array, error)

Capture Group Functions

Function Description Returns
FindStringSubmatch(pattern, s) Extract capture groups from first match (array, error)
FindAllStringSubmatch(pattern, s, n) Extract capture groups from all matches (array, error)
FindStringSubmatchIndex(pattern, s) Get positions of capture groups (array, error)

String Manipulation

Function Description Returns
ReplaceAllString(pattern, s, repl) Replace all matches (string, error)
ReplaceAllFunc(pattern, s, fn) Replace matches using a callback (string, error)
Split(pattern, s, n) Split string by pattern (array, error)

Common Patterns

Email Validation

Email Validation
1
2
3
4
var isValidEmail = function(email string) bool {
    var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
    return err == None and len(matches) >= 3
}

Phone Number Formats

Phone Number Formats
// US phone numbers: (555) 123-4567 or 555-123-4567
var phonePattern = "(?:\\((\\d{3})\\)\\s+|)(\\d{3})-(\\d{4})"

URL Components

URL Components
// Extract protocol, host, port, path from URLs
var urlPattern = "(https?)://([^:/]+)(?::(\\d+))?(/.*)?(?:\\?(.*))?"

Date Formats

Date Formats
1
2
3
var isoDate = "(\\d{4})-(\\d{2})-(\\d{2})"           // 2024-09-25
var usDate = "(\\d{2})/(\\d{2})/(\\d{4})"           // 09/25/2024
var longDate = "(\\w+)\\s+(\\d{1,2}),\\s+(\\d{4})"  // September 25, 2024

Log Parsing

Log Parsing Pattern
// Parse log entries: 2024-09-25 14:30:15 [ERROR] Message
var logPattern = "(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}:\\d{2}:\\d{2})\\s+\\[(\\w+)\\]\\s+(.*)"

Best Practices

1. Always Handle Errors

Always Handle Errors
1
2
3
4
5
6
package main
var matches, err = regex.FindStringSubmatch(pattern, text)
if err != None {
    fmt.Printf("Regex error: %s\n", err)
    return
}

2. Use Raw Strings for Complex Patterns

Raw Strings for Patterns
// Easier to read and maintain
var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"

3. Validate Before Processing

Validate Before Processing
1
2
3
4
5
6
package main
var matches, err = regex.FindStringSubmatch(pattern, text)
if err != None or len(matches) < expectedGroups {
    // Handle invalid input
    return
}

4. Use Descriptive Variable Names

Descriptive Variable Names
1
2
3
4
package main
var emailMatches, _ = regex.FindStringSubmatch(emailPattern, userInput)
var username = emailMatches[1]
var domain = emailMatches[2]

Error Handling

Error Handling
1
2
3
4
5
package main
var bad, err = regex.Match("(", "test")
if err != None {
    fmt.Println("got expected error:", err)
}

Common regex errors: - Invalid syntax: Unclosed groups, invalid escape sequences - Compilation errors: Malformed patterns, unsupported features - Runtime errors: Pattern too complex, stack overflow

Performance Tips

1. Compile Once, Use Many Times

Compile Once Use Many
package main
// Good: Reuse the same pattern
var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"

function validateEmails(emails array) {
    for email in emails {
        var matches, _ = regex.FindStringSubmatch(emailPattern, email)
        // Process matches...
    }
}

2. Use Specific Patterns

Use Specific Patterns
1
2
3
4
5
6
package main
// Better: More specific
var phonePattern = "\\d{3}-\\d{3}-\\d{4}"

// Avoid: Too general
var badPattern = ".*-.*-.*"

3. Limit Backtracking

Limit Backtracking
1
2
3
4
5
6
package main
// Good: Non-greedy quantifiers when appropriate
var htmlTag = "<(\\w+).*?>"

// Can be slow: Excessive backtracking
var slowPattern = "(a+)+b"

Integration Examples

With HTTP Module

With HTTP Module
package main
import http
import regex
import fmt

function parseUserAgent(userAgent string) map {
    var pattern = "([^/]+)/([\\d\\.]+)\\s*\\(([^)]+)\\)"
    var matches, err = regex.FindStringSubmatch(pattern, userAgent)

    if err != None or len(matches) < 4 {
        return None
    }

    return {
        "browser": matches[1],
        "version": matches[2],
        "platform": matches[3]
    }
}

// Usage in HTTP handler
function handleRequest(request map, response map) {
    var userAgent = request["headers"]["User-Agent"]
    var parsed = parseUserAgent(userAgent)

    if parsed != None {
        fmt.Printf("Browser: %s %s on %s\n", 
                  parsed["browser"], parsed["version"], parsed["platform"])
    }
}

With JSON Module

With JSON Module
package main
import json
import regex
import fmt

function extractDataFromText(text string) map {
    var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"
    var phonePattern = "(\\d{3})-(\\d{3})-(\\d{4})"

    var emails, _ = regex.FindAllStringSubmatch(emailPattern, text, -1)
    var phones, _ = regex.FindAllStringSubmatch(phonePattern, text, -1)

    var result = {
        "emails": [],
        "phones": []
    }

    // Process emails
    for email in emails {
        var emailData = {"full": email[0], "username": email[1], "domain": email[2]}
        result["emails"] = append(result["emails"], emailData)
    }

    // Process phones  
    for phone in phones {
        var phoneData = {"full": phone[0], "area": phone[1], "exchange": phone[2], "number": phone[3]}
        result["phones"] = append(result["phones"], phoneData)
    }

    return result
}

With Cast Module

With Cast Module
package main
import cast
import regex
import fmt

function parseConfigValue(line string) map {
    var pattern = "^\\s*(\\w+)\\s*=\\s*([\"']?)([^\"'\\n]*?)\\2\\s*$"
    var matches, err = regex.FindStringSubmatch(pattern, line)

    if err != None or len(matches) < 4 {
        return None
    }

    var key = matches[1]
    var value = matches[3]

    // Try to cast to appropriate type
    var intValue, intErr = cast.ToInt(value)
    if intErr == None {
        return {"key": key, "value": intValue, "type": "int"}
    }

    var boolValue, boolErr = cast.ToBool(value)
    if boolErr == None {
        return {"key": key, "value": boolValue, "type": "bool"}
    }

    return {"key": key, "value": value, "type": "string"}
}

Regular Expression Syntax

Harneet uses Go's RE2 syntax. Key features:

Character Classes

  • \d - Digits (0-9)
  • \w - Word characters (a-z, A-Z, 0-9, _)
  • \s - Whitespace
  • [abc] - Character set
  • [^abc] - Negated character set
  • [a-z] - Character range

Quantifiers

  • * - Zero or more
  • + - One or more
  • ? - Zero or one
  • {n} - Exactly n
  • {n,} - n or more
  • {n,m} - Between n and m

Anchors

  • ^ - Start of string
  • $ - End of string
  • \b - Word boundary

Groups

  • (...) - Capture group
  • (?:...) - Non-capture group
  • (?P<name>...) - Named group (not yet supported)

Escape Sequences

Remember to double-escape in Harneet strings: - \\d for \d - \\\\ for \\ - \\" for "

Limitations

Current limitations in Harneet's regex implementation: - No named capture groups yet - No lookahead/lookbehind assertions - No conditional expressions - No recursive patterns

See Also


Note: All regex functions follow Harneet's standard (result, error) tuple return pattern for consistent error handling.