Regex

Regular expressions using Go's RE2 engine (same syntax as Go's regexp). All functions return a tuple (result, error) following Harneet's stdlib conventions.

Import

Import
1	`import regex`

Functions

Basic Pattern Matching

regex.Match(pattern string, s string) (boolean, error)
regex.FindString(pattern string, s string) (string, error)
regex.FindAllString(pattern string, s string, n int) (array, error)
regex.FindStringIndex(pattern string, s string) (array, error)

Capture Groups

regex.FindStringSubmatch(pattern string, s string) (array, error)
regex.FindAllStringSubmatch(pattern string, s string, n int) (array, error)
regex.FindStringSubmatchIndex(pattern string, s string) (array, error)

String Manipulation

regex.ReplaceAllString(pattern string, s string, repl string) (string, error)
regex.ReplaceAllFunc(pattern string, s string, fn function(RegexMatchInfo) string) (string, error)
regex.Split(pattern string, s string, n int) (array, error)

Compiled Regex Objects

regex.Compile(pattern string) (Regex, error)
regex.MustCompile(pattern string) (Regex, error)

Compiled Regex objects expose instance methods that mirror the function-based API but reuse the compiled pattern:

r.MatchString(s string) (boolean, error)
r.FindString(s string) (string, error)
r.FindAllString(s string, n int) (array, error)
r.FindStringIndex(s string) (array, error)
r.FindStringSubmatch(s string) (array, error)
r.FindAllStringSubmatch(s string, n int) (array, error)
r.FindStringSubmatchIndex(s string) (array, error)
r.ReplaceAllString(s string, repl string) (string, error)

Notes: - Patterns use RE2 syntax (same as Go) - Escape sequences need double escaping inside strings, e.g. "\\d+" - Capture groups use parentheses: "(\\w+)@(\\w+\\.\\w+)" - Index 0 is the full match, index 1+ are capture groups

Basic Examples

Match

Match
1 2 3 4 5 6 7 8 9 10	`package main import regex import fmt var matched, err = regex.Match("h.llo", "hello") if err != None { fmt.Println("error:", err) } else { fmt.Println("matched:", matched) }`

FindString

FindString
1 2 3 4 5 6 7	`package main var fs, err = regex.FindString("[a-z]+", "Go123lang") if err != None { fmt.Println("error:", err) } else { fmt.Println("first word:", fs) }`

Compiled Regex Object Examples

Compile Once, Use Many Times

Compiled Regex

package main
import regex
import fmt

// Compile a pattern once
var r, err = regex.Compile("p([a-z]+)ch")
if err != None {
    fmt.Println("compile error:", err)
} else {
    var ok, mErr = r.MatchString("peach")
    if mErr == None {
        fmt.Println("MatchString(peach):", ok)
    }

    var first, fErr = r.FindString("peach punch")
    if fErr == None {
        fmt.Println("FindString(peach punch):", first)
    }

    var all, aErr = r.FindAllString("peach punch pinch", -1)
    if aErr == None {
        fmt.Println("FindAllString(peach punch pinch, -1):", all)
    }

    var out, rErr = r.ReplaceAllString("a peach and a punch", "<fruit>")
    if rErr == None {
        fmt.Println("ReplaceAllString(a peach and a punch, <fruit>):", out)
    }
}

MustCompile behaves like Compile but is intended for patterns that are known to be valid (for example, declared at the top of a file). In Harneet it still returns (Regex, error) and does not panic.

🔥 Capture Groups Examples

Email Parsing with Capture Groups

Email Parsing

package main
import regex
import fmt

// Extract username and domain from email
var matches, err = regex.FindStringSubmatch("(\\w+)@(\\w+\\.\\w+)", "Contact: user@example.com")
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 3 {
    fmt.Printf("Full match: %s\n", matches[0])    // "user@example.com"
    fmt.Printf("Username: %s\n", matches[1])      // "user"
    fmt.Printf("Domain: %s\n", matches[2])        // "example.com"
}

Phone Number Extraction

package main
import regex
import fmt

// Extract all phone numbers with area codes
var text = "Call 123-456-7890 or 987-654-3210"
var phoneMatches, err = regex.FindAllStringSubmatch("(\\d{3})-(\\d{3})-(\\d{4})", text, -1)
if err != None {
    fmt.Println("error:", err)
} else {
    for i, match in phoneMatches {
        fmt.Printf("Phone %d: %s\n", i+1, match[0])     // Full number
        fmt.Printf("  Area: %s\n", match[1])            // Area code
        fmt.Printf("  Exchange: %s\n", match[2])        // Exchange
        fmt.Printf("  Number: %s\n", match[3])          // Last 4 digits
    }
}

URL Parsing

package main
import regex
import fmt

// Parse URL components
var url = "https://api.example.com:8080/v1/users"
var matches, err = regex.FindStringSubmatch("(\\w+)://([^:/]+)(:(\\d+))?(/.*)?", url)
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 3 {
    fmt.Printf("Protocol: %s\n", matches[1])     // "https"
    fmt.Printf("Host: %s\n", matches[2])         // "api.example.com"
    if matches[4] != "" {
        fmt.Printf("Port: %s\n", matches[4])     // "8080"
    }
    if matches[5] != "" {
        fmt.Printf("Path: %s\n", matches[5])     // "/v1/users"
    }
}

Date Format Parsing

package main
import regex
import fmt

// Parse different date formats
var dates = ["2024-09-25", "09/25/2024", "September 25, 2024"]
var patterns = [
    {"pattern": "(\\d{4})-(\\d{2})-(\\d{2})", "name": "ISO"},
    {"pattern": "(\\d{2})/(\\d{2})/(\\d{4})", "name": "US"},
    {"pattern": "(\\w+)\\s+(\\d{1,2}),\\s+(\\d{4})", "name": "Long"}
]

for date in dates {
    fmt.Printf("Parsing: %s\n", date)
    for pattern in patterns {
        var matches, err = regex.FindStringSubmatch(pattern["pattern"], date)
        if err == None and len(matches) >= 4 {
            fmt.Printf("  %s format: %s/%s/%s\n", 
                      pattern["name"], matches[1], matches[2], matches[3])
        }
    }
}

Log Entry Parsing

package main
import regex
import fmt

// Parse structured log entries
var logEntry = "2024-09-25 14:30:15 [INFO] User login successful for user123"
var matches, err = regex.FindStringSubmatch("(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}:\\d{2}:\\d{2})\\s+\\[(\\w+)\\]\\s+(.*)", logEntry)
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 5 {
    fmt.Printf("Date: %s\n", matches[1])         // "2024-09-25"
    fmt.Printf("Time: %s\n", matches[2])         // "14:30:15"
    fmt.Printf("Level: %s\n", matches[3])        // "INFO"
    fmt.Printf("Message: %s\n", matches[4])      // "User login successful for user123"
}

Position Tracking with FindStringSubmatchIndex

Harneet does not support slicing strings with text[a:b] syntax. Use capture groups (strings) directly, or print indices for reference.

Working approach using capture groups (strings):

Capture Groups Example

package main
import regex
import fmt

var text = "The price is $123.45"
var matches, err = regex.FindStringSubmatch("\\$(\\d+)\\.(\\d+)", text)
if err != None {
    fmt.Println("error:", err)
} else if len(matches) >= 3 {
    fmt.Printf("Full match: '%s'\n", matches[0])   // "$123.45"
    fmt.Printf("Dollars: '%s'\n", matches[1])     // "123"
    fmt.Printf("Cents: '%s'\n", matches[2])       // "45"
}

If you still need byte indices for integration or tooling, you can print them like this:

Byte Indices Example

package main
import regex
import fmt

var text = "The price is $123.45"
var idx, err = regex.FindStringSubmatchIndex("\\$(\\d+)\\.(\\d+)", text)
if err != None {
    fmt.Println("error:", err)
} else if len(idx) >= 6 {
    fmt.Printf("Full match bytes:   [%d:%d]\n", idx[0], idx[1])
    fmt.Printf("Dollars group bytes: [%d:%d]\n", idx[2], idx[3])
    fmt.Printf("Cents group bytes:   [%d:%d]\n", idx[4], idx[5])
}

Unicode-safe slicing from regex indices

If your text may contain non-ASCII characters, convert byte indices to rune indices and then slice safely:

Unicode-safe Slicing

package main
import regex
import strings
import fmt

var text = "Price: $１２３.４５"  // full-width digits to illustrate multibyte
var idx, err = regex.FindStringSubmatchIndex("\\$(.+)\\.(.+)", text)
if err != None {
    fmt.Println("error:", err)
} else if len(idx) >= 2 {
    // Convert full match byte range to rune range
    var rr, rerr = strings.ByteRangeToRuneRange(text, idx[0], idx[1])
    if rerr == None {
        var full, serr = strings.Substring(text, rr[0], rr[1])
        if serr == None {
            fmt.Printf("Full (rune-safe): '%s'\n", full)
        }
    }
}

Advanced Use Cases

Data Validation

package main
import regex
import fmt

function validateEmail(email string) bool {
    var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
    return err == None and len(matches) >= 3
}

function extractEmailParts(email string) map {
    var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
    if err != None or len(matches) < 3 {
        return None
    }

    return {
        "username": matches[1],
        "domain": matches[2],
        "valid": true
    }
}

// Usage
var email = "user.name+tag@example.co.uk"
if validateEmail(email) {
    var parts = extractEmailParts(email)
    fmt.Printf("Valid email - Username: %s, Domain: %s\n", 
              parts["username"], parts["domain"])
}

Configuration Parsing

package main
import regex
import fmt

function parseConfigLine(line string) map {
    // Parse key=value or key="quoted value"
    var matches, err = regex.FindStringSubmatch("^\\s*(\\w+)\\s*=\\s*(?:\"([^\"]*)\"|([^\\s#]+))", line)
    if err != None or len(matches) < 4 {
        return None
    }

    var value = matches[2]  // Quoted value
    if value == "" {
        value = matches[3]  // Unquoted value
    }

    return {
        "key": matches[1],
        "value": value
    }
}

// Usage
var configLines = [
    "port=8080",
    "host=\"localhost\"",
    "debug=true"
]

for line in configLines {
    var config = parseConfigLine(line)
    if config != None {
        fmt.Printf("%s = %s\n", config["key"], config["value"])
    }
}

HTTP Header Parsing

package main
import regex
import fmt

function parseContentType(header string) map {
    // Parse Content-Type: text/html; charset=utf-8
    var matches, err = regex.FindStringSubmatch("^([^/]+)/([^;\\s]+)(?:;\\s*charset=([^;\\s]+))?", header)
    if err != None or len(matches) < 3 {
        return None
    }

    return {
        "type": matches[1],
        "subtype": matches[2],
        "charset": matches[3]
    }
}

// Usage for HTTP module integration
var contentType = "application/json; charset=utf-8"
var parsed = parseContentType(contentType)
if parsed != None {
    fmt.Printf("Type: %s/%s, Charset: %s\n", 
              parsed["type"], parsed["subtype"], parsed["charset"])
}

Basic Pattern Matching Examples

FindAllString

FindAllString
1 2 3 4 5 6 7	`package main var all, err = regex.FindAllString("[a-z]+", "go is fun", -1) if err != None { fmt.Println("error:", err) } else { fmt.Println(all) // ["go", "is", "fun"] }`

FindStringIndex

FindStringIndex
1 2 3 4 5 6 7	`package main var idx, err = regex.FindStringIndex("foo", "xxfooYY") if err != None { fmt.Println("error:", err) } else { fmt.Println(idx) // [2, 5] }`

ReplaceAllString

ReplaceAllString
1 2 3 4 5 6 7	`package main var out, err = regex.ReplaceAllString("\\d+", "abc123xyz456", "#") if err != None { fmt.Println("error:", err) } else { fmt.Println(out) // "abc#xyz#" }`

ReplaceAllFunc (callback-based replacement)

ReplaceAllFunc lets you compute the replacement string using a Harneet callback for each match, similar to Go's ReplaceAllStringFunc:

ReplaceAllFunc Uppercase Matches

package main
import regex
import fmt

function toUpperMatch(info RegexMatchInfo) string {
    // info.text is the full match; info.groups[0] is also the full match, groups[1:] are captures
    // info.indices is a flattened [start0, end0, start1, end1, ...] array in byte offsets
    var m = info.text
    return m.Upper()
}

var input = "a peach and a punch"
var out, err = regex.ReplaceAllFunc("p([a-z]+)ch", input, toUpperMatch)
if err != None {
    fmt.Println("error:", err)
} else {
    fmt.Println(out)  // "a PEACH and a PUNCH"

Callback Rules & Safety Notes:

The callback must accept exactly one argument, a RegexMatchInfo object.
It must return a string; if it returns any other type, ReplaceAllFunc returns (None, error).
If the callback returns an Error or ErrorValue, iteration stops and that error is returned.
Pattern compilation errors are surfaced in the error position, like other regex functions.

RegexMatchInfo

RegexMatchInfo is a struct-like object constructed by the regex module and passed into ReplaceAllFunc callbacks. You can treat it like any other struct in Harneet:

Access fields with dot syntax: info.fieldName.
It is not constructed directly by user code; it is produced internally for each match.

Fields currently provided:

pattern (string): the regex pattern used for the match.
text (string): full match text.
groups (array<string>): capture groups (groups[0] is full match, groups[1:] are captures).
indices (array<int>): flattened [start0, end0, start1, end1, ...] byte indices into the original string.
start (int): start index of the full match.
end (int): end index of the full match.
groupCount (int): number of capture groups (excluding the full match).
groupNames (array<string>): names for each subexpression as returned by Go's SubexpNames() (same length as groups; entries may be empty strings when groups are unnamed).

This design keeps the callback signature stable while allowing the regex module to evolve (for example, by adding more metadata fields later) without breaking existing callbacks.

Split

Split
1 2 3 4 5 6 7	`package main var parts, err = regex.Split("\\s+", "a b c", -1) if err != None { fmt.Println("error:", err) } else { fmt.Println(parts) // ["a", "b", "c"] }`

Function Reference

Basic Functions

Function	Description	Returns
`Match(pattern, s)`	Test if pattern matches string	`(boolean, error)`
`FindString(pattern, s)`	Find first match	`(string, error)`
`FindAllString(pattern, s, n)`	Find all matches	`(array, error)`
`FindStringIndex(pattern, s)`	Find position of first match	`(array, error)`

Capture Group Functions

Function	Description	Returns
`FindStringSubmatch(pattern, s)`	Extract capture groups from first match	`(array, error)`
`FindAllStringSubmatch(pattern, s, n)`	Extract capture groups from all matches	`(array, error)`
`FindStringSubmatchIndex(pattern, s)`	Get positions of capture groups	`(array, error)`

String Manipulation

Function	Description	Returns
`ReplaceAllString(pattern, s, repl)`	Replace all matches	`(string, error)`
`ReplaceAllFunc(pattern, s, fn)`	Replace matches using a callback	`(string, error)`
`Split(pattern, s, n)`	Split string by pattern	`(array, error)`

Common Patterns

Email Validation

Email Validation
1 2 3 4	`var isValidEmail = function(email string) bool { var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email) return err == None and len(matches) >= 3 }`

Phone Number Formats

Phone Number Formats
1 2	`// US phone numbers: (555) 123-4567 or 555-123-4567 var phonePattern = "(?:\$(\\d{3})\$\\s+\|)(\\d{3})-(\\d{4})"`

URL Components

URL Components
1 2	`// Extract protocol, host, port, path from URLs var urlPattern = "(https?)://([^:/]+)(?::(\\d+))?(/.)?(?:\\?(.))?"`

Date Formats

Date Formats
1 2 3	`var isoDate = "(\\d{4})-(\\d{2})-(\\d{2})" // 2024-09-25 var usDate = "(\\d{2})/(\\d{2})/(\\d{4})" // 09/25/2024 var longDate = "(\\w+)\\s+(\\d{1,2}),\\s+(\\d{4})" // September 25, 2024`

Log Parsing

Log Parsing Pattern
1 2	`// Parse log entries: 2024-09-25 14:30:15 [ERROR] Message var logPattern = "(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}:\\d{2}:\\d{2})\\s+\\[(\\w+)\\]\\s+(.*)"`

Best Practices

1. Always Handle Errors

Always Handle Errors
1 2 3 4 5 6	`package main var matches, err = regex.FindStringSubmatch(pattern, text) if err != None { fmt.Printf("Regex error: %s\n", err) return }`

2. Use Raw Strings for Complex Patterns

Raw Strings for Patterns
1 2	`// Easier to read and maintain var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"`

3. Validate Before Processing

Validate Before Processing
1 2 3 4 5 6	`package main var matches, err = regex.FindStringSubmatch(pattern, text) if err != None or len(matches) < expectedGroups { // Handle invalid input return }`

4. Use Descriptive Variable Names

Descriptive Variable Names
1 2 3 4	`package main var emailMatches, _ = regex.FindStringSubmatch(emailPattern, userInput) var username = emailMatches[1] var domain = emailMatches[2]`

Error Handling

Error Handling
1 2 3 4 5	`package main var bad, err = regex.Match("(", "test") if err != None { fmt.Println("got expected error:", err) }`

Common regex errors: - Invalid syntax: Unclosed groups, invalid escape sequences - Compilation errors: Malformed patterns, unsupported features - Runtime errors: Pattern too complex, stack overflow

Performance Tips

1. Compile Once, Use Many Times

Compile Once Use Many

package main
// Good: Reuse the same pattern
var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"

function validateEmails(emails array) {
    for email in emails {
        var matches, _ = regex.FindStringSubmatch(emailPattern, email)
        // Process matches...
    }
}

2. Use Specific Patterns

Use Specific Patterns
1 2 3 4 5 6	`package main // Better: More specific var phonePattern = "\\d{3}-\\d{3}-\\d{4}" // Avoid: Too general var badPattern = ".-.-.*"`

3. Limit Backtracking

Limit Backtracking
1 2 3 4 5 6	`package main // Good: Non-greedy quantifiers when appropriate var htmlTag = "<(\\w+).*?>" // Can be slow: Excessive backtracking var slowPattern = "(a+)+b"`

Integration Examples

With HTTP Module

package main
import http
import regex
import fmt

function parseUserAgent(userAgent string) map {
    var pattern = "([^/]+)/([\\d\\.]+)\\s*\\(([^)]+)\\)"
    var matches, err = regex.FindStringSubmatch(pattern, userAgent)

    if err != None or len(matches) < 4 {
        return None
    }

    return {
        "browser": matches[1],
        "version": matches[2],
        "platform": matches[3]
    }
}

// Usage in HTTP handler
function handleRequest(request map, response map) {
    var userAgent = request["headers"]["User-Agent"]
    var parsed = parseUserAgent(userAgent)

    if parsed != None {
        fmt.Printf("Browser: %s %s on %s\n", 
                  parsed["browser"], parsed["version"], parsed["platform"])
    }
}

With JSON Module

package main
import json
import regex
import fmt

function extractDataFromText(text string) map {
    var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"
    var phonePattern = "(\\d{3})-(\\d{3})-(\\d{4})"

    var emails, _ = regex.FindAllStringSubmatch(emailPattern, text, -1)
    var phones, _ = regex.FindAllStringSubmatch(phonePattern, text, -1)

    var result = {
        "emails": [],
        "phones": []
    }

    // Process emails
    for email in emails {
        var emailData = {"full": email[0], "username": email[1], "domain": email[2]}
        result["emails"] = append(result["emails"], emailData)
    }

    // Process phones  
    for phone in phones {
        var phoneData = {"full": phone[0], "area": phone[1], "exchange": phone[2], "number": phone[3]}
        result["phones"] = append(result["phones"], phoneData)
    }

    return result
}

With Cast Module

package main
import cast
import regex
import fmt

function parseConfigValue(line string) map {
    var pattern = "^\\s*(\\w+)\\s*=\\s*([\"']?)([^\"'\\n]*?)\\2\\s*$"
    var matches, err = regex.FindStringSubmatch(pattern, line)

    if err != None or len(matches) < 4 {
        return None
    }

    var key = matches[1]
    var value = matches[3]

    // Try to cast to appropriate type
    var intValue, intErr = cast.ToInt(value)
    if intErr == None {
        return {"key": key, "value": intValue, "type": "int"}
    }

    var boolValue, boolErr = cast.ToBool(value)
    if boolErr == None {
        return {"key": key, "value": boolValue, "type": "bool"}
    }

    return {"key": key, "value": value, "type": "string"}
}

Regular Expression Syntax

Harneet uses Go's RE2 syntax. Key features:

Character Classes

\d - Digits (0-9)
\w - Word characters (a-z, A-Z, 0-9, _)
\s - Whitespace
[abc] - Character set
[^abc] - Negated character set
[a-z] - Character range

Quantifiers

* - Zero or more
+ - One or more
? - Zero or one
{n} - Exactly n
{n,} - n or more
{n,m} - Between n and m

Anchors

^ - Start of string
$ - End of string
\b - Word boundary

Groups

(...) - Capture group
(?:...) - Non-capture group
(?P<name>...) - Named group (not yet supported)

Escape Sequences

Remember to double-escape in Harneet strings: - \\d for \d - \\\\ for \\ - \\" for "

Limitations

Current limitations in Harneet's regex implementation: - No named capture groups yet - No lookahead/lookbehind assertions - No conditional expressions - No recursive patterns

Regex

Import

Functions

Basic Pattern Matching

Capture Groups

String Manipulation

Compiled Regex Objects

Basic Examples

Match

FindString

Compiled Regex Object Examples

Compile Once, Use Many Times

🔥 Capture Groups Examples

Email Parsing with Capture Groups

Phone Number Extraction

URL Parsing

Date Format Parsing

Log Entry Parsing

Position Tracking with FindStringSubmatchIndex

Unicode-safe slicing from regex indices

Advanced Use Cases

Data Validation

Configuration Parsing

HTTP Header Parsing

Basic Pattern Matching Examples

FindAllString

FindStringIndex

ReplaceAllString

ReplaceAllFunc (callback-based replacement)

RegexMatchInfo

Split

Function Reference

Basic Functions

Capture Group Functions

String Manipulation

Common Patterns

Email Validation

Phone Number Formats

URL Components

Date Formats

Log Parsing

Best Practices

1. Always Handle Errors

2. Use Raw Strings for Complex Patterns

3. Validate Before Processing

4. Use Descriptive Variable Names

Error Handling

Performance Tips

1. Compile Once, Use Many Times

2. Use Specific Patterns

3. Limit Backtracking

Integration Examples

With HTTP Module

With JSON Module

With Cast Module

Regular Expression Syntax

Character Classes

Quantifiers

Anchors

Groups

Escape Sequences

Limitations

See Also