Regex
Regular expressions using Go's RE2 engine (same syntax as Go's regexp). All functions return a tuple (result, error) following Harneet's stdlib conventions.
Import
Functions
Basic Pattern Matching
regex.Match(pattern string, s string) (boolean, error) regex.FindString(pattern string, s string) (string, error) regex.FindAllString(pattern string, s string, n int) (array, error) regex.FindStringIndex(pattern string, s string) (array, error)
Capture Groups
regex.FindStringSubmatch(pattern string, s string) (array, error) regex.FindAllStringSubmatch(pattern string, s string, n int) (array, error) regex.FindStringSubmatchIndex(pattern string, s string) (array, error)
String Manipulation
regex.ReplaceAllString(pattern string, s string, repl string) (string, error) regex.ReplaceAllFunc(pattern string, s string, fn function(RegexMatchInfo) string) (string, error) regex.Split(pattern string, s string, n int) (array, error)
Compiled Regex Objects
regex.Compile(pattern string) (Regex, error) regex.MustCompile(pattern string) (Regex, error)
Compiled Regex objects expose instance methods that mirror the function-based API but reuse the compiled pattern:
r.MatchString(s string) (boolean, error) r.FindString(s string) (string, error) r.FindAllString(s string, n int) (array, error) r.FindStringIndex(s string) (array, error) r.FindStringSubmatch(s string) (array, error) r.FindAllStringSubmatch(s string, n int) (array, error) r.FindStringSubmatchIndex(s string) (array, error) r.ReplaceAllString(s string, repl string) (string, error)
Notes: - Patterns use RE2 syntax (same as Go) - Escape sequences need double escaping inside strings, e.g. "\\d+" - Capture groups use parentheses: "(\\w+)@(\\w+\\.\\w+)" - Index 0 is the full match, index 1+ are capture groups
Basic Examples
Match
| Match |
|---|
| package main
import regex
import fmt
var matched, err = regex.Match("h.llo", "hello")
if err != None {
fmt.Println("error:", err)
} else {
fmt.Println("matched:", matched)
}
|
FindString
| FindString |
|---|
| package main
var fs, err = regex.FindString("[a-z]+", "Go123lang")
if err != None {
fmt.Println("error:", err)
} else {
fmt.Println("first word:", fs)
}
|
Compiled Regex Object Examples
Compile Once, Use Many Times
| Compiled Regex |
|---|
| package main
import regex
import fmt
// Compile a pattern once
var r, err = regex.Compile("p([a-z]+)ch")
if err != None {
fmt.Println("compile error:", err)
} else {
var ok, mErr = r.MatchString("peach")
if mErr == None {
fmt.Println("MatchString(peach):", ok)
}
var first, fErr = r.FindString("peach punch")
if fErr == None {
fmt.Println("FindString(peach punch):", first)
}
var all, aErr = r.FindAllString("peach punch pinch", -1)
if aErr == None {
fmt.Println("FindAllString(peach punch pinch, -1):", all)
}
var out, rErr = r.ReplaceAllString("a peach and a punch", "<fruit>")
if rErr == None {
fmt.Println("ReplaceAllString(a peach and a punch, <fruit>):", out)
}
}
|
MustCompile behaves like Compile but is intended for patterns that are known to be valid (for example, declared at the top of a file). In Harneet it still returns (Regex, error) and does not panic.
🔥 Capture Groups Examples
Email Parsing with Capture Groups
| Email Parsing |
|---|
| package main
import regex
import fmt
// Extract username and domain from email
var matches, err = regex.FindStringSubmatch("(\\w+)@(\\w+\\.\\w+)", "Contact: user@example.com")
if err != None {
fmt.Println("error:", err)
} else if len(matches) >= 3 {
fmt.Printf("Full match: %s\n", matches[0]) // "user@example.com"
fmt.Printf("Username: %s\n", matches[1]) // "user"
fmt.Printf("Domain: %s\n", matches[2]) // "example.com"
}
|
| Phone Number Extraction |
|---|
| package main
import regex
import fmt
// Extract all phone numbers with area codes
var text = "Call 123-456-7890 or 987-654-3210"
var phoneMatches, err = regex.FindAllStringSubmatch("(\\d{3})-(\\d{3})-(\\d{4})", text, -1)
if err != None {
fmt.Println("error:", err)
} else {
for i, match in phoneMatches {
fmt.Printf("Phone %d: %s\n", i+1, match[0]) // Full number
fmt.Printf(" Area: %s\n", match[1]) // Area code
fmt.Printf(" Exchange: %s\n", match[2]) // Exchange
fmt.Printf(" Number: %s\n", match[3]) // Last 4 digits
}
}
|
URL Parsing
| URL Parsing |
|---|
| package main
import regex
import fmt
// Parse URL components
var url = "https://api.example.com:8080/v1/users"
var matches, err = regex.FindStringSubmatch("(\\w+)://([^:/]+)(:(\\d+))?(/.*)?", url)
if err != None {
fmt.Println("error:", err)
} else if len(matches) >= 3 {
fmt.Printf("Protocol: %s\n", matches[1]) // "https"
fmt.Printf("Host: %s\n", matches[2]) // "api.example.com"
if matches[4] != "" {
fmt.Printf("Port: %s\n", matches[4]) // "8080"
}
if matches[5] != "" {
fmt.Printf("Path: %s\n", matches[5]) // "/v1/users"
}
}
|
| Date Format Parsing |
|---|
| package main
import regex
import fmt
// Parse different date formats
var dates = ["2024-09-25", "09/25/2024", "September 25, 2024"]
var patterns = [
{"pattern": "(\\d{4})-(\\d{2})-(\\d{2})", "name": "ISO"},
{"pattern": "(\\d{2})/(\\d{2})/(\\d{4})", "name": "US"},
{"pattern": "(\\w+)\\s+(\\d{1,2}),\\s+(\\d{4})", "name": "Long"}
]
for date in dates {
fmt.Printf("Parsing: %s\n", date)
for pattern in patterns {
var matches, err = regex.FindStringSubmatch(pattern["pattern"], date)
if err == None and len(matches) >= 4 {
fmt.Printf(" %s format: %s/%s/%s\n",
pattern["name"], matches[1], matches[2], matches[3])
}
}
}
|
Log Entry Parsing
| Log Entry Parsing |
|---|
| package main
import regex
import fmt
// Parse structured log entries
var logEntry = "2024-09-25 14:30:15 [INFO] User login successful for user123"
var matches, err = regex.FindStringSubmatch("(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}:\\d{2}:\\d{2})\\s+\\[(\\w+)\\]\\s+(.*)", logEntry)
if err != None {
fmt.Println("error:", err)
} else if len(matches) >= 5 {
fmt.Printf("Date: %s\n", matches[1]) // "2024-09-25"
fmt.Printf("Time: %s\n", matches[2]) // "14:30:15"
fmt.Printf("Level: %s\n", matches[3]) // "INFO"
fmt.Printf("Message: %s\n", matches[4]) // "User login successful for user123"
}
|
Position Tracking with FindStringSubmatchIndex
Harneet does not support slicing strings with text[a:b] syntax. Use capture groups (strings) directly, or print indices for reference.
Working approach using capture groups (strings):
| Capture Groups Example |
|---|
| package main
import regex
import fmt
var text = "The price is $123.45"
var matches, err = regex.FindStringSubmatch("\\$(\\d+)\\.(\\d+)", text)
if err != None {
fmt.Println("error:", err)
} else if len(matches) >= 3 {
fmt.Printf("Full match: '%s'\n", matches[0]) // "$123.45"
fmt.Printf("Dollars: '%s'\n", matches[1]) // "123"
fmt.Printf("Cents: '%s'\n", matches[2]) // "45"
}
|
If you still need byte indices for integration or tooling, you can print them like this:
| Byte Indices Example |
|---|
| package main
import regex
import fmt
var text = "The price is $123.45"
var idx, err = regex.FindStringSubmatchIndex("\\$(\\d+)\\.(\\d+)", text)
if err != None {
fmt.Println("error:", err)
} else if len(idx) >= 6 {
fmt.Printf("Full match bytes: [%d:%d]\n", idx[0], idx[1])
fmt.Printf("Dollars group bytes: [%d:%d]\n", idx[2], idx[3])
fmt.Printf("Cents group bytes: [%d:%d]\n", idx[4], idx[5])
}
|
Unicode-safe slicing from regex indices
If your text may contain non-ASCII characters, convert byte indices to rune indices and then slice safely:
| Unicode-safe Slicing |
|---|
| package main
import regex
import strings
import fmt
var text = "Price: $123.45" // full-width digits to illustrate multibyte
var idx, err = regex.FindStringSubmatchIndex("\\$(.+)\\.(.+)", text)
if err != None {
fmt.Println("error:", err)
} else if len(idx) >= 2 {
// Convert full match byte range to rune range
var rr, rerr = strings.ByteRangeToRuneRange(text, idx[0], idx[1])
if rerr == None {
var full, serr = strings.Substring(text, rr[0], rr[1])
if serr == None {
fmt.Printf("Full (rune-safe): '%s'\n", full)
}
}
}
|
Advanced Use Cases
Data Validation
| Data Validation |
|---|
| package main
import regex
import fmt
function validateEmail(email string) bool {
var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
return err == None and len(matches) >= 3
}
function extractEmailParts(email string) map {
var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
if err != None or len(matches) < 3 {
return None
}
return {
"username": matches[1],
"domain": matches[2],
"valid": true
}
}
// Usage
var email = "user.name+tag@example.co.uk"
if validateEmail(email) {
var parts = extractEmailParts(email)
fmt.Printf("Valid email - Username: %s, Domain: %s\n",
parts["username"], parts["domain"])
}
|
Configuration Parsing
| Configuration Parsing |
|---|
| package main
import regex
import fmt
function parseConfigLine(line string) map {
// Parse key=value or key="quoted value"
var matches, err = regex.FindStringSubmatch("^\\s*(\\w+)\\s*=\\s*(?:\"([^\"]*)\"|([^\\s#]+))", line)
if err != None or len(matches) < 4 {
return None
}
var value = matches[2] // Quoted value
if value == "" {
value = matches[3] // Unquoted value
}
return {
"key": matches[1],
"value": value
}
}
// Usage
var configLines = [
"port=8080",
"host=\"localhost\"",
"debug=true"
]
for line in configLines {
var config = parseConfigLine(line)
if config != None {
fmt.Printf("%s = %s\n", config["key"], config["value"])
}
}
|
| HTTP Header Parsing |
|---|
| package main
import regex
import fmt
function parseContentType(header string) map {
// Parse Content-Type: text/html; charset=utf-8
var matches, err = regex.FindStringSubmatch("^([^/]+)/([^;\\s]+)(?:;\\s*charset=([^;\\s]+))?", header)
if err != None or len(matches) < 3 {
return None
}
return {
"type": matches[1],
"subtype": matches[2],
"charset": matches[3]
}
}
// Usage for HTTP module integration
var contentType = "application/json; charset=utf-8"
var parsed = parseContentType(contentType)
if parsed != None {
fmt.Printf("Type: %s/%s, Charset: %s\n",
parsed["type"], parsed["subtype"], parsed["charset"])
}
|
Basic Pattern Matching Examples
FindAllString
| FindAllString |
|---|
| package main
var all, err = regex.FindAllString("[a-z]+", "go is fun", -1)
if err != None {
fmt.Println("error:", err)
} else {
fmt.Println(all) // ["go", "is", "fun"]
}
|
FindStringIndex
| FindStringIndex |
|---|
| package main
var idx, err = regex.FindStringIndex("foo", "xxfooYY")
if err != None {
fmt.Println("error:", err)
} else {
fmt.Println(idx) // [2, 5]
}
|
ReplaceAllString
| ReplaceAllString |
|---|
| package main
var out, err = regex.ReplaceAllString("\\d+", "abc123xyz456", "#")
if err != None {
fmt.Println("error:", err)
} else {
fmt.Println(out) // "abc#xyz#"
}
|
ReplaceAllFunc (callback-based replacement)
ReplaceAllFunc lets you compute the replacement string using a Harneet callback for each match, similar to Go's ReplaceAllStringFunc:
| ReplaceAllFunc Uppercase Matches |
|---|
| package main
import regex
import fmt
function toUpperMatch(info RegexMatchInfo) string {
// info.text is the full match; info.groups[0] is also the full match, groups[1:] are captures
// info.indices is a flattened [start0, end0, start1, end1, ...] array in byte offsets
var m = info.text
return m.Upper()
}
var input = "a peach and a punch"
var out, err = regex.ReplaceAllFunc("p([a-z]+)ch", input, toUpperMatch)
if err != None {
fmt.Println("error:", err)
} else {
fmt.Println(out) // "a PEACH and a PUNCH"
|
Callback Rules & Safety Notes:
- The callback must accept exactly one argument, a
RegexMatchInfo object. - It must return a string; if it returns any other type,
ReplaceAllFunc returns (None, error). - If the callback returns an
Error or ErrorValue, iteration stops and that error is returned. - Pattern compilation errors are surfaced in the
error position, like other regex functions.
RegexMatchInfo
RegexMatchInfo is a struct-like object constructed by the regex module and passed into ReplaceAllFunc callbacks. You can treat it like any other struct in Harneet:
- Access fields with dot syntax:
info.fieldName. - It is not constructed directly by user code; it is produced internally for each match.
Fields currently provided:
pattern (string): the regex pattern used for the match. text (string): full match text. groups (array<string>): capture groups (groups[0] is full match, groups[1:] are captures). indices (array<int>): flattened [start0, end0, start1, end1, ...] byte indices into the original string. start (int): start index of the full match. end (int): end index of the full match. groupCount (int): number of capture groups (excluding the full match). groupNames (array<string>): names for each subexpression as returned by Go's SubexpNames() (same length as groups; entries may be empty strings when groups are unnamed).
This design keeps the callback signature stable while allowing the regex module to evolve (for example, by adding more metadata fields later) without breaking existing callbacks.
Split
| Split |
|---|
| package main
var parts, err = regex.Split("\\s+", "a b c", -1)
if err != None {
fmt.Println("error:", err)
} else {
fmt.Println(parts) // ["a", "b", "c"]
}
|
Function Reference
Basic Functions
| Function | Description | Returns |
Match(pattern, s) | Test if pattern matches string | (boolean, error) |
FindString(pattern, s) | Find first match | (string, error) |
FindAllString(pattern, s, n) | Find all matches | (array, error) |
FindStringIndex(pattern, s) | Find position of first match | (array, error) |
Capture Group Functions
| Function | Description | Returns |
FindStringSubmatch(pattern, s) | Extract capture groups from first match | (array, error) |
FindAllStringSubmatch(pattern, s, n) | Extract capture groups from all matches | (array, error) |
FindStringSubmatchIndex(pattern, s) | Get positions of capture groups | (array, error) |
String Manipulation
| Function | Description | Returns |
ReplaceAllString(pattern, s, repl) | Replace all matches | (string, error) |
ReplaceAllFunc(pattern, s, fn) | Replace matches using a callback | (string, error) |
Split(pattern, s, n) | Split string by pattern | (array, error) |
Common Patterns
Email Validation
| Email Validation |
|---|
| var isValidEmail = function(email string) bool {
var matches, err = regex.FindStringSubmatch("^([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})$", email)
return err == None and len(matches) >= 3
}
|
| Phone Number Formats |
|---|
| // US phone numbers: (555) 123-4567 or 555-123-4567
var phonePattern = "(?:\\((\\d{3})\\)\\s+|)(\\d{3})-(\\d{4})"
|
URL Components
| URL Components |
|---|
| // Extract protocol, host, port, path from URLs
var urlPattern = "(https?)://([^:/]+)(?::(\\d+))?(/.*)?(?:\\?(.*))?"
|
| Date Formats |
|---|
| var isoDate = "(\\d{4})-(\\d{2})-(\\d{2})" // 2024-09-25
var usDate = "(\\d{2})/(\\d{2})/(\\d{4})" // 09/25/2024
var longDate = "(\\w+)\\s+(\\d{1,2}),\\s+(\\d{4})" // September 25, 2024
|
Log Parsing
| Log Parsing Pattern |
|---|
| // Parse log entries: 2024-09-25 14:30:15 [ERROR] Message
var logPattern = "(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}:\\d{2}:\\d{2})\\s+\\[(\\w+)\\]\\s+(.*)"
|
Best Practices
1. Always Handle Errors
| Always Handle Errors |
|---|
| package main
var matches, err = regex.FindStringSubmatch(pattern, text)
if err != None {
fmt.Printf("Regex error: %s\n", err)
return
}
|
2. Use Raw Strings for Complex Patterns
| Raw Strings for Patterns |
|---|
| // Easier to read and maintain
var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"
|
3. Validate Before Processing
| Validate Before Processing |
|---|
| package main
var matches, err = regex.FindStringSubmatch(pattern, text)
if err != None or len(matches) < expectedGroups {
// Handle invalid input
return
}
|
4. Use Descriptive Variable Names
| Descriptive Variable Names |
|---|
| package main
var emailMatches, _ = regex.FindStringSubmatch(emailPattern, userInput)
var username = emailMatches[1]
var domain = emailMatches[2]
|
Error Handling
| Error Handling |
|---|
| package main
var bad, err = regex.Match("(", "test")
if err != None {
fmt.Println("got expected error:", err)
}
|
Common regex errors: - Invalid syntax: Unclosed groups, invalid escape sequences - Compilation errors: Malformed patterns, unsupported features - Runtime errors: Pattern too complex, stack overflow
1. Compile Once, Use Many Times
| Compile Once Use Many |
|---|
| package main
// Good: Reuse the same pattern
var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"
function validateEmails(emails array) {
for email in emails {
var matches, _ = regex.FindStringSubmatch(emailPattern, email)
// Process matches...
}
}
|
2. Use Specific Patterns
| Use Specific Patterns |
|---|
| package main
// Better: More specific
var phonePattern = "\\d{3}-\\d{3}-\\d{4}"
// Avoid: Too general
var badPattern = ".*-.*-.*"
|
3. Limit Backtracking
| Limit Backtracking |
|---|
| package main
// Good: Non-greedy quantifiers when appropriate
var htmlTag = "<(\\w+).*?>"
// Can be slow: Excessive backtracking
var slowPattern = "(a+)+b"
|
Integration Examples
With HTTP Module
| With HTTP Module |
|---|
| package main
import http
import regex
import fmt
function parseUserAgent(userAgent string) map {
var pattern = "([^/]+)/([\\d\\.]+)\\s*\\(([^)]+)\\)"
var matches, err = regex.FindStringSubmatch(pattern, userAgent)
if err != None or len(matches) < 4 {
return None
}
return {
"browser": matches[1],
"version": matches[2],
"platform": matches[3]
}
}
// Usage in HTTP handler
function handleRequest(request map, response map) {
var userAgent = request["headers"]["User-Agent"]
var parsed = parseUserAgent(userAgent)
if parsed != None {
fmt.Printf("Browser: %s %s on %s\n",
parsed["browser"], parsed["version"], parsed["platform"])
}
}
|
With JSON Module
| With JSON Module |
|---|
| package main
import json
import regex
import fmt
function extractDataFromText(text string) map {
var emailPattern = "([\\w\\._%+-]+)@([\\w\\.-]+\\.[A-Za-z]{2,})"
var phonePattern = "(\\d{3})-(\\d{3})-(\\d{4})"
var emails, _ = regex.FindAllStringSubmatch(emailPattern, text, -1)
var phones, _ = regex.FindAllStringSubmatch(phonePattern, text, -1)
var result = {
"emails": [],
"phones": []
}
// Process emails
for email in emails {
var emailData = {"full": email[0], "username": email[1], "domain": email[2]}
result["emails"] = append(result["emails"], emailData)
}
// Process phones
for phone in phones {
var phoneData = {"full": phone[0], "area": phone[1], "exchange": phone[2], "number": phone[3]}
result["phones"] = append(result["phones"], phoneData)
}
return result
}
|
With Cast Module
| With Cast Module |
|---|
| package main
import cast
import regex
import fmt
function parseConfigValue(line string) map {
var pattern = "^\\s*(\\w+)\\s*=\\s*([\"']?)([^\"'\\n]*?)\\2\\s*$"
var matches, err = regex.FindStringSubmatch(pattern, line)
if err != None or len(matches) < 4 {
return None
}
var key = matches[1]
var value = matches[3]
// Try to cast to appropriate type
var intValue, intErr = cast.ToInt(value)
if intErr == None {
return {"key": key, "value": intValue, "type": "int"}
}
var boolValue, boolErr = cast.ToBool(value)
if boolErr == None {
return {"key": key, "value": boolValue, "type": "bool"}
}
return {"key": key, "value": value, "type": "string"}
}
|
Regular Expression Syntax
Harneet uses Go's RE2 syntax. Key features:
Character Classes
\d - Digits (0-9) \w - Word characters (a-z, A-Z, 0-9, _) \s - Whitespace [abc] - Character set [^abc] - Negated character set [a-z] - Character range
Quantifiers
* - Zero or more + - One or more ? - Zero or one {n} - Exactly n {n,} - n or more {n,m} - Between n and m
Anchors
^ - Start of string $ - End of string \b - Word boundary
Groups
(...) - Capture group (?:...) - Non-capture group (?P<name>...) - Named group (not yet supported)
Escape Sequences
Remember to double-escape in Harneet strings: - \\d for \d - \\\\ for \\ - \\" for "
Limitations
Current limitations in Harneet's regex implementation: - No named capture groups yet - No lookahead/lookbehind assertions - No conditional expressions - No recursive patterns
See Also
Note: All regex functions follow Harneet's standard (result, error) tuple return pattern for consistent error handling.