Skip to content

Sets

Sets are collections that store unique elements with fast membership testing and set theory operations. Harneet's Set implementation uses hash-based storage for O(1) average-case performance on core operations.

Overview

A Set automatically ensures all elements are unique and provides efficient operations for: - Adding and removing elements - Testing membership - Set theory operations (union, intersection, difference) - Iteration over unique elements

Creating a Set

Creating a Set
1
2
3
import collections

var set = collections.new_set()

Basic Operations

Adding Elements

Adding Elements
1
2
3
4
5
6
7
8
9
var fruits = collections.new_set()

fruits.add("apple")
fruits.add("banana")
fruits.add("orange")
fruits.add("apple")  // Duplicate - ignored

fmt.Println("Set size:", fruits.size())  // 3
fmt.Println("Set:", fruits.inspect())    // {"apple", "banana", "orange"}

Removing Elements

Removing Elements
1
2
3
4
5
6
7
8
9
// Remove specific element
var removed = fruits.remove("banana")
fmt.Println("Removed banana:", removed)  // true

// Try to remove non-existent element
var not_found = fruits.remove("grape")
fmt.Println("Removed grape:", not_found)  // false

fmt.Println("Set after removal:", fruits.inspect())  // {"apple", "orange"}

Membership Testing

Membership Testing
// Check if element exists
if fruits.contains("apple") {
    fmt.Println("Set contains apple")
}

if !fruits.contains("banana") {
    fmt.Println("Set does not contain banana")
}

// Batch membership testing
var items_to_check = ["apple", "grape", "orange"]
for item in items_to_check {
    var exists = fruits.contains(item)
    fmt.Println(item, "exists:", exists)
}

Size and Emptiness

Size and Emptiness
1
2
3
4
5
6
7
fmt.Println("Set size:", fruits.size())
fmt.Println("Is empty:", fruits.is_empty())

// Clear all elements
fruits.clear()
fmt.Println("After clear - size:", fruits.size())     // 0
fmt.Println("After clear - empty:", fruits.is_empty()) // true

Set Theory Operations

Union (Combine Sets)

Union
var set1 = collections.new_set()
set1.add("a")
set1.add("b")
set1.add("c")

var set2 = collections.new_set()
set2.add("c")
set2.add("d")
set2.add("e")

// Union contains all elements from both sets
var union_set = set1.union(set2)
fmt.Println("Union:", union_set.inspect())  // {"a", "b", "c", "d", "e"}

Intersection (Common Elements)

Intersection
1
2
3
// Intersection contains only elements present in both sets
var intersection_set = set1.intersection(set2)
fmt.Println("Intersection:", intersection_set.inspect())  // {"c"}

Difference (Elements in First Set Only)

Difference
1
2
3
4
5
6
7
// Difference contains elements in set1 but not in set2
var difference_set = set1.difference(set2)
fmt.Println("Difference:", difference_set.inspect())  // {"a", "b"}

// Symmetric difference (elements in either set, but not both)
var sym_diff = set1.symmetric_difference(set2)
fmt.Println("Symmetric difference:", sym_diff.inspect())  // {"a", "b", "d", "e"}

Subset and Superset Testing

Subset and Superset
var small_set = collections.new_set()
small_set.add("a")
small_set.add("b")

var large_set = collections.new_set()
large_set.add("a")
large_set.add("b")
large_set.add("c")
large_set.add("d")

// Check subset relationship
fmt.Println("small_set is subset of large_set:", small_set.is_subset(large_set))     // true
fmt.Println("large_set is superset of small_set:", large_set.is_superset(small_set)) // true

// Check proper subset (subset but not equal)
fmt.Println("small_set is proper subset:", small_set.is_proper_subset(large_set))   // true

Advanced Features

Conversion and Iteration

Conversion and Iteration
var numbers = collections.new_set()
for i in range(1, 6) {
    numbers.add(i)
}

// Convert to array
var array = numbers.to_array()
fmt.Println("As array:", array)

// Iterate using iterator
var iter = numbers.iterator()
fmt.Println("Iterating:")
for iter.has_next() {
    var item = iter.next()
    fmt.Println("  Item:", item)
}

// Iterate using for-in loop (if supported)
for item in numbers {
    fmt.Println("  Item:", item)
}

Functional Operations

Functional Operations
var numbers = collections.new_set()
for i in range(1, 11) {
    numbers.add(i)
}

// Filter elements
var evens = numbers.filter(((n) =>) n % 2 == 0)
fmt.Println("Even numbers:", evens.inspect())

// Map elements (creates new set)
var squares = numbers.map(((n) =>) n * n)
fmt.Println("Squares:", squares.inspect())

// Reduce to single value
var sum = numbers.reduce(|acc, n| acc + n, 0)
fmt.Println("Sum:", sum)

Batch Operations

Batch Operations
var set = collections.new_set()

// Add multiple elements at once
var items = ["red", "green", "blue", "yellow"]
set.add_all(items)

// Remove multiple elements
var to_remove = ["red", "blue"]
set.remove_all(to_remove)

fmt.Println("After batch operations:", set.inspect())  // {"green", "yellow"}

Use Cases and Examples

1. Duplicate Removal

Duplicate Removal
function remove_duplicates(arr array) array {
    var set = collections.new_set()
    for item in arr {
        set.add(item)
    }
    return set.to_array()
}

var numbers = [1, 2, 2, 3, 3, 3, 4, 5, 5]
var unique = remove_duplicates(numbers)
fmt.Println("Unique numbers:", unique)  // [1, 2, 3, 4, 5]

2. Membership Testing for Large Collections

Fast Membership Testing
// Fast lookup for large datasets
var valid_ids = collections.new_set()
for i in range(1, 1000000) {
    valid_ids.add("user_" + i.to_string())
}

function is_valid_user(user_id string) bool {
    return valid_ids.contains(user_id)  // O(1) average case
}

// Usage
fmt.Println("Valid user:", is_valid_user("user_12345"))  // true
fmt.Println("Valid user:", is_valid_user("user_999999")) // false

3. Tag System

Tag System
struct Article {
    title: string
    content: string
    tags: Set
}

function new_article(title string, content string) Article {
    return Article{
        title: title,
        content: content,
        tags: collections.new_set()
    }
}

function (article *Article) add_tag(tag) {
    article.tags.add(tag)
}

function (article *Article) has_tag(tag) {
    return article.tags.contains(tag)
}

function (article *Article) has_any_tags(tag_list) {
    for tag in tag_list {
        if article.tags.contains(tag) {
            return true
        }
    }
    return false
}

// Usage
var article = new_article("Harneet Guide", "A comprehensive guide...")
article.add_tag("programming")
article.add_tag("tutorial")
article.add_tag("harneet")

fmt.Println("Has programming tag:", article.has_tag("programming"))  // true
fmt.Println("Has any tech tags:", article.has_any_tags(["tech", "programming"]))  // true

4. Set-based Permissions

Set-based Permissions
struct User {
    name: string
    permissions: Set
}

struct Role {
    name: string
    permissions: Set
}

function new_user(name string) User {
    return User{
        name: name,
        permissions: collections.new_set()
    }
}

function new_role(name string, permissions array) Role {
    var role = Role{name: name, permissions: collections.new_set()}
    for perm in permissions {
        role.permissions.add(perm)
    }
    return role
}

function (user *User) assign_role(role) {
    user.permissions = user.permissions.union(role.permissions)
}

function (user *User) has_permission(permission) {
    return user.permissions.contains(permission)
}

function (user *User) has_all_permissions(required_permissions) {
    for perm in required_permissions {
        if !user.permissions.contains(perm) {
            return false
        }
    }
    return true
}

// Usage
var admin_role = new_role("admin", ["read", "write", "delete", "manage_users"])
var editor_role = new_role("editor", ["read", "write"])

var user = new_user("alice")
user.assign_role(editor_role)

fmt.Println("Can read:", user.has_permission("read"))    // true
fmt.Println("Can delete:", user.has_permission("delete")) // false
fmt.Println("Can edit:", user.has_all_permissions(["read", "write"])) // true

5. Graph Algorithms with Sets

Graph Algorithms
struct Graph {
    vertices: Set
    edges: Map  // vertex -> Set of neighbors
}

function new_graph() Graph {
    return Graph{
        vertices: collections.new_set(),
        edges: collections.new_map()
    }
}

function (g *Graph) add_vertex(vertex) {
    g.vertices.add(vertex)
    if !g.edges.contains_key(vertex) {
        g.edges.set(vertex, collections.new_set())
    }
}

function (g *Graph) add_edge(from, to) {
    g.add_vertex(from)
    g.add_vertex(to)
    g.edges.get(from).add(to)
    g.edges.get(to).add(from)  // Undirected graph
}

function (g *Graph) get_neighbors(vertex) {
    return g.edges.get(vertex)
}

function (g *Graph) has_path(start, end) {
    if start == end {
        return true
    }

    var visited = collections.new_set()
    var queue = collections.new_queue()

    queue.enqueue(start)
    visited.add(start)

    for !queue.is_empty() {
        var current = queue.dequeue()
        var neighbors = g.get_neighbors(current)

        for neighbor in neighbors {
            if neighbor == end {
                return true
            }
            if !visited.contains(neighbor) {
                visited.add(neighbor)
                queue.enqueue(neighbor)
            }
        }
    }

    return false
}

// Usage
var graph = new_graph()
graph.add_edge("A", "B")
graph.add_edge("B", "C")
graph.add_edge("C", "D")

fmt.Println("Path A to D:", graph.has_path("A", "D"))  // true
fmt.Println("Path A to E:", graph.has_path("A", "E"))  // false

Performance Characteristics

Time Complexity

Operation Average Case Worst Case Notes
Add O(1) O(n) Hash collision dependent
Remove O(1) O(n) Hash collision dependent
Contains O(1) O(n) Hash collision dependent
Union O(n + m) O(n + m) n, m are set sizes
Intersection O(min(n, m)) O(n * m) Optimized for smaller set
Difference O(n) O(n * m) n is first set size
Size O(1) O(1) Cached counter
Is Empty O(1) O(1) Simple size check

Space Complexity

  • Storage: O(n) where n is the number of unique elements
  • Load Factor: Typically maintains 0.75 load factor for optimal performance
  • Hash Table Overhead: Additional 25-50% memory for hash buckets

Hash Function Performance

The Set implementation uses efficient hash functions optimized for different data types:

Hash Function Performance
// Different types have optimized hashing
var mixed_set = collections.new_set()
mixed_set.add(42)           // Integer hash
mixed_set.add("hello")      // String hash
mixed_set.add([1, 2, 3])    // Array hash
mixed_set.add({"key": "value"}) // Map hash

// Custom hashable types can implement hash() method
struct Point {
    x: int
    y: int
}

function (p Point) hash() {
    return p.x * 31 + p.y  // Simple hash combination
}

var points = collections.new_set()
points.add(Point{x: 1, y: 2})
points.add(Point{x: 3, y: 4})

Error Handling

Sets handle errors gracefully:

Error Handling
var set = collections.new_set()

// Adding non-hashable types
var result = set.add(some_unhashable_object)
if result.type() == "ERROR" {
    fmt.Println("Cannot add non-hashable type to set")
}

// Operations on empty sets
var empty_set = collections.new_set()
var union_result = empty_set.union(set)  // Works fine, returns copy of set

// Null handling
set.add(null)  // null is hashable and can be stored
fmt.Println("Contains null:", set.contains(null))

Best Practices

1. Choose Sets for Uniqueness

Choose Sets for Uniqueness
1
2
3
4
5
// Use sets when uniqueness is important
var unique_visitors = collections.new_set()
function track_visitor(user_id string) {
    unique_visitors.add(user_id)  // Automatically handles duplicates
}

2. Pre-size for Known Capacity

Pre-size Sets
1
2
3
// If you know approximate size, reserve space
var large_set = collections.new_set()
large_set.reserve(10000)  // Reduces hash table resizing

3. Use Set Operations Instead of Loops

Use Set Operations
// Instead of manual loops
var common_elements = collections.new_set()
for item in set1 {
    if set2.contains(item) {
        common_elements.add(item)
    }
}

// Use built-in intersection
var common_elements = set1.intersection(set2)  // More efficient

4. Consider Memory Usage for Large Sets

Monitor Memory Usage
// Monitor memory usage for large sets
var large_set = collections.new_set()
for i in range(1000000) {
    large_set.add(i)
}

// Periodically check size and clear if needed
if large_set.size() > threshold {
    large_set.clear()
}

5. Use Concurrent Sets for Multi-threading

Concurrent Sets
1
2
3
// For concurrent access
var shared_set = collections.new_concurrent_set()
// Thread-safe operations across goroutines

Migration from Arrays

If you're currently using arrays to maintain unique elements:

Migration from Arrays
// Old approach with arrays
var unique_items = []
function add_unique(item any) {
    for existing in unique_items {
        if existing == item {
            return  // Already exists
        }
    }
    unique_items.append(item)  // O(n) check + O(1) append
}

// New approach with sets
var unique_items = collections.new_set()
function add_unique(item any) {
    unique_items.add(item)  // O(1) average case
}

Sets provide a more efficient and cleaner solution for managing unique collections with powerful set theory operations built-in.