Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8
  • .NET 7.0 (C#)
  • Rust
  • Regex Flavor Guide

Function

  • Match
  • Substitution
  • List
  • Unit Tests

Tools

Sponsors
There are currently no sponsors. Become a sponsor today!
An explanation of your regex will be automatically generated as you type.
Detailed match information will be displayed here automatically.
  • All Tokens
  • Common Tokens
  • General Tokens
  • Anchors
  • Meta Sequences
  • Quantifiers
  • Group Constructs
  • Character Classes
  • Flags/Modifiers
  • Substitution
  • A single character of: a, b or c
    [abc]
  • A character except: a, b or c
    [^abc]
  • A character in the range: a-z
    [a-z]
  • A character not in the range: a-z
    [^a-z]
  • A character in the range: a-z or A-Z
    [a-zA-Z]
  • Any single character
    .
  • Alternate - match either a or b
    a|b
  • Any whitespace character
    \s
  • Any non-whitespace character
    \S
  • Any digit
    \d
  • Any non-digit
    \D
  • Any word character
    \w
  • Any non-word character
    \W
  • Non-capturing group
    (?:...)
  • Capturing group
    (...)
  • Zero or one of a
    a?
  • Zero or more of a
    a*
  • One or more of a
    a+
  • Exactly 3 of a
    a{3}
  • 3 or more of a
    a{3,}
  • Between 3 and 6 of a
    a{3,6}
  • Start of string
    ^
  • End of string
    $
  • A word boundary
    \b
  • Non-word boundary
    \B

Regular Expression
No Match

r"
"
gms

Test String

Substitution

Processing...

Code Generator

Generated Code

import Foundation let pattern = #"\\\[(.*?)\\\]"# let regex = try! NSRegularExpression(pattern: pattern, options: [.anchorsMatchLines, .dotMatchesLineSeparators]) let testString = #####""" # Equivalency Lookups: Concepts, Techniques, and Applications Equivalency lookups are a common computational task where you determine whether two or more entities are equivalent based on certain criteria. This concept underpins a wide range of problems, from database joins to synonym matching, hash-based comparisons, and distributed system consistency checks. --- ## 1. **What Are Equivalency Lookups?** ### Definition An **equivalency lookup** is the process of identifying if two elements belong to the same equivalence class based on a defined equivalence relation $( R )$. ### Properties of Equivalence Relations An equivalence relation $( R )$ satisfies three properties: 1. **Reflexivity**: $( a R a )$ (an element is equivalent to itself). 2. **Symmetry**: $( a R b \implies b R a )$ (if $( a )$ is equivalent to $( b )$, then $( b )$ is equivalent to $( a )$). 3. **Transitivity**: $( a R b )$ and $( b R c \implies a R c )$ (if $( a )$ is equivalent to $( b )$, and $( b )$ is equivalent to $( c )$, then $( a )$ is equivalent to $( c )$). ### Real-World Examples - **Dictionary Synonyms**: Checking if two words (e.g., "fast" and "quick") are synonyms. - **User Identity Matching**: Verifying if two user accounts refer to the same individual. - **Canonical Representation**: Mapping equivalent objects to a single representative for efficiency (e.g., hash-based deduplication). --- ## 2. **Techniques for Equivalency Lookups** ### 2.1 Hashing - Use **hash functions** to map equivalent objects to the same hash value. - Efficient for exact matches (e.g., strings, integers). - Example: Checking file equivalency via MD5 or SHA-256 hash comparison. #### Example: Hash-Based Lookup ```python # Equivalency check for strings using hashes import hashlib def get_hash(value): return hashlib.md5(value.encode()).hexdigest() a = "hello" b = "hello" print(get_hash(a) == get_hash(b)) # Output: True ``` ### 2.2 Union-Find (Disjoint Set) - Efficient data structure for handling equivalency relations in dynamic systems. - Operations: - **Find**: Determine the equivalence class of an element. - **Union**: Merge two equivalence classes. - Applications: Network connectivity, Kruskal’s algorithm for MST, and clustering. #### Example: Union-Find Implementation ```python class UnionFind: def __init__(self, size): self.parent = list(range(size)) def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) # Path compression return self.parent[x] def union(self, x, y): rootX = self.find(x) rootY = self.find(y) if rootX != rootY: self.parent[rootX] = rootY # Usage uf = UnionFind(10) uf.union(1, 2) uf.union(2, 3) print(uf.find(1) == uf.find(3)) # Output: True ``` ### 2.3 Canonicalization - Transform each object into a canonical form such that equivalent objects are identical. - Examples: - Sorting strings for anagrams (e.g., "cat" and "tac" → "act"). - Reducing fractions to lowest terms. #### Example: Canonicalizing Anagrams ```python def canonical_form(word): return ''.join(sorted(word)) print(canonical_form("listen") == canonical_form("silent")) # Output: True ``` ### 2.4 Database Indexes - Use indexes for equivalency lookups in structured data. - Example: SQL query to find all users with the same email address. #### Example: SQL Query ```sql SELECT user_id FROM users WHERE email = 'example@example.com'; ``` --- ## 3. **Applications of Equivalency Lookups** ### 3.1 Data Deduplication - Identifying and removing duplicate records or files. - Technique: Hash-based deduplication or clustering similar records. ### 3.2 Graph Connectivity - Check if two nodes are in the same connected component. - Technique: Union-Find or BFS/DFS. ### 3.3 Synonym Matching - Resolve different words or phrases that refer to the same concept. - Technique: Canonicalization or synonym dictionaries. ### 3.4 Distributed Systems - Ensure consistency by checking if replicas are equivalent. - Technique: Compare hash values of data on different servers. --- ## 4. **Optimizations for Large-Scale Lookups** ### 4.1 Bloom Filters - Space-efficient data structure for approximate membership testing. - Useful for checking if an element might be equivalent to others in a large dataset. #### Example: Using Bloom Filter ```python from pybloom_live import BloomFilter bloom = BloomFilter(capacity=1000, error_rate=0.01) bloom.add("hello") print("hello" in bloom) # Output: True ``` ### 4.2 Caching - Store results of equivalency checks to avoid recomputation. - Use LRU (Least Recently Used) or LFU (Least Frequently Used) caches. #### Example: Caching Results with `functools.lru_cache` ```python from functools import lru_cache @lru_cache(maxsize=1000) def is_equivalent(a, b): return sorted(a) == sorted(b) print(is_equivalent("listen", "silent")) # Output: True ``` --- ## 5. **Challenges in Equivalency Lookups** 1. **Scalability**: - Large datasets require efficient data structures and algorithms. - Use distributed systems or approximate methods for very large inputs. 2. **Precision vs. Performance**: - Approximate methods (e.g., Bloom filters) trade off precision for speed. 3. **Ambiguity**: - Defining equivalence relations can be complex for real-world data (e.g., synonym matching may depend on context). 4. **Data Quality**: - Inconsistent or noisy data can lead to false equivalences. --- ## 6. **Summary** Equivalency lookups are essential across fields like data processing, graph theory, and distributed systems. Techniques like hashing, union-find, canonicalization, and Bloom filters provide robust solutions depending on the use case. By balancing accuracy and performance, equivalency lookups can be optimized for scalability and reliability in real-world applications. """##### let stringRange = NSRange(location: 0, length: testString.utf16.count) let substitutionString = #"$$\1$$"# let result = regex.stringByReplacingMatches(in: testString, range: stringRange, withTemplate: substitutionString) print(result)

Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Swift 5.2, please visit: https://developer.apple.com/documentation/foundation/nsregularexpression