Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8
  • .NET 7.0 (C#)
  • Rust
  • Regex Flavor Guide

Function

  • Match
  • Substitution
  • List
  • Unit Tests

Tools

Sponsors
There are currently no sponsors. Become a sponsor today!
An explanation of your regex will be automatically generated as you type.
Detailed match information will be displayed here automatically.
  • All Tokens
  • Common Tokens
  • General Tokens
  • Anchors
  • Meta Sequences
  • Quantifiers
  • Group Constructs
  • Character Classes
  • Flags/Modifiers
  • Substitution
  • A single character of: a, b or c
    [abc]
  • A character except: a, b or c
    [^abc]
  • A character in the range: a-z
    [a-z]
  • A character not in the range: a-z
    [^a-z]
  • A character in the range: a-z or A-Z
    [a-zA-Z]
  • Any single character
    .
  • Alternate - match either a or b
    a|b
  • Any whitespace character
    \s
  • Any non-whitespace character
    \S
  • Any digit
    \d
  • Any non-digit
    \D
  • Any word character
    \w
  • Any non-word character
    \W
  • Non-capturing group
    (?:...)
  • Capturing group
    (...)
  • Zero or one of a
    a?
  • Zero or more of a
    a*
  • One or more of a
    a+
  • Exactly 3 of a
    a{3}
  • 3 or more of a
    a{3,}
  • Between 3 and 6 of a
    a{3,6}
  • Start of string
    ^
  • End of string
    $
  • A word boundary
    \b
  • Non-word boundary
    \B

Regular Expression
No Match

r"
"
gms

Test String

Substitution

Processing...

Code Generator

Generated Code

#include <MsgBoxConstants.au3> ; to declare the Constants of MsgBox Local $sRegex = "(?ms)\\\[(.*?)\\\]" Local $sString = "# Equivalency Lookups: Concepts, Techniques, and Applications " & @CRLF & _ "" & @CRLF & _ "Equivalency lookups are a common computational task where you determine whether two or more entities are equivalent based on certain criteria. This concept underpins a wide range of problems, from database joins to synonym matching, hash-based comparisons, and distributed system consistency checks. " & @CRLF & _ "" & @CRLF & _ "---" & @CRLF & _ "" & @CRLF & _ "## 1. **What Are Equivalency Lookups?** " & @CRLF & _ "" & @CRLF & _ "### Definition " & @CRLF & _ "An **equivalency lookup** is the process of identifying if two elements belong to the same equivalence class based on a defined equivalence relation $( R )$. " & @CRLF & _ "" & @CRLF & _ "### Properties of Equivalence Relations " & @CRLF & _ "An equivalence relation $( R )$ satisfies three properties:" & @CRLF & _ "1. **Reflexivity**: $( a R a )$ (an element is equivalent to itself). " & @CRLF & _ "2. **Symmetry**: $( a R b \implies b R a )$ (if $( a )$ is equivalent to $( b )$, then $( b )$ is equivalent to $( a )$). " & @CRLF & _ "3. **Transitivity**: $( a R b )$ and $( b R c \implies a R c )$ (if $( a )$ is equivalent to $( b )$, and $( b )$ is equivalent to $( c )$, then $( a )$ is equivalent to $( c )$). " & @CRLF & _ "" & @CRLF & _ "### Real-World Examples " & @CRLF & _ "- **Dictionary Synonyms**: Checking if two words (e.g., "fast" and "quick") are synonyms. " & @CRLF & _ "- **User Identity Matching**: Verifying if two user accounts refer to the same individual. " & @CRLF & _ "- **Canonical Representation**: Mapping equivalent objects to a single representative for efficiency (e.g., hash-based deduplication). " & @CRLF & _ "" & @CRLF & _ "---" & @CRLF & _ "" & @CRLF & _ "## 2. **Techniques for Equivalency Lookups** " & @CRLF & _ "" & @CRLF & _ "### 2.1 Hashing " & @CRLF & _ "- Use **hash functions** to map equivalent objects to the same hash value. " & @CRLF & _ "- Efficient for exact matches (e.g., strings, integers). " & @CRLF & _ "- Example: Checking file equivalency via MD5 or SHA-256 hash comparison. " & @CRLF & _ "" & @CRLF & _ "#### Example: Hash-Based Lookup " & @CRLF & _ "```python" & @CRLF & _ "# Equivalency check for strings using hashes" & @CRLF & _ "import hashlib" & @CRLF & _ "" & @CRLF & _ "def get_hash(value):" & @CRLF & _ " return hashlib.md5(value.encode()).hexdigest()" & @CRLF & _ "" & @CRLF & _ "a = "hello"" & @CRLF & _ "b = "hello"" & @CRLF & _ "" & @CRLF & _ "print(get_hash(a) == get_hash(b)) # Output: True" & @CRLF & _ "```" & @CRLF & _ "" & @CRLF & _ "### 2.2 Union-Find (Disjoint Set) " & @CRLF & _ "- Efficient data structure for handling equivalency relations in dynamic systems. " & @CRLF & _ "- Operations:" & @CRLF & _ " - **Find**: Determine the equivalence class of an element. " & @CRLF & _ " - **Union**: Merge two equivalence classes. " & @CRLF & _ "- Applications: Network connectivity, Kruskal’s algorithm for MST, and clustering. " & @CRLF & _ "" & @CRLF & _ "#### Example: Union-Find Implementation " & @CRLF & _ "```python" & @CRLF & _ "class UnionFind:" & @CRLF & _ " def __init__(self, size):" & @CRLF & _ " self.parent = list(range(size))" & @CRLF & _ " " & @CRLF & _ " def find(self, x):" & @CRLF & _ " if self.parent[x] != x:" & @CRLF & _ " self.parent[x] = self.find(self.parent[x]) # Path compression" & @CRLF & _ " return self.parent[x]" & @CRLF & _ " " & @CRLF & _ " def union(self, x, y):" & @CRLF & _ " rootX = self.find(x)" & @CRLF & _ " rootY = self.find(y)" & @CRLF & _ " if rootX != rootY:" & @CRLF & _ " self.parent[rootX] = rootY" & @CRLF & _ "" & @CRLF & _ "# Usage" & @CRLF & _ "uf = UnionFind(10)" & @CRLF & _ "uf.union(1, 2)" & @CRLF & _ "uf.union(2, 3)" & @CRLF & _ "print(uf.find(1) == uf.find(3)) # Output: True" & @CRLF & _ "```" & @CRLF & _ "" & @CRLF & _ "### 2.3 Canonicalization " & @CRLF & _ "- Transform each object into a canonical form such that equivalent objects are identical. " & @CRLF & _ "- Examples:" & @CRLF & _ " - Sorting strings for anagrams (e.g., "cat" and "tac" → "act"). " & @CRLF & _ " - Reducing fractions to lowest terms. " & @CRLF & _ "" & @CRLF & _ "#### Example: Canonicalizing Anagrams " & @CRLF & _ "```python" & @CRLF & _ "def canonical_form(word):" & @CRLF & _ " return ''.join(sorted(word))" & @CRLF & _ "" & @CRLF & _ "print(canonical_form("listen") == canonical_form("silent")) # Output: True" & @CRLF & _ "```" & @CRLF & _ "" & @CRLF & _ "### 2.4 Database Indexes " & @CRLF & _ "- Use indexes for equivalency lookups in structured data. " & @CRLF & _ "- Example: SQL query to find all users with the same email address. " & @CRLF & _ "" & @CRLF & _ "#### Example: SQL Query " & @CRLF & _ "```sql" & @CRLF & _ "SELECT user_id FROM users WHERE email = 'example@example.com';" & @CRLF & _ "```" & @CRLF & _ "" & @CRLF & _ "---" & @CRLF & _ "" & @CRLF & _ "## 3. **Applications of Equivalency Lookups** " & @CRLF & _ "" & @CRLF & _ "### 3.1 Data Deduplication " & @CRLF & _ "- Identifying and removing duplicate records or files. " & @CRLF & _ "- Technique: Hash-based deduplication or clustering similar records. " & @CRLF & _ "" & @CRLF & _ "### 3.2 Graph Connectivity " & @CRLF & _ "- Check if two nodes are in the same connected component. " & @CRLF & _ "- Technique: Union-Find or BFS/DFS. " & @CRLF & _ "" & @CRLF & _ "### 3.3 Synonym Matching " & @CRLF & _ "- Resolve different words or phrases that refer to the same concept. " & @CRLF & _ "- Technique: Canonicalization or synonym dictionaries. " & @CRLF & _ "" & @CRLF & _ "### 3.4 Distributed Systems " & @CRLF & _ "- Ensure consistency by checking if replicas are equivalent. " & @CRLF & _ "- Technique: Compare hash values of data on different servers. " & @CRLF & _ "" & @CRLF & _ "---" & @CRLF & _ "" & @CRLF & _ "## 4. **Optimizations for Large-Scale Lookups** " & @CRLF & _ "" & @CRLF & _ "### 4.1 Bloom Filters " & @CRLF & _ "- Space-efficient data structure for approximate membership testing. " & @CRLF & _ "- Useful for checking if an element might be equivalent to others in a large dataset. " & @CRLF & _ "" & @CRLF & _ "#### Example: Using Bloom Filter " & @CRLF & _ "```python" & @CRLF & _ "from pybloom_live import BloomFilter" & @CRLF & _ "" & @CRLF & _ "bloom = BloomFilter(capacity=1000, error_rate=0.01)" & @CRLF & _ "bloom.add("hello")" & @CRLF & _ "print("hello" in bloom) # Output: True" & @CRLF & _ "```" & @CRLF & _ "" & @CRLF & _ "### 4.2 Caching " & @CRLF & _ "- Store results of equivalency checks to avoid recomputation. " & @CRLF & _ "- Use LRU (Least Recently Used) or LFU (Least Frequently Used) caches. " & @CRLF & _ "" & @CRLF & _ "#### Example: Caching Results with `functools.lru_cache` " & @CRLF & _ "```python" & @CRLF & _ "from functools import lru_cache" & @CRLF & _ "" & @CRLF & _ "@lru_cache(maxsize=1000)" & @CRLF & _ "def is_equivalent(a, b):" & @CRLF & _ " return sorted(a) == sorted(b)" & @CRLF & _ "" & @CRLF & _ "print(is_equivalent("listen", "silent")) # Output: True" & @CRLF & _ "```" & @CRLF & _ "" & @CRLF & _ "---" & @CRLF & _ "" & @CRLF & _ "## 5. **Challenges in Equivalency Lookups** " & @CRLF & _ "" & @CRLF & _ "1. **Scalability**: " & @CRLF & _ " - Large datasets require efficient data structures and algorithms. " & @CRLF & _ " - Use distributed systems or approximate methods for very large inputs. " & @CRLF & _ "" & @CRLF & _ "2. **Precision vs. Performance**: " & @CRLF & _ " - Approximate methods (e.g., Bloom filters) trade off precision for speed. " & @CRLF & _ "" & @CRLF & _ "3. **Ambiguity**: " & @CRLF & _ " - Defining equivalence relations can be complex for real-world data (e.g., synonym matching may depend on context). " & @CRLF & _ "" & @CRLF & _ "4. **Data Quality**: " & @CRLF & _ " - Inconsistent or noisy data can lead to false equivalences. " & @CRLF & _ "" & @CRLF & _ "---" & @CRLF & _ "" & @CRLF & _ "## 6. **Summary** " & @CRLF & _ "Equivalency lookups are essential across fields like data processing, graph theory, and distributed systems. Techniques like hashing, union-find, canonicalization, and Bloom filters provide robust solutions depending on the use case. By balancing accuracy and performance, equivalency lookups can be optimized for scalability and reliability in real-world applications. " Local $sSubst = "$$\1$$" Local $sResult = StringRegExpReplace($sString, $sRegex, $sSubst) MsgBox($MB_SYSTEMMODAL, "Result", $sResult)

Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for AutoIt, please visit: https://www.autoitscript.com/autoit3/docs/functions/StringRegExp.htm