Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8
  • .NET 7.0 (C#)
  • Rust
  • Regex Flavor Guide

Function

  • Match
  • Substitution
  • List
  • Unit Tests

Tools

Sponsors
There are currently no sponsors. Become a sponsor today!
An explanation of your regex will be automatically generated as you type.
Detailed match information will be displayed here automatically.
  • All Tokens
  • Common Tokens
  • General Tokens
  • Anchors
  • Meta Sequences
  • Quantifiers
  • Group Constructs
  • Character Classes
  • Flags/Modifiers
  • Substitution
  • A single character of: a, b or c
    [abc]
  • A character except: a, b or c
    [^abc]
  • A character in the range: a-z
    [a-z]
  • A character not in the range: a-z
    [^a-z]
  • A character in the range: a-z or A-Z
    [a-zA-Z]
  • Any single character
    .
  • Alternate - match either a or b
    a|b
  • Any whitespace character
    \s
  • Any non-whitespace character
    \S
  • Any digit
    \d
  • Any non-digit
    \D
  • Any word character
    \w
  • Any non-word character
    \W
  • Non-capturing group
    (?:...)
  • Capturing group
    (...)
  • Zero or one of a
    a?
  • Zero or more of a
    a*
  • One or more of a
    a+
  • Exactly 3 of a
    a{3}
  • 3 or more of a
    a{3,}
  • Between 3 and 6 of a
    a{3,6}
  • Start of string
    ^
  • End of string
    $
  • A word boundary
    \b
  • Non-word boundary
    \B

Regular Expression
No Match

r"
"
gms

Test String

Substitution

Processing...

Code Generator

Generated Code

import java.util.regex.Matcher; import java.util.regex.Pattern; public class Example { public static void main(String[] args) { final String regex = "\\\\\\[(.*?)\\\\\\]"; final String string = "# Equivalency Lookups: Concepts, Techniques, and Applications \n\n" + "Equivalency lookups are a common computational task where you determine whether two or more entities are equivalent based on certain criteria. This concept underpins a wide range of problems, from database joins to synonym matching, hash-based comparisons, and distributed system consistency checks. \n\n" + "---\n\n" + "## 1. **What Are Equivalency Lookups?** \n\n" + "### Definition \n" + "An **equivalency lookup** is the process of identifying if two elements belong to the same equivalence class based on a defined equivalence relation $( R )$. \n\n" + "### Properties of Equivalence Relations \n" + "An equivalence relation $( R )$ satisfies three properties:\n" + "1. **Reflexivity**: $( a R a )$ (an element is equivalent to itself). \n" + "2. **Symmetry**: $( a R b \\implies b R a )$ (if $( a )$ is equivalent to $( b )$, then $( b )$ is equivalent to $( a )$). \n" + "3. **Transitivity**: $( a R b )$ and $( b R c \\implies a R c )$ (if $( a )$ is equivalent to $( b )$, and $( b )$ is equivalent to $( c )$, then $( a )$ is equivalent to $( c )$). \n\n" + "### Real-World Examples \n" + "- **Dictionary Synonyms**: Checking if two words (e.g., \"fast\" and \"quick\") are synonyms. \n" + "- **User Identity Matching**: Verifying if two user accounts refer to the same individual. \n" + "- **Canonical Representation**: Mapping equivalent objects to a single representative for efficiency (e.g., hash-based deduplication). \n\n" + "---\n\n" + "## 2. **Techniques for Equivalency Lookups** \n\n" + "### 2.1 Hashing \n" + "- Use **hash functions** to map equivalent objects to the same hash value. \n" + "- Efficient for exact matches (e.g., strings, integers). \n" + "- Example: Checking file equivalency via MD5 or SHA-256 hash comparison. \n\n" + "#### Example: Hash-Based Lookup \n" + "```python\n" + "# Equivalency check for strings using hashes\n" + "import hashlib\n\n" + "def get_hash(value):\n" + " return hashlib.md5(value.encode()).hexdigest()\n\n" + "a = \"hello\"\n" + "b = \"hello\"\n\n" + "print(get_hash(a) == get_hash(b)) # Output: True\n" + "```\n\n" + "### 2.2 Union-Find (Disjoint Set) \n" + "- Efficient data structure for handling equivalency relations in dynamic systems. \n" + "- Operations:\n" + " - **Find**: Determine the equivalence class of an element. \n" + " - **Union**: Merge two equivalence classes. \n" + "- Applications: Network connectivity, Kruskal’s algorithm for MST, and clustering. \n\n" + "#### Example: Union-Find Implementation \n" + "```python\n" + "class UnionFind:\n" + " def __init__(self, size):\n" + " self.parent = list(range(size))\n" + " \n" + " def find(self, x):\n" + " if self.parent[x] != x:\n" + " self.parent[x] = self.find(self.parent[x]) # Path compression\n" + " return self.parent[x]\n" + " \n" + " def union(self, x, y):\n" + " rootX = self.find(x)\n" + " rootY = self.find(y)\n" + " if rootX != rootY:\n" + " self.parent[rootX] = rootY\n\n" + "# Usage\n" + "uf = UnionFind(10)\n" + "uf.union(1, 2)\n" + "uf.union(2, 3)\n" + "print(uf.find(1) == uf.find(3)) # Output: True\n" + "```\n\n" + "### 2.3 Canonicalization \n" + "- Transform each object into a canonical form such that equivalent objects are identical. \n" + "- Examples:\n" + " - Sorting strings for anagrams (e.g., \"cat\" and \"tac\" → \"act\"). \n" + " - Reducing fractions to lowest terms. \n\n" + "#### Example: Canonicalizing Anagrams \n" + "```python\n" + "def canonical_form(word):\n" + " return ''.join(sorted(word))\n\n" + "print(canonical_form(\"listen\") == canonical_form(\"silent\")) # Output: True\n" + "```\n\n" + "### 2.4 Database Indexes \n" + "- Use indexes for equivalency lookups in structured data. \n" + "- Example: SQL query to find all users with the same email address. \n\n" + "#### Example: SQL Query \n" + "```sql\n" + "SELECT user_id FROM users WHERE email = 'example@example.com';\n" + "```\n\n" + "---\n\n" + "## 3. **Applications of Equivalency Lookups** \n\n" + "### 3.1 Data Deduplication \n" + "- Identifying and removing duplicate records or files. \n" + "- Technique: Hash-based deduplication or clustering similar records. \n\n" + "### 3.2 Graph Connectivity \n" + "- Check if two nodes are in the same connected component. \n" + "- Technique: Union-Find or BFS/DFS. \n\n" + "### 3.3 Synonym Matching \n" + "- Resolve different words or phrases that refer to the same concept. \n" + "- Technique: Canonicalization or synonym dictionaries. \n\n" + "### 3.4 Distributed Systems \n" + "- Ensure consistency by checking if replicas are equivalent. \n" + "- Technique: Compare hash values of data on different servers. \n\n" + "---\n\n" + "## 4. **Optimizations for Large-Scale Lookups** \n\n" + "### 4.1 Bloom Filters \n" + "- Space-efficient data structure for approximate membership testing. \n" + "- Useful for checking if an element might be equivalent to others in a large dataset. \n\n" + "#### Example: Using Bloom Filter \n" + "```python\n" + "from pybloom_live import BloomFilter\n\n" + "bloom = BloomFilter(capacity=1000, error_rate=0.01)\n" + "bloom.add(\"hello\")\n" + "print(\"hello\" in bloom) # Output: True\n" + "```\n\n" + "### 4.2 Caching \n" + "- Store results of equivalency checks to avoid recomputation. \n" + "- Use LRU (Least Recently Used) or LFU (Least Frequently Used) caches. \n\n" + "#### Example: Caching Results with `functools.lru_cache` \n" + "```python\n" + "from functools import lru_cache\n\n" + "@lru_cache(maxsize=1000)\n" + "def is_equivalent(a, b):\n" + " return sorted(a) == sorted(b)\n\n" + "print(is_equivalent(\"listen\", \"silent\")) # Output: True\n" + "```\n\n" + "---\n\n" + "## 5. **Challenges in Equivalency Lookups** \n\n" + "1. **Scalability**: \n" + " - Large datasets require efficient data structures and algorithms. \n" + " - Use distributed systems or approximate methods for very large inputs. \n\n" + "2. **Precision vs. Performance**: \n" + " - Approximate methods (e.g., Bloom filters) trade off precision for speed. \n\n" + "3. **Ambiguity**: \n" + " - Defining equivalence relations can be complex for real-world data (e.g., synonym matching may depend on context). \n\n" + "4. **Data Quality**: \n" + " - Inconsistent or noisy data can lead to false equivalences. \n\n" + "---\n\n" + "## 6. **Summary** \n" + "Equivalency lookups are essential across fields like data processing, graph theory, and distributed systems. Techniques like hashing, union-find, canonicalization, and Bloom filters provide robust solutions depending on the use case. By balancing accuracy and performance, equivalency lookups can be optimized for scalability and reliability in real-world applications. "; final String subst = "$$\\1$$"; final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL); final Matcher matcher = pattern.matcher(string); // The substituted value will be contained in the result variable final String result = matcher.replaceAll(subst); System.out.println("Substitution result: " + result); } }

Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Java, please visit: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html