Regular Expressions 101

Community Patterns

There does not seem to be anything here

Community Library Entry

1

Regular Expression
PCRE2 (PHP >=7.3)

/
(?i)(?=(\d)\d\d([\s\-\x{2013}\x{2014}\/]))(?<![\d\-\x{2013}\x{2014}%])(?!000|666|9\d\d|\1{3}.\1{2}.\1{4}?|012.34.5678|123.45.6789|234.56.7890|098.76.5432|876.54.3210|078.05.1120|219.09.9999)\d{3}\2(?!00)\d{2}\2(?!0000)\d{4}(?![\d\-\x{2013}\x{2014}%])(?!\.(?:pdf|docx?|xlsx?|pptx?|zip|jpe?g|png|txt|log)\b)
/
gm

Description

Developed for use with Microsoft Purview DLP, which uses the PCRE-compatible "Boost.Regex" engine for pattern matching.

Matches formatted US Social Security number patterns (i.e. nine digits with separators):

  • Allow space \s, tab \t, dash -, emdash, endash, or slash / separators.
    (Separators within an SSN all must be of the same type.)
  • Exclude all-zero area, group, or serial segment sequences:
    000-XX-XXXX, XXX-00-XXXX, or XXX-XX-0000
  • Exclude group numbers 666 and 9##:
    666-XX-XXXX or 900-XX-XXXX
  • Exclude ascending and descending number sequences:
    123-45-6789, 876-54-3210, etc.
  • Excludes known retired SSNs:
    078-05-1120 and 219-09-9999
  • Boundary checks to prevent matching on telephone, credit card, and other non-SSN types.
  • Excludes sequences ending with common file extensions:
    .pdf, .doc(x), .xls(x), .ppt(x), .zip, .jp(e)g, .png, and .log

Derived from: Comprehensive US SSN (Social Security Number) Also uses patterns from the "U.S. Social Security Number (SSN) (Nucleuz Inc)" Sensitive Information Type contained in the Microsoft Purview DLP tool.

Submitted by J. Greg Mackinnon - 2 months ago