Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8
  • .NET 7.0 (C#)
  • Rust
  • Regex Flavor Guide

Function

  • Match
  • Substitution
  • List
  • Unit Tests

Tools

Sponsors
There are currently no sponsors. Become a sponsor today!
An explanation of your regex will be automatically generated as you type.
Detailed match information will be displayed here automatically.
  • All Tokens
  • Common Tokens
  • General Tokens
  • Anchors
  • Meta Sequences
  • Quantifiers
  • Group Constructs
  • Character Classes
  • Flags/Modifiers
  • Substitution
  • A single character of: a, b or c
    [abc]
  • A character except: a, b or c
    [^abc]
  • A character in the range: a-z
    [a-z]
  • A character not in the range: a-z
    [^a-z]
  • A character in the range: a-z or A-Z
    [a-zA-Z]
  • Any single character
    .
  • Alternate - match either a or b
    a|b
  • Any whitespace character
    \s
  • Any non-whitespace character
    \S
  • Any digit
    \d
  • Any non-digit
    \D
  • Any word character
    \w
  • Any non-word character
    \W
  • Non-capturing group
    (?:...)
  • Capturing group
    (...)
  • Zero or one of a
    a?
  • Zero or more of a
    a*
  • One or more of a
    a+
  • Exactly 3 of a
    a{3}
  • 3 or more of a
    a{3,}
  • Between 3 and 6 of a
    a{3,6}
  • Start of string
    ^
  • End of string
    $
  • A word boundary
    \b
  • Non-word boundary
    \B

Regular Expression
No Match

`
`
gm

Test String

Code Generator

Generated Code

#include <StringConstants.au3> ; to declare the Constants of StringRegExp #include <Array.au3> ; UDF needed for _ArrayDisplay and _ArrayConcatenate Local $sRegex = "(?m)%(?P<flag>\#|\+|\-| |0)?((?P<width>[1-9])\.(?P<precision>[1-9])|(?P<widthDefaultPrecision>[1-9])|(?P<widthZeroPrecison>[1-9])\.|\.(?P<precisionDefaultWidth>[1-9]))?(?P<verb>\w{1,9})" Local $sString = "// Copyright 2009 The Go Authors. All rights reserved." & @CRLF & _ "// Use of this source code is governed by a BSD-style" & @CRLF & _ "// license that can be found in the LICENSE file." & @CRLF & _ "" & @CRLF & _ "/*" & @CRLF & _ " Package fmt implements formatted I/O with functions analogous" & @CRLF & _ " to C's printf and scanf. The format 'verbs' are derived from C's but" & @CRLF & _ " are simpler." & @CRLF & _ "" & @CRLF & _ "" & @CRLF & _ " Printing" & @CRLF & _ "" & @CRLF & _ " The verbs:" & @CRLF & _ "" & @CRLF & _ " General:" & @CRLF & _ " %v the value in a default format" & @CRLF & _ " when printing structs, the plus flag (%+v) adds field names" & @CRLF & _ " %#v a Go-syntax representation of the value" & @CRLF & _ " %T a Go-syntax representation of the type of the value" & @CRLF & _ " %% a literal percent sign; consumes no value" & @CRLF & _ "" & @CRLF & _ " Boolean:" & @CRLF & _ " %t the word true or false" & @CRLF & _ " Integer:" & @CRLF & _ " %b base 2" & @CRLF & _ " %c the character represented by the corresponding Unicode code point" & @CRLF & _ " %d base 10" & @CRLF & _ " %o base 8" & @CRLF & _ " %O base 8 with 0o prefix" & @CRLF & _ " %q a single-quoted character literal safely escaped with Go syntax." & @CRLF & _ " %x base 16, with lower-case letters for a-f" & @CRLF & _ " %X base 16, with upper-case letters for A-F" & @CRLF & _ " %U Unicode format: U+1234; same as "U+%04X"" & @CRLF & _ " Floating-point and complex constituents:" & @CRLF & _ " %b decimalless scientific notation with exponent a power of two," & @CRLF & _ " in the manner of strconv.FormatFloat with the 'b' format," & @CRLF & _ " e.g. -123456p-78" & @CRLF & _ " %e scientific notation, e.g. -1.234456e+78" & @CRLF & _ " %E scientific notation, e.g. -1.234456E+78" & @CRLF & _ " %f decimal point but no exponent, e.g. 123.456" & @CRLF & _ " %F synonym for %f" & @CRLF & _ " %g %e for large exponents, %f otherwise. Precision is discussed below." & @CRLF & _ " %G %E for large exponents, %F otherwise" & @CRLF & _ " %x hexadecimal notation (with decimal power of two exponent), e.g. -0x1.23abcp+20" & @CRLF & _ " %X upper-case hexadecimal notation, e.g. -0X1.23ABCP+20" & @CRLF & _ " String and slice of bytes (treated equivalently with these verbs):" & @CRLF & _ " %s the uninterpreted bytes of the string or slice" & @CRLF & _ " %q a double-quoted string safely escaped with Go syntax" & @CRLF & _ " %x base 16, lower-case, two characters per byte" & @CRLF & _ " %X base 16, upper-case, two characters per byte" & @CRLF & _ " Slice:" & @CRLF & _ " %p address of 0th element in base 16 notation, with leading 0x" & @CRLF & _ " Pointer:" & @CRLF & _ " %p base 16 notation, with leading 0x" & @CRLF & _ " The %b, %d, %o, %x and %X verbs also work with pointers," & @CRLF & _ " formatting the value exactly as if it were an integer." & @CRLF & _ "" & @CRLF & _ " The default format for %v is:" & @CRLF & _ " bool: %t" & @CRLF & _ " int, int8 etc.: %d" & @CRLF & _ " uint, uint8 etc.: %d, %#x if printed with %#v" & @CRLF & _ " float32, complex64, etc: %g" & @CRLF & _ " string: %s" & @CRLF & _ " chan: %p" & @CRLF & _ " pointer: %p" & @CRLF & _ " For compound objects, the elements are printed using these rules, recursively," & @CRLF & _ " laid out like this:" & @CRLF & _ " struct: {field0 field1 ...}" & @CRLF & _ " array, slice: [elem0 elem1 ...]" & @CRLF & _ " maps: map[key1:value1 key2:value2 ...]" & @CRLF & _ " pointer to above: &{}, &[], &map[]" & @CRLF & _ "" & @CRLF & _ " Width is specified by an optional decimal number immediately preceding the verb." & @CRLF & _ " If absent, the width is whatever is necessary to represent the value." & @CRLF & _ " Precision is specified after the (optional) width by a period followed by a" & @CRLF & _ " decimal number. If no period is present, a default precision is used." & @CRLF & _ " A period with no following number specifies a precision of zero." & @CRLF & _ " Examples:" & @CRLF & _ " %f default width, default precision" & @CRLF & _ " %9f width 9, default precision" & @CRLF & _ " %.2f default width, precision 2" & @CRLF & _ " %9.2f width 9, precision 2" & @CRLF & _ " %9.f width 9, precision 0" & @CRLF & _ "" & @CRLF & _ " Width and precision are measured in units of Unicode code points," & @CRLF & _ " that is, runes. (This differs from C's printf where the" & @CRLF & _ " units are always measured in bytes.) Either or both of the flags" & @CRLF & _ " may be replaced with the character '*', causing their values to be" & @CRLF & _ " obtained from the next operand (preceding the one to format)," & @CRLF & _ " which must be of type int." & @CRLF & _ "" & @CRLF & _ " For most values, width is the minimum number of runes to output," & @CRLF & _ " padding the formatted form with spaces if necessary." & @CRLF & _ "" & @CRLF & _ " For strings, byte slices and byte arrays, however, precision" & @CRLF & _ " limits the length of the input to be formatted (not the size of" & @CRLF & _ " the output), truncating if necessary. Normally it is measured in" & @CRLF & _ " runes, but for these types when formatted with the %x or %X format" & @CRLF & _ " it is measured in bytes." & @CRLF & _ "" & @CRLF & _ " For floating-point values, width sets the minimum width of the field and" & @CRLF & _ " precision sets the number of places after the decimal, if appropriate," & @CRLF & _ " except that for %g/%G precision sets the maximum number of significant" & @CRLF & _ " digits (trailing zeros are removed). For example, given 12.345 the format" & @CRLF & _ " %6.3f prints 12.345 while %.3g prints 12.3. The default precision for %e, %f" & @CRLF & _ " and %#g is 6; for %g it is the smallest number of digits necessary to identify" & @CRLF & _ " the value uniquely." & @CRLF & _ "" & @CRLF & _ " For complex numbers, the width and precision apply to the two" & @CRLF & _ " components independently and the result is parenthesized, so %f applied" & @CRLF & _ " to 1.2+3.4i produces (1.200000+3.400000i)." & @CRLF & _ "" & @CRLF & _ " Other flags:" & @CRLF & _ " + always print a sign for numeric values;" & @CRLF & _ " guarantee ASCII-only output for %q (%+q)" & @CRLF & _ " - pad with spaces on the right rather than the left (left-justify the field)" & @CRLF & _ " # alternate format: add leading 0b for binary (%#b), 0 for octal (%#o)," & @CRLF & _ " 0x or 0X for hex (%#x or %#X); suppress 0x for %p (%#p);" & @CRLF & _ " for %q, print a raw (backquoted) string if strconv.CanBackquote" & @CRLF & _ " returns true;" & @CRLF & _ " always print a decimal point for %e, %E, %f, %F, %g and %G;" & @CRLF & _ " do not remove trailing zeros for %g and %G;" & @CRLF & _ " write e.g. U+0078 'x' if the character is printable for %U (%#U)." & @CRLF & _ " ' ' (space) leave a space for elided sign in numbers (% d);" & @CRLF & _ " put spaces between bytes printing strings or slices in hex (% x, % X)" & @CRLF & _ " 0 pad with leading zeros rather than spaces;" & @CRLF & _ " for numbers, this moves the padding after the sign" & @CRLF & _ "" & @CRLF & _ " Flags are ignored by verbs that do not expect them." & @CRLF & _ " For example there is no alternate decimal format, so %#d and %d" & @CRLF & _ " behave identically." & @CRLF & _ "" & @CRLF & _ " For each Printf-like function, there is also a Print function" & @CRLF & _ " that takes no format and is equivalent to saying %v for every" & @CRLF & _ " operand. Another variant Println inserts blanks between" & @CRLF & _ " operands and appends a newline." & @CRLF & _ "" & @CRLF & _ " Regardless of the verb, if an operand is an interface value," & @CRLF & _ " the internal concrete value is used, not the interface itself." & @CRLF & _ " Thus:" & @CRLF & _ " var i interface{} = 23" & @CRLF & _ " fmt.Printf("%v\n", i)" & @CRLF & _ " will print 23." & @CRLF & _ "" & @CRLF & _ " Except when printed using the verbs %T and %p, special" & @CRLF & _ " formatting considerations apply for operands that implement" & @CRLF & _ " certain interfaces. In order of application:" & @CRLF & _ "" & @CRLF & _ " 1. If the operand is a reflect.Value, the operand is replaced by the" & @CRLF & _ " concrete value that it holds, and printing continues with the next rule." & @CRLF & _ "" & @CRLF & _ " 2. If an operand implements the Formatter interface, it will" & @CRLF & _ " be invoked. Formatter provides fine control of formatting." & @CRLF & _ "" & @CRLF & _ " 3. If the %v verb is used with the # flag (%#v) and the operand" & @CRLF & _ " implements the GoStringer interface, that will be invoked." & @CRLF & _ "" & @CRLF & _ " If the format (which is implicitly %v for Println etc.) is valid" & @CRLF & _ " for a string (%s %q %v %x %X), the following two rules apply:" & @CRLF & _ "" & @CRLF & _ " 4. If an operand implements the error interface, the Error method" & @CRLF & _ " will be invoked to convert the object to a string, which will then" & @CRLF & _ " be formatted as required by the verb (if any)." & @CRLF & _ "" & @CRLF & _ " 5. If an operand implements method String() string, that method" & @CRLF & _ " will be invoked to convert the object to a string, which will then" & @CRLF & _ " be formatted as required by the verb (if any)." & @CRLF & _ "" & @CRLF & _ " For compound operands such as slices and structs, the format" & @CRLF & _ " applies to the elements of each operand, recursively, not to the" & @CRLF & _ " operand as a whole. Thus %q will quote each element of a slice" & @CRLF & _ " of strings, and %6.2f will control formatting for each element" & @CRLF & _ " of a floating-point array." & @CRLF & _ "" & @CRLF & _ " However, when printing a byte slice with a string-like verb" & @CRLF & _ " (%s %q %x %X), it is treated identically to a string, as a single item." & @CRLF & _ "" & @CRLF & _ " To avoid recursion in cases such as" & @CRLF & _ " type X string" & @CRLF & _ " func (x X) String() string { return Sprintf("<%s>", x) }" & @CRLF & _ " convert the value before recurring:" & @CRLF & _ " func (x X) String() string { return Sprintf("<%s>", string(x)) }" & @CRLF & _ " Infinite recursion can also be triggered by self-referential data" & @CRLF & _ " structures, such as a slice that contains itself as an element, if" & @CRLF & _ " that type has a String method. Such pathologies are rare, however," & @CRLF & _ " and the package does not protect against them." & @CRLF & _ "" & @CRLF & _ " When printing a struct, fmt cannot and therefore does not invoke" & @CRLF & _ " formatting methods such as Error or String on unexported fields." & @CRLF & _ "" & @CRLF & _ " Explicit argument indexes:" & @CRLF & _ "" & @CRLF & _ " In Printf, Sprintf, and Fprintf, the default behavior is for each" & @CRLF & _ " formatting verb to format successive arguments passed in the call." & @CRLF & _ " However, the notation [n] immediately before the verb indicates that the" & @CRLF & _ " nth one-indexed argument is to be formatted instead. The same notation" & @CRLF & _ " before a '*' for a width or precision selects the argument index holding" & @CRLF & _ " the value. After processing a bracketed expression [n], subsequent verbs" & @CRLF & _ " will use arguments n+1, n+2, etc. unless otherwise directed." & @CRLF & _ "" & @CRLF & _ " For example," & @CRLF & _ " fmt.Sprintf("%[2]d %[1]d\n", 11, 22)" & @CRLF & _ " will yield "22 11", while" & @CRLF & _ " fmt.Sprintf("%[3]*.[2]*[1]f", 12.0, 2, 6)" & @CRLF & _ " equivalent to" & @CRLF & _ " fmt.Sprintf("%6.2f", 12.0)" & @CRLF & _ " will yield " 12.00". Because an explicit index affects subsequent verbs," & @CRLF & _ " this notation can be used to print the same values multiple times" & @CRLF & _ " by resetting the index for the first argument to be repeated:" & @CRLF & _ " fmt.Sprintf("%d %d %#[1]x %#x", 16, 17)" & @CRLF & _ " will yield "16 17 0x10 0x11"." & @CRLF & _ "" & @CRLF & _ " Format errors:" & @CRLF & _ "" & @CRLF & _ " If an invalid argument is given for a verb, such as providing" & @CRLF & _ " a string to %d, the generated string will contain a" & @CRLF & _ " description of the problem, as in these examples:" & @CRLF & _ "" & @CRLF & _ " Wrong type or unknown verb: %!verb(type=value)" & @CRLF & _ " Printf("%d", "hi"): %!d(string=hi)" & @CRLF & _ " Too many arguments: %!(EXTRA type=value)" & @CRLF & _ " Printf("hi", "guys"): hi%!(EXTRA string=guys)" & @CRLF & _ " Too few arguments: %!verb(MISSING)" & @CRLF & _ " Printf("hi%d"): hi%!d(MISSING)" & @CRLF & _ " Non-int for width or precision: %!(BADWIDTH) or %!(BADPREC)" & @CRLF & _ " Printf("%*s", 4.5, "hi"): %!(BADWIDTH)hi" & @CRLF & _ " Printf("%.*s", 4.5, "hi"): %!(BADPREC)hi" & @CRLF & _ " Invalid or invalid use of argument index: %!(BADINDEX)" & @CRLF & _ " Printf("%*[2]d", 7): %!d(BADINDEX)" & @CRLF & _ " Printf("%.[2]d", 7): %!d(BADINDEX)" & @CRLF & _ "" & @CRLF & _ " All errors begin with the string "%!" followed sometimes" & @CRLF & _ " by a single character (the verb) and end with a parenthesized" & @CRLF & _ " description." & @CRLF & _ "" & @CRLF & _ " If an Error or String method triggers a panic when called by a" & @CRLF & _ " print routine, the fmt package reformats the error message" & @CRLF & _ " from the panic, decorating it with an indication that it came" & @CRLF & _ " through the fmt package. For example, if a String method" & @CRLF & _ " calls panic("bad"), the resulting formatted message will look" & @CRLF & _ " like" & @CRLF & _ " %!s(PANIC=bad)" & @CRLF & _ "" & @CRLF & _ " The %!s just shows the print verb in use when the failure" & @CRLF & _ " occurred. If the panic is caused by a nil receiver to an Error" & @CRLF & _ " or String method, however, the output is the undecorated" & @CRLF & _ " string, "<nil>"." & @CRLF & _ "" & @CRLF & _ " Scanning" & @CRLF & _ "" & @CRLF & _ " An analogous set of functions scans formatted text to yield" & @CRLF & _ " values. Scan, Scanf and Scanln read from os.Stdin; Fscan," & @CRLF & _ " Fscanf and Fscanln read from a specified io.Reader; Sscan," & @CRLF & _ " Sscanf and Sscanln read from an argument string." & @CRLF & _ "" & @CRLF & _ " Scan, Fscan, Sscan treat newlines in the input as spaces." & @CRLF & _ "" & @CRLF & _ " Scanln, Fscanln and Sscanln stop scanning at a newline and" & @CRLF & _ " require that the items be followed by a newline or EOF." & @CRLF & _ "" & @CRLF & _ " Scanf, Fscanf, and Sscanf parse the arguments according to a" & @CRLF & _ " format string, analogous to that of Printf. In the text that" & @CRLF & _ " follows, 'space' means any Unicode whitespace character" & @CRLF & _ " except newline." & @CRLF & _ "" & @CRLF & _ " In the format string, a verb introduced by the % character" & @CRLF & _ " consumes and parses input; these verbs are described in more" & @CRLF & _ " detail below. A character other than %, space, or newline in" & @CRLF & _ " the format consumes exactly that input character, which must" & @CRLF & _ " be present. A newline with zero or more spaces before it in" & @CRLF & _ " the format string consumes zero or more spaces in the input" & @CRLF & _ " followed by a single newline or the end of the input. A space" & @CRLF & _ " following a newline in the format string consumes zero or more" & @CRLF & _ " spaces in the input. Otherwise, any run of one or more spaces" & @CRLF & _ " in the format string consumes as many spaces as possible in" & @CRLF & _ " the input. Unless the run of spaces in the format string" & @CRLF & _ " appears adjacent to a newline, the run must consume at least" & @CRLF & _ " one space from the input or find the end of the input." & @CRLF & _ "" & @CRLF & _ " The handling of spaces and newlines differs from that of C's" & @CRLF & _ " scanf family: in C, newlines are treated as any other space," & @CRLF & _ " and it is never an error when a run of spaces in the format" & @CRLF & _ " string finds no spaces to consume in the input." & @CRLF & _ "" & @CRLF & _ " The verbs behave analogously to those of Printf." & @CRLF & _ " For example, %x will scan an integer as a hexadecimal number," & @CRLF & _ " and %v will scan the default representation format for the value." & @CRLF & _ " The Printf verbs %p and %T and the flags # and + are not implemented." & @CRLF & _ " For floating-point and complex values, all valid formatting verbs" & @CRLF & _ " (%b %e %E %f %F %g %G %x %X and %v) are equivalent and accept" & @CRLF & _ " both decimal and hexadecimal notation (for example: "2.3e+7", "0x4.5p-8")" & @CRLF & _ " and digit-separating underscores (for example: "3.14159_26535_89793")." & @CRLF & _ "" & @CRLF & _ " Input processed by verbs is implicitly space-delimited: the" & @CRLF & _ " implementation of every verb except %c starts by discarding" & @CRLF & _ " leading spaces from the remaining input, and the %s verb" & @CRLF & _ " (and %v reading into a string) stops consuming input at the first" & @CRLF & _ " space or newline character." & @CRLF & _ "" & @CRLF & _ " The familiar base-setting prefixes 0b (binary), 0o and 0 (octal)," & @CRLF & _ " and 0x (hexadecimal) are accepted when scanning integers" & @CRLF & _ " without a format or with the %v verb, as are digit-separating" & @CRLF & _ " underscores." & @CRLF & _ "" & @CRLF & _ " Width is interpreted in the input text but there is no" & @CRLF & _ " syntax for scanning with a precision (no %5.2f, just %5f)." & @CRLF & _ " If width is provided, it applies after leading spaces are" & @CRLF & _ " trimmed and specifies the maximum number of runes to read" & @CRLF & _ " to satisfy the verb. For example," & @CRLF & _ " Sscanf(" 1234567 ", "%5s%d", &s, &i)" & @CRLF & _ " will set s to "12345" and i to 67 while" & @CRLF & _ " Sscanf(" 12 34 567 ", "%5s%d", &s, &i)" & @CRLF & _ " will set s to "12" and i to 34." & @CRLF & _ "" & @CRLF & _ " In all the scanning functions, a carriage return followed" & @CRLF & _ " immediately by a newline is treated as a plain newline" & @CRLF & _ " (\r\n means the same as \n)." & @CRLF & _ "" & @CRLF & _ " In all the scanning functions, if an operand implements method" & @CRLF & _ " Scan (that is, it implements the Scanner interface) that" & @CRLF & _ " method will be used to scan the text for that operand. Also," & @CRLF & _ " if the number of arguments scanned is less than the number of" & @CRLF & _ " arguments provided, an error is returned." & @CRLF & _ "" & @CRLF & _ " All arguments to be scanned must be either pointers to basic" & @CRLF & _ " types or implementations of the Scanner interface." & @CRLF & _ "" & @CRLF & _ " Like Scanf and Fscanf, Sscanf need not consume its entire input." & @CRLF & _ " There is no way to recover how much of the input string Sscanf used." & @CRLF & _ "" & @CRLF & _ " Note: Fscan etc. can read one character (rune) past the input" & @CRLF & _ " they return, which means that a loop calling a scan routine" & @CRLF & _ " may skip some of the input. This is usually a problem only" & @CRLF & _ " when there is no space between input values. If the reader" & @CRLF & _ " provided to Fscan implements ReadRune, that method will be used" & @CRLF & _ " to read characters. If the reader also implements UnreadRune," & @CRLF & _ " that method will be used to save the character and successive" & @CRLF & _ " calls will not lose data. To attach ReadRune and UnreadRune" & @CRLF & _ " methods to a reader without that capability, use" & @CRLF & _ " bufio.NewReader." & @CRLF & _ "*/" & @CRLF & _ "package fmt" Local $aArray = StringRegExp($sString, $sRegex, $STR_REGEXPARRAYGLOBALFULLMATCH) Local $aFullArray[0] For $i = 0 To UBound($aArray) -1 _ArrayConcatenate($aFullArray, $aArray[$i]) Next $aArray = $aFullArray ; Present the entire match result _ArrayDisplay($aArray, "Result")

Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for AutoIt, please visit: https://www.autoitscript.com/autoit3/docs/functions/StringRegExp.htm