Regular Expressions 101

Save & Share

Regex Version: ver. 1
Fork Regex
ctrl+s
Go to community entry
Flavor

PCRE2 (PHP >=7.3)
PCRE (PHP <7.3)
ECMAScript (JavaScript)
Python
Golang
Java 8
.NET 7.0 (C#)
Rust
Regex Flavor Guide
Function

Match
Substitution
List
Unit Tests
Tools

Code Generator
Regex Debugger
Export Matches
Benchmark Regex
Regular Expression
No Match

Test String

Code Generator

Language

Generated Code

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(?<Reference>NM.*?)\((?<Gene>.*?)\):c.(?<Genic>.*) p.(?<Protein>.*?)\W";
        string input = @"
--------------------------------MATERIALS TESTED--------------------------------
Source material:S-XX-XXXXX
Block: D2
Control material: NONE
Outside accession: GI11-XXXXX
-------------------------------MOLECULAR RESULTS--------------------------------

Rectal adenocarcinoma

50-Gene Somatic Mutation Analysis Panel Report**:

Clinical test requisition for mutation studies on the following genes was received: BRAF, KRAS, NRAS


A next generation sequencing (NGS)-based analysis for the detection of somatic mutations in the coding sequence of a total of 50 genes was performed on the DNA extracted from the sample in our CLIA-certified molecular diagnostics laboratory. Interpretative findings are reported in the mutation screening summary table(s) below followed by specific details of detected variants.

Interpretation Key:

	Circled/Bold:  	Mutation (or variant) detected 
	Underlined:  	Mutation testing requested (ordered gene) 
	Asterisk:  	    Additional confirmation studies in progress, addendum report will be issued

Mutation screening summary: 

 I. Mutations in ordered genes 	   
Gene	Standardized Nomenclature (HGVS)	Location DNA change	Protein change	dbSNP ID	COSMIC ID	   
KRAS	NM_004985.3(KRAS):c.35G>A p.G12D	Exon 2	 SNV	    Missense	    rs121913529	COSM521	   
KRAS	NM_004985.3(KRAS):c.34G>A p.G12S	Exon 2	 SNV	    Missense	    rs121913530	COSM517	 

Note:  These mutations are in trans - present in mutually exclusive reads consistent with presence in distinct alleles.

 II. Mutations in non-ordered genes 	   
Gene	Standardized Nomenclature (HGVS)	                Location DNA change	Protein change	dbSNP ID	COSMIC ID	   
APC	    NM_000038.5(APC):c.3912_3912delinsTC (p.A1305fs*10)	Exon 16	 Complex	Frameshift			   
SMAD4	NM_005359.5(SMAD4):c.1082G>A p.R361H	            Exon 9	 SNV		Missense				COSM14122	   
SMAD4	NM_005359.5(SMAD4):c.430_431del p.S144fs	    Exon 4	Deletion	Frameshift			   
TP53	NM_000546.5(TP53):c.818G>A (p.R273H)	            Exon 8	SNV			Missense	rs28934576	COSM10660	 

  
 IV. Variants of probable germline origin 	   
Gene	Standardized Nomenclature (HGVS)	Location	DNA change	Protein change	dbSNP ID	COSMIC ID	   
ATM	NM_000051.3(ATM):c.5071A>C p.S1691R	Exon 34	    SNV		    Missense		rs1800059		   
KIT	NM_000222.2(KIT):c.1588G>A p.V530I	Exon 10	    SNV		    Missense		rs72550822	COSM1155	   
SMO	NM_005631.4(SMO):c.580G>A p.E194K	    Exon 3	    SNV		    Missense			 



GUIDE TO STANDARDIZED NOMENCLATURE AND EXPLANATION OF CHANGES:

Variants identified are described using an implementation of a standardized nomenclature developed by the Human Genome Variation Society (HGVS, http://www.hgvs.org/mutnomen/). 

The normative Genbank gene reference sequence identifier and gene symbol in parentheses are provided, following by the coding DNA sequence change  (e.g., ""c. 200A>G"", which would mean that the position 200 adenine is changed to guanine), and then the inferred protein change (e.g., ""p. V35C"", which would mean that the amino acid at codon 35 is changed from valine to cysteine).

Additional explanations for the DNA and protein changes seen in the current specimen are shown in the following tables:

 

Explanation of DNA variant/mutation types seen in this specimen
	   
DNA Change		   
SNV	A single nucleotide difference (point mutation) has been identified in the patient sample relative to the reference wild-type gene sequence	   
Deletion	A sequence of DNA has been deleted in the patient sample relative to the reference wild-type gene sequence	   
Complex	A complex variant involving both deletion and insertion of new sequence in the patient sample relative to the reference wild-type gene sequence	 

  

Explanation of protein variant/mutation types seen in this specimen
	   
Protein Change		   
Missense	A single amino acid residue change in the patient sample relative to the reference wild-type protein sequence	   
Frameshift	A mutation involving deletion or insertion of a non-triplet number of nucleotides in the patient sample relative to the reference wild-type sequence.  Frameshift mutations generally result in a nonsense translation with early termination of translation.	 


Additional information on genes with mutation/variants identified on this assay

	APC	http://www.genenames.org/data/hgnc_data.php?hgnc_id=583
	KRAS	http://www.genenames.org/data/hgnc_data.php?hgnc_id=6407
	SMAD4	http://www.genenames.org/data/hgnc_data.php?hgnc_id=6770
	TP53	http://www.genenames.org/data/hgnc_data.php?hgnc_id=11998

Methodology:

Test Platform: PCR-based sequencing is performed using a next generation sequencing (NGS) platform on genomic DNA to screen for mutations in the coding sequences of genes listed below. NGS sequencing analysis of these genes was further confirmed by other platforms during validation in our CLIA-certified molecular diagnostics laboratory.   The genomic reference sequence used is genome GRCh37/hg19.  Detailed information about the signal-processing, basecalling, alignment, and variant calling algorithms are available upon request.

Analytical Sensitivity: For this assay, sensitivity of detection is related in part to depth of coverage, tumor percentage, and allelic frequency for the mutation.   Although the NGS platform is capable of achieving a much higher analytical sensitivity; for clinical purposes, we determined the effective lower limit of detection of this assay (analytical sensitivity) for single nucleotide variations to be in the range of 5% (one mutant allele in the background of nineteen wild type alleles) to 10% (one mutant allele in the background of nine wild type alleles) by taking into consideration the depth of coverage at a given base and the ability to confirm low level mutations using independent conventional platforms. 

Classification of Variants: 
-	The variants detected by this platform are classified into four groups based on both analytic findings such as allelic frequency and the currently available information in publically available and periodically curated reference databases COSMIC version 64 (Catalog of Somatic Mutations in Cancer, Wellcome Trust Sanger Institute, UK) and dbSNP version 137 (National Institute of Health, US). 
o	Group I. Probable somatic mutations in clinically ordered/requested genes
o	Group II. Probable somatic mutations in non-ordered genes
o	Group III. Variants for which a germline versus somatic origin cannot be determined unequivocally
o	Group IV. Variants that are reported as germline polymorphisms in population studies/literature/matched tumor-normal analysis on different patients in our laboratory
-	Silent mutations (mutations that do not result in an amino acid change) are not reported. 
-	Very common germline polymorphisms (defined as variants otherwise fitting into group IV above, but with a population frequency of over 20% in our clinical laboratory sample cohort) are not reported.

Limitations of the test: 
-	The primary purpose of this panel is to detect somatic mutations in genes involved in oncogenesis of this patient’s tumor. The test or the results thereof should not be used to detect germline variants for hereditary cancer syndromes.
-	This panel is not designed to detect germline variants for familial tumors. Given the coverage of broad genomic regions using next generation sequencing technology, variants known to be germline polymorphisms may be detected. These variants are of unknown clinical significance and are classified separately from the variants of potential somatic origin. 
-	Matched non-tumor tissue from this patient has not been tested, therefore, the possibility of any detected mutation being a germline mutation cannot be completely ruled out.    
-	The assay is designed to detect point mutations and smaller insertion/deletions.
-	Variants detected at very low allelic frequencies not deemed to be confirmable by independent, orthogonal methods and/or in significant discordance with the percentage of tumor in the tested sample may be excluded as the clinical significance and reliability of such low level variant calls is not clear.

Report annotation and generation software: A post-variant calling analysis and annotation tool, OncoSeek version 1.1.1.220, was used in the construction of this report.

Sequencing coverage of the genes: The following table describe adequacy of coverage in this assay across the full set of covered genes, exons, and codons.  Adequately covered amplicons are defined as those having total coverage depth of greater than or equal to 250 reads, or for which an orthogonal mutation analysis testing has been performed. Presence of mutations outside the tested regions listed below cannot be ruled out.  Due to space limitations, only certain genes may be listed.  A full list of covered genes & codons for the specific test results on this sample is available upon request.

Coverage by gene and codon(s) tested for adequate amplicons

Gene 	Exons (codons) tested

ABL1 (NM_005157)	4 (232-260), 5 (275-279), 6 (314-360), 7 (380-412)
AKT1 (NM_005163)	3 (16-52), 6 (154-183)
ALK (NM_004304)	23 (1172-1204), 25 (1270-1279)
APC (NM_000038)	16 (860-891), 16 (1089-1125), 16 (1284-1326), 16 (1342-1384), 16 (1426-1471), 16 (1483-1524), 16 (1543-1582)
ATM (NM_000051)	8 (326-355), 9 (407-412), 12 (601-626), 17 (834-865), 26 (1292-1325), 34 (1674-1707), 35 (1726-1757), 36 (1790-1815), 39 (1926-1946), 50 (2436-2454), 54 (2650-2667), 55 (2682-2711), 56 (2718-2736), 59 (2865-2891), 61 (2933-2950), 63 (2996-3026), 63 (3041-3057)
BRAF (NM_004333)	11 (439-473), 15 (581-611)
CDH1 (NM_004360)	3 (65-96), 8 (337-374), 9 (380-408)
CDKN2A (NM_000077)	2 (51-90), 2 (98-140)
CSF1R (NM_005211)	7 (297-319), 22 (953-973)
CTNNB1 (NM_001904)	3 (9-48)
EGFR (NM_005228)	3 (96-123), 7 (279-297), 15 (575-601), 18 (695-726), 19-20 (729-796), 20 (807-823), 21 (855-875)
ERBB2 (NM_004448)	19-20 (752-797), 21 (839-882)
ERBB4 (NM_005235)	3 (136-141), 4 (167-186), 6 (225-247), 7 (254-290), 8 (295-323), 9 (333-367), 15 (580-623), 23 (919-948)
EZH2 (NM_004456)	16 (625-649)
FBXW7 (NM_033632)	5 (264-287), 8 (378-403), 9 (434-473), 10 (478-509), 11 (567-594)
FGFR1 (NM_015850)	4 (120-148), 7 (247-273)
FGFR2 (NM_000141)	7 (250-275), 7 (296-313), 9 (362-399), 12 (546-558)
FGFR3 (NM_000142)	7 (247-277), 9 (367-402), 14 (631-653), 16 (690-719), 18 (771-807)
FLT3 (NM_004119)	11 (437-466), 14 (570-610), 16 (663-685), 20 (828-847)
GNA11 (NM_002067)	5 (202-219)
GNAQ (NM_002072)	5 (206-245)
GNAS (NM_000516)	8-9 (196-240)
HNF1A (NM_000545)	3 (192-221), 4 (253-282)
HRAS (NM_005343)	2 (5-35), 3 (42-82)
IDH1 (NM_005896)	4 (101-135)
IDH2 (NM_002168)	4 (133-177)
JAK2 (NM_004972)	14 (603-622)
JAK3 (NM_000215)	4 (128-140), 13 (568-580), 16 (709-733)
KDR (NM_002253)	6-7 (244-291), 11 (471-480), 19 (872-894), 21 (961-988), 26 (1135-1156), 27 (1192-1221), 30 (1283-1310), 30 (1324-1357)
KIT (NM_000222)	2 (23-58), 9 (494-514), 10-11 (525-587), 13 (627-661), 14 (664-684), 15 (714-724), 17 (802-828), 18 (832-858)
KRAS (NM_004985)	2-3 (5-66), 4 (114-150)
MET (NM_001127500)	2 (159-188), 2 (339-378), 11 (816-856), 14 (981-1012), 16 (1105-1132), 19 (1246-1274)
MLH1 (NM_000249)	12 (373-415)
MPL (NM_005373)	10 (501-522)
NOTCH1 (NM_017617)	26 (1566-1602), 27 (1673-1680), 34 (2436-2476)
NPM1 (NM_002520)	11 (283-295)
NRAS (NM_002524)	2 (3-31), 3 (43-69), 4 (124-150)
PDGFRA (NM_006206)	12 (552-583), 14 (644-668), 15 (671-709), 18 (819-854)
PIK3CA (NM_006218)	2 (54-90), 2 (106-118), 5 (316-351), 7-8 (390-422), 8 (449-468), 10 (522-549), 14 (677-720), 19 (898-924), 21 (1017-1051), 21 (1065-1069)
PTEN (NM_000314)	1 (1-25), 3 (55-70), 5 (99-135), 6 (165-184), 7 (212-215), 7 (231-267), 8 (282-300), 8 (312-342)
PTPN11 (NM_002834)	3 (46-82), 13 (485-527)
RB1 (NM_000321)	4 (130-159), 6 (196-203), 10 (314-345), 11 (350-366), 14 (452-463), 17-18 (547-582), 20 (655-691), 21 (703-724), 22 (743-770)
RET (NM_020975)	10-11 (608-654), 13 (762-786), 15-16 (875-924)
SMAD4 (NM_005359)	3 (98-136), 4 (142-146), 5 (165-202), 6 (242-263), 8 (307-319), 9 (326-365), 10 (384-424), 11 (443-474), 12 (494-532)
SMARCB1 (NM_003073)	2 (35-72), 4-5 (144-206), 9 (373-386)
SMO (NM_005631)	3 (186-228), 5 (307-331), 6 (391-419), 9 (511-542), 11 (608-646)
SRC (NM_005417)	14 (499-533)
STK11 (NM_000455)	1 (22-64), 4 (155-181), 4-5 (191-207), 6 (253-285), 8 (317-361)
TP53 (NM_000546)	2 (1-20), 4 (68-113), 5 (126-138), 5-6 (149-223), 7 (225-258), 8 (263-307), 10 (332-367)
VHL (NM_000551)	1 (78-108), 2 (114-150), 3 (155-174)


DISCLAIMER:

This test was developed and its performance characteristics determined by the Molecular Diagnostic Laboratory (MDL) at the M.D. Anderson Cancer Center.  It has not been cleared by the U.S. Food and Drug Administration. However, such approval is not required for clinical implementation, and the test results on the ordered genes have been shown to be clinically useful. This laboratory is CAP accredited and CLIA certified to perform high complexity molecular testing for clinical purposes.

";
        
        foreach (Match m in Regex.Matches(input, pattern))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for C#, please visit: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx
Regular Expressions 101

Save & Share

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular Expression
No Match

Test String

Code Generator

Language

Generated Code

Save & Share

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular ExpressionNo Match

Test String

Regular Expression
No Match