using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(?<Reference>NM.*?)\((?<Gene>.*?)\):c.(?<Genic>.*) p.(?<Protein>.*?)\W";
string input = @"
--------------------------------MATERIALS TESTED--------------------------------
Source material:S-XX-XXXXX
Block: D2
Control material: NONE
Outside accession: GI11-XXXXX
-------------------------------MOLECULAR RESULTS--------------------------------
Rectal adenocarcinoma
50-Gene Somatic Mutation Analysis Panel Report**:
Clinical test requisition for mutation studies on the following genes was received: BRAF, KRAS, NRAS
A next generation sequencing (NGS)-based analysis for the detection of somatic mutations in the coding sequence of a total of 50 genes was performed on the DNA extracted from the sample in our CLIA-certified molecular diagnostics laboratory. Interpretative findings are reported in the mutation screening summary table(s) below followed by specific details of detected variants.
Interpretation Key:
Circled/Bold: Mutation (or variant) detected
Underlined: Mutation testing requested (ordered gene)
Asterisk: Additional confirmation studies in progress, addendum report will be issued
Mutation screening summary:
I. Mutations in ordered genes
Gene Standardized Nomenclature (HGVS) Location DNA change Protein change dbSNP ID COSMIC ID
KRAS NM_004985.3(KRAS):c.35G>A p.G12D Exon 2 SNV Missense rs121913529 COSM521
KRAS NM_004985.3(KRAS):c.34G>A p.G12S Exon 2 SNV Missense rs121913530 COSM517
Note: These mutations are in trans - present in mutually exclusive reads consistent with presence in distinct alleles.
II. Mutations in non-ordered genes
Gene Standardized Nomenclature (HGVS) Location DNA change Protein change dbSNP ID COSMIC ID
APC NM_000038.5(APC):c.3912_3912delinsTC (p.A1305fs*10) Exon 16 Complex Frameshift
SMAD4 NM_005359.5(SMAD4):c.1082G>A p.R361H Exon 9 SNV Missense COSM14122
SMAD4 NM_005359.5(SMAD4):c.430_431del p.S144fs Exon 4 Deletion Frameshift
TP53 NM_000546.5(TP53):c.818G>A (p.R273H) Exon 8 SNV Missense rs28934576 COSM10660
IV. Variants of probable germline origin
Gene Standardized Nomenclature (HGVS) Location DNA change Protein change dbSNP ID COSMIC ID
ATM NM_000051.3(ATM):c.5071A>C p.S1691R Exon 34 SNV Missense rs1800059
KIT NM_000222.2(KIT):c.1588G>A p.V530I Exon 10 SNV Missense rs72550822 COSM1155
SMO NM_005631.4(SMO):c.580G>A p.E194K Exon 3 SNV Missense
GUIDE TO STANDARDIZED NOMENCLATURE AND EXPLANATION OF CHANGES:
Variants identified are described using an implementation of a standardized nomenclature developed by the Human Genome Variation Society (HGVS, http://www.hgvs.org/mutnomen/).
The normative Genbank gene reference sequence identifier and gene symbol in parentheses are provided, following by the coding DNA sequence change (e.g., ""c. 200A>G"", which would mean that the position 200 adenine is changed to guanine), and then the inferred protein change (e.g., ""p. V35C"", which would mean that the amino acid at codon 35 is changed from valine to cysteine).
Additional explanations for the DNA and protein changes seen in the current specimen are shown in the following tables:
Explanation of DNA variant/mutation types seen in this specimen
DNA Change
SNV A single nucleotide difference (point mutation) has been identified in the patient sample relative to the reference wild-type gene sequence
Deletion A sequence of DNA has been deleted in the patient sample relative to the reference wild-type gene sequence
Complex A complex variant involving both deletion and insertion of new sequence in the patient sample relative to the reference wild-type gene sequence
Explanation of protein variant/mutation types seen in this specimen
Protein Change
Missense A single amino acid residue change in the patient sample relative to the reference wild-type protein sequence
Frameshift A mutation involving deletion or insertion of a non-triplet number of nucleotides in the patient sample relative to the reference wild-type sequence. Frameshift mutations generally result in a nonsense translation with early termination of translation.
Additional information on genes with mutation/variants identified on this assay
APC http://www.genenames.org/data/hgnc_data.php?hgnc_id=583
KRAS http://www.genenames.org/data/hgnc_data.php?hgnc_id=6407
SMAD4 http://www.genenames.org/data/hgnc_data.php?hgnc_id=6770
TP53 http://www.genenames.org/data/hgnc_data.php?hgnc_id=11998
Methodology:
Test Platform: PCR-based sequencing is performed using a next generation sequencing (NGS) platform on genomic DNA to screen for mutations in the coding sequences of genes listed below. NGS sequencing analysis of these genes was further confirmed by other platforms during validation in our CLIA-certified molecular diagnostics laboratory. The genomic reference sequence used is genome GRCh37/hg19. Detailed information about the signal-processing, basecalling, alignment, and variant calling algorithms are available upon request.
Analytical Sensitivity: For this assay, sensitivity of detection is related in part to depth of coverage, tumor percentage, and allelic frequency for the mutation. Although the NGS platform is capable of achieving a much higher analytical sensitivity; for clinical purposes, we determined the effective lower limit of detection of this assay (analytical sensitivity) for single nucleotide variations to be in the range of 5% (one mutant allele in the background of nineteen wild type alleles) to 10% (one mutant allele in the background of nine wild type alleles) by taking into consideration the depth of coverage at a given base and the ability to confirm low level mutations using independent conventional platforms.
Classification of Variants:
- The variants detected by this platform are classified into four groups based on both analytic findings such as allelic frequency and the currently available information in publically available and periodically curated reference databases COSMIC version 64 (Catalog of Somatic Mutations in Cancer, Wellcome Trust Sanger Institute, UK) and dbSNP version 137 (National Institute of Health, US).
o Group I. Probable somatic mutations in clinically ordered/requested genes
o Group II. Probable somatic mutations in non-ordered genes
o Group III. Variants for which a germline versus somatic origin cannot be determined unequivocally
o Group IV. Variants that are reported as germline polymorphisms in population studies/literature/matched tumor-normal analysis on different patients in our laboratory
- Silent mutations (mutations that do not result in an amino acid change) are not reported.
- Very common germline polymorphisms (defined as variants otherwise fitting into group IV above, but with a population frequency of over 20% in our clinical laboratory sample cohort) are not reported.
Limitations of the test:
- The primary purpose of this panel is to detect somatic mutations in genes involved in oncogenesis of this patient’s tumor. The test or the results thereof should not be used to detect germline variants for hereditary cancer syndromes.
- This panel is not designed to detect germline variants for familial tumors. Given the coverage of broad genomic regions using next generation sequencing technology, variants known to be germline polymorphisms may be detected. These variants are of unknown clinical significance and are classified separately from the variants of potential somatic origin.
- Matched non-tumor tissue from this patient has not been tested, therefore, the possibility of any detected mutation being a germline mutation cannot be completely ruled out.
- The assay is designed to detect point mutations and smaller insertion/deletions.
- Variants detected at very low allelic frequencies not deemed to be confirmable by independent, orthogonal methods and/or in significant discordance with the percentage of tumor in the tested sample may be excluded as the clinical significance and reliability of such low level variant calls is not clear.
Report annotation and generation software: A post-variant calling analysis and annotation tool, OncoSeek version 1.1.1.220, was used in the construction of this report.
Sequencing coverage of the genes: The following table describe adequacy of coverage in this assay across the full set of covered genes, exons, and codons. Adequately covered amplicons are defined as those having total coverage depth of greater than or equal to 250 reads, or for which an orthogonal mutation analysis testing has been performed. Presence of mutations outside the tested regions listed below cannot be ruled out. Due to space limitations, only certain genes may be listed. A full list of covered genes & codons for the specific test results on this sample is available upon request.
Coverage by gene and codon(s) tested for adequate amplicons
Gene Exons (codons) tested
ABL1 (NM_005157) 4 (232-260), 5 (275-279), 6 (314-360), 7 (380-412)
AKT1 (NM_005163) 3 (16-52), 6 (154-183)
ALK (NM_004304) 23 (1172-1204), 25 (1270-1279)
APC (NM_000038) 16 (860-891), 16 (1089-1125), 16 (1284-1326), 16 (1342-1384), 16 (1426-1471), 16 (1483-1524), 16 (1543-1582)
ATM (NM_000051) 8 (326-355), 9 (407-412), 12 (601-626), 17 (834-865), 26 (1292-1325), 34 (1674-1707), 35 (1726-1757), 36 (1790-1815), 39 (1926-1946), 50 (2436-2454), 54 (2650-2667), 55 (2682-2711), 56 (2718-2736), 59 (2865-2891), 61 (2933-2950), 63 (2996-3026), 63 (3041-3057)
BRAF (NM_004333) 11 (439-473), 15 (581-611)
CDH1 (NM_004360) 3 (65-96), 8 (337-374), 9 (380-408)
CDKN2A (NM_000077) 2 (51-90), 2 (98-140)
CSF1R (NM_005211) 7 (297-319), 22 (953-973)
CTNNB1 (NM_001904) 3 (9-48)
EGFR (NM_005228) 3 (96-123), 7 (279-297), 15 (575-601), 18 (695-726), 19-20 (729-796), 20 (807-823), 21 (855-875)
ERBB2 (NM_004448) 19-20 (752-797), 21 (839-882)
ERBB4 (NM_005235) 3 (136-141), 4 (167-186), 6 (225-247), 7 (254-290), 8 (295-323), 9 (333-367), 15 (580-623), 23 (919-948)
EZH2 (NM_004456) 16 (625-649)
FBXW7 (NM_033632) 5 (264-287), 8 (378-403), 9 (434-473), 10 (478-509), 11 (567-594)
FGFR1 (NM_015850) 4 (120-148), 7 (247-273)
FGFR2 (NM_000141) 7 (250-275), 7 (296-313), 9 (362-399), 12 (546-558)
FGFR3 (NM_000142) 7 (247-277), 9 (367-402), 14 (631-653), 16 (690-719), 18 (771-807)
FLT3 (NM_004119) 11 (437-466), 14 (570-610), 16 (663-685), 20 (828-847)
GNA11 (NM_002067) 5 (202-219)
GNAQ (NM_002072) 5 (206-245)
GNAS (NM_000516) 8-9 (196-240)
HNF1A (NM_000545) 3 (192-221), 4 (253-282)
HRAS (NM_005343) 2 (5-35), 3 (42-82)
IDH1 (NM_005896) 4 (101-135)
IDH2 (NM_002168) 4 (133-177)
JAK2 (NM_004972) 14 (603-622)
JAK3 (NM_000215) 4 (128-140), 13 (568-580), 16 (709-733)
KDR (NM_002253) 6-7 (244-291), 11 (471-480), 19 (872-894), 21 (961-988), 26 (1135-1156), 27 (1192-1221), 30 (1283-1310), 30 (1324-1357)
KIT (NM_000222) 2 (23-58), 9 (494-514), 10-11 (525-587), 13 (627-661), 14 (664-684), 15 (714-724), 17 (802-828), 18 (832-858)
KRAS (NM_004985) 2-3 (5-66), 4 (114-150)
MET (NM_001127500) 2 (159-188), 2 (339-378), 11 (816-856), 14 (981-1012), 16 (1105-1132), 19 (1246-1274)
MLH1 (NM_000249) 12 (373-415)
MPL (NM_005373) 10 (501-522)
NOTCH1 (NM_017617) 26 (1566-1602), 27 (1673-1680), 34 (2436-2476)
NPM1 (NM_002520) 11 (283-295)
NRAS (NM_002524) 2 (3-31), 3 (43-69), 4 (124-150)
PDGFRA (NM_006206) 12 (552-583), 14 (644-668), 15 (671-709), 18 (819-854)
PIK3CA (NM_006218) 2 (54-90), 2 (106-118), 5 (316-351), 7-8 (390-422), 8 (449-468), 10 (522-549), 14 (677-720), 19 (898-924), 21 (1017-1051), 21 (1065-1069)
PTEN (NM_000314) 1 (1-25), 3 (55-70), 5 (99-135), 6 (165-184), 7 (212-215), 7 (231-267), 8 (282-300), 8 (312-342)
PTPN11 (NM_002834) 3 (46-82), 13 (485-527)
RB1 (NM_000321) 4 (130-159), 6 (196-203), 10 (314-345), 11 (350-366), 14 (452-463), 17-18 (547-582), 20 (655-691), 21 (703-724), 22 (743-770)
RET (NM_020975) 10-11 (608-654), 13 (762-786), 15-16 (875-924)
SMAD4 (NM_005359) 3 (98-136), 4 (142-146), 5 (165-202), 6 (242-263), 8 (307-319), 9 (326-365), 10 (384-424), 11 (443-474), 12 (494-532)
SMARCB1 (NM_003073) 2 (35-72), 4-5 (144-206), 9 (373-386)
SMO (NM_005631) 3 (186-228), 5 (307-331), 6 (391-419), 9 (511-542), 11 (608-646)
SRC (NM_005417) 14 (499-533)
STK11 (NM_000455) 1 (22-64), 4 (155-181), 4-5 (191-207), 6 (253-285), 8 (317-361)
TP53 (NM_000546) 2 (1-20), 4 (68-113), 5 (126-138), 5-6 (149-223), 7 (225-258), 8 (263-307), 10 (332-367)
VHL (NM_000551) 1 (78-108), 2 (114-150), 3 (155-174)
DISCLAIMER:
This test was developed and its performance characteristics determined by the Molecular Diagnostic Laboratory (MDL) at the M.D. Anderson Cancer Center. It has not been cleared by the U.S. Food and Drug Administration. However, such approval is not required for clinical implementation, and the test results on the ordered genes have been shown to be clinically useful. This laboratory is CAP accredited and CLIA certified to perform high complexity molecular testing for clinical purposes.
";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for C#, please visit: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx