  • A single character of: a, b or c
  • A character except: a, b or c
  • A character in the range: a-z
  • A character not in the range: a-z
  • A character in the range: a-z or A-Z
  • Any single character
  • Alternate - match either a or b
  • Any whitespace character
  • Any non-whitespace character
  • Any digit
  • Any non-digit
  • Any word character
  • Any non-word character
  • Non-capturing group
  • Capturing group
  • Zero or one of a
  • Zero or more of a
  • One or more of a
  • Exactly 3 of a
  • 3 or more of a
  • Between 3 and 6 of a
  • Start of string
  • End of string
  • A word boundary
  • Non-word boundary

# coding=utf8 # the above tag defines encoding for this document and is for Python 2.x compatibility import re regex = r"<!DOCTYPE html>|</?\s*[a-z-][^>]*\s*>|(\&(?:[\w\d]+|#\d+|#x[a-f\d]+);|<!--[\s\S\n]*?-->)" test_str = ("\n" "# Above is a blank line, no match.\n\n" "foo bar baz\n" "this is a string\n" "Testing\n" "<>\n" "Hello, World\n" "This is less than <, this is greater than >.\n" " a < 3 && b > 3\n" "<<Important Text>>\n" "# Not HTML-like.\n\n" "<p>fizz buzz</p>\n" "<a>this is a string</a>\n" "this is a <b>string</b>\n" "<p>Testing</p>\n" "<img src=\"hello.jpg\">\n" "<a>Foo</a>\n" "<input type='submit' value='Ok' />\n" "<input type='submit' value='Ok'>\n" "<br/>\n" "<br>\n" "<!-- comment -- doesn't work! -->\n" "<hr>\n" "Foo &amp; bar\n" "# These one-line samples are totally HTML-like.\n\n" "<file-upload>\n" "<absurd example>\n" "<closed example></closed>\n" "# Custom tags.\n\n" "<a>\n" "# Not matched by others, but actually valid.\n\n" "My < weird > string\n" "# Not actually a false positive; this is valid HTML!\n\n" "# Sample \"smallest complete HTML document\":\n" "<!DOCTYPE html>\n" "<title>testing</title>\n" "<p>This is a test.</p>\n" "<strange>This is strange.</strange>\n" "# And yes, <strange> IS VALID HTML.\n\n" "résume\n" "r&eacute;sume\n" "r&#201;sume\n" "r&x00C9;sume\n" "# Entities\n\n" "# List Tricks\n" "<ul><li>Foo</li\n" "><li>Bar</li\n" "></ul>\n\n" "# From\n" "Hello, World\n" "This is less than <, this is greater than >.\n" " a < 3 && b > 3\n" "<<Important Text>>\n" "<a> # This actually is HTML, not a false positive.\n" "<a>Foo</a>\n" "<input type='submit' value='Ok' /> # XHTML, not HTML...\n" "<br/> # XHTML again...\n" "<br> # These didn't work with that answer.\n" "Foo &amp; bar\n" "<input type='submit' value='Ok'>\n\n" "# From\n" "<a href=bla>sdfsdf</a>\n" "<div>something</div>\n" "<br>\n" "<span>mayhem</div>\n" "<hr />\n" "<input name=bla / >\n" "<div>some<span>thing</span>here</div>\n\n\n" "# Prepare your eye bleach.\n" "<p style=\"line-height:normal; margin-top:0px\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Notre sp&eacute;cialit&eacute; : offrir de l&rsquo;assistance &agrave; plus d&rsquo;un million de Qu&eacute;b&eacute;cois. Nous sommes fiers d&rsquo;aider! Participez vous aussi &agrave; cette mission en r&eacute;alisant les r&ecirc;ves d&rsquo;aventure, de d&eacute;tente et de d&eacute;couverte de nos membres et clients au sein de notre agence de voyages.</span></span></p>\\r\\n\\r\\n<p style=\"line-height:normal; margin-top:0px\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Joignez-vous &agrave; nous! Vous b&eacute;n&eacute;ficierez de nombreux avantages :&nbsp;</span></span></p>\\r\\n\\r\\n<ul style=\"line-height:normal\">\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Salaire fixe, et primes lorsque vous d&eacute;passez vos objectifs.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">20 jours de cong&eacute; apr&egrave;s une ann&eacute;e.</span></span></li>\\r\\n\\t<li><span style=\"font-size:12px\"><span style=\"font-family:Arial\">R&eacute;gime de retraite - CAA-Qu&eacute;bec &eacute;gale votre mise!</span></span></li>\\r\\n\\t<li><span style=\"font-size:12px\"><span style=\"font-family:Arial\">Assurance collective compl&egrave;te (soins m&eacute;dicaux et param&eacute;dicaux, invalidit&eacute;, etc.).</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Rabais trippants chez nos partenaires, dans nos centres Voyages et pour vos assurances.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Plus de 1,2 million de membres comme clients potentiels.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Soutien administratif pour vous concentrer sur la vente de voyages.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Allocation g&eacute;n&eacute;reuse pour les &eacute;ducotours.</span></span></li>\\r\\n</ul>\\r\\n\\r\\n<p style=\"line-height:normal; margin-bottom:0px; margin-top:0px\">&nbsp;</p>\\r\\n\\r\\n<p style=\"line-height:normal; margin-bottom:0px; margin-top:0px\"><img class=\"largeimage\" src=\"\" style=\"line-height:normal; width:100%\" /></p>\\r\\n\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t <br/><br/>\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t<p style=\"line-height:normal; margin-top:0px\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">En tant que conseiller en voyages, vos principales t&acirc;ches et responsabilit&eacute;s seront celles-ci :</span></span></p>\\r\\n\\r\\n<ul style=\"line-height:normal\">\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">R&eacute;aliser une analyse des besoins des clients et leur fournir des renseignements pr&eacute;cis et utiles.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Effectuer les r&eacute;servations et achats (forfaits, croisi&egrave;res, circuits, h&ocirc;tels, automobiles et assurances voyage).</span></span></li>\\r\\n</ul>\\r\\n\\r\\n<p style=\"line-height:normal; margin-top:0px\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Vous occuperez un poste r&eacute;gulier &agrave; temps plein (35&nbsp;heures par semaine). L&rsquo;horaire sera variable et vous devrez parfois travailler le soir et la fin de semaine afin de bien servir les voyageurs.</span></span></p>\\r\\n\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t <br/><br/>\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t<p style=\"line-height:normal; margin-top:0px\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Nous sommes toujours &agrave; la recherche de personnes de talent. Mais vous devrez avoir un profil pr&eacute;cis pour ce poste!</span></span></p>\\r\\n\\r\\n<ul style=\"line-height:normal\">\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Dipl&ocirc;me d&rsquo;&eacute;tudes coll&eacute;giales en tourisme ou formation d&rsquo;agent de voyages.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">2 &agrave; 3 ann&eacute;es d&rsquo;exp&eacute;rience comme conseiller en voyages.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Certificat de conseiller en voyages de l&rsquo;Office de la protection du consommateur, ou &ecirc;tre en mesure de l&rsquo;obtenir.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Connaissance des syst&egrave;mes de d&eacute;livrance de billets : GDS, Galileo/Apollo, PcVoyages et SIREV (un atout).</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Ma&icirc;trise du fran&ccedil;ais et de l&rsquo;anglais.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Professionnalisme, attention aux besoins des clients et volont&eacute; d&rsquo;offrir un service de qualit&eacute;.</span></span></li>\\r\\n\\t<li style=\"line-height: normal;\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\">Souci d&rsquo;atteindre les objectifs de vente.</span></span></li>\\r\\n</ul>\\r\\n\\r\\n<p style=\"line-height:normal; margin-top:0px\"><span style=\"font-size:12px; line-height:normal\"><span style=\"font-family:Arial; line-height:normal\"><span style=\"line-height:normal\">Si explorer de nouveaux horizons vous passionne, et que vous aimez aider d&rsquo;autres personnes &agrave; d&eacute;couvrir le monde, vous serez heureux &agrave; Voyages CAA-Qu&eacute;bec. Postulez d&egrave;s aujourd&rsquo;hui. Nous vous attendons avec impatience!</span></span></span></p>") matches = re.finditer(regex, test_str) for matchNum, match in enumerate(matches, start=1): print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = for groupNum in range(0, len(match.groups())): groupNum = groupNum + 1 print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = # Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

