Regular Expressions 101

Community Patterns

Community Library Entry

1

Regular Expression
Python

r"
<!DOCTYPE html>|</?\s*[a-z-][^>]*\s*>|(\&(?:[\w\d]+|#\d+|#x[a-f\d]+);|<!--[\s\S\n]*?-->)
"
g

Description

This would appear to violate the premise of this famous StackOverflow answer, however this is not parsing as such, only matching or heuristic identification.

Technically, all text is HTML if served to a browser in such a way that the browser chooses to interpret it that way, e.g. using a text/html Content-Type.

Submitted by Alice Bevan-mcgregor - 2 years ago