Regular Expressions 101

Community Library Entry

0

Regular Expression
PCRE2 (PHP >=7.3)

/
<img\s+ (?=[^>]*\balt="(?<alt>[^>]*?)")? # optional alt attribute (?=[^>]*\bclass="(?<class>[^>]*?)")? # optional class attribute (?=[^>]*\bsrc="(?<src>[^>]*?)") # src attribute [^>]*\/>
/
gx

Description

We often see some regular expressions to manipulate HTML instead of using a DOM parser. But well, in some cases we haven't got the parser or efficiency is a question of concern. In most cases I think the best is to use both together. First a simple and bullet-proof regex to extract data from the HTML and then a DOM parser to analyse it with a better coverage and more flexibility.

But here, the idea is just to have a relatively simple example of how we can match attributes in any order and with them being optional. It's important to say that it doesn't handle all the possible syntax variations of attributes, as it can be written in so many ways:

  • src="image.png" (with double quotes).
  • style='background-image: url("../images/bg.png")' (with simple quotes).
  • alt = "A description" (with spaces, tabs or even new lines around the "=" sign).
  • id=img-512 or id = img-512 (without quotes).
Submitted by Patrick Janser - 2 years ago