Matches some number of a
's followed by the same number of b
's, usually written as a^nb^n
for all natural numbers n
.
For regular expressions as formally defined in formal language theory, this is impossible, but modern day regex engines can do far more than match just regular languages.
Note that even though this works with the Python flavor on regex101, this doesn't actually compile with Python's re
library, since their regex engine doesn't allow you to:
(?(1)\1)
requires \1
to exist, even though if it doesn't exist the pattern would just match nothing and keep parsing.Here is a short explanation on how the regex works:
The key part of the regex is this lookahead (?=a+((?(1)\1)b))
wrapped by ((?=...)a)*
. The lookahead skips through the a
's, and then group one ((?(1)\1)b))
matches either b
if group one doesn't exist (the first "iteration", where we iterate via the outer *), or group one and a b if it does exist.
The effect of this is that we iterate the * group until either we run out of b
's or we run out of a
's. After that, we just need to check that we have group one followed by the end of the string at the end, since those are the b
's. If we have too many a
's, we would be matching against a...b...b
, and if we have too many b
's we wouldn't hit the end of the string.
One small caveat is that if we have an empty string, group one doesn't exist, which we can remedy by either putting (?(1)\1)
or just wrapping everything in an optional group.