You want to match again from where the last pattern left off.
This is a useful approach to take when repeatedly extracting data in chunks from a string.
Use a combination of the
/g
match modifier, the
\G
pattern anchor, and the
pos
function.
If you use the
/g
modifier on a match, the regular expression engine keeps track of its position in the string when it finished matching. The next time you match with
/g
, the engine starts looking for a match from this remembered position. This lets you use a
while
loop to extract the information you want from the string.
while (/(\d+)/g) { print "Found $1\n"; }
You can also use
\G
in your pattern to anchor it to the end of the previous match. For example, if you had a number stored in a string with leading blanks, you could change each leading blank into the digit zero this way:
$n = " 49 here"; $n =~ s/\G /0/g; print $n;
00049 here
You can also make good use of
\G
in a
while
loop. Here we use
\G
to parse a comma-separated list of numbers (e.g.,
"3,4,5,9,120"
):
while (/\G,?(\d+)/g) { print "Found number $1\n"; }
By default, when your match fails (when we run out of numbers in the examples, for instance) the remembered position is reset to the start. If you don't want this to happen, perhaps because you want to continue matching from that position but with a different pattern, use the modifier
/c
with
/g
:
$_ = "The year 1752 lost 10 days on the 3rd of September"; while (/(\d+)/gc) { print "Found number $1\n"; } if (/\G(\S+)/g) { print "Found $1 after the last number.\n"; }
Found number 1752
Found number 10
Found number 3
Found rd after the last number.
As you can see, successive patterns can use
/g
on a string and in doing so change the location of the last successful match. The position of the last successful match is associated with the scalar being matched against, not with the pattern. Further, the position is not copied when you copy the string, nor saved if you use the ill-named
local
operator.
The location of the last successful match can be read and set with the
pos
function, which takes as its argument the string whose position you want to get or set. If no argument is given,
pos
operates on
$_
:
print "The position in \$a is ", pos($a); pos($a) = 30; print "The position in \$_ is ", pos; pos = 30;
The
/g
modifier is discussed in
perlre
(1) and the
"the rules of regular expression matching"
section of
Chapter 2
of
Programming Perl