split /PATTERN
/,EXPR
,LIMIT
split /PATTERN
/,EXPR
split /PATTERN
/ split
This function scans a string given by
EXPR
for delimiters, and splits the string into a list of substrings, returning the resulting list value in list context, or the count of substrings in scalar context. The delimiters are determined by repeated pattern matching, using the regular expression given in
PATTERN
, so the delimiters may be of any size, and need not be the same string on every match. (The delimiters are not ordinarily returned, but see below.) If the
PATTERN
doesn't match at all,
split
returns the original string as a single substring. If it matches once, you get two substrings, and so on.
If
LIMIT
is specified and is not negative, the function splits into no more than that many fields (though it may split into fewer if it runs out of delimiters). If
LIMIT
is negative, it is treated as if an arbitrarily large
LIMIT
has been specified. If
LIMIT
is omitted, trailing null fields are stripped from the result (which potential users of
pop
would do well to remember). If
EXPR
is omitted, the function splits the
$_
string. If
PATTERN
is also omitted, the function splits on whitespace,
/\s+/
, after skipping any leading whitespace.
Strings of any length can be split:
@chars = split //, $word; @fields = split /:/, $line; @words = split ' ', $paragraph; @lines = split /^/m, $buffer;
A pattern capable of matching either the null string or something longer than the null string (for instance, a pattern consisting of any single character modified by a
*
or
?
) will split the value of
EXPR
into separate characters wherever it is the null string that produces the match; non-null matches will skip over occurrences of the delimiter in the usual fashion. (In other words, a pattern won't match in one spot more than once, even if it matched with a zero width.) For example:
print join ':', split / */, 'hi there';
produces the output
"h:i:t:h:e:r:e"
. The space disappears because it matched as part of the delimiter. As a trivial case, the null pattern
//
simply splits into separate characters (and spaces do not disappear).
The
LIMIT
parameter is used to split only part of a string:
($login, $passwd, $remainder) = split /:/, $_, 3;
We encourage you to split to lists of names like this in order to make your code self-documenting. (For purposes of error checking, note that
$remainder
would be undefined if there were fewer than three fields.) When assigning to a list, if
LIMIT
is omitted, Perl supplies a
LIMIT
one larger than the number of variables in the list, to avoid unnecessary work. For the split above,
LIMIT
would have been 4 by default, and
$remainder
would have received only the third field, not all the rest of the fields. In time-critical applications it behooves you not to split into more fields than you really need.
We said earlier that the delimiters are not returned, but if the
PATTERN
contains parentheses, then the substring matched by each pair of parentheses is included in the resulting list, interspersed with the fields that are ordinarily returned. Here's a simple case:
split /([-,])/, "1-10,20";
produces the list value:
(1, '-', 10, ',', 20)
With more parentheses, a field is returned for each pair, even if some of the pairs don't match, in which case undefined values are returned in those positions. So if you say:
split /(-)|(,)/, "1-10,20";
you get the value:
(1, '-', undef, 10, undef, ',', 20)
The
/
PATTERN
/
argument may be replaced with an expression to specify patterns that vary at run-time. (To do run-time compilation only once, use
/$variable/o
.) As a special case, specifying a space
" "
will split on whitespace just as
split
with no arguments does. Thus,
split(" ")
can be used to emulate
awk
's default behavior, whereas
split(/ /)
will give you as many null initial fields as there are leading spaces. (Other than this special case, if you supply a string instead of a regular expression, it'll be interpreted as a regular expression anyway.)
The following example splits an RFC-822 message header into a hash containing
$head{Date}
,
$head{Subject}
, and so on. It uses the trick of assigning a list of pairs to a hash, based on the fact that delimiters alternate with delimited fields. It makes use of parentheses to return part of each delimiter as part of the returned list value. Since the
split
pattern is guaranteed to return things in pairs by virtue of containing one set of parentheses, the hash assignment is guaranteed to receive a list consisting of key/value pairs, where each key is the name of a header field. (Unfortunately this technique loses information for multiple lines with the same key field, such as Received-By lines. Ah, well. . . .)
$header =~ s/\n\s+/ /g; # Merge continuation lines. %head = ('FRONTSTUFF', split /^([-\w]+):/m, $header);
The following example processes the entries in a UNIX
passwd
file. You could leave out the
chop
, in which case
$shell
would have a newline on the end of it.
open PASSWD, '/etc/passwd'; while (<PASSWD>) { chop; # remove trailing newline ($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split /:/; ... }
The inverse of split is performed by join (except that join can only join with the same delimiter between all fields). To break apart a string with fixed-position fields, use unpack .