You want to process a string one character at a time.
Use split with a null pattern to break up the string into individual characters, or use unpack if you just want their ASCII values:
@array = split(//, $string); @array = unpack("C*", $string);
Or extract each character in turn with a loop:
while (/(.)/g) { # . is never a newline here # do something with $1 }
As we said before, Perl's fundamental unit is the string, not the character. Needing to process anything a character at a time is rare. Usually some kind of higher-level Perl operation, like pattern matching, solves the problem more easily. See, for example, Recipe 7.7 , where a set of substitutions is used to find command-line arguments.
Splitting on a pattern that matches the empty string returns a list of the individual characters in the string. This is a convenient feature when done intentionally, but it's easy to do unintentionally. For instance,
/X*/
matches the empty string. Odds are you will find others when you don't mean to.
Here's an example that prints the characters used in the string "
an
apple
a
day
", sorted in ascending ASCII order:
%seen = (); $string = "an apple a day"; foreach $byte (split //, $string) { $seen{$byte}++; } print "unique chars are: ", sort(keys %seen), "\n";
unique chars are: adelnpy
These
split
and
unpack
solutions give you an array of characters to work with. If you don't want an array, you can use a pattern match with the
/g
flag in a
while
loop, extracting one character at a time:
%seen = (); $string = "an apple a day"; while ($string =~ /(.)/g) { $seen{$1}++; } print "unique chars are: ", sort(keys %seen), "\n";
unique chars are: adelnpy
In general, if you find yourself doing character-by-character processing, there's probably a better way to go about it. Instead of using
index
and
substr
or
split
and
unpack
, it might be easier to use a pattern. Instead of computing a 32-bit checksum by hand, as in the next example, the
unpack
function can compute it far more efficiently.
The following example calculates the checksum of
$string
with a
foreach
loop. There are better checksums; this just happens to be the basis of a traditional and computationally easy checksum. See the MD5 module from CPAN if you want a more sound checksum.
$sum = 0; foreach $ascval (unpack("C*", $string)) { $sum += $ascval; } print "sum is $sum\n"; # prints "1248" if $string was "an apple a day"
This does the same thing, but much faster:
$sum = unpack("%32C*", $string);
This lets us emulate the SysV checksum program:
#!/usr/bin/perl # sum - compute 16-bit checksum of all input files $checksum = 0; while (<>) { $checksum += unpack("%16C*", $_) } $checksum %= (2 ** 16) - 1; print "$checksum\n";
Here's an example of its use:
% perl sum /etc/termcap
1510
If you have the GNU version of sum , you'll need to call it with the - -sysv option to get the same answer on the same file.
% sum --sysv /etc/termcap
1510 851 /etc/termcap
Another tiny program that processes its input one character at a time is slowcat , shown in Example 1.1 . The idea here is to pause after each character is printed so you can scroll text before an audience slowly enough that they can read it.
#!/usr/bin/perl # slowcat - emulate a s l o w line printer # usage: slowcat [-DELAY] [files ...] $DELAY = ($ARGV[0] =~ /^-([.\d]+)/) ? (shift, $1) : 1; $| = 1; while (<>) { for (split(//)) { print; select(undef,undef,undef, 0.005 * $DELAY); } }
The
split
and
unpack
functions in
perlfunc
(1) and
Chapter 3
of
Programming Perl
; the use of
select
for timing is explained in
Recipe 3.10
Copyright © 2001 O'Reilly & Associates. All rights reserved.