Menu

PERL TUTORIALS - Perl Regular Expressions

Perl Regular Expressions

ADVERTISEMENTS

Match Operator Modifiers

ModifierDescription
iMakes the match case insensitive
mSpecifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary
oEvaluates the expression only once
sAllows use of . to match a newline character
xAllows you to use white space in the expression for clarity
gGlobally finds all matches
cgAllows the search to continue even after a global match fails

ADVERTISEMENTS

Substitution Operator Modifiers

ModifierDescription
iMakes the match case insensitive
mSpecifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary
oEvaluates the expression only once
sAllows use of . to match a newline character
xAllows you to use white space in the expression for clarity
gReplaces all occurrences of the found expression with the replacement text
eEvaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text

ADVERTISEMENTS

Translation Operator Modifiers

ModifierDescription
cComplement SEARCHLIST.
dDelete found but unreplaced characters.
sSquash duplicate replaced characters.

More complex regular expressions

PatternDescription
^Matches beginning of line.
$Matches end of line.
.Matches any single character except newline. Using m option allows it to match newline as well.
[...]Matches any single character in brackets.
[^...]Matches any single character not in brackets
*Matches 0 or more occurrences of preceding expression.
+Matches 1 or more occurrence of preceding expression.
?Matches 0 or 1 occurrence of preceding expression.
{ n}Matches exactly n number of occurrences of preceding expression.
{ n,}Matches n or more occurrences of preceding expression.
{ n, m}Matches at least n and at most m occurrences of preceding expression.
a| bMatches either a or b.
\wMatches word characters.
\WMatches nonword characters.
\sMatches whitespace. Equivalent to [\t\n\r\f].
\SMatches nonwhitespace.
\dMatches digits. Equivalent to [0-9].
\DMatches nondigits.
\AMatches beginning of string.
\ZMatches end of string. If a newline exists, it matches just before newline.
\zMatches end of string.
\GMatches point where last match finished.
\bMatches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
\BMatches nonword boundaries.
\n, \t, etc.Matches newlines, carriage returns, tabs, etc.
\1...\9Matches nth grouped subexpression.
\10Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.
[aeiou]Matches a single character in the given set
[^aeiou]Matches a single character outside the given set

Literal characters:

ExampleDescription
PerlMatch "Perl".

Character classes:

ExampleDescription
[Pp]ython Match "Python" or "python"
rub[ye]Match "ruby" or "rube"
[aeiou]Match any one lowercase vowel
[0-9]Match any digit; same as [0123456789]
[a-z]Match any lowercase ASCII letter
[A-Z]Match any uppercase ASCII letter
[a-zA-Z0-9]Match any of the above
[^aeiou]Match anything other than a lowercase vowel
[^0-9]Match anything other than a digit

Special Character Classes:

ExampleDescription
.Match any character except newline
\dMatch a digit: [0-9]
\D Match a nondigit: [^0-9]
\sMatch a whitespace character: [ \t\r\n\f]
\S Match nonwhitespace: [^ \t\r\n\f]
\wMatch a single word character: [A-Za-z0-9_]
\WMatch a nonword character: [^A-Za-z0-9_]

Repetition Cases:

ExampleDescription
ruby? Match "rub" or "ruby": the y is optional
ruby* Match "rub" plus 0 or more ys
ruby+Match "rub" plus 1 or more ys
\d{3}Match exactly 3 digits
\d{3,}Match 3 or more digits
\d{3,5}Match 3, 4, or 5 digits

Nongreedy repetition:

ExampleDescription
<.*>Greedy repetition: matches "<python>perl>"
<.*?>Nongreedy: matches "<python>" in "<python>perl>"

Grouping with parentheses:

ExampleDescription
\D\d+No group: + repeats \d
(\D\d)+Grouped: + repeats \D\d pair
([Pp]ython(, )?)+Match "Python", "Python, python, python", etc.

Backreferences:

ExampleDescription
([Pp])ython&\1ailsMatch python&pails or Python&Pails
(['"])[^\1]*\1Single or double-quoted string. \1 matches whatever the 1st group matched . \2 matches whatever the 2nd group matched, etc.

Alternatives:

ExampleDescription
python|perlMatch "python" or "perl"
rub(y|le))Match "ruby" or "ruble"
Python(!+|\?)"Python" followed by one or more ! or one ?

Anchors:

ExampleDescription
^PythonMatch "Python" at the start of a string or internal line
Python$ Match "Python" at the end of a string or line
\APython Match "Python" at the start of a string
Python\ZMatch "Python" at the end of a string
\bPython\bMatch "Python" at a word boundary
\brub\B\B is nonword boundary: match "rub" in "rube" and "ruby" but not alone
Python(?=!)Match "Python", if followed by an exclamation point
Python(?!!)Match "Python", if not followed by an exclamation point

Special syntax with parentheses:

ExampleDescription
R(?#comment)Matches "R". All the rest is a comment
R(?i)ubyCase-insensitive while matching "uby"
R(?i:uby)Same as above
rub(?:y|le))Group only without creating \1 backreference

The Match Operator

#!/usr/bin/perl

$bar = "This is foo and again foo";
if ($bar =~ /foo/){
   print "First time is matching\n";
}else{
   print "First time is not matching\n";
}

$bar = "foo";
if ($bar =~ /foo/){
   print "Second time is matching\n";
}else{
   print "Second time is not matching\n";
}

#!/usr/bin/perl

$bar = "This is foo and again foo";
if ($bar =~ m[foo]){
   print "First time is matching\n";
}else{
   print "First time is not matching\n";
}

$bar = "foo";
if ($bar =~ m{foo}){
   print "Second time is matching\n";
}else{
   print "Second time is not matching\n";
}

$true = ($foo =~ m/foo/);

my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);

Matching Only Once

#!/usr/bin/perl

@list = qw/food foosball subeo footnote terfoot canic footbrdige/;

foreach (@list)
{
   $first = $1 if ?(foo.*)?;
   $last = $1 if /(foo.*)/;
}
print "First: $first, Last: $last\n";

Regular Expression Variables

#!/usr/bin/perl

$string = "The food is in the salad bar";
$string =~ m/foo/;
print "Before: $`\n";
print "Matched: $&\n";
print "After: $'\n";

The Substitution Operator

s/PATTERN/REPLACEMENT/;

#/user/bin/perl

$string = "The cat sat on the mat";
$string =~ s/cat/dog/;

print "$string\n";

The Translation Operator

tr/SEARCHLIST/REPLACEMENTLIST/cds
y/SEARCHLIST/REPLACEMENTLIST/cds

#/user/bin/perl

$string = 'The cat sat on the mat';
$string =~ tr/a/o/;

print "$string\n";

$string =~ tr/a-z/A-Z/;

Translation Operator Modifiers

#!/usr/bin/perl 

$string = 'the cat sat on the mat.';
$string =~ tr/a-z/b/d;

print "$string\n";

#!/usr/bin/perl

$string = 'food';
$string = 'food';
$string =~ tr/a-z/a-z/s;

print "$string\n";

More complex regular expressions

# nothing in the string (start and end are adjacent)
/^$/   

# a three digits, each followed by a whitespace
# character (eg "3 4 5 ")
/(\d\s){3}/  

# matches a string in which every
# odd-numbered letter is a (eg "abacadaf")
/(a.)+/  

# string starts with one or more digits
/^\d+/

# string that ends with one or more digits
/\d+$/

#!/usr/bin/perl

$string = "Cats go Catatonic\nWhen given Catnip";
($start) = ($string =~ /\A(.*?) /);
@lines = $string =~ /^(.*?) /gm;
print "First word: $start\n","Line starts: @lines\n";

Matching Boundaries

/\bcat\b/ # Matches 'the cat sat' but not 'cat on the mat'
/\Bcat\B/ # Matches 'verification' but not 'the cat on the mat'
/\bcat\B/ # Matches 'catatonic' but not 'polecat'
/\Bcat\b/ # Matches 'polecat' but not 'catatonic'

Selecting Alternatives

if ($string =~ /cat|dog/)

if (($string =~ /Martin Brown/) ||  ($string =~ /Sharon Brown/))

This could be written as follows

if ($string =~ /(Martin|Sharon) Brown/)

Grouping Matching

$string =~ /(\S+)\s+(\S+)/;

and 

$string =~ /\S+\s+\S+/;

my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);

#!/usr/bin/perl

$time = "12:05:30";

$time =~ m/(\d+):(\d+):(\d+)/;
my ($hours, $minutes, $seconds) = ($1, $2, $3);

print "Hours : $hours, Minutes: $minutes, Second: $seconds\n";

#!/usr/bin/perl

$date = '03/26/1999';
$date =~ s#(\d+)/(\d+)/(\d+)#$3/$1/$2#;

print "$date\n";

The G Assertion

#!/usr/bin/perl

$string = "The time is: 12:31:02 on 4/12/00";

$string =~ /:\s+/g;
($time) = ($string =~ /\G(\d+:\d+:\d+)/);
$string =~ /.+\s+/g;
($date) = ($string =~ m{\G(\d+/\d+/\d+)});

print "Time: $time, Date: $date\n";