PERL TUTORIALS - Perl Regular Expressions
Perl Regular Expressions
ADVERTISEMENTS
Match Operator Modifiers
Modifier | Description |
---|---|
i | Makes the match case insensitive |
m | Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary |
o | Evaluates the expression only once |
s | Allows use of . to match a newline character |
x | Allows you to use white space in the expression for clarity |
g | Globally finds all matches |
cg | Allows the search to continue even after a global match fails |
ADVERTISEMENTS
Substitution Operator Modifiers
Modifier | Description |
---|---|
i | Makes the match case insensitive |
m | Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary |
o | Evaluates the expression only once |
s | Allows use of . to match a newline character |
x | Allows you to use white space in the expression for clarity |
g | Replaces all occurrences of the found expression with the replacement text |
e | Evaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text |
ADVERTISEMENTS
Translation Operator Modifiers
Modifier | Description |
---|---|
c | Complement SEARCHLIST. |
d | Delete found but unreplaced characters. |
s | Squash duplicate replaced characters. |
More complex regular expressions
Pattern | Description |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets |
* | Matches 0 or more occurrences of preceding expression. |
+ | Matches 1 or more occurrence of preceding expression. |
? | Matches 0 or 1 occurrence of preceding expression. |
{ n} | Matches exactly n number of occurrences of preceding expression. |
{ n,} | Matches n or more occurrences of preceding expression. |
{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
a| b | Matches either a or b. |
\w | Matches word characters. |
\W | Matches nonword characters. |
\s | Matches whitespace. Equivalent to [\t\n\r\f]. |
\S | Matches nonwhitespace. |
\d | Matches digits. Equivalent to [0-9]. |
\D | Matches nondigits. |
\A | Matches beginning of string. |
\Z | Matches end of string. If a newline exists, it matches just before newline. |
\z | Matches end of string. |
\G | Matches point where last match finished. |
\b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
\B | Matches nonword boundaries. |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. |
\1...\9 | Matches nth grouped subexpression. |
\10 | Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code. |
[aeiou] | Matches a single character in the given set |
[^aeiou] | Matches a single character outside the given set |
Literal characters:
Example | Description |
---|---|
Perl | Match "Perl". |
Character classes:
Example | Description |
---|---|
[Pp]ython | Match "Python" or "python" |
rub[ye] | Match "ruby" or "rube" |
[aeiou] | Match any one lowercase vowel |
[0-9] | Match any digit; same as [0123456789] |
[a-z] | Match any lowercase ASCII letter |
[A-Z] | Match any uppercase ASCII letter |
[a-zA-Z0-9] | Match any of the above |
[^aeiou] | Match anything other than a lowercase vowel |
[^0-9] | Match anything other than a digit |
Special Character Classes:
Example | Description |
---|---|
. | Match any character except newline |
\d | Match a digit: [0-9] |
\D | Match a nondigit: [^0-9] |
\s | Match a whitespace character: [ \t\r\n\f] |
\S | Match nonwhitespace: [^ \t\r\n\f] |
\w | Match a single word character: [A-Za-z0-9_] |
\W | Match a nonword character: [^A-Za-z0-9_] |
Repetition Cases:
Example | Description |
---|---|
ruby? | Match "rub" or "ruby": the y is optional |
ruby* | Match "rub" plus 0 or more ys |
ruby+ | Match "rub" plus 1 or more ys |
\d{3} | Match exactly 3 digits |
\d{3,} | Match 3 or more digits |
\d{3,5} | Match 3, 4, or 5 digits |
Nongreedy repetition:
Example | Description |
---|---|
<.*> | Greedy repetition: matches "<python>perl>" |
<.*?> | Nongreedy: matches "<python>" in "<python>perl>" |
Grouping with parentheses:
Example | Description |
---|---|
\D\d+ | No group: + repeats \d |
(\D\d)+ | Grouped: + repeats \D\d pair |
([Pp]ython(, )?)+ | Match "Python", "Python, python, python", etc. |
Backreferences:
Example | Description |
---|---|
([Pp])ython&\1ails | Match python&pails or Python&Pails |
(['"])[^\1]*\1 | Single or double-quoted string. \1 matches whatever the 1st group matched . \2 matches whatever the 2nd group matched, etc. |
Alternatives:
Example | Description |
---|---|
python|perl | Match "python" or "perl" |
rub(y|le)) | Match "ruby" or "ruble" |
Python(!+|\?) | "Python" followed by one or more ! or one ? |
Anchors:
Example | Description |
---|---|
^Python | Match "Python" at the start of a string or internal line |
Python$ | Match "Python" at the end of a string or line |
\APython | Match "Python" at the start of a string |
Python\Z | Match "Python" at the end of a string |
\bPython\b | Match "Python" at a word boundary |
\brub\B | \B is nonword boundary: match "rub" in "rube" and "ruby" but not alone |
Python(?=!) | Match "Python", if followed by an exclamation point |
Python(?!!) | Match "Python", if not followed by an exclamation point |
Special syntax with parentheses:
Example | Description |
---|---|
R(?#comment) | Matches "R". All the rest is a comment |
R(?i)uby | Case-insensitive while matching "uby" |
R(?i:uby) | Same as above |
rub(?:y|le)) | Group only without creating \1 backreference |
The Match Operator
#!/usr/bin/perl $bar = "This is foo and again foo"; if ($bar =~ /foo/){ print "First time is matching\n"; }else{ print "First time is not matching\n"; } $bar = "foo"; if ($bar =~ /foo/){ print "Second time is matching\n"; }else{ print "Second time is not matching\n"; }
#!/usr/bin/perl $bar = "This is foo and again foo"; if ($bar =~ m[foo]){ print "First time is matching\n"; }else{ print "First time is not matching\n"; } $bar = "foo"; if ($bar =~ m{foo}){ print "Second time is matching\n"; }else{ print "Second time is not matching\n"; }
$true = ($foo =~ m/foo/);
my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);Matching Only Once
#!/usr/bin/perl @list = qw/food foosball subeo footnote terfoot canic footbrdige/; foreach (@list) { $first = $1 if ?(foo.*)?; $last = $1 if /(foo.*)/; } print "First: $first, Last: $last\n";Regular Expression Variables
#!/usr/bin/perl $string = "The food is in the salad bar"; $string =~ m/foo/; print "Before: $`\n"; print "Matched: $&\n"; print "After: $'\n";The Substitution Operator
s/PATTERN/REPLACEMENT/;
#/user/bin/perl $string = "The cat sat on the mat"; $string =~ s/cat/dog/; print "$string\n";The Translation Operator
tr/SEARCHLIST/REPLACEMENTLIST/cds y/SEARCHLIST/REPLACEMENTLIST/cds
#/user/bin/perl $string = 'The cat sat on the mat'; $string =~ tr/a/o/; print "$string\n";
$string =~ tr/a-z/A-Z/;Translation Operator Modifiers
#!/usr/bin/perl $string = 'the cat sat on the mat.'; $string =~ tr/a-z/b/d; print "$string\n";
#!/usr/bin/perl $string = 'food'; $string = 'food'; $string =~ tr/a-z/a-z/s; print "$string\n";More complex regular expressions
# nothing in the string (start and end are adjacent) /^$/ # a three digits, each followed by a whitespace # character (eg "3 4 5 ") /(\d\s){3}/ # matches a string in which every # odd-numbered letter is a (eg "abacadaf") /(a.)+/ # string starts with one or more digits /^\d+/ # string that ends with one or more digits /\d+$/
#!/usr/bin/perl $string = "Cats go Catatonic\nWhen given Catnip"; ($start) = ($string =~ /\A(.*?) /); @lines = $string =~ /^(.*?) /gm; print "First word: $start\n","Line starts: @lines\n";Matching Boundaries
/\bcat\b/ # Matches 'the cat sat' but not 'cat on the mat' /\Bcat\B/ # Matches 'verification' but not 'the cat on the mat' /\bcat\B/ # Matches 'catatonic' but not 'polecat' /\Bcat\b/ # Matches 'polecat' but not 'catatonic'Selecting Alternatives
if ($string =~ /cat|dog/)
if (($string =~ /Martin Brown/) || ($string =~ /Sharon Brown/)) This could be written as follows if ($string =~ /(Martin|Sharon) Brown/)Grouping Matching
$string =~ /(\S+)\s+(\S+)/; and $string =~ /\S+\s+\S+/;
my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);
#!/usr/bin/perl $time = "12:05:30"; $time =~ m/(\d+):(\d+):(\d+)/; my ($hours, $minutes, $seconds) = ($1, $2, $3); print "Hours : $hours, Minutes: $minutes, Second: $seconds\n";
#!/usr/bin/perl $date = '03/26/1999'; $date =~ s#(\d+)/(\d+)/(\d+)#$3/$1/$2#; print "$date\n";The G Assertion
#!/usr/bin/perl $string = "The time is: 12:31:02 on 4/12/00"; $string =~ /:\s+/g; ($time) = ($string =~ /\G(\d+:\d+:\d+)/); $string =~ /.+\s+/g; ($date) = ($string =~ m{\G(\d+/\d+/\d+)}); print "Time: $time, Date: $date\n";