This question already has answers here:
Short example of regular expression converted to a state machine?
(6 answers)
Closed 8 years ago.
Can some explain how Regex engine works when it tries match
^4$ to 749\n486\n4
I am mean how Regex engine parse string While performing match
The regexp ^4$ means match a line that only contains a digit 4
If you apply this regexp to a string that contains newline characters then it will treat the first character of the string as the start of the line and the first newline as the end of the line. Additional characters after the newline are effectively ignored. Example in perl
DB<1> $str="749\n486\n4";
DB<2> x $str =~ /^4$/
empty array
example in python
>>> import re
>>> s="749\n486\n4"
>>> re.search('^4$',s)
However, regexp implementations have a way of dealing with this. There is a multiline setting. In perl
DB<3> x $str =~ /^4$/m
0 1
In python
>>> re.search('^4$',s,re.MULTILINE)
<_sre.SRE_Match object at 0x7f446874b030>
The python docs explain multiline mode like this
re.MULTILINE
When specified, the pattern character '^' matches at the beginning of the string and at
the beginning of each line (immediately following
each newline); and the pattern character '$' matches at the end of the
string and at the end of each line (immediately preceding each
newline). By default, '^' matches only at the beginning of the string,
and '$' only at the end of the string and immediately before the
newline (if any) at the end of the string.
If in your multiline string you actually wanted to know if it ended in a digit 4 on a single line then there is a syntax feature for this
DB<4> x $str =~ /^4\z/m
0 1
See http://perldoc.perl.org/perlre.html especially on the m flag and \a, \z, \Z
or http://docs.python.org/2/library/re.html#regular-expression-objects
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
what is +$ in this command:
[[ $1 =~ ^[0-9]+$ ]]
The + applies to the [0-9] and not the $.
The intended command was:
[[ $1 =~ ^[0-9]+$ ]]
It checks if $1 only contains digits, e.g. 123 or 9 (but not 123f or foo or empty string).
It breaks down as:
[[, start of a Bash extended test command
$1, the first parameter
=~, the Bash extended test command regex match operator
^[0-9]+$, the regex to match against:
^, anchor matching the start of the line
[0-9]+, one or more digits
[0-9], a digit
+, one or more of the preceding atom
$, anchor matching the end of the line
]] to terminate the test command
+ in regexp matches for "1 or more times the preceding pattern" and $ signifies the end of string anchor.
^ is beginning of string anchor (the natural complement to $), and [0-9] matches any single digit (in the range of 0 to 9).
In the following string,
apache:x:48:48:Apache:/var/www:/sbin/nologin
how could I replace the first colon (and this one only) with a comma so I would get the following string?
apache,x:48:48:Apache:/var/www:/sbin/nologin
Also, the code has to support a file with multiple lines and replace the first comma in each line only.
Use a regular expression:
PS C:\> $s = 'apache:x:48:48:Apache:/var/www:/sbin/nologin'
PS C:\> $s -replace '^(.*?):(.*)','$1,$2'
apache,x:48:48:Apache:/var/www:/sbin/nologin
Regexp breakdown:
^(.*?):: shortest match between the beginning of the string and a colon (i.e. the text before the first colon).
(.*): the remainder of the string (i.e. everything after the first colon).
The parantheses group the subexpressions, so they can be referenced in the replacement string as $1 and $2.
Further explanation:
^ matches the beginning of a string.
.* matches any number of characters (. ⇒ any character, * ⇒ zero or more times).
.*? does the same, but gives the shortest match (?) instead of the longest match. This is called a "non-greedy match".
I'm writing a simple program - please see below for my code with comments. Does anyone know why the space character is not recognised in line 10? When I run the code, it finds the :: but does not replace it with a space.
1 #!/usr/bin/perl
2
3 # This program replaces :: with a space
4 # but ignores a single :
5
6 $string = 'this::is::a:string';
7
8 print "Current: $string\n";
9
10 $string =~ s/::/\s/g;
11 print "New: $string\n";
Try s/::/ /g instead of s/::/\s/g.
The \s is actually a character class representing all whitespace characters, so it only makes sense to have it in the regular expression (the first part) rather than in the replacement string.
Use s/::/ /g. \s only denotes whitespace on the matching side, on the replacement side it becomes s.
Replace the \s with a real space.
The \s is shorthand for a whitespace matching pattern. It isn't used when specifying the replacement string.
Replace string should be a literal space, i.e.:
$string =~ s/::/ /g;
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I escape meta-characters when I interpolate a variable in Perl's match operator?
I am using the following regex to search for a string $word in the bigger string $referenceLine as follows :
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$word\b)/g
The problem happens when my $word substring contains some (, etc. Because it takes it as a part of the regex rather than the string to match and gives the following error :
Unmatched ( in regex; marked by <-- HERE in
m/( <-- HERE ?=\b( darsheel safary\b)/
at ./bleu.pl line 119, <REFERENCE> line 1.
Can somone please tell me a solution to this? I think If I could somehow get perl to understand that we want to look for the whole $word as it is without evaluating it, it might work out.
Use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b\Q$word\E\b)/g
to tell the regex engine to treat every character in $word as a literal character.
\Q marks the start, \E marks the end of a literal string in Perl regex.
Alternatively, you could do
$quote_word = quotemeta($word);
and then use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$quote_word\b)/g
One more thing (taken up here from the comments where it's harder to find:
Your regex fails in your example case because of the word boundary anchor \b. This anchor matches between a word character and a non-word character. It only makes sense if placed around actual words, i. e. \bbar\b to ensure that only bar is matched, not foobar or barbaric. If you put it around non-words (as in \b( darsheel safary\b) then it will cause the match to fail (unless there is a letter, digit or underscore right before the ().
I havin a reg ex problemm i would like to have a reg ex that will match the '\nGO at the end of my file(see below.) I have got the following so far:
^\'*GO
but its match the quote sysbol?
EOF:
WHERE (dbo.Property.Archived <> 1)
'
GO
In Perl \Z matches the end of the string, totally ignoring line breaks. Use this to match GO on the last line of a file if the file is loaded into a string:
^GO\Z
POSIX regex uses \' instead of \Z.
To match exactly the newline and then the word GO in your example, you want this:
\nGO
You can also do this:
\n.*GO
This last regular expression will match what you want in your example, but the .* part will make it so there can be anything (or nothing) in between the newline and GO.