What does a '/' character mean in a regular expression?
I have observed the following example to match single or double digit numbers.
/^\d{1,2}$/
When I googled multiple regex cheat sheets, the forward slash did not show up as a character with meaning in regex....
What does '/' do in regex?
It doesn't actually do anything. In Javascript, Perl and some other languages, it is used as a delimiter character explicitly for regular expressions.
Some languages like PHP use it as a delimiter inside a string, with additional options passed at the end, just like Javascript and Perl (in this case, "m" for multi-line):
preg_match("/^\d{1,2}$/m", $input);
With this syntax, you can also use other characters, which can make matching literal /'s easier:
preg_match("![a-z]+/[a-z]+!i", "Example/Match");
Related
I am struggling with writing regex expression in Snowflake.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123','^DEM.*\d\d$') AS regex
I would like to find all strings that starts with "DEM" and ends with two digits. Unfortunately the expression that I am using returns FALSE.
I was checking this expression in two regex generators and it worked.
In snowflake the backslash character \ is an escape character.
Reference: Escape Characters and Caveats
So you need to use 2 backslashes in a regex to express 1.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123', '^DEM.*\\d\\d$') AS regex
Or you could write the regex pattern in such a way that the backslash isn't used.
For example, the pattern ^DEM.*[0-9]{2}$ matches the same as the pattern ^DEM.*\d\d$.
You need to escape your backslashes in your SQL before it can be parsed as a regex string. (sometimes it gets a bit silly with the number of backslashes needed)
Your example should look like this
RLIKE('DEM7BZB01-123','^DEM.*\\d\\d$') AS regex
RLIKE (which is an alias in Snowflake for the SQL Standard REGEXP_LIKE function) implicitly adds ^ and $ to your search pattern...
The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$').
so you can remove them, and that then allows you to use $$ quoting
In single-quoted string constants, you must escape the backslash character in the backslash-sequence. For example, to specify \d, use \d. For details, see Specifying Regular Expressions in Single-Quoted String Constants (in this topic).
You do not need to escape backslashes if you are delimiting the string with pairs of dollar signs ($$) (rather than single quotes).
so you can simply use the regex DEM.*\d\d to find all strings that starts with DEM and ends with two digits without extra escaping as follows
SELECT
'DEM7BZB01-123' AS SKU
, RLIKE('DEM7BZB01-123', $$DEM.*\d\d$$) AS regex
which gives
SKU |REGEX|
-------------+-----+
DEM7BZB01-123|true |
I’m using a variable to search and replace a string using Perl.
I want to replace the string 23.0 with 23.0.1, so I tried this:
my $old="23.0";
my $new="23.0.1";
$_ =~ s/$old/$new/g;
The problem is that it also replaced the string 2310, so I tried:
my $old="23\.0"
and also /ee.
But can’t get the correct syntax for it to work. Can someone show me the correct syntax?
There are two things that will help you here:
The quotemeta function - that will escape meta characters. And also the \Q and \E regex flags, that stop regex interpolation.
print quotemeta "21.0";
Or:
my $old="23.0";
my $new="23.0.1";
my $str = "2310";
$str =~ s/\Q$old\E/$new/g;
print $str;
Just use single quotes and escape the dot.
my $old='23\.0';
To complement Sobrique's excellent answer, let me note that the reason your attempt with "23\.0" didn't work is that "23\.0" and "23.0" evaluate to the same string: in a double-quoted string literal, the backslash escape sequence \. simply evaluates to ..
There are several things you could do to avoid this:
If you indeed want to match a fixed string, and don't need or want to include any special regexp metacharacters in it, you can do as Sobrique suggest and use quotemeta or \Q to escape them.
In particular, this is almost always the correct solution if the string to be matched comes from user input. If you do want to allow some limited set of non-literal metacharacters, you can unescape those after running the pattern through quotemeta. For a simple example, here's a quick-and-dirty way to turn a basic glob-like pattern (using the metacharacters ? and * for "any character" and "any string of characters" repectively) into an equivalent regexp:
my $regexp = "^\Q$glob\E\$"; # quote and anchor the pattern
$regexp =~ s/\\\?/./g; # replace "?" (escaped to "\?" by \Q) with "."
$regexp =~ s/\\\*/.*/g; # replace "*" (escaped to "\*" by \Q) with ".*"
Conversely, if you want to have a literal regexp pattern in your code, without immediately matching it against something, you can use the qr// regexp-like quote operator, like this:
my $old = qr/\b23\.0(\.0)?\b/; # match 23.0 or 23.0.0 (but not 123.012!)
my $new = "23.0.1"; # just a literal string
s/$old/$new/g; # replace any string matching $old in $_ with $new
Note that qr// has other effects beyond just allowing you to use regexp syntax in a string literal: it actually pre-compiles the pattern into a special Regexp object, so that it doesn't need to be recompiled every time it's used later. In particular, as a side effect, the string representation of a qr// regexp literal will usually not exactly match the original content, although it will be equivalent as a regexp. For example, say qr/\b23\.0(\.0)?\b/ will, on my Perl version, output (?^u:\b23\.0(\.0)?\b).
You could also just use a normal double-quoted string literal, and double any backslashes in it, but that's (usually) less efficient than using qr//, and also less readable due to leaning toothpick syndrome.
Using a single-quoted string literal would be slightly better, since backslashes in a single-quoted string are only special when followed by another backslash or a single quote. Even so, readability can still suffer if you happen to need to match any literal backslashes in your regexp, not to mention that it's easy to create subtle bugs if you forget to double a backslash in those rare places where it's still needed.
Is there way to pass some string to regexp and not worry about ecranisation of special chars.
For example I wont to find line which starts with words "\north+west\", as you can see "\n" and "h+" should be ecranised. So question is there some special combination to write text as it is?
/^\s+(<some special combination> \north+west\)\s+/i
or maybe you know function which can properly ecranise my text?
In PHP and Perl you can use \Q...\E delimiters to autoescape metacharacters inside regexp. Quoting the doc:
\Q and \E can be used to ignore regexp metacharacters in the pattern.
For example: \w+\Q.$.\E$ will match one or more word characters,
followed by literals .$. and anchored at the end of the string.
In addition to #raina77ow answer, when you use pcre via a language like PHP that needs pattern delimiters, you can't use the \Q...\E feature if your string contains the opening or the closing delimiter. For example, you can't write patterns like:
/\Qabc/def\E/
~\Qabc~def\E~
[\Qabc[def\E]
[\Qabc]def\E]
(\Qabc)def\E)
(\Qabc(def\E)
The only way is to use the preg_quote function and to put the delimiter (only if this one isn't already a special regex character) in its second parameter.
Sorry, but once again I need help to understand rather complicated snippet from the "Programming Perl" book. Here it is (what is obscure to me marked as bold):
patterns are parsed like double-quoted strings, all the normal double-quote conventions will work, including variable interpolation (unless you use single quotes
as the delimiter) and special characters indicated with backslash escapes. These are applied before the string is interpreted as a regular expression (This is one of the
few places in the Perl language where a string undergoes more than one pass of
processing). ...
Another consequence of this two-pass parsing is that the ordinary Perl tokener
finds the end of the regular expression first, just as if it were looking for the
terminating delimiter of an ordinary string. Only after it has found the end of the
string (and done any variable interpolation) is the pattern treated as a regular
expression. Among other things, this means you can’t “hide” the terminating
delimiter of a pattern inside a regex construct (such as a bracketed character class
or a regex comment, which we haven’t covered yet). Perl will see the delimiter
wherever it is and terminate the pattern at that point.
First, why it is said that Only after it has found the end of the string not the end of the regular expression which it was looking, as stated before?
Second, what does it mean you can’t “hide” the terminating delimiter of a pattern inside a regex construct? Why I can't hide the terminating delimiter /, whereas I can place it wherever I want either in the regexp directly /A\/C/ or in a interpolated variable (even without \):
my $s = 'A/';
my $p = 'A/C';
say $p =~ /$s/;
outputs 1.
While I was writing and re-reading my question I thought that this snippet tells about using a single-quote as a regexp delimiter, then it all seems quite cohesive. Is my assumption correct?
My appreciation.
It says "end of the string" instead of "end of the regular expression" because at that point it's treating the regex as if it were just a string.
It's trying to say that this does not work:
/foo[-/_]/
Even though normal regex metacharacters are not special inside [], Perl will see the regex as /foo[-/ and complain about an unterminated class.
It's trying to say that Perl does not parse the regex as it reads it. First it finds the end of the regex in your source code as if it were a quoted string, so the only special character is \. Then it interpolates any variables. Then it parses the result as a regular expression.
You can hide the terminating delimiter with \ because that works in ordinary strings. You can hide the delimiter inside an interpolated variable, because interpolation happens after the delimiter is found. If you use a bracketing delimiter (e.g. { } or [ ]), you can nest matching pairs of delimiters inside the regex, because q{} works like that too.
But you can't hide it inside any other regex construct.
Say you want to match a *. You would use
m/\*/
But what if you were using you used * as your delimiter? The following doesn't work:
m*\**
because it's interpreted as
m/*/
as seen in the following:
$ perl -e'm*\**'
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE / at -e line 1.
Take the string literal
"a\"b"
It produces the string
a"b
Similarly, the match operator
m*a\*b*
produces the regex pattern
a*b
If you want to match a literal *, you have to use other means. In other words.
m*a\*b* === m/a*b/ matches pattern a*b
m*a\x{2A}b* === m/a\*b/ matches pattern a\*b
Thanks for the previous assistance everyone!. I have a query regarding RegExp in Perl
My issue is..
I know, when matching you can write m// or // or ## (must include m or s if you use this). What is causing me the confusion is a book example on escaping characters I have. I believe most people escape lots of characters, as a sure fire way of the program working without missing a metacharacter something ie: \# when looking to match # say in an email address.
Here's my issue and I know what this script does:
$date= "15/12/99"
$date=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#; << why are no forward slashes escaped??
print($date);
Yet the later example I have, shows it rewritten, as (which i also understand and they're escaped)
$date =~ s/()(\d+)\/(\d+)\/(d+)/$2\/$1\/$3; <<<<which is escaping the forward slashes.
I know the slashes or hashes are programmer preference and their use. What I don't understand is why the second example, escapes the slashes, yet the first doesn't - I have tried and they work both ways. No escaping slashes with hashes? What's even MORE confusing is, looking at yet another book example I also have earlier to this one, using hashes again, they too escape the # symbol.
if ($address =~ m#\##) { print("That's an email address"); } or something similar
So what do you escape from what you don't using hashes or slashes? I know you have to escape metacharacters to match them but I'm confused.
When you build a regexp, you define a character as a delimiter for your regexp i.e. doing // or ##.
If you need to use that character inside your regexp, you will need to escape it so that the regexp engine does not see it as the end of the regexp.
If you build your regexp between forward slashes /, you will need to escape the forward slashes contained in your regexp, hence the escaping in your second example.
Of course, the same rule apply with any character you use as a regexp delimiter, not just forward slashes.
The forward slashes are not meta characters in themselves - only the use of them in the second example as expression separators makes them "special".
The format of a substitute expression is:
s<expression separator char><expression to look for><expression separator char><expression to replace with><expression separator char>
In the first example, using a hash as the first character after the =~ s, makes that character the expression separator, so forward slash is not special and does not require any escaping.
in the second example, the expression separator is indeed the forward slash, so it must be escaped within the expressions themselves.
The regex match-operator allows to define a custom non-whitespace-character as seperator.
In your first example the '#' is used as seperator. So in this regex you don't need to escape the '/' because it hase no special meaning. In the second regex, the seperator char isn't changed. So the default '/' is used. Now you have to escape all '/' in your pattern. Otherwise the parser is confused. :)
If you are not use slashes, the recommend practice is to use the curly braces and the /x modifier.
$date=~ s{ (\d+) \/ (\d+) \/ (\d+) }{$1/$2/$3}x;
Escaping the non-alphanumerics is also a standard even if they are not meta-characters. See perldoc -f quotemeta.
There is another depth to this question about escaping forward slashes with the s operator.
With my example the capturing becomes the problem.
$image_name =~ s/((http:\/\/.+\/)\/)/$2/g;
For this to work the typo with the addition of a second forward slash, had to be captured.
Also, trying to work with just the two slashes did not work. The first slash has to be led by more than one character.
Changing "http://world.com/Photos//space_shots/out_of_this_world.jpg"
To: "http://world.com/Photos/space_shots/out_of_this_world.jpg"