RegExp: is there way to pass string to regexp without ecranisation? - regex

Is there way to pass some string to regexp and not worry about ecranisation of special chars.
For example I wont to find line which starts with words "\north+west\", as you can see "\n" and "h+" should be ecranised. So question is there some special combination to write text as it is?
/^\s+(<some special combination> \north+west\)\s+/i
or maybe you know function which can properly ecranise my text?

In PHP and Perl you can use \Q...\E delimiters to autoescape metacharacters inside regexp. Quoting the doc:
\Q and \E can be used to ignore regexp metacharacters in the pattern.
For example: \w+\Q.$.\E$ will match one or more word characters,
followed by literals .$. and anchored at the end of the string.

In addition to #raina77ow answer, when you use pcre via a language like PHP that needs pattern delimiters, you can't use the \Q...\E feature if your string contains the opening or the closing delimiter. For example, you can't write patterns like:
/\Qabc/def\E/
~\Qabc~def\E~
[\Qabc[def\E]
[\Qabc]def\E]
(\Qabc)def\E)
(\Qabc(def\E)
The only way is to use the preg_quote function and to put the delimiter (only if this one isn't already a special regex character) in its second parameter.

Related

How to exclude part of string using regex and change add this part and the and of string?

I've got a little problem with regex.
I got few strings in one file looking like this:
TEST.SYSCOP01.D%%ODATE
TEST.SYSCOP02.D%%ODATE
TEST.SYSCOP03.D%%ODATE
...
What I need is to define correct regex and change those string name for:
TEST.D%%ODATE.SYSCOP.#01
TEST.D%%ODATE.SYSCOP.#02
TEST.D%%ODATE.SYSCOP.#03
Actually, I got my regex:
r".SYSCOP[0-9]{2}.D%%ODATE" - for finding this in file
But how should look like the changing regex? I need to have the numbers from a string at the and of new string name.
.D%%ODATE.SYSCOP.# - this is just string, no regex and It didn't work
Any idea?
Find: (SYSCOP)(\d+)\.(D%%ODATE)
Replace: $3.$1.#$2 or \3.\1.#\2 for Python
Demo
You may use capturing groups with backreferences in the replacement part:
s = re.sub(r'(\.SYSCOP)([0-9]{2})(\.D%%ODATE)', r'\3\1.#\2', s)
See the regex demo
Each \X in the replacement pattern refers to the Nth parentheses in the pattern, thus, you may rearrange the match value as per your needs.
Note that . must be escaped to match a literal dot.
Please mind the raw string literal, the r prefix before the string literals helps you avoid excessive backslashes. '\3\1.#\2' is not the same as r'\3\1.#\2', you may print the string literals and see for yourself. In short, inside raw string literals, string escape sequences like \a, \f, \n or \r are not recognized, and the backslash is treated as a literal backslash, just the one that is used to build regex escape sequences (note that r'\n' and '\n' both match a newline since the first one is a regex escape sequence matching a newline and the second is a literal LF symbol.)

Regex forward slash

What does a '/' character mean in a regular expression?
I have observed the following example to match single or double digit numbers.
/^\d{1,2}$/
When I googled multiple regex cheat sheets, the forward slash did not show up as a character with meaning in regex....
What does '/' do in regex?
It doesn't actually do anything. In Javascript, Perl and some other languages, it is used as a delimiter character explicitly for regular expressions.
Some languages like PHP use it as a delimiter inside a string, with additional options passed at the end, just like Javascript and Perl (in this case, "m" for multi-line):
preg_match("/^\d{1,2}$/m", $input);
With this syntax, you can also use other characters, which can make matching literal /'s easier:
preg_match("![a-z]+/[a-z]+!i", "Example/Match");

How to match a whole word that includes special characters in regex

In regex is there a way to escape special characters in an entire region of text in PCRE syntax?
eg. hey+Im+A+Single+Word+Including+The+Pluses+And.Dots
Normally to match the exact string in regex I would have to escape every single + and . with /s in the above string. This means that if the string is a variable, One has to seek for special characters and escape them manually. Is there an easier way to do this by telling regex escape all special characters in a block of text?
The motivation behind this is to append this to a larger regex so even though there are easier ways to get exact matches they don't apply here.
Everything between \Q and \E meta characters are treated as literals in PERL Compatible RegExes (PCRE). So in your case:
\Qhey+Im+A+Single+Word+Including+The+Pluses+And.Dots\E
Other engines rarely support this syntax.
If it's python. you can use
re.escape(string) to get a literals string
import re
search = 'hey+Im+A+Single+Word+Including+The+Pluses+And.Dots'
text = '''hey+Im+A+Single+Word+Including+The+Pluses+And.Dots
heyImmASingleWordIncludingThePlusessAndaDots
heyImASingleWordIncludingThePlusesAndxDots
'''
rc = re.escape(search)
#exactly first line in text
print(re.findall(rc,text))
#line two and three as it will + as repeat and . as any char
print(re.findall(search,text))
-------- result -------------------
['hey+Im+A+Single+Word+Including+The+Pluses+And.Dots']
['heyImmASingleWordIncludingThePlusessAndaDots', 'heyImASingleWordIncludingThePlusesAndxDots']

Notepad++ regex replace - how to remove this string

I want to remove strings in the form of the following where some-text is a random text string.
$('#some-text').val();
I've tried various things but I think the $ sign is messing things up since it's used in regex.
You need to escape some characters.
Try this -
\$\('#[^']*'\)\.val\(\);
Try this regex by escaping special chars:
\$\(.*\).val\(\);
To avoid escaping the special characters, you can use \Q - \E pair to surround the part where you want the regex engine to interpret literally:
\Q$('\E<your-regex>\Q').val();\E
Replace <your-regex> with your regex to match the selector, or whatever it is.

What does it mean "you can’t hide the terminating delimiter of a pattern inside a regex construct" in the "Programming Perl"?

Sorry, but once again I need help to understand rather complicated snippet from the "Programming Perl" book. Here it is (what is obscure to me marked as bold):
patterns are parsed like double-quoted strings, all the normal double-quote conventions will work, including variable interpolation (unless you use single quotes
as the delimiter) and special characters indicated with backslash escapes. These are applied before the string is interpreted as a regular expression (This is one of the
few places in the Perl language where a string undergoes more than one pass of
processing). ...
Another consequence of this two-pass parsing is that the ordinary Perl tokener
finds the end of the regular expression first, just as if it were looking for the
terminating delimiter of an ordinary string. Only after it has found the end of the
string (and done any variable interpolation) is the pattern treated as a regular
expression. Among other things, this means you can’t “hide” the terminating
delimiter of a pattern inside a regex construct (such as a bracketed character class
or a regex comment, which we haven’t covered yet). Perl will see the delimiter
wherever it is and terminate the pattern at that point.
First, why it is said that Only after it has found the end of the string not the end of the regular expression which it was looking, as stated before?
Second, what does it mean you can’t “hide” the terminating delimiter of a pattern inside a regex construct? Why I can't hide the terminating delimiter /, whereas I can place it wherever I want either in the regexp directly /A\/C/ or in a interpolated variable (even without \):
my $s = 'A/';
my $p = 'A/C';
say $p =~ /$s/;
outputs 1.
While I was writing and re-reading my question I thought that this snippet tells about using a single-quote as a regexp delimiter, then it all seems quite cohesive. Is my assumption correct?
My appreciation.
It says "end of the string" instead of "end of the regular expression" because at that point it's treating the regex as if it were just a string.
It's trying to say that this does not work:
/foo[-/_]/
Even though normal regex metacharacters are not special inside [], Perl will see the regex as /foo[-/ and complain about an unterminated class.
It's trying to say that Perl does not parse the regex as it reads it. First it finds the end of the regex in your source code as if it were a quoted string, so the only special character is \. Then it interpolates any variables. Then it parses the result as a regular expression.
You can hide the terminating delimiter with \ because that works in ordinary strings. You can hide the delimiter inside an interpolated variable, because interpolation happens after the delimiter is found. If you use a bracketing delimiter (e.g. { } or [ ]), you can nest matching pairs of delimiters inside the regex, because q{} works like that too.
But you can't hide it inside any other regex construct.
Say you want to match a *. You would use
m/\*/
But what if you were using you used * as your delimiter? The following doesn't work:
m*\**
because it's interpreted as
m/*/
as seen in the following:
$ perl -e'm*\**'
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE / at -e line 1.
Take the string literal
"a\"b"
It produces the string
a"b
Similarly, the match operator
m*a\*b*
produces the regex pattern
a*b
If you want to match a literal *, you have to use other means. In other words.
m*a\*b* === m/a*b/ matches pattern a*b
m*a\x{2A}b* === m/a\*b/ matches pattern a\*b