regex to match the argument of specific functions - regex

how to write a regular expression to match the below string :
name(abc1) or number(param9) or listget(12jtxgh)
I want to match the string enclosed in brackets only if it is prepended by name or number or listget.
I tried to this :
r'((.*?))'
and if my expression looks like below :
(name(foo) & number(bar)) - listget(baz)
then it starts matching (name(foo) also. I want my regex to extract only foo, bar and baz from the above expression as it is appended by name, number, listget.
I have to write regex in this form only #r'regex'

The following regular expression should do the trick:
(?:listget|name|number)\(([^)]+)\)
You can try a working demo by visiting this link. As others pointed out, parenthesis must be escaped in order to match their literal, otherwise they are being used by the regex for different purposes (like capturing groups).

In regex brackets are special symbols used for grouping. To match them you have to use an escape character \ like this \(.
For NOT matching a symbol ex. brackets:
[^)]
should be used. Where you don't need the escape character.
For finding an alternative you should use the pipe character | surrounded by brackets. Example:
(mary|john)
For NOT catching a group in a result match inside the the brackets you should start with ?:

Related

Regular expression in Snowflake - starts with string and ends with digits

I am struggling with writing regex expression in Snowflake.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123','^DEM.*\d\d$') AS regex
I would like to find all strings that starts with "DEM" and ends with two digits. Unfortunately the expression that I am using returns FALSE.
I was checking this expression in two regex generators and it worked.
In snowflake the backslash character \ is an escape character.
Reference: Escape Characters and Caveats
So you need to use 2 backslashes in a regex to express 1.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123', '^DEM.*\\d\\d$') AS regex
Or you could write the regex pattern in such a way that the backslash isn't used.
For example, the pattern ^DEM.*[0-9]{2}$ matches the same as the pattern ^DEM.*\d\d$.
You need to escape your backslashes in your SQL before it can be parsed as a regex string. (sometimes it gets a bit silly with the number of backslashes needed)
Your example should look like this
RLIKE('DEM7BZB01-123','^DEM.*\\d\\d$') AS regex
RLIKE (which is an alias in Snowflake for the SQL Standard REGEXP_LIKE function) implicitly adds ^ and $ to your search pattern...
The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$').
so you can remove them, and that then allows you to use $$ quoting
In single-quoted string constants, you must escape the backslash character in the backslash-sequence. For example, to specify \d, use \d. For details, see Specifying Regular Expressions in Single-Quoted String Constants (in this topic).
You do not need to escape backslashes if you are delimiting the string with pairs of dollar signs ($$) (rather than single quotes).
so you can simply use the regex DEM.*\d\d to find all strings that starts with DEM and ends with two digits without extra escaping as follows
SELECT
'DEM7BZB01-123' AS SKU
, RLIKE('DEM7BZB01-123', $$DEM.*\d\d$$) AS regex
which gives
SKU |REGEX|
-------------+-----+
DEM7BZB01-123|true |

How to exclude character that has preceding character different than specified in regular expression?

With regular expression I would like to get all characters between round brackets, but \( and \) characters should be also included in the result.
Examples:
input: fo(ob)a)r
output: ob
input: foo(bar\(qwerty\))baz
output: bar\(qwerty\)
This is what I used for finding text between brackets:
(?<=\()([^\s\(\)]+)(?=\)), but I can't make exceptions for brackets preceded by \.
You could do something like this :
.*(?<!\\)\((.*?)(?<!\\)\)
Basically, it matches as many characters as possible until it sees an open parenthesis without a backslash (using a negative lookbehind), then groups the next matching characters until a closing parenthesis (still without a backslash).
Note that this regex may not work properly if you escape the backslashes.
Example : https://regex101.com/r/BqVKZp/1
This regex works for both your examples, without any lookaheads and lookbehinds:
\((.+[^\\])\)
A U flag is needed.

Find and replace parts of matched string in Notepad++

I have something in a text file that looks like '%r'%XXXX, where the XXXX represents some name at the end. Examples include '%r'%addR or '%r'%removeA. I can match these patterns using regex '%r'%\w+. What I would like to replace this with is '{!r}'.format(XXXX). Note that the name has to stay the same in the replace so I'd get '{!r}.format(addR) or '{!r}.format(removeA). Is there a way to replace parts of the matched string in this way while retaining the unknown variable name pulled out with \w+ in the regex search?
I'm specifically looking for a solution using the find and replace features in Notepad++.
You can use
'%r'%(\w+)
and replace with '{!r}.format\(\1\)
The '%r'%(\w+) pattern contains a pair of unescaped parentheses that create a capturing group. Inside the replacement pattern, we use a \1 backreference to restore that value.
NOTE: The ( and ) in the replacement must be escaped because otherwise they are treated as Boost conditional replacement pattern functional characters.
See more on capturing groups and backreferences.
Search on:
'%r'%(XXXX)
Replace with:
Whatever You like \1
\1 will match the first set of grouping parentheses.

Strange behavior when using regex to match parentheses in vim

I'm having some trouble understanding why a regular expression is not working. I'm searching for the phrase #Test(groups = {"broken"}), and I'm not able to find it with this expression:
#Test\(groups = {"broken"}\)
However, this expression yields results:
#Test\(.*groups = {"broken"}\)
Why is this happening? I can't see why the first expression would not work, but I understand why the second one does.
\( is used for capture in vim since it does not use extended/"magic" regexen by default. If you want to search for a literal paren, use (.
The second expression works because .* matches (.
If you want to search for literal text, just prepend \V to the search pattern; then, only the backslash has special meaning and must be escaped:
/\V#Test(groups = {"broken"})
In contrast to most other regular expression dialects, many Vim atoms need to be prefixed with \ to be non-literal. To make Vim's patterns look more like Perl's, you can prepend \v; then, (...) do capture grouping (as you've expected), and you need to escape \( to match literal parentheses.

NOTEPAD++ REGEX - I can't get what's in between two strings, I don't get it

I'm so close to understanding regex. I'm a bit stumped, I thought i understood lazy and greedy.
Here is my current regex: <g_n><!\[CDATA\[([^]]+)(?=]]><\/g_n>)
My current regex makes:
<g_n><![CDATA[xxxxxxxxxx]]></g_n>
match to:
<g_n><![CDATA[xxxxxxxxxx
But I want to make it match like this:
xxxxxxxxxx
You want
<g_n><!\[CDATA\[(.*?)]]></g_n>
then if you want to replace it use
\1
in the replacement box
Your matching the whole string, the brackets around the .*? match all of that and put it in the \1 variable
So the match will be all of the string with \1 referring to what you want
To change the xxxxx
Regex :
(<g_n><![CDATA[)(?:.*?)(]]></g_n>)
Replacement
\1WHAT YOU WANT TO CHANGE TO\2
It looks like you need to add escape slashes to the two closing square brackets, as they are literals from the string you're parsing.
<g_n><!\[CDATA\[.*+?\]\]><\/g_n>
^ ^
Any square brackets not being escaped by backslashes will be treated as regex operational brackets, which in this case won't catch the input string.
EDIT, I think the +? is redundant.
\[.*\]\]> ...
should suffice, since .* means any character, any amount of times.
Tested with notepad++ 6.3.2:
find: (<g_n><!\[CDATA\[)([^]]+)(?=]]></g_n>)
replace: $1WhatYouWant
You can replace + by * in the pattern to match void CDATA:
<g_n><![CDATA[]]></g_n>