I am struggling with writing regex expression in Snowflake.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123','^DEM.*\d\d$') AS regex
I would like to find all strings that starts with "DEM" and ends with two digits. Unfortunately the expression that I am using returns FALSE.
I was checking this expression in two regex generators and it worked.
In snowflake the backslash character \ is an escape character.
Reference: Escape Characters and Caveats
So you need to use 2 backslashes in a regex to express 1.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123', '^DEM.*\\d\\d$') AS regex
Or you could write the regex pattern in such a way that the backslash isn't used.
For example, the pattern ^DEM.*[0-9]{2}$ matches the same as the pattern ^DEM.*\d\d$.
You need to escape your backslashes in your SQL before it can be parsed as a regex string. (sometimes it gets a bit silly with the number of backslashes needed)
Your example should look like this
RLIKE('DEM7BZB01-123','^DEM.*\\d\\d$') AS regex
RLIKE (which is an alias in Snowflake for the SQL Standard REGEXP_LIKE function) implicitly adds ^ and $ to your search pattern...
The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$').
so you can remove them, and that then allows you to use $$ quoting
In single-quoted string constants, you must escape the backslash character in the backslash-sequence. For example, to specify \d, use \d. For details, see Specifying Regular Expressions in Single-Quoted String Constants (in this topic).
You do not need to escape backslashes if you are delimiting the string with pairs of dollar signs ($$) (rather than single quotes).
so you can simply use the regex DEM.*\d\d to find all strings that starts with DEM and ends with two digits without extra escaping as follows
SELECT
'DEM7BZB01-123' AS SKU
, RLIKE('DEM7BZB01-123', $$DEM.*\d\d$$) AS regex
which gives
SKU |REGEX|
-------------+-----+
DEM7BZB01-123|true |
Related
I've got a little problem with regex.
I got few strings in one file looking like this:
TEST.SYSCOP01.D%%ODATE
TEST.SYSCOP02.D%%ODATE
TEST.SYSCOP03.D%%ODATE
...
What I need is to define correct regex and change those string name for:
TEST.D%%ODATE.SYSCOP.#01
TEST.D%%ODATE.SYSCOP.#02
TEST.D%%ODATE.SYSCOP.#03
Actually, I got my regex:
r".SYSCOP[0-9]{2}.D%%ODATE" - for finding this in file
But how should look like the changing regex? I need to have the numbers from a string at the and of new string name.
.D%%ODATE.SYSCOP.# - this is just string, no regex and It didn't work
Any idea?
Find: (SYSCOP)(\d+)\.(D%%ODATE)
Replace: $3.$1.#$2 or \3.\1.#\2 for Python
Demo
You may use capturing groups with backreferences in the replacement part:
s = re.sub(r'(\.SYSCOP)([0-9]{2})(\.D%%ODATE)', r'\3\1.#\2', s)
See the regex demo
Each \X in the replacement pattern refers to the Nth parentheses in the pattern, thus, you may rearrange the match value as per your needs.
Note that . must be escaped to match a literal dot.
Please mind the raw string literal, the r prefix before the string literals helps you avoid excessive backslashes. '\3\1.#\2' is not the same as r'\3\1.#\2', you may print the string literals and see for yourself. In short, inside raw string literals, string escape sequences like \a, \f, \n or \r are not recognized, and the backslash is treated as a literal backslash, just the one that is used to build regex escape sequences (note that r'\n' and '\n' both match a newline since the first one is a regex escape sequence matching a newline and the second is a literal LF symbol.)
I'm trying some string manipulation using regex's, but I'm not getting the expected output
var myString = "/api/<user_id:int>/"
myString.replace(Regex("<user_id:int>"), "(\\d+)")
this should give me something like /api/(\d+)/ but instead I get /api/(d+)/
However if I create an escaped string directly like var a = "\d+"
I get the correct output \d+ (that I can further use to create a regex Pattern)
is this due to the way String::replace works?
if so, isn't this a bug, why is it removing my escape sequences?
To make the replace a literal string, use:
myString.replace(Regex("<user_id:int>"), Regex.escapeReplacement("(\\d+)"))
For details, this is what kotlin Regex.replace is doing:
Pattern nativePattern = Pattern.compile("<user_id:int>");
String m = nativePattern.matcher("/api/<user_id:int>/").replaceAll("(\\d+)");
-> m = (d+)
From Matcher.replaceAll() javadoc:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string. Dollar signs may be treated
as references to captured subsequences as described above, and
backslashes are used to escape literal characters in the replacement
string.
The call to Regex.escapeReplacement above does exactly that, turning (\\d+) to (\\\\d+)
You are using a .replace overload that takes a regex as the first argument, thus, the second argument is parsed as a regex replacement pattern. Inside a regex replacement pattern, a \ char is special, it may escape a dollar symbol to be treated as a literal dollar sign. So, the literal backslash inside regex replacement patterns should be doubled.
You might use
myString.replace(Regex("<user_id:int>"), """(\\d+)""")
Whenever you have to search and replace with a regex and your replacement pattern is a dynamic value, you should use Regex.escapeReplacement (see GUIDO's answer).
However, you are replacing a literal value with another literal value, you do not have to use a regex here:
myString.replace("<user_id:int>", """(\d+)""")
See this Kotlin demo yielding /api/(\d+)/.
Note the use of raw string literals where a backslash is parsed as a literal backslash.
The replacement as the regex engine see's it is interpolated as a double quoted string.
This is true with every regex engine.
This is to distinguish control codes, like tab newline or carriage return.
Nothing special here.
So the replacement as the engine wants to see it is (\\d+).
The language interpolates the same.
Final result repl_str = "(\\\\d+)"
In regex is there a way to escape special characters in an entire region of text in PCRE syntax?
eg. hey+Im+A+Single+Word+Including+The+Pluses+And.Dots
Normally to match the exact string in regex I would have to escape every single + and . with /s in the above string. This means that if the string is a variable, One has to seek for special characters and escape them manually. Is there an easier way to do this by telling regex escape all special characters in a block of text?
The motivation behind this is to append this to a larger regex so even though there are easier ways to get exact matches they don't apply here.
Everything between \Q and \E meta characters are treated as literals in PERL Compatible RegExes (PCRE). So in your case:
\Qhey+Im+A+Single+Word+Including+The+Pluses+And.Dots\E
Other engines rarely support this syntax.
If it's python. you can use
re.escape(string) to get a literals string
import re
search = 'hey+Im+A+Single+Word+Including+The+Pluses+And.Dots'
text = '''hey+Im+A+Single+Word+Including+The+Pluses+And.Dots
heyImmASingleWordIncludingThePlusessAndaDots
heyImASingleWordIncludingThePlusesAndxDots
'''
rc = re.escape(search)
#exactly first line in text
print(re.findall(rc,text))
#line two and three as it will + as repeat and . as any char
print(re.findall(search,text))
-------- result -------------------
['hey+Im+A+Single+Word+Including+The+Pluses+And.Dots']
['heyImmASingleWordIncludingThePlusessAndaDots', 'heyImASingleWordIncludingThePlusesAndxDots']
How to use VS Find/Replace to replace:
this: $('a[name="lnkFind"]').on('click', function
with this: $(document).on("click", "a[name='lnkFind']", function
I'm not sure which characters need to be escaped - single or double quotes or both? None of the patters I've tried seem to find a match.
You'll need to escape many of these characters.
Find/Replace will complain about the un-escaped ( and ), even the bare ( at the end because it's missing a matching ). Also the square brackets, which are used for character sets, and finally the $.
So this should work as the pattern:
\$\('a\[name="lnkFind"\]'\).on\('click', function
You should look at a list of special characters in Regular Expressions.
$, ., [, ] should all be escaped.
http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html
Except in special cases (such as vim regex), in general you can escape any and all special characters in regex to get their literal form, i.e. escaping a special character that doesn't need to be escaped, won't do any harm.
That said, here's the minimum that needs to be escaped:
\$\('a\[name="lnkFind"]')\.on\('click', function
I don't think you'll need to escape anything in the replacement, because only a $ or \ followed by a number will be interpreted.
I am a newbie of regular expressions, I try to understand what kind of string of the following regular expressions trying to match:
set result [regexp "$PersonName\\|\[^\\n]*\\|\[^\\n]*\\|\\s*0x$PersonId\\|\\s*$gender" [split $outPut \n]]
what does the regular expressions above trying to match?what is the value of result?
The complication here is that the regex specification is protected from the Tcl's string interpolation rules.
To detangle, you should think along these lines:
"$PersonName\\|\[^\\n]*\\|\[^\\n]*\\|\\s*0x$PersonId\\|\\s*$gender" is a double-quoted string, so the usual interpolation rules apply:
Each backslash escapes the following character;
Each $variable reference is substituted for its value;
[command ...] is substituted for the string returned by the executed command.
So each occurence of \\ is there to produce a single '\' character in the interpolated string, and \[ are meant to prevent Tcl from interpreting those [^\n] as commands (named "^\n") to be executed.
So if we suppose that the PersonName variable contains "Joe", PersonId contains DEAD and gender contains "male", Tcl will get Joe\|[^\n]*\|[^\n]*\|\s*0xDEAD\|\s*male after performing all substitutions on the source string.
Now the resulting string is passed to the RE engine which applies its own syntacting rules when it parses the string denoting a regex, as described in the re_syntax manual page.
According to these rules, each backslash, again, escapes the following character unless it's a special "character-entry escape" so here we have:
\s denotes "any whitespace character";
\| escapes the '|' making it lose its usual meaning—to introduce an alteration—so that it literally matches the character '|'.
The [^\n]* construct means "a longest series of zero or more characters not including the newline character". Read up on "character classes" in regexes for more info.
The value of result will be the number of times the regular expression matched. In the absence of the -all option, that will always be 0 or 1 (i.e., not-found/found).
Overall, that regular expression (which #kostix's answer explains well) is really ugly though. REs are a powerful tool, but you can get very confused with them very easily. Moreover, if you're splitting the output on newlines then you don't need to try to exclude them in the RE match; there will definitely be no newlines in the result of split in that case.
If we better understood what you were trying to do, we could direct you to far more effective methods of matching (e.g., using lsearch with suitable options, loading the data into an in-memory SQLite database).