I'm looking for some help diagnosis what I'm doing wrong, when applying a working RegEx expression to Snowflake (specifically REGEXP_REPLACE()).
I'm trying to replace commas (not within a quoted section) with another string. I've tested and confirmed that the expression returns the desired result in regex101.com, but when I try and apply it to a Snowflake query I'm not getting any results.
I've seen references in the REGEXP_REPLACE() documentation (indicating the need for additional escapes on brackets) which I have applied - still no dice.
Can anyone tell me what I'm missing??
Sample text (C1):
99999999999,"SOME CORPROATION, Dissolved January 17, 1983",123 SOME STREET #760,,Denver,CO,90210,,,,,,,,Voluntarily Dissolved,CO,Corporation,JOHN,F.,DOE,,,1512 SOME STREET #760,,DENVER,CO,90210,US,,,,,,,03/29/1886
Working Regex:
(?:[^"\']|(?:\".*?\")|(?:\'.*?\'))*?(,)
My interpretation of SF reqs for RegEx:
REGEXP_REPLACE((C1), '\\(?:[^\\"\']|\\(?:\".*?\"\\)|\\(?:\'.*?\'\\)\\)*?\\(,\\)', '","') AS "blah"
just discovered that Snowflake only offers support for Posix Standard and Extended RegEx, so usage of non-capturing groups is not possible at all.
If you just want a solution that works rather than specifically a REGEX solution, then the following UDF should do the job:
CREATE OR REPLACE FUNCTION replace_char("in_text" string, "replace_text" string, "skip_text" string)
RETURNS string
LANGUAGE JAVASCRIPT
AS
$$
var out_string = '';
var skipping = false;
for (var i = 0; i < in_text.length; i++) {
if (in_text.charAt(i) == skip_text) {
skipping = !skipping;
}
if (skipping === false && in_text.charAt(i) != replace_text) {
out_string = out_string + in_text.charAt(i);
}
else {
if (skipping === true) {
out_string = out_string + in_text.charAt(i);
}
}
}
return out_string;
$$
;
In order to be productionised it would need error handling, checks on the inputs, etc. but this should be enough to get you started.
You can use it as follows:
set intext = '99999999999,"SOME CORPROATION, Dissolved January 17, 1983",123 SOME STREET #760,,Denver,CO,90210,,,,,,,,Voluntarily Dissolved,CO,Corporation,JOHN,F.,DOE,,,1512 SOME STREET #760,,DENVER,CO,90210,US,,,,,,,03/29/1886';
set replace_text = ','; -- char to remove from $intext
set skip_text = '"'; --Between matching occurrences of this char no text will be replaced
select replace_char($intext,$replace_text,$skip_text);
I defined:
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
i need to split the words in s1 and check if they contains in s2
if yes give me the specific exists posts
The best way to help me with this, it is makes it as regex, cause i need this checks for mongo db.
Please let me know the proper regex i need.
Thx.
Possibly was something that could be answered with just the regular expression (and is actually) but considering the data:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
{ "phrase" : "something about androi" }
{ "phrase" : "johnathan was here" }
You match with MongoDB like this:
db.collection.find({ "phrase": /\broi\b|\bjohn\b/ })
And that gets the two documents that match:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
So the regex works by keeping the word boundaries \b around the words to match so they do not partially match something else and are combined with an "or" | condition.
Play with the regexer for this.
Doing open ended $regex queries like this in MongoDB can be often bad for performance. Not sure of your actual use case for this but it is possible that a "full text search" solution would be better suited to your needs. MongoDB has full text indexing and search or you can use an external solution.
Anyhow, this is how you mactch your words using a $regex condition.
To actually process your string as input you will need some code before doing the search:
var string = "roi john";
var splits = string.split(" ");
for ( var i = 0; i < splits.length; i++ ) {
splits[i] = "\\b" + splits[i] + "\\b";
}
exp = splits.join("|");
db.collection.find({ "phrase": { "$regex": exp } })
And possibly even combine that with the case insensitive "$option" if that is what you want. That second usage form with the literal $regex operator is actually a safer form form usage in languages other than JavaScript.
using a loop to iterate over the words of s1 and checking with s2 will give the expected result
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
var arr1 = s1.split(" ");
for(var i=0;i<=arr1.length;i++){
if (s2.indexOf(arr1[i]) != -1){
console.log("The string contains "+arr1[i]);
}
}
I need to determine whether a string begins with a number - I've tried the following to no avail:
if (matches("^[0-9].*)", upper(text))) str = "Title"""
I'm new to DXL and Regex - what am I doing wrong?
You need the caret character to indicate a match only at the start of a string. I added the plus character to match all the numbers, although you might not need it for your situation. If you're only looking for numbers at the start, and don't care if there is anything following, you don't need anymore.
string str1 = "123abc"
string str2 = "abc123"
string strgx = "^[0-9]+"
Regexp rgx = regexp2(strgx)
if(rgx(str1)) { print str1[match 0] "\n" } else { print "no match\n" }
if(rgx(str2)) { print str2[match 0] "\n" } else { print "no match\n" }
The code block above will print:
123
no match
#mrhobo is correct, you want something like this:
Regexp numReg = "^[0-9]"
if(numReg text) str = "Title"
You don't need upper since you are just looking for numbers. Also matches is more for finding the part of the string that matches the expression. If you just want to check that the string as a whole matches the expression then the code above would be more efficient.
Good luck!
At least from example I found this example should work:
Regexp plural = regexp "^([0-9].*)$"
if plural "15systems" then print "yes"
Resource:
http://www.scenarioplus.org.uk/papers/dxl_regexp/dxl_regexp.htm
I have a list of several phrases in the following format
thisIsAnExampleSentance
hereIsAnotherExampleWithMoreWordsInIt
and I'm trying to end up with
This Is An Example Sentance
Here Is Another Example With More Words In It
Each phrase has the white space condensed and the first letter is forced to lowercase.
Can I use regex to add a space before each A-Z and have the first letter of the phrase be capitalized?
I thought of doing something like
([a-z]+)([A-Z])([a-z]+)([A-Z])([a-z]+) // etc
$1 $2$3 $4$5 // etc
but on 50 records of varying length, my idea is a poor solution. Is there a way to regex in a way that will be more dynamic? Thanks
A Java fragment I use looks like this (now revised):
result = source.replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
result = result.substring(0, 1).toUpperCase() + result.substring(1);
This, by the way, converts the string givenProductUPCSymbol into Given Product UPC Symbol - make sure this is fine with the way you use this type of thing
Finally, a single line version could be:
result = source.substring(0, 1).toUpperCase() + source(1).replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
Also, in an Example similar to one given in the question comments, the string hiMyNameIsBobAndIWantAPuppy will be changed to Hi My Name Is Bob And I Want A Puppy
For the space problem it's easy if your language supports zero-width-look-behind
var result = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "(?<=[a-z])([A-Z])", " $1");
or even if it doesn't support them
var result2 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "([a-z])([A-Z])", "$1 $2");
I'm using C#, but the regexes should be usable in any language that support the replace using the $1...$n .
But for the lower-to-upper case you can't do it directly in Regex. You can get the first character through a regex like: ^[a-z] but you can't convet it.
For example in C# you could do
var result4 = Regex.Replace(result, "^([a-z])", m =>
{
return m.ToString().ToUpperInvariant();
});
using a match evaluator to change the input string.
You could then even fuse the two together
var result4 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "^([a-z])|([a-z])([A-Z])", m =>
{
if (m.Groups[1].Success)
{
return m.ToString().ToUpperInvariant();
}
else
{
return m.Groups[2].ToString() + " " + m.Groups[3].ToString();
}
});
A Perl example with unicode character support:
s/\p{Lu}/ $&/g;
s/^./\U$&/;
I have a string that is similar to a path, but I have tried some regex patterns that are supposed to parse paths and they don't quite work.
Here's the string
f|MyApparel/Templates/Events/
I need the "name parts" between the slashes.
I tried (\w+) but the array came back [0] = "f" and [1] = "f".
I tested the pattern on http://www.gskinner.com/RegExr/ and it seems to work correctly.
Here's the AS code:
var pattern : RegExp = /(\w+)/g;
var hierarchy : Array = pattern.exec(params.category_id);
params.name = hierarchy.pop() as String;
pattern.exec() works like in JavaScript. It resets the lastIndex property every time it finds a match for a global regex, and next time you run it it starts from there.
So it does not return an array of all matches, but only the very next match in the string. Hence you must run it in a loop until it returns null:
var myPattern:RegExp = /(\w+)/g;
var str:String = "f|MyApparel/Templates/Events/";
var result:Object = myPattern.exec(str);
while (result != null) {
trace( result.index, "\t", result);
result = myPattern.exec(str);
}
I don't know between which two slashes you want but try
var hierarchy : Array = params.category_id.split(/[\/|]/);
[\/|] means a slash or a vertical bar.