regarding Regex operation in KNIME - regex

This below formula is in alteryx workflow.
if(REGEX_Match([CurrentField],' (',0)) then 'string to display something'
elseif(REGEX_Match([CurrentField],'a',0)) then 'another string to display'
Can you explain me what above function is trying to perform and how to achieve this in KNIME.

To achieve this if/else syntax in KNIME you can use Column Expressions node. In this node there is also rexegMatcher function that returns Boolean.
It can be something like this:
if (regexMatcher(column("column1"),"your_regex" )) {
"string1"
} else if (regexMatcher(column("column1"),"your_regex2" )) {
"string2"
} else {
"string3"
}

Related

$expr with $regexMatch doesn't work when the pattern is inside an array

Based in this example, I'm using $expr and $regexMatch to implement "reverse regex" queries in MongoDB. For instance this example works
However, this only seems to work when the regex is in a first level field in the MongoDB document. In the case the regex is within an element in an array (as in this other example I get errors like this:
query failed: (Location51105) Executor error during find command :: caused by :: $regexMatch needs 'regex' to be of type string or regex
Is there any way of supporting this case?
The regex allows only string input, You can use $map operator to loop the array elements and check the condition,
$map to iterate loop of patterns.pattern array and check $regexMatch condition, it will return boolean value
$anyElementTrue to check if any element is true then it will true
db.collection.find({
"$expr": {
"$anyElementTrue": {
"$map": {
"input": "$patterns.pattern",
"in": {
"$regexMatch": {
"input": "Room1",
"regex": "$$this",
"options": "i"
}
}
}
}
}
})
Playground

Snowflake RegEx syntax woes

I'm looking for some help diagnosis what I'm doing wrong, when applying a working RegEx expression to Snowflake (specifically REGEXP_REPLACE()).
I'm trying to replace commas (not within a quoted section) with another string. I've tested and confirmed that the expression returns the desired result in regex101.com, but when I try and apply it to a Snowflake query I'm not getting any results.
I've seen references in the REGEXP_REPLACE() documentation (indicating the need for additional escapes on brackets) which I have applied - still no dice.
Can anyone tell me what I'm missing??
Sample text (C1):
99999999999,"SOME CORPROATION, Dissolved January 17, 1983",123 SOME STREET #760,,Denver,CO,90210,,,,,,,,Voluntarily Dissolved,CO,Corporation,JOHN,F.,DOE,,,1512 SOME STREET #760,,DENVER,CO,90210,US,,,,,,,03/29/1886
Working Regex:
(?:[^"\']|(?:\".*?\")|(?:\'.*?\'))*?(,)
My interpretation of SF reqs for RegEx:
REGEXP_REPLACE((C1), '\\(?:[^\\"\']|\\(?:\".*?\"\\)|\\(?:\'.*?\'\\)\\)*?\\(,\\)', '","') AS "blah"
just discovered that Snowflake only offers support for Posix Standard and Extended RegEx, so usage of non-capturing groups is not possible at all.
If you just want a solution that works rather than specifically a REGEX solution, then the following UDF should do the job:
CREATE OR REPLACE FUNCTION replace_char("in_text" string, "replace_text" string, "skip_text" string)
RETURNS string
LANGUAGE JAVASCRIPT
AS
$$
var out_string = '';
var skipping = false;
for (var i = 0; i < in_text.length; i++) {
if (in_text.charAt(i) == skip_text) {
skipping = !skipping;
}
if (skipping === false && in_text.charAt(i) != replace_text) {
out_string = out_string + in_text.charAt(i);
}
else {
if (skipping === true) {
out_string = out_string + in_text.charAt(i);
}
}
}
return out_string;
$$
;
In order to be productionised it would need error handling, checks on the inputs, etc. but this should be enough to get you started.
You can use it as follows:
set intext = '99999999999,"SOME CORPROATION, Dissolved January 17, 1983",123 SOME STREET #760,,Denver,CO,90210,,,,,,,,Voluntarily Dissolved,CO,Corporation,JOHN,F.,DOE,,,1512 SOME STREET #760,,DENVER,CO,90210,US,,,,,,,03/29/1886';
set replace_text = ','; -- char to remove from $intext
set skip_text = '"'; --Between matching occurrences of this char no text will be replaced
select replace_char($intext,$replace_text,$skip_text);

Extracting multiple values with RegEx in a Google Sheet formula

I have a Google spreadsheet with 2 columns.
Each cell of the first one contains JSON data, like this:
{
"name":"Love",
"age":56
},
{
"name":"You",
"age":42
}
Then I want a second column that would, using a formula, extract every value of name and string it like this:
Love,You
Right now I am using this formula:
=REGEXEXTRACT(A1, CONCATENER(CHAR(34),"name",CHAR(34),":",CHAR(34),"([^",CHAR(34),"]+)",CHAR(34),","))
The RegEx expresion being "name":"([^"]+)",
The problem being that it currently only returns the first occurence, like this:
Love
(Also, I don't know how many occurences of "name" there are. Could be anywhere from 0 to around 20.)
Is it even possible to achieve what I want?
Thank you so much for reading!
EDIT:
My JSON data starts with:
{
"time":4,
"annotations":[
{
Then in the middle, something like this:
{
"name":"Love",
"age":56
},
{
"name":"You",
"age":42
}
and ends with:
],
"topEntities":[
{
"id":247120,
"score":0.12561166,
"uri":"http://en.wikipedia.org/wiki/Revenue"
},
{
"id":31512491,
"score":0.12504959,
"uri":"http://en.wikipedia.org/wiki/Wii_U"
}
],
"lang":"en",
"langConfidence":1.0,
"timestamp":"2020-05-22T12:17:47.380"
}
Since your text is basically a JSON string, you may parse all name fields from it using the following custom function:
function ExtractNamesFromJSON(input) {
var obj = JSON.parse("[" + input + "]");
var results = obj.map((x) => x["name"])
return results.join(",")
}
Then use it as =ExtractNamesFromJSON(C1).
If you need a regex, use a similar approach:
function ExtractAllRegex(input, pattern,groupId,separator) {
return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}
Then use it as =ExtractAllRegex(C1, """name"":""([^""]+)""",1,",").
Note:
input - current cell value
pattern - regex pattern
groupId - Capturing group ID you want to extract
separator - text used to join the matched results.

How to find all values in word by using regexp in MongoDB?

Let's say I have the following string in MongoDB document:
{"name": "space delimited string"}
I need to build mongodb query with regexp to find this document by entering the following search request:
space string
It look like LIKE operator in RDBS. I know that there is latest MongoDB 3 with full-text search but I need regexp due current outdated version.
Please help me to construct mongodb query with regexp to find document by entering the search above.
Thanks
As I see it there are a couple of options.
If you mean "AND" for all words then use positive lookahead:
{ "name": /(?=.*\bspace\b)(?=.*\bstring\b).+/ }
or if an $all operator suits you better:
{ "name": { "$all": [/\bspace\b/,/\bstrig\b/] } }
And if you mean "OR" for either of the words then you can do:
{ "name": /\bspace\b|\bstring\b/ }
or use an $in operator:
{ "name": { "$in": [/\bspace\b/,/\bstring\b/] } }
Noting that in all cases you likely want those \b boundary matches in there to delimit the "word", or otherwise you are getting "partial" words.
So it depends on which you mean and which suits you best. You can construct the regular expression using its own syntaxt to either mean "AND" or "OR", or alternately you can just use the equivalent MongoDB logical expresions ( $all or $in ) that take a "list" of regular expressions instead.
So build a string for regex or build a list. Your choice.
Naturally of course you need to "break up" a string into the "words" in order to process. Lacking an a language tag here, but as a JavaScript example:
As a single regular expression for "AND":
var searchString = "space string";
var expression = new RegExp(
"" + searchString.split(" ").map(function(word) {
return "(?=.*\\b" + word + "\\b)"
}).join("") + ".+"
)
var query = { "name": expression };
Or for an "OR" condition on a single expression:
var expression = new RegExp(
searchString.split(" ").map(function(word) {
return "\\b" + word + "\\b"
}).join("|")
);
var query = { "name": expression };
Or as a list of expressions:
var type = "AND",
query = { "name": {} };
// List of expressions
var list = searchString.split(" ").map(function(word) {
return new RegExp("\\b" + word + "\\b")
});
// Determine operator based on type
query.name[( type === "AND") ? "$all" : "$in"] = list;

find str in another str with regex

I defined:
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
i need to split the words in s1 and check if they contains in s2
if yes give me the specific exists posts
The best way to help me with this, it is makes it as regex, cause i need this checks for mongo db.
Please let me know the proper regex i need.
Thx.
Possibly was something that could be answered with just the regular expression (and is actually) but considering the data:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
{ "phrase" : "something about androi" }
{ "phrase" : "johnathan was here" }
You match with MongoDB like this:
db.collection.find({ "phrase": /\broi\b|\bjohn\b/ })
And that gets the two documents that match:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
So the regex works by keeping the word boundaries \b around the words to match so they do not partially match something else and are combined with an "or" | condition.
Play with the regexer for this.
Doing open ended $regex queries like this in MongoDB can be often bad for performance. Not sure of your actual use case for this but it is possible that a "full text search" solution would be better suited to your needs. MongoDB has full text indexing and search or you can use an external solution.
Anyhow, this is how you mactch your words using a $regex condition.
To actually process your string as input you will need some code before doing the search:
var string = "roi john";
var splits = string.split(" ");
for ( var i = 0; i < splits.length; i++ ) {
splits[i] = "\\b" + splits[i] + "\\b";
}
exp = splits.join("|");
db.collection.find({ "phrase": { "$regex": exp } })
And possibly even combine that with the case insensitive "$option" if that is what you want. That second usage form with the literal $regex operator is actually a safer form form usage in languages other than JavaScript.
using a loop to iterate over the words of s1 and checking with s2 will give the expected result
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
var arr1 = s1.split(" ");
for(var i=0;i<=arr1.length;i++){
if (s2.indexOf(arr1[i]) != -1){
console.log("The string contains "+arr1[i]);
}
}