Regex for multiple strings including spaces - regex

I need a regex for filtering out a query. For example, I get a query input as below.
state:CA AND country:US OR postalcode:8888
Here, I need to extract terms based on " AND ", " OR " (any case). Can someone please provide the regex with which I can extract terms like "state:CA", "country:US" etc?
I want to consider the spaces before and after the AND, OR as the other terms might contain "and", "or" as part of string.
Eg: state:OR AND country:US
UPDATE:
I have tried something like this
\sAND\s|\sOR\s
With this, I could find the patterns " AND ", " OR ". But, how to make it case-insensitive?

What flavor or regex are you using ?
If the value in your key/pair values will always be comprised of one word only, this would do:
\w+:\w+
Test it here.
Update:
Since your values are comprised by more than one word only, I think you should be splitting the string into key/value pairs instead of using regexes.
Here's how you could do it in javascript:
var s = 'state:New York AND country:US OR postalcode:8888'
var dataBlocks = s.replace(/AND|and|And|OR|Or/g, '|').split('|')
for(var i = 0; i < dataBlocks.length; i++) dataBlocks[i] = dataBlocks[i].trim()
//your resulting array would like like
//Array [ "state:New York", "country:US", "postalcode:8888" ]
The same solution, in C#:
Regex r = new Regex(#"AND|and|And|OR|Or");
var s = "state:New York AND country:US OR postalcode:8888";
var keyValuePairs = r.Replace(s, "|").Split(new char[] { '|' }).Select(z =>
{
var keyValue = z.Trim().Split(new char[] { ':' });
return new KeyValuePair<string, string>(keyValue.FirstOrDefault(), keyValue.LastOrDefault());
});
foreach (var keyValuePair in keyValuePairs)
Console.WriteLine("Key: {0}\tValue:{1}", keyValuePair.Key, keyValuePair.Value);

Related

Google sheet : REGEXREPLACE match everything except a particular pattern

I would try to replace everything inside this string :
[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs
1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567,
1410315029
Except the following numbers :
1303222074
1403281851
1307239335
1403277567
1410315029
I have built a REGEX to match them :
1[0-9]{9}
But I have not figured it out to do the opposite that is everything except all matches ...
google spreadsheet use the Re2 regex engine and doesn't support many usefull features that can help you to do that. So a basic workaround can help you:
match what you want to preserve first and capture it:
pattern: [0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)
replacement: $1 (with a space after)
demo
So probably something like this:
=TRIM(REGEXREPLACE("[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs 1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567, 1410315029"; "[0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)"; "$1 "))
You can also do this with dynamic native functions:
=REGEXEXTRACT(A1,rept("(\d{10}).*",counta(split(regexreplace(A1,"\d{10}","#"),"#"))-1))
basically it is first split by the desired string, to figure out how many occurrences there are of it, then repeats the regex to dynamically create that number of capture groups, thus leaving you in the end with only those values.
First of all thank you Casimir for your help. It gave me an idea that will not be possible with a built-in functions and strong regex lol.
I found out that I can make a homemade function for my own purposes (yes I'm not very "up to date").
It's not very well coded and it returns doublons. But rather than fixing it properly, I use the built in UNIQUE() function on top of if to get rid of them; it's ugly and I'm lazy but it does the job, that is, a list of all matches of on specific regex (which is: 1[0-9]{9}). Here it is:
function ti_extract(input) {
var tab_tis = new Array();
var tab_strings = new Array();
tab_tis.push(input.match(/1[0-9]{9}/)); // get the TI and insert in tab_tis
var string_modif = input.replace(tab_tis[0], " "); // modify source string (remove everything except the TI)
tab_strings.push(string_modif); // insert this new string in the table
var v = 0;
var patt = new RegExp(/1[0-9]{9}/);
var fin = patt.test(tab_strings[v]);
var first_string = tab_strings[v];
do {
first_string = tab_strings[v]; // string 0, or the string with the first removed TI
tab_tis.push(first_string.match(/1[0-9]{9}/)); // analyze the string and get the new TI to put it in the table
var string_modif2 = first_string.replace(tab_tis[v], " "); // modify the string again to remove the new TI from the old string
tab_strings.push(string_modif2);
v += 1;
}
while(v < 15)
return tab_tis;
}

find str in another str with regex

I defined:
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
i need to split the words in s1 and check if they contains in s2
if yes give me the specific exists posts
The best way to help me with this, it is makes it as regex, cause i need this checks for mongo db.
Please let me know the proper regex i need.
Thx.
Possibly was something that could be answered with just the regular expression (and is actually) but considering the data:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
{ "phrase" : "something about androi" }
{ "phrase" : "johnathan was here" }
You match with MongoDB like this:
db.collection.find({ "phrase": /\broi\b|\bjohn\b/ })
And that gets the two documents that match:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
So the regex works by keeping the word boundaries \b around the words to match so they do not partially match something else and are combined with an "or" | condition.
Play with the regexer for this.
Doing open ended $regex queries like this in MongoDB can be often bad for performance. Not sure of your actual use case for this but it is possible that a "full text search" solution would be better suited to your needs. MongoDB has full text indexing and search or you can use an external solution.
Anyhow, this is how you mactch your words using a $regex condition.
To actually process your string as input you will need some code before doing the search:
var string = "roi john";
var splits = string.split(" ");
for ( var i = 0; i < splits.length; i++ ) {
splits[i] = "\\b" + splits[i] + "\\b";
}
exp = splits.join("|");
db.collection.find({ "phrase": { "$regex": exp } })
And possibly even combine that with the case insensitive "$option" if that is what you want. That second usage form with the literal $regex operator is actually a safer form form usage in languages other than JavaScript.
using a loop to iterate over the words of s1 and checking with s2 will give the expected result
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
var arr1 = s1.split(" ");
for(var i=0;i<=arr1.length;i++){
if (s2.indexOf(arr1[i]) != -1){
console.log("The string contains "+arr1[i]);
}
}

Actionscript RegExp positions and lengths of all matches in one call

How can I locate all positions of some word in text in one call using regular expressions in actionscript.
In example, I have this regular expression:
var wordsRegExp:RegExp = /[^a-zA-Z0-9]?(include|exclude)[^a-zA-Z0-9]?/g;
and it finds words "include" and "exclude" in text.
I am using
var match:Array;
match = wordsRegExp.exec(text)
to locate the words, but it finds first one first. I need to find all words "include" and "exclude" and there position so i do this:
var res:Array = new Array();
var match:Array;
while (match = wordsRegExp.exec(text)) {
res[res.length]=match;
}
And this does the trick, BUT very very slow for large amount of text. I was searching for some other method and didn't find it.
Please help and thanks in advance.
EDIT: I tried var arr:Array = text.match(wordsRegExp);
it finds all words, but not there positions in string
I think that's the nature of the beast. I don't know what you mean with "large amount of text", but if you want better performance, you should write your own parsing function. This shouldn't be that complicated, as your search expression is fairly simple.
I've never compared the performance of the String search functions and RegExp, because I thought there are based on the same implementation. If String.match() is faster, then you should try String.search(). With the index you could compute the substring for the next search iteration.
Found this on the help.adobe.com site,...
"Methods for using regular expressions with strings: The exec() method"
… The array also includes an index property, indicating the index position of the start of the substring match …
var pattern:RegExp = /\w*sh\w*/gi;
var str:String = "She sells seashells by the seashore";
var result:Array = pattern.exec(str);
while (result != null)
{
trace(result.index, "\t", pattern.lastIndex, "\t", result);
result = pattern.exec(str);
}
//output:
// 0 3 She
// 10 19 seashells
// 27 35 seashore

regex how can I split this word?

I have a list of several phrases in the following format
thisIsAnExampleSentance
hereIsAnotherExampleWithMoreWordsInIt
and I'm trying to end up with
This Is An Example Sentance
Here Is Another Example With More Words In It
Each phrase has the white space condensed and the first letter is forced to lowercase.
Can I use regex to add a space before each A-Z and have the first letter of the phrase be capitalized?
I thought of doing something like
([a-z]+)([A-Z])([a-z]+)([A-Z])([a-z]+) // etc
$1 $2$3 $4$5 // etc
but on 50 records of varying length, my idea is a poor solution. Is there a way to regex in a way that will be more dynamic? Thanks
A Java fragment I use looks like this (now revised):
result = source.replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
result = result.substring(0, 1).toUpperCase() + result.substring(1);
This, by the way, converts the string givenProductUPCSymbol into Given Product UPC Symbol - make sure this is fine with the way you use this type of thing
Finally, a single line version could be:
result = source.substring(0, 1).toUpperCase() + source(1).replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
Also, in an Example similar to one given in the question comments, the string hiMyNameIsBobAndIWantAPuppy will be changed to Hi My Name Is Bob And I Want A Puppy
For the space problem it's easy if your language supports zero-width-look-behind
var result = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "(?<=[a-z])([A-Z])", " $1");
or even if it doesn't support them
var result2 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "([a-z])([A-Z])", "$1 $2");
I'm using C#, but the regexes should be usable in any language that support the replace using the $1...$n .
But for the lower-to-upper case you can't do it directly in Regex. You can get the first character through a regex like: ^[a-z] but you can't convet it.
For example in C# you could do
var result4 = Regex.Replace(result, "^([a-z])", m =>
{
return m.ToString().ToUpperInvariant();
});
using a match evaluator to change the input string.
You could then even fuse the two together
var result4 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "^([a-z])|([a-z])([A-Z])", m =>
{
if (m.Groups[1].Success)
{
return m.ToString().ToUpperInvariant();
}
else
{
return m.Groups[2].ToString() + " " + m.Groups[3].ToString();
}
});
A Perl example with unicode character support:
s/\p{Lu}/ $&/g;
s/^./\U$&/;

Regular Expression in Actionscript 3

I need a AS3 regular expression that allows me to find/replace in strings like these:
var str1:String = "<value1 att="1"> some text</value1>";
var str2:String = "<value1 att="1" var="a"> some text and more</value1>";
var str3:String = "<value1 att="ok" var="b" def="12"> some text</value1>";
to this:
str1 = "<value1 att="1">*some text</value1>";
str2 = "<value1 att="1" var="a">**some text and more</value1>";
str3 = "<value1 att="ok" var="b" def="12">*****some text</value1>";
I want to be able to replace the spaces at the beginning (inside the > <) for other character. It shouldn't affect the number of character at the right of the spaces or the attributes in the value1 definition.
Assuming that there are no "* " sequences in the text blocks, this should work:
var s:String = "<value1 att='ok' var='b' def='12'> some text</value1>";
//find all spaces after a tag closing bracket and replace with a *
s = s.replace(/>\s/g, ">*");
//find all spaces after a * and replace it with a *
//keep doing this until no more can be found
while (s.match(/>\*+\s/g).length) {
s = s.replace(/\*\s/g, "**");
}
I can't think of a way to do it in one replace though.
I think the easiest way to accomplish what you need is to use a function in replace() expression.
var replaceMethod:Function = function (match:String, tagName:String, tagContent:String, spaces:String, targetText:String, index:int, whole:String) : String
{
trace("\t", "found", spaces.length,"spaces in tag '"+tagName+"'");
trace("\t", "matched string:", match);
// check tag name or whatever you may want
// do something with found spaces
var replacement:String = spaces.replace(" ", "*");
return "<"+tagName+" "+tagContent+">"+replacement+targetText;
}
var str1:String = '<value1 att="1"> some text</value1>';
var exp:RegExp = /<(\w+)([ >].*?)>(\s+)(some text)/gm;
trace("before:", str1);
str1 = str1.replace(exp, replaceMethod);
trace("after:", str1);
It's not performance-safe though; if you are using huge blocks of text and/or launching this routine very frequently, you may want to do something more comlicated, but optimized. One optimization technique is reducing the number of arguments of replaceMathod().
p.s. I think this can be done with one replace() expression and without using replaceMethod(). Look at positive lookaheads and noncapturing groups, may be you can figure it out. http://livedocs.adobe.com/flex/3/html/help.html?content=12_Using_Regular_Expressions_09.html