Regular expression to find a same consecutive words - regex

I'm a newbie to regular expressions and i have a problem in identifying the same consecutive words using regular expression. below is the scenario.
Here is the data :
;af;aj;am;an;ao;ap12;aq123;ar;as;ad;af1223;
and my current regular expression is (;[a-z][a-z];) and it only matches the below sets ;af; , ;am; , ;ao; , ;ar; , ;ad; but my expectation is to match all these sets. ;af;aj;am;an;ao; & ;ar;as;ad;.
Could guys please guide me how to match these patterns?

It seems like your trying to extract the substrings which are in this ;[a-z][a-z]; format. If yes, then you could simply put your regex inside a lookahead to do a overlapping match.
(?=(;[a-z][a-z];))
DEMO

(;[a-z][a-z](?=;))
Try this.This returns the group you are looking for though its not clear how they are same.
The reason why urs was not working wass due to that fact (;[a-z][a-z];) doesnt leave a ; for the next element to start with.So it is not able to match as there is no ; in front of it.A lookahead assertion doesnt cosume ; thereby enabling all matches.
See demo.
http://regex101.com/r/tF4jD3/4

Related

Regular Expresion in Tableau returns only Null's in Calculated Field

I'm trying to extratct in Tableau the first occurance of part of speech name (e.g. subst, adj, fin) located between { and : in every line from column below:
{subst:pl:nom:m3=18, subst:pl:voc:m3=1, subst:pl:acc:m3=5}
{subst:sg:gen:m3=5, subst:sg:inst:m3=1, subst:sg:gen:f=1, subst:sg:nom:m3=1}
{subst:sg:nom:f=3, subst:sg:loc:f=2, subst:sg:inst:f=1, subst:sg:nom:m3=1}
{adj:sg:nom:m3:pos=2, adj:sg:acc:m3:pos=1, adj:sg:acc:n1.n2:pos=3, adj:pl:acc:m1.p1:pos=3, adj:sg:nom:f:pos=1}
{adj:sg:gen:f:pos=2, adj:sg:nom:n:pos=1}
{fin:sg:ter:imperf=5}
To do this I use the following regular expression: {(\w+):(?:.*?)}$. Unfortunately my calculated field returns only Null's:
Screeen from Tableau
I checked my regular expression on regex tester and is valid:
Sreen from regex101.com
I don't know what I'm doing wrong so if anybody has any suggestions I would be greatfull.
Tableau regex engine is ICU, and there are some differences between it and PCRE.
One of them is that braces that should be matched as literal symbols must be escsaped.
Your regex also contains a redundant non-capturing group ((?:.*?) = .*?) and a lazy quantifier that slows down matching since you want to check for a } at the end of the string, and thus should be changed to a greedy .*.
You can use
REGEXP_EXTRACT([col], '^\{(\w+):.*\}$')

regex expression for selecting a value

I want to write a regexp formula for the below sip message that takes number:
< sip:callpark#as1sip1.com:5060;user=callpark;service=callpark;preason=park;paction=park;ptoken=150009;pautortrv=180;nt_server_host=47.168.105.100:5060 >
(Actually there are "<" and ">" signs in the message, but the site does not let me write)
For this case, I want to select ptoken value.. I wrote an expression such as: ptoken=(.*);p but it returns me ptoken=150009;p, I just need the number:150009
How do I write a regexp for this case?
PS: I write this for XML script..
Thanks,
I SOLVE THE PROBLEM BY USING TWO REGEX:
ereg assign_to="token" check_it="true" header="Refer-To:" regexp="(ptoken=([\d]*))" search_in="hdr"/
ereg assign_to="callParkToken" search_in="var" variable="token" check_it="true" regexp="([\d].*)" /
You could use the following regex:
ptoken=(\d+)
# searches for ptoken= literally
# captures every digit found in the first group
Your wanted numbers are in the first group then. Take a look at this demo on regex101.com. Depending on your actual needs, there could be better approaches (Xpath? as tagged as XML) though.
You should use lookahead and lookbehind:
(?<=ptoken=)(.+?)(?=;)
It captures any character (.+?) before which is ptoken= and behind which is ;
The <ereg ... > action has the assign_to parameter. In your case assign_to="token". In fact, the parameter can receive several variable names. The first is assigned the whole string matching the regular expression, and the following are assigned the "capture groups" of the regular expression.
If your regexp is ptoken=([\d]*), the whole match includes ptoken which is bad. The first capture group is ([\d]*) which is the required value. Thus, use <ereg regexp="ptoken=([\d]*)" assign_to="dummyvar,token" ..other parameters here.. >.
Is it working?

Regular expression to match particular starting word or nothing

I'm struggling to come up with the correct regex for the following scenario.
Let's say you have to match a word either starts with http- or nothing
eg : http-test-data, test-data should be a match but xyz-test-data shouldn't be a match
the regex i came up so far is
(?:http-)?(test-data)
but it matches xyz-test-data as well.
You could simply use the following:
(?:http-|^)(test-data)
This tests for either a positive look-behind of http- or for the beginning of the string before test-data.
For example, for the sample data as follows:
http-test-data
xyz-test-data
http-test-data
xyz-test-data
test-data
yes-yes-test-data
-test-data
It yeilds:
http-test-data
http-test-data
test-data
Try this representation
^(http-|)(test-data)
Yes because there is a ? on the (?:http-). Then the regex will also match any string that contains test-data.

php regex to match three words if not then two and then one

Q1: I'm writing a regex in php and not successful. I want to match the following:
so i would
if not then match:
so i
and then:
i would
and
so
i
would
Here is my code:
\b(so i|i would|so i would|(so|i|would))\b
Its only matching the: so, i, would, so i, i would .... but not matching the so i would?
Order your regex correctly.
\b(so i would|so i|i would|(so|i|would))\b
Put the longest string to match to the left.
The | is left-associative and hence, in your version Of the regex, is matching the shorter string.
Just put it at the beginning
\b(so i would|so i|i would|(so|i|would))\b
put longest pattern to left in the group: \b(long|...|short)\b
another solution: \b(so i would|i would|would|so i|so|i)\b
p.s. this is NFA regex engine feature, please refer to "Mastering Regular Expressions"

Regular Expression - Want two matches get only one

I'm working wih a regular expression and have some lines in javascript. My expression should deliver two matches but recognizes only one and I don't know whats the problem.
The Lines in javascript look like this:
if(mode==1) var adresse = "?APPNAME=CampusNet&PRGNAME=ACTION&ARGUMENTS=-A7uh6sBXerQwOCd8VxEMp6x0STE.YaNZDsBnBOto8YWsmwbh7FmWgYGPUHysiL9u0.jUsPVdYQAlvwCsiktBzUaCohVBnkyistIjCR77awL5xoM3WTHYox0AQs65SoHAhMXDJVr7="; else var adresse = "?APPNAME=CampusNet&PRGNAME=ACTION&ARGUMENTS=-AHMqmg-jXIDdylCjFLuixe..udPC2hjn6Kiioq7O41HsnnaP6ylFkQLhaUkaWKINEj4l2JqL2eBSzOpmG.b5Av2AvvUxEinUhMBTt5awdgAL4SkBEgYXGejTGUxcgPE-MfiQjefc=";
My expression looks like this:
(?<Popup>(popUp\(')|(adresse...")).*\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
I want to have two matches with APPNAME...... as Parameters.
[UPDATE] Like Tim Pietzcker wrote i used the greedy version and should have used the lazy version. while he wrote that i solved it myself by using .? instead of . in the middle so the expression looks like this:
(?<Popup>(popUp\(')|(adresse...")).*?\\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
That worked. Thanks to Tim Pietzcker
Your regex matches too much - from the very first adresse until the very last " because it uses a greedy quantifier .*.
If you make that quantifier lazy, i. e.
(?<Popup>(popUp\(')|(adresse...")).*?\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
you get two matches.
Alternatively, if your data allows this, use a different quantifier that only matches non-space characters. This will match faster (but will fail of course if the text you're trying to match could possibly contain spaces):
(?<Popup>(popUp\(')|(adresse..."))\S*\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
Usually you must apply the regex with the "global" flag to find all matches. I can't really say more until I see the complete code sample you are working with.