regex expression for selecting a value - regex

I want to write a regexp formula for the below sip message that takes number:
< sip:callpark#as1sip1.com:5060;user=callpark;service=callpark;preason=park;paction=park;ptoken=150009;pautortrv=180;nt_server_host=47.168.105.100:5060 >
(Actually there are "<" and ">" signs in the message, but the site does not let me write)
For this case, I want to select ptoken value.. I wrote an expression such as: ptoken=(.*);p but it returns me ptoken=150009;p, I just need the number:150009
How do I write a regexp for this case?
PS: I write this for XML script..
Thanks,
I SOLVE THE PROBLEM BY USING TWO REGEX:
ereg assign_to="token" check_it="true" header="Refer-To:" regexp="(ptoken=([\d]*))" search_in="hdr"/
ereg assign_to="callParkToken" search_in="var" variable="token" check_it="true" regexp="([\d].*)" /

You could use the following regex:
ptoken=(\d+)
# searches for ptoken= literally
# captures every digit found in the first group
Your wanted numbers are in the first group then. Take a look at this demo on regex101.com. Depending on your actual needs, there could be better approaches (Xpath? as tagged as XML) though.

You should use lookahead and lookbehind:
(?<=ptoken=)(.+?)(?=;)
It captures any character (.+?) before which is ptoken= and behind which is ;

The <ereg ... > action has the assign_to parameter. In your case assign_to="token". In fact, the parameter can receive several variable names. The first is assigned the whole string matching the regular expression, and the following are assigned the "capture groups" of the regular expression.
If your regexp is ptoken=([\d]*), the whole match includes ptoken which is bad. The first capture group is ([\d]*) which is the required value. Thus, use <ereg regexp="ptoken=([\d]*)" assign_to="dummyvar,token" ..other parameters here.. >.
Is it working?

Related

Regex - Matching a part of a URL

I'm trying to use regular expression to match a part of the following url:
http://www.example.com/store/store.html?ptype=lst&id=370&3434323&root=nav_3&dir=desc&order=popularity
I want the Regex to find:
&3434323
Basically, it's meant to search any part of the argument that doesn't follow the variable=value formula. So basically I need it to search sections of the URL that don't have an equal sign it, but match just that part.
I tried using:
&\w*+[^=_-]
But it returns: &3434323&. I need it to not return the next ampersand.
And it must be done in regex. Thanks in advance!
You can use this regex:
[?&][^=]+(&|$)
It looks for any string that doesn't contain the equal sing [^=]+ and starts with the question mark or the ampersand [?&] and ends with ampersand or the end of the URL (&|$).
Please note that this will return &3434323&, so you'll have to strip the ampersands on both sides in your code. I assume that you're fine with that. If you really don't want the second ampersand, you can use a lookahead:
[?&][^=]+(?=&|$)
If you don't want even the first ampersand, you can use this regex, but not all compilers support it:
(?<=\?|&)[^=]+(?=&|$)
Parsing query parameters can be tricky, but this may do the job:
((?:[?&])[^=&]+)(?=&|$)
It will not catch the ampersand at the end of the parameter, but it will include either the question mark or the ampersand at the beginning. It will match any parameter not in the form of a key-value pair.
Demo here.

How to get the queryparam vid from the url using regex

Help me with the regex, I am trying to get the vid value from the following url.
I tried with like the following but I am not sure with that:
[\&]{1}vid[\=][\d]*
Is that correct?
Use vid=(\d+) for numbers of IDs see regex
Try Your Regex on this place...
https://regex101.com/r/dX3hD4/1
The trick here is to match between two patterns of interest -
"vid="
"&"
Anything you capture between that is what you're after.
Hence use this:
"http://gorid.com/api.jsp?acs=123&vid=432&skey=asdasd-asdas-adsasd".match("vid=([^;]*)&")[1]
We're accessing the 2nd element of the match object because that contains the value.
In a JS/PHP type environment, you can match on something like this, where you just find anything alphanumeric is between vid= and the following &:
vv = str.match(/vid=(.+?)&/)[1];
HERE
If the value is always numeric, replace (.+?) with (\d+?)
The regex you wrote will not work because you are including the characters &vid= in the return value. To make sure the regex engine checks for the string &vid= but does not include it in the result you will need to use a lookbehind:
(?<=&vid=)([^&\r\n]+)
We use a positive lookbehind to find &vid= and then grab everything from that point until the next & sign or the end of the line.
For your second request, if you wish to verify that the content of vid is a valid number you need to specify that all the characters following &vid= should be digits and also include a positive lookahead that makes sure the next character after the digits is a & sign. The corresponding regular expression then becomes:
(?<=&vid=)([^\D]+)(?=&)

regex with 3 backreferences but one optional

I have a regular expression that captures three backreferences though one (the 2nd) may be null.
Given the flowing string:
http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajonathonoat.es&source=web&cd=1&ved=0CC8QFjAA&url=http%3A%2F%2Fjonathonoat.es%2Fbritish-mozcast%2F&ei=MQj9UKejDYeS0QWruIHgDA&usg=AFQjCNHy1cDoWlIAwyj76wjiM6f2Rpd74w&bvm=bv.41248874,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1
I wish to capture the TLD (in this case .co.uk), q param and cd param.
I'm using the following RegEx:
/.*\.google([a-z\.]*).*q=(.*[^&])?.*cd=(\d*).*/i
Which works except the 2nd backreference includes the other parameters upto the cd param, I current get this:
["http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1 ", ".co.uk", "site%3Ajonathonoat.es&source=web", "1", index: 0, input: "http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1"]
The 1st backreference is correct, it's .co.uk and so is the 3rd; it's 1. I want the 2nd backreference to be either null (or undefined or whatever) or just the q param, in this example site%3Ajonathonoat.es. It currently includes the source param too (site%3Ajonathonoat.es&source=web).
Any help would be much appreciated, thanks!
I've added a JSFiddle of the code, look in your browser console for the output, thanks!
if negating character classes, i always add a multiplier to the class itself:
/.*\.google([a-z\.]*).*q=([^&]*?)?.*cd=(\d*).*/i
i also recoomend not using * or + as they are "greedy", always use *? or +? when you are going to find delimiters inside your string. For more on greedyness check J.F.Friedls Mastering Rgeular Expressions or simply here
You want the middle group to be:
q=([^&]*)
This will capture characters other than ampersand. This also allows zero characters, so you can remove the optional group (?).
Working example: http://rubular.com/r/AJkXxgeX5K

Bounding Multiple Matches With Single Text

I'm trying to parse out the properties of a type (eg. the words 'Cusip', 'Issuer', and 'Coupon') shown here:
Public Type GetPricesResponse
Cusip As String
Issuer As String
Coupon As String
End Type
The regex ([a-zA-Z0-9]+).+As works great for this code snippet (see http://regexr.com?300fl), but may not work when mixed with a larger body of code. So, I've tried to "bound" this regex with the words Public Type on the front, and End Type at the end to specifically identify what I need as follows:
Public\sType\s([a-zA-Z0-9]+).+As.+End\sType
...but of course it then doesn't match anything.
I have the MultiLine option set as well.
You've presented two different problems.
The first is, roughly, "can I write a regex to match this thing", the answer is yes. For simplicity I've used \w instead of [a-zA-Z0-9]:
Public\s+Type\s+(\w+)\s+((\w+)\s+As\s+(\w+)\s*('.*\s*)?)+End\s+Type
The next is "how can I parse out the properties" and the answer to that is, as written in the comments: don't use a single regex. First, use a regex which captures only the definitions:
Public\s+Type\s+\w+\s+(.*?)End\s+Type
This uses a the reluctant quantifier *? so that the regex won't gobble up End Type and the DOTALL flag so that you can match several lines. From this match, you take group 1 and repeatedly find the following:
^\s+(\w+)\s+.*$
Group 1 from this match will be your property name.
Use the following regexp to match the whole thing:
Public\s+Type\s+(?<tname>[\w]+)\s+((?<pname>[\w]+)\s+As\s+(?<ptype>[\w]+)\s+)+End\s+Type
Note that it uses named groups for easier access to matched content. Therefore after the whole content is matched, the group named tname matches the class type, the group named pname matches the property name, and the group named ptype matches the corresponding properties type.
Here's its live demo:
http://regexr.com?300l0

Extract querystring value from url using regex

I need to pull a variable out of a URL or get an empty string if that variable is not present.
Pseudo code:
String foo = "http://abcdefg.hij.klmnop.com/a/b/c.file?foo=123&zoo=panda";
String bar = "http://abcdefg.hij.klmnop.com/a/b/c.file";
when I run my regex I want to get 123 in the first case and empty string in the second.
I'm trying this as my replace .*?foo=(.*?)&?.*
replacing this with $1 but that's not working when foo= isn't present.
I can't just do a match, it has to be a replace.
You can try this:
[^?]+(?:\?foo=([^&]+).*)?
If there are parameters and the first parameter is named "foo", its value will be captured in group #1. If there are no parameters the regex will still succeed, but I can't predict what will happen when you access the capturing group. Some possibilities:
it will contain an empty string
it will contain a null reference, which will be automatically converted to
an empty string
the word "null"
your app will throw an exception because group #1 didn't participate in the match.
This regex matches the sample strings you provided, but it won't work if there's a parameter list that doesn't include "foo", or if "foo" is not the first parameter. Those options can be accommodated too, assuming the capturing group thing works.
I think you need to do a match, then a regex. That way you can extract the value if it is present, and replace it with "" if it is not. Something like this:
if(foo.match("\\?foo=([^&]+)")){
String bar = foo.replace("\\?foo=([^&]+)", $1);
}else{
String bar = "";
}
I haven't tested the regex, so I don't know if it will work.
In perl you could use this:
s/[^?*]*\??(foo=)?([\d]*).*/$2/
This will get everything up to the ? to start, and then isolate the foo, grab the numbers in a group and let the rest fall where they may.
There's an important rule when using regular expressions : don't try to put unnecessary processing into it. Sometimes things can't be done only by using one regular expression. Sometimes it is more advisable to use the host programming language.
Marius' answer makes use of this rule : rather than finding a convoluted way of replacing-something-only-if-it-exists, it is better to use your programming language to check for the pattern's presence, and replace only if necessary.