Regular expression for specific part of url - regex

I'm learning to write regular expressions and am trying to get a specific part of a url. For example,
https://domain/#/customerName/profile/profileName/jobs/jobName
where "customerName" is a variable but "domain and "profile" are constants.
How would I obtain profileName given this string? I've come up with
^\/(?!https:\/\/domain\/#\/#\/*\/profile\/)
but don't really know how to cut off the rest of the string after "profileName"

customerName is a variable, but assuming it doesn't contain any slashes, you can match it with [^\/]+ (which means "repeat any character but a forward slash") after the /#/. The same pattern works for profileName. So, you can use:
https:\/\/domain\/#\/[^\/]+\/profile\/([^\/]+)
https://regex101.com/r/aToq8N/3
The profileName will be in the first captured group.
Note that because the URL doesn't contain /#/#/, but just /#/, use \/#\/ in the regular expression instead of \/#\/#\/.

Even if your domain is fixed, you may use this expression too fore the domain part
[^\/]+
This way it can contain dot in addition to alphabetic characters,among other things.

Related

regex expression for selecting a value

I want to write a regexp formula for the below sip message that takes number:
< sip:callpark#as1sip1.com:5060;user=callpark;service=callpark;preason=park;paction=park;ptoken=150009;pautortrv=180;nt_server_host=47.168.105.100:5060 >
(Actually there are "<" and ">" signs in the message, but the site does not let me write)
For this case, I want to select ptoken value.. I wrote an expression such as: ptoken=(.*);p but it returns me ptoken=150009;p, I just need the number:150009
How do I write a regexp for this case?
PS: I write this for XML script..
Thanks,
I SOLVE THE PROBLEM BY USING TWO REGEX:
ereg assign_to="token" check_it="true" header="Refer-To:" regexp="(ptoken=([\d]*))" search_in="hdr"/
ereg assign_to="callParkToken" search_in="var" variable="token" check_it="true" regexp="([\d].*)" /
You could use the following regex:
ptoken=(\d+)
# searches for ptoken= literally
# captures every digit found in the first group
Your wanted numbers are in the first group then. Take a look at this demo on regex101.com. Depending on your actual needs, there could be better approaches (Xpath? as tagged as XML) though.
You should use lookahead and lookbehind:
(?<=ptoken=)(.+?)(?=;)
It captures any character (.+?) before which is ptoken= and behind which is ;
The <ereg ... > action has the assign_to parameter. In your case assign_to="token". In fact, the parameter can receive several variable names. The first is assigned the whole string matching the regular expression, and the following are assigned the "capture groups" of the regular expression.
If your regexp is ptoken=([\d]*), the whole match includes ptoken which is bad. The first capture group is ([\d]*) which is the required value. Thus, use <ereg regexp="ptoken=([\d]*)" assign_to="dummyvar,token" ..other parameters here.. >.
Is it working?

How to capture text between two markers?

For clarity, I have created this:
http://rubular.com/r/ejYgKSufD4
My strings:
http://blablalba.com/foo/bar_soap/foo/dir2
http://blablalba.com/foo/bar_soap/dir
http://blablalba.com/foo/bar_soap
My Regular expression:
\/foo\/(.*)
This returns:
/foo/bar_soap/dir/dir2
/foo/bar_soap/dir
/foo/bar_soap
But I only want
/foo/bar_soap
Any ideas how I can achieve this? As illustrated above, I want everything after foo up until the first forward slash.
Thanks in advance.
Edit. I only want the text after foo until until the next forward slash after. Some directories may also be named as foo and this would render incorrect results. Thanks
. will match anything, so you should change it to [^/] (not slash) instead:
\/foo\/([^\/]*)
Some of the other answers use + instead of *. That might be correct depending on what you want to do. Using + forces the regex to match at least one non-slash character, so this URL would not match since there isn't a trailing character after the slash:
http://blablalba.com/foo/
Using * instead would allow that to match since it matches "zero or more" non-slash characters. So, whether you should use + or * depends on what matches you want to allow.
Update
If you want to filter out query strings too, you could also filter against ?, which must come at the front of all query strings. (I think the examples you posted below are actually missing the leading ?):
\/foo\/([^?\/]*)
However, rather than rolling out your own solution, it might be better to just use split from the URI module. You could use URI::split to get the path part of the URL, and then use String#split split it up by /, and grab the first one. This would handle all the weird cases for URLs. One that you probably haven't though of yet is a URL with a specified fragment, e.g.:
http://blablalba.com/foo#bar
You would need to add # to your filtered-character class to handle those as well.
You can try this regular expression
/\/foo\/([^\/]+)/
\/foo\/([^\/]+)
[^\/]+ gives you a series of characters that are not a forward slash.
the parentheses cause the regex engine to store the matched contents in a group ([^\/]+), so you can get bar_soap out of the entire match of /foo/bar_soap
For example, in javascript you would get the matched group as follows:
regexp = /\/foo\/([^\/]+)/ ;
match = regexp.exec("/foo/bar_soap/dir");
console.log(match[1]); // prints bar_soap

parsing url for specific param value

im looking to use a regular expression to parse a URL to get a specific section of the url and nothing if I cannot find the pattern.
A url example is
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5#c452fds-634d-f424fds-cdsa&bf_action=jildape
I wish to get the bolded text in it.
Currently im using the regex "d=([^#]*)" but the problem is im also running across urls of this pattern:
and im getting the bold section of it
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5&bf_action=jildape
I would prefer it have no matches of this url because it doesnt contain the #
Regexes are not a magic tool that you should always use just because the problem involves a string. In this case, your language probably has a tool to break apart URLs for you. In PHP, this is parse_url(). In Perl, it's the URI::URL module.
You should almost always prefer an existing, well-tested solution to a common problem like this rather than writing your own.
So you want to match the value of the id parameter, but only if it has a trailing section containing a '#' symbol (without matching the '#' or what's after it)?
Not knowing the specifics of what style of regexes you're using, how about something like:
id=([^#&]*)#
regex = "id=([\\w-])+?#"
This will grab everything that is character class[a-zA-Z_0-9-] between 'id=' and '#' assuming everything between 'id=' and '#' is in that character class(i.e. if an '&' is in there, the regex will fail).
id=
-Self explanatory, this looks for the exact match of 'id='
([\\w-])
-This defines and character class and groups it. The \w is an escaped \w. '\w' is a predefined character class from java that is equal to [a-zA-Z_0-9]. I added '-' to this class because of the assumed pattern from your examples.
+?
-This is a reluctant quantifier that looks for the shortest possible match of the regex.
#
-The end of the regex, the last character we are looking for to match the pattern.
If you are looking to grab every character between 'id=' and the first '#' following it, the following will work and it uses the same logic as above, but replaces the character class [\\w-] with ., which matches anything.
regex = "id=(.+?)#"

Bounding Multiple Matches With Single Text

I'm trying to parse out the properties of a type (eg. the words 'Cusip', 'Issuer', and 'Coupon') shown here:
Public Type GetPricesResponse
Cusip As String
Issuer As String
Coupon As String
End Type
The regex ([a-zA-Z0-9]+).+As works great for this code snippet (see http://regexr.com?300fl), but may not work when mixed with a larger body of code. So, I've tried to "bound" this regex with the words Public Type on the front, and End Type at the end to specifically identify what I need as follows:
Public\sType\s([a-zA-Z0-9]+).+As.+End\sType
...but of course it then doesn't match anything.
I have the MultiLine option set as well.
You've presented two different problems.
The first is, roughly, "can I write a regex to match this thing", the answer is yes. For simplicity I've used \w instead of [a-zA-Z0-9]:
Public\s+Type\s+(\w+)\s+((\w+)\s+As\s+(\w+)\s*('.*\s*)?)+End\s+Type
The next is "how can I parse out the properties" and the answer to that is, as written in the comments: don't use a single regex. First, use a regex which captures only the definitions:
Public\s+Type\s+\w+\s+(.*?)End\s+Type
This uses a the reluctant quantifier *? so that the regex won't gobble up End Type and the DOTALL flag so that you can match several lines. From this match, you take group 1 and repeatedly find the following:
^\s+(\w+)\s+.*$
Group 1 from this match will be your property name.
Use the following regexp to match the whole thing:
Public\s+Type\s+(?<tname>[\w]+)\s+((?<pname>[\w]+)\s+As\s+(?<ptype>[\w]+)\s+)+End\s+Type
Note that it uses named groups for easier access to matched content. Therefore after the whole content is matched, the group named tname matches the class type, the group named pname matches the property name, and the group named ptype matches the corresponding properties type.
Here's its live demo:
http://regexr.com?300l0

How to match a string that does not end in a certain substring?

how can I write regular expression that dose not contain some string at the end.
in my project,all classes that their names dont end with some string such as "controller" and "map" should inherit from a base class. how can I do this using regular expression ?
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Do a search for all filenames matching this:
(?<!controller|map|anythingelse)$
(Remove the |anythingelse if no other keywords, or append other keywords similarly.)
If you can't use negative lookbehinds (the (?<!..) bit), do a search for filenames that do not match this:
(?:controller|map)$
And if that still doesn't work (might not in some IDEs), remove the ?: part and it probably will - that just makes it a non-capturing group, but the difference here is fairly insignificant.
If you're using something where the full string must match, then you can just prefix either of the above with ^.* to do that.
Update:
In response to this:
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Not quite sure what you're attempting with the public/class stuff there, so try this:
public.*class.*(?<!controller|map)$`
The . is a regex char that means "anything except newline", and the * means zero or more times.
If this isn't what you're after, edit the question with more details.
Depending on your regex implementation, you might be able to use a lookbehind for this task. This would look like
(?<!SomeText)$
This matches any lines NOT having "SomeText" at their end. If you cannot use that, the expression
^(?!.*SomeText$).*$
matches any non-empty lines not ending with "SomeText" as well.
You could write a regex that contains two groups, one consists of one or more characters before controller or map, the other contains controller or map and is optional.
^(.+)(controller|map)?$
With that you may match your string and if there is a group() method in the regex API you use, if group(2) is empty, the string does not contain controller or map.
Check if the name does not match [a-zA-Z]*controller or [a-zA-Z]*map.
finally I did it in this way
public.*class.*[^(controller|map|spec)]$
it worked