RegExp matching url - regex

I have the current url:
ristoranti/location/latvia/riga/other-tag
I need a regexp that do not get the url if has the location segment
Here what I tried:
ristoranti/(?!location$).*)?(.+?)/(.+?)
Example url I need to get:
ristoranti/latvia/riga/other-tag
I'm not so good with regexp but if I'm right the first segment shoulg get all but location, am I wrong?

Problem is presence of $ in your negative lookahead that will fail to stop ristoranti/location/latvia/riga/other-tag from matching because your URL is not really ending with location. You should replace it by
(?!location/)
which will fail the match when URL has location/ ahead.
Also use ^ at the start. So your final regex should be:
^ristoranti/(?!location/)([^/]*)/([^/]*)/(.*)
RegEx Demo

Related

How can I write a opposite regex to this regex?

this is a regex of a proxy, if I add this to my proxy:
(.*\.|)(abc|google)\.(org|net)
my proxy will not transmit the abc.org, abc.net, google.org, google.net's traffic.
how can I write a regex opposite to this regex? I mean only transmit the abc.org, abc.net, google.org, google.net's traffic.
EDIT-01
My thought is just want to transmit abc.org or www.abc.org, how can I do with that?
Try this:
^(?!(www\.)?(?:abc|google)\.(?:net|org)).*
Demo: https://regex101.com/r/WOnFx8/3/
I used ?! to reverse the matching of your regex. This way, it will match any domain except these specific 4 domains.
Another way to do it is by using this code to include anything before the desired domains:
^(?!(.*\.|)(?:abc|google)\.(?:net|org)).*
demo: https://regex101.com/r/WOnFx8/4/
Your regex you write
(.*\.|)(abc|google)\.(org|net)
mean any string is one of abc.org, gooogle.org, abc.net, google.net, with optional prefix string ends with dot (.)
Like: test.google.org, sub.abc.net,...
I think you want to match string like test.yahoo.com, but not test.google.org. If you can use negative look ahead, this is the answer:
^(.*\.|)(?!(abc|google)\.(org|net))\w+\.\w+$
Explain:
^ and $ to be sure your match is entire url string
Negative look ahead is to check the url is not something like abc.org, abc.net, google.org, google.net
And \w+\.\w+ to check the remain string is kind of URL type (something likes yahoo.com, etc...)
Im going to assume you have lookaheads, if so then you can simply use -
(^.*?\.(?!(abc|google))\w+\.(?:org|net)$)
Demo - https://regex101.com/r/5eC41R/3
What this does is -
Looks for the start of the url (till the first .)
Checks that next part is not abc or google
looks for the next section (till the next .)
Looks for a closing org or net
Note that since it is a lookahead it will be slow compared to other regex matches

Regex in Google Analytics for segment creation

I'm trying to trap URLs of the following structure:
/resources/state-name/city-name
given that there are URLs of the following type
/resources/other-words
/resources/state-name
/resources/state-name/city-name/other-words
I have tried to trap using
include/matches regex:
\/resources\/.*\/.*
exclude/matches regex:
\/resources\/.*\/.*\/.*
but this is allowing the other-words and state-name only to slip through.
Try this regex \/resources\/[^\/\r\n]*(?:$|(?:\/.*\/.*$)). I assumed the end of the url was also the end of the line. This matches all of them but /resources/state-name/city-name
To only get /resources/state-name/city-name, then use this one \/resources\/[^\/\r\n]*\/[^\/\r\n]*$.
Something like this /(\/resources\/)([\w+%-]+)\/([\w+%-]+)/g
With this [\w-%] you match any letter, number, - and % in the URL (i put % because in the URLs spaces are replaced with - or + or %20)
Also, with the () you can access each member with $1 to $3

How to regex last part of URL only?

I want to do a regex that target the end of a url:
www.company.com/orders/thanks
If you return from email or account page order ID is populated in the end:
www.company.com/orders/thanks/1sfasd523425
So I only want to target the URL that ends with /thanks
This thread bring is similiar: How do I get the last segment of URL using regular expressions
Had something similair .*\/thanks\/.+ but target incorrectly.
EDIT: Only target URLs ending with /thanks or /thanks/
Try with lookahead like this.
Regex: .+(?=\/thanks$).+
Explanation: This will match the URL only if thanks is at end of string by positive lookahead.
Regex101 Demo
Use URL object dont parse it yourself
URL url = new URL("http://stackoverflow.com/questions/36616915/how-to-regex-last-part-of-url-only");
URLDecoder.decode(url.getPath(), "utf-8");
url.getPath();
url.getContent();
url.getPort();
url.getContent();

Crawler4j Regex Pattern for url

im using crawler4J , and i want to make some patterns to urls only but i couldn't solve regex for that url :
http://www.site.com/liste/product_name_changable/productDetails.aspx?productId={id}&categoryId={category_id}
i try that :
liste\/*\/productDetails:aspx?productId=*&category_id=*
and
private final static Pattern FILTERS = Pattern.compile("^/liste/*/productDetails.aspx?productId=*$");
but it's not working.
how can i make it regex pattern ?
You have several errors in your regex. All of the asterixes should be .+, to indicate that you want to match at least one or more character. The question mark symbol needs to be escaped. category_id should be categoryId. productDetails:aspx should be productDetails.aspx. With all of these fixes, the regex looks like this:
liste\/.+\/productDetails\.aspx\?productId=.+&categoryId=.+
Also, you shouldn't have ^ or $ at the start and end of the regex. Those match the start and end of the input, so they won't work if you're trying to get a portion of the url, which you are.

Backbone.js route using regex - Matching a URL that does not end with a given string

I have to create a route using regex that matches a URL which does not end with a particular word say 'submit'. For example -
/login/submit ==> does not match
/login/abcsubmit ==> does not match
/abc/xyx => Matches
Use this regex:
((?!(.*?)/\w*submit).*)
like explained in http://backbonejs.org/#Router-route
this.route(/^((?!(.*?)/\w*submit).*)$/, "functionName");
I had tried #Nestenius regex that he provided and it was still matching the first two example urls that you had provided. The reason it was is because the regex was not anchored to the start of the string.
You could still use his regex if you add an ^ tag to the beginning of the regex like so:
^((?!(.*?)/\w*submit).*)
Or you can use this shorter version:
^(?!.*submit).*
Both will match any string that does not contain "submit" in it.