Extract last part of url without query string or jsessionid

Extract last part of url without query string or jsessionid - regex

I want a regex that will always return the last part of an url before the query string parameters and without the jessionid if present.
Here's some url examples:
http://www.somesite.com/some/path/test.action;jsessionid=000063vCmvJAn7VWyymA_dPsHZs:16u9pglit?sort=2&param1=1&param2=2
http://www.somesite.com/some/path/test;jsessionid=000063vCmvJAn7VWyymA_dPsHZs:16u9pglit?sort=2&param1=1&param2=2
http://www.somesite.com/some/path/test.action?sort=2&param1=1&param2=2
http://www.somesite.com/some/path/test?sort=2&param1=1&param2=2
Here's my regex so far:
.*http://.*/some/path.*/(.*);?.*\?.*
It is working for the url that does not contain jsessionid, but will return test;jessionid=... if it is present.
To test: http://regex101.com/r/fM0mE2

I would use this regex:
.*http:\/\/.*\/some\/path.*\/([^;\?]+);?.*\?.*
^^^^^^
Basically matches anything that isn't ; or ?. And I think it might be shortened to:
.*http:\/\/.*\/some\/path.*\/([^;\?]+)

Related

How to replace part of a URL with regex

I need to remove part of a URL with a regex.
From the words: http or https to the word .com.
And it can be several times in one string.
Can anyone help me with this?
For example a string:
"The request is:https://stackoverflow.com/questions"
After the removal - "The request is:/questions"

The regex that performed the deletion perfectly is: (#"\w+://[^/$]*")
with replace "".
Something like that:
var regex = new Regex(#"\w+:\/\/[^\/$]*");
regex.Replace(url, "");

You can use the re.sub() function from the regex package. Alternatively if your working with python you can use urlparse package to extract different parts of the url and concatenate it to the prefix you want.

Pass query value to new URL - regex

I am trying to set up some server redirects.
I have an old URL: /product-category/pillows/?pa_position=back-sleeper
The new URL is: /product-category/pillows/?_position=back-sleeper
The ?_position parameter is new, however the values remain the same. Is there an appropriate regex expression to pass the original param value to the new URL?

As Barmar suggested you could simply replace pa_position with _position. Following sed command would do the job.
sed 's/pa_position/_position/'
If you want to capture the last part of the URL you can use the following regex:
\/product-category\/pillows\/\?pa_position=\(.*\)
The string 'back-sleeper' in this case will then be accessible as the first matched group (\1) of this regex.

Regex PCRE capture multiple occurences query string in URL

I am trying to capture multiple occurence of utm tag in a URL and append when re-writing the url. However i just want utm key values and skip others.
This is a sample URL
https://example.com/dl/?screen=page&title=SABC&page_id=4063&myvalue=Noidea&utm_source=sourceTest19&utm_medium=mediumTest19&utm_campaign=campaignTest19&utm_term=termTest19&test=value&utm_content=contentTest19
I tried this:
(\?.*)(page_id=([^&]*))(\?|&)(.*[&?]utm_[a-z]+=([^&]+).*)
and unfortunately, it doesn't produce the result I expect.
I need to capture PAGE ID and utm tags both, but do not want test=value, myvalue=Noidea and only want query strings with utm tags.
Expected Result is the URL below:
https://example.com/dl/page_id/4063?utm_source=sourceTest19&utm_medium=mediumTest19&utm_campaign=campaignTest19&utm_term=termTest19&utm_content=contentTest19
one group with pageid=<somenumber/text>
one group with all utm tags with key and value
Help will be appreciated.

You can make regex like this to get group result:
(?:(page_id|utm_[a-z]+)=[A-z0-9]+)(?:^\&)?

You can instead replace any parameter that does not match the desired ones with the empty string. The pattern for this is
(?:[?&](?!(?:page_id|utm_[^=&]++)=)[^&]*+)++$|(?<=[?&])(?!(?:page_id|utm_[^=&]++)=)[^&]*+(?:&|$)
Here's a working proof: https://regex101.com/r/L5xcl4/2 It has an extra \s only so it works on the multiline input in the tester, but you shouldn't need it as you'll be working on a string that contains only a URL without whitespace.

How to regex last part of URL only?

I want to do a regex that target the end of a url:
www.company.com/orders/thanks
If you return from email or account page order ID is populated in the end:
www.company.com/orders/thanks/1sfasd523425
So I only want to target the URL that ends with /thanks
This thread bring is similiar: How do I get the last segment of URL using regular expressions
Had something similair .*\/thanks\/.+ but target incorrectly.
EDIT: Only target URLs ending with /thanks or /thanks/

Try with lookahead like this.
Regex: .+(?=\/thanks$).+
Explanation: This will match the URL only if thanks is at end of string by positive lookahead.
Regex101 Demo

Use URL object dont parse it yourself
URL url = new URL("http://stackoverflow.com/questions/36616915/how-to-regex-last-part-of-url-only");
URLDecoder.decode(url.getPath(), "utf-8");
url.getPath();
url.getContent();
url.getPort();
url.getContent();

RegEx to cut out URL

I try to get an URL from a String of the following format:
RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH
I already tried some things, especially the the look before/after, which I used before successfully on another url format (starts https... ends .html, this was working).
But seems I'm too stupid to figure out the regex for the kind of string mentioned above. I just want the URL part from https.... to the end of the random last name. Is this even possible?
Any Ideas?

If you can guarantee that randomfirstname_randomlastname is all lowercase and RANDOMRUBBISH is all uppercase, you can use character classes [a-z] and [A-Z]. The language the regex is for will determine how to use these.
This is example works in javascript:
var str = "RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH";
var match = /https:\/\/www\.my-url\.com\/[a-z]*/.exec(str);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract last part of url without query string or jsessionid - regex

I would use this regex: .http:\/\/.\/some\/path.\/([^;\?]+);?.\?.* ^^^^^^ Basically matches anything that isn't ; or ?. And I think it might be shortened to: .http:\/\/.\/some\/path.*\/([^;\?]+)

Related

How to replace part of a URL with regex

Pass query value to new URL - regex

Regex PCRE capture multiple occurences query string in URL

How to regex last part of URL only?

RegEx to cut out URL

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract last part of url without query string or jsessionid - regex

I would use this regex: .*http:\/\/.*\/some\/path.*\/([^;\?]+);?.*\?.* ^^^^^^ Basically matches anything that isn't ; or ?. And I think it might be shortened to: .*http:\/\/.*\/some\/path.*\/([^;\?]+)

Related

How to replace part of a URL with regex

Pass query value to new URL - regex

Regex PCRE capture multiple occurences query string in URL

How to regex last part of URL only?

RegEx to cut out URL

Categories

Resources

I would use this regex: .http:\/\/.\/some\/path.\/([^;\?]+);?.\?.* ^^^^^^ Basically matches anything that isn't ; or ?. And I think it might be shortened to: .http:\/\/.\/some\/path.*\/([^;\?]+)