Match part of url with regex - regex

I have a challenge with a regex match to a url I hope I can bug some of you clever heads with :-)
Please take a look at this testcase https://www.regex101.com/r/bH4hE1/2
I use the regex: (\w+)(.\w+)+(?!.*(\w+)(.\w+)+)
Problem is, it only finds reports.html but I also need to find reports in the first url
https://my.website.com/reports?ref_=kdp_BS
https://my.website.com/reports.html

To capture "reports" or "reports.html" in any path, begin your match after the last /, and capture word characters and .:
/.*\/([.\w+]+)/
See: https://www.regex101.com/r/iZ7dF3/8

Try:
/([^\/?]+)(?:\?.+)?$/gim
It will work end selects:
reports
reports.html

Related

Multiple slash in URL replacement though regex

I am trying to create a regex in pcre, that is going to salinize URL with multiple slashes like the following:
https://www.domin.com/test1/////test2/somemoretests_67142 https://www.domin.com/test1/test2/somemoretests_67142///// https://www.domin.com/test1/test2///somemoretests_67142
So that I can replace it with the following: https://\2\4 and the link at the end of it looks: https://www.domin.com/test1/test2/somemoretests_67142
I have been struggling with it for the past couple of days, so any regex guru help is more than welcome :)
I have tried the following and more:
(http|https):\/\/(.*)(\/\/+)(.*)
(http|https):\/\/(.*)(\/\/){2,}(.*)
(http|https):\/\/(.*)(\/\/{2})(.*)
I am going to utilize these for Akamai to sanitize our URLs though cloudlet.
You can try:
(?<!https:\/)(?<!http:\/)(\/+$|(?<=\/)\/+)
And substitute the first group with empty string.
Regex demo.
This will produce this output:
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142

Regex for Affiliate URL

For Matomo outgoing link tracking I need the regex pattern, which matched the following URLs:
https://www.example.com/product/?sku=12345&utm_source=123456789
and
https://www.example.com/product/?utm_source=123456789
"https://www.example.com/" and "utm_source=123456789" are always fixed in the URL, just "product/" or "category/product/" change and must replaced by regex pattern.
Thanks
Maybe this example can help you reach your goal:
(?<=https:\/\/www\.example\.com\/).+(?=utm_source=123456789)
It looks for any characters between these two groups:
https://www.example.com/
utm_source=123456789
Given the examples:
https://www.example.com/product/?sku=12345&utm_source=123456789
https://www.example.com/product/?utm_source=123456789
Your matches would be:
product/?sku=12345&
product/?

RegEx expression to find only exact match with Capitalization variants

I want to filter down landing pages in Analytics to just see traffic from “/ph” and “/pH” only, and not include other pages that have like /ph-electrode-maintenance-calibration-guide. (ph|pH) didn't work, Any help would be greatly appreciated. enter image description here
Specify Start of String ^ and End of String $ Anchors:
^\/p[hH]$
http://www.regular-expressions.info/anchors.html
Try (\/p(h|H)(\/?$|\/)) it will match all your paths, eg:
path/pH
path/ph
path/pH/anything
path/ph/anything
but not:
path/ph-electrode-maintenance-calibration-guide
Add a ^ to the beginning if your path always begins with /ph:
^(\/p(h|H)(\/?$|\/))
Try RegExr to test it out

Regex: Get subtext from a string

I have a list of text lines. Each line contains a title and a URL as follows:
product-title-7134 http://domain.com/page-1
another-product-title-822 http://domain.com/page-218
etc.
Using only .NET regex, please help me extract the url from each line.
I understand it can be done by looking at the string from the end until the http is met and output that part but I don't know the exact regex formula for that. Any help is much appreciated.
I would do that with this regex:
http://(\S+)
And find first group in every match.
This regex will math all https:// and http:// links:
(http|https)(://\S+)
You can test this in the .NET regex tester: http://regexstorm.net/tester

Write a regex for url match

I'm trying to write wordpress pretty permalinks regex.
I have following urls. I need 2 matches,
1st : last word between / and / before get/
2nd : string which is start with get/
Url's may be like these
http://localhost/akasia/yacht-technical-services/yacht-crew/get/gulets/for/sale/
Here I need "yacht-crew" and "get/gulets/for/sale/"
http://localhost/akasia/testimonials/get/motoryachts/for/sale/
here I need "testimonials" and get/motoryachts/for/sale/
http://localhost/akasia/may/be/lots/of/seperator/but/ineed/last/get/ships/for/rent/
here I need "last" and get/ships/for/rent/
I catch 2nd part with
(.(get/(.)?))
but for first part there is no luck.
I will be appreciated if someone helps.
Regards
Deniz
I suggest the following:
([^\/]+?)\/(get\/.+)
https://regex101.com/r/uN6yH3/1
The concept is that you match non-slash characters up to the first slash (non-greedy) that is followed by the word "get" grouping it, and then just grab the rest as the second group.
I am assuming PHP.
$path = parse_url($url,PHP_URL_PATH);
$s = strrpos($path,'/');
$matches[] = substr($path,$s+1);