URL matching regex - regex

I need some help with URL matching regex. I read the regex syntax documentation but it's so complex.
I'm trying to create a URL list for a checkout funnel, how would I set up regex for the following?
https://shop.mysite.ca/[unique ID]/checkouts/[unique ID 2]
OR
https://shop.mysite.ca/[unique ID]/checkouts/[unique ID
2]?step=contact_information
What I have so far, though not sure how to put the optional parameter "step=contact_information")
/^(https:\/\/shop.mysite.ca\/)([\da-z]+)(\/checkouts\/)([\da-z]+)$/

You can use a "?" to make a group either appear 0 or 1 times, making it optional.
/^(https:\/\/shop.mysite.ca\/)([\da-z]+)(\/checkouts\/)([\da-z]+)(\?step=contact_information)?$/

this should work
^(https:\/\/shop.mysite.ca\/)([\da-z]+)(\/checkouts\/)([\da-z]+)((\?step=contact_information)*)$
edit: forgot the ? and used * instead. The other solution by #thomas is a bit better I think

Try it:
^(https:\/\/shop.mysite.ca\/)(\d+.)(\/checkouts)(\/)(\d+.)($|\?\w.+$)
If unique ID is composed with non numbers characteres:
^(https:\/\/shop.mysite.ca\/)([\da-zA-Z]+.)(\/checkouts)(\/)([\da-zA-Z]+.)($|\?\w.+$)

Related

combine two URLs REGEX

I have data from two URLS that I need to combine using REGEX
/online-teaching
/online-teaching?fbclid
I have /(online-teaching)|(online teaching)
I can't figure out how to include the url with the ? and the one without.
Thanks!
How about something as simple as:
online-teaching(?:.+)?
Regex demo
Match online-teaching and anything that follows, if it exists (might need to constraint for specific characters instead of matching all with . to have a valid URL, but I'll leave that up to you).

RegEx remove part of string and and replace another part

I have a challenge getting the desired result with RegEx (using C#) and I hope that the community can help.
I have a URL in the following format:
https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1
I want make two modifications, specifically:
1) Remove everything after 'value' e.g. '&ida=0&idb=1'
2) Replace 'category' with e.g. 'newcategory'
So the result is:
https://somedomain.com/subfolder/newcategory/?abc=text:value
I can remove the string from 1) e.g. ^[^&]+ above but I have been unable to figure out how to replace the 'category' substring.
Any help or guidance would be much appreciated.
Thank you in advance.
Use the following:
Find: /(category/.+?value)&.+
Replace: /new$1 or /new\1 depending on your regex flavor
Demo & explanation
Update according to comment.
If the new name is completely_different_name, use the following:
Find: /category(/.+?value)&.+
Replace: /completely_different_name$1
Demo & explanation
You haven't specified language here, I mainly work on python so the solution is in python.
url = re.sub('category','newcategory',re.search('^https.*value', value).group(0))
Explanation
re.sub is used to replace value a with b in c.
re.search is used to match specific patterns in string and store value in the group. so in the above code re.search will store value from "https to value" in group 0.
Using Python and only built-in string methods (there is no need for regular expressions here):
url = r"https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1"
new_url = (url.split('value')[0] + "value").replace("category", 'newcategory')
print(new_url)
Outputs:
https://somedomain.com/subfolder/newcategory/?abc=text:value

Optional question mark - regexp

I have problem with creating correct regular expression.
Here is what I have so far:
https://regex101.com/r/d0epRo/2
I need to add to this links one more parameter and I have to determinate wheather there is question mark or not. Therefore ? should be optional but I can't get it to work.
Those not working (\?|) (\?)? (\??).
Those should be marked http://www.polskieszlaki.pl and http://www.polskieszlaki.pl/wawel.htm but aren't
I have no forther ideas. Help please.
I think what you want is this regex:
a[\s]+href="[^mailto][\S]+polskieszlaki\.pl(?:(.*))?(?:(\?)(.*))?\"
This (?: ... ) means "do not capture"
If you are just trying to retrieve the query parameters try:
a[\s]+href="[^mailto][\S]+polskieszlaki\.pl(.*)(?:\?(?<param>.*))\"
You can then extract the param group
Or in a more simpler form without the named + ignored capture groups:
a[\s]+href="[^mailto][\S]+polskieszlaki\.pl(.*)(\?(.*))\"

Regex for URL to sites

I have two URLs with the patterns:
1.http://localhost:9001/f/
2.http://localhost:9001/flight/
I have a site filter which redirects to the respective sites if the regex matches. I tried the following regex patterns for the 2 URLs above:
http?://localhost[^/]/f[^flight]/.*
http?://localhost[^/]/flight/.*
Both URLS are getting redirected to the first site, as both URLs are matched by the first regex.
I have tried http?://localhost[^/]/[f]/.* also for the 1st url. I am Unable to get what am i missing . I feel that this regex should not accept any thing other than "f", but it is allowing "flight" as well.
Please help me by pointing the mistake i have done.
Keep things simple:
.*/f(/[^/]*)?$
vs
.*/flight(/[^/]*)?$
Adding ? before $ makes the trailing slash with optional path term optional.
The first one will be caught with following regex;
/^http:[\/]{2}localhost:9001\/f[^light]$/
The other one will be disallowed and can be found with following regex
/^http:[\/]{2}localhost:9001\/flight\/$/
You regex has several issues: 1) p? means optional p (htt:// will match), 2) [^/] will only match : in your URLs since it will only capture 1 character (and you have a port number), 3) [^light] is a negated character class that means any character that is not l, i, g, h, or t.
So, if you want to only capture localhost URLs, you'd better use this regex for the 1st site:
http://localhost[^/]*/f/.*
And this one for the second
http://localhost[^/]*/flight/.*
Please also bear in mind that depending on where you use the regexps, your actual input may or may not include either the protocol.
These should work for you:
http[s]{0,1}:\/\/localhost:[0-9]{4}\/f\/
http[s]{0,1}:\/\/localhost:[0-9]{4}\/flight\/
You can see it working here

RegExp replace after

I have some link templates and I need to replace substrings inside of that links.
Link templates:
"/all_news"
"/all_news/"
"/all_news/page1"
"/all_news/page1/"
All of these templates mean the same thing - first page of news page without filtering.
So I need to:
1st template - insert "/pageX"
2nd template - insert "pageX"
3rd and 4th templates - replace page number
Is it possible with only one regexp?
If yes, then please help me.
If no, then I have 2nd question:
maybe its possible to replace everything after "/all_news" on "/pageX"?
I mean next logic:
string started
ok, I see substring "/all_news"
I replace everything after "/all_news" even if nothing exist(if string ends by "/all_news")
I return "/all_news/pageX".
This'll do it.
'/all_news/page1'.replace(/(.*\/all_news).*/,'$1' + '/pageX');
Just one for all.
Java has lookbehind. It negates the need for the $1. The solution looks like:
String result = "/all_news/page1";
String pattern = "(?<=\\/all_news).*";
System.out.println(result.replaceAll(pattern,"/PageX"));
Cheers.