How to cut url down correctly by regex? - regex

May I ask you some question about regex? It will be cool if you could help me to solve an issue. I have tons of urls and I need to find out all unique which has word promo in url.
For instance, I have a bunch urls like that:
/promo/vygoda-do-20-na-samsung?from=hb
/promo/antikrizisnaya-rasprodazha-skidki-do-50-mark164615151?from=hb
/promo/antikrizisnaya-rasprodazha-skidki-do-50-mark164615151
but I need get like this:
/promo/vygoda-do-20-na-samsung
/promo/antikrizisnaya-rasprodazha-skidki-do-50
/promo/antikrizisnaya-rasprodazha-skidki-do-50
All I could do it is
https://regex101.com/r/Ot8xzV/1
I have just started my journey to regex and don't have strong knowledge, so, please help me to do it. I'll be very grateful

Use
(.*/promo/[^?]+?)(?:-mark\d+|\?).*
Replace with $1 if you can replace. Capturing group may work for you already.
See proof.

Related

how to make regex

I was trying to solve a problem through regex. but It's very hard to make the regex. let look to an example maybe you people can help me out. and gave me some good source to learn regex. Now my problem is I want to make the regex for a sentence. e.g www.facebook.com www.goole.com www.online.facebook.com www.live.com if you see these example the www and com is same but the data between these are changing. i tried to make through this link but can't.

Regex to find a web address

I'm trying to isolate links from html using a regex and the one I found that is suppose to do it doesn't seem to work.
/^(http?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Am I missing something? I'm using Brackets as my text editor
^(?:http|https):\/\/(?:[a-z0-9\-\.]+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:\.\?\+=&%#!\-\/\(\)]+)|\?(?:[\w#!:\.\?\+=&%#!\-\/\(\)]+))?$
Messy, but works.
Also, you might want to look at a similar question: Regex expression for valid website link
Hope this helps :)
It is hard to make it 100% accurate.
A url could also be a IP address for example.
http://ip/
It can contain query strings.
http://www.google.com/?a=1&b=2
It can contain spaces.
http://www.google.com/this is my url/
It depends on what need you have for accuracy.

Regular expression to exclude local addresses

I'm trying to configure my Foxy Proxy program and one of the features is to provide a regular expression for an exclusion list.
I'm trying to blacklist the local sites (ending in .local), but it doesn't seem to work.
This is what I attempted:
^(?:https?://)?\d+\.(?!local)+/.*$
^(?:https?://)?\d+\.(?!local)(\d)+/.*$
I also researched on Google and Stack Exchange with no success.
Since you indicate in the comments that you actually need a whitelist solution, I went with that:
Try: ^(?:https?://)?[\w.-]+\\.(?!local)\w+/.*$
http://regex101.com/r/xV4gS0
Your regex expressions match host names which start with a series of digits followed by a period and then not followed by the string "local". If this is a "blacklist", then that hardly seems like what you want.
If you're trying to match all hostnames which end in .local, you'd want something like the following for the hostname portion:
[^/]*\.local(?:/|$)
with appropriate escapes inserted depending on regex context.
If your original question was incorrect and you really need a whitelist, then you'd want something like:
^(?:(?!\.local)[^\/])*(?:\/|$)
as illustrated in http://regex101.com/r/yB0uY4
Thank you everyone to help. Indeed, it turns out that for this program, enlisting "not .local" as blacklist, it's not the same as "all .local" as whitelist.
I also had a rookie mistake on my pattern. I meant "\w" instead of "\d". Thank you Peter Alfvin for catching that.
So my final working solution is what Bart suggested:
^(?:https?://)?[\w.-]+\.(?!local)\w+/.*$ as a whitelist.

RegEx match all website links except those containing admin

I'm setting up URL Rewrite on an IIS and i need to match the following URLs using regex.
http://sub.mysite.com
sub.mysite.com
sub.mysite.com/
sub.mysite.com/Site1
sub.mysite.com/Site1/admin
but not:
sub.mysite.com/admin
sub.mysite.com/admin/somethingelse
sub.mysite.com/admin/admin
The site it self (sub.mysite.com) should not be "hardcoded" in the expression. Instead, it should be matched by something like .*.
I'm really blank on this one. I did find solutions to match the different URLs but once i try to combine them either none of them match or all of them do.
I hope someone can help me.
For your specific case, assuming you are matching the part after the domain (REQUEST_URI):
(?!/admin).*
(?!...) is a negative lookahead. I am not sure if it is supported in the IIS URL Rewrite engine. If not, a better approach would be to check for a complementary approach:
Or as #kirilloid said, just match /admin/? and discard (pay attention to slashes).
BTW. if you want to quickly test RegExps with a "visual" feedback, I highly recommend http://gskinner.com/RegExr/
([A-Za-z0-9]+.)+.com(?!/admin)/?([A-Za-z0-9]+/?)*
this should do the trick

What is the regex for a URL like this?

I don't really know regex, but would like a quick solution to search and replace links. I want to use the search regex wordpress plugin to remove links in my post. How do I format the regex to a link like this:
http://website.com/index.php?id=934&title=item name
edit: the numbers in the id and the item name varies
Thank you in advance!
Try this one out: http://regexr.com?2vjq6
Depending on whether or not you need whitespace in your "title" parameter, the regex I provided may need to be altered. Best practice would be to not have whitespace in your URLs (use URL encoding instead, where a space = %20).
http://website.com/index.php\?id=[0-9]*&title=[a-zA-Z0-9\-]*
Try this pattern
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?