Define Regular Expression to match ending of URL - regex

What would the regular expression look like to include/exclude a specific URL? I posted two URLs below -I need a regex that will distinguish between the two. The only difference in the two URLs is the ending: type vs hcat.
https://post.craigslist.org/k/WDEDan6W4xGILKcEW036_A/w7TH4?s=type
https://post.craigslist.org/k/WDEDan6W4xGILKcEW036_A/w7TH4?s=hcat

I hope I understood your question right.
But if you want to give the exact given URLs in - this should do:
"https://post\.craigslist\.org/k/WDEDan6W4xGILKcEW036_A/w7TH4\?s=(type|hcat)"
With this, Capture Group 1 would contain either type or hcat or nothing.
If you want to check based on this domain URL and the URL should end on the parameter s with type or cat, use this:
"https://post\.craigslist\.org/.*?s=(type|hcat)"
Note: The ? now marks the * as not greedy, it is not the escaped \? from above.

Related

combine two URLs REGEX

I have data from two URLS that I need to combine using REGEX
/online-teaching
/online-teaching?fbclid
I have /(online-teaching)|(online teaching)
I can't figure out how to include the url with the ? and the one without.
Thanks!
How about something as simple as:
online-teaching(?:.+)?
Regex demo
Match online-teaching and anything that follows, if it exists (might need to constraint for specific characters instead of matching all with . to have a valid URL, but I'll leave that up to you).

Regex for URL to sites

I have two URLs with the patterns:
1.http://localhost:9001/f/
2.http://localhost:9001/flight/
I have a site filter which redirects to the respective sites if the regex matches. I tried the following regex patterns for the 2 URLs above:
http?://localhost[^/]/f[^flight]/.*
http?://localhost[^/]/flight/.*
Both URLS are getting redirected to the first site, as both URLs are matched by the first regex.
I have tried http?://localhost[^/]/[f]/.* also for the 1st url. I am Unable to get what am i missing . I feel that this regex should not accept any thing other than "f", but it is allowing "flight" as well.
Please help me by pointing the mistake i have done.
Keep things simple:
.*/f(/[^/]*)?$
vs
.*/flight(/[^/]*)?$
Adding ? before $ makes the trailing slash with optional path term optional.
The first one will be caught with following regex;
/^http:[\/]{2}localhost:9001\/f[^light]$/
The other one will be disallowed and can be found with following regex
/^http:[\/]{2}localhost:9001\/flight\/$/
You regex has several issues: 1) p? means optional p (htt:// will match), 2) [^/] will only match : in your URLs since it will only capture 1 character (and you have a port number), 3) [^light] is a negated character class that means any character that is not l, i, g, h, or t.
So, if you want to only capture localhost URLs, you'd better use this regex for the 1st site:
http://localhost[^/]*/f/.*
And this one for the second
http://localhost[^/]*/flight/.*
Please also bear in mind that depending on where you use the regexps, your actual input may or may not include either the protocol.
These should work for you:
http[s]{0,1}:\/\/localhost:[0-9]{4}\/f\/
http[s]{0,1}:\/\/localhost:[0-9]{4}\/flight\/
You can see it working here

Regex match multiple data in URL for IIS Rewrite rule

I'm looking for some help with a regex pattern for rewriting a URL. My URL structure is:
http://domain.com/[username]/[token]/[userid]/
The data types are:
username = alphanumeric
token = alphanumeric
userid = numeric
An example with data:
http://domain.com/john1975/aBc123/123456789/
Using a regular expressions I'm trying to get a reference for each piece of data, so I can rewrite to:
index.asp?username={R:1}&token={R:2}&userid={R:3}
Also keep in mind the regex shouldn't be too greedy, so I can still access files such as:
http://domain.com/about.asp
http://domain.com/images/logo.png
The regex I've tried is:
^[0-9a-z]+/[0-9a-z]+/[0-9]+$
This doesn't match my example URL.
You're missing the trailing forward slash. The regex should be :
^([0-9a-z]+)/([0-9a-z]+)/([0-9]+)/$
I'm assuming you're flagging it as case insensitive. If not then you need
^([0-9a-zA-Z]+)/([0-9a-zA-Z]+)/([0-9]+)/$
You also need the brackets so you can call your back references, which are also wrong - you want to match on 1,2 and 3, not 0, which is the match of the whole expression. They should read:
index.asp?username={R:1}&token={R:2}&userid={R:3}

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+

Regex to match all URLs except certain URLs

I need to match all valid URLs except:
http://www.w3.org
http://w3.org/foo
http://www.tempuri.org/foo
Generally, all URLs except certain domains.
Here is what I have so far:
https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
will match URLs that are close enough to my needs (but in no way all valid URLs!) (thanks, http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/!)
https?://www\.(?!tempuri|w3)\S*
will match all URLs with www., but not in the tempuri or w3 domain.
And I really want
https?://([-\w\.]+)(?!tempuri|w3)\S*
to work, but afaick, it seems to select all http:// strings.
Gah, I should just do this in something higher up the Chomsky hierarchy!
The following regular expression:
https?://(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)\S*
only matches the first four lines from the following excerpt:
https://ok1.url.com
http://ok2.url.com
https://not.ok.tempuri.com
http://not-ok.either.w3.com
http://no1.w3.org
http://no2.w3.org
http://tempuri.bla.com
http://no4.tempuri.bla
http://no3.tempuri.org
http://w3.org/foo
http://www.tempuri.org/foo
I know what you're thinking, and the answer is that in order to match the above list and only return the first two lines you'd have to use the following regular expression:
https?://(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)\S*
which, in truth, is nothing more than a slight modification of the first regular expression, where the
(?!w3|tempuri)([-\w]*\.)
part appears twice in a row.
The reason why your regular expression wasn't working was because when you include . inside the ()* then that means it can not only match this. and this.this. but also this.this.th - in other words, it doesn't necessarily end in a dot, so it will force it to end wherever it has to so that the expression matches. Try it out in a regular expression tester and you'll see what I mean.