Correct regex / mod-rewrite syntax for this url - regex

Hi I am having a little difficulty working out this mod rewrite rule / regex correctly.
I have a url format like this:
www.site.com/some-page-title-here-cb384
www.site.com/another-page-title-here-cb385
And I'd like to find only the numbers after each 'cb' only if the url contains a 'cb' after the last hyphen in each string.
I have:
.*?([0-9]+)$
Which matches the last set of numbers but I need to be more specific in saying only if the last section of the url contains the pattern '-cb'.

Try this:
.*-cb(\d+)$
This one should work and you should find the numbers in $1.
Let me be more specific. Your regexp (and mine above) doesn't match only the last part of the string, but matches the whole string. If you want to match only the last part, you should write it without .*:
-cb(\d+)$

Related

How to get the only the digit using Regex expression from URL?

I need some help with Regex expression, as it s very new to me.
I have a URL which consists of Item Number or Product ID.
What I am looking to achieve is that could trim the URL part and extra part after a symbol of %.
Here is how the url looks like.
https://www.test.com/test-test/test/test-demo-demo-demo-demo.html?piid=12345678%2C24753325#seemoreoptions-b0uksl51j4m
OR
https://www.test.com/test-test/test/test-demo-demo-demo-demo.html?piid=12345678
So from the above URL I am looking to trim https://www.test.com/test-test/test/test-demo-demo-demo-demo.html?piid= and this part %2C24753325#seemoreoptions-b0uksl51j4m
So, this should give me only 12345678.
I have use the following Regex
(.*)(\=) Replace with $2
Above Regex does trim the url first part but does not the part after % symbol.
I tried to get solution on
https://regexr.com/
So for the both the above URL examples, I should get the result as
12345678
Thank you in advance
Instead of trimming part before and after digits you want, try another approach: extract digits you want.
You can use groups (parentheses) in regexp to extract found data.
piid=([0-9]+)
It means:
piid= - text to find
[0-9]+ - one or more digits
() - group
You can extract first group by $1 (or \1 etc. - depends of language you use).
Example: https://regexr.com/758d9

URL regex that skips ending periods

I'm trying to create a regex that matches url strings within normal text. I have this:
http[s]?://[^\s]+
This seems to work well with the exception that if the url is at the end of a sentence it will grab the period as well. For example for this string:
I am typing some text with the url http://something.com/something-?args=someargs. This is another sentence.
it matches:
http://something.com/some-thing?args=someargs.
I would like it to match:
http://something.com/some-thing?args=someargs
Obviously I can't exclude periods because they are in the url previously but I can't figure out how to tell it to exclude the last period if there is one. I could potentially use a negative lookahead for end of line or whitespace, but if it's in the middle of the line (without a period after it) that would leave off the last character of the url.
Most of the ones I have seen online have the same issue that they match the ending dot so maybe it's not possible? I know basic regex but certainly not a genius with it so if someone has a solution I would be very grateful :).
Also, I can do some post-process in this case to remove the dot if I need to, just seems like there should be a Regex solution...
Try this one
http[s]?://[^\s]+[^. ]

What's the right regular expression to match the exact word at the end of a string and excluding all other urls with more chars at the end?

I have to match an exact string at the end of a url, but not match all other urls that have more characters after that string
I can better explain with example.
I need to match the url having the string 'white' at its end: http//mysite.com/white
But I also need to not match urls having one or more characters postponed to it, like http//mysite.com/white__blue or http//mysite.com/white/yellow or http//mysite.com/white/
How to do that?
Thanks
Regex to match any url*
^(https?:\/\/)?([\da-z\.-]+\.[a-z\.]{2,6}|[\d\.]+)([\/:?=&#]{1}[\da-z\.-]+)*[\/\?]?$
Regex to match a url containing white in the end
^(https?:\/\/)?([\da-z\.-]+\.[a-z\.]{2,6}|[\d\.]+)([\/:?=&#]{1}[\da-z\.-]+)*[\/\?]?white$
You can check the regex here
From regexr.com
It does not match urls(which are not valid anyway) like
httpabrakadabra.co//
http:google.com
http://no-tld-here-folks.a
http://potato.54.211.192.240/
Based on your limited sample inputs, I'd say you could get away with this very minimal pattern:
^http[^\s]+white$
However, depending on what you are truly trying to achieve, what language/function you are implementing this pattern with, and what the full input string looks like, this pattern may need to be refined.
It would be best if you would improve your question to include all of the above relevant information.

Regex: Negative lookahead after list match

Consider the following input string (part of css file):
url('data:image/png;base64,iVBORw0KGgoAAAAN...');
url(example.png);
The objective is to take the url part using regex and do something with it. So the first part is easy:
url\(['"]?(.+?)['"]?\)
Basically, it takes contents from inside url(...) with optional quotes symbols. Using this regexp I get the following matches:
data:image/png;base64,iVBORw0KGgoAAAAN...
example.png
So far so good. Now I want to exclude the urls which include 'data:image' in their text. I think negative lookahead is the proper tool for that but using it like this:
url\(['"]?(?!data:image)(.+?)['"]?\)
gives me the following result for the first url:
'data:image/png;base64,iVBORw0KGgoAAAAN...
Not only it doesn't exclude this match, but the matched string itself now includes quote character at the beginning. If I use + instead of first ? like this:
url\(['"]+(?!data:image)(.+?)['"]?\)
it works as expected, url is not matched. But this doesn't allow the optional quote in url (since + is 1 or more). How should I change the regex to exclude given url?
You can use negative lookahead like this:
url\((['"]?)((?:(?!data:image).)+?)\1?\)
RegEx Demo

URL Rewrite Pattern to exclude application name from path

I'm trying to use the IIS 7 URL Rewrite feature for the first time, and I'm having trouble getting my regular expression working. It seems like it should be simple enough. All I need to do is rewrite a URL like this:
http://localhost/myApplication/MySpecialFolder
To:
http://localhost/MySpecialFolder
Is this possible? I want the regular expression to ignore everything before "myApplication" in the original URL, so that I could use "http://localhost" OR "http://mysite", etc.
Here's what I've got so far:
^myApplication/MySpecialFolder$
But using the "Test Pattern..." feature in IIS, it says my patterns don't match unless I supply "myApplication/MySpecialFolder" exactly. Does anyone know how I can update my regular expression so that everything prior to "myApplication" is ignored and the following URLs will be seen as a match?
http://localhost/myApplication/MySpecialFolder
http://mysite/myApplication/MySpecialFolder
Many thanks in advance!
SOLUTION:
I needed to change my regex to:
myApplication/MySpecialFolder
Without the ^ at the beginning and without the $ at the end.
Your regular expression is correct, the pattern will be matched against path starting after the first slash after the domain.
So only bold part will be used for matching: http://localhost/myApplication/MySpecialFolder
To limit the rewriting to specific domain you have to use Conditions section with Condition input = {HTTP_HOST}
Unless there is something radically different with regexes in IIS, you would want to take out the anchor (^) at the beginning to match.
myApplication/MySpecialFolder$
The carat ^ tells it that that is the beginning of the string and the dollar sign $ tells it to match the end. A regex like abc finds "abc" anywhere in the string, ^abc matches strings that start with "abc", abc$ matches strings that end with "abc", and ^abc$ only matches when the whole string is "abc".