Regex to match a URL and insert a directory - regex

I would like to use regex to match the following:
http://www.test.com/example/sometext/
and then redirect to:
http://www.test.com/uk/example/sometext/
where 'example' is not in a list of reserved words, like _images, _lib, _css, etc.

Use a negative look ahead:
(http://www.test.com/)((?!(_images|_lib|_css))[^/]+/sometext/)
And replace with
$1uk/$2
Broken down, the juicy buts are:
(?!someregex) = a negative lookahead - ie assert the following input does not match someregex
(_images|_lib|_css) = the syntax for regex OR logic, just using literals
[^/]+ = some characters that aren't a slash

Related

regex to select only first instance of string (no duplicates)

I am using this regex
(rs)\w+/
to select strings that begin with the string 'rs', i.e.
..the biomarker rs4343 but not rs4342. However rs4343 ..
this returns: rs4343, rs4242, re4343
Is it possible to use regex to select only the first instance of a matched string to avoid duplication, i.e. to return: rs4343, rs4242
I can use JS or PHP regex.
Try this:
(rs\w+)(?!.*\1)
Regex101
Details:
(rs\w+) - Group the required match
(?!.*\1) - Use negative lookahead to assert that there is no same match after this

Regex to extract second word from URL

I want to extract a second word from my url.
Examples:
/search/acid/all - extract acid
/filter/ion/all/sss - extract ion
I tried to some of the ways
/.*/(.*?)/
but no luck.
A couple things:
The forward slashes / have to be escaped like this \/
The (.*?) will match the least amount of any character, including zero characters. In this case it will always match with an empty string.
The .* will take as many characters as it can, including forward slashes
A simple solution will be:
/.+?\/(.*?)\//
Update:
Since you are using JavaScript, try the following code:
var url = "/search/acid/all";
var regex = /.+?\/(.*?)\//g;
var match = regex.exec(url);
console.log(match[1]);
The variable match is a list. The first element of that list is a full match (everything that was matched), you can just ignore that, since you are interested in the specific group we wanted to match (the thing we put in parenthesis in the regex).
You can see the working code here
This regex will do the trick:
(?:[^\/]*.)\/([^\/]*)\/
Proof.
For me, I had difficulties with the above answers for URL without an ending forward slash:
/search/acid/all/ /* works */
/search/acid /* doesn't work */
To extract the second word from both urls, what worked for me is
var url = "/search/acid";
var regex = /(?:[^\/]*.)\/([^\/]*)/g;
var match = regex.exec(url);
console.log(match[1]);

regex: match 'customer' in string 'styles/customer.1031.css'

I'm trying to extract the customer string out of a filepath in nodejs. So far I have come up with this:
var fileName = 'styles/customer.1031.css';
fileName = fileName.substring(7);
fileName = fileName.substring(0, fileName.length - 4);
fileName = fileName.match('[a-z]*')[0];
console.log(fileName); // <-- yields 'customer'
I'm cutting the styles/ from the beginning and the .css from the end. Then I'm only matching the lowercase characters. What would be a proper regex to match only the customer string so I don't need to cut the string before? F. ex. how would the regex look like to catch everything after styles/ until the .?
The regex to use could look like ^styles/([^.]+)\..*$ where
^styles/ translates to "starts with 'styles/'
Then your match (at least one character, matching until first '.')
Then a literal '.'
Then anything until the end of the string (this is optional, depending on your needs)
How would the regex look like to catch everything after styles/ until
the .?
This is how it will look like:
styles\/(.*?)\.
Run it on Regex101
The caught string can then be accessed via \1.
You can use non capturing group regex as (?:styles\/)(.*?)\.
var fileName = 'styles/customer.1031.css';
console.log(/(?:styles\/)(.*?)\./.exec(fileName)[1])

Regex: Negative lookahead after list match

Consider the following input string (part of css file):
url('...');
url(example.png);
The objective is to take the url part using regex and do something with it. So the first part is easy:
url\(['"]?(.+?)['"]?\)
Basically, it takes contents from inside url(...) with optional quotes symbols. Using this regexp I get the following matches:
...
example.png
So far so good. Now I want to exclude the urls which include 'data:image' in their text. I think negative lookahead is the proper tool for that but using it like this:
url\(['"]?(?!data:image)(.+?)['"]?\)
gives me the following result for the first url:
'...
Not only it doesn't exclude this match, but the matched string itself now includes quote character at the beginning. If I use + instead of first ? like this:
url\(['"]+(?!data:image)(.+?)['"]?\)
it works as expected, url is not matched. But this doesn't allow the optional quote in url (since + is 1 or more). How should I change the regex to exclude given url?
You can use negative lookahead like this:
url\((['"]?)((?:(?!data:image).)+?)\1?\)
RegEx Demo

regex : how to eliminiate urls ending with .dtd

This is JavaScript regex.
regex = /(http:\/\/[^\s]*)/g;
text = "I have http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd and I like http://google.com a lot";
matches = text.match(regex);
console.log(matches);
I get both the urls in the result. However I want to eliminate all the urls ending with .dtd . How do I do that?
Note that I am saying ending with .dtd should be removed. It means a url like http://a.dtd.google.com should pass .
The nicest way to do it is to use a negative lookbehind (in languages that support them):
/(?>http:\/\/[^\s]*)(?<!\.dtd)/g
The ?> in the first bracket makes it an atomic grouping which stops the regex engine backtracking - so it'll match the full URL as it does now, and if/when the next part fails it won't try going back and matching less.
The (<!\.dtd) is a negative lookbehind, which only matches if \.dtd doesn't match ending at that position (i.e., the URL doesn't end in .dtd).
For languages that don't (such as JavaScript), you can do a negative lookahead instead, which is a bit more ugly and is generally less efficient:
/(http:\/\/(?![^\s]*\.dtd\b)[^\s]*)/g
Will match http://, then scan ahead to make sure it doesn't end in .dtd, then backtrack and scan forward again to get the actual match.
As always, http://www.regular-expressions.info/ is a good reference for more information