Regex: Negative lookahead after list match - regex

Consider the following input string (part of css file):
url('data:image/png;base64,iVBORw0KGgoAAAAN...');
url(example.png);
The objective is to take the url part using regex and do something with it. So the first part is easy:
url\(['"]?(.+?)['"]?\)
Basically, it takes contents from inside url(...) with optional quotes symbols. Using this regexp I get the following matches:
data:image/png;base64,iVBORw0KGgoAAAAN...
example.png
So far so good. Now I want to exclude the urls which include 'data:image' in their text. I think negative lookahead is the proper tool for that but using it like this:
url\(['"]?(?!data:image)(.+?)['"]?\)
gives me the following result for the first url:
'data:image/png;base64,iVBORw0KGgoAAAAN...
Not only it doesn't exclude this match, but the matched string itself now includes quote character at the beginning. If I use + instead of first ? like this:
url\(['"]+(?!data:image)(.+?)['"]?\)
it works as expected, url is not matched. But this doesn't allow the optional quote in url (since + is 1 or more). How should I change the regex to exclude given url?

You can use negative lookahead like this:
url\((['"]?)((?:(?!data:image).)+?)\1?\)
RegEx Demo

Related

regex match string unless followed by #

I'm trying to add a #param to url's I want to add it to all urls that doesn't already have the # to avoid double param. My urls doesn't look like urls they are made up of handlebar parameters.
they can look like following:
{{app.url}}
{{root.app.url}}
{{app.url}}#param
{{root.app.url}}#param
So I came up with a regex that matches the handlebar tag ({{(root.)?app.url}})
only problem is that when I later uses regexp_replace(url, '({{(root\.)?app\.url}})', '\1#param')
my result looks like this:
{{app.url}}#param
{{root.app.url}}#param
{{app.url}}#param#param
{{root.app.url}}#param#param
One solution I can think of is doing it in two steps, and the 2nd step should look for duplicate #param#param and replace that with single #param.
But it had me wondering if there was a way using regex to exclude the handlebar tags that are followed by # and completely cancel that match?
Here are some examples:
https://regex101.com/r/d3Zyvo/6
Note: this is for use in postgressql update queries. The regex is POSIX/PCRE. I must use regex_replace with back reference since there might be content before and after the hanbdlebar tags, I simply cannot just concatenate the param. (see the link).
You may use a negative lookahead (?!#):
({{(root\.)?app\.url}})(?!#)
^^^^^
See the regex demo.
Details
({{(root\.)?app\.url}}) - Group 1 (later referred to with \1 from the replacement pattern):
{{ - {{ substring
(root\.)? - an optional Group 2 matching 1 or 0 occurrences of root.
app\.url}} - a literal app.url}} substring
(?!#) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a # char.
See Table 9-17. Regular Expression Constraints:
(?!re) negative lookahead matches at any point where no substring matching re begins (AREs only)
PostgreSQL demo:
select regexp_replace('{{app.url}}
{{root.app.url}}
{{app.url}}#param
{{root.app.url}}#param',
'({{(root\.)?app\.url}})(?!#)',
'\1#param',
'g');

Regular expression to match line containing some strings and not others

I have lines like this:
example.com/p/stuff/...
example.com/page/thing/...
example.com/page/stuff/...
example.com/page/other-stuff/...
etc
where the dots represent continuing URL paths. I want to select URLs that contain /page/ and are NOT followed by thing/. So from the above list we would select:
example.com/page/stuff/...
example.com/page/other-stuff/...
.*?\/page\/[^(thing)].*
this is the regex for matching a string which has /page/ not followed by thing
adding the lazy evalation is suggested because you advance a char at the time, better performance!
You need to use negative lookahead:
example\.com\/page\/(?!thing\/).*
Demo
Use the following regex pattern:
.*?\/page\/(?!thing\/).*
https://regex101.com/r/19wh1w/2
(?!thing\/) - negative lookahead assertion ensures that page/ section is not followed by thing/

URL rewrite using PCRE expression - append prefix to all incoming URIs except one pattern

i am using match expression as https://([^/]*)/(.*) and replace expression as constantprefix/$2 and trying to rewrite incoming URL by adding '/constantprefix' to all URLs
for Below URLs it is working as expected:
https://hostname/incomingURI is converting to
/constantprefix/incomingURI
https://hostname/ is converting to /constantprefix/
https://hostname/login/index.aspx is converting to
/constantprefix/login/index.aspx
i am having problem for the URLs which already starting with /constantprefix, i am seeing two /constantprefix/constantprefix in the output URL which I am not looking for, is there any way we can avoid that ?
if incoming URL is https://hostname/constantprefix/login/index.aspx then output URL is becoming https://hostname/constantprefix/constantprefix/login/index.aspx
may i know how i can avoid /constantprefix/constantprefix from match expression ?
You can do it with:
https://[^/]*/(?!constantprefix(?:/|$))(.*)
using the replacement string:
constantprefix/$1
(?!...) is a negative lookahead and means not followed by. It's only a test and doesn't consume characters (this kind of elements in a pattern are also called "zero-width assertions" as a lookbehind or anchors ^ and $).
The first capture group in your pattern was useless, I removed it.

Regex to match all urls, excluding .css, .js recources

I'm looking for a regular expression to exclude the URLs from an extension I don't like.
For example resources ending with: .css, .js, .font, .png, .jpg etc. should be excluded.
However, I can put all resources to the same folder and try to exclude URLs to this folder, like:
.*\/(?!content\/media)\/.*
But that doesn't work! How can I improve this regex to match my criteria?
e.g.
Match:
http://www.myapp.com/xyzOranotherContextRoot/rest/user/get/123?some=par#/other
No match:
http://www.myapp.com/xyzOranotherContextRoot/content/media/css/main.css?7892843
The correct solution is:
^((?!\/content\/media\/).)*$
see: https://regex101.com/r/bD0iD9/4
Inspirit by Regular expression to match a line that doesn't contain a word?
Two things:
First, the ?! negative lookahead doesn't remove any characters from the input. Add [^\/]+ before the trailing slash. Right now it is trying to match two consecutive slashes. For example:
.*\/(?!content\/media)[^\/]+\/.*
(edit) Second, the .*s at the beginning and end match too much. Try tightening those up, or adding more detail to content\/media. As it stands, content/media can be swallowed by one of the .*s and never be checked against the lookahead.
Suggestions:
Use your original idea - test against the extensions: ^.*\.(?!css|js|font|png|jpeg)[a-z0-9]+$ (with case insensitive).
Instead of using the regular expression to do this, use a regex that will pull any URL (e.g., https?:\/\/\S\+, perhaps?) and then test each one you find with String.indexOf: if(candidateURL.indexOf('content/media')==-1) { /*do something with the OK URL */ }

Correct regex / mod-rewrite syntax for this url

Hi I am having a little difficulty working out this mod rewrite rule / regex correctly.
I have a url format like this:
www.site.com/some-page-title-here-cb384
www.site.com/another-page-title-here-cb385
And I'd like to find only the numbers after each 'cb' only if the url contains a 'cb' after the last hyphen in each string.
I have:
.*?([0-9]+)$
Which matches the last set of numbers but I need to be more specific in saying only if the last section of the url contains the pattern '-cb'.
Try this:
.*-cb(\d+)$
This one should work and you should find the numbers in $1.
Let me be more specific. Your regexp (and mine above) doesn't match only the last part of the string, but matches the whole string. If you want to match only the last part, you should write it without .*:
-cb(\d+)$