Regex lookahead. Find word without .min. in string - regex

I'm trying to replace a link in a html file with regex and nodejs. I want to replace links without a .min.js extension.
For example, it should match "common.js" but not "common.min.js"
Here's what I've tried:
let htmlOutput = html.replace(/common\.(?!min)*js/g, common.name);
I think this negative lookahead should work but it doesn't match anything. Any help would be appreciated.

The (?!min)*js part is corrupt: you should not quantify zero-width assertions like lookaheads (they do not consume text so quantifiers after them are treated either as user errors or are ignored). Since js does not start with min this lookahead even without a quantifier is redundant.
If you want to match a string with a whole word common, then having any chars and ending with .js but not .min.js you need
/\bcommon\b(?!.*\.min\.js$).*\.js$/
See the regex demo.
Details:
\b - word boundary
common - a substring
\b - word boundary
(?!.*\.min\.js$) - immediately to the right, there should not be any 0 or more chars followed with .min.js at the end of the string
.* - any 0 or more chars
\.js - a .js substring
$ - end of string.

Here, we likely can find a simple expression to pass any char except new lines and ., after the word common, followed by .js:
common([^\.]+)?\.js
Demo
RegEx Circuit
jex.im visualizes regular expressions:

The end regex I'm using is /\bcommon[^min]+js\b/g
This will find the word common with any amount of chracters afterword except if those characters contain the word minand ending in js allowing me to replace scripts on my html page like:
script src="~/dist/common.js"
OR
script src="~/dist/common.9cf5748e0e7fc2928a07.js"
Thanks to Wiktor Stribiżew for helping me.

Related

Regex - Pattern until double \n

Somehow I am not able to find anything online about how to set a pattern ending to a double \n. My particular case is the following. I have this string:
"1 Matt\n00:00:00,100 --> 00:00:01,500\nThis is said \nby Matt.\n\n2 Lucas\n00:00:01,700 --> 00:00:02,300\nWhile this is said by Lucas"
And I would like to extract only the texts between digit\n and \n\n. So, in my case, I'd like to have
This is said \nby Matt.
While this is said by Lucas
Although I am not very skilled with RegEx, I tried many combinations such as ?<=\d\n).*?(?=\n\n), ?<=\d\n).\n\n and ?<=\d\n).*?(?=\r\n\r\n) but without any luck.
I have tried those as well as others with R's stringr library, but also with python's re.
The issue first came up in this answer: https://stackoverflow.com/a/72547966/19284124
You can make the . match across lines with the (?s) inline modifier and extend the double newline pattern to alternatively match the end of string:
(?s)(?<=\d\n).*?(?=\n\n|\Z)
See the regex demo.
Details:
(?s) - a flag allowing . match line break chars
(?<=\d\n) - a positive lookbehind that matches a location that is immediately preceded with a digit and a newline
.*? - any zero or more chars, as few as possible
(?=\n\n|\Z) - a positive lookahead that matches a location that is immediately followed with two newline chars or end of string.
This regex is more efficient and is a variant that would work on many regex flavors such as Javascript, PHP, Python, java, .NET etc because we avoid using (?s) and \Z or \z:
(?<=\d\n)(?:.*\n)*?.*(?=\n\n|$)
Make sure to use it without MULTILINE mode.
RegEx Demo

Multiline PCRE, multiple conditions

just starting out with regex and have hit a stumbling block. Hoping someone might be able to explain the workaround.
Trying to carry out a multi-line search. I wish to use "*" as the 'flag', so to speak: if a line contains an asterisk it should match. The digits at the start of the line should be output, so should the word "Match" in the linked example, excluding the asterisk itself.
I assume my use of "|" is dividing the regex into two conditions, when it actually needs to satisfy both to match.
https://regex101.com/r/Pu56bi/2
(?m)(^\d+)|(?<=\*).*$
Any help kindly appreciated.
You could use a pos. lookahead as in
^(?=.*?\*)(\d+).+?(Match)$
See your modified example on regex101.com.
If Match is always at the end of the string, you could match the digits at the start of the string, then match an * and Match at the end of the string.
Use a word boundary \b to prevent the word of digits being part of a longer word.
^(\d+)\b.*\*.*\b(Match)$
Regex demo
If there can be test after the word Match you can assert * using a positive lookahead.
^(?=.*\*)(\d+)\b.*\b(Match)\b.*$
Regex demo

How to end a string with $ directly after .* with a RegEx?

I'm trying to report on a set of URLs that catches all potential URL parameters and I'm having an issue defining the RegEx properly.
We have this RegEx to capture a few variations of our URLs to feed into our reporting but I need to be able to end the string with a $ but when I do, it doesn't show any results.
The RegEx:
/join/$|/join/\?product.*|/join/\.*
For another account, we only use one variation which is outlined below (which works):
^/join/$
I believe the issue is in that after \?product.*, I'm not ending the string (or even starting it).
So far I have tried: ^/join/$|(^[/join/\?product.*]$)|(^[/join/\.*]$) with no luck.
If you want to match the dollar sign literally you have to escape it \$ or else it would mean an anchor to assert the end of the string / line.
This pattern ^/join/$ would therefore only match /join/
In your pattern you use an alternation where the last part /join/\.* would match /join/ but also /join/..... because when you escape the dot you will match it literally and the * quantifier repeats 0+ times.
Perhaps you are looking for:
^/join/(?:\?product.*\$)?$
This will match /join/ followed by an optional part (?:\?product.*\$)? that will match ?product, followed by any char 0+ times and will end on $.
Regex demo
Please, make the pattern lazy and $ is a special character for regex so need to escape that. (Regarding escaping part, google analytics may follow something else.) [] is used to capture a character in a range, be careful with that as well, as you are trying to capture a group I think.
\?product.*?\$

match part of string but exclude certain file extension

I have following two files:
test_ExampleSchemeConfig
test_ExampleSchemeConfig.cpp
and I want to use the following regular expression to separate these two. I want
to filter out test_ExampleSchemeConfig and the following expression doesn't work:
test_.*(?!(\.(cpp|hpp)))$
I'm wondering how can I fix it?
I believe answer to my question should be somewhere but I have no luck finding it.
Thanks much!
You may consider using one of the following regular expressions:
^test_[^.]*$
It will match a string starting with test_ and then having any 0+ chars other than .. See the regex demo.
Or, you may use
^test_.*$(?<!\.[ch]pp)
It will match any string starting with test_ and then having any 0+ chars, but not ending with .cpp or .hpp. See the regex demo.
If your regex engine does not support lookbehind, use the equivalent pattern with a lookahead:
^test_(?!.*\.[ch]pp$).*$
This regex matches test_, then makes sure there are no 0+ chars other than line break chars followed with ., c or h and then pp at the end of the string, and then grabs the whole line. See the regex demo.

Regular expression for variable routes

I have the following directory:
Videos/common/Project/Project01/video.project_01.StatusOK/video.project_01.StatusOK.csproj
And the regular expression that I use to extract only with the last part of the route (video.project_01.StatusOK.csproj) is the following:
([\w|.])/Project/([\w|.|\s])/([\w|.|\s])/([\w|.|\s])([.]*)
The problem is that if the route varies, that is if there is a directory before: video.project_01.StatusOK.csproj, for example like this: Videos/common/Project/Project01/video.project_01.StatusOK/test/video.project_01. StatusOK.csproj, I would extract 'test'.
Let's see if someone can help me with a regular expression for java, always extract the last part which contains the '.csproj', whatever the route.
Regards, and thank you very much
Try this Regex:
(?<=\/)[^\/]+csproj
Click for Demo
See JAVA code HERE
Explanation:
(?<=\/) - positive lookbehind to find the position immediately preceded by a /
[^\/]+ - matches 1+ occurrences of any character that is not a /
csproj - matches csproj literally
In case you are unaware, Java 7 introduced NIO2 which brought a new interface java.nio.file.Path. You can break up the path to your directory and then use a regular expression on each part of the path.
Oracle's Java Tutorial has a section on Path Operations
(There is also a section on Regular Expressions)
If you want to keep to the /Project/ in your path, you could try this:
.*?/Project/.*?(?<=\/)([\w+. ]+\.csproj)$
That would match
match any character zero or more times non greedy (.*?)
match /Project/
match any character zero or more times non greedy (.*?)
positive lookbehind that asserts that what is before is a forward slash (?<=\/)
A capturing group ( this will contain your match
A character class that will match one or more word characters, dot or whitespace [\w. ]+ one or more times
Match .csproj \.csproj
Close the capturing group )
The end of the string $