match part of string but exclude certain file extension - regex

I have following two files:
test_ExampleSchemeConfig
test_ExampleSchemeConfig.cpp
and I want to use the following regular expression to separate these two. I want
to filter out test_ExampleSchemeConfig and the following expression doesn't work:
test_.*(?!(\.(cpp|hpp)))$
I'm wondering how can I fix it?
I believe answer to my question should be somewhere but I have no luck finding it.
Thanks much!

You may consider using one of the following regular expressions:
^test_[^.]*$
It will match a string starting with test_ and then having any 0+ chars other than .. See the regex demo.
Or, you may use
^test_.*$(?<!\.[ch]pp)
It will match any string starting with test_ and then having any 0+ chars, but not ending with .cpp or .hpp. See the regex demo.
If your regex engine does not support lookbehind, use the equivalent pattern with a lookahead:
^test_(?!.*\.[ch]pp$).*$
This regex matches test_, then makes sure there are no 0+ chars other than line break chars followed with ., c or h and then pp at the end of the string, and then grabs the whole line. See the regex demo.

Related

Regex - Pattern until double \n

Somehow I am not able to find anything online about how to set a pattern ending to a double \n. My particular case is the following. I have this string:
"1 Matt\n00:00:00,100 --> 00:00:01,500\nThis is said \nby Matt.\n\n2 Lucas\n00:00:01,700 --> 00:00:02,300\nWhile this is said by Lucas"
And I would like to extract only the texts between digit\n and \n\n. So, in my case, I'd like to have
This is said \nby Matt.
While this is said by Lucas
Although I am not very skilled with RegEx, I tried many combinations such as ?<=\d\n).*?(?=\n\n), ?<=\d\n).\n\n and ?<=\d\n).*?(?=\r\n\r\n) but without any luck.
I have tried those as well as others with R's stringr library, but also with python's re.
The issue first came up in this answer: https://stackoverflow.com/a/72547966/19284124
You can make the . match across lines with the (?s) inline modifier and extend the double newline pattern to alternatively match the end of string:
(?s)(?<=\d\n).*?(?=\n\n|\Z)
See the regex demo.
Details:
(?s) - a flag allowing . match line break chars
(?<=\d\n) - a positive lookbehind that matches a location that is immediately preceded with a digit and a newline
.*? - any zero or more chars, as few as possible
(?=\n\n|\Z) - a positive lookahead that matches a location that is immediately followed with two newline chars or end of string.
This regex is more efficient and is a variant that would work on many regex flavors such as Javascript, PHP, Python, java, .NET etc because we avoid using (?s) and \Z or \z:
(?<=\d\n)(?:.*\n)*?.*(?=\n\n|$)
Make sure to use it without MULTILINE mode.
RegEx Demo

Regex lookahead. Find word without .min. in string

I'm trying to replace a link in a html file with regex and nodejs. I want to replace links without a .min.js extension.
For example, it should match "common.js" but not "common.min.js"
Here's what I've tried:
let htmlOutput = html.replace(/common\.(?!min)*js/g, common.name);
I think this negative lookahead should work but it doesn't match anything. Any help would be appreciated.
The (?!min)*js part is corrupt: you should not quantify zero-width assertions like lookaheads (they do not consume text so quantifiers after them are treated either as user errors or are ignored). Since js does not start with min this lookahead even without a quantifier is redundant.
If you want to match a string with a whole word common, then having any chars and ending with .js but not .min.js you need
/\bcommon\b(?!.*\.min\.js$).*\.js$/
See the regex demo.
Details:
\b - word boundary
common - a substring
\b - word boundary
(?!.*\.min\.js$) - immediately to the right, there should not be any 0 or more chars followed with .min.js at the end of the string
.* - any 0 or more chars
\.js - a .js substring
$ - end of string.
Here, we likely can find a simple expression to pass any char except new lines and ., after the word common, followed by .js:
common([^\.]+)?\.js
Demo
RegEx Circuit
jex.im visualizes regular expressions:
The end regex I'm using is /\bcommon[^min]+js\b/g
This will find the word common with any amount of chracters afterword except if those characters contain the word minand ending in js allowing me to replace scripts on my html page like:
script src="~/dist/common.js"
OR
script src="~/dist/common.9cf5748e0e7fc2928a07.js"
Thanks to Wiktor Stribiżew for helping me.

PCRE Regex Match /x... but not /y/x

When configuring redirections, it's common to run into multiple pages that include some of the same path strings. We've ran into this instance multiple times where we need to redirect:
https://example.com/x...
But not:
https://example.com/y/x...
To match the /x... we use PCRE regex of:
/x.*
We've been struggling to get the exclude to match correctly; we apologize in advance as our regex is a bit weak, here's our pseudo code:
Match all /x... except /y/x...
Here is what we thought that looked like:
^\/(?!y\/).x.*
In our mind that reads:
Any query starting with /x..., except starting with /y/x...
Thank you in advance, and please feel free to suggest better formatting, we are not stack overflow pros.
Your regex matches from the start of the string a forward slash and then uses a negative lookahead to check what follows is not y/. If that is true, then match any character followed by x and 0+ character. That will match for example //x///
Without taking matching the url part into account, one way could be to use a negative lookahead (?! to check if what is on the right side does not contain /y/x and then match any character:
^(?!.*/y/x).+
Regex demo
You may use a negative lookbehind assertion:
~(?<!/y)/x~
RegEx Demo
(?<!/y) is a negative lookbehind assertnion that will fail the match if /y appears before matching /x.

Regular Expression Should not start with a character and contain a sequence

For example, should not start with h and should contain ap.
Should match apology, rap god, trap but not match happy.
I tried
^[^h](ap)*
but it doesn't match sequences which start with ap like apology.
You may use
^(?!h).*ap
See the following demo. To match the whole string to the end, append .* at the end:
^(?!h).*ap.*
If you plan to only match words following the rules you outlined, you may use
\b(?!h)\w*ap\w*
Or, without a lookahead:
\b([^\Wh]\w*)?ap\w*
See this regex demo and the demo without a lookahead.
#WiktorStribiżew's comment with negative lookahead is correct (you might want to add .* to it if you want to match the whole string).
For completeness, you can also use alternation:
^(?:[^h].*ap|ap).*
Demo: https://regex101.com/r/ecVTGm/1

Regex Negative Lookbehind Matches Lookbehind text .NET

Say I have the following strings:
PB-GD2185-11652-MTCH
GD2185-11652-MTCH
KD-GD2185-11652-MTCH
KD-GD2185-11652
I want REGEX.IsMatch to return true if the string has MTCH in it and does not start with PB.
I expected the regex to be the following:
^(?<!PB)\S+(?=MTCH)
but that gives me the following matches:
PB-GD2185-11652-
GD2185-11652-
KD-GD2185-11652-
I do not understand why the negative lookbehind not only doesn't exclude the match but includes the PB characters in the match. The positive lookahead works as expected.
EDIT 1
Let me start with a simpler example. The following regex matches all of the strings as I would expect it to:
\S+
The following regex still matches all of the strings even though I would expect it not to:
\S+(?!MTCH)
The following regex matches all but the final H character on the first three strings:
\S+(?<!MTCH)
From the documentation at regex 101, a lookahead looks for text to the right of the pattern and a lookbehind looks for text to the left of the pattern, so having a lookahead at the beginning of a string does not jive with the documentation.
Edit 2
take another example with the following three strings:
grey
greyhound
hound
the regex:
^(?<!grey)hound
only matches the final hound. whereas the regex:
^(?<!grey)\S+
matches all three.
You need a lookahead: ^(?!PB)\S+(?=MTCH). Using the look-behind means the PB has to come before the first character.
The problem was because of the greediness of \S+. When dealing with lookarounds and greedy quantifiers you can easily match more characters than you expect. One way to deal with this is to insert a negative lookaround in a group with the greedy quantifier to exclude it as a match as stated in this question:
How to non-greedy multiple lookbehind matches
and on this helpful website about greediness in regular expressions:
http://www.rexegg.com/regex-quantifiers.html
Note that this second link has a few other ways to deal with the greediness in various situations.
A good regular expression for this situation is as follows:
^(?<!PB)((?!PB)\S+)(MTCH)
In situations like this it is going to be much clearer to do it logically within the code. So first check if the string matches MTCH and then that it doesn't match ^PB