Regex to find X words only when a word doesn't exist - regex

I'm trying to do some code refactoring and as I'm doing quite a lot I'm using WebStorm's regex find in files to see which files still need refactoring.
I know this (?:^|(?=[^']).\b)(this.user|this.isVatRegistered|showStatsInNet)\b will show all files with one of those bits of code in.
And according to this post: Match string that doesn't contain a specific word ^(?!.*UserMixin).*$ should do a negative look ahead to match only when that word doesn't exist.
My problem is I don't know how to combine them. Would someone be able to provide some guidance please?
I've tried combining like so: (?:^|(?=[^']).\b)(?!.*UserMixin)(this.user|this.isVatRegistered|showStatsInNet)\b but to no avail.
TL;DR How do I match on X number of words only when another word isn't present?

You have to work with a negative lookahead then try to match those sub-strings:
\A(?![\d\D]*?UserMixin)[\d\D]*?\b(?:this\.(?:user|isVatRegistered)|showStatsInNet)\b
This would be time consuming though since there are two [\d\D]*? occurrences that will move the cursor character by character to the end of file content.

Related

Regular Expression: Two words in any order but with a string between?

I want to use positive lookaheads so that RegEx will pick up two words from two different sets in any order, but with a string between them of length 1 to 20 that is always in the middle.
It also is already case insensitive, allow for any number of characters including 0 before the first word found and the same after the second word found - I am unsure if it is more correct to terminate in $.
Without the any order matching I am so far as:
(?i:.*(new|launch|releas)+.{1,20}(product1|product2)+.*)
I have attempted to add any order matching with the following but it only picks up the first word:
(?i:.*(?=new|launch|releas)+.{1,20}(?=product1|product2)+.*)
I thought perhaps this was because of the +.{1,20} in the middle but I am unsure how it could work if I add this to both sets instead, as for instance this could cause a problem if the first word is the very first part of the source text it is parsing, and so no character before it.
I have seen example where \b is used for lookaheads but that also seems like it may cause a problem as I want it to match when the first word is at the start of the source text but also when it is not.
How should I edit my RegEx here please?

Notepad++ Regex Search XML argument for anything but certain word

I have a well structured XML file with several grouped units, which contain a consistent number of child elements.
I am trying to find a way, through Regex in Notepad++, to search throughout all of these groups for a certain argument that contains a single word. I have found a way of doing this but the problem is I want to find the negation of this word, that means for instance, if the word is "downward" I want to find anything that is NOT "downward".
Here is an example:
<xml:jus id="84" trek="spanned" place="downward">
I've came up with <xml:jus id="\d+" trek="[\w]*" place="\<downward"> to find these tags, but I need to find all other matches that do not have "downward" in place= argument. I tried <xml:jus id="\d+" trek="[\w]*" place="^\<downward"> but without success.
Any help is appreciated.
If the properties and the string is in the same format, you could also make use of SKIP FAIL to first match what you want to exclude.
<xml:jus id="\d+" trek="\w+" place="downward">(*SKIP)(*F)|<xml:jus id="\d+" trek="\w+" place="[^"]+">
Regex demo
You might be able to use a negative lookahead to exclude downward from being the place:
<[^>]+ place="(?!downward").*?"[^>]*>
Demo

Regex taking too many characters

I need some help with building up my regex.
What I am trying to do is match a specific part of text with unpredictable parts in between the fixed words. An example is the sentence one gets when replying to an email:
On date at time person name has written:
The cursive parts are variable, might contains spaces or a new line might start from this point.
To get this, I built up my regex as such: On[\s\S]+?at[\s\S]+?person[\s\S]+?has written:
Basically, the [\s\S]+? is supposed to fill in any letter, number, space or break/new line as I am unable to predict what could be between the fixed words tha I am sure will always be there.
Now comes the hard part, when I would add the word "On" somewhere in the text above the sentence that I want to match, the regex now matches a much bigger text than I want. This is due to the use of [\s\S]+.
How am I able to make my regex match as less characters as possible? Using "?" before the "+" to make it lazy does not help.
Example is here with words "From - This - Point - Everything:". Cases are ignored.
Correct: https://regexr.com/3jdek.
Wrong because of added "From": https://regexr.com/3jdfc
The regex is to be used in VB.NET
A more real life, with html tags, can be found here. Here, I avoided using [\s\S]+? or (.+)?(\r)?(\n)?(.+?)
Correct: https://regexr.com/3jdd1
Wrong: https://regexr.com/3jdfu after adding certain parts of the regex in the text above. Although, in html, barely possible to occur as the user would never write the matching tag himself, I do want to make sure my regex is correctjust in case
These things are certain: I know with what the part of text starts, no matter where in respect to the entire text, I know with what the part of text ends, and there are specific fixed words that might make the regex more reliable, but they can be ommitted. Any text below the searched part is also allowed to be matched, but no text above may be matched at all
Another example where it goes wrong: https://regexr.com/3jdli. Basically, I have less to go with in this text, so the regex has less tokens to work with. Adding just the first < already makes the regex take too much.
From my own experience, most problems are avoided when making sure I do not use any [\s\S]+? before I did a (\r)?(\n)? first
[\s\S] matches all character because of union of two complementary sets, it is like . with special option /s (dot matches newlines). and regex are greedy by default so the largest match will be returned.
Following correct link, the token just after the shortest match must be geschreven, so another way to write without using lazy expansion, which is more flexible is to prepend the repeated chracter set by a negative lookahead inside loop,
so
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft(.+?(?=geschreven))geschreven:
becomes
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft((?:(?!geschreven).)+)geschreven:
(?: ) is for non capturing the group which just encapsulates the negative lookahead and the . (which can be replaced by [\s\S])
(?! ) inside is the negative lookahead which ensures current position before next character is not the beginning of end token.
Following comments it can be explicitly mentioned what should not appear in repeating sequence :
From(?:(?!this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!this|point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
to understand what the technic (?:(?!tokens)[\s\S])+ does.
in the first this can't appear between From and this
in the second From or this can't appear between From and this
in the third this or point can't appear between this and point
etc.

Create Regex to find whole word only if includes one string and excludes another

I'm trying to create a regex that returns a word ONLY if it includes one string and excludes another.
For instance, if I'm looking for the word "want" and not "dontwant" it would find the words word_want_other but would NOT find the word word_want_other_dontwant.
Currently, I'm trying to get it to work with a negative lookahead (see below), but this keeps on finding the words with "default" in them.
.*want.*(?!dontwant).*
Debuggex Demo
This works:
(?=.*want)(?!.*dontwant).*
It looks for any string that, after the beginning of the line contains want and does not contain dontwant, and if it meets all those, it returns the whole line.
I'm not completely sure why the original idea didn't work, as having something that DOES work I tried to go back and find something in-between. Sorry there isn't a better explanation of what was going wrong in the previous one.

how to eliminate dots from filenames, except for the file extension

I have a bunch of files that look like this:
A.File.With.Dots.Instead.Of.Spaces.Extension
Which I want to transform via a regex into:
A File With Dots Instead Of Spaces.Extension
It has to be in one regex (because I want to use it with Total Commander's batch rename tool).
Help me, regex gurus, you're my only hope.
Edit
Several people suggested two-step solutions. Two steps really make this problem trivial, and I was really hoping to find a one-step solution that would work in TC. I did, BTW, manage to find a one-step solution that works as long as there's an even number of dots in the file name. So I'm still hoping for a silver bullet expression (or a proof/explanation of why one is strictly impossible).
It appears Total Commander's regex library does not support lookaround expressions, so you're probably going to have to replace a number of dots at a time, until there are no dots left. Replace:
([^.]*)\.([^.]*)\.([^.]*)\.([^.]*)$
with
$1 $2 $3.$4
(Repeat the sequence and the number of backreferences for more efficiency. You can go up to $9, which may or may not be enough.)
It doesn't appear there is any way to do it with a single, definitive expression in Total Commander, sorry.
Basically:
/\.(?=.*?\.)//
will do it in pure regex terms. This means, replace any period that is followed by a string of characters (non-greedy) and then a period with nothing. This is a positive lookahead.
In PHP this is done as:
$output = preg_replace('/\.(?=.*?\.)/', '', $input);
Other languages vary but the principle is the same.
Here's one based on your almost-solution:
/\.([^.]*(\.[^.]+$)?)/\1/
This is, roughly, "any dot stuff, minus the dot, and maybe plus another dot stuff at the end of the line." I couldn't quite tell if you wanted the dots removed or turned to spaces - if the latter, change the substitution to " \1" (minus the quotes, of course).
[Edited to change the + to a *, as Helen's below.]
Or substitute all dots with space, then substitute [space][Extension] with .[Extension]
A.File.With.Dots.Instead.Of.Spaces.Extension
to
A File With Dots Instead Of Spaces Extension
to
A File With Dots Instead Of Spaces.Extension
Another pattern to find all dots but the last in a (windows) filename that I've found works for me in Mass File Renamer is:
(?!\.\w*$)\.
I don't know how useful that is to other users, but this page was an early search result and if that had been on here it would have saved me some time.
It excludes the result if it's followed by an uninterrupted sequence of alphanumeric characters leading to the end of the input (filename) but otherwise finds all instances of the dot character.
You can do that with Lookahead. However I don't know which kind of regex support you have.
/\.(?=.*\.)//
Which roughly translates to Any dot /\./ that has something and a dot afterwards. Obviously the last dot is the only one not complying. I leave out the "optionality" of something between dots, because the data looks like something will always be in between and the "optionality" has a performance cost.
Check:
http://www.regular-expressions.info/lookaround.html