Regex to include some files but with one exception - regex

I would like a regex that includes all filenames with a certain ending ex. ".err" but not if this filename starts with e.g. "test". In other words include "*.err"-files but not "test-whatever.err"-files.
I have found that
(?!test.*\.err$).*\.err
excludes the test*.err files and that
.*\.err
includes all the *.err files, but I need them both in the same expression.
Also the fact that the ".err" can be written as ".ERR" or ".Err" must be taken into concideration for this regex to work properly for me.
All thoughts and ideas are appreciated!
Regards
Rickard

Use this
^(?i)(?!test).*\.err$
See it here online on Regexr
The important parts, that are different to yours:
Use anchors. ^ and $ are anchoring your pattern to the start and to the end of the string
(?i) makes it "ignorecase", so that err will also match "ERR" or "ErR" and test will also match "Test" and TEST ...
You didn't gave the language, but this features should work with the most flavours.

How about this one:
^(?!test).*\.err$

Related

Regex alternative to negative lookahead

I want to match all paths that include the keyword build unless they also contain .html
Here is a working regex that uses negative lookahead: https://regexr.com/4msck
I am using regex for path matching in unison which does not support negative lookahead. How can I replicate the functionality of the above regex without negative lookahead?
It is possible, but the resulting regex is pretty poor in terms of readability and maintainability.
http://regexr.com/4mst1
^(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))*build(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))*$
Explanation:
^ - start of string/line
(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))* - matches anything that does not contain .html
build - literally that string
(?:[^\.\n]|\.(?:$|[^h\n]|h(?:$|[^t\n]|t(?:$|[^m\n]|m(?:$|[^l\n])))))* - same as before
$ - end of string/line
According to the manual, this should work. It is based on the comment: "I want to ignore all files in a build directory except for html files"
ignore = Regex .*build.*
ignorenot = Name {*.html}
I am not familiar with unison, so I must assume that you can specify the paths with more than 1 rule.
I have this expectation because of this statement in the manual:
There is also an ignorenot preference, which specifies a set of patterns for paths that should not be ignored, even if they match an ignore pattern.

Match anything.c but not anything.in.c

I'm trying to write a regex that matches a.c, hello.c, etc.c, but not a.in.c, hello.in.c, etc.in.c.
Here's my regex: https://regex101.com/r/jC8nB0/21
It says that the negative lookahead won't match what I specified, that is, .in.c. I didn't know where to teach it to match .c. I tried both inside the parenthesis and outside.
I've read Regex: match everything but specific pattern but it talks about matching everything except something, but I need to match a rule except other rule.
This worked for me.
.*(?<!(\.in))\.c
https://www.regular-expressions.info/lookaround.html
*Edited do to good information from zzxyz
This is actually a bit complicated given unknown input. The following isn't perfect, but it avoids .cpp files, and deals with strings that don't contain filenames, or longer strings that do.
\b\S+(?<!\.in)\.c\b
https://regex101.com/r/jC8nB0/286

Regex to select path till a folder name

Given a string "C:\Tom\Dick\Harry\Chocolate\Treat\Hunt\Fruitless" I have to select anything which appears before Treat.
I have tried with
(.*)\\Treat
but it includes the Treat word also.
Result is "C:\Tom\Dick\Harry\Chocolate\Treat".
Any help will be much appreciated.
You could use a lookahead in the regex if you don't want to include the word \Treat.
.*(?=\\Treat)
DEMO
OR
If you want to include the word Treat then try the below regex,
^.*?\\Treat
DEMO
(.*?)(?:Treat).*
This simple re should do it.
See demo

Regular Expression Union

I am trying to have a union of regular expression such that it excludes *.log file and includes *.pl file by
^(.(?!(.log)))|^(.*\.pl)$
What would be the correct syntax for that?
You are close. Your lookahead would look like this.
^(?!.*\.log).*\.pl$
However, your regex is anchored so you could simply match the ones that end in .pl.
^.*\.pl$
To match all lines excluding the ones that end in .log assuming this is a possibility for what you need.
^(?!.*\.log).*$
You don't need to exclude the others, if it doesn't match it will not be in the output (demo):
^.*?\.pl$
Was not sure about the exact scenario in the question, but if you actually need to list all files with extensions, excluding the *.log and *.pl ones, you could use:
^.*\.(?!(pl|log))[^.]+$

Regex to match when a string is present twice

I am horrible at RegEx expressions and I just don't use them often enough for me to remember the syntax between uses.
I am using grepWin to search my files. I need to do a search that will return the files that have a given string twice.
So, for example, if I was searching on the word "how", then file one would not match:
Hello
how are you today?
but file two would:
Hello
how are you today?
I am fine, how are you?
Any one know how to make a RegEx that will match that?
something like this (depends on language and your specific task)
\(how.*){2}\
Edit:
according to #CodeJockey
\^(([^h]|h[^o]|ho[^w])*how([^h]|h[^o]|ho[^w])*){2,2}$\
(it become more complicated)
#CodeJockey: Thanks for comments
I don't know what grepWin supports, but here's what I came up with to make something match exactly twice.
/^((?!how).)*how((?!how).)*how((?!how).)*$/
Explanation:
/^ # start of subject
((?!how).)* # any text that does not contain "how"
how # the word "how"
((?!how).)* # any text that does not contain "how"
how # the word "how"
((?!how).)* # any text that does not contain "how"
$/ # end of subject
This ensures that you find two "how"s, but the texts between the "how"s and to either side of them do not contain "how".
Of course, you can substitute any string for "how" in the expression.
If you want to "simplify" by only writing the search expression twice, you can use backreferences thus:
/^(?:(?!how).)*(how)(?:(?!\1).)*\1(?:(?!\1).)*$/
Refiddle with this expression
Explanation:
I added ?: to make the negative lookaheads' text non-capturing. Then I added parentheses around the regular how to make that a capturing subpattern (the first and only one).
I had to include "how" again in the first lookahead because it's a negative lookahead (meaning any capture would not contain "how") and the captured "how" is not captured yet at that point.
This is significantly harder than I originally thought it would be, and requires variable-length lookbehind, which grepWin does not support...
this expression:
(?<!blah.{0,99999})blah(?=.*?blah)(?!.*blah.*blah)
was successfully used in Eclipse, using the "Search > File" dialog to exclude files with one and three instances of blah and to include files with exactly two instances of blah.
Eclipse does not permit a .* in lookbehind, so I used .{0,99999} instead.
It is possible, with the right tool, but It isn't pretty to get it to work with grepWin (see answer above). Can you use other tools (such as Eclipse) and what did you want to do with the files afterwards?
This works for grep || python, it will return a match only if "how" exists twice in a your_file:
grep "how.*how" your_file
in python (re imported):
re.search(r"how.*how","your_text")
It will return everything in between,(the dot means any character and the star means any number of characters), and you can customize your own script.