Regex for all files except .hg_keep - regex

I use empty .hg_keep files to keep some (otherwise empty) folders in Mercurial.
The problem is that I can't find a working regex which excludes everything but the .hg_keep files.
lets say we have this filestructure:
a/b/c2/.hg_keep
a/b/c/d/.hg_keep
a/b/c/d/file1
a/b/c/d2/.hg_keep
a/b/.hg_keep
a/b/file2
a/b/file1
a/.hg_keep
a/file2
a/file1
and I want to keep only the .hg_keep files under a/b/.
with the help of http://gskinner.com/RegExr/ I created the following .hgignore:
syntax: regexp
.*b.*/(?!.*\.hg_keep)
but Mercurial ignores all .hg_keep files in subfolders of b.
# hg status
? .hgignore
? a/.hg_keep
? a/b/.hg_keep
? a/file1
? a/file
# hg status -i
I a/b/c/d/.hg_keep
I a/b/c/d/file1
I a/b/c/d2/.hg_keep
I a/b/c2/.hg_keep
I a/b/file1
I a/b/file2
I know that I a can hd add all the .hg_keep files, but is there a solution with a regular expression (or glob)?

Regexp negation might work for this. If you want to ignore everything except the a/b/.hg_keep file, you can probably use:
^(?!a/b/\.hg_keep)$
The parts of this regexp that matter are:
^ anchor the match to the beginning of the file path
(?! ... ) negation of the expression between '!' and ')'
a/b/\.hg_keep the full path of the file you want to match
$ anchor the match to the end of the file path
The regular expression
^a/b/\.hg_keep$
would match only the file called a/b/.hg_keep.
Its negation
^(?!a/b/\.hg_keep)$
will match everything else.

Not quite sure in what context you are using the Regex but this should be it, this matches all lines ending in .hg_keep:
^.*\.hg_keep$
EDIT: And here is a Regex to match items not matching the above expression:
^(?:(?!.*\.hg_keep).)*$

Try (?!.*/\.hg_keep$).

Looking for something similiar to this.
Found an answer, but it's not what we want to hear.
Limitations
There is no straightforward way to ignore all but a set of files. Attempting to use an inverted regex match will fail when combined with other patterns. This is an intentional limitation, as alternate formats were all considered far too likely to confuse users to be worth the additional flexibility.
Ref: https://www.mercurial-scm.org/wiki/.hgignore

Related

Path Validation - My RegEx is matching leading spaces in directory names and I can't fix it

I'm back again with more RegEx shenanigans.
I had what I thought was a perfect windows path validation expression.
Here it is at Regex101: https://regex101.com/r/BertHu/6
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?[^\\\/:*?"<>|\r\n]+\\?)(?:[^\\\/:*?"<>|\r\n]+\\)*[^\\\/:*?"<>|\r\n]*$
Breakdown:
# (?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\| # Drive
# \\?[^\\\/:*?"<>|\r\n]+\\?) # Relative path
# (?:[^\\\/:*?"<>|\r\n]+\\)* # Folder
# [^\\\/:*?"<>|\r\n]* # File
The issue I'm having now, is that the expression is matching paths with leading spaces in directories.
Example: C:\ Leading Space\ Shouldnt Match is matching.
I tried adding [^\s] to the folder portion of the expression:
(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*
But that only invalidates a leading space in the first path segment:
C:\ LeadingSpace\ShouldntMatch Doesn't match (Good)
C:\LeadingSpace\ ShouldntMatch Matches incorrectly (Bad)
I think the problem lies here:
If anyone could help or point me in the right direction that would be great.
Sorry for all the RegEx questions!
Well it depends what the exact rules are, if I take your regex101 script, as basis, I would say:
File, Folder and Relative Folder, are more or less the same (if you ignore the no-capture group and the Backslashes):
\\?[^\\\/:*?"<>|\r\n]+\\?
(?:[^\\\/:*?"<>|\r\n]+\\)*
[^\\\/:*?"<>|\r\n]*
So there are three potenital places, where folders could start with a leading space.
You could add a [^\s] infront of all of them like this
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?([^\s][^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
I saved the modified regex101 https://regex101.com/r/Pd3lcR/1
Now it should work, at least for my limited testcases, and information about the restriction.
Btw.: I don't know what your use case is, but this regex is pretty long for a smiple matching and filename capture, may be there is a more readable way(for non regex people).
Update:
to fix the introduced Bug, I have to prevent the Share option matching with the relative path, by preventing a double slash with (?!\\)
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?((?!\\)[^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
here the updated regex: https://regex101.com/r/RMVkTC/3
Update (Version 2):
I rewrote the regex to the way I would create it. It is not perfectly optimized(short), but this way it is easier to test/bugfix.
The RegExp is exactly 3 parts, piped together:
Drive + path + folder/file: (^[a-z]:\\([^\s][^\\\/:*?"<>|\r\n]+\\)*[^\s][^\\\/:*?"<>|\r\n]+$)
relativepath + folder/file: (^(\.?\.?\\[^\\])*([^\\\/:*?"<>|\r\n]+\\?)*$)
Share + folder/file: (^\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\([^\s][^\\\/:*?"<>|\r\n]*\\?)*$)
like this, if you have to change something for one edge case, it is more contained and easy to adapt.
here the updated regex: https://regex101.com/r/Qxj3Ni/1

Writing valid RegEx for use in file/folder exclusion

I'm trying to write two expressions to use in the files/folder Exclusion List for Code42 CrashPlan backup. Their support won't help with RegEx expressions, they just point me to their KB article.
In their "File Exclusions" section, I'd like to:
exclude this folder specifically: S:\Google Drive\Temp
any file or folder containing the string Backup_Excluded anywhere in its name.
This is what I've got so far - but I have no way of knowing if they're correct:
(?i).*Google Drive\\Temp ...but since I really want to exclude a specific folder, not a pattern - do I need to escape the slashes and colon in the path of S:\Google Drive\Temp
(?i).*Backup_Excluded
Research disclaimer: I know there are RegEx resources out there, but am unsure which flavor/syntax to use, as I'd imagine there are many. I was hoping those with more RegEx familiarity could advise.
The link you posted says:
The Code42 app treats all file separators as forward slashes /.
So it seems you'd want to use / instead of \\ in your regular expressions.
Colon doesn't need escaping.
\ needs escaping because it's the escaping character itself.
/ normally needs escaping because it is the default separators for regular expression sections. However, the examples in your link don't escape it, so only the matching section is implied, so no escaping.
Then you could probably use:
S:/Google Drive/Temp
or [A-Z]:/Google Drive/Temp (to allow any drive)
.*Backup_Excluded.*
I probably wouldn't use (?i), as the capitals in those strings are usually there, but that's your call.
Check out e.g. https://regex101.com/ to test your regular expressions (also in different flavours).

REGEX --ignore-files does not ignore

I try to ignore files matching this regular expression :
--ignore-files="^load*\.py$"
i want to ignore all files starting with the pattern load+xxx
when i do like that, the files starting with load are also listed. would please help ?
Thanks
Your regex will only match files like
loa.py
load.py
loaddddd.py
because you forgot to add the wildcard "dot" (which means "any character"):
--ignore-files="^load.*\.py$"

Regex expression to match a string but exclude something at the same time

I want to try and ask this as concisely as possible please forgive me if I'm leaving something out. I want the expression to match all cases except where an exact filename string is present.
A backup software I'm using uses regular expressions and I want to setup an exclusion to skip all of a particular file extension type, except I have certain files I need to backup so I don't want them to match.
The files I want to exclude are we'll say for this example *.FLV
(?i).*\.flv
I want to include in my backups three files: abc123.flv, ghk432.flv, and fdw917.flv
This is where I'm having trouble, even just including one file from the three to be included to backup
(?i).*\.flv^(?!(abc123\.flv))&
The expression is being added to an Exclusion List for code42 CrashPlan backup, their support unfortunately cannot assist with complex RegEx expressions.
The closest thing I can supply as an example is their Example 3: Using An Exclude To Include:
.*/Documents/((?!(.*\.(doc|rtf)|.*/)$).)*$
http://support.code42.com/Administrator/3.6_And_4.0/Configuring/Using_Include_And_Exclude_Filters
However it excludes all files within directories named "Documents" and includes any files in those folders with doc or rtf file extensions. I'm trying to create an expression working with file extensions irregardless of folder location.
In my brain logically it seems like I need to write this as some kind of if then else statement but regex is not my forte.
Use an anchored negative look ahead with an alternation for the files you want to keep:
^(?i)(?!.*(abc123|ghk432|fdw917)\.flv).*\.flv
The negative lookahead asserts that the following input does not match its regex, and the pipe character means "or".
Try to put the negative lookahead at the position of the filename in the path:
^([^/]*/)*(?!(abc123|ghk432|fdw917)\.flv$)[^/]*\.flv$

hgignore: help ignoring all files but certain ones

I need an .hgdontignore file :-) to include certain files and exclude everything else in a directory. Basically I want to include only the .jar files in a particular directory and nothing else. How can I do this? I'm not that skilled in regular expression syntax. Or can I do it with glob syntax? (I prefer that for readability)
Just as an example location, let's say I want to exclude all files under foo/bar/ except for foo/bar/*.jar.
The answer from Michael is a fine one, but another option is to just exclude:
foo/bar/**
and then manually add the .jar files. You can always add files that are excluded by an ignore rule and it overrides the ignore. You just have to remember to add any jars you create in the future.
To do this, you'll need to use this regular expression:
foo/bar/.+?\.(?!jar).+
Explanation
You are telling it what to ignore, so this expression is searching for things you don't want.
You look for any file whose name (including relative directory) includes (foo/bar/)
You then look for any characters that precede a period ( .+?\. == match one or more characters of any time until you reach the period character)
You then make sure it doesn't have the "jar" ending (?!jar) (This is called a negative look ahead
Finally you grab the ending it does have (.+)
Regular expressions are easy to mess up, so I strongly suggest that you get a tool like Regex Buddy to help you build them. It will break down a regex into plain English which really helps.
EDIT
Hey Jason S, you caught me, it does miss those files.
This corrected regex will work for every example you listed:
foo/bar/(?!.*\.jar$).+
It finds:
foo/bar/baz.txt
foo/bar/baz
foo/bar/jar
foo/bar/baz.jar.txt
foo/bar/baz.jar.
foo/bar/baz.
foo/bar/baz.txt.
But does not find
foo/bar/baz.jar
New Explanation
This says look for files in "foo/bar/" , then do not match if there are zero or more characters followed by ".jar" and then no more characters ($ means end of the line), then, if that isn't the case, match any following characters.
Anyone that wants to use negative lookaheads (or ?! in regex syntax) or any kind of back-referencing mechanism should be aware that Mercurial will fall back from google's RE2 to Python's re module for matching.
RE2 is a non-backtracking engine that guarantees a run-time linear with the size of the input. If performance is important to you, that is if you have a big repository, you should consider sticking to more simple patterns that Re2 supports, which is why I think that the solution offered by Ryan.