hgignore: help ignoring all files but certain ones - regex

I need an .hgdontignore file :-) to include certain files and exclude everything else in a directory. Basically I want to include only the .jar files in a particular directory and nothing else. How can I do this? I'm not that skilled in regular expression syntax. Or can I do it with glob syntax? (I prefer that for readability)
Just as an example location, let's say I want to exclude all files under foo/bar/ except for foo/bar/*.jar.

The answer from Michael is a fine one, but another option is to just exclude:
foo/bar/**
and then manually add the .jar files. You can always add files that are excluded by an ignore rule and it overrides the ignore. You just have to remember to add any jars you create in the future.

To do this, you'll need to use this regular expression:
foo/bar/.+?\.(?!jar).+
Explanation
You are telling it what to ignore, so this expression is searching for things you don't want.
You look for any file whose name (including relative directory) includes (foo/bar/)
You then look for any characters that precede a period ( .+?\. == match one or more characters of any time until you reach the period character)
You then make sure it doesn't have the "jar" ending (?!jar) (This is called a negative look ahead
Finally you grab the ending it does have (.+)
Regular expressions are easy to mess up, so I strongly suggest that you get a tool like Regex Buddy to help you build them. It will break down a regex into plain English which really helps.
EDIT
Hey Jason S, you caught me, it does miss those files.
This corrected regex will work for every example you listed:
foo/bar/(?!.*\.jar$).+
It finds:
foo/bar/baz.txt
foo/bar/baz
foo/bar/jar
foo/bar/baz.jar.txt
foo/bar/baz.jar.
foo/bar/baz.
foo/bar/baz.txt.
But does not find
foo/bar/baz.jar
New Explanation
This says look for files in "foo/bar/" , then do not match if there are zero or more characters followed by ".jar" and then no more characters ($ means end of the line), then, if that isn't the case, match any following characters.

Anyone that wants to use negative lookaheads (or ?! in regex syntax) or any kind of back-referencing mechanism should be aware that Mercurial will fall back from google's RE2 to Python's re module for matching.
RE2 is a non-backtracking engine that guarantees a run-time linear with the size of the input. If performance is important to you, that is if you have a big repository, you should consider sticking to more simple patterns that Re2 supports, which is why I think that the solution offered by Ryan.

Related

Writing valid RegEx for use in file/folder exclusion

I'm trying to write two expressions to use in the files/folder Exclusion List for Code42 CrashPlan backup. Their support won't help with RegEx expressions, they just point me to their KB article.
In their "File Exclusions" section, I'd like to:
exclude this folder specifically: S:\Google Drive\Temp
any file or folder containing the string Backup_Excluded anywhere in its name.
This is what I've got so far - but I have no way of knowing if they're correct:
(?i).*Google Drive\\Temp ...but since I really want to exclude a specific folder, not a pattern - do I need to escape the slashes and colon in the path of S:\Google Drive\Temp
(?i).*Backup_Excluded
Research disclaimer: I know there are RegEx resources out there, but am unsure which flavor/syntax to use, as I'd imagine there are many. I was hoping those with more RegEx familiarity could advise.
The link you posted says:
The Code42 app treats all file separators as forward slashes /.
So it seems you'd want to use / instead of \\ in your regular expressions.
Colon doesn't need escaping.
\ needs escaping because it's the escaping character itself.
/ normally needs escaping because it is the default separators for regular expression sections. However, the examples in your link don't escape it, so only the matching section is implied, so no escaping.
Then you could probably use:
S:/Google Drive/Temp
or [A-Z]:/Google Drive/Temp (to allow any drive)
.*Backup_Excluded.*
I probably wouldn't use (?i), as the capitals in those strings are usually there, but that's your call.
Check out e.g. https://regex101.com/ to test your regular expressions (also in different flavours).

Regex expression to match a string but exclude something at the same time

I want to try and ask this as concisely as possible please forgive me if I'm leaving something out. I want the expression to match all cases except where an exact filename string is present.
A backup software I'm using uses regular expressions and I want to setup an exclusion to skip all of a particular file extension type, except I have certain files I need to backup so I don't want them to match.
The files I want to exclude are we'll say for this example *.FLV
(?i).*\.flv
I want to include in my backups three files: abc123.flv, ghk432.flv, and fdw917.flv
This is where I'm having trouble, even just including one file from the three to be included to backup
(?i).*\.flv^(?!(abc123\.flv))&
The expression is being added to an Exclusion List for code42 CrashPlan backup, their support unfortunately cannot assist with complex RegEx expressions.
The closest thing I can supply as an example is their Example 3: Using An Exclude To Include:
.*/Documents/((?!(.*\.(doc|rtf)|.*/)$).)*$
http://support.code42.com/Administrator/3.6_And_4.0/Configuring/Using_Include_And_Exclude_Filters
However it excludes all files within directories named "Documents" and includes any files in those folders with doc or rtf file extensions. I'm trying to create an expression working with file extensions irregardless of folder location.
In my brain logically it seems like I need to write this as some kind of if then else statement but regex is not my forte.
Use an anchored negative look ahead with an alternation for the files you want to keep:
^(?i)(?!.*(abc123|ghk432|fdw917)\.flv).*\.flv
The negative lookahead asserts that the following input does not match its regex, and the pipe character means "or".
Try to put the negative lookahead at the position of the filename in the path:
^([^/]*/)*(?!(abc123|ghk432|fdw917)\.flv$)[^/]*\.flv$

Checkstyle - How to exclude any file, but the ones ending with 'Impl.java', with a regex?

I have been thinking some time about this problem.
I have a huge Maven multi module project and using build-tools I share a custom set of rules for all modules (no problem around here).
The problem comes when I want to apply one rule to just one set of files in just one of the modules. By using checkstyle suppressions file I can easyly exclude all files that I don't want the rule to apply, but it has its limitations. Lets put this into an example:
Files:
a/b/c/d/FileImpl.java
a/b/c/d/File.java
a/b/c/d/e/FileImpl.java
a/b/c/FileImpl.java
...
What regex would you write that assures you that all files (including future files that may be introduced) get excluded but just the ones that end with Impl.java under package a.b.c.d? in terms of regex, it has to be a regex that matches anything but the file ending I want.
It would be easyer if I could just set an "includes" referring only to the set of files to apply the rule to, but as far as I know that's not possible. It has to be using suppressions, so that it suppresses all files but the ones I want.
I have tried using capturing groups, lookahead and lookbehind but had no success at all.
Any ideas?
What regex would you write that assures you that all files (including future files that may be introduced) get excluded but just the ones that end with Impl.java under package a.b.c.d?
Try this regex:
a/b/c/d/.+Impl\.java$
Demo
http://fiddle.re/rpqgg
Found the right solution to the not so clear question, hope it helps anyone interested.
http://fiddle.re/9wvfg
Regex Goal --> match any chain but the ones that match the internal regex.
Regex --> ^((?!d/\w+Impl\.java).)*$
Using negative lookahead and the template: ^((?!regexp).)*$
Thanks for the support!!

How do you find a "."?

I'm trying to create a regular expression to look for filenames from full file paths, but it should not return for just a directory. For example, C:\Users\IgneusJotunn\Desktop\file.doc should return file.doc while C:\Users\IgneusJotunn\Desktop\folder should find no matches. These are all Word or Excel files, but I prefer not to rely on that. This:
StringRegExp($string, "[^\\]*\z",1)
finds whatever is after the last slash, but can't differentiate files from folders. This:
StringRegExp($string, "[^\\]*[dx][ol][cs]\z",1)
almost works, but is an ugly hack and there may be docx or xlsx files. Plus, files could be named like MyNamesDoc.doc. Easily solved if I could search for a period, but . is a used character (it means any single character except a newline) which does not seem to work with escapes. This:
StringRegExp($ue_string, "[^\\]*\..*\z",1)
should work, finding anything after the last backslash, capturing only something with a period in it. How to incorporate a period? Or any way to just match files?
Edit: Answered my own question. I'm interested in why it wasn't working and if there's a more elegant solution.
Local $string = StringRegExp($string, "[^\\]*\.doc\z|[^\\]*\.docx\z|[^\\]*\.xls\z|[^\\]*\.xlsx\z",1)
Periods do in fact work with the same escape slash most special characters use. As for the document type, an Or pipe and a different extension works great. If for some reason you need to add an extension, just add another Or.
Meh, I'm bored. You could do this:
$sFile = StringRegExp($sPath, "[^\\]+\.(?:doc|xls)x?$", 1)
There's no guarantees that a folder wouldn't be named that, so to be absolutely certain you'd have to check the file/folder attributes. However it's doubtful anyone would name a folder with something like '.docx'
Reverse the string.
Look for the "."
Look for "\" with StringInStr (and/or "/")
Trim the right side from the return of StringinStr
Reverse it again.

Regex: Get Filename Without Extension in One Shot?

I want to get just the filename using regex, so I've been trying simple things like
([^\.]*)
which of course work only if the filename has one extension. But if it is adfadsfads.blah.txt I just want adfadsfads.blah. How can I do this with regex?
In regards to David's question, 'why would you use regex' for this, the answer is, 'for fun.' In fact, the code I'm using is simple
length_of_ext = File.extname(filename).length
filename = filename[0,(filename.length-length_of_ext)]
but I like to learn regex whenever possible because it always comes up at Geek cocktail parties.
Try this:
(.+?)(\.[^.]*$|$)
This will:
Capture filenames that start with a dot (e.g. .logs is a file named .logs, not a file extension), which is common in Unix.
Gets everything but the last dot: foo.bar.jpeg gets you foo.bar.
Handles files with no dot: secret-letter gets you secret-letter.
Note: as commenter j_random_hacker suggested, this performs as advertised, but you might want to precede things with an anchor for readability purposes.
Everything followed by a dot followed by one or more characters that's not a dot, followed by the end-of-string:
(.+?)\.[^\.]+$
The everything-before-the-last-dot is grouped for easy retrieval.
If you aren't 100% sure every file will have an extension, try:
(.+?)(\.[^\.]+$|$)
how about 2 captures one for the end and one for the filename.
eg.
(.+?)(?:\.[^\.]*$|$)
^(.*)\\(.*)(\..*)$
Gets the Path without the last \
The file without extension
The the extension with a .
Examples:
c:\1\2\3\Books.accdb
(c:\1\2\3)(Books)(.accdb)
Does not support multiple . in file name
Does support . in file path
I realize this question is a bit outdated, however, I had some trouble finding a good source and wound up making the regex myself. To save whoever may find this time,
If you're looking for a ~standalone~ regex
This will match the extension without the dot
\w+(?![\.\w])
This will always match the file name if it has an extention
[\w\. ]+(?=[\.])
Ok, I am not sure why I would use regular expression for this. If I know for example that the string is a full filepath, then I would use another API to get the file name. Regular expressions are very powerfull but at the same time quite complex (you have just proved that by asking how to create such a simple regex). Somebody said: you had a problem that you decided to solve it using regular expressions. Now you have two problems.
Think again. If you are on .NET platform for example, then take a look at System.IO.Path class.
I used this pattern for simple search:
^\s*[^\.\W]+$
for this text:
file.ext
fileext
file.ext.ext
file.ext
fileext
It finds fileext in the second and last lines.
I applied it in a text tree view of a folder (with spaces as indents).
Just the name of the file, without path and suffix.
^.*[\\|\/](.+?)\.[^\.]+$
Try
(?<=[\\\w\d-:]*\\)([\w\d-:]*)(?=\.[\.\w\d-:]*)
Captures just the filename of any kind within an entire filepath. Purposefully excludes the file path and the file extension
Etc:
C:\Log\test\bin\fee105d1-5008-410c-be39-883e5e40a33d.pdf
Doesn't capture (C:\Log\test\bin)
Captures (fee105d1-5008-410c-be39-883e5e40a33d)
Doesn't capture (.pdf)
This RegExp works for me:
(.+(?=\..+$))|(.+[^\.])
Results (bold means match):
test.txt
test 234!.something123
.test
.test.txt
test.test2.txt
.