Ignore everything in a directory except one subfolder - regex

I have a directory ~/x7/music/sfx.
There are some files and folders in the root of ~/x7/music.
I need to sync only the sfx folder and ignore anything else in music.
I've tried many variants, but all of them was wrong.
ignore = Name music/*
ignorenot = Regex music/sfx/.* (OR just *)
does not work.
I was expecting to use something like
ignore = Name music/*^/

I'm not familiar with unison, but to ignore everything except sfx you could use
ignore = Regex /root/path/to/music/.*
ignorenot Regex /root/path/to/music/sfx/.*
Documentation Source
There is also an ignorenot preference, which specifies a set of
patterns for paths that should not be ignored, even if they match an
ignore pattern. However, the interaction of these two sets of patterns
can be a little tricky. Here is exactly how it works:
Unison starts detecting updates from the root of the replicas—i.e., from the empty path. If the empty path matches an ignore pattern and
does not match an ignorenot pattern, then the whole replica will be
ignored. (For this reason, it is not a good idea to include Name *
as an ignore pattern. If you want to ignore everything except a
certain set of files, use Name ?*.)
If the root is a directory, Unison continues looking for updates in all the immediate children of the root. Again, if the name of some
child matches an ignore pattern and does not match an ignorenot
pattern, then this whole path including everything below it will be
ignored.
If any of the non-ignored children are directories, then the process continues recursively.

Following unison's documentation, if a certain path is ignored then so does everything bellow it. So, if you want to ignore everything within a folder except a subfolder, you should not ignore the folder itself, but everything inside it (which is different), and then use ignorenot.
ignore = Path x7/music/?*
ignore = Path x7/music/.?*
ignorenot = Path x7/music/sfx
That should do it.
Regarding the particular regexs used there, I'm following once again unison's documentation advice: "For this reason, it is not a good idea to include Name * as an ignore pattern. If you want to ignore everything except a certain set of files, use Name ?*."
The second ignore line ignores also hidden files/folders within music, if that's necessary for you.

On 2.48.3, this should work:
ignore = Path /root/path/to/music/*
ignorenot = Path /root/path/to/music/sfx

Related

Path Validation - My RegEx is matching leading spaces in directory names and I can't fix it

I'm back again with more RegEx shenanigans.
I had what I thought was a perfect windows path validation expression.
Here it is at Regex101: https://regex101.com/r/BertHu/6
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?[^\\\/:*?"<>|\r\n]+\\?)(?:[^\\\/:*?"<>|\r\n]+\\)*[^\\\/:*?"<>|\r\n]*$
Breakdown:
# (?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\| # Drive
# \\?[^\\\/:*?"<>|\r\n]+\\?) # Relative path
# (?:[^\\\/:*?"<>|\r\n]+\\)* # Folder
# [^\\\/:*?"<>|\r\n]* # File
The issue I'm having now, is that the expression is matching paths with leading spaces in directories.
Example: C:\ Leading Space\ Shouldnt Match is matching.
I tried adding [^\s] to the folder portion of the expression:
(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*
But that only invalidates a leading space in the first path segment:
C:\ LeadingSpace\ShouldntMatch Doesn't match (Good)
C:\LeadingSpace\ ShouldntMatch Matches incorrectly (Bad)
I think the problem lies here:
If anyone could help or point me in the right direction that would be great.
Sorry for all the RegEx questions!
Well it depends what the exact rules are, if I take your regex101 script, as basis, I would say:
File, Folder and Relative Folder, are more or less the same (if you ignore the no-capture group and the Backslashes):
\\?[^\\\/:*?"<>|\r\n]+\\?
(?:[^\\\/:*?"<>|\r\n]+\\)*
[^\\\/:*?"<>|\r\n]*
So there are three potenital places, where folders could start with a leading space.
You could add a [^\s] infront of all of them like this
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?([^\s][^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
I saved the modified regex101 https://regex101.com/r/Pd3lcR/1
Now it should work, at least for my limited testcases, and information about the restriction.
Btw.: I don't know what your use case is, but this regex is pretty long for a smiple matching and filename capture, may be there is a more readable way(for non regex people).
Update:
to fix the introduced Bug, I have to prevent the Share option matching with the relative path, by preventing a double slash with (?!\\)
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?((?!\\)[^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
here the updated regex: https://regex101.com/r/RMVkTC/3
Update (Version 2):
I rewrote the regex to the way I would create it. It is not perfectly optimized(short), but this way it is easier to test/bugfix.
The RegExp is exactly 3 parts, piped together:
Drive + path + folder/file: (^[a-z]:\\([^\s][^\\\/:*?"<>|\r\n]+\\)*[^\s][^\\\/:*?"<>|\r\n]+$)
relativepath + folder/file: (^(\.?\.?\\[^\\])*([^\\\/:*?"<>|\r\n]+\\?)*$)
Share + folder/file: (^\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\([^\s][^\\\/:*?"<>|\r\n]*\\?)*$)
like this, if you have to change something for one edge case, it is more contained and easy to adapt.
here the updated regex: https://regex101.com/r/Qxj3Ni/1

regex match file name in a path

I am trying to match the file name in a path. For example:
/path/test/index.html
I would want to match index.html
However I also can have a path with no / so the path could be
index.html
and would want to match index.html
I have the following to match the first case and can grab it with a group.
.*/([^/]+)
But how can I also match a file name when the only thing in the path is the file name?
There is probably no need to have anything but [^/]+$ unless you want to
match the entire line and your engine matcher requires it.
There is usually more than one way to do a regex, and also that there are often edge cases that end up complicating a simple task.
If you want to match any/ every valid file name in a string then perhaps:
[A-Za-z0-9_-]+\.?[A-Za-z0-9]*
or (since you can have a file named ConfigurationFile.txt.bac for example)...
[A-Za-z0-9_-\.]+\.?[A-Za-z0-9]*
But that is not what you want because each directory name is a valid file name... so...
this will match only valid file names with an extension.
[A-Za-z0-9_-]+\.[A-Za-z0-9]+
or
[A-Za-z0-9_-\.]+\.[A-Za-z0-9]+
Clearly there are many options. The AA(accepted answer) only matches any string in a path that is at the end of the path. It does not match a file name without a path. The AA may well do for the OP. It is helpful to me to be able to match any file name within a string.
There are always edge cases, for example in my case I am still matching version numbers with this regex. I have a work around for my case but I am getting too specific.
Make the .*/ into an optional group:
(?:.*/)?([^/]+)

Regex expression to match a string but exclude something at the same time

I want to try and ask this as concisely as possible please forgive me if I'm leaving something out. I want the expression to match all cases except where an exact filename string is present.
A backup software I'm using uses regular expressions and I want to setup an exclusion to skip all of a particular file extension type, except I have certain files I need to backup so I don't want them to match.
The files I want to exclude are we'll say for this example *.FLV
(?i).*\.flv
I want to include in my backups three files: abc123.flv, ghk432.flv, and fdw917.flv
This is where I'm having trouble, even just including one file from the three to be included to backup
(?i).*\.flv^(?!(abc123\.flv))&
The expression is being added to an Exclusion List for code42 CrashPlan backup, their support unfortunately cannot assist with complex RegEx expressions.
The closest thing I can supply as an example is their Example 3: Using An Exclude To Include:
.*/Documents/((?!(.*\.(doc|rtf)|.*/)$).)*$
http://support.code42.com/Administrator/3.6_And_4.0/Configuring/Using_Include_And_Exclude_Filters
However it excludes all files within directories named "Documents" and includes any files in those folders with doc or rtf file extensions. I'm trying to create an expression working with file extensions irregardless of folder location.
In my brain logically it seems like I need to write this as some kind of if then else statement but regex is not my forte.
Use an anchored negative look ahead with an alternation for the files you want to keep:
^(?i)(?!.*(abc123|ghk432|fdw917)\.flv).*\.flv
The negative lookahead asserts that the following input does not match its regex, and the pipe character means "or".
Try to put the negative lookahead at the position of the filename in the path:
^([^/]*/)*(?!(abc123|ghk432|fdw917)\.flv$)[^/]*\.flv$

Regex to match directory path and ignore filepaths

I have following input list from which I want to extract directories path and ignore filepaths. Below is an example of input list separated by ;
MI4/Search/Service/src/main/resources/META-INF/persistence.xml;MI4/Search/Service/src/main/resources/META-INF;MI4/FRSearch/Service/src/main/resources/resource/spring.xml;MI4/Search/Service/src/main/resources/conf;
The regex should match
MI4/Search/Service/src/main/resources/META-INF;
MI4/Search/Service/src/main/resources/conf;
First of all, directories can be named with extensions, so checking for the presence or absence of an extension in a path is not a reliable way to do this to determine if something really is a file or directory. In fact, the only way you can determine if a path is a directory or file name would be to use the appropriate OS API, e.g. GetFileAttributes on Windows or stat on Linux.
If this is your requirement, then you should split on the semicolon and iterate over each path that results, feeding each one in turn into the appropriate API to determine if it is a file or directory. If a textual match is all you need, I would still suggest you split on the semicolon and then match each individual path against an appropriate regular expression.
A Ruby function that would extract the extension might look like the following:
def get_extension(path)
path =~ /[^\/](\.[^.\/]*)$/
$1
end
Note that there are a few issues you'll need to deal with. This regular expression, for example, would treat the path foo/bar/.hidden as a path without an extension. This might not be exactly the behaviour you need. You'd need to tweak the expression appropriately.
You would then obtain all the paths for which get_extension returns nil. Please let us know which language you're trying to do this in, since there are significant syntactic differences.

How do you find a "."?

I'm trying to create a regular expression to look for filenames from full file paths, but it should not return for just a directory. For example, C:\Users\IgneusJotunn\Desktop\file.doc should return file.doc while C:\Users\IgneusJotunn\Desktop\folder should find no matches. These are all Word or Excel files, but I prefer not to rely on that. This:
StringRegExp($string, "[^\\]*\z",1)
finds whatever is after the last slash, but can't differentiate files from folders. This:
StringRegExp($string, "[^\\]*[dx][ol][cs]\z",1)
almost works, but is an ugly hack and there may be docx or xlsx files. Plus, files could be named like MyNamesDoc.doc. Easily solved if I could search for a period, but . is a used character (it means any single character except a newline) which does not seem to work with escapes. This:
StringRegExp($ue_string, "[^\\]*\..*\z",1)
should work, finding anything after the last backslash, capturing only something with a period in it. How to incorporate a period? Or any way to just match files?
Edit: Answered my own question. I'm interested in why it wasn't working and if there's a more elegant solution.
Local $string = StringRegExp($string, "[^\\]*\.doc\z|[^\\]*\.docx\z|[^\\]*\.xls\z|[^\\]*\.xlsx\z",1)
Periods do in fact work with the same escape slash most special characters use. As for the document type, an Or pipe and a different extension works great. If for some reason you need to add an extension, just add another Or.
Meh, I'm bored. You could do this:
$sFile = StringRegExp($sPath, "[^\\]+\.(?:doc|xls)x?$", 1)
There's no guarantees that a folder wouldn't be named that, so to be absolutely certain you'd have to check the file/folder attributes. However it's doubtful anyone would name a folder with something like '.docx'
Reverse the string.
Look for the "."
Look for "\" with StringInStr (and/or "/")
Trim the right side from the return of StringinStr
Reverse it again.