inotifywait - exclude regex pattern formatting - regex

I am trying to use inotifywait to watch all .js files under my ~/js directory; how do I format my regex inside the following command?
$ inotifywait -m -r --exclude [REGEX HERE] ~/js
The regex - according to the man page, should be of POSIX extended regular expression - needs to match "all files except those that ends in .js", so these files can in turn be excluded by the --exclude option.
I've tried the (?!) lookaround thing, but it doesn't seem to work in this case. Any ideas or workarounds? Would much appreciate your help on this issue.

I've tried the (?!) thing
This thing is called negative lookahead and it is not supported by POSIX ERE.
So you have to do it the hard way, i.e. match everything that you want to exclude.
e.g.
\.(txt|xml) etc.

inotifywait has no include option and POSIX extended regular expressions don't support negation. (Answered by FailedDev)
You can patch the inotify tools to get an --include option. But you need to compile and maintain it yourself. (Answered by browndav)
A quicker workaround is using grep.
$ inotifywait -m -r ~/js | grep '\.js$'
But be aware of grep's buffering if you pipe the output to another commands. Add --line-buffered to make it work with while read. Here is an example:
$ inotifywait -m -r ~/js | grep '\.js$' --line-buffered |
while read path events file; do
echo "$events happened to $file in $path"
done
If you just want to watch already existing files, you can also use find to generate the list of files. It will not watch newly created files.
$ find ~/js -name '*.js' | xargs inotifywait -m
If all your files are in one directory, you can also use ostrokach's suggestion. In that case shell expansion is much easier than find and xargs. But again, it won't watch newly created files.
$ inotifywait -m ~/js/*.js

I posted a patch here that adds --include and --includei options that work like negations of --exclude and --excludei:
https://github.com/browndav/inotify-tools/commit/160bc09c7b8e78493e55fc9f071d0c5575496429
Obviously you'd have to rebuild inotifytools, and this is relatively untested, but hopefully it can make it in to mainline or is helpful to someone who comes across this post later.

Make sure you are quoting the regex command, if you are using shell-relevant characters (including ()).
While this is working:
inotifywait --exclude \.pyc .
this is not:
inotifywait --exclude (\.pyc|~) .
You have to quote the entire regular expression:
inotifywait --exclude '.*(\.pyc|~)' .

As of version 3.20.1, inotifywait does include the --include and --includei options.
To see them, run inotifywait --help. For some reason, they aren't documented in the manpages.

You could get most of this with --exclude '\.[^j][^s]' to ignore files unless they contain .js at some point in the filename or path. If you combine it with -r then it will work with arbitrary levels of nesting.
Only drawback is filenames like test.js.old will still be watched and all files inside a directory called example.js/ will also be watched, but this is probably somewhat unlikely.
You could probably extend this regex to fix this but personally I don't think the drawbacks are a big enough of a deal to worry about.

Related

Regex with inotifywait to compile two types of file in golang

I use a script to auto-compile in golang with inotifywait. But this script only checks files with the extension .go. I want to also add the .tmpl extension but the script uses regular expressions. What kind of changes I have to make to this line to get the desired result?
inotifywait -q -m -r -e close_write -e moved_to --exclude '[^g][^o]$' $1
I've tried to concatenate with | or & and other things like ([^t][^m][^p][^l]|[^g][^o])$ but nothing seems to work.
Rather than trying to use a regex to exclude two types of file, why don't you just only watch those files?
inotifywait -q -m -r -e close_write -e moved_to /path/**/*.{go,tmpl}
To use the ** (which does a recursive match), you may have to enable bash's globstar:
shopt -s globstar

Search filenames with regex

Is there any way to do something like git log <path>, but instead of path using a regex? I want to search commits containing files, whose filenames match a given pattern...
... and while we're at it: Is there also a way to do a git status / git diff only for filenames matching a given pattern?
Thanks in advance!
EDIT: I would be terrific if any way to do it, would also work for Git v1.7.1.
As far as a pure git solution goes and I'm aware of the only option to match specific file patterns is to use a glob.
git log -- '*.json'
Will give you all files which contain changes to a json file. The same can be done for git status.
On the other hand it's quite easy to search for regular expressions in the diff or the commit message. git log offers a --grep option to search for matches in the commit message and a -S option to search for strings.
Take a look at this question for further details.
For a simple pattern you could try, for example:
find . -name "*.c" | xargs git log
For a full-blown regex you can use:
find . | grep "REGEX" | xargs git log
If you need previously deleted files to be included in the output, you can use
git log --all --pretty=format: --name-only --diff-filter=A | sort -u | grep "REGEX" | xargs git log --
The first part of the above command, which finds all files that were ever in git, was lifted from an answser to this other question.
Thanks to your answers (especially Greg and Michael) I developed a way myself. (I hope this proves viable):
git log --name-only --pretty="format:"|sort -u|egrep '<REGEX>'|xargs git log --
Can you do something like:
git log | grep [string_to_look_for]

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

Copy and Rename Multiple Files with Regular Expressions in bash

I've got a file structure that looks like:
A/
2098765.1ext
2098765.2ext
2098765.3ext
2098765.4ext
12345.1ext
12345.2ext
12345.3ext
12345.4ext
B/
2056789.1ext
2056789.2ext
2056789.3ext
2056789.4ext
54321.1ext
54321.2ext
54321.3ext
54321.4ext
I need to rename all the files that begin with 20 to start with 10; i.e., I need to rename B/2022222.1ext to B/1022222.1ext
I've seen many of the other questions regarding renaming multiple files, but couldn't seem to make it work for my case. Just to see if I can figure out what I'm doing before I actually try to do the copy/renaming I've done:
for file in "*/20?????.*"; do
echo "{$file/20/10}";
done
but all I get is
{*/20?????.*/20/10}
Can someone show me how to do this?
You just have a little bit of incorrect syntax is all:
for file in */20?????.*; do mv $file ${file/20/10}; done
Remove quotes from the argument to in. Otherwise, the filename expansion does not occur.
The $ in the substitution should go before the bracket
Here is a solution which use the find command:
find . -name '20*' | while read oldname; do echo mv "$oldname" "${oldname/20/10}"; done
This command does not actually do your bidding, it only prints out what should be done. Review the output and if you are happy, remove the echo command and run it for real.
Just wanna add to Explosion Pill's answer.
On OS X though, you must say
mv "${file}" "${file_expression}"
Or the mv command does not recognize it.
Brace expansions like :
{*/20?????.*/20/10}
can't be surrounded by quotes.
Instead, try doing (with Perl rename) :
rename 's/^10/^20/' */*.ext
You can do this using the Perl tool rename from the shell prompt. (There are other tools with the same name which may or may not be able to do this, so be careful.)
If you want to do a dry run to make sure you don't clobber any files, add the -n switch to the command.
note
If you run the following command (linux)
$ file $(readlink -f $(type -p rename))
and you have a result like
.../rename: Perl script, ASCII text executable
then this seems to be the right tool =)
This seems to be the default rename command on Ubuntu.
To make it the default on Debian and derivative like Ubuntu :
sudo update-alternatives --set rename /path/to/rename
The glob behavior of * is suppressed in double quotes. Try:
for file in */20?????.*; do
echo "${file/20/10}";
done

sed script extended regex

I need to write multiple sed script files. I can't seem to find a way to enable extended regex from within the script. Is this possible? It isn't possible for me to use option flags because the scripts need to run on an external environment which isn't under my control.
You can try specifying the flag in the script shebang, say:
#!/bin/sed -rf
# script goes here
And then tell the admin to run the script as is (chmod a+x it first, then ./script.sed) so the shebang line is used for finding the right interpreter.
You may need to substitute /bin/sed with the right path for your environment. Unfortunately you probably won't be able to use /usr/bin/env sed -r for this (the extra -r is a problem).
I think the answer to your question is "no", but, if this is GNU sed, then you probably don't really need extended regular expressions, because GNU sed's implementation of basic regular expressions actually supports the features of EREs that true POSIX BREs don't. Admittedly, the result is incredibly, painfully backslash-heavy — ERE's s/(a|b+|cd?)/e/g becomes BRE's s/\(a\|b\+\|cd\?\)/e/g — but it works.