Match and replace string with sed in makefile - regex

I want to search through a bunch of MD and HTML files for a line that begins with "id" and replace any instances of a string "old" with the string "new". I had this working from the command line with the following.
find \( -name '*.md' -o -name '*.html' \) -exec sed -i '/id:/s/old/new/g' '{} \;
However, I need to run the command from a Makefile. I have never done anything with make before. When I drop this same command into the Makefile and try to execute it from there, it fails. That's when I realized how little I know about make because I naively thought if it worked from the command line it would work from make. I was wrong. So I was looking in this Makefile for some examples of sed commands that do something similar and I came up with the following. This does not error out but it also does not do anything to my files. So, I am at a loss. Any help is appreciated!
switch_old_for_new:
find \( -name '*.md' -o '*.html' \) -exec sed -i 's#^\(id: \)$(OLD)#\1$(NEW)#' '{}' \;
NOTE: as you can probably see, I need to be able to pass in two actual values for "old" and "new" from the command line, so I also need to have variables in the sed. So I would execute it like this:
make switch_old_for_new OLD=old NEW=new

It seems it was late and you ran out of coffee when copying the command line to make ;)
The only thing that was fishy in your first example was a superfluous ' right before {}. All other things run unchanged in make. In a recipe the \ has no special meaning to make, that is, if make finds it in a tabulator-preceded line after a target: then it should really run verbatim to the solo command line. The only notable exception is a \ right before the line-break, i.e. something like:
target:
echo a very long \
line with a \+newline in it
In this case make will take the \(newline) as indication that it shall pass the current line together with next line (and all subsequent \(newline) concatenated) to the shell in one call instead of separate shell calls for each recipe line in the default case. (Note: only the tab but not the \(newline) will be deleted from the string given to the shell - you need to trick around with variables a bit if that \(newline gets in the way.)
Also, all types of quoting characters '," and also the back-tick (which SO won't allow me to write in syntax font) as well as glob-characters *,? don't invoke any kind of special behaviour - they are passed to the shell as they are.
So your make file could look like:
switch_old_for_new:
find . \( -name '*.md' -o -name '*.html' \) -exec sed -i '/id:/s/$(OLD)/$(NEW)/g' {} \;

Related

Expand command line exclude pattern with zsh

I'm trying to pass a complicated regex as an ignore pattern. I want to ignore all subfolders of locales/ except locales/US/en/*. I may need to fallback to using a .agignore file, but I'm trying to avoid that.
I'm using silver searcher (similar to Ack, Grep). I use zsh in my terminal.
This works really well and ignores all locale subfolders except locales/US:
ag -g "" --ignore locales/^US/ | fzf
I also want to ignore all locales/US/* except for locales/US/en
Want I want is this, but it does not work.
ag -g "" --ignore locales/^US/^en | fzf
Thoughts?
Add multiple --ignore commands. For instance:
ag -g "" --ignore locales/^US/ --ignore locales/US/^en
The following can work as well:
find locales/* -maxdepth 0 -name 'US' -prune -o -exec rm -rf '{}' ';'
Man Pages Documentation
-prune
True; if the file is a directory, do not descend into it. If -depth is given, false; no effect. Because -delete implies -depth, you cannot usefully use -prune and -delete together.
-prune lets you filter out your results (better description here)
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of ';' is encountered. The string '{}' is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find. Both of these constructions might need to be escaped (with a '\') or quoted to protect them from expansion by the shell. See the EXAMPLES section for examples of the use of the -exec option. The specified command is run once for each matched file. The command is executed in the starting directory. There are unavoidable security problems surrounding use of the -exec action; you should use the -execdir option instead.
-exec lets you execute a command on any results find returns.

How to Execute Python File In Unix Find Command

Okay. So I lets say that I am in the main directory of my computer. How can I search for a file.py and execute it with Unix in one line? Two lines is okay but we are assuming we do not know the file path.
Its a simple question but I am unable to find an answer
Updated
Per kojiro's comment, a better method is to use the -exec argument to find.
$ find ./ -name 'file.py' -exec python '{}' \;
The manpage for find explains its usage better than I can, see here under -exec command ;. But in short it will call command for each result with any arguments up to the \; and replacing '{}' with the file path of the result.
Also in the man page for find, it's worth looking at the notes relating to the -print and -print0 flags if you're using the below approach.
Original Answer
Does something like the following do what you want?
$ cd /path/to/dir/
$ find ./ -name 'file.py' | xargs -L 1 python
which is a pretty useful pattern where
find ./ -name 'file.py'
will list all the paths to files with names matching file.py in the current directory or any subdirectory.
Pipe the output of this into xargs which passes each line from its stdin as an argument to the program given to it. In this case, python. However we want to execute python once for every line given to xargs, from the wikipedia article for xargs
one can also invoke a command for each line of input at a time with -L 1
However, this will match all files under the current path that are named 'file.py'. You can probably limit this to the first result with a flag to find if you want.

How to call grep on pattern files?

I'm trying to grep over files which have names matching regexp. But following:
#!/bin/bash
grep -o -e "[a-zA-Z]\{1,\}" $1 -h --include=$2 -R
is working only in some cases. When I call this script like that:
./script.sh dir1/ [A-La-l]+
it doesn't work. But following:
./script.sh dir1/ \*.txt
works fine. I have also tried passing arguments within double-quotes and quotes but neither worked for me.
Do you have any ideas how to solve this problem?
grep's --include option does not accept a regex but a glob (such as *.txt), which is considerably less powerful. You will have to decide whether you want to match regexes or globs -- *.txt is not a valid regex (the equivalent regex is .*\.txt) while [A-La-l]+ is not a valid glob.
If you want to do it with regexes, you will not be able to do it with grep alone. One thing you could do is to leave the file selection to a different tool such as find:
find "$1" -type f -regex "$2" -exec grep -o -e '[a-zA-Z]\{1,\}' -h '{}' +
This will construct and run a command grep -o -e '[a-zA-Z]\{1,\}' -h list of files in $1 matching the regex $2. If you replace the + with \;, it will run the command for each file individually, which should yield the same results (very) slightly more slowly.
If you want to do it with globs, you can carry on as before; your code already does that. You should put double quotes around $1 and $2, though.

How to Recursively Remove Files of a Certain Type

I misread the gzip documentation, and now I have to remove a ton of ".gz" files from many directories inside one another. I tried using 'find' to locate all .gz files. However, whenever there's a file with a space in the name, rm interprets that as another file. And whenever there's a dash, rm interprets that as a new flag. I decided to use 'sed' to replace the spaces with "\ " and the space-dashes with "\ -", and here's what I came up with.
find . -type f -name '*.gz' | sed -r 's/\ /\\ /g' | sed -r 's/\ -/ \\-/g'
When I run the find/sed query on a file that, for example, has a name of "Test - File - for - show.gz", I get the output
./Test\ \-\ File\ \-\ for\ \-\ show.gz
Which appears to be acceptable for rm, but when I run
rm $(find . -type f -name '*.gz'...)
I get
rm: cannot remove './Test\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
rm: cannot remove 'File\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
...
I haven't made extensive use of sed, so I have to assume I'm doing something wrong with the regular expressions. If you know what I'm doing wrong, or if you have a better solution, please tell me.
Adding backslashes before spaces protects the spaces against expansion in shell source code. But the output of a command in a command substitution does not undergo shell parsing, it only undergoes wildcard expansion and field splitting. Adding backslashes before spaces doesn't protect them against field splitting.
Adding backslashes before dashes is completely useless since it's rm that interprets dashes as special, and it doesn't interpret backslashes as special.
The output of find is ambiguous in general — file names can contain newlines, so you can't use a newline as a file name separator. Parsing the output of find is usually broken unless you're dealing with file names in a known, restricted character set, and it's often not the simplest method anyway.
find has a built-in way to execute external programs: the -exec action. There's no parsing going on, so this isn't subject to any problem with special characters in file names. (A path beginning with - could still be interpreted as an option, but all paths begin with . since that's the directory being traversed.)
find . -type f -name '*.gz' -exec rm {} +
Many find implementations (Linux, Cygwin, BSD) can delete files without invoking an external utility:
find . -type f -name '*.gz' -delete
See Why does my shell script choke on whitespace or other special characters? for more information on writing robust shell scripts.
There is no need to pipe to sed, etc. Instead, you can make use of the -exec flag on find, that allows you to execute a command on each one of the results of the command.
For example, for your case this would work:
find . -type f -name '*.gz' -exec rm {} \;
which is approximately the same as:
find . -type f -name '*.gz' -exec rm {} +
The last one does not open a subshell for each result, which makes it faster.
From man find:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until an
argument consisting of ;' is encountered. The string{}' is
replaced by the current file name being processed everywhere it occurs
in the arguments to the command, not just in arguments where it is
alone, as in some versions of find. Both of these constructions
might need to be escaped (with a `\') or quoted to protect them from
expansion by the shell. See the EXAMPLES section for examples of the
use of the -exec option. The specified command is run once for
each matched file. The command is executed in the starting directory.
There are unavoidable security problems surrounding use of the -exec
action; you should use the -execdir option instead.

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.