Recursive find and replace based on regex - regex

I have changed up my director structure and I want to do the following:
Do a recursive grep to find all instances of a match
Change to the updated location string
One example (out of hundreds) would be:
from common.utils import debug --> from etc.common.utils import debug
To get all the instances of what I'm looking for I'm doing:
$ grep -r 'common.' ./
However, I also need to make sure common is preceded by a space. How would I do this find and replace?

It's hard to tell exactly what you want because your refactoring example changes the import as well as the package, but the following will change common. -> etc.common. for all files in a directory:
sed -i 's/\bcommon\./etc.&/' $(egrep -lr '\bcommon\.' .)
This assumes you have gnu sed available, which most linux systems do. Also, just to let you know, this will fail if there are too many files for sed to handle at one time. In that case, you can do this:
egrep -lr '\bcommon\.' . | xargs sed -i 's/\bcommon\./etc.&/'
Note that it might be a good idea to run the sed command as sed -i'.OLD' 's/\bcommon\./etc.&/' so that you get a backup of the original file.

If your grep implementation supports Perl syntax (-P flag, on e.g. Linux it's usually available), you can benefit from the additional features like word boundaries:
$ grep -Pr '\bcommon\.'
By the way:
grep -r tends to be much slower than a previously piped find command as in Rob's example. Furthermore, when you're sure that the file-names found do not contain any whitespace, using xargs is much faster than -exec:
$ find . -type f -name '*.java' | xargs grep -P '\bcommon\.'
Or, applied to Tim's example:
$ find . -type f -name '*.java' | xargs sed -i.bak 's/\<common\./etc.common./'
Note that, in the latter example, the replacement is done after creating a *.bak backup for each file changed. This way you can review the command's results and then delete the backups:
$ find . -type f -name '*.bak' | xargs rm
If you've made an oopsie, the following command will restore the previous versions:
$ find . -type f -name '*.bak' | while read LINE; do mv -f $LINE `basename $LINE`; done
Of course, if you aren't sure that there's no whitespace in the file names and paths, you should apply the commands via find's -exec parameter.
Cheers!

This is roughly how you would do it using find. This requires testing
find . -name \*.java -exec sed "s/FIND_STR/REPLACE_STR/g" {}
This translates as "Starting from the current directory find all files that end in .java and execute sed on the file (where {} is a place holder for the currently found file) "s/FIND_STR/REPLACE_STR/g" replaces FIND_STR with REPLACE_STR in each line in the current file.

Related

sed command - find and replace while excluding specific pattern in find

I have a specific case with sed.
As part of the build process, we are required to find and replace the version numbers of required artifacts with the newly created tag name across multiple modules (pom.xml) files.
The command we use is this:
find . -name "*.xml" -print0 | xargs -0 sed -i 's/6\.0\.0\.0/6.0.0.0.001/g'
The edge case we have is this:
There are some modules that have similar version numbers but are already frozen versions coming in from our repository. These need not be changed.
Entries such as <modulename-version>modulename-6.0.0.0.016</modulename-version> are present in the pom.xml's but do not need to be changed.
Is there a way to ignore a pattern of 6\.0\.0\.0\.\d{3} with sed?
The entire setup is intended to run un-attended via python-fabric on our remote build server and we really dont want to wake up in the night to try and solve a problem where a module modulename-6.0.0.0.001.016.jar was not found!
Any help in this space would be most appreciated!
Change your sed command to:
'/6\.0\.0\.0\.\d{3}/!s/6\.0\.0\.0/6.0.0.0.001/g'
Or
'/6\.0\.0\.0\.\d{3}/b; s/6\.0\.0\.0/6.0.0.0.001/g'
Sed may also not accept \d, so you can just use [0-9]:
'/6\.0\.0\.0\.[0-9]{3}/!s/6\.0\.0\.0/6.0.0.0.001/g'
{3} also may need -r
`sed -r ...`
Complete commands:
find . -name "*.xml" -print0 | xargs -0 sed -ri '/6\.0\.0\.0\.[0-9]{3}/!s/6\.0\.0\.0/6.0.0.0.001/g'
find . -name "*.xml" -print0 | xargs -0 sed -ri '/6\.0\.0\.0\.[0-9]{3}/b; s/6\.0\.0\.0/6.0.0.0.001/g'

Safe search&replace on linux

Let's consider I have files located in different subfolders and I would like to search, test and replace something into these files.
I would like to do it in three steps:
Search of a specific pattern (with or without regexp)
Test to replace it with something (with or without regexp)
Apply the changes only to the concerned files
My current solution is to define some aliases in my .bashrc in order to easily use grep and sed:
alias findsrc='find . -name "*.[ch]" -or -name "*.asm" -or -name "*.inc"'
alias grepsrc='findsrc | xargs grep -n --color '
alias sedsrc='findsrc | xargs sed '
Then I use
grepsrc <pattern> to search my pattern
(no solution found yet)
sedsrc -i 's/<pattern>/replace/g'
Unfortunately this solution does not satisfy me. The first issue is that sed touch all the files even of no changes. Then, the need to use aliases does not look very clean to me.
Ideally I would like have a workflow similar to this one:
Register a new context:
$ fetch register 'mysrcs' --recurse *.h *.c *.asm *.inc
Context list:
$ fetch context
1. mysrcs --recurse *.h *.c *.asm *.inc
Extracted from ~/.fetchrc
Find something:
$ fetch files mysrcs /0x[a-f0-9]{3}/
./foo.c:235 Yeah 0x245
./bar.h:2 Oh yeah 0x2ac hex
Test a replacement:
$ fetch test mysrcs /0x[a-f0-9]{3}/0xabc/
./foo.c:235 Yeah 0xabc
./bar.h:2 Oh yeah 0xabc hex
Apply the replacement:
$ fetch subst --backup mysrcs /0x[a-f0-9]{3}/0xabc/
./foo.c:235 Yeah 0xabc
./bar.h:2 Oh yeah 0xabc hex
Backup number: 242
Restore in case of mistake:
$ fetch restore 242
This kind of tools look pretty standard to me. Everybody needs to search and replace. What alternative can I use that is standard in Linux?
#!/bin/ksh
# Call the batch with the 2 (search than replace) pattern value as argument
# assuming the 2 pattern are "sed" compliant regex
SearchStr="$1"
ReplaceStr="$2"
# Assuming it start the search from current folder and take any file
# if more filter needed, use a find before with a pipe
grep -l -r "$SearchStr" . | while read ThisFile
do
sed -i -e "s/${SearchStr}/${ReplaceStr}/g" ${ThisFile}
done
should be a base script to adapt to your need
I often have to perform such maintenance tasks. I use a mix of find, grep, sed, and awk.
And instead of aliases, I use functions.
For example:
# i. and ii.
function grepsrc {
find . -name "*.[ch]" -or -name "*.asm" -or -name "*.inc" -exec grep -Hn "$1"
}
# iii.
function sedsrc {
grepsrc "$1" | awk -F: '{print $1}' | uniq | while read f; do
sed -i s/"$1"/"$2"/g $f
done
}
Usage example:
sedsrc "foo[bB]ar*" "polop"
for F in $(grep -Rl <pattern>) ; do sed 's/search/replace/' "$F" | sponge "$F" ; done
grep with the -l argument just lists files that match
We then use an iterator to just run those files which match through sed
We use the sponge program from the moreutils package to write the processed stream back to the same file
This is simple and requires no additional shell functions or complex scripts.
If you want to make it safe as well... check the folder into a Git repository. That's what version control is for.
Yes there is a tool doing exactely that you are looking for. This is Git. Why do you want to manage the backup of your files in case of mistakes when specialized tools can do that job for you?
You split your request in 3 subquestions:
How quickly search into a subset of my files?
How to apply a substitution temporarly, then go back to the original state?
How to substitute into your subset of files?
We first need to do some jobs in your workspace. You need to init a Git repository then add all your files into this repository:
$ cd my_project
$ git init
$ git add **/*.h **/*.c **/*.inc
$ git commit -m "My initial state"
Now, you can quickly get the list of your files with:
$ git ls-files
To do a replacement, you can either use sed, perl or awk. Here the example using sed:
$ git ls-files | xargs sed -i -e 's/search/replace/'
If you are not happy with this change, you can roll-back anytime with:
$ git checkout HEAD
This allows you to test your change and step-back anytime you want to.
Now, we did not simplified the commands yet. So I suggest to add an alias to your Git configuration file, usually located here ~/.gitconfig. Add this:
[alias]
sed = ! git grep -z --full-name -l '.' | xargs -0 sed -i -e
So now you can just type:
$ git sed s/a/b/
It's magic...

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

Replace text across multiple files in a directory with sed

I need to hide the IP addresses in the log files for security reasons. The IP addresses are of version 4 and 6. How do I hide the addresses in a way that, IPv4 example 123.4.32.16 is replaced by x.x.x.x and IPv6 example 232e:23o5:te43:5423:5433:0000:ef09:23ff is replaced by x:x:x:x:x:x:x:x? Is it possible to do this using a single sed command?
You might want to use find and sed for this.
Let's assume your logs have the extension ".log":
find /path/to/logs -type f -name '*.log' -exec \
sed -i -e 's,[0-9]\+\(\.[0-9]\+\)\{3\},x.x.x.x,g' \
-e 's,[0-9a-f]\+\(:[0-9a-f]\+\)\{7\},x:x:x:x:x:x:x:x,gi' {} \;
How does this work?
First, we ask find to recursively locate files with the .log extension starting from /path/to/logs. -type f tells find we wan't to find regular files.
For each file, it will execute sed. The -i argument tells sed you want to edit the file in place. (Check out http://www.grymoire.com/Unix/Sed.html)
One solution using find and perl:
find /the/directory -type f -exec perl -pi -e '
s/\b\d{1,3}(\.\d{1,3}){3}\b/x.x.x.x/g;
s/\b[a-f\d]{1,4}(:[a-f\d]{1,4}){7}\b/x:x:x:x:x:x:x:x/gi' {} \;
(type on one line)
Well, first you should probably just fix whatever is doing the logging to log the way you want to.
Now if you need to go back and modify historical files, you might consider using sed
sed -e 's/\b(\d{1,3}\.){3}\d{1,3}\b/x.x.x.x/' /path/to/file
sed -e 's/\b([:xdigit:]{4}:){7}[:xdigit:]{4}\b/x.x.x.x.x.x.x.x/' /path/to_file
I use this:
find . -name "*.log" -exec grep -izl PATTERN {} \; | xargs perl -i.orig -e -n 's/PATTERN/REPLACEMENT/g'
You'd want to insert your PATTERN(s) and replace *.log with something else depending on the name of your log files.
The -i.orig backs up the files being replaced with an extension of .orig.
I found that this was relatively faster than other things I tried. find/grep combo to indentify candidates, then perl to do the work.

Shell script to recursively browse a directory and replace a string

I need to recursively search directories and replace a string (say http://development:port/URI) with another (say http://production:port/URI) in all the files where ever it's found. Can anyone help?
It would be much better if that script can print out the files that it modified and takes the search/replace patterns as input parameters.
Regards.
find . -type f | xargs sed -i s/pattern/replacement/g
Try this:
find . -type f | xargs grep -l development | xargs perl -i.bak -p -e 's(http://development)(http://production)g'
Another approach with slightly more feedback:
find . -type f | while read file
do
grep development $file && echo "modifying $file" && perl -i.bak -p -e 's(http://development)(http://prodution)g' $file
done
Hope this helps.
It sounds like you would benefit from a layer of indirection. (But then, who wouldn't?)
I'm thinking that you could have the special string in just one location. Either reference the configuration settings at runtime, or generate these files with the correct string at build time.
Don't try the above within a working SVN / CVS directory, since it will also patch the .svn/.cvs, which is definitely not what you want. To avoid .svn modifications, for example, use:
find . -type f | fgrep -v .svn | xargs sed -i 's/pattern/replacement/g'
Use zsh so with advanced globing you can use only one command.
E.g.:
sed -i 's:pattern:target:g' ./**
HTH