replacing one word by another in an entire directory - unix - regex

I'm refactoring some code, and I decided to replace one name by another, let's say foo by bar. They appear in multiple .cc and .h files, so I would like to change from:
Foo key();
to
Bar key();
that's it, replace all the occurrences of Foo by Bar in Unix. And the files are in the same directory. I thought about
sed -e {'s/Foo/Bar/g'}
but I'm unsure if that's going to work.

This should do the trick:
sed -i'.bak' 's/\bFoo\b/Bar/g' *.files

I would use sed:
sed -i.bak -e '/Foo/ s//Bar/g' /path/to/dir/*.cc
Repeat for the *.h files

I don't use sed alot, but iF you have access to Perl on the command line (which many unix's do) you can do:
perl -pi -e 's/Foo key/Bar key/g' `find ./ -name '*.h' -o -name '*.cc'`
This will find (recursively) all files in the current directory ending with .h or .cc and then use Perl to replace 'Foo key' with 'Bar key' in each file.

I like Jaypal's sed command. It useds \b to ensure that you only replace full words (Foo not Foobar) and it makes backup files in case something went wrong.
However, if all of your files are not in one directory, then you will need to use a more sophisticated method to list them all. Use the find command to send them all to sed:
find . -print0 -regex '.*\.\(cc\|h\)' | xargs -0 sed -i'.bak' 's/\bFoo\b/Bar/g'

You probably have perl installed (if its UNIX), so here's something that should work for you:
perl -e "s/Foo/Bar/g;" -pi.save $(find path/to/DIRECTORY -type f)
Note, this provides a backup of the original file, if you need that as a bit of insurance.
Otherwise, you can do what #Kevin mentioned and just use an IDE refactoring feature.
Note: I just saw you're using Vim, here's a quick tutorial on how to do it

Related

sed command creating unwanted duplicates of file with -e extension

I am trying to do a recursive find and replace on java files in a directory using a shell script. It works, but it is hiding all the files, and creating duplicates with a -e extension
#!/bin/bash
for file in $(find . -type f -name "*.java")
do sed -i -e 's/foo/bar/g' $file
done
From my understanding, the -e is optional - but if I do not provide it I get the following error on every file it finds
sed: 1: "./DirectoryAdapter.java": invalid command code .
Any clue as it what is happening here? For reference I am on Mac OS X running El Capitan
Here is a before and after screenshot of the directory after running the script. The replaced files still exist, they are hidden?
On OSX sed (BSD) sed requires an extension after -i option. Since it is finding -e afterwards it is adding -e to each input filename. btw you don't even need -e option here.
You can pass an empty extension like this:
sed -i '' 's/foo/bar/g' $file
Or use .bak for an extension to save original file:
sed -i.bak 's/foo/bar/g' $file
The accepted answer works for OSX but causes issues if your code is run on both GNU and OSX systems since they expect -i[SUFFIX] and -i [SUFFIX] respectively.
There are probably two reasonable solutions in this case.
Don't use -i (inplace). Instead pipe to a temporary file and overwrite the original after.
use perl.
The easiest fix for this I found was to simply use perl. The syntax is almost identical:
sed -i -e 's/foo/bar/g' $file
->
perl -pi -e 's/foo/bar/g' $file

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

Replace text across multiple files in a directory with sed

I need to hide the IP addresses in the log files for security reasons. The IP addresses are of version 4 and 6. How do I hide the addresses in a way that, IPv4 example 123.4.32.16 is replaced by x.x.x.x and IPv6 example 232e:23o5:te43:5423:5433:0000:ef09:23ff is replaced by x:x:x:x:x:x:x:x? Is it possible to do this using a single sed command?
You might want to use find and sed for this.
Let's assume your logs have the extension ".log":
find /path/to/logs -type f -name '*.log' -exec \
sed -i -e 's,[0-9]\+\(\.[0-9]\+\)\{3\},x.x.x.x,g' \
-e 's,[0-9a-f]\+\(:[0-9a-f]\+\)\{7\},x:x:x:x:x:x:x:x,gi' {} \;
How does this work?
First, we ask find to recursively locate files with the .log extension starting from /path/to/logs. -type f tells find we wan't to find regular files.
For each file, it will execute sed. The -i argument tells sed you want to edit the file in place. (Check out http://www.grymoire.com/Unix/Sed.html)
One solution using find and perl:
find /the/directory -type f -exec perl -pi -e '
s/\b\d{1,3}(\.\d{1,3}){3}\b/x.x.x.x/g;
s/\b[a-f\d]{1,4}(:[a-f\d]{1,4}){7}\b/x:x:x:x:x:x:x:x/gi' {} \;
(type on one line)
Well, first you should probably just fix whatever is doing the logging to log the way you want to.
Now if you need to go back and modify historical files, you might consider using sed
sed -e 's/\b(\d{1,3}\.){3}\d{1,3}\b/x.x.x.x/' /path/to/file
sed -e 's/\b([:xdigit:]{4}:){7}[:xdigit:]{4}\b/x.x.x.x.x.x.x.x/' /path/to_file
I use this:
find . -name "*.log" -exec grep -izl PATTERN {} \; | xargs perl -i.orig -e -n 's/PATTERN/REPLACEMENT/g'
You'd want to insert your PATTERN(s) and replace *.log with something else depending on the name of your log files.
The -i.orig backs up the files being replaced with an extension of .orig.
I found that this was relatively faster than other things I tried. find/grep combo to indentify candidates, then perl to do the work.

Changing #include filenames to match case

I have a body of C/C++ source code where the filename in the #include statement does not match the *.h file exactly. The match is correct, but is case insensitive. This is the type of source files that occur in a Windows system.
I want to change all the source files so that all #include statements are exact matches to the filenames they refer to.
All filenames to change are enclosed in quotes.
Example:
List of files
File1.h
FILE2.H
file1.cpp
file1.cpp
#include "file1.h"
#include "file2.h"
Change file1.cpp to
#include "File1.h"
#include "FILE2.H"
I would like to create an automated script to perform this update.
I have listed steps below that are pieces of this process, but I can't seem to bring the pieces together.
Create a list of all *.h files, ls *.h > include.lst. This creates a file of all the filenames with the correct case.
Using the filenames in include.lst create a sed command 's/<filename>/<filename>/I' which does a case insensitive search and replaces the match with properly cased filename. I believe I only have to do the replacement once, but adding the global g will take care of multiple occurances.
Apply this list of replacements to all files in a directory.
I would like suggestions on how to create the sed command 2) given include.lst. I think I can handle the rest.
Use sed in script, or use Perl script:
find . -name *.c -print0 | xargs -0 sed -i.bak -e "s/\#include\s\"\([^\"]+/)\"/\#include\s\"\L\1\"/"
-i.bak will back up the file to original_file_name.bak so you do not need to worry if you mess up
This line changes all header includes to lower case in your C files.
Then you want to change all files names:
find . -name *.h -print0 | xargs -0 rename 's/(*)/\L\1/'
This renames all header file to lower case.
This is for linux only. If you are using Windows, you might want to use Perl or Python script for all above.
for hfile in $(find /header/dir -type f -iname '*.h'); do
sed -i 's/#include "'$hfile'"/#include "'$hfile'"/gI' file1.cpp
done
I hope I got the quotes right :) Try without -i before applying.
You can wrap the sed call in another loop like this:
for hfile in $(find /header/dir -type f -iname '*.h'); do
for sfile in $(find /source/dir -type f -iname '*.cpp'); do
sed -i 's/#include "'$hfile'"/#include "'$hfile'"/gI' "$sfile"
done
done
This might work for you (GNU sed):
sed 's|.*|s/^#include "&"$/#include "&"/i|' list_of_files | sed -i -f - *.{cpp,h}
Thanks for all the details on lowercasing filenames and #include strings.
However, my original question was to perform a literal replacement.
Below is the basic command and sed script that met my requirements.
ls *.h *.H | sed -e "s/\([^\r\n]*\)/s\/\\\(\\\#include\\\s\\\"\\\)\1\\\"\/\\\1\1\\\"\/gi/g" >> sedcmd.txt
ls *.h *.H creates a list of files, one line at a time
Pipe this list to sed.
Search for the whole line, which is a filename. Put the whole line in group 1. s/\(^\r\n]*\)/
Replace the whole line, the filename, with the string s/\(\#include\s"\)<filename>"/\1<filename>"/gi
The string #include<space>" is placed in group 1. The i in the gi states to do a case insensitive search. The g is the normal global search and replace.
Given a filename ACCESS.H and cancel.h, the output of the script is
s/\(\#include\s"\)ACCESS.H"/\1ACCESS.H"/gi
s/\(\#include\s"\)cancel.h"/\1cancel.h"/gi
Finally, the sed command file can be used with the command
sed -i.bak -f sedcmd.txt *.cpp *.h
My solution doesn't fail for pathnames containing slashes (hopefully you don't contain % signs in your header paths).
It's also orders of magnitude faster (takes ~13 seconds on a few hundred files, as opposed to several minutes of waiting).
#!/bin/bash
shopt -s globstar failglob nocaseglob
# You should pushd to your include path-root.
pushd include/path/root
headers=( **/*.h )
popd
headers+=( *.h ) # My codebase has some extra header files in the project root.
echo ${#headers[*]} headers
# Separate each replacement with ;
regex=""
for header in "${headers[#]}"; do
regex+=';s%#include "'"$header"'"%#include "'"$header"'"%gI'
done
regex="${regex:1}"
find . -type f -iname '*.cpp' -print0 | \
xargs -0 sed -i "$regex"
It's much faster to make sed run just once per file (with many ;-separated regexes).

Recursive find and replace based on regex

I have changed up my director structure and I want to do the following:
Do a recursive grep to find all instances of a match
Change to the updated location string
One example (out of hundreds) would be:
from common.utils import debug --> from etc.common.utils import debug
To get all the instances of what I'm looking for I'm doing:
$ grep -r 'common.' ./
However, I also need to make sure common is preceded by a space. How would I do this find and replace?
It's hard to tell exactly what you want because your refactoring example changes the import as well as the package, but the following will change common. -> etc.common. for all files in a directory:
sed -i 's/\bcommon\./etc.&/' $(egrep -lr '\bcommon\.' .)
This assumes you have gnu sed available, which most linux systems do. Also, just to let you know, this will fail if there are too many files for sed to handle at one time. In that case, you can do this:
egrep -lr '\bcommon\.' . | xargs sed -i 's/\bcommon\./etc.&/'
Note that it might be a good idea to run the sed command as sed -i'.OLD' 's/\bcommon\./etc.&/' so that you get a backup of the original file.
If your grep implementation supports Perl syntax (-P flag, on e.g. Linux it's usually available), you can benefit from the additional features like word boundaries:
$ grep -Pr '\bcommon\.'
By the way:
grep -r tends to be much slower than a previously piped find command as in Rob's example. Furthermore, when you're sure that the file-names found do not contain any whitespace, using xargs is much faster than -exec:
$ find . -type f -name '*.java' | xargs grep -P '\bcommon\.'
Or, applied to Tim's example:
$ find . -type f -name '*.java' | xargs sed -i.bak 's/\<common\./etc.common./'
Note that, in the latter example, the replacement is done after creating a *.bak backup for each file changed. This way you can review the command's results and then delete the backups:
$ find . -type f -name '*.bak' | xargs rm
If you've made an oopsie, the following command will restore the previous versions:
$ find . -type f -name '*.bak' | while read LINE; do mv -f $LINE `basename $LINE`; done
Of course, if you aren't sure that there's no whitespace in the file names and paths, you should apply the commands via find's -exec parameter.
Cheers!
This is roughly how you would do it using find. This requires testing
find . -name \*.java -exec sed "s/FIND_STR/REPLACE_STR/g" {}
This translates as "Starting from the current directory find all files that end in .java and execute sed on the file (where {} is a place holder for the currently found file) "s/FIND_STR/REPLACE_STR/g" replaces FIND_STR with REPLACE_STR in each line in the current file.