I have a body of C/C++ source code where the filename in the #include statement does not match the *.h file exactly. The match is correct, but is case insensitive. This is the type of source files that occur in a Windows system.
I want to change all the source files so that all #include statements are exact matches to the filenames they refer to.
All filenames to change are enclosed in quotes.
Example:
List of files
File1.h
FILE2.H
file1.cpp
file1.cpp
#include "file1.h"
#include "file2.h"
Change file1.cpp to
#include "File1.h"
#include "FILE2.H"
I would like to create an automated script to perform this update.
I have listed steps below that are pieces of this process, but I can't seem to bring the pieces together.
Create a list of all *.h files, ls *.h > include.lst. This creates a file of all the filenames with the correct case.
Using the filenames in include.lst create a sed command 's/<filename>/<filename>/I' which does a case insensitive search and replaces the match with properly cased filename. I believe I only have to do the replacement once, but adding the global g will take care of multiple occurances.
Apply this list of replacements to all files in a directory.
I would like suggestions on how to create the sed command 2) given include.lst. I think I can handle the rest.
Use sed in script, or use Perl script:
find . -name *.c -print0 | xargs -0 sed -i.bak -e "s/\#include\s\"\([^\"]+/)\"/\#include\s\"\L\1\"/"
-i.bak will back up the file to original_file_name.bak so you do not need to worry if you mess up
This line changes all header includes to lower case in your C files.
Then you want to change all files names:
find . -name *.h -print0 | xargs -0 rename 's/(*)/\L\1/'
This renames all header file to lower case.
This is for linux only. If you are using Windows, you might want to use Perl or Python script for all above.
for hfile in $(find /header/dir -type f -iname '*.h'); do
sed -i 's/#include "'$hfile'"/#include "'$hfile'"/gI' file1.cpp
done
I hope I got the quotes right :) Try without -i before applying.
You can wrap the sed call in another loop like this:
for hfile in $(find /header/dir -type f -iname '*.h'); do
for sfile in $(find /source/dir -type f -iname '*.cpp'); do
sed -i 's/#include "'$hfile'"/#include "'$hfile'"/gI' "$sfile"
done
done
This might work for you (GNU sed):
sed 's|.*|s/^#include "&"$/#include "&"/i|' list_of_files | sed -i -f - *.{cpp,h}
Thanks for all the details on lowercasing filenames and #include strings.
However, my original question was to perform a literal replacement.
Below is the basic command and sed script that met my requirements.
ls *.h *.H | sed -e "s/\([^\r\n]*\)/s\/\\\(\\\#include\\\s\\\"\\\)\1\\\"\/\\\1\1\\\"\/gi/g" >> sedcmd.txt
ls *.h *.H creates a list of files, one line at a time
Pipe this list to sed.
Search for the whole line, which is a filename. Put the whole line in group 1. s/\(^\r\n]*\)/
Replace the whole line, the filename, with the string s/\(\#include\s"\)<filename>"/\1<filename>"/gi
The string #include<space>" is placed in group 1. The i in the gi states to do a case insensitive search. The g is the normal global search and replace.
Given a filename ACCESS.H and cancel.h, the output of the script is
s/\(\#include\s"\)ACCESS.H"/\1ACCESS.H"/gi
s/\(\#include\s"\)cancel.h"/\1cancel.h"/gi
Finally, the sed command file can be used with the command
sed -i.bak -f sedcmd.txt *.cpp *.h
My solution doesn't fail for pathnames containing slashes (hopefully you don't contain % signs in your header paths).
It's also orders of magnitude faster (takes ~13 seconds on a few hundred files, as opposed to several minutes of waiting).
#!/bin/bash
shopt -s globstar failglob nocaseglob
# You should pushd to your include path-root.
pushd include/path/root
headers=( **/*.h )
popd
headers+=( *.h ) # My codebase has some extra header files in the project root.
echo ${#headers[*]} headers
# Separate each replacement with ;
regex=""
for header in "${headers[#]}"; do
regex+=';s%#include "'"$header"'"%#include "'"$header"'"%gI'
done
regex="${regex:1}"
find . -type f -iname '*.cpp' -print0 | \
xargs -0 sed -i "$regex"
It's much faster to make sed run just once per file (with many ;-separated regexes).
Related
I changed file names, so I have to change included file names.
For example, I change alpha.h to fix_alpha.h. So I have to change
#include "alpha.h"
to
#include "fix_alpha.h"
but there are so many files to fix like beta.h to fix_beta.h
I tried to use reg exp to fix it
grep -rl '*' ./ | xargs sed -i 's/*\.h``/fix_*\.h/g'
but it doesn't work.
How should I wirte the reg exp to make it work?
Your grep command looks for files containing a * (and most of your C files will probably have at least one comment, so all C files — source and header — will be selected). You can do better than that, though. For example, assuming that the names that need to be mapped are alpha.h, beta.h, gamma.h, and delta.h, and that you're a regular sane programmer who doesn't put spaces or newlines in either file names or directory names, then you can generate the list of interesting files with:
grep -rlE '^#include[[:space:]]*"(alpha|beta|gamma|delta)\.h"' .
Then, for each such file, you need to replace those header names with the fix_ prefix. Now you get to work out whether your sed has extended regular expressions. Assuming not, then you can create a sed script to do the mapping for you. For example, create a text file mapped-headers containing:
alpha.h
beta.h
gamma.h
delta.h
Now make that into a sed script with (of course) sed:
sed -e 's/\.h/\\.h/' \
-e 's%.*%s/^#include[[:space:]]*"&"/#include "fix_&"/' \
mapped-headers > script.sed
Now you can apply that script to the files containing any of the headers to be mapped using:
grep -rlE '^#include[[:space:]]*"(alpha|beta|gamma|delta)\.h"' . |
xargs sed -i.bak -f script.sed
(If you're overwriting files on Mac OS X with the -i option, you must provide a backup suffix, which can be attached to the -i option as shown, or in a separate option; if you want no suffix but overwriting, you have to write -i '' on Mac. By contrast, if you're using GNU sed, the backup suffix is optional, but if present, it must be attached to the -i option. What's shown will work on both platforms. Not all versions of sed support the -i option for overwriting.)
The extension to 100 headers with the simple prefix mapping should be obvious, at least for the sed script part. You could also have a sed script generate the grep script too; it would be similar in spirit, but simpler than the one shown.
You can write fancier regular expressions to match deviant layouts for the #include directives. You might allow for spaces before the #, and after the #. If you have comments in the operational part of your #include lines, you've got bigger problems than just mapping the names. This code won't handle:
/* comments? */ # \
include /*
*/ "alpha.h"
But no-one who's sane writes code like that anyway.
Further, if you have some not wholly systematic renamings to do, you can revise the script to handle that by changing the mapped-headers file to have the 'old' and 'new' values in separate columns:
alpha.h alphabet.h
beta.h veg/carotene.h
gamma.h hard/radiation.h
delta.h v-wing.h
find -name '*.h' -exec sed -i 's~alpha.h~fix_alpha.h~' {} \;
Your code may contains /,so here use ~ delimiter
How's is it not working?
Anyway, your regex seems not making sense to me. I believe you could simply replace by
sed 's|^\(\s*#\s*include\s.*[*/"]\)\([^/"]*\)\.h\(".*\)$|\1fix_\2.h\3|'
\ / \ / \ /
lines start with header everything
#include, match filename after .h
until the " or /
before the header
filename
Example:
#include "foo.h"
# include "foo.h"
# include "foo.h"
#include "foo/bar.h"
#include "foo/bar.h" // something else
result
#include "fix_foo.h"
# include "fix_foo.h"
# include "fix_foo.h"
#include "foo/fix_bar.h"
#include "foo/fix_bar.h" // something else
Keep it simple:
find . -type f -print | xargs sed -i 's/#include "/&fix_/'
I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.
I'm refactoring some code, and I decided to replace one name by another, let's say foo by bar. They appear in multiple .cc and .h files, so I would like to change from:
Foo key();
to
Bar key();
that's it, replace all the occurrences of Foo by Bar in Unix. And the files are in the same directory. I thought about
sed -e {'s/Foo/Bar/g'}
but I'm unsure if that's going to work.
This should do the trick:
sed -i'.bak' 's/\bFoo\b/Bar/g' *.files
I would use sed:
sed -i.bak -e '/Foo/ s//Bar/g' /path/to/dir/*.cc
Repeat for the *.h files
I don't use sed alot, but iF you have access to Perl on the command line (which many unix's do) you can do:
perl -pi -e 's/Foo key/Bar key/g' `find ./ -name '*.h' -o -name '*.cc'`
This will find (recursively) all files in the current directory ending with .h or .cc and then use Perl to replace 'Foo key' with 'Bar key' in each file.
I like Jaypal's sed command. It useds \b to ensure that you only replace full words (Foo not Foobar) and it makes backup files in case something went wrong.
However, if all of your files are not in one directory, then you will need to use a more sophisticated method to list them all. Use the find command to send them all to sed:
find . -print0 -regex '.*\.\(cc\|h\)' | xargs -0 sed -i'.bak' 's/\bFoo\b/Bar/g'
You probably have perl installed (if its UNIX), so here's something that should work for you:
perl -e "s/Foo/Bar/g;" -pi.save $(find path/to/DIRECTORY -type f)
Note, this provides a backup of the original file, if you need that as a bit of insurance.
Otherwise, you can do what #Kevin mentioned and just use an IDE refactoring feature.
Note: I just saw you're using Vim, here's a quick tutorial on how to do it
I have changed up my director structure and I want to do the following:
Do a recursive grep to find all instances of a match
Change to the updated location string
One example (out of hundreds) would be:
from common.utils import debug --> from etc.common.utils import debug
To get all the instances of what I'm looking for I'm doing:
$ grep -r 'common.' ./
However, I also need to make sure common is preceded by a space. How would I do this find and replace?
It's hard to tell exactly what you want because your refactoring example changes the import as well as the package, but the following will change common. -> etc.common. for all files in a directory:
sed -i 's/\bcommon\./etc.&/' $(egrep -lr '\bcommon\.' .)
This assumes you have gnu sed available, which most linux systems do. Also, just to let you know, this will fail if there are too many files for sed to handle at one time. In that case, you can do this:
egrep -lr '\bcommon\.' . | xargs sed -i 's/\bcommon\./etc.&/'
Note that it might be a good idea to run the sed command as sed -i'.OLD' 's/\bcommon\./etc.&/' so that you get a backup of the original file.
If your grep implementation supports Perl syntax (-P flag, on e.g. Linux it's usually available), you can benefit from the additional features like word boundaries:
$ grep -Pr '\bcommon\.'
By the way:
grep -r tends to be much slower than a previously piped find command as in Rob's example. Furthermore, when you're sure that the file-names found do not contain any whitespace, using xargs is much faster than -exec:
$ find . -type f -name '*.java' | xargs grep -P '\bcommon\.'
Or, applied to Tim's example:
$ find . -type f -name '*.java' | xargs sed -i.bak 's/\<common\./etc.common./'
Note that, in the latter example, the replacement is done after creating a *.bak backup for each file changed. This way you can review the command's results and then delete the backups:
$ find . -type f -name '*.bak' | xargs rm
If you've made an oopsie, the following command will restore the previous versions:
$ find . -type f -name '*.bak' | while read LINE; do mv -f $LINE `basename $LINE`; done
Of course, if you aren't sure that there's no whitespace in the file names and paths, you should apply the commands via find's -exec parameter.
Cheers!
This is roughly how you would do it using find. This requires testing
find . -name \*.java -exec sed "s/FIND_STR/REPLACE_STR/g" {}
This translates as "Starting from the current directory find all files that end in .java and execute sed on the file (where {} is a place holder for the currently found file) "s/FIND_STR/REPLACE_STR/g" replaces FIND_STR with REPLACE_STR in each line in the current file.
How would I replace one pattern with another for every file with extension .cc and .h recursively? Not sure if I need to prevent it from going into .svn directories.
first attempt
#!/bin/bash
for file in `find . -name '*.cc' -or -name '*.h'`; do \
sed -e s/$1/$2/g -i temp $file
done
If your project is managed under linux platform, You can do sth like that inside the bash:
for file in `find . -name '*.cpp' -or -name '*.h'`; do \
cat $file | sed s/$1/$2/g > tmp
mv tmp $file
done
Each svn file has '*-base' extension so all of them will be unchanged. This script will only affect to *h and *cc files.
You can search and replace using regular expressions through certain files in Eclipse project explorer, for example. I'm sure there are lots of tools that can do this, but Elipse comes to mind first.
To do this you need to invoke the "Search" dialog, place your pattern in the search field, specify extensions with a wildcard and press "Replace",