change include header file with regular expression - regex

I changed file names, so I have to change included file names.
For example, I change alpha.h to fix_alpha.h. So I have to change
#include "alpha.h"
to
#include "fix_alpha.h"
but there are so many files to fix like beta.h to fix_beta.h
I tried to use reg exp to fix it
grep -rl '*' ./ | xargs sed -i 's/*\.h``/fix_*\.h/g'
but it doesn't work.
How should I wirte the reg exp to make it work?

Your grep command looks for files containing a * (and most of your C files will probably have at least one comment, so all C files — source and header — will be selected). You can do better than that, though. For example, assuming that the names that need to be mapped are alpha.h, beta.h, gamma.h, and delta.h, and that you're a regular sane programmer who doesn't put spaces or newlines in either file names or directory names, then you can generate the list of interesting files with:
grep -rlE '^#include[[:space:]]*"(alpha|beta|gamma|delta)\.h"' .
Then, for each such file, you need to replace those header names with the fix_ prefix. Now you get to work out whether your sed has extended regular expressions. Assuming not, then you can create a sed script to do the mapping for you. For example, create a text file mapped-headers containing:
alpha.h
beta.h
gamma.h
delta.h
Now make that into a sed script with (of course) sed:
sed -e 's/\.h/\\.h/' \
-e 's%.*%s/^#include[[:space:]]*"&"/#include "fix_&"/' \
mapped-headers > script.sed
Now you can apply that script to the files containing any of the headers to be mapped using:
grep -rlE '^#include[[:space:]]*"(alpha|beta|gamma|delta)\.h"' . |
xargs sed -i.bak -f script.sed
(If you're overwriting files on Mac OS X with the -i option, you must provide a backup suffix, which can be attached to the -i option as shown, or in a separate option; if you want no suffix but overwriting, you have to write -i '' on Mac. By contrast, if you're using GNU sed, the backup suffix is optional, but if present, it must be attached to the -i option. What's shown will work on both platforms. Not all versions of sed support the -i option for overwriting.)
The extension to 100 headers with the simple prefix mapping should be obvious, at least for the sed script part. You could also have a sed script generate the grep script too; it would be similar in spirit, but simpler than the one shown.
You can write fancier regular expressions to match deviant layouts for the #include directives. You might allow for spaces before the #, and after the #. If you have comments in the operational part of your #include lines, you've got bigger problems than just mapping the names. This code won't handle:
/* comments? */ # \
include /*
*/ "alpha.h"
But no-one who's sane writes code like that anyway.
Further, if you have some not wholly systematic renamings to do, you can revise the script to handle that by changing the mapped-headers file to have the 'old' and 'new' values in separate columns:
alpha.h alphabet.h
beta.h veg/carotene.h
gamma.h hard/radiation.h
delta.h v-wing.h

find -name '*.h' -exec sed -i 's~alpha.h~fix_alpha.h~' {} \;
Your code may contains /,so here use ~ delimiter

How's is it not working?
Anyway, your regex seems not making sense to me. I believe you could simply replace by
sed 's|^\(\s*#\s*include\s.*[*/"]\)\([^/"]*\)\.h\(".*\)$|\1fix_\2.h\3|'
\ / \ / \ /
lines start with header everything
#include, match filename after .h
until the " or /
before the header
filename
Example:
#include "foo.h"
# include "foo.h"
# include "foo.h"
#include "foo/bar.h"
#include "foo/bar.h" // something else
result
#include "fix_foo.h"
# include "fix_foo.h"
# include "fix_foo.h"
#include "foo/fix_bar.h"
#include "foo/fix_bar.h" // something else

Keep it simple:
find . -type f -print | xargs sed -i 's/#include "/&fix_/'

Related

Use [msys] bash to remove all files whose name matches a pattern, regardless of file-name letter-case

I need a way to clean up a directory, which is populated with C/C++ built-files (.o, .a, .EXE, .OBJ, .LIB, etc.) produced by (1) some tools which always create files having UPPER-CASE names, and (2) other tools which always create lower-case file names. (I have no control over the tools.)
I need to do this from a MinGW 'msys' bash.exe shell script (or bash command prompt). I understand piping (|), but haven't come up with the right combination of exec's yet. I have successfully filtered the file names, using commands like this example:
ls | grep '.\.[eE][xX][eE]'
to list all files having any case-combination of letters in the file-extension--this example gets all the executable (e.g. ".EXE") files.
(I'll be doing similar for .o, .a, .OBJ, .LIB, .lib, .MAP, etc., which all share the same directory as the C/C++ source files. I don't want to delete the source files, only the built-files. And yes, I probably should rework the directory structure, to use a separate directory for the built-files [only], but that will take time, and I need a quick solution now.)
How can I merge the above command with "something" else (e.g., like the 'rm -f' command???), to carry this the one step further, to actually delete [only] those filtered-out files from the current directory? (I'm hopeful for a solution which does not require a temporary file to hold the filtered file names.)
Adding this answer because the accepted answer is suggesting practices which are not-recommended in actual scripts. (Please don't feel bad, I was also on that track once..)
Parsing ls output is a NO-NO! See http://mywiki.wooledge.org/ParsingLs for more detailed explanation on why.
In short, ls separates the filenames with newline; which can be present in the filename itself. (Plus, ls does not handle other special characters properly. ls prints the output in human readable form.) In unix/linux, it's perfectly valid to have a newline in the filename.
A unix filename cannot have a NULL character though. Hence below command should work.
find /path/to/some/directory -iname '*.exe' -print0 | xargs -0 rm -f
find: is a tool used to, well, find files matching the required pattern/criterion.
-iname: search using particular names, case insensitive. Note that the argument to -iname is wildcard, not regex.
-print0: Print the file names separated by NULL character.
xargs: Takes the input from stdin & runs the commands supplied (rm -f in this case) on them. The input is separaed by white-space by default.
-0 specifies that the input is separated by null character.
Or even better approach,
find /path/to/some/directory -iname '*.exe' -delete
-delete is a built-in feature of find, which deletes the files found with the pattern.
Note that if you want to do some other operation, like move them to particular directory, you'd need to use first option with xargs.
Finally, this command find /path/to/some/directory -iname '*.exe' -delete would recursively find the *.exe files/directories. You can restrict the search to current directory with -maxdepth 1 & filetype to simple file (not directory, pipe etc.) using -type f. Check the manual link I provided for more details.
this is what you mean?
rm -f `ls | grep '.\.[eE][xX][eE]'`
but usually your "ls | grep ..." output will have some other fields that you have to strip out such as date etc., so you might just want to output the file name itself.
try something like:
rm -f `ls | grep '.\.[eE][xX][eE]' | awk '{print $9}'`
where you file name is in the 9th field like:
-rwxr-xr-x 1 Administrators None 283 Jul 2 2014 search.exe
You can use following command:
ls | grep '.\.[eE][xX][eE]' | xargs rm -f
Use of "xargs" would turn standard input ( in this case output of the previous command) as arguments for "rm -f" command.

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

replacing one word by another in an entire directory - unix

I'm refactoring some code, and I decided to replace one name by another, let's say foo by bar. They appear in multiple .cc and .h files, so I would like to change from:
Foo key();
to
Bar key();
that's it, replace all the occurrences of Foo by Bar in Unix. And the files are in the same directory. I thought about
sed -e {'s/Foo/Bar/g'}
but I'm unsure if that's going to work.
This should do the trick:
sed -i'.bak' 's/\bFoo\b/Bar/g' *.files
I would use sed:
sed -i.bak -e '/Foo/ s//Bar/g' /path/to/dir/*.cc
Repeat for the *.h files
I don't use sed alot, but iF you have access to Perl on the command line (which many unix's do) you can do:
perl -pi -e 's/Foo key/Bar key/g' `find ./ -name '*.h' -o -name '*.cc'`
This will find (recursively) all files in the current directory ending with .h or .cc and then use Perl to replace 'Foo key' with 'Bar key' in each file.
I like Jaypal's sed command. It useds \b to ensure that you only replace full words (Foo not Foobar) and it makes backup files in case something went wrong.
However, if all of your files are not in one directory, then you will need to use a more sophisticated method to list them all. Use the find command to send them all to sed:
find . -print0 -regex '.*\.\(cc\|h\)' | xargs -0 sed -i'.bak' 's/\bFoo\b/Bar/g'
You probably have perl installed (if its UNIX), so here's something that should work for you:
perl -e "s/Foo/Bar/g;" -pi.save $(find path/to/DIRECTORY -type f)
Note, this provides a backup of the original file, if you need that as a bit of insurance.
Otherwise, you can do what #Kevin mentioned and just use an IDE refactoring feature.
Note: I just saw you're using Vim, here's a quick tutorial on how to do it

Changing #include filenames to match case

I have a body of C/C++ source code where the filename in the #include statement does not match the *.h file exactly. The match is correct, but is case insensitive. This is the type of source files that occur in a Windows system.
I want to change all the source files so that all #include statements are exact matches to the filenames they refer to.
All filenames to change are enclosed in quotes.
Example:
List of files
File1.h
FILE2.H
file1.cpp
file1.cpp
#include "file1.h"
#include "file2.h"
Change file1.cpp to
#include "File1.h"
#include "FILE2.H"
I would like to create an automated script to perform this update.
I have listed steps below that are pieces of this process, but I can't seem to bring the pieces together.
Create a list of all *.h files, ls *.h > include.lst. This creates a file of all the filenames with the correct case.
Using the filenames in include.lst create a sed command 's/<filename>/<filename>/I' which does a case insensitive search and replaces the match with properly cased filename. I believe I only have to do the replacement once, but adding the global g will take care of multiple occurances.
Apply this list of replacements to all files in a directory.
I would like suggestions on how to create the sed command 2) given include.lst. I think I can handle the rest.
Use sed in script, or use Perl script:
find . -name *.c -print0 | xargs -0 sed -i.bak -e "s/\#include\s\"\([^\"]+/)\"/\#include\s\"\L\1\"/"
-i.bak will back up the file to original_file_name.bak so you do not need to worry if you mess up
This line changes all header includes to lower case in your C files.
Then you want to change all files names:
find . -name *.h -print0 | xargs -0 rename 's/(*)/\L\1/'
This renames all header file to lower case.
This is for linux only. If you are using Windows, you might want to use Perl or Python script for all above.
for hfile in $(find /header/dir -type f -iname '*.h'); do
sed -i 's/#include "'$hfile'"/#include "'$hfile'"/gI' file1.cpp
done
I hope I got the quotes right :) Try without -i before applying.
You can wrap the sed call in another loop like this:
for hfile in $(find /header/dir -type f -iname '*.h'); do
for sfile in $(find /source/dir -type f -iname '*.cpp'); do
sed -i 's/#include "'$hfile'"/#include "'$hfile'"/gI' "$sfile"
done
done
This might work for you (GNU sed):
sed 's|.*|s/^#include "&"$/#include "&"/i|' list_of_files | sed -i -f - *.{cpp,h}
Thanks for all the details on lowercasing filenames and #include strings.
However, my original question was to perform a literal replacement.
Below is the basic command and sed script that met my requirements.
ls *.h *.H | sed -e "s/\([^\r\n]*\)/s\/\\\(\\\#include\\\s\\\"\\\)\1\\\"\/\\\1\1\\\"\/gi/g" >> sedcmd.txt
ls *.h *.H creates a list of files, one line at a time
Pipe this list to sed.
Search for the whole line, which is a filename. Put the whole line in group 1. s/\(^\r\n]*\)/
Replace the whole line, the filename, with the string s/\(\#include\s"\)<filename>"/\1<filename>"/gi
The string #include<space>" is placed in group 1. The i in the gi states to do a case insensitive search. The g is the normal global search and replace.
Given a filename ACCESS.H and cancel.h, the output of the script is
s/\(\#include\s"\)ACCESS.H"/\1ACCESS.H"/gi
s/\(\#include\s"\)cancel.h"/\1cancel.h"/gi
Finally, the sed command file can be used with the command
sed -i.bak -f sedcmd.txt *.cpp *.h
My solution doesn't fail for pathnames containing slashes (hopefully you don't contain % signs in your header paths).
It's also orders of magnitude faster (takes ~13 seconds on a few hundred files, as opposed to several minutes of waiting).
#!/bin/bash
shopt -s globstar failglob nocaseglob
# You should pushd to your include path-root.
pushd include/path/root
headers=( **/*.h )
popd
headers+=( *.h ) # My codebase has some extra header files in the project root.
echo ${#headers[*]} headers
# Separate each replacement with ;
regex=""
for header in "${headers[#]}"; do
regex+=';s%#include "'"$header"'"%#include "'"$header"'"%gI'
done
regex="${regex:1}"
find . -type f -iname '*.cpp' -print0 | \
xargs -0 sed -i "$regex"
It's much faster to make sed run just once per file (with many ;-separated regexes).

Automatically fix filename cases in C++ codebase?

I am porting a C++ codebase which was developed on a Windows platform to Linux/GCC. It seems that the author didn't care for the case of filenames, so he used
#include "somefile.h"
instead of
#include "SomeFile.h"
to include the file which is actually called "SomeFile.h". I was wondering if there is any tool out there to automatically fix these includes? The files are all in one directory, so it would be easy for the tool to find the correct names.
EDIT: Before doing anything note that I'm assuming you either have copies of the files off ot the side or preferably that you have a baseline version in source control should you need to roll back for any reason.
You should be able to do this with sed: Something like sed -i 's/somefile\.h/SomeFile.H/I' *.[Ch]
This means take a case-insensitive somefile (trailing /I) and do an in-place (same file) replacement (-i) with the other text, SomeFile.H.
You can even do it in a loop (totally untested):
for file in *.[Ch]
do
sed -i "s/$file/$file/I" *.[Ch]
done
I should note that although I don't believe this applies to you, Solaris sed doesn't support -i and you'd have to install GNU sed or redirect to a file and rename.
Forgive my, I'm away from my linux environment right now so I can't test this myself, but I can tell you what utilities you would need to use to do it.
Open a terminal and use cd to navigate to the correct directory.
cd ~/project
Get a list of all of the .h files you need. You should be able to accomplish this with the shell's wildcard expansion without any effort.
ls include/*.h libs/include/*.h
Get a list of all of the files in the entire project (.c, .cpp, .h, .whatever), anything that can #include "header.h". Again, wildcard expansion.
ls include/*.h libs/include/*.h *.cpp libs/*.cpp
Iterate over each file in the project with a for loop
for f in ... # wildcard file list
do
echo "Looking in $f"
done
Iterate over each header file with a for loop
for h in ... # wildcard header list
do
echo "Looking for $h"
done
For each header in each project file, use sed to search for #include "headerfilename.h", and replace with #include "HeaderFileName.h" or whatever the correct case is.
Warning: Untested and probably dangerous: This stuff is a place to start and should be thoroughly tested before use.
h_escaped=$(echo $h | sed -e 's/\([[\/.*]\|\]\)/\\&/g') # escapes characters in file name
argument="(^\s*\#include\s*\")$h_escaped(\"\s*\$)" # I think this is right
sed -i -e "s/$argument/\$1$h\$2/gip"`
Yes, I know it looks awful.
Things to consider:
Rather than going straight to running this on your production codebase, test it thoroughly first.
sed can eat files like a VCR can eat tapes.
Make a backup.
Make another backup.
This is an O(N^2) operation involving hard disk access, and if your project is large it will run slowly. If your project is not gigantic, don't bother, but if it is, consider doing something to pipe sed's output to other seds.
Your search should be case insensitive: it should match #include, #INCLUDE, #iNcLuDe, and any combination of case present in the existing header filename, as well as any amount of whitespace between the include and the header. Bonus points if you preserve whitespace.
Use Notepad++ to do a 'Find in Files' and replace.
From toolbar:
Search - Find in Files.
Then complete the 'Find what' and 'Replace with'.