Recursively rename directories and files based on a regular expression - regex

I am trying to strip all "?" in file names in a given directory who was got more subdirectories and they have subdirectories within it. I've tried using a simple perl regex script with system calls but it fails to recurse over each subdirectory, and going manually would be too much wasted time. How can I solve my problem?

You can use the find command to search the filenames with "?" and then use its exec argument to run a script which removes the "?" characters from the filename. Consider this script, which you could save to /usr/local/bin/rename.sh, for example (remember to give it +x permission):
#!/bin/sh
mv "$1" "$(echo $1| tr -d '?')"
Then this will do the job:
find -name "*\?*" -exec rename.sh {} \;

Try this :
find -name '*\?*' -exec prename 's/\?//g' {} +
See https://metacpan.org/module/RMBARKER/File-Rename-0.06/rename.PL (this is the default rename command on Ubuntu distros)

Find all the names with '?' and delete all of them. Probably -exec option could be used as well but would require additional script
for f in $(find $dir -name "*?*" -a -type f) ; do
mv $f ${f/?/}
done

Related

Can git rm take a regex or can I pipe the contents of a file to git rm?

I'm trying to remove all of the folder meta files from a unity project in the git repo my team is using. Other members don't delete the meta file associated to the folder they deleted/emptied and it's propagating to everyone else. It's a minor annoyance that shouldn't need to be seen so I've added this to the .gitignore:
*.meta
!*.*.meta
and now need to remove only the folder metas. I'd rather remove the metas now than wait for them to appear and have git remove them later. I'm using git bash on Windows and have tried the following commands to find just the folder metas:
find . -name '*.meta' > test.txt #returns folders and files
find . -regex '.*\.meta' > test.txt #again folders and files
find . -regex '\.[^\.]{0,}\.meta' > test.txt #nothing
find . -regex '\.[^.]{0,}\.meta' > test.txt #nothing
find . -regex '\.{2}' > test.txt #nothing
find . -regex '(\..*){2}' > test.txt #nothing
I know regex is interpreted differently per program/language but the following will produce the results I want in Notepad++ and I'm not sure how to translate it for git or git bash:
^.*/[^.]{0,}\.meta$
by capturing the lines (file paths from root of repo) that end with a /<foldername>.meta since I realized some folders contained a '.' in their name.
Once this is figured out I need to go line by line and git rm the files.
NOTE
I can also run:
^.*/.*?\..*?\.meta$\n
and replace with nothing to delete all of the file metas from the folders and files result, and use that result to get all of the folder metas, but I'd also like to know how to avoid needing Notepad++ as an extra step.
To confine the results only to indexed files use git ls-files, the swiss-army knife of index-aware file listing. git update-index is the core-command index munger,
git ls-files -i -x '*.meta' -x '!*.*.meta' | git update-index --force-remove --stdin
which will remove the files from your index but leave them in the work tree.
It's easier to express with two conditions just like in .gitignore. Match *.meta but exclude *.*.meta:
find . -name '*.meta' ! -name '*.*.meta'
Use -exec to run the command of your choice on the matched files. {} is a placeholder for the file names and ';' signifies the end of the -exec command (weird syntax but it's useful if you append other things after the -exec ... ';').
find . -name '*.meta' ! -name '*.*.meta' -exec git rm {} ';'

Moving file to another folder after performing search and replace operation on it

Please help me out here:
I'm using the below command to search and replace strings in files in a directory (including sub-directories):
find . -type f -exec perl -api -e 's/\b(?!00)[A-Z0-9]{6,}/dummy/g' {} \;
What I want to is after it performs the above operation on a file, I want to simultaneously move it to another folder and then work on the next file.
Any help is appreciated.
Thanks
You could try this:
find . -type f -exec perl -api -e 's/\b(?!00)[A-Z0-9]{6,}/dummy/g' {} \; -exec mv {} /to/this/directory \;
After the first -exec predicate completes successfully, find will run the next -exec. This answer to a related question will give you a bit more information.
What I want to is after it performs the above operation on a file, I want to simultaneously move it to another folder and then work on the next file.
You can do:
while IFS= read -rd '' file; do
perl -ap -e 's/\b(?!00)[A-Z0-9]{6,}/dummy/g' "$file" > "/dest/$file"
done < <(find . -type f -print0)
This will also take care of files with white-spaces and special characters.

select folders (not files) with leading "_" (underscore) symbols in terminal

I'm trying to run through folders and subfolders (only, no files can be altered) in a given directory which have leading underscores and remove those leading underscores. I'm planning on accomplishing this with a simple shell script:
for folder in ./_* do
mv "$folder" "${folder:1}"
done
The above script doesn't work yet to specification for two reasons which I'm trying to correct here:
- one, the "./_*" does not work like it should, either throwing an error (./_*: No such file or directory) or selecting folders which do not have leading underscores too.
- two, it does not specify folders only...is there an option for the mv command which can do that?
Thanks
To find all folders starting with underscore use this find:
find . -type d -name '_*'
And to remove _ use:
find . -type d -name '_*' -exec bash -c 'f="$1"; mv "$f" "${f:1}" - {} \;
Using bash4 recursively :
shopt -s globstar
for dir in **/_*/; do
mv "$dir" "${dir:1}"
done

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

Problem using "find" in shell scripting

I while back I wrote a shell script that automatically runs a python script over any c++ files it can find in a specified directory. I tested it, it worked fine, and I saved it and forgot about it; problem is I've came back to use it and encountered a problem (turns out I didnt test it enough eh?).
Anyway, the source directory paths I was testing before had no spaces in their names, e.g.
/somedirectory/subfolder/src/
But when I try and run the script using a path with spaces in it, e.g.
/Documents\ and\ Settings/subfolder/src/
It doesnt work.
I've located where the problem is, but I'm not sure how to fix it. Here's the code causing the problem:
names=( $(find "${SOURCE_ROOT_DIRECTORY}" -regex "[A-Za-z0-9]*.*\(cpp\|h\|cc\)$"))
The regular expression works with paths with no spaces, so I'm not sure if there's a problem with the regular expression, or if the "find" command stops when it encounters a space.
Can anyone help?
find doesn't "stop" when it hits files with spaces in their names. The problem occurs when try to store them as elements in an array.
Change IFS to the newline character (by default it is space):
#change IFS
OLDIFS=$IFS
IFS=$'\n'
#run find
names=($(find . -regex "[A-Za-z0-9]*.*\(cpp\|h\|cc\)$"))
#restore IFS
IFS=$OLDIFS
#test out the array
echo "size: ${#names[#]}"
for i in "${names[#]}"
do
echo "$i"
done
The canonical usage pattern is:
find subfolder/ -type f -name '*.cpp' -print0 |
xargs -0rn1 myscript.py
This has all the bells and whistles, you can probably do without -type f and perhaps some of the xargs flags
you can use read
while read -r file; do
names+=("$file")
done < <(find "${SOURCE_ROOT_DIRECTORY}" -regex "[A-Za-z0-9]*.*\(cpp\|h\|cc\)$")
a small test
$ mkdir -p /tmp/test && cd $_
$ touch foo bar "ab cd"
$ ls
ab cd bar foo
$ while read -r file; do names+=("$file"); done < <(find /tmp/test -type f);
$ echo ${#names[#]}
3
$ for file in "${names[#]}"; do echo "$file"; done;
/tmp/test/ab cd
/tmp/test/bar
/tmp/test/foo
$ unset names file