Find folders that contain multiple matches to a regex/grep - regex

I have a folder structure encompassing many thousands of folders. I would like to be able to find all the folders that, for example, contain multiple .txt files, or multiple .jpeg, or whatever without seeing any folders that contain only a single file of that kind.
The folders should all have only one file of a specific type, but this is not always the case and it is tedious to try to find them.
Note that the folders may contain many other files.
If possible, I'd like to match "FILE.JPG" and "file.jpg" as both matching a query on "file" or "jpg".
What I have been doing in simply find . -iname "*file*" and going through it manually.
folders contain folders, sometimes 3 or 4 levels deep
first/
second/
README.txt
readme.TXT
readme.txt
foo.txt
third/
info.txt
third/fourth/
raksljdfa.txt
Should return
first/second/README.txt
first/second/readme.TXT
first/second/readme.txt
first/secondfoo.txt```
when searching for "txt"
and
first/second/README.txt
first/second/readme.TXT
first/second/readme.txt
when searching for "readme"

This pure Bash code should do it (with caveats, see below):
#! /bin/bash
fileglob=$1 # E.g. '*.txt' or '*readme*'
shopt -s nullglob # Expand to nothing if nothing matches
shopt -s dotglob # Match files whose names start with '.'
shopt -s globstar # '**' matches multiple directory levels
shopt -s nocaseglob # Ignore case when matching
IFS= # Disable word splitting
for dir in **/ ; do
matching_files=( "$dir"$fileglob )
(( ${#matching_files[*]} > 1 )) && printf '%s\n' "${matching_files[#]}"
done
Supply the pattern to be matched as an argument to the program when you run it. E.g.
myprog '*.txt'
myprog '*readme*'
(The quotes on the patterns are necessary to stop them matching files in the current directory.)
The caveats regarding the code are:
globstar was introduced with Bash 4.0. The code won't work with older Bash.
Prior to Bash 4.3, globstar matches followed symlinks. This could lead to duplicate outputs, or even failures due to circular links.
The **/ pattern expands to a list of all the directories in the hierarchy. This could take an excessively long time or use an excessive amount of memory if the number of directories is large (say, greater than ten thousand).
If your Bash is older than 4.3, or you have large numbers of directories, this code is a better option:
#! /bin/bash
fileglob=$1 # E.g. '*.txt' or '*readme*'
shopt -s nullglob # Expand to nothing if nothing matches
shopt -s dotglob # Match files whose names start with '.'
shopt -s nocaseglob # Ignore case when matching
IFS= # Disable word splitting
find . -type d -print0 \
| while read -r -d '' dir ; do
matching_files=( "$dir"/$fileglob )
(( ${#matching_files[*]} > 1 )) \
&& printf '%s\n' "${matching_files[#]}"
done

Something like this sounds like what you want:
find . -type f -print0 |
awk -v re='[.]txt$' '
BEGIN {
RS = "\0"
IGNORECASE = 1
}
{
dir = gensub("/[^/]+$","",1,$0)
file = gensub("^.*/","",1,$0)
}
file ~ re {
dir2files[dir][file]
}
END {
for (dir in dir2files) {
if ( length(dir2files[dir]) > 1 ) {
for (file in dir2files[dir]) {
print dir "/" file
}
}
}
}'
It's untested but should be close. It uses GNU awk for gensub(), IGNORECASE, true multi-dimensional arrays and length(array).

Related

How can I use perl to delete files matching a regex

Due to a Makefile mistake, I have some fake files in my git repo...
$ ls
=0.1.1 =4.8.0 LICENSE
=0.5.3 =5.2.0 Makefile
=0.6.1 =7.1.0 pyproject.toml
=0.6.1, all_commands.txt README_git_workflow.md
=0.8.1 CHANGES.md README.md
=1.2.0 ciscoconfparse/ requirements.txt
=1.7.0 configs/ sphinx-doc/
=2.0 CONTRIBUTING.md tests/
=2.2.0 deploy_docs.py tutorial/
=22.2.0 dev_tools/ utils/
=22.8.0 do.py
=2.7.0 examples/
$
I tried this, but it seems that there may be some more efficient means to accomplish this task...
# glob "*" will list all files globbed against "*"
foreach my $filename (grep { /\W\d+\.\d+/ } glob "*") {
my $cmd1 = "rm $filename";
`$cmd1`;
}
Question:
I want a remove command that matches against a pcre.
What is a more efficient perl solution to delete the files matching this perl regex: /\W\d+\.\d+/ (example filename: '=0.1.1')?
Fetch a wider set of files and then filter through whatever you want
my #files_to_del = grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "$dir/*";
I added an anchor (^) so that the regex can only match a string that begins with that pattern, otherwise this can blow away files other than intended. Reconsider what exactly you need.
Altogether perhaps (or see a one-liner below †)
use warnings;
use strict;
use feature 'say';
use File::Glob ':bsd_glob'; # for better glob()
use Cwd qw(cwd); # current-working-directory
my $dir = shift // cwd; # cwd by default, or from input
my $re = qr/^\W[0-9]+\.[0-9]+/;
my #files_to_del = grep { /$re/ and not -d } glob "$dir/*";
say for #files_to_del; # please inspect first
#unlink or warn "Can't unlink $_: $!" for #files_to_del;
where that * in glob might as well have some pre-selection, if suitable. In particular, if the = is a literal character (and not an indicator printed by the shell, see footnote)‡ then glob "=*" will fetch files starting with it, and then you can pass those through a grep filter.
I exclude directories, identified by -d filetest, since we are looking for files (and to not mix with some scary language about directories from unlink, thanks to brian d foy comment).
If you'd need to scan subdirectories and do the same with them, perhaps recursively -- what doesn't seem to be the case here? -- then we could employ this logic in File::Find::find (or File::Find::Rule, or yet others).
Or read the directory any other way (opendir+readdir, libraries like Path::Tiny), and filter.
† Or, a quick one-liner ... print (to inspect) what's about to get blown away
perl -wE'say for grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "*"'
and then delete 'em
perl -wE'unlink or warn "$_: $!" for grep /^\W[0-9]+\.[0-9]+/ && !-d, glob "*"'
(I switched to a more compact syntax just so. Not necessary)
If you'd like to be able to pass a directory to it (optionally, or work in the current one) then do
perl -wE'$d = shift//q(.); ...' dirpath (relative path fine. optional)
and then use glob "$d/*" in the code. This works the same way as in the script above -- shift pulls the first element from #ARGV, if anything was passed to the script on the command line, or if #ARGV is empty it returns undef and then // (defined-or) operator picks up the string q(.).
‡ That leading = may be an "indicator" of a file type if ls has been aliased with ls -F, what can be checked by running ls with suppressed aliases, one way being \ls (or check alias ls).
If that is so, the = stands for it being a socket, what in Perl can be tested for by the -S filetest.
Then that \W in the proposed regex may need to be changed to \W? to allow for no non-word characters preceding a digit, along with a test for a socket. Like
my $re = qr/^\W? [0-9]+ \. [0-9]+/x;
my #files_to_del = grep { /$re/ and -S } glob "$dir/*";
Why not just:
$ rm =*
Sometimes, shell commands are the best option.
In these cases, I use perl to merely filter the list of files:
ls | perl -ne 'print if /\A\W\d+\.\d+/a' | xargs rm
And, when I do that, I feel guilty for not doing something simpler with an extended pattern in grep:
ls | grep -E '^\W\d+\.\d+' | xargs rm
Eventually I'll run into a problem where there's a directory so I need to be more careful about the file list:
find . -type f -maxdepth 1 | grep -E '^\./\W\d+\.\d+' | xargs rm
Or I need to allow rm to remove directories too should I want that:
ls | grep -E '^\W\d+\.\d+' | xargs rm -r
Here you go.
unlink( grep { /\W\d+\.\d+/ && !-d } glob( "*" ) );
This matches the filename, and excludes directories.
To delete filenames matching this: /\W\d+\.\d+/ pcre, use the following one-liners...
1> $fn is a filename... I'm also removing the my keywords since the one-liner doesn't have to worry about perl lexical scopes:
perl -e 'foreach $fn (grep { /\W\d+\.\d+/ } glob "*") {$cmd1="rm $fn";`$cmd1`;}'
2> Or as Andy Lester responded, perhaps his answer is as efficient as we can make it...
perl -e 'unlink(grep { /\W\d+\.\d+/ } glob "*");'

Use find to identify filename same as the parent directory name

I would like to use find in order to search for files in different subdirectories that have to match the same pattern as their parent category.
example:
ls
Random1_fa Random2_fa Random3_fa
inside these dirs there are different files that I want to search for only one of each:
cd Random1_fa
Random1.fa
Random1.fastq
Random1_match_genome.fa
Random1_unmatch_genome.fa
...
I want to "find" only the files with "filename".fa e.g:
/foo/bar/1_Random1/Random1_fa/Random1.fa
/foo/bar/2_Random2/Random2_fa/Random2.fa
/foo/bar/3_Random5/Random5_fa/Random5.fa
/foo/bar/10_Random99/Random99_fa/Random99.fa
I did:
ls | sed 's/_fa//' |find -name "*.fa"
but not what I was looking for.
I want to redirect the result of sed as a regex pattern in find.
Something "awk-like" this:
ls| sed 's/_fa//' |find -name "$1.fa"
or
ls| sed 's/_fa/.fa/' |find -name "$1"
Why read from standard input using sed to filter out files to exclude when you can do the regex condition directly with find. First you run a shell glob expansion for all directories ending with _fa and get the name of the string to find to use in the find expression. All you need to do is
for dir in ./*_fa; do
# Ignore un-expanded globs from the for-loop. The un-expanded string woul fail
# to match the condition for a directory(-d), so we exit the loop in case
# we find no files to match
[ -d "$dir" ] || continue
# The filename from the glob expansion is returned as './name.fa'. Using the
# built-in parameter expansion we remove the './' and '_fa' from the name
str="${dir##./}"
regex="${str%%_fa}"
# We then use 'find' to identify the file as 'name.fa' in the directory
find "$dir" -type f -name "${regex}.fa"
done
The below would match filenames containing only [A-Za-z0-9] and ending with .fa. Run this command at the top level containing your directories to match all the files.
To copy the file elsewhere add the following
find "$dir" -type f -name "${regex}.fa" -exec cp -t /home/destinationPath {} +

sed / awk - remove space in file name

I'm trying to remove whitespace in file names and replace them.
Input:
echo "File Name1.xml File Name3 report.xml" | sed 's/[[:space:]]/__/g'
However the output
File__Name1.xml__File__Name3__report.xml
Desired output
File__Name1.xml File__Name3__report.xml
You named awk in the title of the question, didn't you?
$ echo "File Name1.xml File Name3 report.xml" | \
> awk -F'.xml *' '{for(i=1;i<=NF;i++){gsub(" ","_",$i); printf i<NF?$i ".xml ":"\n" }}'
File_Name1.xml File_Name3_report.xml
$
-F'.xml *' instructs awk to split on a regex, the requested extension plus 0 or more spaces
the loop {for(i=1;i<=NF;i++) is executed for all the fields in which the input line(s) is(are) splitted — note that the last field is void (it is what follows the last extension), but we are going to take that into account...
the body of the loop
gsub(" ","_", $i) substitutes all the occurrences of space to underscores in the current field, as indexed by the loop variable i
printf i<NF?$i ".xml ":"\n" output different things, if i<NF it's a regular field, so we append the extension and a space, otherwise i equals NF, we just want to terminate the output line with a newline.
It's not perfect, it appends a space after the last filename. I hope that's good enough...
▶    A D D E N D U M    ◀
I'd like to address:
the little buglet of the last space...
some of the issues reported by Ed Morton
generalize the extension provided to awk
To reach these goals, I've decided to wrap the scriptlet in a shell function, that changing spaces into underscores is named s2u
$ s2u () { awk -F'\.'$1' *' -v ext=".$1" '{
> NF--;for(i=1;i<=NF;i++){gsub(" ","_",$i);printf "%s",$i ext (i<NF?" ":"\n")}}'
> }
$ echo "File Name1.xml File Name3 report.xml" | s2u xml
File_Name1.xml File_Name3_report.xml
$
It's a bit different (better?) 'cs it does not special print the last field but instead special-cases the delimiter appended to each field, but the idea of splitting on the extension remains.
This seems a good start if the filenames aren't delineated:
((?:\S.*?)?\.\w{1,})\b
( // start of captured group
(?: // non-captured group
\S.*? // a non-white-space character, then 0 or more any character
)? // 0 or 1 times
\. // a dot
\w{1,} // 1 or more word characters
) // end of captured group
\b // a word boundary
You'll have to look-up how a PCRE pattern converts to a shell pattern. Alternatively it can be run from a Python/Perl/PHP script.
Demo
Assuming you are asking how to rename file names, and not remove spaces in a list of file names that are being used for some other reason, this is the long and short way. The long way uses sed. The short way uses rename. If you are not trying to rename files, your question is quite unclear and should be revised.
If the goal is to simply get a list of xml file names and change them with sed, the bottom example is how to do that.
directory contents:
ls -w 2
bob is over there.xml
fred is here.xml
greg is there.xml
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}";
done
shopt -u nullglob
# output
bob is over there.xml
fred is here.xml
greg is there.xml
# then rename them
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
# I prefer 'rename' for such things
# rename 's/[[:space:]]/_/g' "${a_glob[i]}";
# but sed works, can't see any reason to use it for this purpose though
mv "${a_glob[i]}" $(sed 's/[[:space:]]/_/g' <<< "${a_glob[i]}");
done
shopt -u nullglob
result:
ls -w 2
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml
globbing is what you want here because of the spaces in the names.
However, this is really a complicated solution, when actually all you need to do is:
cd [your space containing directory]
rename 's/[[:space:]]/_/g' *.xml
and that's it, you're done.
If on the other hand you are trying to create a list of file names, you'd certainly want the globbing method, which if you just modify the statement, will do what you want there too, that is, just use sed to change the output file name.
If your goal is to change the filenames for output purposes, and not rename the actual files:
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}" | sed 's/[[:space:]]/_/g';
done
shopt -u nullglob
# output:
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml
You could use rename:
rename --nows *.xml
This will replace all the spaces of the xml files in the current folder with _.
Sometimes it comes without the --nows option, so you can then use a search and replace:
rename 's/[[:space:]]/__/g' *.xml
Eventually you can use --dry-run if you want to just print filenames without editing the names.

Using a variable in sed search pattern when the value of the variable contains square brackets

What I'm trying to do is check that a file has been created. The best way I can think to do this is by listing the files before hand, listing them afterwards, deleting the before list from the after list, then seeing if the after list is not zero. I ran into trouble deleting the before list from the after list. Filenames with square brackets were not being deleted from the list.
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
If I use '' then the variable doesn't get called, and the -r option on read doesn't seem to make it work like I expected. If anyone has any suggestions on alternative ways of doing this, do contribute, but I would still like to know how to use a variable in the search pattern when the value of the variable contains metacharacters. If anyone can help remove the code smell of "rm listfilesafter.swp--" then that would also be appreciated. Full code below:
cd ~/Desktop
ls >listfilesbefore.swp
#echo "balh blah" >SomeNonZeroFile.txt #comment or uncomment to test the if then statement
ls >listfilesafter.swp
sed -i -- '/listfilesafter.swp/d' listfilesafter.swp #deletes listfilesafter.swp from the list of files create after the event on line 3
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
cat listfilesafter.swp
echo "check listfiles. Enter to continue."
read dummy_variable
if [ -s listfilesafter.swp ]
then
rm listfilesbefore.swp
rm listfilesafter.swp
echo "success, the file was created"
else
rm listfilesbefore.swp
rm listfilesafter.swp
echo "failure, the file was not created"
fi
Given that you have two lists of files in sorted order (since ls lists the files in sorted order), you should probably be using a command like diff or, in this case,
comm to find the differences between the two lists of files.
If you want to know which file(s) were created, then that's the list of files (lines) in the second file that are not in the first. With no options, comm lists the lines it reads in 3 columns:
lines in the first file not in the second
lines in the second file not in the first
lines in both files
You only need the lines (file names) in the second column, and therefore you want to suppress the list of files in the first and third columns, so you'll use comm -13 to do that:
before=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
after=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
trap "rm -f $before $after; exit 1" 0 1 2 3 13 15
ls > $before
…execute command that creates file(s)…
ls > $after
comm -13 $before $after
rm -f $before $after
trap 0
Obviously, you could capture the list of files from comm in a variable for further analysis, etc.
Making sed work when the search strings contain metacharacters
I'm still confused about sed. How do I use a variable in the search pattern of sed if the value contains metacharacters? Or in this case would I be better off using something other than sed?
In the scenario you have, you're far better off not using sed, and in any case your technique is horrendously slow if there are hundreds or thousands of files in the directory (running sed once per file name is not going to be fast).
However, supposing that it was necessary to use sed and that you wanted to deal with metacharacters in the file names in the list, then you would have to escape the metacharacters (with a backslash in front). I'd probably do something like this:
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
sed -f script.sed listfilesafter.swp
The first script takes any metacharacters in the line (file name) and replaces it with backslash-metacharacter. In the first substitute, the [][\/*.] character class matches square brackets, two types of slashes, stars and dots. Depending on the predilections of the variant of sed you're using, you might need to protect (){} with backslashes too, but in POSIX standard sed, the {} gain metacharacter meaning when prefixed with a backslash, so they're not modified by default. The second substitute takes the possibly modified line and converts it into a 'match and delete' command. The output, therefore, is a sed script that will delete the file names found in listfilesbefore.swp. The second command applies that script to listfilesafter.swp, doing in one sed command what your outline code does with one run of sed per file name.
Using sed to generate a sed script is a powerful technique. It isn't always appropriate, but when it is, it is very useful.
Shell script demo.sh
echo "Pre-populate the directory with some random file names"
for file in $(random -n 20 -T '%W%V%C-%w%v%c%v%c-%04[0000:9999]d.txt')
do
cp /dev/null $file
done
for template in '%w%v%w(%03[000:999]d)%w%v%w.txt' \
'%w%v%w[123]%w%v%we.txt' \
'%w%v%wfile*%03[0:999]d*.txt' \
'%w%v%w%v%c\\\%d.txt' \
'%w%v%w-{%04X}-{%04X}.txt'
do
for file in $(random -n 2 -T "$template")
do
cp /dev/null "$file"
done
done
ls > listfilesbefore.swp
ls
echo
echo "Create some new files with metacharacters in the names"
for file in 'new(123)file.txt' 'new[123]file.txt' 'newfile*321*.txt' \
'newfile\\\.txt' 'newfile-{A39F}-{B77D}.txt'
do
cp /dev/null "$file"
done
ls
ls > listfilesafter.swp
echo
echo "Create sed script"
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
echo
cat script.sed
echo
echo "Apply it"
sed -f script.sed listfilesafter.swp
The random command I'm using is of my own devising, but it is convenient for demonstrations such as this.
Example run
Pre-populate the directory with some random file names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create some new files with metacharacters in the names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create sed script
/^AIG-taral-3486\.txt$/d
/^COV-oipuc-9088\.txt$/d
/^CUG-vowan-5758\.txt$/d
/^FEH-ieqek-0603\.txt$/d
/^IUS-aaduw-7080\.txt$/d
/^KER-jazuc-4824\.txt$/d
/^MIZ-iezec-8255\.txt$/d
/^NIT-kupib-6873\.txt$/d
/^PUX-oocov-2216\.txt$/d
/^QAW-xonod-3937\.txt$/d
/^QES-wawok-4790\.txt$/d
/^RON-difag-1986\.txt$/d
/^SAD-gesug-5706\.txt$/d
/^SAJ-luqoj-4311\.txt$/d
/^TUZ-wapaw-8547\.txt$/d
/^VAL-zutap-8054\.txt$/d
/^YIP-xudeb-7397\.txt$/d
/^YUP-uudiv-8848\.txt$/d
/^ZIB-jurax-2903\.txt$/d
/^ZUR-xonik-8800\.txt$/d
/^aavfile\*147\*\.txt$/d
/^demo\.sh$/d
/^diman\\\\\\7115\.txt$/d
/^ganur\\\\\\8732\.txt$/d
/^gud-{7049}-{3103}\.txt$/d
/^listfilesbefore\.swp$/d
/^lur\[123\]maee\.txt$/d
/^rivfile\*065\*\.txt$/d
/^ueo(417)yea\.txt$/d
/^uoi(751)qio\.txt$/d
/^woi-{37E8}-{009C}\.txt$/d
/^xof\[123\]hoxe\.txt$/d
Apply it
listfilesafter.swp
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt

BASH: How to rename lots of file insertnig folder name in middle of filename

(I'm in a Bash environment, Cygwin on a Windows machine, with awk, sed, grep, perl, etc...)
I want to add the last folder name to the filename, just before the last underscore (_) followed by numbers or at the end if no numbers are in the filename.
Here is an example of what I have (hundreds of files needed to be reorganized) :
./aaa/A/C_17x17.p
./aaa/A/C_32x32.p
./aaa/A/C.p
./aaa/B/C_12x12.p
./aaa/B/C_4x4.p
./aaa/B/C_A_3x3.p
./aaa/B/C_X_91x91.p
./aaa/G/C_6x6.p
./aaa/G/C_7x7.p
./aaa/G/C_A_113x113.p
./aaa/G/C_A_8x8.p
./aaa/G/C_B.p
./aab/...
I would like to rename all thses files like this :
./aaa/C_A_17x17.p
./aaa/C_A_32x32.p
./aaa/C_A.p
./aaa/C_B_12x12.p
./aaa/C_B_4x4.p
./aaa/C_A_B_3x3.p
./aaa/C_X_B_91x91.p
./aaa/C_G_6x6.p
./aaa/C_G_7x7.p
./aaa/C_A_G_113x113.p
./aaa/C_A_G_8x8.p
./aaa/C_B_G.p
./aab/...
I tried many bash for loops with sed and the last one was the following :
IFS=$'\n'
for ofic in `find * -type d -name 'A'`; do
fic=`echo $ofic|sed -e 's/\/A$//'`
for ftr in `ls -b $ofic | grep -E '.png$'`; do
nfi=`echo $ftr|sed -e 's/(_\d+[x]\d+)?/_A\1/'`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
But yet with no success... This \1 does not get inserted in the $nfi...
This is the last one I tried, only working on 1 folder (which is a subfolder of a huge folder collection) and after over 60 minutes of unsuccessful trials, I'm here with you guys.
I modified your script so that it works for all your examples.
IFS=$'\n'
for ofic in ???/?; do
IFS=/ read fic fia <<<$ofic
for ftr in `ls -b $ofic | grep -E '\.p.*$'`; do
nfi=`echo $ftr|sed -e "s/_[0-9]*x[0-9]*/_$fia&/;t;s/\./_$fia./"`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
# it's easier to change to here first
cd aaa
# process every file
for f in $(find . -type f); do
# strips everything after the first / so this is our foldername
foldername=${f/\/*/}
# creates the new filename from substrings of the
# original filename concatenated to the foldername
newfilename=".${f:1:3}${foldername}_${f:4}"
# if you are satisfied with the output, just leave out the `echo`
# from below
echo mv ${f} ${newfilename}
done
Might work for you.
See here in action. (slightly modified, as ideone.com handles STDIN/find diferently...)