Bash: delete most files in directory - regex

I have a directory full of mostly postscript files which I'm trying to erase most: Namely those who don't have 000100, 000110, 000120 or 000200 on the second place in their name. I want to retain those.
Here is an excerpt from the directory:
0091_000100_0000_0000_0001_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000110_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000120_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000200_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000300_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000310_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000320_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000330_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000400_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000410_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000420_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_001120_0102_0000_0003_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0096_000100_0000_0000_0001_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000110_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000120_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000200_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000300_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000310_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000320_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000330_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000400_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000410_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000420_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000430_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000440_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000450_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0097_000100_0000_0000_0001_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000110_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000120_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000200_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000300_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000310_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000320_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000330_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000400_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000410_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000420_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000430_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
This is what I'm trying to get:
0091_000100_0000_0000_0001_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000110_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000120_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0091_000200_0000_0000_0002_000000__66_5_32_6_9_82856598585_60_3560351294_L_40_1_52_9_42_97_58_53.ps
0096_000100_0000_0000_0001_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000110_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000120_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0096_000200_0000_0000_0002_000000__85_5_2__2_37732144298_48_1790154593_L_52_26_17_77_41_43.ps
0097_000100_0000_0000_0001_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000110_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000120_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
0097_000200_0000_0000_0002_000000__81_5_46_2_48_2146991211_65_1953946853_L_44_6_72_1_58_71_77_49.ps
My try so far works but is somewhat unpractical:
#!/bin/sh
for f in *.ps; do
case $f in
(0091_000100*.ps|0091_000110*.ps|0091_000120*.ps|0091_000200*.ps)
;;
(*)
rm -- "$f";;
esac
done
I have to write every start of the filename I want to keep. One problem: The script doesn't match the 0096_* and 0097_* files and all the others omitted for readability. The format of the filename is always the same up to the double underscore. The values in the number groups might change.
Is there a way to match for the second group? My experimentation wasn't successful so far.
Thank you for your help!

Seems like ls *.ps | awk -F_ '$2 < 100 || $2 > 200' might be the list of files you want to delete. After verifying that,
rm $(ls *.ps | awk -F_ '$2 < 100 || $2 > 200')
As long as no file has whitespace or glob characters in its name. (If they do, use xargs)

I like using find for best performance when dealing with a large count of files.
This regex should yield the same results:
find . -type f -name '*.ps' |egrep "000[12]{1}[012]{1}" |xargs rm -f

Assuming a directory has only regular files...
ls *.ps | egrep -v '^[0-9]{4}_000100_|^[0-9]{4}_000110_|^[0-9]{4}_000120_|^[0-9]{4}_000200_' | xargs rm -f

Related

How can I use perl to delete files matching a regex

Due to a Makefile mistake, I have some fake files in my git repo...
$ ls
=0.1.1 =4.8.0 LICENSE
=0.5.3 =5.2.0 Makefile
=0.6.1 =7.1.0 pyproject.toml
=0.6.1, all_commands.txt README_git_workflow.md
=0.8.1 CHANGES.md README.md
=1.2.0 ciscoconfparse/ requirements.txt
=1.7.0 configs/ sphinx-doc/
=2.0 CONTRIBUTING.md tests/
=2.2.0 deploy_docs.py tutorial/
=22.2.0 dev_tools/ utils/
=22.8.0 do.py
=2.7.0 examples/
$
I tried this, but it seems that there may be some more efficient means to accomplish this task...
# glob "*" will list all files globbed against "*"
foreach my $filename (grep { /\W\d+\.\d+/ } glob "*") {
my $cmd1 = "rm $filename";
`$cmd1`;
}
Question:
I want a remove command that matches against a pcre.
What is a more efficient perl solution to delete the files matching this perl regex: /\W\d+\.\d+/ (example filename: '=0.1.1')?
Fetch a wider set of files and then filter through whatever you want
my #files_to_del = grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "$dir/*";
I added an anchor (^) so that the regex can only match a string that begins with that pattern, otherwise this can blow away files other than intended. Reconsider what exactly you need.
Altogether perhaps (or see a one-liner below †)
use warnings;
use strict;
use feature 'say';
use File::Glob ':bsd_glob'; # for better glob()
use Cwd qw(cwd); # current-working-directory
my $dir = shift // cwd; # cwd by default, or from input
my $re = qr/^\W[0-9]+\.[0-9]+/;
my #files_to_del = grep { /$re/ and not -d } glob "$dir/*";
say for #files_to_del; # please inspect first
#unlink or warn "Can't unlink $_: $!" for #files_to_del;
where that * in glob might as well have some pre-selection, if suitable. In particular, if the = is a literal character (and not an indicator printed by the shell, see footnote)‡ then glob "=*" will fetch files starting with it, and then you can pass those through a grep filter.
I exclude directories, identified by -d filetest, since we are looking for files (and to not mix with some scary language about directories from unlink, thanks to brian d foy comment).
If you'd need to scan subdirectories and do the same with them, perhaps recursively -- what doesn't seem to be the case here? -- then we could employ this logic in File::Find::find (or File::Find::Rule, or yet others).
Or read the directory any other way (opendir+readdir, libraries like Path::Tiny), and filter.
† Or, a quick one-liner ... print (to inspect) what's about to get blown away
perl -wE'say for grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "*"'
and then delete 'em
perl -wE'unlink or warn "$_: $!" for grep /^\W[0-9]+\.[0-9]+/ && !-d, glob "*"'
(I switched to a more compact syntax just so. Not necessary)
If you'd like to be able to pass a directory to it (optionally, or work in the current one) then do
perl -wE'$d = shift//q(.); ...' dirpath (relative path fine. optional)
and then use glob "$d/*" in the code. This works the same way as in the script above -- shift pulls the first element from #ARGV, if anything was passed to the script on the command line, or if #ARGV is empty it returns undef and then // (defined-or) operator picks up the string q(.).
‡ That leading = may be an "indicator" of a file type if ls has been aliased with ls -F, what can be checked by running ls with suppressed aliases, one way being \ls (or check alias ls).
If that is so, the = stands for it being a socket, what in Perl can be tested for by the -S filetest.
Then that \W in the proposed regex may need to be changed to \W? to allow for no non-word characters preceding a digit, along with a test for a socket. Like
my $re = qr/^\W? [0-9]+ \. [0-9]+/x;
my #files_to_del = grep { /$re/ and -S } glob "$dir/*";
Why not just:
$ rm =*
Sometimes, shell commands are the best option.
In these cases, I use perl to merely filter the list of files:
ls | perl -ne 'print if /\A\W\d+\.\d+/a' | xargs rm
And, when I do that, I feel guilty for not doing something simpler with an extended pattern in grep:
ls | grep -E '^\W\d+\.\d+' | xargs rm
Eventually I'll run into a problem where there's a directory so I need to be more careful about the file list:
find . -type f -maxdepth 1 | grep -E '^\./\W\d+\.\d+' | xargs rm
Or I need to allow rm to remove directories too should I want that:
ls | grep -E '^\W\d+\.\d+' | xargs rm -r
Here you go.
unlink( grep { /\W\d+\.\d+/ && !-d } glob( "*" ) );
This matches the filename, and excludes directories.
To delete filenames matching this: /\W\d+\.\d+/ pcre, use the following one-liners...
1> $fn is a filename... I'm also removing the my keywords since the one-liner doesn't have to worry about perl lexical scopes:
perl -e 'foreach $fn (grep { /\W\d+\.\d+/ } glob "*") {$cmd1="rm $fn";`$cmd1`;}'
2> Or as Andy Lester responded, perhaps his answer is as efficient as we can make it...
perl -e 'unlink(grep { /\W\d+\.\d+/ } glob "*");'

iterate over apache 2 log file names and compare numbers using linux bash

Here is an example of logs in my /var/www/apache2/log folder-
./no_domain_access.log.7.gz
./no_domain_access.log.8.gz
./no_domain_access.log.9.gz
./no_domain_error.log.10.gz
./no_domain_error.log.11.gz
./no_domain_error.log.12.gz
./no_domain_error.log.13.gz
./no_domain_error.log.14.gz
./no_domain_error.log.15.gz
./no_domain_error.log.16.gz
./no_domain_error.log.17.gz
./no_domain_error.log.18.gz
./no_domain_error.log.19.gz
./no_domain_error.log.20.gz
and goes until 50...
I would like to iterate over those files and remove all log files that are greater then 5.
using regex syntax will give me the option to match numbers in the pattern of [1-9] or {1,2} but this will also match that log files that i dont want to delete ( single numbers 1-5 log files that i wish to keep)
How can i match only file names with numbers higher than 5 ?
Thanks!
You can use awk one-liner for this:
printf '%s\n' *[0-9].gz | awk -F '.' '$(NF-1) >= 5'
This awk command uses dot as field separator and compared $(NF-1) (that is the numeric field before extension) with number 5.
To delete these files use:
printf '%s\n' *[0-9].gz | awk -F '.' '$(NF-1) >= 5' | xargs rm
xargs takes input from awk and rm command just deletes those files.
Use the bash, regex operator ~ to extract the number and list the file if the number was greater than 5
for file in /var/www/apache2/log/*.gz; do
test -f "$file" || continue
[[ $file =~ ^.*log\.([[:digit:]]+).*$ ]] && { (( "${BASH_REMATCH[1]}" > 5 )) && printf "%s\n" "$file"; }
done
If you just want to delete the files, replace printf "%s\n" by just rm.
Find with regular expressions
find . -regex './no_domain_access.log.*gz' ! -regex './no_domain_access.log.[1-5].gz'
Find all files matching no_domain... and then run another regular expression to attain all these results minus files with 1 to 5.
Without regular expressions, using shell globs and entirely native & portable POSIX shell code:
rm -f no_domain_access.log.[6-9].gz no_domain_access.log.[0-9][0-9].gz
It's easier in bash:
rm -f no_domain_access.log.{6..50}.gz
These are probably created with logrotate or a similar log rotation utility. You might want to just change its configuration to only store five logs.
If it's controlled by logrotate, you can find the documentation with man logrotate and you'll probably find something like this:
/var/log/no_domain_access.log {
rotate 50
daily
}
Change the 50 to 5 and you're done. You probably(?) still have to clean up the current old logs using one of the above commands.

Using a variable in sed search pattern when the value of the variable contains square brackets

What I'm trying to do is check that a file has been created. The best way I can think to do this is by listing the files before hand, listing them afterwards, deleting the before list from the after list, then seeing if the after list is not zero. I ran into trouble deleting the before list from the after list. Filenames with square brackets were not being deleted from the list.
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
If I use '' then the variable doesn't get called, and the -r option on read doesn't seem to make it work like I expected. If anyone has any suggestions on alternative ways of doing this, do contribute, but I would still like to know how to use a variable in the search pattern when the value of the variable contains metacharacters. If anyone can help remove the code smell of "rm listfilesafter.swp--" then that would also be appreciated. Full code below:
cd ~/Desktop
ls >listfilesbefore.swp
#echo "balh blah" >SomeNonZeroFile.txt #comment or uncomment to test the if then statement
ls >listfilesafter.swp
sed -i -- '/listfilesafter.swp/d' listfilesafter.swp #deletes listfilesafter.swp from the list of files create after the event on line 3
while read -r LINE
do
sed -i -- "/$LINE/d" listfilesafter.swp #without the -- I get 'sed: 1: "listfilesafter.swp": extra characters at the end of l command'
rm listfilesafter.swp--
done < listfilesbefore.swp
cat listfilesafter.swp
echo "check listfiles. Enter to continue."
read dummy_variable
if [ -s listfilesafter.swp ]
then
rm listfilesbefore.swp
rm listfilesafter.swp
echo "success, the file was created"
else
rm listfilesbefore.swp
rm listfilesafter.swp
echo "failure, the file was not created"
fi
Given that you have two lists of files in sorted order (since ls lists the files in sorted order), you should probably be using a command like diff or, in this case,
comm to find the differences between the two lists of files.
If you want to know which file(s) were created, then that's the list of files (lines) in the second file that are not in the first. With no options, comm lists the lines it reads in 3 columns:
lines in the first file not in the second
lines in the second file not in the first
lines in both files
You only need the lines (file names) in the second column, and therefore you want to suppress the list of files in the first and third columns, so you'll use comm -13 to do that:
before=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
after=$(mktemp ${TMPDIR:-/tmp}/files.XXXXXX)
trap "rm -f $before $after; exit 1" 0 1 2 3 13 15
ls > $before
…execute command that creates file(s)…
ls > $after
comm -13 $before $after
rm -f $before $after
trap 0
Obviously, you could capture the list of files from comm in a variable for further analysis, etc.
Making sed work when the search strings contain metacharacters
I'm still confused about sed. How do I use a variable in the search pattern of sed if the value contains metacharacters? Or in this case would I be better off using something other than sed?
In the scenario you have, you're far better off not using sed, and in any case your technique is horrendously slow if there are hundreds or thousands of files in the directory (running sed once per file name is not going to be fast).
However, supposing that it was necessary to use sed and that you wanted to deal with metacharacters in the file names in the list, then you would have to escape the metacharacters (with a backslash in front). I'd probably do something like this:
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
sed -f script.sed listfilesafter.swp
The first script takes any metacharacters in the line (file name) and replaces it with backslash-metacharacter. In the first substitute, the [][\/*.] character class matches square brackets, two types of slashes, stars and dots. Depending on the predilections of the variant of sed you're using, you might need to protect (){} with backslashes too, but in POSIX standard sed, the {} gain metacharacter meaning when prefixed with a backslash, so they're not modified by default. The second substitute takes the possibly modified line and converts it into a 'match and delete' command. The output, therefore, is a sed script that will delete the file names found in listfilesbefore.swp. The second command applies that script to listfilesafter.swp, doing in one sed command what your outline code does with one run of sed per file name.
Using sed to generate a sed script is a powerful technique. It isn't always appropriate, but when it is, it is very useful.
Shell script demo.sh
echo "Pre-populate the directory with some random file names"
for file in $(random -n 20 -T '%W%V%C-%w%v%c%v%c-%04[0000:9999]d.txt')
do
cp /dev/null $file
done
for template in '%w%v%w(%03[000:999]d)%w%v%w.txt' \
'%w%v%w[123]%w%v%we.txt' \
'%w%v%wfile*%03[0:999]d*.txt' \
'%w%v%w%v%c\\\%d.txt' \
'%w%v%w-{%04X}-{%04X}.txt'
do
for file in $(random -n 2 -T "$template")
do
cp /dev/null "$file"
done
done
ls > listfilesbefore.swp
ls
echo
echo "Create some new files with metacharacters in the names"
for file in 'new(123)file.txt' 'new[123]file.txt' 'newfile*321*.txt' \
'newfile\\\.txt' 'newfile-{A39F}-{B77D}.txt'
do
cp /dev/null "$file"
done
ls
ls > listfilesafter.swp
echo
echo "Create sed script"
sed 's/[][\/*.]/\\&/g; s%.*%/^&$/d%' listfilesbefore.swp > script.sed
echo
cat script.sed
echo
echo "Apply it"
sed -f script.sed listfilesafter.swp
The random command I'm using is of my own devising, but it is convenient for demonstrations such as this.
Example run
Pre-populate the directory with some random file names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create some new files with metacharacters in the names
AIG-taral-3486.txt
COV-oipuc-9088.txt
CUG-vowan-5758.txt
FEH-ieqek-0603.txt
IUS-aaduw-7080.txt
KER-jazuc-4824.txt
MIZ-iezec-8255.txt
NIT-kupib-6873.txt
PUX-oocov-2216.txt
QAW-xonod-3937.txt
QES-wawok-4790.txt
RON-difag-1986.txt
SAD-gesug-5706.txt
SAJ-luqoj-4311.txt
TUZ-wapaw-8547.txt
VAL-zutap-8054.txt
YIP-xudeb-7397.txt
YUP-uudiv-8848.txt
ZIB-jurax-2903.txt
ZUR-xonik-8800.txt
aavfile*147*.txt
demo.sh
diman\\\7115.txt
ganur\\\8732.txt
gud-{7049}-{3103}.txt
listfilesbefore.swp
lur[123]maee.txt
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt
rivfile*065*.txt
ueo(417)yea.txt
uoi(751)qio.txt
woi-{37E8}-{009C}.txt
xof[123]hoxe.txt
Create sed script
/^AIG-taral-3486\.txt$/d
/^COV-oipuc-9088\.txt$/d
/^CUG-vowan-5758\.txt$/d
/^FEH-ieqek-0603\.txt$/d
/^IUS-aaduw-7080\.txt$/d
/^KER-jazuc-4824\.txt$/d
/^MIZ-iezec-8255\.txt$/d
/^NIT-kupib-6873\.txt$/d
/^PUX-oocov-2216\.txt$/d
/^QAW-xonod-3937\.txt$/d
/^QES-wawok-4790\.txt$/d
/^RON-difag-1986\.txt$/d
/^SAD-gesug-5706\.txt$/d
/^SAJ-luqoj-4311\.txt$/d
/^TUZ-wapaw-8547\.txt$/d
/^VAL-zutap-8054\.txt$/d
/^YIP-xudeb-7397\.txt$/d
/^YUP-uudiv-8848\.txt$/d
/^ZIB-jurax-2903\.txt$/d
/^ZUR-xonik-8800\.txt$/d
/^aavfile\*147\*\.txt$/d
/^demo\.sh$/d
/^diman\\\\\\7115\.txt$/d
/^ganur\\\\\\8732\.txt$/d
/^gud-{7049}-{3103}\.txt$/d
/^listfilesbefore\.swp$/d
/^lur\[123\]maee\.txt$/d
/^rivfile\*065\*\.txt$/d
/^ueo(417)yea\.txt$/d
/^uoi(751)qio\.txt$/d
/^woi-{37E8}-{009C}\.txt$/d
/^xof\[123\]hoxe\.txt$/d
Apply it
listfilesafter.swp
new(123)file.txt
new[123]file.txt
newfile*321*.txt
newfile-{A39F}-{B77D}.txt
newfile\\\.txt

BASH: How to rename lots of file insertnig folder name in middle of filename

(I'm in a Bash environment, Cygwin on a Windows machine, with awk, sed, grep, perl, etc...)
I want to add the last folder name to the filename, just before the last underscore (_) followed by numbers or at the end if no numbers are in the filename.
Here is an example of what I have (hundreds of files needed to be reorganized) :
./aaa/A/C_17x17.p
./aaa/A/C_32x32.p
./aaa/A/C.p
./aaa/B/C_12x12.p
./aaa/B/C_4x4.p
./aaa/B/C_A_3x3.p
./aaa/B/C_X_91x91.p
./aaa/G/C_6x6.p
./aaa/G/C_7x7.p
./aaa/G/C_A_113x113.p
./aaa/G/C_A_8x8.p
./aaa/G/C_B.p
./aab/...
I would like to rename all thses files like this :
./aaa/C_A_17x17.p
./aaa/C_A_32x32.p
./aaa/C_A.p
./aaa/C_B_12x12.p
./aaa/C_B_4x4.p
./aaa/C_A_B_3x3.p
./aaa/C_X_B_91x91.p
./aaa/C_G_6x6.p
./aaa/C_G_7x7.p
./aaa/C_A_G_113x113.p
./aaa/C_A_G_8x8.p
./aaa/C_B_G.p
./aab/...
I tried many bash for loops with sed and the last one was the following :
IFS=$'\n'
for ofic in `find * -type d -name 'A'`; do
fic=`echo $ofic|sed -e 's/\/A$//'`
for ftr in `ls -b $ofic | grep -E '.png$'`; do
nfi=`echo $ftr|sed -e 's/(_\d+[x]\d+)?/_A\1/'`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
But yet with no success... This \1 does not get inserted in the $nfi...
This is the last one I tried, only working on 1 folder (which is a subfolder of a huge folder collection) and after over 60 minutes of unsuccessful trials, I'm here with you guys.
I modified your script so that it works for all your examples.
IFS=$'\n'
for ofic in ???/?; do
IFS=/ read fic fia <<<$ofic
for ftr in `ls -b $ofic | grep -E '\.p.*$'`; do
nfi=`echo $ftr|sed -e "s/_[0-9]*x[0-9]*/_$fia&/;t;s/\./_$fia./"`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
# it's easier to change to here first
cd aaa
# process every file
for f in $(find . -type f); do
# strips everything after the first / so this is our foldername
foldername=${f/\/*/}
# creates the new filename from substrings of the
# original filename concatenated to the foldername
newfilename=".${f:1:3}${foldername}_${f:4}"
# if you are satisfied with the output, just leave out the `echo`
# from below
echo mv ${f} ${newfilename}
done
Might work for you.
See here in action. (slightly modified, as ideone.com handles STDIN/find diferently...)

Shell Script - list files, read files and write data to new file

I have a special question to shell scripting.
Simple scripting is no Problem for me but I am new on this and want to make me a simple database file.
So, what I want to do is:
- Search for filetypes (i.e. .nfo) <-- should be no problem :)
- read inside of each found file and use some strings inside
- these string of each file should be written in a new file. Each found file informations
should be one row in new file
I hope I explained my "project" good.
My problem is now, to understand how I can tell the script it has to search for files and then use each of this files to read in it and use some information in it to write this in a new file.
I will explain a bit better.
I am searching for files and that gives me back:
file1.nfo
file2.nfo
file3.nfo
Ok now in each of that file I need the information between 2 lines. i.e.
file1.nfo:
<user>test1</user>
file2.nfo:
<user>test2</user>
so in the new file there should now be:
file1.nfo:user1
file2.nfo:user2
OK so:
find -name *.nfo > /test/database.txt
is printing out the list of files.
and
sed -n '/<user*/,/<\/user>/p' file1.nfo
gives me back the complete file and not only the information between <user> and </user>
I try to go on step by step and I am reading a lot but it seems to be very difficult.
What am I doing wrong and what should be the best way to list all files, and write the files and the content between two strings to a file?
EDIT-NEW:
Ok here is an update for more informations.
I learned now a lot and searched the web for my problems. I can find a lot of informations but i don´t know how to put them together so that i can use it.
Working now with awk is that i get back filename and the String.
Here now the complete Informations (i thought i can go on by myself with a bit of help but i can´t :( )
Here is an example of: /test/file1.nfo
<string1>STRING 1</string1>
<string2>STRING 2</string2>
<string3>STRING 3</string3>
<string4>STRING 4</string4>
<personal informations>
<hobby>Baseball</hobby>
<hobby>Baskeball</hobby>
</personal informations>
Here an example of /test/file2.nof
<string1>STRING 1</string1>
<string2>STRING 2</string2>
<string3>STRING 3</string3>
<string4>STRING 4</string4>
<personal informations>
<hobby>Soccer</hobby>
<hobby>Traveling</hobby>
</personal informations>
The File i want to create has to look like this.
STRING 1:::/test/file1.nfo:::Date of file:::STRING 4:::STRING 3:::Baseball, Basketball:::STRING 2
STRING 1:::/test/file2.nfo:::Date of file:::STRING 4:::STRING 3:::Baseball, Basketball:::STRING 2
"Date of file" should be the creation date of the file. So that i can see how old is the file.
So, that´s what i need and it seems not easy.
Thanks a lot.
UPATE ERROR -printf
find: unrecognized: -printf
Usage: find [PATH]... [OPTIONS] [ACTIONS]
Search for files and perform actions on them.
First failed action stops processing of current file.
Defaults: PATH is current directory, action is '-print'
-follow Follow symlinks
-xdev Don't descend directories on other filesystems
-maxdepth N Descend at most N levels. -maxdepth 0 applies
actions to command line arguments only
-mindepth N Don't act on first N levels
-depth Act on directory *after* traversing it
Actions:
( ACTIONS ) Group actions for -o / -a
! ACT Invert ACT's success/failure
ACT1 [-a] ACT2 If ACT1 fails, stop, else do ACT2
ACT1 -o ACT2 If ACT1 succeeds, stop, else do ACT2
Note: -a has higher priority than -o
-name PATTERN Match file name (w/o directory name) to PATTERN
-iname PATTERN Case insensitive -name
-path PATTERN Match path to PATTERN
-ipath PATTERN Case insensitive -path
-regex PATTERN Match path to regex PATTERN
-type X File type is X (one of: f,d,l,b,c,...)
-perm MASK At least one mask bit (+MASK), all bits (-MASK),
or exactly MASK bits are set in file's mode
-mtime DAYS mtime is greater than (+N), less than (-N),
or exactly N days in the past
-mmin MINS mtime is greater than (+N), less than (-N),
or exactly N minutes in the past
-newer FILE mtime is more recent than FILE's
-inum N File has inode number N
-user NAME/ID File is owned by given user
-group NAME/ID File is owned by given group
-size N[bck] File size is N (c:bytes,k:kbytes,b:512 bytes(def.))
+/-N: file size is bigger/smaller than N
-links N Number of links is greater than (+N), less than (-N),
or exactly N
-prune If current file is directory, don't descend into it
If none of the following actions is specified, -print is assumed
-print Print file name
-print0 Print file name, NUL terminated
-exec CMD ARG ; Run CMD with all instances of {} replaced by
file name. Fails if CMD exits with nonzero
-delete Delete current file/directory. Turns on -depth option
The pat1,pat2 notation of sed is line based. Think of it like this, pat1 sets an enable flag for its commands and pat2 disables the flag. If both pat1 and pat2 are on the same line the flag will be set, and thus in your case print everything following and including the <user> line. See grymoire's sed howto for more.
An alternative to sed, in this case, would be to use a grep that supports look-around assertions, e.g. GNU grep:
find . -type f -name '*.nfo' | xargs grep -oP '(?<=<user>).*(?=</user>)'
If grep doesn't support -P, you can use a combination of grep and sed:
find . -type f -name '*.nfo' | xargs grep -o '<user>.*</user>' | sed 's:</\?user>::g'
Output:
./file1.nfo:test1
./file2.nfo:test2
Note, you should be aware of the issues involved with passing files on to xargs and perhaps use -exec ... instead.
It so happens that grep outputs in the format you need and is enough for an one-liner.
By default a grep '' *.nfo will output something like:
file1.nfo:random data
file1.nfo:<user>test1</user>
file1.nfo:some more random data
file2.nfo:not needed
file2.nfo:<user>test2</user>
file2.nfo:etc etc
By adding the -P option (Perl RegEx) you can restrict the output to matches only:
grep -P "<user>\w+<\/user>" *.nfo
output:
file1.nfo:<user>test1</user>
file2.nfo:<user>test2</user>
Now the -o option (only show what matched) saves the day, but we'll need a bit more advanced RegEx since the tags are not needed:
grep -oP "(?<=<user>)\w+(?=<\/user>)" *.nfo > /test/database.txt
output of cat /test/database.txt:
file1.nfo:test1
file2.nfo:test2
Explained RegEx here: http://regex101.com/r/oU2wQ1
And your whole script just became a single command.
Update:
If you don't have the --perl-regexp option try:
grep -oE "<user>\w+<\/user>" *.nfo|sed 's#</?user>##g' > /test/database.txt
All you need is:
find -name '*.nfo' | xargs awk -F'[><]' '{print FILENAME,$3}'
If you have more in your file than just what you show in your sample input then this is probably all you need:
... awk -F'[><]' '/<user>/{print FILENAME,$3}' file
Try this (untested):
> outfile
find -name '*.nfo' -printf "%p %Tc\n" |
while IFS= read -r fname tstamp
do
awk -v tstamp="$tstamp" -F'[><]' -v OFS=":::" '
{ a[$2] = a[$2] sep[$2] $3; sep[$2] = ", " }
END {
print a["string1"], FILENAME, tstamp, a["string4"], a["string3"], a["hobby"], a["string2"]
}
' "$fname" >> outfile
done
The above will only work if your file names do not contain spaces. If they can, we'd need to tweak the loop.
Alternative if your find doesn't support -printf (suggestion - seriously consider getting a modern "find"!):
> outfile
find -name '*.nfo' -print |
while IFS= read -r fname
do
tstamp=$(stat -c"%x" "$fname")
awk -v tstamp="$tstamp" -F'[><]' -v OFS=":::" '
{ a[$2] = a[$2] sep[$2] $3; sep[$2] = ", " }
END {
print a["string1"], FILENAME, tstamp, a["string4"], a["string3"], a["hobby"], a["string2"]
}
' "$fname" >> outfile
done
If you don't have "stat" then google for alternatives to get a timestamp from a file or consider parsing the output of ls -l - it's unreliable but if it's all you've got...