I am trying to automate the periodic detection and elimination of files, using fdupes. I got this beautiful script:
# from here:
# https://www.techrepublic.com/blog/linux-and-open-source/how-to-remove-duplicate-files-without-wasting-time/
OUTF=rem-duplicates_2019-01.sh;
echo "#! /bin/sh" > $OUTF;
find "$#" -type f -printf "%s\n" | sort -n | uniq -d |
xargs -I## -n1 find "$#" -type f -size ##c -exec md5sum {} \; |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/;' >> $OUTF;
chmod a+x $OUTF; ls -l $OUTF
This produces a file with this structure:
#! /bin/sh
#rm ./directory_a/file_a
#rm ./directory_b/file_identical_to_a
#rm ./directory_a/file_b
#rm ./directory_b/file_identical_to_b
#rm ./directory_c/another_file_identical_to_b
#rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
I want to remove the # tag from the first line of each paragraph to get
rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
I have been trying to modify the next-to-last line, with variations of things like this:
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/;s/\n\n#rm/\n\nrm/;' >> $OUTF;
But cannot manage SED to recognize the (\n\n) or any other pointer I can think of to the beginning of the paragraph. What am I doing wrong?
Edit: I am unable to edit the comment, so here is the final script:
TEMPF=temp.txt;
OUTF=rem-duplic_2019-01.sh
echo "#! /bin/sh" > $TEMPF;
find "$#" -type f -printf "%s\n" | sort -n | uniq -d |
xargs -I## -n1 find "$#" -type f -size ##c -exec md5sum {} \; |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $TEMPF;
awk -v a=2 '/^$/{a=2}!--a{sub(/#/,"")}1' $TEMPF > $OUTF
chmod a+x $OUTF; ls -l $OUTF
rm $TEMPF
Use awk instead:
awk '/^$/{a=1} !a--{sub(/#/,"")} 1' a=1 file
/^$/ { a = 1 } means set a to 1 if current line is a blank one,
!a-- is a shorthand for a-- == 0, following action ({ sub(/#/, "") }) removes the first # from current line,
1 means print all lines,
a=1 is required to remove # from the line after shebang (i.e 2nd line).
With sed:
sed "1n;/^#/,/^$/{ s///;}" file
You can use this too:
sed '/^$\|^#!/{N;s/#r/r/}' input.txt
feel free to add the in-place opt if you want
This might work for you (GNU sed):
sed '/^#!\|^\s*$/{n;s/.//}' file
If the current line is a shebang or an empty line, print it and remove the first character of the next line.
Just use Perl with paragraph mode
perl -00 -pe ' s/^#// '
With inputs
$ cat yozzarian.txt
#! /bin/sh
#rm ./directory_a/file_a
#rm ./directory_b/file_identical_to_a
#rm ./directory_a/file_b
#rm ./directory_b/file_identical_to_b
#rm ./directory_c/another_file_identical_to_b
#rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
$ perl -00 -pe ' s/^#// ' yozzarian.txt
! /bin/sh
#rm ./directory_a/file_a
#rm ./directory_b/file_identical_to_a
rm ./directory_a/file_b
#rm ./directory_b/file_identical_to_b
#rm ./directory_c/another_file_identical_to_b
rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
$
Related
[context] My script needs to replace semvers of multiple .car names with commit sha. In short, I would like that every dev_CA_1.0.0.car became dev_CA_6a8zt5d832.car
ADDING commit sha right before .car was pretty trivial. With this, I end up with dev_CA_1.0.0_6a8zt5d832.car
find . -depth -name "*.car" -exec sh -c 'f="{}"; \
mv -- "$f" $(echo $f | sed -e 's/.car/_${CI_COMMIT_SHORT_SHA}.car/g')' \;
But I find it incredibly difficult to REPLACE. What aspect of sed am I misconceiving trying this:
find . -depth -name "*.car" -exec sh -c 'f="{}"; \
mv -- "$f" $(echo $f | sed -r -E 's/[0-9\.]+.car/${CI_COMMIT_SHORT_SHA}.car/g')
or this
find . -depth -name "*.car" -exec sh -c 'f="{}"; \
mv -- "$f" $(echo $f | sed -r -E 's/^(.*_)[0-9\.]+\.car/\1${CI_COMMIT_SHORT_SHA}\.car/g')' \;
no matches found: f="{}"; mv -- "$f" $(echo $f | sed -r -E ^(.*_)[0-9.]+.car/1684981321531.car/g)
or multiple variants:
\ escaping (e.g. \\.)
( and ) escaping (e.g. \() (I read somewhere that regex grouping with sed requires some care with ())
Is there a more direct way to do it?
Edit
$f getting in sed are path looking like
./somewhere/some_project_CA_1.2.3.car
./somewhere_else/other_project_CE_9.2.3.car
You may use
sed "s/_[0-9.]\{1,\}\.car$/_${CI_COMMIT_SHORT_SHA}.car/g"
See the online demo
Here, sed is used with a POSIX ERE expression, that matches
_ - an underscore
[0-9.]\{1,\} - 1 or more digits or dots
\.car - .car (note that a literal . must be escaped! a . pattern matches any char)
$ - end of string.
Can you try this :
export CI_COMMIT_SHORT_SHA=6a8zt5d832
find . -depth -name "*.car" -exec sh -c \
'f="{}"; echo mv "$f" "${f%_*}_${CI_COMMIT_SHORT_SHA}.car"' \;
Remove echo once you are satisfied of the result.
I am currently using the following code:
#!/bin/bash
rm /media/external/archive/auth-settings.tar.raw
rm /media/external/archive/bak-settings.tar.raw
rm /media/external/archive/cont-settings.tar.raw
rm /media/external/archive/data-data.tar.raw
rm /media/external/archive/data-settings.tar.raw
rm /media/external/archive/mon-data.tar.raw
rm /media/external/archive/mon-settings.tar.raw
rm /media/external/archive/mail-data.tar.raw
rm /media/external/archive/mail-settings.tar.raw
rm /media/external/archive/portal-settings.tar.raw
rm /media/external/archive/webserver-data.tar.raw
rm /media/external/archive/webserver-settings.tar.raw
for f in /media/external/archive/*.tar.raw;
do mv "$f" "${f%.*.tar.raw}.tar.raw";
done
to remove old backups once the new ones have been archived. However if for some reason the archiving fails, this script will be run regardlessly and it will remove all the archives, leaving nothing behind.
How can I modify the script in a way so that the rm command are only done if the the corresponding archive exists with a count number in its file name skipping the deletion if the numbered archive does not exist. For example:
auth-setting.tar.raw
should be removed only if there is a
auth-settings.number.tar.raw
You can use an if statement in the script. For example :
if [ -e auth-settings.*.tar.raw ]
then
rm /media/external/archive/auth-settings.tar.raw
fi
the "*" characther give you a wildcard to insert between the 2 dots.
In this way you delete /media/external/archive/auth-settings.tar.raw only if "auth-settings.number.tar.raw" exist.
This will return a list of files which have a number as in the format you provided, but it will strip the number so you can only delete this output :
find /media/external/archive/ -type f -regextype sed -regex ".*\.[0-9]\+\.tar\.raw" -print0 | xargs --null -L 1 basename | sed -E "s/(.*)(\.[0-9]+)(.*)/\1\3/g"
Files in the directory :
more-files.03242.tar.raw
somethingwithdot.3.tar.raw
'with spaces.09434.tar.raw'
Output :
somethingwithdot.tar.raw
more-files.tar.raw
with spaces.tar.raw
Now you can safely iterate and delete these files as you can be sure they have a backup.
I have files on an ubuntu 14.04 linux machine that have invalid windows characters such as <>?:"|* etc. in a directory. I wish to remove these invalid windows characters so that they may be viewable from a windows machine as well.
Eg: the following are a couple of files in the directory:
file "1".html
file "asdf".txt
The expected output after renaming should be: (essentially, it renames the invalid characters with a single underscore)
file _1_.html
file _asdf_.txt
I've been running the command from Find files with illegal windows characters in the name on Linux (modifying it slightly):
find . -name "*[<>\\|?:\"*]*" -exec bash -c 'x="{}"; y=$(sed "s/[<>\\|?:\"*]\+/_/g" <<< "$x") && echo "renaming" "$x" "-->" "$y" && mv "$x" "$y" ' \;
But, the bash command above fails on files with double quotes in them. it works fine for the other invalid characters.
Can you help fix this script? Thanks in advance.
Using bash parameter expansion
$ touch 'file "1".html' 'file "asdf".txt' 'a<b' 'f?r' 'e*w' 'z|e' 'w:r' 'b>a'
$ ls
a<b b>a e*w file "1".html file "asdf".txt f?r w:r z|e
$ find -name "*[<>\\|?:\"*]*" -exec bash -c 'echo mv "$0" "${0//[<>\\|?:\"*]/_}"' {} \;
mv ./z|e ./z_e
mv ./file "asdf".txt ./file _asdf_.txt
mv ./a<b ./a_b
mv ./file "1".html ./file _1_.html
mv ./e*w ./e_w
mv ./w:r ./w_r
mv ./f?r ./f_r
mv ./b>a ./b_a
$ find -name "*[<>\\|?:\"*]*" -exec bash -c 'mv "$0" "${0//[<>\\|?:\"*]/_}"' {} \;
$ ls
a_b b_a e_w file _1_.html file _asdf_.txt f_r w_r z_e
To use extglob
$ touch 'tmp::<>|asdf.txt'
$ find -name "*[<>\\|?:\"*]*" -exec bash -c 'shopt -s extglob; echo mv "$0" "${0//+([<>\\|?:\"*])/_}"' {} \;
mv ./tmp::<>|asdf.txt ./tmp_asdf.txt
With perl based rename
$ find -name "*[<>\\|?:\"*]*" -exec rename 's/[<>\\|?:\"*]/_/g' {} +
$ ls
a_b b_a e_w file _1_.html file _asdf_.txt f_r w_r z_e
Use rename -n for dry run without actually renaming the files
$ touch 'tmp::<>|asdf.txt'
$ find -name "*[<>\\|?:\"*]*" -exec rename -n 's/[<>\\|?:\"*]+/_/g' {} +
rename(./tmp::<>|asdf.txt, ./tmp_asdf.txt)
I am trying to use sed to replace all instances of a command with a variable, expect when they come after a comment or is part of another word. I have gotten close, being able to replace one instance before a comment, but not if there is more than one.
I have a test file with the line:
rm rm # rm
I want to make this read:
$RM $RM # rm
This is what I have so far:
sed -i 's/\(^\|[^[#.*]]\)\brm\b/\1$RM/' file1
Which returns:
$RM rm # rm
Any help is much appreciated. Other solutions not involving sed are welcome, but I might need some help understanding them.
Thanks!
EDIT:
This is just an example of what I am looking for. Not every line will be formatted like this, and not every line will contain a command before the comment, or vise versa. I am just looking for a solution that will also cover a situation similar to this example. Sorry for the lack of explanation. Here is a slightly better example:
"$#" #rm
# rm
rm # rm
rm
"rm "
'rm '
`rm `
{rm }
$# rm # rm
rm rm # rm
rm # rm rm
rmremovermlink
Output should be:
"$#" #rm
# rm
$RM # rm
$RM
"$RM "
'$RM '
`$RM `
{$RM }
$# $RM # rm
$RM $RM # rm
$RM # rm rm
rmremovermlink
You can use this sed command:
sed ':a;s/^\([^#]*\)\<rm\>/\1$RM/;ta;' file
# ^ ^ ^ ^ ^ ^ ^---- go to label "a" if something is replaced
# | | | | | '---- back-reference to capture group 1
# | | | '---'---- word boundaries
# | | '---- capture group 1
# | '---- replacement (only one occurrence)
# '---- defines the label "a"
You can use this perl command for this substitution:
perl -pe 's/(?<!\$)#.*$(*SKIP)(*F)|\brm\b/\$RM/g' file
"$#" #rm
# rm
$RM # rm
$RM
"$RM "
'$RM '
`$RM `
{$RM }
$# $RM # rm
$RM $RM # rm
$RM # rm rm
rmremovermlink
RegEx Demo
\$[A-Z]{2}\s\$[A-Z]{2}\s\#\s[a-z]{2}
Let me know if this is what you're looking for. It matches :
$RM $RM # rm
I'm looking for a script for below structure:
Before :
/Description/TestCVin/OpenCVin/NameCv/.....
/Description/blacVin/baka/NameCv_hubala/......
/Description/CVintere/oldCvimg/NameCv_add/.....
after:
/Description/TestaplCVin/OpenaplCVin/NameaplCv/.....
/Description/blaapcVlin/baka/NameaplCv_hubala/......
/Description/aplCVintere/oldaplCvimg/NameaplCv_add/.....
I want to rename " Cv or CV or cV " >> "aplCv or aplCV or aplcV" in all folder by regular expression...
My script does look like:
#!/bin/sh
printf "Input your Directory path: -> "
read DIR
cd "$DIR"
FILECASE=$(find . -iname "*cv*")
LAST_DIR_NAME=""
for fdir in $FILECASE
do
if [[ -d $fdir ]];
then
LAST_DIR_NAME=$fdir
fi
FILE=$(echo $fdir | sed -e "s/\([Cc][Vv]\)/arpl\1/g")
echo "la file $FILE"
if ([[ -f $fdir ]] && [[ "$fdir" =~ "$LAST_DIR_NAME" ]]);
then
FILECASE=$(find . -iname "*cv*")
tmp=$(echo $LAST_DIR_NAME | sed -e "s/\([Cc][Vv]\)/arpl\1/g")
fdir=$(echo $fdir | sed -e 's|'$LAST_DIR_NAME'|'$tmp'|g')
fi
mv -- "$fdir" "$FILE"
done
But it throws an error ..:(
How could I write it to rename the files according to their folder names?
You can do like this
#!/bin/sh
printf "Input your Directory path: -> "
read DIR
cd "$DIR"
MYARRAY=$(find . -iname "*cv*" )
touch "tmpfile"
for fdir in $MYARRAY
do
echo "$fdir" >> "tmpfile"
done
MYARRAY=$(tac "tmpfile")
for fdir in $MYARRAY
do
cd "$fdir"
prev=$(cd -)
base=$(basename $fdir)
cd ..
nDIR=$(echo "$base" | sed -e "s/\([Cc][Vv]\)/arpl\1/g")
mv "$base" "$nDIR"
cd $prev
done
rm -f "tmpfile"
Also one issue i think tac command not included in Mac OS X.Instead tac use tail -r like MYARRAY=$(tail -r "tmpfile")
Always make a backup before playing with this kind of scripts.
You can try the following:
find . -iname '*cv*' -exec echo 'mv {} $(echo $(dirname {})/$(basename {}|sed s/cv/apl/gi))' \;|tac|xargs -i bash -c 'eval {}'
This uses -exec to print commands for renaming.
The second arguments are generated by using shell substitutions to replace cv with apl in the last part of the path.
tac is used to reverse the order of the commands, so that we do not rename a directory before working with its contents.
Finally, we eval the commands with bash.
Also, do not use -exec in a permanent script. Please read the security warnings about exec in the find man-page.