Correcting file numbers using bash - regex

I have a bunch of file names in a folder like this:
test_07_ds.csv
test_08_ds.csv
test_09_ds.csv
test_10_ds.csv
...
I want to decrease the number of every file, so that these become:
test_01_ds.csv
test_02_ds.csv
test_03_ds.csv
test_04_ds.csv
...
Here's what I came up with:
for i in $1/*; do
n=${i//[^0-9]/};
n2=`expr $n - 6`;
if [ $n2 -lt 10 ]; then
n2="0"$n2;
fi
n3=`echo $i | sed -r "s/[0-9]+/$n2/"`
echo $n3;
cp $i "fix/$n3";
done;
Is there a cleaner way of doing this?

This might help:
shopt -s extglob
for i in test_{07..10}_ds.csv; do
IFS=_ read s m e <<<"$i"; # echo "Start=$s Middle=$m End=$e"
n=${m#+(0)} # Remove leading zeros to
# avoid interpretation as octal number.
n=$((n-6)) # Subtract 6.
n=$(printf '%02d' "$n") # Format `n` with a leading 0.
# comment out the next echo to actually execute the copy.
echo \
cp "$i" "fix/${s}_${n}_${e}";
done;
Or collapsing it all together
#!/bin/bash
shopt -s extglob
for i in ${1:-.}/*; do # $1 will default to pwd `.`
IFS=_ read s m e <<<"$i"; # echo "Start=$s Middle=$m End=$e"
n=$(printf '%02d' "$((${m#+(0)}-6))")
cp "$i" "fix/${s}_${n}_${e}";
done;

You can use awk for simplification:
for f in *.csv; do
mv "$f" $(awk 'BEGIN{FS=OFS="_"} {$2 = sprintf("%02d", $2-6)} 1' <<< "$f")
done

Could you please try following code and let me know if this helps you.
awk 'FNR==1{OLD=FILENAME;split(FILENAME, A,"_");A[2]=A[2]-6;NEW=A[1]"_"A[2]"_"A[3];system("mv " OLD " " NEW);close(OLD)}' *.csv
Also I had assumed like your files are always starting from _7 name so I have deducted 6 from each of their names, also in case you could put complete path in mv command which is placed in above system awk's built-in utility and could move the files to another place too. Let me know how it goes then.

Related

How to find specific text in a text file, and append it to the filename?

I have a collection of plain text files which are named as yymmdd_nnnnnnnnnn.txt, which I want to append another number sequence to the filenames, so that they each become named as yymmdd_nnnnnnnnnn_iiiiiiiii.txt instead, where the iiiiiiiii is taken from the one line in each file which contains the text "GST: 123456789⏎" (or similar) at the end of the line. While I am sure that there will only be one such matching line within each file, I don't know exactly which line it will be on.
I need an elegant one-liner solution that I can run over the collection of files in a folder, from a bash script file, to rename each file in the collection by appending the specific GST number for each filename, as found within the files themselves.
Before even getting to the renaming stage, I have encountered a problem with this. Here is what I tried, which didn't work...
# awk '/\d+$/' | grep -E 'GST: ' 150101_2224567890.txt
The grep command alone works perfectly to find the relevant line within the file, but the awk doesn't return just the final digits group. It fails with the error "warning: regexp escape sequence \d is not a known regexp operator". I had assumed that this regex should return any number of digits which are at the end of the line. The text file in question contains a line which ends with "GST: 112060340⏎". Can someone please show me how to make this work, and maybe also to help with the appropriate coding to move the collection of files to the new filenames? Thanks.
Thanks to a comment from #Renaud, I now have the following code working to obtain just the GST registration number from within a text file, which puts me a step closer towards a workable solution.
awk '/GST: / {printf $NF}' 150101_2224567890.txt
I still need to loop this over the collection instead of just specifying one filename. I also need to be able to use the output from #Renaud's contribution, to rename the files. I'm getting closer to a working solution, thanks!
This awk should work for you:
awk '$1=="GST:" {fn=FILENAME; sub(/\.txt$/, "", fn); print "mv", FILENAME, fn "_" $2 ".txt"; nextfile}' *_*.txt | sh
To make it more readable:
awk '$1 == "GST:" {
fn = FILENAME
sub(/\.txt$/, "", fn)
print "mv", FILENAME, fn "_" $2 ".txt"
nextfile
}' *_*.txt | sh
Remove | sh from above to see all mv commands together.
You may try
for f in *_*.txt; do echo mv "$f" "${f%.txt}_$(sed '/.*GST: /!d; s///; q' "$f").txt"; done
Drop the echo if you're satisfied with the output.
As you are sure there is only one matching line, you can try:
$ n=$(awk '/GST:/ {print $NF}' 150101_2224567890.txt)
$ mv 150101_2224567890.txt "150101_2224567890_$n.txt"
Or, for all .txt files:
for f in *.txt; do
n=$(awk '/GST:/ {print $NF}' "$f")
if [[ -z "$n" ]]; then
printf '%s: GST not found\n' "$f"
continue
fi
mv "$f" "$f{%.txt}_$n.txt"
done
Another one-line solution to consider, although perhaps not so elegant.
for original_filename in *_*.txt; do \
new_filename=${original_filename%'.txt'}_$(
grep -E 'GST: ' "$original_filename" | \
sed -E 's/.*GST//g; s/[^0-9]//g'
)'.txt' && \
mv "$original_filename" "$new_filename"; \
done
Output:
150101_2224567890_123456789.txt
If you are open to a multi line script:-
#!/bin/sh
for f in *.txt; do
prefix=$(echo "${f}" | sed s'#\.txt##')
cp "${f}" f1
sed -i s'#GST#%GST#' "./f1"
cat "./f1" | tr '%' '\n' > f2
number=$(cat "./f2" | sed -n '/GST/'p | cut -d':' -f2 | tr -d ' ')
newname="${prefix}_${number}.txt"
mv -v "${f}" "${newname}"
rm -v "./f1"
rm -v "./f2"
done
In general, if you want to make your files easy to work with, then leave as many potential places for them to be split with newlines as possible. It is much easier to alter files by simply being able to put what you want to delete or print on its' own line, than it is to search for things horizontally with regular expressions.

bash sript to check script file extension and adding an extension

I have written the following Bash script. Its role is to check its own name, and in case of nonexistent extension , to amend ".sh" with sed. Still I have error "missing target file..."
#!/bin/bash
FILE_NAME="$0"
EXTENSION=".sh"
FILE_NAME_MOD="$FILE_NAME$EXTENSION"
if [[ "$0" != "FILE_NAME_MOD" ]]; then
echo mv -v "$FILENAME" "$FILENAME$EXTENSION"
cp "$0" | sed 's/\([^.sh]\)$/\1.sh/g' $0
fi
#!/bin/bash
file="$0"
extension=".sh"
if [ $(echo -n $file | tail -c 3) != $extension ]; then
mv -v "$file" "$file$extension"
fi
Important stuff:
-n flag suppress the new line at the end, so we can test for 3 chars instead of 4
When in doubt, always use set -x to debug your scripts.
Try this Shellcheck-clean code:
#! /bin/bash -p
file=${BASH_SOURCE[0]}
extension=.sh
[[ $file == *"$extension" ]] || mv -i -- "$file" "$file$extension"
See choosing between $0 and BASH_SOURCE for details of why ${BASH_SOURCE[0]} is better than $0.
See Correct Bash and shell script variable capitalization for details of why file is better than FILE and extension is better than EXTENSION. (In short, ALL_UPPERCASE names are dangerous because there is a danger that they will clash with names that are already used for something else.)
The -i option to mv means that you will be prompted to continue if the new filename is already in use.
See Should I save my scripts with the .sh extension? before adding .sh extensions to your shell programs.
Just for fun, here is a way to do it just with GNU sed:
#!/usr/bin/env bash
sed --silent '
# match FILENAME only if it does not end with ".sh"
/\.sh$/! {
# change "FILENAME" to "mv -v FILENAME FILENAME.sh"
s/.*/mv -v & &.sh/
# execute the command
e
}
' <<<"$0"
You can also make the above script output useful messages:
#!/usr/bin/env bash
sed --silent '
/\.sh$/! {
s/.*/mv -v & &.sh/
e
# exit with code 0 immediately after the change has been made
q0
}
# otherwise exit with code 1
q1
' <<<"$0" && echo 'done' || echo 'no changes were made'

Modify bash variables with sed

I am trying to modify a number of environmental variables containing predefined compiler flags. To do so, I tried using a bash loop that goes over all environmental variables listed with "env".
for i in $(env | grep ipo | awk 'BEGIN {FS="="} ; { print $1 } ' )
do echo $(sed -e "s/-ipo/ / ; s/-axAVX/ /" <<< $i)
done
This is not working since the loop variable $i contains just the name of the environmental variable stored as a character string. I tried searching a method to convert a string into a variable but things started becoming unnecessary complicated. The basic problem is how to properly supply the environmental variable itself to sed.
Any ideas how to properly modify my script are welcome.
Thanks,
Alex
Part I
The way you're parsing env is wrong. It breaks whenever you have spaces or wildcards. Instead use this:
while IFS= read -r line; do
# do stuff with variable line
done < <(env)
To see why your solution is broken, do:
for i in $(env); do
echo "$i"
done
and you'll very likely see a difference with the output of env.
Now the while method I gave will also break when you have newlines in your variables.
Very likely your env version has the flag -0 or --null. Use it to be 100% safe:
while IFS= read -r -d '' line; do
# do stuff with variable line
done < <(env --null)
Part II
When you have read your line, you want to split it into a key and a value. Don't use awk for that. Use Bash:
key=${line%%=*}
value=${line#*=}
Look:
while IFS= read -r -d '' line; do
key=${line%%=*}
value=${line#*=}
echo "key: $key"
echo "value: $value"
done < <(env --null)
Part III
Now I understand that you want to act only on the variables that contain the string ipo, and for these you want to substitute a space for the first occurence of the string -ipo and -axAVX. So:
while IFS= read -r -d '' line; do
key=${line%%=*}
value=${line#*=}
[[ $value = *ipo* ]] || continue
value=${value/-ipo/ }
value=${value/-axAVX/ }
echo "key: $key"
echo "new value: $value"
done < <(env --null)
Part IV
You want to replace the variable with this new value. You can use declare for this. (You don't need the export builtin, since your variable is already marked as exported):
while IFS= read -r -d '' line; do
key=${line%%=*}
value=${line#*=}
[[ $value = *ipo* ]] || continue
value=${value/-ipo/ }
value=${value/-axAVX/ }
declare "$key=$value"
done < <(env --null)
Part V
Finally, you'll try to put this in a script and you'll realize that it doesn't work: that's because a script is executed in a child process and every changes made in a child process are not seen by the parent process. So you'll want to source it! To source a file file, use:
. file
(yes, a dot, a space and the name of the file).
Try with indirect expansion:
for i in $(env | grep ipo | awk 'BEGIN {FS="="} ; { print $1 } ' )
do
echo $(sed -e "s/-ipo/ / ; s/-axAVX/ /" <<< ${!i})
done
I think the bit you are missing is the ${!i} to expand the variable called whatever $i is set to..
#!/bin/sh
for i in $(env | grep ipo | awk 'BEGIN {FS="="} ; { print $1 }' )
do
val=$(sed -e "s/-ipo/ / ; s/-axAVX/ /" <<< ${!i})
export ${i}=${val}
echo ${i} is now set to $val
done
... do stuff with new env variables
If you run the script it will change the environment variable for itself and anything it spawns.
When it returns however you will still have the same environment you started with.
$ echo $IPOVAR
blah -ipo -axAVX end # variable stats as this
$ sh env.sh
IPOVAR is now set to blah end # It is changed!
$ echo $IPOVAR
blah -ipo -axAVX end # Its still the same.
I believe you can do it all in awk:
env | grep ipo | awk -F= '{ gsub("-ipo","",$2); gsub("-axAVX","",$2); print $0}'

Bash Script sed command not working correctly with file passed through command line

Problem
As I am trying to write a script to rename massive files according to some regex requirement, the command work ok on my iTerm2 succeeds but the same command fails to do the work in the script.
Plus some of my file names includes some Chinese and Korean characters.(don't know whether that is the problem or not)
code
So My code takes three input: Old regex, New regex and the files that need to be renamed.
Here is not code:
#!/bin/bash
# we have less than 3 arguments. Print the help text:
if [ $# -lt 3 ] ; then
cat << HELP
ren -- renames a number of files using sed regular expressions USAGE: ren 'regexp'
'replacement' files...
EXAMPLE: rename all *.HTM files into *.html:
ren 'HTM' 'html' *.HTM
HELP
exit 0
fi
OLD="$1"
NEW="$2"
# The shift command removes one argument from the list of
# command line arguments.
shift
shift
# $# contains now all the files:
for file in "$#"; do
if [ -f "$file" ] ; then
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
if [ -f "$newfile" ]; then
echo "ERROR: $newfile exists already"
else
echo "renaming $file to $newfile ..."
mv "$file" "$newfile"
fi
fi
done
I register the bash command in the .profile as:
alias ren="bash /pathtothefile/ren.sh"
Test
The original file name is "제01과.mp3" and I want it to become "第01课.mp3".
So with my script I use:
$ ren "제\([0-9]*\)과" "第\1课" *.mp3
And it seems that the sed in the script has not worked successfully.
But the following which is exactly the same, works to replaces the name:
$ echo "제01과.mp3" | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
Any thoughts? Thx
Print the result
I have make the following change in the script so that it could print the process information:
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
echo "The ${file} is changed to ${newfile}"
And the result for my test is:
The 제01과.mp3 is changed into 제01과.mp3
ERROR: 제01과.mp3 exists already
So there is no format problem.
Updating(all done under bash 4.2.45(2), Mac OS 10.9)
Testing
As I try to execute the command from the bash directly. I mean with the for loop. There is something interesting. I first stored all the names into a files.txt file using:
$ ls | grep mp3 > files.txt
And do the sed and bla bla. While single command in bash interactive mode like:
$ file="제01과.mp3"
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
gives
第01课.mp3
While in the following in the interactive mode:
files=`cat files.txt`
for file in $files
do
echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
done
gives no changes!
And by now:
echo $file
gives:
$ 제30과.mp3
(There are only 30 files)
Problem Part
And I tried the first command which worked before:
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
It gives no changes as:
$ 제30과.mp3
So I create a new newfile and tried again as:
$ newfile="제30과.mp3"
$ echo $newfile | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
And it gives correctly:
$第30课.mp3
WOW ORZ... Why! Why ! Why! And I try to see whether file and newfile are the same, and of course, they are not:
if [[ $file == $new ]]; then
echo True
else
echo False
fi
gives:
False
My guess
I guess there are some encoding problems , but I have found non reference, could anyone help? Thx again.
Update 2
I seem to understand that there are a huge difference between string and the file name. To be specific, it I directly use a variable like:
file="제30과.mp3"
in the script, the sed works fine. However, if the variable was passed from the $# or set the variable like:
file=./*mp3
Then the sed fails to work. I don't know why. And btw, mac sed has no -r option and in ubuntu -r does not solve the question I mention above.
Some errors combined:
In order to use groups in a regex, you need extended regex -r in sed, -E in grep
escaping correctly is a beast :)
Example
files="제2과.mp3 제30과.mp3"
for file in $files
do
echo $file | sed -r 's/제([0-9]*)과\.mp3/第\1课.mp3/g'
done
outputs
第2课.mp3
第30课.mp3
If you are not doing this as a programming project, but want to skip ahead to the part where it just works, I found these resources listed at http://www.tldp.org/LDP/GNU-Linux-Tools-Summary/html/x4055.htm:
MMV (and MCP, MLN, ...) utilities use a specialized syntax to perform bulk file operations on paths. (http://linux.maruhn.com/sec/mmv.html)
mmv before\*after.mp3 Before\#1After.mp3
Esomaniac, a Java alternative that also works on Windows, is apparently dead (home page is parked).
rename is a perl script you can download from CPAN: https://metacpan.org/release/File-Rename
rename 's/\.JPG$/.jpg/' *.JPG

BASH script to *create* filenames with spaces from filenames with "%20"

First, I know this sounds ass backwards. It is. But I'm looking to convert (on the BASH command line) a bunch of script-generated thumbnail filenames that do have a "%20" in them to the equivalent without filenames. In case you're curious, the reason is because the script I'm using created the thumbnail filenames from their current URLs, and it added the %20 in the process. But now WordPress is looking for files like "This%20Filename.jpg" and the browser is, of course, removing the escape character and replacing it with spaces. Which is why one shouldn't have spaces in filenames.
But since I'm stuck here, I'd love to convert my existing thumbnails over. Next, I will post a question for help fixing the problem in the script mentioned above. What I'm looking for now is a quick script to do the bad thing and create filenames with spaces out of filenames with "%20"s.
Thanks!
If you only want to replace each literal %20 with one space:
for i in *; do
mv "$i" "${i//\%20/ }"
done
(for instance this will rename file%with%20two%20spaces to file%with two spaces).
You'll probably need to apply %25->% too though, and other similar transforms.
convmv can do this, no script needed.
$ ls
a%20b.txt
$ convmv --unescape *.txt --notest
mv "./a%20b.txt" "./a b.txt"
Ready!
$ ls
a b.txt
personally, I don't like file names with spaces - beware you will have to treat them specially in future scripts. Anyway, here is the script that will do what you want to achieve.
#!/bin/sh
for fname in `ls *%20*`
do
newfname=`echo $fname | sed 's/%20/ /g'`
mv $fname "$newfname"
done;
Place this to a file, add execute permission and run this from the directory where you have file with %20 in their names.
Code :
#!/bin/bash
# This is where your files currently are
DPATH="/home/you/foo/*.txt"
# This is where your new files will be created
BPATH="/home/you/new_foo"
TFILE="/tmp/out.tmp.$$"
[ ! -d $BPATH ] && mkdir -p $BPATH || :
for f in $DPATH
do
if [ -f $f -a -r $f ]; then
/bin/cp -f $f $BPATH
sed "s/%20/ /g" "$f" > $TFILE && mv $TFILE "$f"
else
echo "Error: Cannot read $f"
fi
done
/bin/rm $TFILE
Not bash, but for the more general case of %hh (encoded hex) in names.
#!/usr/bin/perl
foreach $c(#ARGV){
$d=$c;
$d=~s/%([a-fA-F0-9][a-fA-F0-9])/my $a=pack('C',hex($1));$a="\\$a"/eg;
print `mv $c $d` if ($c ne $d);
}