Copy files based on a timestamp value in a file name - regex

All the files reside in one folder. File names look like following:
1695_6892_20160321000000_20160321235959.file.name.csv.gz
The third substring (after the second _) is a timestamp.
How do i copy all files with a timestamp < 20150531000000 to another folder my_folder?

Try this:
for i in *.gz; do test `echo $i | cut -d _ -f 3` -lt 20150531000000 && cp $i my_folder; done

And... You can use awk.
for i in $(ls -1 org_folder | awk -F"_" '{ if ($3 < 20150531000000) print $0 }'); cp mv org_foler/$i my_folder/; done

ls | awk -F'_' '$3<20150531000000{print}'
should be the files you want to move, so
for f in "$(ls|awk -F'_' '$3<20150531000000{print}')"; do mv "${f}" elsewhere/ ;done

Related

How to find specific text in a text file, and append it to the filename?

I have a collection of plain text files which are named as yymmdd_nnnnnnnnnn.txt, which I want to append another number sequence to the filenames, so that they each become named as yymmdd_nnnnnnnnnn_iiiiiiiii.txt instead, where the iiiiiiiii is taken from the one line in each file which contains the text "GST: 123456789⏎" (or similar) at the end of the line. While I am sure that there will only be one such matching line within each file, I don't know exactly which line it will be on.
I need an elegant one-liner solution that I can run over the collection of files in a folder, from a bash script file, to rename each file in the collection by appending the specific GST number for each filename, as found within the files themselves.
Before even getting to the renaming stage, I have encountered a problem with this. Here is what I tried, which didn't work...
# awk '/\d+$/' | grep -E 'GST: ' 150101_2224567890.txt
The grep command alone works perfectly to find the relevant line within the file, but the awk doesn't return just the final digits group. It fails with the error "warning: regexp escape sequence \d is not a known regexp operator". I had assumed that this regex should return any number of digits which are at the end of the line. The text file in question contains a line which ends with "GST: 112060340⏎". Can someone please show me how to make this work, and maybe also to help with the appropriate coding to move the collection of files to the new filenames? Thanks.
Thanks to a comment from #Renaud, I now have the following code working to obtain just the GST registration number from within a text file, which puts me a step closer towards a workable solution.
awk '/GST: / {printf $NF}' 150101_2224567890.txt
I still need to loop this over the collection instead of just specifying one filename. I also need to be able to use the output from #Renaud's contribution, to rename the files. I'm getting closer to a working solution, thanks!
This awk should work for you:
awk '$1=="GST:" {fn=FILENAME; sub(/\.txt$/, "", fn); print "mv", FILENAME, fn "_" $2 ".txt"; nextfile}' *_*.txt | sh
To make it more readable:
awk '$1 == "GST:" {
fn = FILENAME
sub(/\.txt$/, "", fn)
print "mv", FILENAME, fn "_" $2 ".txt"
nextfile
}' *_*.txt | sh
Remove | sh from above to see all mv commands together.
You may try
for f in *_*.txt; do echo mv "$f" "${f%.txt}_$(sed '/.*GST: /!d; s///; q' "$f").txt"; done
Drop the echo if you're satisfied with the output.
As you are sure there is only one matching line, you can try:
$ n=$(awk '/GST:/ {print $NF}' 150101_2224567890.txt)
$ mv 150101_2224567890.txt "150101_2224567890_$n.txt"
Or, for all .txt files:
for f in *.txt; do
n=$(awk '/GST:/ {print $NF}' "$f")
if [[ -z "$n" ]]; then
printf '%s: GST not found\n' "$f"
continue
fi
mv "$f" "$f{%.txt}_$n.txt"
done
Another one-line solution to consider, although perhaps not so elegant.
for original_filename in *_*.txt; do \
new_filename=${original_filename%'.txt'}_$(
grep -E 'GST: ' "$original_filename" | \
sed -E 's/.*GST//g; s/[^0-9]//g'
)'.txt' && \
mv "$original_filename" "$new_filename"; \
done
Output:
150101_2224567890_123456789.txt
If you are open to a multi line script:-
#!/bin/sh
for f in *.txt; do
prefix=$(echo "${f}" | sed s'#\.txt##')
cp "${f}" f1
sed -i s'#GST#%GST#' "./f1"
cat "./f1" | tr '%' '\n' > f2
number=$(cat "./f2" | sed -n '/GST/'p | cut -d':' -f2 | tr -d ' ')
newname="${prefix}_${number}.txt"
mv -v "${f}" "${newname}"
rm -v "./f1"
rm -v "./f2"
done
In general, if you want to make your files easy to work with, then leave as many potential places for them to be split with newlines as possible. It is much easier to alter files by simply being able to put what you want to delete or print on its' own line, than it is to search for things horizontally with regular expressions.

Bash multiple copy latest files with timestamp using regex

please, I need a help with rename of multiple files. One application in our generating everyday 3 reports with filemask OPEN_REPORTn_yyyymmddHH24Miss.csv, e.g listing like this one:
/mnt/server/OPEN_REPORT1_20180604130922.csv
/mnt/server/OPEN_REPORT2_20180604130922.csv
/mnt/server/OPEN_REPORT3_20180604130922.csv
I want this files copy as
/mnt/server/OPEN_REPORT1.csv
/mnt/server/OPEN_REPORT2.csv
/mnt/server/OPEN_REPORT3.csv
and keep original files without change the name (so, that means that I must list only 3 last files)
I have this solution:
cp $(ls -t /mnt/server/OPEN_REPORT1_* | head -n1) /mnt/server/OPEN_REPORT1.csv
cp $(ls -t /mnt/server/OPEN_REPORT2_* | head -n1) /mnt/server/OPEN_REPORT2.csv
cp $(ls -t /mnt/server/OPEN_REPORT3_* | head -n1) /mnt/server/OPEN_REPORT3.csv
But this solution is not too effective because I'm using more cp command as I need. I want copy those files with only one use cp command and with regular expressions.
I'm trying solution like this one:
for file in $(ls -t /mnt/server/OPEN_REPORT?_??????????????.csv | head -n3); do echo ${file} | sed 's/OPEN_REPORT([0-9]{1})/$1/'; done
but result for echo doesn't looks fine.
Please any help with solution? Thanks for any advice
SOLUTION (thanks to David Peltier):
for file in $(ls -t /mnt/server/OPEN_REPORT?_??????????????.csv | head -n3); do cp $file ${file%_*}.csv; done
try this
for file in $(ls -1 /mnt/server/*.csv); do cp /mnt/server/$file /mnt/server/${file%_*}.csv;done
Bash can do replacement and you no longer need to use sed.
${var%Pattern}, ${var%%Pattern}
${var%Pattern} Remove from $var the shortest part of $Pattern that matches the back end of $var.
${var%%Pattern} Remove from $var the longest part of $Pattern that matches the back end of $var.
https://www.tldp.org/LDP/abs/html/parameter-substitution.html

Correcting file numbers using bash

I have a bunch of file names in a folder like this:
test_07_ds.csv
test_08_ds.csv
test_09_ds.csv
test_10_ds.csv
...
I want to decrease the number of every file, so that these become:
test_01_ds.csv
test_02_ds.csv
test_03_ds.csv
test_04_ds.csv
...
Here's what I came up with:
for i in $1/*; do
n=${i//[^0-9]/};
n2=`expr $n - 6`;
if [ $n2 -lt 10 ]; then
n2="0"$n2;
fi
n3=`echo $i | sed -r "s/[0-9]+/$n2/"`
echo $n3;
cp $i "fix/$n3";
done;
Is there a cleaner way of doing this?
This might help:
shopt -s extglob
for i in test_{07..10}_ds.csv; do
IFS=_ read s m e <<<"$i"; # echo "Start=$s Middle=$m End=$e"
n=${m#+(0)} # Remove leading zeros to
# avoid interpretation as octal number.
n=$((n-6)) # Subtract 6.
n=$(printf '%02d' "$n") # Format `n` with a leading 0.
# comment out the next echo to actually execute the copy.
echo \
cp "$i" "fix/${s}_${n}_${e}";
done;
Or collapsing it all together
#!/bin/bash
shopt -s extglob
for i in ${1:-.}/*; do # $1 will default to pwd `.`
IFS=_ read s m e <<<"$i"; # echo "Start=$s Middle=$m End=$e"
n=$(printf '%02d' "$((${m#+(0)}-6))")
cp "$i" "fix/${s}_${n}_${e}";
done;
You can use awk for simplification:
for f in *.csv; do
mv "$f" $(awk 'BEGIN{FS=OFS="_"} {$2 = sprintf("%02d", $2-6)} 1' <<< "$f")
done
Could you please try following code and let me know if this helps you.
awk 'FNR==1{OLD=FILENAME;split(FILENAME, A,"_");A[2]=A[2]-6;NEW=A[1]"_"A[2]"_"A[3];system("mv " OLD " " NEW);close(OLD)}' *.csv
Also I had assumed like your files are always starting from _7 name so I have deducted 6 from each of their names, also in case you could put complete path in mv command which is placed in above system awk's built-in utility and could move the files to another place too. Let me know how it goes then.

Bash copy all directory with content that matches a pattern

Is there some way to copy the directories including the contents using bash script. For example
// Suppose there are many directory inside Test in c as,
/media/test/
-- en_US
-- file1
-- file 2
-- de_DE
-- file 1
-- SUB-dir1
-- sub file 1
-- file 2
.....
.....
-- Test 1
-- testfile1
-- folder
--- more 1
............
NoW i want to copy all the directories (including sub-directory and files)
to another location which matches the pattern.
--> for example , in above case I want the directories en_US and de_DE to be copied in another
location including sub-directories and files.
So Far I have done/ find out :
1) Needed Pattern as , /b/w{2}_/w{2}/b
2) I can list all the directories as ,
$MYDIR="/media/test/"
DIRS=`ls -l $MYDIR | egrep '^d' | awk '{print $10}'`
for DIR in $DIRS
do
echo ${DIR}
done
Now I need help in combining these together so that the script can copy all the directory(including sub contents) that matches the pattern to another location.
Thanks in advance.
To selectively copy an entire directory structure to a similar directory structure, while filtering the contents, in a general way your best bet is to archive the original directory and unarchive. For instance, using GNU Tar:
$ mkdir destdir
$ tar -c /media/test/{en_US,de_DE} | tar -C destdir -x --strip-components=1
In this example, the /media/test directory structure is partially recreated under destdir, excluding the /media prefix (thanks to --strip-components=1).
The left side tar archives just the directories/paths which match the pattern that we specified. The archive is produced on that command's standard output, which is piped to the decoding tar on the right hand side. The -C tells it to change to the destination directory. It extracts the files there, removing a leading path component.
$ ls destdir
test
$ ls destdir/test
en_US de_DE
Of course, your specific example test case is quite easily handled with cp -a:
$ mkdir destdir
$ cp -a /media/test/{en_US,de_DE} destdir
If the pattern is complicated, involving multiple selections of subtree material at deeper and/or different levels of the source directory hierarchy, then you need the more general approach, if you wish to do the copy in a single batch command which just specifies source patterns.
I'm not sure about your environment, but I guess you try to do this:
cp -r src_dir/??_?? dest_dir
Here is your starter for 10:
You will have to add the extra checks and balances that you require but it should give you a flying start.
#!/bin/bash
# assumes $1 is source to search and $2 to destination to copy to
subdirs=`find $1 -name ??_?? -print`
echo $subdirs
for x in $subdirs
do
echo $x
cp -a $x $2
done
Please check if this is what you wanted. It searches for directories with format xx_yy/ab_cd/&&_$$ (2char_2char) and copies the content to a new directory .
usage : ./script.sh
cat script.sh
#!/bin/bash
MYDIR="/media/test/"
NEWDIRPATH="/media/test_new"
DIRS=`ls -l $MYDIR | grep "^d" | awk '{print $9}'`
for DIR in $DIRS
do
total_characters=`echo $DIR | wc -m`
if [ $total_characters -eq 6 ]; then
has_underscore=`echo "$DIR" | grep "_"`
if [ "$has_underscore" != "" ]; then
echo "${DIR}"
start_string_count=`echo $DIR | awk -F '_' '{print $1}' | wc -m`
end_string_count=`echo $DIR | awk -F '_' '{print $2}' | wc -m`
echo "start_string_count => $start_string_count ; end_string_count => $end_string_count"
if [ $start_string_count -eq 3 ] && [ $end_string_count -eq 3 ]; then
mkdir -p $NEWDIRPATH/"$DIR"_new
cp -r $DIR $NEWDIRPATH/"$DIR"_new
fi
fi
fi
done

grep files with whitespaces in filename

I have a list contains filenames. Some of filenames contain whitespaces:
./folder/folder/some file name.ext
I need to grep each of these files:
cat filelist | while read i; do grep "pattern" $i; done
Obviously grep fails because of whitespaces:
grep ./folder/folder/some: No such file or directory
grep file: No such file or directory
grep name: No such file or directory
I've tried to escape whitespaces like:
:%s/some file name/some\ file\ name/g
but no luck.
How can I perform my operation?
Thanks!
You can use this loop:
while read -r i; do grep "pattern" "$i"; done < filelist
Using a pipe with cat is error prone and BASH will treat strings with space as separate arguments.
Simpler if you use xargs:
xargs -d '\n' -- grep pattern -- < filelist
If you really need to have only 1 file processed per instance of grep, add -n 1:
xargs -n 1 -d '\n' -- grep pattern -- < filelist
You can also use readarray:
readarray -t files < filelist
for f in "${files[#]}"; do grep pattern -- "$f"; done
Or simply
readarray -t files < filelist
grep pattern -- "${files[#]}"
And make sure files are in UNIX format:
sed -i 's|\r||' filelist ## Or
dos2unix filelist
Which you can do directly with process substitution:
readarray -t files < <(exec sed -e 's|\r||' filelist)
xargs -d '\n' -- grep pattern -- < <(exec sed -e 's|\r||' filelist)
Quote $i
cat filelist | while read i; do grep "pattern" "$i"; done
Easy and effecient:
var=$(awk '{ print "\""$0"\""}' filelist)
command="grep \"pattern\" $var"
eval $command
Or if you wanted it as a one liner:
command="grep -ir \"pattern\" $(awk '{ print "\""$0"\""}' <<< "$(ls)")"; eval $command