get file names, time stamps and MD5 checksums from a log file - regex

I want to write a bash script, that will take the output of a log file and extract the relevant content to another log file, which I will used to do statistical analysis of the time it takes to send a file as an example:
The content is as follows:
FileSize TimeStamp MD5 Full Path to File
4824597 2013-06-21 11:26 5a264...c11 ...45/.../.../ITAM.xml
4824597 2013-06-20 23:18 5a264...c11 ...48/.../.../1447_rO8iKD.TMP.ITAM.xml
I am trying to extract the TimeStamp and the Full Path to the File.
I am a beginner in scripting but so far I have tried:
cat "/var/log/Customer.log" | grep '2013* *11' >> test.txt
Are there other methods I'm missing. Thank you very much.

If you want extract the TimeStamp and the Full Path for all entries then this should work:
awk 'NR>1{print $2,$3,$NF}' inputFile > outputFile

Code for GNU sed:
sed -nr '2,$ {s/\S+\s+(\S+)\s+(\S+)\s+\S+\s+(.*)/\1 \2\t\3/;p}' file
$cat file
FileSize TimeStamp MD5 Full Path to File
4824597 2013-06-21 11:26 5a264...c11 ...45/.../.../ITAM.xml
4824597 2013-06-20 23:18 5a264...c11 ...48/.../.../1447_rO8iKD.TMP.ITAM.xml
$sed -nr '2,$ {s/\S+\s+(\S+)\s+(\S+)\s+\S+\s+(.*)/\1 \2\t\3/;p}' file
2013-06-21 11:26 ...45/.../.../ITAM.xml
2013-06-20 23:18 ...48/.../.../1447_rO8iKD.TMP.ITAM.xml

Looks like this is what you want:
awk '$2 ~ /^2013/ && $4 ~ /11$/ { print $2, $3, $NF; }' /var/log/Customer.log > test.txt
$2 ~ /^2013/ matches dates beginning with 2013
$4 ~ /11$/ matches MD5 ending with 11
print $2, $3, $NF prints fields 2 (date), 3 (time), and the last field (pathname)
If these regular expressions are confusing to you, go to Regular-Expressions.info and read the tutorial.

Assuming the columns are tab-separated, you can just use cut:
cut -f2,4 /var/log/Customer.log | grep -v MD5 >> test.txt
will append columns 2 and 4 (counting starts at 1) into the test.txt. Lines containing MD5 will be removed by the grep invocation.

You can do it like this:
awk 'NR!=1 {print $2 " " $3 "\t" $5}' Customer.log > stat.txt

Related

egrep to find largest suffix for file

There are files like this:
Report.cfg
Report.cfg.1
Report.cfg.2
Report.cfg.3
I want to fetch the max suffix, if exists (i.e. 3) using egrep.
If I try simple egrep:
ls | egrep Report.cfg.*
I get the full file name and the whole list, not the suffix only.
What could be an optimized egrep?
You can use this awk to find greatest number from a list of file ending with dot and a number.:
printf '%s\n' *.cfg.[0-9] | awk -F '.' '$NF > max{max = $NF} END{print max}'
3

Renaming files by using a prefix from text file

I have a set of files named
sample_exp_A1_A01
sample_exp_A2_A02
sample_exp_A3_A03
sample_exp_A4_A04
sample_exp_A5_A05
And I have a text file with the following values
A01 170
A02 186
A03 165
A04 130
A05 120
I would like to rename the files based on the text file values like
FS_170_sample_exp_A1_A01
FS_186_sample_exp_A2_A02
FS_165_sample_exp_A3_A03
FS_130_sample_exp_A4_A04
FS_120_sample_exp_A5_A05
So match the IDs in the text file based on A01 or A02 or A03 or A04 and then add that corresponding number as prefix to filenames. In addition also prefix all file names with a FS as shown above.
I manually tried doing it but could only do for one file at a time this way
rename 's/^/FS_170_/' *A01
to get
FS_170_sample_exp_A1_A01
Assuming your suffixes are in a file named suffix, you can do this:
for fname in sample*; do
echo mv "$fname" FS_"$(awk -v pf="${fname##*_}" \
'$1 == pf {print $2}' suffix)"_"$fname"
done
It loops over all your files; in the loop, it puts together the new file name by prepending FS_ and the output of
awk -v pf="${fname##*_}" '$1 == pf {print $2}' suffix
This assigns the last part of the input file name to the awk variable pf, and then, for lines in suffix where the first field matches that variable, prints the second field.
Alternatively, if you have a grep that supports Perl compatible regular expressions, you can use grep -Po "${fname##*_} \K.*" suffix instead (using a variable-sized look-behind, \K):
for fname in sample*; do
echo mv "$fname" FS_"$(grep -Po "${fname##*_} \K.*" suffix)"_"$fname"
done
This is added to the new filename, and the rest of the new name is the complete old name.
For your input files, this results in
mv sample_exp_A1_A01 FS_170_sample_exp_A1_A01
mv sample_exp_A2_A02 FS_186_sample_exp_A2_A02
mv sample_exp_A3_A03 FS_165_sample_exp_A3_A03
mv sample_exp_A4_A04 FS_130_sample_exp_A4_A04
mv sample_exp_A5_A05 FS_120_sample_exp_A5_A05
To actually rename the files, the echo has to be removed.
If suffix is gigantic, you can accelerate this by having awk exit after the first match:
awk -v pf="${fname##*_}" '$1 == pf {print $2; exit}' suffix
or grep stop after the first match:
grep -m 1 -Po "${fname##*_} \K.*" suffix

sed/awk specify ranges in regex

I have a file list output like this:
/path/F201405151800
/path/F201405151900
/path/F201405152000
/path/F201405152100
I piped this output to sed and used the following syntax:
sed -n '/F.\{8\}'$var1'/,/F.\{8\}'$var2'/p'
$var1 and $var2 are user inputs and as it can be seen, they refer to hours of the day in my files list. The above syntax works perfectly if $var1 and $var2 values are found. But if the value of $var1 is 16, and $var2 is 19 sed will not output anything because 16 will not be found in the above file list range.
A solution to this was:
sed -n '/F.\{8\}1[6-9]/p'
...which works but the issue I am facing now is how to specify double digit ranges in order to include something like: 16-20. I tried globbing between single quotes (like I'm doing with variables) like this:
sed -n '/F.\{8\}'{16..21}'/p'
...but the output I get is:
sed: can't read /F.\{8\}16/p: No such file or directory
sed: can't read /F.\{8\}17/p: No such file or directory
sed: can't read /F.\{8\}18/p: No such file or directory
sed: can't read /F.\{8\}19/p: No such file or directory
sed: can't read /F.\{8\}20/p: No such file or directory
sed: can't read /F.\{8\}21/p: No such file or directory
I don't really need to use sed, I explored some options with awk but could not obtain what I want, the main issue being that I can't figure out how to specify a regex RS so that I have the hours block as an awk field and do some conditions like
'$2 > 16 && $2 < 21 {print}'
You can use this awk:
awk -F'/' '{h=$3; gsub(/^F[0-9]{6}|[0-9]{4}$/, "", h)} h > 12 && h < 21' file
/path/F201405151800
/path/F201405151900
/path/F201405152000
/path/F201405152100
In order to have double digit ranges you can try something like this:
sed -n '/F.\{8\}(1[6-9]|2[01])/p'
Try to add -e:
sed -n -e '/F.\{8\}'"$var1"'/,/F.\{8\}'"$var2"'/p'
And always quote the variables to prevent word splitting.
Update:
awk -v A=17 -v B=21 -F/ '!/\/F[0-9]{12}$/{next} {h = substr($NF, 10, 2)} h >= A && h <= B'

sed or awk regex, stop matching after semi-colon

I have a multiple strings in a file that looks like this
TXT 20131101 094502,20131101 094502,Fri Nov 1 09:45:02 UTC 2013;
I want a regex that will get everything after TXT and only display that up until the ; using sed or awk
I have tried many ways but I cant seem to get it to stop at the ;
Thanks for any help
I want a regex that will get everything after TXT and only display
that up until the ;
grep -oP 'TXT[^;]*' filename
Using awk:
awk -F';' '{print $1}' filename
Using sed:
sed 's/\([^;]*\).*/\1/' filename
sed "s/TXT\([^;]*\);.*/\1/"
between TXT (so also first space if any) and first ; (not included)
reply from devnull include the "TXT" in the output

AWK replace $0 of second file when match few columns

How I merge two files when two first columns match in both files and replace first file values with second file columns... I mean...
Same number of columns:
FILE 1:
121212,0100,1.1,1.2,
121212,0200,2.1,2.2,
FILE 2:
121212,0100,3.1,3.2,3.3,
121212,0130,4.1,4.2,4.3,
121212,0200,5.1,5.2,5.3,
121212,0230,6.1,6.2,6.3,
OUTPUT:
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,
In other words, I need to print $0 of the second file when match $1 and $2 in both files. I understand the logic but I can't implement it using arrays. That apparently should be used.
Please take a moment to explain any code.
Use awk to print the first 2 fields in the pattern file and pipe to grep to do the match:
$ awk 'BEGIN{OFS=FS=","}{print $1,$2}' file1 | grep -f - file2
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,
The -f option tells grep to take the pattern from a file but using - instead of a filename makes grep take the patterns from stdin.
So the first awk script produces the patterns from file1 which we pipe to match against in file2 using grep:
$ awk 'BEGIN{OFS=FS=","}{print $1,$2}' file1
121212,0100
121212,0200
You probably want to anchor the match to the beginning of the line using ^:
$ awk 'BEGIN{OFS=FS=","}{print "^"$1,$2}' file1
^121212,0100
^121212,0200
$ awk 'BEGIN{OFS=FS=","}{print "^"$1,$2}' file1 | grep -f - file2
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,
Here's one way using awk:
awk -F, 'FNR==NR { a[$1,$2]; next } ($1,$2) in a' file1 file2
Results:
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,