I have a hdfs dir with lots of files and directory in it in following format.
-rw-rw-rw- 3 root xyz <filesize> 2015-04-12 00:34 file1
-rw-rw-rw- 3 root xyz <filesize> 2015-04-11 11:34 file2
-rw-rw-rw- 3 root xyz <filesize> 2015-04-09 09:54 file3
drwxrwxrwx 3 root xyz 0 2015-04-02 00:34 dir
I've one awk script which filters the files from the list using
awk '{ if($1 !~ /d.*/ ) {print $0}}'.
I am using this in c++ function which has two timestamps in date(yyyy-mm-dd) and time (hh:mm:ss) format.
I would like to put condition in 'if' of 'awk' which can filter the files which lies between two timestamps.
I tried doing
($6 >= startDate) && ($6 <= endDate) && ($7 >= startTime) && ($7 <= endTime),
but this is not working as expected. I am newbie to awk.
You can use find to simplify this.
find . -newermt "$dt1" ! -newermt "$dt2"
Here is a shell script, which i used to test. If you can pass the arguments from within your C++ code, you can shorten this to a one liner with just the find command.
# Date 1
startDate=2015-04-11
startTime=21:10:00
dt1="$startDate $startTime"
# Date 2
endDate=2015-04-11
endTime=22:10:00
dt2="$endDate $endTime"
find . -newermt "$dt1" ! -newermt "$dt2"
Note: I assume, startdate and starttime go together and enddate and endtime go together. But you have the freedom to choose otherwise.
Related
I have three csv files containing different data for a common object. These represent data about distinct collections of items at work. These objects have unique codes. The number of files is not important so I will set this problem up with two. I have a handy recipe for joining these files using join -- but the cleaning part is killing me.
File A snippet - contains unique data. Also the cataloging error E B.
B 547
J 65
EB 289
E B 1
CO 8900
ZX 7
File B snippet - unique data about a different dimension of the objects.
B 5
ZX 67
SD 4
CO 76
J 54
EB 10
Note that file B contains a code not in common with file A.
Now I submit to you the "official" canon of codes designated for this set of objects:
B
CO
ZX
J
EB
Note that File B contains a non-canonical code with data. It needs to be captured and documented. Same with bad code in file A.
End goal: run trend and stats on the collections using the various fields from the multiple reports. They mostly match the canon but there are oddballs due to cataloging errors and codes that are no longer in use.
End goal result after merge/join:
B 547 5
J 65 54
EB 289 10
CO 8900 76
ZX 7 67
So my first idea was to use grep -F -f for this, using the canonical codes as a search list then merge with join. Problem is, with one letter codes it's too inclusive. It would seem like a job for awk where it can work with tab delimiters and REGEX the oddball codes. I'm not sure though, how to get awk to use a list to sift other files. Will join alone handle all this? Maybe I merge with join or paste, then sift out the weirdos? Which method is the least brittle and more likely to handle edge cases like the drunk cataloger?
If you're thinking, "Dude, this is better done with Perl or Python ...etc.". I'm all ears. No rules, I just need to deliver!
Your question says the data is csv, but based on your samples I'm assuming it's tsv. I'm also assuming E B should end up in the outlier output and that NA values should be filled with 0.
Given those assumptions, the following may be sufficient:
sort -t $'\t' -k 1b,1 fileA > fileA.sorted && sort -t $'\t' -k 1b,1 fileB > fileB.sorted
join -t $'\t' -a1 -a2 -e0 -o auto fileA.sorted fileB.sorted > out
grep -f codes out > out-canon
grep -vf codes out > out-oddball
The content of file codes:
^B\s
^CO\s
^ZX\s
^J\s
^EB\s
Result:
$ cat out-canon
B 547 5
CO 8900 76
EB 289 10
J 65 54
ZX 7 67
$ cat out-oddball
E B 1 0
SD 0 4
Try this(GNU awk):
awk 'BEGIN{FS=OFS="\t";}ARGIND==1{c[$1]++;}ARGIND==2{b[$1]=$2}ARGIND==3{if (c[$1]) {print $1,$2,b[$1]+0; delete b[$1];} else {if(tolower($1)~"[a-z]+ +[a-z]+")print>"error.fileA"; else print>"oddball.fileA";}}END{for (i in b) {print i,0,b[i] " (? maybe?)";print i,b[i] > "oddball.fileB";}}' codes fileB fileA
It will create error.fileA, oddball.fileA if such lines exists, oddball.fileB.
Normal output didn't write to file, you can write with > yourself when results are ok:
B 547 5
J 65 54
EB 289 10
CO 8900 76
ZX 7 67
SD 0 4 (? maybe?)
Had a hard time reading your description, not sure if this is what you want.
Anyway it's easy to improve this awk code.
You can change to FILENAME=="file1", or FILENAME==ARGV[1] if ARGIND is not working.
I am just learning bash scripting and commands and i need some help with this assignment.
I have txt file that contains the following text and i need to:
Extract guest name ( 1.1.1 ..)
Sum guest result and output the guest name with result.
I used sed with simple regex to extract out the name and the digits but i have no idea about how to summarize the numbers becuase the guest have multiple lines record as you can see in the txt file. Note: i can't use awk for processing
Here is my code:
cat file.txt | sed -E 's/.*([0-9]{1}.[0-9]{1}.[0-9]{1}).*([0-9]{1})/\1 \2/'
And result is:
1.1.1 4
2.2.2 2
1.1.1 1
3.3.3 1
2.2.2 1
Here is the .txt file:
Guest 1.1.1 have "4
Guest 2.2.2 have "2
Guest 1.1.1 have "1
Guest 3.3.3 have "1
Guest 2.2.2 have "1
and the output should be:
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
Thank you in advance
I know your teacher wont let you use awk but, since beyond this one exercise you're trying to learn how to write shell scripts, FYI here's how you'd really do this job in a shell script:
$ awk -F'[ "]' -v OFS=' = ' '{sum[$2]+=$NF} END{for (id in sum) print id, sum[id]}' file
3.3.3 = 1
2.2.2 = 3
1.1.1 = 5
and here's a bash builtins equivalent which may or may not be what you've covered in class and so may or may not be what your teacher is expecting:
$ cat tst.sh
#!/bin/env bash
declare -A sum
while read -r _ id _ cnt; do
(( sum[$id] += "${cnt#\"}" ))
done < "$1"
for id in "${!sum[#]}"; do
printf '%s = %d\n' "$id" "${sum[$id]}"
done
$ ./tst.sh file
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
See https://www.artificialworlds.net/blog/2012/10/17/bash-associative-array-examples/ for how I'm using the associative array. It'll be orders of magnitude slower than the awk script and I'm not 100% sure it's bullet-proof (since shell isn't designed to process text there are a LOT of caveats and pitfalls) but it'll work for the input you provided.
OK -- since this is a class assignment, I will tell you how I did it, and let you write the code.
First, I sorted the file. Then, I read the file one line at a time. If the name changed, I printed out the previous name and count, and set the count to be the value on that line. If the name did not change, I added the value to the count.
Second solution used an associative array to hold the counts, using the guest name as the index. Then you just add the new value to the count in the array element indexed on the guest name.
At the end, loop through the array, print out the indexes and values.
It's a lot shorter.
Please help to understand. I find the simple of script in off site about update RRDTool base.
But for me need to create one rrd base to all servers. Please help to understand what way the best and just give some point how to do this.
Send data from all servers to rrdtool base and update it? or try to get all data from server where rrdtool and update in local level?
#!/bin/sh
a=0
while [ "$a" == 0 ]; do
snmpwalk -c public 192.168.1.250 hrSWRunPerfMem > snmp_reply
total_mem=`awk 'BEGIN {tot_mem=0}
{ if ($NF == "KBytes")
{tot_mem=tot_mem+$(NF-1)}
}
END {print tot_mem}' snmp_reply`
# I can use N as a replacement for the current time
rrdtool update target.rrd N:$total_mem
# sleep until the next 300 seconds are full
perl -e 'sleep 300 - time % 300'
done # end of while loop
i would like to gzip log files but i cannot work out how to run a regex expression in my command.
My Log file look like this, they roll every hour.
-rw-r--r-- 1 aus nds 191353 Sep 28 01:59 fubar.log.20150928-01
-rw-r--r-- 1 aus nds 191058 Sep 28 02:59 fubar.log.20150928-02
-rw-r--r-- 1 aus nds 190991 Sep 28 03:59 fubar.log.20150928-03
-rw-r--r-- 1 aus nds 191388 Sep 28 04:59 fubar.log.20150928-04
script.
FUBAR_DATE=$(date -d "days ago" +"%Y%m%d ")
fubar_file="/apps/fubar/logs/fubar.log."$AUS_DATE"-^[0-9]"
/bin/gzip $fubar_file
i have tried a few varients on using the regex but without success, can you see the simple error in my code.
Thanks in advace
I did:
$ fubar_file="./fubar.log."${FUBAR_DATE%% }"-[0-9][0-9]"
and it worked for me.
Why not make fubar_file an array to hold the matching log file names, and then use a loop to gzip them individually. Then presuming AUS_DATE contains 20150928:
# FUBAR_DATE=$(date -d "days ago" +"%Y%m%d ") # not needed for gzip
fubar_file=( /apps/fubar/logs/fubar.log.$AUS_DATE-[0-9][0-9] )
for i in ${fubar_file[#]}; do
gzip "$i"
done
or if you do not need to preserve the filenames in the array for later use, just gzip the files with a for loop:
for i in /apps/fubar/logs/fubar.log.$AUS_DATE-[0-9][0-9]; do
gzip "$i"
done
or, simply use find to match the files and gzip them:
find /apps/fubar/logs -type f -name "fubar.log.$AUS_DATE-[0-9][0-9]" -execdir gzip '{}' +
Note: all answers presume AUS_DATE contains 20150928.
Below gives the list in a file (unsorted-file) that needs to be sorted in Linux, preferably in a single line linux command.
03123456789abcd
02987654321pqrs
02123456789mnop
03987654321stuv
04123456789ghjk
01000000000
99000000000
97000000000
98000000000
Required sorted file output:
01000000000
02123456789mnop
03123456789abcd
04123456789ghjk
02987654321pqrs
03987654321stuv
97000000000
98000000000
99000000000
Requirement:
If first two char is 01 then it is the header
If first two char is greater than 90 then they are trailers
Sort order: position 3 - 11 and then position 1 - 2
I tried a simple sort command like
$sort unsorted-file > sorted-file.
The requirement 3 failed. Then I tried
$sort -k 1.3, 1.11 -k 1.2 unsorted-file > sorted-file
The trailer records made it to the top of the file because of all zeros from position 3.
The other options that I know is to strip out the headers and trailers; sort the file and merge the header and trailer files back. Is there a way to do in one linux (complex) command itself?
Thanks for your time.
-R-
( grep '^01' unsorted-file
grep -E -v '^(01|9)' unsorted-file | sort -k 1.3,1.11 -k 1.1
grep '^9' unsorted-file ) > sorted-file