Using rrdtool to monitor few servers - rrdtool

Please help to understand. I find the simple of script in off site about update RRDTool base.
But for me need to create one rrd base to all servers. Please help to understand what way the best and just give some point how to do this.
Send data from all servers to rrdtool base and update it? or try to get all data from server where rrdtool and update in local level?
#!/bin/sh
a=0
while [ "$a" == 0 ]; do
snmpwalk -c public 192.168.1.250 hrSWRunPerfMem > snmp_reply
total_mem=`awk 'BEGIN {tot_mem=0}
{ if ($NF == "KBytes")
{tot_mem=tot_mem+$(NF-1)}
}
END {print tot_mem}' snmp_reply`
# I can use N as a replacement for the current time
rrdtool update target.rrd N:$total_mem
# sleep until the next 300 seconds are full
perl -e 'sleep 300 - time % 300'
done # end of while loop

Related

Extracting part of lines with specific pattern and sum the digits using bash

I am just learning bash scripting and commands and i need some help with this assignment.
I have txt file that contains the following text and i need to:
Extract guest name ( 1.1.1 ..)
Sum guest result and output the guest name with result.
I used sed with simple regex to extract out the name and the digits but i have no idea about how to summarize the numbers becuase the guest have multiple lines record as you can see in the txt file. Note: i can't use awk for processing
Here is my code:
cat file.txt | sed -E 's/.*([0-9]{1}.[0-9]{1}.[0-9]{1}).*([0-9]{1})/\1 \2/'
And result is:
1.1.1 4
2.2.2 2
1.1.1 1
3.3.3 1
2.2.2 1
Here is the .txt file:
Guest 1.1.1 have "4
Guest 2.2.2 have "2
Guest 1.1.1 have "1
Guest 3.3.3 have "1
Guest 2.2.2 have "1
and the output should be:
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
Thank you in advance
I know your teacher wont let you use awk but, since beyond this one exercise you're trying to learn how to write shell scripts, FYI here's how you'd really do this job in a shell script:
$ awk -F'[ "]' -v OFS=' = ' '{sum[$2]+=$NF} END{for (id in sum) print id, sum[id]}' file
3.3.3 = 1
2.2.2 = 3
1.1.1 = 5
and here's a bash builtins equivalent which may or may not be what you've covered in class and so may or may not be what your teacher is expecting:
$ cat tst.sh
#!/bin/env bash
declare -A sum
while read -r _ id _ cnt; do
(( sum[$id] += "${cnt#\"}" ))
done < "$1"
for id in "${!sum[#]}"; do
printf '%s = %d\n' "$id" "${sum[$id]}"
done
$ ./tst.sh file
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
See https://www.artificialworlds.net/blog/2012/10/17/bash-associative-array-examples/ for how I'm using the associative array. It'll be orders of magnitude slower than the awk script and I'm not 100% sure it's bullet-proof (since shell isn't designed to process text there are a LOT of caveats and pitfalls) but it'll work for the input you provided.
OK -- since this is a class assignment, I will tell you how I did it, and let you write the code.
First, I sorted the file. Then, I read the file one line at a time. If the name changed, I printed out the previous name and count, and set the count to be the value on that line. If the name did not change, I added the value to the count.
Second solution used an associative array to hold the counts, using the guest name as the index. Then you just add the new value to the count in the array element indexed on the guest name.
At the end, loop through the array, print out the indexes and values.
It's a lot shorter.

how to use grep to parse out columns in csv

I have a log with millions of lines that like this
1482364800 bunch of stuff 172.169.49.138 252377 + many other things
1482364808 bunch of stuff 128.169.49.111 131177 + many other things
1482364810 bunch of stuff 2001:db8:0:0:0:0:2:1 124322 + many other things
1482364900 bunch of stuff 128.169.49.112 849231 + many other things
1482364940 bunch of stuff 128.169.49.218 623423 + many other things
Its so big that I can't really read it into memory for python to parse so i want to zgrep out only the items I need into another smaller file but Im not very good with grep. In python I would normally open.gzip(log.gz) then pull out data[0],data[4],data[5]to a new file so my new file only has the epoc and ip and date(the ip can be ipv6 or 4)
expected result of the new file:
1482364800 172.169.49.138 252377
1482364808 128.169.49.111 131177
1482364810 2001:db8:0:0:0:0:2:1 124322
1482364900 128.169.49.112 849231
1482364940 128.169.49.218 623423
How do I do this zgrep?
Thanks
To select columns you have to use cut command zgrep/grep select lines
so you can use cut command like this
cut -d' ' -f1,2,4
in this exemple I get the columns 1 2 and 4 with space ' ' as a delimiter of the columns
yous should know that -f option is used to specify numbers of columns and -d for the delimiter.
I hope that I have answered your question
I'm on OSX and maybe that is the issue but I couldnt get zgrep to work in filtering out columns. and zcat kept added a .Z at the end of the .gz. Here's what I ended up doing:
awk '{print $1,$3,$4}' <(gzip -dc /path/to/source/Largefile.log.gz) | gzip > /path/to/output/Smallfile.log.gz
This let me filter out the 3 columns I needed from the Largefile to a Smallfile while keeping both the source and destination in compressed format.

AWS CLI "s3 ls" command to list a date range of files in a virtual folder

I'm trying to list files from a virtual folder in S3 within a specific date range. For example: all the files that have been uploaded for the month of February.
I currently run a aws s3 ls command but that gives all the files:
aws s3 ls s3://Bucket/VirtualFolder/VirtualFolder --recursive --human-readable --summarize > c:File.txt
How can I get it to list only the files within a given date range?
You could filter the results with a tool like awk:
aws s3 ls s3://Bucket/VirtualFolder/VirtualFolder --recursive --human-readable --summarize \
| awk -F'[-: ]' '$1 >= 2016 && $2 >= 3 { print }'
Where awk splits each records using -, :, and space delimiters so you can address fields as:
$1 - year
$2 - month
$3 - day
$4 - hour
$5 - minute
$6 - second
The aws cli ls command does not support filters, so you will have to bring back all of the results and filter locally.
Realizing this question was tagged command-line-interface, I have found the best way to address non-trivial aws-cli desires is to write a Python script.
Tersest example:
$ python3 -c "import boto3; print(boto3.client('s3').list_buckets()['Buckets'][0])"
Returns: (for me)
{'Name': 'aws-glue-scripts-282302944235-us-west-1', 'CreationDate': datetime.datetime(2019, 8, 22, 0, 40, 5, tzinfo=tzutc())}
That one-liner isn't a profound script, but it can be expounded into one. (Probably with less effort than munging a bash script, much as I love bash.) After looking up a few boto3 calls, you can deduce the rest from equivalent cli commands.

Compare modification time of file with given time stamps?

I have a hdfs dir with lots of files and directory in it in following format.
-rw-rw-rw- 3 root xyz <filesize> 2015-04-12 00:34 file1
-rw-rw-rw- 3 root xyz <filesize> 2015-04-11 11:34 file2
-rw-rw-rw- 3 root xyz <filesize> 2015-04-09 09:54 file3
drwxrwxrwx 3 root xyz 0 2015-04-02 00:34 dir
I've one awk script which filters the files from the list using
awk '{ if($1 !~ /d.*/ ) {print $0}}'.
I am using this in c++ function which has two timestamps in date(yyyy-mm-dd) and time (hh:mm:ss) format.
I would like to put condition in 'if' of 'awk' which can filter the files which lies between two timestamps.
I tried doing
($6 >= startDate) && ($6 <= endDate) && ($7 >= startTime) && ($7 <= endTime),
but this is not working as expected. I am newbie to awk.
You can use find to simplify this.
find . -newermt "$dt1" ! -newermt "$dt2"
Here is a shell script, which i used to test. If you can pass the arguments from within your C++ code, you can shorten this to a one liner with just the find command.
# Date 1
startDate=2015-04-11
startTime=21:10:00
dt1="$startDate $startTime"
# Date 2
endDate=2015-04-11
endTime=22:10:00
dt2="$endDate $endTime"
find . -newermt "$dt1" ! -newermt "$dt2"
Note: I assume, startdate and starttime go together and enddate and endtime go together. But you have the freedom to choose otherwise.

RRD DB fake value generator

I want to generate fake values in RRD DB for a period of 1 month and with 5 seconds as a frequency for data collection. Is there any tool which would fill RRD DB with fake data for given time duration.
I Googled a lot but did not find any such tool.
Please help.
I would recommend the following one liner:
perl -e 'my $start = time - 30 * 24 * 3600; print join " ","update","my.rrd",(map { ($start+$_*5).":".rand} 0..(30*24*3600/5))' | rrdtool -
this assumes you have an rrd file called my.rrd and that is contains just one data source expecting GAUGE type data.