making graphs with a shell script - regex

i need to make a graph with numeric values in a time period, the values represent online users in a web page.
the script will be exectued with cron every 30 mins and the needed html file will be downloaded with wget. but there are some yet unanswered questions & problems:
-i need to get just the numeric value from html code (but grep returns the whole line), how can I get only the numeric value? I can get the line with grep, it looks like this:
Users online: 24 917 </div>
How can I get just the 24917?
-what would be easier? to generate .svg file with the graph, or save values in a .csv file (and generate graph with OOo or something similar). Maybe some other good ideas?
Thanks in advance,
-skazhy

You can do the following to get your number:
Set the regular expression:
digits='[[:digit:]]+ *[[:digit:]]*'
followed by these two lines:
num=$(echo $line | grep -Eo "$digits")
num=${num// }
or these:
# Bash >= 3.2 (syntax may be different for 3.0/3.1)
[[ $line =~ $digits ]]
num=${BASH_REMATCH[#]// }
to extract the number from the variable $line containing the line in your question.
Gnuplot should be readily available. A few examples of its output can be found here.
These are from here.

Just one process (grep):
array=( $(grep whatever filename ) ) && echo "${array[2]}${array[3]}"

Related

Extract Google drive folder id from URL's

I am just trying to extract the Google drive folder id from bunch of different google drive URL's
cat links.txt
https://drive.google.com/drive/mobile/folders/1mzr8lgf50p9z6p-7RyHn4XjnyKSvyyuE?usp=sharing
https://drive.google.com/open?id=1_7vwy0-y0BqvPOtG2Or4pvoChnZHrHAx
https://drive.google.com/folderview?id=1rOLhig0g3DdgB9YfvW8HiqRA6o6LxAFF
https://drive.google.com/file/d/1o2J_NwHS3l1-fM71HaDN-xxres1jHkb_/view?usp=drivesdk
https://drive.google.com/drive/folders/0AKzaqn_X7nxiUk9PVA
https://drive.google.com/drive/mobile/folders/0AKzaqn_X7nxiUk9PVA
https://drive.google.com/drive/mobile/folders/0AKzaqn_X7nxiUk9PVA/1re_-YAGfTuyE1Gt848vzTu4ZDC6j23sG/1Ye90fM5qYMYkXp4QMAcQftsJCFVHswWj/149W7xNROO33zaPvIYTNwvtVGAXFxCg_b?sort=13&direction=a
https://drive.google.com/drive/mobile/folders/1nY48t6MATb0XM-iEdeWzEs70qXW2N4Y9?sort=13&direction=a
https://drive.google.com/drive/folders/1M3Xp3xz44NS8QJO5XJT5DK55MohwN6tF?sort=13&direction=a
Expected Output
1mzr8lgf50p9z6p-7RyHn4XjnyKSvyyuE
1_7vwy0-y0BqvPOtG2Or4pvoChnZHrHAx
1rOLhig0g3DdgB9YfvW8HiqRA6o6LxAFF
1o2J_NwHS3l1-fM71HaDN-xxres1jHkb_
0AKzaqn_X7nxiUk9PVA
0AKzaqn_X7nxiUk9PVA
149W7xNROO33zaPvIYTNwvtVGAXFxCg_b
1nY48t6MATb0XM-iEdeWzEs70qXW2N4Y9
1M3Xp3xz44NS8QJO5XJT5DK55MohwN6tF
After an hour of trial/error , i did came up with this regex - ([01A-Z])(?=[\w-]*[A-Za-z])[\w-]+
It seems to work almost well except it can't process the 3rd last link properly. If there are multiple nested folder ids in URL, i need the innermost one in the output . Can someone please help me out with this error and possibly improve the regex if it can be done in a more efficient way than mine
You may try this sed:
sed -E 's~.*[/=]([01A-Z][-_[:alnum:]]+)([?/].*|$)~\1~' links.txt
1mzr8lgf50p9z6p-7RyHn4XjnyKSvyyuE
1_7vwy0-y0BqvPOtG2Or4pvoChnZHrHAx
1rOLhig0g3DdgB9YfvW8HiqRA6o6LxAFF
1o2J_NwHS3l1-fM71HaDN-xxres1jHkb_
0AKzaqn_X7nxiUk9PVA
0AKzaqn_X7nxiUk9PVA
149W7xNROO33zaPvIYTNwvtVGAXFxCg_b
1nY48t6MATb0XM-iEdeWzEs70qXW2N4Y9
1M3Xp3xz44NS8QJO5XJT5DK55MohwN6tF
With GNU awk:
awk '{print $NF}' FPAT='[a-zA-Z0-9_-]{19,34}' file
$NF: contains last column
FPAT: A regular expression describing the contents of the fields in a record. When set, gawk parses the input into fields, where the fields match the regular expression, instead of using the value of FS as the field separator.
Output:
1mzr8lgf50p9z6p-7RyHn4XjnyKSvyyuE
1_7vwy0-y0BqvPOtG2Or4pvoChnZHrHAx
1rOLhig0g3DdgB9YfvW8HiqRA6o6LxAFF
1o2J_NwHS3l1-fM71HaDN-xxres1jHkb_
0AKzaqn_X7nxiUk9PVA
0AKzaqn_X7nxiUk9PVA
149W7xNROO33zaPvIYTNwvtVGAXFxCg_b
1nY48t6MATb0XM-iEdeWzEs70qXW2N4Y9
1M3Xp3xz44NS8QJO5XJT5DK55MohwN6tF

How to take document per line when combining multiple documents?

Hello everyone,
I have 3000 documents with me. I want to combine the content of those 3000 documents in one single document. I used
cat *.html > Combined_Text.txt
command to do the process. But, I would like to have the data of one document per line in the Combined_Text.txt which means I should just be having 3000 lines of content (one document per line). How to do it? Please help!
The following command will remove new lines from every html and then append the files to each other in Combined_Text.txt.
for f in *.html; do cat $f | tr -d '\n' >> Combined_Text.txt; echo "" >> Combined_Text.txt; done;
That second echo seems to inelegant, I'm sure there is a better way to put the files on their own lines, but it does the job.

Linux Bash Regular Expressions, retrieving data from SNMPGet Output

I've been working on getting a few simple monitoring tools running at home, and decided to be funny and retrieve the printer data along with everything else, however now that I've got the SNMP portion of it working quite well, I can't seem to be able to parse the data that my SNMPGET command retrieves properly in Linux, the current script I am using is as follows:
#!/usr/bin/env bash
# RegEx for Strings: "(.+?)"| -?\d+
RegExStr='"(.+?)"| -?\d+'
# ***
# Brother HL-2150N Printer
# ***
# Order Data: Toner Naame, Toner Level, Drum Name, Drum Status, Total Pages Printer, Display Status
Input=$(snmpget -v 1 -c public 192.168.16.112 SNMPv2-SMI::mib-2.43.11.1.1.6.1.1 SNMPv2-SMI::mib-2.43.11.1.1.8.1.1 SNMPv2-SMI::mib-2.43.11.1.1.6.1.2 SNMPv2-SMI::mib- 2.43.11.1.1.9.1.1 SNMPv2-SMI::mib-2.43.10.2.1.4.1.1 SNMPv2-SMI::mib-2.43.16.5.1.2.1.1 -m BROTHER-MIB)
Output1=( $(echo $Input | egrep -o $RegExStr) )
# Output
echo $Input
echo ${Output1[#]}
Which, oddly enough does not work. I'm fairly certain my regular expression ( "(.+?)" ) is correct, as I've tested it numerous times in various different syntax checkers and testers. It's supposed to select all the data that's between quotation marks ("").
Anyhow, the SNMPGET return is:
SNMPv2-SMI::mib-2.43.11.1.1.6.1.1 = STRING: "Black Toner Cartridge" SNMPv2-SMI::mib-2.43.11.1.1.8.1.1 = INTEGER: -2 SNMPv2-SMI::mib-2.43.11.1.1.6.1.2 = STRING: "Drum Unit" SNMPv2-SMI::mib-2.43.11.1.1.9.1.1 = INTEGER: -3 SNMPv2-SMI::mib-2.43.10.2.1.4.1.1 = Counter32: 13630 SNMPv2-SMI::mib-2.43.16.5.1.2.1.1 = STRING: "SLAAP "
I've tried various things myself, and using grep returns a blank string. to my understanding grep does not support every regular expression command by itself, so I started using egrep, while this returns SOMETHING, it is everything inside the original string divided by spaces, starting at the first quotation mark.
Is there anything I'm missing? I've looked around, and adjusted my methods a few times but never seemed to get a usable array in return.
Anyhow, I appreciate any help/pointers you'd be able to give me. I'd like to be able to get this running, even if just for fun and a good learning experience. Thank you in advance though! I'll be fidgeting on with it some more myself, but will check here every now and then.
From your output:
To get all strings:
grep -oP 'STRING: *"\K[^"]*'
Black Toner Cartridge
Drum Unit
SLAAP
To get all integers:
grep -oP '(INTEGER|Counter32): *\K[^ ]*'
-2
-3
13630
With awk you can do this:
awk 'NR%2==0' RS=\" <<< $Input
Black Toner Cartridge
Drum Unit
SLAAP
Or into a variable
Output1=$(awk 'NR%2==0' RS=\" <<< $Input)

Bash script regex for file size

I'm trying to extract the size (in kb) from a file. Trying to do so as follows:
textA=$(du a)
sizeA=$(expr match "$textA" '\(^[^\s]*\)')
textB=$(du b)
sizeB=$(expr match "$textB" '\(^[^\s]*\)')
echo $textA
echo $sizeA
echo $textB
echo $sizeB
[[ $sizeA == $sizeB ]] && echo "eq"
But this just prints in console textA and textB. Both are like:
30745 a
Can someone please explain why is not the regex matching? I've tried to test the regex against the text in many sites, just to make sure, and it appears to capture the correct text.
I've also tried changing it to:
'^\([^\s]*\)'
But this way it will capture all the text. Any thoughts?
My expr match does not understand \s or other extended regexps. Try '\([0-9]*\)' instead.
But as others mentioned already, using regexp for getting "the first word" is a little overkill. I'd use du s | { read a b; echo $a; }, but you could also use the awk version or solutions using cut.
Not a direct answer, but I would do it like this:
sizeA=$(du a | awk '{print $1}')
size=$(wc -c < file)
If you want to use du, I would use the bash builtin read:
read size filename < <(du file)
Note that you can't say du file | read size filename because in bash, components of a pipeline are executed in subshells, so the variables will disappear when the subshell exits.
Do not parse the output of du, if available you can e.g. use stat to get the size of a file in bytes:
sizeA=$(stat -c%s "${fileA}")

How can I extract a pattern from all files in a directory, using Perl?

I am running a command which returns 96 .txt files for each hour of a particular date.
so finally it gives me 24*96 files for one day in a directory.
My aim is to extract data for four months which will result in 30*24*96*4 files in a directory.
After I get the data I need to extract certain "pattern" from each of the files and display that as output.
1) Below script is only for one day where date is hardcoded in the script
2) I need to make it work for all days in a month and I need to run it from june to october
3) As data is huge , my disk will run out of space so I don't want to create these many files instead i just want to grep on the fly and get only one output file .
How can i efficiently do this ?
My shell script looks like this
for R1 in {0..9}; do
for S1 in {0..95}; do
echo $R1 $S1
curl -H "Accept-Encoding: gzip" "http://someservice.com/getValue?Count=96&data=$S1&fields=hitType,QueryString,pathInfo" | zcat > 20101008-mydata-$R1-$S1.txt
done
done
This returns the files I need.
After that, I extract a URL pattern from each of the file grep "test/link/link2" * | grep category > 1. output
you can use this awk command to get URLs
awk -vRS="</a>" '/href/&&/test.*link2/&&/category/{gsub(/.*<a.*href=\"|\".*/,"");print}' file
Here's how to loop over 4 months worth of dates
#!/usr/bin/perl
use strict;
use warnings;
use Date::Simple ':all';
for (my $date = ymd(2010,4,1), my $end = ymd(2010,8,1);$date < $end; $date++) {
my $YYYYMMDD = $date->format("%Y%m%d");
process_one_day($YYYYMMDD); # Add more formats if needed as parameters
}
sub process_one_day {
my $YYYYMMDD = shift;
# ...
# ... Insert your code to process that date
# ... Either call system() command on the sample code in your question
# ... Or better yet write a native Perl equivalent
# ...
# ... For native processing, use WWW::Mechanize to extract the data from the URL
# ... and Perl's native grep() to grep for it
}