Python: Match a special caracter with regular expression

Python: Match a special caracter with regular expression - regex

Hi everyone I'm using the re.match function to extract pieces of string within a row from the file.
My code is as follows:
## fp_tmp => pointer of file
for x in fp_tmp:
try:
cpuOverall=re.match(r"(Overall CPU load average)\s+(\S+)(%)",x)
cpuUsed=re.match(r"(Total)\s+(\d+)(%)",x)
ramUsed=re.match(r"(RAM Utilization)\s+(\d+\%)",x)
####Not Work####
if cpuUsed is not None: cpuused_new=cpuUsed.group(2)
if ramUsed is not None: ramused_new=ramUsed.group(2)
if cpuOverall is not None: cpuoverall_new=cpuOverall.group(2)
except:
searchbox_result = None
Each field is extracted from the following corresponding line:
ramUsed => RAM Utilization 2%
cpuUsed => Total 4%
cpuOverall => Overall CPU load average 12%
ramUsed, cpuUsed, cpuOverall are the variable where I want write the result!!
Corretly line are:
(space undefined) RAM Utilization 2%
(space undefined) Total 4%
(space undefined) Overall CPU load average 12%
When I execute the script all variable return a value: None.
With other variable the script work corretly.
Why the code not work in this case? I use the python3
I think that the problem is a caracter % that not read.
Do you have any suggestions?
PROBLEM 2:
## fp_tmp => pointer of file
for x in fp_tmp:
try:
emailReceived=re.match(r".*(Messages Received)\s+\S+\s+\S+\s+(\S+)",x)
####Not Work####
if emailReceived is not None: emailreceived_new=emailReceived.group(2)
except:
searchbox_result = None
Each field is extracted from the following corresponding on 2 lines in a file:
[....]
Counters: Reset Uptime Lifetime
Receiving
Messages Received 3,406 1,558 3,406
[....]
Rates (Events Per Hour): 1-Minute 5-Minutes 15-Minutes
Receiving
Messages Received 0 0 0
Recipients Received 0 0 0
[....]
I want extract only second occured, that:
Rates (Events Per Hour): 1-Minute 5-Minutes 15-Minutes
Receiving
Messages Received 0 0 0 <-this
Do you have any suggestions?

cpuOverall line: you forgot that there is more information at the start of the line. Change to
'.*(Overall CPU load average)\s+(\S+%)'
cpuUsed line: you forgot that there is more information at the start of the line. Change to
'.*(Total)\s+(\d+%)'
ramUsed line: you forgot that there is more information at the start of the line... Change to
'.*(RAM Utilization)\s+(\d+%)'
Remember that re.match looks for an exact match from the start:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. [..]
With these changes, your three variables are set to the percentages:
>>> print (cpuused_new,ramused_new,cpuoverall_new)
4% 2% 12%

Related

How to get a total number of occurrences in an IF statement

After looking at a bunch of the other questions similar to mine, I found one that partially works but not exactly as I need it to. I am trying to get a total for the number of times an occurrence is found in my IF statement. The code I have tried is below:
for i in data['resources']:
url = server+ "/api/3/assets/" +str(i)
response = requests.request("GET", url, headers=headers, verify=False)
data = response.json()
Total = 0
if data['red'] and data['blue'] > 0:
Total += 1
print('Total:', Total)
Rather than giving me an output like:
Total: 3
I get this:
Total: 1
Total: 1
Total: 1

Total=0 inside the for loop and getting reset for each iteration. So you have to put Total=0 outside the for loop. And also if you want it to print only once then put the Print statement outside and at the end of the for loop
I am sure this should solve your problem

Parse output from k6 data to get specific information

I am trying to extract data from a k6 output (https://docs.k6.io/docs/results-output):
data_received.........: 246 kB 21 kB/s
data_sent.............: 174 kB 15 kB/s
http_req_blocked......: avg=26.24ms min=0s med=13.5ms max=145.27ms p(90)=61.04ms p(95)=70.04ms
http_req_connecting...: avg=23.96ms min=0s med=12ms max=145.27ms p(90)=57.03ms p(95)=66.04ms
http_req_duration.....: avg=197.41ms min=70.32ms med=91.56ms max=619.44ms p(90)=288.2ms p(95)=326.23ms
http_req_receiving....: avg=141.82µs min=0s med=0s max=1ms p(90)=1ms p(95)=1ms
http_req_sending......: avg=8.15ms min=0s med=0s max=334.23ms p(90)=1ms p(95)=1ms
http_req_waiting......: avg=189.12ms min=70.04ms med=91.06ms max=343.42ms p(90)=282.2ms p(95)=309.22ms
http_reqs.............: 190 16.054553/s
iterations............: 5 0.422488/s
vus...................: 200 min=200 max=200
vus_max...............: 200 min=200 max=200
The data comes in the above format and I am trying to find a way to get each line in the above along with the values only. As an example:
http_req_duration: 197.41ms, 70.32ms,91.56ms, 619.44ms, 288.2ms, 326.23ms
I have to do this for ~50-100 files and want to find a RegEx or similar quicker way to do it, without writing too much code. Is it possible?

Here's a simple Python solution:
import re
FIELD = re.compile(r"(\w+)\.*:(.*)", re.DOTALL) # split the line to name:value
VALUES = re.compile(r"(?<==).*?(?=\s|$)") # match individual values from http_req_* fields
# open the input file `k6_input.log` for reading, and k6_parsed.log` for parsing
with open("k6_input.log", "r") as f_in, open("k6_parsed.log", "w") as f_out:
for line in f_in: # read the input file line by line
field = FIELD.match(line) # first match all <field_name>...:<values> fields
if field:
name = field.group(1) # get the field name from the first capture group
f_out.write(name + ": ") # write the field name to the output file
value = field.group(2) # get the field value from the second capture group
if name[:9] == "http_req_": # parse out only http_req_* fields
f_out.write(", ".join(VALUES.findall(value)) + "\n") # extract the values
else: # verbatim copy of other fields
f_out.write(value)
else: # encountered unrecognizable field, just copy the line
f_out.write(line)
For a file with contents as above you'll get a resulting:
data_received: 246 kB 21 kB/s
data_sent: 174 kB 15 kB/s
http_req_blocked: 26.24ms, 0s, 13.5ms, 145.27ms, 61.04ms, 70.04ms
http_req_connecting: 23.96ms, 0s, 12ms, 145.27ms, 57.03ms, 66.04ms
http_req_duration: 197.41ms, 70.32ms, 91.56ms, 619.44ms, 288.2ms, 326.23ms
http_req_receiving: 141.82µs, 0s, 0s, 1ms, 1ms, 1ms
http_req_sending: 8.15ms, 0s, 0s, 334.23ms, 1ms, 1ms
http_req_waiting: 189.12ms, 70.04ms, 91.06ms, 343.42ms, 282.2ms, 309.22ms
http_reqs: 190 16.054553/s
iterations: 5 0.422488/s
vus: 200 min=200 max=200
vus_max: 200 min=200 max=200
If you have to run it over many files, I'd suggest you to investigate os.glob(), os.walk() or os.listdir() to list all the files you need and then loop over them and execute the above, thus further automating the process.

How to match this regular expression using TCL

Kindly give me some input on this. I have the below input for a TCL regular expression.
set a { Descriptor Blocks:
10.132.224.74 (Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200 }
From the above i want to separate like two list element, the regular expression should separate by following word.
Here there are two interface output is there, one is for
10.132.224.74 (Tunnel42)
interface and another one is for
10.135.0.86 (GigabitEthernet0/1)
If there is no line starting with "Internal tag is " after the "Originating router
is " line it should divide upto "Originating router is " line as a one
list element.
If there is a line "Internal tag is " is available after the
"Originating router is " line it should divide upto "Internal tag is "
as a one list
I am expecting the output like
{Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200

A more generalized approach can be splitting them input into line and parsing them as needed
set a { Descriptor Blocks:
10.132.224.74 (Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200 }
set tunnelStart 0
set interfaceStart 0
set tunnelInfo {}
set interfaceInfo {}
set result {}
foreach line [split $a \n] {
if {[regexp {\(Tunnel\d+\)} $line]} {
# If suppose, we already identified 'tunnelInfo' and extracted it, then that variable won't be empty
if {$tunnelInfo ne {}} {
regsub {\n$} $tunnelInfo {} tunnelInfo
# So, appending it to 'result'
lappend result $tunnelInfo
# Then, resetting the 'tunnelInfo'
set tunnelInfo {}
}
set tunnelStart 1
set interfaceStart 0
} elseif {[regexp {\(GigabitEthernet\d+/\d+\)} $line]} {
# Same reason as explained above
if {$interfaceInfo ne {}} {
regsub {\n$} $interfaceInfo {} interfaceInfo
lappend result $interfaceInfo
set interfaceInfo {}
}
set interfaceStart 1
set tunnelStart 0
}
if {$tunnelStart} {
#Appending each line along with '\n'
append tunnelInfo $line\n
} elseif {$interfaceStart} {
append interfaceInfo $line\n
}
}
#Removing the last '\n' alone
regsub {\n$} $tunnelInfo {} tunnelInfo
regsub {\n$} $interfaceInfo {} interfaceInfo
# At last checking if the variable is not empty, append it to 'result'
if {$tunnelInfo ne {}} {
lappend result $tunnelInfo
}
if {$interfaceInfo ne {}} {
lappend result $interfaceInfo
}
puts $result
You can put them in a procedure & call wherever you want to separate the input. If suppose your input has more than one tunnel and interface lines information, you could re-write the code to parse it accordingly.

You can use the textutil module to do this easily:
package require textutil
textutil::split::splitx $a {\n(?=\s*\d)}
This splits the original text into a list of three items: the " Descriptor Blocks:" substring and one item each for the two blocks. It works by finding junctures where a line break and optional whitespace is followed by a digit. The line break is removed, but the leading whitespace and the digit is preserved.
Core-Tcl solution:
The substitution
regsub -all -line {^(?=\s*\d)} $a \n
will split the text into three parts (the first part being the " Descriptor Blocks:" substring) by inserting an extra line break before each block. This solution obviously depends on only the first line in each block starting with a digit optionally preceded by whitespace. The -line option makes ^ anchor after a line break.
Note that this results in a text with three parts, not a list of three elements: if you want that you will need to break the text up at every double line break. Another way to deal with this is to have regsub instead insert a character that won't occur in the text, and then split on that character, e.g.
split [regsub -all -line {^(?=\s*\d)} $a #] #
Documentation: package, regsub, split, textutil package

Xively read data in Python

I have written a python 2.7 script to retrieve all my historical data from Xively.
Originally I wrote it in C#, and it works perfectly.
I am limiting the request to 6 hour blocks, to retrieve all stored data.
My version in Python is as follows:
requestString = 'http://api.xively.com/v2/feeds/41189/datastreams/0001.csv?key=YcfzZVxtXxxxxxxxxxxORnVu_dMQ&start=' + requestDate + '&duration=6hours&interval=0&per_page=1000' response = urllib2.urlopen(requestString).read()
The request date is in the correct format, I compared the full c# requestString version and the python one.
Using the above request, I only get 101 lines of data, which equates to a few minutes of results.
My suspicion is that it is the .read() function, it returns about 34k of characters which is far less than the c# version. I tried adding 100000 as an argument to the ad function, but no change in result.

Left another solution wrote in Python 2.7 too.
In my case, got data each 30 minutes because many sensors sent values every minute and Xively API has limited half hour of data to this sent frequency.
It's general module:
for day in datespan(start_datetime, end_datetime, deltatime): # loop increasing deltatime to star_datetime until finish
while(True): # assurance correct retrieval data
try:
response = urllib2.urlopen('https://api.xively.com/v2/feeds/'+str(feed)+'.csv?key='+apikey_xively+'&start='+ day.strftime("%Y-%m-%dT%H:%M:%SZ")+'&interval='+str(interval)+'&duration='+duration) # get data
break
except:
time.sleep(0.3)
raise # try again
cr = csv.reader(response) # return data in columns
print '.'
for row in cr:
if row[0] in id: # choose desired data
f.write(row[0]+","+row[1]+","+row[2]+"\n") # write "id,timestamp,value"
The full script you can find it here: https://github.com/CarlosRufo/scripts/blob/master/python/retrievalDataXively.py
Hope you might help, delighted to answer any questions :)

Parsing (partially) non-uniform text blocks in Perl

I have a file with a few blocks that look like this in a file (and in a variable, at this point in the program).
Vlan2 is up, line protocol is up
....
reliability 255/255, txload 1/255, rxload 1/255^M
....
Last clearing of "show interface" counters 49w5d
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
....
L3 out Switched: ucast: 17925 pkt, 23810209 bytes mcast: 0 pkt, 0 bytes
33374 packets input, 13154058 bytes, 0 no buffer
Received 926 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
3094286 packets output, 311981311 bytes, 0 underruns
0 output errors, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
Here's a second block, to show you how the blocks can slightly vary:
port-channel86 is down (No operational members)
...
reliability 255/255, txload 1/255, rxload 1/255
...
Last clearing of "show interface" counters 31w2d
...
RX
147636 unicast packets 0 multicast packets 0 broadcast packets
84356 input packets 119954232 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
147636 unicast packets 0 multicast packets 0 broadcast packets
84356 output packets 119954232 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
0 interface resets
I want to pick out certain data elements from each block, which may or may not exist in each block. For example, in the first block I posted I may want to know that there are 0 runts, 0 input errors and 0 overrun. In the second block, I might want to know that there are 0 jumbo packets, collisions, etc. If a given query isn't in the block, it's acceptable to just return na, as this is designed to be processed uniformly.
Each block is structured in a similar way to the two I posted; newlines and spaces delimiting some entries, commas delimiting others.
I have a few ideas as to how this might work. I'm unaware if there is any kind of "look back" function in Perl, but I could attempt to look for the field names (runts, "input errors", etc) and then grab the previous integer; that seems like it would be the most elegant solution for this, but I'm unsure if it's possible.
Currently, I'm doing this in Perl. Each "block" that I'm processing is actually several of these blocks (separated by double newlines). It doesn't have to be done in a single regular expressions; I believe it can be done by applying several regular expressions per block. Performance is not really a factor, as this script will run maybe once per hour.
My goal is to get all of this into a .csv file (or some other data format that's easily graphable) in an automated fashion.
Any ideas?
Edit: example output in CSV as I mentioned, which would be written line by line (for multiple entries like this) to a file as the end result. If a particular entry isn't found in the block, it is marked na in the corresponding line:
interface_name,txload,rxload,last_clearing,input_queue,output_drops,runts,....
vlan2,1,1,49w5d,0-75-0-0,0,0,....
port-channel86,1,1,31w2d,na,na,0,...

Simple hash of properties and numbers.
sub extract {
my ($block) = #_;
my %r;
while ($block =~ /(?<num>\d+) \s (?<name>[A-Za-z\s]+)/gmsx) {
my $name = $+{name};
my $num = $+{num};
$name =~ s/\A \s+//msx;
$name =~ s/\s+ \z//msx;
$r{$name} = $num;
}
return %r;
}
my $block = <<'';
Vlan2 is up, line protocol is up
⋮
my $block2 = <<'';
port-channel86 is down (No operational members)
⋮
use Data::Dumper qw(Dumper);
print Dumper {extract $block};
print Dumper {extract $block2};

I don't think a single regex could do it, nor would I want to support it if it could.
Using multiple regexes, you could easily use something like:
(\d+) runts
(\d+) input errors
...etc...
A simple array of property names and a loop could solve this pretty quickly and with very little fuss.
If you can strip down the input to smaller chunks with some preprocessing, you would be less likely to get false positives.

Here is one way to do it in awk, but this needs lots of tweak to be perfect.
But again, use SNMP.
awk '{
printf $1
for (i=1;i<=NF;i++) {
if ($i" "$(i+1)~/Input queue:/) printf ",%s",$(i+2)
if ($i~/runts/) printf ",%s",$(i-1)
if ($i~/multicast,/) printf ",%s",$(i-1)
}
print ""
}' RS="swapped out" file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js