How to match this regular expression using TCL - regex

Kindly give me some input on this. I have the below input for a TCL regular expression.
set a { Descriptor Blocks:
10.132.224.74 (Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200 }
From the above i want to separate like two list element, the regular expression should separate by following word.
Here there are two interface output is there, one is for
10.132.224.74 (Tunnel42)
interface and another one is for
10.135.0.86 (GigabitEthernet0/1)
If there is no line starting with "Internal tag is " after the "Originating router
is " line it should divide upto "Originating router is " line as a one
list element.
If there is a line "Internal tag is " is available after the
"Originating router is " line it should divide upto "Internal tag is "
as a one list
I am expecting the output like
{Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200

A more generalized approach can be splitting them input into line and parsing them as needed
set a { Descriptor Blocks:
10.132.224.74 (Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200 }
set tunnelStart 0
set interfaceStart 0
set tunnelInfo {}
set interfaceInfo {}
set result {}
foreach line [split $a \n] {
if {[regexp {\(Tunnel\d+\)} $line]} {
# If suppose, we already identified 'tunnelInfo' and extracted it, then that variable won't be empty
if {$tunnelInfo ne {}} {
regsub {\n$} $tunnelInfo {} tunnelInfo
# So, appending it to 'result'
lappend result $tunnelInfo
# Then, resetting the 'tunnelInfo'
set tunnelInfo {}
}
set tunnelStart 1
set interfaceStart 0
} elseif {[regexp {\(GigabitEthernet\d+/\d+\)} $line]} {
# Same reason as explained above
if {$interfaceInfo ne {}} {
regsub {\n$} $interfaceInfo {} interfaceInfo
lappend result $interfaceInfo
set interfaceInfo {}
}
set interfaceStart 1
set tunnelStart 0
}
if {$tunnelStart} {
#Appending each line along with '\n'
append tunnelInfo $line\n
} elseif {$interfaceStart} {
append interfaceInfo $line\n
}
}
#Removing the last '\n' alone
regsub {\n$} $tunnelInfo {} tunnelInfo
regsub {\n$} $interfaceInfo {} interfaceInfo
# At last checking if the variable is not empty, append it to 'result'
if {$tunnelInfo ne {}} {
lappend result $tunnelInfo
}
if {$interfaceInfo ne {}} {
lappend result $interfaceInfo
}
puts $result
You can put them in a procedure & call wherever you want to separate the input. If suppose your input has more than one tunnel and interface lines information, you could re-write the code to parse it accordingly.

You can use the textutil module to do this easily:
package require textutil
textutil::split::splitx $a {\n(?=\s*\d)}
This splits the original text into a list of three items: the " Descriptor Blocks:" substring and one item each for the two blocks. It works by finding junctures where a line break and optional whitespace is followed by a digit. The line break is removed, but the leading whitespace and the digit is preserved.
Core-Tcl solution:
The substitution
regsub -all -line {^(?=\s*\d)} $a \n
will split the text into three parts (the first part being the " Descriptor Blocks:" substring) by inserting an extra line break before each block. This solution obviously depends on only the first line in each block starting with a digit optionally preceded by whitespace. The -line option makes ^ anchor after a line break.
Note that this results in a text with three parts, not a list of three elements: if you want that you will need to break the text up at every double line break. Another way to deal with this is to have regsub instead insert a character that won't occur in the text, and then split on that character, e.g.
split [regsub -all -line {^(?=\s*\d)} $a #] #
Documentation: package, regsub, split, textutil package

Related

Telegraf: How to extract from field using regex processor?

I would like to extract the values for connections, upstream and downstream using telegraf regex processor plugin from this input:
2022/11/16 22:38:48 In the last 1h0m0s, there were 10 connections. Traffic Relayed ↑ 60 MB, ↓ 4 MB.
Using this configuration the result key "upstream" is a copy of the initial message but without a part of the 'regexed' stuff.
[[processors.regex]]
tagpass = ["snowflake-proxy"]
[[processors.regex.fields]]
## Field to change
key = "message"
## All the power of the Go regular expressions available here
## For example, named subgroups
pattern = 'Relayed.{3}(?P<UPSTREAM>\d{1,4}\W.B),'
replacement = "${UPSTREAM}"
## If result_key is present, a new field will be created
## instead of changing existing field
result_key = "upstream"
Current output:
2022/11/17 10:38:48 In the last 1h0m0s, there were 1 connections. Traffic 3 MB ↓ 5 MB.
How do I get the decimals?
I'm quite a bit confused how to use the regex here, because on several examples in the web it should work like this. See for example: http://wiki.webperfect.ch/index.php?title=Telegraf:_Processor_Plugins
The replacement config option specifies what you want to replace in for any matches.
I think you want something closer to this:
[[processors.regex.fields]]
key = "message"
pattern = '.*Relayed.{3}(?P<UPSTREAM>\d{1,4}\W.B),.*$'
replacement = "${1}"
result_key = "upstream"
to get:
upstream="60 MB"

Python: Match a special caracter with regular expression

Hi everyone I'm using the re.match function to extract pieces of string within a row from the file.
My code is as follows:
## fp_tmp => pointer of file
for x in fp_tmp:
try:
cpuOverall=re.match(r"(Overall CPU load average)\s+(\S+)(%)",x)
cpuUsed=re.match(r"(Total)\s+(\d+)(%)",x)
ramUsed=re.match(r"(RAM Utilization)\s+(\d+\%)",x)
####Not Work####
if cpuUsed is not None: cpuused_new=cpuUsed.group(2)
if ramUsed is not None: ramused_new=ramUsed.group(2)
if cpuOverall is not None: cpuoverall_new=cpuOverall.group(2)
except:
searchbox_result = None
Each field is extracted from the following corresponding line:
ramUsed => RAM Utilization 2%
cpuUsed => Total 4%
cpuOverall => Overall CPU load average 12%
ramUsed, cpuUsed, cpuOverall are the variable where I want write the result!!
Corretly line are:
(space undefined) RAM Utilization 2%
(space undefined) Total 4%
(space undefined) Overall CPU load average 12%
When I execute the script all variable return a value: None.
With other variable the script work corretly.
Why the code not work in this case? I use the python3
I think that the problem is a caracter % that not read.
Do you have any suggestions?
PROBLEM 2:
## fp_tmp => pointer of file
for x in fp_tmp:
try:
emailReceived=re.match(r".*(Messages Received)\s+\S+\s+\S+\s+(\S+)",x)
####Not Work####
if emailReceived is not None: emailreceived_new=emailReceived.group(2)
except:
searchbox_result = None
Each field is extracted from the following corresponding on 2 lines in a file:
[....]
Counters: Reset Uptime Lifetime
Receiving
Messages Received 3,406 1,558 3,406
[....]
Rates (Events Per Hour): 1-Minute 5-Minutes 15-Minutes
Receiving
Messages Received 0 0 0
Recipients Received 0 0 0
[....]
I want extract only second occured, that:
Rates (Events Per Hour): 1-Minute 5-Minutes 15-Minutes
Receiving
Messages Received 0 0 0 <-this
Do you have any suggestions?
cpuOverall line: you forgot that there is more information at the start of the line. Change to
'.*(Overall CPU load average)\s+(\S+%)'
cpuUsed line: you forgot that there is more information at the start of the line. Change to
'.*(Total)\s+(\d+%)'
ramUsed line: you forgot that there is more information at the start of the line... Change to
'.*(RAM Utilization)\s+(\d+%)'
Remember that re.match looks for an exact match from the start:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. [..]
With these changes, your three variables are set to the percentages:
>>> print (cpuused_new,ramused_new,cpuoverall_new)
4% 2% 12%

How to do multiple regexp tests in TCL?

Can someone tell me what I'm doing wrong in my regexp statement? It doesn't match the "Operability: Degraded" line. I am trying to match anything that is not in operable state. I am new to TCL. Thanks!
Contents of $expect_out(buffer) it does the regexp on:
ID 20:
Location: G1
Presence: Equipped
Overall Status: Operable
Operability: Degraded
Visibility: Yes
Product Name: 16GB DDR3-1600-MHz RDIMM/PC3-12800/dual rank/1.35V
PID:
VID: V01
Vendor: 0x2C00
Vendor Description: Micron Technology, Inc.
Vendor Part Number:
Vendor Serial (SN):
HW Revision: 0
Form Factor: DIMM
Type: DDR3
Capacity (MB): 16384
Clock: 1600
Latency: 0.600000
Width: 64
Code:
proc check_errors { buffer cmd } {
set count [ regexp -all -- { Activate-Status.*?!Ready|Overall.*Status.*?!Operable|Operability.*?!Operable|Controller.*Status.*?!Optimal|Errors.*?!0|Dr
opped.*?!0|Discarded.*?!0|Bad.*?!0|Suspect.*?!No|Thresholded.*?!0|Visibility.*?!Yes|Thermal.*Status.*?!OK|HA.*?!READY } $buffer ]
if { [ set count ] != 0 } {
puts "\tFAIL $cmd (Error Count: $count)"
} else {
puts "\tPASS $cmd"
}
}
Output: (blade 6/5 has a known issue, it should fail the memory check)
Blade 6/5 checks...
PASS show stats
PASS show version
PASS show adapter detail
PASS show cpu detail
PASS show memory detail
PASS show inventory detail
!term doesn't mean "anything but term" in regex. For that type of logic, you'll need a negative lookahead approach:
Activate-Status(?!.*Ready)|Overall.*Status(?!.*Operable)|Operability(?!.*Operable)|Controller.*Status(?!.*Optimal)|Errors(?!.*0)|Dropped(?!.*0)|Discarded(?!.*0)|Bad(?!.*0)|Suspect(?!.*No)|Thresholded(?!.*0)|Visibility.(?!.*yes)|Thermal.*Status(?!.*OK)|HA.*(?!.*READY)
check it out here
note: I'd use case insensitivity to filter out both "No" and "no", and also, you must make sure your input is not treated as a single line, but multiple lines, so the .* wildcards don't race past the \n newlines and mess everything up.
#sweaver2112 has the right answer. I'd like to add maintainability into the mix:
use the -expanded flag for additional non-meaningful whitespace
use the -line so . does not match a newline (so "Ready" is on the same line as "Activate-Status")
-nocase for case-insensitive matching (if that's important)
set count [ regexp -all -expanded -line -- {
Activate-Status (?!.*?Ready) |
Overall.*Status (?!.*?Operable) |
Operability (?!.*?Operable) |
Controller.*Status (?!.*?Optimal) |
Errors (?!.*?0) |
Dropped (?!.*?0) |
Discarded (?!.*?0) |
Bad (?!.*?0) |
Suspect (?!.*?No) |
Thresholded (?!.*?0) |
Visibility (?!.*?Yes) |
Thermal.*Status (?!.*?OK) |
HA (?!.*?READY)
} $buffer ]

Sort multi-line blocks in large (~10GB) by single token in block

I have a large file (~10GB) full of memory traces in this format:
INPUT:
Address: 7f2da282c000
Data:
0x7f2da282c000
0
0x7f2db4c810d0
0
0x304b2e198
0x304b2e1b8
0x304b3af38
0x304b54528
Address: 603000
Data:
0x603000
0
0x7f2db4c810d0
0
0x304b2e198
0x304b2e1b8
0x304b3af38
0x304b54528
.
.
.
Address: 7f2da2a38dc0
Data:
0x7f2da2a38dc0
0
0x7f2db4c810d0
0
0x304b2e198
0x304b2e1b8
0x304b3af38
0x304b54528
These are addresses and 64 bytes of data at those addresses at different points in time as the accesses occurred. Each hex value in the data field represents 8 bytes. Suppose each address and its data make up one multi-line block.
Certain addresses are accessed/updated multiple times and I'd like to sort the multi-line blocks so that each address that has multiple updates, has those accesses right below it like this:
OUTPUT:
Address: 7f2da2a38dc0
Data:
0x7f2da2a38dc0
0
0x7f2db4c810d0
0
0x304b2e198
0x304b2e1b8
0x304b3af38
0x304b54528
0x304b2e198
0x304b2e1b8
0x304b3af38
0x304b54528
0x304b2e198
0x304b2e1b8
0x304b3af38
0x304b54528
.
.
.
0x7f2da2a38dc0
0
0x7f2db4c810d0
0
0x7f2da2a38dc0
0
0x7f2db4c810d0
0
Address: 0xadsf212
Data:
[Updates]
[Updates]
.
.
.
[Updates]
Where each address that is accessed more than once, has its respective updates below it, and addresses that are accessed only once are thrown out.
What I tried:
-Comparing each address to every other address in a simple c++ program, but it's way too slow, (has been running for a couple days now).
-Used *nix sort to get all the addresses and their counts (sort -k 2,2 bigTextFile.txt | uniq -cd > output file), but only the first line of the multi-line blocks are sorted by, the deadbeeff part in 'Address: deadbeeff' and the data blocks are left behind. Is there any way for sort to take a set of lines and sort them from a single value in the top line of the block, i.e. the address value and move the entire block around? I found some awk scripts that looked not applicable.
-Looked into making a database out of the file with address, the access index and the data as three columns and then run a query for all the data updates that have the same address, but I've never used databases and I'm not sure this is the best approach.
Any recommendations on what I tried, or new approaches is appreciated.
This is pretty basic file processing. It sounds like you just need to hash the blocks on address and then print the map values that have more than one block. In languages like perl this is simple:
use strict;
sub read_block {
my #data;
while (<>) {
s/^Address: //; # Remove "Address: ".
return \#data unless /\S/;
push #data, $_ unless /^Data/; # Ignore "Data:".
}
\#data
}
sub main {
my %map;
while (1) {
my $block = read_block;
last unless scalar(#$block) > 0;
my $addr = shift #$block; # Add the block to the hash.
push #{$map{$addr}}, $block;
}
# Just for fun, sort keys by address.
my #sorted_addr = sort { hex $a cmp hex $b } keys %map;
# Print blocks that have more than one access.
foreach my $addr (#sorted_addr) {
next unless scalar(#{$map{$addr}}) > 1; # Ignore blocks of 1.
print "Address: $addr";
foreach my $block (#{$map{$addr}}) {
print #$block;
print "\n"; # Leave a blank line between blocks.
}
}
}
main;
Of course you'll need a machine with enough RAM to hold the data. 32Gb ought to do nicely. If you don't have that, a trickier 2-pass algorithm will do with much less.

Parsing (partially) non-uniform text blocks in Perl

I have a file with a few blocks that look like this in a file (and in a variable, at this point in the program).
Vlan2 is up, line protocol is up
....
reliability 255/255, txload 1/255, rxload 1/255^M
....
Last clearing of "show interface" counters 49w5d
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
....
L3 out Switched: ucast: 17925 pkt, 23810209 bytes mcast: 0 pkt, 0 bytes
33374 packets input, 13154058 bytes, 0 no buffer
Received 926 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
3094286 packets output, 311981311 bytes, 0 underruns
0 output errors, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
Here's a second block, to show you how the blocks can slightly vary:
port-channel86 is down (No operational members)
...
reliability 255/255, txload 1/255, rxload 1/255
...
Last clearing of "show interface" counters 31w2d
...
RX
147636 unicast packets 0 multicast packets 0 broadcast packets
84356 input packets 119954232 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
147636 unicast packets 0 multicast packets 0 broadcast packets
84356 output packets 119954232 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
0 interface resets
I want to pick out certain data elements from each block, which may or may not exist in each block. For example, in the first block I posted I may want to know that there are 0 runts, 0 input errors and 0 overrun. In the second block, I might want to know that there are 0 jumbo packets, collisions, etc. If a given query isn't in the block, it's acceptable to just return na, as this is designed to be processed uniformly.
Each block is structured in a similar way to the two I posted; newlines and spaces delimiting some entries, commas delimiting others.
I have a few ideas as to how this might work. I'm unaware if there is any kind of "look back" function in Perl, but I could attempt to look for the field names (runts, "input errors", etc) and then grab the previous integer; that seems like it would be the most elegant solution for this, but I'm unsure if it's possible.
Currently, I'm doing this in Perl. Each "block" that I'm processing is actually several of these blocks (separated by double newlines). It doesn't have to be done in a single regular expressions; I believe it can be done by applying several regular expressions per block. Performance is not really a factor, as this script will run maybe once per hour.
My goal is to get all of this into a .csv file (or some other data format that's easily graphable) in an automated fashion.
Any ideas?
Edit: example output in CSV as I mentioned, which would be written line by line (for multiple entries like this) to a file as the end result. If a particular entry isn't found in the block, it is marked na in the corresponding line:
interface_name,txload,rxload,last_clearing,input_queue,output_drops,runts,....
vlan2,1,1,49w5d,0-75-0-0,0,0,....
port-channel86,1,1,31w2d,na,na,0,...
Simple hash of properties and numbers.
sub extract {
my ($block) = #_;
my %r;
while ($block =~ /(?<num>\d+) \s (?<name>[A-Za-z\s]+)/gmsx) {
my $name = $+{name};
my $num = $+{num};
$name =~ s/\A \s+//msx;
$name =~ s/\s+ \z//msx;
$r{$name} = $num;
}
return %r;
}
my $block = <<'';
Vlan2 is up, line protocol is up
⋮
my $block2 = <<'';
port-channel86 is down (No operational members)
⋮
use Data::Dumper qw(Dumper);
print Dumper {extract $block};
print Dumper {extract $block2};
I don't think a single regex could do it, nor would I want to support it if it could.
Using multiple regexes, you could easily use something like:
(\d+) runts
(\d+) input errors
...etc...
A simple array of property names and a loop could solve this pretty quickly and with very little fuss.
If you can strip down the input to smaller chunks with some preprocessing, you would be less likely to get false positives.
Here is one way to do it in awk, but this needs lots of tweak to be perfect.
But again, use SNMP.
awk '{
printf $1
for (i=1;i<=NF;i++) {
if ($i" "$(i+1)~/Input queue:/) printf ",%s",$(i+2)
if ($i~/runts/) printf ",%s",$(i-1)
if ($i~/multicast,/) printf ",%s",$(i-1)
}
print ""
}' RS="swapped out" file