TCL data structure, Need to extract, match, and compare - regex

I am completely new to programming full-stop.
I have to come up with a program that checks cables that interconnect FPGAs. From the command prompt (TCL interface), I get the following after entering a command:
{A13-D1 CON_CABLE_HT3_25 [serial# of the cable] J1 {1V0 1V2 1V35 1V5 1V8 2V5 3V3} [another random serial number]}
Now this is repeated for every single port on the system. A13 is the name of the port, that the cable is connected to. I need to match up the serial numbers of the cables, and then put them in an array or something. So if A13 has the same serial number(of the cable) as D1, it would mean A13 is connected to D1. So it could be written as A13 D1, which would make more sense.
After getting all the connections, I need to compare it against a configuration file, where all the connections are given.
I don't need someone to do this for me, I just dont know how to get started! Any ideas?
I'd like to do this in tcl as the all the commands that I would be using to get this data is in tcl.

That line looks like a brace-quoted Tcl list, which given that there's Tcl on the other side is entirely possible. That makes extracting information easy!
Assuming you've got the line in a variable called line, you can get the connection information with lindex:
set connectInfo [lindex $line 0 0]
Then you'd use split and a further lindex to get the two bits of that:
set port [lindex [split $connectInfo "-"] 0]
You could also use regexp to extract the information from $connectInfo:
# Anchored because we're matching the whole string
regexp {^(\w+)-\w+$} $connectInfo -> port
# You should check the result of regexp to see if the match actually succeeded
# if {![regexp {^(\w+)-\w+$} $connectInfo -> port]} {
# error "it didn't match! waaah!"
# }
However, writing a regular expression tends to require knowing the format of the data quite well. (If you're using regular expressions, put them in braces as that avoids backslashitis.)

Related

Select multiple variables with Regex inside a single string

Regex101 link
https://regex101.com/r/wOwFEV/2
Background
I have a dump of nmap reports and I want to extract data from to digest.
I have various inputs similar to:
23/tcp open telnet SMC SMC2870W Wireless Ethernet Bridge
The latter three variables change, but the common denominator is:
The first value is ALWAYS 23/tcp
They are ALWAYS separated by more than one space
There will ALWAYS be four values
I would like to use Regex to pluck each "variable" and assign it to a group.
Right now, I have
(?sm)(?=^23\/tcp)(?<port>.*?)\s*open
Which grabs 23/tcp and assigns it to <port>
But I also want to grab:
open and assign it to <state>
telnet and assign it to <service>
SMC SMC2870W Wireless Ethernet Bridge and assign it to <description>
If not an answer, I think knowing how to grab values between '2 or more' white spaces will solve this, but I can't find any similar examples!
A more specific regexp is:
(?sm)(?=^23\/tcp)(?<port>\d+\/\w+)\s+(?<state>\w*?)\s+(?<service>\w*?)\s+(?<description>.*?)\s$
This restricts the port to be digits/alphanumeric, and state and service to be alphanumeric. It only uses .* for the description, since it's arbitrary text.
And with this change, it's not necessary to require that there be at least 2 spaces between each field, it will work with any number of spaces.
DEMO
Nevermind, got it.
(?sm)(?=^23\/tcp)(?<port>.*?)\s{2,}(?<state>.*?)\s{2,}(?<service>.*?)\s{2,}(?<description>.*?)$
Will do exactly what I described.
https://regex101.com/r/wOwFEV/3

How to safely bypass tabs in RegEx

I'm using C to do my regular expressions. Things work except for when the input string contains tabs.
This is my RegEx I plug into the regcomp function:
(DROP).*(tcp).*([\\.0-9]+).*0\\.0\\.0\\.0.*dpt:([0-9]+)(.*)
Regcomp returned OK with no issues.
I then used the following string to do the matching with:
DROP\ttcp\t--\t202.153.39.52\t0.0.0.0/0\ttcp dpt:21
I'm using such string to simulate output of iptables because I want to make a program to see which IPs are already listed.
When I execute my program, I receive the following pieces of output after executing the RegEx where the first line is data from the first offset:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
21
Everything is correct except the second-last value. It shows 2, but I expect it to be 202.153.39.52. and I used ([\\.0-9]+) in my RegEx to try to specifically state I only want numbers and dots to match.
How do I fix my RegEx?
UPDATE
I then proceeded to use this RegEx instead in hopes I get each individual octet of the IP address
(DROP).*(tcp).*([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+).*(0\\.0\\.0\\.0).*dpt:([0-9]+)
This is my result:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
153
39
52
0.0.0.0
21
Now this means the first ([0-9]+) isn't processing properly. I should receive a 202, not a 2. Is there something I'm doing wrong? Do I need a special flag for any RegEx function?
I think you're confused about the difference between regex syntax and that syntax encoded as a string (in languages like Java that don't have first class regexes).
Try something more robust and commonsense:
DROP\s+tcp\s+\S+\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+0\.0\.0\.0/0\s+tcp\s+dpt:(\d+)
This will capture the ip address and the port number only. Why would you want to capture a fixed string like DROP?
As a string, this is:
"DROP\\s+tcp\\s+\\S+\\s+(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s+0\\.0\\.0\\.0/0\\s+tcp\\s+dpt:(\\d+)"
Use an online regex tester like this one for testing and to convert from regex to string automatically.

Matching the IP using regular expression

set ip 10.10.
if {[regexp
{^(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.?){4}$} $ip
match]} { puts $match }
the above pattern matching 10.10. can anyone tell me how this happening
First, using a regular expression to check ip addresses is extremely fragile and unnecessarily complex, and you still have to do the heavy lifting yourself. Instead, use the Tcllib_ip package.
package require ip
If you want to know if a given string is an IPv4 address, just check with
::ip::is 4 $str ;# 1 if valid ipv4, 0 otherwise
or
::ip::version $str ;# returns 4 or 6 for ipv4 or ipv6, -1 otherwise
The commands in the package also handle address strings that aren't dotted decimal.
The package isn't included in all distributions, but can be installed using teacup install or by downloading the files and sourcing them into the script.
To answer the question: the original asker has one error and one problem. The error is that the regular expression used to match the ip address also matches strings that aren't ip addresses. This is one of the most common problems when using regular expressions. The reason and the fix is addressed in other answers to the question. To recap: Captain noted that since the original regular expression makes the dot optional, the string 10.10. can be matched as 1 0. 1 0.. There are several possible solutions: {^(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(\.|$)){4}$} as suggested by the same Captain seems valid but may turn out to have more problems if tested.
The main problem is that a non-trivial regular expression is used to match the address. For all but the most trivial regular expressions, rigorous testing must be performed to ensure that they don't produce false positives. This testing is usually impractical to make exhaustive, which means that you can't know for sure if it works until an angry customer tells you it doesn't. When a case of false positive match is found, the solution is either to drop the regular expression and try another method, or alternatively to make the regular expression more complex in order to make the match more strict. At this point, the test suite may also have to grow.
A better way is to step back and look for other solutions. If there is a standard library function for it, that should be used. If we imagine there is none in this case, simply reflecting on the most basic formulation of an ipv4 decimal-dot address ("four groups of integers from 0 to 255, joined by dots") suggests some simple and safe functions:
proc isOctet n {
expr {[string is integer -strict $n] && 0 <= $n && $n <= 255}
}
proc splitIpv4dd1 str {
split $str .
}
proc splitIpv4dd2 str {
scan $str %d.%d.%d.%d
}
proc splitIpv4dd3 str {
lrange [regexp -inline {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} $str] 1 end
}
# plug any of the preceding splitIpv4ddN functions into this command
proc putsIpv4dd str {
set count 0
foreach n [splitIpv4dd1 $str] {
if {[isOctet $n]} {
incr count
}
}
if {$count == 4} {puts $str}
}
It is much easier to verify that each of these functions does its job correctly without false negatives or positives, and if they do, the command to print ip addresses can be assumed to work correctly. The third splitting function uses a regular expression, but in this case it's a trivial one without alternatives and optional atoms.
One important goal when writing robust and maintainable code is to keep functions cohesive and clear-cut without loopholes or irregularities. Matching with non-trivial regular expressions runs counter to this.
I certainly understand and actually applaud the wish to understand what went wrong, but the correct conclusion to draw from this is that regular expression matching isn't a good method to use in this case.
You can try to use this regex:
^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$
Regex Demo
To answer "how this is happening" - ´.´ optional, it finds 1, 0., 1, 0.
And the answer to the unasked question
The below expression will make the dot optional only if it is the end of the string (modified to ensure no trailing dot):
^(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(\.(?=[0-9])|$)){4}$
Please remember that the original question was asking "how is this happening" - i.e. understanding the regular expression behaviour... NOTHING about how to change the regex or how this should be done...

how to expect percentages and spaces

I am making an expect script to check memory usage and can only proceed to the next steps if the mem usage is less than 65%.
#!/usr/bin/expect -f
spawn telnet $serverip
send "show performance\r"
expect {
timeout { send_user "\nCPU Usage is too high.\n";exit 1}
"0-65%" # i need to expect 0-65%
}
then proceed to other commands.
output is :
CPU used MEM used RX(Kbps) TX(Kbps) RX(Kbps) TX(Kbps)
1.0% 51.2% 0.000 0.000 1.620 2.426
i need to make sure that mem used is less than 65%. How can i do this in EXPECT SCRIPT?
Thanks for the help. Its been killing me.
You have to use regular expression in the expect itself with the help of -re flag.
There can be two ways to get this done.
Match all the show performance command output till the prompt and then apply the tcl's legacy regexp in that output
Match only the required value (i.e. the mem used % value) alone directly.
I assume your device's prompt will be #. But, there are some devices whose prompt may vary. So, in order to handle this, we can come up with generalized prompt pattern as
set prompt "#|>|\\\$";
If your device's prompt is not available in this, then please include the same.
#!/usr/bin/expect -f
#This is a common approach for few known prompts
#If your device's prompt is missing here, then you can add the same.
set prompt "#|>|\\\$"; # We escaped the `$` symbol with backslash to match literal '$'
spawn telnet $serverip
# Add code for login here
expect -re $prompt; # Using '-re' flag here to match one one of the prompt.
# Your some other code here to something if any
# This is to clean up the previous expect_out(buffer) content
# So that, we can get the exact output what we need.
expect *;
send "show performance\r"; # '\r' used here to type 'return' .i.e new line
expect -re $prompt; # Matching the prompt with regexp
#Now, the content of 'expect_out(buffer)' has what we need
set output $expect_out(buffer);
# Applying the tcl's regexp here
if {[regexp {%\s+([^%]+)} $output ignore mem]} {
puts "Memory used : $mem"
}
I have used the pattern as {%\s+([^%]+)}. In your output, we have 2 percentage symbols. The first one corresponds to the CPU used and second one is for the memory used. So, basically I am trying to match the text % 51.2%
Let me decode the pattern.
% - to match the first percentage sign
\s+ - to match the more than one spaces.
[^%]+ - Match anything other than % (This is where we are getting the required value i.e. the value 51.2)
Then what is the need of parenthesis here ? Well ,that is for grouping. Expect will save the matched output into expect_out(0,string). For the nth sub match, it will be saved on expect_out(n, string). i.e. For 1st sub match expect_out(1,string) and for 2nd sub match expect_out(2,string) and so on. Expect will store all the matched and unmatched input to a variable called expect_out(buffer). So, that is the short story. One more thing might bother you. What is this expect *` doing here ? You can have a look at here to know more about the same.
That's all about the 1st way. Now, what about the second approach which I have described above ? That is bit more easy.
send "show performance\r";
expect {
-re {%\s+([^%]+)} { set mem $expect_out(1,string); puts "Memory used : $mem" }
timeout { puts timeout_happened }
}
This looks more comfortable and no need of applying separate regexp additionally. That is one advantage of it. You can use whichever you find it comfortable and whichever is much needed as per your requirement.
Once your get the value, you can simply compare it with a if loop if it is less than 65%.

Change WiFi WPA2 passkey from a script

I'm using Raspbian Wheezy, but this is not a Raspberry Pi specific question.
I am developing a C application, which allows the user to change their WiFi Password.
I did not find a ready script/command for this, so I'm trying to use sed.
I pass the SSID name and new key to a bash script, and the key is replaced for the that ssid block within *etc/wpa_supplicant/wpa_supplicant.conf.*.
My application runs as root.
A sample block is shown below.
network={
ssid="MY_SSID"
scan_ssid=1
psk="my_ssid_psk"
}
so far I've tried the following (I've copied the wpa_supplicant.conf to wpa.txt for trying) :
(1) This tries to do the replacement between a range, started when my SSID is detected, and ending when the closing brace, followed by a newline.
SSID="TRIMURTI"
PSK="12345678"
sed -n "1 !H;1 h;$ {x;/ssid=\"${SSID}\"/,/}\n/ s/[[:space:]]*psk=.*\n/\n psk=\"${PSK}\"\n/p;}" wpa.txt
and
(2) This tries to 'remember' the matched pattern, and reproduce it in the output, but with the new key.
SSID="TRIMURTI"
PSK="12345678"
sed -n "1 !H; 1 h;$ {x;s/\(ssid=\"${SSID}\".*psk=\).*\n/\1\"${PSK}\"/p;}" wpa.txt
I have used hold & pattern buffers as the pattern can span multiple lines.
Above, the first example seems to ignore the range & replaces the 1st instance, and then truncates the rest of the file.
The second example replaces the last found psk value & truncates the file thereafter.
So I need help in correcting the above code, or trying a different solution.
If we can assume the fields will always be in a strict order where the ssid= goes before psk=, all you really need is
sed "/^[[:space:]]*ssid=\"$SSID\"[[:space:]]*$/,/}/s/^\([[:space:]]*psk=\"\)[^\"]*/\1$PSK/" wpa.txt
This is fairly brittle, though. If the input is malformed, or if the ssid goes after the psk in your block, it will break. The proper solution (which however is severe overkill in this case) is to have a proper parser for the input format; while that is in theory possible in sed, it would be much simpler if you were to swtich a higher-level language like Python or Perl, or even Awk.
The most useful case is update a password or other value in configuration is to utilize wpa_cli. E.g.:
wpa_cli -i "wlan0" set_network "0" psk "\"Some5Strong1Pass"\"
wpa_cli -i "wlan0" save_config
The save_config method is required to update cfg file: /etc/wpa_supplicant/wpa_supplicant.conf