How to safely bypass tabs in RegEx - regex

I'm using C to do my regular expressions. Things work except for when the input string contains tabs.
This is my RegEx I plug into the regcomp function:
(DROP).*(tcp).*([\\.0-9]+).*0\\.0\\.0\\.0.*dpt:([0-9]+)(.*)
Regcomp returned OK with no issues.
I then used the following string to do the matching with:
DROP\ttcp\t--\t202.153.39.52\t0.0.0.0/0\ttcp dpt:21
I'm using such string to simulate output of iptables because I want to make a program to see which IPs are already listed.
When I execute my program, I receive the following pieces of output after executing the RegEx where the first line is data from the first offset:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
21
Everything is correct except the second-last value. It shows 2, but I expect it to be 202.153.39.52. and I used ([\\.0-9]+) in my RegEx to try to specifically state I only want numbers and dots to match.
How do I fix my RegEx?
UPDATE
I then proceeded to use this RegEx instead in hopes I get each individual octet of the IP address
(DROP).*(tcp).*([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+).*(0\\.0\\.0\\.0).*dpt:([0-9]+)
This is my result:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
153
39
52
0.0.0.0
21
Now this means the first ([0-9]+) isn't processing properly. I should receive a 202, not a 2. Is there something I'm doing wrong? Do I need a special flag for any RegEx function?

I think you're confused about the difference between regex syntax and that syntax encoded as a string (in languages like Java that don't have first class regexes).
Try something more robust and commonsense:
DROP\s+tcp\s+\S+\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+0\.0\.0\.0/0\s+tcp\s+dpt:(\d+)
This will capture the ip address and the port number only. Why would you want to capture a fixed string like DROP?
As a string, this is:
"DROP\\s+tcp\\s+\\S+\\s+(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s+0\\.0\\.0\\.0/0\\s+tcp\\s+dpt:(\\d+)"
Use an online regex tester like this one for testing and to convert from regex to string automatically.

Related

Bash replace substring after first colon

I am trying to build a connection string that requires pulling 3 IP addresses from another config file. When I get those values, I need to replace the port on each. I plan to replace each port using simple Bash find and replace ${string/pattern/replacement} but my problem is I'm stuck on the best way to parse the pattern out of the IP.
Here is what I have so far:
myFile.config:
ip.1=ip-ip-1-address:1234:5678
ip.2=ip-ip-2-address:1234:5678
ip.3=ip-ip-3-address:1234:5678
Copying some other simple process, I found I can pull the value of each IP like this:
IP1=`grep "ip.1=" /path/to/conf/myFile.config | awk -F "=" '{print $2}'`
which gives me ip.1=ip-ip-1-address:1234:5678. However, I need to replace 1234:5678 with 6543 for example. I've been looking around and I found this awesome answer that detailed using Bash prefix substitution but that relies on knowing the parameter. for example, I would have to do it this way:
test=${ip1##ip-ip-1-address:}
which results in $test being 1234:5678. That's fine but maybe I don't know the IP address as the parameter, so I'm back to considering regex unless there's a way for me to use * as the parameter or something, but I have been unsuccessful so far. For regex, I have tried a bunch such as test=${ip1/(?<=:).*/}.
Note that ${ip1/(?<=:).*/} you tried is an example of string manipulation syntax that does not support regex, only specific patterns.
You seem to want
x='ip.1=ip-ip-1-address:1234:5678'
echo "${x%%:*}:6543" # => ip.1=ip-ip-1-address:6543
The ${x%%:*} takes the value of x and removes all chars from the end till the first : including it. :6543 is added to the result of this manipulation using "${x%%:*}:6543".
To extract that value, you may also use
awk '/^ip\.1=/{sub("^[^:]+:", "");print}' myFile.config
The awk command finds lines starting with ip.1= and then removes all text from the start till the first colon including the colon and only prints these values.

Select multiple variables with Regex inside a single string

Regex101 link
https://regex101.com/r/wOwFEV/2
Background
I have a dump of nmap reports and I want to extract data from to digest.
I have various inputs similar to:
23/tcp open telnet SMC SMC2870W Wireless Ethernet Bridge
The latter three variables change, but the common denominator is:
The first value is ALWAYS 23/tcp
They are ALWAYS separated by more than one space
There will ALWAYS be four values
I would like to use Regex to pluck each "variable" and assign it to a group.
Right now, I have
(?sm)(?=^23\/tcp)(?<port>.*?)\s*open
Which grabs 23/tcp and assigns it to <port>
But I also want to grab:
open and assign it to <state>
telnet and assign it to <service>
SMC SMC2870W Wireless Ethernet Bridge and assign it to <description>
If not an answer, I think knowing how to grab values between '2 or more' white spaces will solve this, but I can't find any similar examples!
A more specific regexp is:
(?sm)(?=^23\/tcp)(?<port>\d+\/\w+)\s+(?<state>\w*?)\s+(?<service>\w*?)\s+(?<description>.*?)\s$
This restricts the port to be digits/alphanumeric, and state and service to be alphanumeric. It only uses .* for the description, since it's arbitrary text.
And with this change, it's not necessary to require that there be at least 2 spaces between each field, it will work with any number of spaces.
DEMO
Nevermind, got it.
(?sm)(?=^23\/tcp)(?<port>.*?)\s{2,}(?<state>.*?)\s{2,}(?<service>.*?)\s{2,}(?<description>.*?)$
Will do exactly what I described.
https://regex101.com/r/wOwFEV/3

TCL data structure, Need to extract, match, and compare

I am completely new to programming full-stop.
I have to come up with a program that checks cables that interconnect FPGAs. From the command prompt (TCL interface), I get the following after entering a command:
{A13-D1 CON_CABLE_HT3_25 [serial# of the cable] J1 {1V0 1V2 1V35 1V5 1V8 2V5 3V3} [another random serial number]}
Now this is repeated for every single port on the system. A13 is the name of the port, that the cable is connected to. I need to match up the serial numbers of the cables, and then put them in an array or something. So if A13 has the same serial number(of the cable) as D1, it would mean A13 is connected to D1. So it could be written as A13 D1, which would make more sense.
After getting all the connections, I need to compare it against a configuration file, where all the connections are given.
I don't need someone to do this for me, I just dont know how to get started! Any ideas?
I'd like to do this in tcl as the all the commands that I would be using to get this data is in tcl.
That line looks like a brace-quoted Tcl list, which given that there's Tcl on the other side is entirely possible. That makes extracting information easy!
Assuming you've got the line in a variable called line, you can get the connection information with lindex:
set connectInfo [lindex $line 0 0]
Then you'd use split and a further lindex to get the two bits of that:
set port [lindex [split $connectInfo "-"] 0]
You could also use regexp to extract the information from $connectInfo:
# Anchored because we're matching the whole string
regexp {^(\w+)-\w+$} $connectInfo -> port
# You should check the result of regexp to see if the match actually succeeded
# if {![regexp {^(\w+)-\w+$} $connectInfo -> port]} {
# error "it didn't match! waaah!"
# }
However, writing a regular expression tends to require knowing the format of the data quite well. (If you're using regular expressions, put them in braces as that avoids backslashitis.)

Regular Expression in Perl to check fit/match

I'm in great trouble.
I must check if a string fits (matches) another string with RegEx.
For example, given the following string:
Apr 2 13:42:32 sandbox izxp[12000]: Received disconnect from 10.11.106.14: 10: disconnected by user
In the editable input field I give the program the following shortened string:
Received disconnect from 10.11.106.14: 10
If it fits the existing string (as you can see above), it is OK.
If any part of the new edited string doesn't fit the original string, I must warn the user with a message.
Could you help me solving this question with RegEx? Or another method?
I would appreciate it!
You must get the original string in a variable, let's call it $original (this is perl). Then you must get the input from the "editable input field", let's call it $input.
Then it is a simple
if ($original=~/$input/)
{
#Your code for a message to the user here
}
Your solution would be less regex and more escaping. Assuming you're going to use no regex patterns and just search for the input string literal, you should write your function so that it turns this
Received disconnect from 10.11.106.14: 10
into this
Received disconnect from 10\.11\.106\.14: 10
This can be achieved with many different libraries depending on which language you are using.
That will then allow you to check for a match.
Regular Expressions are more designed for common patterns in strings, rather than finding exact literals.

Regexp pattern matching IP and UserAgent in an Huge File

I have a huge log file that has a structure like this:
ip=X.X.X.X
userAgent=Firefox
-----
Referer=hxxp://www.bla.org
I want to create a custom output like this:
ip:userAgent
for ex:
X.X.X.X:Firefox
and the pattern will ignore lines which don't start with ip= and userAgent=. (these two must form a pair as i mentioned above.)
I am a newbie administrator and our client needs a sorted file immediately.
Any help will be wonderful.
Thanks.
^ip=(\d+(?:\.\d+){3})[\r\n]+userAgent=(.+)$
Apply in global + multiline mode.
Group 1 will contain the IP, group 2 will contain the user agent string.
Edit: The above expression can be simplified a bit, we can remove the IP address format checking - assuming that there will be nothing but real IP addresses in the log file:
^ip=(\d+\.?)+[\r\n]+userAgent=(.+)$
You can use:
^ip=((?:[0-9]{1,3}\.){3}[0-9]{1,3})$
And
^userAgent=(.*)$
Get the group 1 for both and you will have the desired data.
give it a try (this is in no way robust if there are lines where your log file differs from the example snippet above):
sed -n -e '/^ip=/ {s///
N
s/\nuserAgent=/:/
p
}' HugeFile > customoutput