Select multiple variables with Regex inside a single string - regex

Regex101 link
https://regex101.com/r/wOwFEV/2
Background
I have a dump of nmap reports and I want to extract data from to digest.
I have various inputs similar to:
23/tcp open telnet SMC SMC2870W Wireless Ethernet Bridge
The latter three variables change, but the common denominator is:
The first value is ALWAYS 23/tcp
They are ALWAYS separated by more than one space
There will ALWAYS be four values
I would like to use Regex to pluck each "variable" and assign it to a group.
Right now, I have
(?sm)(?=^23\/tcp)(?<port>.*?)\s*open
Which grabs 23/tcp and assigns it to <port>
But I also want to grab:
open and assign it to <state>
telnet and assign it to <service>
SMC SMC2870W Wireless Ethernet Bridge and assign it to <description>
If not an answer, I think knowing how to grab values between '2 or more' white spaces will solve this, but I can't find any similar examples!

A more specific regexp is:
(?sm)(?=^23\/tcp)(?<port>\d+\/\w+)\s+(?<state>\w*?)\s+(?<service>\w*?)\s+(?<description>.*?)\s$
This restricts the port to be digits/alphanumeric, and state and service to be alphanumeric. It only uses .* for the description, since it's arbitrary text.
And with this change, it's not necessary to require that there be at least 2 spaces between each field, it will work with any number of spaces.
DEMO

Nevermind, got it.
(?sm)(?=^23\/tcp)(?<port>.*?)\s{2,}(?<state>.*?)\s{2,}(?<service>.*?)\s{2,}(?<description>.*?)$
Will do exactly what I described.
https://regex101.com/r/wOwFEV/3

Related

Regex masking all phone numbers except a specific range

Not 100% if this is possible but I would like to convert any outbound call that does not match my DID range to a set phone number. 
With our carrier in Australia if the ANI is not from their supplied range the call is blocked as part of new regulations. 
What I am looking for is something like this. 
if not +61 2 XXXX XXXX - +61 2 XXXX  XXXX  then send as +612XXXX XXXX
I apologise I have no true understanding of regex and do not know even where to begin.
I am starting to work on my knowledge of it though. please be kind. If anyone can point me to an "idiots guide" link I would be appreciative as I am just getting into this.
Of course it's possible. It's just a matter of how much work you want to do. I'm not quite sure what you want to mask and what you want to pass on unmutilated. A couple of particular examples would help. How many different formats, countries, and so on do you need to support?
With these problems, I tend to follow this approach:
Normalize the data. Make them all look the same. So, remove all non-digits, for example. +61 2 XXXX XXXX turns into 612XXXXXXXX. In this step, you'd also fill in implicit information, like a local number that does not include the country code. Number::Phone may be interesting, but, also note is was the largest distro on CPAN for awhile.
Now it should be easier to recognize the number and it's components (because if it isn't, you didn't do Step 1 right). Instead of a regex, you might use a parser. That is, get the country code, and then from that, decide what has to happen next. That's the sort of thing I have to do with ISBNs in Business::ISBN, which have a group code then a publisher code (both of which are variable length.
Once you can recognize the number, it's easy to select a range. If it's in the range, you know what to replace.

Complex Regex is not validating all repetitions of one specific rule

I need help to validate a field using regex. It will run in Postgres 9.5.
The rules are
The string must contain all seven services: Oil, Wiper blades, Air filter, Tires, Battery, Brake, Antifreeze
All services must have the operation hours, and the accepted values are HH[:MM]{am|pm}-HH[:MM]{am|pm}, or the literals ”working hours”, ”after hours”, ”not available” (this is the rule that I couldn't find the solution)
It is case insensitive, and the spaces should be irrelevant.
The services as separated by a pipe, and the service and working hours are separated by a colon
I did the regex:
^(?=.*(Oil))(?=.*(Wiper blades))(?=.*(Air filter))(?=.*(Tires))(?=.*(Battery))(?=.*(Brake))(?=.*(Antifreeze))(?=.*(\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)\s{0,}-\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)|working hours|after hours|not availabl)).+
This part of the regex is validating only one sequence, not all seven sequences.
(?=.*(\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)\s{0,}-\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)|working hours|after hours|not availabl))
Example of good string
Oil:8AM-10PM|Wiper blades:8 AM -10 PM|Air filter:8AM-10pm|Tires:8AM-10PM|Battery:8AM-10PM|Brake:8AM-9PM|Antifreeze:not available
Example of bad strings
Oil:8AM-10PM|Wiper blades:8AM-10PM|Air filter:8AM-10PM|Tires:8AM-10PM|Battery:8AM-10PM|Brake:8AM-9PM|Antifreeze:fsdfdsfs
Oil:8AM-10PM|Wiper blades:8AM-10PM|Air filter:8AM|Tires:8AM-10PM|Battery:8AM-10PM|Brake:8AM-9PM|Antifreeze:
Oil:8AM-10PM|Wiper blades:8AM-10PM|Air filter:8AM-10PM|Tires:8AM-10PM|Battery:|Brake:|Antifreeze:8AM-9PM
Oil:8AM-10PM|Wiper blades:8AM-10PM
Do someone have any idea what is missing to validate the seven occurrences?
I've made another regex that works :
^(((oil|Air\ filter|Wiper\ blades|Tires|Battery|Brake|Antifreeze):((((\d{1,2})((A|P)M)(-?)){2})|(not available))(\|?)){7})$
How ever, this regex does not take counts of repetition. Which mean, you could have Oil two time it will still works.
I've create a regex101 if you wish to tests more cases.

Validate Street Address Format

I'm trying to validate the format of a street address in Google Forms using regex. I won't be able to confirm it's a real address, but I would like to at least validate that the string is:
[numbers(max 6 digits)] [word(minimum one to max 8 words with
spaces in between and numbers and # allowed)], [words(minimum one to max four words, only letters)], [2
capital letters] [5 digit number]
I want the spaces and commas I left in between the brackets to be required, exactly where I put them in the above example. This would validate
123 test st, test city, TT 12345
That's obviously not a real address, but at least it requires the entry of the correct format. The data is coming from people answering a question on a form, so it will always be just an address, no names. Plus they're all address is one area South Florida, where pretty much all addresses will match this format. The problem I'm having is people not entering a city, or commas, so I want to give them an error if they don't. So far, I've found this
^([0-9a-zA-Z]+)(,\s*[0-9a-zA-Z]+)*$
But that doesn't allow for multiple words between the commas, or the capital letters and numbers for zip. Any help would save me a lot of headaches, and I would greatly appreciate it.
There really is a lot to consider when dealing with a street address--more than you can meaningfully deal with using a regular expression. Besides, if a human being is at a keyboard, there's always a high likelihood of typing mistakes, and there just isn't a regex that can account for all possible human errors.
Also, depending on what you intend to do with the address once you receive it, there's all sorts of helpful information you might need that you wouldn't get just from splitting the rough address components with a regex.
As a software developer at SmartyStreets (disclosure), I've learned that regular expressions really are the wrong tool for this job because addresses aren't as 'regular' (standardized) as you might think. There are more rigorous validation tools available, even plugins you can install on your web form to validate the address as it is typed, and which return a wealth of of useful metadata and information.
Try Regex:
\d{1,6}\s(?:[A-Za-z0-9#]+\s){0,7}(?:[A-Za-z0-9#]+,)\s*(?:[A-Za-z]+\s){0,3}(?:[A-Za-z]+,)\s*[A-Z]{2}\s*\d{5}
See Demo
Accepts Apt# also:
(^[0-9]{1,5}\s)([A-Za-z]{1,}(\#\s|\s\#|\s\#\s|\s)){1,5}([A-Za-z]{1,}\,|[0-9]{1,}\,)(\s[a-zA-Z]{1,}\,|[a-zA-Z]{1,}\,)(\s[a-zA-Z]{2}\s|[a-zA-Z]{2}\s)([0-9]{5})

How to safely bypass tabs in RegEx

I'm using C to do my regular expressions. Things work except for when the input string contains tabs.
This is my RegEx I plug into the regcomp function:
(DROP).*(tcp).*([\\.0-9]+).*0\\.0\\.0\\.0.*dpt:([0-9]+)(.*)
Regcomp returned OK with no issues.
I then used the following string to do the matching with:
DROP\ttcp\t--\t202.153.39.52\t0.0.0.0/0\ttcp dpt:21
I'm using such string to simulate output of iptables because I want to make a program to see which IPs are already listed.
When I execute my program, I receive the following pieces of output after executing the RegEx where the first line is data from the first offset:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
21
Everything is correct except the second-last value. It shows 2, but I expect it to be 202.153.39.52. and I used ([\\.0-9]+) in my RegEx to try to specifically state I only want numbers and dots to match.
How do I fix my RegEx?
UPDATE
I then proceeded to use this RegEx instead in hopes I get each individual octet of the IP address
(DROP).*(tcp).*([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+).*(0\\.0\\.0\\.0).*dpt:([0-9]+)
This is my result:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
153
39
52
0.0.0.0
21
Now this means the first ([0-9]+) isn't processing properly. I should receive a 202, not a 2. Is there something I'm doing wrong? Do I need a special flag for any RegEx function?
I think you're confused about the difference between regex syntax and that syntax encoded as a string (in languages like Java that don't have first class regexes).
Try something more robust and commonsense:
DROP\s+tcp\s+\S+\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+0\.0\.0\.0/0\s+tcp\s+dpt:(\d+)
This will capture the ip address and the port number only. Why would you want to capture a fixed string like DROP?
As a string, this is:
"DROP\\s+tcp\\s+\\S+\\s+(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s+0\\.0\\.0\\.0/0\\s+tcp\\s+dpt:(\\d+)"
Use an online regex tester like this one for testing and to convert from regex to string automatically.

Change WiFi WPA2 passkey from a script

I'm using Raspbian Wheezy, but this is not a Raspberry Pi specific question.
I am developing a C application, which allows the user to change their WiFi Password.
I did not find a ready script/command for this, so I'm trying to use sed.
I pass the SSID name and new key to a bash script, and the key is replaced for the that ssid block within *etc/wpa_supplicant/wpa_supplicant.conf.*.
My application runs as root.
A sample block is shown below.
network={
ssid="MY_SSID"
scan_ssid=1
psk="my_ssid_psk"
}
so far I've tried the following (I've copied the wpa_supplicant.conf to wpa.txt for trying) :
(1) This tries to do the replacement between a range, started when my SSID is detected, and ending when the closing brace, followed by a newline.
SSID="TRIMURTI"
PSK="12345678"
sed -n "1 !H;1 h;$ {x;/ssid=\"${SSID}\"/,/}\n/ s/[[:space:]]*psk=.*\n/\n psk=\"${PSK}\"\n/p;}" wpa.txt
and
(2) This tries to 'remember' the matched pattern, and reproduce it in the output, but with the new key.
SSID="TRIMURTI"
PSK="12345678"
sed -n "1 !H; 1 h;$ {x;s/\(ssid=\"${SSID}\".*psk=\).*\n/\1\"${PSK}\"/p;}" wpa.txt
I have used hold & pattern buffers as the pattern can span multiple lines.
Above, the first example seems to ignore the range & replaces the 1st instance, and then truncates the rest of the file.
The second example replaces the last found psk value & truncates the file thereafter.
So I need help in correcting the above code, or trying a different solution.
If we can assume the fields will always be in a strict order where the ssid= goes before psk=, all you really need is
sed "/^[[:space:]]*ssid=\"$SSID\"[[:space:]]*$/,/}/s/^\([[:space:]]*psk=\"\)[^\"]*/\1$PSK/" wpa.txt
This is fairly brittle, though. If the input is malformed, or if the ssid goes after the psk in your block, it will break. The proper solution (which however is severe overkill in this case) is to have a proper parser for the input format; while that is in theory possible in sed, it would be much simpler if you were to swtich a higher-level language like Python or Perl, or even Awk.
The most useful case is update a password or other value in configuration is to utilize wpa_cli. E.g.:
wpa_cli -i "wlan0" set_network "0" psk "\"Some5Strong1Pass"\"
wpa_cli -i "wlan0" save_config
The save_config method is required to update cfg file: /etc/wpa_supplicant/wpa_supplicant.conf