Regex smallest capture - regex

I'm trying to find interfaces that contain specific words in the config 'blocks'. E.g.:
!
interface FastEthernet303
description Customer Access
switchport access vlan 40
no ip address
!
interface Vlan1
no ip address
shutdown
!
interface Vlan343
description Customer_LAN
vrf forwarding 1
ip address 1.1.1.1 255.255.254.0
!
interface Dialer1
description 1-1-1
ip flow monitor GO-FLOW input
ip flow monitor GO-FLOW output
keepalive 5 3
!
I want only the interface Dialer1 to be captured.
If I use the '!\ninterface (.?)\n(.).*?\n!' as regex it starts from
the first interface (matches first !\interface etc) and captures accros interface blocks and I want it only to capture blocks with for example the key words 'flow monitor' in them:
interface Dialer1
description 1-1-1
ip flow monitor GO-FLOW input
ip flow monitor GO-FLOW output
keepalive 5 3
I've tried some negative lookup but can't seem to get it right.
Anyone able to help with this please?
The next step would be to extract the interface name but that should be easy once I have the first part.
Many Thanks
Frank

I actually think you can do this without using lookarounds. In the pattern below I use [^!] to cautiously proceed in the pattern without passing an interface marker !.
interface[^!]*flow monitor[^!]*
This answer strongly relies on ! serving as a divider between interfaces. If this not be the case, then my answer is going to have to change.
Demo

Related

One line command that sets error code 1 if string in multi-line stdin, 0 if not

When I run the command
nmcli --get-values TYPE connection show --active
I sometimes receive a list of values as follows
vpn
802-3-ethernet
tun
tun
But other times the vpn line is not present. (the order of the lines cannot be assumed)
I'm looking for a one-liner that will accept the output of that nmcli command (presumably via pipe/stdin?) and return an exit code of 1 when vpn is in that list and an exit code of 0 when vpn is not in that list.
What I've Tried
Every combination of grep that I can think of. grep -v will absolutely not work because it will always find a line that is not vpn. Other options to grep return data, but do not change the error code (that I can find).
Every negation regex I can find or think of. Regexes in the form ^(?!vpn).*$ do not work because there will always be a line that does not say vpn.
Use Case
I am writing a systemd service to update my dynamic DNS. But I don't want to set my dynamic DNS while I'm on a VPN. I want to use systemd built-in abilities (as much as possible) to control it. So I want to use the systemd built-in ExecStartPre= (which fails the unit on exit code 1+) to control whether the service starts.
If you've got a way to run a service (or not) using systemd depending on whether a VPN is connected, I'll accept that in lieu of the above. But naive assumptions like "tun0 active=VPN" are false for me. I have various tun connections active at any one time, for various reasons. So triggering on sys-subsystem-net-devices-tun0.device does not work.
What Doesn't Work
Most of the Google and SO results I find are for line-specific negation and do not address my use case where there will be multiple lines. Or they return the values and do not set the error code. I need error codes set.
Line-based: Regular expression to match a line that doesn't contain a word
Line-based: How to negate specific word in regex?
Instead of just checking for existence of vpn in the output, count the number of occurences, then evaluate a conditional expression checking on the number of lines:
nmcli --get-values TYPE connection show --active | [ $(grep -c ^vpn) -eq 0 ]

Select multiple variables with Regex inside a single string

Regex101 link
https://regex101.com/r/wOwFEV/2
Background
I have a dump of nmap reports and I want to extract data from to digest.
I have various inputs similar to:
23/tcp open telnet SMC SMC2870W Wireless Ethernet Bridge
The latter three variables change, but the common denominator is:
The first value is ALWAYS 23/tcp
They are ALWAYS separated by more than one space
There will ALWAYS be four values
I would like to use Regex to pluck each "variable" and assign it to a group.
Right now, I have
(?sm)(?=^23\/tcp)(?<port>.*?)\s*open
Which grabs 23/tcp and assigns it to <port>
But I also want to grab:
open and assign it to <state>
telnet and assign it to <service>
SMC SMC2870W Wireless Ethernet Bridge and assign it to <description>
If not an answer, I think knowing how to grab values between '2 or more' white spaces will solve this, but I can't find any similar examples!
A more specific regexp is:
(?sm)(?=^23\/tcp)(?<port>\d+\/\w+)\s+(?<state>\w*?)\s+(?<service>\w*?)\s+(?<description>.*?)\s$
This restricts the port to be digits/alphanumeric, and state and service to be alphanumeric. It only uses .* for the description, since it's arbitrary text.
And with this change, it's not necessary to require that there be at least 2 spaces between each field, it will work with any number of spaces.
DEMO
Nevermind, got it.
(?sm)(?=^23\/tcp)(?<port>.*?)\s{2,}(?<state>.*?)\s{2,}(?<service>.*?)\s{2,}(?<description>.*?)$
Will do exactly what I described.
https://regex101.com/r/wOwFEV/3

[[:digit:]] Equivalent for characters bash

What is the regex identifier for alpha characters equivalent to [[:digit:]] for numbers used within bash? My man page scrubbing skills aren't very good today.
man bash and search for "regular expression" leads to a paragraph that points to regex(3)
man 3 regex section "SEE ALSO" points to regex(7)
man 7 regex lists the available character classes, including digit and, what you're after, alpha
If I got you correctly, you need the character class for alphabets which is
[[:alpha:]]
and if you're looking for anything other than an alphabet, then
[^[:alpha:]] # well the ^ in the beginning of a range negates it
This [ article ] is an excellent read on Regex.
To grab the valid ip addresses from a file or input, i would use the below technique:
$ cat testfile.ip
Well this is a small para on IP addresses. Well to start with a string
like 172.217.26.206 represents an IP address. Well, in this case it
is Google's IP. To put it short an IP like 192.168.0.16 is the token by
which a computer connected to internet is known to the outside wolrd. It
is all numbers game! Really ! But as humans can't remember such crazy
numbers, some fellow devised a mechanism whereby we can call a computer
by names like 'puppy' or 'vodoo'. This mechanism is called the DNS system
whereby a computer is in charge of redirecting you to '192.168.0.34' in
case you asked for 'vodoo' and '192.168.234.255' in case you asked for
'puppy'. Well you've gateways as if you're going into some big cities. So
you'll often here "You have the wrong gateway, mine is the right one
which is 192.168.0.1." Well you have IPV6 addresses which are evern
crazier numbers. Also, you do have wrong IP addresses like
'288.134.43.22' and '999.1.0.255'. Aha ! You're in no man's land if you
are assigned these IPS. Oh ! are you an alien? Sounds scary. Aww. Bye
$ grep -oP '[\d]{0,3}\.[\d]{0,3}\.[\d]{0,3}\.[\d]{0,3}' testfile.ip |
awk -v FS="." '{for(i=1;i<=NF;i++){
if($i>=0 && $i<=255){continue;}else{next;}
}}1'
172.217.26.206
192.168.0.16
192.168.0.34
192.168.234.255
192.168.0.1

How to safely bypass tabs in RegEx

I'm using C to do my regular expressions. Things work except for when the input string contains tabs.
This is my RegEx I plug into the regcomp function:
(DROP).*(tcp).*([\\.0-9]+).*0\\.0\\.0\\.0.*dpt:([0-9]+)(.*)
Regcomp returned OK with no issues.
I then used the following string to do the matching with:
DROP\ttcp\t--\t202.153.39.52\t0.0.0.0/0\ttcp dpt:21
I'm using such string to simulate output of iptables because I want to make a program to see which IPs are already listed.
When I execute my program, I receive the following pieces of output after executing the RegEx where the first line is data from the first offset:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
21
Everything is correct except the second-last value. It shows 2, but I expect it to be 202.153.39.52. and I used ([\\.0-9]+) in my RegEx to try to specifically state I only want numbers and dots to match.
How do I fix my RegEx?
UPDATE
I then proceeded to use this RegEx instead in hopes I get each individual octet of the IP address
(DROP).*(tcp).*([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+).*(0\\.0\\.0\\.0).*dpt:([0-9]+)
This is my result:
DROP tcp -- 202.153.39.52 0.0.0.0/0 tcp dpt:21
DROP
tcp
2
153
39
52
0.0.0.0
21
Now this means the first ([0-9]+) isn't processing properly. I should receive a 202, not a 2. Is there something I'm doing wrong? Do I need a special flag for any RegEx function?
I think you're confused about the difference between regex syntax and that syntax encoded as a string (in languages like Java that don't have first class regexes).
Try something more robust and commonsense:
DROP\s+tcp\s+\S+\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+0\.0\.0\.0/0\s+tcp\s+dpt:(\d+)
This will capture the ip address and the port number only. Why would you want to capture a fixed string like DROP?
As a string, this is:
"DROP\\s+tcp\\s+\\S+\\s+(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s+0\\.0\\.0\\.0/0\\s+tcp\\s+dpt:(\\d+)"
Use an online regex tester like this one for testing and to convert from regex to string automatically.

RegEx to match Bitcoin addresses?

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:
A Bitcoin address, or simply address, is an identifier of 27-34
alphanumeric characters, beginning with the number 1 or 3 [...]
I figured it would look something like this
/^[13][a-zA-Z0-9]{27,34}/
Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.
I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}, but I don't even know what the [^OIl] part means and it doesn't seem to match the 3 a Bitcoin address could start with.
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
bitcoin address is
an identifier of 26-35 alphanumeric characters
beginning with the number 1 or 3
random digits
uppercase
lowercase letters
with the exception that the uppercase letter O, uppercase letter I, lowercase letter l, and the number 0 are never used to prevent visual ambiguity.
[^OIl] matches any character that's not O, I or l. The problems in your regex are:
You don't have a $ at the end, so it'd match any string beginning with a BC address.
You didn't count the first character in your {27,34} - that should be {26,33}
However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.
^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
Based on the new address type Bech32
Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:
\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b
Including testnet address:
\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b
Only testnet:
\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b
Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:
\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b
Here are some other links where I found infos:
https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
http://r6.ca/blog/20180106T164028Z.html
As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.
These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).
Here are my test cases:
--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67
Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line? Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
- Standalone address:
1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72
--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"
"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah
src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg"
src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"
href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah
Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
(Too narrow and misses BitCoin addresses within a paragraph)
(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
(Still misses text after BTC on same line and triples execution time)
\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
(Too broad and catches URL formats)
The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):
[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s
One reference point for execution times (shows cost in steps and time): https://regex101.com/
Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.
Seth
for mainnet bitcoin
/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
if you don't want to understand the above regex you can skip the detail below
breaking it down
For regular addresses
/[13]{1}/
address will start with 1 or 3, {1} defines that only match one character in square bracket
/[13]{1}[a-km-zA-HJ-NP-Z1-9]/
cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)
/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/
can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long
For segwit
/bc1/
starts with bc1
/bc1[a-z0-9]/
can only contain lower case letters and numbers
/bc1[a-z0-9]{39,59}/
can be 42 to 62 characters long, we already checked first three characters to be bc1, so remaining address will be 39 to 59 characters long
I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.
\w{25,}
For matching legacy, nested SegWit, and native SegWit addresses:
/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
Source: Regex for Bitcoin Addresses.