[[:digit:]] Equivalent for characters bash - regex

What is the regex identifier for alpha characters equivalent to [[:digit:]] for numbers used within bash? My man page scrubbing skills aren't very good today.

man bash and search for "regular expression" leads to a paragraph that points to regex(3)
man 3 regex section "SEE ALSO" points to regex(7)
man 7 regex lists the available character classes, including digit and, what you're after, alpha

If I got you correctly, you need the character class for alphabets which is
[[:alpha:]]
and if you're looking for anything other than an alphabet, then
[^[:alpha:]] # well the ^ in the beginning of a range negates it
This [ article ] is an excellent read on Regex.
To grab the valid ip addresses from a file or input, i would use the below technique:
$ cat testfile.ip
Well this is a small para on IP addresses. Well to start with a string
like 172.217.26.206 represents an IP address. Well, in this case it
is Google's IP. To put it short an IP like 192.168.0.16 is the token by
which a computer connected to internet is known to the outside wolrd. It
is all numbers game! Really ! But as humans can't remember such crazy
numbers, some fellow devised a mechanism whereby we can call a computer
by names like 'puppy' or 'vodoo'. This mechanism is called the DNS system
whereby a computer is in charge of redirecting you to '192.168.0.34' in
case you asked for 'vodoo' and '192.168.234.255' in case you asked for
'puppy'. Well you've gateways as if you're going into some big cities. So
you'll often here "You have the wrong gateway, mine is the right one
which is 192.168.0.1." Well you have IPV6 addresses which are evern
crazier numbers. Also, you do have wrong IP addresses like
'288.134.43.22' and '999.1.0.255'. Aha ! You're in no man's land if you
are assigned these IPS. Oh ! are you an alien? Sounds scary. Aww. Bye
$ grep -oP '[\d]{0,3}\.[\d]{0,3}\.[\d]{0,3}\.[\d]{0,3}' testfile.ip |
awk -v FS="." '{for(i=1;i<=NF;i++){
if($i>=0 && $i<=255){continue;}else{next;}
}}1'
172.217.26.206
192.168.0.16
192.168.0.34
192.168.234.255
192.168.0.1

Related

Select multiple variables with Regex inside a single string

Regex101 link
https://regex101.com/r/wOwFEV/2
Background
I have a dump of nmap reports and I want to extract data from to digest.
I have various inputs similar to:
23/tcp open telnet SMC SMC2870W Wireless Ethernet Bridge
The latter three variables change, but the common denominator is:
The first value is ALWAYS 23/tcp
They are ALWAYS separated by more than one space
There will ALWAYS be four values
I would like to use Regex to pluck each "variable" and assign it to a group.
Right now, I have
(?sm)(?=^23\/tcp)(?<port>.*?)\s*open
Which grabs 23/tcp and assigns it to <port>
But I also want to grab:
open and assign it to <state>
telnet and assign it to <service>
SMC SMC2870W Wireless Ethernet Bridge and assign it to <description>
If not an answer, I think knowing how to grab values between '2 or more' white spaces will solve this, but I can't find any similar examples!
A more specific regexp is:
(?sm)(?=^23\/tcp)(?<port>\d+\/\w+)\s+(?<state>\w*?)\s+(?<service>\w*?)\s+(?<description>.*?)\s$
This restricts the port to be digits/alphanumeric, and state and service to be alphanumeric. It only uses .* for the description, since it's arbitrary text.
And with this change, it's not necessary to require that there be at least 2 spaces between each field, it will work with any number of spaces.
DEMO
Nevermind, got it.
(?sm)(?=^23\/tcp)(?<port>.*?)\s{2,}(?<state>.*?)\s{2,}(?<service>.*?)\s{2,}(?<description>.*?)$
Will do exactly what I described.
https://regex101.com/r/wOwFEV/3

regex in notepad++ or sed to return two different strings

I have a report that has information about a list of servers. I am wanting to search this list for uptime over a certain amount, and also the IP of the server. I have been using notepad++ to do the searching, but sed syntax would be ok too. The report has data like this:
some.dns.com
up 720 days,
some version
several lines of disk space information, between 14 and 16 lines
Connection to 10.1.1.1 closed.
some.other.dns
up 132 days,
some version
several lines of disk space information, between 14 and 16 lines
Connection to 10.1.1.2 closed.
I've come up with the following so far, which gives me the uptime threshold I need:
up ([9-9]\d|\d{3,} days,)
But I also need the IP addresses to make sense of it, and haven't been able to figure out a way to get JUST the IPs related to the servers with high uptime.
I've found something like this to find IP addresses:
((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))
So, I was hoping to return something like the following:
up 720 days,
10.1.1.1
You may actually use awk:
awk -F"\n" -v RS="" '$0 ~ /up (9[0-9]|[0-9]{3,}) days/{gsub(/Connection to | closed\./, "", $NF); print $1 "\n" $NF}' file > newfile
See the online demo
The file is read paragraph by paragraph, and fields are separated with a newline. If a record matches up (9[0-9]|[0-9]{3,}) days pattern (up with a space, then 9 followed with any digit or any 3 digits followed with space and days, then the last field ($NF) is stripped from the static text and the first and last fileds are printed.

Validate Street Address Format

I'm trying to validate the format of a street address in Google Forms using regex. I won't be able to confirm it's a real address, but I would like to at least validate that the string is:
[numbers(max 6 digits)] [word(minimum one to max 8 words with
spaces in between and numbers and # allowed)], [words(minimum one to max four words, only letters)], [2
capital letters] [5 digit number]
I want the spaces and commas I left in between the brackets to be required, exactly where I put them in the above example. This would validate
123 test st, test city, TT 12345
That's obviously not a real address, but at least it requires the entry of the correct format. The data is coming from people answering a question on a form, so it will always be just an address, no names. Plus they're all address is one area South Florida, where pretty much all addresses will match this format. The problem I'm having is people not entering a city, or commas, so I want to give them an error if they don't. So far, I've found this
^([0-9a-zA-Z]+)(,\s*[0-9a-zA-Z]+)*$
But that doesn't allow for multiple words between the commas, or the capital letters and numbers for zip. Any help would save me a lot of headaches, and I would greatly appreciate it.
There really is a lot to consider when dealing with a street address--more than you can meaningfully deal with using a regular expression. Besides, if a human being is at a keyboard, there's always a high likelihood of typing mistakes, and there just isn't a regex that can account for all possible human errors.
Also, depending on what you intend to do with the address once you receive it, there's all sorts of helpful information you might need that you wouldn't get just from splitting the rough address components with a regex.
As a software developer at SmartyStreets (disclosure), I've learned that regular expressions really are the wrong tool for this job because addresses aren't as 'regular' (standardized) as you might think. There are more rigorous validation tools available, even plugins you can install on your web form to validate the address as it is typed, and which return a wealth of of useful metadata and information.
Try Regex:
\d{1,6}\s(?:[A-Za-z0-9#]+\s){0,7}(?:[A-Za-z0-9#]+,)\s*(?:[A-Za-z]+\s){0,3}(?:[A-Za-z]+,)\s*[A-Z]{2}\s*\d{5}
See Demo
Accepts Apt# also:
(^[0-9]{1,5}\s)([A-Za-z]{1,}(\#\s|\s\#|\s\#\s|\s)){1,5}([A-Za-z]{1,}\,|[0-9]{1,}\,)(\s[a-zA-Z]{1,}\,|[a-zA-Z]{1,}\,)(\s[a-zA-Z]{2}\s|[a-zA-Z]{2}\s)([0-9]{5})

Matching variable words in a string

This will sound extremely nerdy, but I play this online game that writes its in-game events to a log file. There's a program I'm using that is capable of reading this log file, and it's also capable of interpreting regex. My goal is to write a regex command that analyzes a certain string from this log file and then spits out certain parts of the string onto my screen.
The string that gets written to the log file has the following syntax (variables in bold):
NAME hits/bashes/crushes/claws/whatever NEWNAME for NUMBER points of damage.
If it matters, NUMBER will never contain commas or spaces, and the action verb (hits, bashes, whatever) will only ever be a single word without any special characters, spaces, numbers, etc.
What I'd like this program to do is interpret the regex code that I enter and spit out a result that says: NAME attacks NEWNAME
The catch is, NAME and NEWNAME can have the following range of possibilities (names and examples picked at random):
Kevin
Kevin's pet
Kevin from Oregon
Kevin from Oregon's pet
Kevin from Oregon`s pet (note the grave accent there instead of the apostrophe)
It's pretty simple if it's just something like Kevin hits Josh for 10728 points of damage. In this case, my regex is the following code block (please note that the program interprets the {N} wildcard on its own as any number without the need for regex):
(?<char1>\w+) \w+ (?<char2>\w+) for {N} points of damage.
...and my output reads...
${char1} attacks ${char2}
Whenever the game outputs that string of Kevin hits Josh for 10728 points of damage. to the log file, the program I'm using picks up on it and correctly outputs Kevin attacks Josh to my screen.
However, using that regex line results in a failure when spaces, apostrophes, grave accents, and/or any combination of the three are present in either NAME or NEWNAME.
I tried to alter the regex line to read...
(?<char1>[a-zA-Z0-9_ ]+) \w+ (?<char2>[a-zA-Z0-9_ ]+) for {N} points of damage.
...but when I encounter the string Kevin bashes Josh of Texas for 2132344 points of damage., for example, the output to my screen winds up being:
Kevin bashes Josh attacks Texas.
I'm trying different things but ultimately not coming up with something that's spitting out the proper format of NAME attacks NEWNAME when those two variables contain spaces, apostrophes, grave accents, and/or any combination of the three.
Any help or tips on what I'm doing wrong or how I can further alter that regex line would be extremely appreciated!
This is going to sound even nerdier, but I think the question isn't the regex, it's what tool you use the regex in.
Your biggest problem thus far has been the names. I suggest ignoring the names, and focusing only on the elements you know are there. The names are what's left.
I tried this myself using GNU sed:
sed -e 's/for [[:digit:]]\+ points of damage//' -e 's/hits\|bashes\|crushes/attacks/'
You see, first we can eliminate the end of the sentence, which is wholly superfluous. Then, we simply switch the verb to "attacks".
If the program uses a synonym for "attacks" that you don't have yet, you'll still have reasonable output; you can then fix your regex to include the new synonym.
You are guaranteed trouble if somebody's name includes "bashes" (or whatever) in it.
The second sed expression should be improved to be relevant only at a word boundary, but I'll leave that as an exercise for the reader. :)

RegEx to match Bitcoin addresses?

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:
A Bitcoin address, or simply address, is an identifier of 27-34
alphanumeric characters, beginning with the number 1 or 3 [...]
I figured it would look something like this
/^[13][a-zA-Z0-9]{27,34}/
Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.
I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}, but I don't even know what the [^OIl] part means and it doesn't seem to match the 3 a Bitcoin address could start with.
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
bitcoin address is
an identifier of 26-35 alphanumeric characters
beginning with the number 1 or 3
random digits
uppercase
lowercase letters
with the exception that the uppercase letter O, uppercase letter I, lowercase letter l, and the number 0 are never used to prevent visual ambiguity.
[^OIl] matches any character that's not O, I or l. The problems in your regex are:
You don't have a $ at the end, so it'd match any string beginning with a BC address.
You didn't count the first character in your {27,34} - that should be {26,33}
However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.
^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
Based on the new address type Bech32
Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:
\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b
Including testnet address:
\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b
Only testnet:
\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b
Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:
\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b
Here are some other links where I found infos:
https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
http://r6.ca/blog/20180106T164028Z.html
As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.
These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).
Here are my test cases:
--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67
Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line? Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
- Standalone address:
1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72
--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"
"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah
src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg"
src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"
href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah
Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
(Too narrow and misses BitCoin addresses within a paragraph)
(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
(Still misses text after BTC on same line and triples execution time)
\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
(Too broad and catches URL formats)
The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):
[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s
One reference point for execution times (shows cost in steps and time): https://regex101.com/
Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.
Seth
for mainnet bitcoin
/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
if you don't want to understand the above regex you can skip the detail below
breaking it down
For regular addresses
/[13]{1}/
address will start with 1 or 3, {1} defines that only match one character in square bracket
/[13]{1}[a-km-zA-HJ-NP-Z1-9]/
cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)
/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/
can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long
For segwit
/bc1/
starts with bc1
/bc1[a-z0-9]/
can only contain lower case letters and numbers
/bc1[a-z0-9]{39,59}/
can be 42 to 62 characters long, we already checked first three characters to be bc1, so remaining address will be 39 to 59 characters long
I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.
\w{25,}
For matching legacy, nested SegWit, and native SegWit addresses:
/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
Source: Regex for Bitcoin Addresses.