Regex - finding all 3 letter words - unexpected outcome - regex

Using regexpal.com to practice my regular expressions. I decided to start simply and ran into a problem.
Say you want to find all 3 letter words.
\s\w{3}\s
\s - space
\w - word characters
{3} - 3 and only 3 of the previous character
\s
If I have two three letter words next to each other example " and the " only the first is selected. I thought that after a regex found a match it would go back one character and start searching for the next matching string. (In which case it would "find" both " and " & " the ".

(?<=\s)\w{3}(?=\s)
Overlapping spaces.
Use 0 width assertions instead.When you use \s\w{3}\s on " abc acd " the regex engine consumes abc so the only thing left is acd which your regex will not match.So use lookaround to just assert and not consume.
EDIT:
\b\w{3}\b
Can also be used.
\b==>assert position at a word boundary (^\w|\w$|\W\w|\w\W)
or
(?:^|(?<=\s))\w{3}(?=\s|$)
This will find your 3 letter word even if it is at start or in middle or at end.

Related

Regular expression to limit the character found to 2 or 3 not more [duplicate]

Using regexpal.com to practice my regular expressions. I decided to start simply and ran into a problem.
Say you want to find all 3 letter words.
\s\w{3}\s
\s - space
\w - word characters
{3} - 3 and only 3 of the previous character
\s
If I have two three letter words next to each other example " and the " only the first is selected. I thought that after a regex found a match it would go back one character and start searching for the next matching string. (In which case it would "find" both " and " & " the ".
(?<=\s)\w{3}(?=\s)
Overlapping spaces.
Use 0 width assertions instead.When you use \s\w{3}\s on " abc acd " the regex engine consumes abc so the only thing left is acd which your regex will not match.So use lookaround to just assert and not consume.
EDIT:
\b\w{3}\b
Can also be used.
\b==>assert position at a word boundary (^\w|\w$|\W\w|\w\W)
or
(?:^|(?<=\s))\w{3}(?=\s|$)
This will find your 3 letter word even if it is at start or in middle or at end.

How to find the first occurrence of sub-strings not ended with specified characters

I'm gonna select the first occurrence of an only-alphabet string which is not ended by any of the characters ".", ":" and ";"
For example:
"float a bbc 10" --> "float"
"float.h" --> null
"float:: namespace" --> "namesapace"
"float;" --> null
I came up with the regex \G([A-z]+)(?![:;\.]) but it only ignores the character before the banned characters, while I need it to skip all string before banned characters.
You may use
/(?<!\S)[A-Za-z]++(?![:;.])/
See the regex demo. Make sure not to use the g modifier to get the first match only.
One of the main trick here is to use a possessive ++ quantifier to match all consecutive letters and check for :, ; or . only once right after the last of the matched letters.
Pattern details
(?<!\S) - either whitespace or start of string should immediately precede the current location
[A-Za-z]++ - 1+ letters matched possessively allowing no backtracking into the pattern
(?![:;.]) - a negative lookahead that fails the match if there is a ;, : or . immediately to the right of the current location.

How to use a regex to match if any pattern appears once out of many times in a given sequence

Hard to word this correctly, but TL;DR.
I want to match, in a given text sentence (let's say "THE TREE IS GREEN") if any space is doubled (or more).
Example:
"In this text,
THE TREE IS GREEN should not match,
THE TREE IS GREEN should
and so should THE TREE IS GREEN
but double-spaced TEXT SHOULD NOT BE FLAGGED outside the pattern."
My initial approach would be
/THE( {2,})TREE( {2,})IS( {2,})GREEN/
but this only matches if all spaces are double in the sequence, therefore I'd like to make any of the groups trigger a full match. Am I going the wrong way, or is there a way to make this work?
You can use Negative lookahead if there is an option.
First match the sentence that you want to fail, in your case, it is "THE TREE IS GREEN" then give the most generic case that wants to catch your desired result.
(?!THE TREE IS GREEN)(THE[ ]+TREE[ ]+IS[ ]+GREEN)
https://regex101.com/r/EYDU6g/2
You can just search for the spaces that you're looking for:
/ {2,}/ will work to match two or more of the space character. (https://regexr.com/4h4d4)
You can capture the results by surrounding it with parenthesis - /( {2,})/
You may want to broaden it a bit.
/\s{2,}/ will match any doubling of whitespace.
(\s - means any whitespace - space, tab, newline, etc.)
No need to match the whole string, just the piece that's of interest.
If I am not mistaken you want the whole match if there is a part present where there are 2 or more spaces between 2 uppercased parts.
If that is the case, you might use:
^.*[A-Z]+ {2,}[A-Z]+.*$
^ Start of string
.*[A-Z]+ match any char except a newline 0+ time, then match 1+ times [A-Z]
[ ]{2,} Match 2 or more times a space (used square brackets for clarity)
A-Z+ Match 1+ times an uppercase char
.*$ Match any char except a newline 0+ times until the end of the string
Regex demo
You could do this:
import re
pattern = r"THE +TREE +IS +GREEN"
test_str = ("In this text,\n"
"THE TREE IS GREEN should not match,\n"
"THE TREE IS GREEN should\n"
"and so should THE TREE IS GREEN\n"
"but double-spaced TEXT SHOULD NOT BE FLAGGED outside the pattern.")
matches = re.finditer(pattern, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
if match.group() != 'THE TREE IS GREEN':
print ("{match}".format(match = match.group()))

What is the regex for "any number doesn't start with 0, but allows spaces in front and behind"

I have tried the following regex:
(([ ]+[1-9](\d+)?+[ ]+)|([ ]+[1-9](\d+)?)|[1-9](\d+)?[ ]+)|([1-9](\d+)?)
The strings I am testing against the regex are the following:
"Good" strings that should match:
" 3 "
"3"
"3 "
"Bad" strings that should fail:
" 03 "
"03"
"03 "
I have this so far, but the results I get are weird.
Simply you could try the below regex.
^\s*[1-9]\d*\s*$
You can use this regex:
^\s*[1-9][0-9]*\s*$
See demo
Explanation:
^ - Beginning of a line/string
\s* - Optional whitespace, any number of repetitions
[1-9] - A digit not equal to 0
[0-9]* - any number of digits
\s*$ - Trailing whitespace before string/line end
This regex will work almost with any regex engine, since not all of them know \d (although the majority recognizes this pattern.) Instead of \s, you can even use a literal space , but again, that all depends on the regex engine you will choose.
This will do
^\s*[1-9]\d*\s*$
But some have been faster than me to post.

Need a regular expression for alphanumeric with 1 hypen present and space inbetween words

Can you please provide me with a regular expression that would
Allow only alphanumeric
Have definitely only one hyphen in the entire string
Hyphen or spaces not allowed at the front and back of the string
no consecutive space or hyphens allowed.
hypen and one space can be present near each other
Valid - "123-Abc test1","test- m e","abc slkh-hsds"
Invalid - " abc ", " -hsdj sdsd hjds- "
Thanks for helping me out on the same. Your help is much appreciated
/^([a-zA-Z0-9] ?)+-( ?[a-zA-Z0-9])+$/
See demo here.
EDIT:
If there can't be a space on both sides of the hyphen, then there needs to be a little more:
/^([a-zA-Z0-9] ?)+-(((?<! -) )?[a-zA-Z0-9])+$/
^^^^^^^^ ^
Alternatively, if negative lookbehind assertions aren't supported (e.g. in JavaScript), then an equivalent regex:
/^([a-zA-Z0-9]( (?!- ))?)+-( ?[a-zA-Z0-9])+$/
^ ^^^^^^^ ^
Only alphanumeric (hyphen and space included, otherwise it'd make no sense):
^[\da-zA-Z -]+$
This is the main part that will match the string and makes sure that every character is in the given set. I.e. digits and ASCII letters as well as space and hyphen (the use of which will be restricted in the following parts).
Only one hyphen and none at the start or end of the string:
(?=^[^-]+-[^-]+$)
This is a lookahead assertion making sure that the string starts and ends with at least one non-hyphen character. A single hyphen is required in the middle.
No space at the start or end or the string:
(?=^[^ ].*[^ ]$)
Again a lookahead, similar to the one above. They could be combined into one, but it looks much messier and is harder to explain.
No consecutive spaces (consecutive hyphens are ruled out already by 2. above):
(?!.* )
Putting it all together:
(?!.* )(?=^[^ ].*[^ ]$)(?=^[^-]+-[^-]+$)^[\da-zA-Z -]+$
Quick PowerShell test:
PS> $re='(?!.* )(?=^[^ ].*[^ ]$)(?=^[^-]+-[^-]+$)^[\da-zA-Z -]+$'
PS> "123-Abc test1","test- m e","abc slkh-hsds"," abc ", " -hsdj sdsd hjds- " -match $re
123-Abc test1
test- m e
abc slkh-hsds
Use this regex:
^(.+-.+)[\da-zA-Z]+[\da-zA-Z ]*[\da-zA-Z]+$