Special regex rule for wordlist - regex

How do I create a proper regex rule to the following rules?
Must contain at least one number.
Must contain at least one alphabetic character a-z.
No more than 4 of the same number or letter in a row.
Any ideas? Thank you!

Use lookaheads and back referencing:
^(?=.*[a-z])(?=.*\d)(?!.*(.)\1{3}).*$
Regex101
Edit: If you do not wish to match strings that have white space characters, you can do:
^(?=.*[a-z])(?=.*\d)(?!.*(.)\1{3})\S*$ // replaced the . at the end with \S

Related

Capturing uppercase words in text with regex

I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.
I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}. The capture group also includes spaces at the start or the end of those words (like a boundary).
I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2
Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go and it has no look-behind/ahead.
In your regex, you just need to change the second * for a +:
[A-Z]*(?: +[A-Z]+){4,}
Explanation
While using (?: +[A-Z]*), you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the * by a +, you matches spaces if there are uppercase after.
Demo on regex101
Replace the *s by +s, and your regex only matches the words in the first sentence.
.* also matches the empty string. Looking at you regex and ignoring both [A-Z]*, all that remains is a sequence of spaces. Using + makes sure that there is at least one uppercase char between every now and then.
You had to mark at least 1 upper case as [A-Z]*(?: +[A-Z]+){4,} see updated regex.
A better Regex will allow non spaces as [A-Z]*(?: *[A-Z]+){4,}.see better regex
* After will indicate to allow at least upper case even without spaces.

Limiting RegEx to match only a string of 1-254 characters length

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.

Regex: Match after first letter

I have a list of words as follows:
cat
concatenate
matter
pattern
hat
rather
fathom
at
saturate
vat
I need a regular expression to match any words which are a single letter followed by the letters 'at'.
I currently have [A-Za-z]at but that includes the 'cat' and 'nat' in 'concatenate' and the 'rat' in 'saturate'.
How can I make it look for exactly one character before, and make sure that there is not more than 1 character before the 'at'. I tried using {1} but that still didn't work. Thanks for your help.
Use word boundary:
\b[A-Za-z]at\b
or, if you have string contains just those 3 characters, then you can use anchors:
^[A-Za-z]at$
You can use ^[A-Za-z]at$
[A-za-z] would check for a single letter. Following at would look for exact match.
Using the ^ and $ sign would force the word to start and end in the given boundaries.

Perl matching characters bigger than a given length

I have been struggle to write regex that matches words longer than a given length within parentheses. First I thought I could do this with \(\w{a,}\) but I realize that it doesn't match with words with white space (ab cd ef). All I want to do is find out any characters within parentheses longer than, for instance, 3 characters. How can I resolve this problem ?
What is a word with white space?
if you want to match any character then use .
\(.{3,}\)
. matches any character except newlines
But be careful, this is greedy. it will match for example also
(a)123(b)
To avoid this you could do something like
\([^)]{3,}\)
See it here online on Regexr
[^)] means any character except a )
You could use a character class that includes both \w and \s:
\([\w\s]{a,}\)
Maybe do you mean?
\([\w\s]{a,}\)
if it has a space in it it's not a word anymore.
is matching any characters fine \(.{a,}\)? Or you just need the whitespace \(\(\w|\s\){a,}\)?

Regex to validate number and letter sequence

I want a regex to validate inputs of the form AABBAAA, where A is a a letter (a-z, A-Z) and B is a digit (0-9). All the As must be the same, and so must the Bs.
If all the A's and B's are supposed to be the same, I think the only way to do it would be:
([a-zA-Z])\1([0-9])\2\1\1\1
Where \1 and \2 refer to the first and second parenthetical groupings. However, I don't think all regex engines support this.
It's really not as hard as you think; you've got most of the syntax already.
[a-zA-Z]{2}[0-9]{2}[a-zA-Z]{3}
The numbers in braces ({}) tell how many times to match the previous character or set of characters, so that matches [a-zA-Z] twice, [0-9] twice, and [a-zA-Z] three times.
Edit: If you want to make sure the matched string is not part of a longer string, you can use word boundaries; just add \b to each end of the regex:
\b[a-zA-Z]{2}[0-9]{2}[a-zA-Z]{3}\b
Now "Ab12Cde" will match but "YZAb12Cdefg" will not.
Edit 2: Now that the question has changed, backreferences are the only way to do it. edsmilde's answer should work; however, you may need to add the word boundaries to get your final solution.
\b([a-zA-Z])\1([0-9])\2\1\1\1\b
[a-zA-Z]{2}\d{2}[a-zA-Z]{3}