regular expression for matching - regex

It is for a normal register name, could be 1-n characters with a-zA-Z and -, like
larry-cai, larrycai, larry-c-cai, l,
but - can't be the first and end character, like
-larry, larry-
my thinking is like
^[a-zA-Z]+[a-zA-Z-]*[a-zA-Z]+$
but the length should be 2 if my regex
should be simple, but don't how to do it
Will be nice if you can write it and pass http://tools.netshiftmedia.com/regexlibrary/

You didn't specify which regex engine you're using. One way would be (if your engine supports lookaround):
^(?!-)[A-Za-z-]+(?<!-)$
Explanation:
^ # Start of string
(?!-) # Assert that the first character isn't a dash
[A-Za-z-]+ # Match one or more "allowed" characters
(?<!-) # Assert that the previous character isn't a dash...
$ # ...at the end of the string.
If lookbehind is not available (for example in JavaScript):
^(?!-)[A-Za-z-]*[A-Za-z]$
Explanation:
^ # Start of string
(?!-) # Assert that the first character isn't a dash
[A-Za-z-]* # Match zero or more "allowed" characters
[A-Za-z] # Match exactly one "allowed" character except dash
$ # End of string

This should do it:
^[a-zA-Z]+(-[a-zA-Z]+)*$
With this there need to be one or more alphabetic characters at the begin (^[a-zA-Z]+). And if there is a - following, it needs to be followed by at least one alphabetic character (-[a-zA-Z]+). That pattern can be repeated arbitrary times until the end of the string is reached.

A simple answer would be:
^(([a-zA-Z])|([a-zA-Z][a-zA-Z-]*[a-zA-Z]))$
This matches either a string with length 1 and characters a-zA-Z or it matches an improved version of your original expression which is fine for strings with length greater than 1.
Credit for the improvement goes to Tim and ridgerunner (see comments).

Try this:
^[a-zA-Z]+([-]*[a-zA-Z])*$

Not sure which lazy group takes precedence..
^[a-zA-Z][a-zA-Z-]*?[a-zA-Z]?$

maybe this?
^[^-]\S*[^-]$|^[^-]{1}$

Related

Regular expression - Allow period('.') in the middle of the string but not at the end

I am using a regular expression to allow and reject strings based on the criteria--
The expression used-
^([\w\.,'()#&-]|\s)*$
Allows-
exmple_
example(ggg)
exam.pl56e
exam.pl56e.hbhbh.
exampleghh. vgvj
example (bb)ste kklk ae
_example_
Currently, it allows adding period in the middle of the string as well as at the end.
I just want to reject string if the period is added at the end of the string but allow it to be added in the middle using the above regular expression
For example, reject-
Test.test1.
Example.
Test Test.
test#example.
exam.pl56e.hbhbh.
You may use a single character class in the pattern (merge \s with the previous character class) to simplify the pattern, and use either
^([\w.,'()#&\s-]*[\w,'()#&\s-])?$
See the regex demo.
Details
^ - start of string
([\w.,'()#&\s-]*[\w,'()#&\s-])? - an optional sequence (if you want to match at least 1 char, remove ( and )?) of:
[\w.,'()#&\s-]* - 0+ word, ., ,, ', (, ), #, &, whitespace or hyphen chars
[\w,'()#&\s-] - a word, ,, ', (, ), #, &, whitespace or hyphen chars (but no .!)
$ - end of string
Or, a lookbehind version:
^[\w.,'()#&\s-]*$(?<!\.)
It matches a string that only consists of the chars inside the character class, and after the end of string is matched, the lookbehind checks if the last char is a dot. If it is, the match is failed.
Or, a lookahead
^(?!.*\.$)[\w.,'()#&\s-]*$
Here, (?!.*\.$) checks if the string ends with . after any 0+ chars, and if it does, no match is returned. Else, the string is matched against the [\w.,'()#&\s-]* pattern.
Just specify that the last character cannot be a period.
^([\w\.,'()#&-]|\s)*[^.]$
A nice trick I've learned is to blacklist certain otherwise-allowed expressions by placing them on their own in an unmatched alternation in front of the matched one.
# sentences containing `foo` or `bar` but not the word `foobar`
^.*foobar.*$|(^.*foo.*$)|(^.*bar.*$)
This is admittedly a bit...verbose here:
^(?:[\w\.,'()#&-]|\s)*\.$|^([\w\.,'()#&-]|\s)*$
So it might be better to use a negative lookbehind
^([\w\.,'()#&-]|\s)*$(?<!\.)
You could use a negative lookahead (?!) to assert that what follows are not the characters in the character class repeated zero or more times ending with a dot at the end of the string.
^(?![\w\.,'()#&\s-]*\.$)[\w\.,'()#&\s-]*$
Note that using the asterix * it matches zero or more times.

Regular Expression Valid and Invalid togather

I the below items i want to only detect the valid items with regular expression.
Space in word means invalid, # sign means invalid, Starting word with number is invalid.
Invalid : M_123 ASD
Invalid : M_123#ASD
Invalid : 1_M# ADD
Valid : M_125ASD
Valid : M_125$ASD
I am trying as below :
[A-Za-z0-9_$]
Not working properly. I need to set both valid and invalid sets for a word.
Can i do a match with regular expression?
Your regex [A-Za-z0-9_$] presents a character class that matches a single character that is either an ASCII letter or digit, or _ or $ symbols. If you use it with std::regex_match, it would only match a whole string that consists of just one char like that since the pattern is anchored by default when used with that method. If you use it with an std::regex_search, a string like ([_]) would pass, since the regex is not anchored and can find partial matches.
To match 0 or more chars, you need to add * quantifier after your class. To match one or more chars, you need to add + quantifier after your character class. However, you have an additional restriction: a digit cannot appear at the start.
It seems you may use
^[A-Za-z][A-Za-z0-9_$]*$
See the regex demo at regex101.com.
Details:
^ - start of string
[A-Za-z] - an ASCII letter (exactly one occurrence)
[A-Za-z0-9_$]* - 0+ ASCII letters, digits, _ or $
$ - end of string anchor.
Note that with regex_match, you may omit ^ and $ anchors.
So the requirements are
cannot start with number( i am assuming it as start with alphabet)
cannot contain space or #
all other characters are valid
you can try this regex ^[a-zA-Z]((?![\# ]).)+?$
^[a-zA-Z] checks for alphabet at start of the line
((?![\# ]).)+?$ checks if there are no # or space in the remaining part of the line.
Online demo here
EDIT
As per Wiktor's comment the regex can be simplified to ^[a-zA-Z][^# ]+$.

How do I check a whole existing regular expression for a digit?

I have written a regular expression as follows:
"^[\+]{0,1}([\#]|[\*]|[\d]){1,15}$"
In summary this matches an optional '+' sign followed by up to 15 characters which might be '#', '*' or a digit.
However, this means that '+#' will match and this is not a valid result as I always need at least one number.
Typical valid matches might be:
+1234
445678999
+#7897897
+345764756#775
So, given that I've crafted a valid RegEx for these to match, I guess the elegant solution is to use this regex and add some special criterion to globally check for a digit in the result OR somehow disallow anything which doesn't have at least one digit in.
How do I check for that digit?
This solutions requires at least one digit in the string, using lookahead (the (?=...) section):
^(?=.*\d)\+?[#*\d]{1,15}$
Legenda
^ # Start of the string (or line with m/multiline flag)
(?=.*\d) # Lookahead that checks for at least one digit in the match
\+? # An optional literal plus '+'
[#*\d]{1,15} # one to fifteen of literal '#' or '*' or digit (\d is [0-9])
$ # End of the string (line with m/multiline flag)
Online Demo
Regex graphical schema (everybody loves it)
NOTE: as you can see in the demo avoid also combinations just like +* or + or #* , you get it...
Try this regex (my first idea initially):
^(?=.*[0-9])[+]?([#*\d]{1,15})$
You can replace [0-9] with \d.
DEMO:
https://regex101.com/r/bM9oE6/3
I'd use
^(?=.*\d)\+?[#*\d]{1,15}$
Explanation:
^ : begining of line
(?= : lookahead
.*\d : at least one digit
)
\+? : optional +
[#*\d]{1,15} : 1 to 15 character in class [#*\d]
$ : end of line
matched:
+1234
445678999
+#7897897
+345764756#775
###456
not matched:
+#*
+*
#*
+#
This should work in your case:
^(\+{0,1}[\d#]{1,15})$
Demo:
https://regex101.com/r/fU1eC2/1
Edit:
If you need # after + in string use ^[+#]?([\d#]{1,15})(?<!#)$
matches "+#7897897"
If don't, use ^[+#]*([\d#]{1,15})(?<!#)$
matches "+#7897897"

How to write a RegEx pattern that accepts a string with at most one of each letter, but unordered?

I have tried this:
[a]?[b]?[c]?[d]?[e]?[f]?[g]?[h]?[i]?[j]?[k]?[l]?[m]?[n]?[o]?[p]?[q]?[r]?[s]?[t]?[u]?[v]?[w]?[x]?[y]?[z]?
But this RegEx rejects string where the order in not alphabetical, like these:
"zabc"
"azb"
I want patterns like these two to be accepted too. How could I do that?
EDIT 1
I don't want letter repetitions, i.e., I want the following strings to be rejected:
aazb
ozob
Thanks.
You can use a negative lookahead assertion to make sure no two characters are the same:
^(?!.*(.).*\1)[a-z]*$
Explanation:
^ # Start of string
(?! # Assert that it's impossible to match the following:
.* # any number of characters
(.) # followed by one character (capture that in group 1)
.* # followed by any number of characters
\1 # followed by the same character as the one captured before
) # End of lookahead
[a-z]* # Match any number of ASCII lowercase letters
$ # End of string
Test it live on regex101.com.
Note: This regex needs to brute-force check all possible character pairs, so performance may be a problem with larger strings. If you can use anything besides regex, you're going to be happier. For example, in Python:
if re.search("^[a-z]*$", mystring) and len(mystring) == len(set(mystring)):
# valid string

Regex to match string not ending with pattern

I try to find a regex that matches the string only if the string does not end with at least three '0' or more. Intuitively, I tried:
.*[^0]{3,}$
But this does not match when there one or two zeroes at the end of the string.
If you have to do it without lookbehind assertions (i. e. in JavaScript):
^(?:.{0,2}|.*(?!000).{3})$
Otherwise, use hsz's answer.
Explanation:
^ # Start of string
(?: # Either match...
.{0,2} # a string of up to two characters
| # or
.* # any string
(?!000) # (unless followed by three zeroes)
.{3} # followed by three characters
) # End of alternation
$ # End of string
You can try using a negative look-behind, i.e.:
(?<!000)$
Tests:
Test Target String Matches
1 654153640 Yes
2 5646549800 Yes
3 848461158000 No
4 84681840000 No
5 35450008748 Yes
Please keep in mind that negative look-behinds aren't supported in every language, however.
What wrong with the no-look-behind, more general-purpose ^(.(?!.*0{3,}$))*$?
The general pattern is ^(.(?!.* + not-ending-with-pattern + $))*$. You don't have to reverse engineer the state machine like Tim's answer does; you just insert the pattern you don't want to match at the end.
This is one of those things that RegExes aren't that great at, because the string isn't very regular (whatever that means). The only way I could come up with was to give it every possibility.
.*[^0]..$|.*.[^0].$|.*..[^0]$
which simplifies to
.*([^0]|[^0].|[^0]..)$
That's fine if you only want strings not ending in three 0s, but strings not ending in ten 0s would be long. But thankfully, this string is a bit more regular than some of these sorts of combinations, and you can simplify it further.
.*[^0].{0,2}$