Regex matches parts of a string, but not whole string - regex

This is the regex I'm using to validate a string that can contain lowercase and uppercase letters, numbers and dash:
/([a-zA-Z0-9-])+$/
It has the following results:
abd - matches
abcd- - matches
abcd0 - matches
abcd0- - matches
abc# - doesn't match (correct)
abc#efg - matches (incorrect, it shouldn't)
What am I doing wrong?

I would say you need /^([a-zA-Z0-9-])+$/. You want to match the whole string, not just a part, but you're missing the mark for the beginning of the string ^.
^ and $ say between the beginning and the end of the string and ([a-zA-Z0-9-])+ says there can be one or more characters a-zA-Z0-9-.
Your regexp matches everything which contains one or more characters a-zA-Z0-9- before the end of the string no matter what's before.
You can test your regular expression on regex101.com (very good online tool for regular expression testing with explanation, reference etc.).

Related

Regex to match characters to the right of a colon

I'm stuck on a regex. I'm trying to match words in any language to the right of a colon without matching the colon itself.
The basic rule:
For a line to be valid, it must not begin with or contain any characters outside of [a-z0-9_] until after :.
Any characters to the right of : should match as long as the line begins with the set of characters defined above.
For instance, given a string such as these:
this string should not match
bob_1:Hi. I'm Bob. I speak русский and this string should match
alice:Hi Bob. I speak 한국어 and this string should also match
http://example.com - would prefer to not match URLs
This string:should not match because no spaces or capital letters are allowed left of the colon
Only 2 of the 5 strings above need to match. And only to the right of the colon.
Hi. I'm Bob. I speak русский and this string should match
Hi Bob. I speak 한국어 and this string should also match
I'm currently using (^[a-z0-9_]+(?=:)) to match characters to the left of :. I just can't seem to reverse the logic.
The closest I have at the moment is (?!(?!:)).+. This seems to match everything to right of the colon as well as the colon itself. I just can't figure out how to not include : in the match.
Can one of you regex wizards help me out? If anything is unclear please let me know.
Short regex pattern (case insensitive):
^\w+:(\w.*)
\w - matches any word character (equal to [a-zA-Z0-9_])
https://regex101.com/r/MZhqSL/6
As you marked pcre, here's the pattern you need (only to the right of the colon):
^\w+:\K\w.*
\K - resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
https://regex101.com/r/E1yHVY/1
You can use this regex:
^[a-z0-9_]+:\K(?!//).*
RegEx Demo
RegEx Breakup:
^: Start
[a-z0-9_]+: Match 1+ of [a-z0-9_] characters
:: Match a colon
\K: Reset matched info so far
(?!//): Negative lookahead to disallow // right after colon to avoid matching potential URLs
.*: Match anything until end
You can use the regex: ^.*?:(.*)$
^.*?: - from the beginning of the line, any character until the colon (non-greedy) included
(.*)$ - use a matching group to anything that follows it till the end of the line
Link to DEMO

Match a String with optional number of hyphens - Java Regex

I am trying to match Strings with optional number of hyphens.
For example,
string1-string2,
string1-string2-string3,
string1-string2-string3 and so on.
Right now, I have something which matches one hyphen. How can I make the regex to match optional number of hyphens?
My current regex is: arn:aws:iam::\d{12}:[a-zA-Z]/?[a-zA-Z]-?[a-zA-Z]*
What do I need to add?
Use this regex:
^\\w+(-\\w+)*$
Explanation:
\\w+ - match any string containing [a-zA-Z_0-9]
(-\\w+)* - match a hyphen followed by a string zero or more times
Regex101
Note that this won't match an empty string, or a string containing weird characters. You could handle these cases manually or you could update the regex.

Regex with start and end match

I'm having trouble matching the start and end of a regex on Python.
Essentially I'm confused about the when to use word boundaries /b and start/end anchors ^ $
My regex of
^[A-Z]{2}\d{2}
matches 4 letter characters (two uppercase letters, two digits) which is what I'm after
Matches AJ99, RD22, CP44 etc
However, I also noted that AJAJAJAJAJAJAJAJAJSJHS99 could be matched as well. I've tried used ^ and $ together to match the whole string. This doesn't work
^[A-Z]{2}\d{2}$ # this doesn't work
but
^[A-Z]{2}\d{2} # this is fine
[A-Z]{2}\d{2}$ # this is fine
The string I'm matching against is 4 characters long, but in the first two examples the regex could pick the start and end of a longer string respectively.
s = "NZ43" # 4 characters, match perfect! However....
s = "AM27272727" # matches the first example
s = "HAHSHSHSHDS57" # matches the second example
The position anchors ^ and $ place a restriction on the position of your matched chars:
Analyzing your complete regex:
^[A-Z]{2}\d{2}$
^ matches only at the beginning of the text
[A-Z]{2} exactly 2 uppercase Ascii alphabetic characters
\d{2} exactly 2 digits (equivalent to [0-9]{2})
$ matches only at the end of the text
If you remove one or both of the 2 position anchors (^ or $) you can match a substring starting from the beginning or the end as you stated above.
If you want to match exactly a word without using the start/end of the string use the \b anchor, like this:
``\b[A-Z]{2}\d{2}\b``
\b matches at the start/end of text and between a regex word (in regex a word char \w is intended as one of [a-zA-Z0-9_]) and one char not in the word group (available as \W).
The regex above matches WS24 in all the next strings:
WS24 alone
before WS24
WS24 after
before WS24 after
NZ43
It doesn't match:
AM27272727 (it will do if is AM27 272727 or AM27"272727
HAHSHSHSHDS57 (it will do if HAHSHSHSH DS75 or...you get it)
A demo online (the site will be useful to you also to experiment with regex).
The fact that your shown behaviour is like it's supposed to be, your question suggests that you maybe does not have fully understood how regular expressions work.
As a addition to the very good and informative answer of GsusRecovery, here's a site, that guides you through the concepts of regular expressions and tries to teach you the basics with a lessons-based system. To be clear, I do not want to tout this website, as there are plenty of those, but however I could really made a use of this one and so it's the one I'm suggesting.

RegEx matching standalone string with dashes

I need to write a RegEx to match the "1-234-5678" string if there are no dash characters around it.
I have the following RegEx:
\b\d\-\d{3}\-\d{4}\b
Now this works fine and matches "1-234-5678" correctly in the strings below:
text 1-234-5678 text
111 1-234-5678 1212
The RegEx also correctly NOT matches "1-234-5678" in the strings below:
text1-234-5678text
1111-234-56781212
But the problem is that it also matches in the following strings:
text-1-234-5678-text
111-1-234-5678-1212
It's because \b matches before and after the dashes.
How can I eliminate matches if there's a dash in front or after the data?
Use a negative lookbehind and negative lookahead to check whether the above mentioned format is not preceded and followed by a - symbol,
(?<!-)\b\d\-\d{3}\-\d{4}\b(?!-)
DEMO

regular expression no characters

I have this regular expression
([A-Z], )*
which should match something like
test, (with a space after the comma)
How to I change the regex expression so that if there are any characters after the space then it doesn't match.
For example if I had:
test, test
I'm looking to do something similar to
([A-Z], ~[A-Z])*
Cheers
Use the following regular expression:
^[A-Za-z]*, $
Explanation:
^ matches the start of the string.
[A-Za-z]* matches 0 or more letters (case-insensitive) -- replace * with + to require 1 or more letters.
, matches a comma followed by a space.
$ matches the end of the string, so if there's anything after the comma and space then the match will fail.
As has been mentioned, you should specify which language you're using when you ask a Regex question, since there are many different varieties that have their own idiosyncrasies.
^([A-Z]+, )?$
The difference between mine and Donut is that he will match , and fail for the empty string, mine will match the empty string and fail for ,. (and that his is more case-insensitive than mine. With mine you'll have to add case-insensitivity to the options of your regex function, but it's like your example)
I am not sure which regex engine/language you are using, but there is often something like a negative character groups [^a-z] meaning "everything other than a character".