Regular Expression to List accepted words

Regular Expression to List accepted words - regex

I need a regular expression to list accepted Version Numbers. ie. Say I wanted to accept "V1.00" and "V1.02". I've tried this "(V1.00)|(V1.01)" which almost works but then if I input "V1.002" (Which is likely due to the weird version numbers I am working with) I still get a match. I need to match the exact strings.
Can anyone help?

The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this:
^(V1\.00|V1\.01)$
A more compact way of getting the same result would be:
^(V1\.0[01])$

Do this:
^(V1\.00|V1\.01)$
(. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)

I would use the '^' and '$' to mark the beginning and end of the string, like this:
^(V1\.00|V1\.01)$
That way the entire string must match the regex.

Related

Regular expression to check strings containing a set of words separated by a delimiter

As the title says, I'm trying to build up a regular expression that can recognize strings with this format:
word!!cat!!DOG!! ... Phone!!home!!
where !! is used as a delimiter. Each word must have a length between 1 and 5 characters. Empty words are not allowed, i.e. no strings like !!,!!!! etc.
A word can only contain alphabetical characters between a and z (case insensitive). After each word I expect to find the special delimiter !!.
I came up with the solution below but since I need to add other controls (e.g. words can contain spaces) I would like to know if I'm on the right way.
(([a-zA-Z]{1,5})([!]{2}))+
Also note that empty strings are not allowed, hence the use of +
Help and advices are very welcome since I just started learning how to build regular expressions. I run some tests using http://regexr.com/ and it seems to be okay but I want to be sure. Thank you!
Examples that shouldn't match:
a!!b!!aaaaaa!!
a123!!b!!c!!
aAaa!!bbb
aAaa!!bbb!

Splitting the string and using the values between the !!
It depends on what you want to do with the regular expression. If you want to match the values between the !!, here are two ways:
Matching with groups
([^!]+)!!
[^!]+ requires at least 1 character other than !
!! instead of [!]{2} because it is the same but much more readable
Matching with lookahead
If you only want to match the actual word (and not the two !), you can do this by using a positive lookahead:
[^!]+(?=!!)
(?=) is a positive lookahead. It requires everything inside, i.e. here !!, to be directly after the previous match. It however won't be in the resulting match.
Here is a live example.
Validating the string
If you however want to check the validity of the whole string, then you need something like this:
^([^!]+!!)+$
^ start of the string
$ end of the string
It requires the whole string to contain only ([^!]+!!) one or more than one times.
If [^!] does not fit your requirements, you can of course replace it with [a-zA-Z] or similar.

Capture part or whole using regex with same capturing name

Given the two following strings :
\06086-afde-4e46-8886-#xxx.com\0xxx7ccd-6293-4343-8e50-xxx
\0name.surname#xxx.com\0xxx6293-4343-8e50-e1d5-xxx
I try to extract 6086-afde-4e46-8886- (id it is a guid) or name.surname#xxx.com (if it is not a guid). The difficulty here is that the captured groups must have the same name.
So far, I have
(?<name>(?:\w{4}-){4}|[a-zA-Z.]{1,}#xxx\.com), but this also captures 7ccd-6293-4343-8e50- or 6293-4343-8e50-e1d5- which I don't want.
I was also thinking about something like \\\0(?<name>(?:\w{4}-){4}|[a-zA-Z.]{1,}#xxx\.com)(?:(?:#xxx\.com)?\\\0),
but then is there a way not to repeat the xxx.com part (because it is more complicated than that). Also, this relies on finding \\0, which I'd like not to, as I don't really know if this will be found somewhere else in the string.
Thanks..

The following regular expression is matching the number 6086-afde-4e46-8886- and the email name.surname#xxx.com into the same group name without using the start sequence \0
(?<_name_>[A-Za-z]+\.[A-Za-z]+#xxx\.com|(?:[\w]{4}-){4}(?=#xxx\.com))
This regular expression uses a positive look ahead (?=#xxx\.com) for matching the number without taking #xxx.com.

try this
\\0(?<_name_>(?:[\w\-\.]+))#xxx\.com
And add all allowed characters inside the square parentheses
demo: http://regexhero.net/tester/?id=be0fed5e-1d24-43cc-9db9-812311c17d61

Seems like you're trying to get the first match. If yes then try the below regex.
^.*?(?<name>(?:\w{4}-){4}|[a-zA-Z.]{1,}#xxx.com)
http://regex101.com/r/jC3uR4/5

Assistance with a regular expression

I am not good with regular expressions, and I could use some help with a couple of expressions I am working on. I have a line of text, such as Text here then 999-99 and I'd like to isolate that number sequence at the end. It could be either 999-99 or 999-99-9. The following seems to work:
\d{3}-\d{2}(-\d{1})?
But I notice that it really just seems to be searching anywhere within the text, as I can add text after the number sequence and it still matches. This needs to be more strict, so that the line must end with this exact sequence, and nothing after it. I tried ending with $ instead of ?, but that never seems to create a match (it always returns false).
I could also use some help with character replacement. I am working on a program which deals with OCR scanning, and occasionally the string value that comes back contains undisplayable characters, represented by the ܀ symbol. Is there a regular expression which will replace the ܀ characters with a space?

Try this regular expression.
([\d-]+)$

This should work. Just end your regex with $. It represents end of line
\d{3}-\d{2}(-\d{1})?$

Use the word-boundary metacharacter, \b:
\b\d{3}-\d{2}(-\d)?\b
You can also remove the {1} from the last \d since it's redundant.

Negative integer Regex doesn't match

I have Googled it, and found the following results:
http://icfun.blogspot.com/2008/03/regular-expression-to-handle-negative.html
http://regexlib.com/DisplayPatterns.aspx?cattabindex=2&categoryId=3
With some (very basic) Regex knowledge, I figured this would work:
r\.(^-?\d+)\.(^-?\d+)\.mcr
For parsing such strings:
r.0.0.mcr
r.-1.5.mcr
r.20.-1.mcr
r.-1.-1.mcr
But I don't get a match on these.
Since I'm learning (or trying to learn) Regex, could you please explain why my pattern doesn't match (instead of just writing a new working one for me)? From what I understood, it goes like so:
Match r
Match a period
Match a prefix negative sign or not, and store the group
Match a period
Match a prefix negative sign or not, and store the group
Match a preiod
Match mcr
But I'm wrong, apparently :).

You are very close. ^ matches the start of a string, so it should only be located at the start of a pattern (if you want to use it at all - that depends on whether you will also accept e.g. abcr.0.0.mcr or not). Similarly, one can use $ (but only at the end of the pattern) to indicate that you will only accept strings that do not contain anything after what the pattern matches (so that e.g. r.0.0.mcrabc won't be accepted). Otherwise, I think it looks good.

The ^ characters are telling it to match only at the beginning of a line; since it's obviously not at the beginning of a line in either case, it fails to match. In this case, you just need to remove both ^s. (I think what you're trying to say is "don't let anything else be in between these", but that's the default except at the start of the regex; you would need something like .* to make it allow additional characters between them.)

Since the ^ is not at the start of the expression, its meaning is 'not'. So in this case it means that there should not be a dash there.

Regex AND'ing

I have to two strings that I want to match everything that doesn't equal them, the first string can be followed by a number of characters. I tried something like this, negating two ors and negating that result.
?!(?!^.*[^Factory]$|?![^AppName])
Any ideas?

Try this regular expression:
(?!.*Factory$|.*AppName)^.*
This matches every string that does not end with Factory and does not contain AppName.

dfa's answer is by far the best option. But if you can't use it for some reason, try:
^(?!.*Factory|AppName)
It's very difficult to determine from your question and your regex what you're trying to do; they seem to imply opposite behaviors. The regex I wrote will not match if Factory appears anywhere in the string, or AppName appears at the beginning of it.

what about
if (!match("(Factory|AppName)")) {
// your code
}

Would it work if you looked for the existence of those two strings and then negated the regex?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression to List accepted words - regex

The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this: ^(V1\.00|V1\.01)$ A more compact way of getting the same result would be: ^(V1\.0[01])$

Do this: ^(V1\.00|V1\.01)$ (. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)

I would use the '^' and '$' to mark the beginning and end of the string, like this: ^(V1\.00|V1\.01)$ That way the entire string must match the regex.

Related

Regular expression to check strings containing a set of words separated by a delimiter

Capture part or whole using regex with same capturing name

Assistance with a regular expression

Negative integer Regex doesn't match

Regex AND'ing

Categories

Resources