Regex: combines not equals, not contained in and match up until - regex

I need a regex that:
won't match if the string is equal to any of a series of strings
won't match if any of another series of sub-strings is contained anywhere within
and will match only up until any of a set of certain delimiter characters.
I've got the first two requirements, but can't figure out how to add in the last part:
^(?:((?!^fill$)(?!^style$)(?!->)[^;:]))*$
Will not match 'fill' or 'style' (but will match 'fills' or 'astyle').
Will not match if '->' is anywhere inside. (e.g. don't match 'a->b')
However a : or a ; will cause no match, rather than matching up until the first occurrence of either of those characters.
e.g.
Will:
should return 'Will' but currently returns nothing.

You can use
^(?!(?:fill|style)$|.*(?:-[>-]|<-))[^;:]+
See the regex demo.
Details:
^ - start of string
(?!(?:fill|style)$|.*(?:-[>-]|<-)) - immediately to the right, there can't be:
(?:fill|style)$ - fill or style (followed by the end of string)
| - or
.*(?:-[>-]|<-) - after any zero or more chars as many as possible, --, ->, <- (note the <-> alternative is missing since <- covers it)
[^;:]+ - any zero or more chars other than ; and : as many as possible

Related

Regex With Conditional - Not Desired Output

Was actually glossing over a question and found myself struggling to perform something really simple.
If a string contains % I want to use a particular regex, else I want to use a different one.
I tried the following: https://regex101.com/r/UvFZpo/1/
Regex: (%)(?(1)[^$]+|[^%]+).
Test string: abc%
But I'm not getting the expected results.
I was expecting to see abc% matched as it contains %.
If the string was, abc$, I'd expect it to use the second expression.
Where am I going wrong?
Regex parses strings from left to right, position by position.
Once your pattern matches &, its index is at the end of string, hence, it fails since there are no more chars to be matched by the subsequent [^$]+ pattern.
You can use a mere alternation here:
^(?:([^$]*%[^$]*)|([^%]+))$
See the regex demo
If the string contains %, the Group 1 will be populated, else, Group 2 will.
Details
^ - start of string
(?:([^$]*%[^$]*)|([^%]+)) - either of the two alternatives:
([^$]*%[^$]*) - Group 1: any 0+ chars other than $, as many as possible, % any 0+ chars other than $, as many as possible,
| - or
([^%]+) - any 1+ chars other than %, as many as possible
$ - end of string.

Regex for validating multiple Inputs as per the below regex

I have a regex that validates my inputs like
Regex :
^(?=.{1,15}$)([a-zA-Z0-9]+)(?:[-]{1})[a-zA-Z0-9]+$
Example 1: BBB-123BBB
Now, I want to create a regex using the above, where my regex can validate multiple inputs with a semicolon (;) as a delimiter & the maximum input that can be there is 20.
Like for ex 2:
BBB-123BB,AAA-1234;EEE-9876....20 items
Ex 2.
BB-123BB,AAA3-1234;EEE334-9876....20 items
How can I extend my regex above (the first one) to allow multiple inputs to be added while letting them be split by a semicolon and can have a maximum of 20 items (as shown in ex 2)?
Building on your pattern, I removed unnecessary capturing groups and used simple -, which is equivalent to (?:[-]{1}). Here's what I came up with:
^(?:(?:^|;)(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+){1,20}$
Explanation:
^ - match beginning of a string
(?:...) - non-capturing group
^|; - alternation: match ; literally or match beginning of string
[^;]{1,15}; - match between 1 and 15 characters other than ;
{1,20} - match preceding pattern between 1 and 20 times
$ - match end of a string
Demo
EDIT: Pattern:
^(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+(?:;(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+){0,19}$
won't accept ; at the beginning.
SECOND EDIT:
^(?=[^;]{1,15}(?:;|$))[a-zA-Z0-9]+-[a-zA-Z0-9]+(?:;(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+){0,19}$
Added: (?:;|$) - match either ; literally or $ - end of string
What it does: correctly limits length of a token to 15
If the maximum is not important to enforce, simply allow arbitrary repetitions.
^(?=[^;]{3,15}(?:;[^;]{3,15})*$)[a-zA-Z0-9]+-[a-zA-Z0-9]+(;[a-zA-Z0-9]+-[a-zA-Z0-9]+)*$
If you want to specifically allow between 0 and 19 repeats, change the last * to {0,19}.
The minimal string which can match the main expression has three characters; so I updated the length constraint to {3,15}.
A minus simply matches itself so there is no need to put it in a character class; and there is never a good reason to specify a single repetition of anything, so I simplified the main regex accordingly.

Regular Expression Valid and Invalid togather

I the below items i want to only detect the valid items with regular expression.
Space in word means invalid, # sign means invalid, Starting word with number is invalid.
Invalid : M_123 ASD
Invalid : M_123#ASD
Invalid : 1_M# ADD
Valid : M_125ASD
Valid : M_125$ASD
I am trying as below :
[A-Za-z0-9_$]
Not working properly. I need to set both valid and invalid sets for a word.
Can i do a match with regular expression?
Your regex [A-Za-z0-9_$] presents a character class that matches a single character that is either an ASCII letter or digit, or _ or $ symbols. If you use it with std::regex_match, it would only match a whole string that consists of just one char like that since the pattern is anchored by default when used with that method. If you use it with an std::regex_search, a string like ([_]) would pass, since the regex is not anchored and can find partial matches.
To match 0 or more chars, you need to add * quantifier after your class. To match one or more chars, you need to add + quantifier after your character class. However, you have an additional restriction: a digit cannot appear at the start.
It seems you may use
^[A-Za-z][A-Za-z0-9_$]*$
See the regex demo at regex101.com.
Details:
^ - start of string
[A-Za-z] - an ASCII letter (exactly one occurrence)
[A-Za-z0-9_$]* - 0+ ASCII letters, digits, _ or $
$ - end of string anchor.
Note that with regex_match, you may omit ^ and $ anchors.
So the requirements are
cannot start with number( i am assuming it as start with alphabet)
cannot contain space or #
all other characters are valid
you can try this regex ^[a-zA-Z]((?![\# ]).)+?$
^[a-zA-Z] checks for alphabet at start of the line
((?![\# ]).)+?$ checks if there are no # or space in the remaining part of the line.
Online demo here
EDIT
As per Wiktor's comment the regex can be simplified to ^[a-zA-Z][^# ]+$.

Match 3 and 4 delimiters and between them; not less not more

I have a command-line program that its first argument ( = argv[ 1 ] ) is a regex pattern.
./program 's/one-or-more/anything/gi/digit-digit'
So I need a regex to check if the entered input from user is correct or not. This regex can be solve easily but since I use c++ library and std::regex_match and this function by default puts begin and end assertion (^ and $) at the given string, so the nan-greedy quantifier is ignored.
Let me clarify the subject. If I want to match /anything/ then I can use /.*?/ but std::regex_match considers this pattern as ^/.*?/$ and therefore if the user enters: /anything/anything/anyhting/ the std::regex_match still returns true whereas the input-pattern is not correct. The std::regex_match only returns true or false and the expected pattern form the user can only be a text according to the pattern. Since the pattern is various, here, I can not provide you all possibilities, but I give you some example.
Should be match
/.//
s/.//
/.//g
/.//i
/././gi
/one-or-more/anything/
/one-or-more/anything/g/3
/one-or-more/anything/i
/one-or-more/anything/gi/99
s/one-or-more/anything/g/4
s/one-or-more/anything/i
s/one-or-more/anything/gi/54
and anything look like this pattern
Rules:
delimiters are /|##
s letter at the beginning and g, i and 2 digits at the end are optional
std::regex_match function returns true if the entire target character sequence can be match, otherwise return false
between first and second delimiter can be one-or-more +
between second and third delimiter can be zero-or-more *
between third and fourth can be g or i
At least 3 delimiter should be match /.// not less so /./ should not be match
ECMAScript 262 is allowed for the pattern
NOTE
May you would need to see may question about std::regex_match:
std::regex_match and lazy quantifier with strange
behavior
I no need any C++ code, I just need a pattern.
Do not try d?([/|##]).+?\1.*?\1[gi]?[gi]?\1?d?\d?\d?. It fails.
My attempt so far: ^(?!s?([/|##]).+?\1.*?\1.*?\1)s?([/|##]).+?\2.*?\2[gi]?[gi]?\d?\d?$
If you are willing to try, you should put ^ and $ around your pattern
If you need more details please comment me, and I will update the question.
Thanks.
You could use this regular expression:
^s?([/|##])((?!\1).)+\1((?!\1).)*\1((gi?|ig)(\1\d\d?)?|i)?$
See regex101.com
Note how this also rejects these cases:
///anything/
/./anything/gg
/./anything/ii
/./anything/i/12
How it works:
Some explanation of the parts that are different:
((?!\1).): this will match any character that is not the delimiter. This way you are sure you can keep track of the exact number of delimiters used. You can this way also prevent that the first character after the first delimiter, is again that delimiter, which should not be allowed.
(gi?|ig): matches any of the valid modifier combinations, except a sole i, which is treated separately. So this also excludes gg and ii as valid character sequences.
(\1\d\d?)?: optionally allows for an extra delimiter (after a g modifier -- see previous) to be added with one or two digits following it.
( |i)?: for the case there is no g modifier present, but just the i or none: then no digits are allowed to follow.
This is a tricky one, but I took the challenge - here is what I have ended up with:
^s?([\/|##])(?:(?!\1).)+\1(?:(?!\1).)*\1(?:i|(?:gi?|ig)(\1\d{1,2})?)?$
Pattern breakdown:
^ matches start of string
s? matches an optional 's' character
([\/|##]) matches the delimeter characters and captures as group 1
(?:(?!\1).)+ matches anything other than the delimiter character one or more times (uses negative lookahead to make sure that the character isn't the delimiter matched in group 1)
\1 matches the delimiter character captured in group 1
(?:(?!\1).)* matches anything other than the delimiter character zero or more times
\1 matches the delimiter character captured in group 1
(?: starts a new group
i matches the i character
| or
(?:gi?|ig) matches either g, gi, or ig
(\1\d{1,2})? followed by an optional extra delimiter and 0-9 once or twice
)? closes group and makes it optional
$ matches end of string
I have used non capturing groups throughout - these are groups that start ?:

Regular Expression to match set of arbitrary codes

I am looking for some help on creating a regular expression that would work with a unique input in our system. We already have some logic in our keypress event that will only allow digits, and will allow the letter A and the letter M. Now I need to come up with a RegEx that can match the input during the onblur event to ensure the format is correct.
I have some examples below of what would be valid. The letter A represents an age, so it is always followed by up to 3 digits. The letter M can only occur at the end of the string.
Valid Input
1-M
10-M
100-M
5-7
5-20
5-100
10-20
10-100
A5-7
A10-7
A100-7
A10-20
A5-A7
A10-A20
A10-A100
A100-A102
Invalid Input
a-a
a45
4
This matches all of the samples.
/A?\d{1,3}-A?\d{0,3}M?/
Not sure if 10-A10M should or shouldn't be legal or even if M can appear with numbers. If it M is only there without numbers:
/A?\d{1,3}-(A?\d{1,3}|M)/
Use the brute force method if you have a small amount of well defined patterns so you don't get bad corner-case matches:
^(\d+-M|\d+-\d+|A\d+-\d+|A\d+-A\d+)$
Here are the individual regexes broken out:
\d+-M <- matches anything like '1-M'
\d+-\d+ <- 5-7
A\d+-\d+ <- A5-7
A\d+-A\d+ <- A10-A20
/^[A]?[0-9]{1,3}-[A]?[0-9]{1,3}[M]?$/
Matches anything of the form:
A(optional)[1-3 numbers]-A(optional)[1-3 numbers]M(optional)
^A?\d+-(?:A?\d+|M)$
An optional A followed by one or more digits, a dash, and either another optional A and some digits or an M. The '(?: ... )' notation is a Perl 'non-capturing' set of parentheses around the alternatives; it means there will be no '$1' after the regex matches. Clearly, if you wanted to capture the various bits and pieces, you could - and would - do so, and the non-capturing clause might not be relevant any more.
(You could replace the '+' with '{1,3}' as JasonV did to limit the numbers to 3 digits.)
^A?\d{1,3}-(M|A?\d{1,3})$
^ -- the match must be done from the beginning
A? -- "A" is optional
\d{1,3} -- between one and 3 digits; [0-9]{1,3} also work
- -- A "-" character
(...|...) -- Either one of the two expressions
(M|...) -- Either "M" or...
(...|A?\d{1,3}) -- "A" followed by at least one and at most three digits
$ -- the match should be done to the end
Some consequences of changing the format. If you do not put "^" at the beginning, the match may ignore an invalid beginning. For example, "MAAMA0-M" would be matched at "A0-M".
If, likewise, you leave $ out, the match may ignore an invalid trail. For example, "A0-MMMMAAMAM" would match "A0-M".
Using \d is usually preferred, as is \w for alphanumerics, \s for spaces, \D for non-digit, \W for non-alphanumeric or \S for non-space. But you must be careful that \d is not being treated as an escape sequence. You might need to write it \\d instead.
{x,y} means the last match must occur between x and y times.
? means the last match must occur once or not at all.
When using (), it is treated as one match. (ABC)? will match ABC or nothing at all.
I’d use this regular expression:
^(?:[1-9]\d{0,2}-(?:M|[1-9]\d{0,2})|A[1-9]\d{0,2}-A?[1-9]\d{0,2})$
This matches either:
<number>-M or <number>-<number>
A<number>-<number> or A<number>-A<number>
Additionally <number> must not begin with a 0.