Regex for validating multiple Inputs as per the below regex - regex

I have a regex that validates my inputs like
Regex :
^(?=.{1,15}$)([a-zA-Z0-9]+)(?:[-]{1})[a-zA-Z0-9]+$
Example 1: BBB-123BBB
Now, I want to create a regex using the above, where my regex can validate multiple inputs with a semicolon (;) as a delimiter & the maximum input that can be there is 20.
Like for ex 2:
BBB-123BB,AAA-1234;EEE-9876....20 items
Ex 2.
BB-123BB,AAA3-1234;EEE334-9876....20 items
How can I extend my regex above (the first one) to allow multiple inputs to be added while letting them be split by a semicolon and can have a maximum of 20 items (as shown in ex 2)?

Building on your pattern, I removed unnecessary capturing groups and used simple -, which is equivalent to (?:[-]{1}). Here's what I came up with:
^(?:(?:^|;)(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+){1,20}$
Explanation:
^ - match beginning of a string
(?:...) - non-capturing group
^|; - alternation: match ; literally or match beginning of string
[^;]{1,15}; - match between 1 and 15 characters other than ;
{1,20} - match preceding pattern between 1 and 20 times
$ - match end of a string
Demo
EDIT: Pattern:
^(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+(?:;(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+){0,19}$
won't accept ; at the beginning.
SECOND EDIT:
^(?=[^;]{1,15}(?:;|$))[a-zA-Z0-9]+-[a-zA-Z0-9]+(?:;(?=[^;]{1,15})[a-zA-Z0-9]+-[a-zA-Z0-9]+){0,19}$
Added: (?:;|$) - match either ; literally or $ - end of string
What it does: correctly limits length of a token to 15

If the maximum is not important to enforce, simply allow arbitrary repetitions.
^(?=[^;]{3,15}(?:;[^;]{3,15})*$)[a-zA-Z0-9]+-[a-zA-Z0-9]+(;[a-zA-Z0-9]+-[a-zA-Z0-9]+)*$
If you want to specifically allow between 0 and 19 repeats, change the last * to {0,19}.
The minimal string which can match the main expression has three characters; so I updated the length constraint to {3,15}.
A minus simply matches itself so there is no need to put it in a character class; and there is never a good reason to specify a single repetition of anything, so I simplified the main regex accordingly.

Related

How to match strings that are entirely composed of a predefined set of substrings with regex

How to match strings that are entirely composed of a predefined set of substrings. For example, I want to see if a string is composed of only the following allowed substrings:
,
034
140
201
In the case when my string is as follows:
034,201
The string is fully composed of the 'allowed' substrings, so I want to positively match it.
However, in the following string:
034,055,201
There is an additional 055, which is not in my 'allowed' substrings set. So I want to not match that string.
What regex would be capable of doing this?
Try this one:
^(034|201|140|,)+$
Here is a demo
Step by step:
^ begining of a line
(034|201|140|,) captures group with alternative possible matches
+ captured group appears one or more times
$ end of a line
This regex will match only your values and ensure that the line doesn't start or end with a comma. Only matches in group 0 if it is valid, the groups are non-matching.
^(?:034|140|201)(?:,(?:034|140|201))*$
^: start
(?:034|140|201): non-matching group for your set of items (no comma)
(?:,(?:034|140|201))*: non-matching group of a comma followed by non-matching group of values, 0 or more times
$: end

Regular Expression: Find a specific group within other groups in VB.Net

I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY
You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it
My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;

Replace / Delete everything after first + character in datastudio

I have a string looking like this (stored as an Event Action value from Google Analytics)
0+171235652++zu
or
122+115166747++en
I would like (with the use of calculate fields) create a new field that will show only the number before the 1st '+' character. So in those examples above
0 or 122
What I tried was (below), but it did not help, Any ideas?
REGEXP_REPLACE(Event Action, '(^\\+).*', '')
You may use
REGEXP_EXTRACT(Event Action, '^([^+]+)')
See the regex in action. The regex matches:
^ - start of string
([^+]+) - Capturing group 1: any one or more chars other than a + (you may use ([^+]*) if you want to also get empty match when a + is the first char).
If you want a replacement function, you may use
REGEXP_REPLACE(Event Action,"[+].*","")
The pattern you tried (^\\+).* did not work because this part ^\\+ matches the start of the string followed by 1 or more times a plus sign.
If what comes before the first plus sign should be digits and the plus sign itself should be present, you could capture the leading digits followed by matching the plus sign followed by the rest of the string.
Use group 1 using \\1 in the replacement.
^(\\d+)\\+.*
In parts
^ Start of string
(\\d+) Capture group 1, match 1 or more digits
\\+.* Match a + char and 0 or more times any char except a newline
Regex demo
Example code
REGEXP_REPLACE(Event Action, '^(\\d+)\\+.*', '\\1')

Match 3 and 4 delimiters and between them; not less not more

I have a command-line program that its first argument ( = argv[ 1 ] ) is a regex pattern.
./program 's/one-or-more/anything/gi/digit-digit'
So I need a regex to check if the entered input from user is correct or not. This regex can be solve easily but since I use c++ library and std::regex_match and this function by default puts begin and end assertion (^ and $) at the given string, so the nan-greedy quantifier is ignored.
Let me clarify the subject. If I want to match /anything/ then I can use /.*?/ but std::regex_match considers this pattern as ^/.*?/$ and therefore if the user enters: /anything/anything/anyhting/ the std::regex_match still returns true whereas the input-pattern is not correct. The std::regex_match only returns true or false and the expected pattern form the user can only be a text according to the pattern. Since the pattern is various, here, I can not provide you all possibilities, but I give you some example.
Should be match
/.//
s/.//
/.//g
/.//i
/././gi
/one-or-more/anything/
/one-or-more/anything/g/3
/one-or-more/anything/i
/one-or-more/anything/gi/99
s/one-or-more/anything/g/4
s/one-or-more/anything/i
s/one-or-more/anything/gi/54
and anything look like this pattern
Rules:
delimiters are /|##
s letter at the beginning and g, i and 2 digits at the end are optional
std::regex_match function returns true if the entire target character sequence can be match, otherwise return false
between first and second delimiter can be one-or-more +
between second and third delimiter can be zero-or-more *
between third and fourth can be g or i
At least 3 delimiter should be match /.// not less so /./ should not be match
ECMAScript 262 is allowed for the pattern
NOTE
May you would need to see may question about std::regex_match:
std::regex_match and lazy quantifier with strange
behavior
I no need any C++ code, I just need a pattern.
Do not try d?([/|##]).+?\1.*?\1[gi]?[gi]?\1?d?\d?\d?. It fails.
My attempt so far: ^(?!s?([/|##]).+?\1.*?\1.*?\1)s?([/|##]).+?\2.*?\2[gi]?[gi]?\d?\d?$
If you are willing to try, you should put ^ and $ around your pattern
If you need more details please comment me, and I will update the question.
Thanks.
You could use this regular expression:
^s?([/|##])((?!\1).)+\1((?!\1).)*\1((gi?|ig)(\1\d\d?)?|i)?$
See regex101.com
Note how this also rejects these cases:
///anything/
/./anything/gg
/./anything/ii
/./anything/i/12
How it works:
Some explanation of the parts that are different:
((?!\1).): this will match any character that is not the delimiter. This way you are sure you can keep track of the exact number of delimiters used. You can this way also prevent that the first character after the first delimiter, is again that delimiter, which should not be allowed.
(gi?|ig): matches any of the valid modifier combinations, except a sole i, which is treated separately. So this also excludes gg and ii as valid character sequences.
(\1\d\d?)?: optionally allows for an extra delimiter (after a g modifier -- see previous) to be added with one or two digits following it.
( |i)?: for the case there is no g modifier present, but just the i or none: then no digits are allowed to follow.
This is a tricky one, but I took the challenge - here is what I have ended up with:
^s?([\/|##])(?:(?!\1).)+\1(?:(?!\1).)*\1(?:i|(?:gi?|ig)(\1\d{1,2})?)?$
Pattern breakdown:
^ matches start of string
s? matches an optional 's' character
([\/|##]) matches the delimeter characters and captures as group 1
(?:(?!\1).)+ matches anything other than the delimiter character one or more times (uses negative lookahead to make sure that the character isn't the delimiter matched in group 1)
\1 matches the delimiter character captured in group 1
(?:(?!\1).)* matches anything other than the delimiter character zero or more times
\1 matches the delimiter character captured in group 1
(?: starts a new group
i matches the i character
| or
(?:gi?|ig) matches either g, gi, or ig
(\1\d{1,2})? followed by an optional extra delimiter and 0-9 once or twice
)? closes group and makes it optional
$ matches end of string
I have used non capturing groups throughout - these are groups that start ?:

Why doesn't the regex ^([0|1]1)+$ match the string "111"?

I'm trying to write a regex to match binary strings where every odd character is a 1.
I came up with this:
^([0|1]1)+$
My logic:
^ matches the start of the line
( starts a capture group
[0|1] match a 0 or 1 (since the 0th position is even)
1 the previous character (0 or 1) must be followed by a 1
+ repeat the previous pattern one or more times
$ matches the end of the line
So by my logic, it the above regex should match binary strings where every other character (with the first "other" character being the second one in the string) is a 1.
However, it doesn't work correctly. As an example, the string 111 is not matched.
Why isn't it working and what should I change to make it work?
Regex101 Test
If you need every odd character to be a 1, then you need something more like this:
^([01]1)*[01]?$
The first character can be anything, the next has to be 1, then repeated several times while the last character can be 0 or 1.
The pipe in your character class is not needed, and is actually making your regex also match a pipe character. So remove it entirely. You use the pipe in groups (i.e. (?: ... ) or ( ... ) to denote alternation).
The above will also match an empty string, so you could add (?=.) at the beginning to force matching at least 1 character (i.e. ^(?=.)([01]1)*[01]?$.
The above will match where you have (where x is either 0 or 1):
x
x1
x1x
x1x1
x1x1x
x1x1x1
etc.
Your current regex on the other side is attempting to match even number of characters. You repeat the group ([0|1]1) which matches 2 characters exactly (no more no less) so the length of your whole match will be a multiple of 2.
Adding the optional [01] at the end allows for strings with odd number of characters to match.
Your regex is for even-length strings only. [01] and 1 each match a character, therefore your capturing group matches 2 characters.
This modifies your regex to accept odd-length strings:
^([01](1|$))+$
Firstly, the [0|1] should read [01]. Otherwise you have a character group that matches, 0, | or 1.
Now, [01]1 matches exactly two characters. Thus ([01]1)+ cannot match a string whose length is not a multiple of two.
To make it work with inputs of odd length, change the regex to
^(([01]1)+[01]?|1)$
You can use this pattern:
^1?([01]1)+$|^1$
or
^(1?([01]1)+|1)$
To deal with an odd or even number of digits you need to put an optional 1? at the begining. To ensure that there is at least one digit, you can't use a * quantifier for the group, otherwhise the pattern can match the empty string. This why, you need to use + for the group and add the case of a single 1