Regex Match string having exact number of a char - regex

I Have some strings:
1:2:3:4:5
2:3:4:5
5:3:2
6:7:8:9:0
How to find strings that have exact numbers of colons?
Example I need to find strings where 4 colons.
Result:
1:2:3:4:5 and 6:7:8:9:0
Edit:
No matter what text between colons, it may so:
qwe:::qwe:
:998:qwe:3ee3:00
I have to specify a number of colons, but using regexp_matches.
It something like filter to search broken strings.
Thanks.

With N being the number you search for:
"^([^:]*:){N}[^:]*$"
Here is a test:
for s in ":::foo:bar" "foo:bar:::" "fo:o:ba::r" ; do echo "$s" | egrep "^([^:]*:){4}[^:]*$" ; done
Change 4 to 3 and 5, to see it not matching.
Maybe postgresql needs specific flags or masking for some elements.
"^([^:]*:){N}[^:]*$"
"^ $"
# matching the whole String/Line, not just a part
"([^:]*:){N}[^:]*"
"( ){N}[^:]*"
# N repetitions of something, followed by an arbitrary number of non-colons (maybe zero)
"([^:]*:)"
# non-colons in arbitrary number (including zero), followed by a colon

You want to use the quantifier syntax {4}. The quantifier is used to indicate that the preceding capture group needs to occur n number of times in order to meet the matching criteria. To find the pattern of five digits separated by semi-colons. something like the following would work.
((\d\:){4}\d)
I am assuming you may want any digit or word character but not whitespace or punctuation. In that case use the word character (\w).
((\w)[\:]){4}(\w))
But depending on what you would like to do with that pattern you may need a different regular expression. If you wanted to capture and replace all the colons while leaving the digits intact your pattern would need to use string replacement or more advanced grouping.

Any number of any characters, including 4 :
((.*:){4}.*)

Related

Regex to match number(s) or UUID

I need regex which loosely matches UUIDs and numbers. I expect my filename to be formatted like:
results_SOMETHING.csv
This something ideally should be numbers (count of how many time a script is run) or a UUID.
This regex is encompasses a huge set of filenames:
^results_?.*.csv$
and this one:
^results_?[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.csv$
matches only UUIDs. I want a regex whose range is somewhere in between. Mostly I don't want matches like result__123.csv.
Note: This doesn't directly answer the OP question, but given the title, it will appear in searches.
Here's a proper regex to match a uuid based on this format without the hex character constraint:
(\w{8}(-\w{4}){3}-\w{12}?)
If you want it to match only hex characters, use:
/([a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}?)/i
(Note the / delimiters used in Javascript and the /i flag to denote case-insensitivity; depending on your language, you may need to write this differently, but you definitely want to handle both lower and upper case letters).
If you're prepending results_ and appending .csv to it, that would look like:
^results_([a-z\d]{8}(-[a-z\d]{4}){3}-[a-z\d]{12}?).csv$
-----EDITED / UPDATED-----
Based on the comments you left, there are some other patterns you want to match (this was not clear to me from the question). This makes it a little more challenging - to summarize my current understanding:
results.csv - match (NEW)
results_1A.csv - match (NEW)
results_ABC.csv - ? no match (I assume)
result__123.csv - no match
results_123.csv - match
Results_123.cvs - ? no match
results_0a0b0c0d-884f-0099-aa95-1234567890ab.csv - match
You will find the following modification works according to the above "specification":
results(?:_(?:[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}|(?=.*[0-9])[A-Z0-9]+))?\.csv
Breaking it down:
results matches characters "results" literally
(?:_ ….)? non-capturing group, repeated zero or one time:
"this is either there, or there is nothing"
[0-9a-f]{8}- exactly 8 characters from the group [0-9a-f]
followed by hyphen "-"
(?:[0-9a-f]{4}-){3} ditto but group of 4, and repeated three times
[0-9a-f]{12} ditto, but group of 12
| OR...
(?=.*[0-9]+) at least one number following this
[A-Z0-9]+ at least one capital letter or number
\.csv the literal string ".csv" (the '.' has to be escaped)
demonstration on regex101.com

Limit length of string containing at least 1 digits , 0 or more characters and optional dash

I am trying to make a regular expression for consumer products models.
I have this regular expression: ([a-z]*-?[0-9]+-?[a-z]*-?){4,}
which I expect to limit this whole special string to 4 or more but what happens is that the limit is applied to only the digits.
So this example matches: E1912H while this does not: EM24A1BF although both should match.
Can you tell me what I am doing wrong or how can I make the limit to the whole special string not only the digits?
Limitations:
1- String contains at least 1 digit
2- string can contains characters
3- string can contain "-"
4- minimum length = 4
Summary of your conditions so far:
require at least 1 digit [0-9]
require at least 4 symbols {4,}
can have characters [a-zA-Z]
can have short dash [-]
The following regexp meets them all:
^(?=.*\d)([A-Za-z0-9-]+){4,}$
Note: ^ and $ symbols mean entire input string is validated. Alter this if it`s not the case.
it cant match... EM24A1BF contains EM, which are 2 [a-z], not 1 as your regex states.
Something like this
[a-z]*-?\d+-?[a-z]*-?\d*[a-z]+
matches both your expression and all these:
E1912H
EM24A1BF
eM24A1BF
eM-24A-1BF
eM-24A-
eM24A-1BF
eM-24A1BF
To be sure your string meets both your requirements (the characters'position and composition AND the length requirement), you need to use a non-consuming regular expression
Check this out
([\w-]*\d+[\w-]*){4,}
it matches the following
32ES5200G
LE32K900
N55XT770XWAU3D

Regex matching Numbers with ,-N and Numbers

I am trying to match strings in the pattern,
Numbers
, or - or N
Numbers
([0-9]+[,-N])+[0-9]+
Should match,
87-7-6
86-6-2,3
4-N-0
87-7-6
86-14-2,3
4-N-0
Is not matching,
4-N-0
Any help?
You need to escape the dash in the set, otherwise it will match all characters from comma to N.
([0-9]+[,\-N])+[0-9]+
It doesn't match 4-N-0 because it doesn't fall into what you describe that it should match. If you want it to match multiple separators, add a + after that set:
([0-9]+[,\-N]+)+[0-9]+
Or perhaps you want to use the exact sequence -N- as one of the separators, so that it won't match for example 4NNNNNNNN0 or 4-,-,-,-,-,0:
([0-9]+([,\-]|-N-))+[0-9]+
The hyphen is a reserved symbol. should it be:
([0-9]+[,\-N])+[0-9]+

Limit number of alpha characters in regular expression

I've been struggling to figure out how to best do this regular expression.
Here are my requirements:
Up to 8 characters
Can only be alphanumeric
Can only contain up to three alpha characters [a-z] (zero alpha characters are valid to)
Any ideas would be appreciated.
This is what I've got so far, but it only looks for contiguous letter characters:
^(\d|([A-Za-z])(?!([A-Za-z]{3,}))){0,8}$
I'd write it like this:
^(?=[a-z0-9]{0,8}$)(?:\d*[a-z]){0,3}\d*$
It has two parts:
(?=[a-z0-9]{0,8}$)
Looksahead and matches up to 8 alphanumeric to the end of the string
(?:\d*[a-z]){0,3}\d*$
Essentially allowing injection of up to 3 [a-z] among \d*
Rubular
On rubular.com
12345678 // matches
123456789
#(#*#$
12345 // matches
abc12345
abcd1234
12a34b5c // matches
12ab34cd
123a456 // matches
Alternatives
I do think regex is the best solution for this, but since the string is short, it would be a lot more readable to do this in two steps as follows:
It must match [a-z0-9]{0,8}
Then, delete all \d
The length must now be <= 3
Do you have to do this in exactly one regular expression? It is possible to do that with standard regular expressions, but the regular expression will be rather long and complicated. You can do better with some of the Perl extensions, but depending on what language you're using, they may or may not be supported. The cleanest solution is probably to check whether the string matches:
^[A-Za-z0-9]{0,8}$
but doesn't match:
([A-Za-z].*){4}
i.e. it's an alpha string of up to 8 characters (first regular expression), but doesn't contain 4 or more alpha characters (possibly separated by other characters (second regular expression).
/^(?!(?:\d*[a-z]){4})[a-z0-9]{0,8}$/i
Explanation:
[a-z0-9]{0,8} matches up to 8 alphanumerics.
Lookahead should be placed before the matching happens.
The (?:\d*[a-z]) matches 1 alphabetic anywhere. The {4} make the count to 4. So this disables the regex from matching when 4 alphabetics can be found (i.e. limit the count to ≤3).
It's better not to exploit regex like this. Suppose you use this solution, are you sure you will know what the code is doing when you revisit it 1 year later? A clearer way is just check rule-by-rule, e.g.
if len(theText) <= 8 and theText.isalnum():
if sum(1 for c in theText if c.isalpha()) <= 3:
# valid
The easiest way to do this would be in multiple steps:
Test the string against /^[a-z0-9]{0,8}$/i -- the string is up to 8 characters and only alphanumeric
Make a copy of the string, delete all non-alphabetic characters
See if the resulting string has a length of 3 or less.
If you want to do it in one regular expression, you can use something like:
/^(?=\d*(?:[a-z]?\d*){0,3}$)[a-z0-9]{0,8}$/i
Which looks for a alphanumeric string between length 0 and 8 (^[a-z0-9]{0,8}$), but first uses a lookahead ((?=\d*(?:[a-z]?\d*){0,3}$)) to make sure that the string
has at most 3 alphabetic characters.

Regular Expression to match set of arbitrary codes

I am looking for some help on creating a regular expression that would work with a unique input in our system. We already have some logic in our keypress event that will only allow digits, and will allow the letter A and the letter M. Now I need to come up with a RegEx that can match the input during the onblur event to ensure the format is correct.
I have some examples below of what would be valid. The letter A represents an age, so it is always followed by up to 3 digits. The letter M can only occur at the end of the string.
Valid Input
1-M
10-M
100-M
5-7
5-20
5-100
10-20
10-100
A5-7
A10-7
A100-7
A10-20
A5-A7
A10-A20
A10-A100
A100-A102
Invalid Input
a-a
a45
4
This matches all of the samples.
/A?\d{1,3}-A?\d{0,3}M?/
Not sure if 10-A10M should or shouldn't be legal or even if M can appear with numbers. If it M is only there without numbers:
/A?\d{1,3}-(A?\d{1,3}|M)/
Use the brute force method if you have a small amount of well defined patterns so you don't get bad corner-case matches:
^(\d+-M|\d+-\d+|A\d+-\d+|A\d+-A\d+)$
Here are the individual regexes broken out:
\d+-M <- matches anything like '1-M'
\d+-\d+ <- 5-7
A\d+-\d+ <- A5-7
A\d+-A\d+ <- A10-A20
/^[A]?[0-9]{1,3}-[A]?[0-9]{1,3}[M]?$/
Matches anything of the form:
A(optional)[1-3 numbers]-A(optional)[1-3 numbers]M(optional)
^A?\d+-(?:A?\d+|M)$
An optional A followed by one or more digits, a dash, and either another optional A and some digits or an M. The '(?: ... )' notation is a Perl 'non-capturing' set of parentheses around the alternatives; it means there will be no '$1' after the regex matches. Clearly, if you wanted to capture the various bits and pieces, you could - and would - do so, and the non-capturing clause might not be relevant any more.
(You could replace the '+' with '{1,3}' as JasonV did to limit the numbers to 3 digits.)
^A?\d{1,3}-(M|A?\d{1,3})$
^ -- the match must be done from the beginning
A? -- "A" is optional
\d{1,3} -- between one and 3 digits; [0-9]{1,3} also work
- -- A "-" character
(...|...) -- Either one of the two expressions
(M|...) -- Either "M" or...
(...|A?\d{1,3}) -- "A" followed by at least one and at most three digits
$ -- the match should be done to the end
Some consequences of changing the format. If you do not put "^" at the beginning, the match may ignore an invalid beginning. For example, "MAAMA0-M" would be matched at "A0-M".
If, likewise, you leave $ out, the match may ignore an invalid trail. For example, "A0-MMMMAAMAM" would match "A0-M".
Using \d is usually preferred, as is \w for alphanumerics, \s for spaces, \D for non-digit, \W for non-alphanumeric or \S for non-space. But you must be careful that \d is not being treated as an escape sequence. You might need to write it \\d instead.
{x,y} means the last match must occur between x and y times.
? means the last match must occur once or not at all.
When using (), it is treated as one match. (ABC)? will match ABC or nothing at all.
I’d use this regular expression:
^(?:[1-9]\d{0,2}-(?:M|[1-9]\d{0,2})|A[1-9]\d{0,2}-A?[1-9]\d{0,2})$
This matches either:
<number>-M or <number>-<number>
A<number>-<number> or A<number>-A<number>
Additionally <number> must not begin with a 0.