regex strings separated by commas validating length of each string - regex

I have a set of strings separated with commas,
like : cat,dog,Elephant
what to validate is like strings separated with commas should
range from length of 3 to 6 . (strings can be anything like .&^*#$)
i.e a9&,bbbb,cc,ddddddd
in the above strings cc,ddddddd are invalid since dint come into
the range of length 3 t0 6.
In this way a9&,bbbb,ccc,a12$%,adsdff
I went through many question that where posted in stack overflow
and got some ideas from it
^[1-9]\d([,][1-9]\d){0,3}$ this is a regex i got from stackoverflow posted question
this accepts digits alone but I need alphanumeric
I tired to change but dint work
^1-9a-zA-z{0,3}$
Could you please help me out?
and explain what does each symbol means so that i could learn more from
you people.
Thank you for posting answers for my previous questions too.

[^,] will accept everything BUT the comma that you are using as a separator. It isn't clear what your regex should give you, if the substrings that are not long 3-6, the substrings that ARE long 3-6, both mixed, both divided or what.
Try this:
Regex rx = new Regex("^(?:(?:([^,]{3,6})|(?:[^,]*))(?:,|$))*");
var matches = rx.Match("AA,BB&B,!CC,DDDDDD,EE");
foreach (Capture capture in matches.Groups[1].Captures) {
string oneCapture = capture.Value;
}
The captures will be only the strings that are long 3-6.

I believe what you want is the following;
^([^,]{3,6},)*[^,]{3,6}$
To break this down the first ^ matches the beginning of a line the [^,]{3,6}, means 3 to 6 characters of anything but a comma followed by a single comma. the ( )* enclosing that means repeat this 0 or more times then the last [^,]{3,6}$ part says end this with 3 to 6 characters which aren't a comma.

This should do the trick if the regex you mentioned already works fine for digits.
^.\d([,].\d){0,3}$
For reference I often use msdn reference, but it's kept a bit short to begin with, maybe someone else can provide a good tutorial.
There are some tools out there like expresso which help test and develop regexes.

I think the following expression does what you want:
^(?:([^,]{3,6}),?)*$
The [^,]{3,6} part means "any character that is not a comma, 3 to 6 repetitions". That is the core of the expression. The parenthesis make a group, which will allow you to retrieve the values that were captured by that group.
The ,? part means "a comma, zero or one times".
These parts are surrounded by a non-capturing group (?: ... ). That means that the contained expression is grouped, but you won't be able to retrieve the values that were captured by it. That group is necessary to apply a repetition charater *, which means "repeat the previous group zero or more times".
The anchors ^ and $ mean "beginning of string" and "end of string". They prevent the expression from matching only part of a string. If you were searching for a pattern inside a larger string, you wouldn't want them.
You might want to try Expresso to learn more about regular expressions. The program has an analyzer that describes the various parts of the expression.

Related

Detect multiple periods in Regex and kill entire match

I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$

if else failing regex

I want to write a regex to find if word is of 3 characters length, but preceding by m_ is optional. In that case m_ followed by minimum of 3 characters is required.
Basically I want to match
Abc or m_abc and dont match ab or m_ac
(^(m_))?([a-zA-Z0-9_]{3,})|(^[a-zA-Z0-9_]{3,}$)
I tried an if loop but it is matching the text m_a also.
Can you please help me what I am missing here
Maybe I wrote my regex wrong.
I want something like
if(m_ found)
"followed by 3 characters required"
else
"Look if total number of characters is 3"
Thanks.
You could use this regular expression, which either requires the m_ at the start or forbids it (by negative look-ahead):
^(m_|(?!m_))\w{3,}$
See regex tester
If negative look-head is not a feature you can use, then you could go for this more elaborate regex, which goes through the different options for the first two characters:
^(m_\w{3,}|m[A-Za-z0-9]\w+|[A-Za-ln-z0-9]\w{2,})$
See regex tester
Do you want your 3-character word to be able to have underscores in it? Because if not, then you can change [a-zA-Z0-9_] to [a-zA-Z0-9], and it should not match m_a in that case. And unless you want to match numerals, you can simplify it further to [a-zA-z].
The main error is that the ^ line anchor is inside the optional parenthesized expression. You want beginning of line unconditionally, followed by an optional m_?.
You can simplify the rest significantly. Three or more is captured by an expression which requires three characters; the regex will succeed at that point, whether or not you are at the end of the input.
^(m_)?[a-zA-Z0-9_]{3}
The underscore in the character class seems somewhat dubious. Do you really intend for a "word" to include underscores? Then m_ac will also match, because it is at least three characters long and consists of characters in the set, even though you say it is specifically disallowed.

Regex a decimal number with comma

I'm heaving trouble finding the right regex for decimal numbers which include the comma separator.
I did find a few other questions regarding this issue in general but none of the answers really worked when I tested them
The best I got so far is:
[0-9]{1,3}(,([0-9]{3}))*(.[0-9]+)?
2 main problems so far:
1) It records numbers with spaces between them "3001 1" instead of splitting them to 2 matches "3001" "1" - I don't really see where I allowed space in the regex.
2) I have a general problem with the beginning\ending of the regex.
The regex should match:
3,001
1
32,012,111.2131
But not:
32,012,11.2131
1132,012,111.2131
32,0112,111.2131
32131
In addition I'd like it to match:
1.(without any number after it)
1,(without any number after it)
as 1
(a comma or point at the end of the number should be overlooked).
Many Thanks!
.
This is a very long and convoluted regular expression that fits all your requirements. It will work if your regex engine is based on PCRE (hopefully you're using PHP, Delphi or R..).
(?<=[^\d,.]|^)\d{1,3}(,(\d{3}))*((?=[,.](\s|$))|(\.\d+)?(?=[^\d,.]|$))
DEMO on RegExr
The things that make it so long:
Matching multiple numbers on the same line separated by only 1 character (a space) whilst not allowing partial matchs requires a lookahead and a lookbehind.
Matching numbers ending with . and , without including the . or , in the match requires another lookahead.
(?=[,.](\s|$)) Explanation
When writing this explanation I realised the \s needs to be a (\s|$) to match 1, at the very end of a string.
This part of the regex is for matching the 1 in 1, or the 1,000 in 1,000. so let's say our number is 1,000. (with the . on the end).
Up to this point the regex has matched 1,000, then it can't find another , to repeat the thousands group so it moves on to our (?=[,.](\s|$))
(?=....) means its a lookahead, that means from where we have matched up to, look at whats coming but don't add it to the match.
So It checks if there is a , or a . and if there is, it checks that it's immediately followed by whitespace or the end of input. In this case it is, so it'd leave the match as 1,000
Had the lookahead not matched, it would have moved on to trying to match decimal places.
This works for all the ones that you have listed
^[0-9]{1,3}(,[0-9]{3})*(([\\.,]{1}[0-9]*)|())$
. means "any character". To use a literal ., escape it like this: \..
As far as I know, that's the only thing missing.

RegEx padding numbers surrounded by other characters

I am looking for a RegEx that captures a series of digits surrounded by a string pattern and pads that series of digits with leading zeros up to 4 digits. At the same time all spaces should be removed from the entire string.
Some examples:
"F12b" should capture "12" and return "F0012b"
"AB 214/3" should capture "214" and return "AB0214/3"
"G0124" should capture "0124" and return the original string unchanged
The source string should adhere to the following rules:
- should start with [a-zA-Z]
- after the above pattern can be any number of optional spaces
- the numeric sequence can be followed by another string
- the numeric sequence can be any number of digits. Only if there are less than 4 digits is the sequence to be padded with leading zeros, otherwise it remains unchanged.
- I am only interested in the first occurrance within a string
I am posting this question here because I don't use RegEx often enough to figure this one out, but I know it's a perfect case for RegEx.
Any help is greatly appreciated, and an explanation of the expression would certainly help me understand it.
To match that and extract the info you want, regex is fine, you can use this:
^([a-zA-Z]+)\s*(\d+)(.*)
See it here on regexr. You see only that the space has been removed in your second example, but all needed information is captured in $1, $2 and $3
Regular expressions are a tool to match patterns. Using that pattern within a replacement method and how the replacement string can be build is completely language dependent and has nothing to do with regex. Without knowing the language this part can not be answered.

How to match everything up to the second occurrence of a character?

So my string looks like this:
Basic information, advanced information, super information, no information
I would like to capture everything up to second comma so I get:
Basic information, advanced information
What would be the regex for that?
I tried: (.*,.*), but I get
Basic information, advanced information, super information,
This will capture up to but not including the second comma:
[^,]*,[^,]*
English translation:
[^,]* = as many non-comma characters as possible
, = a comma
[^,]* = as many non-comma characters as possible
[...] is a character class. [abc] means "a or b or c", and [^abc] means anything but a or b or c.
You could try ^(.*?,.*?),
The problem is that .* is greedy and matches maximum amount of characters. The ? behind * changes the behaviour to non-greedy.
You could also put the parenthesis around each .*? segment to capture the strings separately if you want.
I would take a DRY approach, like this:
^([^,]*,){1}[^,]*
This way you can match everything until the n occurrence of a character without repeating yourself except for the last pattern.
Although in the case of the original poster, the group and repetition of the group is useless I think this will help others that need to match more than 2 times the pattern.
Explanation:
^ From the start of the line
([^,]*,) Create a group matching everything except the comma character until it meet a comma.
{1} Count the above pattern (the number of time you need)-1. So if you need 2 put 1, if you need 20 put 19.
[^,]* Repeat the pattern one last time without the tailing comma.
Try this approach:
(.*?,.*?),.*
Link to the solution