Regex to require space after comma in list - regex

I want to require a space after every comma in a list. I've got this, which works pretty well for my lists that have 5 to 7 digits, separated by commas.
^([^,]{5,7},)*[^,][^ ]{5,7}$
The problem is it allows 12345,12345. I don't want that to pass. 12345, 12345 should pass. I also need just 12345 to pass, so the comma and space is not required if it's just one 5-7 digit number.

Your regex does not match 12345,12345 because this part ([^,]{5,7},)* will match from the start including the comma.
Then it matches not a comma [^,] which will match the second 1 and then it has to match not a whitespace [^ ]{5,7} but there are only 4 characters left to match which are 2345 and it can not match.
If the first part fails it tries to match [^,][^ ]{5,7} which in total matches 6-8 characters.
You might use:
^[^,\s]{5,7}(?:, [^,\s]{5,7})*$
Regex demo
^ Start of the string
[^,\s]{5,7} Match not a whitespace character of a comma 5 - 7 times
(?: Non capturing group
, [^,\s]{5,7} Match a comma, space and not a comma or a whitespace character 5-7 times
)* Close non capturing group and repeat 0+ times
$ End of the string

I didn't understand your regex, but something as simple as this should work:
^(?:\d{5,7}, )*\d{5,7}$
Or if you didn't intend to allow digit-only,
^(?:[^, ]{5,7}, )*[^, ]{5,7}$

Related

Regex pattern reads correctly but doesn't produce desired result

I am testing the following regex:
(?<=\d{3}).+(?!',')
This at regex101 regex
Test string:
187 SURNAME First Names 7 Every Street, Welltown Racing Driver
The sequence I require is:
Begin after 3 digit numeral
Read all characters
Don't read the comma
In other words:
SURNAME First Names 7 Every Street
But as demo shows the negative lookahead to the comma has no bearing on the result. I can't see anything wrong with my lookarounds.
You could match the 3 digits, and make use of a capture group capturing any character except a comma.
\b\d{3}\b\s*([^,]+)
Explanation
\b\d{3}\b Match 3 digits between word boundaries to prevent partial word matches
\s* Match optional whitespace chars
([^,]+) Capture group 1, match 1+ chars other than a comma
Regex demo
.+ consumes everything.
So (?!,) is guaranteed to be true.
I'm not sure if using quotes is correct for whichever flavour of regex you are using. Bare comma seems more correct.
Try:
(?<=\d{3})[^,]+

Regex to pull first two fields from a comma separated file

I want to pull the second string in a commma delimited list where the first value is numeric and the second is alpha.
I'm using \d[^,]+(?=,) to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.
Here's part of a sample file that I'm trying to extract data from:
Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)
//line break here is for clarity and does not exist in file//
4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N
You could make use of a capturing group if you want to match the second string by first matching 1+ digits and a comma.
Then capture in a group matching 1+ chars a-zA-Z and match the trailing comma.
^\d+,([a-zA-Z]+(?: [a-zA-Z]+)*),
^ Start of string
\d+, Match 1+ digits and a comma (Or use (\d+), if the digits should also be a group)
( Capture group 1
[a-zA-Z]+ Match 1+ chars a-zA-Z
(?: [a-zA-Z]+)* Repeat matching the same as previous preceded by a space
), Close capturing group and match trailing comma
Regex demo
To get a bit broader match you could use this pattern to match at least a single char a-zA-Z
\d+,([a-zA-Z ]*[a-zA-Z][a-zA-Z ]*),
Regex demo
Note that this part in your pattern \d[^,]+ matches not only digits, but 1 digit followed by 1+ times any char except a comma which would for example also match 4a$ .
You could try this regex:
^\d+,([^,]+),
This will look for lines:
starting with one or more digits
followed by a comma
capture anything that is not a comma
followed by a comma
See it at Regex 101
If not all lines contain a name, then change the + to a *:
^\d+,([^,]*),
See alternative regex

how to match a list of fixed length words separated by space or comma?

The words' length could be 2 or 6-10 and could be separated by space or comma. The word only include alphabet, not case sensitive.
Here is the groups of words that should be matched:
RE,re,rereRE
Not matching groups:
RE,rere,rel
RE,RERE
Here is the pattern that I have tried
((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|\s+)?)
But unfortunately this pattern can match string like this: RE,RERE
Look like the word boundary has not been set.
You could match chars a-z either 2 or 6 - 10 times using an alternation
Then repeat that pattern 0+ times preceded by a comma or a space [ ,].
^(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*$
Explanation
^ Start of string
(?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match chars a-z 6 -10 or 2 times
(?: Non capturing group
[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match comma or space and repeat previous pattern
)* Close non capturing group and repeat 0+ times
$ End of string
Regex demo
If lookarounds are supported, you might also assert what is directly on the left and on the right is not a non whitespace character \S.
(?<!\S)(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[ ,](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*(?!\S)
Regex demo
([a-zA-Z]{2}(,|\s)|[a-zA-Z]{6,10}|(,|\s))
This one will get only the words who have 2 letter, or between 6 and 10
\b,?([a-zA-Z]{6,10}|[a-zA-Z]{2}),?\b
You can use this
^(?!.*\b[a-z]{4}\b)(?:(?:[a-z]{2}|[a-z]{6,10})(?:,|[ ]+)?)+$
Regex Demo
This regex will match your first case, but neither of your two other cases:
^((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|[ ]+|$))+$
I'm making the assumption here that each line should be a single match.
Here it is in action.

RegEx for N number of spaces in a string

I am looking to create groups, that are separated by 4 spaces
The problem is that if the group contains any space, other than the 4 space separator, there is no match with the regex I have tried so far
This is what I have tried.
Let's say I have these 2 lines, with 4 spaces between the words
word 1 word 2
word1 word2
and the regex is
^([^ {4}]*) {4}([^ {4}]*)$
This matches only the 2nd line. The presence of any space anywhere other than the 4 space separator, will not match the line.
My expectation is to match and have the correct groups identified, in both these lines.
This RegEx might help you to divide your input strings into five groups, where the second and fourth groups are the four-space:
([a-zA-Z0-9_ ]*)(\s{4})([a-zA-Z0-9_ ]*)(\s{4})([a-zA-Z0-9_ ]*)
If you may not have space in your columns, you could simplify it using this RegEx:
(\w+)(\s{4})(\w+)(\s{4})(\w+)
After some experimentation and based on the good suggestions here, I came us with This RegEx:
^(.*?) (.*?) (.*?)$
On the surface it does what I need. The last line has more 4 space blocks at the end, but that should not happen. Any pitfall that I am not seeing?
Instead of using a non greedy dot star .*? approach, you could specify the characters that you want to match.
If your data contains for example only words, you could match 1+ word chars \w+ followed by a repeating pattern (\w+(?: \w+)*) to match a space and 1+ word chars followed by matching 4 spaces.
Note that if you want to match more that a word character, you could use a character class and add the characters that you would allow to match.
^(\w+(?: \w+)*) {4}(\w+(?: \w+)*) {4}(\w+(?: \w+)*)$
Regex demo

Regular Expression, with number spaces dashes limited to 8-13 numbers

I am trying to do a regular expression to validate a number between 9 and 13 numbers, but the sequence can have dashes and spaces and the ideal is to not have more than one space or dash consecutively.
this rule allow me to control the validation between 9 and 13
/^[\d]{9,13}$/
now to add dashes and spaces
/^[\d -]{9,13}$/
I think I need something like that, but I need to count the numbers
/^[ -](?:\d){9,13}$/
Any tips?
Notice how my regex starts and ends with a digit. Also, this prevents consecutive spaces and dashes.
/^\d([ \-]?\d){7,12}$/
It appears that you don't want leading or trailing spaces and dashes. This should do it.
/^\d([- ]*\d){8,12}$/
Regular expression:
\d digits (0-9)
( group and capture to \1 (between 8 and 12 times)
[- ]* any character of: '-', ' ' (0 or more times)
\d digits (0-9)
){8,12} end of \1
Another option: A digit followed any number of space or dash 8-12 times, followed by a digit.
/^(\d[- ]*){8,12}\d$/
Use look aheads to assert the various constraints:
/^(?!.*( |--))(?=(\D*\d){9,13}\D*$)[\d -]+$/
Assuming a dash following a space or vice versa is ok:
^( -?|- ?)?(\d( -?|- ?)?){9,13}$
Explanation:
( -?|- ?) - this is equivalent to ( | -|-|- ). Note that there can't be 2 consecutive dashes or spaces here, and this can only appear at the start or directly after a digit, so this prevents 2 consecutive dashes or spaces in the string.
And there clearly must be exactly one digit in (\d( -?|- ?)?), thus the {9,13} enforces 9-13 digits.
Assuming a dash following a space or vice versa is NOT ok:
^[ -]?(\d[ -]?){9,13}$
Explanation similar to the above.
Both of the above allows the string to start or end with a digit, dash or space.