Regex for Comma delimited list - regex

What is the regular expression to validate a comma delimited list like this one:
12365, 45236, 458, 1, 99996332, ......

I suggest you to do in the following way:
(\d+)(,\s*\d+)*
which would work for a list containing 1 or more elements.

This regex extracts an element from a comma separated list, regardless of contents:
(.+?)(?:,|$)
If you just replace the comma with something else, it should work for any delimiter.

It depends a bit on your exact requirements. I'm assuming: all numbers, any length, numbers cannot have leading zeros nor contain commas or decimal points. individual numbers always separated by a comma then a space, and the last number does NOT have a comma and space after it. Any of these being wrong would simplify the solution.
([1-9][0-9]*,[ ])*[1-9][0-9]*
Here's how I built that mentally:
[0-9] any digit.
[1-9][0-9]* leading non-zero digit followed by any number of digits
[1-9][0-9]*, as above, followed by a comma
[1-9][0-9]*[ ] as above, followed by a space
([1-9][0-9]*[ ])* as above, repeated 0 or more times
([1-9][0-9]*[ ])*[1-9][0-9]* as above, with a final number that doesn't have a comma.

Match duplicate comma-delimited items:
(?<=,|^)([^,]*)(,\1)+(?=,|$)
Reference.
This regex can be used to split the values of a comma delimitted list. List elements may be quoted, unquoted or empty. Commas inside a pair of quotation marks are not matched.
,(?!(?<=(?:^|,)\s*"(?:[^"]|""|\\")*,)(?:[^"]|""|\\")*"\s*(?:,|$))
Reference.

/^\d+(?:, ?\d+)*$/

i used this for a list of items that had to be alphanumeric without underscores at the front of each item.
^(([0-9a-zA-Z][0-9a-zA-Z_]*)([,][0-9a-zA-Z][0-9a-zA-Z_]*)*)$

You might want to specify language just to be safe, but
(\d+, ?)+(\d+)?
ought to work

I had a slightly different requirement, to parse an encoded dictionary/hashtable with escaped commas, like this:
"1=This is something, 2=This is something,,with an escaped comma, 3=This is something else"
I think this is an elegant solution, with a trick that avoids a lot of regex complexity:
if (string.IsNullOrEmpty(encodedValues))
{
return null;
}
else
{
var retVal = new Dictionary<int, string>();
var reFields = new Regex(#"([0-9]+)\=(([A-Za-z0-9\s]|(,,))+),");
foreach (Match match in reFields.Matches(encodedValues + ","))
{
var id = match.Groups[1].Value;
var value = match.Groups[2].Value;
retVal[int.Parse(id)] = value.Replace(",,", ",");
}
return retVal;
}
I think it can be adapted to the original question with an expression like #"([0-9]+),\s?" and parse on Groups[0].
I hope it's helpful to somebody and thanks for the tips on getting it close to there, especially Asaph!

In JavaScript, use split to help out, and catch any negative digits as well:
'-1,2,-3'.match(/(-?\d+)(,\s*-?\d+)*/)[0].split(',');
// ["-1", "2", "-3"]
// may need trimming if digits are space-separated

The following will match any comma delimited word/digit/space combination
(((.)*,)*)(.)*

Why don't you work with groups:
^(\d+(, )?)+$

If you had a more complicated regex, i.e: for valid urls rather than just numbers. You could do the following where you loop through each element and test each of them individually against your regex:
const validRelativeUrlRegex = /^(^$|(?!.*(\W\W))\/[a-zA-Z0-9\/-]+[^\W_]$)/;
const relativeUrls = "/url1,/url-2,url3";
const startsWithComma = relativeUrls.startsWith(",");
const endsWithComma = relativeUrls.endsWith(",");
const areAllURLsValid = relativeUrls
.split(",")
.every(url => validRelativeUrlRegex.test(url));
const isValid = areAllURLsValid && !endsWithComma && !startsWithComma

Related

How to get items into array from string with comma separated values in type script and any item has comma it will be in double quotes

I've been struggling to get all items of below string into an array.
abc,"de,f",hi,"hello","te,st&" items into an array in Typescript.
If any string has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Tried split function but it fails as my strings can have comma as well.
Any help in this regard is highly appreciated.
Thank you.
If you are looking to use a regular expression matching, can you try a different regEx that would match strings inside quotes first, then strings outside quotes, something like (\".+?\")|(^[^\"]+,)|(,[^\"]+,)
I don't know how relevant it would be in case of TypeScript, but I am guessing you'd be able to work something out that takes this Pattern and gives you the matches one by one
First of all, I think that you are making the things more complicated than what they are by implementing the following logic:
has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Instead of doing this that way, you should systematically put your elements inside double quote:
abc,"de,f",hi,"hello","te,st&"
→
"abc","de,f","hi","hello","te,st&"
you will have then the following string to parse.
A regex like this one will do the job:
(?<=,")([^"]*)(?=",)|(?<=")([^"]*)(?=",)|(?<=")([^"]*)(?="$)
using back references $1$2$3, you can extract your elements.
RegEx /(?:^|,)(\"(?:[^\"])\"|[^,])/ has helped me get the required values.
var test = '"abc,123",test,123,456,"def:get"';
test.split(/(\"(?:[^\"])\"|[^,])/);
Its returning the below array.
["", ""abc,123"", ",", "test", ",", "123", ",", "456", ",", ""def:get"", ""]
And when a particular values in side double quotes,I just trimmed them to get the actual values and have ignore empty items of array..
use the split a string .....
let fullName = "First,Last"
let fullNameArr = fullName.characters.split{$0 == ","}.map(String.init)
fullNameArr[0] // First
fullNameArr[1] // Last

Match return substring between two substrings using regexp

I have a list of records that are character vectors. Here's an example:
'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'
From these names I would like to extract whatever's between the two substrings 1mil_ and _ks_drivers_sorted.csv.
So in this case the output would be:
0,1_1_1_lb200
0_1_lb100
1_1_lb2_100_100
1_1_lb100
I'm using MATLAB so I thought to use regexp to do this, but I can't understand what kind of regular expression would be correct.
Or are there some other ways to do this without using regexp?
Let the data be:
x = {'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'};
You can use lookbehind and lookahead to find the two limiting substrings, and match everything in between:
result = cellfun(#(c) regexp(c, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match'), x);
Or, since the regular expression only produces one match, the following simpler alternative can be used (thanks #excaza for noticing):
result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match', 'once');
In your example, either of the above gives
result =
4×1 cell array
'0,1_1_1_lb200'
'0_1_lb100'
'1_1_lb2_100_100'
'1_1_lb100'
For me the easy way to do this is just use espace or nothing to replace what you don't need in your string, and the rest is what you need.
If is a list, you can use a loop to do this.
Exemple to replace "1mil_" with "" and "_ks_drivers_sorted.csv" with ""
newChr = strrep(chr,'1mil_','')
newChr = strrep(chr,'_ks_drivers_sorted.csv','')

Matlab Extracting sub string from cell array

I have a '3 x 1' cell array the contents of which appear like the following:
'ASDF_LE_NEWYORK Fixedafdfgd_ML'
'Majo_LE_WASHINGTON FixedMonuts_ML'
'Array_LE_dfgrt_fdhyuj_BERLIN Potato Price'
I want to be able to elegantly extract and create another '3x1' cell array with contents as:
'NEWYORK'
'WASHINGTON'
'BERLIN'
If you notice in above the NAME's are after the last underscore and before the first SPACE or '_ML'. How do I write such code in a concise manner.
Thanks
Edit:
Sorry guys I should have used a better example. I have it corrected now.
You can use lookbehind for _ and lookahead for space:
names = regexp(A, '(?<=_)[^\s_]*(?=\s)', 'match', 'once');
Where A is the cell array containing the strings:
A = {...
'ASDF_LE_NEWYORK Fixedafdfgd_ML'
'Majo_LE_WASHINGTON FixedMonuts_ML'
'Array_LE_dfgrt_fdhyuj_BERLIN Potato Price'};
>> names = regexp(A, '(?<=_)[^\s_]*(?=\s)', 'match', 'once')
names =
'NEWYORK'
'WASHINGTON'
'BERLIN'
NOTE: The question was changed, so the answer is no longer complete, but hopefully the regexp example is still useful.
Try regexp like this:
names = regexp(fullNamesCell,'_(NAME\d?)\s','tokens');
names = cellfun(#(x)(x{1}),names)
In the pattern _(NAME\d?)\s, the parenthesis define a subexpression, which will be returned as a token (a portion of matched text). The \d? specifies zero or one digits, but you could use \d{1} for exactly one digit or \d{1,3} if you expect between 1 and 3 digits. The \s specified whitespace.
The reorganization of names is a little convoluted, but when you use regexp with a cell input and tokens you get a cell of cells that needs some reformatting for your purposes.

Struggling with regex logic: how do I remove a param from a url query string?

I'm comparing 2 URL query strings to see if they're equal; however, I want to ignore a specific query parameter (always with a numeric value) if it exists. So, these 2 query strings should be equal:
firstName=bobby&lastName=tables&paramToIgnore=2
firstName=bobby&lastName=tables&paramToIgnore=5
So, I tried to use a regex replace using the REReplaceNoCase function:
REReplaceNoCase(myQueryString, "&paramToIgnore=[0-9]*", "")
This works fine for the above example. I apply the replace to both strings and then compare. The problem is that I can't be sure that the param will be the last one in the string... the following 2 query strings should also be equal:
firstName=bobby&lastName=tables&paramToIgnore=2
paramToIgnore=5&firstName=bobby&lastName=tables
So, I changed the regex to make the preceding ampersand optional... "&?paramToIgnore=[0-9]*". But - these strings will still not be equal as I'll be left with an extra ampersand in one of the strings but not the other:
firstName=bobby&lastName=tables
&firstName=bobby&lastName=tables
Similarly, I can't just remove preceding and following ampersands ("&?paramToIgnore=[0-9]*&?") as if the query param is in the middle of the string I'll strip one ampersand too many in one string and not the other - e.g.
firstName=bobby&lastName=tables&paramToIgnore=2
firstName=bobby&paramToIgnore=5&lastName=tables
will become
firstName=bobby&lastName=tables
firstName=bobbylastName=tables
I can't seem to get my head around the logic of this... Can anyone help me out with a solution?
If you can't be sure of the order the parameters appear i would recommend, that you don't compare them by the string itsself.
I recommend splitting the string up like this:
String stringA = "firstName=bobby&lastName=tables&paramToIgnore=2";
String stringB = "firstName=bobby&lastName=tables&paramToIgnore=5";
String[] partsA = stringA.split("&");
String[] partsB = stringB.split("&");
Then go through arrays and make the paramToIgnore somehow euqal:
for(int i = 0; i < partsA.length; i++)
{
if(partsA[i].startsWith("paramToIgnore"){
partsA[i] = "IgnoreMePlease";
}
}
for(int j = 0; j < partsB.length; j++)
{
if(partsB[i].startsWith("paramToIgnore"){
partsB[i] = "IgnoreMePlease";
}
}
Then you can sort and compare the arrays to see if they are equal:
Arrays.sort(partsA);
Arrays.sort(partsB);
boolean b = Arrays.equals(partsA, partsB);
I'm pretty sure it's possible to make this more compact and give it a better performance. But with comparing strings like you do, you somehow alsways have to care about the order of your parameters.
You can use the QueryStringDeleteVar UDF on cflib to remove the query string variables you want to ignore from both strings, then compare them.
Make it in two steps:
first remove your param, as you described in example
then remove ampersand which is left at the begining or the end of query with separate regex, or any double/triple/... ampersands in the middle of the query
How about having an 'or' in the RegEx to match an ampersand at the start or the end?
&paramToIgnore=[0-9]*|paramToIgnore=[0-9]*&
Seems to do the job when testing in regexpal.com
try changing it to:
REReplaceNoCase(myQueryString, "&?paramToIgnore=[0-9]+", "")
plus instead of star should capture 1 or more of the preceding matched characters. It won't match anything but 0-9 so if there is another parameter after that it'll stop when it can't match any more digits.
Alternatively, you could use:
REReplaceNoCase(myQueryString, "&?paramToIgnore=[^&]", "")
This will match anything but an ampersand. It will cover the case if the parameter exists but there is no value; which is probably something you'd want to account for.

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed