Can anyone please help me to find the suitable regular expression to validate a string that has comma separated numbers, for e.g. '1,2,3' or '111,234234,-09', etc. Anything else should be considered invalid. for e.g. '121as23' or '123-123' is invalid.
I suppose this must be possible in Flex using regular expression but I can not find the correct regular expression.
#Justin, I tried your suggestion /(?=^)(?:[,^]([-+]?(?:\d*\.)?\d+))*$/ but I am facing two issues:
It will invalidate '123,12' which should be true.
It won't invalidate '123,123,aasd' which is invalid.
I tried another regex - [0-9]+(,[0-9]+)* - which works quite well except for one issue: it validates '12,12asd'. I need something that will only allow numbers separated by commas.
Your example data consists of three decimal integers, each having an optional leading plus or minus sign, separated by commas with no whitespace. Assuming this describes your requirements, the Javascript/ActionScript/Flex regex is simple:
var re_valid = /^[-+]?\d+(?:,[-+]?\d+){2}$/;
if (re_valid.test(data_string)) {
// data_string is valid
} else {
// data_string is NOT valid
}
However, if your data can contain any number of integers and may have whitespace the regex becomes a bit longer:
var re_valid = /^[ \t]*[-+]?\d+[ \t]*(,[ \t]*[-+]?\d+[ \t]*)*$/;
If your data can be even more complex (i.e. the numbers may be floating point, the values may be enclosed in quotes, etc.), then you may be better off parsing the string as a CSV record and then check each value individually.
Looks like what you want is this:
/(?!,)(?:(?:,|^)([-+]?(?:\d*\.)?\d+))*$/
I don't know Flex, so replace the / at the beginning and end with whatever's appropriate in Flex regex syntax. Your numbers will be in match set 1. Get rid of the (?:\d*\.)? if you only want to allow integers.
Explanation:
(?!,) #Don't allow a comma at the beginning of the string.
(?:,|^) #Your groups are going to be preceded by ',' unless they're the very first group in the string. The '(?:blah)' means we don't want to include the ',' in our match groups.
[-+]? #Allow an optional plus or minus sign.
(?:\d*\.)?\d+ #The meat of the pattern, this matches '123', '123.456', or '.456'.
* #Means we're matching zero or more groups. Change this to '+' if you don't want to match empty strings.
$ #Don't stop matching until you reach the end of the string.
Related
I have some string
022/03/17 05:53:40.376949 1245680 029 DSA- DREP COLS log debug S 1
Need get 1245680 number use regex statement
I use next regular \d+ but many result in output.
First: are you sure that you want to have regex? Wouldn't a string cut operation be better?
First for a fixed amount of 29 characters as this is the prefix length and then search for the next space in the rest of the string to clear the remainder.
If you have to use regex for some other reason (e.g. you don't have the ability to implement a routine where you need it), you can use a regex with a group to extract just the number you want: ^.{29}(\d+).*$
Here you have to use group(1) or any other reference to a group in the language you are using to get the value you want.
As the rest of the line also can contain numbers (and I suppose a variable amount of characters, if this a log entry), my simple attempts to use lookbehind and lookahead combination failed as they also found that other numbers in the line.
If 022/03/17 05:53:40.376949 is always in that format, you can use:
\d{2}:\d{2}:\d{2}.\d{1,6}\s*(\d*)\s*
or more generally:
\d*\/\d*\/\d*\s+.*?\s+(\d*)
These will match the date/time segment, whitespace, the sequence of (captured) digits you desire, and then more whitespace.
I'm trying to use prxmatch to verify if postcode format (UK) is correct. The ('/^[A-Z]{1,2}\d{2,3}[A-Z]{2}|[A-Z]{1,2}\d[A-Z]\d[A-Z]{2}$/') bit covers (I think) all the possible post code formats used in UK, however I only want exact and not partial matches and no additional chars before or after match.
data pc_flag ; set abc ;
format pc_correct_flag $1. compressed_postcode $100.;
compressed_postcode = compress(postcode);
pc_regex = prxparse('/^[A-Z]{1,2}\d{2,3}[A-Z]{2}|[A-Z]{1,2}\d[A-Z]\d[A-Z]{2}$/');
if prxmatch(pc_regex,compressed_postcode)>0
then pc_correct_flag='Y';
else pc_correct_flag='N';run;
I was expecting 'Y' only on exact matches on full string, i.e. with no additional characters before and after regex. However, I'm also getting false positives, where a part of 'compressed_postcode' matches regex, but there are additional characters after the match, which I thought using $ would prevent.
I.e. I'd expect only something like AA11AA to match, but not AA11AAAA. I suspect this has to do with $ positioning but can't figure out exactly what's wrong. Any idea what I've missed?
SAS character variables contain trailing spaces out to the length of the variable. Either trim the value to be examined, or add \s*$ as the pattern termination.
if prxmatch(pc_regex,TRIM(compressed_postcode))>0 then …
Your regex is quite permissive - it allows every letter of the alphabet in every valid character position, so it matches quite a lot of strings that look like valid postcodes but do not exist as such, e.g. ZZ1 1ZZ.
I provided a more specific SAS-compatible postcode regex as an answer to another question - here's link in case this proves useful to you:
https://stackoverflow.com/a/43793562/667489
That one still matches some non-postcode strings, but it filters out any with characters on Royal Mail's blacklists for each position within the postcode.
As per Richard's answer, you need to trim the string being matched before applying the regex, or amend the regex to match extra trailing blanks.
I want to remove all numbers from a paragraph except from some words.
My attempt is using a negative look-ahead:
gsub('(?!ami.12.0|allo.12)[[:digit:]]+','',
c('0.12','1245','ami.12.0 00','allo.12 1'),perl=TRUE)
But this doesn't work. I get this:
"." "" "ami.. " "allo."
Or my expected output is:
"." "" 'ami.12.0','allo.12'
You can't really use a negative lookahead here, since it will still replace when the cursor is at some point after ami.
What you can do is put back some matches:
(ami.12.0|allo.12)|[[:digit:]]+
gsub('(ami.12.0|allo.12)|[[:digit:]]+',"\\1",
c('0.12','1245','ami.12.0 00','allo.12 1'),perl=TRUE)
I kept the . since I'm not 100% sure what you have, but keep in mind that . is a wildcard and will match any character (except newlines) unless you escape it.
Your regex is actually finding every digit sequence that is not the start of "ami.12.0" or "allo.12". So for example, in your third string, it gets to the 12 in ami.12.0 and looks ahead to see if that 12 is the start of either of the two ignored strings. It is not, so it continues with replacing it. It would be best to generalize this, but in your specific case, you can probably achieve this by instead doing a negative lookbehind for any prefixes of the words (that can be followed by digit sequences) that you want to skip. So, you would use something like this:
gsub('(?<!ami\\.|ami\\.12\\.|allo\\.)[[:digit:]]+','',
c('0.12','1245','ami.12.0 00','allo.12 1'),perl=TRUE)
I want to split a comma seperated list of email addresses AND I want to get the user friendly names within those email addresses if there is one.
Now I use this regular expression:
(?<value>(?<normalized>.*?)\[.*?\])\s*,*\s*
This reg exp works for input string
"Eline[Elinek#yahoo.com],raymond[raymondc#yahoo.com]"
It returns two pairs:
value 'Eline[Elinek#yahoo.com]' normalized 'Eline'
value 'raymond[raymondc#yahoo.com]' normalized 'raymond'
but it doesn't work for input string
"Eline[Elinek#yahoo.com],piet#yahoo.com,raymond[raymondc#yahoo.com]"
It should return 3 email addresses with normalized empty in the second case.
Why should your second example return 3 matches? The second email has no [...], which you require in your pattern, so this address is additionally matched by (?<normalized>.*?) of the third email address.
Try this here instead:
(?<value>(?<normalized>[^,]*?)\[.*?\]|[^,\[\]]*)\s*,?\s*
See it here on Regexr
But this is getting unreadable, why not at first split on commas and work then on the resulting array?
You can try this pattern:
(?<value>(?<normalized>[^\[,]*?)\[?[^,]*\]?)
It seems that your pattern is not intended to match the whole input string, and you intent to iterate through different matches, therefore there's no need to add the patterns for commas in the end.
The normalized group matches characters while they are not either [ or ,. The group for value makes [, and ] optional, and matches any character in between while they are not a comma.
I have the need to check whether strings adhere to a particular ID format.
The format of the ID is as follows:
aBcDe-fghIj-KLmno-pQRsT-uVWxy
A sequence of five blocks of five letters upper case or lower case, separated by one dash.
I have the following regular expression that works:
string idFormat = "[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}";
Note that there is no trailing dash, but the all of the blocks within the ID follow the same format. Therefore, I would like to be able to represent this sequence of four blocks with a trailing dash inside the regular expression and avoid the duplication.
I tried the following, but it doesn't work:
string idFormat = "[[a-zA-Z]{5}[-]{1}]{4}[a-zA-Z]{5}";
How do I shorten this regular expression and get rid of the duplicated parts?
What is the best way to ensure that each block does also not contain any numbers?
Edit:
Thanks for the replies, I now understand the grouping in regular expressions.
I'm running a few tests against the regular expression, the following are relevant:
Test 1: aBcDe-fghIj-KLmno-pQRsT-uVWxy
Test 2: abcde-fghij-klmno-pqrst-uvwxy
With the following regular expression, both tests pass:
^([a-zA-Z]{5}-){4}[a-zA-Z]{5}$
With the next regular expression, test 1 fails:
^([a-z]{5}-){4}[a-z]{5}$
Several answers have said that it is OK to omit the A-Z when using a-z, but in this case it doesn't seem to be working.
You can try:
([a-z]{5}-){4}[a-z]{5}
and make it case insensitive.
If you can set regex options to be case insensitive, you could replace all [a-zA-Z] with just plain [a-z]. Furthermore, [-]{1} can be written as -.
Your grouping should be done with (, ), not with [, ] (although you're correctly using the latter in specifying character sets.
Depending on context, you probably want to throw in ^...$ which matches start and end of string, respectively, to verify that the entire string is a match (i.e. that there are no extra characters).
In javascript, something like this:
/^([a-z]{5}-){4}[a-z]{5}$/i
This works for me, though you might want to check it:
[a-zA-Z]{5}(-[a-zA-Z]{5}){4}
(One group of five letters, followed by [dash+group of five letters] four times)
([a-zA-Z]{5}[-]{1}){4}[a-zA-Z]{5}
Try
string idFormat = "([a-zA-Z]{5}[-]{1}){4}[a-zA-Z]{5}";
I.e. you basically replace your brackets by parentheses. Brackets are not meant for grouping but for defining a class of accepted characters.
However, be aware that with shortened versions, you can use the expression for validating the string, but not for analyzing it. If you want to process the 5 groups of characters, you will want to put them in 5 groups:
string idFormat =
"([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})";
so you can address each group and process it.