Regex matching Numbers with ,-N and Numbers - regex

I am trying to match strings in the pattern,
Numbers
, or - or N
Numbers
([0-9]+[,-N])+[0-9]+
Should match,
87-7-6
86-6-2,3
4-N-0
87-7-6
86-14-2,3
4-N-0
Is not matching,
4-N-0
Any help?

You need to escape the dash in the set, otherwise it will match all characters from comma to N.
([0-9]+[,\-N])+[0-9]+
It doesn't match 4-N-0 because it doesn't fall into what you describe that it should match. If you want it to match multiple separators, add a + after that set:
([0-9]+[,\-N]+)+[0-9]+
Or perhaps you want to use the exact sequence -N- as one of the separators, so that it won't match for example 4NNNNNNNN0 or 4-,-,-,-,-,0:
([0-9]+([,\-]|-N-))+[0-9]+

The hyphen is a reserved symbol. should it be:
([0-9]+[,\-N])+[0-9]+

Related

Regex to Delete Consecutive Duplicates (integers and/or floats) from Comma Separated List

I have a (sorted) list like this and seek a regular expression to match consecutive duplicates.
1,1,1.28,1.35,1.4,1.4,2,2,4,7.5,7.56
I tried different options and the best so far was ?:^|,)([^,]+)(,[ ]*\1)+, but obviously it does not take into account cases like 1,1,1.28 (see demo).
In plain words, the regex I would need it:
Whatever there's inside two commas, match if there is a duplicate
You can use
(?<![^\D,])(\d+(?:\.\d+)?)(?:,\1)+(?![^,]|\.?\d)
Replace with $1. See the regex demo.
Details:
(?<![^\D,]) - immediately to the left of the current location, there can be no char other than a non-digit or comma
(\d+(?:\.\d+)?) - Group 1: one or more digits followed with an optional sequence of . and one or more digits
(?:,\1)+ - one or more sequences of a comma and Group 1 values
(?![^,]|\.?\d) - immediately to the right, there can't be a char other than a , or an optional . followed with a digit.
My take on this is:
(\b\d+(?:\.\d+)?\b)(?:,\1)+[^\.\d]
What's good about this one in particular is that it matches all the commas between the repeating numbers. That is handy in case you have to only retain one copy of a number in the list and delete all the others - you can simply delete the entire match and substitute it back with group 1 content, and the comma order will still be as expected - a,b,c! Or in case you need to remove duplicates entirely, just remove all matches (again, the order will be the same).
Explanation:
(\b\d+(?:\.\d+)?\b) matches a number, possibly a decimal fraction. "boundaries" are used in order not to match "...,11,1,...". This exact ordering of numbers is not allowed (11>1), but I inserted it just to make sure there will no problems of similar kind.
(?:,\1)+ matches a comma and then the previously found number. Here we use the fact that the numbers are sorted.
[^\.\d] is tricky: in case the first non-mathing number has a dot and the matching doesn't, we have to stop and do not match the dot. Also we have to not match "7.5,7.56", and for that we can use "not digit". But then we have to match everything else, including end of line. So as a substitute for "not digit AND not dot" I used "not (digit or dot)".

Regex Match string having exact number of a char

I Have some strings:
1:2:3:4:5
2:3:4:5
5:3:2
6:7:8:9:0
How to find strings that have exact numbers of colons?
Example I need to find strings where 4 colons.
Result:
1:2:3:4:5 and 6:7:8:9:0
Edit:
No matter what text between colons, it may so:
qwe:::qwe:
:998:qwe:3ee3:00
I have to specify a number of colons, but using regexp_matches.
It something like filter to search broken strings.
Thanks.
With N being the number you search for:
"^([^:]*:){N}[^:]*$"
Here is a test:
for s in ":::foo:bar" "foo:bar:::" "fo:o:ba::r" ; do echo "$s" | egrep "^([^:]*:){4}[^:]*$" ; done
Change 4 to 3 and 5, to see it not matching.
Maybe postgresql needs specific flags or masking for some elements.
"^([^:]*:){N}[^:]*$"
"^ $"
# matching the whole String/Line, not just a part
"([^:]*:){N}[^:]*"
"( ){N}[^:]*"
# N repetitions of something, followed by an arbitrary number of non-colons (maybe zero)
"([^:]*:)"
# non-colons in arbitrary number (including zero), followed by a colon
You want to use the quantifier syntax {4}. The quantifier is used to indicate that the preceding capture group needs to occur n number of times in order to meet the matching criteria. To find the pattern of five digits separated by semi-colons. something like the following would work.
((\d\:){4}\d)
I am assuming you may want any digit or word character but not whitespace or punctuation. In that case use the word character (\w).
((\w)[\:]){4}(\w))
But depending on what you would like to do with that pattern you may need a different regular expression. If you wanted to capture and replace all the colons while leaving the digits intact your pattern would need to use string replacement or more advanced grouping.
Any number of any characters, including 4 :
((.*:){4}.*)

Regular Expression begining of string with special characters

Using this for an example string
+$43073$7
and need the 5 number sequence from it I'm using the Regex expression
#"\$+(?<lot>\d{5})"
which is matching up any +$ in the string. I tried
#"^\$+(?<lot>\d{5})"
as the +$ are always at the beginning of the string. What will work?
If you use anchor ^, you need to include the + symbol at the first and don't forget to escape it because + is a special meta character in regex which repeats the previous token one or more times.
#"^\+\$(?<lot>\d{5})"
And without the anchor, it would be like
#"\$(?<lot>\d{5})"
And get the 5 digit number you want from group index 1.
DEMO
I would match what you want:
\d+
or if you only want digits after "special" characters at the start of input:
^\W+(\d+)
grabbing group 1

regex: find one-digit number

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks
First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).
Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b
Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).
It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';
If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.

Regular Expression to match set of arbitrary codes

I am looking for some help on creating a regular expression that would work with a unique input in our system. We already have some logic in our keypress event that will only allow digits, and will allow the letter A and the letter M. Now I need to come up with a RegEx that can match the input during the onblur event to ensure the format is correct.
I have some examples below of what would be valid. The letter A represents an age, so it is always followed by up to 3 digits. The letter M can only occur at the end of the string.
Valid Input
1-M
10-M
100-M
5-7
5-20
5-100
10-20
10-100
A5-7
A10-7
A100-7
A10-20
A5-A7
A10-A20
A10-A100
A100-A102
Invalid Input
a-a
a45
4
This matches all of the samples.
/A?\d{1,3}-A?\d{0,3}M?/
Not sure if 10-A10M should or shouldn't be legal or even if M can appear with numbers. If it M is only there without numbers:
/A?\d{1,3}-(A?\d{1,3}|M)/
Use the brute force method if you have a small amount of well defined patterns so you don't get bad corner-case matches:
^(\d+-M|\d+-\d+|A\d+-\d+|A\d+-A\d+)$
Here are the individual regexes broken out:
\d+-M <- matches anything like '1-M'
\d+-\d+ <- 5-7
A\d+-\d+ <- A5-7
A\d+-A\d+ <- A10-A20
/^[A]?[0-9]{1,3}-[A]?[0-9]{1,3}[M]?$/
Matches anything of the form:
A(optional)[1-3 numbers]-A(optional)[1-3 numbers]M(optional)
^A?\d+-(?:A?\d+|M)$
An optional A followed by one or more digits, a dash, and either another optional A and some digits or an M. The '(?: ... )' notation is a Perl 'non-capturing' set of parentheses around the alternatives; it means there will be no '$1' after the regex matches. Clearly, if you wanted to capture the various bits and pieces, you could - and would - do so, and the non-capturing clause might not be relevant any more.
(You could replace the '+' with '{1,3}' as JasonV did to limit the numbers to 3 digits.)
^A?\d{1,3}-(M|A?\d{1,3})$
^ -- the match must be done from the beginning
A? -- "A" is optional
\d{1,3} -- between one and 3 digits; [0-9]{1,3} also work
- -- A "-" character
(...|...) -- Either one of the two expressions
(M|...) -- Either "M" or...
(...|A?\d{1,3}) -- "A" followed by at least one and at most three digits
$ -- the match should be done to the end
Some consequences of changing the format. If you do not put "^" at the beginning, the match may ignore an invalid beginning. For example, "MAAMA0-M" would be matched at "A0-M".
If, likewise, you leave $ out, the match may ignore an invalid trail. For example, "A0-MMMMAAMAM" would match "A0-M".
Using \d is usually preferred, as is \w for alphanumerics, \s for spaces, \D for non-digit, \W for non-alphanumeric or \S for non-space. But you must be careful that \d is not being treated as an escape sequence. You might need to write it \\d instead.
{x,y} means the last match must occur between x and y times.
? means the last match must occur once or not at all.
When using (), it is treated as one match. (ABC)? will match ABC or nothing at all.
I’d use this regular expression:
^(?:[1-9]\d{0,2}-(?:M|[1-9]\d{0,2})|A[1-9]\d{0,2}-A?[1-9]\d{0,2})$
This matches either:
<number>-M or <number>-<number>
A<number>-<number> or A<number>-A<number>
Additionally <number> must not begin with a 0.