Regex to find repeating numbers - regex

Can anyone help me or direct me to build a regex to validate repeating numbers
eg : 11111111, 2222, 99999999999, etc
It should validate for any length.

\b(\d)\1+\b
Explanation:
\b # match word boundary
(\d) # match digit remember it
\1+ # match one or more instances of the previously matched digit
\b # match word boundary
If 1 should also be a valid match (zero repetitions), use a * instead of the +.
If you also want to allow longer repeats (123123123) use
\b(\d+)\1+\b
If the regex should be applied to the entire string (as opposed to finding "repeat-numbers in a longer string), use start- and end-of-line anchors instead of \b:
^(\d)\1+$
Edit: How to match the exact opposite, i. e. a number where not all digits are the same (except if the entire number is simply a digit):
^(\d)(?!\1+$)\d*$
^ # Start of string
(\d) # Match a digit
(?! # Assert that the following doesn't match:
\1+ # one or more repetitions of the previously matched digit
$ # until the end of the string
) # End of lookahead assertion
\d* # Match zero or more digits
$ # until the end of the string

To match a number of repetitions of a single digit, you can write ([0-9])\1*.
This matches [0-9] into a group, then matches 0 or more repetions (\1) of that group.
You can write \1+ to match one or more repetitions.

Use a backreference:
(\d)\1+
Probably you want to use some sort of anchors ^(\d)\1+$ or \b(\d)\1+\b

I used this expression to give me all phone numbers that are all the same digit.
Basically, it means to give 9 repetitions of the original first repetition of a given number, which results in 10 of the same number in a row.
([0-9])\1{9}

(\d)\1+? matches any digit repeating

you can get repeted text or numbers easily by backreference take a look on following example:
this code simply means whatever the pattern inside [] . ([inside pattern]) the \1 will go finding same as inside pattern forward to that.

Related

Regex get string after specific char, but only when the text starts with a specific string

I have a list of values that contains various values, but I'm only interested in the number after # of those starting with XXX_
ABC
XXX_YYY
XXX_YYY#12235
XXX_YYY#12281
XXX_YYY#12318
I have tried several things but not quite hit the head of the nail :-(
(?<!XXX\_)#
and
(?<=XXX\_)\*\[^#\]+$ - closest but also get those without # in :-(
To get the number after #, please find below python code and modify as per need
import re
result = re.findall("(?<=#)(.*?)(?=$)",a)
print(result[0])
Both patterns do not take numbers into account, and will match:
(?<!XXX_)# only matches a single # when not directly preceded by XXX_
(?<=XXX_)*[^#]+$ Optionally repeats a lookbehind assertion, and then matches 1+ chars other than # till the end of the string.
If there is a single # char in the string before the numbers, you can match XXX_ followed by any char except # using a negated character class and then match # followed by capturing the digits at the end of the string in group 1.
XXX_[^\n#]*#(\d+)$
The pattern matches:
XXX_ Match literally
[^\n#]*# Match optional chars other than # or a newline, then match #
(\d+) Capture 1+ digits in group 1
$ End of string
See a regex demo.

In Scala, is it possible to insert commas via a regex to separate thousands in numbers?

In Scala, is it possible to actually insert commas via a regex to separate thousands in numbers where the comma definitely is not there to start with?
For example, I'd like to convert 30000.00 into 30,000.00.
I am not sure this is exactly what you need, but you can use this:
val formatter = java.text.NumberFormat.getNumberInstance
println(formatter.format(30000.00)) // prints 30,000
This is not scala based answer.
You can use regex \d{1,3}(?=(?:\d{3})+\.) to find the matches and substitute each match with the same match plus an extra comma $0,.
See the online demo.
Explanation:
\d{1,3} This matches a decimal character between 1 and 3 times
(?= Positive lookahead starts
(?: This indicates a Non-capturing group
\d{3} matches a digit exactly 3 times
) end of Non-capturing group.
+ matches the previous group one or more times
\. matches the character . literally
) Positive lookahead ends.

Why is this regex selecting this text

I am using the regex
(.*)\d.txt
on the expression
MyFile23.txt
Now the online tester says that using the above regex the mentioned string would be allowed (selected). My understanding is that it should not be allowed because there are two numeric digits 2 and 3 while the above regex expression has only one numeric digit in it i.e \d.It should have been \d+. My current expression reads. Zero of more of any character followed by one numeric digit followed by .txt. My question is why is the above string passing the regex expression ?
This regex (.*)\d.txt will still match MyFile23.txt because of .* which will match 0 or more of any character (including a digit).
So for the given input: MyFile23.txt here is the breakup:
.* # matches MyFile2
\d # matched 3
. # matches a dot (though it can match anything here due to unescaped dot)
txt # will match literal txt
To make sure it only matches MyFile2.txt you can use:
^\D*\d\.txt$
Where ^ and $ are anchors to match start and end. \D* will match 0 or more non-digit.
The pattern you have has one group (.*) which would match using your example:MyFile2
because the . allows any character.
Furthermore the . in the pattern after this group is not escaped which will result in allowing another character of any kind.
To avoid this use:
(\D*)\d+\.txt
the group (\D*) would now match all non digit characters.
Here is the explanation, your "MyFile23.txt" matches the regex pattern:
A literal period . should always be escaped as \. else it will match "any character".
And finally, (.*) matches all the string from the beginning to the last digit (MyFile2). Have a look at the "MATCH INFORMATION" area on the right at this page.
So, I'd suggest the following fix:
^\D*\d\.txt$ = beginning of a line/string, non-digit character, any number of repetitions, a digit, a literal period, a literal txt, and the end of the string/line (depending on the m switch, which depends on the input string, whether you have a list of words on separate lines, or just a separate file name).
Here is a working example.

How to make sure that certain digits in a number are not the same

I have a couple of number strings like the following:
0000000
0000011
0000012
I want to validate that the pattern is like this:
AAAAABC
where A, B and C are all different digits. So in the example, only 0000012 should be matched.
My regex so far is (\d)\1\1\1\1\d\d, but it doesn't make sure that the digits are different. What do I need to do?
I think you want
(\d)\1{4}(?!\1)(\d)(?!\1|\2)\d
Explanation:
(\d) # Match a digit, capture in group 1
\1{4} # Match the same digit as before four times
(?!\1) # Assert that the next character is not the same digit as before
(\d) # Match another digit, capture in group 2
(?!\1|\2) # Assert the next character is different from both previous digits
\d # Match another digit.
See it on regex101.

what does this regular expression mean?

^(?!-)[a-z\d\-]{1,100}$
Here's an explanation using regex comment mode, so this expanded form can itself be used as a regex:
(?x) # flag to enable comment mode
^ # start of line/string.
(?!-) # negative lookahead for literal hyphen (-) character, so fails if the next position contains one.
[a-z\d\-] # character class matches a single alpha (a-z), digit (\d) or hyphen (\-).
{1,100} # match the above [class] upto 100 times, at least once.
$ # end of line/string.
In short, it's matching upto 100 lowercase alphanumerics or hyphen, but the first character must not be hyphen.
Could be attempting to validate a serial number, or similar, but it's too general to say for sure.
Not all regex engines support negative lookaheads. If you're trying to figure out what it is doing in order to adapt for an engine without negative lookaheads, you can use:
^[a-z\d][a-z\d-]{0,99}$
(?!-) == negative lookahead
start of line not followed by a - that contains at least 1 to 100 characters that can be a-z or 0-9 or a - followed by the end of the line, though the \d in the character class is probably wrong and should be specified by 0-9 otherwise the a-z takes care of a 'd' character, depends on the regex flavor.
A string of letters, digits and dashes. Between 1 and 100 characters. The first character is not a dash.