Regex to match a pattern with same number in R - regex

I have set of strings which looks like the below. Each string has 3 numbers separated with an underscore (_). Each number is a value between 1 - 100.
ma_1_1_1
ma_2_100_59
ma_29_29_29
ma_100_100_100
ma_7_72_78
ma_10_10_100
ma_4_4_49
I want to write a regular expression where I can get the strings whose digits are all same. For example my output would be
ma_1_1_1, ma_29_29_29 and ma_100_100_100

Like this?
^ma_(\d+)_\1_\1$
See a demo on regex101.com.
This uses backreferences with the first captured group as well as anchors.

Use back-references to make a regex match a previous group again:
ma_(100|[1-9][0-9]?)_\1_\1\b
Regex101 Demo
This will also validate that the numbers are within range. If this validation is unnecessary, use (\d+) for the capture group.

This answer is a modification to #4castle which will only extract the strings with similar numbers.
grep("ma_(100|[0-9][0-9]|[0-9])(_\\1)(_\\1)\\b", stringList, value = T)

Related

Regex expression for [number2] in [number],[number2][word]

I'm trying to find a regular expression to find [number2] in [number],[number2][word].
So far I've tried with [,](\d*), but it also gets me the comma.
Demo: https://regexr.com/59eqa
You may use:
(?<=,)(\d*)
Regex Demo
Detail:
(?<=,): positive look behind that doesn't consume character but indicate that the number must have , before it
The previous answers do not handle the case that the second (or two numbers) is matched.
If the second number must be captured, this can be done with
\b\d+,(\d+)[A-Za-z]
where the "number2" is contained in captured group 1.
If you want to get the match only, you could use 2 lookarounds, asserting a comma to the left and a char a-zA-Z to the right.
Use \d+ to match 1 or more digits.
(?<=,)\d+(?=[a-zA-Z])
Regex demo
If there should be a digit before the comma as well:
(?<=\d,)\d+(?=[a-zA-Z])
Regex demo

Regular Expression allow only numbers, commas and dashes

I'm trying to come up with a Data Annotation regular expression to match the following formats.
34
38-30
100,25-30
4-5,5,1-5
Basically the expression should only allow numbers, -(dash) and ,(comma) in any order
I tried following but couldn't get it working.
[RegularExpression(#"(0-9 .&'-,]+)", ErrorMessage ="Lot numbers are invalid.")]
It's ^[0-9,-]*$. Check out this demo.
I think your use case is having a CSV list of numbers, or ranges of numbers (identified as a number followed by a dash followed by another number). We can use the following regex:
[0-9]+(?:-[0-9]+)?(,[0-9]+(?:-[0-9]+)?)*
This regex matches a number, followed by an optional dash and another number, that quantity then followed by comma and another similar term, any number of times.
In the demo below I added anchors on both sides of the regex. Whether you need to do this depends on how you plan to use the pattern.
Demo

Regular expression Positive lookbehind, ignore first 2 words

I have the following sentence: total 10 item(s) 26,50
I want to extract the number 26,50 based on the word "total". I came this far with a Positive Lookbehind but I'm stuck now. (?<=total )(.*)(?=\d)
You don't need lookbehind. Use groups:
https://regex101.com/r/oC0dM3/2
total\s+(?P<COUNT>\d+)\s+item(?:\(s\))?\s+(?P<PRICE>\d+(?:,\d+)?)
Many Regex engine does not support variable variable length Look behind, in those cases your Regex would be pretty inefficient if you use lookbehind.
Use pattern grouping instead:
^total[^)]+\)\s+(.*)$
The only captured group here is your desired portion.
^total[^)]+\)\s+ matches upto the last whitespace before the desired pattern
(.*)$ gets our desired portion
Demo

Regex number format no consercutive - - in between number

I current have this regex:
/^\+?\d+(\d|\-)+\d+$/
this accepts
12345
123-456
+12345
+12345-12345
my problem that this also accepts
123--123
123-------3242-324324
How can I fix the regex to not accept consecutive dash in between numbers?
This will be correct one
^\+?\d+(-\d+)*$
Regex Demo
or modifying a bit of your regex with negative lookahead will also work
^(?!.*--)\+?\d+(\d|\-)+\d+$
Regex Demo

Fetch one out of two Numbers out of String

I hav a list of strings, such as: Ø20X400
I need to extract the first of the numbers - between Ø and X
I've come so far to match the numbers in general with \d+ - as simple as it is...
But I need an expression to get the first value separated, not both of them...
You can use lookarounds (?<=..) and (?=..):
(?<=Ø)\d+(?=X)
or in Java style:
(?<=Ø)\\d+(?=X)
A second way is to use a capture group:
Ø(\d+)X
or
Ø(\\d+)X
Then you can extract the content of the group.
The regex engines I know parse \n as a newline. \d is used for numbers.
The following regex gives you the first number between a Ø and a X in a capture group:
^.*?Ø(\d+)X.*
Edit live on Debuggex
This Regex will do it for you, (\d+?)X, and here is a Rubular to prove it. See, you want to group digits together, but make it non-greedy, ending the evaluation on X.
Try this one:
\d+(?=\D)
Should find first number wich has some not a number ahead
With normal regular expressions, I would say:
Ø(\d+)X
This finds the Ø character, followed by one or more numbers, followed by an X. Also, the numbers will be stored in the first capture group. Capture groups differ from one regex implementation to another, but this would typically be denoted by \1. Capture group zero, \0, is usually the matched string itself. In this version, \d denotes digits 0-9, but if your regex engine uses \n for that purpose, use:
Ø(\n+)X