Regex match depending on lookbehind match

Regex match depending on lookbehind match - regex

I need to match these values:
(First approach to a regex that roughly does what I want)
\d+([.,]\d{3})*[.,]\d{2}
like
24,56
24.56
1.234,56
1,234.56
1234,56
1234.56
but I need to not match
1.234.56
1,234,56
So somehow I need to check the last occurrence of "." or "," to not be the same as the previous "." or ",".
Background: Amounts shall be matched in English and German format with (optional) 1000-Separators.
But even with help of regex101 I completely fail at coming up with a correctly working look-behind. Any suggestions are highly appreciated.
UPDATE
Based on the answers I got so far, I came up with this (demo):
\d{1,3}(?:([\.,'])?\d{3})*(?!\1)[\.,\s]\d{2}
But it matches for example 1234.567,23 which is not desirable.

You may capture the digit grouping symbol and use a negative lookahead with a backreference to restrict the decimal separator:
^(?:\d+|\d{1,3}(?:([.,])\d{3})*)(?!\1)[.,]\d{2}$
^ ^ ^^^^^
See the regex demo
Group 1 will contain the last value of the digit grouping symbol and (?!\1)[.,] will match the other symbol.
Details:
^ - start of string
(?:\d+|\d{1,3}(?:([.,])\d{3})*) - either of the two alternatives:
\d+ - 1+ digits
| - or
\d{1,3} - 1 to 3 digits,
(?:([.,])\d{3})* - zero or more sequences of:
([.,]) - Group 1 capturing . or ,
\d{3} - 3 digits
(?!\1)[.,] - a . or , but not equal to what was last captured with ([.,]) pattern above
\d{2} - 2 digits
$ - end of string.

You can use
^\d+(([.,])\d{3})*(?!\2)[.,]\d{2}$
live demo

Related

Regex to match multiple cases

I have the following examples that must match with my regex
1,[]
1,[0,0,0,[]]
1,[0,0,0,0,0,[]]
1,1
1
I came up with a simple way of matching the middle ones with .?,\[.*\[\]\] but it doesnt match the first and the last one.
Maybe this is too much to handle with regex but I want to check the following things:
If there is a ',' it should have a following character or characters(numbers or letters)
If a bracket is opened: it should close '[]'
The bracket insides can be whatever but it must respect rule 1 and 2.
I am trying to find a solution so I'm grateful if you can help me. Thank you.

You can use
^\d+(?:,(?:(\[(?:[^][]++|\g<1>)*])|\d+))?$
See the regex demo. Details:
^ - start of string
\d+ - one or more digits
(?:,(?:(\[(?:[^][]++|\g<1>)*])|\d+))? - an optional sequence of
, - a comma
(?:(\[(?:[^][]++|\g<1>)*])|\d+) - one of the alternatives:
(\[(?:[^][]++|\g<1>)*]) - Group 1: [, then zero or more occurrences of one or more chars other than [ and ] or Group 1 pattern recursed
| - or
\d+ - one or more digits
$ - end of string.

RegEx for matching operation sequences

I have a numbers operation like this:
-2-28*95+874-1545*-5+36
I need to extract operands, not implied in a multiplication operation with a regex:
-2
+874
+36
I tried things like that without success:
[\+,-]\d+(?=\+|-|$)
This regex matches -5, too, and
(?(?=\d+)[\+,-]|^)\d+(?=\+|-|$)
matches nothing.
How do I solve this problem?

You may use
(?<!\*)[-+]\d*\.?\d+(?![*\d])
See the regex demo
Details
(?<!\*) - (a negative lookbehind making sure the current position is) not immediately preced with a * char
[-+] - - or +
\d* - 0 or more digits
\.? - an optional . char
\d+ - 1+ digits
(?![*\d]) - not immediately followed with a * or digit char.
See the regex graph:

This RegEx might help you to capture your undesired pattern in one group (), then it would leave your desired output:
(((-|\+|)\d+\*(-|\+|)\d+))
You can also use other language specific functions such as (*SKIP)(*FAIL) or (*SKIP)(*F) and get the desired output:
((((-|\+|)\d+\*(-|\+|)\d+))(*SKIP)(*FAIL)|([s\S]))
You can also DRY your expression, if you wish, and remove unnecessary groups that you may not need.

Another option could be to match what you don't want and capture in a group what you want to keep. Your values are then in the first capturing group:
[+-]?\d+(?:\*[+-]?\d+)+|([+-]?\d+)
Explanation
[+-]?\d+ Optional + or - followed by 1+ digits
(?:\*[+-]?\d+)+ Repeat the previous pattern 1+ times with an * prepended
| Or
([+-]?\d+) Capture in group 1 matching an optional + or - and 1+ digits
Regex demo

Strange behavior of regex

Background:
I need to identify a pair of numbers separated by a hyphen (-), the numbers can optionally include +/- and can be decimal.
So below are examples of that:
3-4, +3-+4, .3-.4, 0.3-0.4, -0.3--0.4, 0.3--0.4 etc...
I was using below expression:
(-?\+?\d*.?\d*)-(-?\+?\d*.?\d*)
It works well in most cases but fails in below:
-0.3--0.4
The groups it forms are: -0.3- and 0.4
But if i replace it like:
(-?\+?\d*.?\d+)-(-?\+?\d*.?\d+), it works fine.
I am wondering what difference replacing the * with + is making?
We have used this in javascript.

The wrong capturing is accounted for by the fact that your patterns inside capturing groups (-?\+?\d*.?\d*) can match an empty string and - more importantly here - . matches any char, not only a dot. You must escape it to match a literal dot. Note how (-?\+?\d*.?\d*)-(-?\+?\d*.?\d*) matches 3-4, (the , is captured with Group 2 pattern .) and note Matches 5 and 6 where . matches a space and a hyphen.
Also, your -?\+? actually allows matching -+ sequence of signs, which does not seem what you need. Just use [-+]? optional character class.
So, you might want to use ([-+]?\d*\.?\d*)-([-+]?\d*\.?\d*) pattern, but I'd advise to make sure at least 1 digit is matched, and you may use ([-+]?\d*\.?\d+)-([-+]?\d*\.?\d+) pattern for it.
Details:
([-+]?\d*\.?\d+) - Group 1: a sequence of
[-+]? - an optional - or +
\d* - 0+ digits
\.? - an optional .
\d+/\d* - 1 or more digits (or 0 or more with *)
- - a hyphen
([-+]?\d*\.?\d+) - see above.

Regex for Chilean RUT/RUN with PCRE

I'm having issues with the validation of the chilean RUT/RUN with a regex expression in PCRE. I have the next regular expression but sadly can't make it work:
\b[0-9|.]{1,10}\-[K|k|0-9]
I need help to see what is wrong with the code. The application I need to use only uses PCRE.
Thank you.

You may use
^(\d{1,3}(?:\.\d{1,3}){2}-[\dkK])$
to match and capture (that is not usually necessary, but your app requires a capturing group to extract its contents) a whole string that matches the pattern. See the regex demo.
To match shorter strings that match this pattern inside a larger string, you may remove ^ and $ (see demo) or use \b word boundaries instead (see this demo).
Details:
^ - start of string
\d{1,3} - 1 to 3 digits
(?:\.\d{1,3}){2} - 2 sequences of a literal . and 1 to 3 digits
- - a hyphen
[\dkK] - a digit, k or K.
$ - end of string.

As they sometimes omit the dots, I used this one:
^(\d{1,2}(?:[\.]?\d{3}){2}-[\dkK])$
Details:
^ - start of string
\d{1,2} - 1 or 2 digits
(?:[.]?\d{3}){2} - 2 sequences of an optional '.' and 3 digits
- a hyphen
[\dkK] - a digit, k or K
$ - end of string
1234567-k OK
12345678-k OK
1.234.567-k OK
12.345.678-k OK
known issue:
12.345678-k and 12345.678-k still OK and I do not like this :(

You need to change to ^(\d{1,3}(?:\.\d{3}){2}-[\dkK])$ to capture only 2 sequence of 3 digits after the first sequence of 1-3 digits.

please consider being more specific in the REGEX build, since it matched wrong numbers, such as 17.87.335-2. Also the included one did't match formats without the dots or the hyphens.
Please consider using the following format: \b(\d{1,3}(?:(.?)\d{3}){2}(-?)[\dkK])\b
Modified prior version to try the other formats: https://regex101.com/r/2Us0j6/9

R- regex extracting a string between a dash and a period

First of all I apologize if this question is too naive or has been repeated earlier. I tried to find it in the forum but I'm posting it as a question because I failed to find an answer.
I have a data frame with column names as follows;
head(rownames(u))
[1] "A17-R-Null-C-3.AT2G41240" "A18-R-Null-C-3.AT2G41240" "B19-R-Null-C-3.AT2G41240"
[4] "B20-R-Null-C-3.AT2G41240" "A21-R-Transgenic-C-3.AT2G41240" "A22-R-Transgenic-C-3.AT2G41240"
What I want is to use regex in R to extract the string in between the first dash and the last period.
Anticipated results are,
[1] "R-Null-C-3" "R-Null-C-3" "R-Null-C-3"
[4] "R-Null-C-3" "R-Transgenic-C-3" "R-Transgenic-C-3"
I tried following with no luck...
gsub("^[^-]*-|.+\\.","\\2", rownames(u))
gsub("^.+-","", rownames(u))
sub("^[^-]*.|\\..","", rownames(u))
Would someone be able to help me with this problem?
Thanks a lot in advance.
Shani.

Here is a solution to be used with gsub:
v <- c("A17-R-Null-C-3.AT2G41240", "A18-R-Null-C-3.AT2G41240", "B19-R-Null-C-3.AT2G41240", "B20-R-Null-C-3.AT2G41240", "A21-R-Transgenic-C-3.AT2G41240", "A22-R-Transgenic-C-3.AT2G41240")
gsub("^[^-]*-([^.]+).*", "\\1", v)
See IDEONE demo
The regex matches:
^[^-]* - zero or more characters other than -
- - a hyphen
([^.]+) - Group 1 matching and capturing one or more characters other than a dot
.* - any characters (even including a newline since perl=T is not used), any number of occurrences up to the end of the string.

This can easily be achieved with the following regex:
-([^.]+)
# look for a dash
# then match everything that is not a dot
# and save it to the first group
See a demo on regex101.com. Outputs are:
R-Null-C-3
R-Null-C-3
R-Null-C-3
R-Null-C-3
R-Transgenic-C-3
R-Transgenic-C-3

Regex
-([^.]+)\\.
Description
- matches the character - literally
1st Capturing group ([^\\.]+)
[^\.]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
. matches the character . literally
\\. matches the character . literally
Debuggex Demo
Output
MATCH 1
1. [4-14] `R-Null-C-3`
MATCH 2
1. [29-39] `R-Null-C-3`
MATCH 3
1. [54-64] `R-Null-C-3`
MATCH 4
1. [85-95] `R-Null-C-3`
MATCH 5
1. [110-126] `R-Transgenic-C-3`
MATCH 6
1. [141-157] `R-Transgenic-C-3`

This seems an appropriate case for lookarounds:
library(stringr)
str_extract(v, '(?<=-).*(?=\\.)')
where
(?<= ... ) is a positive lookbehind, i.e. it looks for a - immediately before the next captured group;
.* is any character . repeated 0 or more times *;
(?= ... ) is a positive lookahead, i.e. it looks for a period (escaped as \\.) following what is actually captured.
I used stringr::str_extract above because it's more direct in terms of what you're trying to do. It is possible to do the same thing with sub (or gsub), but the regex has to be uglier:
sub('.*?(?<=-)(.*)(?=\\.).*', '\\1', v, perl = TRUE)
.*? looks for any character . from 0 to as few as possible times *? (lazy evaluation);
the lookbehind (?<=-) is the same as above;
now the part we want .* is put in a captured group (...), which we'll need later;
the lookahead (?=\\.) is the same;
.* captures any character, repeated 0 to as many as possible times (here the end of the string).
The replacement is \\1, which refers to the first captured group from the pattern regex.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex match depending on lookbehind match - regex

You can use ^\d+(([.,])\d{3})*(?!\2)[.,]\d{2}$ live demo

Related

Regex to match multiple cases

RegEx for matching operation sequences

Strange behavior of regex

Regex for Chilean RUT/RUN with PCRE

R- regex extracting a string between a dash and a period

Categories

Resources