Strange behavior of regex - regex

Background:
I need to identify a pair of numbers separated by a hyphen (-), the numbers can optionally include +/- and can be decimal.
So below are examples of that:
3-4, +3-+4, .3-.4, 0.3-0.4, -0.3--0.4, 0.3--0.4 etc...
I was using below expression:
(-?\+?\d*.?\d*)-(-?\+?\d*.?\d*)
It works well in most cases but fails in below:
-0.3--0.4
The groups it forms are: -0.3- and 0.4
But if i replace it like:
(-?\+?\d*.?\d+)-(-?\+?\d*.?\d+), it works fine.
I am wondering what difference replacing the * with + is making?
We have used this in javascript.

The wrong capturing is accounted for by the fact that your patterns inside capturing groups (-?\+?\d*.?\d*) can match an empty string and - more importantly here - . matches any char, not only a dot. You must escape it to match a literal dot. Note how (-?\+?\d*.?\d*)-(-?\+?\d*.?\d*) matches 3-4, (the , is captured with Group 2 pattern .) and note Matches 5 and 6 where . matches a space and a hyphen.
Also, your -?\+? actually allows matching -+ sequence of signs, which does not seem what you need. Just use [-+]? optional character class.
So, you might want to use ([-+]?\d*\.?\d*)-([-+]?\d*\.?\d*) pattern, but I'd advise to make sure at least 1 digit is matched, and you may use ([-+]?\d*\.?\d+)-([-+]?\d*\.?\d+) pattern for it.
Details:
([-+]?\d*\.?\d+) - Group 1: a sequence of
[-+]? - an optional - or +
\d* - 0+ digits
\.? - an optional .
\d+/\d* - 1 or more digits (or 0 or more with *)
- - a hyphen
([-+]?\d*\.?\d+) - see above.

Related

RegEx for matching operation sequences

I have a numbers operation like this:
-2-28*95+874-1545*-5+36
I need to extract operands, not implied in a multiplication operation with a regex:
-2
+874
+36
I tried things like that without success:
[\+,-]\d+(?=\+|-|$)
This regex matches -5, too, and
(?(?=\d+)[\+,-]|^)\d+(?=\+|-|$)
matches nothing.
How do I solve this problem?
You may use
(?<!\*)[-+]\d*\.?\d+(?![*\d])
See the regex demo
Details
(?<!\*) - (a negative lookbehind making sure the current position is) not immediately preced with a * char
[-+] - - or +
\d* - 0 or more digits
\.? - an optional . char
\d+ - 1+ digits
(?![*\d]) - not immediately followed with a * or digit char.
See the regex graph:
This RegEx might help you to capture your undesired pattern in one group (), then it would leave your desired output:
(((-|\+|)\d+\*(-|\+|)\d+))
You can also use other language specific functions such as (*SKIP)(*FAIL) or (*SKIP)(*F) and get the desired output:
((((-|\+|)\d+\*(-|\+|)\d+))(*SKIP)(*FAIL)|([s\S]))
You can also DRY your expression, if you wish, and remove unnecessary groups that you may not need.
Another option could be to match what you don't want and capture in a group what you want to keep. Your values are then in the first capturing group:
[+-]?\d+(?:\*[+-]?\d+)+|([+-]?\d+)
Explanation
[+-]?\d+ Optional + or - followed by 1+ digits
(?:\*[+-]?\d+)+ Repeat the previous pattern 1+ times with an * prepended
| Or
([+-]?\d+) Capture in group 1 matching an optional + or - and 1+ digits
Regex demo

REGEXP_REPLACE for exact regex pattern, not working

I'm trying to match an exact pattern to do some data cleanup for ISSN's using the code below:
select case when REGEXP_REPLACE('1234-5678 ÿþT(zlsd?k+j''fh{l}x[a]j).,~!##$%^&*()_+{}|:<>?`"\;''/-', '([0-9]{4}[\-]?[Xx0-9]{4})(.*)', '$1') not similar to '[0-9]{4}[\-]?[Xx0-9]{4}' then 'NOT' else 'YES' end
The pattern I want match any 8 digit group with a possible dash in the middle and possible X at the end.
The code above works for most cases, but if capture group 1 is the following example: 123456789 then it also returns positive because it matches the first 8 digits, and I don't want it to.
I tried surrounding capture group 1 with ^...$ but that doesn't work either.
So I would like to match exactly these examples and similar ones:
1234-5678
1234-567X
12345678
1234567X
BUT NOT THESE (and similar):
1234567899
1234567899x
What am I missing?
You may use
^([0-9]{4}-?[Xx0-9]{4})([^0-9].*)?$
See the regex demo
Details
^ - start of string
([0-9]{4}-?[Xx0-9]{4}) - Capturing group 1 ($1): four digits, an optional -, and then four x / X or digits
([^0-9].*)? - an optional Capturing group 2: any char other than a digit and then any 0+ chars as many as possible
$ - end of string.

Regex for Chilean RUT/RUN with PCRE

I'm having issues with the validation of the chilean RUT/RUN with a regex expression in PCRE. I have the next regular expression but sadly can't make it work:
\b[0-9|.]{1,10}\-[K|k|0-9]
I need help to see what is wrong with the code. The application I need to use only uses PCRE.
Thank you.
You may use
^(\d{1,3}(?:\.\d{1,3}){2}-[\dkK])$
to match and capture (that is not usually necessary, but your app requires a capturing group to extract its contents) a whole string that matches the pattern. See the regex demo.
To match shorter strings that match this pattern inside a larger string, you may remove ^ and $ (see demo) or use \b word boundaries instead (see this demo).
Details:
^ - start of string
\d{1,3} - 1 to 3 digits
(?:\.\d{1,3}){2} - 2 sequences of a literal . and 1 to 3 digits
- - a hyphen
[\dkK] - a digit, k or K.
$ - end of string.
As they sometimes omit the dots, I used this one:
^(\d{1,2}(?:[\.]?\d{3}){2}-[\dkK])$
Details:
^ - start of string
\d{1,2} - 1 or 2 digits
(?:[.]?\d{3}){2} - 2 sequences of an optional '.' and 3 digits
- a hyphen
[\dkK] - a digit, k or K
$ - end of string
1234567-k OK
12345678-k OK
1.234.567-k OK
12.345.678-k OK
known issue:
12.345678-k and 12345.678-k still OK and I do not like this :(
You need to change to ^(\d{1,3}(?:\.\d{3}){2}-[\dkK])$ to capture only 2 sequence of 3 digits after the first sequence of 1-3 digits.
please consider being more specific in the REGEX build, since it matched wrong numbers, such as 17.87.335-2. Also the included one did't match formats without the dots or the hyphens.
Please consider using the following format: \b(\d{1,3}(?:(.?)\d{3}){2}(-?)[\dkK])\b
Modified prior version to try the other formats: https://regex101.com/r/2Us0j6/9

Regex match depending on lookbehind match

I need to match these values:
(First approach to a regex that roughly does what I want)
\d+([.,]\d{3})*[.,]\d{2}
like
24,56
24.56
1.234,56
1,234.56
1234,56
1234.56
but I need to not match
1.234.56
1,234,56
So somehow I need to check the last occurrence of "." or "," to not be the same as the previous "." or ",".
Background: Amounts shall be matched in English and German format with (optional) 1000-Separators.
But even with help of regex101 I completely fail at coming up with a correctly working look-behind. Any suggestions are highly appreciated.
UPDATE
Based on the answers I got so far, I came up with this (demo):
\d{1,3}(?:([\.,'])?\d{3})*(?!\1)[\.,\s]\d{2}
But it matches for example 1234.567,23 which is not desirable.
You may capture the digit grouping symbol and use a negative lookahead with a backreference to restrict the decimal separator:
^(?:\d+|\d{1,3}(?:([.,])\d{3})*)(?!\1)[.,]\d{2}$
^ ^ ^^^^^
See the regex demo
Group 1 will contain the last value of the digit grouping symbol and (?!\1)[.,] will match the other symbol.
Details:
^ - start of string
(?:\d+|\d{1,3}(?:([.,])\d{3})*) - either of the two alternatives:
\d+ - 1+ digits
| - or
\d{1,3} - 1 to 3 digits,
(?:([.,])\d{3})* - zero or more sequences of:
([.,]) - Group 1 capturing . or ,
\d{3} - 3 digits
(?!\1)[.,] - a . or , but not equal to what was last captured with ([.,]) pattern above
\d{2} - 2 digits
$ - end of string.
You can use
^\d+(([.,])\d{3})*(?!\2)[.,]\d{2}$
live demo

Only allow 2 digits in a string using regex

I need regex that only allows a maximum of 2 digits (or whatever the desired limit is actually) to be entered into an input field.
The requirements for the field are as follows:
Allow a-z A-Z
Allow 0-9
Allow - and . characters
Allow spaces (\s)
Do not allow more than 2 digits
Do not allow any other special characters
I have managed to put together the following regex based on several answers on SO:
^(?:([a-zA-z\d\s\.\-])(?!([a-zA-Z]*\d.*){3}))*$
The above regex is really close. It works successfully for the following:
test 12 test
test12
test-test.12
But it allows an input of:
123 (but not 1234, so it's close).
It only needs to allow an input of 12 when only digits are entered into the field.
I would like some help in finding a more efficient and cleaner (if possible) solution than my current regex - but it must still be regex, no JS.
You could use a positive lookahead like
(?=^(?:\D*\d\D*){2}$) # only two digits
^[- .\w]+$ # allowed characters
See a demo on regex101.com.
You may use a negative lookahead anchored at the start that will make the match fail once there are 3 digits found anywhere in the string:
^(?!(?:[^0-9]*[0-9]){3})[a-zA-Z0-9\s.-]*$
^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
Details:
^ - start of string
(?!(?:[^0-9]*[0-9]){3}) - the negative lookahead failing the match if exactly 3 following sequences are found:
[^0-9]* - zero or more chars other than digits
[0-9] - a digit (thus, the digits do not have to be adjoining)
[a-zA-Z0-9\s.-]* - 0+ ASCII letters, digits, whitespace, . or - symbols
$ - end of string.