Find the first set of 5 digits in a text - regex

I need to find the first set of 5 numbers in a text like this :
;SUPER U CHARLY SUR MARNE;;;rte de Pavant CHARLY SUR MARNE Picardie 02310;Charly-sur-Marne;;;02310;;;;;;;;;;;;;;
I need to find the first 02310 only.
My regex but it found all set of 5 numbers :
([^\d]|^)\d{5}([^\d]|$)

To match the first 5-digit number you may use
^.*?\K(?<!\d)\d{5}(?!\d)
See the regex demo. As you want to remove the match, simply keep the Replace With field blank. The ^ matches the start of a line, .*? matches any 0+ chars other than line break chars, as few as possible, and \K operator drops the text matched so far. Then, (?<!\d)\d{5}(?!\d) matches 5 digits not enclosed with other digits.
Another variation includes a capturing group/backreference:
Find What: ^(.*?)(?<!\d)\d{5}(?!\d)
Replace With: $1
See this regex demo.
Here, instead of dropping the found text before the number, (.*?) is captured into Group 1 and $1 in the replacement pattern puts it back.

I would've use
(^(?:(?!\d{5}).)+)(\d{5})(?!\d)
It finds fragment from beginning of the string till end of first 5-digit number, but in case of replacement you can use $1 or $2 to substitute corresponding part. For example replacement $1<$2> will surround number by < and >.

To find the first 5 digits in the text, you could also match not a digit \D* or 1-4 digits followed by matching 5 digits:
^(?=.*\b\d{5}\b)(?:\D*|\d{1,4})*\K\d{5}(?!\d)
^ Start of string
(?=.*\b\d{5}\b) Assert that there are 5 consecutive digits between word boundaries
(?:\D*|\d{1,4})* Repeat matching 0+ times not a digit or 1-4 digits
\K\d{5} Forget what was matched, then match 5 digits
(?!\d) Assert what followed is not a digit
Regex demo

Related

Is there a way to use Regex to capture numbers out of a string based on a specific leading letters?

I need to extract any number between 4-10 digits that following directly after 'PO#' OR 'PO# ' (with a whitespace). I do not want to include the PO# with the actual value that is extracted, however I do need it as criteria to target the value within a string. If the digits are less than 4 or greater than 10, I do not wish to capture the value and would like to otherwise ignore it.
A sample string would look like this:
PO#12445 for Vendor Enterprise
or
Invoice# 21412556 for Vendor Enterprise for PO# 12445
My current RegEX expression captures PO# with '#' and I use additional logic after the fact to remove the '#', however my expression is also capturing Invoice# and Inv# which I don't want it to do. I'd like it to only target PO#.
Current Expression: [P][O][#]\s*[0-9]{3,9}\d+\w
Any help would be greatly appreciated!
If you need only the digits, you can use \b(?<=PO#)\s?(\d{4,10})\b, with:
(?<=PO#): positivive lookbehind, be sure that this pattern is present before the needed pattern (PO followed by #)
\s?: 0 or 1 whitespace
(\d{4,10}): between 4 and 10 digits
\b: word boundaries to avoid ie. the 10 first digits of a 11 digits pattern match or 'SPO#' to match
Edit: Alexander Mashin is right about the lookbehind having to be fixed width, so \b(?<=PO#)\s?(\d{4,10})\b is better https://regex101.com/r/1KBQd1/5
Edit: added word boundaries
You can use a capturing group and repeat matching the digits 4-10 times using [0-9]{4,10}.
Note that [P][O][#] is the same as PO#
\bPO#\s*([0-9]{4,10})\b
\bPO#\s* Match PO# preceded by a word boundary and match 0+ whitespace chars
( Capture group 1
[0-9]{4,10} Match 4 - 10 digits
)\b Close group followed by a word boundary to prevent the match being part of a larger word
Regex demo
If PCRE is available, how about:
PO#\s*\K\d{4,10}(?=\D|$)
PO#\s* matches the leading substring "PO#" followed by 0 or more whitespaces.
\K resets the starting position of the match and works as a positive (zero length) lookbehind.
\d{4,10} matches a sequence of digits of 4 <= length <= 10.
(?=\D|$) is the positive lookahead to match a non-digit character or the end of the string.

Removing trailing zeros using REPLACE regex

Remove trailing zeros to a number with 4 decimals
Sample expected output:
1.7500 -> 1.75
1.1010 -> 1.101
1.0000 -> 1
I am new with REGEX so I just tried this one first but not working:
REPLACE ALL OCCURRENCES OF REGEX '^\.[0]\d{0,3}' IN lv_rate WITH space.
Need help for the right regex to use. Thanks!
EDIT: SHIFT lv_rate RIGHT DELETING TRAILING '0' is not an option.
Try replacing on the following regex pattern:
\.?0+$
Use empty string as the replacement. This will match an optional decimal point, followed by trailing zeroes until the end of the string. See the demo below to see this pattern working.
Demo
This answer assumes that all inputs would always have a decimal component. If not, then we would need to add additional logic.
If you want to remove trailing zeros to a number with 4 decimals, one option is to use a capturing group and use group 1 in the replacement.
^(\d+(?=\.\d{4}$)(?:\.\d*[1-9])?)\.?0+$
In parts
^ Start of string
( Capture group 1
\d+ Match 1+ digits
(?=\.\d{4}$) Assert what is on the right is a . and 4 digits
(?:\.\d*[1-9])? Optionally match digits until the last digit 1-9
) Close group 1
\.?0+ Match an optional . and 1 or more times a zero
$ End of string
Regex demo

How to cut last digits from number - REGEX

I have to find the first 11 digits and cut everything that follows from the eleventh digit.
I've been trying to do it with this pattern :/^(\d{11}.*?). However, doesn't work.
You know what I'm doing wrong?
Depending on your regex flavour, you could use:
Find: ^\d{11}\K.+$
Replace: NOTHING
Explanation:
^ : beginning of line
\d{11} : 11 digits
\K : forget all we have seen until this position
.+ : 1 or more any character
$ : end of line
If you want to match first characters, you need to use anchor ^ that will anchor match at the beginning of the string.
If you want to match something and then reuse it, then you need to capture it isnide capturing group and use it in sbstitution with \1.
If you want to capture eleven digits - \d{11} will work for you.
So to sum up, you need pattern ^(\d{11}).* and replace with \1. .* will match 0 or more characters (any).
After lot of trying, It actually works with this one:
^(?=(\d{11})).+?

vba regular expression last occurrence

I would like to match the "775" (representing the last 3 digit number with an unkown total number of occurrences) within the string "one 234 two 449 three 775 f4our" , with "f4our" representing an unknown number of characters (letters, digits, spaces, but not 3 or more digits in a row).
I came up with the regular expression "(\d{3}).*?$" thinking the "?" would suffice to get the 775 instead of the 234, but this doesn't seem to work.
Is there any way to accomplish this using VBA regular expressions?
Note that (\d{3}).*?$ just matches and captures into Group 1 the first 3 consecutive digits and then matches any 0+ characters other than a newline up to the end of the string.
You need to get the 3 digit chunk at the end of the string that is not followed with a 3-digit chunk anywhere after it.
You may use a negative lookahead (?!.*\d{3}) to impose a restriction on the match:
\d{3}(?!.*\d{3})
See the regex demo. Or - if the 3 digits are to be matched as whole word:
\b\d{3}\b(?!.*\b\d{3}\b)
See another demo

Regular expression of two digit number where two digits are not same

I am trying to write a regular expression that will match a two digit number where the two digits are not same.
I have used the following expression:
^([0-9])(?!\1)$
However, both the strings "11" and "12" are not matching. I thought "12" would match. Can anyone please tell me where I am going wrong?
You need to allow matching 2 digits. Your regex ^([0-9])(?!\1)$ only allows 1 digit string. Note that a lookahead does not consume characters, it only checks for presence or absence of something after the current position.
Use
^(\d)(?!\1)\d$
^^
See demo
Explanation of the pattern:
^ - start of string
(\d) - match and capture into Group #1 a digit
(?!\1) - make sure the next character is not the same digit as in Group 1
\d - one digit
$ - end of string.