Regex - Extract string between characters if they exist - regex

I would need to use RegEx to extract a string between characters if they exist (The colon character).
Examples:
SX: 22AA 001 267
2294 0BB 267: 09
2294 0CC 267
In all cases, I want the result.
2294 001 267
Thank you all.

You can use this regex to match them all
(?:^|:)\s?([A-Z\d]+(?: [A-Z\d]+)+)(?:$|:)
NOTE: As you did not mention what language you're using I decided to not use lookarounds. So you have to get the first group from the match.

Related

PCRE2 - Match every word whose suffix matches a backreference

Given the string below,
ay bee ceefooh deefoo38 ee 37 ef gee38 aitch 38 eye19 jay38 kay 99 el88 em38 en 29 ou38 38 pee 12 q38 arr 999 esss 555
the goal is to match every word such that the suffix is a number that matches the number that appears after foo (which happens to be 38 in this case).
There is only one substring that begins with foo and ends with a number. The expected matches all exist after said substring.
Expected matches:
gee38
jay38
em38
ou38
q38
I've tried foo(\d+).*?(\w+\1)\b and foo(\d+).*(\w+\1)\b, but they fail to match all, because they either match the first one (gee38) or the last one (q38).
Is it possible to match all with just a single regex and, importantly, in just a single run?
The PCRE2 engine that I use behaves in the same way as https://regex101.com/r/uFEDOE/1. So, if the regex can match multiple substrings on regex101, then the engine that I use can too.
(?:foo|\G(?!^))(\d+).*?(?=(\w+))\w+(?=\1\b)
Demo
It could be some size or performance optimization.
#Niko Gambt, say if any optimization is important for you.

Regex to capture and reposition the same pattern

I have a list of numbers that I would like to reformat, but I'm having difficulty with (I think) the substitution -- I'm capturing the groups as I intend to, but they aren't being rendered the way I expect them to be.
Here's some of the text:
Rear seal:
102
111
112
113
137
156
And the expected output is this:
Rear seal:
102 111 112
113 137 156
I'm using this regex to distinguish the first, second, and third lines:
(\d{3}[\n\r])(\d{3}[\n\r])(\d{3}[\n\r]) coupled with \1\t\2\t\3\n for the substitution. But for some reason it comes out as
Rear seal:
102
111
112
113
137
156
I'm using the excellent site regex101.com for testing, but I could use some human input. Specific link is
https://regex101.com/r/R7niEU/1 for this issue.
Thanks in advance.
You are capturing the newline in the capturing group. That way it will also be part of the replacement.
You can only capture the digits and match the newline instead.
Then replace with \1\t\2\t\3\n
(\d{3})[\n\r](\d{3})[\n\r](\d{3})[\n\r]
Regex demo

How do I format a list of phone numbers using regular expression in vim commands?

Given the following list of phone numbers
8144658695
812 673 5748
812 453 6783
812-348-7584
(617) 536 6584
834-674-8595
Write a single regular expression (use vim on loki) to reformat the numbers so they look like this
814 465 8695
812 673 5748
812 453 6783
812 348 7584
617 536 6584
834 674 8595
I am using the search and replace command. My regular expression using back referencing:
:%s/\(\d\d\d\)\(\d\d\d\)\(\d\d\d\d\)/\1 \2 \3\g
only formats the first line.
Any ideas?
Try this:
:%s,.*\(\d\d\d\).*\(\d\d\d\).*\(\d\d\d\d\).*,\1 \2 \3,
First use count to match a pattern multiple times, it is a bad habbit to repeat the pattern:
\d\{3} "instead of \d\d\d
Than you also have to match the whitespaces etc:
:%s/.*\(\d\{3}\).*\(\d\{3}\).*\(\d\{4}\).*/\1 \2 \3/g
Or even better, escape the whole regex with \v:
:%s/\v.*(\d{3}).*(\d{3}).*(\d{4}).*/\1 \2 \3/g
This greatly increases readability

phone number RegEx not working for some strings

I want to recognize phone number as 9 consecutive figures which can be separated by white spaces, non-breaking spaces etc. with regEx "(\s*\d\s*){9}"
I run VBA macro (JS RegEx) and here are example strings which work fine with above RegEx:
ul. 27 Grudnia 16, tel. 21 287 31 61, fax 61 286 69 60 –
ul. Wrzosowa 110/120/222, kom. 692 601 428
And here is an example where phone number is not detected in VBA, but is detected by RegEx JS online tools:
al. Mazowieckiego 63, kom. 622 769 694 –
Strings which are detected and these which are not, have the same structure, so I have no idea why VBA doesn't detect phone number in some of them.
It came out that VBA changed some strings to look in - replaced a whitespace - chr(32) with a non breaking chr(160).
Removing chr(160) from string to look in solves the problem.
Also I will try to find RegEx which will let non-breaking spaces, because \s* doesn't do so, at least in VBA.

Italian phone 10-digit number regex issue

I'm trying to use the regex from this site
/^([+]39)?((38[{8,9}|0])|(34[{7-9}|0])|(36[6|8|0])|(33[{3-9}|0])|(32[{8,9}]))([\d]{7})$/
for italian mobile phone numbers but a simple number as 3491234567 results invalid.
(don't care about spaces as i'll trim them)
should pass:
349 1234567
+39 349 1234567
TODO: 0039 349 1234567
TODO: (+39) 349 1234567
TODO: (0039) 349 1234567
regex101 and regexr both pass the validation..what's wrong?
UPDATE:
To clarify:
The regex should match any number that starts with either
388/389/380 (38[{8,9}|0])|
or
347/348/349/340 (34[{7-9}|0])|
or
366/368/360 (36[6|8|0])|
or
333/334/335/336/337/338/339/330 (33[{3-9}|0])|
328/329 (32[{8,9}])
plus 7 digits ([\d]{7})
and the +39 at the start optionally ([+]39)?
The following regex appears to fulfill your requirements. I took out the syntax errors and guessed a bit, and added the missing parts to cover your TODO comments.
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[7-90]|36[680]|33[3-90]|32[89])\d{7}$
Demo: https://regex101.com/r/yF7bZ0/1
Your test cases fail to cover many of the variations captured by the regex; perhaps you'll want to beef up the test set to make sure it does what you want.
The beginning allows for an optional international prefix with or without the parentheses. The basic pattern is (00|\+)39 and it is repeated with or without parentheses around it. (Perhaps a better overall approach would be to trim parentheses and punctuation as well as whitespace before processing begins; you'll want to keep the plus as significant, of course.)
Updated with information from #Edoardo's answer; wrapped for legibility and added comments:
^ # beginning of line
(\((00|\+)39\)|(00|\+)39)? # country code or trunk code, with or without parentheses
( # followed by one of the following
32[89]| # 328 or 329
33[013-9]| # 33x where x != 2
34[04-9]| # 34x where x not in 1,2,3
35[01]| # 350 or 351
36[068]| # 360 or 366 or 368
37[019] # 370 or 371 or 379
38[089]) # 380 or 388 or 389
\d{6,7} # ... followed by 6 or 7 digits
$ # and end of line
There are obvious accidental gaps which will probably also get filled over time. Generalizing this further is likely to improve resilience toward future changes, but of course may at the same time increase the risk of false positives. Make up your mind about which is worse.
I found this and i updated with new operators and MVNO prefixes (Iliad, ho.)
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[4-90]|36[680]|33[13-90]|32[89]|35[01]|37[019])\d{6,7}$
I improved the regex adding the case to handle space between numbers:
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[4-90]|36[680]|33[13-90]|32[89]|35[01]|37[019])(\s?\d{3}\s?\d{3,4}|\d{6,7})$
so, for example, I can match phone number like this (0039) 349 123 4567 or this 349 123 4567
Following doc:
https://it.qaz.wiki/wiki/Telephone_numbers_in_Italy
A simple regex for MOBILE italian numbers without special chars is:
/^3[0-9]{8,9}$/
it match a string starting with the digit '3' and followed by 8 or 9 digits, ex:
3345678103
you can add then ITALIAN prefix like '+39 ' or '0039 '
/^+39 3[0-9]{8,9}$/ --- match --> +39 3345678103
/^\0039 3[0-9]{8,9}$/ --- match --> 0039 3345678103