I have two examples
abc 34 def12 ghi
abc 34 33 ghi
and a regexp
^.*?([0-9]{2}) ?([a-z]{2,3})? ?([0-9]{2}).*$
(see https://regex101.com/r/U2JNaS/1)
I need to modify it in such way to extract $1, $2, $3 but only if $2 was present, i.e. I need it to return
34 def12
<WRONG>
How to achieve that?
Note that you put a ? after the second capturing group (([a-z]{2,3})).
It causes that the whole regex will match even if the particular row does not contain the "letter" part.
Just remove this ?, so that in this case the whole regex will not match.
Related
I am facing a (naive) problem with regular expression.
I need to find any substrings composed of a fixed number (n) of different characters.
So, for "aaabcddd", if n=3 the substrings that I expect to find are: "abc" and "bcd".
My idea is to use n-1 capture groups and '[^' to exclude characters already matched. Thus, I wrote the following Perl regex (in Julia):
r"(([[:alpha:]])[^\2])[^\1]"
But, it is not working.
Do you have any tips?
You can not use a backreference to a capture group using a negated character class [^\1]
What you can do is use a negative lookahead to assert what is directly to the right of the current position is not what you have already captured in a previous group.
If that is the case, capture a single alpha in a new group.
The matches abc and bcd are in capture group 1
(?=(([[:alpha:]])(?!\2)([[:alpha:]])(?!\3|\2)[[:alpha:]]))
(?= Positive lookahead
( Capture group 1
([[:alpha:]]) Capture the first char in group 2
(?!\1)([[:alpha:]]) If not looking at what is captured by group 2 to the right, capture the second char in group 3
(?!\2|\1) If not looking to the right at what is captured by group 2 or 3
[[:alpha:]] Mach the 3rd char
) Close group 1
) Close the lookahead
Regex demo
Or a bit shorter using a case insensitive match:
(?=(([a-z])(?!\2)([a-z])(?!\3|\2)[a-z]))
Here is a solution to an arbitrary value of n characters:
#!/usr/local/bin/perl
use strict; use warnings; use feature ':5.10';
my $s="aaabcded";
my $n=3;
while ($s=~/(?=([[:alpha:]]{$n}))/g){
my $hit=$1;
my #chars = split //, $hit;
my %uniq;
#uniq{#chars} = ();
say "$hit" if (scalar keys %uniq) == $n;
}
Running with $n=3 prints:
abc
bcd
cde
Running with $n=4 prints:
abcd
bcde
And $n=5:
abcde
I am trying to update a regex pattern to include a Named Capture Group. Currently, this regex pattern:
\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\1\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b
correctly returns 4 matches from this sample text:
AAA
43 42 040 012 036 00
43 42 090 037 124 00
53 07 010 005 124 00
06-14 301-830-081-49
BBB
When I revised the pattern to add a Named Capture Group it only returns 3 matches and misses the last one.
(?<myPattern>\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\1\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b)
How can I keep the Named Capture Group but still return 4 matches ?
See example here.
Thanks.
Named capturing groups are still assigned numeric IDs. That is, (?<myPattern>[a-z]+)(\d+) contains two groups, the first one - with ID 1 - is "myPattern" group matching one or more lowercase letters, and the second group is \d+, with ID 2.
In your case, the problem arises due to the use of the \1 backreference later in the pattern. It refers to "myPattern" group value now, so the matching is incorrect.
To fix the issue, you need to replace \1 to the corresponding group, \2:
(?<myPattern>\b\d(?!(?:\d{0,3}([-\/\\.])\d{1,2}\2\d{1,4})\b(?!\S))(?:[^\n\d\$\.\%]*\d){14}\b)
See the regex demo.
I want to add a digit to the end of a search group, but I can't figure out how to keep the digit from interfering with the group reference in the replacement pattern:
Text: Someword 8888
Pattern: ^(\w+\s\d+)
Replacement pattern: ???
Desired result: Someword 88881
$11 looks for the eleventh search group, and results in an empty string
$1\1 results in Someword 8888Someword8888
$1\\1 results in Someword 8888\1
I know that this could be done in two separate find/replaces, but I want to know if there is a way this can be done in one.
There are several ways to get your desired result.
You may use a POSIX like replacement backreference \1 to insert Group 1 value and since there can be only 9 such backreferences, \11 is parsed as backreference to Group 1 and a 1.
Or, use ${1}1 where ${1} is an unambiguous replacement backreference with 1 after it.
I am trying to write a regular expression for a line like:
Funds Disb ABC Corp nmnxcb /abdsd= 12345678912345 abcdef
and retrieve the digits into a named group. I have created a regular expression for the above as :
^Funds Disb ABC Corp.*\s+(?<SOMEID>\d+).*$
The problem with that is it would not match my line if the number(12345678912345 in the above example) is not in the line. I have tried changing it to the below (adding '?' after the group) so it would expect 0 or 1 instance of the named group but after the change it stops reading the number altogether as the named group.
^Funds Disb ABC Corp.*\s+(?<SOMEID>\d+)?.*$
The problem with ^Funds Disb ABC Corp.*\s+(?<SOMEID>\d+)?.*$ is that the first .* will initially eat the entire rest of the line, including all the digits. It will have to backtrack a bit, in order to satisfy the \s+, but it won't backtrack enough to find the digits - after all, you TOLD it that the digits were entirely optional.
To fix this, you need to make sure that the regex never skips forward over any digits, prior to the actual group where you want them to match: [^\d]* instead of .*. So try: ^Funds Disb ABC Corp[^\d]*\s+(?<SOMEID>\d+)?.*$
Notepad ++ Replacing Multiple Words
Okay so heres what i need to know, currently i am searching multiple words at once, heres some sample data
(\bACCESS\b)|(\bAccs\b)|(\bALLEY\b)|(\bAlly\b)|(\bALLEYWAY\b)
What i want to do is add a ":" to the end of every word that is found. Like this
41 dwadadad Rd:
93 awdawdadawd Terrace:
4/100 awdadawdwad St:
32 awdawdawdawd Ave:
59 awdawdawd Street: Ferny Grove
Is there a regular expression for only getting the end of the matched word?
I suggest using an alternation list with just two word boundaries - at the start and end of the pattern, and just one group:
\b(?:Rd|Terrace|St|Ave|Street)\b
And replace with $0: (where $0 backreference references the whole match, if the pattern matched Rd, the Rd will be inserted in the resulting string).
Note that we can use 2 \b only becayse they enclose the alternation non-capturing group (?:...), and are thus applied to each alternative. It shortens the regex and speeds it up.
All you have to do is change your regex to:
((\bACCESS\b)|(\bAccs\b)|(\bALLEY\b)|(\bAlly\b)|(\bALLEYWAY\b))
And then replace with: \1: