I want to add a digit to the end of a search group, but I can't figure out how to keep the digit from interfering with the group reference in the replacement pattern:
Text: Someword 8888
Pattern: ^(\w+\s\d+)
Replacement pattern: ???
Desired result: Someword 88881
$11 looks for the eleventh search group, and results in an empty string
$1\1 results in Someword 8888Someword8888
$1\\1 results in Someword 8888\1
I know that this could be done in two separate find/replaces, but I want to know if there is a way this can be done in one.
There are several ways to get your desired result.
You may use a POSIX like replacement backreference \1 to insert Group 1 value and since there can be only 9 such backreferences, \11 is parsed as backreference to Group 1 and a 1.
Or, use ${1}1 where ${1} is an unambiguous replacement backreference with 1 after it.
Related
I have the following string
020075307354H 021133360876 981497910079937800ABC CDE FGH THY 0M19780403015001O+2¹qujzh_¢o\piVN¤«²µerNA¥\^?©E|=V_®¢Zu<£;Æ^TV½IÌc¤±·Gl.ÁEÊO·9y¹Bs¾Ë©ºFT¥*ÉA¬=iÚÒ®{æ*»¨;ÄNÕ®Ûòæ¦'Ñ…9>ÙYKè¹t/R{(>ÔÕBã2½7q¹|u…nztf~¦spw_ZX£\¦~Qa²mn¡¨QX«W±¯¯¦¨d£¾}·`B¶M}Qc|AµOÇ~Äd¤·¯HÇaI_¶²ÂÆYC?xÄR²>½HpÃjÁNLifm#ÕEí¾)ZvÇÊzØ)D&¦áÑM¡ç…1F¥Åh9R[9Fä¤Ãå<÷¼T}Ã…©ÎCDNs«E`É?¤eñ/ï´¯Åíÿt
and I want to use 1 Regex substitution to do the following 2 tasks:
Get the substring from position 49 to 58 -> 0079937800
Strip leading zeros from this substring -> 79937800
The desired end result is 79937800.
I figured out, that I can substitute the substring of task 1 with .{48}(.{10}).+.
The second task of removing leading I figured I can get using (\b0*([1-9][0-9]*|0)\b) , but how can I combine both tasks and get a working substitution string?
You can capture the "marker" that follows the 10-character string in a capture group in a positive lookahead, then match the desired substring with an arbitrary number of leading zeroes, and follow it with another positive lookahead to ensure that it is followed by the marker captured in the first capture group. The desired substring will then be in the second capture group:
^.{48}(?=.{10}(.*))0*(.*?)(?=\1)
Demo: https://regex101.com/r/Q61KYJ/1
Since you commented that the requirement for a substitution is mandated by your software, you can simply add .* at the end of the above regex and substitute the match with the second capture group:
^.{48}(?=.{10}(.*))0*(.*?)(?=\1).*
Demo: https://regex101.com/r/FcRAGB/1
I have the regex
(\d|(IV|I{0,3})|\bone\b|\btwo\b|\bthree\b|\bfour\b)[\w\s]+
if I use the sentence
'1 has wound' - 1 is matched in group 1 as expected
'IV has wound' - IV is matched in group 1 as expected
but, the sentence
'one has wound' - the word one doesn't get matched in group 1
when i modify the regex as follows
(\bone\b|\btwo\b|\bthree\b|\bfour\b|\d|(IV|I{0,3}))[\w\s]+
the group matches as expected.
So, my question why does changing the order of the group work..
I tried looking up ordering and precedence for regex but couldn't find anything relevant..
Thx
I think you made a mistake in your regex, it should be
(\d|(IV|I{1,3})|\bone\b|\btwo\b|\bthree\b|\bfour\b)[\w\s
Notice it's I{1,3}, not I{0,3}.
So, because of that, your regex match zero I, thus the empty capture group 1
I have the following Regex
(?:(?:zero|one|two|three|four|five|six|seven|eight|nine|\[0-9\])\s*){4,}
As you can see, it matches numbers with whitespace.
Question
How do I stop it from matching the final whitespace character?
For example:
1 2 3 4 5<whitespace>
should rather be:
1 2 3 4 5
The way you wrote the regex, trailing whitespaces will always be a part of a match, and there is no way to get rid of them. You need to rewrite the pattern repeating the number matching part inside a group that you need to assign the limiting quantifier with the min value decremented.
Schematically, it looks like
<NUMPATTERN>(?:\s+<NUMPATTERN>){3,}
See the regex demo.
In PCRE and Ruby, you may repeat capture group patterns with (?n) syntax (to shorten the pattern):
(zero|one|two|three|four|five|six|seven|eight|nine|[0-9])(?:\s+\g<1>){3,}
See the regex demo
Scanario
I have to grab a substring from a composed string.
Match condition:
string starts with 'section1:'
captured string may be a blank separated or a dash separated list of alphanumerical values
if the captured string ends with a specific suffix ('-xx'), exclude the suffix from the captured string.
Examples
section1:ypsilon : section 1 matches, grab 'ypsilon'
section1:ypsilon zeta : section 1 matches, grab 'ypsilon zeta'
section1:ypsilon-zeta : section 1 matches, grab 'ypsilon-zeta'
section1:ypsilon-xx : section 1 matches, grab 'ypsilon', exclude '-xx'
section1:ypsilon zeta-xx : section 1 matches, grab 'ypsilon zeta', exclude '-xx'
section1:ypsilon-zeta-xx : section 1 matches, grab 'ypsilon-zeta', exclude '-xx'
section2:ypsilon : section 2 does not match
Solution so far
^section1:([a-zA-Z0-9\- ]+)(\-xx)?$
The idea is to get the group 1, whereas the group 2 is optional.
Demo.
Question
Unfortunately the suffix matches the group1 definition, as it is an alphabetic string with a dash. So the resulting captured strings does not exclude the suffix.
Any clue?
You were close, the main problem you're facing is the greediness of operators.
n+ will match as many n as possible, if we wish to reduce this we have to suffix it with ?
I end up with this regex Demo here
^section1:([a-zA-Z0-9\- ]+?)(|-xx)$
Main difference is the ? after the + to make it non-greedy (or reluctant) and I prefer to use alternation between empty and desire suffix instead of a group (|-xx) this match nothing OR -xx before the end of line.
I've no argument between both, matter of taste I think.
Use alteration of -xx with a non capturing group and use ? to make + not so ready that -xx is sucked up in the match:
(?<=^section1):([a-zA-Z0-9\- ]+?)(?:-xx|:)
Demo
If you don't have the second : to use as a bookmark, use $:
(?<=^section1):([a-zA-Z0-9\- ]+?)(?:-xx|\s*$)
Demo 2
I have an regex to match string of the form x=y. Ie name assigned a value. The value can optionally be quoted and both name and value conform to \w+
My regex is
\w+=\w+|"\w+"|'\w+'
There can be multiple of these assignments on one line, but here I ran into problems. For some reason when I enclose this regex in (?:) it won't match. See test case below
use Test::More;
my $re1 = qr/^\w+=\w+|"\w+"|'\w+'$/p;
my $re2 = qr/^(?:\w+=\w+|"\w+"|'\w+')$/p;
ok('xy="abc"' =~ $re1);
say "PREMATCH ${^PREMATCH}";
say "MATCH ${^MATCH}";
say "POSTMATCH ${^POSTMATCH}";
ok('xy="abc"' =~ $re2);
done_testing;
Output is
ok 1
PREMATCH xy=
MATCH "abc"
POSTMATCH
not ok 2
# Failed test at ./test.pl line 20.
1..2
# Looks like you failed 1 test of 2.
I don't understand why the first matches and the second not. And I also don't understand why the first one matches only the part after the equal sign.
You are having an issue with your alternation. It is taking the entire part of the regex before the first pipe as one option. In other words,
/^\w+=\w+|"\w+"|'\w+'$/
is parsed into three possibilities to match
^\w+=\w+
"\w+"
or
'\w+'$
To fix this you have 2 choices (that I see). First expand each of those choices to what you really want:
/^\w+=\w+|^\w+="\w+"|^\w+='\w+'$/
The second is to cluster the alternation:
/^\w+=(?:\w+|"\w+"|'\w+')$/
Your
^\w+=\w+|"\w+"|'\w+'$
is equivalent to
(?:^\w+=\w+)|(?:"\w+")|(?:'\w+'$)
where it matches the ^ followed by whitespace OR quotation marks around the word OR a single-quote around words that occur at the end of the string.
Your
^(?:\w+=\w+|"\w+"|'\w+')$
Requires that ALL of those within the group start at the beginning of the line (due to the ^ outside of the group), then the various test, and then ALL of those groups must complete at the end of the string (due to the $ outside of the group).
The simplest fix is to simply move both the ^ and $ into to the group:
(?:^\w+=\w+|"\w+"|'\w+'$)