Replace nested double brace pair with single [duplicate]

Replace nested double brace pair with single [duplicate] - regex

This question already has answers here:
Can regular expressions be used to match nested patterns? [duplicate]
(11 answers)
Closed 4 years ago.
Any ideas how to replace:
..((....))..
With:
..(...)..
Be aware, it is not a straight up replace of "((" with "(". The expression must determine that the child brace pair being removed is contained directly with the parent pair, with no other content.
Bonus points if anyone can figure out how to function recursively, e.g. "(((...)))" to "(...)"

You can use this:
([(]*)(?:\([^)]*\))([)]*)
You just need to replace groups with empty string if even first group size is equal to second group or else use the minimum one.
Test:
(ABC)
((ABC))
(((ABC)))
((ABC)a)
Match Information:
Match 1
Full match 0-5 `(ABC)`
Group 1. 0-0 ``
Group 2. 5-5 ``
--> Hence, no update required
Match 2
Full match 6-13 `((ABC))`
Group 1. 6-7 `(`
Group 2. 12-13 `)`
--> As Group 1 and Group 2 size is same, replace those values with '' resulting to '(ABC)
Match 3
Full match 14-23 `(((ABC)))`
Group 1. 14-16 `((`
Group 2. 21-23 `))`
--> Same in this case as well
Match 4
Full match 24-30 `((ABC)`
Group 1. 24-25 `(`
Group 2. 30-30 ``
--> As group 1 and group 2 are not of same size, reduce to the min one which is group 2 (size 0) and hence no update required leaving it to '((ABC)A)'
Demo

Related

Substitute one group with another group

<P1 x="-0,36935" y="0,26315"/><P2 x="4,29731" y="0,26315"/><P3 x="5,29731" y="-0,40351"/><P4 x="-0,36935" y="-0,40351"/>
<P1 x="4,64065" y="0,26315"/><P2 x="5,97398" y="0,26315"/><P3 x="5,30731" y="-0,40351"/><P4 x="4,64065" y="-0,40351"/>
I want to put a value of P3(x) into P2(x).
So far I have a somewhat working solution, that is not the prettiest;
((?:P2 x=\")(.*?[^\"]+)(?=.+((?:P3 x=\")(.*?[^\"]+))))
It forces me to use P2 x="\4 substitution instead of simply \4
https://regex101.com/r/iua3p0/1
What I am trying is to separate Group 2 from Group 1, and Group 4 from Group 3,
that at the same time would allow me to use value of Group 4 in Group 2

You may try this regex:
(?<=<P2 x=")[\d,-]+(?=.*<P3 x="([\d,-]+)")
Substitution:
\1
(?<=<P2 x=") // positive lookbehind, the following rule must be preceded by <P2 x="
[\d,-]+ // a set of characters formed by digits, comma and -
(?=.*<P3 x="([\d,-]+)) // positive lookahead, find the x value of P3 and store it in group 1
See the proof

RegEx - Match a number that has different beginnings

I'm new to RegEx and I'm trying to match a specific number that has 8 digits, and has 3 start options:
00
15620450000
VS
For Example:
1562045000012345678
VS12345678
0012345678
12345678
I don't want to match the 4th option.
Right now I have managed to match the first and third options, but I'm having problems with the second one, I wrote this expression, trying to match the 8 digits under 'Project':
156204500|VS|00(?<Project>\d{8})
What should I do?
Thanks

With your shown samples, please try following regex once.
^(?:00|15620450000|VS)(\d{8})$
OR to match it with Project try:
^(?:00|15620450000|VS)(?<Project>\d{8})$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^(?:00|15620450000|VS) ##Checking value from starting and in a non-capturing group matching 00/15620450000/VS here as per question.
(?<Project>\d{8} ##Creating group named Project which is making sure value has only 8 digits till end of value.
)$ ##Closing capturing group here.

Let's understand why your solution failed that will help you get around such kind of problems in the future. Your regex, 156204500|VS|00(\d{8}) is processed as follows:
156204500 OR VS OR 00(\d{8})
In arithmetic,
1 + 2 + 3 (4 + 5) <--- (4 + 5) is multiplied with only 3
is different from
(1 + 2 + 3) (4 + 5) <--- (4 + 5) is multiplied with (1 + 2 + 3)
This rule is applicable to RegEx as well. Obviously, you intended to use the second form.
By now, you must have already figured out the following solution:
(15620450000|VS|00)(\d{8})
Note that unless you want to capture a group, a capturing group does not make sense and this is where regex has another concept called non-capturing group which you obtain by putting ?: as the first thing in the parentheses. With a non-capturing group, the final solution becomes:
(?:15620450000|VS|00)\d{8}

Why is this regex performing partial matches?

I have the following raw data:
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19 ...
I'm using this regex to remove duplicates:
([^.]+)(.[ ]*\1)+
which results in the following:
1.2.4.5.9.115.16.19 ...
The problem is how the regex handles 1.1 in the substring .11.15. What should be 9.11.15.16 becomes 9.115.16. How do I fix this?
The raw values are sorted in numeric order to accommodate the regex used for processing the duplicate values.
The regex is being used within Oracle's REGEXP_REPLACE
The decimal is a delimiter. I've tried commas and pipes but that doesn't fix the problem.

Oracle's REGEX does not work the way you intended. You could split the string and find distinct rows using the general method Splitting string into multiple rows in Oracle. Another option is to use XMLTABLE , which works for numbers and also strings with proper quoting.
SELECT LISTAGG(n, '.') WITHIN
GROUP (
ORDER BY n
) AS n
FROM (
SELECT DISTINCT TO_NUMBER(column_value) AS n
FROM XMLTABLE(replace('1.1.2.2.4.4.4.5.5.9.11.15.16.16.19', '.', ','))
);
Demo

Unfortunately Oracle doesn't provide a token to match a word boundary position. Neither familiar \b token nor ancient [[:<:]] or [[:>:]].
But on this specific set you can use:
(\d+\.)(\1)+
Note: You forgot to escape dot.

Your regex caught:
a 1 - the second digit in 11,
then a dot,
and finally 1 - the first digit in 15.
So your regex failed to catch the whole sequence of digits.
The most natural way to write a regex catching the whole sequence
of digits would be to use:
a loobehind for either the start of the string or a dot,
then catch a sequence of digits,
and finally a lookahead for a dot.
But as I am not sure whether Oracle supports lookarounds, I wrote
the regex another way:
(^|\.)(\d+)(\.(\2))+
Details:
(^|\.) - Either start of the string or a dot (group 1), instead of
the loobehind.
(\d+) - A sequence of digits (group 2).
( - Start of group 3, containing:
\.(\2) - A dot and the same sequence of digits which caught group 2.
)+ - End of group 3, it may occur multiple times.

Group the repeating pattern and remove it
As revo has indicated, a big source of your difficulties came with not escaping the period. In addition, the resulting string having a 115 included can be explained as follows (Valdi_Bo made a similar observation earlier):
([^.]+)(.[ ]*\1)+ will match 11.15 as follow:
SCOTT#DB>SELECT
2 '11.15' val,
3 regexp_replace('11.15','([^.]+)(\.[ ]*\1)+','\1') deduplicated
4 FROM
5 dual;
VAL DEDUPLICATED
11.15 115
Here is a similar approach to address those problems:
matching pattern composition
-Look for a non-period matching list of length 0 to N (subexpression is referenced by \1).
'19' which matches ([^.]*)
-Look for the repeats which form our second matching list associated with subexression 2, referenced by \2.
'19.19.19' which matches ([^.]*)([.]\1)+
-Look for either a period or end of string. This is matching list referenced by \3. This fixes the match of '11.15' by '115'.
([.]|$)
replacement string
I replace the match pattern with a replacement string composed of the first instance of the non-period matching list.
\1\3
Solution
regexp_replace(val,'([^.]*)([.]\1)+([.]|$)','\1\3')
Here is an example using some permutations of your examples:
SCOTT#db>WITH tst AS (
2 SELECT
3 '1.1.2.2.4.4.4.5.5.9.11.15.16.16.19' val
4 FROM
5 dual
6 UNION ALL
7 SELECT
8 '1.1.1.1.2.2.4.4.4.4.4.5.5.9.11.11.11.15.16.16.19' val
9 FROM
10 dual
11 UNION ALL
12 SELECT
13 '1.1.2.2.4.4.4.5.5.9.11.15.16.16.19.19.19' val
14 FROM
15 dual
16 ) SELECT
17 val,
18 regexp_replace(val,'([^.]*)([.]\1)+([.]|$)','\1\3') deduplicate
19 FROM
20 tst;
VAL DEDUPLICATE
------------------------------------------------------------------------
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19 1.2.4.5.9.11.15.16.19
1.1.1.1.2.2.4.4.4.4.4.5.5.9.11.11.11.15.16.16.19 1.2.4.5.9.11.15.16.19
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19.19.19 1.2.4.5.9.11.15.16.19
My approach does not address possible spaces in the string. One could just remove them separately (e.g. through a separate replace statement).

Extract filename and id from its name

I have a file with text
# co2a0000123.rd
# co2c0000124.rd
I need to use regex and extract co2a0000123 in group 1 and a or c as highlighted in group 2 of regex expression
I have tried
(\B[a|c])([a-z0-9]+).(?:[a-z]+)
What happens is ([a-z0-9]+).(?:[a-z]+) this part of regex gives co2a0000123 in group 1 as desired but as soon as I add (\B[a|c]) in the beginning or end co2a0000123 changes to co2a in group 1 and gives 'a' in Group 2.

Try for example \s(\w+?([ac])\w*)\.
Group 1 will be the part between a space and a dot.
Group 2 will be the first a or c anywhere except the first letter within Group 1.

RegEx multiple capture groups replaced in a string

I have a string of data...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
I want to take the following string and make 5 strings.
"123456712J456","D","TEST1"
"123456712J456","D","TEST2"
"123456712J456","D","TEST3"
"123456712J456","D","TEST4"
"123456712J456","D","TEST5"
I currently have the following regex...
//In a program like Textpad
<FIND> "\(.\{13\}\)","D","\([^~]*\)~\(.*\)
<REPLACE> "\1","D","\2"\n"\1","D","\3
//On the regex101 site
"(.{13})","D","([^~]*)~(.*)
Now if I run this 5 times it would work fine. The problem is there is an unknown number of lines to be made. For example...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
"123456712J457","D","TEST1~TEST2~TEST3"
"123456712J458","D","TEST1~TEST2"
"123456712J459","D","TEST1~TEST2~TEST3~TEST4"
I was hoping to be able to use a MULTI capture group to make this work. I found this PAGE talking about the common mistake between repeating a capturing group and capturing a repeated group. I need to capture a repeated group. For some reason I just could not make mine work right though. Anyone else have an idea?
RESOURCES:
http://www.regular-expressions.info/captureall.html
http://regex101.com/

Try this.See demo.Just club match1 and rest of the matches.
http://regex101.com/r/yR3mM3/17
RegEx:
(.*,)|([^"~]+)
Example:
"1234567123456","T","TEST1~TEST2~TEST3~TEST4~TEST5"
Results:
MATCH 1
1. [0-20] `"1234567123456","T",`
MATCH 2
2. [21-26] `TEST1`
MATCH 3
2. [27-32] `TEST2`
MATCH 4
2. [33-38] `TEST3`
MATCH 5
2. [39-44] `TEST4`
MATCH 6
2. [45-50] `TEST5`

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace nested double brace pair with single [duplicate] - regex

Related

Substitute one group with another group

RegEx - Match a number that has different beginnings

Why is this regex performing partial matches?

Extract filename and id from its name

RegEx multiple capture groups replaced in a string

Categories

Resources