Regex: find string between curly brackets, which itself contains curly brackets - regex

Suppose a string on the following format:
Use \hyperlink{aaa}{apple {pear} banana} and \hyperlink{bbb}{banana {pear} {apple}}.
I want to extract:
\hyperlink{aaa}{apple {pear} banana}
\hyperlink{bbb}{banana {pear} {apple}}
What regex could be used for such an extraction?
I got stuck with this:
\\hyperlink{\S+}{.+}

Here how you can do it with a recursive regex
\\hyperlink\{[^}]+?\}(\{(?>[^{}]+|(?1))+\})(?=\s|$)
Regex Demo
Recursive regex

If there is no arbitrary nesting, you can use a pattern with negated }{ like
\\hyperlink{[^}{]*}{[^}{]*(?:{[^}{]*}[^}{]*)*}
Similar this answer but unrolled. See the demo at regex101. To {extract} use groups (demo).
Depending on your environment / regex flavor it can be necessary to escape the opening { by a backslash for the braces that are not inside a character class to match them literally.
Further note that \S+ can consume } and .+ can match more than desired if unaware.

Related

Regex: match a string at start or after some special characters

I'm using Java Pattern class to find a string "keyword" which is at the beginning of the string or after a character that is in a list of characters. For example, the list of characters is ' ' and '<', then:
match:
"keyword..."
"...<keyword..."
"... keyword..."
not match:
"...akeyword..."
I've tried all these:
"[^ <]keyword"
"[ <^]keyword"
"[\\^ <]keyword" note:for a Java/C# string backslash need to be escaped
This question is similar Match only at string start or after whitespace but with only basic skills of Regex I can't adopt it to this problem. I'v tried:
"(?<!\\S<)keyword"
"(?<!([\\S<]))keyword"
And this seems to be a very basic problem, there may be a very easy and clear way.
This should work (^|[< ])keyword
(...|...) has ^ and [< ], stating either it should be start of string of be after char(<) or char( )
You could use an alternation | in a non capturing group (?:^|[ <]) to assert either the start of the string ^ or match a space or < in a character class and use a capturing group for keyword.
(?:^|[ <])(keyword)\b
Regex demo
Or you could use a positive lookbehind (?<=...) and match only keyword
(?<=^|[< ])keyword\b
Regex demo
(^keyword |[< ^]keyword)
Write in the square brackets the character you need.

Regex to replace the dot inside curly braces?

Is it possible to replace the dot with a underscore but just inside curly braces using only regex ?
eg. a.b.c={{c.d.f}}
after the replace it should look like
a.b.c={{c_d_f}}
The curly braces are always balanced and there will always be two open curly braces and two closed ones.
You can use this lookahead regex for search:
\.(?=[^{}]*\})
Lookahead (?=[^{}]*\}) asserts that there is a } ahead after 0 or more non { and non } characters.
and replace by _
RegEx Demo
You can use \G like (assuming . inside only {{ and no nesting)
(?:(\{\{)|\G(?!\A))([^.}]*)[.]
and replace with
\1\2_
Regex Demo
If u assume there is one character between dots use:
Search:({{.).(.).(.}})
Replace with:\1_\2_\3
If one or more:
Search:({{.+).(.+).(.+}})
With same replace

Replacing a string in Sublime

I'd like to replace this
y[100] with this Ith(y,100) in Sublime3.
I've got the regular expression \by[\d+] in find what and Ith(y,$1) in replace with, but it doesn't work. It finds what to replace correctly but just replaces it with Ith(y, )
You need to pick the \d+ inside () to capture in group($1)
\by[(\d+)]
Also, you may need to to escape the [ and ] characters here.
\by\[(\d+)\]
You have to capture the data you want to backreference: to do that you have to use unescaped parenthesis (...), ie capturing groups. Also, [...] are character classes, a special character in regex that you need to escape.
Try replacing
\by\[(\d+)\]
with
Ith(y, \1)
You need to escape the special characters and capture the number in a capturing group group. Try this regex:
\by\[(\d+)]\b
And replace with
Ith(y, $1)
Online explanation and demonstration: http://regex101.com/r/nX3yJ9

NOTEPAD++ REGEX - I can't get what's in between two strings, I don't get it

I'm so close to understanding regex. I'm a bit stumped, I thought i understood lazy and greedy.
Here is my current regex: <g_n><!\[CDATA\[([^]]+)(?=]]><\/g_n>)
My current regex makes:
<g_n><![CDATA[xxxxxxxxxx]]></g_n>
match to:
<g_n><![CDATA[xxxxxxxxxx
But I want to make it match like this:
xxxxxxxxxx
You want
<g_n><!\[CDATA\[(.*?)]]></g_n>
then if you want to replace it use
\1
in the replacement box
Your matching the whole string, the brackets around the .*? match all of that and put it in the \1 variable
So the match will be all of the string with \1 referring to what you want
To change the xxxxx
Regex :
(<g_n><![CDATA[)(?:.*?)(]]></g_n>)
Replacement
\1WHAT YOU WANT TO CHANGE TO\2
It looks like you need to add escape slashes to the two closing square brackets, as they are literals from the string you're parsing.
<g_n><!\[CDATA\[.*+?\]\]><\/g_n>
^ ^
Any square brackets not being escaped by backslashes will be treated as regex operational brackets, which in this case won't catch the input string.
EDIT, I think the +? is redundant.
\[.*\]\]> ...
should suffice, since .* means any character, any amount of times.
Tested with notepad++ 6.3.2:
find: (<g_n><!\[CDATA\[)([^]]+)(?=]]></g_n>)
replace: $1WhatYouWant
You can replace + by * in the pattern to match void CDATA:
<g_n><![CDATA[]]></g_n>

Multiline regex replacement in sed/vi

I need to replace this statement in a named.conf with regex
masters {
10.11.2.1;
10.11.2.2;
};
All my approaches with sed/vi do not work
%s/masters.*\}\;//g
does not match. Also tried with /s \s etc to match the newline.
In vim, you can force a pattern to match across newlines with \_, for example:
%s/masters {\_[^}]*};//g
It's important to replace .* with something more conservative like [^}]* if you prefix with \_, because * is greedy, so \_.* will try to match everything to the end of the document.