Removing words inside brackets with regex - regex

I want to remove words inside bracket,
I'm currently using this
【.*】
【remove this】preserve this 【remove this】
but it removes everything for this because there is another bracket
How can I solve this? it also happens on comma
◆.*、
◆ remove this、preserve this、
that regex removes everything because I have 2 commas

Use non-greedy matching with ?, and also escape the brackets, which are special characters:
\[.*?\]

You can try two solutions:
Lazy Operators (this might not work on your RegEx parser)
\[.*?\]
.*?,
Or replace . by a negation list to match any element but the end delimiter:
\[[^]]*\]
[^,]*,

use a specified character group
\[[^\]]+\]

Related

Regular Expression (notepad++) insert, not replace

In a regular expression (notepad++), I want to search for:( )|(:)|(_)|(\.), and to insert \ before to, as above, a blank space, colon, under line and ".".
Search example: abcd:1234 jiod.8ufd_adfd
Result: abcd\:1234\ jiod\.8ufd\_adfd
Briefly, how can I refer to what was found in the replace expression?
Note that it is not \1, \2, \3 or \4 in the example, as I need to include what was found, there is no way to know which was found, is there?
You can use a single character class (instead of using the alternation with capturing groups) to match one of the listed
In the replacement use $& to refer to the matched text and prepend a backslash.
Match
[:\h._]
Replace with
\\$&
The character class matches either a colon, horizontal whitespace char, dot or underscore.
Regex demo
There's no such thing as insert, because if you think about it, inserting is just replacing the original with a new string that contains the old text as well.
Try this instead: search for ([ :_.]) (your original regex is pointlessly complicated) and replace with \\$1 (ie, slash followed by the original text).

How to exclude character that has preceding character different than specified in regular expression?

With regular expression I would like to get all characters between round brackets, but \( and \) characters should be also included in the result.
Examples:
input: fo(ob)a)r
output: ob
input: foo(bar\(qwerty\))baz
output: bar\(qwerty\)
This is what I used for finding text between brackets:
(?<=\()([^\s\(\)]+)(?=\)), but I can't make exceptions for brackets preceded by \.
You could do something like this :
.*(?<!\\)\((.*?)(?<!\\)\)
Basically, it matches as many characters as possible until it sees an open parenthesis without a backslash (using a negative lookbehind), then groups the next matching characters until a closing parenthesis (still without a backslash).
Note that this regex may not work properly if you escape the backslashes.
Example : https://regex101.com/r/BqVKZp/1
This regex works for both your examples, without any lookaheads and lookbehinds:
\((.+[^\\])\)
A U flag is needed.

Go Regexp to Match Characters Between

I have content I am trying to remove from a string
s:=`Hello! <something>My friend</something>this is some <b>content</b>.`
I want to be able to replace <b>content</b> and <something>My friend</something> so that the string is then
`Hello! this is some .`
So basically, I want to be able to remove anything between <.*>
But the problem is that the regex matches <something>My friend</something> this is some <b>content</b> because golang is matching the first < to the very last >
* is a greedy operator meaning it will match as much as it can and still allow the remainder of the regular expression to match. In this case, I would suggest using negated character classes since backreferences are not supported.
s := "Hello! <something>My friend</something>this is some <b>content</b>."
re := regexp.MustCompile("<[^/]*/[^>]*>")
fmt.Println(re.ReplaceAllString(s, ""))
Go Playground
Go's regexp doesn't have backtracking so you can't use <(.*?)>.*?</\1> like you would do in perl.
However if you don't care if the closing tag matches you can use:
<.*?/.*?>
Just saw your update, .* is a greedy operator, it will match everything in between, you have to use non-greedy matching (aka .*?).
play

Remove all characters after a certain match

I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.
You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO
I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5

NOTEPAD++ REGEX - I can't get what's in between two strings, I don't get it

I'm so close to understanding regex. I'm a bit stumped, I thought i understood lazy and greedy.
Here is my current regex: <g_n><!\[CDATA\[([^]]+)(?=]]><\/g_n>)
My current regex makes:
<g_n><![CDATA[xxxxxxxxxx]]></g_n>
match to:
<g_n><![CDATA[xxxxxxxxxx
But I want to make it match like this:
xxxxxxxxxx
You want
<g_n><!\[CDATA\[(.*?)]]></g_n>
then if you want to replace it use
\1
in the replacement box
Your matching the whole string, the brackets around the .*? match all of that and put it in the \1 variable
So the match will be all of the string with \1 referring to what you want
To change the xxxxx
Regex :
(<g_n><![CDATA[)(?:.*?)(]]></g_n>)
Replacement
\1WHAT YOU WANT TO CHANGE TO\2
It looks like you need to add escape slashes to the two closing square brackets, as they are literals from the string you're parsing.
<g_n><!\[CDATA\[.*+?\]\]><\/g_n>
^ ^
Any square brackets not being escaped by backslashes will be treated as regex operational brackets, which in this case won't catch the input string.
EDIT, I think the +? is redundant.
\[.*\]\]> ...
should suffice, since .* means any character, any amount of times.
Tested with notepad++ 6.3.2:
find: (<g_n><!\[CDATA\[)([^]]+)(?=]]></g_n>)
replace: $1WhatYouWant
You can replace + by * in the pattern to match void CDATA:
<g_n><![CDATA[]]></g_n>