Regex Replace anything that does not match capturing group - regex

I have a String like following: [Monster:Test]Maps=1,5,2,3[Monster:Test2]Maps=2-5
I need to replace the string of unnecessary text.
The only text I want to keep is the brackets including the text between the brackets. So only [Monster:Test] and [Monster:Test2] should be kept.
So my regex to find it is: \\[(.*)\\]
I don't understand how to replace anything that does not match my group.

How about using preg_match_all
$s = "[Monster:Test]Maps=1,5,2,3[Monster:Test2]Maps=2-5~";
preg_match_all("/\[[^\]]+\]/", $s, $m);
echo implode($m[0]);
Results into:
[Monster:Test][Monster:Test2]
Does this work as required?

Just join all matches and you should end up with what you want.
/\[[^\]]+\]/g;
matches the first [
matches anything that is not a ]
matches a ]
g flag for all matches
Implementation Example:
var string = "[Monster:Test]Maps=1,5,2,3[Monster:Test2]Maps=2-5";
var result = string.match(/\[[^\]]+\]/g).join("");
console.log(result);
* Although the example is javascript, you should be able to do this in any other language.

Related

PHP, Incosistent results with preg_replace (RegEx)

I'm looking for a nudge in the right direction because the results I get from preg_replace don't make sense to me.
I've the following RegEx:
/([a-zA-Z0-9]{1,})/([a-zA-Z0-9]{1,})/([a-zA-Z0-9_]{1,})/([a-zA-Z0-9_]{1,})/([a-zA-Z0-9_]{1,})\b
I've a file which consists of lines like this:
1:
*/tdn/quota/plot_3/boot_tdd_8/Homes_Homes1/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*
2:
*/vdm/quota/plot_1/boot_tdd_1/Homes_Homes2/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*
3:
*/vdm/quota/plot_5/boot_tdd_3/Homes_Homes1/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*
My goal is to only keep everything after the /Homes_Homes/ part.
I get the correct replacement for the first file path with my Regex:
*/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*
The second file path is also correct:
*/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*
However, for the last file path I get:
*/container.rig*
instead of
*/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*
Why does preg_replace fail with the third file path?
The reason you get that last result is because preg_replace will replace all the matches and in the last example string the pattern matches twice.
What you might do is set the 4th parameter $limit to 1 to do a single replacement.
Not all the character classes in your pattern match an underscore, but if it would be ok to do so, you might shorten the pattern using a quantifier {5}, and anchor ^ to assert the start of the srting and make use of \K to match and then forget the * at the start of the string.
^\*\K(?:/\w+){5}
Regex demo | Php demo
For example
$re = '~^\*\K(?:/\w+){5}~';
$strings = [
"*/tdn/quota/plot_3/boot_tdd_8/Homes_Homes1/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*",
"*/vdm/quota/plot_1/boot_tdd_1/Homes_Homes2/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*",
"*/vdm/quota/plot_5/boot_tdd_3/Homes_Homes1/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*"
];
foreach ($strings as $s) {
echo preg_replace($re, '', $s) . PHP_EOL;
}
Output
*/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*
*/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*
*/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*

Regex - Exclude some string after matches [duplicate]

I need to extract from a string a set of characters which are included between two delimiters, without returning the delimiters themselves.
A simple example should be helpful:
Target: extract the substring between square brackets, without returning the brackets themselves.
Base string: This is a test string [more or less]
If I use the following reg. ex.
\[.*?\]
The match is [more or less]. I need to get only more or less (without the brackets).
Is it possible to do it?
Easy done:
(?<=\[)(.*?)(?=\])
Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:
is preceded by a [ that is not captured (lookbehind);
a non-greedy captured group. It's non-greedy to stop at the first ]; and
is followed by a ] that is not captured (lookahead).
Alternatively you can just capture what's between the square brackets:
\[(.*?)\]
and return the first captured group instead of the entire match.
If you are using JavaScript, the solution provided by cletus, (?<=\[)(.*?)(?=\]) won't work because JavaScript doesn't support the lookbehind operator.
Edit: actually, now (ES2018) it's possible to use the lookbehind operator. Just add / to define the regex string, like this:
var regex = /(?<=\[)(.*?)(?=\])/;
Old answer:
Solution:
var regex = /\[(.*?)\]/;
var strToMatch = "This is a test string [more or less]";
var matched = regex.exec(strToMatch);
It will return:
["[more or less]", "more or less"]
So, what you need is the second value. Use:
var matched = regex.exec(strToMatch)[1];
To return:
"more or less"
Here's a general example with obvious delimiters (X and Y):
(?<=X)(.*?)(?=Y)
Here it's used to find the string between X and Y. Rubular example here, or see image:
You just need to 'capture' the bit between the brackets.
\[(.*?)\]
To capture you put it inside parentheses. You do not say which language this is using. In Perl for example, you would access this using the $1 variable.
my $string ='This is the match [more or less]';
$string =~ /\[(.*?)\]/;
print "match:$1\n";
Other languages will have different mechanisms. C#, for example, uses the Match collection class, I believe.
[^\[] Match any character that is not [.
+ Match 1 or more of the anything that is not [. Creates groups of these matches.
(?=\]) Positive lookahead ]. Matches a group ending with ] without including it in the result.
Done.
[^\[]+(?=\])
Proof.
http://regexr.com/3gobr
Similar to the solution proposed by null. But the additional \] is not required. As an additional note, it appears \ is not required to escape the [ after the ^. For readability, I would leave it in.
Does not work in the situation in which the delimiters are identical. "more or less" for example.
Most updated solution
If you are using Javascript, the best solution that I came up with is using match instead of exec method.
Then, iterate matches and remove the delimiters with the result of the first group using $1
const text = "This is a test string [more or less], [more] and [less]";
const regex = /\[(.*?)\]/gi;
const resultMatchGroup = text.match(regex); // [ '[more or less]', '[more]', '[less]' ]
const desiredRes = resultMatchGroup.map(match => match.replace(regex, "$1"))
console.log("desiredRes", desiredRes); // [ 'more or less', 'more', 'less' ]
As you can see, this is useful for multiple delimiters in the text as well
PHP:
$string ='This is the match [more or less]';
preg_match('#\[(.*)\]#', $string, $match);
var_dump($match[1]);
This one specifically works for javascript's regular expression parser /[^[\]]+(?=])/g
just run this in the console
var regex = /[^[\]]+(?=])/g;
var str = "This is a test string [more or less]";
var match = regex.exec(str);
match;
To remove also the [] use:
\[.+\]
I had the same problem using regex with bash scripting.
I used a 2-step solution using pipes with grep -o applying
'\[(.*?)\]'
first, then
'\b.*\b'
Obviously not as efficient at the other answers, but an alternative.
I wanted to find a string between / and #, but # is sometimes optional. Here is the regex I use:
(?<=\/)([^#]+)(?=#*)
Here is how I got without '[' and ']' in C#:
var text = "This is a test string [more or less]";
// Getting only string between '[' and ']'
Regex regex = new Regex(#"\[(.+?)\]");
var matchGroups = regex.Matches(text);
for (int i = 0; i < matchGroups.Count; i++)
{
Console.WriteLine(matchGroups[i].Groups[1]);
}
The output is:
more or less
If you need extract the text without the brackets, you can use bash awk
echo " [hola mundo] " | awk -F'[][]' '{print $2}'
result:
hola mundo

regex to duplicate a word and add in extra text

I have a long list of words that I want to duplicate
Example
CallDateTime
WebDateTime
WavName
Dnis
Verified
Concern
ConcernCode
I'm trying to understand some regex to copy each word and placing to the right, along with adding in some needed text
's/(\t+)_(\w+)/\u\2, \u\1, \0/'
Well.. that is not working , THIS IS EXPECTED OUTPUT NEEDED
#CallDateTime = i.CallDateTime,
#WebDateTime = i.WebDateTime,
etc...
Obviously adding in ^ with # is easy and $ with , , but I want to also copy with a regex
I have seen this
((\w+)_(\w+))
Replace Pattern:
\3, \2, \1
But I don't understand that ..
Let's solve this with notepad++:
Find what: (\w+)
Replace with: #\1 = i.\1,
Explanation:
\w+ matches one or more word characters
(...) is a capturing group. You can reference it with \1 in the replacement part
replacement: A literal #, then the captured word, then a space, etc...
Searching for .+ and replacing it with #$0 = i.$0, should do the job.
https://regex101.com/r/WQXFy6/3
Replace :
\b(\w+)\b
by
#\1 = i.\1
Javascript code :
var str = "CallDateTime\nWebDateTime\nWavName\nDnis\nVerified\nConcern\nConcernCode";
str = str.replace(/\b(\w+)\b/g, '#$1 = i.$1');
console.log(str);

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Regular Expression to find a string included between two characters while EXCLUDING the delimiters

I need to extract from a string a set of characters which are included between two delimiters, without returning the delimiters themselves.
A simple example should be helpful:
Target: extract the substring between square brackets, without returning the brackets themselves.
Base string: This is a test string [more or less]
If I use the following reg. ex.
\[.*?\]
The match is [more or less]. I need to get only more or less (without the brackets).
Is it possible to do it?
Easy done:
(?<=\[)(.*?)(?=\])
Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:
is preceded by a [ that is not captured (lookbehind);
a non-greedy captured group. It's non-greedy to stop at the first ]; and
is followed by a ] that is not captured (lookahead).
Alternatively you can just capture what's between the square brackets:
\[(.*?)\]
and return the first captured group instead of the entire match.
If you are using JavaScript, the solution provided by cletus, (?<=\[)(.*?)(?=\]) won't work because JavaScript doesn't support the lookbehind operator.
Edit: actually, now (ES2018) it's possible to use the lookbehind operator. Just add / to define the regex string, like this:
var regex = /(?<=\[)(.*?)(?=\])/;
Old answer:
Solution:
var regex = /\[(.*?)\]/;
var strToMatch = "This is a test string [more or less]";
var matched = regex.exec(strToMatch);
It will return:
["[more or less]", "more or less"]
So, what you need is the second value. Use:
var matched = regex.exec(strToMatch)[1];
To return:
"more or less"
Here's a general example with obvious delimiters (X and Y):
(?<=X)(.*?)(?=Y)
Here it's used to find the string between X and Y. Rubular example here, or see image:
You just need to 'capture' the bit between the brackets.
\[(.*?)\]
To capture you put it inside parentheses. You do not say which language this is using. In Perl for example, you would access this using the $1 variable.
my $string ='This is the match [more or less]';
$string =~ /\[(.*?)\]/;
print "match:$1\n";
Other languages will have different mechanisms. C#, for example, uses the Match collection class, I believe.
[^\[] Match any character that is not [.
+ Match 1 or more of the anything that is not [. Creates groups of these matches.
(?=\]) Positive lookahead ]. Matches a group ending with ] without including it in the result.
Done.
[^\[]+(?=\])
Proof.
http://regexr.com/3gobr
Similar to the solution proposed by null. But the additional \] is not required. As an additional note, it appears \ is not required to escape the [ after the ^. For readability, I would leave it in.
Does not work in the situation in which the delimiters are identical. "more or less" for example.
Most updated solution
If you are using Javascript, the best solution that I came up with is using match instead of exec method.
Then, iterate matches and remove the delimiters with the result of the first group using $1
const text = "This is a test string [more or less], [more] and [less]";
const regex = /\[(.*?)\]/gi;
const resultMatchGroup = text.match(regex); // [ '[more or less]', '[more]', '[less]' ]
const desiredRes = resultMatchGroup.map(match => match.replace(regex, "$1"))
console.log("desiredRes", desiredRes); // [ 'more or less', 'more', 'less' ]
As you can see, this is useful for multiple delimiters in the text as well
PHP:
$string ='This is the match [more or less]';
preg_match('#\[(.*)\]#', $string, $match);
var_dump($match[1]);
This one specifically works for javascript's regular expression parser /[^[\]]+(?=])/g
just run this in the console
var regex = /[^[\]]+(?=])/g;
var str = "This is a test string [more or less]";
var match = regex.exec(str);
match;
To remove also the [] use:
\[.+\]
I had the same problem using regex with bash scripting.
I used a 2-step solution using pipes with grep -o applying
'\[(.*?)\]'
first, then
'\b.*\b'
Obviously not as efficient at the other answers, but an alternative.
I wanted to find a string between / and #, but # is sometimes optional. Here is the regex I use:
(?<=\/)([^#]+)(?=#*)
Here is how I got without '[' and ']' in C#:
var text = "This is a test string [more or less]";
// Getting only string between '[' and ']'
Regex regex = new Regex(#"\[(.+?)\]");
var matchGroups = regex.Matches(text);
for (int i = 0; i < matchGroups.Count; i++)
{
Console.WriteLine(matchGroups[i].Groups[1]);
}
The output is:
more or less
If you need extract the text without the brackets, you can use bash awk
echo " [hola mundo] " | awk -F'[][]' '{print $2}'
result:
hola mundo