regular expression for find and replace - regex

I've got strings like:
('Michael Herold','Michael Herold'),
but I need to remove the last parts so I end up with:
('Michael Herold'),
I'm still new to Regular Expressions so they confuse me. I'm using Notepad++.

find: \('([^']*)','\1'\)
Replace: ('\1')

So the actual function you use will depend on the language. Notepad++ is a text editor, not a language.
The regular expression that you will want will be ",'Michael Herold'" and you'll replace any matches with "", the empty string.
So in PHP for example, you'll have
$source = "('Michael Herold','Michael Herold')";
$pattern = "/(,'Michael Herold')+/";
$newString = $preg_replace($pattern, $source, "");
Do the equivalent in whatever language you use.

I'm not sure what flavor of regular expressions Notepad++ uses, but try replacing this expression:
\('([^']*)','\1'\)
with this one:
('$1')
The \1 matches whatever was found in the first set of single quotes (Michael Herold in your example), and $1 is replaced with that same string. (Try \1 if $1 doesn't work in Notepad++.)
See it in action here.

Related

Regular Expression - Perl

I am trying to get the a sub string from a string using regular expression but it getting error as my regular expression is not working. Can any one help me out in writing correct one :
Here is the Pattern on which i am trying to write the regular expression :
MSM8_BD_V4.3_1-1_idle-Kr_Run3.xlsx
MSM8_BD_V4.3_2-6_mp3-Kr_Run2.xlsx
MSM8_BD_V4.3_Camera_snap-7.xlsx
MSM8_BD_V4.3_Camera_snap-8.xlsx
MSM8_BD_V4.3_Radio_202.16-0.xlsx
I am trying to get the bold part of the substring .
below is the Regular expression i tried:
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
my ($captured) = $line =~ /MSM8939_BD_V4\.\3\_[d]*(.+?)\w/gx;
print "$captured\n";
[d] matches nothing but the literal letter d. You want \d, without the brackets, to match a digit. However, it looks like you also want to include underscores. That would be [\d_].
Try this:
/^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/
If I run this on your input (with e.g. perl -nE 'say $1 if /^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/'), I get this output:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
for (qw(
MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx
MSM8939_BD_V4.3_2-6_mp3-Kratos_Run2.xlsx
MSM8939_BD_V4.3_Camera_snap-7.xlsx
MSM8939_BD_V4.3_Camera_snap-8.xlsx
MSM8939_BD_V4.3_Radio_202.16-0.xlsx
)) {
my ($captured) = ($_ =~ /.*[-_]([^\W_]+_[\w.]+)-/gx);
print "$captured\n";
}
Use a greedy pattern to go as far as possible, then grab the last two strings that look like what you want which are still followed by a hyphen.
As does the other answer which was just edited while I was typing, this produces:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
This one may be more general in that the beginning of the substring is not hard-coded, i.e., you could use it in other cases which did not necessarily start with MSM8_BD_V4.3.

how to use lookaround regex in this latex example

the latex sample is as follows:
$F=K$,balalalala,balablal Bi$_x$Sb$_{1-x}$,balabla $abcd$ balabala
What I want to match is inline math expressions like $F=K$, $abcd$, while not those expressions with "_" after "$", like $_x$ and $_{1-x}$
So I write regex expression like this
\$[^_][^\$]+\$(?!_)
I add (?!_) because $Sb$ in the middle of Bi$_x$Sb$_{1-x}$ should not be considered as an math expression.
But the code is not working properly. It returns two expression
$F=K$ and $,balabla $.
What is the right regex expression for this problem?
your desired match needs a Lookbehind regex, somthing like:
\$[^$]+\$(?<!\$_[^$]+)
but we know that regex inside could not use + or * (must be fix-length), so the above regex is invalid.
I suggest to process the text in 2-pass. in first pass remove any $_xxx$ pattern :
perl -ne 's/(\$_[^\$]+\$)//g;print;'
and then match your desired pattern:
grep -oP '\$[^$]+\$'

Regular Expression for an mail address

"Max Mustermann" <max.mustermann#domain.com>
max.mustermann#domain.com
Max <max.mustermann#domain.com>
I need a regular Expression which matches everthing outside the arrow brackets (including the brackets).
The Match should be removed afterwards.
After the replacement it should look like this:
"Max Mustermann" <max.mustermann#domain.com> => max.mustermann#domain.com
The easiest solution would be to search for
[^<]*<([^>]*)>.*
and replace that with \1 or $1, depending on your regex engine.
This removes everything until the first < and everything from the next > until the end of the string.
Let's just hope that there will be no brackets inside the quoted names.
This should work, but beware that it is very simplified:
(?:[^<]*<)?([^>]+).*
Answer of email will be in $1.
For example, in Perl use:
$email =~ s/(?:[^<]*<)?([^>]+).*/$1/;
See RegexPlanet online demo.

Regular Expression Question regarding search&replace

I'm trying to match cases with regular expression to search and replace some text of given pattern. I can match the pattern, but I'd like to keep some of the literals when replacing.
For example, from the string "abcd123," I'd like to keep abcd but remove 123. I can match the pattern using a simple regular expression like [a-zA-Z0-9]+, but when I want to replace it, I don't know what to use for the replacement. Is this even possible with just regular expressions?
Thanks a lot.
The answer depends on what language/regex engine you are using. You typically use parentheses to save sections matched and either $1, $2, ... or \1, \2, ... in the replacement string to refer to those sections.
For example, from JavaScript:
var x = "Hello World";
x.replace( /([A-Z])\w+/g, '$1xx' );
// "Hxx Wxx"
What language or text editor are you using?

Regular Expression: Start from second one

I want to find the second <BR> tag and to start the search from there. How can i do it using regular expressions?
<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>
Prepend <BR>[^<]*(?=<BR>) to your regex, or remove the lookahead part if you want to start after the second <BR>, such as: <BR>[^<]*<BR>.
Find text after the second <BR> but before the third: <BR>[^<]*<BR>([^<]*)<BR>
This finds "waldo" in <BR>404<BR>waldo<BR>.
Note: I specifically used the above instead of the non-greedy .*? because once the above starts not working for you, you should stop parsing HTML with regex, and .*? will hide when that happens. However, the non-greedy quantifier is also not as well-supported, and you can always change to that if you want.
assuming you are using PHP, you can split your string on <BR> using explode
$str='<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>';
$s = explode("<BR>",$str,3);
$string = end($s);
print $string;
output
$ php test.php
Abdurrahman<BR><SMALL>Fathers Name</SMALL>
you can then use "$string" variable and do whatever you want.
The steps above can be done with other languages as well by using the string splitting methods your prog language has.
this regular expression should math the first two <br />s:
/(\s*<br\s*/?>\s*){2}/i
so you should either replace them with nothing or use preg_match or RegExp.prototype.match to extract the arguments.
In JavaScript:
var afterReplace = str.replace( /(\s*<br\s*\/?>\s*){2}/i, '' );
In PHP
$afterReplace = preg_replace( '/(\s*<br\s*\/?>\s*){2}/i', '', $str );
I'm only sure it'll work in PHP / JavaScript, but it should work in everything...
The usual solution to this sort of problem is to use a "capturing group". Most regular expression systems allow you to extract not only the entire matching sequence, but also sub-matches within it. This is done by grouping a part of the expression within ( and ). For instance, if I use the following expression (this is in JavaScript; I'm not sure what language you want to be working in, but the basic idea works in most languages):
var string = "<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>";
var match = string.match(/<BR>.*?<BR>([a-zA-Z]*)/);
Then I can get either everything that matched using match[0], which is "<BR>like <BR>Abdurrahman", or I can get only the part inside the parentheses using match[1], which gives me "Abdurrahman".