perl script not extracting only path (regex) - regex

I have this regex expression
($oldpath = $_) =~ m/^\/(.+\/)*/;
This is the input:
/cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/110-lafayette_railroad.mp3
But the output is:
/cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/110-lafayette_railroad.mp3
When it should be:
/cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/
Thanks in advance. :)

What do you mean by "output"? $1 contains
cd-lib/mp3/rock/LittleFeat/Dixie_Chicken/
which is almost what you wanted (it just misses the leading /).
You assigned $_ to $oldpath, than matched it against a regex. It doesn't change either $_ or $oldpath.
The canonical way is
my ($match) = m/^\/(.+\/)*/;
or rather (to prevent the leaning toothpick syndrome)
my ($match) = m{^/(.+/)*};
i.e. running the match in list context returns the matching capture groups, and the first one is assinged to $match.

Related

Make a regular expression in perl to grep value work on a string with different endings

I have this code in perl where I want to extract the value of 'EUR_AF', in this case '0.39'.
Sometimes 'EUR_AF' ends with ';', sometimes it doesn't.
Alternatively, 'EUR_AF' may end with '=0' instead of '=0.39;' or '=0.39'.
How do I make the code handle that? Can't seem to find it online...I could of course wrap everything in an almost endless if-elsif-else statement, but that seems overkill.
Example text:
AVGPOST=0.9092;AN=2184;RSQ=0.5988;ERATE=0.0081;AC=144;VT=SNP;THETA=0.0045;AA=A;SNPSOURCE=LOWCOV;LDAF=0.0959;AF=0.07;ASN_AF=0.05;AMR_AF=0.10;AFR_AF=0.11;EUR_AF=0.039
Code: $INFO =~ m/\;EUR\_AF\=(.*?)(;)/
I did find that: $INFO =~ m/\;EUR\_AF\=(.*?0)/ handles the cases of EUR_AF=0, but how to handle alternative scenarios efficiently?
Extract one value:
my ($eur_af) = $s =~ /(?:^|;)EUR_AF=([^;]*)/;
my ($eur_af) = ";$s" =~ /;EUR_AF=([^;]*)/;
Extract all values:
my %rec = split(/[=;]/, $s);
my $eur_af = $rec{EUR_AF};
This regex should work for you: (?<=EUR_AF=)\d+(\.\d+)?
It means
(?<=EUR_AF=) - look for a string preceeded by EUR_AF=
\d+(\.\d+)? - consist of a digit, optionally a decimal digit
EDIT: I originally wanted the whole regex to return the correct result, not only the capture group. If you want the correct capture group edit it to (?<=EUR_AF=)(\d+(?:\.\d+)?)
I have found the answer. The code:
$INFO =~ m/(?:^|;)EUR_AF=([^;]*)/
seems to handle the cases where EUR_AF=0 and EUR_AF=0.39, ending with or without ;. The resulting $INFO will be 0 or 0.39.

Regular Expression - Perl

I am trying to get the a sub string from a string using regular expression but it getting error as my regular expression is not working. Can any one help me out in writing correct one :
Here is the Pattern on which i am trying to write the regular expression :
MSM8_BD_V4.3_1-1_idle-Kr_Run3.xlsx
MSM8_BD_V4.3_2-6_mp3-Kr_Run2.xlsx
MSM8_BD_V4.3_Camera_snap-7.xlsx
MSM8_BD_V4.3_Camera_snap-8.xlsx
MSM8_BD_V4.3_Radio_202.16-0.xlsx
I am trying to get the bold part of the substring .
below is the Regular expression i tried:
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
my ($captured) = $line =~ /MSM8939_BD_V4\.\3\_[d]*(.+?)\w/gx;
print "$captured\n";
[d] matches nothing but the literal letter d. You want \d, without the brackets, to match a digit. However, it looks like you also want to include underscores. That would be [\d_].
Try this:
/^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/
If I run this on your input (with e.g. perl -nE 'say $1 if /^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/'), I get this output:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
for (qw(
MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx
MSM8939_BD_V4.3_2-6_mp3-Kratos_Run2.xlsx
MSM8939_BD_V4.3_Camera_snap-7.xlsx
MSM8939_BD_V4.3_Camera_snap-8.xlsx
MSM8939_BD_V4.3_Radio_202.16-0.xlsx
)) {
my ($captured) = ($_ =~ /.*[-_]([^\W_]+_[\w.]+)-/gx);
print "$captured\n";
}
Use a greedy pattern to go as far as possible, then grab the last two strings that look like what you want which are still followed by a hyphen.
As does the other answer which was just edited while I was typing, this produces:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
This one may be more general in that the beginning of the substring is not hard-coded, i.e., you could use it in other cases which did not necessarily start with MSM8_BD_V4.3.

Powershell regex replace only first hit

I'm trying to use a regular expression to replace the first character after a single hit, while using PowerShell.
No matter how I try, I can't seem to make it work. Here's what I'm talking about:
Code:
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]'/*'
$regex.Replace($info,"/C",1)
$regex
Output:
/CAB/F/*ZXCVBN/MTF/ ---
I'm simply trying to replace the /F in the expression with /C, but it fails every time.
I'm using /* since I don't really know what character will I find after the first / but that's what I want to replace in the end of the day.
I pretty sure this will be pretty simple but, as you can see, I'm, just not familiar enough with regular expressions.
Ok, rather than just a comment I guess I'll add an answer. You can use a negative lookbehind to make sure that there are no /'s before what you are matching, so it will only match the first one. Also, as Noah stated the * is not a wildcard, . is. This will match any / plus 1 character that does not have another / anywhere before it in the string:
"(?<!/.*)/."
So in context to your code, it would look like this:
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]"(?<!/)/."
$regex.Replace($info,"/C",1)
Those lines will output:
AB/C/*ZXCVBN/MTF/ ---
Edit: RegEx broken down at RegEx101: http://regex101.com/r/tI7oN1/1
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]'^([^/]*)/[a-zA-Z]'
$regex.Replace($info,"$1/C",1)
$regex
^([^/]*) - this looks for anything but slashes at the beginning, captured in a group
/[a-zA-Z] - then a slash followed by a letter
The replacement puts back whatever was matched by the first group, and adds /C
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]'/.'
$regex.Replace($info,"/C",1)
Or simply:
$info -replace '^(.*?)/.','$1/C'
If you don't want to use the regex, you can accomplish the same thing with -split and -join:
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$info -split '/.',2 -join '/C'
AB/C/*ZXCVBN/MTF/ ---
The ,2 will stop the split after the first match (2 elements). Then re-join the elements with /C.
You are misunderstanding * in regex. It is not a wildcard character.
The star in regex means capture 0 or more of the preceding items in the expression.
Wildcard in regex is actually a period.

Regular expression using powershell

Here's is the scenario, i have these lines mentioned below i wanted to extract only the middle character in between two dots.
"scvmm.new.resources" --> This after an regular expression match should return only "new"
"sc.new1.rerces" --> This after an regular expression match should return only "new1"
What my basic requirement was to exract anything between two dots anything can come in prefix and suffix
(.*).<required code>.(.*)
Could anyone please help me out??
You can do that without using regex. Split the string on '.' and grab the middle element:
PS> "scvmm.new.resources".Split('.')[1]
new
Or this
'scvmm.new.resources' -replace '.*\.(.*)\..*', '$1'
Like this:
([regex]::Match("scvmm.new1.resources", '(?<=\.)([^\.]*)(?=\.)' )).value
You don't actually need regular expressions for such a trivial substring extraction. Like Shay's Split('.') one can use IndexOf() for similar effect like so,
$s = "scvmm.new.resources"
$l = $s.IndexOf(".")+1
$r = $s.IndexOf(".", $l)
$s.Substring($l, $r-$l) # Prints new
$s = "sc.new1.rerces"
$l = $s.IndexOf(".")+1
$r = $s.IndexOf(".", $l)
$s.Substring($l, $r-$l) # Prints new1
This looks the first occurence of a dot. Then it looks for first occurense of a dot after the first hit. Then it extracts the characters between the two locations. This is useful in, say, scenarios in which the separation characters are not the same (though the Split() way would work in many cases too).

How to have a variable as regex in Perl

I think this question is repeated, but searching wasn't helpful for me.
my $pattern = "javascript:window.open\('([^']+)'\);";
$mech->content =~ m/($pattern)/;
print $1;
I want to have an external $pattern in the regular expression. How can I do this? The current one returns:
Use of uninitialized value $1 in print at main.pm line 20.
$1 was empty, so the match did not succeed. I'll make up a constant string in my example of which I know that it will match the pattern.
Declare your regular expression with qr, not as a simple string. Also, you're capturing twice, once in $pattern for the open call's parentheses, once in the m operator for the whole thing, therefore you get two results. Instead of $1, $2 etc. I prefer to assign the results to an array.
my $pattern = qr"javascript:window.open\('([^']+)'\);";
my $content = "javascript:window.open('something');";
my #results = $content =~ m/($pattern)/;
# expression return array
# (
# q{javascript:window.open('something');'},
# 'something'
# )
When I compile that string into a regex, like so:
my $pattern = "javascript:window.open\('([^']+)'\);";
my $regex = qr/$pattern/;
I get just what I think I should get, following regex:
(?-xism:javascript:window.open('([^']+)');)/
Notice that it it is looking for a capture group and not an open paren at the end of 'open'. And in that capture group, the first thing it expects is a single quote. So it will match
javascript:window.open'fum';
but not
javascript:window.open('fum');
One thing you have to learn, is that in Perl, "\(" is the same thing as "(" you're just telling Perl that you want a literal '(' in the string. In order to get lasting escapes, you need to double them.
my $pattern = "javascript:window.open\\('([^']+)'\\);";
my $regex = qr/$pattern/;
Actually preserves the literal ( and yields:
(?-xism:javascript:window.open\('([^']+)'\);)
Which is what I think you want.
As for your question, you should always test the results of a match before using it.
if ( $mech->content =~ m/($pattern)/ ) {
print $1;
}
makes much more sense. And if you want to see it regardless, then it's already implicit in that idea that it might not have a value. i.e., you might not have matched anything. In that case it's best to put alternatives
$mech->content =~ m/($pattern)/;
print $1 || 'UNDEF!';
However, I prefer to grab my captures in the same statement, like so:
my ( $open_arg ) = $mech->content =~ m/($pattern)/;
print $open_arg || 'UNDEF!';
The parens around $open_arg puts the match into a "list context" and returns the captures in a list. Here I'm only expecting one value, so that's all I'm providing for.
Finally, one of the root causes of your problems is that you do not need to specify your expression in a string in order for your regex to be "portable". You can get perl to pre-compile your expression. That way, you only care what instructions the characters are to a regex and not whether or not you'll save your escapes until it is compiled into an expression.
A compiled regex will interpolate itself into other regexes properly. Thus, you get a portable expression that interpolates just as well as a string--and specifically correctly handles instructions that could be lost in a string.
my $pattern = qr/javascript:window.open\('([^']+)'\);/;
Is all that you need. Then you can use it, just as you did. Although, putting parens around the whole thing, would return the whole matched expression (and not just what's between the quotes).
You do not need the parentheses in the match pattern. It will match the whole pattern and return that as $1, which I am guess is not matching, but I am only guessing.
$mech->content =~ m/$pattern/;
or
$mech->content =~ m/(?:$pattern)/;
These are the clustering, non-capturing parentheses.
The way you are doing it is correct.
The solutions have been already given, I'd like to point out that the window.open call might have multiple parameters included in "" and grouped by comma like:
javascript:window.open("http://www.javascript-coder.com","mywindow","status=1,toolbar=1");
There might be spaces between the function name and parentheses, so I'd use a slighty different regex for that:
my $pattern = qr{
javascript:window.open\s*
\(
([^)]+)
\)
}x;
print $1 if $text =~ /$pattern/;
Now you have all parameters in $1 and can process them afterwards with split /,/, $stuff and so on.
It reports an uninitialized value because $1 is undefined. $1 is undefined because you have created a nested matching group by wrapping a second set of parentheses around the pattern. It will also be undefined if nothing matches your pattern.