EXtracting sub-string in Perl? - regex

I have a string in a variable:
$mystr = "some text %PARTS/dir1/dir2/myfile.abc some more text";
Now %PARTS is literally present in the string, it is not a variable or hash.
I want to extract the sub-string %PARTS/dir1/dir2/myfile.abc from it. I created the following reg expression. I am just a beginner in Perl. So please let me know if I have done anything wrong.
my $local_file = substr ($mystr, index($mystr, '%PARTS'), index($mystr, /.*%PARTS ?/));
I even tried this:
my $local_file = substr ($mystr, index($mystr, '%PARTS'), index($mystr, /.*%PARTS' '?/));
But both give nothing if I print $local_file.
What might be wrong here?
Thank You.
UPDATE: Referred the following sites for using this method:
http://perlmeme.org/howtos/perlfunc/substr.html see example 1c
How to take substring of a given string until the first appearance of specified character?

The index function returns the first index of the occurrence of a substring in a string, else -1. It has nothing to do with regular expressions.
Regular expressions are applied to a string with the bind operator =~.
To extract the matched area of a regular expression, enclose the pattern in parens (a capture group). The matched substring will then be available in $1:
my $str = "some text %PARTS/dir1/dir2/myfile.abc some more text";
if ($str =~ /(%PARTS\S+)/) {
my $local_file = $1;
...; # do something
} else {
die "the match failed"; # do something else
}
The \S character class will match every non-space character.
To learn about regular expressions, you can look at the perlretut.

The index function is not related to regexps. Its arguments are just strings, not regexps. So your usage is wrong.
Regexps are a powerful feature of Perl and the most appropriate tool for this task:
my ($local_file) = $mystr =~ /(%PARTS[^ ]+)/;
See perlop for more information on the =~ operator.

Related

Regex matching by scalars in perl

I am using regular expression using scalars here. First time though. I will put the code. It should be self evident
#!/usr/bin/perl
my $regex = "PM*C";
my $var = "PM_MY_CALC";
if($var =~ m/$regex/){
print "match \n";
}
else{
print "no match\n";
}
The output that I get is "no match"..
am i missing something obvious here? obviously It did not match any other stuff.. so just made both the regex and the variable to be checked equal.. still no match.
I have tried doing this too..
if($var =~ $regex ){
based on some search from perlMonks.
am i missing something obvious here?
You're missing how regular expressions work. They don't work how shell filename expansion works.
Your regex uses * which means "zero of more of the preceding character". So M* matches nothing, 'M', 'MM', 'MMM', etc.
You wanted to match "PM" followed by any number of any character followed by "C". The correct regex for that is PM.*C. A dot (.) means "match (almost) any character" and (as I said above) * matches zero or more of that.
I recommend reading the Perl Regular Expression tutorial.

regexp if start with \{ or \"

I'm trying to write a regular expression that test if a variable start with a string character in TCL, I wrote this code but it doesn't work
if {[regexp {^\"\{.*} $data]} {puts "something" }
*string char in TCL starts with { or "
You need to pick the right regular expression and use it correctly. This can get a lot less confusing if you store the RE in a variable first, particularly with large regular expressions, but even in this case it helps you understand the difference between the literal RE and how it is used.
set RE {^[\"\{]}
if {[regexp $RE $theString]} {
puts "something"
}
Note that Tcl does not anchor its RE matching by default, so you don't need a leading or trailing .* if you are just determining if a RE matches.

Regular Expression - Perl

I am trying to get the a sub string from a string using regular expression but it getting error as my regular expression is not working. Can any one help me out in writing correct one :
Here is the Pattern on which i am trying to write the regular expression :
MSM8_BD_V4.3_1-1_idle-Kr_Run3.xlsx
MSM8_BD_V4.3_2-6_mp3-Kr_Run2.xlsx
MSM8_BD_V4.3_Camera_snap-7.xlsx
MSM8_BD_V4.3_Camera_snap-8.xlsx
MSM8_BD_V4.3_Radio_202.16-0.xlsx
I am trying to get the bold part of the substring .
below is the Regular expression i tried:
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
my ($captured) = $line =~ /MSM8939_BD_V4\.\3\_[d]*(.+?)\w/gx;
print "$captured\n";
[d] matches nothing but the literal letter d. You want \d, without the brackets, to match a digit. However, it looks like you also want to include underscores. That would be [\d_].
Try this:
/^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/
If I run this on your input (with e.g. perl -nE 'say $1 if /^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/'), I get this output:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
for (qw(
MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx
MSM8939_BD_V4.3_2-6_mp3-Kratos_Run2.xlsx
MSM8939_BD_V4.3_Camera_snap-7.xlsx
MSM8939_BD_V4.3_Camera_snap-8.xlsx
MSM8939_BD_V4.3_Radio_202.16-0.xlsx
)) {
my ($captured) = ($_ =~ /.*[-_]([^\W_]+_[\w.]+)-/gx);
print "$captured\n";
}
Use a greedy pattern to go as far as possible, then grab the last two strings that look like what you want which are still followed by a hyphen.
As does the other answer which was just edited while I was typing, this produces:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
This one may be more general in that the beginning of the substring is not hard-coded, i.e., you could use it in other cases which did not necessarily start with MSM8_BD_V4.3.

Regular expression using powershell

Here's is the scenario, i have these lines mentioned below i wanted to extract only the middle character in between two dots.
"scvmm.new.resources" --> This after an regular expression match should return only "new"
"sc.new1.rerces" --> This after an regular expression match should return only "new1"
What my basic requirement was to exract anything between two dots anything can come in prefix and suffix
(.*).<required code>.(.*)
Could anyone please help me out??
You can do that without using regex. Split the string on '.' and grab the middle element:
PS> "scvmm.new.resources".Split('.')[1]
new
Or this
'scvmm.new.resources' -replace '.*\.(.*)\..*', '$1'
Like this:
([regex]::Match("scvmm.new1.resources", '(?<=\.)([^\.]*)(?=\.)' )).value
You don't actually need regular expressions for such a trivial substring extraction. Like Shay's Split('.') one can use IndexOf() for similar effect like so,
$s = "scvmm.new.resources"
$l = $s.IndexOf(".")+1
$r = $s.IndexOf(".", $l)
$s.Substring($l, $r-$l) # Prints new
$s = "sc.new1.rerces"
$l = $s.IndexOf(".")+1
$r = $s.IndexOf(".", $l)
$s.Substring($l, $r-$l) # Prints new1
This looks the first occurence of a dot. Then it looks for first occurense of a dot after the first hit. Then it extracts the characters between the two locations. This is useful in, say, scenarios in which the separation characters are not the same (though the Split() way would work in many cases too).

How can I capture multiple matches from the same Perl regex?

I'm trying to parse a single string and get multiple chunks of data out from the same string with the same regex conditions. I'm parsing a single HTML doc that is static (For an undisclosed reason, I can't use an HTML parser to do the job.) I have an expression that looks like:
$string =~ /\<img\ssrc\="(.*)"/;
and I want to get the value of $1. However, in the one string, there are many img tags like this, so I need something like an array returned (#1?) is this possible?
As Jim's answer, use the /g modifier (in list context or in a loop).
But beware of greediness, you dont want the .* to match more than necessary (and dont escape < = , they are not special).
while($string =~ /<img\s+src="(.*?)"/g ) {
...
}
#list = ($string =~ m/\<img\ssrc\="(.*)"/g);
The g modifier matches all occurences in the string. List context returns all of the matches. See the m// operator in perlop.
You just need the global modifier /g at the end of the match. Then loop through
until there are no matches remaining
my #matches;
while ($string =~ /\<img\ssrc\="(.*)"/g) {
push(#matches, $1);
}
Use the /g modifier and list context on the left, as in
#result = $string =~ /\<img\ssrc\="(.*)"/g;