find the occurrences of particular string in give string? - regex

hi friends now 'm working in perl i need to check the give string occurence in a set of array using perl!i tried lot but i can't can anybody?
open FILE, "<", "tab_name.txt" or die $!;
my #tabName=<FILE>;
my #all_files= <*>;
foreach my $tab(#tabName){
$_=$tab;
my $pgr="PGR Usage";
if(m/$pgr/)
{
for(my $t=0;scalar #all_files;$t++){
my $file_name='PGR.csv';
$_=$all_files[$t];
if(m\$file_name\)
{
print $file_name;
}
}
print "\n$tab\n";
}
}

Here is a problem:
for(my $t=0;scalar #all_files;$t++){
The second part of the for loop needs to be a condition, such as:
for(my $t=0;$t < #all_files;$t++){
Your code as written will never end.
However, this is much better:
foreach (#all_files){
In addition, you have a problem with your regex. A variable in a regex is treated as a regular expression. . is a special character matching anything. Thus, your code would match PGR.csv, but also PGRacsv, etc. And it would also match filenames where that is a part of the name, such as FOO_PGR.csvblah. To solve this:
Use quote literal (\Q...\E) to make sure the filename is treated literally.
Use anchors to match the beginning and end of the string (^, $).
Also, backslashes are valid, but they are a strange character to use.
The corrected regex looks like this:
m/^\Q$file_name\E$/
Also, you should put this at the top of every script you write:
use warnings;
use strict;

This line :
for(my $t=0;scalar #all_files;$t++){
produces an infinite loop, you'd better use:
for(my $t=0;$t < #all_files;$t++){

Aside from the problems you have going through the array, are you looking for substr?

Related

Perl Regex Find and Return Every Possible Match

Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string
EDIT CHANGE STRING FOR DEMO PURPOSES
"A.....B.....B......B......B......B"
And I want to find every possible sequence of "A.......B"
This code will give me the shortest possible return and exit the while loop
while($string =~ m/(A(.*?)B)/gi) {
print "found\n";
my $substr = $1;
print $substr."\n";
}
And this will give me the longest and exit the while loop.
$string =~ m/(A(.*)B)/gi
But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?
EDIT ADDED DESIRED OUTPUT BELOW
found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B
There are various ways to parse the string so to scoop up what you want.
For example, use regex to step through all A...A substrings and process each capture
use warnings;
use strict;
use feature 'say';
my $s = "A.....B.....B......B......B......B";
while ($s =~ m/(A.*)(?=A|$)/gi) {
my #seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say #seqs[0..$i] if $i % 2 != 0;
}
}
The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B here).
The above prints
A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B
There may be bioinformatics modules that do this but I am not familiar with them.

Replace text between START & END strings excluding the END string in perl

I was going through examples and questions on the web related to finding and replacing a text between two strings (say START and END) using perl. And I was successful in doing that with the provided solutions. In this case my START and END are also inclusive to replacement text. The syntax I used was s/START(.*?)END/replace_text/s to replace multiple lines between START and END but stop replacing when first occurrence of END is hit.
But I would like to know how to replace a text between START and END excluding END like this in perl only.
Before:
START
I am learning patten matching.
I am using perl.
END
After:
Successfully replaced.
END
To perform the check but avoid matching the characters you can use positive look ahead:
s/START.*?(?=END)/replace_text/s
One solution is to capture the END and use it in the replacement text.
s/START(.*?)(END)/replace_text$2/s
Another option is using range operator .. to ignore every line of input until you find the end marker of a block, then output the replace string and end marker:
#!/usr/bin/perl
use strict;
use warnings;
my $rep_str = 'Successfully replaced.';
while (<>) {
my $switch = m/^START/ .. /^END/;
print unless $switch;
print "$rep_str\n$_" if $switch =~ m/E0$/;
}
It is quite easy to adapt it to work for an array of string:
foreach (#strings) {
my $switch = ...
...
}
To use look-around assertions you need to redefine the input record separator ($/) (see perlvar), perhaps to slurp the while file into memory. To avoid this, the range ("flip-flop") operator is quite useful:
while (<>) {
if (/^START/../^END/) {
next unless m{^END};
print "substituted_text\n";
print;
}
else {
print;
}
}
The above preserves any lines in the output that precede or follow the START/END block.

Perl Regex, get strings between two strings

I am new to Perl and trying to use Regex to get a piece of string between two tags that I know will be there in that string. I already tried various answers from stackoverflow but none of them seems to be working for me. Here's my example...
The required data is in $info variable out of which I want to get the useful data
my $info = "random text i do not want\n|BIRTH PLACE=Boston, MA\n|more unwanted random text";
The Useful Data in the above string is Boston, MA. I removed the newlines from the string by $info =~ s/\n//g;. Now $info has this string "random text i do not want|BIRTH PLACE=Boston, MA|more unwanted random text". I thought doing this will help me capture the required data easily.
Please help me in getting the required data. I am sure that the data will always be preceded by |BIRTH PLACE= and succeeded by |. Everything before and after that is unwanted text. If a question like this is already answered please guide me to it as well. Thanks.
Instead of replacing everything around it, you could search for /\|BIRTH PLACE=([^\|]+)\n\|/, [^\|]+ being one or more of anything that is not a pipe.
$info =~ m{\|BIRTH PLACE=(.*?)\|} or die "There is no data in \$info?!";
my $birth_place = $1;
That should do the trick.
You know, actually, those newlines might have helped you. I would have gone for an initial regular expression of:
/^\|BIRTH PLACE=(.*)$/m
Using the multiline modifer (m) to match ^ at the beginning of a line and $ at the end of it, instead of just matching at the beginning and end of the string. Heck, you can even get really crazy and match:
/(?<=^\|BIRTH PLACE=).+$/m
To capture only the information you want, using lookbehind ((?<= ... )) to assert that it's the birth place information.
Why curse the string twice when you can do it once?
So, in perl:
if ($info =~ m/(?<=^\|BIRTH PLACE=).+$/m) {
print "Born in $&.\n";
} else {
print "From parts unknown";
}
You have presumably read this data from a file, which is a bad start. You program should look like this
use strict;
use warnings;
use autodie;
open my $fh, '<', 'myfile';
my $pob;
while (<$fh>) {
if (/BIRTH PLACE=(.+)/) {
$pob = $1;
last;
}
}
print $pob;
output
Boston, MA

cant get the perl regex to work

My perl is getting rusty. It only prints "matched=" but $1 is blank!?!
EDIT 1: WHo the h#$! downvoted this? There are no wrong questions. If you dont like it, move on to next one!
$crazy="abcd\r\nallo\nXYZ\n\n\nQQQ";
if ($crazy =~ m/([.\n\r]+)/gsi) {
print "matched=", $1, "\n";
} else {
print "not matched!\n";
}
EDIT 2: This is the code fragment with updated regex, works great!
$crazy="abcd\r\nallo\nXYZ\n\n\nQQQ";
if ($crazy =~ m/([\s\S]+)/gsi) {
print "matched=", $1, "\n";
} else {
print "not matched!\n";
}
EDIT 3: Haha, i see perl police strikes yet again!!!
I don't know if this is your exact problem, but inside square brackets, '.' is just looking for a period. I didn't see a period in the input, so I wondered which you meant.
Aside from the period, the rest of the character class is looking for consecutive whitespace. And as you didn't use the multiline switch, you've got newlines being counted as whitespace (and any character), but no indication to scan beyond the first record separator. But because of the way that you print it out, it also gives some indication that you meant more than the literal period, as mentioned above.
Axeman is correct; your problem is that . in a character class doesn't do what you expect.
By default, . outside a character class (and not backslashed) matches any character but a newline. If you want to include newlines, you specify the /s flag (which you seem to already have) on your regex or put the . in a (?s:...) group:
my $crazy="abcd\r\nallo\nXYZ\n\n\nQQQ";
if ($crazy =~ m/((?s:.+))/) {
print "matched=", $1, "\n";
} else {
print "not matched!\n";
}
. in a character class is a literal period, not match anything. What you really want is /(.+)/s. The /g flag says to match multiple times, but you are using the regex in scalar context, so it will only match the first item. The /i flag makes the regex case insensitive, but there are no characters with case in your regex. The \s flag makes . match newlines, and it always matches "\r", so instead of [.\n\r], you can just use ..
However, /(.+)/s will match any string with one or more characters, so you would be better off with
my $crazy="abcd\r\nallo\nXYZ\n\n\nQQQ";
if (length $crazy) {
print "matched=$crazy\n";
} else {
print "not matched!\n";
}
It is possible you meant to do something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $crazy = "abcd\r\nallo\nXYZ\n\n\nQQQ";
while ($crazy =~ /(.+)[\r\n]+/g) {
print "matched=$1\n";
}
But that would probably be better phrased:
#!/usr/bin/perl
use strict;
use warnings;
my $crazy = "abcd\r\nallo\nXYZ\n\n\nQQQ";
for my $part (split /[\r\n]+/, $crazy) {
print "matched=$part\n";
}
$1 contains white space, that's why you don't see it in a print like that, just add something after it/quote it.
Example:
perl -E "qq'abcd\r\nallo\nXYZ\n\n\nQQQ'=~/([.\n\r]+)/gsi;say 'got(',length($1),qq') >$1<';"
got(2) >
<
Updated for your comments:
To match everything you can simply use /(.+)/s
[.] (dot inside a character class) does not mean "match any character", it just means match the literal . character. So in an input string without any dots,
m/([.\n\r]+)/gsi
will just match strings of \n and \r characters.
With the /s modifier, you are already asking the regex engine to include newlines with . (match any character), so you could just write
m/(.+)/gsi

how do you match two strings in two different variables using regular expressions?

$a='program';
$b='programming';
if ($b=~ /[$a]/){print "true";}
this is not working
thanks every one i was a little confused
The [] in regex mean character class which match any one of the character listed inside it.
Your regex is equivalent to:
$b=~ /[program]/
which returns true as character p is found in $b.
To see if the match happens or not you are printing true, printing true will not show anything. Try printing something else.
But if you wanted to see if one string is present inside another you have to drop the [..] as:
if ($b=~ /$a/) { print true';}
If variable $a contained any regex metacharacter then the above matching will fail to fix that place the regex between \Q and \E so that any metacharacters in the regex will be escaped:
if ($b=~ /\Q$a\E/) { print true';}
Assuming either variable may come from external input, please quote the variables inside the regex:
if ($b=~ /\Q$a\E/){print true;}
You then won't get burned when the pattern you'll be looking for will contain "reserved characters" like any of -[]{}().
(apart the missing semicolons:) Why do you put $a in square brackets? This makes it a list of possible characters. Try:
$b =~ /\Q${a}\E/
Update
To answer your remarks regarding = and =~:
=~ is the matching operator, and specifies the variable to which you are applying the regex ($b) in your example above. If you omit =~, then Perl will automatically use an implied $_ =~.
The result of a regular expression is an array containing the matches. You usually assign this so an array, such as in ($match1, $match2) = $b =~ /.../;. If, on the other hand, you assign the result to a scalar, then the scalar will be assigned the number of elements in that array.
So if you write $b = /\Q$a\E/, you'll end up with $b = $_ =~ /\Q$a\E/.
$a='program';
$b='programming';
if ( $b =~ /\Q$a\E/) {
print "match found\n";
}
If you're just looking for whether one string is contained within another and don't need to use any character classes, quantifiers, etc., then there's really no need to fire up the regex engine to do an exact literal match. Consider using index instead:#!/usr/bin/env perl
#!/usr/bin/env perl
use strict;
use warnings;
my $target = 'program';
my $string = 'programming';
if (index($string, $target) > -1) {
print "target is in string\n";
}