Find n occurrences from group of characters

Find n occurrences from group of characters - regex

Given a string, I am suppose to print "two" if i find exactly two characters from the group xyz.
Given jxyl print two
Given jxyzl print nothing
Given jxxl print two
I am very new to perl so this is my approach.
my $word = "jxyl";
#char = split //, $word;
my $size = $#char;
for ( $i = 0; $i < $size - 1; $i++ ) {
if ( $char[i] eq "x" || $char[i] eq "y" || $char eq "z" ) {
print "two";
}
}
Can anyone tell me why this is isn't working correctly?

From the FAQ:
perldoc -q count
How can I count the number of occurrences of a substring within a string?
use warnings;
use strict;
while (<DATA>) {
chomp;
my $count = () = $_ =~ /[xyz]/g;
print "$_ two\n" if $count == 2;
}
__DATA__
jxyl
jxyzl
jxxl
Outputs:
jxyl two
jxxl two

You basically want to count the number of specific characters in a string.
You can use tr:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my $count = $_ =~ tr/xyz//;
print "$_ - $count\n";
}
__DATA__
jxyl
jxyzl
jxxl
Outputs:
jxyl - 2
jxyzl - 3
jxxl - 2
Determining if there are exactly 2 can be done after the counting.

Definitely not the best way to do it, but here is a regex for fun and to show there is more than one way to do things.
perl -e'$word = "jxyl"; print "two" if $word =~ /^[^xyz]*[xyz][^xyz]*[xyz][^xyz]*$/'

Related

Counting number of pattern matches in Perl

I am VERY new to perl, and to programming in general.
I have been searching for the past couple of days on how to count the number of pattern matches; I have had a hard time understanding others solutions and applying them to the code I have already written.
Basically, I have a sequence and I need to find all the patterns that match [TC]C[CT]GGAAGC
I believe I have that part down. but I am stuck on counting the number of occurrences of each pattern match. Does anyone know how to edit the code I already have to do this? Any advice is welcomed. Thanks!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
# open fasta file for reading
unless( open( FASTA, "<", '/scratch/Drosophila/dmel-all-chromosome- r6.02.fasta' )) {
die "Can't open dmel-all-chromosome-r6.02.fasta for reading:", $!;
}
#split the fasta record
local $/ = ">";
#scan through fasta file
while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
}
}
}
Update, I have added in
my #matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
print scalar #matches;
In the code below. However, it seems to be outputting 0 in front of each pattern match, instead of outputting the total sum of all pattern matches.
while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
my #matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
print scalar #matches;
}
}
}
Edit: I need the output to list ever pattern match found. I also need it to find the total number of matches found. For example:
CCTGGAAGC
TCTGGAAGC
TCCGGAAGC
3 matches found

counting the number of occurrences of each pattern match
my #matches = $string =~ /pattern/g
#matches array will contain all the matched parts. You can then do below to get the count.
print scalar #matches
Or you could directly write
my $matches = () = $string =~ /pattern/
I would suggest you to use the former as you might need to check "what was matched" in future (perhaps for debugging?).
Example 1:
use strict;
use warnings;
my $string = 'John Doe John Done';
my $matches = () = $string =~ /John/g;
print $matches; #prints 2
Example 2:
use strict;
use warnings;
my $string = 'John Doe John Done';
my #matches = $string =~ /John/g;
print "#matches"; #prints John John
print scalar #matches; #prints 2
Edit:
while ( my #matches = $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
print "Count of matches:". scalar #matches;
}

As you have written the code, you have to count the matches yourself:
local $/ = ">";
my $count = 0;
#scan through fasta file
while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
$count = $count +1;
}
}
}
print "Fount $count matches\n";
should do the job.
HTH Georg

my #count = ($seq =~ /([TC]C[CT]GGAAGC)/g);
print scalar #count ;

Counting occurrences of a word in a string in Perl

I am trying to find out the number of occurrences of "The/the". Below is the code I tried"
print ("Enter the String.\n");
$inputline = <STDIN>;
chop($inputline);
$regex="\[Tt\]he";
if($inputline ne "")
{
#splitarr= split(/$regex/,$inputline);
}
$scalar=#splitarr;
print $scalar;
The string is :
Hello the how are you the wanna work on the project but i the u the
The
The output that it gives is 7. However with the string :
Hello the how are you the wanna work on the project but i the u the
the output is 5. I suspect my regex. Can anyone help in pointing out what's wrong.

I get the correct number - 6 - for the first string
However your method is wrong, because if you count the number of pieces you get by splitting on the regex pattern it will give you different values depending on whether the word appears at the beginning of the string. You should also put word boundaries \b into your regular expression to prevent the regex from matching something like theory
Also, it is unnecessary to escape the square brackets, and you can use the /i modifier to do a case-independent match
Try something like this instead
use strict;
use warnings;
print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;
my $regex = 'the';
if ( $inputline ne '' ) {
my #matches = $inputline =~ /\b$regex\b/gi;
print scalar #matches, " occurrences\n";
}

With split, you're counting the substrings between the the's. Use match instead:
#!/usr/bin/perl
use warnings;
use strict;
my $regex = qr/[Tt]he/;
for my $string ('Hello the how are you the wanna work on the project but i the u the The',
'Hello the how are you the wanna work on the project but i the u the',
'the theological cathedral'
) {
my $count = () = $string =~ /$regex/g;
print $count, "\n";
my #between = split /$regex/, $string;
print 0 + #between, "\n";
print join '|', #between;
print "\n";
}
Note that both methods return the same number for the two inputs you mentioned (and the first one returns 6, not 7).

The following snippet uses a code side-effect to increment a counter, followed by an always-failing match to keep searching. It produces the correct answer for matches that overlap (e.g. "aaaa" contains "aa" 3 times, not 2). The split-based answers don't get that right.
my $i;
my $string;
$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";

What you need is 'countof' operator to count the number of matches:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;
If you want to select only the word the or The, add word boundary:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;

replace {x} with param in string

I want to replace {x} where x is a number from 1-10 with a string from an array.
The array is populated by splitting a string with whitespace.
I have put together some code but the regex is probably wrong.
my #params = split(' ', "Paramtest: {0} {1} {2}");
my $count = #params;
for (my $i = 0; $i <= $count; $i++) {
my $param = #params->[$i];
$cmd_data =~ s/{"$i"}/"$param"/;
if(!$cmd_data) {
$server->command(sprintf("msg $target %s incorrect syntax for %s.", $nick, "!params p1 p2 p3"));
return;
}
}
$server->command(sprintf("msg $target %s.", $cmd_data));
Update
I've tried using the below code as a modified version of Miller's (the first answer)
my #params = split(' ', "!fruit oranges apples");
my $cmd_data = "Fruits: {0} {1}";
$cmd_data =~ s{\{(\d+)\}}{
$params[$1] // die "Not found $1" #line 160
}eg;
$server->command(sprintf("msg $target %s.", $cmd_data));
Output
Not found 1 at myscript.pl line 160.

Perhaps a more generalized search and replace will serve you better:
use strict;
use warnings;
my #params = qw(zero one two three four five six seven eight);
my $string = 'My String: {0} {1} {2}';
$string =~ s{\{(\d+)\}}{
$params[$1] // die "Not found $1"
}eg;
print $string;
Outputs:
My String: zero one two

replace starting characters in string

I have a string in which i need to replace the starting set of characters with mod1.
Its like xyz_gf_111_yz to mod1_111_yz.
bcd_df_222_xx to mod2_222_xx and so on.
can anybody suggest sol, as the starting string is not fixed and im beginner in perl
thanks!

my #strings = qw(xyz_gf_111_yz bcd_df_222_xx asd_cv_333_dd);
my $i = 1;
for my $str (#strings)
{
my $after = $str;
$after =~ s/^\w{3}[_]\w{2}/mod$i/;
$i++;
print "$str -> $after\n";
}

Something like the following could get you started:
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx);
my $i = 0;
for my $str (#strings) {
my $after = $str;
$i++;
$after =~ s/[^_]+/mod$i/;
print "$str -> $after\n";
}

#Miller,
I suggest a different solution, assuming that you want to replace the starting substring (all chars to the left the first digit) and the associated digit to the "mod" string is given by the first digit of the number substring the following could be a way.
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx asd_cv_333_dd);
for my $str (#strings) {
print "bfr:".$str."\n";
$str =~ s/^([^\d]+?)_(\d)/mod$2_$2/;
print "aft:".$str."\n";
}

Here's another option:
use strict;
use warnings;
my $i;
my #strings = ( 'xyz_gf_111_yz', 'bcd_df_222_xx' );
for (#strings) {
print $_, "\n" if s/.+?_[^_]+/'mod'.++$i/e;
}
Output:
mod1_111_yz
mod2_222_xx

Extract word before the 1st occurrence of a special string

I have an array that contains elements like
#array=("link_dm &&& drv_ena&&&1",
"txp_n_los|rx_n_lost",
"eof &&& 2 &&& length =!!!drv!!!0");
I want to get all the characters before the first "&&&", and if the element doesn't have a "&&&", then I need to extract the entire element.
This is what I want to extract:
likn_dm
txp_n_los|rx_n_lost
eof
I used
foreach my $row (#array){
if($row =~ /^(.*)\&{3}/){
push #firstelements,$1;
}
}
But I'm getting
link_dm &&& drv_ena
txp_n_los|rx_n_lost
eof &&& 2
Can somebody please suggest how I can achieve this?

Perhaps just splitting would be helpful:
use strict;
use warnings;
my #array = (
"link_dm &&& drv_ena&&&1",
"txp_n_los|rx_n_lost",
"eof &&& 2 &&& length =!!!drv!!!0"
);
foreach my $row (#array){
my ($chars) = split /\&{3}/, $row, 2;
print $chars, "\n"
}
Output:
link_dm
txp_n_los|rx_n_lost
eof

You can write:
#firstelements = map { m/^(.*?) *&&&/ ? $1 : $_ } #array;
Or, if you prefer foreach over map and if over ?::
foreach my $row (#array){
if($row =~ /^(.*)\&{3}/) {
push #firstelements, $1;
} else {
push #firstelements, $row;
}
}

for (#array) {
print "$1\n" if /([^ ]*)(?: *[&]{3}.*)?$/;
}

If you're using regular expressions, use the minimum spanning pattern: .*?. See perldoc perlre http://perldoc.perl.org/perlre.html
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
my #array = (
"link_dm &&& drv_ena&&&1",
"txp_n_los|rx_n_lost",
"eof &&& 2 &&& length =!!!drv!!!0",
);
my #first_elements = ();
for my $line ( #array ){
# check for '&&&'
if( my ( $first_element ) = $line =~ m{ \A (.*?) \s* \&{3} }msx ){
push #first_elements, $first_element;
}else{
push #first_elements, $line;
}
}
print Dumper \#first_elements;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find n occurrences from group of characters - regex

From the FAQ: perldoc -q count How can I count the number of occurrences of a substring within a string? use warnings; use strict; while (<DATA>) { chomp; my $count = () = $_ =~ /[xyz]/g; print "$_ two\n" if $count == 2; } DATA jxyl jxyzl jxxl Outputs: jxyl two jxxl two

Definitely not the best way to do it, but here is a regex for fun and to show there is more than one way to do things. perl -e'$word = "jxyl"; print "two" if $word =~ /^[^xyz][xyz][^xyz][xyz][^xyz]*$/'

Related

Counting number of pattern matches in Perl

Counting occurrences of a word in a string in Perl

replace {x} with param in string

replace starting characters in string

Extract word before the 1st occurrence of a special string

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find n occurrences from group of characters - regex

From the FAQ: perldoc -q count How can I count the number of occurrences of a substring within a string? use warnings; use strict; while (<DATA>) { chomp; my $count = () = $_ =~ /[xyz]/g; print "$_ two\n" if $count == 2; } __DATA__ jxyl jxyzl jxxl Outputs: jxyl two jxxl two

Definitely not the best way to do it, but here is a regex for fun and to show there is more than one way to do things. perl -e'$word = "jxyl"; print "two" if $word =~ /^[^xyz]*[xyz][^xyz]*[xyz][^xyz]*$/'

Related

Counting number of pattern matches in Perl

Counting occurrences of a word in a string in Perl

replace {x} with param in string

replace starting characters in string

Extract word before the 1st occurrence of a special string

Categories

Resources

From the FAQ: perldoc -q count How can I count the number of occurrences of a substring within a string? use warnings; use strict; while (<DATA>) { chomp; my $count = () = $_ =~ /[xyz]/g; print "$_ two\n" if $count == 2; } DATA jxyl jxyzl jxxl Outputs: jxyl two jxxl two

Definitely not the best way to do it, but here is a regex for fun and to show there is more than one way to do things. perl -e'$word = "jxyl"; print "two" if $word =~ /^[^xyz][xyz][^xyz][xyz][^xyz]*$/'