how to grep the word between symbols using perl - regex

I need to grep the word between the symbols as shown below in an array.
my $string = "hi how r u<what is your name>what is your age";
i need to grep as
$str = what is yourname;

You could use capturing groups like below.
use strict;
use warnings;
my $line = "hi how r u?what is your name?what is your age";
my #str = $line =~ m/\?([^?]*)\?/g;
print "#str\n";
Output:
what is your name

Or, in Perl:
$string =~ /\?(.+)\?/;
$result = $1;

You can split your string on this metacharacter.
my #results = split /\?/, $string;
print $results[1];

Related

How to grep the word between two symbols in Perl?

I need to grep the word between the symbols as shown below in an array.
my $string = "?hi how r u?what is your name?what is your age?";
It's to be converted to array where array should be like:
my $array[0]="hi how r u";
my $array[1]="what is your name";
my $array[2]="what is your age";
To ignore empty results you can match the input with regex and store matched results in an array:
use strict;
use warnings;
my $string = "?hi how r u?what is your name?what is your age?";
my #matches = ( $string =~ /(?<=\?)[^?]+/g );
foreach my $i (#matches) {
print $i . "\n";
}
Output:
hi how r u
what is your name
what is your age
You can use the split function, however you have to escape the ? character, so that it won't get special treatment as part of a regular expression control character.
my #array = split '\\?', $string;

Perl - string matching issue

I have a problem I cannot understand. I have this string:
gene_id "siRNA_Z27kG1_20543"transcript_id "siRNA_Z27kG1_20543_X_1";tss_id "TSS124620"
And I want to change the gene_id. So, I have the following code:
if ($line =~ /;transcript_id "([A-Za-z0-9:\-._]*)(_[oxOX][_.][0-9]*)";/) {
$num = $2;
$line =~ s/gene_id "([A-Za-z0-9:\-._]*)";/gene_id "$1$num";/g;
print $new $line."\n";
}
The aim of my code is to change siRNA_Z27kG1_20543 for siRNA_Z27kG1_20543_X_1. However, my code does not produce that output. Why? I can't understand that.
My regex needs to be as it is because I match other strings (this time with success).
#!/usr/bin/perl
use strict;
use warnings;
my $string = q{gene_id "siRNA_Z27kG1_20543"transcript_id "siRNA_Z27kG1_20543_X_1";tss_id "TSS124620"};
if($string =~ m|transcript_id "([A-Za-z0-9:\-._]*)(_[oxOX][_.][0-9]*)"|){
my $replace_with = qq{gene_id "$1$2"};
$string =~ s/gene_id (\"\w+\")/$replace_with/g;
}
print "$string";
Output: gene_id "siRNA_Z27kG1_20543_X_1"transcript_id "siRNA_Z27kG1_20543_X_1";tss_id "TSS124620"
Demo
Remove the semicolon at the start of the pattern as it is not present in the string :-
if ($line =~ /transcript_id "([A-Za-z0-9:\-._]*)(_[oxOX][_.][0-9]*)";/) {
$num = $2;
$line =~ s/gene_id "([A-Za-z0-9:\-._]*)";/gene_id "$1$num";/g;
print $new $line."\n";
}

replace starting characters in string

I have a string in which i need to replace the starting set of characters with mod1.
Its like xyz_gf_111_yz to mod1_111_yz.
bcd_df_222_xx to mod2_222_xx and so on.
can anybody suggest sol, as the starting string is not fixed and im beginner in perl
thanks!
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx asd_cv_333_dd);
my $i = 1;
for my $str (#strings)
{
my $after = $str;
$after =~ s/^\w{3}[_]\w{2}/mod$i/;
$i++;
print "$str -> $after\n";
}
Something like the following could get you started:
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx);
my $i = 0;
for my $str (#strings) {
my $after = $str;
$i++;
$after =~ s/[^_]+/mod$i/;
print "$str -> $after\n";
}
#Miller,
I suggest a different solution, assuming that you want to replace the starting substring (all chars to the left the first digit) and the associated digit to the "mod" string is given by the first digit of the number substring the following could be a way.
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx asd_cv_333_dd);
for my $str (#strings) {
print "bfr:".$str."\n";
$str =~ s/^([^\d]+?)_(\d)/mod$2_$2/;
print "aft:".$str."\n";
}
Here's another option:
use strict;
use warnings;
my $i;
my #strings = ( 'xyz_gf_111_yz', 'bcd_df_222_xx' );
for (#strings) {
print $_, "\n" if s/.+?_[^_]+/'mod'.++$i/e;
}
Output:
mod1_111_yz
mod2_222_xx

Perl: alter a string by regex match

I'm using this code to alter a string by a regex match.
$a->{'someone'} = "a _{person}";
$a->{'person'} = "gremlin";
$string = "_{someone} and a thing"
while($string =~ /(_\{(.*?)\}/g){
$search = metaquote($1);
$replace = $a->{$2};
$string =~ s/$search/$replace/;
}
The result is a _{person} and a thing but I'm expecting: a gremlin and a thing.
What to do to get this working?
The function is called quotemeta, not metaquote. Also, a right parenthesis is missing in your regex:
#!/usr/bin/perl
use warnings;
use strict;
my $a;
$a->{'someone'} = "a _{person}";
$a->{'person'} = "gremlin";
my $string = "_{someone} and a thing";
while($string =~ /(_\{(.*?)\})/g){
my $search = quotemeta($1);
my $replace = $a->{$2};
$string =~ s/$search/$replace/;
}
print "$string\n";
I also added strict and warnings to help myself avoid common pitfalls.
I think this should be more effecient variant:
use strict;
my $a;
$a->{'someone'} = "a _{person}";
$a->{'person'} = "gremlin";
my $string = "_{someone} and a thing";
while( $string =~ s/(_\{(.*?)\})/ $a->{$2} /ges ) {}
print $string."\n";
This variant repeatedly substitues all of the placeholders in the string for their corresponding hash value until there are none left.
In addition, a is a bad identifier for any variable, so I have named it tokens.
use strict;
use warnings;
my %tokens = (
someone => 'a _{person}',
person => 'gremlin',
);
my $string = '_{someone} and a thing';
1 while $string =~ s/_\{([^}]+)\}/$tokens{$1}/g;
print $string, "\n";
output
a gremlin and a thing

How can I count the amount of spaces at the start of a string in Perl?

How can I count the amount of spaces at the start of a string in Perl?
I now have:
$temp = rtrim($line[0]);
$count = ($temp =~ tr/^ //);
But that gives me the count of all spaces.
$str =~ /^(\s*)/;
my $count = length( $1 );
If you just want actual spaces (instead of whitespace), then that would be:
$str =~ /^( *)/;
Edit: The reason why tr doesn't work is it's not a regular expression operator. What you're doing with $count = ( $temp =~ tr/^ // ); is replacing all instances of ^ and with itself (see comment below by cjm), then counting up how many replacements you've done. tr doesn't see ^ as "hey this is the beginning of the string pseudo-character" it sees it as "hey this is a ^".
You can get the offset of a match using #-. If you search for a non-whitespace character, this will be the number of whitespace characters at the start of the string:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
my $count = $s =~ /\S/ ? $-[0] : length $s;
print "'$s' has $count whitespace characters at its start\n";
}
Or, even better, use #+ to find the end of the whitespace:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
$s =~ /^\s*/;
print "$+[0] '$s'\n";
}
Here's a script that does this for every line of stdin. The relevant snippet of code is the first in the body of the loop.
#!/usr/bin/perl
while ($x = <>) {
$s = length(($x =~ m/^( +)/)[0]);
print $s, ":", $x, "\n";
}
tr/// is not a regex operator. However, you can use s///:
use strict; use warnings;
my $t = (my $s = " \t\n sdklsdjfkl");
my $n = 0;
++$n while $s =~ s{^\s}{};
print "$n \\s characters were removed from \$s\n";
$n = ( $t =~ s{^(\s*)}{} ) && length $1;
print "$n \\s characters were removed from \$t\n";
Since the regexp matcher returns the parenthesed matches when called in a list context, CanSpice's answer can be written in a single statement:
$count = length( ($line[0] =~ /^( *)/)[0] );
This prints amount of white space
echo " hello" |perl -lane 's/^(\s+)(.*)+$/length($1)/e; print'
3