check for magic numbers - regex

I have to give output as line number with "magic number found" msg, for every magic number found
I am parsing a .C file.
magic number is basically
if(list == 5) // here 5 is magic number
for(int i = 0; i<6; i++) //here 6 is magic number
MY CODE
use strict;
use warnings;
my $input = $ARGV[0];
open(FILE, $input) or die $!;
my #lines = <FILE>;
my $count = 0;
foreach(#lines){
$count++;
if($_ =~ 'for\('){ # I want to check within for( )
if($_ =~ '#####'){ # how do the check for numbers
print "magic number found at line:".$count;
}
}
elif($_ =~ 'if\('){ # I want to check within if( )
if($_ =~ '#####'){
print "magic number found at line:".$count;
}
}
}
As magic number exists only in for loop and if loop, so i want to check within bracket if there exist any decimal or hexadecimal value.

-Edited again. This time as for if condition, it first matches paired parentheses, and then looks for a number after ==.
I've made it more robust, for that it will recognise multiline condition testing. However, as other said, this might still not cover 100% of the possible cases.
use 5.14.0;
use warnings;
my $data = join '', <DATA>;
my $if = qr/if \s* ( \( (?: [^()]++ | (?-1) )*+ \) ) /spx; # matching paired parenthesis
my $for = qr/for (\s*\(.*?;.*?) (\d+) \s*? ; /spx; #(\d+) is for the Magic Number
for($data){
while (/$if/g){
my $pat = ${^MATCH};
my $line =()= ${^PREMATCH} =~ /\n/g;
# assumes a magic number exists only when '==' is present
if ($pat =~ /(.* == \s* )([0-9a-fA-F.]+)\s*\)/x){
my $mn = $2;
my $condition = $1;
$line += () = $condition =~ /\n/g;
say "Line ", $line + 1," has magic number $mn";
}
}
while (/$for/g){
my $mn = $2; #Magic Number
my $condition = $1; #For counting \n in the MATCH.
my $line =()= ${^PREMATCH} =~ /\n/g; #Counting \n in the PREMATCH.
$line += () = $condition =~ /\n/g;
say "Line ", $line + 1," has magic number $mn";
}
}
__DATA__
if(list ==
5) // here 5 is magic number
for(int i = 0; i<6
;
i++
) //here 6 is magic number
if(list ==
8) // here 8 is magic number
if (IsMirrorSubChainEnable())
{
err = ChainCtrlSetMirrorSC(pChainCtrl, pNewRouteInfo);
}
ModCallCallbacks((ModT *)pChainCtrl, kDoneVinResetCallback, NULL, 0);
Output:
Line 2 has magic number 5
Line 10 has magic number 8
Line 4 has magic number 6

Related

Loop through global substitution

$num = 6;
$str = "1 2 3 4";
while ($str =~ s/\d/$num/g)
{
print $str, "\n";
$num++;
}
Is it possible to do something like the above in perl? I would like the loop to run only 4 times and to finish with $str being 6 7 8 9.
You don't need the loop: the /g modifier causes the substitution to be repeated as many times as it matches. What you want is the /e modifier to compute the substitution. Assuming the effect you were after is to add 5 to each number, the example code is as follows.
$str = "1 2 3 4";
$str =~ s/(\d)/$1+5/eg;
print "$str\n";
If you really wanted to substitute numbers starting with 6, then this works.
$num = 6;
$str = "1 2 3 4";
$str =~ s/\d/$num++/eg;
print "$str\n";
You could with the match operator.
my $new_str = '';
while ($str =~ /\G(.*?)\d/gc) {
$new_str .= $1 . $num++;
}
$new_str .= substr($str, pos($str));
However, #TFBW's solution is probably what you want unless you're writing a tokenizer for a big parser. In a tokenizer, it would look more like:
# So we don't have to say «$str =~ all» over the place.
# Also, allows us to use redo.
for ($str) {
/\G \s+ /xgc; # Skip whitespace.
if (/\G (\d+) /xgc) {
# Do something with $1.
redo;
}
...
last if /\G \z /xgc;
die "Unrecognized token";
}

Counting number of pattern matches in Perl

I am VERY new to perl, and to programming in general.
I have been searching for the past couple of days on how to count the number of pattern matches; I have had a hard time understanding others solutions and applying them to the code I have already written.
Basically, I have a sequence and I need to find all the patterns that match [TC]C[CT]GGAAGC
I believe I have that part down. but I am stuck on counting the number of occurrences of each pattern match. Does anyone know how to edit the code I already have to do this? Any advice is welcomed. Thanks!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
# open fasta file for reading
unless( open( FASTA, "<", '/scratch/Drosophila/dmel-all-chromosome- r6.02.fasta' )) {
die "Can't open dmel-all-chromosome-r6.02.fasta for reading:", $!;
}
#split the fasta record
local $/ = ">";
#scan through fasta file
while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
}
}
}
Update, I have added in
my #matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
print scalar #matches;
In the code below. However, it seems to be outputting 0 in front of each pattern match, instead of outputting the total sum of all pattern matches.
while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
my #matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
print scalar #matches;
}
}
}
Edit: I need the output to list ever pattern match found. I also need it to find the total number of matches found. For example:
CCTGGAAGC
TCTGGAAGC
TCCGGAAGC
3 matches found
counting the number of occurrences of each pattern match
my #matches = $string =~ /pattern/g
#matches array will contain all the matched parts. You can then do below to get the count.
print scalar #matches
Or you could directly write
my $matches = () = $string =~ /pattern/
I would suggest you to use the former as you might need to check "what was matched" in future (perhaps for debugging?).
Example 1:
use strict;
use warnings;
my $string = 'John Doe John Done';
my $matches = () = $string =~ /John/g;
print $matches; #prints 2
Example 2:
use strict;
use warnings;
my $string = 'John Doe John Done';
my #matches = $string =~ /John/g;
print "#matches"; #prints John John
print scalar #matches; #prints 2
Edit:
while ( my #matches = $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
print "Count of matches:". scalar #matches;
}
As you have written the code, you have to count the matches yourself:
local $/ = ">";
my $count = 0;
#scan through fasta file
while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
$count = $count +1;
}
}
}
print "Fount $count matches\n";
should do the job.
HTH Georg
my #count = ($seq =~ /([TC]C[CT]GGAAGC)/g);
print scalar #count ;

perl count line in double looping, if match regular expression plus 1

I open a file by putting the line to an array. Inside this file based on the regular expression that contains a duplicate value. If the regular expression is a match I want to count it. The regular expression may look like this
$b =~ /\/([^\/]+)##/. I want to match $1 value.
my #array = do
{
open my $FH, '<', 'abc.txt' or die 'unable to open the file\n';
<$FH>;
};
Below is the way I do, it will get the same line in my file. Thank for help.
foreach my $b (#array)
{
$conflictTemp = 0;
$b =~ /\/([^\/]+)##/;
$b = $1;
#print "$b\n";
foreach my $c (#array)
{
$c =~ /\/([^\/]+)##/;
$c = $1;
if($b eq $c)
{
$conflictTemp ++;
#print "$b , $c \n"
#if($conflictTemp > 1)
#{
# $conflict ++;
#}
}
}
}
Below is the some sample data, two sentences are duplicates
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
It looks like you're trying to iterate each element of the array, select some data via pattern match, and then count dupes. Is that correct?
Would it not be easier to:
my %count_of;
while ( <$FH> ) {
my ( $val ) = /\/([^\/]+)##/;
$count_of{$val}++;
}
And then, for the variables that have more than one (e.g. there's a duplicate):
print join "\n", grep { $count_of{$_} > 1 } keys %count_of;
Alternatively, if you're just wanting to play 'spot the dupe':
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $match = qr/\/([^\/]+)##/;
while ( <DATA> ) {
my ( $value ) = m/$match/ or next;
print if $seen{$value}++;
}
__DATA__
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
The problem has been solved by the previous answer - I just want to offer an alternate flavour that;
Spells out the regex
Uses the %seen hash to record the line the pattern first appears; to enable
slightly more detailed reporting
use v5.12;
use warnings;
my $regex = qr/
\/ # A literal slash followed by
( # Capture to $1 ...
[^\/]+ # ... anything that's not a slash
) # close capture to $1
## # Must be immdiately followed by literal ##
/x;
my %line_num ;
while (<>) {
next unless /$regex/ ;
my $pattern = $1 ;
if ( $line_num{ $pattern } ) {
say "'$pattern' appears on lines ", $line_num{ $pattern }, " and $." ;
next ;
}
$line_num{ $pattern } = $. ; # Record the line number
}
# Ran on data above will produce;
# '.cdtproject' appears on lines 7 and 8
# '.cdtbuild' appears on lines 10 and 11

matched count for characters in two variables

Iam trying to match two variables and get the count of characters that are matched.
For ex:
$name1 = 'cat';
$name2 = 'dcat';
result needs to be 3.
Tried this but always prints 1
$count = 0;
$count++ while ($name1 =~ /$name2/g);
print "$count\n";
$count = 0;
$matchstr = "";
foreach (split(//,$name1)){
$matchstr .= $_;
$count++ if $name2 =~ /$matchstr/;
}
print "No of matched characters are: $count \n ";

How to find the largest repeating string with overlap in a line

I have a series of lines such as
my $string = "home test results results-apr-25 results-apr-251.csv";
#str = $string =~ /(\w+)\1+/i;
print "#str";
How do I find the largest repeating string with overlap which are separated by whitespace?
In this case I'm looking for the output :
results-apr-25
It looks like you need the String::LCSS_XS which calculates Longest Common SubStrings. Don't try it's Perl-only twin brother String::LCSS because there are bugs in that one.
use strict;
use warnings;
use String::LCSS_XS;
*lcss = \&String::LCSS_XS::lcss; # Manual import of `lcss`
my $var = 'home test results results-apr-25 results-apr-251.csv';
my #words = split ' ', $var;
my $longest;
my ($first, $second);
for my $i (0 .. $#words) {
for my $j ($i + 1 .. $#words) {
my $lcss = lcss(#words[$i,$j]);
unless ($longest and length $lcss <= length $longest) {
$longest = $lcss;
($first, $second) = #words[$i,$j];
}
}
}
printf qq{Longest common substring is "%s" between "%s" and "%s"\n}, $longest, $first, $second;
output
Longest common substring is "results-apr-25" between "results-apr-25" and "results-apr-251.csv"
my $var = "home test results results-apr-25 results-apr-251.csv";
my #str = split " ", $var;
my %h;
my $last = pop #str;
while (my $curr = pop #str ) {
if(($curr =~/^$last/) || $last=~/^$curr/) {
$h{length($curr)}= $curr ;
}
$last = $curr;
}
my $max_key = max(keys %h);
print $h{$max_key},"\n";
If you want to make it without a loop, you will need the /g regex modifier.
This will get you all the repeating string:
my #str = $string =~ /(\S+)(?=\s\1)/ig;
I have replaced \w with \S (in your example, \w doesn't match -), and used a look-ahead: (?=\s\1) means match something that is before \s\1, without matching \s\1 itself—this is required to make sure that the next match attempt starts after the first string, not after the second.
Then, it is simply a matter of extracting the longest string from #str:
my $longest = (sort { length $b <=> length $a } #str)[0];
(Do note that this is a legible but far from being the most efficient way of finding the longest value, but this is the subject of a different question.)
How about:
my $var = "home test results results-apr-25 results-apr-251.csv";
my $l = length $var;
for (my $i=int($l/2); $i; $i--) {
if ($var =~ /(\S{$i}).*\1/) {
say "found: $1";
last;
}
}
output:
found: results-apr-25