Perl Regex - Print the matched Conditional Regex

Perl Regex - Print the matched Conditional Regex - regex

I am trying to extract some patterns out of a log file but I am unable to print them properly.
Examples of log strings :
1) sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
2) sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
I want to extract 3 things :
A = "FPJ.INV_DOM_16_PRD" B = "47269" C = 9616 or 2644666 (if the line
has endid then C = 2644666 else it's 9616)
log line can either be of type 1 or 2. I am able to extract A and B but I am stuck with C as I need a conditional statement for it and I am not able to extract it properly. I am pasting my code :
my $string='/sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666';
if ($string =~ /sequence_history\/buckets\/(.*)/){
my $line = $1;
print "$line\n";
if($line =~ /(FPJ.*PRD)\.(\d*)\./){
my $topic_type_string = $1;
my $topic_id = $2;
print "$1\n$2\n";
}
if($string =~ /(?(?=endid=)\d*$)/){
# how to print match pattern here?
print "match\n";
}
Thanks in advance!

This will do the job:
use Modern::Perl;
use Data::Dumper;
my $re = qr/(FPJ.+?PRD)\.(\d+)\..*?(\d+)$/;
while(<DATA>) {
chomp;
my (#l) = $_ =~ /$re/g;
say Dumper\#l;
}
__DATA__
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
Output:
$VAR1 = [
'FPJ.INV_DOM_16_PRD',
'47269',
'2644666'
];
$VAR1 = [
'FPJ.INV_DOM_16_PRD',
'41987',
'9616'
];
Explanation:
( : start group 1
FPJ : literally FPJ
.+? : 1 or more any character but newline, not greedy
PRD : literally PRD
) : end group 1
\. : a dot
( : start group 2
\d+ : 1 or more digit
) : end group 2
\. : a dot
.*? : 0 or more any character not greedy
( : start group 3
\d+ : 1 or more digit
) : end group 3
$ : end of string

If you are trying to fetch some entries in log file, then you can use file handles in perl. In below code i'm trying to fetch the entries from a log file named as test.log
Entries of the log are as below.
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.69886?startid=2644000&endid=26765849
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.24465?startid=2644000&endid=836783741
Below is the perl script to fetch required data.
#!/usr/bin/perl
use strict;
use warnings;
open (FH, "test.log") || die "Not able to open test.log $!";
my ($a,$b,$c);
while (my $line=<FH>)
{
if ($line =~ /sequence_history\/buckets\/.*endid=(\d*)/)
{
$c= $1;
if ($line =~ /(FPJ.*PRD)\.(\d*)\.(\d*)\?/)
{
$a=$1;
$b=$2;
}
}
else
{
if ($line =~ /sequence_history\/buckets\/(FPJ.*PRD)\.(\d*)\.(\d*)/)
{
$a=$1;
$b=$2;
$c=$3;
}
}
print "\n \$a=$a\n \$b=$b\n \$c=$c \n";
}
Output:
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=2644666
$a=FPJ.INV_DOM_16_PRD
$b=41987
$c=9616
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=26765849
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=836783741
You can use the above code by replacing "test.log" by log file name (along with its path) from which you want to fetch data as shown below.
open (FH, "/path/to/log/file/test.log") || die "Not able to open test.log $!";

Related

Perl Regex - Getting Text Before and After Match

I am parsing a tab delimited file line by line:
Root rootrank 1 Bacteria domain .72 Firmicutes phylum 1 Clostridia class 1 etc.
=
while (my $line = <$fh>) {
chomp($line);
}
On every line, I want to capture the 1st entry before and after a particular match. For example, for the match phylum, I want to capture the entries Firmicutes and 1. For the match domain, I want to capture the entries Bacteria and .72. How would I write the regex to do this?
Sidenote: I can't simply split the line by tab into an array and use the index because sometimes a category is missing or there are extra categories, and that causes the entries to be shifted by one or two indices. And I want to avoid writing blocks of if statements.

You can still split the input, then map the words to indices, and use than use the indices corresponding to the matches to extract the neighbouring cells:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #matches = qw( phylum domain );
while (<>) {
chomp;
my #cells = split /\t/;
my %indices;
#indices{ #cells } = 0 .. $#cells;
for my $match (#matches) {
if (defined( my $index = $indices{$match} )) {
say join "\t", #cells[ $index - 1 .. $index + 1 ];
}
}
}
What's missing:
You should handle the case when $index == 0 or $index == $#cells.
You should handle the case where some words are repeated in one line.

my $file = "file2.txt";
open my $fh, '<', $file or die "Unable to Open the file $file for reading: $!\n";
while (my $line = <$fh>) {
chomp $line;
while ($line =~ /(\w+)\s+(\w+)\s+(\.?\d+)/g) {
my ($before, $match, $after) = ($1, $2, $3);
print "Before: $before Match: $match After: $after\n";
}
}

You can just simply use the following regex to capture the words before and after of a matched word:
(?<LSH>[\w.]+)[\s\t](?<MATCH>.*?)[\s\t](?<RHS>[\w.]+)
see demo / explanation

You could do:
#!/usr/bin/perl
use Modern::Perl;
my #words = qw(phylum domain);
while(<DATA>) {
chomp;
for my $word (#words) {
my ($before, $after) = $_ =~ /(\S+)(?:\t\Q$word\E\t)(\S+)/i;
say "word: $word\tbefore: $before\tafter: $after";
}
}
__DATA__
Root rootrank 1 Bacteria domain .72 Firmicutes phylum 1 Clostridia class 1 etc.
Output:
word: phylum before: Firmicutes after: 1
word: domain before: Bacteria after: .72

perl count line in double looping, if match regular expression plus 1

I open a file by putting the line to an array. Inside this file based on the regular expression that contains a duplicate value. If the regular expression is a match I want to count it. The regular expression may look like this
$b =~ /\/([^\/]+)##/. I want to match $1 value.
my #array = do
{
open my $FH, '<', 'abc.txt' or die 'unable to open the file\n';
<$FH>;
};
Below is the way I do, it will get the same line in my file. Thank for help.
foreach my $b (#array)
{
$conflictTemp = 0;
$b =~ /\/([^\/]+)##/;
$b = $1;
#print "$b\n";
foreach my $c (#array)
{
$c =~ /\/([^\/]+)##/;
$c = $1;
if($b eq $c)
{
$conflictTemp ++;
#print "$b , $c \n"
#if($conflictTemp > 1)
#{
# $conflict ++;
#}
}
}
}
Below is the some sample data, two sentences are duplicates
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3

It looks like you're trying to iterate each element of the array, select some data via pattern match, and then count dupes. Is that correct?
Would it not be easier to:
my %count_of;
while ( <$FH> ) {
my ( $val ) = /\/([^\/]+)##/;
$count_of{$val}++;
}
And then, for the variables that have more than one (e.g. there's a duplicate):
print join "\n", grep { $count_of{$_} > 1 } keys %count_of;
Alternatively, if you're just wanting to play 'spot the dupe':
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $match = qr/\/([^\/]+)##/;
while ( <DATA> ) {
my ( $value ) = m/$match/ or next;
print if $seen{$value}++;
}
__DATA__
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3

The problem has been solved by the previous answer - I just want to offer an alternate flavour that;
Spells out the regex
Uses the %seen hash to record the line the pattern first appears; to enable
slightly more detailed reporting
use v5.12;
use warnings;
my $regex = qr/
\/ # A literal slash followed by
( # Capture to $1 ...
[^\/]+ # ... anything that's not a slash
) # close capture to $1
## # Must be immdiately followed by literal ##
/x;
my %line_num ;
while (<>) {
next unless /$regex/ ;
my $pattern = $1 ;
if ( $line_num{ $pattern } ) {
say "'$pattern' appears on lines ", $line_num{ $pattern }, " and $." ;
next ;
}
$line_num{ $pattern } = $. ; # Record the line number
}
# Ran on data above will produce;
# '.cdtproject' appears on lines 7 and 8
# '.cdtbuild' appears on lines 10 and 11

A non-greedy Perl regular expression

I need to write a script which does the following:
$ cat testdata.txt
this is my file containing data
for checking pattern matching with a patt on the back!
only one line contains the p word.
$ ./mygrep5 pat th testdata.txt
this is my file containing data
for checking PATTERN MATCHING WITH a PATT ON THe back!
only one line contains the p word.
I have been able to print the line which is amended with the "a" capitalized as well. I have no idea how to only take what is needed.
I have been messing around (below is my script so far) and all I manage to return is the "PATT ON TH" part.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dump 'pp';
my ($f, $s, $t) = #ARGV;
my #output_lines;
open(my $fh, '<', $t);
while (my $line = <$fh>) {
if ($line =~ /$f/ && $line =~ /$s/) {
$line =~ s/($f.+?$s)/$1/g;
my $sub_phrase = uc $1;
$line =~ s/$1/$sub_phrase/g;
print $line;
}
#else {
# print $line;
#}
}
close($fh);
which returns: "for checking pattern matching with a PATT ON THe back!"
How can I fix this problem?

It sounds like you want to capitalize from pat to th except for instances of a surrounded by spaces. The easiest way is to uppercase the whole thing, and then fix any instances of A surrounded by spaces.
sub capitalize {
my $s = shift;
my $uc = uc($s);
$uc =~ s/ \s \K A (?=\s) /a/xg;
return $uc;
}
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
The downside is that will replacing any existing A surrounded by spaces with a. The following is more complicated, but it doesn't suffer from that problem:
sub capitalize {
my $s = shift;
my #parts = $s =~ m{ \G ( \s+ | \S+ ) }xg;
for (#parts) {
$_ = uc($_) if $_ ne "a";
}
return join('', #parts);
}
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
The rest of the code can be simplified:
#!/usr/bin/perl
use strict;
use warnings;
sub capitalize { ... }
my $f = shift;
my $s = shift;
while (<>) {
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
print;
}

So, if you want to match each sequence that starts with pat and ends with th, non-greedily, and uppercase that sequence, you can simply use an expression on the right side of your substitution:
$line =~ s/($f.+?$s)/uc($1)/eg;
And that's it.

Perl parsing JavaScript file regex, to catch quotes only at the beginning and end of the returned string

I'm just starting to learn Perl. I need to parse JavaScript file. I came up with the following subroutine, to do it:
sub __settings {
my ($_s) = #_;
my $f = $config_directory . "/authentic-theme/settings.js";
if ( -r $f ) {
for (
split(
'\n',
$s = do {
local $/ = undef;
open my $fh, "<", $f;
<$fh>;
}
)
)
{
if ( index( $_, '//' ) == -1
&& ( my #m = $_ =~ /(?:$_s\s*=\s*(.*))/g ) )
{
my $m = join( '\n', #m );
$m =~ s/[\'\;]//g;
return $m;
}
}
}
}
I have the following regex, that removes ' and ; from the string:
s/[\'\;]//g;
It works alright but if there is a mentioned chars (' and ;) in string - then they are also removed. This is undesirable and that's where I stuck as it gets a bit more complicated for me and I'm not sure how to change the regex above correctly to only:
Remove only first ' in string
Remove only last ' in string
Remove ont last ; in string if exists
Any help, please?

You can use the following to match:
^'|';?$|;$
And replace with '' (empty string)
See DEMO

Remove only first ' in string
Remove only last ' in string
^[^']*\K'|'(?=[^']*$)
Try this .See demo.
https://regex101.com/r/oF9hR9/8
Remove ont last ; in string if exists
;(?=[^;]*$)
Try this.See demo.
https://regex101.com/r/oF9hR9/9
All three in one
^[^']*\K'|'(?=[^']*$)|;(?=[^;]*$)
See Here

You can use this code:
#!/usr/bin/perl
$str = "'string; 'inside' another;";
$str =~ s/^'|'?;?$//g;
print $str;
IDEONE demo
The main idea is to use anchors: ^ beginning of string, $ end of string and ;? matches the ";" symbol at the end only if it is present (? quantifier is making the pattern preceding it optional).EDIT: Also, ; will get removed even if there is no preceding '.

I suggest that your original code should look more like this. It is much more idiomatic Perl and I think more straightforward to follow
sub __settings {
my ($_s) = #_;
my $file = "$config_directory/authentic-theme/settings.js";
return unless -r $file;
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
my #file = <$fh>;
chomp #file;
for ( #file ) {
next if m{//};
if ( my #matches = $_ =~ /(?:$_s\s*=\s*(.*))/g ) {
my $matches = join "\n", #matches;
$matches =~ tr/';//d;
return $matches;
}
}
}

Changing time format using regex in perl

I want to read 12h format time from file and replace it with 24 hour
example
this is due at 3:15am -> this is due 15:15
I tried saving variables in regex and manupilate it later but didnt work, I also tried using substitution "/s" but because it is variable I couldnt figure it out
Here is my code:
while (<>) {
my $line = $_;
print ("this is text before: $line \n");
if ($line =~ m/\d:\d{2}pm/g){
print "It is PM! \n";}
elsif ($line =~ m/(\d):(\d\d)am/g){
print "this is try: $line \n";
print "Its AM! \n";}
$line =~ s/($regexp)/<French>$lexicon{$1}<\/French>/g;
print "sample after : $line\n";
}

A simple script can do the work for you
$str="this is due at 3:15pm";
$str=~m/\D+(\d+):\d+(.*)$/;
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12;
$min=$2;
$str=~s/at.*/$hour:$min/g;
print "$str\n";
Gives output as
this is due 15:15
What it does??
$str=~m/\D+(\d+):(\d+)(.*)$/; Tries to match the string with the regex
\D+ matches anything other than digits. Here it matches this is due at
(\d+) matches any number of digits. Here it matches 3. Captured in group 1 , $1 which is the hours
: matches :
(\d+) matches any number of digits. Here it matches 15, which is the minutes
(.*) matches anything follwed, here am . Captures in group 2, `$2
$ anchors the regex at end of
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12; Converts to 24 hour clock. If $2 is pm adds 12 unless it is 12. Also if the time is am and 12 then the hour is 0
$str=~s/at.*/$hour:$min/g; substitutes anything from at to end of string with $hour:$min, which is the time obtained from the ternary operation performed before

#!/usr/bin/env perl
use strict;
use warnings;
my $values = time_12h_to_24h("11:00 PM");
sub time_12h_to_24h
{
my($t12) = #_;
my($hh,$mm,$ampm) = $t12 =~ m/^(\d\d?):(\d\d?)\s*([AP]M?)/i;
$hh = ($hh % 12) + (($ampm =~ m/AM?/i) ? 0 : 12);
return sprintf("%.2d:%.2d", $hh, $mm);
}
I found this code in the bleow link. Please check:
Is my pseudo code for changing time format correct?

Try this it give what you expect
my #data = <DATA>;
foreach my $sm(#data){
if($sm =~/(12)\.\d+(pm)/g){
print "$&\n";
}
elsif($sm =~m/(\d+(\.)?\d+)(pm)/g )
{
print $1+12,"\n";
}
}
__DATA__
Time 3.15am
Time 3.16pm
Time 5.17pm
Time 1.11am
Time 1.01pm
Time 12.11pm

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl Regex - Print the matched Conditional Regex - regex

Related

Perl Regex - Getting Text Before and After Match

perl count line in double looping, if match regular expression plus 1

A non-greedy Perl regular expression

Perl parsing JavaScript file regex, to catch quotes only at the beginning and end of the returned string

Changing time format using regex in perl

Categories

Resources