Perl Regular Expression Question - regex

I wrote a Perl program which reads text from text file and prints it out.
I want to print out a line which has specific format.
For example, there are some lines like this:
information:
Ahmad.prn:592118:2001:7:5:/Essay
Ashford.rtf:903615:2001:6:28:/usr/Essay
Barger.doc:243200:2001:7:4:/home/dir
end of Information.
I want to read only these three lines:
Ahmad.prn:592118:2001:7:5:/Essay
Ashford.rtf:903615:2001:6:28:/usr/Essay
Barger.doc:243200:2001:7:4:/home/dir
I think that the meaning of the fields is:
Ahmad.prn <- file name
592118 <- size of file
2001:7:5 <- created date
/Essay <- path of file
My code is this:
#!/usr/bin/perl
use strict;
use warnings;
open (my $infh, "<", $file)||die "cant open";
while(my $line = <$infh>) {
chomp ($line);
if ($line =~ /(what regular expression do I have to put in here?)/) {
print "$line";
}
}
close ($infh);

If lines you need always ends with /Essay, you may use following regex
/:\/Essay$/
Edit 1: looks there is middle parts are only numbers, you may match this way.
/:\d+:\d+:\d+:\d+:/

Since you have this format for Ahmad.prn:592118:2001:7:5:/Essay
Ahmad.prn <- file name
592118 <- size of file
2001:7:5 <- created date
/Essay <- path of file
you can use this regular expression
/^\s*(\S+):(\d+):(\d+:\d+:\d+):(\S+)\s*$/
With this you will have file name in $1, Size of the file in $2, Date of creation in $3, Path to the file in $4
I added optional spaces in the start and end of the line, if you want to allow optional spaces after or before : you can add \s*

#!/usr/bin/perl
use strict;
my $inputText = qq{
Ahmad.prn:592118:2001:7:5:/Essay
Ashford.rtf:903615:2001:6:28:/usr/Essay
Barger.doc:243200:2001:7:4:/home/dir
end of Information.
};
my #input = split /\n/, $inputText;
my $i = 0;
while ($input[$i] !~ /^end of Information.$/) {
if ($input[$i] !~ /:/) {
$i++;
next;
}
my ($fileName, $fileSize, $year, $month, $day, $filePath) = split /:/, $input[$i];
print "$fileName\t $fileSize\t $month/$day/$year\t $filePath\n";
$i++;
}

$line =~ ([a-zA-Z.]+):(\d+):(\d+):(\d+):(\d+):([\/A-Za-z]+)
$name = $1; #Ahmad.prn
$id = $2; #592118
$year = $3; #2001
$dir = $6; #/Essay
Note: loop through it for multiple names

Related

Perl replace a string which is in a line above matching pattern

If there is a string (e.g. "TODAY,TOMORROW,YESTERDAY") in any line for all the *.java files in a directory(and its sub directories), then replace a string in the line above (eg., "Raining,Cloudy,Windy" to "Sunny") and print them to a csv file before replacing (eg., file1.java TODAY Raining, File2.java TOMMOROW Cloudy)
But my regex is not working as desired. Also, is there any other better way to achieve the mentioned requirement?
use strict;
use warnings;
use File::Find::Rule;
my #day = ("TODAY", "TOMORROW", "YESTERDAY");
my #weather = ("Raining", "Cloudy", "Windy");
my $dayregex = join "|", #day;
$dayregex = qr/\b($dayregex)\b/;
my $weatherregex = join "|", #weather;
$weatherregex = qr/\b($weatherregex)\b/;
my $output = 'output.csv';
#Getting list of files in dir and sub dirs
my #files = File::Find::Rule->file()
->name( '*.java' )
->in( 'C:/Users/path/to/folder/' );
for my $file (#files) {
print "Opening file: $file\n";
open(INPUT, $file) or die("Input file $file not found. \n");
while (my $line = <INPUT>) {
if ($line =~ m/$dayregex/ {
print "There was a match on $1 from array day\n";
#send the output to csv before replacing
open(OUTPUT, '>'.$output) or die("Cannot create $output file. \n");
print OUTPUT $file $1 $weatherregex;
close(OUTPUT);
#Replace the matched weatherregex string with Sunny
$line =~ s/$weatherregex(.*\n.*$1)/Sunny/g ;
}
}
}
close(INPUT);
Try this:
weatherregex(.*\n)+
I think this will work to find weatherregex is presented in multiple line or splitted into two lines.

perl count line in double looping, if match regular expression plus 1

I open a file by putting the line to an array. Inside this file based on the regular expression that contains a duplicate value. If the regular expression is a match I want to count it. The regular expression may look like this
$b =~ /\/([^\/]+)##/. I want to match $1 value.
my #array = do
{
open my $FH, '<', 'abc.txt' or die 'unable to open the file\n';
<$FH>;
};
Below is the way I do, it will get the same line in my file. Thank for help.
foreach my $b (#array)
{
$conflictTemp = 0;
$b =~ /\/([^\/]+)##/;
$b = $1;
#print "$b\n";
foreach my $c (#array)
{
$c =~ /\/([^\/]+)##/;
$c = $1;
if($b eq $c)
{
$conflictTemp ++;
#print "$b , $c \n"
#if($conflictTemp > 1)
#{
# $conflict ++;
#}
}
}
}
Below is the some sample data, two sentences are duplicates
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
It looks like you're trying to iterate each element of the array, select some data via pattern match, and then count dupes. Is that correct?
Would it not be easier to:
my %count_of;
while ( <$FH> ) {
my ( $val ) = /\/([^\/]+)##/;
$count_of{$val}++;
}
And then, for the variables that have more than one (e.g. there's a duplicate):
print join "\n", grep { $count_of{$_} > 1 } keys %count_of;
Alternatively, if you're just wanting to play 'spot the dupe':
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $match = qr/\/([^\/]+)##/;
while ( <DATA> ) {
my ( $value ) = m/$match/ or next;
print if $seen{$value}++;
}
__DATA__
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
The problem has been solved by the previous answer - I just want to offer an alternate flavour that;
Spells out the regex
Uses the %seen hash to record the line the pattern first appears; to enable
slightly more detailed reporting
use v5.12;
use warnings;
my $regex = qr/
\/ # A literal slash followed by
( # Capture to $1 ...
[^\/]+ # ... anything that's not a slash
) # close capture to $1
## # Must be immdiately followed by literal ##
/x;
my %line_num ;
while (<>) {
next unless /$regex/ ;
my $pattern = $1 ;
if ( $line_num{ $pattern } ) {
say "'$pattern' appears on lines ", $line_num{ $pattern }, " and $." ;
next ;
}
$line_num{ $pattern } = $. ; # Record the line number
}
# Ran on data above will produce;
# '.cdtproject' appears on lines 7 and 8
# '.cdtbuild' appears on lines 10 and 11

Search a pattern and replace the entire line of a perl module file

I have a perl module file. like this :
$release_name = 'Software Release';
$primary_version = '1';
$secondary_version = 'R00.0';
$Main_version = 'R00.0';
I want to search the Main_version and replace the line to
$Main_version = R00.1
when i run the script.
I have tried like this. but its not working.
#!/usr/bin/perl -w
use strict;
my $base;
my $file = "/main-dir/work/Myfile.pm";
open(FILE, $file) || die "File not found";
my #base = <FILE>;
close(FILE);
my $item = '$Main_version';
my newitem="R00.1";
foreach $base(#base)
{
if($base =~ /$item/){
$base =~ s/$item/$item='$newitem'/gi;
print ("Hello, world!\n");
}
#else { print $base;}
}
open (BASE, ">$file");
print BASE #base;
close (BASE);
How to search and change the entire line of a perl module? Thanks.
Had you used the standard VERSION variable, you could have profited from perl-reversion.
The dollar sign is special in regular expressions, it means "the end of the line". Backslash it, or use quotemeta which could be shortened to \Q in a regex:
$base =~ /\Q$item/

Perl newbie: trying to find string in array of strings

I need to match a string against an array of strings. The string that I am searching for should be able to contain wildcards.
#!/usr/bin/perl
#
## disable buffered I/O which would lead
## to deadloops for the Apache server
$| = 1;
#
## read URLs one per line from stdin
while (<>) {
my $line = $_;
my #array1 = ("abc","def","ghi");
$found = 0;
if (/$line/i ~~ #array1)
{
print "found\n";
}
else
{
print "not found\n";
}
}
I test this script with the input of abc and it returns not found
perl ./mapscript.pl
abc
not found
Your input has a newline at the end. Add:
chomp $line;
right after
my $line = $_;
Use chomp(my $input = $_) to remove newline instead of my $input = $_ inside your while..
** OOPs.. Didn't see that I'm posting Duplicate..
a newline at the end always exists using <>. see chomp

How can I extract abbreviations from a file using Perl?

I need to extract certain Abbreviations from a file such as ABS,TVS,and PERL. Any abbreviations which are in uppercase letters. I'd preferably like to do this with a regular expression. Any help is appreciated.
It would have been nice to hear what part you were particularly having trouble with.
my %abbr;
open my $inputfh, '<', 'filename'
or die "open error: $!\n";
while ( my $line = readline($inputfh) ) {
while ( $line =~ /\b([A-Z]{2,})\b/g ) {
$abbr{$1}++;
}
}
for my $abbr ( sort keys %abbr ) {
print "Found $abbr $abbr{$abbr} time(s)\n";
}
Reading text to be searched from standard input and writing
all abbreviations found to standard output, separated by spaces:
my $text;
# Slurp all text
{ local $/ = undef; $text = <>; }
# Extract all sequences of 2 or more uppercase characters
my #abbrevs = $text =~ /\b([[:upper:]]{2,})\b/g;
# Output separated by spaces
print join(" ", #abbrevs), "\n";
Note the use of the POSIX character class [:upper:], which will match
all uppercase characters, not just English ones (A-Z).
Untested:
my %abbr;
open (my $input, "<", "filename")
|| die "open: $!";
for ( < $input > ) {
while (s/([A-Z][A-Z]+)//) {
$abbr{$1}++;
}
}
Modified it to look for at least two consecutive capital letters.
#!/usr/bin/perl
use strict;
use warnings;
my %abbrs = ();
while(<>){
my #words = split ' ', $_;
foreach my $word(#words){
$word =~ /([A-Z]{2,})/ && $abbrs{$1}++;
}
}
# %abbrs now contains all abreviations