How can I iterate over the code of an html file and find certain recurring text and then append a word and and iterating number to it.
So:
<!-- TemplateBeginEditable -->
<!-- TemplateBeginEditable -->
<!-- TemplateBeginEditable -->
etc...
Becomes :
<!-- TemplateBeginEditable Event=1 -->
<!-- TemplateBeginEditable Event=2 -->
<!-- TemplateBeginEditable Event=3 -->
etc...
I have tried PERL thinking it would be the easiest/fastest and went to jQuery and then back to PERL.
It seems simple enough to find/replace many ways with REGEX and return an array of the occurrences, but getting the iterating variable tacked on proves to be more of a challenge.
Latest Example of what I have tried:
#!/usr/bin/perl -w
# Open input file
open INPUTFILE, "<", $ARGV[0] or die $!;
# Open output file in write mode
open OUTPUTFILE, ">", $ARGV[1] or die $!;
# Read the input file line by line
while (<INPUTFILE>) {
my #matches = ($_ =~ m/TemplateBeginEditable/g);
### what do I do ith matches array? ###
$_ =~ s/TemplateBeginEditable/TemplateBeginEditable Event=/g;
print OUTPUTFILE $_;
}
close INPUTFILE;
close OUTPUTFILE;
To perform a replacement, you don't need to match the pattern before, you can directly perform the replacement. Example with your code:
while (<INPUTFILE>) {
s/TemplateBeginEditable/TemplateBeginEditable Event=/g;
print OUTPUTFILE $_;
}
Now to add a counter incremented at each replacement, you can put a piece of code in the pattern itself using this syntax:
my $i;
while (<INPUTFILE>) {
s/TemplateBeginEditable(?{ ++$i })/TemplateBeginEditable Event=$i/g;
print OUTPUTFILE $_;
}
To make it shorter you can use the \K feature to change the start of the match result:
while (<INPUTFILE>) {
s/TemplateBeginEditable\K(?{ ++$i })/ Event=$i/g;
print OUTPUTFILE $_;
}
Or with a one-liner:
perl -pe 's/TemplateBeginEditable\K(?{++$i})/ Event=$i/g' file > output
If you have awk available, and the target text only occurs at most once per line, then Perl is overkill I think:
awk 'BEGIN{n=1}{n+=sub("TemplateBeginEditable","& Event="n)}1'
Some explanation: The sub function returns the number of substitutions performed (0 or 1); the & means "whatever matched"; "..."n is string concatenation (no operator in awk); the 1 is a "true" condition that invokes the default "action" of {print}.
Expanding on my one-liner in the comments:
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift or die "Usage: $0 <filename>\n";
open my $fh, '<', $file or die "Cannot open $file: $!\n";
open my $ofh, '>', "$file.modified" or die "Cannot open $file.modified: $!\n";
my $i = 1;
while (my $line = <$fh>) {
if ($line =~ s/TemplateBeginEditable/$& Event=$i/) {
$i++;
}
print $ofh $line;
}
__END__
Note that this assumes you will never have more than one instance of your desired text on a single line, as shown in your sample input.
I'd just do:
local $/=undef;
my $content = <FH>;
my $x = 0;
$content =~ s/(My expected pattern)/$1 . " time=" . (++$x)/ge;
Related
I have a Perl program that:
Reads a SRC file written in C
Uses a regex match from SRC file to find specific formatted data to use as the Destination filename
Opens new Destination file
Performs another regex match to find all C style comments /* */ that contain a keyword abcd. Note: these comments can be 1 line or more than 1 line so the regex is looking for the first /* and then the keyword abcd and then any amount of text and space before it encounters a closing */
Writes the regex matches to the destination file
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
seek SRC_FH, 0, 0;
while(my $row = <SRC_FH>){
if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
print DES_FH "$1\n";
}
}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
My problem is I think because of the way perl code executes although by regex is correct, my destination file is only getting the 1 line comments written to it. Any C style comments that are more than 1 line are not getting written to the destination file. What am I missing in my 2nd if statement?
I checked my 2nd if statement regex here https://regexr.com/ and it works as its supposed to capturing multi line C style comments as well as single line comments that also contain the keyword abcd.
So I tried the 1st suggestion below by zdim. Here is what I used:
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
my #comments;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
#seek SRC_FH, 0, 0;
my $content = do {
#read whole file at once
local $/;
open (SRC_FH,'<', $src) or die $!;
<SRC_FH>;
};
#if($content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg){
# my #comments = $content;
# }
my #comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg;
foreach (#comments){
print DES_FH "$1\n";
}
#while(my $row = <SRC_FH>){
# if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
# print DES_FH "$1\n";
# }
#}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
The result is all the content from sample.c are copied to the destination file. A full 1:1 copy. Where I am looking to pull all comments single line and multiline out of the C file.
Example 1:
/* abcd */
Example 2:
/* some text
* some more comments
abcd and some more comments */
Final Solution
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
seek SRC_FH, 0, 0;
my $content = do{local $/; <SRC_FH>};
my #comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/g;
for(#comments){
print DES_FH "$_\n";
}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
What am I missing in my 2nd if statement?
Well, nothing -- it's just that in a multiline C comment neither of its lines has both /* and */. Thus that regex just cannot match a multiline comment when a file is read line by line.
To catch such comments either:
Read the whole file into a string ("slurp" it), and then add /s modifier on the regex so that . matches a newline as well. Also use /g modifier so to capture all such patterns in the string. One way
my $content = do {
local $/; # undef record separator so the whole file is read at once
open my $src_fh, '<', $src_file or die $!; # have to re-open
<$src_fh>; # reads it all
}; # lexical filehandle gets closed as we leave scope
# NOTE -- there may be difficulties in capturing comments in a C source file
my #comments = $content =~ /.../sg; # your regex
Or use a library to slurp a file, like
use Path::Tiny;
my $content = path($src_file)->slurp;
Or,
Set a flag when you see /*, get/print all lines until you hit the closing */, then unset the flag. Here is a rudimentary version of that
my $inside_comment = 0;
while (<$src_fh>) {
if (m{(/\*.*)}) { #/ fix syntax hilite
$inside_comment = 1; # opening line for the comment
say $des_fh $1;
}
elsif (m{(.*\*/)}) { # closing line for the comment
say $des_fh $1;
$inside_comment = 0;
}
elsif ($inside_comment) { say $des_fh $_}
}
I tested all this but please check and improve. For one, this plays funny with leading spaces.
Note: Getting all comments out of a C program in general may be rather tricky.
Here is a one-line version of slurping
my $file_content = do { local (#ARGV, $/) = $file_name; <> }
I'm new to programming and I've run into an issue. We have to use Perl to write a script that opens a file, then loops through each line using a Regex - then print out the results. The opening of the file and the loop I have, but I can't figure out how to implement the Regex. It outputs 0 matched results, when the assignment outline suggests the number to be 338. If I don't use the Regex, it outputs 2987, which is the total number of lines - which is correct. So there's something incorrect with the Regex I just can't figure out. Any help would be greatly appreciated!
Here's what I have thus far:
use warnings;
use strict;
my $i = 0;
my $filename = 'C:\Users\sample.log.txt';
open (fh, '<', $filename) or die $!;
while(<fh>) {
if ($filename=~ /(sshd)/){
$i++;
}
}
close(fh);
print $i;
Consider this piece of code of yours:
while(<fh>) {
if ($filename=~ /(sshd)/){
$i++;
}
}
You are indeed looping through the file lines, but you keep checking if the file name matches your regex. This is clearly not what you intend.
You meant:
while (my $line = <fh>) {
if ($line =~ /sshd/){
$i++;
}
}
Parentheses around the regex seem superfluous (they are meat to capture, while you are only matching).
Since expression while (<fh>) assigns the content of the line to special variable $_ (which is the default argument for regexp matching), this can be shortened as:
while (<fh>) {
$i++ if /sshd/;
}
OP code has some errors which I've correcte
use warnings;
use strict;
use feature 'say';
my $i = 0;
my $filename = 'C:\Users\sample.log.txt';
open my $fh, '<', $filename
or die "Couldn't open $filename";
map{ $i++ if /sshd/ } <$fh>;
close($fh);
say "Found: $i";
I have a file that looks something like this:
Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"
I want to grep the file for the line \attribute3= then I want to save the string found inside the quotation marks to a separate variable.
Here's what I have so far:
#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
while (my $line = <$fh>) {
if ($line =~ /\/attribute3=/g){
print $line . "\n";
}
}
That's printing out /attribute3="all*the*things*I'm*interested*in*are*inside*here** but
I want all*the*things*I'm*interested*in*are*inside*here**and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sentence.*I*think*we*have*enough*words.
So what I did next is:
#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
my $part_I_want;
while (my $line = <$fh>) {
if ($line =~ /\/attribute3=/g){
$line =~ /^/\attribute3=\"(.*?)/; # capture everything after the quotation mark
$part_I_want .= $1; # the capture group; save the stuff on line 1
# keep adding to the string until we reach the closing quotation marks
next (unless $line =~ /\"/){
$part_I_want .= $_;
}
}
}
The code above doesn't work. How do I grep capture a multiline pattern between two characters (in this case it's quotation marks)?
my $str = do { local($/); <DATA> };
$str =~ /attribute3="([^"]*)"/;
$str = $1;
$str =~ s/\n/ /g;
__DATA__
Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"
Read the entire file into a single variable and use /attribute3=\"([^\"]*)\"/ms
From the command line:
perl -n0e '/\/attribute3="(.*)"/s && print $1' foo.txt
This is basically what you had, but the 0 flag is the equivalent of undef $/ within the code. From the man page:
-0[octal/hexadecimal]
specifies the input record separator ($/) as an octal or hexadecimal number. If there are no digits, the null character is the separator.
I'm in an intro to Perl course and we are tasked with taking an input.txt file (of the Gettysburg Address - that has all instances of the word 'old' changed to 'new') and creating an output.txt file that switches 'new' back to having 'old'. I've got a general regex that switches all instances of 'new' to 'old', but it needs to work regardless of case in the input file. I'm wondering how I could add that in? Also, I'm looking to verify that I have my output.txt built in correctly? When I run what I have, I get no output.txt file created in my directory. Here is what I have so far:
open(my $getty, "<", "input.txt")
or die "Cannot open < input.txt: $!";
open(my $getty, ">", "output.txt")
or die "Cannot open < output.txt: $!";
while(my $line = <$getty>) {
if ($line =~ 's/new/old/') {
$line =~ s/new/old/;
}
}
You don't need to put an if condition.
while(my $line = <$getty>) {
$line =~ s/new/old/gi;
}
The good recipe is to change smth. from shell by Perl:
perl -pi.orig -e 's{old}{new}g' filename.txt
This produce replacement in file filename.txt wtih an original file filename.txt.orig
I guess your course is over, however here is a complete answer for anyone who might need it:
The if is unnecessary. You need to use print on the output filehandle to create the file. Here is the complete working code (there is no comma after the filehandle in the print statement):
use strict;
open(my $in_file, "<", "input.txt")
or die "Cannot open < input.txt: $!";
open(my $out_file, ">", "output.txt")
or die "Cannot open < output.txt: $!";
while(my $line = <$in_file>) {
$line =~ s/new/old/i;
print $out_file $line;
}
If the case of the input word should be the same in the output word, this is a solution:
use strict;
open(my $in_file, "<", "input.txt")
or die "Cannot open < input.txt: $!";
open(my $out_file, ">", "output.txt")
or die "Cannot open < output.txt: $!";
while(my $line = <$in_file>) {
$line =~ s{(new)}
{
my #chars = split '', $1;
my #old = qw/o l d/;
my #out;
foreach my $char (#chars) {
if($char =~ /\p{Uppercase}/) {
push #out, uc(shift #old);
}
else {
push #out, shift #old;
}
}
join('', #out);
}esi;
print $out_file $line;
}
What happens here is that I use s{pattern}{replacement}. The e modifier makes the replacement part perl code and s makes it possible for me to use whitespace in the expression. In the replecement code I go trough every char of the "new" (captured with braces so I can check it in the variable $1). If the char is uppercase I use the uc function to make the output char uppercase aswell.
I'm reading the contents of a log file, performing a regex on the lines and putting the results in an array, but for some reason there is no output.
use strict;
use warnings;
my $LOGFILE = "log.file";
my $OUTFILE = "out.file";
open(LOG, "$LOGFILE") or die ("Could not open $LOGFILE: $!\n");
open(TMP, ">", "$OUTFILE") or die ("Could not open $OUTFILE: $!\n");
my #data = (<LOG> =~ /<messageBody>(.*?)<\/messageBody>/sg);
print TMP "This is a test line. \n";
foreach (#data){
print "#data\n";
print "\n=======================\n";
}
close TMP;
close LOG;
My output is a file (out.file) and the only content is "This is a test line." I know the regex works because I tried it at the prompt with:
-lne 'BEGIN{undef $/} while (/(.*?)</messageBody>/sg) {print $1} log.file > test.file
What am I doing wrong?
Your data likely spans lines.
You'll therefore need to slurp the entire thing before using your regex:
my $logdata = do {local $/; <LOG>};
my #data = $logdata =~ m{<messageBody>(.*?)</messageBody>}sg;
If you want to print the data's items, then you should do something like this:
foreach $item (#data){
print TMP "$item\n";
print TMP "\n=======================\n";
}
$item will loop through all array items and will be written in TMP file