How to print lines in between two patterns? - regex

I would like to print everything between lines #cluster t.# has ### elements (including this line) and #cluster t.#+1 has ### elements (preferably omitting this line) from my input file into corresponding numbered output files (clust(#).txt). The script thus far creates the appropriate numbered files, without any content.
#!/usr/bin/perl
use strict;
use warnings;
open(IN,$ARGV[0]);
our $num = 0;
while(my $line = <IN>) {
if ($line =~ /^\#cluster t has (\d+) elements/) {
my $clust = "full";
open (OUT, ">clust$clust.txt");
} elsif ($line =~ m/^\#cluster t.(\d+.*) has (\d+) elements/) {
my $clust = $1;
$num++;
open (OUT, ">clust$clust.txt");
print OUT, $_ if (/$line/ ... /$line/);
}
}

Update Re-arranged so that the version based on my final understanding of input comes first. Also edited for clarity.
Detect the line that starts the section to be written to its own file and open the suitable file; otherwise just write to the filehandle (that corresponds to the current output file).
An example input file, in my understanding, data_range.txt
#cluster t.1 has 100 elements
data 1
data 1 1
#cluster t.2 has 200 elements
data 2
#cluster t.3 has 300 elements
Print t.N and the lines following up to the next t.N, to a file clust(N).txt.
use warnings;
use strict;
my $file = shift || 'data_range.txt';
open my $fh, $file or die "Can't open $file: $!";
my $fh_out;
my $clustline = qr/\#cluster t\.([0-9]+) has [0-9]+ elements/;
while (<$fh>)
{
if (/$clustline/) {
my $outfile = "clust($1).txt";
open $fh_out, '>', $outfile or die "Can't open $outfile: $!";
}
print $fh_out $_;
}
For each line with #cluster a new file with the corresponding number is opened, closing the previous one since we use the same filehandle. All following lines, including that one, belong to that file and they are printed there.
The code above assumes that the first line in the file is a #cluster line, and that all lines in this file belong to one of output files. If this may not be so then we need to be more careful: (1) use a flag for when the writing starts and (2) add a branch that allows to skip lines.
my $started_writing = 0;
my $clustline = qr/\#cluster t\.([0-9]+) has [0-9]+ elements/;
while (<$fh>)
{
if (/$clustline/) {
my $fout = "clust($1).txt";
open $fh_out, '>', $fout or die "Can't open $fout for writing: $!";
$started_writing = 1;
}
elsif (not $started_writing) { # didn't get to open output files yet
next;
}
elsif (/dont_write_this_line/) { # condition for lines to skip altogether
next;
}
print $fh_out $_;
}
All of this assumes that a #cluster line cannot repeat with the same number. You'd lose output data if that happened, so add a test if you aren't sure of your input (or open output files in append mode).
With either we get output clust(1).txt
#cluster t.1 has 100 elements
data 1
data 1 1
and clust(2).txt
#cluster t.2 has 200 elements
data 2
and clust(3).txt with the #cluster t.3 line.
Original version, with the initial understanding of input and requirements
The range operator is nearly tailor made for this. It keeps track of its true/false state across repeated calls. It turns true once its left operand evaluates true and stays that way until the right one is true, after which it is false, so on the next evaluation. There is more to it, please see the docs.
Made-up input file data_range.txt
#cluster t.1 has 100 elements
#cluster t.2 has 200 elements
#cluster t.3 has 300 elements
#cluster t.4 has 400 elements
#cluster t.5 has 500 elements
Print everything between marker-lines 2 and 4, including the starting line but not the ending one.
use warnings;
use strict;
my $file = 'data_range.txt';
open my $fh, $file or die "Can't open $file: $!";
# Build the start and end patterns
my $beg = qr/^\#cluster t\.2 has 200 elements$/;
my $end = qr/^\#cluster t\.4 has 400 elements$/;
while (<$fh>)
{
if (/$beg/ .. /$end/) {
print if not /$end/;
}
}
This prints lines 2 and 3. The .. operator turns true once the line ($_) matches $beg and is true until a line matches $end. After that it is false, for the next line. Thus it ends up including both start and end lines as well. So we also test for the end marker, and not print if we have that line.
If you would rather use the literal marker lines you can test strings for equality
my $beg = q(#cluster t.2 has 200 elements);
my $end = q(#cluster t.4 has 400 elements);
while (my $line = <$fh>)
{
chomp($line);
if ($line eq $beg .. $line eq $end) {
print "$line\n" if $line ne $end;
}
}
This works the same way as the example above. Note that now we have to chomp since the newline would foil eq test (and then we add \n for printing).

I have a more concise way to provide :
perl -ne 'print if /^foo/ .. /^base/' file.txt
Sample input
Lorem ipsum dolor
sit amet,
consectetur adipiscing
foo
bar
base
elit,
sed do
Output
foo
bar
base

Related

perl regex multiline

I am trying to write a perl script that replaces a few lines of text with a few other lines, I am a perl newbie, appreciate any help.
Need to replace
'ENTITLEMENT_EVS_V',
NULL,
NULL,
with:
'ENTITLEMENT_EVS_V',
ENTITLEMENT_CATEGORY_CODE,
6,
I am unable to do so, especially the regex part. I tried many things, but the script currently stands at:
#!/usr/bin/env perl
my ($lopen_fh, $lwrite_fh);
# my $l_reg_evs = qq{
#'ENTITLEMENT_EVS_V',
#NULL,
#NULL,
#};
my $l_reg_evs = qr/(\'ENTITLEMENT_EVS_V\',
NULL,
NULL,
)/;
my $l_evs=qq{
'ENTITLEMENT_EVS_V',
ENTITLEMENT_CATEGORY_CODE,
6,
};
open ($lopen_fh, '<', "/home/cbdev2/imp/dev/src/deli/entfreeunits/config/entfreeunits/stubs/DirectVariables_evEntlCategory.exp") or die $!;
open ($lwrite_fh, '>', "/home/cbdev2/imp/dev/src/deli/entfreeunits/config/entfreeunits/stubs/DirectVariables_evEntlCategory.new.exp") or die $!;
while(<$lopen_fh>) {
$_ =~ s/$l_reg_evs/$l_evs/m;
print $lwrite_fh $_;
}
close $lopen_fh;
close $lwrite_fh;
I am not quite clear what you are trying to do, but I tried to distill the essence of your problem into a self-contained script. In a real program, you'd be reading and parsing the mapping of variable names to categories and codes, but here, I am just going to read from strings. The purpose of that is to show that the task can be accomplished without slurping files.
#!/usr/bin/env perl
use strict;
use warnings;
my $entitlement_map_file = <<EOF_MAP_FILE;
'ENTITLEMENT_EVS_V'
ENTITLEMENT_CATEGORY_CODE
6
EOF_MAP_FILE
my $entitlement_input_file = <<EOF_INPUT_FILE;
'ENTITLEMENT_EVS_V',
NULL,
NULL,
EOF_INPUT_FILE
# read and parse the file containing mapping of
# variables to category names and codes
open my $map_fh, '<', \$entitlement_map_file
or die $!;
my %map;
while (my $var = <$map_fh>) {
chomp $var;
chomp( my $mnemonic = <$map_fh> );
chomp( my $code = <$map_fh>);
#{ $map{$var} }{qw(mnemonic code)} = ($mnemonic, $code);
}
close $map_fh;
# read the input file, look up variable name in the map
# if there, follow with category name and code
# skip two lines from input, continue where you left off
open my $in, '<', \$entitlement_input_file
or die $!;
while (my $var = <$in>) {
$var =~ s/,\s+\z//;
next unless exists $map{ $var };
for (1 .. 2) {
die unless <$in> =~ /^NULL/;
}
print join(",\n", $var, #{ $map{$var} }{qw(mnemonic code)}), "\n";
}
close $in;
Output:
'ENTITLEMENT_EVS_V',
ENTITLEMENT_CATEGORY_CODE,
6
Generalizing this is left as an exercise to the reader.
I think I'd write something like this. It expects the path to the input file as a parameter on the command line and prints the result to STDOUT
It requires all of the following to be true
The first line of the block to search for and the block to be replaced are always identical
The number of lines in the block to search for and the block to be replaced are always the same
The input file always contains the first line of the block to search for exactly once
There is no need to check that the lines in the file after the first one in the block are NULL,, and it is sufficient to locate just the first line and remove the following lines whatever they contain
It works by reading the input file and copying it to STDOUT. If it encounters a line that contains the first line of the replacement block, then it reads and it discards lines until the number of lines read is equal to the size of the replacement block. Then the text in the replacement block is printed to STDOUT and the copying continues
use strict;
use warnings 'all';
no warnings 'qw'; # avoid warning about commas in qw//
my #replacement = qw/
'ENTITLEMENT_EVS_V',
ENTITLEMENT_CATEGORY_CODE,
6,
/;
open my $fh, '<', $ARGV[0];
while ( <$fh> ) {
if ( /$replacement[0]/ ) {
<$fh> for 1 .. $#replacement;
print "$_\n" for #replacement;
}
else {
print;
}
}
This works fine with some sample data that I created, but I have no way of knowing whether the stipulations listed above apply to your actual data. I'm sure you will let me know if something needs adjusting
Here's what I did:
#!/usr/bin/env perl
my ($lopen_fh, $lwrite_fh);
my $l_stub_dir = "/home/cbdev2/imp/dev/src/deli/entfreeunits/config/entfreeunits/stubs";
my $l_stub = "DirectVariables_evEntlCategory.exp";
my $l_filename = "$l_stub_dir/$l_stub";
my $search_evs = 'ENTITLEMENT_EVS_V';
my $search_tre = 'ENTITLEMENT_TRE_V';
my #replacement_evs = qw/
'ENTITLEMENT_EVS_V',
ENTITLEMENT_CATEGORY_CODE,
6,
/;
my #replacement_tre = qw/
'ENTITLEMENT_TRE_V',
ENTITLEMENT_CATEGORY_CODE,
6,
/;
open ($lopen_fh, "<$l_filename") or die $!;
open ($lwrite_fh, ">$l_filename.new") or die $!;
while(<$lopen_fh>) {
if ( /'ENTITLEMENT_EVS_V'/ ) {
<$lopen_fh> for 1 .. $#replacement_evs;
print $lwrite_fh " $_\n" for #replacement_evs;
}
elsif ( /'ENTITLEMENT_TRE_V'/ ) {
<$lopen_fh> for 1 .. $#replacement_tre;
print $lwrite_fh " $_\n" for #replacement_tre;
}
else {
print $lwrite_fh $_;
}
}
close $lopen_fh;
close $lwrite_fh;
unlink($l_filename) or die "Failed to delete $l_filename: $!";
link("$l_filename.new", $l_filename) or die "Failed to copy $l_filename";
unlink("$l_filename.new") or die "Failed to delete $l_filename.new: $!";

Take an array of phone numbers and search another array for each occurrence of said number and print that matching line and the following line

I have two text files. I'm importing each file into an array. Each value in the numbers array should search the users array for its match. If found echo the matching line and the proceeding line.
So, if the first entry in the numbers array is 1234, search users array for 1234. If found print that line and the next.
numbers.txt looks like:
1234567021
1234566792
users filelooks like:
1234567021#host.com User-Password == "secret"
Framed-IP-Address = 000.000.000.000,
What I have so far:
use strict;
my $users_file = "users";
my $numbers_file = "numbers.txt";
my $phonenumber;
my $numbers;
#### Place phone number into an array ####
open (RESULTS, $numbers_file) or die "Unable to open file: $users_file\n$!";
my #numbers;
#numbers = <NUMBER_RESULTS>;
close(NUMBER_RESULTS);
#### Place users file contents into an array ####
open (RESULTS, $users_file) or die "Unable to open file: $users_file\n$!";
my #users_data;
#users_data = <RESULTS>;
close(RESULTS);
#### Search the array for the string ####
foreach $numbers(#users_data) {
if (index($numbers,$phonenumber) ge 0) {
my #list = grep /\b$numbers\b/, #users_data;
chomp #list;
print "$_\n" foreach #list;
}
}
exit 1;
You are recreating a search for a key when perl has a built-in hash data type that will handle this better and faster than rolling your own. Using this will take a little more work in reading in the data, but it will be worth it.
First, lets switch to a modern version of open where we use a lexically scoped variable for the file handle, and specify a mode.
open (my $results, "<", $users_file) or die "Unable to open file: $users_file\n$!";
From there, we will read the file open line at a time and populate the hash.
my (%users_data, $number, $number_line);
while(<$results>)
{
chomp;
if(defined($number))
{
$user_data{$number} = "$number_line\t$_\n"; #load the line after the number into the hash value.
undef $number;
}else
{
if(/^(\d+)\#/) #match digits between the beginning of the line and the # symbol.
{
$number = $1; #save matched digits from $1.
$number_line = $_;
}
}
Note that this is assuming that the data is well formated. If there are concerns, you can test for proper formatting in the else clause.
Now, for the output we can use the following
for (#numbers)
{
chomp; #since we didn't remove newlines when populating #numbers
if( defined($users_data{$_}) )
{
print $users_data{$_};
}
}
EDIT
Here is a working version. Note use strict and use warnings helped to catch that one variables was declared (RESULTS and %users_file) while another was used later (NUMBER_RESULTS and %user_file), which is why those are so important. Also Data::Dumper was used to print out the contents of the array #numbers and the hash %users_data to see what data actually made it into the data structures.
#!/usr/bin/env perl
use strict;
use warnings;
#use Data::Dumper;
my $users_file = "users";
my $numbers_file = "numbers.txt";
#### Place phone number into an array ####
open (my $results, "<", $numbers_file) or die "Unable to open file: $numbers_file\n$!";
my #numbers;
#numbers = <$results>;
close($results);
#print Dumper \#numbers;
open (my $results, "<", $users_file) or die "Unable to open file: $users_file\n$!";
my (%users_data, $number, $number_line);
while(<$results>)
{
chomp;
if(defined($number))
{
$users_data{$number} = $number_line."\n$_\n"; #load the line after the number into the hash value.
undef $number;
}else
{
if(/^(\d+)\#/) #match digits between the beginning of the line and the # symbol.
{
$number = $1; #save matched digits from $1.
$number_line = $_;
}
}
}
#print Dumper \%users_data;
for (#numbers)
{
chomp; #since we didn't remove newlines when populating #numbers
if( defined($users_data{$_}) )
{
print $users_data{$_};
}
}

Find and get the location of a list of words in a text

I'm sure this is simple but I just can't figure out what to do...
I have a text file with a bunch of words in it (let's call it "wordlist") organized in a single column. Then I have a big text file (let's call it "essay"). What I want to do is to look in the "essay" file for the words in my "wordlist".
The trick is that I want to know the position of the matched word in the "essay" (meaning, match found after X characters).
I'm actually able to do it when I look for a single word (so wordlist containing just 1 word) but I can't get it to work when working with a list of words...
Any advice ?
thanks a lot
Ok so I just realized it would just tell me "no match found" anyway...Here is the code
use strict;
use warnings;
open (my $wordlist, "<", "/wordlist.txt")
or die "cannot open < wordlist.txt $!";
open (my $essay, "<", "/essay.txt")
or die "cannot open < essay.txt $!";
while (<$essay>) { print "match found\n" if ($essay =~ m/$wordlist/) ; }
{ print "no match found\n" if ($essay !~ m/$wordlist/) ; }
Help please...?
perl index function basically matches substring which does not ensure the match of a full string. A regular expression based match is more useful here imho.
Explanation:
Read whole text of essay in a string. => $essay
For each word from wordlist.txt => $_
-- Keep matching $_ within $essay with proper regex. The one used here is b$_\b
-- For each match, collect the value of #-[0]
\b: is the word boundary character here which ensures that it only matches with complete words not substrings.
#-: is a special variable that contains the start position of the last regex match.
Here is a sample code:
use strict;
use warnings;
use 5.010;
my $wordlist_file = 'wordlist.txt';
open my $wordlist_fh, '<', $wordlist_file or die "Failed to open '$wordlist_file': $!";
my %pos;
my $essay_file = 'essay.txt';
my $essay = do {
local $/ = undef;
open my $fh, "<", $essay_file
or die "could not open $essay_file: $!";
<$fh>;
};
while (<$wordlist_fh>) {
chomp;
$pos{$_} = [] unless $pos{$_};
while($essay =~ m/\b$_\b/g){
push #{$pos{$_}}, #-;
}
}
use Data::Dumper;
print Dumper(\%pos);
the wordlist file and essay files are similar as mentioned by ThisSuitIsBlackNot.
wordlist.txt
I
Perl
hacker
essay.txt
I want to be just another Perl hacker when I grow up
I want to be just another Perl hacker when I grow up
The %pos hash now contains all the positions of your each word. I just showed them through dumper
$VAR1 = {
'hacker' => [
'31',
'84'
],
'Perl' => [
'26',
'79'
],
'I' => [
'0',
'43',
'53',
'96'
]
};
Note that the counts are including the newline characters at the end of each line.
Maybe you can use index() function.
Here is the link: Using the Perl index() function
This is my sample. The performance may be not too well. Hope it helps~:)
open (my $wordlist, "<", "files/wordlist.txt")
or die "cannot open < wordlist.txt $!";
open (my $essay, "<", "files/essay.txt")
or die "cannot open < essay.txt $!";
my $words = {};
while (<$wordlist>) {
chomp($_);
$words->{$_} = 1;
}
my $row_count = 0;
while (<$essay>) {
$row_count++;
chomp($_);
foreach my $word (keys %{$words}) {
my $offset = 0;
my $r = index($_, $word, $offset);
while ($r != -1) {
print "Found [$word] in line $row_count at $r\n";
$offset = $r + 1;
$r = index($_, $word, $offset);
}
}
}
In your code, $essay and $wordlist are both filehandles. When you say
print "match found\n" if ($essay =~ m/$wordlist/);
You're trying to match the stringification of one filehandle to the stringification of another filehandle. When a filehandle is stringified, it looks something like this:
GLOB(0x9a26c38)
So your code actually does something like:
print "match found\n" if ('GLOB(0x9a26c38)' =~ m/GLOB(0x94bbc38)/);
This is not what you want. You need to read the contents of your files and compare those, not the filehandles themselves.
Essay words each on their own line
The following code assumes that your "essay" consists of one word per line. We read the contents of the essay file into a hash of arrays, with the lines as keys and an array of positions as values. We use an array in case the same word appears multiple times in the file. The position of the first word is zero. We then loop through the word list file, printing the word and the first matching position, if there is one.
use strict;
use warnings;
use 5.010;
my $essay_file = 'files/essay.txt';
open my $essay_fh, '<', $essay_file or die "Failed to open '$essay_file': $!";
my $pos = 0;
my %essay;
while (<$essay_fh>) {
chomp;
push #{ $essay{$_} }, $pos;
$pos += length $_;
}
my $wordlist_file = 'files/wordlist.txt';
open my $wordlist_fh, '<', $wordlist_file or die "Failed to open '$wordlist_file': $!";
while (<$wordlist_fh>) {
chomp;
say "$_: $essay{$_}[0]" if exists $essay{$_};
}
essay.txt
I
want
to
be
just
another
Perl
hacker
when
I
grow
up
wordlist.txt
I
Perl
hacker
Output
I: 0
Perl: 20
hacker: 24
Note that I'm ignoring newline characters when computing the position values. You can adjust this as necessary.
Essay words more than one per line
If your essay file can have more than one word per line, we can use a regex to check for matches:
use strict;
use warnings;
use 5.010;
# Slurp entire essay file into a variable
my $essay = do {
local $/;
my $essay_file = 'files/essay.txt';
open my $essay_fh, '<', $essay_file or die "Failed to open '$essay_file': $!";
<$essay_fh>;
};
my $wordlist_file = 'files/wordlist.txt';
open my $wordlist_fh, '<', $wordlist_file or die "Failed to open '$wordlist_file': $!";
while (<$wordlist_fh>) {
chomp;
say "$_: ", pos($essay) - length($_) if $essay =~ /\b$_\b/g;
}
essay.txt
I want to be just another Perl hacker when I grow up
wordlist.txt
I
Perl
hacker
hack
Output
I: 0
Perl: 26
hacker: 31
Note that the results are a little bit different from our other program, because now there are spaces between words. Also note that there is no output for the word hack, since we're only checking for whole word matches.

Perl extract between start and end

I am aiming to extract a string from start to an end word, (dIonly is start and should be the end workset [including these parenthesis]; furthermore I would like to print the output into a file named report.
I have had problems with lookbehind, as the variable length was not implemented.
Now I reversed the string, to do lookahead. However, something is still not working.
I need to start from dIonly which means I have to reverse the string to circumvent the problem described above, as there are many workset(( in the whole string, which means I can't start from there...
Thank you! I edited the script now. What I need to do is reverse the string. I did that by splitting the string with a space as delimiter into a list, then reversed it, and put it into a string again. Just to split it into a list again at the delimiter 'solution' as my output will have several strings of which I want to extract dIonly to workset (this only works once the string is reversed as otherwise I would encouter worksets that I do not want and extract a different string, as dIonly is a distinct part of the pattern of the solution from which I can work forward to the second workset (which itself is the first workset with 2 parenthesis). Then I want to print it to a new output file. Any suggestions welcome!
This is a sample of the data:
... denotes that it continues after
..... maxRiskC(cA, 3)) c workset((RiskCA(cA, 3), RiskCB(cB, 2), maxRiskC(cA, 3))) c RiskCA(cA, 3) c RiskCB(cB, 2)) ***********
equation (built-in equation for symbol <=) 6 <= 40 ---> true
Solution 4 (state 31) states: 40 rewrites: 8421 in 5357394502ms cpu
(1464ms real) (0) rewrites/second) G:Game --> workset(empty) c playA
c dIonly c
.....
#!/usr/bin/perl
# perl -d ./perl_debugger.pl
use strict;
use Data::Dumper qw(Dumper);
use File::Slurp;
my #a_linesorig;
my #a_out;
my #a_str;
my $line;
my $reversedline;
my #a_linesrev;
my #reversedarray;
my $reversedline;
my $str;
open(my $fh, "<", "data.txt")
or die "cannot open < data.txt: $!";
my $line = read_file('data.txt');
#a_linesorig = split(' ', $line);
#a_linesrev = reverse(#a_linesorig);
$reversedline = join(' ', #a_linesrev); # joins the reversed list to a single string again
#reversedarray = split( /solution/, $reversedline ); # should split huge string into a list from one solution to next
foreach $str (#reversedarray) {
if ($str =~ /\bdIonly:\b(.*?)\bworkset\b/g);
print Dumper \$str;
print (#a_out, "$str");
}
close $fh
or die "can't close file: $!";
open(my $fh, ">", "output.txt")
or die "cannot open > output.txt: $!";
foreach $str (#a_out)
{
print ($fh "$str\n");
}
close $fh
or die "can't close file: $!";
Take off the reverse, it will reverse letters also and not individual words, for that scalar.
You can try it with a greedy match since you are only interested in the last workset:
while (my $line = <$input>) {
chomp $line;
if ($line =~ /.*workset(.*dIonly)/) {
# do something with results
say $fh "'$1'";
}
}
And if you need to reverse before writing to the file, you can do:
while (my $line = <$input>) {
chomp $line;
if ($line =~ /.*workset(.*dIonly)/) {
say $fh join " ",reverse (split / /,$1);
}
}

Perl - Start reading from specific line, and only get first column of it line, until end

I have a text file that looks like the following:
Line 1
Line 2
Line 3
Line 4
Line 5
filename2.tif;Smpl/Pix & Bits/Smpl are missing.
There are 5 lines that are always the same, and on the 6th line is where I want to start reading data. Upon reading data, each line (starting from line 6) is delimited by semicolons. I need to just get the first entry of each line (starting on line 6).
For example:
Line 1
Line 2
Line 3
Line 4
Line 5
filename2.tif;Smpl/Pix & Bits/Smpl are missing.
filename4.tif;Smpl/Pix & Bits/Smpl are missing.
filename6.tif;Smpl/Pix & Bits/Smpl are missing.
filename8.tif;Smpl/Pix & Bits/Smpl are missing.
Output desired would be:
filename2.tif
filename4.tif
filename6.tif
filename8.tif
Is this possible, and if so, where do I begin?
This uses the Perl 'autosplit' (or 'awk') mode:
perl -n -F'/;/' -a -e 'next if $. <= 5; print "$F[0]\n";' < data.file
See 'perlrun' and 'perlvar'.
If you need to do this in a function which is given a file handle and a number of lines to skip, then you won't be using the Perl 'autosplit' mode.
sub skip_N_lines_read_column_1
{
my($fh, $N) = #_;
my $i = 0;
my #files = ();
while (my $line = <$fh>)
{
next if $i++ < $N;
my($file) = split /;/, $line;
push #files, $file;
}
return #files;
}
This initializes a loop, reads lines, skipping the first N of them, then splitting the line and capturing the first result only. That line with my($file) = split... is subtle; the parentheses mean that the split has a list context, so it generates a list of values (rather than a count of values) and assigns the first to the variable. If the parentheses were omitted, you would be providing a scalar context to a list operator, so you'd get the number of fields in the split output assigned to $file - not what you needed. The file name is appended to the end of the array, and the array is returned. Since the code did not open the file handle, it does not close it. An alternative interface would pass the file name (instead of an open file handle) into the function. You'd then open and close the file in the function, worrying about error handling.
And if you need the help with opening the file, etc, then:
use Carp;
sub open_skip_read
{
my($name) = #_;
open my $fh, '<', $name or croak "Failed to open file $name ($!)";
my #list = skip_N_lines_read_column_1($fh, 5);
close $fh or croak "Failed to close file $name ($!)";
return #list;
}
#!/usr/bin/env perl
#
# name_of_program - what the program does as brief one-liner
#
# Your Name <your_email#your_host.TLA>
# Date program written/released
#################################################################
use 5.10.0;
use utf8;
use strict;
use autodie;
use warnings FATAL => "all";
# ⚠ change to agree with your input: ↓
use open ":std" => IN => ":encoding(ISO-8859-1)",
OUT => ":utf8";
# ⚠ change for your output: ↑ — *maybe*, but leaving as UTF-8 is sometimes better
END {close STDOUT}
our $VERSION = 1.0;
$| = 1;
if (#ARGV == 0 && -t STDIN) {
warn "reading stdin from keyboard for want of file args or pipe";
}
while (<>) {
next if 1 .. 5;
my $initial_field = /^([^;]+)/ ? $1 : next;
# ╔═══════════════════════════╗
# ☞ your processing goes here ☜
# ╚═══════════════════════════╝
} continue {
close ARGV if eof;
}
__END__
Kinda ugly but, read out the dummy lines and then split on ; for the rest of them.
my $logfile = '/path/to/logfile.txt';
open(FILE, $logfile) || die "Couldn't open $logfile: $!\n";
for (my $i = 0 ; $i < 5 ; $i++) {
my $dummy = <FILE>;
}
while (<FILE>) {
my (#fields) = split /;/;
print $fields[0], "\n";
}
close(FILE);