First a quick intro. I'm new here, so if I screw up a post, please let me know and I'll fix it.
I've been trying to accomplish my goal using perl, but I'm stuck. I don't need to use perl to accomplish it, but I figure it's that, or Excel and I like perl better. If you have a better method please share.
I start with a file (output from a log file). It is 1 line, fields delimitted by colon. Here is an example of the file:
RmDenySumm:SGID=46244:Req=15000:tsid=46244:AllocBw=38332:BwList=12456/12500/3750/5876/3750:tsid=63042:AllocBw=38750:BwList=15000/12500/3750/3750/3750:tsid=63043:AllocBw=36717:BwList=14706/12500/3750/5761:tsid=63044:AllocBw=37011:BwList=15000/12500/5761/3750:tsid=61741:AllocBw=38450:BwList=12339/3750/6501/12502/3357:tsid=61721:AllocBw=37460:BwList=12500/15000/4200/5760:tsid=2072:AllocBw=31975:BwList=12136/12339/3750/3750:tsid=2073:AllocBw=24260:BwList=14634/5876/3750:tsid=30842:AllocBw=38453:BwList=14634/12500/5761/5557:tsid=30843:AllocBw=37105:BwList=15000/15000/3750/3355:tsid=30844:AllocBw=38295:BwList=14706/12339/3750/3750/3750:tsid=30845:AllocBw=25601:BwList=5762/12339/3750/3750:tsid=30846:AllocBw=38455:BwList=15000/12136/5761/5557:tsid=30847:AllocBw=26974:BwList=14634/12339:tsid=30848:AllocBw=29634:BwList=14634/15000:tsid=30849:AllocBw=37338:BwList=14838/15000/3750/3750:tsid=60958:AllocBw=36898:BwList=12339/12500/6501/5557:tsid=60959:AllocBw=37178:BwList=12339/12500/12339:tsid=60960:AllocBw=27339:BwList=12339/15000:tsid=60962:AllocBw=34839:BwList=12339/3750/15000/3750:tsid=60963:AllocBw=37500:BwList=15000/15000/3750/3750:tsid=60964:AllocBw=38346:BwList=15000/3754/15000/4592:tsid=60965:AllocBw=24626:BwList=15000/5876/3750:tsid=60966:AllocBw=34513:BwList=12502/12500/5761/3750
I need to grab all of "AllocBW=######" fields, separate the number part from the "AllocBW", add them all together then subtract them from a set value.
In perl, I have this:
#!/usr/bin/perl -w
use Data::Dumper;
#
#
my $file = "/home/nick/perl/svcgroup.txt";
my #asplit;
my $c = 0;
open (FILE, "<", $file) or die "Can't open file".$!."\n";
while (<FILE>) {
$_ =~ s/\n//g;
push(#asplit, split (":", $_));
#print Dumper #asplit;
}
foreach $splits (#asplit) {
if ($splits =~ m/AllocBw/) {
print $splits."\n";
}
}
#print Dumper #asplit;
print "\n\n";
close FILE;
exit;
Which leaves me with:
AllocBw=38332
AllocBw=38750
AllocBw=36717
AllocBw=37011
AllocBw=38450
AllocBw=37460
AllocBw=31975
AllocBw=24260
AllocBw=38453
AllocBw=37105
AllocBw=38295
AllocBw=25601
AllocBw=38455
AllocBw=26974
AllocBw=29634
AllocBw=37338
AllocBw=36898
AllocBw=37178
AllocBw=27339
AllocBw=34839
AllocBw=37500
AllocBw=38346
AllocBw=24626
AllocBw=34513
This is where I get stuck. I'm not sure how to strip these values down to the number and add them up.
If someone can assist, I'd be grateful. If this is more easily accomplished using something other than Perl, that's fine too. My programming scope is limited, as I only make small scripts to accomplish small repetitive tasks at work.
EDIT FOR BORODIN
ie (not formatted like this, this is just for illustration):
AllocBw 12575+
AllocBw 12568+
AllocBw 12358 = TotAllocBw 37501
MaxBw 38800*3=116400
116400(MaxBw) - 37501(TotAllocBw) = TotAvaiBw 78899
This would just be a big bonus. The script you wrote works perfectly well for my purposes and I can adapt it as I need. Thanks again! Much appreciated. I was able to follow everything you did differently in the script and learned some new stuff.. Thanks for that as well.
It is simplest to use a global regular expression match to find all occurrences of AllocBw=... in each line of your input file.
This program's outer while loop iterates over all the lines in the input file, and so should be executed only once.
The inner while iterates over all instances of the regex pattern AllocBw=(\d+) (AllocBw= followed by any number of decimal digits) and captures the numeric value into $1.
The captured number is added to $total each time, and can simply be printed at the end.
use strict;
use warnings;
my $file = '/home/nick/perl/svcgroup.txt';
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
my $total = 0;
while ( <$fh> ) {
$total += $1 while /AllocBw=(\d+)/g;
}
printf "Total: %d\n", $total;
output
Total: 826049
Related
I have searched for several hours and I've found tons of resources, but I can't seem to put the info together for my purposes. I have a file with:
FirstName LastName
FirstName LastName
FirstName LastName
etc...
This is homework assigned to the class by students from their presentation. 99% of the code below is theirs from a template script, I'll comment where my contributions come in. I'm supposed to find all of the last names that begin with a vowel and output them into a text file. It seems that I'm having most of my trouble figuring out the regex for "vowel as first character of second field" - as well as some trouble with basic Perl I suppose. How do I do this? This isn't really worth much credit at all, but I'm just driving myself crazy trying to figure this out - Thanks a bunch for any help.
#!/usr/bin/perl
$input = 'names.txt';
open ($info, $input) or die "Could not open $input";
$output = 'hw.txt';
open($fh, '>>', $output) or die "Could not open $output";
#names = split(/\s/, $line); #MINE - probably wrong somehow
while( my $line = <$info>) {
if ($names[2] =~ /^AEIOU/){ #MINE - also probably wrong somehow
print $fh "$line";
}
}
close $info;
close $fh;
Like many programming languages, Perl's array indices start at 0
You sound like you have your answer, but just for completeness you're also using $line before it's actually initialized, you don't need to re-quote "$line" in your print, and also instead of splitting the line first, you could just use a regex match from the very start:
while( my $line = <$info> ) {
if( $line =~ /\s[AEIOU]/ ) {
print $fh $line;
}
}
That's actually slightly safer than your code too because if there's a double-space then your code will result in the second array element being an undef (although there are other ways to mitigate that).
As an aside, I'm very pleased to see Perl still being the subject of homework assignments :)
scenario: I am a Jr. C# developer, but recently (3 days) began learning Perl for batch files. I have a requirement to parse through a text file, extract some key data, then output the key data to a new text file. As seems to always be the case, there are butt loads of fragmented examples on the net regarding how to 'read' from a file, 'write' to a file, 'store' line by line into an array, 'filter' this and that, yadda yadda, but nothing discussing the entire process of read, filter, write. Trying to splice examples from the net together is no good, because none seem to work together as coherent code. Coming from C#, Perl's syntax structure is hella confusing. I just need some advice on this process.
My objective is to parse a text file, single out all lines similar to the one below, by date, and output only the first 8 digits of the 2nd number group and 5 digits from the 3rd number group to a new text file.
11122 20100223454345 ....random text..... [keyword that identifies all the
entries I need]... random text 0.0034543345
I know regex is likely the best option, and have most of the expression written, but it does not work in Perl!
Question: Could someone please show a simple (dummy) example of how to read from, filter (using dummy regex) the file, then output the (dummy) results to a new file? I'm not concerned with functional details, I can learn those, I just need the syntax structure Perl uses. For example:
open(FH, '<', 'dummy1.txt')
open(NFH, '>', 'dummy2.txt')
#array; or $dumb;
while(<FH>)
{
filter each line [REGEX] and shove it into [#array or $dumb scalar]
}
print(join(',', #array)) to dummy2.txt
close FH;
close NFH;
Note: For various reasons, I cannot paste my source code in here, sorry. Any help is appreciated.
UPDATE: ANSWER:
Much thanks to all those who provided insight into my issue. After reading through you replies, as well as conducting further research, I learned that there are dozens of ways to accomplish the same task in Perl(which I am not a fan of). In the end, this is how I solved the problem, and IMO it's the cleanest, and most succinct, solution for those having similar struggles. Thanks again for all the help.
#======================================================================
# 1. READ FILE: inputFile.txt
# 2. CREATE FILE: outputFile.txt
# 3. WRITE TO: outputFile.txt IF line matches REGEX constraints
# 4. CLOSE FILES: outputFile.txt & inputFile.txt
#==========================================================================
#1
$readFile = 'C:/.../.../inputFile.txt';
open(FH, '<', $readFile) or Error("Could not read file ($!)");
#2
$writeFile = 'C:/.../.../outputFile.txt';
open(NFH, '>', $writeFile) or Error("Cannot write to file ($!)");
#3
#lines = <FH>;
LINE: foreach $line (#lines)
{
if ($line =~ m/(201403\d\d).*KEYWORD.*time was (\d+\.\d+)/)
{
$date = $1;
$elapsedtime = $2;
print NFH "$date,$elapsedtime\n";
}
}
#4
close NFH;
close FH;
perlfaq5 - How do I change, delete, or insert a line in a file, or append to the beginning of a file? covers most of the different scenarios for how to use files.
However, I will add to that by saying that always start your scripts with use strict; and use warnings;, and because you're doing file processing, use autodie; will serve you as well.
With that in mind, a quick stub would be the following:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'dummy1.txt';
open my $outfh, '>', 'dummy2.txt';
while (my $line = <$infh>) {
chomp $line; # Remove \n
if (Whatever magically processing here) {
print $outfh, "your new data";
}
}
while(<FH>)
{
# variable $_ contains the current line
if(m/regex_goes_here/) #by default, the regex match operator m// attempts to match the default $_ variable
{
#do actions
}
}
Also note, m/regex/ is the same as /regex/
Refer to:
http://perldoc.perl.org/perlvar.html#General-Variables
http://perldoc.perl.org/perlre.html
For capturing variables from regex match, THIS might help
EDIT
If you want a different variable than the default $_, as #Miller suggested, use while($line = <FH>) followed by if($line =~ m/regex_goes_here/)
=~ is the Binding Operator
One tip. Don't explicitly open filehandles to your input and output files. Instead read from STDIN and write to STDOUT. Your program will be far more flexible and easier to use as you'll be able to treat it like a Unix filter.
$ your_filter_program < your_input.txt > your_output.txt
And doing this actually makes your program simpler to write too.
while (<>) { # <> reads from STDIN
# transform your data (which is in $_) in some way
...
print; # prints $_ to STDOUT
}
You might find the first few chapters of Data Munging with Perl are useful.
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
INPUT_FILE => "NAME_OF_INPUT_FILE",
OUTPUT_FILE => "NAME_OF_OUTPUT_FILE",
FILTER => qr/regex_for_line_to_filter/,
};
open my $in_fh, "<", INPUT_FILE;
open my $out_fh, ">", OUTPUT_FILE;
while ( my $line = <$in_fh> ) {
chomp $line;
next unless $line =~ FILTER;
$line =~ s/regular_expression/replacement/;
say {$out_fh} $line;
}
close $in_file;
close $out_file;
The $in_file is your input file, and $out_fh is your output file. I basically open both, and loop through the input. The chomp removes the \n from the end. I always recommend doing that.
The next goes to the next iteration of the loop unless I match FILTER which is a regular expression matching lines you want to keep. This is identical to:
if ( $line !~ FILTER ) {
next;
}
I then use the substitution command to get the parts of the line I want, and munge them into the output I want. I maybe better off expanding this a bit. Maybe using split to split up my line into various pieces, the only using the pieces I want. I could then use substr to pull out the substring from the select pieces.
The say command is like print except it automatically adds in a NL on the end. This is how you write a line to a file.
Now, get Learning Perl and read it. If you know any programming. it shouldn't take you more than a week to go through the first half of the book. That should be more than enough to be able to write a program like this. The more complex stuff like references and object orientation might take a bit longer.
On line documentation can be found at http://perldoc.perl.org. You can look up the use statements which are called pragmas over there. Documentation on the individual functions are also available.
If I understood well, this one liner will do the job:
perl -ane 'print substr($F[1],0,8),"\t",substr($F[-1],0,5),"\n" if /keyword/' in.txt
Assuming in.txt is:
11122 20100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.0034543345
11122 30100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.124543345
11122 40100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.65487
11122 50100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.6215
output:
20100223 0.003
40100223 0.654
I have a giant text data file (~100MB) that is a concatenation of a bunch of data files with various header information then some columns of data. Here's the problem. I want to extract a particular number from the header info before each of these data sets and then append that to another column in the data (and write out that data to a different file).
The header info that I want is of the format ex: BGA 1
Where what I want for that extra data column is the # after word BGA. It will be a number between 1 and maybe 20000. I can write the regex to pull the word BGA, but I don't seem to be able to figure out how to just get the digit after it.
To add EXTRA fun, that text "BGA 1" is repeated in each data section TWICE.
Here's what I have so far, which actually doesn't work... I want it to at least print "BGA" everytime it encounters the word BGA, but it prints nothing.... Any help would be appreciated.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'alldata.txt';
open my $info, $file or die "Could not open $file: $!";
$_="";
while(my $line = <$info>){
if ($line eq "/BGA/"){
print <>,"\n";
}
}
close $file;
if ($line =~ /BGA\s(\d+)/){
#your code
print "BGA number $1 \n";
#your code
}
And $1 variable will have the number you want
If there is more than one BGA per line, you'll need to allow the regex to match more than once per line:
while (my $line = <$info>) {
while ( $line =~ /BGA\s(\d+)/g ) {
print "$1\n";
}
}
This should print out all the BGA numbers as a single column. Without any further information it's hard to answer this any better.
First, a 100 MB file is not giant. Don't be so defeatist. You could even slurp it into memory:
Let's look at the few critical places in your code:
while(my $line = <$info>) {
if ($line eq "/BGA/") {
Your condition $line eq "/BGA/" tests if the line literally consists of the string "/BGA/". But, that can never be true for the line with at least have the input record separator, i.e. the contents of $/ at the end because you did not chomp it. In any case, what you want is to match lines that contain "BGA" anywhere and the proper Perl syntax to do that is
if ($line =~ /BGA/) {
Now, once you fix that, you are going to run into a problem with the following statement:
print <>,"\n";
What you really want is print $line;. The diamond operator, <>, in list context is going to try to slurp from STDIN or any files specified as arguments on the command line. Not a good idea.
Others have pointed out how to match the string "BGA" followed by a digit. For better answers, you are going to need to show examples of input and expected output.
I have a file with several XML tags like such:
<Good>Yay!</Good>
<Great>Yup!</Great>
<Bad>booo</Bad>
<Bad>
<Ok>not that great</ok>
</Bad>
<Good>Wheee!</Good>
where I want to get rid of the "Bad" tags and anything in between.
So it would turn into just:
<Good>Yay!</Good>
<Great>Yup!</Great>
<Good>Wheee!</Good>
I know this one-liner:
perl -pe "undef $/;s/<Bad>.*?<\/Bad>//msg" < originalFile > newlyStrippedFile
Seems to do everything I want (aside from putting extra newlines in, but hopefully I can deal with that easily enough)
But I need to put it in a script (two files are read into the command line, one with all the tags, the other with a list of tags to pull out), so the same thing is going to be called several times.
And I'm just having trouble. Either it's only ever reading one line or I get errors or both.
Here is the relevant portion of my latest attempt:
open ORIGINAL_FILE, $sdb_pathname
or die "Can't open '$sdb_pathname' : $!";
#sdb_input_array = <ORIGINAL_FILE>;
close ORIGINAL_FILE;
#sdb_input_scalar=join("",#sdb_input_array);
foreach $tag (#tags) {
&remove_tag($tag);
}
sub remove_tag
{
my($current_tag) = #_;
$sdb_input_scalar =~ s/<$current_tag>.*?<\/$current_tag>//msg;
open NEWLY_STRIPPED_FILE, $clean_sdb_pathname
or die "Can't open '$clean_sdb_pathname' : $!";
print(NEWLY_STRIPPED_FILE $sdb_input_scalar);
close(NEWLY_STRIPPED_FILE);
}
This is giving me "use of uninitialized value $sdb_input_scalar in substitution (s///) at my $sdb_input_scalar =~ line.
and
Filehandle NEWLY_STRIPPED_FILE opened only for input
And of course my two files still look identical, as if I did nothing to them.
I'm sorry if I'm missing something obvious but I'm literally brand new to perl. Someone at work gave an 8-hour estimate to do this script and I've already used over 5 hours just installing perl, learning the syntax and getting the other aspects to go right. I know there is an XML::Parser module but I found the examples very overwhelming for the short time I have left to complete.
I have to assume my regex is correct because the one-liner works so nicely.
Can anyone please help me adapt it to what I need it for?
You really should use an XML parser. It's almost a guarantee that an XML file will not parse quite the way you expect it to with regexes. However, let's get you started first.
Where you have:
#sdb_input_scalar=join("",#sdb_input_array);
You actually want:
$sdb_input_scalar=join("",#sdb_input_array);
Now some other tips.
At the top of your script make sure you enable warnings with the -w flag like this:
#!/path/to/perl -w
use strict;
Once you add in the use strict it will cause you several errors, but that's a good thing. We're going to enforce some scope and other good practices. You now need to initialize variables (beginning with $, #, or %) with my. For example:
my #sdb_input_array = <ORIGINAL_FILE>;
or:
foreach my $tag (#tags) { ... }
Instead of calling open like you are, use the three arguement version:
open ($originalFile, "<", $sdb_pathname)
or die "Can't open '$sdb_pathname' : $!";
my #sdb_input_array = <$originalFile>;
That will set it to read only. See http://perldoc.perl.org/functions/open.html
Generally you should avoid dependency on globals. Change how you call remove_tag():
foreach $tag (#tags) {
$sdb_input_scalar = remove_tag($sdb_input_scalar, $tag);
}
To support this you need to change the function as well:
sub remove_tag
{
my($input, $current_tag) = #_;
$input =~ s/<$current_tag>.*?<\/$current_tag>//msg;
return $input;
}
You can then write out once after you have iterated over all tags by moving this outside of the remove_tag function:
open ($strippedFile, ">", $clean_sdb_pathname)
or die "Can't open '$clean_sdb_pathname' : $!";
print $strippedFile $sdb_input_scalar;
close($strippedFile);
Here is a solution using XML::Twig:
use warnings;
use strict;
use XML::Twig;
my $xml = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => {
#Define a sub that will be called for all 'Bad' tags
Bad => sub {
$_->set_tag('Good');
}
}
);
$xml->parse(\*DATA);
$xml->print;
__DATA__
<xml><Good>Yay!</Good><Great>Yup!</Great><Bad>booo</Bad><Bad>
<Ok>not that great</Ok></Bad><Good>Wheee!</Good></xml>
XML::Twig also has parsefile() and parsefile_inplace() methods that take a filename directly and process it--just what you need.
There is a little bit of a learning curve with this method, but the benefits are great.
First: don't use regular expressions to deal with XML!
Then, assuming the doubt from the question title, rather than the specific usage case. Your one-liner is better written as:
perl -0777 -pe "s/<(Bad)>.*?<\/\1>//msg" < originalFile > newlyStrippedFile
Now, use the Perl itself to "inflate" the one-liner:
perl -MO=Deparse -0777 -pe "s/<(Bad)>.*?<\/\1>//msg" > oneliner.pl
And this is what you get:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s[<(Bad)>.*?</\1>][]gms;
}
continue {
die "-p destination: $!\n" unless print $_;
}
Just add use strict; use warnings;.
This is a solution using XML::Twig. I have assumed that your XML document is well-formed and have wrapped the data you have shown in it in a <root> element to make it so.
The $twig object defines a single twig handler for <Bad> elements, which simply deletes the element if it appears during parsing.
Once the input has been parsed, $twig-print shows the residual XML.
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new(
twig_handlers => { Bad => sub { $_->delete } },
pretty_print => 'record',
);
$twig->parse(<<'END_XML');
<root>
<Good>Yay!</Good>
<Great>Yup!</Great>
<Bad>booo</Bad>
<Bad>
<Ok>not that great</Ok>
</Bad>
<Good>Wheee!</Good>
</root>
END_XML
$twig->print;
output
<root>
<Good>Yay!</Good>
<Great>Yup!</Great>
<Good>Wheee!</Good>
</root>
This should do the trick:
$tags=join("",#sdb_input_array);
print "contents before : $tags \n";
$tags =~ s/<Bad>.*?<\/Bad>//msg;
print "content cleaned : $tags \n";
the tags variable should now not carry the "BAD" tags - the only issue will be that the tag lines will be left with a blank unfilled line so that you have blank lines in between the GOOD tag lines - but you can remove blank lines as your final step
I've been struggling with this for a while and I was wondering if there was something obvious I've missed.
As programming learning/practice, I'm trying to put together a simple script for calculating the components of a restriction enzyme digest mix. However, first I need to get a list of enzyme stock concentrations.
I pulled all the individual pages from the New England Biolabs enzyme page, and my goal with this current script is to pull out the name of the enzyme and the concentrations available from the company.
This example works with a local copy of EcoRI (link included at bottom of submission).
use warnings;
use strict;
open(FILE,'productR0101.asp');
my $line;
my $counter;
my $array1;
my $array2;
my $array3;
my $concentration;
my #array4;
$counter = 1;
while ($line = <FILE>) {
chomp($line);
if ($counter == 6 ){
$array1 = $line;
$counter++;
}
else{
$counter++;
}
if ($line =~ m/.{8}units.ml/g) {
(#array4) =$line =~ m/.{8}units.ml/g;
print #array4;
}
}
print "\n".$array1;
exit;
Every file has the enzyme name on the sixth line of the file, so I just pulled that whole line. However, the concentrations are in different locations, so my approach was to read in the file one line at a time, and match to the units/ml tag.
My thinking was that it should print out the match for each line, if there was one, every time the while loop runs, effectively resulting in a string of separate print statements.
This is where I get messed up. There are six different locations in this file with a units/ml tag: three for 20,000 and three for 100,000.
I was expecting six different results printed, but when I run this, only one 100,000 units/ml result is returned.
I've tried all sorts of fixes. I tried concatenating strings, I tried storing it as a string, I tried concatenating it onto another array that never gets touched by the (#array4) = $line =~ m/.{8}units.ml/g line, and it either breaks it or gives the same result.
And finally, I apologize for any weird conventions. I'm still learning Perl, and my first experience programming was with MATLAB.
Also, the $array1, $array2, etc. exist because I was trying to keep track of exactly what was getting put where; my intention is to clean it up once I get it functional.
So does anyone have any ideas about what I'm doing wrong?
EDIT: the data source is the source code to each individual enzyme page. For this example, if you view the page source you get the complete input file I gave to the script.
Are the 20,000 units/ml at the start of the line? Because in that case, .{8} would fail to match - the dot doesn't match newlines, and 20,000_ is only 7 characters.
We really need to see the data you are processing, but it looks like you are storing only the last occurrence of /units.ml/ in #array4 because you are reading the file line by line.
I will add to this answer if you supplement your question, but for now I need to know
What your data looks like
What the mysterious /.{8}/ is for
Are you aware that $array1, $array2, and $array3, are scalars, as well as being very bad names for variables?
For now, here is a rewrite of your code using idiomatic Perl, and the $. variable that evaluates to the line number of the file most recently read
use strict;
use warnings;
open my $file, '<', 'productR0101.asp' or die $!;
my $array1;
my #array4;
while (my $line = <$file>) {
chomp $line;
$array1 = $line if $. == 6;
if ($line =~ m/.{8}units.ml/) {
#array4 = $line =~ m/.{8}units.ml/g;
print "#array4\n";
}
}
print "\n".$array1;
I can't exactly reproduce the behavior you've reported of only getting one of the 100,000 units/ml results, as I'm not exactly sure what your input data is. However, I think the problem is with the regular expression not having any captures. You should put parenthesis around the part of the regex match that you want to be returned to #array4. So instead of this:
#array4 = $line =~ m/.{8}units.ml/g;
Try this:
#array4 = $line =~ m/(.{8})units.ml/g;
#array4 = $line =~ /(.{8})units.ml/;
EDIT:
You also don't want to use the m/ and /g modifiers.