How to extract string from raw data file (stuck) - regex

I have a raw data text file and I am writing a perl script to extract a string from it. I am trying to get the computer name where the result would only be UFLEX-06 however, I am getting Computer Name: UFLEX-06 (with the tab spacing at the back of Computer Name).
This is what the text file contains:
Computer Info:
Computer Name: UFLEX-06
Computer is external to Teradyne.
Computer IP Address(es):
This is what I've done so far in my perl script:
use strict;
use warnings;
my $filename = 'IGXLEventLog.3.17.2015.20.25.12.625.log';
open(my $fn, '<', $filename) or die "Could not open file '$filename' $!";
our %details;
while(my $row = <$fn>)
{
chomp $row;
if($row =~ /Computer Name:/i)
{
my $cp_line;
do
{
$cp_line .= $row;
}while($row !~ /Computer Name:/);
#Remove
$cp_line =~ /Computer Name:/s;
my #String = split(/Computer Name:/, $cp_line);
#Assigning array elements to hash array
$details{cp_line} = $String[0];
print $details{cp_line};
}
}
I'm new to Perl so do pardon me if there are any obvious mistakes. My idea was to use split so that I can remove the Computer Name:. Any help on this?

I really don't follow what you are trying to do with your own code, but this is all that is necessary
use strict;
use warnings;
my $filename = 'IGXLEventLog.3.17.2015.20.25.12.625.log';
open my $fh, '<', $filename or die "Could not open file '$filename': $!";
while ( <$fh> ) {
if ( /Computer Name:\s*(\S+)/i ) {
print $1, "\n";
}
}

You can use a regex with capturing groups like this:
Computer Name: (.*)
And then grab the content from the group
Working demo
I'm not good at Perl but I think you can use below code and adapt it to your needs:
use strict;
use warnings;
my $filename = 'IGXLEventLog.3.17.2015.20.25.12.625.log';
open(my $fn, '<', $filename) or die "Could not open file '$filename' $!";
our %details;
while(my $row = <$fn>)
{
chomp $row;
if($row =~ m/Computer Name: (.*)/)
{
print $1;
}
}

Related

Substituting millions of regular expressions (perl)

I have a text file containing over one million lines of text. On each line, there is an alphanumerical code which needs to be substituted with a name. I have tried doing this using different Perl scripts, but each time the scripts die because they are using too many memory. I am new to Perl, so I imagine that I am doing something wrong, and it making the job too complex?
So far, I have tried:
use strict;
use warnings;
my $filename = 'names.txt';
my $data = read_file($filename);
$data =~ s/88tx0p/Author1/g;
##and then there are 1,000,000+ other substitution regexes.
write_file($filename, $data);
exit;
sub read_file {
my ($filename) = #_;
open my $in, '<:encoding(UTF-8)', $filename or die "Could not open
'$filename' for reading $!";
local $/ = undef;
my $all = <$in>;
close $in;
return $all;
}
sub write_file {
my ($filename, $content) = #_;
open my $out, '>:encoding(UTF-8)', $filename or die "Could not open
'$filename' for writing $!";;
print $out $content;
close $out;
return;
}
But then I realised that this script is trying to write the output to the original file, which I imagine uses more memory? So I tried the following:
use strict;
use utf8;
use warnings;
open(FILE, 'names.txt') || die "File not found";
my #lines = <FILE>;
close(FILE);
my #newlines;
foreach(#lines) {
$_ =~ s/88tx0p/Author1/g;
##and then there are approximately 1,000,000 other substitution regexes.
push(#newlines,$_);
}
open(FILE, '>names_edited.txt') || die "File not found";
;
print FILE #newlines;
close(FILE);
But again, this used too much memory. Please could I get help with ways of doing this while using minimum amount of memory? Thank you all.
Your problem is you're using a foreach loop. That needs you to load all the lines into memory, which is the root of your problem.
Try it in a while loop:
open ( my $file, '<', 'names.txt' ) or die $!;
open ( my $output, '>', 'names_edited.txt' ) or die $!;
select $output; #destination for print;
while ( <$file> ) { #reads one line at a time, sets $_
s/88tx0p/Author1/g; #acts on $_ by default
print; #defaults to printing $_ to the selected filehandle $output
}
That'll work line by line (as your initial code was) but will read only one line at a time, so the memory footprint will be vastly lower.

how to extract titles from a text file and use those titles to open other files and search for patterns

I am trying to figure out a Perl program that reads a text file containing file names, opens each of those files and searches them for a regular expression (eg GGggG).
I am reasoning that first I need to read the file and save everything into an array.
Then foreach element of the array, open the corresponding file and search within it.
Can someone please help?
This code works (gratia Chankey Pathak) for lines containing only one title thus not needing any processing:
my $filename = 'names.txt';
open (my $fh, "<", $filename) or die $!;
while ( <$fh> ) {
chomp $_;
my $file_contents;
{
open (my $fh, '<', $_) or die $!;
my $file_contents = '';
while (<$fh>) {
$file_contents .= $_;
print "Matched!" if $file_contents =~ /gggggg/i;
}
}
}
But what if the name file is full of names (few/line) separated only by \t?
My approach would be like below to solve the problem.
my $filename = 'names.txt';
open (my $fh, "<", $filename) or die $!;
# assuming each line contains file name
while ( <$fh> ) {
chomp $_;
my $file_contents;
{
open (my $fh, '<', $_) or die $!;
local $/ = undef;
$file_contents = <$fh>;
close $fh;
}
print "Matched!" if $file_contents =~ /GGGGG/;
}
See:
Open and read from text files
Regex in Perl
Loops in Perl

perl split 8gb csv with "," as pattern

I recognise this might be a duplicate but the size of the file I have to split requires a method with doesn't load the csv into memory before processing it. ie I'm looking for a line by line method to read and split and output my file. I I only need my output to be the last 3 field without the quotes and without the thousand delimiting comma.
I have a file of arcGIS coordinates which contain quotes and commas internal to the fields. Data example below.
"0","0","1","1","1,058.83","1,455,503.936","5,173,996.331"
I have been trying to do this using variations on split( '","' , $line);.
Here'e my code.
use strict;
use warnings;
open (FH, '<', "DEM_Export.csv") or die "Can't open file DEM_Export.csv";
open (FH2, '>', "DEM_ExportProcessed.csv") or die "Can't open file DEM_ExportProcessed.csv";
print FH2 "EASTING, NORTHING, ELEVATION,\n";
my $count = 0;
foreach my $line (<FH>) {
chomp;
# if ($count == 0){next;}
print $line, "\n";
my #list = split( '","' , $line);
print "1st print $list[5],$list[6],$list[4]\n";
$list[4] =~ s/,//g;
$list[5] =~ s/,//g;
$list[6] =~ s/,//g;
$list[4] =~ s/"//g;
$list[5] =~ s/"//g;
$list[6] =~ s/"//g;
print "2nd print $list[5],$list[6],$list[4]\n";
if ($count == 10) {
exit;
}
my $string = sprintf("%.3f,%.3f,%.3f\n", $list[5],$list[6],$list[4]);
print FH2 $string;
$count++;
}
close FH;
close FH2;
I'm getting close my my wits end with this and really need a solution.
Any help will be gratefully received.
Cheers
This is really very straightforward using the Text::CSV to handle the nastiness of CSV data
Here's an example, which works fine with the sample data you have shown. As long as your input file is plain ASCII and the rows are about the size you have shown it should work fine
It prints its output to STDOUT, so you'll want to use a command-line redirect to put it into the file you want
use strict;
use warnings 'all';
use Text::CSV;
my $csv_file = 'DEM_Export.csv';
open my $in_fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};
my $csv = Text::CSV->new({ eol => "\n" });
print "EASTING,NORTHING,ELEVATION\n";
while ( my $row = $csv->getline($in_fh) ) {
$csv->print(\*STDOUT, [ map tr/,//dr, #$row[-2,-1,-3] ] );
}
output
1455503.936,5173996.331,1058.83
I guess I should have been braver and had a crack with Text::CSV to start with rather than asking a question.
Many thanks to Сухой27 and choroba for pointing me in the right direction.
Here is the code I ended up with. Probably not the tidiest.
use strict;
use warnings;
use Text::CSV;
my $file = "DEM_Export.csv";
my $file2 = "DEM_ExportProcessed.csv";
open (FH2, '>', $file2) or die "Can't open file $file2: $!";
print FH2 "EASTING, NORTHING, ELEVATION,\n";
print "Starting file processing...\n";
my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
open my $io, "<", $file or die "$file: $!";
while (my $row = $csv->getline ($io)) {
my #fields = #$row;
s/,//g for #fields[3..5];
my $string = sprintf("%.3f,%.3f,%.3f\n", $fields[4],$fields[5],$fields[3]);
print FH2 $string;
}
print "Finished!";
close FH2;
Worked a treat!
Thank you.

Perl editing the files

My file contains 100 lines
1234 ABC 100.0.0.0
4567 DEF 200.0.0.0 .....
I am matching a pattern in my file. for example search for 4567 and replace the 200.0.0.0 to 500.0.0.0, so that line looks now 4567 DEF 500.0.0.0.
my $file = "$file_path/file.lst";
my $newid = "500.0.0.0"
open MAST, $file or die "Unable to open file.lst: $!";
my $new = "$file.tmp.$$";
my $bak = "$file.bak";
open(NEW, "> $new") or die "can't open $new: $!";
while (<MAST>) {
my ($pattern,$id) = (split /\s+/, $_)[0,4];
print $_;
if ( $_ =~ m/^$pattern/ ) {
$_ =~ s/$id/$newid/g;
}
(print NEW $_) or die "can't write to $new: $!";
}
close(MAST) or die "can't close $file: $!";
close(NEW) or die "can't close $new: $!";
rename($file, $bak) or die "can't rename $file to $bak: $!";
rename($new, $file) or die "can't rename $new to $file: $!";
What I need to do:
Show the line before change and after change on screen and ask for user confirmation and proceed with other things later.
Please advice.
If you execute your code it will show compilation error because statement is not terminated on second line.
Always use use warnings; and use strict; on top of your program after shebang line.
Here is a way to search a pattern and replace:
#!/usr/bin/perl
use warnings;
use strict;
my $file = "file.txt";
my $newid = "500.0.0.0";
my #update;
open my $fh, "<", $file or die $!;
while ( my $line = <$fh> )
{
my ($pattern, $id) = (split(/\s+/, $line))[0, 2]; #split line to get first and third column value
if ($pattern =~ m/4567/)
{
print "Before change: $line\n";
$line =~ s/$id/$newid/g; #replace id with newid when pattern is 4567
print "After change: $line\n";
#print "Confirm with Yes or No: "; #here you can for confirmation
#chomp(my $confirmation = <STDIN>);
}
push #update, $line;
}
close $fh;
#modify the file
open my $fhw, ">", $file or die "Couldn't modify file: $!";
print $fhw #update;
close $fhw;
----------file.txt----------
1234 ABC 100.0.0.0
4567 DEF 200.0.0.0
It will also print the matched line before change and after change on console. You can modify this code according to your requirement. Confirmation part is not very clear in your question.
Without the user confirmation element this will do it:
perl -i.bak -pe 'm/^4567/ and s/200/500/' filename
Are you user you need per-line user confirmation? You can check if this did what you needed with diff.

Regex and creating output file (Perl beginner)

I'm in an intro to Perl course and we are tasked with taking an input.txt file (of the Gettysburg Address - that has all instances of the word 'old' changed to 'new') and creating an output.txt file that switches 'new' back to having 'old'. I've got a general regex that switches all instances of 'new' to 'old', but it needs to work regardless of case in the input file. I'm wondering how I could add that in? Also, I'm looking to verify that I have my output.txt built in correctly? When I run what I have, I get no output.txt file created in my directory. Here is what I have so far:
open(my $getty, "<", "input.txt")
or die "Cannot open < input.txt: $!";
open(my $getty, ">", "output.txt")
or die "Cannot open < output.txt: $!";
while(my $line = <$getty>) {
if ($line =~ 's/new/old/') {
$line =~ s/new/old/;
}
}
You don't need to put an if condition.
while(my $line = <$getty>) {
$line =~ s/new/old/gi;
}
The good recipe is to change smth. from shell by Perl:
perl -pi.orig -e 's{old}{new}g' filename.txt
This produce replacement in file filename.txt wtih an original file filename.txt.orig
I guess your course is over, however here is a complete answer for anyone who might need it:
The if is unnecessary. You need to use print on the output filehandle to create the file. Here is the complete working code (there is no comma after the filehandle in the print statement):
use strict;
open(my $in_file, "<", "input.txt")
or die "Cannot open < input.txt: $!";
open(my $out_file, ">", "output.txt")
or die "Cannot open < output.txt: $!";
while(my $line = <$in_file>) {
$line =~ s/new/old/i;
print $out_file $line;
}
If the case of the input word should be the same in the output word, this is a solution:
use strict;
open(my $in_file, "<", "input.txt")
or die "Cannot open < input.txt: $!";
open(my $out_file, ">", "output.txt")
or die "Cannot open < output.txt: $!";
while(my $line = <$in_file>) {
$line =~ s{(new)}
{
my #chars = split '', $1;
my #old = qw/o l d/;
my #out;
foreach my $char (#chars) {
if($char =~ /\p{Uppercase}/) {
push #out, uc(shift #old);
}
else {
push #out, shift #old;
}
}
join('', #out);
}esi;
print $out_file $line;
}
What happens here is that I use s{pattern}{replacement}. The e modifier makes the replacement part perl code and s makes it possible for me to use whitespace in the expression. In the replecement code I go trough every char of the "new" (captured with braces so I can check it in the variable $1). If the char is uppercase I use the uc function to make the output char uppercase aswell.