Use Perl to search for a match between two files - regex

I have two files: file1.txt and file2.txt. Both contains lines in this format:
file1
name1:value1
file2
name2:value2
I want to check if value1 is found in list2 (in the name2 string)
I have this function:
#!/usr/bin/perl
use warnings;
use Parallel::ForkManager;
sub loadf($);
print "Starting main program\n";
my #list1 = loadf("list1.txt");
my #list2 = loadf("list2.txt");
my $workernum = 10;
open(OK, '>>', 'valid.txt');
open(ER, '>>', 'invalid.txt');
$pm = new Parallel::ForkManager($workernum);
my $cnt = 0;
foreach $line (#list1) {
$cnt++;
my $pid = $pm->start and next;
my #data1 = split(":", $line);
my $name1 = $data1[0];
my $value1 = $data1[1];
my #data2 = split(":", $list2);
my $name2 = $data2[0];
my $value2 = $data2[1];
if (/$value1/i ~~ #list2)
{
print OK $name1 . " - " . $value2 . "\n";
print " [+] Found: " . $name1 . " - " . $value2 . "\n";
}
else
{
print ER $name1 . "\n";
print " [x] Unknown: " . $name1 . " - " . $value1 . "\n";
}
$pm->finish;
}
close(OK);
close(ER);
print "\n*** Finished ***\n";
sub loadf($) {
my #file;
open(FILE, $_[0] . "\n") or die("[+] Couldn't open " . $_[0] . "\n");
#file = <FILE>;
close(FILE);
return #file;
}
__END__
which is not working. What am I doing wrong ?

#!/usr/bin/perl
open(F,'list1.txt');
my #list1=<F>;
close(F);
open(F,'list2.txt');
my #list2=<F>;
close(F);
chomp(#list1,#list2);
foreach my $line (#list1)
{
if ($line=~/.+\:.+/)
{
my #data1 = split(":", $line);
if (my #d2=grep /$data1[1]\:/i,#list2){print " [+] Found: " . $data1[0] . " - " . [split(':',$d2[0])]->[1] . "\n"; }
else { print " [x] Unknown: " . $data1[0] . " - " . $data1[1] . "\n"; }
}
}

Erm... I really don't get your algorithm, sorry. First you read all the lines from compared files into two arrays, then you forking the processor loop but, as I see it, make no attempts to split these workers' job into chunks (to parallel it).
I'd suggest trying a bit different approach: slurp only the second file, then process the first file line by line. You didn't mention whether names and values of the second file are unique; I suppose they're not, but the program could be made even simpler if they are.
open my $caf, '<', 'list2.txt' or die $!, "\n";
my $checkedAgainst = do { local $/; <$caf>; };
open my $cf, '<', 'list1.txt' or die $!, "\n";
my $workernum = 10;
$pm = new Parallel::ForkManager($workernum);
while (<$cf>) {
my $pid = $pm->start and next;
my ($nameToCheck, $valueToCheck) = split /:/;
if ($checkedAgainst =~ /^\Q$valueToCheck\E:(.+)$/m) {
print " [+] Found: $nameToCheck - $1", "\n";
}
else {
print " [x] Unknown: $nameToCheck - $valueToCheck", "\n";
}
}
$pm->finish;
In other words, I first load the second file into one big string, then try to match it with the lines from the first file (line by line). I don't know, what symbols could appear in your values, that's why \Q-\E (quotemeta operators) are used there.
UPDATE: attempted to make this code fork, have no means to test it where I stand though.

Related

Regular expression anchored to beginning of any line containing the word "hello" in order they occur in string

Should find first hello, print character position... find next hello and print character positon... and the anchor can be any line that has the first hello...
Why doesn't it work?
Attempt #1:
$line = "\n hi\n hiya \n hello\n hi \n hello2";
$match = $line =~ m/^\s*(hello)/;
if (!$match) {
die("not found\n");
}
print "found at pos: " . pos($line) . "\n";
$line = $';
$match = $line =~ m/^\s*(hello)/;
if (!$match) {
die("not found\n");
}
print "found at pos: " . pos($line) . "\n";
Result: not found
Attempt #2:
$line = "\n hi\n hiya \n hello\n hi \n hello2";
$match = $line =~ m/\A\s*(hello)/;
if (!$match) {
die("not found\n");
}
$line = $';
$match = $line =~ m/\A\s*(hello)/;
if (!$match) {
die("not found\n");
}
print "found at pos: " . pos($line) . "\n";
Result: not found
For a "multiline" string need /m modifier for ^ to match line beginnings inside the string
use warnings;
use strict;
use feature 'say';
my $line = "\n hi\n hiya \n hello\n hi \n hello2";
while ( $line =~ /^\s*(hello)/mg ) {
say $1;
say pos $line
}
Prints
hello
22
hello
34

Regular expressions to search and replace decimal dashes with a normal dash in perl?

I Currently require a regular expression to search and replace all |–| with |-|. I am Currently Replacing |`| with |'| and it is working using:
while($_ =~ s/`/'/g)
{
print "Line: '$.'. ";
print "Found '$&'. ";
}
However using the same regex is not working for all of my below attempts:
while($_ =~ s/\–/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/\&#8211/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/\&ndash/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/\–/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/&#8211/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
while($_ =~ s/&ndash/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
The Script Currently looks as follows:
#!/usr/bin/perl
use strict;
use warnings;
my $FILE;
my $filename = 'NoDodge.c';
open($FILE,"<service.c") or die "File not opened";
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
while (<$FILE>)
{
while($_ =~ s/`/'/g)
{
print "Line: '$.'. ";
print "Found '$&'. ";
}
while($_ =~ s/\&#8211/-/g)
{
print "Line: '$.'. ";
print "Found '$&'.\n";
}
print $fh $_;
}
close $fh;
print "\nCompleted\n";
Example of Current Result:
Line: '152'. Found '`'.
Line: '162'. Found '`'.
Completed
SOLUTION:
Provided by Borodin,
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use open qw/ :std :encoding(utf8) /;
my $FILE;
my $fh;
my $readfile = 'service.c';
my $writefile = 'NoDodge.c';
open($FILE,'<',$readfile) or die qq{Unable to open "$readfile" for input: $!};
open($fh, '>',$writefile) or die qq{Unable to open "$writefile" for output: $!};
while (<$FILE>)
{
while(s/–/-/g)
{
print "Found: $& on Line: $.\n";
}
while(s/`/'/g)
{
print "Found: $& on Line: $.\n";
}
print $fh $_;
}
close $fh;
close $FILE;
print "\nService Migrated to $writefile\n";
Example Output:
Found: – on Line: 713
Found: ` on Line: 713
Found: – on Line: 724
Found: ` on Line: 724
Found: ` on Line: 794
Service Migrated to NoDodge.c
You need to use utf8 at the top of your program, otherwise Perl will see the individual bytes that make up the UTF-8 encoding of the en-dash (E2 80 93). There's also no need to specify $_ as the object of the substitution as it is the default, and you needn't escape an en-dash as it's not a special character within regex patterns
use utf8;
...
while( s/–/-/g ) { ... }
Or you may want to make it clearer using Unicode names, as it's far from obvious at a glance what it is you're replacing. In that case you don't need use utf8 as long as you name every non-ASCII character instead of using it literally, like this
while( s/\N{EN DASH}/-/g ) { ... }
You will also need to open the files -- both input and output -- as UTF-8-encoded. The simplest way is to set UTF-8 as the default mode. You would add this line near the top of your program
use open qw/ :std :encoding(utf8) /;
or you can open each file explicitly as UTF-8-encoded like this
my $filename = 'NoDodge.c';
open my $in_fh, '<:encoding(utf8)', 'service.c'
or die qq{Unable to open "service.c" for input: $!};
open my $out_fh, '>:encoding(utf8)', $filename
or die qq{Unable to open "$filename" for output: $!};

return entire line when search string is found

I want to return the whole line when I find the search string in the line.
I'm getting most of what I need when I print but I'm short about 10 characters. I get the start of the line and the string and about 10 characters after that but nothing more.
Here's my code (thanks in advance):
use strict;
use warnings;
my $calls_dir = "Ask/";
opendir(my $search_dir, $calls_dir) or die "$!\n";
my #files = grep /\.txt$/i, readdir $search_dir;
closedir $search_dir;
print "Got ", scalar #files, " files\n";
#my %seen = ();
foreach my $file (#files) {
my %seen = ();
my $current_file = $calls_dir . $file;
open my $FILE, '<', $current_file or die "$file: $!\n";
while (<$FILE>) {
chomp;
if (/^*(.*)Contact\s*(.*)\r?$/i) {
$seen{$1} = 1;
foreach my $addr ( sort keys %seen ) {
print "\n";
print $file;
print "\n";
print "[$addr]\n";
print "\n";
print "\n";
}
}
}
close $FILE;
}
$_ contains the entire line you're looking for

(La)Tex math parsing for C/C++

I would like to convert parse (la)tex math expressions, and convert them to (any kind of!) scripting language expression, so I can evaluate expressions.
What libraries do you recommend ?
May be it will help - take a look at TeXmacs, especially at a way it interacts with computer algebra systems.
Here is a set of possible options from a similar question. https://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
I think that Perl would make a fine choice for something like this, acting on text is one of its fortes.
Here is some info on how to make an exclusive flip-flop test (to find the context between \begin{} and \end{} without keeping those lines), http://www.effectiveperlprogramming.com/2010/11/make-exclusive-flip-flop-operators/
EDIT: So this problem has started me going. Here is a first attempt to create something here is my "math.pl" which takes a .tex file as an arguement (i.e. $./math.pl test.tex).
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;
my $re_num = qr/[+\-\dE\.]/;
my $file = shift;
open( my $fh, '<', $file);
#parsing this out for more than just the equation environment might be easier using Text::Balanced too.
my #equations;
my $current_equation = '';
while(<$fh>) {
my $test;
next unless ($test = /\\begin\{equation\}/ .. /\\end\{equation\}/);
if ($test !~ /(^1|E0)$/ ) {
chomp;
$current_equation .= $_;
} elsif ($test =~ /E0$/) {
#print $current_equation . "\n";
push #equations, {eq => $current_equation};
$current_equation = '';
}
}
foreach my $eq (#equations) {
print "Full Equation: " . $eq->{'eq'} . "\n";
solve($eq);
print "Result: " . $eq->{'value'} . "\n\n";
}
sub solve {
my $eq = shift;
print $eq->{'eq'} . "\n";
parse($eq);
compute($eq);
print "intermediate result: " . $eq->{'value'} . "\n";
}
sub parse {
my $eq = shift;
my ($command,#fields) = extract_multiple(
$eq->{'eq'}, [ sub { extract_bracketed(shift,'{}') } ]
);
$command =~ s/^\\//;
print "command: " . $command . "\n";
#fields = map { s/^\{\ *//; s/\ *\}$//; print "arg: $_\n"; {value => $_}; } #fields;
($eq->{'command'}, #{ $eq->{'args'} }) = ($command, #fields);
}
sub compute {
my ($eq) = #_;
#check arguements ...
foreach my $arg (#{$eq->{'args'}}) {
#if arguement is a number, continue
if ($arg->{'value'} =~ /^$re_num$/) {
next;
#if the arguement is a simple mathematical operation, do it and continue
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\ |\*|\\times)?\ *($re_num)$/) {
$arg->{'value'} = $1 * $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\+)?\ *($re_num)$/) {
$arg->{'value'} = $1 + $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\-)?\ *($re_num)$/) {
$arg->{'value'} = $1 - $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\/)?\ *($re_num)$/) {
$arg->{'value'} = $1 / $2;
} else {
#parse it and calc it as if it were its own equation.
$arg->{'eq'} = $arg->{'value'};
solve($arg);
}
}
my #args = #{$eq->{'args'}};
## add command processing here
# frac
if ($eq->{'command'} eq 'frac') {
$eq->{'value'} = $args[0]->{'value'} / $args[1]->{'value'};
return;
}
}
and here is a sample test.tex:
\documentclass{article}
\begin{document}
Hello World!
\begin{equation}
\frac{\frac{1}{3}}{2}
\end{equation}
\end{document}
Maybe using boost::spirit in order to tokenize the expression. You will need to define a huge grammar!
Use a parser generator to create an appropriate parser. Try ANTLR for this, as it includes an IDE for the Grammar, which is very helpful. Using tree rewrite rules, you can then convert the parse tree to an abstract syntax tree.
Start perhaps with the expression evaluator from ANTLR tutorial. I think this is reasonably close enough.

How to do perl inline regex without setting to a variable?

Normally if you wish to change a variable with regex you do this:
$string =~ s/matchCase/changeCase/;
But is there a way to simply do the replace inline without setting it back to the variable?
I wish to use it in something like this:
my $name="jason";
print "Your name without spaces is: " $name => (/\s+/''/g);
Something like that, kind of like the preg_replace function in PHP.
Revised for Perl 5.14.
Since 5.14, with the /r flag to return the substitution, you can do this:
print "Your name without spaces is: [", do { $name =~ s/\s+//gr; }
, "]\n";
You can use map and a lexical variable.
my $name=" jason ";
print "Your name without spaces is: ["
, ( map { my $a = $_; $a =~ s/\s+//g; $a } ( $name ))
, "]\n";
Now, you have to use a lexical because $_ will alias and thus modify your variable.
The output is
Your name without spaces is: [jason]
# but: $name still ' jason '
Admittedly do will work just as well (and perhaps better)
print "Your name without spaces is: ["
, do { my ( $a = $name ) =~ s/\s+//g; $a }
, "]\n";
But the lexical copying is still there. The assignment within in the my is an abbreviation that some people prefer (not me).
For this idiom, I have developed an operator I call filter:
sub filter (&#) {
my $block = shift;
if ( wantarray ) {
return map { &$block; $_ } #_ ? #_ : $_;
}
else {
local $_ = shift || $_;
$block->( $_ );
return $_;
}
}
And you call it like so:
print "Your name without spaces is: [", ( filter { s/\s+//g } $name )
, "]\n";
print "Your name without spaces is: #{[map { s/\s+//g; $_ } $name]}\n";