I would like to convert parse (la)tex math expressions, and convert them to (any kind of!) scripting language expression, so I can evaluate expressions.
What libraries do you recommend ?
May be it will help - take a look at TeXmacs, especially at a way it interacts with computer algebra systems.
Here is a set of possible options from a similar question. https://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
I think that Perl would make a fine choice for something like this, acting on text is one of its fortes.
Here is some info on how to make an exclusive flip-flop test (to find the context between \begin{} and \end{} without keeping those lines), http://www.effectiveperlprogramming.com/2010/11/make-exclusive-flip-flop-operators/
EDIT: So this problem has started me going. Here is a first attempt to create something here is my "math.pl" which takes a .tex file as an arguement (i.e. $./math.pl test.tex).
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;
my $re_num = qr/[+\-\dE\.]/;
my $file = shift;
open( my $fh, '<', $file);
#parsing this out for more than just the equation environment might be easier using Text::Balanced too.
my #equations;
my $current_equation = '';
while(<$fh>) {
my $test;
next unless ($test = /\\begin\{equation\}/ .. /\\end\{equation\}/);
if ($test !~ /(^1|E0)$/ ) {
chomp;
$current_equation .= $_;
} elsif ($test =~ /E0$/) {
#print $current_equation . "\n";
push #equations, {eq => $current_equation};
$current_equation = '';
}
}
foreach my $eq (#equations) {
print "Full Equation: " . $eq->{'eq'} . "\n";
solve($eq);
print "Result: " . $eq->{'value'} . "\n\n";
}
sub solve {
my $eq = shift;
print $eq->{'eq'} . "\n";
parse($eq);
compute($eq);
print "intermediate result: " . $eq->{'value'} . "\n";
}
sub parse {
my $eq = shift;
my ($command,#fields) = extract_multiple(
$eq->{'eq'}, [ sub { extract_bracketed(shift,'{}') } ]
);
$command =~ s/^\\//;
print "command: " . $command . "\n";
#fields = map { s/^\{\ *//; s/\ *\}$//; print "arg: $_\n"; {value => $_}; } #fields;
($eq->{'command'}, #{ $eq->{'args'} }) = ($command, #fields);
}
sub compute {
my ($eq) = #_;
#check arguements ...
foreach my $arg (#{$eq->{'args'}}) {
#if arguement is a number, continue
if ($arg->{'value'} =~ /^$re_num$/) {
next;
#if the arguement is a simple mathematical operation, do it and continue
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\ |\*|\\times)?\ *($re_num)$/) {
$arg->{'value'} = $1 * $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\+)?\ *($re_num)$/) {
$arg->{'value'} = $1 + $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\-)?\ *($re_num)$/) {
$arg->{'value'} = $1 - $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\/)?\ *($re_num)$/) {
$arg->{'value'} = $1 / $2;
} else {
#parse it and calc it as if it were its own equation.
$arg->{'eq'} = $arg->{'value'};
solve($arg);
}
}
my #args = #{$eq->{'args'}};
## add command processing here
# frac
if ($eq->{'command'} eq 'frac') {
$eq->{'value'} = $args[0]->{'value'} / $args[1]->{'value'};
return;
}
}
and here is a sample test.tex:
\documentclass{article}
\begin{document}
Hello World!
\begin{equation}
\frac{\frac{1}{3}}{2}
\end{equation}
\end{document}
Maybe using boost::spirit in order to tokenize the expression. You will need to define a huge grammar!
Use a parser generator to create an appropriate parser. Try ANTLR for this, as it includes an IDE for the Grammar, which is very helpful. Using tree rewrite rules, you can then convert the parse tree to an abstract syntax tree.
Start perhaps with the expression evaluator from ANTLR tutorial. I think this is reasonably close enough.
Related
I have several hundred PHP scripts that expect a language field to contain an ISO 639-1 2-character identifier. for example "en", which I now want to modify to support language codes qualified by country code, for example "fr-CA". In each of these scripts there is the following code:
case 'lang':
{ // language code
if (strlen($value) == 2)
$lang = strtolower($value);
break;
} // language code
which I want to modify to:
case 'lang':
{ // language code
if (strlen($value) >= 2)
$lang = strtolower(substr($value,0,2));
break;
} // language code
So I wrote a perl script to run over the entire directory tree and modify all of the matching scripts. For testing I have set the script up to create all of the modified scripts in a new directory structure:
use strict;
use warnings;
use 5.010;
use File::Find;
use File::Slurp;
my #content;
find( \&wanted, '/home/jcobban/public_html/');
exit;
sub wanted {
if (-f)
{
print "wanted: ", $File::Find::name, "\n";
my $odir = '/home/jcobban/testlang' . substr($File::Find::dir, 25);
if ((substr $odir, -1) ne "/"){
$odir = "$odir/";
}
if (! -d $odir){
mkdir $odir;
}
print "odir '$odir'\n";
my #lines = read_file($File::Find::name);
my $caselang = 0;
my $updated = 0;
foreach my $line (#lines){
if ($line =~ /\bcase\b/)
{
$caselang = $line =~ /\blang\b/i;
}
if ($line =~ /\bbreak\b/)
{
$caselang = 0;
}
if ($caselang)
{
print "old $line\n";
$line =~ s/ == 2/ >= 2/;
$line =~ s/strtolower(.value)/strtolower(substr(\$value,0,2))/;
$updated = 1;
print "new $line\n";
}
}
if ($updated)
{
# my $newfile = $File::Find::dir . "/" . $_;
my $newfile = $odir . $_;
print "alter \$lang to support ll-CC $newfile\n";
write_file($newfile, #lines);
}
else
{
print "did not find lang support in $_\n";
}
}
return;
}
The first match replace works, to change the == to >=, but the second match replace does not modify any lines and I do not understand well. I thought maybe there was a problem with matching to "\$" so I replaced it with "." but still no lines are changed. I applied the same command to other regex engines and they all worked. The output for a typical file is:
wanted: /home/jcobban/public_html/videoTutorials.php
odir '/home/jcobban/testlang/'
old case 'lang':
new case 'lang':
old {
new {
old if (strlen($value) == 2)
new if (strlen($value) >= 2)
old $lang = strtolower($value);
new $lang = strtolower($value);
alter $lang to support ll-CC /home/jcobban/testlang/videoTutorials.php
I have obviously been spending too much time using VIM. The problem with my code was that I needed to escape the round brackets so they were not interpreted as a subpattern.
$line =~ s/strtolower\(.value\)/strtolower(substr(\$value,0,2))/;
Just want to show some hacks, maybe it will be interesting for you:
s'strtolower\(\K\$value'substr($value,0,2)'
We can quote substitution with whatever we want:
s/foo/bar/;
s'foo'bar';
s(foo)(bar);
If we choose single quotes, variables will not be interpolated, but we still have to escape dollar sign in pattern side, because it will be treated as "end of line" by re engine.
\K Keep the stuff left of the \K
more information in perldoc perlre
I am trying to write a perl script that get all strings that is does not start and end with a single quote. And a string cannot be a part of comment # and each line in DATA is not necessary at the beginning of a line.
use warnings;
use strict;
my $file;
{
local $/ = undef;
$file = <DATA>;
};
my #strings = $file =~ /(?:[^']).*(?:[^'])/g;
print join ("\n",#strings);
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
I am getting no where with this regex.
The expected output is
"This is a string2"
"This is comment syntax #"
"This is string 4"
Obviously this is only an exercise, as there are been many students asking about this problem lately. Regex's will only ever get you part of the way there, as there will pretty much always be edge cases.
The following code is probably good enough for your purposes, but it doesn't even successfully parse itself because of quotes inside a qr{}. You'll have to figure out how to get strings that span lines to work on your own:
use strict;
use warnings;
my $doublequote_re = qr{"(?: (?> [^\\"]+ ) | \\. )*"}x;
my $singlequote_re = qr{'(?: (?> [^\\']+ ) | \\. )*'}x;
my $data = do { local $/; <DATA> };
while ($data =~ m{(#.*|$singlequote_re|$doublequote_re)}g) {
my $match = $1;
if ($match =~ /^#/) {
print "Comment - $match\n";
} elsif ($match =~ /^"/) {
print "Double quote - $match\n";
} elsif ($match =~ /^'/) {
print "Single quote - $match\n";
} else {
die "Carp! something went wrong! <$match>";
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Do not know how to achieve that by using regular expression, so here is a simple hand-written lexer:
#!/usr/bin/perl
use strict;
use warnings;
sub extract_string {
my #buf = split //, shift;
while (my $peer = shift #buf) {
if ($peer eq '"') {
my $str = "$peer";
while ($peer = shift #buf) {
$str .= "$peer";
last if $peer eq '"';
}
if ($peer) {
return ($str, join '', #buf);
}
else {
return ("", "");
}
}
elsif ($peer eq '#') {
return ("", "");
}
}
}
my ($str, $buf);
while ($buf = <DATA>) {
chomp $buf;
while (1) {
($str, $buf) = extract_string $buf;
print "$str\n" if $str;
last unless $buf;
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Another option is using Perl module such as PPI.
I have the following piece of code:
#!/usr/bin/perl
use strict;
use warnings;
#use diagnostics;
use URI qw( );
my #insert_words = qw(HELLO GOODBYE);
while (<DATA>) {
chomp;
my $url = URI->new($_);
my $query = $url->query;
foreach (#insert_words) {
# Use package vars to communicate with /(?{})/ blocks.
local our $insert_word = $_;
local our #queries;
if (defined $query) {
$query =~ m{
^(.*[/=])([^/=&]*)((?:[/=&].*)?)\z
(?{
if (length $2) {
push #queries, "$1$insert_word$2$3";
push #queries, "$1$insert_word$3";
push #queries, "$1$2$insert_word$3";
}
})
(?!)
}x;
}
if (#queries) {
for (#queries) {
$url->query($_);
print $url, "\n";
}
}
else {
print $url, "\n";
}
}
}
__DATA__
http://www.example.com/index.php?route=9&other=7
The above piece of code works correctly and produces the following output:
http://www.example.com/index.php?route=9&other=HELLO7
http://www.example.com/index.php?route=9&other=HELLO
http://www.example.com/index.php?route=9&other=7HELLO
http://www.example.com/index.php?route=HELLO9&other=7
http://www.example.com/index.php?route=HELLO&other=7
http://www.example.com/index.php?route=9HELLO&other=7
http://www.example.com/index.php?route=9&other=GOODBYE7
http://www.example.com/index.php?route=9&other=GOODBYE
http://www.example.com/index.php?route=9&other=7GOODBYE
http://www.example.com/index.php?route=GOODBYE9&other=7
http://www.example.com/index.php?route=GOODBYE&other=7
http://www.example.com/index.php?route=9GOODBYE&other=7
As you can see it inserts the words in the array at specific places in the url.
What I am now having problems with:
I would now like to add the functionality to do all the possible combinations of HELLO and GOODBYE (or whatever is in the #insert_words) as well, for example it should also add the following url's to the output I already get:
http://www.example.com/index.php?route=HELLO&other=GOODBYE
http://www.example.com/index.php?route=HELLO&other=HELLO
http://www.example.com/index.php?route=GOODBYE&other=HELLO
http://www.example.com/index.php?route=GOODBYE&other=GOODBYE
But I do not know how to go about this in the best way?
Your help with this will be much appreciated, many thanks
Please don't use fancy regexes like that - they are an experimental feature of Perl and are far from simple to comprehend.
If I understand you then you need to do this recursively.
I think you want all variations of the URL with each query parameter as it is, or preceded, succeeded, or replaced by every value in #insert_words.
This seems to do what you ask. It uses URI::QueryParam to split up the query portion of the URL properly instead of using your nasty regex. It does produce substantially more combinations than you show in your question but I can see no other way of interpreting your requirement.
The number of possible variations is 49. Each parameter can have its original value, or be preceded, succeeded or replaced by either of two values. That is seven possible values for each parameter and so 7² or 49 different variations for two parameters.
use strict;
use warnings;
use URI;
use URI::QueryParam;
my #insert_words = qw/ HELLO GOODBYE /;
my #urls;
sub mod_param {
my ($url, $paridx, #insertions) = #_;
my #params = $url->query_param;
return if $paridx > $#params;
my $key = $params[$paridx];
my $oldval = $url->query_param($key);
my #variations = ($oldval);
push #variations, ($oldval.$_, $_.$oldval, $_) for #insertions;
for my $val (#variations) {
$url->query_param($key, $val);
if ($paridx == $#params) {
push #urls, "$url";
}
else {
mod_param($url, $paridx + 1, #insertions);
}
}
$url->query_param($key, $oldval);
}
while (<DATA>) {
chomp;
my $url = URI->new($_);
#urls = ();
mod_param($url, 0, #insert_words);
print $_, "\n" for #urls;
}
__DATA__
http://www.example.com/index.php?route=9&other=7
output
http://www.example.com/index.php?route=9&other=7
http://www.example.com/index.php?route=9&other=7HELLO
http://www.example.com/index.php?route=9&other=HELLO7
http://www.example.com/index.php?route=9&other=HELLO
http://www.example.com/index.php?route=9&other=7GOODBYE
http://www.example.com/index.php?route=9&other=GOODBYE7
http://www.example.com/index.php?route=9&other=GOODBYE
http://www.example.com/index.php?route=9HELLO&other=7
http://www.example.com/index.php?route=9HELLO&other=7HELLO
http://www.example.com/index.php?route=9HELLO&other=HELLO7
http://www.example.com/index.php?route=9HELLO&other=HELLO
http://www.example.com/index.php?route=9HELLO&other=7GOODBYE
http://www.example.com/index.php?route=9HELLO&other=GOODBYE7
http://www.example.com/index.php?route=9HELLO&other=GOODBYE
http://www.example.com/index.php?route=HELLO9&other=7
http://www.example.com/index.php?route=HELLO9&other=7HELLO
http://www.example.com/index.php?route=HELLO9&other=HELLO7
http://www.example.com/index.php?route=HELLO9&other=HELLO
http://www.example.com/index.php?route=HELLO9&other=7GOODBYE
http://www.example.com/index.php?route=HELLO9&other=GOODBYE7
http://www.example.com/index.php?route=HELLO9&other=GOODBYE
http://www.example.com/index.php?route=HELLO&other=7
http://www.example.com/index.php?route=HELLO&other=7HELLO
http://www.example.com/index.php?route=HELLO&other=HELLO7
http://www.example.com/index.php?route=HELLO&other=HELLO
http://www.example.com/index.php?route=HELLO&other=7GOODBYE
http://www.example.com/index.php?route=HELLO&other=GOODBYE7
http://www.example.com/index.php?route=HELLO&other=GOODBYE
http://www.example.com/index.php?route=9GOODBYE&other=7
http://www.example.com/index.php?route=9GOODBYE&other=7HELLO
http://www.example.com/index.php?route=9GOODBYE&other=HELLO7
http://www.example.com/index.php?route=9GOODBYE&other=HELLO
http://www.example.com/index.php?route=9GOODBYE&other=7GOODBYE
http://www.example.com/index.php?route=9GOODBYE&other=GOODBYE7
http://www.example.com/index.php?route=9GOODBYE&other=GOODBYE
http://www.example.com/index.php?route=GOODBYE9&other=7
http://www.example.com/index.php?route=GOODBYE9&other=7HELLO
http://www.example.com/index.php?route=GOODBYE9&other=HELLO7
http://www.example.com/index.php?route=GOODBYE9&other=HELLO
http://www.example.com/index.php?route=GOODBYE9&other=7GOODBYE
http://www.example.com/index.php?route=GOODBYE9&other=GOODBYE7
http://www.example.com/index.php?route=GOODBYE9&other=GOODBYE
http://www.example.com/index.php?route=GOODBYE&other=7
http://www.example.com/index.php?route=GOODBYE&other=7HELLO
http://www.example.com/index.php?route=GOODBYE&other=HELLO7
http://www.example.com/index.php?route=GOODBYE&other=HELLO
http://www.example.com/index.php?route=GOODBYE&other=7GOODBYE
http://www.example.com/index.php?route=GOODBYE&other=GOODBYE7
http://www.example.com/index.php?route=GOODBYE&other=GOODBYE
Perl's xcopy has the method fn_pat to specify a regular expression for the pattern matching and I want to use this to recursively copy a directory ignoring all files/folders that any of these strings:
.svn
build
test.blah
I am stumbling with the syntax to do that, I have looked over many perl regular expression guides but for the life of me I just can not get the hang of it. I appreciate any help.
Thanks.
... update ...
I found a perl regex that seems to be working, just not with xcopy's fn_pat. Not sure if this is a bug with xcopy or if my expression is not correct, however my tests show its ok.
$exp = '^(?!.*(\.svn|build|test\.blah)).*$';
if( '/dev/bite/me/.svn' =~ $exp ){ print "A\n"; }
if( '/dev/bite/me/.svn/crumbs' =~ $exp ){ print "B\n"; }
if( '/dev/build/blah.ext' =~ $exp ){ print "C\n"; }
if( '/dev/crap/test.blah/bites' =~ $exp ){ print "D\n"; }
if( '/dev/whats/up.h' =~ $exp ){ print "E\n"; }
only E prints as I was hoping. I'm curious to know if this is correct or not as well as to any ideas why its not working with xcopy.
Here is where File::Xcopy calls File::Find::finddepth:
sub find_files {
my $self = shift;
my $cls = ref($self)||$self;
my ($dir, $re) = #_;
my $ar = bless [], $cls;
my $sub = sub {
(/$re/)
&& (push #{$ar}, {file=>$_, pdir=>$File::Find::dir,
path=>$File::Find::name});
};
finddepth($sub, $dir);
return $ar;
}
Here $re is your regexp.
According to the File::Find docs, $_ will be set to just the leaf name of the file being visited unless the no_chdir option used.
The only way I can see to get the no_chdir option passed to finddepth is to monkey-patch File::Xcopy::finddepth:
use File::Xcopy;
*{"File::Xcopy::finddepth"} = sub {
my ($sub, $dir) = #_;
File::Find::finddepth({ no_chdir => 1, wanted => $sub}, $dir);
};
Normally if you wish to change a variable with regex you do this:
$string =~ s/matchCase/changeCase/;
But is there a way to simply do the replace inline without setting it back to the variable?
I wish to use it in something like this:
my $name="jason";
print "Your name without spaces is: " $name => (/\s+/''/g);
Something like that, kind of like the preg_replace function in PHP.
Revised for Perl 5.14.
Since 5.14, with the /r flag to return the substitution, you can do this:
print "Your name without spaces is: [", do { $name =~ s/\s+//gr; }
, "]\n";
You can use map and a lexical variable.
my $name=" jason ";
print "Your name without spaces is: ["
, ( map { my $a = $_; $a =~ s/\s+//g; $a } ( $name ))
, "]\n";
Now, you have to use a lexical because $_ will alias and thus modify your variable.
The output is
Your name without spaces is: [jason]
# but: $name still ' jason '
Admittedly do will work just as well (and perhaps better)
print "Your name without spaces is: ["
, do { my ( $a = $name ) =~ s/\s+//g; $a }
, "]\n";
But the lexical copying is still there. The assignment within in the my is an abbreviation that some people prefer (not me).
For this idiom, I have developed an operator I call filter:
sub filter (&#) {
my $block = shift;
if ( wantarray ) {
return map { &$block; $_ } #_ ? #_ : $_;
}
else {
local $_ = shift || $_;
$block->( $_ );
return $_;
}
}
And you call it like so:
print "Your name without spaces is: [", ( filter { s/\s+//g } $name )
, "]\n";
print "Your name without spaces is: #{[map { s/\s+//g; $_ } $name]}\n";