Normally if you wish to change a variable with regex you do this:
$string =~ s/matchCase/changeCase/;
But is there a way to simply do the replace inline without setting it back to the variable?
I wish to use it in something like this:
my $name="jason";
print "Your name without spaces is: " $name => (/\s+/''/g);
Something like that, kind of like the preg_replace function in PHP.
Revised for Perl 5.14.
Since 5.14, with the /r flag to return the substitution, you can do this:
print "Your name without spaces is: [", do { $name =~ s/\s+//gr; }
, "]\n";
You can use map and a lexical variable.
my $name=" jason ";
print "Your name without spaces is: ["
, ( map { my $a = $_; $a =~ s/\s+//g; $a } ( $name ))
, "]\n";
Now, you have to use a lexical because $_ will alias and thus modify your variable.
The output is
Your name without spaces is: [jason]
# but: $name still ' jason '
Admittedly do will work just as well (and perhaps better)
print "Your name without spaces is: ["
, do { my ( $a = $name ) =~ s/\s+//g; $a }
, "]\n";
But the lexical copying is still there. The assignment within in the my is an abbreviation that some people prefer (not me).
For this idiom, I have developed an operator I call filter:
sub filter (&#) {
my $block = shift;
if ( wantarray ) {
return map { &$block; $_ } #_ ? #_ : $_;
}
else {
local $_ = shift || $_;
$block->( $_ );
return $_;
}
}
And you call it like so:
print "Your name without spaces is: [", ( filter { s/\s+//g } $name )
, "]\n";
print "Your name without spaces is: #{[map { s/\s+//g; $_ } $name]}\n";
Related
I am trying to parse through a simple enough file of field and value pairs.
So some fields I am not interested in and I want to skip
So in my "play" code I had a static thing like this:
next if $field =~ m/fieldToIgnore1|fieldToIgnore2/;
... then I extended this an an array and still happy
print "== using ~~ ==\n";
foreach my $field (#fields) {
next if $field ~~ #foni;
print "$field\n";
}
(fnoi == fields not of interest)
But when I carry that over back into my non-play setup it doesn't work.
Now in the play I was just looping over
my #fields = ("field1", "field2");
my #foni = ("fieldToIgnore1", "fieldToIgnore1");
In my proper code I go through each line and take out the lines that are setup like field - value lines and then strip out the field into a scalar... hence why I thought it would the same idea as my play code - but it doesn't seem to be
while ( <$infile> ) {
if ( /^PUBLISH:/ ) {
( $symbol, $record_type ) = ( $1, $2 );
print "symbol is: [$symbol]\n";
} else {
my ( $field, $value ) = split(/\|/);
next unless $value;
print "field is: [$field]\n";
print "value is: [$value]\n";
$field =~ s/^\s+|\s+$//g;
$value =~ s/^\s+|\s+$//g;
print "... field is: [$field]\n";
print "... value is: [$value]\n";
## ADD FIELD SKIPPING LOGIC HERE
You can build a regex pattern from your array, like this
my $re = join '|', #foni;
$re = qr/$re/; # Compile the regex
for my $field (#fields) {
next if $field =~ $re;
...
}
I'm tying to come up with some regex that will remove all space chars from a string as long as it's not inside of double quotes (").
Example string:
some string with "text in quotes"
Result:
somestringwith"text in quotes"
So far I've come up with something like this:
$str =~ /"[^"]+"|/g;
But it doesn't seem to be giving the intended result.
I'm honestly very new at perl and haven't had too much regexp experience. So if anyone willing to answer would also be willing to provide some insight into the why and how that would be great!
Thanks!
EDIT
String will not contain escaped "'s
It should actually always be formatted like this:
Some.String = "Some Value"
Result would be
Some.String="Some Value"
Here is a technique using split to separate the quoted strings. It relies on your data being consistent and will not work with loose quotes.
use strict;
use warnings;
my #line = split /("[^"]*")/;
for (#line) {
unless (/^"/) {
s/[ \t]+//g;
}
}
print #line; # line is altered
Basically, you split up the string in order to isolate the quoted strings. Once that is done, perform the substitution on all other strings. Since the array elements are aliased in the loop, substitutions are performed on the actual array.
You can run this script like so:
perl -n script.pl inputfile
To see the output. Or
perl -n -i.bak script.pl inputfile
To do in-place edit on inputfile, while saving backup in inputfile.bak.
With that said, I'm not sure what your edit means. Do you want to change
Some.String = "Some Value"
to
Some.String="Some Value"
Text::ParseWords is tailor-made for this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::ParseWords;
my #strings = (
q{This.string = "Hello World"},
q{That " string " and "another shoutout to my bytes"},
);
for my $s ( #strings ) {
my #words = quotewords '\s+', 1, $s;
print join('', #words), "\n";
}
Output:
This.string="Hello World"
That" string "and"another shoutout to my bytes"
Using Text::ParseWords means if you ever had to deal with quoted strings with escaped quotation marks in them, you'd be ready ;-)
Also, this sounds like you have a configuration file of some sort and you're trying to parse it. If that is the case, there are probably better solutions.
I suggest removing the quoted substrings using split and then recombining them with join after removing whitespace from the intermediate text.
Note that if the regex used for split contains captures then the captured values will also be included in the list returned.
Here's some sample code.
use strict;
use warnings;
my $source = <<END;
Some.String = "Some Value";
Other.String = "Other Value";
Last.String = "Last Value";
END
print join '', map {s/\s+// unless /"/; $_; } split /("[^"]*")/, $source;
output
Some.String= "Some Value";Other.String = "Other Value";Last.String = "Last Value";
I would simply loop through the string char by char. This way you can handle escaped strings too (just add an isEscaped variable).
my $text='lala "some thing with quotes " lala ... ';
my $quoteOpen = 0;
my $out;
foreach $char(split//,$text) {
if ($char eq "\"" && $quoteOpen==0) {
$quoteOpen = 1;
$out .= $char;
} elsif ($char eq "\"" && $quoteOpen==1) {
$quoteOpen = 0;
$out .= $char;
} elsif ($char =~ /\s/ && $quoteOpen==1) {
$out .= $char;
} elsif ($char !~ /\s/) {
$out .= $char;
}
}
print "$out\n";
Splitting on double quotes, removing spaces only from even fields (i.e. those in quotes):
sub remove_spaces {
my $string = shift;
my #fields = split /"/, $string . ' '; # trailing space needed to keep final " in output
my $flag = 1;
return join '"', map { s/ +//g if $flag; $flag = ! $flag; $_} #fields;
}
It can be done with regex:
s/([^ ]*|\"[^\"]*\") */$1/g
Note that this won't handle any kind of escapes inside the quotes.
I would like to convert parse (la)tex math expressions, and convert them to (any kind of!) scripting language expression, so I can evaluate expressions.
What libraries do you recommend ?
May be it will help - take a look at TeXmacs, especially at a way it interacts with computer algebra systems.
Here is a set of possible options from a similar question. https://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
I think that Perl would make a fine choice for something like this, acting on text is one of its fortes.
Here is some info on how to make an exclusive flip-flop test (to find the context between \begin{} and \end{} without keeping those lines), http://www.effectiveperlprogramming.com/2010/11/make-exclusive-flip-flop-operators/
EDIT: So this problem has started me going. Here is a first attempt to create something here is my "math.pl" which takes a .tex file as an arguement (i.e. $./math.pl test.tex).
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;
my $re_num = qr/[+\-\dE\.]/;
my $file = shift;
open( my $fh, '<', $file);
#parsing this out for more than just the equation environment might be easier using Text::Balanced too.
my #equations;
my $current_equation = '';
while(<$fh>) {
my $test;
next unless ($test = /\\begin\{equation\}/ .. /\\end\{equation\}/);
if ($test !~ /(^1|E0)$/ ) {
chomp;
$current_equation .= $_;
} elsif ($test =~ /E0$/) {
#print $current_equation . "\n";
push #equations, {eq => $current_equation};
$current_equation = '';
}
}
foreach my $eq (#equations) {
print "Full Equation: " . $eq->{'eq'} . "\n";
solve($eq);
print "Result: " . $eq->{'value'} . "\n\n";
}
sub solve {
my $eq = shift;
print $eq->{'eq'} . "\n";
parse($eq);
compute($eq);
print "intermediate result: " . $eq->{'value'} . "\n";
}
sub parse {
my $eq = shift;
my ($command,#fields) = extract_multiple(
$eq->{'eq'}, [ sub { extract_bracketed(shift,'{}') } ]
);
$command =~ s/^\\//;
print "command: " . $command . "\n";
#fields = map { s/^\{\ *//; s/\ *\}$//; print "arg: $_\n"; {value => $_}; } #fields;
($eq->{'command'}, #{ $eq->{'args'} }) = ($command, #fields);
}
sub compute {
my ($eq) = #_;
#check arguements ...
foreach my $arg (#{$eq->{'args'}}) {
#if arguement is a number, continue
if ($arg->{'value'} =~ /^$re_num$/) {
next;
#if the arguement is a simple mathematical operation, do it and continue
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\ |\*|\\times)?\ *($re_num)$/) {
$arg->{'value'} = $1 * $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\+)?\ *($re_num)$/) {
$arg->{'value'} = $1 + $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\-)?\ *($re_num)$/) {
$arg->{'value'} = $1 - $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\/)?\ *($re_num)$/) {
$arg->{'value'} = $1 / $2;
} else {
#parse it and calc it as if it were its own equation.
$arg->{'eq'} = $arg->{'value'};
solve($arg);
}
}
my #args = #{$eq->{'args'}};
## add command processing here
# frac
if ($eq->{'command'} eq 'frac') {
$eq->{'value'} = $args[0]->{'value'} / $args[1]->{'value'};
return;
}
}
and here is a sample test.tex:
\documentclass{article}
\begin{document}
Hello World!
\begin{equation}
\frac{\frac{1}{3}}{2}
\end{equation}
\end{document}
Maybe using boost::spirit in order to tokenize the expression. You will need to define a huge grammar!
Use a parser generator to create an appropriate parser. Try ANTLR for this, as it includes an IDE for the Grammar, which is very helpful. Using tree rewrite rules, you can then convert the parse tree to an abstract syntax tree.
Start perhaps with the expression evaluator from ANTLR tutorial. I think this is reasonably close enough.
I'm trying to make an on-the-fly pattern tester in Perl.
Basically it asks you to enter the pattern, and then gives you a >>>> prompt where you enter possible matches. If it matches it says "%%%% before matched part after match" and if not it says "%%%! string that didn't match". It's trivial to do like this:
while(<>){
chomp;
if(/$pattern/){
...
} else {
...
}
}
but I want to be able to enter the pattern like /sometext/i rather than just sometext
I think I'd use an eval block for this? How would I do such a thing?
This sounds like a job for string eval, just remember not to eval untrusted strings.
#!/usr/bin/perl
use strict;
use warnings;
my $regex = <>;
$regex = eval "qr$regex" or die $#;
while (<>) {
print /$regex/ ? "matched" : "didn't match", "\n";
}
Here is an example run:
perl x.pl
/foo/i
foo
matched
Foo
matched
bar
didn't match
^C
You can write /(?i:<pattern>)/ instead of /<pattern>/i.
This works for me:
my $foo = "My bonnie lies over the ocean";
print "Enter a pattern:\n";
while (<STDIN>) {
my $pattern = $_;
if (not ($pattern =~ /^\/.*\/[a-z]?$/)) {
print "Invalid pattern\n";
} else {
my $x = eval "if (\$foo =~ $pattern) { return 1; } else { return 0; }";
if ($x == 1) {
print "Pattern match\n";
} else {
print "Not a pattern match\n";
}
}
print "Enter a pattern:\n"
}
I am reading each line of an input file (IN) and printing the line read to an output file (OUT) if the line begins with one of the patterns, say "ab", "cd","ef","gh","ij" etc. The line printed is of form "pattern: 100" or form "pattern: 100:200". I need to replace "pattern" with "myPattern", i.e. print the current line to FILE but replace all the text before the first occurence of ":" with "myPattern". What is the best way to do this?
Currently I have:
while ( <IN> )
{
print FILE if /^ab:|^bc:|^ef:|^gh:/;
}
I am not sure if substr replacement would help as "pattern" can be either "ab" or"cd" or "ef" or "gh" etc.
Thanks!
Bi
Generically, do this like:
my %subst = ( 'ab' => 'newab', 'bc' => 'newbc', 'xy' => 'newxy' );
my $regex = join( '|', map quotemeta, sort { length($b) <=> length($a) } keys %subst );
$regex = qr/^($regex):/;
while ( <IN> ) {
print FILE if s/$regex/$subst{$1}:/;
}
The sort puts the longest ones first, so that if the data has ab:: and both ab and ab: are being substituted, ab: is used instead of ab.
Perl's substitution operator by default (a) uses the first match, (b) only replaces one match and (c) returns true if a replacement was made and false if it wasn't.
So:
while ( <IN> )
{
if (s/<pattern1>:/<replace1>/ ||
s/<pattern2>:/<replace2>/) {
print FILE;
}
}
Should work for you. Note that because of short-circuiting, only one substitution will be made.
sub replacer {
$line = shift;
$find = shift;
$replace = shift;
$line =~ /([^:]+):/
if ($1 =~ /$find/) {
$line =~ s/([^:]+):/$replace/ ;
return $line;
}
return ;
}
while (<IN>)
{
print OUT replacer ($_,"mean","variance");
print OUT replacer ($_,"pattern","newPattern");
}
My perl is a little rusty, so syntax might not be exact.
edit: Put it in a function for ya.
while ( <IN> )
{
s/^pattern:/myPattern:/;
print OUT
}
This might be what you want:
$expr = "^(ab)|(cd)|(ef)|(gh)|(ij)";
while (<IN>)
{
if (/$expr:/)
{
s/$expr/$myPattern/;
print FILE;
}
}
The shortest way to do what you ask above is to re-use your code, but include a substitution.
while ( <IN> )
{
print FILE if s/^(ab|bc|ef|gh):/MyPattern:/;
}
Any of the left hand side patterns will be replaced. If the left hand side does not match, nothing will be printed.