Evaluating math in regsub - regex

I need to convert a string from 2 dimensional array to 1 dimension in Tcl.
Eg
lane_out[0][7] -> lane_out[7]
lane_out[1][0] -> lane_out[8]
The initial string is read from a file, modified and stored in another file. Also, the input file has a lot of different kinds of strings that need replacement so the user provides the find and replace regex in another file.
I have successfully done this for simple strings that required no additional calculations. But I need help in executing expressions. I was thinking that I could have the user provide the expression in the file and execute that but I have not been successful.
User Input File:
lappend userinput {\[(\d+)]\*$ *}
lappend userinput {\[(\d+)]\[(\d+)]$ [expr{\1*8+\2}]}
My broken code:
set line "lane_out[1][0]"
foreach rule $userinput {
foreach {find replace} [regexp -all -inline {\S+} $rule] break
regsub -all $find $line $replace line
set match [regexp -all -inline {expr{.*}} $line] #match = expr{1*8+0}
set val [$match] #supposed to execute what is in match
regsub expr{.*} $line $val line
}

What you do is very complex: you have an input file, a set of search/replace rules, and produce some output. The rules, however, require calling expr.
Here is a made-up data file (data.txt):
lane_out[0][7] = "foo bar"
lane_out[1][0] = "foo bar"
lane_out[1][1] = "foo bar"
lane_out[2][0] = "foo bar"
lane_out[2][1] = "foo bar"
And the rules file (rules.txt):
{\[(\d+)\]\[(\d+)\]} {\[[expr {\1 * 8 + \2}]\]}
Here is my script (search_replace.tcl):
package require Tclx
# Read the rules file
set rules {}
for_file line rules.txt {
lassign $line find replace
lappend rules $find $replace
}
# Read the data, apply rules
for_file line data.txt {
foreach {find replace} $rules {
regsub -all $find $line $replace line
set line [subst $line]
puts $line
}
}
Output:
lane_out[7] = "foo bar"
lane_out[8] = "foo bar"
lane_out[9] = "foo bar"
lane_out[16] = "foo bar"
lane_out[17] = "foo bar"
Discussion
The rules.txt's format: each line contains search- and replace expressions, separated by a space
The code will translate a line such as
lane_out[2][1] = "foo bar"
to:
lane_out\[[expr {2 * 8 + 1}]\] = "foo bar"
Then, the subst command replaces that with:
lane_out[17] = "foo bar"
The tricky part is to escape the square brackets so that expr can do the right thing.

Related

TCL--Regexp matching not happening properly

set cell "HEADBUFTIE42D_D3_N"
set postfix "_M7P5TR_C60L08"
set line " cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) { "
if {[regexp "^ +cell[(]$cell$postfix[)] *\\{" $line match]} {
puts "hello"
}
hereI am trying to match the line
cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) {
Note that there are 2 spaces in beginning and 1 space after {.
But the match does not happen. Please help
You have a problem of not escaping the regexp meta characters properly ({ is not being escaped, then [(] is trying to execute a command named (, and so on). It's easier if you use format like so:
set cell "HEADBUFTIE42D_D3_N"
set postfix "_M7P5TR_C60L08"
set line " cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) { "
set re [format {^ +cell\(%s%s\) *\{} $cell $postfix]
if {[regexp $re $line match]} {
puts "hello"
}

Regex in Tcl not working

I am trying to match two strings ....one of the string i am getting from list and other one is declared by me.
set name " HTTP REQUEST = 1\n HTTP REQUEST(SUCCESS) = 0\nSERVER CONN = 1"
set pattern "HTTP REQUEST(SUCCESS)*"
set List [split $name "\n"]
foreach var $List {
set var [lindex $List 1]
#set var2 [string trim $var1 " "]
}
puts $var
if {[regexp $var $pattern match]} {
puts " matched!"
puts $match
} else {
puts " not matched!"
}
There are two errors:
Parentheses must be escaped with literal backslashes
The text input should go after the pattern in a regexp call
So use
set pattern {HTTP REQUEST\(SUCCESS\)}
^ ^
and then
if {[regexp $pattern $var match]} {
^^^^^^^^^^^^^
See this code demo

Grepping second pattern after matching first pattern

Is there any grep/sed option which will allow me to match a pattern after matching another pattern? For example: Input file (foos are variable patterns starting with 0 mixed with random numbers preceded by # in front):
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
So once I try to search for a variable pattern (eg. foo2), I also want to match another pattern (eg, #number) from this pattern line number, in this case, #89888.
Therefore output for variable foo2 must be:
foo2 #89888
For variable foo5:
foo5 #98980
foos consist of every character, including which may be considered metacharacters.
I tried a basic regex match script using tcl which will first search for foo* and then search for next immediate #, but since I am working with a very large file, it will take days to finish. Any help is appreciated.
A Perl one-liner to slurp the whole file and match across any newlines for the pattern you seek would look like:
perl -000 -nle 'm{(foo2).*(\#89888)}s and print join " ",$1,$2' file
The -000 switch enables "slurp" mode which signals Perl not to split the file into chunks, but rather treat it as one large string. The s modifier lets . match any character, including a newline.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my ( %matches, $recent_foo );
while(<DATA>)
{
chomp;
( $matches{$recent_foo} ) = $1 if m/(\\#\d+)/;
( $recent_foo ) = $1 if m/(0foo\d+)/;
}
print Dumper( \%matches );
__DATA__
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
./perl
$VAR1 = {
'0foo5' => '\\#98980',
'0foo3' => '\\#89888'
};
If what you want is 0foo1, 0foo2 and 0foo3 to all have the same value the following will do:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my ( %matches, #recent_foo );
while(<DATA>)
{
chomp;
if (/^\\#/)
{
#matches{#recent_foo} = ($') x #recent_foo;
undef #recent_foo;
}
elsif (/^0/)
{
push #recent_foo, $';
}
}
print Dumper( \%matches );
__DATA__
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
gives:
$VAR1 = {
'foo2' => '89888',
'foo1' => '89888',
'foo5' => '98980',
'foo3' => '89888',
'foo4' => '98980'
};
Var='foo2'
sed "#n
/${Var}/,/#[0-9]\{1,\}/ {
H
/#[0-9]\{1,\}/ !d
s/.*//;x
s/.//;s/\n.*\\n/ /p
q
}" YourFile
Not clear as request. It take first occurence of your pattern foo2 until first #number, remove line between and print both line in 1 than quit (no other extract
A Tcl solution. The procedure runs in a little over 3 microseconds, so you'll need very large data files to have it run for days. If more than one token matches, the first match is used (it's easy to rewrite the procedure to return all matches).
set data {
0foo1
0foo2
0foo3
\#89888
0foo4
0foo5
\#98980
0foo6
}
proc find {data pattern} {
set idx [lsearch -regexp $data $pattern]
if {$idx >= 0} {
lrange $data $idx $idx+1
}
}
find $data 0foo3
# -> 0foo3 #89888
find $data 0f.*5
# -> 0foo5 #98980
Documentation: if, lrange, lsearch, proc, set
sed
sed -n '/foo2/,/#[0-9]\+/ {s/^[[:space:]]*[0\\]//; p}' file |
sed -n '1p; $p' |
paste -s
The first sed prints all the lines between the first pattern and the 2nd, removing optional leading whitespace and the leading 0 or \.
The second sed extracts only the first and last lines.
The paste command prints the 2 lines as a single line, separated with a tab.
awk
awk -v p1=foo5 '
$0 ~ p1 {found = 1}
found && /#[0-9]+/ { sub(/^\\\/, ""); print p1, $0; exit }
' file
tcl
lassign $argv filename pattern1
set found false
set fid [open $filename r]
while {[gets $fid line] != -1} {
if {[string match "*$pattern1*" $line]} {
set found true
}
if {$found && [regexp {#\d+} $line number]} {
puts "$pattern1 $number"
break
}
}
close $fid
Then
$ tclsh 2patt.tcl file foo4
foo4 #98980
Is this what you want?
$ awk -v tgt="foo2" 'index($0,tgt){f=1} f&&/#[0-9]/{print tgt, $0; exit}' file
foo2 \#89888
$ awk -v tgt="foo5" 'index($0,tgt){f=1} f&&/#[0-9]/{print tgt, $0; exit}' file
foo5 \#98980
I'm using index() above as it searches for a string not a regexp and so could not care less what RE metacharacters are in foo - they are all just literal characters in a string.
It's not clear from your question if you want to find a specific number after a specific foo or the first number after foo2 or even if you want to search for a specific foo value or all "foo"s or...

How to replace two or more strings in a file in tcl

I am reading a file and trying to replace 3 strings in different lines using regsub.
Input file :
This is a bus
This is a car
This is a bike
Output Expected:
This is a Plane
This is a Scooter
This is a Bicycle
if i use
puts $out [regsub -all "( bus)" $line "\ $x" ]
puts $out [regsub -all "( car)" $line "\ $y" ]
puts $out [regsub -all "( bike)" $line "\ $z" ]
As i am calling as a proc with arguments x,y,z as plane,scooter,bicycle.
But This is printing all lines 3 times. How to replace all three strings ??
You can also use string map to replace strings:
string map {{ bus} { Plane} { car} { Scooter} { bike} { Bicycle}} $input_string
The arguments are a list of pairs of "find" "replace" strings and then your input string...
BTW. With the regsub method, you can nest the regsubs, so that the result of one becomes the input of the other e.g. with two: regsub -all { bus} [regsub -all { car} $input_string { Scooter}] { Plane} it isn't very readable though!
Also note that you don't need to capture the group with parentheses in your expression: "( car)" would do an extra sub-group capture that you don't actually use... { car} is better...
The clearest way is to write the line to a variable in-between each substitution. Writing back to the variable it came from is quite often the easiest approach. You can then print the result out once at the end.
set line [regsub -all "( bus)" $line "\ $x"]
set line [regsub -all "( car)" $line "\ $y"]
set line [regsub -all "( bike)" $line "\ $z"]
puts $out $line
In case you read file line by line you can use if operator.
while { [ gets $fh line ] >= 0} {
if {[regexp -all -- { bus} $line]} {
puts $out [regsub -all "( bus)" $line "\ $x" ]
} elseif {[regexp -all -- { car} $line]} {
puts $out [regsub -all "( car)" $line "\ $y" ]
} else {
puts $out [regsub -all "( bike)" $line "\ $z" ]
}
}

How can I substitute one substring for another in Perl?

I have a file and a list of string pairs which I get from another file. I need substitute the first string of the pair with the second one, and do this for each pair.
Is there more efficient/simple way to do this (using Perl, grep, sed or other), then running a separate regexp substitution for each pair of values?
#! /usr/bin/perl
use warnings;
use strict;
my %replace = (
"foo" => "baz",
"bar" => "quux",
);
my $to_replace = qr/#{["(" .
join("|" => map quotemeta($_), keys %replace) .
")"]}/;
while (<DATA>) {
s/$to_replace/$replace{$1}/g;
print;
}
__DATA__
The food is under the bar in the barn.
The #{[...]} bit may look strange. It's a hack to interpolate generated content inside quote and quote-like operators. The result of the join goes inside the anonymous array-reference constructor [] and is immediately dereferenced thanks to #{}.
If all that seems too wonkish, it's the same as
my $search = join "|" => map quotemeta($_), keys %replace;
my $to_replace = qr/($search)/;
minus the temporary variable.
Note the use of quotemeta—thanks Ivan!—which escapes the first string of each pair so the regular-expression engine will treat them as literal strings.
Output:
The bazd is under the quux in the quuxn.
Metaprogramming—that is, writing a program that writes another program—is also nice. The beginning looks familiar:
#! /usr/bin/perl
use warnings;
use strict;
use File::Compare;
die "Usage: $0 path ..\n" unless #ARGV >= 1;
# stub
my #pairs = (
["foo" => "baz"],
["bar" => "quux"],
['foo$bar' => 'potrzebie\\'],
);
Now we generate the program that does all the s/// replacements—but is quotemeta on the replacement side a good idea?—
my $code =
"sub { while (<>) { " .
join(" " => map "s/" . quotemeta($_->[0]) .
"/" . quotemeta($_->[1]) .
"/g;",
#pairs) .
"print; } }";
#print $code, "\n";
and compile it with eval:
my $replace = eval $code
or die "$0: eval: $#\n";
To do the replacements, we use Perl's ready-made in-place editing:
# set up in-place editing
$^I = ".bak";
my #save_argv = #ARGV;
$replace->();
Below is an extra nicety that restores backups that the File::Compare module judges to have been unnecessary:
# in-place editing is conservative: it creates backups
# regardless of whether it modifies the file
foreach my $new (#save_argv) {
my $old = $new . $^I;
if (compare($new, $old) == 0) {
rename $old => $new
or warn "$0: rename $old => $new: $!\n";
}
}
There are two ways, both of them require you to compile a regex alternation on the keys of the table:
my %table = qw<The A the a quick slow lazy dynamic brown pink . !>;
my $alt
= join( '|'
, map { quotemeta } keys %table
sort { ( length $b <=> length $a ) || $a cmp $b }
)
;
my $keyword_regex = qr/($alt)/;
Then you can use this regex in a substitution:
my $text
= <<'END_TEXT';
The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.
The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.
END_TEXT
$text =~ s/$keyword_regex/$table{ $1 }/ge; # <- 'e' means execute code
Or you can do it in a loop:
use English qw<#LAST_MATCH_START #LAST_MATCH_END>;
while ( $text =~ /$keyword_regex/g ) {
my $key = $1;
my $rep = $table{ $key };
# use the 4-arg form
substr( $text, $LAST_MATCH_START[1]
, $LAST_MATCH_END[1] - $LAST_MATCH_START[1], $rep
);
# reset the position to start + new actual
pos( $text ) = $LAST_MATCH_START[1] + length $rep;
}
Build a hash of the pairs. Then split the target string into word tokens, and check each token against the keys in the hash. If it's present, replace it with the value of that key.
If eval is not a security concern:
eval $(awk 'BEGIN { printf "sed \047"} {printf "%s", "s/\\<" $1 "\\>/" $2 "/g;"} END{print "\047 substtemplate"}' substwords )
This constructs a long sed command consisting of multiple substitution commands. It's subject to potentially exceeding your maximum command line length. It expects the word pair file to consist of two words separated by whitespace on each line. Substitutions will be made for whole words only (no clbuttic substitutions).
It may choke if the word pair file contains characters that are significant to sed.
You can do it this way if your sed insists on -e:
eval $(awk 'BEGIN { printf "sed"} {printf "%s", " -e \047s/\\<" $1 "\\>/" $2 "/g\047"} END{print " substtemplate"}' substwords)