TCL--Regexp matching not happening properly - regex

set cell "HEADBUFTIE42D_D3_N"
set postfix "_M7P5TR_C60L08"
set line " cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) { "
if {[regexp "^ +cell[(]$cell$postfix[)] *\\{" $line match]} {
puts "hello"
}
hereI am trying to match the line
cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) {
Note that there are 2 spaces in beginning and 1 space after {.
But the match does not happen. Please help

You have a problem of not escaping the regexp meta characters properly ({ is not being escaped, then [(] is trying to execute a command named (, and so on). It's easier if you use format like so:
set cell "HEADBUFTIE42D_D3_N"
set postfix "_M7P5TR_C60L08"
set line " cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) { "
set re [format {^ +cell\(%s%s\) *\{} $cell $postfix]
if {[regexp $re $line match]} {
puts "hello"
}

Related

Dynamic regular expression for Nesting brackets failed due to unknow bugs

rencently I have met a strange bug when use a dynamic regular expressions in perl for Nesting brackets' match. The origin string is " {...test{...}...} ", I want to grep the pair brace begain with test, "test{...}". actually there are probably many pairs of brace before and end this group , I don't really know the deepth of them.
Following is my match scripts: nesting_parser.pl
#! /usr/bin/env perl
use Getopt::Long;
use Data::Dumper;
my %args = #ARGV;
if(exists$args{'-help'}) {printhelp();}
unless ($args{'-file'}) {printhelp();}
unless ($args{'-regex'}) {printhelp();}
my $OpenParents;
my $counts;
my $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;$counts++; print "\nLeft:".$OpenParents." ;"})
| \} (?(?{$OpenParents ne 0; $counts++}) (?{$OpenParents--;print "Right: ".$OpenParents." ;"})) (?(?{$OpenParents eq 0}) (?!))
)*
)
}x;
my $string = `cat $args{'-file'}`;
my $partten = $args{'-regex'} ;
print "####################################################\n";
print "Grep [$partten\{...\}] from $args{'-file'}\n";
print "####################################################\n";
while ($string =~ /($partten$NestedGuts)/xmgs){
print $1."}\n";
print $2."####\n";
}
print "Regex has seen $counts brackts\n";
sub printhelp{
print "Usage:\n";
print "\t./nesting_parser.pl -file [file] -regex '[regex expression]'\n";
print "\t[file] : file path\n";
print "\t[regex] : regex string\n";
exit;
}
Actually my regex is:
our $OpenParents;
our $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;})
| \} (?(?{$OpenParents ne 0}) (?{$OpenParents--})) (?(?{$OpenParents eq 0} (?!))
)*
)
}x;
I have add brace counts in nesting_parser.pl
I also write a string generator for debug: gen_nesting.pl
#! /usr/bin/env perl
use strict;
my $buffer = "{{{test{";
unless ($ARGV[0]) {print "Please specify the nest pair number!\n"; exit}
for (1..$ARGV[0]){
$buffer.= "\n\{\{\{\{$_\}\}\}\}";
#$buffer.= "\n\{\{\{\{\{\{\{\{\{$_\}\}\}\}\}\}\}\}\}";
}
$buffer .= "\n\}}}}";
open TEXT, ">log_$ARGV[0]";
print TEXT $buffer;
close TEXT;
You can generate a test file by
./gen_nesting.pl 1000
It will create a log file named log_1000, which include 1000 lines brace pairs
Now we test our match scripts:
./nesting_parser.pl -file log_1000 -regex "test" > debug_1000
debug_1000 looks like a great perfect result, matched successfully! But when I gen a 4000 lines test log file and match it again, it seem crashed:
./gen_nesting.pl 4000
./nesting_parser.pl -file log_4000 -regex "test" > debug_4000
The end of debug_4000 shows
{{{{3277}
####
Regex has seen 26213 brackts
I don't know what's wrong with the regex expresions, mostly it works well for paired brackets, untill recently I found it crashed when I try to match a text file more than 600,000 lines.
I'm really confused by this problems,
I really hope to solve this problem.
thank you all!
First for matching nested brackets I normally use Regexp::Common.
Next, I'm guessing that your problem is that Perl's regular expression engine breaks after matching 32767 groups. You can verify this by turning on warnings and looking for a message like Complex regular subexpression recursion limit (32766) exceeded.
If so, you can rewrite your code using /g and \G and pos. The idea being that you match the brackets in a loop like this untested code:
my $start = pos($string);
my $open_brackets = 0;
my $failed;
while (0 < $open_brackets or $start == pos($string)) {
if ($string =~ m/\G[^{}]*(\{|\})/g) {
if ($1 eq '{') {
$open_brackets++;
}
else {
$open_brackets--;
}
}
else {
$failed = 1;
break; # WE FAILED TO MATCH
}
}
if (not $failed and 0 == $open_brackets) {
my $matched = substr($string, $start, pos($string));
}

Regex in Tcl not working

I am trying to match two strings ....one of the string i am getting from list and other one is declared by me.
set name " HTTP REQUEST = 1\n HTTP REQUEST(SUCCESS) = 0\nSERVER CONN = 1"
set pattern "HTTP REQUEST(SUCCESS)*"
set List [split $name "\n"]
foreach var $List {
set var [lindex $List 1]
#set var2 [string trim $var1 " "]
}
puts $var
if {[regexp $var $pattern match]} {
puts " matched!"
puts $match
} else {
puts " not matched!"
}
There are two errors:
Parentheses must be escaped with literal backslashes
The text input should go after the pattern in a regexp call
So use
set pattern {HTTP REQUEST\(SUCCESS\)}
^ ^
and then
if {[regexp $pattern $var match]} {
^^^^^^^^^^^^^
See this code demo

How to delete all characters after a certain character in each line in perl?

I have a file that I am reading in and I am trying to delete everything after specific characters such as "[". I have listed the code I have below:
while($line = <INFILE>) {
print "$line \n";
}
Some lines will have "[blah blah blah blah]" and I need to delete everything after the first bracket including the first bracket per line, any help would be greatly appreciated!
To print up to the first occurence of a specific string $delim:
while (<INFILE>) {
printf "%s\n", substr($_, 0, index($_, $delim));
}
This finds the index of the first occurence of the string and prints from first character (0) up to but excluding the index of tat first occurence.
Another option is to use a regex:
while (<$fh>) {
s/\Q$delim\E.*$//m;
print;
}
Note the \Q and \E delimiters to prevent the regex engine from interpreting e.g. [ as a regex metacharacter.
Since you mentioned you wanted to store the line in an array then do the following.
my #arr = ();
while(<INFILE>){
chomp;
push #arr, $_;
}
foreach my $item (#arr){
#process lines as you see fit here
}

Regex to read the id

I have the log file with the following content:
(2947:_dRW00T3WEeSkhZ9pqkt5dQ) ---$ ABC XY "Share" 16-Sep-2014 03:22 PM
(2948:_3nFSwz3TEeSkhZ9pqkt5dQ) ---$ ABC XY "Share" 16-Sep-2014 03:05 PM
(2949:_voeYED3AEeSkhZ9pqkt5dQ) ---$ ABC XY "Initial for Re,oved" 16-Sep-2014 12:44 PM
I want to read the unique id say _dRW00T3WEeSkhZ9pqkt5dQ from each line and store it in a array.
My current code is:
while(<$fh>) {
if ($_ =~ /\((.*?)\)/) {
push #cs_ids , $1;
}
}
Try this:
while(<$fh>) {
if ($_ =~ /\(\d+:(.+?)\)/) {
push #cs_ids , $1;
}
}
The regexp checks all string which starts with ( then one or more digits a double point and than one or more characters ( Which will be stored in $1). THe end of the string is a ).
You were almost there:
perl -e '$string = "(2947:_dRW00T3WEeSkhZ9pqkt5dQ)"; if ($string =~ /^\((\d+:)(.*?)\)$/) { die $2; }'
_dRW00T3WEeSkhZ9pqkt5dQ at -e line 1.
Change your regular expression condition to:
/^\((\d+:)(.*?)\)$/
What that does is match and group the 4 digits and colon into special var $1 and the id you want into special var $2.
If every line of the log file is guaranteed to have an ID string, then you can write just
while (<$fh>) {
/:(\w+)/ and push #cs_ids , $1;
}
The \w ("word") character class matches alphanumeric characters or underscore, and this regex just snags the first sequence of word characters that follow a colon. It is best to avoid the non-greedy modifier if possible as it is a sloppy specification and can be much slower than a simple multiple character match.

Evaluating math in regsub

I need to convert a string from 2 dimensional array to 1 dimension in Tcl.
Eg
lane_out[0][7] -> lane_out[7]
lane_out[1][0] -> lane_out[8]
The initial string is read from a file, modified and stored in another file. Also, the input file has a lot of different kinds of strings that need replacement so the user provides the find and replace regex in another file.
I have successfully done this for simple strings that required no additional calculations. But I need help in executing expressions. I was thinking that I could have the user provide the expression in the file and execute that but I have not been successful.
User Input File:
lappend userinput {\[(\d+)]\*$ *}
lappend userinput {\[(\d+)]\[(\d+)]$ [expr{\1*8+\2}]}
My broken code:
set line "lane_out[1][0]"
foreach rule $userinput {
foreach {find replace} [regexp -all -inline {\S+} $rule] break
regsub -all $find $line $replace line
set match [regexp -all -inline {expr{.*}} $line] #match = expr{1*8+0}
set val [$match] #supposed to execute what is in match
regsub expr{.*} $line $val line
}
What you do is very complex: you have an input file, a set of search/replace rules, and produce some output. The rules, however, require calling expr.
Here is a made-up data file (data.txt):
lane_out[0][7] = "foo bar"
lane_out[1][0] = "foo bar"
lane_out[1][1] = "foo bar"
lane_out[2][0] = "foo bar"
lane_out[2][1] = "foo bar"
And the rules file (rules.txt):
{\[(\d+)\]\[(\d+)\]} {\[[expr {\1 * 8 + \2}]\]}
Here is my script (search_replace.tcl):
package require Tclx
# Read the rules file
set rules {}
for_file line rules.txt {
lassign $line find replace
lappend rules $find $replace
}
# Read the data, apply rules
for_file line data.txt {
foreach {find replace} $rules {
regsub -all $find $line $replace line
set line [subst $line]
puts $line
}
}
Output:
lane_out[7] = "foo bar"
lane_out[8] = "foo bar"
lane_out[9] = "foo bar"
lane_out[16] = "foo bar"
lane_out[17] = "foo bar"
Discussion
The rules.txt's format: each line contains search- and replace expressions, separated by a space
The code will translate a line such as
lane_out[2][1] = "foo bar"
to:
lane_out\[[expr {2 * 8 + 1}]\] = "foo bar"
Then, the subst command replaces that with:
lane_out[17] = "foo bar"
The tricky part is to escape the square brackets so that expr can do the right thing.