Regex in Tcl not working - regex

I am trying to match two strings ....one of the string i am getting from list and other one is declared by me.
set name " HTTP REQUEST = 1\n HTTP REQUEST(SUCCESS) = 0\nSERVER CONN = 1"
set pattern "HTTP REQUEST(SUCCESS)*"
set List [split $name "\n"]
foreach var $List {
set var [lindex $List 1]
#set var2 [string trim $var1 " "]
}
puts $var
if {[regexp $var $pattern match]} {
puts " matched!"
puts $match
} else {
puts " not matched!"
}

There are two errors:
Parentheses must be escaped with literal backslashes
The text input should go after the pattern in a regexp call
So use
set pattern {HTTP REQUEST\(SUCCESS\)}
^ ^
and then
if {[regexp $pattern $var match]} {
^^^^^^^^^^^^^
See this code demo

Related

TCL--Regexp matching not happening properly

set cell "HEADBUFTIE42D_D3_N"
set postfix "_M7P5TR_C60L08"
set line " cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) { "
if {[regexp "^ +cell[(]$cell$postfix[)] *\\{" $line match]} {
puts "hello"
}
hereI am trying to match the line
cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) {
Note that there are 2 spaces in beginning and 1 space after {.
But the match does not happen. Please help
You have a problem of not escaping the regexp meta characters properly ({ is not being escaped, then [(] is trying to execute a command named (, and so on). It's easier if you use format like so:
set cell "HEADBUFTIE42D_D3_N"
set postfix "_M7P5TR_C60L08"
set line " cell(HEADBUFTIE42D_D3_N_M7P5TR_C60L08) { "
set re [format {^ +cell\(%s%s\) *\{} $cell $postfix]
if {[regexp $re $line match]} {
puts "hello"
}

RegEx in Powershell, combine replace calls

I've written my own CSS minifier for fun and profit (not so much profit), and it works great. I am now trying to streamline it, since I'm essentially filtering the file 10+ times. Not a huge deal with a small file, but the larger they get, the worse that performance hit will be.
Is there a more elegant way to filter my input file? I'm assuming regex will have a way, but I am no regex wizard...
$a = (gc($path + $file) -Raw)
$a = $a -replace "\s{2,100}(?<!\S)", ""
$a = $a -replace " {", "{"
$a = $a -replace "} ", "}"
$a = $a -replace " \(", "\("
$a = $a -replace "\) ", "\)"
$a = $a -replace " \[", "\["
$a = $a -replace "\] ", "\]"
$a = $a -replace ": ", ":"
$a = $a -replace "; ", ";"
$a = $a -replace ", ", ","
$a = $a -replace "\n", ""
$a = $a -replace "\t", ""
To save you a little headache, i'm basically using the first -replace to strip any successive witespace from 2-100 characters in length.
The remaining replace statements cover cleaning up single spaces in specific circumstances.
How can I combine this, so I'm not filtering the file 12 times?
negative lookbehind (?<!\S) is used in this scenario: (?<!prefix)thing to match a thing which does not have the prefix on the left. When you put it at the end of the regex, with nothing after it, I think it does nothing at all. You might have intended it to go on the left, or might have intended to to be a negative lookahead, I won't try to guess, I'll just remove it for this answer.
You're missing the use of character classes. abc looks for the text abc, but put them in square brackets and [abc] looks for any of the characters a, b, c.
Using that, you can combine the last two lines into one: [\n\t] which replace either a newline or a tab.
You can combine the two separate (replace with nothing) rules using regex logical OR | to make one match: \s{2,100}|[\n\t] - match the spaces or the newline or tab. (You could probably use OR twice instead of characters, fwiw).
Use regex capture groups which allow you to reference whatever the regex matched, without knowing in advance what that was.
e.g. "space bracket -> bracket" and "space colon -> colon" and "space comma -> comma" all follow the general pattern "space (thing) -> (thing)". And the same with the trailing spaces "(thing) space -> (thing)".
Combine capture groups with character classes to merge the rest of the lines all into one.
e.g.
$a -replace " (:)", '$1' # capture the colon, replacement is not ':'
# it is "whatever was in the capture group"
$a -replace " ([:,])", '$1' # capture the colon, or comma. Replacement
# is "whatever was in the capture group"
# space colon -> colon, space comma -> comma
# make the space optional with \s{0,1} and put it at the start and end
\s{0,1}([:,])\s{0,1} #now it will match "space (thing)" or "(thing) space"
# Add in the rest of the characters, with appropriate \ escapes
# gained from [regex]::Escape('those chars here')
# Your original:
$a = (gc D:\css\1.css -Raw)
$a = $a -replace "\s{2,100}(?<!\S)", ""
$a = $a -replace " {", "{"
$a = $a -replace "} ", "}"
$a = $a -replace " \(", "\("
$a = $a -replace "\) ", "\)"
$a = $a -replace " \[", "\["
$a = $a -replace "\] ", "\]"
$a = $a -replace ": ", ":"
$a = $a -replace "; ", ";"
$a = $a -replace ", ", ","
$a = $a -replace "\n", ""
$a = $a -replace "\t", ""
# My version:
$b = gc d:\css\1.css -Raw
$b = $b -replace "\s{2,100}|[\n\t]", ""
$b = $b -replace '\s{0,1}([])}{([:;,])\s{0,1}', '$1'
# Test that they both do the same thing on my random downloaded sample file:
$b -eq $a
# Yep.
Do that again with another | to combine the two into one:
$c = gc d:\css\1.css -Raw
$c = $c -replace "\s{2,100}|[\n\t]|\s{0,1}([])}{([:;,])\s{0,1}", '$1'
$c -eq $a # also same output as your original.
NB. that the space and tab and newline capture nothing, so '$1' is empty,
which removes them.
And you can spend lots of time building your own unreadable regex which probably won't be noticeably faster in any real scenario. :)
NB. '$1' in the replacement, the dollar is a .Net regex engine syntax, not a PowerShell variable. If you use double quotes, PowerShell will string interpolate from the variable $1 and likely replace it with nothing.
You may join the patterns that are similar into 1 bigger expression with capturing groups, and use a callback inside a Regex replace method where you may evaluate the match structure and use appropriate action.
Here is a solution for your scenario that you may extend:
$callback = { param($match)
if ($match.Groups[1].Success -eq $true) { "" }
else {
if ($match.Groups[2].Success -eq $true) { $match.Groups[2].Value }
else {
if ($match.Groups[3].Success -eq $true) { $match.Groups[3].Value }
else {
if ($match.Groups[4].Success -eq $true) { $match.Groups[4].Value }
}
}
}
}
$path = "d:\input\folder\"
$file = "input_file.txt"
$a = [IO.File]::ReadAllText($path + $file)
$rx = [regex]'(\s{2,100}(?<!\S)|[\n\t])|\s+([{([])|([])}])\s+|([:;,])\s+'
$rx.Replace($a, $callback) | Out-File "d:\result\file.txt"
Pattern details:
(\s{2,100}(?<!\S)|[\n\t]) - Group 1 capturing 2 to 100 whitespaces not preceded with a non-whitespace char (maybe this lookbehind is redundant) OR a newline or tab char
| - or
\s+([{([]) - just matching one or more whitespaces (\s+), and then capturing into Group 2 any single char from the [{([] character class: {, ( or [
|([])}])\s+ - or Group 3 capturing any single char from the [])}] character class: }, ) or ] and then just matching one or more whitespaces
|([:;,])\s+ - or Group 4 capturing any char from [:;,] char class (:, ; or ,) and one or more whitespaces.

How to replace two or more strings in a file in tcl

I am reading a file and trying to replace 3 strings in different lines using regsub.
Input file :
This is a bus
This is a car
This is a bike
Output Expected:
This is a Plane
This is a Scooter
This is a Bicycle
if i use
puts $out [regsub -all "( bus)" $line "\ $x" ]
puts $out [regsub -all "( car)" $line "\ $y" ]
puts $out [regsub -all "( bike)" $line "\ $z" ]
As i am calling as a proc with arguments x,y,z as plane,scooter,bicycle.
But This is printing all lines 3 times. How to replace all three strings ??
You can also use string map to replace strings:
string map {{ bus} { Plane} { car} { Scooter} { bike} { Bicycle}} $input_string
The arguments are a list of pairs of "find" "replace" strings and then your input string...
BTW. With the regsub method, you can nest the regsubs, so that the result of one becomes the input of the other e.g. with two: regsub -all { bus} [regsub -all { car} $input_string { Scooter}] { Plane} it isn't very readable though!
Also note that you don't need to capture the group with parentheses in your expression: "( car)" would do an extra sub-group capture that you don't actually use... { car} is better...
The clearest way is to write the line to a variable in-between each substitution. Writing back to the variable it came from is quite often the easiest approach. You can then print the result out once at the end.
set line [regsub -all "( bus)" $line "\ $x"]
set line [regsub -all "( car)" $line "\ $y"]
set line [regsub -all "( bike)" $line "\ $z"]
puts $out $line
In case you read file line by line you can use if operator.
while { [ gets $fh line ] >= 0} {
if {[regexp -all -- { bus} $line]} {
puts $out [regsub -all "( bus)" $line "\ $x" ]
} elseif {[regexp -all -- { car} $line]} {
puts $out [regsub -all "( car)" $line "\ $y" ]
} else {
puts $out [regsub -all "( bike)" $line "\ $z" ]
}
}

Evaluating math in regsub

I need to convert a string from 2 dimensional array to 1 dimension in Tcl.
Eg
lane_out[0][7] -> lane_out[7]
lane_out[1][0] -> lane_out[8]
The initial string is read from a file, modified and stored in another file. Also, the input file has a lot of different kinds of strings that need replacement so the user provides the find and replace regex in another file.
I have successfully done this for simple strings that required no additional calculations. But I need help in executing expressions. I was thinking that I could have the user provide the expression in the file and execute that but I have not been successful.
User Input File:
lappend userinput {\[(\d+)]\*$ *}
lappend userinput {\[(\d+)]\[(\d+)]$ [expr{\1*8+\2}]}
My broken code:
set line "lane_out[1][0]"
foreach rule $userinput {
foreach {find replace} [regexp -all -inline {\S+} $rule] break
regsub -all $find $line $replace line
set match [regexp -all -inline {expr{.*}} $line] #match = expr{1*8+0}
set val [$match] #supposed to execute what is in match
regsub expr{.*} $line $val line
}
What you do is very complex: you have an input file, a set of search/replace rules, and produce some output. The rules, however, require calling expr.
Here is a made-up data file (data.txt):
lane_out[0][7] = "foo bar"
lane_out[1][0] = "foo bar"
lane_out[1][1] = "foo bar"
lane_out[2][0] = "foo bar"
lane_out[2][1] = "foo bar"
And the rules file (rules.txt):
{\[(\d+)\]\[(\d+)\]} {\[[expr {\1 * 8 + \2}]\]}
Here is my script (search_replace.tcl):
package require Tclx
# Read the rules file
set rules {}
for_file line rules.txt {
lassign $line find replace
lappend rules $find $replace
}
# Read the data, apply rules
for_file line data.txt {
foreach {find replace} $rules {
regsub -all $find $line $replace line
set line [subst $line]
puts $line
}
}
Output:
lane_out[7] = "foo bar"
lane_out[8] = "foo bar"
lane_out[9] = "foo bar"
lane_out[16] = "foo bar"
lane_out[17] = "foo bar"
Discussion
The rules.txt's format: each line contains search- and replace expressions, separated by a space
The code will translate a line such as
lane_out[2][1] = "foo bar"
to:
lane_out\[[expr {2 * 8 + 1}]\] = "foo bar"
Then, the subst command replaces that with:
lane_out[17] = "foo bar"
The tricky part is to escape the square brackets so that expr can do the right thing.

Process all matches in a file in multiple steps and replace some of the matches

I'm trying to find all matches of a regex in a file and replace them. I would like to find matches in multiple steps. For example, I want to first find the pattern that come between two $IDENTIFIER_ , then inside that pattern replace all $ONE with $TWO.
This what I have so far:
$entireFile = "Some random text here var_a 4456 var_b var_c 1122 var_d var_e 559 var_f Some random text here ";
my $ONE_="1";
my $TWO_="2";
my $IDENTIFIER_ = "\\b[a-zA-Z_][a-zA-Z0-9_]*\\b";
my $id1;
my $id2;
my $item;
while ($entireFile =~ m/($IDENTIFIER_)(.*?)($IDENTIFIER_)/g)
{
$id1 = $1;
$item = $2;
$id2 = $3;
#Check to see if $item has $ONE and replace with $TWO
if ($item =~ s/(.*?)$ONE_(.*?)/$1$TWO_$2/g )
{
print $id1.$item.$id2."\n" ;
}
}
This prints:
var_c 2222 var_d
What I need help with is how to print the rest of the file (the text before the first match, the text between subsequent matches, and the text after the last match).
$entireFile = "Some random text here var_a 4456 var_b".
" var_c 1122 var_d var_e 559 var_f Some random text here ";
my $ONE_="1";
my $TWO_="2";
my $re_id = qr/\b[a-zA-Z_][a-zA-Z0-9_]*/;
while ($entireFile =~ s/($re_id.*?)$ONE_(.*?$re_id)/$1$TWO_$2/) { }
print $entireFile;
If you really want to match in two phases:
$entireFile = "Some random text here var_a 4456 var_b".
" var_c 1122 var_d var_e 559 var_f Some random text here ";
my ($ONE_, $TWO_) = ("1", "2");
my $re_id = qr/\b[a-zA-Z_][a-zA-Z0-9_]*/;
my $printed=0;
while ($entireFile =~ /($re_id)(.*?)($re_id)/g) {
my ($id1, $item, $id2) = ($1, $2, $3);
my ($start, $end, $length) = ($-[0], $+[0], $+[0]-$-[0]);
if ($printed < $start) {
print substr($entireFile, $printed, $-[0]-$printed);
$printed = $start;
}
if ($item =~ s/(.*?)$ONE_(.*?)/$1$TWO_$2/g ) {
print $id1.$item.$id2."\n" ;
$printed = $end;
} else {
print substr($entireFile, $printed, $length)."\n";
$printed = $end;
}
}
One approach is to use a function that you execute within the substitution.
e.g.
$entireFile = "Some random text here var_a 4456 var_b var_c 1122 var_d var_e 559 var_f Some random text here ";
my $ONE_="1";
my $TWO_="2";
my $IDENTIFIER_ = "\\b[a-zA-Z_][a-zA-Z0-9_]*\\b";
$entireFile =~ s/($IDENTIFIER_)(.*?)($IDENTIFIER_)/$1 . inner_func($2) . $3/egs;
print( $entireFile );
sub inner_func {
my ( $text ) = #_;
$text =~ s/$ONE_/$TWO_/g;
return( $text );
}
The /e flag instructs the substitution operator (s///) to execute the replacement text as if it were code. This can be especially useful for recursive descent parsing...
If you use /s as a flag on your substitution you are also telling the search-and-replace to treat newlines like any other character - enabling you to perform this global replace across lines (if you've slurped the entire file into your variable in the first place).