Regex a var that contains square brackets in tcl

Regex a var that contains square brackets in tcl - regex

I'm trying to edit a verilog file by finding a match in lines of a file and replacing the match by "1'b1". The problem is that the match is a bus with square brackets in the form "busname[0-9]".
for example in this line:
XOR2X1 \S12/gen_fa[8].fa_i/x0/U1 ( .A(\S12/bcomp [8]), .B(abs_gx[8]), .Y(
I need to replace "abs_gx[8]" by "1'b1".
So I tried to find a match by using this code:
#gets abs_gx[8]
set net "\{[lindex $data 0]\}"
#gets 1'b1
set X [lindex $data 1]
#open and read lines of file
set netlist [open "./$circuit\.v" r]
fconfigure $netlist -buffering line
gets $netlist line
#let's assume the line is XOR2X1 \S12/gen_fa[8].fa_i/x0/U1 ( .A(\S12/bcomp [8]), .B(abs_gx[8]), .Y(
if {[regexp "(.*.\[A-X\]\()$net\(\).*)" $line -inline]} {
puts $new "$1 1'b$X $2" }
elseif {[regexp "(.*.\[Y-Z\]\()$net(\).*)" $line]} {
puts $new "$1$2" }
else {puts $new $line}
gets $netlist line
I tried so much things and nothing seems to really match or I get an error because 8 is not a command because [8] gets interpreted as a command.
Any sneaky trick to place a variable in a regex without having it interpreted as a regular expression itself?

If you have an arbitrary string that you want to match exactly as part of a larger regular expression, you should precede all non-alphanumeric characters in the string by a backslash (\). Fortunately, _ is also not special in Tcl's REs, so you can use \W (equivalent to [^\w]) to match the characters you need to fix
set reSafe [regsub -all {\W} $value {\\&}]
If you're going to be doing that a lot, make a helper procedure.
proc reSafe {value} {
regsub -all {\W} $value {\\&}
}
(Yes, I'd like a way of substituting variables more directly, but the RE engine's internals are code I don't want to touch…)

If I understand correctly, you want to substitute $X for $net except when $net is preceded by Y( or Z( in which case you just delete $net. You could avoid the complications of regexp by using string map which just does literal substitutions - see https://www.tcl-lang.org/man/tcl8.6/TclCmd/string.htm#M34 . You would then need to specify the Y( and Z( cases separately, but that's easy enough when there are only two. So instead of the regsub lines you would do:
set line [string map [list Y($net Y( Z($net Z( $net $X] $line]
puts $new $line

Related

Tcl regsub used with subst produces unexpected result

Edit:
I was trying to replace "xor_in0" with "xor_in[0]" and "xor_in1" with "xor_in[1]" for a given str parameter. Here "xor_in0", "xor_in1" is parameter passed in and I represent it as "key", and "xor_in[0]", "xor_in[1]" is the value parameter stored in an array. Notice the point here is to replace every "key" in "str" with "value" . Here is my testing code:
set str "(xor_in0^xor_in1)"
set str1 "xor_in0^xor_in1" # another input
set key "xor_in0"
set value "xor_in\[0\]"
set newstr ""
set nonalpha "\[^0-9a-zA-Z\]"
regsub -all [subst {^\[(*\]($key)($nonalpha+)}] $str [subst -nobackslashes {$value\2}] newstr
puts $newstr
But somehow it doesn't work... I also tried to remove [subst ...] and it still failed to match anything. This is somehow against my knowledge of regular expression. Please help.

Everything seems a bit over-complicated to me.
Let's look at the regsub that you're actually going to execute. There's a trick to doing that easily; if your command is:
regsub -all [subst {^\[(*\]($key)($nonalpha+)}] $str [subst -nobackslashes {$value\2}] newstr
Then we can print out what it's going to try to do with:
puts [list regsub -all [subst {^\[(*\]($key)($nonalpha+)}] $str [subst -nobackslashes {$value\2}] newstr]
That reveals that you're really doing this:
regsub -all {^[(*](xor_in0)([^0-9a-zA-z]+)} (xor_in0^xor_in1) {xor_in[0]\2} newstr
The part that looks a bit strange in there is the ([^0-9a-zA-z]+) at the end of the RE. It's legal but odd as we can write things a bit differently with \W for matching a non-alpha:
regsub -all {^[(*](xor_in0)(\W+)} $str {xor_in[0]\2} newstr
And that seems to work. What might the bug be then? The definition of nonalpha, as you're using "\[^0-9a-zA-z\]" instead of "\[^0-9a-zA-Z\]". Yes, a literal ^ lies in the ASCII (and Unicode) range from A to z…
OTOH, I'd actually expect a transformation to really be done like this:
set newstr [regsub -all {(\y[a-zA-Z]+_in)(\d+)} $str {\1[\2]}]
The only things you're not used to there are \y (a word boundary constraint) and \d (match any digit). Or, for a simple transformation (mapping all instances of a literal substring to another literal substring):
set newstr [string map [list $key $value] $str]

Actually the real problem to my question is the A-z typo :)

Simple is generally better:
regsub -all {\d+} $s {[&]} s
Takes care of your examples.

Perl grep a multi line output for a pattern

I have the below code where I am trying to grep for a pattern in a variable. The variable has a multiline text in it.
Multiline text in $output looks like this
_skv_version=1
COMPONENTSEQUENCE=C1-
BEGIN_C1
COMPONENT=SecurityJNI
TOOLSEQUENCE=T1-
END_C1
CMD_ID=null
CMD_USES_ASSET_ENV=null_jdk1.7.0_80
CMD_USES_ASSET_ENV=null_ivy,null_jdk1.7.3_80
BEGIN_C1_T1
CMD_ID=msdotnet_VS2013_x64
CMD_ID=ant_1.7.1
CMD_FILE=path/to/abcI.vc12.sln
BEGIN_CMD_OPTIONS_RELEASE
-useideenv
The code I am using to grep for the pattern
use strict;
use warnings;
my $cmd_pattern = "CMD_ID=|CMD_USES_ASSET_ENV=";
my #matching_lines;
my $output = `cmd to get output` ;
print "output is : $output\n";
if ($output =~ /^$cmd_pattern(?:null_)?(\w+([\.]?\w+)*)/s ) {
print "1 is : $1\n";
push (#matching_lines, $1);
}
I am getting the multiline output as expected from $output but the regex pattern match which I am using on $output is not giving me any results.
Desired output
jdk1.7.0_80
ivy
jdk1.7.3_80
msdotnet_VS2013_x64
ant_1.7.1

Regarding your regular expression:
You need a while, not an if (otherwise you'll only be matching once); when you make this change you'll also need the /gc modifiers
You don't really need the /s modifier, as that one makes . match \n, which you're not making use of (see note at the end)
You want to use the /m modifier so that ^ matches the beginning of every new line, and not just the beginning of the string
You want to add \s* to your regular expression right after ^, because in at least one of your lines you have a leading space
You need parenthesis around $cmd_pattern; otherwise, you're getting two options, the first one being ^CMD_ID= and the second one being CMD_USES_ASSET_ENV= followed by the rest of your expression
You can also simplify the (\w+([\.]?\w+)*) bit down to (.+).
The result would be:
while ($output =~ /^\s*(?:$cmd_pattern)(?:null_)?(.+)/gcm ) {
print "1 is : $1\n";
push (#matching_lines, $1);
}
That being said, your regular expression still won't split ivy and jdk1.7.3_80 on its own; I would suggest adding a split and removing _null with something like:
while ($output =~ /^\s*(?:$cmd_pattern)(?:null_)?(.+)/gcm ) {
my $text = $1;
my #text;
if ($text =~ /,/) {
#text = split /,(?:null_)?/, $text;
}
else {
#text = $text;
}
for (#text) {
print "1 is : $_\n";
push (#matching_lines, $_);
}
}
The only problem you're left with is the lone line CMD_ID=null. I'm gonna leave that to you :-)
(I recently wrote a blog post on best practices for regular expressions - http://blog.codacy.com/2016/03/30/best-practices-for-regular-expressions/ - you'll find there a note to always require the /s in Perl; the reason I mention here that you don't need it is that you're not using the ones you actually need, and that might mean you weren't certain of the meaning of /s)

Using variables in regular expression

I have a list of strings structured like:
C:/Users/scott-filter1.pgm C:/Users/scott-filter2.pgm C:/Users/scott-filter3.pgm
Essentially, what I want to do is remove C:/Users/scott- and .pgm leaving me with just filter1 for example.
So, this is my regular expression:
regsub -nocase {.pgm} [regsub -nocase {C:/Users/scott-} $list ""] ""
Which works fine, albeit a little clunky. Now, when I replace the inner regular expression with a regular expression that contains a variable, such as:
set myname scott
{C:/Users/$myname-}
It no longer works. Any ideas on how to achieve what I want to achieve?
Thanks!

You will need to remove the braces as they prevent substitution (that is you won't have the variable replaced by the value of that variable and instead, you will have the literal string $myname in the regex -- also might be worth noting that $ in regex matches at the end of the string):
regsub "C:/Users/$myname-" $in "" out
Or you can do it with a single regsub:
set list "C:/Users/scott-filter1.pgm"
set myname "scott"
regsub -nocase -- "C:/Users/$myname-(.*)\\.pgm" $list {\1} out
puts $out
# => filter1
Notes:
If you remove the braces and use quotes, you need to double escape things you would otherwise escape once.
I'm using a capture group when I use parens and .* matches any character(s). The captured part is then put back using \1 in the replacement part, into the variable called out.
Strictly speaking, you need to escape . because this is a wildcard in regex and matches any 1 character. Because I'm using quotes, I need to double escape it with two backslashes.
Matching might be easier and more straightforward than substitution:
regexp -nocase -- "C:/Users/$myname-(.*)\\.pgm" $list - out
puts $out
# => filter1
If the 'name' can be anything, then you can use a more generic regex to avoid having to place the name in the regex... For instance, if $myname can never have a dash, you can use the negated class [^-] which matches anything except dash and you won't have to worry about double escapes:
regexp -nocase -- {C:/Users/[^-]+-(.*)\.pgm} $list - out
puts $out
# => filter1

There is another way to do this, assuming the part you want is always in a file name between a dash and the last dot before the extension.
set foo C:/Users/scott-filter1.pgm
# => C:/Users/scott-filter1.pgm
set bar [file rootname [file tail $foo]]
# => scott-filter1
set baz [split $bar -]
# => scott filter1
set qux [lindex $baz end]
# => filter1
or
lindex [split [file rootname [file tail $foo]] -] end
# => filter1
The file commands work on any string that is recognizable as a file path. file tail yields the file path minus the part with the directories, i.e. only the actual file name. file rootname yields the file name minus the extension. split converts the string into a list, splitting it at every dash. lindex gets one item from the list, in this case the last item.
An even more ad-hoc-ish (but actually quite generic) solution:
lindex [split [lindex [split $foo -] end] .] 0
# => filter1
This invocation splits the file path at every dash and selects the last item. This item is again split at every dot, and the first item of the resulting list is selected.
Documentation: file, lindex, set, split

Since this is a list of filenames, we can use lmap (to apply an operation to each of the elements of a list, requires 8.6) and file (specifically file tail and file rootname) to do most of the work. A simple string map will finish it off, though a regsub could also have been used.
set filenames {C:/Users/scott-filter1.pgm C:/Users/scott-filter2.pgm C:/Users/scott-filter3.pgm}
set filtered [lmap name $filenames {
string map {"scott-" ""} [file rootname [file tail $name]]
# Regsub version:
#regsub {^scott-} [file rootname [file tail $name]] ""
}]
Older versions of Tcl will need to use foreach:
set filtered {}
foreach name $filenames {
lappend filtered [string map {"scott-" ""} [file rootname [file tail $name]]]
# Regsub version:
#lappend filtered [regsub {^scott-} [file rootname [file tail $name]] ""]
}

TCL Strip expression (Varying chars)

If i have a string such as:
foo_image_v001.ext
that could just as easily say
bar_image_v001.ext
How can i use TCL to strip the first underscore and everything to the right of it, leaving me with just 'foo' or 'bar'
Im normally a python guy, not very versed in TCL, but it will work best in this case if i can just get it to work =)

Replace everything after the first _
set new [regsub {_.*} $old {}]

Here is one way to do it:
set filename foo_image_v001.ext
set prefix [regsub {_.*} $filename ""]
the regsub looks for the pattern {_.*} in $filename, and replace it with nothing "".

Maybe you could use this?
set string "foo_image_v001.ext"
regexp -- {^([^_]+)} $string - var
foo gets stored in $var.

Don't need to break out a regular expression for this:
using string commands:
set prefix [string range $filename 0 [expr {[string first _ $filename] - 1}]]
Also, if you split the string on underscores, what you want is the first element in the resulting list:
set prefix [lindex [split $filename _] 0]

Case matching regexp

I have been wondering about a regexp matching pattern in Tcl for some time and I've remained stumped as to how it was working. I'm using Wish and Tcl/Tk 8.5 by the way.
I have a random string MmmasidhmMm stored in $line and the code I have is:
while {[regexp -all {[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
$data is a text file.
This is what I got:
m
m
m
m
m
m
While I was expecting:
M
m
m
m
M
m
I was trying some things to see how changing a bit would affect the results when I got this:
while {[regexp -all {^[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
I get:
M
m
m
Surprisingly, $match keeps the case.
I was wondering why in the first case, $match automatically becomes lowercase for some reason. Unless I am not understanding how the regexp actually is working, I'm not sure what I could be doing wrong. Maybe there's a flag that fixes it that I don't know about?
I'm not sure I'll really use this kind of code some day, but I guess learning how it works might help me in other ways. I hope I didn't miss anything in there. Let me know if you need more information!

The key here is in your -all flag. The documentation for that said:
-all -- Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.
That means the variable match contains the very last match, which is a lower case 'm'. Drop the -all flag and you will get what you want.
Update
If your goal is to remove all 'm' regardless of case, that whole block of code can be condensed into just one line:
regsub -all {[MM]} $line "" line
Or, more intuitively:
set line [string map -nocase {m ""} $line]; # Map all M's into nothing

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex a var that contains square brackets in tcl - regex

Related

Tcl regsub used with subst produces unexpected result

Perl grep a multi line output for a pattern

Using variables in regular expression

TCL Strip expression (Varying chars)

Case matching regexp

Categories

Resources