Passing a variable to regexp when the variable may have brackets (TCL) - regex

In my job, I deal a lot with entities whose names may contain square brackets. We mostly use tcl, so square brackets can sometimes cause havoc. I'm trying to do the following:
set pat {pair_shap_val[9]}
set aff {pair_shap_val[9]_affin_input}
echo [regexp "${pat}_affin.*" $aff]
However, this returns a 0 when I would expect a 1. I'm certain that when ${pat} is passed to the regexp engine, the brackets are being expanded and read as "[9]" instead of "[9]".
How do I phrase the regexp so a pattern contains a variable when the variable itself may have special regexp characters?
EDIT: An easy way would be to just escape the brackets when setting $pat. However, the value for $pat is passed to me by a function so I cannot easily do that.

Just ruthlessly escape all non-word chars:
set pat {pair_shap_val[9]}
set aff {pair_shap_val[9]_affin_input}
puts [regexp "${pat}_affin.*" $aff] ;# ==> 0
set escaped_pat [regsub -all {\W} $pat {\\&}]
puts $escaped_pat ;# ==> pair_shap_val\[9\]
puts [regexp "${escaped_pat}_affin.*" $aff] ;# ==> 1
A second thought: this doesn't really seem to require regular expression matching. It appears you just need to check that the pat string is contained in the aff string:
% expr {[string first $pat $aff] != -1}
1

Related

Regexp not matching string with [] and / in Tcl

I am unable to match regex with a pin name having patterns with / and []. How to match string with this expression in tcl regexp?
ISSUE:
% set inst "channel/rptrw12\[5\]"
channel/rptrw12[5]
% set pin "channel/rptrw12\[5\]/rpinv\[11\]/vcc"
channel/rptrw12[5]/rpinv[11]/vcc
% regexp -nocase "^$inst" $pin
0
PASSING CASE:
% regexp -nocase vcc $pin
1
% set pat "ctrl/crdtfifo"
ctrl/crdtfifo
% set pin2 "ctrl/crdtfifo/iwdatabuf"
ctrl/crdtfifo/iwdatabuf
% regexp -nocase $pat $pin2
1
Your problem is that you are fighting with RE engine metacharacters, specifically […], which defines a character set. If you want to continue using your current approach, you'll need to add more backslashes.
But you don't have to do that!
If you are asking the question “does this string exist in that string?” you can also consider using one of these:
Use string first and check if the result (where the substring is) is not negative:
if {[string first $inst $pin] >= 0} {
puts "Found it"
}
Use regexp ***=, which means “interpret the rest of this as a literal string, no metacharacters”:
if {[regexp ***=$inst $pin]} {
puts "Found it"
}
If you only want to match for equality at the start of the string (you're asking “does this string start with that string?”) you probably should instead do one of these:
Use string first and check if the resulting index is zero:
if {[string first $inst $pin] == 0} {
puts "Found '$inst' at the start of '$pin'"
}
Use string equal with the right option (very much like strncmp() in C, if you know that):
if {[string equal -length [string length $inst] $inst $pin]} {
puts "'$pin' starts with '$inst'"
}
If you remember your regular expressions, the [] syntax has special meaning in regexp. It defines a character group. For example:
[abc]
means match a or b or c.
Therefore the pattern:
channel/rptrw12[5]
means match the string:
channel/rptrw125
If you want to match the literal character [ in regexp you need to escape it (same with all other characters that have meaning in regexp like . or ? or ( etc.). So your pattern should be:
channel/rptrw12\[5\]
But remember, the characters \ and [ has special meaning in tcl strings. So your code must do:
set inst "channel/rptrw12\\\[5\\\]"
The first \ escapes the \ character so that tcl will insert a single \ into the string. The third \ escapes the [ character so that tcl will not try to execute a command or function named 5.
Alternatively you can use {} instead of "":
set inst {channel/rptrw12\[5\]}

In Tcl how can I remove all zeroes to the left but the zeroes to the right should remain?

Folks! I ran into a problem that I can't solve by myself.
Since the numbers "08" and "09" cannot be read like the others (01,02,03,04, etc ...) and must be treated separately in the language Tcl.
I can't find a way to remove all [I say ALL because there are more than one on the same line] the leading zeros except the one on the right, which must remain intact.
It may sound simple to those who are already thoroughly familiar with the Tcl / Tk language. But for me, who started out and am looking for more information about Tcl / Tk, I read a lot of material on the internet, including this https: // stackoverflow.com/questions/2110864/handling-numbers-with-leading-zeros-in-tcl#2111822
So nothing to show me how to do this in one sweep eliminating all leading zeros.
I need you to give me a return like this: 2:9:10
I need this to later manipulate the result with the expr [arithmetic expression] command.
In this example it just removes a single leading zero:
set time {02:09:10}
puts [regsub {^0*(.+)} $time {\1}]
# Return: 2:09:10
If anyone can give me that strength friend?! I'm grateful right now.
The group (^|:) matches either the beginning of the string or a colon.
0+ matches one or more zeros. Replace with the group match \1,
otherwise the colons get lost. And of course, use -all to do all of
the matches in the target string.
% set z 02:09:10
02:09:10
% regsub -all {(^|:)0+} $z {\1} x
2
% puts $x
2:9:10
%
Edit: As Barmar points out, this will change :00 to an empty string.
A better regex might be:
regsub -all {(^|:)0} $z {\1} x
This will only remove a single leading 0.
You're only matching the 0 at the beginning of the string, you need to match after each : as well.
puts [regsub -all {(^|:)0*([^:])} $time {\1\2}]
In general it is best to use scan $str %d to convert a decimal number with possible leading zeroes to its actual value.
But in your case this will also work (and seems simpler to me than the answers given earlier and doesn't rely on the separator being a colon):
regsub -all {0*(\d+)} $time {\1}
This will remove any number of leading zeroes, but doesn't trim 00 down to an empty string. Also trailing zeroes will not be affected.
regsub -all {0*(\d+)} {0003:000:1000} {\1} => 3:0:1000
the scan command is useful here to extract three decimal numbers out of that string:
% set time {02:09:10}
02:09:10
% scan $time {%d:%d:%d} h m s
3
% puts [list $h $m $s]
2 9 10
There are a few tricky edge cases here. Specifically, the string 02:09:10:1001:00 covers the key ones (including middle zeroes, only zeroes). We can use a single substitution command to do the work:
regsub -all {\m0+(?=\d)} $str {}
(This uses a word start anchor and lookahead constraint.)
However, I would be more inclined to use other tools for this sort of thing. For times, for example, parsing them is better done with scan:
set time "02:09:10"
scan $time "%d:%d:%d" h m s
Or, depending on what is going on, clock scan (which handles dates as well, making it more useful in some cases and less in others).

Handle commas in quoted strings in Tcl

I'm using the following line in Tcl to parse a comma-separated line of fields. Some of the fields may be quoted so they can contain comma's:
set line {12,"34","56"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "56"
(It's a bit strange that the last field is quoted instead of braced but that's not the problem here)
However, when there is a comma in the quote, it does not work:
set line {12,"34","56,78"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "{56} 78"
I would expect:
{12} {34} {56,78}
Is there something wrong with my regexp or it there something tcl-ish going on?
One option that comes to mind is using the CSV functionality in TclLib. (No reason to reinvent the wheel unless you have to...)
http://tcllib.sourceforge.net/doc/csv.html
Docs Excerpt
::csv::split ? -alternate ? line
{sepChar ,} {delChar "} converts a
line in CSV format into a list of the
values contained in the line. The
character used to separate the values
from each other can be defined by the
caller, via sepChar, but this is
optional. The default is ",". The
quoting character can be defined by
the caller, but this is optional. The
default is '"'. If the option
-alternate is spcified a slightly different syntax is used to parse the
input. This syntax is explained below,
in the section FORMAT.
The problem seems to be an extra comma: you only accept quoted strings if they have a comma after them., and do the same for non-quoted tokens, This works:
set fresult [regsub -all {(\")([^\"]+)(\")|([^,\"]+)} $line {{\2\4} } fields]
^(no commas)^
Working Example: http://ideone.com/O2hss
You can safely keep the commas out of the pattern - the regex engine will keen searching new matches: it will skip a comma it cannot match, and start at the next character.
Bonus: this will also handle escaped quotes, using \" (if you need you should be able to adapt easily by using "" instead of \\. ).:
set fresult [regsub -all {"((?:[^"\\]|\\.)+)"|([^,"]+)} $line {{\1\2} } fields]
Example: http://ideone.com/ztkBh
Use the following regsub
% set line {12,"34","56,78"}
% regsub -all {(,")|(",)|"} $line " " line
% set line
12 34 56,78 <<< Result
Here all the occurrences of ," or ", or " (in order) are replaced by space
As you said to #Kobi, if you allow for empty fields, you should allow for empty strings ""
{((\")([^\"]*)(\")|([^,\"]*))(,|$)} where the fields of interest shifted to 3 and 5
Expanded: { ( (\")([^\"]*)(\") | ([^,\"]*) ) (,|$) } I admit, I don't know if tcl allows (?:) non-capture grouping.

TCL regsub isn't working when the expression has [0]

I tried the following code:
set exp {elem[0]}
set temp {elem[0]}
regsub $temp $exp "1" exp
if {$exp} {
puts "######### 111111111111111 ################"
} else {
puts "########### 0000000000000000 ############"
}
of course, this is the easiest regsub possible (the words match completely), and still it doesnt work, and no substitution is done. if I write elem instead of elem[0], everything works fine.
I tried using {elem[0]}, elem[0], "elem[0]" etc, and none of them worked.
Any clue anyone?
This is the easiest regsub possible (the words match completely)
Actually, no, the words don't match. You see, in a regular expression, square brackets have meaning. Your expression {elem[0]} actually mean:
match the sequence of letters 'e'
followed by 'l'
followed by 'e'
followed by 'm'
followed by '0' (the character for the number zero)
So it would match the string "elem0" not "elem[0]" since the character after 'm' is not '0'.
What you want is {elem\[0\]} <-- backslash escapes special meaning.
Read the manual for tcl's regular expression syntax, re_syntax, for more info on how regular expressions work in tcl.
In addition to #slebetman's answer, if your want any special characters in your regular expression to be treated like plain text, there is special syntax for that:
set word {abd[0]}
set regex $word
regexp $regex $word ;# => returns 0, did not match
regexp "(?q)$regex" $word ;# => returns 1, matched
That (?q) marker must be the first part of the RE.
Also, if you're really just comparing literal strings, consider the simpler if {$str1 eq $str2} ... or the glob-style matching of [string match]

How do I extract all matches with a Tcl regex?

hi everybody i want solution for this regular expression, my problem is Extract all the hex numbers in the form H'xxxx, i used this regexp but i didn't get all hexvalues only i get one number, how to get whole hex number from this string
set hex "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set res [regexp -all {H'([0-9A-Z]+)&} $hex match hexValues]
puts "$res H$hexValues"
i am getting output is 5 H4D52
On -all -inline
From the documentation:
-all : Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.
-inline : Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression.
Thus to return all matches --including captures by groups-- as a flat list in Tcl, you can write:
set matchTuples [regexp -all -inline $pattern $text]
If the pattern has groups 0…N-1, then each match is an N-tuple in the list. Thus the number of actual matches is the length of this list divided by N. You can then use foreach with N variables to iterate over each tuple of the list.
If N = 2 for example, you have:
set numMatches [expr {[llength $matchTuples] / 2}]
foreach {group0 group1} $matchTuples {
...
}
References
regular-expressions.info/Tcl
Sample code
Here's a solution for this specific problem, annotated with output as comments (see also on ideone.com):
set text "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set pattern {H'([0-9A-F]{4})}
set matchTuples [regexp -all -inline $pattern $text]
puts $matchTuples
# H'22EF 22EF H'2354 2354 H'4BD4 4BD4 H'4C4B 4C4B H'4D52 4D52 H'4DC9 4DC9
# \_________/ \_________/ \_________/ \_________/ \_________/ \_________/
# 1st match 2nd match 3rd match 4th match 5th match 6th match
puts [llength $matchTuples]
# 12
set numMatches [expr {[llength $matchTuples] / 2}]
puts $numMatches
# 6
foreach {whole hex} $matchTuples {
puts $hex
}
# 22EF
# 2354
# 4BD4
# 4C4B
# 4D52
# 4DC9
On the pattern
Note that I've changed the pattern slightly:
Instead of [0-9A-Z]+, e.g. [0-9A-F]{4} is more specific for matching exactly 4 hexadecimal digits
If you insist on matching the &, then the last hex string (H'4DC9 in your input) can not be matched
This explains why you get 4D52 in the original script, because that's the last match with &
Maybe get rid of the &, or use (&|$) instead, i.e. a & or the end of the string $.
References
regular-expressions.info/Finite Repetition, Anchors
I'm not Tclish, but I think you need to use both the -inline and -all options:
regexp -all -inline {H'([0-9A-Z]+)&} $string
EDIT: Here it is again, this time with a corrected regex (see the comments):
regexp -all -inline {H'[0-9A-F]+&} $string