Check if string end with substring, Tcl - regex

I have a searching string .state_s[0] and another two lists of strings:
{cache.state_s[0]} {cache.state_s[1]}
and
{cache.state_s[0]a} {cache.state_s[1]}
I need command(s) Tcl interpreter accepts to ask if the searching string is matching any of the items in the string list. What is also very important, the solution should only return positive result for the first list. I tried:
set pattern {.state_s[0]}
set escaped_pattern [string map {* \\* ? \\? [ \\[ ] \\] \\ \\\\} $pattern]
set m1 {{cache.state_s[0]} {cache.state_s[1]}}
set m2 {{cache.state_s[0]a} {cache.state_s[1]}}
regexp $escaped_pattern $m1
regexp $escaped_pattern $m2
However, the above commands are returning "1" with both regexp calls.
Basically, I need a way to check if a substring (having special chars like [) is at the end of a string.

You have the elements as a list in a variable m1.
set m1 {{cache.state_s[0]} {cache.state_s[1]}}
set m2 {{cache.state_s[0]a} {cache.state_s[1]}}
But, when you apply it against regexp, Tcl will treat the input m1 as a whole string, not a list.
Since both the list contains the string .state_s[0], regexp returning the result as 1.
If you want to apply the regular expression for each element, then I would recommend to use the lsearch with -regexp flag.
% set m1 {{cache.state_s[0]} {cache.state_s[1]}}
{cache.state_s[0]} {cache.state_s[1]}
% set m2 {{cache.state_s[0]a} {cache.state_s[1]}}
{cache.state_s[0]a} {cache.state_s[1]}
%
% lsearch -regexp -inline $m1 {\.state_s\[0]$};
cache.state_s[0]
% lsearch -regexp -inline $m2 {\.state_s\[0]$}
%
The pattern I have used here is {\.state_s\[0]$}. The last $ symbol represents end of line. With this, we are ensuring that the element doesn't have any more characters in it. We don't have escape the closing square bracket ] in Tcl.

Related

Regexp not matching string with [] and / in Tcl

I am unable to match regex with a pin name having patterns with / and []. How to match string with this expression in tcl regexp?
ISSUE:
% set inst "channel/rptrw12\[5\]"
channel/rptrw12[5]
% set pin "channel/rptrw12\[5\]/rpinv\[11\]/vcc"
channel/rptrw12[5]/rpinv[11]/vcc
% regexp -nocase "^$inst" $pin
0
PASSING CASE:
% regexp -nocase vcc $pin
1
% set pat "ctrl/crdtfifo"
ctrl/crdtfifo
% set pin2 "ctrl/crdtfifo/iwdatabuf"
ctrl/crdtfifo/iwdatabuf
% regexp -nocase $pat $pin2
1
Your problem is that you are fighting with RE engine metacharacters, specifically […], which defines a character set. If you want to continue using your current approach, you'll need to add more backslashes.
But you don't have to do that!
If you are asking the question “does this string exist in that string?” you can also consider using one of these:
Use string first and check if the result (where the substring is) is not negative:
if {[string first $inst $pin] >= 0} {
puts "Found it"
}
Use regexp ***=, which means “interpret the rest of this as a literal string, no metacharacters”:
if {[regexp ***=$inst $pin]} {
puts "Found it"
}
If you only want to match for equality at the start of the string (you're asking “does this string start with that string?”) you probably should instead do one of these:
Use string first and check if the resulting index is zero:
if {[string first $inst $pin] == 0} {
puts "Found '$inst' at the start of '$pin'"
}
Use string equal with the right option (very much like strncmp() in C, if you know that):
if {[string equal -length [string length $inst] $inst $pin]} {
puts "'$pin' starts with '$inst'"
}
If you remember your regular expressions, the [] syntax has special meaning in regexp. It defines a character group. For example:
[abc]
means match a or b or c.
Therefore the pattern:
channel/rptrw12[5]
means match the string:
channel/rptrw125
If you want to match the literal character [ in regexp you need to escape it (same with all other characters that have meaning in regexp like . or ? or ( etc.). So your pattern should be:
channel/rptrw12\[5\]
But remember, the characters \ and [ has special meaning in tcl strings. So your code must do:
set inst "channel/rptrw12\\\[5\\\]"
The first \ escapes the \ character so that tcl will insert a single \ into the string. The third \ escapes the [ character so that tcl will not try to execute a command or function named 5.
Alternatively you can use {} instead of "":
set inst {channel/rptrw12\[5\]}

Need explanation of tcl regexp inline example in the man page please

While trying to understand regexp and --inline use, saw this example but couldn't understand how it works.
Link to the man page is: http://www.tcl.tk/man/tcl8.4/TclCmd/regexp.htm#M13
In there, under --inline option, this example was given:
regexp -inline -- {\w(\w)} " inlined "
=> {in n}
regexp -all -inline -- {\w(\w)} " inlined "
=> {in n li i ne e}
How does this "{\w(\w)}" yield "{in n}"? Can someone explain please.
Appreciate the help.
Thanks
If -inline but not -all is not given, regexp returns a list consisting of one value for the entire region matched and one value for each submatch (regions captured by parentheses). To see what the entire match is, ignore the parentheses: the pattern is now {\w\w}, matching the two first word characters in the string (in). The first submatch is what you get if you skip one word character (the \w outside the parentheses) and then capture the next word character (the \w inside the parentheses), getting n.
If both -inline and -all are given, regexp does this repeatedly, restarting at the first character beyond the last entire match.
I think that to understand -inline, you must first understand that -inline puts the matches (and submatches) in a list. Because if you had...
regexp -- {\w(\w)} " inlined " m1 m2
You will have...
% puts $m1
in
% puts $m2
n
As the whole match in is stored in m1 while the submatch of the capture group n is stored in m2.
Putting those in a list (i.e. when using -inline) will give {in n}.
When you now have -all and -inline at the same time (assuming that you already know that -all retrieves all non-overlapping matches in regexp), you can no more use variable names after the input string, so you get a list containing all the matches and submatches and if I have to name them m and s (for match and submatch respectively), you have:
in n li i ne e
m s m s m s

How to match nth occurrence in a string using regular expression

How to match nth occurrence in a string using regular expression
set test {stackoverflowa is a best solution finding site
stackoverflowb is a best solution finding site stackoverflowc is a
best solution finding sitestackoverflowd is a best solution finding
sitestackoverflowe is a best solution finding site}
regexp -all {stackoverflow} $test
The above one give "5" as output
regexp {stackoverflow} $test
The above one give stackoverflow as result, here it is matching first occurrence of stackoverflow (i.e) stackoverflowa
My requirement is i want to match 5th occurrence of stackoverflow (i.e) stackoverflowe from the above given string.
Please some one clarify my question..Thanks
Then another one question
Try
set results [regexp -inline -all {stackoverflow.} $test]
# => stackoverflowa stackoverflowb stackoverflowc stackoverflowd stackoverflowe
puts [lindex $results 4]
I'll be back to explain this further shortly, making pancakes right now.
So.
The command returns a list (-inline) of all (-all) substrings of the string contained in test that match the string "stackoverflow" (less quotes) plus one character, which can be any character. This list is stored in the variable result, and by indexing with 4 (because indexing is zero-based), the fifth element of this list can be retrieved (and, in this case, printed).
The dot at the end of the expression wasn't in your expression: I added it to check that I really did get the right match. You can of course omit the dot to match "stackoverflow" exactly.
ETA (from Donal's comment): in many cases it's convenient to extract not the string itself, but its position and extent within the searched string. The -indices option gives you that (I'm not using the dot in the expression now: the index list makes it obvious which one of the "stackoverflow"s I'm getting anyway):
set indices [regexp -inline -all -indices {stackoverflow} $test]
# => {0 12} {47 59} {94 106} {140 152} {186 198}
You can then use string range to get the string match:
puts [string range $test {*}[lindex $indices 4]]
The lindex $indices 4 gives me the list 186 198; the {*} prefix makes the two elements in that list appear as two separate arguments in the invocation of string range.

need help in tcl command usage for regsub

I am new learner for tcl. I have some issue as below when using regsub. Consider the following scenario:
set test1 [list prefix_abc_3 abc_1 abc_2 AAA_0]
set test2 abc
regsub -all ${test2}_[1-9] $test1 [list] test1
I expected $test1 output is [prefix_abc_3 AAA_0]
However regsub has also removed the partial matched string which is prefix_abc_3. Does anyone here have any idea on how to regsub the exact words only in a list?
I tried to find solution via net but could not get any clue/hints. Appreciate if someone here can help me.
\m and \M in regexps match the beginning and end of a word respectively. But you don't have a string of words in test1, but a list of elements, and sometimes there's a difference so don't mix the two. regsub only handles strings while lsearch works with lists:
set test1 [list prefix_abc_3 abc_1 abc_2 AAA_0]
set test2 abc
set test1 [lsearch -all -inline -not -regexp $test1 "^${test2}_\[1-9\]\$"]
If the pattern is that simple, you can use the -glob option (the default) instead of -regexp and maybe save some processor time.
What exactly did you execute?
When I type the commands above into tclsh, it displays an error -
% set test1 [list prefix_abc_3 abc_1 abc_2 AAA_0]
prefix_abc_3 abc_1 abc_2 AAA_0
% set test2 abc
abc
% regsub -all ${test2}_[1-9] [list] test1
invalid command name "1-9"
I'm unsure what you are trying to do. You start by inisitalising test1 as a list. You then treat it as a string by passing it to regsub. This is a completely legal thing to do, but may indicate that you are confused by something. Are you trying to test your substitution by applying it four times, to each of prefix_abc_3, abc_1, abc_2 and AAA_0? You can certainly do that the way you are, but a more natural way would be to do
foreach test $test1 {
regsub $pattern $test [list] testResult
puts stdout $testResult
}
Then again, what are you trying to achieve with your substitution? It looks as though your are trying to replace the stringabc with a null string, i.e. remove it altogether. Passing [list] as a null string is perfectly valid, but again may indicate confusion between lists and strings.
To achieve the result you want, all you need to do is add a leading space to your pattern, pass a space as the substitution string and escape the square brackets, i.e.
regsub -all " ${test2}_\[-9\]" $test1 " " test1
but I suspect that this is a made-up example and you're really trying to do something slightly different.
Edit
To obtain a list that contains just those list entries that don't exactly match your pattern, I suggest
proc removeExactMatches {input} {
set result [list]; # Initialise the result list
foreach inputElement $input {
if {![regexp {^abc_[0-9]$} $inputElement]} {
lappend result $inputElement
}
}
return $result
}
set test1 [removeExactMatches [list prefix_abc_3 abc_1 abc_2 AAA_0]]
Notes:
i) I don't use regsub at all.
ii) Although it's safe and legal to switch around between lists and strings, it all takes time and it obscures what I'm tryng to do, so I avoid it wherever possible. You seem to have a list of strings and you want to remove some of them, so that's what I use in my suggested solution. The regular expression commands in Tcl handle strings so I pass them strings.
iii) To ensure that the list elements match exactly, I anchor the pattern to the start and end of the string that I'm matching against using ^ and $.
iv) To prevent the interpreter from recognising the [1-9] in the regular expression pattern and trying to execute a (non-existant) command 1-9, I enclose the whole pattern string within curly brackets.
v) For greater generality, I might want to pass the pattern to the proc as well as the input list (of strings), in that case, I'd do
proc removeExactMatches {inputPattern input} {
.
.
.
set pattern "^"
append pattern $inputPattern
append pattern "\$"
.
.
.
if {![regub $pattern $inputElement]} {
.
.
.
}
set test1 [removeExactMatches {abc_[1-9]} {prefix_abc_3 abc_1 abc_2 AAA_0}]
to minimise the number of characters that had to be escaped. (Actually I probably wouldn't use the quotation marks for the start and end anchors within the proc - they aren't really needed and I'm a lazy typist!)
Looking at your original question, it seems that you might want to vary only the abc part of the pattern, in which case you might want to just pass that to your proc and append the _[0-9] as well as the anchors within it - don't forget to escape the square brackets or use curly brackets if you go down this route.

How do I extract all matches with a Tcl regex?

hi everybody i want solution for this regular expression, my problem is Extract all the hex numbers in the form H'xxxx, i used this regexp but i didn't get all hexvalues only i get one number, how to get whole hex number from this string
set hex "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set res [regexp -all {H'([0-9A-Z]+)&} $hex match hexValues]
puts "$res H$hexValues"
i am getting output is 5 H4D52
On -all -inline
From the documentation:
-all : Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.
-inline : Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression.
Thus to return all matches --including captures by groups-- as a flat list in Tcl, you can write:
set matchTuples [regexp -all -inline $pattern $text]
If the pattern has groups 0…N-1, then each match is an N-tuple in the list. Thus the number of actual matches is the length of this list divided by N. You can then use foreach with N variables to iterate over each tuple of the list.
If N = 2 for example, you have:
set numMatches [expr {[llength $matchTuples] / 2}]
foreach {group0 group1} $matchTuples {
...
}
References
regular-expressions.info/Tcl
Sample code
Here's a solution for this specific problem, annotated with output as comments (see also on ideone.com):
set text "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set pattern {H'([0-9A-F]{4})}
set matchTuples [regexp -all -inline $pattern $text]
puts $matchTuples
# H'22EF 22EF H'2354 2354 H'4BD4 4BD4 H'4C4B 4C4B H'4D52 4D52 H'4DC9 4DC9
# \_________/ \_________/ \_________/ \_________/ \_________/ \_________/
# 1st match 2nd match 3rd match 4th match 5th match 6th match
puts [llength $matchTuples]
# 12
set numMatches [expr {[llength $matchTuples] / 2}]
puts $numMatches
# 6
foreach {whole hex} $matchTuples {
puts $hex
}
# 22EF
# 2354
# 4BD4
# 4C4B
# 4D52
# 4DC9
On the pattern
Note that I've changed the pattern slightly:
Instead of [0-9A-Z]+, e.g. [0-9A-F]{4} is more specific for matching exactly 4 hexadecimal digits
If you insist on matching the &, then the last hex string (H'4DC9 in your input) can not be matched
This explains why you get 4D52 in the original script, because that's the last match with &
Maybe get rid of the &, or use (&|$) instead, i.e. a & or the end of the string $.
References
regular-expressions.info/Finite Repetition, Anchors
I'm not Tclish, but I think you need to use both the -inline and -all options:
regexp -all -inline {H'([0-9A-Z]+)&} $string
EDIT: Here it is again, this time with a corrected regex (see the comments):
regexp -all -inline {H'[0-9A-F]+&} $string