need help in tcl command usage for regsub - regex

I am new learner for tcl. I have some issue as below when using regsub. Consider the following scenario:
set test1 [list prefix_abc_3 abc_1 abc_2 AAA_0]
set test2 abc
regsub -all ${test2}_[1-9] $test1 [list] test1
I expected $test1 output is [prefix_abc_3 AAA_0]
However regsub has also removed the partial matched string which is prefix_abc_3. Does anyone here have any idea on how to regsub the exact words only in a list?
I tried to find solution via net but could not get any clue/hints. Appreciate if someone here can help me.

\m and \M in regexps match the beginning and end of a word respectively. But you don't have a string of words in test1, but a list of elements, and sometimes there's a difference so don't mix the two. regsub only handles strings while lsearch works with lists:
set test1 [list prefix_abc_3 abc_1 abc_2 AAA_0]
set test2 abc
set test1 [lsearch -all -inline -not -regexp $test1 "^${test2}_\[1-9\]\$"]
If the pattern is that simple, you can use the -glob option (the default) instead of -regexp and maybe save some processor time.

What exactly did you execute?
When I type the commands above into tclsh, it displays an error -
% set test1 [list prefix_abc_3 abc_1 abc_2 AAA_0]
prefix_abc_3 abc_1 abc_2 AAA_0
% set test2 abc
abc
% regsub -all ${test2}_[1-9] [list] test1
invalid command name "1-9"
I'm unsure what you are trying to do. You start by inisitalising test1 as a list. You then treat it as a string by passing it to regsub. This is a completely legal thing to do, but may indicate that you are confused by something. Are you trying to test your substitution by applying it four times, to each of prefix_abc_3, abc_1, abc_2 and AAA_0? You can certainly do that the way you are, but a more natural way would be to do
foreach test $test1 {
regsub $pattern $test [list] testResult
puts stdout $testResult
}
Then again, what are you trying to achieve with your substitution? It looks as though your are trying to replace the stringabc with a null string, i.e. remove it altogether. Passing [list] as a null string is perfectly valid, but again may indicate confusion between lists and strings.
To achieve the result you want, all you need to do is add a leading space to your pattern, pass a space as the substitution string and escape the square brackets, i.e.
regsub -all " ${test2}_\[-9\]" $test1 " " test1
but I suspect that this is a made-up example and you're really trying to do something slightly different.
Edit
To obtain a list that contains just those list entries that don't exactly match your pattern, I suggest
proc removeExactMatches {input} {
set result [list]; # Initialise the result list
foreach inputElement $input {
if {![regexp {^abc_[0-9]$} $inputElement]} {
lappend result $inputElement
}
}
return $result
}
set test1 [removeExactMatches [list prefix_abc_3 abc_1 abc_2 AAA_0]]
Notes:
i) I don't use regsub at all.
ii) Although it's safe and legal to switch around between lists and strings, it all takes time and it obscures what I'm tryng to do, so I avoid it wherever possible. You seem to have a list of strings and you want to remove some of them, so that's what I use in my suggested solution. The regular expression commands in Tcl handle strings so I pass them strings.
iii) To ensure that the list elements match exactly, I anchor the pattern to the start and end of the string that I'm matching against using ^ and $.
iv) To prevent the interpreter from recognising the [1-9] in the regular expression pattern and trying to execute a (non-existant) command 1-9, I enclose the whole pattern string within curly brackets.
v) For greater generality, I might want to pass the pattern to the proc as well as the input list (of strings), in that case, I'd do
proc removeExactMatches {inputPattern input} {
.
.
.
set pattern "^"
append pattern $inputPattern
append pattern "\$"
.
.
.
if {![regub $pattern $inputElement]} {
.
.
.
}
set test1 [removeExactMatches {abc_[1-9]} {prefix_abc_3 abc_1 abc_2 AAA_0}]
to minimise the number of characters that had to be escaped. (Actually I probably wouldn't use the quotation marks for the start and end anchors within the proc - they aren't really needed and I'm a lazy typist!)
Looking at your original question, it seems that you might want to vary only the abc part of the pattern, in which case you might want to just pass that to your proc and append the _[0-9] as well as the anchors within it - don't forget to escape the square brackets or use curly brackets if you go down this route.

Related

TCL Find in list

I need to search in a list an exact word CUT_LEVEL_EP1.5 without being able to use the elements that are before and after my string. below is my list :
set l {L1555(CUT_LEVEL_EP1.5-1) L1560(CUT_LEVEL_EP1.5-2) L1565(CUT_LEVEL_EP1.5) L1570(CUT_LEVEL_EP1.5-3)}
here’s what I tried to do :
set index [lsearch -regexp $l {\yCUT_LEVEL_EP1.5\M}]
if {$index > -1} {
puts [lindex $l $index]
}
The result is not what I expect L1555(CUT_LEVEL_EP1.5-1) I would like this L1565(CUT_LEVEL_EP1.5)
As your requirements don't match the built-in definition of a "word", you cannot use the \M constraint escape. An alternative may be to use a negative lookahead:
lsearch -regexp $l {\yCUT_LEVEL_EP1.5(?![A-Z0-5.-])}
It looks to me like you want to find the string only if it is the entirety of the contents of the parentheses. (You also want to avoid the trap of . matching things other than itself.) You can do it easily, but only with a little trickery.
set str "CUT_LEVEL_EP1.5"
set index [lsearch -regexp $l ***=($str)]
The ***= says that the rest of the RE is a literal, and then we put the parentheses directly (as well as substituting in the string we are really looking for, safe because of the force-literal trick).

In Tcl how can I remove all zeroes to the left but the zeroes to the right should remain?

Folks! I ran into a problem that I can't solve by myself.
Since the numbers "08" and "09" cannot be read like the others (01,02,03,04, etc ...) and must be treated separately in the language Tcl.
I can't find a way to remove all [I say ALL because there are more than one on the same line] the leading zeros except the one on the right, which must remain intact.
It may sound simple to those who are already thoroughly familiar with the Tcl / Tk language. But for me, who started out and am looking for more information about Tcl / Tk, I read a lot of material on the internet, including this https: // stackoverflow.com/questions/2110864/handling-numbers-with-leading-zeros-in-tcl#2111822
So nothing to show me how to do this in one sweep eliminating all leading zeros.
I need you to give me a return like this: 2:9:10
I need this to later manipulate the result with the expr [arithmetic expression] command.
In this example it just removes a single leading zero:
set time {02:09:10}
puts [regsub {^0*(.+)} $time {\1}]
# Return: 2:09:10
If anyone can give me that strength friend?! I'm grateful right now.
The group (^|:) matches either the beginning of the string or a colon.
0+ matches one or more zeros. Replace with the group match \1,
otherwise the colons get lost. And of course, use -all to do all of
the matches in the target string.
% set z 02:09:10
02:09:10
% regsub -all {(^|:)0+} $z {\1} x
2
% puts $x
2:9:10
%
Edit: As Barmar points out, this will change :00 to an empty string.
A better regex might be:
regsub -all {(^|:)0} $z {\1} x
This will only remove a single leading 0.
You're only matching the 0 at the beginning of the string, you need to match after each : as well.
puts [regsub -all {(^|:)0*([^:])} $time {\1\2}]
In general it is best to use scan $str %d to convert a decimal number with possible leading zeroes to its actual value.
But in your case this will also work (and seems simpler to me than the answers given earlier and doesn't rely on the separator being a colon):
regsub -all {0*(\d+)} $time {\1}
This will remove any number of leading zeroes, but doesn't trim 00 down to an empty string. Also trailing zeroes will not be affected.
regsub -all {0*(\d+)} {0003:000:1000} {\1} => 3:0:1000
the scan command is useful here to extract three decimal numbers out of that string:
% set time {02:09:10}
02:09:10
% scan $time {%d:%d:%d} h m s
3
% puts [list $h $m $s]
2 9 10
There are a few tricky edge cases here. Specifically, the string 02:09:10:1001:00 covers the key ones (including middle zeroes, only zeroes). We can use a single substitution command to do the work:
regsub -all {\m0+(?=\d)} $str {}
(This uses a word start anchor and lookahead constraint.)
However, I would be more inclined to use other tools for this sort of thing. For times, for example, parsing them is better done with scan:
set time "02:09:10"
scan $time "%d:%d:%d" h m s
Or, depending on what is going on, clock scan (which handles dates as well, making it more useful in some cases and less in others).

Passing a variable to regexp when the variable may have brackets (TCL)

In my job, I deal a lot with entities whose names may contain square brackets. We mostly use tcl, so square brackets can sometimes cause havoc. I'm trying to do the following:
set pat {pair_shap_val[9]}
set aff {pair_shap_val[9]_affin_input}
echo [regexp "${pat}_affin.*" $aff]
However, this returns a 0 when I would expect a 1. I'm certain that when ${pat} is passed to the regexp engine, the brackets are being expanded and read as "[9]" instead of "[9]".
How do I phrase the regexp so a pattern contains a variable when the variable itself may have special regexp characters?
EDIT: An easy way would be to just escape the brackets when setting $pat. However, the value for $pat is passed to me by a function so I cannot easily do that.
Just ruthlessly escape all non-word chars:
set pat {pair_shap_val[9]}
set aff {pair_shap_val[9]_affin_input}
puts [regexp "${pat}_affin.*" $aff] ;# ==> 0
set escaped_pat [regsub -all {\W} $pat {\\&}]
puts $escaped_pat ;# ==> pair_shap_val\[9\]
puts [regexp "${escaped_pat}_affin.*" $aff] ;# ==> 1
A second thought: this doesn't really seem to require regular expression matching. It appears you just need to check that the pat string is contained in the aff string:
% expr {[string first $pat $aff] != -1}
1

How to match nth occurrence in a string using regular expression

How to match nth occurrence in a string using regular expression
set test {stackoverflowa is a best solution finding site
stackoverflowb is a best solution finding site stackoverflowc is a
best solution finding sitestackoverflowd is a best solution finding
sitestackoverflowe is a best solution finding site}
regexp -all {stackoverflow} $test
The above one give "5" as output
regexp {stackoverflow} $test
The above one give stackoverflow as result, here it is matching first occurrence of stackoverflow (i.e) stackoverflowa
My requirement is i want to match 5th occurrence of stackoverflow (i.e) stackoverflowe from the above given string.
Please some one clarify my question..Thanks
Then another one question
Try
set results [regexp -inline -all {stackoverflow.} $test]
# => stackoverflowa stackoverflowb stackoverflowc stackoverflowd stackoverflowe
puts [lindex $results 4]
I'll be back to explain this further shortly, making pancakes right now.
So.
The command returns a list (-inline) of all (-all) substrings of the string contained in test that match the string "stackoverflow" (less quotes) plus one character, which can be any character. This list is stored in the variable result, and by indexing with 4 (because indexing is zero-based), the fifth element of this list can be retrieved (and, in this case, printed).
The dot at the end of the expression wasn't in your expression: I added it to check that I really did get the right match. You can of course omit the dot to match "stackoverflow" exactly.
ETA (from Donal's comment): in many cases it's convenient to extract not the string itself, but its position and extent within the searched string. The -indices option gives you that (I'm not using the dot in the expression now: the index list makes it obvious which one of the "stackoverflow"s I'm getting anyway):
set indices [regexp -inline -all -indices {stackoverflow} $test]
# => {0 12} {47 59} {94 106} {140 152} {186 198}
You can then use string range to get the string match:
puts [string range $test {*}[lindex $indices 4]]
The lindex $indices 4 gives me the list 186 198; the {*} prefix makes the two elements in that list appear as two separate arguments in the invocation of string range.

greedy matching in regexp

I have the following output:
Player name: RON_98
Player name: RON_97
player name: RON_96
I need to get the RON part and the digital part after it(for example 98),I used the following regexp: regexp "(RON)_(\[0-9]*)",does this will match the RON_96 of the last line? "*" is greedy match, how to match only the first line of the output? do we have something like (RON)_(only match digital)? and can prevent it to match the rest of the line?
Always put regular expressions in braces in Tcl.
It's not technically necessary (you can use Tcl's language definition to exactly work out what backslashes would be needed to do it any other way) but it's simpler in all cases that you're likely to encounter normally.
The examples below will use this.
Regular expressions start matching as soon as they can. Then, under normal (greedy) circumstances they match as much text as they can. Thus, with your sample code and text, the matcher starts trying to match at the R on the first line and goes on to consume up to the 8, at which point it has a match and stops. You can verify this by asking regexp to report the indices into the string where the match happened instead of the substring that was matched (via the -indices option, documented on the manual page).
To get all the matches in a string, you have two options:
Pass the -all -inline options to regexp and process the list of results with foreach:
# Three variables in foreach; one for whole match, one for each substring
foreach {a b c} [regexp -all -inline {(RON)_([0-9]*)} $thedata] {
puts "matched '$a', with b=$b and c=$c"
}
Use the -indices option together with the -start option, all in a while loop, so you step through the string:
set idx 0
while {[regexp -start $idx -indices {(RON)_([0-9]*)} $thedata a b c]} {
puts "matched at '$a', with subranges '$b' and '$c'"
set extracted [string range $thedata {*}$c]
puts "the extracted value is '$extracted'"
# Advance the place where the next search will start from
set idx [expr {[lindex $a 1] + 1}]
}
I'd normally recommend using the first option; it's much easier to use! Sometimes the second is better as it provides more information and uses less intermediate storage, but it's also much trickier to get right.
Even if you select your stated regex to match multiple lines, it will not match more than the first occurance of what you stated, and this is "RON_98". It will stop after the last digit of the first match. You could even force it to stop after reading a line by using $ at the end of your RegEx (matches an end of line).
For reference, the [0-9] can be written easier as \d (Digit):
(RON)_\d*
is easier to read.