tcl how to split a string by using regexp - regex

I have some string with format
class(amber#good)
class(Back1#notgood)
class(back#good)
and I want to use regexp to get value of these string
Expected answer:
amber
Back1
back
And here's my cmd:
set string "class(amber#good)"
regexp -all {^\\([a-zA-z_0-9].\#$} $string $match
puts $match
But the answer is not what I expected

You can use
regexp {\(([^()#]+)} $string - match
See the Tcl demo online.
The \(([^()#]+) regex matches
\( - a ( char
([^()#]+) - Capturing group 1 (match): any one or more chars other than parentheses and #.
The hyphen is used since the whole-match value is not necessary, we are only interested to get Group 1 value.

Sometimes using regular expressions is error prone and/or overkill.
Here's an alternate answer using split:
lindex [split $string "()#"] 1

Related

Perl: remove a part of string after pattern

I have strings like this:
trn_425374_1_94_-
trn_12_1_200_+
trn_2003_2_198_+
And I want to split all after the first number, like this:
trn_425374
trn_12
trn_2003
I tried the following code:
$string =~ s/(?<=trn_\d)\d+//gi;
But returns the same as the input. I have been following examples of similar questions but I don't know what I'm doing wrong. Any suggestion?
If you are running Perl 5 version 10 or later then you have access to the \K ("keep") regular expression escape. Everything before the \K is excluded from the substitution, so this removes everything after the first sequence of digits (except newlines)
s/\d+\K.+//;
with earlier versions of Perl, you will have to capture the part of the string you want to keep, and replace it in the substitution
s/(\D*\d+).+/$1/;
Note that neither of these will remove any trailing newline characters. If you want to strip those as well, then either chomp the string first, or add the /s modifier to the substitution, like this
s/\d+\K.+//s;
or
s/(\D*\d+).+/$1/s;
Do grouping to save first numbers of digits found and use .* to delete from there until end of line:
#!/usr/bin/env perl
use warnings;
use strict;
while ( <DATA> ) {
s/(\d+).*$/$1/ && print;
}
__DATA__
trn_425374_1_94_-
trn_12_1_200_+
trn_2003_2_198_+
It yields:
trn_425374
trn_12
trn_2003
your regexr should be:
$string =~ s/(trn_\d+).*/$1/g;
It substitutes the whole match by the memorized at $1 (which is the string part you want to preserve)
Use \K to preserve the part of the string you want to keep:
$string =~ s/trn_\d+\K.*//;
To quote the link above:
\K
This appeared in perl 5.10.0. Anything matched left of \K is not
included in $& , and will not be replaced if the pattern is used in a
substitution.

TCL regexp pattern search

I am trying to find a pattern match as below
abc(xxxx):efg(xxxx):xyz(xxxx) where xxxx - [0-9] digits
I used
set string "my string is abc(xxxx):efg(xxxx):xyz(xxxx)"
regexp abc(....):efg(....):xyz(....) $string result_str
it returns 0. Can anyone help?
The problem you've got is that ( and ) have special meaning to regular expressions in Tcl (and many other RE engines besides) in that they denote a capturing sub-RE. To make the characters “normal”, they have to be escaped with a backslash, and that means that it's best to put the regular expression in braces (because backslashes are general Tcl metacharacters).
Thus:
% set string "my string is abc(xxxx):efg(xxxx):xyz(xxxx)"
% regexp {abc\(....\):efg\(....\):xyz\(....\)} $string
1
If you want to also capture the contents of those parentheses, you need a slightly more complex RE:
regexp {abc\((....)\):efg\((....)\):xyz\((....)\)} $string \
all abc_bit efg_bit xyz_bit
Note that those .... sequences always match exactly four characters, but it's better to be more specific. To match any number of digits in each case:
regexp {abc\((\d+)\):efg\((\d+)\):xyz\((\d+)\)} $string -> abc efg xyz
When using regexp to extract bits of a string, it's pretty common to use -> as a (rather strange) variable name for the whole string match; it looks mnemonically like it's saying “send the pieces extracted to these variables”.
Not worked with tcl but seems like you need to escape the ( and ). Also if you are sure that the x's would be digits, use \d{4} instead of ..... Based on this, the updated regex you could try is
abc\(\d{4}\):efg\(\d{4}\):xyz\(\d{4}\).

how to remove duplicate charecter strictly using regexp in tcl

How to remove duplicate characters in a string strictly using regexp in TCL?
e.g., I have a string like this aabbcddeffghh and I need only characters that are "abcdefgh". I tried with lsort unique, i am able to get unique characters:
join [lsort -unique [split $mystring {}]]
but i need using regexp command only.
You can't remove all non-consecutive double characters from a string with just Tcl's regsub command. It doesn't support access to back-references in lookahead sequences, which means that any scheme for removal will necessarily run into problems with overlapping match regions.
The simplest fix is to wrap in a while loop (with an empty body), using the fact that regsub will return the number of substitutions performed when it's given a variable to store the result in (last argument to it below):
set str "mississippi mud pie"
while {[regsub -all {(.)(.*)\1+} $str {\1\2} str]} {}
puts $str; # Prints "misp ude"
Try this one:
regsub -linestop -lineanchor -all {([a-z])\1+} $subject {\1} result
or
regsub -linestop -nocase -lineanchor -all {([a-z])\1+} $subject {\1} result
Explanation
{
( # Match the regular expression below and capture its match into backreference number 1
[a-z] # Match a single character in the range between “a” and “z”
)
\1 # Match the same text as most recently matched by capturing group number 1
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
}
regsub -all {(.)(?=.*\1)} $subject {} result
It uses a look-ahead to check if there are any more instances of the character. If there are, it removes the character.
You will always retain the last character. It is not possible to do look-behinds in TCL without extra libraries.
More information about look-arounds: Regex tutorial - Lookahead and Lookbehind Zero-Width Assertions
Edit: Hmmm... Seems to be a bug with backreferences in Tcl 8.5. {(.).*\1} matches, but not {(.)(?=.*\1)}. It complains about Invalid backreference number. I can't see any solution to this without a backreference inside a look-ahead.
It might just be the version i tested it on (ideone.com/pFS0Q). I can't find any other version of a Tcl interpreter online to test.

Insertion with Regex to format a date (Perl)

Suppose I have a string 04032010.
I want it to be 04/03/2010. How would I insert the slashes with a regex?
To do this with a regex, try the following:
my $var = "04032010";
$var =~ s{ (\d{2}) (\d{2}) (\d{4}) }{$1/$2/$3}x;
print $var;
The \d means match single digit. And {n} means the preceding matched character n times. Combined you get \d{2} to match two digits or \d{4} to match four digits. By surrounding each set in parenthesis the match will be stored in a variable, $1, $2, $3 ... etc.
Some of the prior answers used a . to match, this is not a good thing because it'll match any character. The one we've built here is much more strict in what it'll accept.
You'll notice I used extra spacing in the regex, I used the x modifier to tell the engine to ignore whitespace in my regex. It can be quite helpful to make the regex a bit more readable.
Compare s{(\d{2})(\d{2})(\d{4})}{$1/$2/$3}x; vs s{ (\d{2}) (\d{2}) (\d{4}) }{$1/$2/$3}x;
Well, a regular expression just matches, but you can try something like this:
s/(..)(..)(..)/$1/$2/$3/
#!/usr/bin/perl
$var = "04032010";
$var =~ s/(..)(..)(....)/$1\/$2\/$3/;
print $var, "\n";
Works for me:
$ perl perltest
04/03/2010
I always prefer to use a different delimiter if / is involved so I would go for
s| (\d\d) (\d\d) |$1/$2/|x ;

Non greedy LookAhead

I have strings like follows:
val:key
I can capture 'val' with /^\w*/.
How can I now get 'key' without the ':' sign?
Thanks
How about this?
/^(\w+):(\w+)$/
Or if you just want to capture everything after the colon:
/:(.+)/
Here's a less clear example using a lookbehind assertion to ensure a colon occurred before the match - the entire match will not include that colon.
/(?<=:).*/
What language are you using? /:(.*)/ doesn't capture the ":" but it does match the ':'
In Perl, if you say:
$text =~ /\:(.*)/;
$capture = $1;
$match = $&;
Then $capture won't have the ":" and $match will. (But try to avoid using $& as it slows down Perl: this was just to illustrate the match).
This will capture the key in group 1 and the value in group 2. It should work correctly even when the value contails a colon (:) character.
^(\w+?):(.*)
/\:(\w*)/
That looks for : and then captures all the word characters after it till the end of the string