tcl: regexp match subsring at the end of string - regex

I trying to match a substring occurs many times in string
str1 = st1.st2.{k}.st3.{k}.st4.{k}.
str2 = st1.st2.{k}.st3.{k}.st4.
I use regexp to match "{k}" at the end of str1:
regexp .*\.\{k\}\.$ $str1
but I got 0 !!
in fact I use regsub to test the regexp
regsub {.*\.\{k\}\.$} $str {}
result ==> empty
if the pattern is matched, the matched string will be removed !!
what missing in regexp expression ?

In your code, the regexp is returning the value 1 only, not 0. When you want to match the last occurrence of .{k}., you have to go ahead with sub-matches to get what you want.
set str1 st1.st2.{k}.st3.{k}.st4.{k}.
puts [regexp ".*(\.{k}\.)" $str1 whole last]
puts $last
Output :
1
.{k}.
The $ sign is not mandatory to specify the end of line as we simply want to match the last occurrence.
With the regsub, you should be using the back-reference to capture the 1st group, so that it can be replaced correctly.
puts [regsub "(.*)(\.{k}\.)" $str1 "\\1"]
Output :
st1.st2.{k}.st3.{k}.st4
What is wrong with regsub {.*\.\{k\}\.$} $str {} ?
Well, the pattern .*\.\{k\}\.$ will match the whole string and you are replacing it with empty string, which is why you are getting the empty result.
Reference : Noncapturing Subpatterns

Related

tcl how to split a string by using regexp

I have some string with format
class(amber#good)
class(Back1#notgood)
class(back#good)
and I want to use regexp to get value of these string
Expected answer:
amber
Back1
back
And here's my cmd:
set string "class(amber#good)"
regexp -all {^\\([a-zA-z_0-9].\#$} $string $match
puts $match
But the answer is not what I expected
You can use
regexp {\(([^()#]+)} $string - match
See the Tcl demo online.
The \(([^()#]+) regex matches
\( - a ( char
([^()#]+) - Capturing group 1 (match): any one or more chars other than parentheses and #.
The hyphen is used since the whole-match value is not necessary, we are only interested to get Group 1 value.
Sometimes using regular expressions is error prone and/or overkill.
Here's an alternate answer using split:
lindex [split $string "()#"] 1

How to extract last occurrence of character in Perl?

I am trying to extract a particular part of the string.
Input String : $str = /wave=1/sin2=1/sin1=2/sin0=3
Output String : $str = /wave=1/sin2=1/sin1=2
Method 1 :
#waveSplitArray = split /\//,$str;
$lastOccuranceOfWave = pop #waveSplitArray;
How to use regex to get the desired output?
The easiest way is to use a greedy quantifier with a "keep" flag \K
If you want to keep the value of $str and put the result in a new variable
my $s2 = $str =~ s|.*\K/.*||r;
or
( my $s2 = $str ) =~ s|.*\K/.*||;
If you want to modify the original string, then it's just
$str =~ s|.*\K/.*||;
Try
/(.*\/)[^\/]*/
and you'll have the desired pattern in $1.
Demo
You can try this method also
my $str = '/wave=1/sin2=1/sin1=2/sin0=3';
my ($st2) = $str =~m{(.+)/};
print $st2;
{(.+)/}
Here {} works like delimiter // (Don't confused with quantifier for example\d{n})
And . matches any character except new line and making this as to match 1 or more times using the + quantifier then it will matches till the END (Because + is greedy quantifier, check the steps here), then it will back track for the /, when it is find the / back track will terminate. And storing into the capturing group (), and capturing group will store into the $st2 variable.
RegEx Demo

What does -line flag do in tcl regular expression?

Below I have copied the code I had written. I don't know what the line flag does.
set value "hi this is venkat345
hi this is venkat435
hi this is venkat567"
regexp -all -line -- {(venkat.+)$} $value a b
puts "Full Match: $a"
puts "Sub Match1: $b"
The above code gives the following output
Full Match: venkat567
Sub Match1: venkat567
Can any one explain me when and where should I choose the -line flag in tcl regular expression
The man page has defined it well I believe:
-line
Enables newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning. With this flag, [^ bracket expressions and . never match newline, ^ matches an empty string after any newline in addition to its normal function, and $ matches an empty string before any newline in addition to its normal function. This flag is equivalent to specifying both -linestop and -lineanchor, or the (?n) embedded option (see the re_syntax manual page).
If you want to understand it another way, . and [^ ... ] usually match newlines, for example:
regexp -- {^....$} "ab\nc"
returns 1 (meaning the regexp matches the string, counting \n as 1 character) but using the -line switch will prevent . to match \n.
Similary:
regexp -- {^[^abc]+$} "de\nf"
will also return 1 because the negated class [^abc] is able to match a character that is not abc, which includes \n.
The second function of the -line switch makes ^ match at every beginning of line instead of matching only at the start of the whole string, and makes $ match at every end of line instead of matching only at the end of the whole string.
% set text {abc
abc}
abc
abc
% regexp -- {^abc$} $text
0
% regexp -line -- {^abc$} $text
1
As for the when and where, it will depend on what you are trying to do. Based on your sample code, it would seem to me that you need to get all the usernames beginning with venkat that can appear at the end of any line. Since you want to match many, you will need to use the -all and -inline switches to get the matched strings, and I would recommend to change the regexp a bit:
set value "hi this is venkat345
hi this is venkat435
hi this is venkat567"
# I removed the capture group and changed . to \S to match non-space characters
set results [regexp -all -inline -line -- {venkat\S+$} $value]
puts $results
# venkat345 venkat435 venkat567
-line just make sure your . will never match a newline.
According to the Tcl regexp documentation:
-line
Enables newline-sensitive matching. By default, newline is a
completely ordinary character with no special meaning. With this flag,
‘[^’ bracket expressions and ‘.’ never match newline, ‘^’ matches an
empty string after any newline in addition to its normal function, and
‘$’ matches an empty string before any newline in addition to its
normal function. This flag is equivalent to specifying both -linestop
and -lineanchor, or the (?n) embedded option (see METASYNTAX, below).
Here is the output without -line option:
Full Match: venkat345
hi this is venkat435
hi this is venkat567
Sub Match1: venkat345
hi this is venkat435
hi this is venkat567
The .+ just matches all the lines up to the value string end.

Powershell regex

Is there a Powershell regex command I could use to replace the last consecutive zero in a text string with a "M". For Example:
$Pattern = #("000123456", "012345678", "000000001", "000120000")
Final result:
00M123456
M12345678
0000000M1
00M120000
Thanks.
Search for the following regex:
"^(0*)0"
The regex searches for a consecutive string of 0 at the beginning ^ of the string. It captures all the 0 except the one for replacement. "^0(0*)" also works, since we only need to take note of the number of 0 which we don't touch.
With the replacement string:
'$1M'
Note that $1 is denotes the text captured by the first capturing group, which is (0*) in the regex.
Example by #SegFault:
"000120000" -replace "^(0*)0", '$1M'

What do these regular expressions mean?

I'm venturing to read a code in Perl and found the following regular expressions:
$str =~ s/(<.+?>)|(&\w+;)/ /gis;
$str =~ /(\w+)/gis
I wonder what these codes represent.
Can anyone help me?
The first one $str =~ s/(<.+?>)|(&\w+;)/ /gis; does a sustitution:
$str : the variable to work on
=~ : do the subs and save in the same variable
s : substitution operator
/ : begining or the regex
( : begining of captured group 1
< : <
.+? : one or more of any char NOT greedy
> : >
) : end of capture group 1
| : alternation
( : begining of captured group 2
& : &
\w+ : one or more word char ie: [a-zA-Z0-9_]
; : ;
) : end of group 2
/ : end of search part
: a space
/ : end of replace part
gis; : global, case insensitive, multi-line
This will replace all tags and encoded element like & or < by a space.
The second one expect that it left at least one word.
One way to help deciper regular expressions is to use the YAPE::Regex::Explain module from CPAN:
#!/usr/bin/env perl
use YAPE::Regex::Explain;
#...may need to single quote $ARGV[0] for the shell...
print YAPE::Regex::Explain->new( $ARGV[0] )->explain;
Assuming this snippet is named 'rexplain' you would do:
$ ./rexplain 's/(<.+?>)|(&\w+;)/ /gis'
The first strips out every XML/HTML tag and every character entity, replacing each one with a space. The second finds every substring consisting entirely of word characters.
In detail:
The first part of the first expression first matches a <, then any character with the . (newlines included thanks to the /s flag at the end). The + modifier would match one or more characters up until the last > found in $str, but the ? after it makes it not greedy, so it only matches up to the first > encountered. The second part matches & followed by any word character until ; is found. Since ; is not a word character, the ? modifier is not needed. The s/ up front means a substitution, and the bit after the second / means that's what any match is substituted with. The /gis at the end means *g*reedy, case *i*nsensitive, and *s*ingle line.
The second expression finds the first substring of non-word characters and puts it in $1. If you call it repeatedly, the /g at the end means that it will keep matching every instance in $str.
The first one takes a string and replaces html tags or html character codes with a space
The second one makes sure there is still a word left when done.
These "codes" are regular expressions. Type this to learn more:
perldoc perlre
The code above replaces with blanks some HTML/XML tags and some URL-encoded characters such as
from $str. But there are better ways to do this using CPAN modules. The code then tries to match and capture into variable $1 the first word in $str.
Ex:
perl -le '$str = "foo<br> bar<another\ntag>baz"; print $str; $str =~ s/(<.+?>)|(&\w+;)/ /gis; $str =~ /(\w+)/gis; print $str; print $1;'
It prints:
foo<br> bar<another
tag>baz
foo bar baz
foo