Map strange behaviour - regex

I have a map function as follows, which reads from an array of lines generated by a unix command.
my %versions = map {
if (m/(?|(?:^Patch\s(?(?=description).*?(\w+)\sPATCH).*?(\d+(?:\.\d+)+).*)|(?:^(OPatch)\s(?=version).*?(\d+(\.\d+)+)))/)
{ 'hello' => 'bye'; }
} #dbnode_versions;
print Dumper(\%versions); gives
$VAR1 = {
'' => undef,
'hello' => 'bye',
'bye' => ''
};
which I find extremely odd, as the hello and bye values should only get added if the regex is true. Anyone able to help me out?

Well, you have to consider what happens when the regex doesn't match, and the if is false. The if will evaluate to some value, although you shouldn't rely on the value of a statement.
Especially, if (cond) { expression } is roughly equivalent to cond and expression. This means that if the regex (our cond) will not match, we'll get a false value.
use Data::Dump;
dd [map { /foo(bar)/ and (hello => 'bye') } qw/foo foobar bar/];
What is your expected output? You may have thought ["hello", "bye"]. But actually, we get
["", "hello", "bye", ""]
because "" represents the false value returned by the regex match on failure.
If you want to return nothing in failure cases, you should explicitly return an empty list:
map { /foo(bar)/ ? (hello => 'bye') : () } qw/foo foobar bar/
or use grep, which filters a list for those elements that match a condition:
my %hash =
map { hello => 'bye' } # replace each matching element
grep { /foo(bar)/ } # filter for matching elements
qw/foo foobar bar/;
The %hash will them either be () or (hello => 'bye'), as each key can only occur once.

Related

Regular expression is too complex error in tcl

I have not seen this error for a small list. Issue popped up when the list went >10k. Is there any limit on the number of regex patterns in tcl?
puts "#LEVELSHIFTER_TEMPLATES_LIMITSFILE:$perc_limit(levelshifter_templates)"
puts "#length of templates is :[llength $perc_limit(levelshifter_templates)]"
if { [regexp [join $perc_limit(levelshifter_templates) |] $temp] }
#LEVELSHIFTER_TEMPLATES_LIMITSFILE:HDPELT06_LVLDBUF_CAQDP_1 HDPELT06_LVLDBUF_CAQDPNRBY2_1 HDPELT06_LVLDBUF_CAQDP_1....
#length of templates is :13520
ERROR: couldn't compile regular expression pattern: regular expression is too complex
If $temp is a single word and you're really just doing a literal test, you should invert the check. One of the easiest ways might be:
if {$temp in $perc_limit(levelshifter_templates)} {
# ...
}
But if you're doing that a lot (well, more than a small number of times, 3 or 4 say) then building a dictionary for this might be best:
# A one-off cost
foreach key $perc_limit(levelshifter_templates) {
# Value is arbitrary
dict set perc_limit_keys $key 1
}
# This is now very cheap
if {[dict exists $perc_limit_keys $temp]} {
# ...
}
If you've got multiple words in $temp, split and check (using the second technique, which is now definitely worthwhile). This is where having a helper procedure can be a good plan.
proc anyWordIn {inputString keyDictionary} {
foreach word [split $inputString] {
if {[dict exists $keyDictionary $word]} {
return true
}
}
return false
}
if {[anyWordIn $temp $perc_limit_keys]} {
# ...
}
Assuming you want to see if the value in temp is an exact match for one of the elements of the list in perf_limit(levelshifter_templates), here's a few ways that are better than trying to use regular expressions:
Using lsearch`:
# Sort the list after populating it so we can do an efficient binary search
set perf_limit(levelshifter_templates) [lsort $perf_limit(levelshifter_templates)]
# ...
# See if the value in temp exists in the list
if {[lsearch -sorted $perf_limit(levelshifter_templates) $temp] >= 0} {
# ...
}
Storing the elements of the list in a dict (or array if you prefer) ahead of time for an O(1) lookup:
foreach item $perf_limit(levelshifter_templates) {
dict set lookup $item 1
}
# ...
if {[dict exists $lookup $temp]} {
# ...
}
I found a simple workaround for this problem by using a foreach statement to loop over all the regexes in the list instead of joining them and searching, which failed for a super-long list.
foreach pattern $perc_limit(levelshifter_templates) {
if { [regexp $pattern $temp]}
#puts "$fullpath: [is_std_cell_dev $dev]"
puts "##matches: $pattern return 0"
return 0
}
}

Why condition returns True using regular expressions for finding special characters in the string?

I need to validate the variable names:
name = ["2w2", " variable", "variable0", "va[riable0", "var_1__Int", "a", "qq-q"]
And just names "variable0", "var_1__Int" and "a" are correct.
I could Identify most of "wrong" name of variables using regex:
import re
if re.match("^\d|\W|.*-|[()[]{}]", name):
print(False)
else:
print(True)
However, I still become True result for va[riable0. Why is it the case?
I control for all type of parentheses.
.match() checks for a match only at the beginning of the string, while .search() checks for a match anywhere in the string.
You can also simplify your regex to this and call search() method:
^\d|\W
That basically checks whether first character is digit or a non-word is anywhere in the input.
RegEx Demo
Code Demo
Code:
>>> name = ["2w2", " variable", "variable0", "va[riable0", "var_1__Int", "a", "qq-q"]
>>> pattern = re.compile(r'^\d|\W')
>>> for str in name:
... if pattern.search(str):
... print(str + ' => False')
... else:
... print(str + ' => True')
...
2w2 => False
variable => False
variable0 => True
va[riable0 => False
var_1__Int => True
a => True
qq-q => False
Your expression is:
"^\d|\W|.*-|[()[]{}]"
But re.match() matches from the beginning of the string always, so your ^ is unnecessary, but you need a $ at the end, to make sure the entire input string matches, and not just a prefix.

Regex help need to match an ampersand OR and end of string

I'm trying to create a regex to match part of a URL
The possible URLs might be
www.mysite.com?userid=123xy
www.mysite.com?userid=123x&username=joe
www.mysite.com?tag=xyz&userid=1ww45
www.mysite.com?tag=xyz&userid=1g3x5&username=joe
I'm trying to match the userid=123456
So far I have
Dim r As New Regex("[&?]userID.*[?&]")
Debug.WriteLine(r.Match(strUrl))
But this is only matching lines 2 and 4.
Can anyone help?
(?<=[?&]userid=)[^&#\s]*
Output:
123xy
123x
1ww45
1g3x5
A few points:
This works both if you are matching one URL at a time and if you have a whitespace-separated set.
This captures the username only. It uses the non-capturing positive look-behind assertion since you only care about the username.
The fragment part, if present, will be ignored (e.g. if the URL looked like this: www.mysite.com?tag=xyz&userid=1ww45#top)
If the case of userid doesn't matter, use RegexOptions.IgnoreCase.
I got it:
[&?]userID=[^\s&#]+
PHP solution:
"/[\\?&]userid=([^&]*)/"
Tests:
$tests = [
[
"regex" => "/[\\?&]userid=([^&]*)/",
"expected" => "123xy",
"inputs" => [
"www.mysite.com?userid=123xy",
"www.mysite.com?userid=123xy&username=joe",
"www.mysite.com?tag=xyz&userid=123xy",
"www.mysite.com?tag=xyz&userid=123xy&username=joe"
]
]
];
foreach ($tests as $test) {
$regex = $test['regex'];
$expected = $test['expected'];
foreach ($test['inputs'] as $input) {
if (!preg_match($regex, $input, $match)) {
throw new Exception("Regex '{$regex}' doesn't match for input '{$input}' or error has occured.");
}
$matched = $match[1];
if ($matched !== $expected) {
throw new Exception("Found '{$matched}' instead of '{$expected}'.");
}
echo "Matched '{$matched}' in '{$input}'." . PHP_EOL;
}
}
Results:
Matched '123xy' in 'www.mysite.com?userid=123xy'.
Matched '123xy' in 'www.mysite.com?userid=123xy&username=joe'.
Matched '123xy' in 'www.mysite.com?tag=xyz&userid=123xy'.
Matched '123xy' in 'www.mysite.com?tag=xyz&userid=123xy&username=joe'.
You can use the regex: .*?(userid=\d+).*
.*? - is a non-greedy way to express: everything that comes before (userid=\d+)
Python example:
import re
a = 'www.mysite.com?userid=12345'
b = 'www.mysite.com?userid=12345&username=joe'
mat = re.match('.*?(userid=\d+).*', a)
print mat.group(1) # prints userid=12345
mat = re.match('.*?(userid=\d+).*', b)
print mat.group(1) # prints userid=12345
Link to Fiddler

Creating a list in tcl with elements that in proper index positions

How Do I convert the below string/list to a list whose first element is 1-81 second element is 81-162 3rd element us 162-243 using tcl
{} {} {1 -81} { } {81 -162} { } {162 -243} { } {243 -324} { } {324 -405} { } {405 -486} { } {486 -567} { } {567 -648} { } {648 -729} { } {729 -810} { } {810 -891} { } {891 -972} { } {972 -1053} { } {1053 -1134} { }
Thanks
If you just want to filter out empty list elements, the obvious thing to do is:
# Assuming the original list is in $list
set result {}
foreach x $list {
if {[string trim $x] != ""} {
lappend result $x
}
}
# The result list should contain the cleaned up list.
Note that you don't need to do the [string trim] if you're sure all empty elements really are empty and don't contain whitespace (meaning {} instead of possibly { }). But your example contain both empty elements and whitespace so you need to do the string trim.
Alternatively you can use a regular expression to test:
foreach x $list {
# Test if $x contains non-whitespace characters:
if {[regexp {\S} $x]} {
lappend result $x
}
}
You can however do the above in a single line using lsearch:
# Find all elements that contain non whitespace characters:
set result [lsearch -inline -all -regexp $list {\S}]
It seems you want to accomplish two goals:
Remove all empty items from the original list
For each non-empty item, remove space
I would like to offer a different approach: using the struct::list, which has a filter command:
package require struct::list
set oldList {{} {} {1 -81} { } {81 -162} { } {162 -243} { } {243 -324} {}}
set newList [struct::list filterfor item $oldList {
[set item [string map {{ } {}} $item]] != ""
}]
In this solution, I use the struct::list filterfor command, which resembles the foreach command. The body of the filterfor is a boolean expression. In the body, I use string map to remove all spaces from each item, and only return true if the result is not empty. This solution might not be the most efficient, but a different approach to solve the problem.

How to match the variable in switch with contents of a list?

I have a doubt concerning the use of switch in tcl. Mainly, I was wondering if it was possible to make something like:
switch myvar {
list1 {
puts "myvar matches contents of list1"; }
list2 {
puts "myvar matches contents of list2"; }
default {
puts "myvar doesn't match any content of any list"; }
}
In here, list1 and list2 would be either a list or array of strings containing the names of different files.
Is this even possible without making a very detailed regexp search?
Thanks!
You can rewrite it as an if elseif else construct easily, as Brian Fenton already said (and simplify it with the 'in' operator too.
if {$myvar in $list1} {
puts "myvar matches content of list"
} elseif {$myvar in $list2} {
puts "myvar matches content of list2"
} elseif {
puts "myvar doesn't match any content of any list"
}
You could of course wrap up the code and write your own switch version that does what you want, after all, this is Tcl...
proc listswitch {item conditions} {
if {[llength $conditions] % 2} {
return -code error "Conditions must be pairs"
}
set code ""
foreach {cond block} $conditions {
if {$cond eq "default"} {
set code $block
break
} elseif {$item in $cond} {
set code $block
break
}
}
if {$code ne ""} {
uplevel 1 $code
}
}
listswitch 10 {
{10 20 30 50} {
puts "Match in list 1" }
{50 20 90 11} {
puts "Match in list 2"
}
default {
puts "No match"
}
}
You need to worry a little if you want to match filenames literally, or what kind of equality your interested in though. There are some subtle things there, like case insensitive filesystems, different directory separators, absolute vs. relative and even stuff like filesystem encodings which might change the outcome.
Nice question Jason. At first, I thought you wanted a way to compare the contents of two lists. But I think you want to check if the string is a member of the lists. I don't see any easy way to do that with switch, so what I would do is very simply to use lsearch.
if {[lsearch $list1 $myvar ] != -1} {
puts "myvar matches contents of list1"; }
} elseif {[lsearch $list2 $myvar ] != -1} {
puts "myvar matches contents of list2"; }
} else
puts "myvar doesn't match any content of any list"; }
}