if (expression) VS "Traditional If" in AHK - if-statement

The following simple script displays Yes, then No.
I don't get it.
From what I read in the AHK documentation, I suspect this has something to do with the if (expression) VS "Traditional If".
But I find the documentation not very clear on this subject.
Could someone explain this?
#SingleInstance force
#NoEnv
Toto := "c"
If (Toto In a,b)
MsgBox Yes
Else
MsgBox No
If Toto In a,b
MsgBox Yes
Else
MsgBox No

You cannot use IN with expressions.
Your first example uses an expression, which does not support the use of IN. Therefore the behavior of the statement is undefined.
Your second example is correct and produces a correct result.
From https://autohotkey.com/docs/commands/IfIn.htm:
The operators "between", "is", "in", and "contains" are not supported
in expressions.

Related

Matching the IP using regular expression

set ip 10.10.
if {[regexp
{^(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.?){4}$} $ip
match]} { puts $match }
the above pattern matching 10.10. can anyone tell me how this happening
First, using a regular expression to check ip addresses is extremely fragile and unnecessarily complex, and you still have to do the heavy lifting yourself. Instead, use the Tcllib_ip package.
package require ip
If you want to know if a given string is an IPv4 address, just check with
::ip::is 4 $str ;# 1 if valid ipv4, 0 otherwise
or
::ip::version $str ;# returns 4 or 6 for ipv4 or ipv6, -1 otherwise
The commands in the package also handle address strings that aren't dotted decimal.
The package isn't included in all distributions, but can be installed using teacup install or by downloading the files and sourcing them into the script.
To answer the question: the original asker has one error and one problem. The error is that the regular expression used to match the ip address also matches strings that aren't ip addresses. This is one of the most common problems when using regular expressions. The reason and the fix is addressed in other answers to the question. To recap: Captain noted that since the original regular expression makes the dot optional, the string 10.10. can be matched as 1 0. 1 0.. There are several possible solutions: {^(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(\.|$)){4}$} as suggested by the same Captain seems valid but may turn out to have more problems if tested.
The main problem is that a non-trivial regular expression is used to match the address. For all but the most trivial regular expressions, rigorous testing must be performed to ensure that they don't produce false positives. This testing is usually impractical to make exhaustive, which means that you can't know for sure if it works until an angry customer tells you it doesn't. When a case of false positive match is found, the solution is either to drop the regular expression and try another method, or alternatively to make the regular expression more complex in order to make the match more strict. At this point, the test suite may also have to grow.
A better way is to step back and look for other solutions. If there is a standard library function for it, that should be used. If we imagine there is none in this case, simply reflecting on the most basic formulation of an ipv4 decimal-dot address ("four groups of integers from 0 to 255, joined by dots") suggests some simple and safe functions:
proc isOctet n {
expr {[string is integer -strict $n] && 0 <= $n && $n <= 255}
}
proc splitIpv4dd1 str {
split $str .
}
proc splitIpv4dd2 str {
scan $str %d.%d.%d.%d
}
proc splitIpv4dd3 str {
lrange [regexp -inline {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} $str] 1 end
}
# plug any of the preceding splitIpv4ddN functions into this command
proc putsIpv4dd str {
set count 0
foreach n [splitIpv4dd1 $str] {
if {[isOctet $n]} {
incr count
}
}
if {$count == 4} {puts $str}
}
It is much easier to verify that each of these functions does its job correctly without false negatives or positives, and if they do, the command to print ip addresses can be assumed to work correctly. The third splitting function uses a regular expression, but in this case it's a trivial one without alternatives and optional atoms.
One important goal when writing robust and maintainable code is to keep functions cohesive and clear-cut without loopholes or irregularities. Matching with non-trivial regular expressions runs counter to this.
I certainly understand and actually applaud the wish to understand what went wrong, but the correct conclusion to draw from this is that regular expression matching isn't a good method to use in this case.
You can try to use this regex:
^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$
Regex Demo
To answer "how this is happening" - ´.´ optional, it finds 1, 0., 1, 0.
And the answer to the unasked question
The below expression will make the dot optional only if it is the end of the string (modified to ensure no trailing dot):
^(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(\.(?=[0-9])|$)){4}$
Please remember that the original question was asking "how is this happening" - i.e. understanding the regular expression behaviour... NOTHING about how to change the regex or how this should be done...

Regular Expressions (Normal OR Nested Brackets)

So I'm completely new to the overwhelming world of Regex. Basically, I'm using the Gedit API to create a new custom language specification (derived from C#) for syntax-highlighting (for DM from Byond). In escaped characters in DM, you have to use [variable] as an escaping syntax, which is simple enough. However, it could also be nested, such as [array/list[index]] for instance. (It could be nested infinitely.) I've looked through the other questions, and when they ask about nested brackets they only mean exclusively nested, whereas in this case it could be either/or.
Several attempts I've tried:
\[.*\] produces the result "Test [Test[Test] Test]Test[Test] Test"
\[.*?\] produces the result "Test [Test[Test] Test]Test [Test] Test"
\[(?:.*)\] produces the result "Test [Test[Test] Test]Test[Test] Test"
\[(?:(?!\[|\]).)*\] produces the result "Test [Test[Test] Test]Test[Test] Test". This is derived from https://stackoverflow.com/a/9580978/2303154 but like mentioned above, that only matches if there are no brackets inside.
Obviously I've no real idea what I'm doing here in more complex matching, but at least I understand more of the basic operations from other sources.
From #Chaos7Theory:
Upon reading GtkSourceView's Specification Reference, I've figured out that it uses PCRE specifically. I then used that as a lead.
Digging into it and through trial-and-error, I got it to work with:
\[(([^\[\]]*|(?R))*)\]
I hope this helps someone else in the future.

simple regular expression question

How to match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab where number of a's should be min of 10?
I mean i know this way:
[a][a][a][a][a][a][a][a][a][a][a][a][a]a*b
But there must be a better elegant method where is if my min number of a's become say 100..
What is it? I am trying to match (a^n)b sort of thing where n can be anything
EDIT:
I forgot to mention this is done using lex and yacc.. where the lex has to return a token to yacc.
%{
#include "y.tab.h"
%}
%%
aaaaaaaaaa[a]*b {return ok;}
\n {return '\n';}
. {return 0;}
%%
Try
a{10,}
which says a 10 or more times.
grep -E "a{10,}" filename
matches aaaaaaaaaaaaaaaaaaaaaaaaab but not aaaaaaaaab.
If your lex is flex, you can use a{10,}.
If not so, according to
3. Lex Regular Expressions
, you can use a{10}a* instead.
Footy,
[WARNING: This answer is COMPLETE BUNKUM!!!]
(if you mean soccer, we're swarn enemies ;-)
Ummm, No... That is not as far as I know, using "the standard" regular expression syntax as supported by sed, grep, nawk, and the likes... and no not even egrep... As far as I know, the a{10,*} syntax (which is exactly what you're hankering for) didn't emerge until Perl rewrote all the books on the capabilities of regular expressions... and (don't quote me on this) I don't think that happened until like version 5.
So yeah, If you're stuck with using nawk, then it's the aaaaaaaaardvarking hardway dude. Sorry.
Cheers. Keith.
EDIT:
Hmmm... I seem to be the odd-man-out here... maybe everone-elses "standard operating environment(s)" have been updated with "standard tools" that recognise later regular expression syntax extensions... Sooo... Hmmm... I tested this on my (three year old) cygwin implementation of egrep... and it suprised me by actually working!!!
Administrator#snadbox3 ~
$ egrep 'a{3,}b' <<-eof
> ab
> aab
> aaab
> aaaab
> eof
aaab
aaaab
So I'm WRONG all ends up... looks like the "new" {min,[max]} syntax is reasonably well supported, and I'm getting old. Sigh.
Cheers. Keith.
use this format : a^na*b and replace n with any number you want.

How do I check if a scalar has a compiled regex in it with Perl?

Let's say I have a subroutine/method that a user can call to test some data that (as an example) might look like this:
sub test_output {
my ($self, $test) = #_;
my $output = $self->long_process_to_get_data();
if ($output =~ /\Q$test/) {
$self->assert_something();
}
else {
$self->do_something_else();
}
}
Normally, $test is a string, which we're looking for anywhere in the output. This was an interface put together to make calling it very easy. However, we've found that sometimes, a straight string is problematic - for example, a large, possibly varying number of spaces...a pattern, if you will. Thus, I'd like to let them pass in a regex as an option. I could just do:
$output =~ $test
if I could assume that it's always a regex, but ah, but the backwards compatibility! If they pass in a string, it still needs to test it like a raw string.
So in that case, I'll need to test to see if $test is a regex. Is there any good facility for detecting whether or not a scalar has a compiled regex in it?
As hobbs points out, if you're sure that you'll be on 5.10 or later, you can use the built-in check:
use 5.010;
use re qw(is_regexp);
if (is_regexp($pattern)) {
say "It's a regex";
} else {
say "Not a regex";
}
However, I don't always have that option. In general, I do this by checking against a prototype value with ref:
if( ref $scalar eq ref qr// ) { ... }
One of the reasons I started doing it this way was that I could never remember the type name for a regex reference. I can't even remember it now. It's not uppercase like the rest of them, either, because it's really one of the packages implemented in the perl source code (in regcomp.c if you care to see it).
If you have to do that a lot, you can make that prototype value a constant using your favorite constant creator:
use constant REGEX_TYPE => ref qr//;
I talk about this at length in Effective Perl Programming as "Item 59: Compare values to prototypes".
If you want to try it both ways, you can use a version check on perl:
if( $] < 5.010 ) { warn "upgrade now!\n"; ... do it my way ... }
else { ... use is_regex ... }
As of perl 5.10.0 there's a direct, non-tricky way to do this:
use 5.010;
use re qw(is_regexp);
if (is_regexp($pattern)) {
say "It's a regex";
} else {
say "Not a regex";
}
is_regexp uses the same internal test that perl uses, which means that unlike ref, it won't be fooled if, for some strange reason, you decide to bless a regex object into a class other than Regexp (yes, that's possible).
In the future (or right now, if you can ship code with a 5.10.0 requirement) this should be considered the standard answer to the problem. Not only because it avoids a tricky edge-case, but also because it has the advantage of saying exactly what it means. Expressive code is a good thing.
See the ref built-in.

Why is this regular expression faster?

I'm writing a Telnet client of sorts in C# and part of what I have to parse are ANSI/VT100 escape sequences, specifically, just those used for colour and formatting (detailed here).
One method I have is one to find all the codes and remove them, so I can render the text without any formatting if needed:
public static string StripStringFormating(string formattedString)
{
if (rTest.IsMatch(formattedString))
return rTest.Replace(formattedString, string.Empty);
else
return formattedString;
}
I'm new to regular expressions and I was suggested to use this:
static Regex rText = new Regex(#"\e\[[\d;]+m", RegexOptions.Compiled);
However, this failed if the escape code was incomplete due to an error on the server. So then this was suggested, but my friend warned it might be slower (this one also matches another condition (z) that I might come across later):
static Regex rTest =
new Regex(#"(\e(\[([\d;]*[mz]?))?)?", RegexOptions.Compiled);
This not only worked, but was in fact faster to and reduced the impact on my text rendering. Can someone explain to a regexp newbie, why? :)
Do you really want to do run the regexp twice? Without having checked (bad me) I would have thought that this would work well:
public static string StripStringFormating(string formattedString)
{
return rTest.Replace(formattedString, string.Empty);
}
If it does, you should see it run ~twice as fast...
The reason why #1 is slower is that [\d;]+ is a greedy quantifier. Using +? or *? is going to do lazy quantifing. See MSDN - Quantifiers for more info.
You may want to try:
"(\e\[(\d{1,2};)*?[mz]?)?"
That may be faster for you.
I'm not sure if this will help with what you are working on, but long ago I wrote a regular expression to parse ANSI graphic files.
(?s)(?:\e\[(?:(\d+);?)*([A-Za-z])(.*?))(?=\e\[|\z)
It will return each code and the text associated with it.
Input string:
<ESC>[1;32mThis is bright green.<ESC>[0m This is the default color.
Results:
[ [1, 32], m, This is bright green.]
[0, m, This is the default color.]
Without doing detailed analysis, I'd guess that it's faster because of the question marks. These allow the regular expression to be "lazy," and stop as soon as they have enough to match, rather than checking if the rest of the input matches.
I'm not entirely happy with this answer though, because this mostly applies to question marks after * or +. If I were more familiar with the input, it might make more sense to me.
(Also, for the code formatting, you can select all of your code and press Ctrl+K to have it add the four spaces required.)