Here's some code:
matches = /\/([a-z]+)\/(\d+)\/state\/([a-z]+)/.match(address) # line 1
puts matches[0]? # line 2
puts matches.try &.[0] # line 3
matches datatype is (Regex::MatchData | Nil) according to typeof. Yet, line 2, referring to matches[0] fails with a compilation error. And I don't understand line 3 at all!
Could someone clarify?
Line 2: As you say, the type of matches is (Regex::MatchData | Nil). In case it happens to be nil, it wouldn't have the #[]? method, which makes the type checker angry. You are supposed to check whether match succeeded first:
matches = /\/([a-z]+)\/(\d+)\/state\/([a-z]+)/.match(address)
if matches
puts matches[0]?
end
Inside if, the type of matches is just Regex::MatchData (as we eliminated the Nil possibility), and the type checker can rest peacefully.
If you are sure your string will match, you can pacify the type checker with not_nil!, but that opens up a possibility of a runtime error, if your confidence in your data's conformance was unfounded:
puts matches.not_nil![0]?
Line 3: #try will perform the block, except when the caller is nil, when it returns nil. No guards are needed because #try is explicitly defined on Nil (as well as on Object).
It uses the shortcut syntax for blocks, where &.[0] is kind of equivalent to { |x| x[0] }.
Related
My RAKU Code:
sub comments {
if ($DEBUG) { say "<filtering comments>\n"; }
my #filteredtitles = ();
# This loops through each track
for #tracks -> $title {
##########################
# LAB 1 TASK 2 #
##########################
## Add regex substitutions to remove superflous comments and all that follows them
## Assign to $_ with smartmatcher (~~)
##########################
$_ = $title;
if ($_) ~~ s:g:mrx/ .*<?[\(^.*]> / {
# Repeat for the other symbols
########################## End Task 2
# Add the edited $title to the new array of titles
#filteredtitles.push: $_;
}
}
# Updates #tracks
return #filteredtitles;
}
Result when compiling:
Error Compiling! Placeholder variable '#_' may not be used here because the surrounding block doesn't take a signature.
Is there something obvious that I am missing? Any help is appreciated.
So, in contrast with #raiph's answer, here's what I have:
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
Just that. Nothing else. Let's dissect it, from the inside out:
This part: / <[\(^]> / is a regular expression that will match one character, as long as it is an open parenthesis (represented by the \() or a caret (^). When they go inside the angle brackets/square brackets combo, it means that is an Enumerated character class.
Then, the: S introduces the non-destructive substitution, i.e., a quoting construct that will make regex-based substitutions over the topic variable $_ but will not modify it, just return its value with the modifications requested. In the code above, S:g brings the adverb :g or :global (see the global adverb in the adverbs section of the documentation) to play, meaning (in the case of the substitution) "please make as many as possible of this substitution" and the final / marks the end of the substitution text, and as it is adjacent to the second /, that means that
S:g / <[\(^]> //
means "please return the contents of $_, but modified in such a way that all its characters matching the regex <[\(^]> are deleted (substituted for the empty string)"
At this point, I should emphasize that regular expressions in Raku are really powerful, and that reading the entire page (and probably the best practices and gotchas page too) is a good idea.
Next, the: .map method, documented here, will be applied to any Iterable (List, Array and all their alikes) and will return a sequence based on each element of the Iterable, altered by a Code passed to it. So, something like:
#x.map({ S:g / foo /bar/ })
essencially means "please return a Sequence of every item on #x, modified by substituting any appearance of the substring foo for bar" (nothing will be altered on #x). A nice place to start to learn about sequences and iterables would be here.
Finally, my one-liner
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
can be translated as:
I have a List with three string elements
Foo
Ba(r
B^az
(This would be a placeholder for your "list of titles"). Take that list and generate a second one, that contains every element on it, but with all instances of the chars "open parenthesis" and "caret" removed.
Ah, and store the result in the variable #tracks (that has my scope)
Here's what I ended up with:
my #tracks = <Foo Ba(r B^az>;
sub comments {
my #filteredtitles;
for #tracks -> $_ is copy {
s:g / <[\(^]> //;
#filteredtitles.push: $_;
}
return #filteredtitles;
}
The is copy ensures the variable set up by the for loop is mutable.
The s:g/...//; is all that's needed to strip the unwanted characters.
One thing no one can help you with is the error you reported. I currently think you just got confused.
Here's an example of code that generates that error:
do { #_ }
But there is no way the code you've shared could generate that error because it requires that there is an #_ variable in your code, and there isn't one.
One way I can help in relation to future problems you may report on StackOverflow is to encourage you to read and apply the guidance in Minimal Reproducible Example.
While your code did not generate the error you reported, it will perhaps help you if you know about some of the other compile time and run time errors there were in the code you shared.
Compile-time errors:
You wrote s:g:mrx. That's invalid: Adverb mrx not allowed on substitution.
You missed out the third slash of the s///. That causes mayhem (see below).
There were several run-time errors, once I got past the compile-time errors. I'll discuss just one, the regex:
.*<?[...]> will match any sub-string with a final character that's one of the ones listed in the [...], and will then capture that sub-string except without the final character. In the context of an s:g/...// substitution this will strip ordinary characters (captured by the .*) but leave the special characters.
This makes no sense.
So I dropped the .*, and also the ? from the special character pattern, changing it from <?[...]> (which just tries to match against the character, but does not capture it if it succeeds) to just <[...]> (which also tries to match against the character, but, if it succeeds, does capture it as well).
A final comment is about an error you made that may well have seriously confused you.
In a nutshell, the s/// construct must have three slashes.
In your question you had code of the form s/.../ (or s:g/.../ etc), without the final slash. If you try to compile such code the parser gets utterly confused because it will think you're just writing a long replacement string.
For example, if you wrote this code:
if s/foo/ { say 'foo' }
if m/bar/ { say 'bar' }
it'd be as if you'd written:
if s/foo/ { say 'foo' }\nif m/...
which in turn would mean you'd get the compile-time error:
Missing block
------> if m/⏏bar/ { ... }
expecting any of:
block or pointy block
...
because Raku(do) would have interpreted the part between the second and third /s as the replacement double quoted string of what it interpreted as an s/.../.../ construct, leading it to barf when it encountered bar.
So, to recap, the s/// construct requires three slashes, not two.
(I'm ignoring syntactic variants of the construct such as, say, s [...] = '...'.)
From the docs (perldoc -f m)
If ? is the delimiter, then a match-only-once rule applies, described in m?*PATTERN*? below.
The "match-only-once rule" doesn't' seem to be defined anywhere, but it seems to be a real optimization,
use Benchmark qw(:all) ;
use constant HAYSTACK => "this is a test string";
my $needle = "test";
cmpthese(-1, {
'questionmark' => sub { if ( HAYSTACK =~ m?$needle?n ) { 1 } },
'backslash' => sub { if ( HAYSTACK =~ m/$needle/n ) { 1 } },
});
With the results,
Rate backslash questionmark
backslash 9267717/s -- -57%
questionmark 21588328/s 133% --
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior? Let's take for example the output
perl -E'say "FOOOOOO" =~ m/O/' # returns 1
If it's not even counting the O what does it do after the first match such that it's twice as slow?
The "match-only-once rule" doesn't' seem to be defined anywhere, […]
"A match-only-once rule" is a description of the rule — it's a rule saying that m?PATTERN? matches only once — not an official name that you can use to search. The text that you quote is pulled from the perlop manpage, so when it says "described in m?*PATTERN*? below", it's referring to this part of that manpage:
m?PATTERN?msixpodualngc
This is just like the m/PATTERN/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only m?? patterns local to the current package are reset.
while (<>) {
if (m?^$?) {
# blank line between header and body
}
} continue {
reset if eof; # clear m?? status for next file
}
Another example switched the first "latin1" encoding it finds to "utf8" in a pod file:
s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior?
Even in scalar context, m// or m?? may be called many times between resets, and if so then the two behave differently. (You can see this in the first snippet above. It's also the reason that your benchmarks give different performance results: the version with m?$needle?n only does a regex match the first time the function is called — it just returns 'no match' on all subsequent calls — whereas the version with m/$needle/n does a regex match every time.)
The confusion here is that "once" in "match-only-once" is in reference to the calling context of the m?? not in reference to matching once the needle inside the haystack, and ignoring subsequent matches of the needle inside the haystack. So if m?? is called many times without reset, only the first one that matches will return the match.
sub foo { return "foo" =~ m?o? };
say foo(); # 1
say foo(); # undef
reset();
say foo(); # 1
Ruby has a universal idea of "truthiness" and "falsiness".
Ruby does have two specific classes for Boolean objects, TrueClass and FalseClass, with singleton instances denoted by the special variables true and false, respectively.
However, truthiness and falsiness are not limited to instances of those two classes, the concept is universal and applies to every single object in Ruby. Every object is either truthy or falsy. The rules are very simple. In particular, only two objects are falsy:
nil, the singleton instance of NilClass and
false, the singleton instance of FalseClass
Every single other object is truthy. This includes even objects that are considered falsy in other programming languages, such as
the Integer 0,
the Float 0.0,
the empty String '',
the empty Array [],
the empty Hash {},
These rules are built into the language and are not user-definable. There is no to_bool implicit conversion or anything similar.
Here is a quote from the ISO Ruby Language Specification:
6.6 Boolean values
An object is classified into either a trueish object or a falseish object.
Only false and nil are falseish objects. false is the only instance of the class FalseClass (see 15.2.6), to which a false-expression evaluates (see 11.5.4.8.3). nil is the only instance of the class NilClass (see 15.2.4), to which a nil-expression evaluates (see 11.5.4.8.2).
Objects other than false and nil are classified into trueish objects. true is the only instance of the class TrueClass (see 15.2.5), to which a true-expression evaluates (see 11.5.4.8.3).
The executable Ruby/Spec seems to agree:
it "considers a non-nil and non-boolean object in expression result as true" do
if mock('x')
123
else
456
end.should == 123
end
According to those two sources, I would assume that Regexps are also truthy, but according to my tests, they aren't:
if // then 'Regexps are truthy' else 'Regexps are falsy' end
#=> 'Regexps are falsy'
I tested this on YARV 2.7.0-preview1, TruffleRuby 19.2.0.1, and JRuby 9.2.8.0. All three implementations agree with each other and disagree with the ISO Ruby Language Specification and my interpretation of the Ruby/Spec.
More precisely, Regexp objects that are the result of evaluating Regexp literals are falsy, whereas Regexp objects that are the result of some other expression are truthy:
r = //
if r then 'Regexps are truthy' else 'Regexps are falsy' end
#=> 'Regexps are truthy'
Is this a bug, or desired behavior?
This isn’t a bug. What is happening is Ruby is rewriting the code so that
if /foo/
whatever
end
effectively becomes
if /foo/ =~ $_
whatever
end
If you are running this code in a normal script (and not using the -e option) then you should see a warning:
warning: regex literal in condition
This is probably somewhat confusing most of the time, which is why the warning is given, but can be useful for one lines using the -e option. For example you can print all lines matching a given regexp from a file with
$ ruby -ne 'print if /foo/' filename
(The default argument for print is $_ as well.)
This is the result of (as far as I can tell) an undocumented feature of the ruby language, which is best explained by this spec:
it "matches against $_ (last input) in a conditional if no explicit matchee provided" do
-> {
eval <<-EOR
$_ = nil
(true if /foo/).should_not == true
$_ = "foo"
(true if /foo/).should == true
EOR
}.should complain(/regex literal in condition/)
end
You can generally think of $_ as the "last string read by gets"
To make matters even more confusing, $_ (along with $-) is not a global variable; it has local scope.
When a ruby script starts, $_ == nil.
So, the code:
// ? 'Regexps are truthy' : 'Regexps are falsey'
Is being interpreted like:
(// =~ nil) ? 'Regexps are truthy' : 'Regexps are falsey'
...Which returns falsey.
On the other hand, for a non-literal regexp (e.g. r = // or Regexp.new('')), this special interpretation does not apply.
// is truthy; just like all other object in ruby besides nil and false.
Unless running a ruby script directly on the command line (i.e. with the -e flag), the ruby parser will display a warning against such usage:
warning: regex literal in condition
You could make use of this behaviour in a script, with something like:
puts "Do you want to play again?"
gets
# (user enters e.g. 'Yes' or 'No')
/y/i ? play_again : back_to_menu
...But it would be more normal to assign a local variable to the result of gets and perform the regex check against this value explicitly.
I'm not aware of any use case for performing this check with an empty regex, especially when defined as a literal value. The result you've highlighted would indeed catch most ruby developers off-guard.
I want to use the regex crate and capture numbers from a string.
let input = "abcd123efg";
let re = Regex::new(r"([0-9]+)").unwrap();
let cap = re.captures(e).unwrap().get(1).unwrap().as_str();
println!("{}", cap);
It worked if numbers exist in input, but if numbers don't exist in input I get the following error:
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value'
I want my program continue if the regex doesn't match. How can I handle this error?
You probably want to (re-)read the chapter on "Error Handling" in the Rust book. Error handling in Rust is mostly done via the types Result<T, E> and Option<T>, both representing an optional value of type T with Result<T, E> carrying additional information about the absence of the main value.
You are calling unwrap() on each Option or Result you encounter. unwrap() is a method saying: "if there is no value of type T, let the program explode (panic)". You only want to call unwrap() if an absence of a value is not expected and thus would be a bug! (NB: actually, the unwrap() in your second line is a perfectly reasonable use!)
But you use unwrap() incorrectly twice: on the result of captures() and on the result of get(1). Let's tackle captures() first; it returns an Option<_> and the docs say:
If no match is found, then None is returned.
In most cases, the input string not matching the regex is to be expected, thus we should deal with it. We could either just match the Option (the standard way to deal with those possible errors, see the Rust book chapter) or we could use Regex::is_match() before, to check if the string matches.
Next up: get(1). Again, the docs tell us:
Returns the match associated with the capture group at index i. If i does not correspond to a capture group, or if the capture group did not participate in the match, then None is returned.
But this time, we don't have to deal with that. Why? Our regex (([0-9]+)) is constant and we know that the capture group exists and encloses the whole regex. Thus we can rule out both possible situations that would lead to a None. This means we can unwrap(), because we don't expect the absence of a value.
The resulting code could look like this:
let input = "abcd123efg";
let re = Regex::new(r"([0-9]+)").unwrap();
match re.captures(e) {
Some(caps) => {
let cap = caps.get(1).unwrap().as_str();
println!("{}", cap);
}
None => {
// The regex did not match. Deal with it here!
}
}
You can either check with is_match or just use the return type of captures(e) to check it (it's an Option<Captures<'t>>) instead of unwrapping it, by using a match (see this how to handle options).
well i am currently writing a script that is meant to check the logs of another script i wrote to see if it has had three or more unsuccessful pings in a row before a successful one, this is just barebones at the moment but it should look something like this
fileread,x,C:\Users\Michael\Desktop\ping.txt
result:=RegExMatch(%x% ,failure success)
msgbox,,, The file is = %x% `n the result is = %result%
now the file that is trying to read is
success failure success
and for some reason, when it reads the file it says that the variable %x% 'contains illegal characters
when i copy and paste the contents of ping.txt into the script and save it as a variable it works
i have made sure that the file has windows line endings CR +LF
i have assigned the variable generated in file read as another variable thus stripping any trailing or leading whitespace characters
the file is encoded in ANSI and still has the problem with UTF8
Function parameters take variable names without the % symbol, simply remove them.
I also want to point out that if the second parameter is meant to be a regular expression,
instead of a variable containing a regular expression, you will need quotes around it.
As is your script passes an empty string as the pattern which will always return 1
(failure is interpreted as a variable with an empty string associated with it.).
To quote Lexikos:
"An empty string, when compiled as a regex pattern, will match exactly
zero characters at whatever position you attempt to match it. Think of
it this way: For any position n in any string, the next 0 characters
are always the same."
Because you are simply truth testing,
or finding the index I want to point out that Autohotkey has a useful shorthand operator for this.
string := "this is a test"
f1::
result := RegExMatch(string, "\sis")
traytip,, %result%
Return
f2::
result := string ~= "\sis"
traytip,, % result
Return
These hotkeys both do the same thing; the second uses the shorthand operator ~=
and notice how the traytip parameter in the second example has only one %
When you start a command parameter with a % that starts an expression,
and within an expression variables are not enclosed with %.
The ternary operator ?: is also very useful:
string := "this is a test"
f3::traytip,, % (result := string ~= "\sis") ? (result) : ("nothing")
It might look complicated but it's very simple.
Think of
% as if
? as then
: as else
If (true) then (a) else (b)
% (true) ? (a) : (b)
A variable will be evaluated as False if 0 (or nothing) is assigned to it.
But in this example "\sis" is matched and the index of the space is returned (5),
so it is evaluated as True.
You can read more about variables and operators here:
http://l.autohotkey.net/docs/Variables.htm