How to parse parameters to a hubot script - regex

New to hubot/coffeescript and inheriting and existing script.
I googled and found some unhelpful stuff like this: Hubot matching on multiple tokens per line?
What I want to do is be able to parse parameters to my Hubot message. For example:
startPlaceOrderListener = () ->
robot.respond /order me (.*)/i, (res) ->
and then follow it with what you want to order.
I can obviously re-invent the wheel and parse res.match[1] myself, but hubot already seems to have some regular expression parsing built in for its own use and I was wondering if there's a way to leverage that for my own nefarious purposes.

It turns out the coffeescript has regular expressions built in. So
/order me (.*)/i
is straight coffeescript.
To match a regular expression you can do:
/order me (.*)/i.test("Bob")
Where the i can be left out if you don't want to ignore case.

To parse the input value in CoffeeScript you can do something like:
robot.respond /open the (.*) doors/i, (res) ->
doorType = res.match[1]
if doorType is "pod bay"
res.reply "I'm afraid I can't let you do that."
else
res.reply "Opening #{doorType} doors"

Related

Scanning a language with non-delimited strings with nested tokens

I want to create a lexer/parser for a language that has non-delimited strings.
Which part of the language is a string is defined by the command preceding it.
For example it has statements that look like this:
pause 5
alert Hello world[CRLF] this contains 'pause' once (1)
Alert in this instance can end with any string, including keywords and numbers.
Further complicating things, the text can contain tags like [CRLF] that I want to separate too.
Ideally I'd want this to be broken up into:
[PAUSE][INT 5]
[ALERT][STR "Hello world"][CRLF][STR " this contains 'pause' once (1)"]
I'm currently using flex but from what I've gathered this kind of thing isn't possible with flex.
How can I achieve what I want here?
(Since one of your tags is "regex", I'll suggest a non-flex approach.)
From the example, it seems like you could just:
match each line against ^(\w+) (.+) to obtain command and arguments-text, and then
get individual arguments by splitting the arguments-text on (\[\w+\]) (assuming your regex library's split function can return both the splitter-strings and the split-strings).
It's possible your actual situation is more complex and something like flex makes more sense, but I'm not really seeing it so far.

HTML tokenizer algorithm

I'm trying to write a basic html parser which doesn't tolerate errors and was reading HTML5 parsing algorithm but it's just too much information for a simple parser. I was wondering if someone had an idea on the logic for a basic tokenizer which would simply turn a small html into a list of significant tokens. I'm more of interested in the logic than the code..
std::string html = "<div id='test'> Hello <span>World</span></div>";
Tokenizer t;
t.tokenize(html);
So for the above html, I want to convert it to a list of something like this:
["<","div","id", "=", "test", ">", "Hello", "<", "span", ">", "world", "</", "span", ">", "<", "div", ">"]
I don't have anything for the tokenize method but was wondering if iterating over the html character by character is the best way to build the list..
void Tokenizer::tokenize(std::string html){
std::list<std::string> tokens;
for(int i = 0; i < html.length();i++){
char c = html[i];
if(...){
...
}
}
}
I think what you are looking for is a lexical analyzer. Its goal is getting all the tokens that are defined in your language, in this case is HTML. As #IraBaxter said, you can use a Lexical tool, like Lex, that is founded in Linux or OSX; but you must define the rule and, for this, you need use Regular Expressions.
But, if you wan to know about an algorithm for this issue you can check the book of Keith D. Cooper & Linda Torczon, chapter 2, Scanners. This chapter talks about Automatas and who they can be used to create a Scanner where it use a Table-Driven Scanner to get tokens, like you want. Let me share you an image of this chapter:
The idea is that you define a DFA where you have:
A finite set of states in the recognizer, including start state, accepting states and error state.
An Alfabet.
A function which helps to determine if a transition is valid or not, using the table of transitions or, if you don't want use a table, coding the automata.
Take a time to study this chapter.
The other answers here are great, and you should definitely use a lexical-analyzer-generator like flex for the job. The input to such a generator is a list of rules that identify the different token types. An input file might look like this:
WHITE_SPACE \s*
IDENTIFIER [a-zA-Z0-9_]+
LEFT_ANGLE <
The algorithm that flex uses is essentially:
Find the rule that matches the most text.
If two rules match the same length of text, choose the one that occurs earlier in the list of rules provided.
You could write this algorithm quite easily yourself using regular expressions. However, do remember that this will not be as fast as flex, since flex compiles the regular expressions away into a very fast DFA.

Grails Filter regexs

I am new to grails and so far i have only been able to use simple filters. I want to use filter in an efficient manner.
(I am using grails 2.4.3, with jdk_1.6)
I want to create a filter to allow accessing AppName/ and AppName/user/login and i could not get it right! I wanted to use regex but i am not getting it right!
i tried this
loggedInOnly(uri:'/**',uriExclude :"*.css|*.js|*image*|/|/user/login"){
before = {
println "### ###### #### #"
}
}
and i also tried to revers the regex parameter, but i am getting no luck! I searched all of google but i could not find a single thread to tell me how filter regex work!
i know i could create xxxx(controller:'*', action:'*') filter then use the controllerName and actionName parameters to check! But there gotta be a better way!
My question in a nutshell: How does regex work in filters?
First, take a closer look at the documentation. Notice that uri and uriExclude are ant paths and not regular expressions. Keeping that in mind if you look how ant paths function you will see they aren't capable of logical ors.
So, with all of that in mind it's back to using enabling regex and using the find attribute instead.
loggedInOnly(regex: true, find: '(.​*.css|.*.js|.*image.*|\\/|\\/user\\/login)​', invert: true){
before = {
...
}
}
Notice I hae used invert to have this filter apply to anything that doesn't match any of the patterns inside the find. Also, I wrote this off the top of my head so you may have to spot check the regular expression in your application (I did check it using groovy web console to make sure I didn't really mess up the syntax).
Hope this helps.

PCRE in Haskell - what, where, how?

I've been searching for some documentation or a tutorial on Haskell regular expressions for ages. There's no useful information on the HaskellWiki page. It simply gives the cryptic message:
Documentation
Coming soonish.
There is a brief blog post which I have found fairly helpful, however it only deals with Posix regular expressions, not PCRE.
I've been working with Posix regex for a few weeks and I'm coming to the conclusion that for my task I need PCRE.
My problem is that I don't know where to start with PCRE in Haskell. I've downloaded regex-pcre-builtin with cabal but I need an example of a simple matching program to help me get going.
Is it possible to implement multi-line matching?
Can I get the matches back in this format: [(MatchOffset,MatchLength)]?
What other formats can I get the matches back in?
Thank you very much for any help!
There's also regex-applicative which I've written.
The idea is that you can assign some meaning to each piece of a regular expression and then compose them, just as you write parsers using Parsec.
Here's an example -- simple URL parsing.
import Text.Regex.Applicative
data Protocol = HTTP | FTP deriving Show
protocol :: RE Char Protocol
protocol = HTTP <$ string "http" <|> FTP <$ string "ftp"
type Host = String
type Location = String
data URL = URL Protocol Host Location deriving Show
host :: RE Char Host
host = many $ psym $ (/= '/')
url :: RE Char URL
url = URL <$> protocol <* string "://" <*> host <* sym '/' <*> many anySym
main = print $ "http://stackoverflow.com/questions" =~ url
There are two main options when wanting to use PCRE-style regexes in Haskell:
regex-pcre uses the same interface as described in that blog post (and also in RWH, as I think an expanded version of that blog post); this can be optionally extended with pcre-less. regex-pcre-builtin seems to be a pre-release snapshot of this and probably shouldn't be used.
pcre-light is bindings to the PCRE library. It doesn't provide the return types you're after, just all the matchings (if any). However, the pcre-light-extras package provides a MatchResult class, for which you might be able to provide such an instance. This can be enhanced using regexqq which allows you to use quasi-quoting to ensure that your regex pattern type-checks; however, it doesn't work with GHC-7 (and unless someone takes over maintaining it, it won't).
So, assuming that you go with regex-pcre:
According to this answer, yes.
I think so, via the MatchArray type (it returns an array, which you can then get the list out from).
See here for all possible results from a regex.
Well, I wrote much of the wiki page and may have written "Coming soonish". The regex-pcre package was my wrapping of PCRE using the regex-base interface, where regex-base is used as the interface for several very different regular expression engine backends. Don Stewart's pcre-light package does not have this abstraction layer and is thus much smaller.
The blog post on Text.Regex.Posix uses my regex-posix package which is also on top of regex-base. Thus the usage of regex-pcre will be very very similar to that blog post, except for the compile & execution options of PCRE being different.
For configuring regex-pcre the Text.Regex.PCRE.Wrap module has the constants you need. Use makeRegexOptsM from regex-base to specify the options.
regexpr is another PCRE-ish lib that's cross-platform and quick to get started with.
I find rex to be quite nice too, its ViewPatterns integration is a nice idea I think.
It can be verbose though but that's partially tied to the regex concept.
parseDate :: String -> LocalTime
parseDate [rex|(?{read -> year}\d+)-(?{read -> month}\d+)-
(?{read -> day}\d+)\s(?{read -> hour}\d+):(?{read -> mins}\d+):
(?{read -> sec}\d+)|] =
LocalTime (fromGregorian year month day) (TimeOfDay hour mins sec)
parseDate v#_ = error $ "invalid date " ++ v
That said I just discovered regex-applicative mentioned in one of the other answers and it may be a better choice, could be less verbose and more idiomatic, although rex has basically zero learning curve if you know regular expressions which can be a plus.

What's the best way to validate a user-entered URL in a Cocoa application?

I am trying to build a homebrew web brower to get more proficient at Cocoa. I need a good way to validate whether the user has entered a valid URL. I have tried some regular expressions but NSString has some interesting quirks and doesn't like some of the back-quoting that most regular expressions I've seen use.
You could start with the + (id)URLWithString:(NSString *)URLString method of NSURL, which returns nil if the string is malformed.
If you need further validation, you can use the baseURL, host, parameterString, path, etc methods to give you particular components of the URL, which you can then evaluate in whatever way you see fit.
I've found that it is possible to enter some URLs that seem to be OK but are rejected by the NSURL creation methods. So we have a method to escape the string first to make sure it's in a good format. Here is the meat of it:
NSString *escapedURLString =
NSMakeCollectable(CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)URLString,
(CFStringRef)#"%+#", // Characters to leave unescaped
NULL,
kCFStringEncodingUTF8));