Tree for 'unelectability' - linguistics

How would I go about making a derivational tree for this word? I am at a loss, I can't find the rules for making trees online and it is very confusing. I got that the stem is elect as a verb, then un-elect, then un-elect-able, and finally the full word un-elect-abl-ity. Is this the correct structure?

The correct assembly would be:
elect+ABLE => electable (walk+ABLE => walkable)
UN+electable => unelectable (UN+common => uncommon)
unelectable+ITY => unelectability (acid+ITY => acidity)
Every step is motivated independently for simple derivations. UN+elect would mean impeach and your assembly would give something meaning impeachability.

Related

Any way to speed up instaparse?

I'm trying to use instaparse on a dimacs file less than 700k in size, with the following grammar
<file>=<comment*> <problem?> clause+
comment=#'c.*'
problem=#'p\s+cnf\s+\d+\s+\d+\s*'
clause=literal* <'0'>
<literal>=#'[1-9]\d*'|#'-\d+'
calling like so
(def parser
(insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard))
...
(time (parser (slurp filename)))
and it's taking about a hundred seconds. That's three orders of magnitude slower than I was hoping for. Is there some way to speed it up, some way to tweak the grammar or some option I'm missing?
The grammar is wrong. It can't be satisfied.
Every file ends with a clause.
Every clause ends with a '0'.
The literal in the clause, being a greedy reg-exp,will eat
the final '0'.
Conclusion: No clause will ever be found.
For example ...
=> (parser "60")
Parse error at line 1, column 3:
60
^
Expected one of:
"0"
#"\s+"
#"-\d+"
#"[1-9]\d*"
We can parse a literal
=> (parser "60" :start :literal)
("60")
... but not a clause
=> (parser "60" :start :clause)
Parse error at line 1, column 3:
60
^
Expected one of:
"0" (followed by end-of-string)
#"\s+"
#"-\d+"
#"[1-9]\d*"
Why is it so slow?
If there is a comment:
it can swallow the whole file;
or be broken at any 'c' character into successive comments;
or terminate at any point after the initial 'c'.
This implies that every tail has to be presented to the rest of the grammar, which includes a reg-exp for literal that Instaparse can't see inside. Hence all have to be tried, and all will ultimately fail. No wonder it's slow.
I suspect that this file is actually divided into lines. And that your problems arise from trying to conflate newlines with other forms of white-space.
May I gently point out that playing with a few tiny examples - which is all I've done - might have saved you a deal of trouble.
I think that your extensive use of * is causing the problem. Your grammar is too ambiguous/ambitious (I guess). I would check two things:
;;run it as
(insta/parses grammar input)
;; with a small input
That will show you how much ambiguity is in your grammar definition: check "ambiguous grammar".
Read Engelberg performance notes, it would help understand your own problem and probably find out what fits best for you.

etterfilter pcre_regex : difficulties with binary strings

that works great
replace("\x02\x03\x04", "\x05\x06\x07")
but that do not work
pcre_regex(DATA.data, "\x02\x03\x04", "\x05\x06\x07")
cause the \x is not interpreted in the replace string
i have tried to do
if ( search(DATA.data, "\x02\x03\x04") )
{
log(DATA.data, "./D")
exec("/bin/sed 's/\x02\x03\x04/\x05\x06\x07/g' ./D > ./E")
drop()
inject("./E")
}
but the exec seems to launch command in background
so inject() happen before ./E is wrote
=> which way are we intended to use pcre_regex with binary strings ?
=> is there another way to use etterfilter ?
=> is there another tool that do the job (binary replacement WITH regex) ?
Can you please try with this online tool?
https://www.debuggex.com/
this will help you in finding out if the bug is in ettercap or in pcre or in your regex :)
answer is here
https://github.com/Ettercap/ettercap/issues/488
answer is here
https://github.com/Ettercap/ettercap/issues/488

How to click a link in only a single class

I have elements that can be in one of two state class="icon" or class="icon active".
I thought that $browser.element(:class => /^icon$/).click would click the first button that isn't active but it just clicks the first one it finds regardless of whether or not it also contains "active."
Is the regex wrong? Or better yet, is there a non-regex way of doing it?
As mentioned in the comments, the regex you used should work in watir-webdriver. However if you need a solution that will work in both watir-classic and watir-webdriver, you will need to use find.
b.elements.find{ |e| e.class_name == 'icon'}.click
This will only matches elements where the 'class' attribute is exactly 'icon'.
It is slower and less readable, but allows you to bypass watir-classic's method for matching classes. As seen below, watir-classic will check that the regex matches any of the element's classes.
def match_class? element, what
classes = element.class_name.split(/\s+/)
classes.any? {|clazz| what.matches(clazz)}
end
This is theoretical, and I apologize for not having the time to construct a fake page and test to see if it works
browser.element(:class => /icon(?!active)$/).click
This works in theory (the regex) matching a line like icon but not icon active but, there may be some under the hood magic that goes on with how class names are matched which might cause it to return the wrong line.
If that does not work let me know, I'll suggest an alternative approach, which while less elegant, ought to work.
For reference I used the Rubular online regex tester along with this SO answer Regular expression to match a line that doesn't contain a word? to some up with that.
Failing the ability to use a regex, another option would be to get a collection of matching items, and then inspect them more closely, clicking when you find one that works and abandoning the collection at that point.
browswer.elements(:class => "icon").each do |possible|
unless possible.attribute_value("class").include? "active"
possible.click
break
end
end
I'm not always a big fan of unless, but in this case it results in readable code, so I used it
for troubleshooting, lets see what is being shown for the class info on the elements in that collection
browswer.elements(:class => "icon").each do |possible|
puts possible.attribute_value("class")
end

Regex vs. string:find() for simple word boundary

Say I only need to find out whether a line read from a file contains a word from a finite set of words.
One way of doing this is to use a regex like this:
.*\y(good|better|best)\y.*
Another way of accomplishing this is using a pseudo code like this:
if ( (readLine.find("good") != string::npos) ||
(readLine.find("better") != string::npos) ||
(readLine.find("best") != string::npos) )
{
// line contains a word from a finite set of words.
}
Which way will have better performance? (i.e. speed and CPU utilization)
The regexp will perform better, but get rid of those '.*' parts. They complicate the code and don't serve any purpose. A regexp like this:
\y(good|better|best)\y
will search through the string in a single pass. The algorithm it builds from this regexp will look first for \y, then character 1 (g|b), then character 2 (g => go or b => be), character 3 (go => goo or be => bes|bet), character 4 (go => good or bes => best or bet => bett), etc. Without building your own state machine, this is as fast as it gets.
You won't know which is faster until you've measured, but the issues at stake are:
The regex implementation, esp. whether it needs to precompile (like Google RE2, POSIX regexes).
The implementation of string::find.
The length of the string you're searching in.
How many strings you're searching in.
My bets are on the regex, but again: you've got to measure to be sure.
Obviously not the second one (using 'find'), since you're running three comparisons (need to traverse the string at least 3 times) instead of one hopefully smart one. If the regex engine works at all like it should (and I suppose it does) then it will probably be at least three times faster.

Fast string matching algorithm with simple wildcards support

I need to match input strings (URLs) against a large set (anywhere from 1k-250k) of string rules with simple wildcard support.
Requirements for wildcard support are as follows:
Wildcard (*) can only substitute a "part" of a URL. That is fragments of a domain, path, and parameters. For example, "*.part.part/*/part?part=part&part=*". The only exception to this rule is in the path area where "/*" should match anything after the slash.
Examples:
*.site.com/* -- should match sub.site.com/home.html, sub2.site.com/path/home.html
sub.site.*/path/* -- should match sub.site.com/path/home.html, sub.site.net/path/home.html, but not sub.site.com/home.html
Additional requirements:
Fast lookup (I realize "fast" is a relative term. Given the max 250k rules, still fall within < 1.5s if possible.)
Work within the scope of a modern desktop (e.g. not a server implementation)
Ability to return 0:n matches given a input string
Matches will have rule data attached to them
What is the best system/algorithm for such as task? I will be developing the solution in C++ with the rules themselves stored in a SQLite database.
First of all, one of the worst performing searches you can do is with a wildcard at both ends of the string ".domain.com/path" -- and I think you're going to hit this case a lot. So my first recommendation is to reverse the order of the domains as they're stored in your DB: com.domain.example/path1/path2/page.html. That will allow you to keep things much more tidy and only use wildcards in "one direction" on the string, which will provide MUCH faster lookups.
I think John mentions some good points about how to do this all within your DB. If that doesn't work I would use a regex library in C++ against the list. I bet you'll get the best performance and most general regex syntax that way.
If I'm not mistaken, you can take string rule and break it up into domain, path, and query pieces, just like it's a URL. Then you can apply a standard wildcard matching algorithm with each of those pieces against the corresponding pieces from the URLs you want to test against. If all of the pieces match, the rule is a match.
Example
Rule: *.site.com/*
domain => *.site.com
path => /*
query => [empty]
URL: sub.site.com/path/home.html
domain => sub.site.com
path => /path/home.html
query => [empty]
Matching process:
domain => *.site.com matches sub.site.com? YES
path => /* matches /path/home.html? YES
query => [empty] matches [empty] YES
Result: MATCH
As you are storing the rules in a database I would store them already broken into those three pieces. And if you want uber-speed you could convert the *'s to %'s and then use the database's native LIKE operation to do the matching for you. Then you'd just have a query like
SELECT *
FROM ruleTable
WHERE #urlDomain LIKE ruleDomain
AND #urlPath LIKE rulePath
AND #urlQuery LIKE ruleQuery
where #urlDomain, #urlPath, and #urlQuery are variables in a prepared statement. The query would return the rules that match a URL, or an empty result set if nothing matches.