Prolog - Finding alternative words (synonyms)

Prolog - Finding alternative words (synonyms) - list

I'm trying to figure out how I would go about creating a synonym finder in Prolog.
I have some words here...
word(likes).
word(house).
word(chair).
If the input was likes, I would want to output a synonym such as 'loves'. Or for house I would want to output 'home'.
I want to do this with the synonym predicate instead of adding the alternative words as a new word().
I've got as far as doing this:
synonym (house,[home]).
I'm not sure where to go from here.

If you're willing to enumerate your cases manually I would consider having a predicate that "normalizes" or "simplifies" vocabulary. For instance, something like this:
%% synonym(Synonym, CanonicalTerm) :- Synonym is a synonym for CanonicalTerm
synonym(loves, enjoys).
synonym(likes, enjoys).
synonym(enjoys, enjoys).
Prologs usually index on the first argument, so this lookup will be fast (certainly faster than enumerating the whole database and doing a member/2 lookup). And then you can simply perform this step after parsing or on-demand, and code your rules around the canonical term.
WordNet probably does not consider love and like to be synonyms, really, so it is probably overkill for your needs.
Let's apply this to the earlier question:
?- phrase(sentence(np(Noun,_), vp(Verb, np(Object, _))),
[a,teenage,boy,loves,a,big,problem]),
synonym(Verb, CanonicalVerb),
present(Suggestion, Noun, CanonicalVerb, Object).
Noun = boy,
Verb = loves,
CanonicalVerb = enjoys,
Object = problem,
Suggestion = 'construction kit'
This of course assumes you update the present/4 fact as well.

Related

Haskell - Why is Alternative implemented for List

I have read some of this post Meaning of Alternative (it's long)
What lead me to that post was learning about Alternative in general. The post gives a good answer to why it is implemented the way it is for List.
My question is:
Why is Alternative implemented for List at all?
Is there perhaps an algorithm that uses Alternative and a List might be passed to it so define it to hold generality?
I thought because Alternative by default defines some and many, that may be part of it but What are some and many useful for contains the comment:
To clarify, the definitions of some and many for the most basic types such as [] and Maybe just loop. So although the definition of some and many for them is valid, it has no meaning.
In the "What are some and many useful for" link above, Will gives an answer to the OP that may contain the answer to my question, but at this point in my Haskelling, the forest is a bit thick to see the trees.
Thanks

There's something of a convention in the Haskell library ecology that if a thing can be an instance of a class, then it should be an instance of the class. I suspect the honest answer to "why is [] an Alternative?" is "because it can be".
...okay, but why does that convention exist? The short answer there is that instances are sort of the one part of Haskell that succumbs only to whole-program analysis. They are global, and if there are two parts of the program that both try to make a particular class/type pairing, that conflict prevents the program from working right. To deal with that, there's a rule of thumb that any instance you write should live in the same module either as the class it's associated with or as the type it's associated with.
Since instances are expected to live in specific modules, it's polite to define those instances whenever you can -- since it's not really reasonable for another library to try to fix up the fact that you haven't provided the instance.

Alternative is useful when viewing [] as the nondeterminism-monad. In that case, <|> represents a choice between two programs and empty represents "no valid choice". This is the same interpretation as for e.g. parsers.
some and many does indeed not make sense for lists, since they try iterating through all possible lists of elements from the given options greedily, starting from the infinite list of just the first option. The list monad isn't lazy enough to do even that, since it might always need to abort if it was given an empty list. There is however one case when both terminates: When given an empty list.
Prelude Control.Applicative> many []
[[]]
Prelude Control.Applicative> some []
[]
If some and many were defined as lazy (in the regex sense), meaning they prefer short lists, you would get out results, but not very useful, since it starts by generating all the infinite number of lists with just the first option:
Prelude Control.Applicative> some' v = liftA2 (:) v (many' v); many' v = pure [] <|> some' v
Prelude Control.Applicative> take 100 . show $ (some' [1,2])
"[[1],[1,1],[1,1,1],[1,1,1,1],[1,1,1,1,1],[1,1,1,1,1,1],[1,1,1,1,1,1,1],[1,1,1,1,1,1,1,1],[1,1,1,1,1,"
Edit: I believe the some and many functions corresponds to a star-semiring while <|> and empty corresponds to plus and zero in a semiring. So mathematically (I think), it would make sense to split those operations out into a separate typeclass, but it would also be kind of silly, since they can be implemented in terms of the other operators in Alternative.

Consider a function like this:
fallback :: Alternative f => a -> (a -> f b) -> (a -> f e) -> f (Either e b)
fallback x f g = (Right <$> f x) <|> (Left <$> g x)
Not spectacularly meaningful, but you can imagine it being used in, say, a parser: try one thing, falling back to another if that doesn't work.
Does this function have a meaning when f ~ []? Sure, why not. If you think of a list's "effects" as being a search through some space, this function seems to represent some kind of biased choice, where you prefer the first option to the second, and while you're willing to try either, you also tag which way you went.
Could a function like this be part of some algorithm which is polymorphic in the Alternative it computes in? Again I don't see why not. It doesn't seem unreasonable for [] to have an Alternative instance, since there is an implementation that satisfies the Alternative laws.
As to the answer linked to by Will Ness that you pointed out: it covers that some and many don't "just loop" for lists. They loop for non-empty lists. For empty lists, they immediately return a value. How useful is this? Probably not very, I must admit. But that functionality comes along with (<|>) and empty, which can be useful.

What is the name of the data structure for and-or-lists (or and-or-trees) and where can I read about it?

I recently needed to make a data structure which was a nested list of and/or questions. Since most every interesting thing has been discovered by someone else previously, I’m looking for the name of this data structure. it looks something like this.
‘((a b c) (b d e) (c (a b) (f a)))
The interpretation is I want to find abc or bde or caf or caa or cbf or cba and the list encapsulates that. At the top level each item is or’ed together and sub-lists of the top level are and’ed together and sub-lists of sub-lists are or’ed again sub-lists of those are and’ed and sub-lists of those or’ed ad infinitum. Note that in my example, all the lists are the same length, in my real application the lists vary in length.
The code to walk such a “tree” is relatively simple, but I’m assuming that there is a name for that type of tree and there is stuff I can read about it.
These lists are equivalent to fixed length regular expressions (which I've seen referred to as "network expressions", but I am particularly interested in this data structure and representation thereof.

In general (in the very high level of abstraction) it is:
Context free grammar -Wiki
If you allow it to be infinitely nested, then it is not a regular expression because of presence of parentheses (left and right should match).
If you consider, that expressions inside parentheses are ordered. I mean that a and b and c is equivalent to (a and b) and c. You get then Binary expression tree -Wiki
But for your particular case, it is probably: Disjunctive normal form -Wiki
I am not sure, but my intuition says that it is regular expression again because you have only 2 levels of nesting (1st - for 'or-ed' and 2nd - for 'and-ed' parts)

The trees are also a subset of DAWGS - directed acyclic word graphs and one could construct them the same way.
In my case, I have a very small set that I have built by hand and I don't worry about getting the minimal set, but instead just want something that I can easily write down but deals with the types of simple variations I see. Basically, I have different ways of finding where I keep my .el files based upon the different directory structures of various OSes I use. (E.g. when I was working at Google, the /usr/local/emacs/site-lisp directory was actually more like /usr/local/Google/emacs/site-lisp.)
I don't need a full regex, but there are about a dozen variations, some having quite long lists of nested sub-directories (c:\users\cfclark\appData\roaming\emacs.emacs.d or some other awful thing) that I wanted to write down (and then have emacs make an automated search to find the one that is appropriate to this particular installation). And every time I go to a new job, I can simply add to the list a description of where they are in that setup.
Anyway, as that code has evolved, I found that I had I was doing (nested or's and and's and realized that the structure generalized to the alternating or/and/or/and/... case). So, my assumption is that someone must have discovered this before. I had hints of it myself several years ago, but didn't set down to implement it. The Disjunctive Normal Form link mpasko256 gave is also particularly relevant. I don't normalize to that level, I still keep nested and's and or's rather than flattening to 2, but I do have a distinct structure, or's at the top, then and's, then or's....

Proper flow control in Prolog without using the non-declarative if-then-else syntax

I would like to check for an arbitrary fact and do something if it is in the knowledge base and something else if it not, but without the ( I -> T ; E)syntax.
I have some facts in my knowledge base:
unexplored(1,1).
unexplored(2,1).
safe(1,1).
given an incomplete rule
foo:- safe(A,B),
% do something if unexplored(A,B) is in the knowledge base
% do something else if unexplored(A,B) is not in the knowledge base
What is the correct way to handle this, without doing it like this?
foo:-
safe(A,B),
( unexplored(A,B) -> something ; something_else ).

Not an answer but too long for a comment.
"Flow control" is by definition not declarative. Changing the predicate database (the defined rules and facts) at run time is also not declarative: it introduces state to your program.
You should really consider very carefully if your "data" belongs to the database, or if you can keep it a data structure. But your question doesn't provide enough detail to be able to suggest anything.
You can however see this example of finding paths through a maze. In this solution, the database contains information about the problem that does not change. The search itself uses the simplest data structure, a list. The "flow control" if you want to call it this is implicit: it is just a side effect of Prolog looking for a proof. More importantly, you can argue about the program and what it does without taking into consideration the exact control flow (but you do take into consideration Prolog's resolution strategy).

The fundamental problem with this requirement is that it is non-monotonic:
Things that hold without this fact may suddenly fail to hold after adding such a fact.
This inherently runs counter to the important and desirable declarative property of monotonicity.
Declaratively, from adding facts, we expect to obtain at most an increase, never a decrease of the things that hold.
For this reason, your requirement is inherently linked to non-monotonic constructs like if-then-else, !/0 and setof/3.
A declarative way to reason about this is to entirely avoid checking properties of the knowledge base. Instead, focus on a clear description of the things that hold, using Prolog clauses to encode the knowledge.
In your case, it looks like you need to reason about states of some search problem. A declarative way to solve such tasks is to represent the state as a Prolog term, and write pure monotonic rules involving the state.
For example, let us say that a state S0 is related to state S if we explore a certain position Pos that was previously not explored:
state0_state(S0, S) :-
select(Pos-unexplored, S0, S1),
S = [Pos-explored|S1].
or shorter:
state0_state(S0, [Pos-explored|S1) :-
select(Pos-unexplored, S0, S1).
I leave figuring out the state representation I am using here as an easy exercise. Notice the convenient naming convention of using S0, S1, ..., S to chain the different states.
This way, you encode explicit relations about Prolog terms that represent the state. Pure, monotonic, and works in all directions.

Is there a fairly simple way for a script to tell (from context) whether "her" is a possessive pronoun?

I am writing a script to reverse all genders in a piece of text, so all gendered words are swapped - "man" is swapped with "woman", "she" is swapped with "he", etc. But there is an ambiguity as to whether "her" should be replaced with "him" or "his".

Okay. Lets look at this like a linguist might. I am thinking aloud here.
"Her" is a pronoun. It can either be a:
1. possessive pronoun
This is her book.
2. personal pronoun
Give it to her. (after preposition)
He wrote her a letter. (indirect object)
He treated her for a cold. (direct object)
So lets look at case (1), possessive pronoun. That is it is a pronoun which is in the "genitive" case (meaning, it is a noun which is being "possessive." Okay, that detail isn't quite as important as the next one.)
In this case, "her" is acting as a "determiner". Determiners may occur in two places in a sentence (this is a simplification):
Det + Noun ("her book")
Det + Adj + Noun ("her nice book")
So to figure out if her is a determiner, you could have this logic:
a. If the word following "her" is a noun, then "her" is a determiner.
b. If the 2 words following "her" is an adjective, then a noun, then "her" is a determiner"
And if you establish that "her" is a determiner, then you know that you must replace it with "his", which is also a determiner (aka genitive noun, aka possessive pronoun).
If it doesn't match criteria (a) and (b) above, then you could possibly conclude that it is not a determiner, which means it must be a personal pronoun. In that case, you would replace "her" with "him".
You wouldn't even have to do the tests below, but I'll try to describe them anyway.
Looking at (2) from above: personal pronoun, rather than possessive. This gets trickier.
The examples above show "her" occurring in 3 ways:
(1) Give it to her. (after preposition. we call this the "object of a preposition".)
So you could maybe devise a rule: "If 'her' occurs immediately after a preposition, then it should be treated as a noun, so we would replace it with 'him'".
The next two are tricky. "her" can either be a direct object or an indirect object.
(2) He wrote her a letter. (indirect object)
(3) He treated her for a cold. (direct object)
Syntactically, how can we tell the difference?
A direct object occurs immediately after a verb.
If you have a verb, followed by a noun, then that noun is a direct object. eg:
He treated her.*
If you have a verb, followed by a noun, followed by a prepositional phrase, then the noun is a direct object.
He treated her for a cold. ("her" is a noun, and it comes immediately after the verb "treated". "for a cold" is a prepositional phrase.)
Which means that you could say "If you have Verb + Noun + Prep" then the noun is a direct object. Since the noun is a direct object, then it is a personal pronoun, so use "him". (note, you only have to check for a preposition, not the entire prep phrase, since the phrase will always begin with a preposition.)
If it is an indirect object, then you'll have the form "verb + noun + noun".
He wrote her a letter. ("her" is a noun, "letter" is a noun. well, "a letter" is a "noun phrase", so you'd have to account for determiners as well.)
So... if "her" is a direct object, indirect object, or obj of prep, you could change it to "him", otherwise, change it to "his".
This method seems a lot more complicated - so I'd just start by checking to see if "her" is a determiner (see above), and if it is a determiner, use "his" otherwise, just use "him".
So, the above has a lot of simplifications. It doesn't cover "interrupting phrases", or clause structures, or constituency tests, or embedded clauses, or punctuation, or anything like that.
Also, this solution requires a dictionary - a list of "nouns" and "verbs" and "prepositions" so that you can determine the lexical category of each word in the sentence.
And even there, man, natural language processing is hard. You'd want to do some sort of "training" for your model to have a good solution. BUT for very simple things, try some of the stuff described above.
Sorry for being so verbose! (None of the existing answers gave any hard data, or precise linguistic definitions, so here goes.)

Given the scope of your project: reversing all gender-related words, it appears that :
The "investment" in a more fundamental approach would be justified
No heuristic based on simple lookup/substitution will adequately serve all or even most cases.
Furthermore, Regex too seems a poor choice of tool; natural language is just not a regular langugage ;-).
Instead, you should consider introducing Part-of-Speech (POS) tagging, possibly with a hint of Named Entity Recognition, and then apply substitution rules based on the extra info the tagging supplied.
This may seem like a lot of work, but if for example your scripting language happens to be Python, you can leverage NTLK to implement all this with a relatively small effort.

G'day,
This is one of those cases where you could invest an inordinate amount of time tracking down the automatic solution and finish up with a result that you're going to have to check through anyway.
I'd suggest making your script insert a piece of text that will really stand out at every instance of "her" and would be easily searchable. Maybe even make the script insert both "him" and "his" strings so that you only need to delete one of them after you've seen the context?
You're going to save a lot of time and effort this way. Not to mention blood, sweat and tears even! (-:
Coming up with a fully automatic solution is no mean feat as it will involve scanning a massive corpus of words to determine if the following word is an object.
Sometimes gaining that extra 5 or 10 percent improvement is just not worth the extra effort involved. Except of course as an "it is left as an interesting exercise for the reader..." type problem that some text books seem to love.
Edit: I forgot to mention that finding this "tipping point" is a true art. Definitely one skill that only comes with experience. (-:
Edit: Part II - The Revenge I also forgot to mention that you can eliminate one edge case though. If the word "him" is followed by punctuation, e.g. "... to her.", "... for her," etc. then you can eliminate the uncertainty for those cases and just replace them with "him". Similarly if the word is followed by a class of words, e.g. "... for her to" can have the "her" easily be replaced with "him". Edit 3: This is not a full list of exceptions but is merely intended as a suggestion for a starting point of the list of items you'll need to look for.
HTH

Trying to determine whether her is a possessive or personal pronoun is harder than trying to determine the class of him or his. However, you would expect both to be used in the same contexts given a large enough corpus. So why not reverse the problem? Take a large corpus and find all occurrences of him and his. Then look at the words surrounding them (just how many words you need to look at is left up to you). With enough training examples, you can estimate the probability that a given set of words in the vicinity of the word indicates him or his. Then you can use those probability estimates on an occurrence of her to determine whether you should be using him or his. As other responses have indicated, you're not going to be perfect. Also, figuring out how big of a neighborhood to use and how to calculate the probabilities is a fair bit of work. You could probably do fairly well using a simple classifier like Naive Bayes.
I suspect, though, you can get a decent bit of accuracy just by looking at patterns in parts of speech and writing some rules. Naturally, you'll miss some, but probably a dozen rules or so will account for the majority of occurrences. I just glanced through about fifty occurrences of her in "The Phantom Rickshaw" by Rudyard Kipling and you can easily get 90% accuracy just by the rule:
her_followed_by_noun ? possessive : personal
You can use an off-the-shelf part-of-speech (POS) tagger like the Stanford POS Tagger to automatically determine whether a word is a noun or something else in context. Again, it's not perfect, but it does pretty well.
Edge cases with odd clause structures are hard to get right, but they also occur fairly rarely in most text. It just depends on your data.

I don’t think so. You could check if the possessive pronoun is followed by a noun or an adjective and thereby conclude that is indeed a possessive pronoun. But of course you would have to write a script that is able to do this and even if you had a method it would still be wrong in some other cases. A simple pattern matching algorithm won’t help you here.
Good luck with analysing this: http://en.wikipedia.org/wiki/X-bar_theory

Definitely no. You would have to do syntactic analysis on your input text (parsing the English language, really, that's where the word “to parse” comes from). That's the only way you can determine with certainty what the “her” in your text stand for, you can't rely on search-and-replace. There are many ways to do that, but none would qualify as “fairly simple”, I think.

I will address regex, since that is one of the tags. Regular expressions are insufficiently powerful for parsing human language, because regex does not do recursion, and all human lnguages are recursive.
When this fact is combined with the other ambiguities in English, such as the way many words can serve multiple functions in a sentense, I think that a reliable automated solution will be a very difficult and costly project.

About the only one I can think of (and I'm sure someone in the comments will prove me wrong!) is any instance of her followed by punctuation can most probably be replace with him. But I still agree with the previous answers that you're probably best off doing a manual replace.

OK, based on some of the answers people gave I've got a better idea of how to approach this. Instead of trying to write a script that gets this right 100% of the time I'll just aim to get it right as often as possible. A quick search through some English-language texts shows that "his" appears (very roughly) twice as often as "him", so the default behaviour should be to convert "her" to "his". If I did this and nothing else it should be right about two thirds of the time.
Now I'm not interested in finding patterns that would show "her" should be converted to "his", since this is what I would do anyway, I'm only interested in finding patterns that would show "her" should be converted to "him", since these would allow me to lower the error rate. There's two rules I can implement fairly painlessly:
If "her" is followed immediately by a comma or period, it should be converted to "him", as Michael Itzoe said.
If 'her' occurs immediately after a preposition, then it should be treated as a noun, we would replace it with 'him', as Rasher said.
And I'll be able to do more than that if I use Part of Speech tagging software. I think I'll get on with doing the easy stuff first :-)

Expression Evaluation in C++

I'm writing some excel-like C++ console app for homework.
My app should be able to accept formulas for it's cells, for example it should evaluate something like this:
Sum(tablename\fieldname[recordnumber], fieldname[recordnumber], ...)
tablename\fieldname[recordnumber] points to a cell in another table,
fieldname[recordnumber] points to a cell in current table
or
Sin(fieldname[recordnumber])
or
anotherfieldname[recordnumber]
or
"10" // (simply a number)
something like that.
functions are Sum, Ave, Sin, Cos, Tan, Cot, Mul, Div, Pow, Log (10), Ln, Mod
It's pathetic, I know, but it's my homework :'(
So does anyone know a trick to evaluate something like this?

Ok, nice homework question by the way.
It really depends on how heavy you want this to be. You can create a full expression parser (which is fun but also time consuming).
In order to do that, you need to describe the full grammar and write a frontend (have a look at lex and yacc or flexx and bison.
But as I see your question you can limit yourself to three subcases:
a simple value
a lookup (possibly to an other table)
a function which inputs are lookups
I think a little OO design can helps you out here.
I'm not sure if you have to deal with real time refresh and circular dependency checks. Else they can be tricky too.

For the parsing, I'd look at Recursive descent parsing. Then have a table that maps all possible function names to function pointers:
struct FunctionTableEntry {
string name;
double (*f)(double);
};

You should write a parser. Parser should take the expression i.e., each line and should identify the command and construct the parse tree. This is the first phase. In the second phase you can evaluate the tree by substituting the data for each elements of the command.

Previous responders have hit it on the head: you need to parse the cell contents, and interpret them.
StackOverflow already has a whole slew of questions on building compilers and interperters where you can find pointers to resources. Some of them are:
Learning to write a compiler (#1669 people!)
Learning Resources on Parsers, Interpreters, and Compilers
What are good resources on compilation?
References Needed for Implementing an Interpreter in C/C++
...
and so on.
Aside: I never have the energy to link them all together, or even try to build a comprehensive list.

I guess you cannot use yacc/lex (or the like) so you have to parse "manually":
Iterate over the string and divide it into its parts. What a part is depends on you grammar (syntax). That way you can find the function names and the parameters. The difficulty of this depends on the complexity of your syntax.
Maybe you should read a bit about lexical analysis.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js