Reading and storing all words of file in prolog - list

I am a newbie to prolog, till now I am able to read all words of file, displayed them one by one, now I want to store them in a list(one by one, as I soon as I am displaying them). All logic for append given everywhere, append content of two lists in an empty list. For example
append(new_word,word_list,word_List), intially my word_list is empty, so everything fine, but afterwards it says no, and stop at that point.
Need help to be able to store element in list one by one.

You can use difference lists :
file_to_list(W, L) :-
read_word(Word),
append_dl(W, [Word|U]-U, Ws),
!, file_to_list(Ws, L).
file_to_list_1(Ws, Ws).
append_dl(X-Y, Y-Z, X-Z).
You call file_to_list(U-U, L-[]) to get the list of words. There is no slowdown but takes more inferences than CapelliC's code (one per word).

Related

SpaCy questions about lists of lists in Python 2.7

I think part of my issue has to do with spaCy and part has to do with not understanding the most elegant way to work within python itself.
I am uploading a txt file in python, tokenizing it into sentences and then tokenizing that into words with nltk:
sent_text = nltk.sent_tokenize(text)
tokenized_text = [nltk.word_tokenize(x) for x in sent_text]
That gives me a list of lists, where each list within the main list is a sentence of tokenized words. So far so good.
I then run it through SpaCy:
text = nlp(unicode(tokenized_text))
Still a list of lists, same thing, but it has all the SpaCy info.
This is where I'm hitting a block. Basically what I want to do is, for each sentence, only retain the nouns, verbs, and adjectives, and within those, also get rid of auxiliaries and conjunctions. I was able to do this earlier by creating a new empty list and appending only what I want:
sent11 = []
for token in sent1:
if (token.pos_ == 'NOUN' or token.pos_ == 'VERB' or token.pos_ =='ADJ') and (token.dep_ != 'aux') and (token.dep_ != 'conj'):
sent11.append(token)
This works fine for a single sentence, but I don't want to be doing it for every single sentence in a book-length text.
Then, once I have these new lists (or whatever the best way to do it is) containing only the pieces I want, I want to use the "similarity" function of SpaCy to determine which sentence is closest semantically to some other, much shorter text that I've done the same stripping of everything but nouns, adj, verbs, etc to.
I've got it working when comparing one single sentence to another by using:
sent1.similarity(sent2)
So I guess my questions are
1) What is the best way to turn a list of lists into a list of lists that only contain the pieces I want?
and
2) How do I cycle through this new list of lists and compare each one to a separate sentence and return the sentence that is most semantically similar (using the vectors that SpaCy comes with)?
You're asking a bunch of questions here so I'm going to try to break them down.
Is nearly duplicating a book-length amount of text by appending each word to a list bad?
How can one eliminate or remove elements of a list efficiently?
How can one compare a sentence to each sentence in the book where each sentence is a list and the book is a list of sentences.
Answers:
Generally yes, but on a modern system it isn't a big deal. Books are text which are probably just UTF-8 characters if English, otherwise they might be Unicode. A UTF-8 character is a byte and even a long book such as War and Peace comes out to under 3.3 Mb. If you are using chrome, firefox, or IE to view this page your computer has more than enough memory to fit a few copies of it into ram.
In python you can't really.
You can do removal using:
l = [1,2,3,4]
del l[-2]
print(l)
[1,2,4]
but in the background python is copying every element of that list over one. It is not recommended for large lists. Instead using a dequeue which implements itself as a doublely-linked-list has a bit of extra overhead but allows for efficient removal of elements in the middle.
If memory is an issue then you can also use generators wherever possible. For example you could probably change:
tokenized_text = [nltk.word_tokenize(x) for x in sent_text]
which creates a list that contains tokens of the entire book, with
tokenized_text = (nltk.word_tokenize(x) for x in sent_text)
which creates a generator that yields tokens of the entire book. Generators have almost no memory overhead and instead compute the next element as they go.
I'm not familiar with SpaCy, and while the question fits on SO you're unlikely to get good answers about specific libraries here.
From the looks of it you can just do something like:
best_match = None
best_similarity_value = 0
for token in parsed_tokenized_text:
similarity = token.similarity(sent2)
if similarity > best_similarity_value:
best_similarity_value = similarity
best_match = token
And if you wanted to check against multiple sentences (non-consecutive) then you could put an outer loop that goes through those:
for sent2 in other_token_list:

Erlang - Can one use Lists:append for adding an element to a string?

Here is my function that parses an addition equation.
expr_print({num,X}) -> X;
expr_print({plus,X,Y})->
lists:append("(",expr_print(X),"+",expr_print(Y),")").
Once executed in terminal it should look like this (but it doesn't at the moment):
>math_erlang: expr_print({plus,{num,5},{num,7}}).
>(5+7)
Actually one could do that, but it would not work the way wish in X in {num, X} is a number and not string representation of number.
Strings in Erlang are just lists of numbers. And if those numbers are in wright range they can be printed as string. You should be able to find detail explenation here. So first thing you wold like to do is to make sure that call to expr_print({num, 3}). will return "3" and not 3. You should be able to find solution here.
Second thing is lists:append which takes only one argument, list of list. So your code could look like this
expra_print({num,X}) ->
lists:flatten(io_lib:format("~p", [X]));
expr_print({plus,X,Y})->
lists:append(["(", expr_print(X),"+",expr_print(Y), ")"]).
And this should produce you nice flat string/list.
Another thing is that you might not need flat list. If you planning to writing this to file, or sending over TCP you might want to use iolist, which are much easier to create (you could drop append and flatten calls) and faster.

Prolog Combining Two Lists

I am new to prolog and would appreciate any help on the following question:
I need to write a program that accepts two lists and appends the second to first and displays this new list and its length. I know that prolog might have some built in functions to make this all easier...but I do not want to use those.
eg: newlist([a,b,c],[d,e,f],L3,Le). would return L3=[a,b,c,d,e,f] and Le=6
Here is what I have so far:
newlist([],List,List,0)
newlist([Element|List1],List2,[Element|List3],L) :- newlist(List1,List2,List3, LT), L is LT + 1.
This does the appending correctly but I can only get the length of the first list instead of the combined list. Is there a way for me to add the second list's length to the first to get the combined list length?
Thanks, and sorry if this question is rather easy...I am new.
Is there a way for me to add the second list's length to the first to get the combined list length?
You should replace:
newlist([],List,List,0).
with:
newlist([],List,List,X):-length(List,X).

Prolog permutations with repetition

I'm having a hard time wrapping my head around the concept of logic programming. I'm trying to get all permutations with repetition into a give list.
I can put what I have, but I don't know what I'm doing!
perms_R(List,[]).
perms_R([X|Xt],[Y|Yt],Out) :- perms_R([Y|Xt],Yt),perms_R(Xt,[Y|Yt])
.
The idea was to go through each element in the second list and put it in my first list. I'm trying to figure this out, but I'm stuck.
I need to call perms_R([a,b,c,d],[1,2,3,4]). and get:
1,1,1,1
1,1,1,2
1,1,1,3
1,1,1,4
1,1,2,1
etc....
I understand the first list seems useless and I could just do it with a list length, but I actually need it for the remainder of my code, so I'm trying to model this after what I need. Once I get past this part, I will be putting extra logic in that will limit the letters that can be replaced in the first list, but don't worry about that part!
What you are looking for is not a permutation. You want to create a list of a given size using items from a given set.
You may do it with this snippet:
perms_R([], _).
perms_R([Item|NList], List):-
member(Item, List),
perms_R(NList, List).
You would need to pass a semi instantiated list and the source items:
perms_R([A,B,C,D],[1,2,3,4]).

GNU Prolog - Build up a list in a loop

I need to build a new list with a "loop". Basically i can't use recursion explicitly, so i am using append to go through lists of list.
I can get the element. Problem is i need to check this element and if something is true it returns another element i need to put back into the list. It does check correctly and it changes correctly.
Problem i am having is how do i create a completely new list.
So, if i had
[[1,1,1],[2,6,2],[3,3,3]]
I go through each element. say i get to the 6 and it changes. So i need to create a new list like so,
[[1,1,1],[2,10,2],[3,3,3]].
Right now my main problem is just creating each row. If i can create each row, i will be able to create a list of lists.
So to break this down a little more, lets just worry about [1,1,1].
I go through each element while appending the new element to a newlist. the new list is now [1,1,1]
I have this:
set(Row,Col,Bin,TheEntry,Bout) :-
append(ListLeft, [R|_], Bin),
append(ListLeft2, [C|_], R),
length(ListLeft, LenR),
length(ListLeft2,LenC),
CurrRow is LenR + 1,
CurrCol is LenC + 1,
getChar(C, Row, Col, CurrRow, CurrCol,TheEntry, NewC),
appendhere?.
I need to create a new list there with the character returned from NewC. Not sure how to do this.
Any clues?
Thanks.
To give you an idea about how to use append/3 to extract an item from a list of lists, consider the following predicate called replace/2:
replace(In, Out) :-
append(LL, [L|RL], In),
append(LE, [E|RE], L),
replaceElement(E, NewE), !,
append(LE, [NewE|RE], NewL),
append(LL, [NewL|RL], Out).
replace(In, In).
This non-recursive predicate takes, as Input, a list of lists, and backtracks to find an element E within an inner list L that can be replaced via replaceElement/2; if so, it is replaced by constructing the inner list first (NewL), then uses this new list in the construction of the new outer list (Out), as the result.
Note that this simply serves to demonstrate how to use append/3 to break apart a list of lists to retrieve individual elements as you need via backtracking, and not recursion, as requested. Once an element E is found to be replaceable by NewE via replaceElement/3, it is used in the construction of the list again using append/3 as shown.
Also note that this suggestion (which is intended to help you, not be your final answer) also happens to replace only a single element within an inner list, if any at all. If you want to do multiple replacements of the input list in a single call to replace/2 or similar using this technique, then you will almost certainly need a recursive definition, or the ability to use the global database via assert. I'm happy to be corrected if someone else can provide a definition as a counterexample.
With this example predicate replace/2, together with, say, the following fact:
replaceElement(6, 10).
Executing the following gives us your required behaviour:
1 ?- replace([[1,1,1],[2,6,2],[3,3,3]], Out).
Out = [[1, 1, 1], [2, 10, 2], [3, 3, 3]] ;
false.
If you cannot use cut (!), it is fine to omit it, but note that the second clause replace(In, In) will cause all calls to replace/2 to backtrack at least once to give you the input list back. If this behaviour is undesirable, omitting this second clause will cause replace/2 to fail outright if there is no replacement to be made.
If you cannot use recursion and have to do it with backtracking you should do something like this:
Assume Bin is a list of lists (each item is a full row)
~ Split input Bin in three parts (a list of 'left' rows, a Row, and a list of remaining rows). This can be done using append/3 with something like append(Left, [Item|Rest], Rows)
~ Now obtain the length of the 'left' rows
~ Test the length using 'is' operator to check wether the left list has Row - 1 items
~ Do the same but now with the Item, i.e. split it in three parts (LeftColums, ColumItem and Rest)
~ Test now the length against the required Column
~ Now you have the Item to change so all you need to do is rebuild a list using two appends (one to rebuild the chosen row and another to rebuild the output list).
So from your code you wouldn't use unnamed variables (_). Instead of that you have to use a named variable to be able to rebuild the new list with the item changed.