Making tuples out of two lists - list

I'm a beginner in Prolog and I have two make tuples out of two lists using recursion. For example, func ([1, 2, 3], [4, 5, 6]) should output [(1, 4), (1,5), (1,6), (2, 4), (2, 5), (2, 6), (3, 4), (3 ,5), (3, 6)]. I have the following code:
func([],_,[]).
func([X|T1],Y,[Z|W]):-
match(X,Y,Z),
func(T1,Y,W).
match(X,[Y],[(X,Y)]).
match(X,[Y|T],[(X,Y)|Z]) :-
match(X,T,Z).
But my output for func([1,2,3],[4,5,6],X) is
X = [[(1, 4), (1, 5), (1, 6)], [(2, 4), (2, 5), (2, 6)], [(3, 4), (3, 5), (3, 6)]].
How can I get rid of the extra square brackets in the middle of my output? I've tried playing with the parenthesis and brackets in both of my functions, but I can't figure it out.

Using the findall/3 standard predicate and the member/2 de facto standard predicate:
| ?- findall(X-Y, (member(X,[1,2,3]), member(Y,[4,5,6])), Pairs).
Pairs = [1-4,1-5,1-6,2-4,2-5,2-6,3-4,3-5,3-6]
yes
To understand this solution, observe that, for each value of X, we enumerate by backtracking all values of Y. I.e. when backtracking (as implicitly performed by the findall/3 predicate to construct a list of all solutions of its second argument), we exhaust all solutions for the last choice-point (the member(Y,[4,5,6]) goal) before backtracking to a previous choice-point (the member(X,[1,2,3]) goal). This is know as chronological backtracking and is one of the defining characteristics of Prolog.
Note that I used X-Y, the usual Prolog representation for a pair, instead of (X,Y), which is not a recommended solution for constructing n-tuples as it only works nicely for pairs of elements.

Related

Tuple to list in Prolog

Recently I'm doing a program and it requires me to convert a tuple to a list.
Tuples look like this: [(1,[1,2,3,4]), (1,[2,3,4,5]), ...]
And what I want is a list of: [(1,2,3,4), (2,3,4,5), ...]
Is there any way I can do that?
In Prolog, (1, 4, 2, 5), is syntactical sugar for (1, (4, (2, 5))), just like [1, 4, 2, 5] is syntactical sugar for [1|[4|[2|[5|[]]]]] (note however that the list ends with an empty list [], whereas for a tuple, it ends with a (2, 5)).
list_tuple([A, B], (A, B)).
list_tuple([A|T], (A, B)) :-
list_tuple(T, B).
So then we can write a predicate to unpack the list out of the 2-tuple, and convert the list to a tuple:
conv((_, L), R) :-
list_tuple(L, R).
and we can use maplist/3 to perform the conversion over the entire list:
convlist(As, Bs) :-
maplist(conv, As, Bs).
This then yields:
?- convlist([(1,[1,2,3,4]), (1,[2,3,4,5])], R).
R = [(1, 2, 3, 4), (2, 3, 4, 5)] ;
false.
Tuples are however in Prolog not very common, so I do not see why you do not stick with the list itself.

Word Labels for Document Matrix in Gensim

My ultimate goal is to produce a *.csv file containing labeled binary term vectors for each document. In essence, a term document matrix.
Using gensim, I can produce a file with an unlabeled term matrix.
I do this by essentially copying and pasting code from here: http://radimrehurek.com/gensim/tut1.html
Given a list of documents called "texts".
corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus)
[(0, 1), (1, 1), (2, 1)]
[(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)]
[(2, 1), (5, 1), (7, 1), (8, 1)]
[(1, 1), (5, 2), (8, 1)]
[(3, 1), (6, 1), (7, 1)]
[(9, 1)]
[(9, 1), (10, 1)]
[(9, 1), (10, 1), (11, 1)]
[(4, 1), (10, 1), (11, 1)]
To convert the above vectors into a numpy matrix, I use:
scipy_csc_matrix = gensim.matutils.corpus2csc(corpus)
I then convert the sparse numpy matrix to a full array:
full_matrix = csc_matrix(scipy_csc_matrix).toarray()
Finally, I output this to a file:
with open('file.csv','wb') as f:
writer = csv.writer(f)
writer.writerows(full_matrix)
This produces a matrix of binomial vectors, but I do not know which vector represents which word. Is there an accurate way of matching words to vectors?
I've tried parsing the dictionary to creative a list of words which I would glue to the above full_matrix.
#Retrive dictionary
tokenIDs = dictionary.token2id
#Retrieve keys from dictionary and concotanate those to full_matrix
for key, value in tokenIDs.iteritems():
temp1 = unicodedata.normalize('NFKD', key).encode('ascii','ignore')
temp = [temp1]
dictlist.append(temp)
Keys = np.asarray(dictlist)
#Combine Keys and Matrix
labeled_full_matrix = np.concatenate((Keys, full_matrix), axis=1)
However, this does not work. The word ids (Keys) are not matched to the appropriate vectors.
I am under the assumption a much simpler and more elegant approach is possible. But after some time, I haven't been able to find it. Maybe someone here can help, or point me to something fundamental I've missed.
Is this what you want?
%time lda1 = models.LdaModel(corpus1, num_topics=20, id2word=dictionary1, update_every=5, chunksize=10000, passes=100)
import pandas
mixture = [dict(lda1[x]) for x in corpus1]
pandas.DataFrame(mixture).to_csv("output.csv")

Ungrouping a (key, list(values)) pair in Spark/Scala

I have data formatted in the following way:
DataRDD = [(String, List[String])]
The first string indicates the key and the list houses the values. Note that the number of values is different for each key (but is never zero). I am looking to map the RDD in such a way that there will be a key, value pair for each element in the list. To clarify this, imagine the whole RDD as the following list:
DataRDD = [(1, [a, b, c]),
(2, [d, e]),
(3, [a, e, f])]
Then I would like the result to be:
DataKV = [(1, a),
(1, b),
(1, c),
(2, d),
(2, e),
(3, a),
(3, e),
(3, f)]
Consequently, I would like to return all combinations of keys which have identical values. This may be returned into a list for each key, even when there are no identical values:
DataID = [(1, [3]),
(2, [3]),
(3, [1, 2])]
Since I'm fairly new to Spark and Scala I have yet to fully grasp their concepts, as such I hope any of you can help me. Even if it's just a part of this.
This is definitely a newbie question that often times comes up. The solution is to use flatMapValues
val DataRDD = sc.parallelize(Array((1, Array("a", "b", "c")), (2, Array("d", "e")),(3, Array("a", "e", "f"))))
DataRDD.flatMapValues(x => x).collect
Which will give the desired solution
Array((1,a), (1,b), (1,c), (2,d), (2,e), (3,a), (3,e), (3,f))

how to iterate through lists vertically?

I have multiple lists to work with. What I'm trying to do is to take a certain index for every list(in this case index 1,2,and 3), in a vertical column. And add those vertical numbers to an empty list.
line1=[1,2,3,4,5,5,6]
line2=[3,5,7,8,9,6,4]
line3=[5,6,3,7,8,3,7]
vlist1=[]
vlist2=[]
vlist3=[]
expected output
Vlist1=[1,3,5]
Vlist2=[2,5,6]
Vlist3=[3,7,3]
Having variables with numbers in them is often a design mistake. Instead, you should probably have a nested data structure. If you do that with your line1, line2 and line3 lists, you'd get a nested list:
lines = [[1,2,3,4,5,5,6],
[3,5,7,8,9,6,4],
[5,6,3,7,8,3,7]]
You can then "transpose" this list of lists with zip:
vlist = list(zip(*lines)) # note the list call is not needed in Python 2
Now you can access the inner lists (which in are actually tuples this now) by indexing or slicing into the transposed list.
first_three_vlists = vlist[:3]
in python 3 zip returns a generator object, you need to treat it like one:
from itertools import islice
vlist1,vlist2,vlist3 = islice(zip(line1,line2,line3),3)
But really you should keep your data out of your variable names. Use a list-of-lists data structure, and if you need to transpose it just do:
list(zip(*nested_list))
Out[13]: [(1, 3, 5), (2, 5, 6), (3, 7, 3), (4, 8, 7), (5, 9, 8), (5, 6, 3), (6, 4, 7)]
Use pythons zip() function, index accordingly.
>>> line1=[1,2,3,4,5,5,6]
>>> line2=[3,5,7,8,9,6,4]
>>> line3=[5,6,3,7,8,3,7]
>>> zip(line1,line2,line3)
[(1, 3, 5), (2, 5, 6), (3, 7, 3), (4, 8, 7), (5, 9, 8), (5, 6, 3), (6, 4, 7)]
Put your input lists into a list. Then to create the ith vlist, do something like this:
vlist[i] = [];
for l in list_of_lists:
vlist[i].append(l[i])

Composing a list of all pairs

I'm brand new to Scala, having had very limited experience with functional programming through Haskell.
I'd like to try composing a list of all possible pairs constructed from a single input list. Example:
val nums = List[Int](1, 2, 3, 4, 5) // Create an input list
val pairs = composePairs(nums) // Function I'd like to create
// pairs == List[Int, Int]((1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1) ... etc)
I tried using zip on each element with the whole list, hoping that it would duplicate the one item across the whole. It didn't work (only matched the first possible pair). I'm not sure how to repeat an element (Haskell does it with cycle and take I believe), and I've had trouble following the documentation on Scala.
This leaves me thinking that there's probably a more concise, functional way to get the results I want. Does anybody have a good solution?
How about this:
val pairs = for(x <- nums; y <- nums) yield (x, y)
For those of you who don't want duplicates:
val uniquePairs = for {
(x, idxX) <- nums.zipWithIndex
(y, idxY) <- nums.zipWithIndex
if idxX < idxY
} yield (x, y)
val nums = List(1,2,3,4,5)
uniquePairs: List[(Int, Int)] = List((1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
Here's another version using map and flatten
val pairs = nums.flatMap(x => nums.map(y => (x,y)))
List[(Int, Int)] = List((1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (2,5), (3,1), (3,2), (3,3), (3,4), (3,5), (4,1), (4,2), (4,3), (4,4), (4,5), (5,1), (5,2) (5,3), (5,4), (5,5))
This can then be easily wrapped into a composePairs function if you like:
def composePairs(nums: Seq[Int]) =
nums.flatMap(x => nums.map(y => (x,y)))