Composing a list of all pairs - list

I'm brand new to Scala, having had very limited experience with functional programming through Haskell.
I'd like to try composing a list of all possible pairs constructed from a single input list. Example:
val nums = List[Int](1, 2, 3, 4, 5) // Create an input list
val pairs = composePairs(nums) // Function I'd like to create
// pairs == List[Int, Int]((1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1) ... etc)
I tried using zip on each element with the whole list, hoping that it would duplicate the one item across the whole. It didn't work (only matched the first possible pair). I'm not sure how to repeat an element (Haskell does it with cycle and take I believe), and I've had trouble following the documentation on Scala.
This leaves me thinking that there's probably a more concise, functional way to get the results I want. Does anybody have a good solution?

How about this:
val pairs = for(x <- nums; y <- nums) yield (x, y)

For those of you who don't want duplicates:
val uniquePairs = for {
(x, idxX) <- nums.zipWithIndex
(y, idxY) <- nums.zipWithIndex
if idxX < idxY
} yield (x, y)
val nums = List(1,2,3,4,5)
uniquePairs: List[(Int, Int)] = List((1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))

Here's another version using map and flatten
val pairs = nums.flatMap(x => nums.map(y => (x,y)))
List[(Int, Int)] = List((1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (2,5), (3,1), (3,2), (3,3), (3,4), (3,5), (4,1), (4,2), (4,3), (4,4), (4,5), (5,1), (5,2) (5,3), (5,4), (5,5))
This can then be easily wrapped into a composePairs function if you like:
def composePairs(nums: Seq[Int]) =
nums.flatMap(x => nums.map(y => (x,y)))

Related

F# - generating a list of tuples from integer input

I'm supposed to return a list of tuples from an integer input.
For example:
output' 4 should return a list of tuples:
[(1, 1);
(2, 1); (2, 2);
(3, 1); (3, 2); (3, 3);
(4, 1); (4, 2); (4, 3); (4, 4)]
At the moment I'm getting
[(1, 1); (1, 2); (1, 3); (1, 4);
(2, 1); (2, 2); (2, 3); (2, 4);
(3, 1);(3, 2); (3, 3); (3, 4);
(4, 1); (4, 2); (4, 3); (4, 4)]
What I have so far:
let output' x =
let ls= [1..x]
ls |> List.collect (fun x ->[for i in ls -> x,i])
output' 4
I can't figure out how to get the needed output. Any help would be appreciated.
You can add a filter:
...
|> List.filter (fun (a, b) -> a >= b)`
or
let output x =
[ for i in 1..x do
for j in 1..i do yield (i,j)
]
In F# they mostly work with sequences, so here is a sequence-driven lazy solution:
let output' max =
let getTuples x =
seq { 1 .. x }
|> Seq.map (fun y -> (x, y))
seq { 1 .. max }
|> Seq.map getTuples
If you need lists, replace seq { 1 .. x } with [ 1 .. x ].
It will still be more functional-way than loops.

How to sort a List of Lists of pairs of number in descending order by the second item in each pair in Scala?

I have a list that looks like this List(List(0, 2), List(0, 3), List(2, 3), List(3, 2), List(3, 0), List(2, 0))), note this list will only contain pairs and will not contain duplicate pairs. I want to sort the list in descending order by the second item in each sub list in this larger list. If there are duplicate values I don't really which comes first.
For this instance the answer could look like List(List(0,3), List(2,3), List(0,2), List(3,2), List(3,0), List(2,0))
My idea was looping through the larger list and get a list containing each second item in each pair and sort those but I am having trouble keeping track of which second item in each pair belong to which pair afterwards. Perhaps there is a more clever way?
A simple solution would be:
list.sortBy(-_.last)
You can simply do:
list.sortBy(-_(1))
If the lists are always length 2, I would use tuples instead of lists. Then it is just a matter of using sortBy
scala> val l1 = List(List(0, 2), List(0, 3), List(2, 3), List(3, 2), List(3, 0), List(2, 0))
l1: List[List[Int]] = List(List(0, 2), List(0, 3), List(2, 3), List(3, 2), List(3, 0), List(2, 0))
scala> val l2 = l1.map(x => (x(0), x(1)))
l2: List[(Int, Int)] = List((0,2), (0,3), (2,3), (3,2), (3,0), (2,0))
scala> l2.sortBy(-_._2)
res1: List[(Int, Int)] = List((0,3), (2,3), (0,2), (3,2), (3,0), (2,0))
list.sortBy( sublist => -1 * sublist.tail.head )

Word Labels for Document Matrix in Gensim

My ultimate goal is to produce a *.csv file containing labeled binary term vectors for each document. In essence, a term document matrix.
Using gensim, I can produce a file with an unlabeled term matrix.
I do this by essentially copying and pasting code from here: http://radimrehurek.com/gensim/tut1.html
Given a list of documents called "texts".
corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus)
[(0, 1), (1, 1), (2, 1)]
[(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)]
[(2, 1), (5, 1), (7, 1), (8, 1)]
[(1, 1), (5, 2), (8, 1)]
[(3, 1), (6, 1), (7, 1)]
[(9, 1)]
[(9, 1), (10, 1)]
[(9, 1), (10, 1), (11, 1)]
[(4, 1), (10, 1), (11, 1)]
To convert the above vectors into a numpy matrix, I use:
scipy_csc_matrix = gensim.matutils.corpus2csc(corpus)
I then convert the sparse numpy matrix to a full array:
full_matrix = csc_matrix(scipy_csc_matrix).toarray()
Finally, I output this to a file:
with open('file.csv','wb') as f:
writer = csv.writer(f)
writer.writerows(full_matrix)
This produces a matrix of binomial vectors, but I do not know which vector represents which word. Is there an accurate way of matching words to vectors?
I've tried parsing the dictionary to creative a list of words which I would glue to the above full_matrix.
#Retrive dictionary
tokenIDs = dictionary.token2id
#Retrieve keys from dictionary and concotanate those to full_matrix
for key, value in tokenIDs.iteritems():
temp1 = unicodedata.normalize('NFKD', key).encode('ascii','ignore')
temp = [temp1]
dictlist.append(temp)
Keys = np.asarray(dictlist)
#Combine Keys and Matrix
labeled_full_matrix = np.concatenate((Keys, full_matrix), axis=1)
However, this does not work. The word ids (Keys) are not matched to the appropriate vectors.
I am under the assumption a much simpler and more elegant approach is possible. But after some time, I haven't been able to find it. Maybe someone here can help, or point me to something fundamental I've missed.
Is this what you want?
%time lda1 = models.LdaModel(corpus1, num_topics=20, id2word=dictionary1, update_every=5, chunksize=10000, passes=100)
import pandas
mixture = [dict(lda1[x]) for x in corpus1]
pandas.DataFrame(mixture).to_csv("output.csv")

Ungrouping a (key, list(values)) pair in Spark/Scala

I have data formatted in the following way:
DataRDD = [(String, List[String])]
The first string indicates the key and the list houses the values. Note that the number of values is different for each key (but is never zero). I am looking to map the RDD in such a way that there will be a key, value pair for each element in the list. To clarify this, imagine the whole RDD as the following list:
DataRDD = [(1, [a, b, c]),
(2, [d, e]),
(3, [a, e, f])]
Then I would like the result to be:
DataKV = [(1, a),
(1, b),
(1, c),
(2, d),
(2, e),
(3, a),
(3, e),
(3, f)]
Consequently, I would like to return all combinations of keys which have identical values. This may be returned into a list for each key, even when there are no identical values:
DataID = [(1, [3]),
(2, [3]),
(3, [1, 2])]
Since I'm fairly new to Spark and Scala I have yet to fully grasp their concepts, as such I hope any of you can help me. Even if it's just a part of this.
This is definitely a newbie question that often times comes up. The solution is to use flatMapValues
val DataRDD = sc.parallelize(Array((1, Array("a", "b", "c")), (2, Array("d", "e")),(3, Array("a", "e", "f"))))
DataRDD.flatMapValues(x => x).collect
Which will give the desired solution
Array((1,a), (1,b), (1,c), (2,d), (2,e), (3,a), (3,e), (3,f))

Changing nested list of numbers to nested list o tuples

please can someone help me with the codes for this nested list of numbers to look like the nested list of tuples below ie from pot to val.
pot = [[1,2,3,4],[5,6,7,8]]
val = [[(1,2),(2,3),(3,4)],[(5,6),(6,7),(7,8)]]
I used a grouper function but it didn't quite give me the desired result. Is there another way ? Thanks
for line in pot:
temp = []
for i in range(len(line)-1):
temp.append( (line[i],line[i+1]) )
val.append(temp)
May contain typos.
There's probably a better way to do it but I noticed there was no answer to your question and worked something out that does the job:
pot = [[1,2,3,4],[5,6,7,8]]
val = []
for sublist in pot:
temp = []
for n in range (1, len(sublist)):
temp.append((sublist[n-1], sublist[n]))
val.append(temp)
print val
prints
[[(1, 2), (2, 3), (3, 4)], [(5, 6), (6, 7), (7, 8)]]
I'm new to Python. Just started learning it because I'm working on a project that requires is, but I think this solves your question.
pot = [[1,2,3,4],[5,6,7,8]]
inner = []
val = []
a = 0
b = 0
for L in pot:
for x in range(len(L)):
if x>0:
a = L[x-1]
b = L[x]
inner.append((a,b))
val.append(inner)
inner = []
print val
My output running python 2.7 is:
[[(1, 2), (2, 3), (3, 4)], [(5, 6), (6, 7), (7, 8)]]