Ungrouping a (key, list(values)) pair in Spark/Scala - list

I have data formatted in the following way:
DataRDD = [(String, List[String])]
The first string indicates the key and the list houses the values. Note that the number of values is different for each key (but is never zero). I am looking to map the RDD in such a way that there will be a key, value pair for each element in the list. To clarify this, imagine the whole RDD as the following list:
DataRDD = [(1, [a, b, c]),
(2, [d, e]),
(3, [a, e, f])]
Then I would like the result to be:
DataKV = [(1, a),
(1, b),
(1, c),
(2, d),
(2, e),
(3, a),
(3, e),
(3, f)]
Consequently, I would like to return all combinations of keys which have identical values. This may be returned into a list for each key, even when there are no identical values:
DataID = [(1, [3]),
(2, [3]),
(3, [1, 2])]
Since I'm fairly new to Spark and Scala I have yet to fully grasp their concepts, as such I hope any of you can help me. Even if it's just a part of this.

This is definitely a newbie question that often times comes up. The solution is to use flatMapValues
val DataRDD = sc.parallelize(Array((1, Array("a", "b", "c")), (2, Array("d", "e")),(3, Array("a", "e", "f"))))
DataRDD.flatMapValues(x => x).collect
Which will give the desired solution
Array((1,a), (1,b), (1,c), (2,d), (2,e), (3,a), (3,e), (3,f))

Related

Making tuples out of two lists

I'm a beginner in Prolog and I have two make tuples out of two lists using recursion. For example, func ([1, 2, 3], [4, 5, 6]) should output [(1, 4), (1,5), (1,6), (2, 4), (2, 5), (2, 6), (3, 4), (3 ,5), (3, 6)]. I have the following code:
func([],_,[]).
func([X|T1],Y,[Z|W]):-
match(X,Y,Z),
func(T1,Y,W).
match(X,[Y],[(X,Y)]).
match(X,[Y|T],[(X,Y)|Z]) :-
match(X,T,Z).
But my output for func([1,2,3],[4,5,6],X) is
X = [[(1, 4), (1, 5), (1, 6)], [(2, 4), (2, 5), (2, 6)], [(3, 4), (3, 5), (3, 6)]].
How can I get rid of the extra square brackets in the middle of my output? I've tried playing with the parenthesis and brackets in both of my functions, but I can't figure it out.
Using the findall/3 standard predicate and the member/2 de facto standard predicate:
| ?- findall(X-Y, (member(X,[1,2,3]), member(Y,[4,5,6])), Pairs).
Pairs = [1-4,1-5,1-6,2-4,2-5,2-6,3-4,3-5,3-6]
yes
To understand this solution, observe that, for each value of X, we enumerate by backtracking all values of Y. I.e. when backtracking (as implicitly performed by the findall/3 predicate to construct a list of all solutions of its second argument), we exhaust all solutions for the last choice-point (the member(Y,[4,5,6]) goal) before backtracking to a previous choice-point (the member(X,[1,2,3]) goal). This is know as chronological backtracking and is one of the defining characteristics of Prolog.
Note that I used X-Y, the usual Prolog representation for a pair, instead of (X,Y), which is not a recommended solution for constructing n-tuples as it only works nicely for pairs of elements.

Tuple to list in Prolog

Recently I'm doing a program and it requires me to convert a tuple to a list.
Tuples look like this: [(1,[1,2,3,4]), (1,[2,3,4,5]), ...]
And what I want is a list of: [(1,2,3,4), (2,3,4,5), ...]
Is there any way I can do that?
In Prolog, (1, 4, 2, 5), is syntactical sugar for (1, (4, (2, 5))), just like [1, 4, 2, 5] is syntactical sugar for [1|[4|[2|[5|[]]]]] (note however that the list ends with an empty list [], whereas for a tuple, it ends with a (2, 5)).
list_tuple([A, B], (A, B)).
list_tuple([A|T], (A, B)) :-
list_tuple(T, B).
So then we can write a predicate to unpack the list out of the 2-tuple, and convert the list to a tuple:
conv((_, L), R) :-
list_tuple(L, R).
and we can use maplist/3 to perform the conversion over the entire list:
convlist(As, Bs) :-
maplist(conv, As, Bs).
This then yields:
?- convlist([(1,[1,2,3,4]), (1,[2,3,4,5])], R).
R = [(1, 2, 3, 4), (2, 3, 4, 5)] ;
false.
Tuples are however in Prolog not very common, so I do not see why you do not stick with the list itself.

How do I flatten a bag full of tuples into a bag?

Like, say I have {{(1, a), (2, b)},{(3, c), (4, c)}}. How do I get {(1, a), (2, b), (3, c), (4, d)} from this?
Use FLATTEN. Details can be found here.
A = LOAD 'data';
B = FOREACH A GENERATE FLATTEN($0);
DUMP B;

Composing a list of all pairs

I'm brand new to Scala, having had very limited experience with functional programming through Haskell.
I'd like to try composing a list of all possible pairs constructed from a single input list. Example:
val nums = List[Int](1, 2, 3, 4, 5) // Create an input list
val pairs = composePairs(nums) // Function I'd like to create
// pairs == List[Int, Int]((1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1) ... etc)
I tried using zip on each element with the whole list, hoping that it would duplicate the one item across the whole. It didn't work (only matched the first possible pair). I'm not sure how to repeat an element (Haskell does it with cycle and take I believe), and I've had trouble following the documentation on Scala.
This leaves me thinking that there's probably a more concise, functional way to get the results I want. Does anybody have a good solution?
How about this:
val pairs = for(x <- nums; y <- nums) yield (x, y)
For those of you who don't want duplicates:
val uniquePairs = for {
(x, idxX) <- nums.zipWithIndex
(y, idxY) <- nums.zipWithIndex
if idxX < idxY
} yield (x, y)
val nums = List(1,2,3,4,5)
uniquePairs: List[(Int, Int)] = List((1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
Here's another version using map and flatten
val pairs = nums.flatMap(x => nums.map(y => (x,y)))
List[(Int, Int)] = List((1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (2,5), (3,1), (3,2), (3,3), (3,4), (3,5), (4,1), (4,2), (4,3), (4,4), (4,5), (5,1), (5,2) (5,3), (5,4), (5,5))
This can then be easily wrapped into a composePairs function if you like:
def composePairs(nums: Seq[Int]) =
nums.flatMap(x => nums.map(y => (x,y)))

GNU Prolog - Loop and new list

This is just kind of general question, steming from something else.
Say you want the product table from a matrix ( I think thats what its called).
Example i put in
outer([1,2,3],[4,5,6],L).
Then L = [[4,5,6],[8,10,12],[12,14,18]]
So i want to iterate through two lists and create a new list.
I got this:
outer(L1,L2,L3) :-
append(LL,[L|RL],L1),
append(LE,[E|RE],L2),
Prod is L * E, !,
append(LE,[Prod|RE], NewL),
append(LL,[NewL|RL], L3).
which is kind of close. I know i can use append to iterate through both Lists, not sure how to create a new list. Always have trouble when it comes to creating a completely new list.
Thanks.
product([],_,[]).
product([H1|T1],L2,R):- mul(H1,L2,R1),product(T1,L2,R2),append([R1],R2,R).
mul(X,[],[]).
mul(X,[H|T],[Z|R]):-Z is X*H, mul(X,T,R).
Here's another, it uses map instead of append. Dot-products are produced for products involving a non number. It's also deterministic.
The multiplier:
amul([], _Other_Row,[]).
amul([X|Xs],Other_Row,[Row_Out|Rest_Out]) :-
maplist(mul(X),Other_Row, Row_Out),
amul(Xs,Other_Row, Rest_Out).
The product predicate:
mul(X,Y, Prod) :-
( number(X), number(Y)
-> Prod is X * Y
; true
-> Prod = dot(X,Y)
).
[1,3,5] X [2,4,6]
?- amul([1,3,5], [2,4,6],Prod).
Prod = [[2, 4, 6], [6, 12, 18], [10, 20, 30]].
[a,b,c] X [1,2,3,4]
?- amul([a,b,c],[1,2,3,4],Prod).
Prod = [[dot(a, 1), dot(a, 2), dot(a, 3), dot(a, 4)],
[dot(b, 1), dot(b, 2), dot(b, 3), dot(b, 4)],
[dot(c, 1), dot(c, 2), dot(c, 3), dot(c, 4)]].