How do I flatten a bag full of tuples into a bag? - mapreduce

Like, say I have {{(1, a), (2, b)},{(3, c), (4, c)}}. How do I get {(1, a), (2, b), (3, c), (4, d)} from this?

Use FLATTEN. Details can be found here.
A = LOAD 'data';
B = FOREACH A GENERATE FLATTEN($0);
DUMP B;

Related

Django get a list of related id's linked to each parent record id in a query set?

I have a relationship client has many projects.
I want to create a dictionary of the form:
{
'client_id': ['project_id1', 'project_id2'],
'client_id2': ['project_id7', 'project_id8'],
}
what I tried was;
clients_projects = Client.objects.values_list('id', 'project__id')
which gave me:
<QuerySet [(3, 4), (3, 5), (3, 11), (3, 12), (2, 3), (2, 13), (4, 7), (4, 8), (4, 9), (1, 1), (1, 2), (1, 6), (1, 10)]>
which I can cast to a list with list(clients_projects):
[(3, 4),
(3, 5),
(3, 11),
(3, 12),
(2, 3),
(2, 13),
(4, 7),
(4, 8),
(4, 9),
(1, 1),
(1, 2),
(1, 6),
(1, 10)]
Assuming that projects is the ReverseManyToOneDescriptor attached to the Client model, you can write that:
clients_projects = { c.id : c.projects.values_list('id', flat=True) for c in Client.objects.all() }
This problem is quite similar to this one: Django GROUP BY field value.
Since Django doesn't provide a group_by (yet) you need to manually replicate the behavior:
result = {}
for client in Client.objects.all().distinct():
result[client.id] = Client.objects.filter(id=client.id)
.values_list('project__id', flat=True)
Breakdown:
Get a set of distinct clients from you Client model and iterate through them.(You can also order that set if you wish, by adding .order_by('id') for example)
Because you need only the project__id as a list, you can utilize values_list()'s flat=True argument, which returns a list of values.
Finally, result will look like this:
{
'client_1_id': [1, 10, ...],
'client_5_id': [2, 5, ...],
...
}
There is a module that claims to add GROUP BY functionality to Django: https://github.com/kako-nawao/django-group-by, but I haven't used it so I just list it here and not actually recommend it.

how to sort list in python which has two numbers per index value?

My code
b=[((1,1)),((1,2)),((2,1)),((2,2)),((1,3))]
for i in range(len(b)):
print b[i]
Obtained output:
(1, 1)
(1, 2)
(2, 1)
(2, 2)
(1, 3)
how do i sort this list by the first element or/and second element in each index value to get the output as:
(1, 1)
(1, 2)
(1, 3)
(2, 1)
(2, 2)
OR
(1, 1)
(2, 1)
(1, 2)
(2, 2)
(1, 3)
It would be nice if both columns are sorted as shown in the desired output, how ever if either of the output columns is sorted it will suffice.
Try this: b = sorted(b, key = lambda i: (i[0], i[1]))
The sorted builtin does this.
>>> sorted (b)
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2)]
This only sorts by the first element, to sort on the second
>>> sorted(b, key=lambda i: i[1])
[(1, 1), (2, 1), (1, 2), (2, 2), (1, 3)]
Also notice that Python doesn't allow this nested tuple; the paren inside a paren is reduced to just one.
>>> b=[((1,1)),((1,2)),((2,1)),((2,2)),((1,3))]
>>> b
[(1, 1), (1, 2), (2, 1), (2, 2), (1, 3)]

Ungrouping a (key, list(values)) pair in Spark/Scala

I have data formatted in the following way:
DataRDD = [(String, List[String])]
The first string indicates the key and the list houses the values. Note that the number of values is different for each key (but is never zero). I am looking to map the RDD in such a way that there will be a key, value pair for each element in the list. To clarify this, imagine the whole RDD as the following list:
DataRDD = [(1, [a, b, c]),
(2, [d, e]),
(3, [a, e, f])]
Then I would like the result to be:
DataKV = [(1, a),
(1, b),
(1, c),
(2, d),
(2, e),
(3, a),
(3, e),
(3, f)]
Consequently, I would like to return all combinations of keys which have identical values. This may be returned into a list for each key, even when there are no identical values:
DataID = [(1, [3]),
(2, [3]),
(3, [1, 2])]
Since I'm fairly new to Spark and Scala I have yet to fully grasp their concepts, as such I hope any of you can help me. Even if it's just a part of this.
This is definitely a newbie question that often times comes up. The solution is to use flatMapValues
val DataRDD = sc.parallelize(Array((1, Array("a", "b", "c")), (2, Array("d", "e")),(3, Array("a", "e", "f"))))
DataRDD.flatMapValues(x => x).collect
Which will give the desired solution
Array((1,a), (1,b), (1,c), (2,d), (2,e), (3,a), (3,e), (3,f))

Changing nested list of numbers to nested list o tuples

please can someone help me with the codes for this nested list of numbers to look like the nested list of tuples below ie from pot to val.
pot = [[1,2,3,4],[5,6,7,8]]
val = [[(1,2),(2,3),(3,4)],[(5,6),(6,7),(7,8)]]
I used a grouper function but it didn't quite give me the desired result. Is there another way ? Thanks
for line in pot:
temp = []
for i in range(len(line)-1):
temp.append( (line[i],line[i+1]) )
val.append(temp)
May contain typos.
There's probably a better way to do it but I noticed there was no answer to your question and worked something out that does the job:
pot = [[1,2,3,4],[5,6,7,8]]
val = []
for sublist in pot:
temp = []
for n in range (1, len(sublist)):
temp.append((sublist[n-1], sublist[n]))
val.append(temp)
print val
prints
[[(1, 2), (2, 3), (3, 4)], [(5, 6), (6, 7), (7, 8)]]
I'm new to Python. Just started learning it because I'm working on a project that requires is, but I think this solves your question.
pot = [[1,2,3,4],[5,6,7,8]]
inner = []
val = []
a = 0
b = 0
for L in pot:
for x in range(len(L)):
if x>0:
a = L[x-1]
b = L[x]
inner.append((a,b))
val.append(inner)
inner = []
print val
My output running python 2.7 is:
[[(1, 2), (2, 3), (3, 4)], [(5, 6), (6, 7), (7, 8)]]

Composing a list of all pairs

I'm brand new to Scala, having had very limited experience with functional programming through Haskell.
I'd like to try composing a list of all possible pairs constructed from a single input list. Example:
val nums = List[Int](1, 2, 3, 4, 5) // Create an input list
val pairs = composePairs(nums) // Function I'd like to create
// pairs == List[Int, Int]((1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1) ... etc)
I tried using zip on each element with the whole list, hoping that it would duplicate the one item across the whole. It didn't work (only matched the first possible pair). I'm not sure how to repeat an element (Haskell does it with cycle and take I believe), and I've had trouble following the documentation on Scala.
This leaves me thinking that there's probably a more concise, functional way to get the results I want. Does anybody have a good solution?
How about this:
val pairs = for(x <- nums; y <- nums) yield (x, y)
For those of you who don't want duplicates:
val uniquePairs = for {
(x, idxX) <- nums.zipWithIndex
(y, idxY) <- nums.zipWithIndex
if idxX < idxY
} yield (x, y)
val nums = List(1,2,3,4,5)
uniquePairs: List[(Int, Int)] = List((1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
Here's another version using map and flatten
val pairs = nums.flatMap(x => nums.map(y => (x,y)))
List[(Int, Int)] = List((1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (2,5), (3,1), (3,2), (3,3), (3,4), (3,5), (4,1), (4,2), (4,3), (4,4), (4,5), (5,1), (5,2) (5,3), (5,4), (5,5))
This can then be easily wrapped into a composePairs function if you like:
def composePairs(nums: Seq[Int]) =
nums.flatMap(x => nums.map(y => (x,y)))