Check if all first elements in tuple list satisfies condition - list

I want to check if a list is a subset of another, based on the first element in its tuple.
subset(List(('a', 1), ('b', 2), ('c', 3)), List(('a', 4), ('b', 5)) // True
subset(List(('a', 1), ('b', 2), ('c', 3)), List(('a', 4), ('b', 5), ('f', 6)) // False
The size of the lists does not have to be the same. I've tried something like this, but with no luck
x.forall((char: Char, num: Int) => {y.contains((_,num))})

You can map in the input lists to retain only the first element, then use some set functionality to check equality:
def subset(a: List[(Char, Int)], b: List[(Char, Int)]): Boolean = {
val a_ = a.map(_._1).toSet
val b_ = b.map(_._1).toSet
b_.subsetOf(a_)
}
Update: Simplified based on suggestion from Luis

Related

Delete duplicates in list of tuples [duplicate]

If I have the list of tuples as the following:
[('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
I would like to remove duplicate tuples (duplicate in terms of both content and order of items inside) so that the output would be:
[('a', 'b'), ('c', 'd')]
Or
[('b', 'a'), ('c', 'd')]
I tried converting it to set then to list but the output would maintain both ('b', 'a') and ('a', 'b') in the resulting set!
Try this :
a = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
b = list(set([ tuple(sorted(t)) for t in a ]))
[('a', 'b'), ('c', 'd')]
Let's break this down :
If you sort a tuple, it becomes a sorted list.
>>> t = ('b', 'a')
>>> sorted(t)
['a', 'b']
For each tuple t in a, sort it and convert it back to a tuple.
>>> b = [ tuple(sorted(t)) for t in a ]
>>> b
[('a', 'b'), ('c', 'd'), ('a', 'b'), ('a', 'b')]
Convert the resulting list b to a set : values are now unique. Convert it back to a list.
>>> list(set(b))
[('a', 'b'), ('c', 'd')]
Et voilĂ  !
Note that you can skip the creation of the intermediate list b by using a generator instead of a list comprehension.
>>> list(set(tuple(sorted(t)) for t in a))
[('a', 'b'), ('c', 'd')]
If you did not mind using a frozenset with a set:
l = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
print(set(map(frozenset,l)))
{frozenset({'a', 'b'}), frozenset({'c', 'd'})}
You can convert back to tuple if preferable:
l = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
print(list(map(tuple,set(map(frozenset ,l)))))
[('a', 'b'), ('d', 'c')]
Or using a set and reversing the order of the tuples:
l = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
seen, pairs = set(), []
for a,b in l:
if (a,b) not in seen and (b,a) not in seen:
pairs.append((a,b))
seen.add((a,b))
This can solve your problem if order is not important.
a=[('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
a=map(tuple,[sorted(i) for i in a])
print list(set(a))
Output:
[('a', 'b'), ('c', 'd')]
Just wanted to add a potential second solution if anyone has a use case where "first come, first serve" might matter.
For example, say we take three lists and merge them into a list of tuples:
# Make some lists (must be same size)
a = [1,1,1,2,8,6,1]
b = [2,4,6,1,4,21,69]
c = [2,8,21,2,1,1,8]
# Lists to list of tuples
arr = []
for i in range(len(a)):
new_row = (a[i],b[i],c[i])
arr.append(new_row)
Such that our original array looks like:
(1, 2, 2)
(1, 4, 8)
(1, 6, 21)
(2, 1, 2)
(8, 4, 1)
(6, 21, 1)
(1, 69, 8)
In our case, we want to remove items like (2,1,2) and (8,4,1) as they're equivalent to (1,2,2) and (1,4,8) respectively.
To do this, we can use a new empty list called filtered or something, and itertools.permutations() on each tuple in the original array.
First, we check if any permutation of each item is present in the filtered list.
If not, we add. If it is, we skip the duplicate.
filtered = []
for i in range(len(arr)):
it = itertools.permutations(arr[i])
perms = []
for p in it:
perms.append(p)
check = any(item in perms for item in filtered)
if not check:
filtered.append(arr[i])
Now if we iterate over filtered and print, we see our truncated list of tuples:
(1, 2, 2)
(1, 4, 8)
(1, 6, 21)
(1, 69, 8)
Note that we're left with the first instance of each tuple of numbers, and not working via a set guarantees the same order of elements when iterating over the filtered list.
Only thing I'm not 100% on is the time/space complexity of doing it this way -- if anyone has feedback I'd love to hear about it.
Built-in types to the rescue:
data = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
set(map(frozenset, data))
{frozenset({'a', 'b'}), frozenset({'c', 'd'})}

How to sortby more than one value in pyspark

I am playing around with Spark. I tried the sortBy function in spark with some sample data
tmp = [('e', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5),('a',1)]
sc.parallelize(tmp).sortBy(lambda (x,y): y).collect()
This works fine and sorts by the integer value in the key value pair. What is required to sort as per key after sorting it integer wise?
sc.parallelize(tmp).sortBy(lambda (x,y): y,x).collect()
says x is not defined.
Desired output
('a', 1),('e',1) ('b', 2), ('1', 3), ('d', 4), ('2', 5)
Have you tried,
sc.parallelize(tmp).sortBy(lambda (x, y): (y, x)).collect()?
sortBy(lambda (x, y): y, x) is a function call with 2 arguments in Python.

Slicing a Python OrderedDict

In my code I frequently need to take a subset range of keys+values from a Python OrderedDict (from collections package). Slicing doesn't work (throws TypeError: unhashable type) and the alternative, iterating, is cumbersome:
from collections import OrderedDict
o = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
# want to do:
# x = o[1:3]
# need to do:
x = OrderedDict()
for idx, key in enumerate(o):
if 1 <= idx < 3:
x[key] = o[key]
Is there a better way to get this done?
You can use the itertools.islice function, which takes an iterable and outputs the stop first elements. This is beneficial since iterables don't support the common slicing method, and you won't need to create the whole items list from the OrderedDict.
from collections import OrderedDict
from itertools import islice
o = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
sliced = islice(o.items(), 3) # o.iteritems() in Python 2.7 is o.items() in Python 3
sliced_o = OrderedDict(sliced)
The ordered dict in the standard library, doesn't provide that functionality. Even though libraries existed for a few years before collections.OrderedDict that have this functionality (and provide essentially a superset of OrderedDict): voidspace odict and ruamel.ordereddict (I am the author of the latter package, which is a reimplementation of odict in C):
from odict import OrderedDict as odict
p = odict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print p[1:3]
In ruamel.ordereddict you can relax the ordered input requirement (AFAIK you cannot ask derivative of dict if its keys are ordered (would be good addition to ruamel.ordereddict to recognise collection.OrderedDicts)):
from ruamel.ordereddict import ordereddict
q = ordereddict(o, relax=True)
print q[1:3]
r = odict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print r[1:3]
If you want (or have to) stay within the standard library you can sublass collections.OrderedDict's __getitem__:
class SlicableOrderedDict(OrderedDict):
def __getitem__(self, k):
if not isinstance(k, slice):
return OrderedDict.__getitem__(self, k)
x = SlicableOrderedDict()
for idx, key in enumerate(self.keys()):
if k.start <= idx < k.stop:
x[key] = self[key]
return x
s = SlicableOrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print s[1:3]
of course you could use Martijn's or Jimmy's shorter versions to get the actual slice that needs returning:
from itertools import islice
class SlicableOrderedDict(OrderedDict):
def __getitem__(self, k):
if not isinstance(k, slice):
return OrderedDict.__getitem__(self, k)
return SlicableOrderedDict(islice(self.viewitems(), k.start, k.stop))
t = SlicableOrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print t[1:3]
or if you just want smarten up all existing OrderedDicts without subclassing:
def get_item(self, k):
if not isinstance(k, slice):
return OrderedDict._old__getitem__(self, k)
return OrderedDict(islice(self.viewitems(), k.start, k.stop))
OrderedDict._old__getitem__ = OrderedDict.__getitem__
OrderedDict.__getitem__ = get_item
u = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print u[1:3]
In Python 2, you can slice the keys:
x.keys()[1:3]
and to support both Python 2 and Python 3, you'd convert to a list first:
list(k)[1:3]
The Python 2 OrderedDict.keys() implementation does exactly that.
In both cases you are given a list of keys in correct order. If creating a whole list first is an issue, you can use itertools.islice() and convert the iterable it produces to a list:
from itertools import islice
list(islice(x, 1, 3))
All of the above also can be applied to the items; use dict.viewitems() in Python 2 to get the same iteration behaviour as Python 3 dict.items() provides. You can pass the islice() object straight to another OrderedDict() in this case:
OrderedDict(islice(x.items(), 1, 3)) # x.viewitems() in Python 2
I was able to slice an OrderedDict using the following:
list(myordereddict.values())[start:stop]
I didn't test the performance.
I wanted to slice using a key, since I didn't know the index in advance:
o = OrderedDict(zip(list('abcdefghijklmnopqrstuvwxyz'),range(1,27)))
stop = o.keys().index('e') # -> 4
OrderedDict(islice(o.items(),stop)) # -> OrderedDict([('a', 1), ('b', 2), ('c', 3)])
or to slice from start to stop:
start = o.keys().index('c') # -> 2
stop = o.keys().index('e') # -> 4
OrderedDict(islice(o.iteritems(),start,stop)) # -> OrderedDict([('c', 3), ('d', 4)])
def slice_odict(odict, start=None, end=None):
return OrderedDict([
(k,v) for (k,v) in odict.items()
if k in list(odict.keys())[start:end]
])
This allows for:
>>> x = OrderedDict([('a',1), ('b',2), ('c',3), ('d',4)])
>>> slice_odict(x, start=-1)
OrderedDict([('d', 4)])
>>> slice_odict(x, end=-1)
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> slice_odict(x, start=1, end=3)
OrderedDict([('b', 2), ('c', 3)])
x = OrderedDict(o.items()[1:3])

Composing a list of all pairs

I'm brand new to Scala, having had very limited experience with functional programming through Haskell.
I'd like to try composing a list of all possible pairs constructed from a single input list. Example:
val nums = List[Int](1, 2, 3, 4, 5) // Create an input list
val pairs = composePairs(nums) // Function I'd like to create
// pairs == List[Int, Int]((1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1) ... etc)
I tried using zip on each element with the whole list, hoping that it would duplicate the one item across the whole. It didn't work (only matched the first possible pair). I'm not sure how to repeat an element (Haskell does it with cycle and take I believe), and I've had trouble following the documentation on Scala.
This leaves me thinking that there's probably a more concise, functional way to get the results I want. Does anybody have a good solution?
How about this:
val pairs = for(x <- nums; y <- nums) yield (x, y)
For those of you who don't want duplicates:
val uniquePairs = for {
(x, idxX) <- nums.zipWithIndex
(y, idxY) <- nums.zipWithIndex
if idxX < idxY
} yield (x, y)
val nums = List(1,2,3,4,5)
uniquePairs: List[(Int, Int)] = List((1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
Here's another version using map and flatten
val pairs = nums.flatMap(x => nums.map(y => (x,y)))
List[(Int, Int)] = List((1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (2,5), (3,1), (3,2), (3,3), (3,4), (3,5), (4,1), (4,2), (4,3), (4,4), (4,5), (5,1), (5,2) (5,3), (5,4), (5,5))
This can then be easily wrapped into a composePairs function if you like:
def composePairs(nums: Seq[Int]) =
nums.flatMap(x => nums.map(y => (x,y)))

How to merge and diff lists in Scala?

How do I perform an operation that checks the IDs and removes the different elements of the first list and adds the different elements of the second list?
The letters are a Entity Id. The numbers are a object reference in memory.
List 1: A:1, B:2, C:3, D:4, E:5
List 2: B:6, C:7, E:8, F:9
RemovedElements: A:1, D:4
InvalidElements: B:6, C:7, E:8
ResultList: B:2, C:3, E:5, F:9
Does anyone know if there is any function that performs this operation?
scala> val l1 = Seq(('A', 1), ('B', 2), ('C', 3), ('D', 4), ('E', 5))
l1: Seq[(Char, Int)] = List((A,1), (B,2), (C,3), (D,4), (E,5))
scala> val l2 = Seq(('B', 6), ('C', 7), ('E', 8), ('F', 9))
l2: Seq[(Char, Int)] = List((B,6), (C,7), (E,8), (F,9))
scala> l2 map { e =>
| l1.find(_._1 == e._1).getOrElse(e)
| }
res51: Seq[(Char, Int)] = List((B,2), (C,3), (E,5), (F,9))