If the set.add() function and list serve for the same purpose in Python? - list

As what mentioned in the Title, if both of them serve for the same purpose?
Most of the time i will chose to use list, and i don't know when is a better time to use set.add() function.
I try both of them and give me the exact same result...
Personally feel list is better. What do you guys think?
a = set()
a.add('a1')
a.add('a2')
a.add('a3')
for ele in a:
print ele
b = []
b.append('a1')
b.append('a2')
b.append('a3')
for ele in b:
print ele
Please advise...

In terms of general data structures, a set structure tends to allow only one element of each value whereas a list may have more than one of each.
In other words, the pseudo-code set.add(7) executed twice results in the set containing the single element 7 (or an error if it considers adding the same element twice to be invalid).
Using a list instead of a set would result in two elements, both being 7.
For Python specifically, adding duplicates to a set is not an error but it still plainly only allows one of each:
>>> s = set()
>>> s.add(1)
>>> s.add(1)
>>> s.add(2)
>>> s
set([1, 2])
The list on the other hand allows multiples:
>>> l = list()
>>> l.append(1)
>>> l.append(1)
>>> l.append(2)
>>> l
[1, 1, 2]
The reason why you didn't see a difference is simply because you added three unique items to the list and set. In that context, they act the same. Behaviour only diverges when you add duplicate items.

Related

How to maintain order of insertion in dictionary in python? [duplicate]

I have a dictionary that I declared in a particular order and want to keep it in that order all the time. The keys/values can't really be kept in order based on their value, I just want it in the order that I declared it.
So if I have the dictionary:
d = {'ac': 33, 'gw': 20, 'ap': 102, 'za': 321, 'bs': 10}
It isn't in that order if I view it or iterate through it. Is there any way to make sure Python will keep the explicit order that I declared the keys/values in?
From Python 3.6 onwards, the standard dict type maintains insertion order by default.
Defining
d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}
will result in a dictionary with the keys in the order listed in the source code.
This was achieved by using a simple array with integers for the sparse hash table, where those integers index into another array that stores the key-value pairs (plus the calculated hash). That latter array just happens to store the items in insertion order, and the whole combination actually uses less memory than the implementation used in Python 3.5 and before. See the original idea post by Raymond Hettinger for details.
In 3.6 this was still considered an implementation detail; see the What's New in Python 3.6 documentation:
The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5).
Python 3.7 elevates this implementation detail to a language specification, so it is now mandatory that dict preserves order in all Python implementations compatible with that version or newer. See the pronouncement by the BDFL. As of Python 3.8, dictionaries also support iteration in reverse.
You may still want to use the collections.OrderedDict() class in certain cases, as it offers some additional functionality on top of the standard dict type. Such as as being reversible (this extends to the view objects), and supporting reordering (via the move_to_end() method).
from collections import OrderedDict
OrderedDict((word, True) for word in words)
contains
OrderedDict([('He', True), ('will', True), ('be', True), ('the', True), ('winner', True)])
If the values are True (or any other immutable object), you can also use:
OrderedDict.fromkeys(words, True)
Rather than explaining the theoretical part, I'll give a simple example.
>>> from collections import OrderedDict
>>> my_dictionary=OrderedDict()
>>> my_dictionary['foo']=3
>>> my_dictionary['aol']=1
>>> my_dictionary
OrderedDict([('foo', 3), ('aol', 1)])
>>> dict(my_dictionary)
{'foo': 3, 'aol': 1}
Note that this answer applies to python versions prior to python3.7. CPython 3.6 maintains insertion order under most circumstances as an implementation detail. Starting from Python3.7 onward, it has been declared that implementations MUST maintain insertion order to be compliant.
python dictionaries are unordered. If you want an ordered dictionary, try collections.OrderedDict.
Note that OrderedDict was introduced into the standard library in python 2.7. If you have an older version of python, you can find recipes for ordered dictionaries on ActiveState.
Dictionaries will use an order that makes searching efficient, and you cant change that,
You could just use a list of objects (a 2 element tuple in a simple case, or even a class), and append items to the end. You can then use linear search to find items in it.
Alternatively you could create or use a different data structure created with the intention of maintaining order.
I came across this post while trying to figure out how to get OrderedDict to work. PyDev for Eclipse couldn't find OrderedDict at all, so I ended up deciding to make a tuple of my dictionary's key values as I would like them to be ordered. When I needed to output my list, I just iterated through the tuple's values and plugged the iterated 'key' from the tuple into the dictionary to retrieve my values in the order I needed them.
example:
test_dict = dict( val1 = "hi", val2 = "bye", val3 = "huh?", val4 = "what....")
test_tuple = ( 'val1', 'val2', 'val3', 'val4')
for key in test_tuple: print(test_dict[key])
It's a tad cumbersome, but I'm pressed for time and it's the workaround I came up with.
note: the list of lists approach that somebody else suggested does not really make sense to me, because lists are ordered and indexed (and are also a different structure than dictionaries).
You can't really do what you want with a dictionary. You already have the dictionary d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}created. I found there was no way to keep in order once it is already created. What I did was make a json file instead with the object:
{"ac":33,"gw":20,"ap":102,"za":321,"bs":10}
I used:
r = json.load(open('file.json'), object_pairs_hook=OrderedDict)
then used:
print json.dumps(r)
to verify.
from collections import OrderedDict
list1 = ['k1', 'k2']
list2 = ['v1', 'v2']
new_ordered_dict = OrderedDict(zip(list1, list2))
print new_ordered_dict
# OrderedDict([('k1', 'v1'), ('k2', 'v2')])
Another alternative is to use Pandas dataframe as it guarantees the order and the index locations of the items in a dict-like structure.
I had a similar problem when developing a Django project. I couldn't use OrderedDict, because I was running an old version of python, so the solution was to use Django's SortedDict class:
https://code.djangoproject.com/wiki/SortedDict
e.g.,
from django.utils.datastructures import SortedDict
d2 = SortedDict()
d2['b'] = 1
d2['a'] = 2
d2['c'] = 3
Note: This answer is originally from 2011. If you have access to Python version 2.7 or higher, then you should have access to the now standard collections.OrderedDict, of which many examples have been provided by others in this thread.
Generally, you can design a class that behaves like a dictionary, mainly be implementing the methods __contains__, __getitem__, __delitem__, __setitem__ and some more. That class can have any behaviour you like, for example prividing a sorted iterator over the keys ...
if you would like to have a dictionary in a specific order, you can also create a list of lists, where the first item will be the key, and the second item will be the value
and will look like this
example
>>> list =[[1,2],[2,3]]
>>> for i in list:
... print i[0]
... print i[1]
1
2
2
3
You can do the same thing which i did for dictionary.
Create a list and empty dictionary:
dictionary_items = {}
fields = [['Name', 'Himanshu Kanojiya'], ['email id', 'hima#gmail.com']]
l = fields[0][0]
m = fields[0][1]
n = fields[1][0]
q = fields[1][1]
dictionary_items[l] = m
dictionary_items[n] = q
print dictionary_items

Get a generator to return first n combinations [duplicate]

This question already has answers here:
How to get the n next values of a generator into a list
(5 answers)
Fetch first 10 results from a list in Python
(4 answers)
Closed 9 days ago.
With linq I would
var top5 = array.Take(5);
How to do this with Python?
Slicing a list
top5 = array[:5]
To slice a list, there's a simple syntax: array[start:stop:step]
You can omit any parameter. These are all valid: array[start:], array[:stop], array[::step]
Slicing a generator
import itertools
top5 = itertools.islice(my_list, 5) # grab the first five elements
You can't slice a generator directly in Python. itertools.islice() will wrap an object in a new slicing generator using the syntax itertools.islice(generator, start, stop, step)
Remember, slicing a generator will exhaust it partially. If you want to keep the entire generator intact, perhaps turn it into a tuple or list first, like: result = tuple(generator)
import itertools
top5 = itertools.islice(array, 5)
#Shaikovsky's answer is excellent, but I wanted to clarify a couple of points.
[next(generator) for _ in range(n)]
This is the most simple approach, but throws StopIteration if the generator is prematurely exhausted.
On the other hand, the following approaches return up to n items which is preferable in many circumstances:
List:
[x for _, x in zip(range(n), records)]
Generator:
(x for _, x in zip(range(n), records))
In my taste, it's also very concise to combine zip() with xrange(n) (or range(n) in Python3), which works nice on generators as well and seems to be more flexible for changes in general.
# Option #1: taking the first n elements as a list
[x for _, x in zip(xrange(n), generator)]
# Option #2, using 'next()' and taking care for 'StopIteration'
[next(generator) for _ in xrange(n)]
# Option #3: taking the first n elements as a new generator
(x for _, x in zip(xrange(n), generator))
# Option #4: yielding them by simply preparing a function
# (but take care for 'StopIteration')
def top_n(n, generator):
for _ in xrange(n):
yield next(generator)
The answer for how to do this can be found here
>>> generator = (i for i in xrange(10))
>>> list(next(generator) for _ in range(4))
[0, 1, 2, 3]
>>> list(next(generator) for _ in range(4))
[4, 5, 6, 7]
>>> list(next(generator) for _ in range(4))
[8, 9]
Notice that the last call asks for the next 4 when only 2 are remaining. The use of the list() instead of [] is what gets the comprehension to terminate on the StopIteration exception that is thrown by next().
Do you mean the first N items, or the N largest items?
If you want the first:
top5 = sequence[:5]
This also works for the largest N items, assuming that your sequence is sorted in descending order. (Your LINQ example seems to assume this as well.)
If you want the largest, and it isn't sorted, the most obvious solution is to sort it first:
l = list(sequence)
l.sort(reverse=True)
top5 = l[:5]
For a more performant solution, use a min-heap (thanks Thijs):
import heapq
top5 = heapq.nlargest(5, sequence)
With itertools you will obtain another generator object so in most of the cases you will need another step the take the first n elements. There are at least two simpler solutions (a little bit less efficient in terms of performance but very handy) to get the elements ready to use from a generator:
Using list comprehension:
first_n_elements = [generator.next() for i in range(n)]
Otherwise:
first_n_elements = list(generator)[:n]
Where n is the number of elements you want to take (e.g. n=5 for the first five elements).
This should work
top5 = array[:5]

top 5 or N lists from nested lists "using one of the element" from nested list (little bit complex)

I have list like following
m=[['abc','x-name',222],['pqr','y-name',333],['mno','j-name',333],['qrt','z-name',111],['dcu','lz-name',999]]
Let's say I want to get top 2 out of this list considering 3rd column(i.e 222 or etc)
I know I can get the Max one like following
>>> m=[['abc','x-name',222],['pqr','y-name',333],['mno','j-name',333],['qrt','z-name',111],['dcu','lz-name',999]]
>>> print max(m, key=lambda x: x[2])
['dcu', 'lz-name', 999]
but what I have to get top 2 (considering the duplicates) my result should be
['dcu', 'lz-name', 999] ['pqr','y-name',333] ['mno','j-name',333]
Is it possible? I head is spinning trying to figure it out, can you pls have look and help me..
OR -just got idea
You can tell me to delete MAX element so that I can get top 2 elements using iteration( duplicate will be a problem though)
You can sort and slice instead:
>>> from operator import itemgetter
>>> sorted(m, key=itemgetter(2), reverse=True)[:3]
[['dcu', 'lz-name', 999], ['pqr', 'y-name', 333], ['mno', 'j-name', 333]]
Or, using the heapq.nlargest():
>>> import heapq
>>> heapq.nlargest(3, m, key=itemgetter(2))
[['dcu', 'lz-name', 999], ['pqr', 'y-name', 333], ['mno', 'j-name', 333]]
This, though, would not handle the duplicates nicely and it is not of a linear time complexity, plus it would created a sorted copy of the initial list in memory. Please see the following threads for linear-time and more memory-efficient solutions:
Get the second largest number in a list in linear time
Best way to sort 1M records in Python

Python: Cleaner ways to initialize

Or maybe I should say, ways to skip having to initialize at all.
I really hate that every time I want to do a simple count variable, I have to say, "hey python, this variable starts at 0." I want to be able to say count+=1and have it instantly know to start from 0 at the first iteration of the loop. Maybe there's some sort of function I can design to accomodate this? count(1) that adds 1 to a self-created internal count variable that sticks around between iterations of the loop.
I have the same dislike for editing strings/lists into a new string/list.
(Initializing new_string=""/new_list=[] before the loop).
I think list comprehensions may work for some lists.
Does anyone have some pointers for how to solve this problem? I am fairly new, I've only been programming off and on for half a year.
Disclaimer: I do not think that this will make initialization any cleaner. Also, in case you have a typo in some uses of your counter variable, you will not get a NameError but instead it will just silently create and increment a second counter. Remember the Zen of Python:
Explicit is better than implicit.
Having said that, you could create a special class that will automatically add missing attributes and use this class to create and auto-initialize all sorts of counters:
class Counter:
def __init__(self, default_func=int):
self.default = default_func
def __getattr__(self, name):
if name not in self.__dict__:
self.__dict__[name] = self.default()
return self.__dict__[name]
Now you can create a single instance of that class to create an arbitrary number of counters of the same type. Example usage:
>>> c = Counter()
>>> c.foo
0
>>> c.bar += 1
>>> c.bar += 2
>>> c.bar
3
>>> l = Counter(list)
>>> l.blub += [1,2,3]
>>> l.blub
[1, 2, 3]
In fact, this is similar to what collections.defaultdict does, except that you can use dot-notation for accessing the counters, i.e. c.foo instead of c['foo']. Come to think of it, you could even extend defaultdict, making the whole thing much simpler:
class Counter(collections.defaultdict):
def __getattr__(self, name):
return self[name]
If you are using a counter in a for loop you can use enumerate:
for counter, list_index in enumerate(list):
the counter is the first variable in the statement and 1 is added to it per iteration of the loop, the next variable is the value of that iteration in the list. I hope this answers your first question as for your second, the following code might help
list_a = ["this", "is"]
list_b = ["a", "test"]
list_a += list_b
print(list_a)
["this", "is", "a", "test"]
The += works for strings as well because they are essentially lists aw well. Hope this helps!

Making two lists identical in python 2.7

I have two lists in python.
a=[1,4,5]
b=[4,1,5]
What i need is to order b according to a. Is there any methods to do it so simply without any
loops?
The easiest way to do this would be to use zip to combine the elements of the two lists into tuples:
a, b = zip(*sorted(zip(a, b)))
sorted will compare the tuples by their first element (the element from a) first; zip(*...) will "unzip" the sorted list.
or may be just check everything is perfect then..copy list a for b
if all(x in b for x in a) and len(a)==len(b):
b=a[:]
If you want to make list2 identical to list1, you don't need to mess with order or re-arrange anything, just replace list2 with a copy of list1:
list2 = list(list1)
list() takes any iterable and produces a new list from it, so we can use this to copy list1, thus creating two lists that are exactly the same.
It might also be possible to just do list2 = list1, but do note that this will cause any changes to either to affect the other (as they point to the same object), so this is probably not what you want.
If list2 is referenced elsewhere, and thus needs to remain the same object, it's possible to replace every value in the list using list2[:] = list1.
In general, you probably want the first solution.
Sort b based on items' index in a, with all items not in a at the end.
>>> a=[1,4,5,2]
>>> b=[4,3,1,5]
>>> sorted(b, key=lambda x:a.index(x) if x in a else len(a))
[1, 4, 5, 3]