I have two lists that I zip() together:
>> x1 = ['1', '2', '3']
>> y1 = ['a', 'b', 'c']
>> zipped = zip(x1, y1)
As expected so far:
>> print(list(zipped)
[('1', 'a'), ('2', 'b'), ('3', 'c')]
From the docs, it seems like I can just do this to get back the two lists from the zip object:
>> x2, y2 = zip(*zipped)
But instead I get the error:
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2869, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-6-58fe68d00d1c>", line 1, in <module>
x2, y2 = zip(*zipped)
ValueError: not enough values to unpack (expected 2, got 0)
Obviously I'm not understanding something simple about the zip object.
Edit:
As #daragua points out below, the print(list(zipped)) was actually consuming the zipped object, thus making it empty. That's true, for my simple example. I'm still having an issue with my real code.
What I'm trying to to is write a unit test for Django view that has a zipped object in it's context. The view works fine, I'm just struggling writing the test for it.
In my view context I have this:
for season in context['pools']:
commissioner.append(season.pool.is_commish(self.request.user))
context['pools'] = zip(context['pools'], commissioner)
This works as expected. The pools context object is two lists, which the template handles just fine:
{% for season, commissioner in pools %}
The test I'm struggling to write is to check if the commissioner value is correct for a the pool object for the logged in user. In my test:
context = self.response.context['pools']
print(list(context ))
In this case, context is an empty list [].
The zip function returns an iterator. The print(list(zipped)) call thus runs the iterator til its end and the next zip(*zipped) doesn't have anything to eat.
Just zip back to reverse:
>>> zipped=zip(x1, y1)
>>> x2, y2=zip(*zipped)
>>> x2
('1', '2', '3')
And use map if you want the result to be a list rather than a tuple:
>>> zipped=zip(x1, y1)
>>> x2, y2=map(list, zip(*zipped))
>>> x2
['1', '2', '3']
>>> y2
['a', 'b', 'c']
Side note: In Python 3, zip returns a one-time use iterator. That is why zip needs to be called again after each use (in Python 3 only)
Ultimately the issue wasn't with the zip object at all.
If you are sending a zip object to a Django context:
x = ['1', '2', '3']
y = ['a', 'b', 'c']
context['zipped'] = zip(x, y)
And if you are trying to write a unit test for the context data, you don't need to unzip, as it can be pulled from the response context already unzipped:
def test_zipped_data(TestCase):
x, y = self.response.context_data['zipped']
self.assertIn('b', y)
However if your context data is empty (zipped == []), then you will get the ValueError exception. But that's just because your context is empty and your test worked!
Related
Update 1: the last line of code sorted_xlist = sorted(xlist).extend(sorted(words_cp)) should be changed to:
sorted_xlist.extend(sorted(xlist))
sorted_xlist.extend(sorted(words_cp))
Update 1: Code is updated to solve the problem of changing length of words list.
This exercise of list functions is from Google's Python Introduction course. I don't know why the code doesn't work in Python 2.7. The goal of the code is explained in annotation portion.
# B. front_x
# Given a list of strings, return a list with the strings
# in sorted order, except group all the strings that begin with 'x' first.
# e.g. ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] yields
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
# Hint: this can be done by making 2 lists and sorting each of them
# before combining them.
def front_x(words):
words_cp = []
words_cp.extend(words)
xlist=[]
sorted_xlist=[]
for i in range(0, len(words)):
if words[i][0] == 'x':
xlist.append(words[i])
words_cp.remove(words[i])
print sorted(words_cp) # For debugging
print sorted(xlist) # For debugging
sorted_xlist = sorted(xlist).extend(sorted(words_cp))
return sorted_xlist
Update 1: Now error message is gone.
front_x
['axx', 'bbb', 'ccc']
['xaa', 'xzz']
X got: None expected: ['xaa', 'xzz', 'axx', 'bbb', 'ccc']
['aaa', 'bbb', 'ccc']
['xaa', 'xcc']
X got: None expected: ['xaa', 'xcc', 'aaa', 'bbb', 'ccc']
['aardvark', 'apple', 'mix']
['xanadu', 'xyz']
X got: None expected: ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
The splitting of the original list works fine. But the merging doesn't work.
You're iterating over a sequence as you're changing its length.
Imagine if you start off with an array
arr = ['a','b','c','d','e']
When you remove the first two items from it, now you have:
arr = ['c','d','e']
But you're still iterating over the length of the original array. Eventually you get to i > 2, in my example above, which raises an IndexError.
I am trying to sort a list of 100 filenames so they will used in the right order in later calculations. All the filenames have 'name_1' in the beginning of the name and '_out.txt' at the end. The difference is a number in between, going from 1-100
The list looks a bit like this:
['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
For this actual example I want:
['name_1_2_out.txt', 'name_1_5_out.txt', 'name_1_6_out.txt', 'name_1_10_out.txt', 'name_1_100_out.txt']
Now I have tried both list.sort and sorted(list) but with no luck. I have also tried with the key=int or key=str but none of them could help, since it seems, that it could not convert only a part of the string to int.
Can anyone help me with advice
You need leading zeros to sort the way you want.
#!/usr/bin/python
# -*- coding: utf-8 -*-
L=['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
OUT=[]
n='100' # max number
for item in L:
old=item[7:-8] # Faulty index
if len(old) < len(n):
new='0'*(len(n)-len(old))+old # Nice index
item=item.replace(old, new)
OUT.append(item)
OUT.sort()
print OUT
Result
['name_1_002_out.txt', 'name_1_005_out.txt', 'name_1_006_out.txt', 'name_1_010_out.txt', 'name_1_100_out.txt']
I would suggest renaming files to make life easier later on since not all file managers display faulty filenames in order.
You can use the key function for this task:
>>> l = ['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
>>> sorted(l,key=lambda s: int(s.split('_')[2]))
['name_1_2_out.txt', 'name_1_5_out.txt', 'name_1_6_out.txt', 'name_1_10_out.txt', 'name_1_100_out.txt']
lista = ['2','3','5','8','4','6','1']
listb = [('2','3'),('5','8'),('4','6'),('1','9')]
listc = {'a':'3','b':'5','c':'9','d':'4','e':'2','f':'0'}
d = sorted(lista, key=lambda item:int(item), reverse=True)
e = sorted(listb, key=lambda item:int(item[0]) + int(item[1]), reverse=True)
f = sorted(listc.items(), key=lambda item:int(item[1]), reverse=True)
print(d)
print(e)
print(f)
output:
['8', '6', '5', '4', '3', '2', '1']
[('5', '8'), ('4', '6'), ('1', '9'), ('2', '3')]
[('c', '9'), ('b', '5'), ('d', '4'), ('a', '3'), ('e', '2'), ('f', '0')]
I am often vertically concatenating many *.csv files in Pandas. So, everytime I do this, I have to check that all the files I am concatenating have the same number of columns. This became quite cumbersome since I had to figure out a way to ignore the files with more or less columns than what I tell it I need. eg. the first 10 files have 4 columns but then file #11 has 8 columns and file #54 has 7 columns. This means I have to load all files - even the files that have the wrong number of columns. I want to avoid loading those files and then trying to concatenate them vertically - I want to skip them completely.
So, I am trying to write a Unit Test with Pandas that will:
a. check the size of all the *.csv files in some folder
b. ONLY read in the files that have a pre-determined number of columns
c. print a message indicating the naems of the *.csv files have the wrong number of columns
Here is what I have (I am working in the folder C:\Users\Downloads):
import unittest
import pandas as pd
from os import listdir
# Create csv files:
df1 = pd.DataFrame(np.random.rand(10,4), columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.rand(10,3), columns = ['A', 'B', 'C'])
df1.to_csv('test1.csv')
df1.to_csv('test2.csv')
class Conct(unittest.TestCase):
"""Tests for `primes.py`."""
TEST_INP_DIR = 'C:\Users\Downloads'
fns = listdir(TEST_INP_DIR)
t_fn = fn for fn in fns if fn.endswith(".csv") ]
print t_fn
dfb = pd.DataFrame()
def setUp(self):
for elem in Conct.t_fn:
print elem
fle = pd.read_csv(elem)
try:
pd.concat([Conct.dfb,fle],axis = 0, join='outer', join_axes=None, ignore_index=True, verify_integrity=False)
except IOError:
print 'Error: unable to concatenate a file with %s columns.' % fle.shape[1]
self.err_file = fle
def tearDown(self):
del self.err_fle
if __name__ == '__main__':
unittest.main()
Problem:
I am gettingthis output:
['test1.csv', 'test2.csv']
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
The first print statement works - it is printing a list of *.csv files, as expected. But, for some reason, the second and third print statements do not work.
Also, the concatenation should not have gone through - the second file has 3 columns but the first one has got 4 columns. The IOerror line does not seem to be printing.
How can I use a Python unittest to check each of the *.csv files to make sure that they have the same number of columns before concatenation? And how can I print the appropriate error message at the correct time?
On second thought, instead of chunksize, just read in the first row and count the number of columns, then read and append everything with the correct number of columns. In short:
for f in files:
test = pd.read_csv( f, nrows=1 )
if len( test.columns ) == 4:
df = df.append( pd.read_csv( f ) )
Here's the full version:
df1 = pd.DataFrame(np.random.rand(2,4), columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.rand(2,3), columns = ['A', 'B', 'C'])
df3 = pd.DataFrame(np.random.rand(2,4), columns = ['A', 'B', 'C', 'D'])
df1.to_csv('test1.csv',index=False)
df2.to_csv('test2.csv',index=False)
df3.to_csv('test3.csv',index=False)
files = ['test1.csv', 'test2.csv', 'test3.csv']
df = pd.DataFrame()
for f in files:
test = pd.read_csv( f, nrows=1 )
if len( test.columns ) == 4:
df = df.append( pd.read_csv( f ) )
In [54]: df
Out [54]:
A B C D
0 0.308734 0.242331 0.318724 0.121974
1 0.707766 0.791090 0.718285 0.209325
0 0.176465 0.299441 0.998842 0.077458
1 0.875115 0.204614 0.951591 0.154492
(Edit to add) Regarding the use of nrows for the test... line: The only point of the test line is to read in enough of the CSV so that on the next line we check if it has the right number of columns before reading in. In this test case, reading in the first row is sufficient to figure out if we have 3 or 4 columns, and it's inefficient to read in more than that, although there is no harm in leaving off the nrows=1 besides reduced efficiency.
In other cases (e.g. no header row and varying numbers of columns in the data), you might need to read in the whole CSV. In that case, you'd be better off doing it like this:
for f in files:
test = pd.read_csv( f )
if len( test.columns ) == 4:
df = df.append( test )
The only downside of that way is that you completely read in the datasets with 3 columns that you don't want to keep, but you also don't read in the good datasets twice that way. So that's definitely a better way if you don't want to use nrows at all. Ultimately, depends on what your actual data looks like as to which way is best for you, of course.
So I am trying to save multiple plots which are generated after every iteration of a for loop and I want to insert a name tag on those plots like a header with the number of iterations done. code looks like this. I tried suptitle but it does not work.
for i in range(steps):
nor_m = matplotlib.colors.Normalize(vmin = 0, vmax = 1)
plt.hexbin(xxx,yyy,C, gridsize=13, cmap=matplotlib.cm.rainbow, norm=nor_m, edgecolors= 'k', extent=[-1,12,-1,12])
plt.draw()
plt.suptitle('frame'%i, fontsize=12)
savefig("flie%d.png"%i)
What about plt.title?
for i in range(steps):
nor_m = matplotlib.colors.Normalize(vmin=0, vmax=1)
plt.hexbin(xxx, yyy, C, gridsize=13, cmap=matplotlib.cm.rainbow, norm=nor_m, edgecolors= 'k', extent=[-1,12,-1,12])
plt.title('frame %d'%i, fontsize=12)
plt.savefig("flie%d.png"%i)
You also had an error in the string formatting of the title call. Actually 'frame'%i should have failed with an TypeError: not all arguments converted during string formatting-error.
Note also, that you don't need the plt.draw, since this will be called by plt.savefig.
I know that this is probably a silly question and I apologize for that, but I am very new to python and have tried to solve this for a long time now, with no success.
I have a list of tuples similar to the one bellow:
data = [('ralph picked', ['nose', '4', 'apple', '30', 'winner', '3']),
('aaron popped', ['soda', '1', 'popcorn', '6', 'pill', '4', 'question', '29'])]
I would like to sort the nested list in descending other:
data = [('ralph picked', ['apple', '30', 'nose', '4', 'winner', '3']),
('aaron popped', ['question', '29', 'popcorn', '6', 'pill', '4', 'soda', '1'])]
I tried using simple
sorted(data)
but what I get is only the first item of tuple sorted. What I am missing here? I really thank you for any help.
Let's consider only the inner list. The first issue is that it seems like you want to keep word, number pairs together. We can use zip to combine them, remembering that seq[::2] gives us every second element starting at the 0th, and seq[1::2] gives us every second starting at the first:
>>> s = ['nose', '4', 'apple', '30', 'winner', '3']
>>> zip(s[::2], s[1::2])
<zip object at 0xb5e996ac>
>>> list(zip(s[::2], s[1::2]))
[('nose', '4'), ('apple', '30'), ('winner', '3')]
Now, as you've discovered, if you call sorted on a sequence, it sorts first by the first element, then by the second to break ties, etc., going as deep as it needs to. So if we call sorted on this:
>>> sorted(zip(s[::2], s[1::2]))
[('apple', '30'), ('nose', '4'), ('winner', '3')]
Well, that looks like it works, but only by fluke because apple-nose-winner is in alphabetical order. Really we want to sort by the second term. sorted takes a key parameter:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: x[1])
[('winner', '3'), ('apple', '30'), ('nose', '4')]
That didn't work either, because it's sorting the number strings lexicographically (dictionary-style, so '30' comes before '4'). We can tell it we want to use the numerical value, though:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]))
[('winner', '3'), ('nose', '4'), ('apple', '30')]
Almost there -- we want this reversed:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]), reverse=True)
[('apple', '30'), ('nose', '4'), ('winner', '3')]
And this is almost right, but we need to flatten it. We can use either a nested list comprehension:
>>> s2 = sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]), reverse=True)
>>> [value for pair in s2 for value in pair]
['apple', '30', 'nose', '4', 'winner', '3']
or use itertools.chain:
>>> from itertools import chain
>>> list(chain.from_iterable(s2))
['apple', '30', 'nose', '4', 'winner', '3']
And I think that's where we wanted to go.