Comparing Two Pandas Dataframes and Creating a List of Values - list

I have 2 pandas dataframes:
df1 = pd.DataFrame({'Model': [1,2,3,4,5], 'Color': ['Orange', 'Red', 'Black', 'Purple', 'Pink']})
Color Model
Orange 1
Red 2
Black 3
Purple 4
Pink 5
df2 = pd.DataFrame({'Color': ['Orange', 'Green', 'Purple', 'Black', 'Indigo'], 'Drink': ['Soda', 'Juice', 'Water', 'Soda', 'Lemonade'], 'Model': [1,1,4,6,8]})
Color Drink Model
Orange Soda 1
Green Juice 1
Purple Water 4
Black Soda 6
Indigo Lemonade 8
I am trying to get create a list of Drinks from df2 whose Color and Model match the Color and Model from df1. For the above example the output should be:
['Soda', 'Water']
How would I accomplish this? I've tried:
drinks = []
for x in df1.Model:
for y in df2.Model:
for j in df1.Color:
for k in df2.Color:
if x == y and j == k:
drinks.append(df2.loc[df2['Model'] == x, 'Drink'].iloc[0])
which returns:
['Soda', 'Soda', 'Soda', 'Soda', 'Soda', 'Soda', 'Water', 'Water', 'Water']
I think I'm close but not sure how to get rid of the repetition.

If you're sure that your duplicated list contains all needed values, you can convert it to set()=)
lst=['Soda', 'Soda', 'Soda', 'Soda', 'Soda', 'Soda', 'Water', 'Water', 'Water']
unique_list=list(set(lst))

I managed to work this out with the following:
df3 = df2.loc[(df2['Color'].isin(list(df1.Color))) & df2.Model.isin(list(df1.Model))]
list(df3.Drink)
Output:
['Soda', 'Water']

Related

\t Doesn't work in my code, neither does \n

Hey I am wondering why this doesn't work in my code,
I am lead to believe from other forums that putting \t and \n in speech marks should fix the result:
zoo = ("Kangaroo","Leopard","Moose")
print("Tuple:", zoo, "\tLength:", len(zoo))
print(type( zoo))
bag = {'Red','Green','Blue'}
bag.add('Yellow')
print('\nSet:',bag,'\tLength' , len(bag))
print(type(bag))
print('\nIs Green In bag Set?:','Green' in bag)
print('Is orange in bag set?:', 'Orange' in bag)
box = {'Red','Purple','Yellow'}
print('\nSet:',box,'\t\tLength' , len(box))
print('Common to both sets:' , bag.intersection(box))
It just says:
('Tuple:', ('Kangaroo', 'Leopard', 'Moose'), '\tLength:', 3)
<type 'tuple'>
('\nSet:', set(['Blue', 'Green', 'Yellow', 'Red']), '\tLength', 4)
<type 'set'>
('\nIs Green In bag Set?:', True)
('Is orange in bag set?:', False)
('\nSet:', set(['Purple', 'Yellow', 'Red']), '\t\tLength', 3)
('Common to both sets:', set(['Red', 'Yellow']))
print is a command, not a function, in python2.7, so the parentheses are being interpreted as surrounding tuples, which is what that is what gets printed. The control characters are being displayed (instead of their effects) because you aren't printing the strings directly, but as part of tuples.
One way you can do it without changing to much is this.
zoo = ("Kangaroo","Leopard","Moose")
zlength = len(zoo)
print "Tuple: {}, \tLength: {}".format(zoo,zlength)
print type(zoo)
bag = {'Red','Green','Blue'}
bag.add('Yellow')
blength = len(bag)
print '\nSet: {}, \tLength: {}'.format(list(bag), blength)
print type(bag)
print '\nIs Green In bag Set?:','Green' in bag
print 'Is orange in bag set?:', 'Orange' in bag
box = {'Red','Purple','Yellow'}
bolength = len(box)
print '\nSet: {}, \tLength: {}'.format(list(box),bolength)
print 'Common to both sets:' , list(bag.intersection(box))
OUPUT:
Tuple: ('Kangaroo', 'Leopard', 'Moose'), Length: 3
Set: ['Blue', 'Green', 'Yellow', 'Red'], Length: 4
Is Green In bag Set?: True
Is orange in bag set?: False
Set: ['Purple', 'Red', 'Yellow'], Length: 3
Common to both sets: ['Yellow', 'Red']
In Python 2.7 the name print is recognized as the print statement, not as a built-in function. You can disable the statement and use the print() function by adding the following future statement at the top of your module:
from __future__ import print_function
Thus, for example:
>>> zoo = ("Kangaroo","Leopard","Moose")
>>> print("Tuple:", zoo, "\tLength:", len(zoo))
('Tuple:', ('Kangaroo', 'Leopard', 'Moose'), '\tLength:', 3)
>>> from __future__ import print_function
>>> print("Tuple:", zoo, "\tLength:", len(zoo))
Tuple: ('Kangaroo', 'Leopard', 'Moose') Length: 3
>>>

facecolor = 'none' (empty circles) not working using seaborn and .map

I have the following code where I am trying to plot 2 sets of data on the same plot, with the markers being empty circles. I would expect the inclusion of facecolor = 'none' in the map function below to accomplish this, but it does not seem to work. The closest I can get with the below is to have red circles around the red and blue dark dots.
x1 = np.random.randn(50)
y1 = np.random.randn(50)*100
x2 = np.random.randn(50)
y2 = np.random.randn(50)*100
df1 = pd.DataFrame({'x1':x1, 'y1':y1})
df2 = pd.DataFrame({'x2':x2, 'y2':y2})
df = pd.concat([df1.rename(columns={'x1':'x','y1':'y'})
.join(pd.Series(['df1']*len(df1), name='df')),
df2.rename(columns={'x2':'x','y2':'y'})
.join(pd.Series(['df2']*len(df2), name='df'))],
ignore_index=True)
pal = dict(df1="red", df2="blue")
g = sns.FacetGrid(df, hue='df', palette=pal, size=5)
g.map(plt.scatter, "x", "y", s=50, alpha=.7, linewidth=.5, facecolors = 'none', edgecolor="red")
g.map(sns.regplot, "x", "y", ci=None, robust=1)
g.add_legend()
sns.regplot doesn't pass through all the keywords you need for this, but you can do it with scatter explicitly, turning off regplot's scatter, and then rebuilding the legend:
g.map(plt.scatter, "x", "y", s=50, alpha=.7,
linewidth=.5,
facecolors = 'none',
edgecolor=['red', 'blue'])
g.map(sns.regplot, "x", "y", ci=None, robust=1,
scatter=False)
markers = [plt.Line2D([0,0],[0,0], markeredgecolor=pal[key],
marker='o', markerfacecolor='none',
mew=0.3,
linestyle='')
for key in pal]
plt.legend(markers, pal.keys(), numpoints=1)
plt.show()

Two lists of same length. Get unique pairs based on one of them

I have two lists of the same length. Let's say:
x = ['123', '456', '789', '123']
y = ['aaa', 'aaa', 'bbb', 'ccc']
They are basically combinations (123-aaa, 456-aaa, 789-bbb, 123-ccc).
What I am trying to accomplish is to get the unique values and the count of unique values in list y, BUT only after I have removed all the pairs that have the same value in list x. So in this example I would need to delete x[3], and y[3]. And then have the counts in a dictionary:
{'aaa' : '2', 'bbb' : '1'}
I hope this is clear enough. Been banging my head against the wall for hours...
Another simple way of doing it:
from collections import Counter
x = ['123', '456', '789', '123']
y = ['aaa', 'aaa', 'bbb', 'ccc']
uniq = []
res = Counter()
for number, letter in zip(x, y):
if number not in uniq:
uniq.append(number)
res.update(letter)
print dict(res)
{'aaa': 2, 'bbb': 1}
Check dict and Counter. dict() replaces repeated values so you'd need to reverse your lists first.
In [24]: x
Out[24]: ['123', '456', '789', '123']
In [25]: y
Out[25]: ['aaa', 'aaa', 'bbb', 'ccc']
In [26]: x.reverse()
In [27]: y.reverse()
In [28]: x
Out[28]: ['123', '789', '456', '123']
In [29]: y
Out[29]: ['ccc', 'bbb', 'aaa', 'aaa']
In [30]: z=dict(zip(x,y))
In [31]: z
Out[31]: {'123': 'aaa', '456': 'aaa', '789': 'bbb'}
In [32]: from collections import Counter
In [33]: values=z.values()
In [34]: values
Out[34]: dict_values(['aaa', 'bbb', 'aaa'])
In [37]: z = Counter(values)
In [38]: z
Out[38]: Counter({'aaa': 2, 'bbb': 1})
In [39]: z['aaa']
Out[39]: 2
In [40]: z['bbb']
Out[40]: 1
In [41]: z.keys()
Out[41]: dict_keys(['bbb', 'aaa'])
If converting to numpy array is an option for you, check unique
In [39]: import numpy as np
In [40]: x = np.array(['123', '456', '789', '123'])
In [41]: y = np.array(['aaa', 'aaa', 'bbb', 'ccc'])
In [42]: x_unique, x_index = np.unique(x, return_index=True)
In [43]: x_unique
Out[43]:
array(['123', '456', '789'],
dtype='|S3')
In [44]: x_index
Out[44]: array([0, 1, 2])
In [47]: z_unique, z_counts= np.unique(y[x_index],return_counts=True)
In [48]: z_unique
Out[48]:
array(['aaa', 'bbb'],
dtype='|S3')
In [49]: z_counts
Out[49]: array([2, 1])
In [50]: z = dict(zip(z_unique, z_counts))
In [51]: z
Out[51]: {'aaa': 2, 'bbb': 1}

How can I use django-nvd3 to draw multi line chart with different values in X-axis?

All the examples I could find in searches to draw several lines in graph such as:
https://github.com/areski/django-nvd3/blob/master/demoproject/demoproject/views.py#L66
use completely same values for x-axis. something like:
start_time = int(time.mktime(datetime.datetime(2012, 6, 1).timetuple()) * 1000)
nb_element = 150
xdata = range(nb_element)
xdata = map(lambda x: start_time + x * 1000000000, xdata)
ydata = [i + random.randint(1, 10) for i in range(nb_element)]
ydata2 = map(lambda x: x * 2, ydata)
tooltip_date = "%d %b %Y %H:%M:%S %p"
extra_serie1 = {
"tooltip": {"y_start": "", "y_end": " cal"},
"date_format": tooltip_date,
'color': '#a4c639'
}
extra_serie2 = {
"tooltip": {"y_start": "", "y_end": " cal"},
"date_format": tooltip_date,
'color': '#FF8aF8'
}
chartdata = {'x': xdata,
'name1': 'series 1', 'y1': ydata, 'extra1': extra_serie1,
'name2': 'series 2', 'y2': ydata2, 'extra2': extra_serie2}
What can I do when I want to draw two lines with these data (x values for two graph points are not completely the same):
xdata1 = [1, 2, 4, 5]
ydata1 = [18, 3, 5, 2]
xdata2 = [1, 3, 5, 6]
ydata2 = [3, 13, 0, 6]
It's not possible to inject datasets with different x-axis. NVD3 is a chart library and so doesn't put much complexity in processing data, so you need to normalize your x-axis.
If you have complex datasets you can look at Panda, it can help you normalizing your dataset when you have different x-axis, http://pandas.pydata.org/

Assigning new column name and creating new column conditionally in pandas not working?

I have a simple dataframe with pandas, then I rename the variable names into 'a' and 'b'.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df.columns = ['a', 'b']
print df
df['color'] = np.where(df['b']=='Z', 'green', 'red')
print df
a b
0 Z A
1 Z B
2 X B
3 Y C
a b color
0 Z A red
1 Z B red
2 X B red
3 Y C red
Without the renaming line df.columns, I get
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
#df.columns = ['a', 'b']
#print df
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print df
Set Type color
0 Z A green
1 Z B green
2 X B red
3 Y C red
I want and would expect the first set of code to produce "green green red red", but it failed and I don't know why.
As pointed out in the comments, the problem comes from how you are rename the columns. You are better off renaming, like so:
df = df.rename( columns={'Set': 'a','Type': 'b'})