Overlay histograms in one plot - python-2.7

I have two dataframes that I'm trying to make histograms of. I would like to overlay one histogram over the other and show them in the same cell, so I can easily compare the distributions. Can anyone suggest how to do that? I have example code and data below. This will plot the histograms separately one above the other.
Data:
print(df[1:5])
bob
1 1
2 3
3 5
4 1
print(df2[1:5])
bob
1 3
2 3
3 2
4 1
Code:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df[df[bob]>=1][bob].hist(bins=25, range=[0, 25])
plt.show()
df2[df2[bob]>=1][bob].hist(bins=25, range=[0, 25])
plt.show()

Use ax:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
fig = plt.figure()
ax = fig.add_subplot(111)
df = pd.DataFrame([1, 3, 5, 1], columns=["bob"], index=[1, 2, 3, 4])
df2 = pd.DataFrame([3, 3, 2, 1], columns=["bob"], index=[1, 2, 3, 4])
ax.hist([df, df2], label=("df", "df2"), bins=25, range=[0, 25])
ax.legend()

Related

Creating a bar chart that sums data from a txt file

So, im trying to create a bar chart that takes data from a txt file that has 3 rows. The idea is for the bar chart to sum the data from each row, and to graph 3 bars under those numbers in the Y axis, and the name of what does numbers mean in the X axis. Been searching around and i came off with this, but i feel like its really far away from what i really need.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
arrays = np.loadtxt("D:/uni/txt arkhives/tiempos.txt", dtype=float)
row1 =np.array(arrays[:,0])
row2 =np.array(arrays[:,1])
row3 =np.array(arrays[:,2])
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second','third'])
df = pd.DataFrame({'A': row1, 'B': row2, 'C': row3}, index=index)
plt.figure()
df.groupby(['row1','row2','row3']).sum().unstack().plot()
plt.grid()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Graph')
plt.legend()
plt.show()
I keep getting this error:
This might what you're looking for:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
##If you do not have headers in your file use:
df = pd.read_table("D:/uni/txt arkhives/tiempos.txt",header=None,names = ['A','B','C'])
##If you do have headers in your file use:
df = pd.read_table("D:/uni/txt arkhives/tiempos.txt")
df['bar'] = 1
dfgroup = df.groupby('bar').sum()
ax = dfgroup [['A','B','C']].plot(kind='bar', title ="Bar Chart", figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel("X label", fontsize=12)
ax.set_ylabel("Y Lable", fontsize=12)
plt.show()
Make sure the columns names are A, B, C and it should be good.

Python Side-by-side box plots on same figure

I am trying to generate a box plot in Python 2.7 for each categorical value in column E from the Pandas dataframe below
A B C D E
0 0.647366 0.317832 0.875353 0.993592 1
1 0.504790 0.041806 0.113889 0.445370 2
2 0.769335 0.120647 0.749565 0.935732 3
3 0.215003 0.497402 0.795033 0.246890 1
4 0.841577 0.211128 0.248779 0.250432 1
5 0.045797 0.710889 0.257784 0.207661 4
6 0.229536 0.094308 0.464018 0.402725 3
7 0.067887 0.591637 0.949509 0.858394 2
8 0.827660 0.348025 0.507488 0.343006 3
9 0.559795 0.820231 0.461300 0.921024 1
I would be willing to do this with Matplotlib or any other plotting library. So far the above code can plot all the categories combined on one plot. Here is the code to generate the above data and produce the plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# Data
df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD'))
df['E'] = [1,2,3,1,1,4,3,2,3,1]
# Boxplot
bp = ax.boxplot(df.iloc[:,:-1].values, widths=0.2)
plt.show()
In this example, the categories are 1,2,3,4. I would like to plot separate boxplots side-by-side on the same figure, for only categories 1 and 2 and show the category names in the legend.
Is there a way to do this?
Additional Information:
The output should look similar to the 3rd figure from here - replace "Yes","No" by "1","2".
Starting with this:
import numpy
import pandas
from matplotlib import pyplot
import seaborn
seaborn.set(style="ticks")
# Data
df = pandas.DataFrame(numpy.random.rand(10,4), columns=list('ABCD'))
df['E'] = [1, 2, 3, 1, 1, 4, 3, 2, 3, 1]
You've got a couple of options. If separate axes are ok,
fig, axes = pyplot.subplots(ncols=4, figsize=(12, 5), sharey=True)
df.query("E in [1, 2]").boxplot(by='E', return_type='axes', ax=axes)
If you want 1 axes, I think seaborn will be easier. You just need to clean up your data.
ax = (
df.set_index('E', append=True) # set E as part of the index
.stack() # pull A - D into rows
.to_frame() # convert to a dataframe
.reset_index() # make the index into reg. columns
.rename(columns={'level_2': 'quantity', 0: 'value'}) # rename columns
.drop('level_0', axis='columns') # drop junk columns
.pipe((seaborn.boxplot, 'data'), x='E', y='value', hue='quantity', order=[1, 2])
)
seaborn.despine(trim=True)
The cool thing about seaborn is that tweaking the parameters slightly can achieve a lot in terms of the plot's layout. If we switch our hue and x variables, we get:
ax = (
df.set_index('E', append=True) # set E as part of the index
.stack() # pull A - D into rows
.to_frame() # convert to a dataframe
.reset_index() # make the index into reg. columns
.rename(columns={'level_2': 'quantity', 0: 'value'}) # rename columns
.drop('level_0', axis='columns') # drop junk columns
.pipe((seaborn.boxplot, 'data'), x='quantity', y='value', hue='E', hue_order=[1, 2])
)
seaborn.despine(trim=True)
If you're curious, the resulting dataframe looks something like this:
E quantity value
0 1 A 0.935433
1 1 B 0.862290
2 1 C 0.197243
3 1 D 0.977969
4 2 A 0.675037
5 2 B 0.494440
6 2 C 0.492762
7 2 D 0.531296
8 3 A 0.119273
9 3 B 0.303639
10 3 C 0.911700
11 3 D 0.807861
An addition to #Paul_H answer.
Side-by-side boxplots on the single matplotlib.axes.Axes, no seaborn:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(10,4), columns=list('ABCD'))
df['E'] = [1, 2, 1, 1, 1, 2, 1, 2, 2, 1]
mask_e = df['E'] == 1
# prepare data
data_to_plot = [df[mask_e]['A'], df[~mask_e]['A'],
df[mask_e]['B'], df[~mask_e]['B'],
df[mask_e]['C'], df[~mask_e]['C'],
df[mask_e]['D'], df[~mask_e]['D']]
# Positions defaults to range(1, N+1) where N is the number of boxplot to be drawn.
# we will move them a little, to visually group them
plt.figure(figsize=(10, 6))
box = plt.boxplot(data_to_plot,
positions=[1, 1.6, 2.5, 3.1, 4, 4.6, 5.5, 6.1],
labels=['A1','A0','B1','B0','C1','C0','D1','D0'])

Understanding of a logistic regression example

I am new to Logistic regression. The following is from package mypc example
project. It is still unclear to me for its purpose. More specifically,
variable n is [5, 5, 5, 5], which is used in the mode: pymc.Binomial.
I suppose it should have both 0 and 1 in binomial fitting. n is to represent
'1' cases?
Could you explain the idea of this example? Thanks,
The example is from:
www.map.ox.ac.uk/media/PDF/Patil_et_al_2010.pdf
.........
import pymc
import numpy as np
n = 5*np.ones(4,dtype=int)
x = np.array([-.86,-.3,-.05,.73])
alpha = pymc.Normal('alpha',mu=0,tau=.01)
beta = pymc.Normal('beta',mu=0,tau=.01)
#pymc.deterministic
def theta(a=alpha, b=beta):
"""theta = logit^{-1}(a+b)"""
return pymc.invlogit(a+b*x)
d = pymc.Binomial('d', n=n, p=theta, value=np.array([0.,1.,3.,5.]),\
observed=True)
.......
import pymc
import pymc.Matplot
import mymodel
S = pymc.MCMC(mymodel, db='pickle')
S.sample(iter=10000, burn=5000, thin=2)
pymc.Matplot.plot(S)
import matplotlib.pyplot as plt
plt.show()

matplotlib subplot2grid doesn't display correctly

I'm using subplot2grid to display graphs. However, not all subplots are being displayed. Obviously it has to do with the if statement.
However, in my complete code I need those if statements because depending on some conditions plots have diffent formats. I want all 3 subplots to be displayed (one for each i). However, the first one is missing. How to display it correctly?
Here is the simplified code:
import matplotlib.pyplot as plt
fig=plt.figure()
for i in xrange(0,3):
if i==1:
ax=plt.subplot2grid((3,1),(i,0))
ax.plot([1,2],[1,2])
fig.autofmt_xdate()
else:
ax=plt.subplot2grid((3,1),(i,0), rowspan=2)
ax.plot([1,2],[1,2])
fig.autofmt_xdate()
plt.show()
I would just use the gridspec module from matplotlib. Then you can set the width/height ratios directly.
Then you can do something like this:
import numpy
from matplotlib import gridspec
import matplotlib.pyplot as plt
def do_plot_1(ax):
ax.plot([0.25, 0.5, 0.75], [0.25, 0.5, 0.75], 'k-')
def do_plot_2(ax):
ax.plot([0.25, 0.5, 0.75], [0.25, 0.5, 0.75], 'g--')
fig = plt.figure(figsize=(6, 4))
gs = gridspec.GridSpec(nrows=3, ncols=1, height_ratios=[2, 1, 2])
for n in range(3):
ax = fig.add_subplot(gs[n])
if n == 1:
do_plot_1(ax)
else:
do_plot_2(ax)
fig.tight_layout()
To use plt.subplot2grid, you'd need to effectively do something like this:
fig = plt.figure(figsize=(6, 4))
ax1 = plt.subplot2grid((5,1), (0, 0), rowspan=2)
ax2 = plt.subplot2grid((5,1), (2, 0), rowspan=1)
ax3 = plt.subplot2grid((5,1), (3, 0), rowspan=2)
Since you have two axes with a rowspan=2, your grid needs to be 2+1+2 = 5 blocks tall.

Is it possible to automatically generate multiple subplots in matplotlib?

Is it possible to automatically generate multiple subplots in matplotlib? An example of the process I want to automate is:
import matplotlib.pyplot as plt
figure = plt.figure()
ax1 = figure.add_subplot(2, 3, 1)
ax2 = figure.add_subplot(2, 3, 2)
ax3 = figure.add_subplot(2, 3, 3)
ax4 = figure.add_subplot(2, 3, 4)
ax5 = figure.add_subplot(2, 3, 5)
ax6 = figure.add_subplot(2, 3, 6)
The subplots need unique names, as this will allow me to do stuff like:
for ax in [ax1, ax2, ax3, ax4, ax5, ax6]:
ax.set_title("example")
Many thanks.
Addition: Are there any functions that automate the generation of multiple subplots? What if I needed to repeat the above process 100 times? Would I have to type out every ax1 to ax100?
You can use:
fig, axs = plt.subplots(2,3)
axs will be an array containing the subplots.
Or unpack the array instantly:
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(2,3)