matplotlib: Plot multiple small figures in one big plot - python-2.7

I have a pandas dataframe pandas_df with 6 input columns: column_1, column_2, ... , column_6, and one result column result. Now I used the following code to plot the scatter plot for every two input column pairs (so totally I have 6*5/2 = 15 figures). I did the following code 15 times, and each generated a big figure.
I am wondering is there a way to iterate over all possible column pairs, and plot all 15 figures as small figures in one big plot? Thanks!
%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
pandas_df.plot(x='column_1', y='column_2', kind = 'scatter', c = 'result')

consider the dataframe df
df = pd.DataFrame(np.random.rand(10, 6), columns=pd.Series(list('123456')).radd('C'))
df
Solution
Use itertools and matplotlib.pyplot.subplots
from itertools import combinations
import matplotlib.pyplot as plt
pairs = list(combinations(df.columns, 2))
fig, axes = plt.subplots(len(pairs) // 3, 3, figsize=(15, 12))
for i, pair in enumerate(pairs):
d = df[list(pair)]
ax = axes[i // 3, i % 3]
d.plot.scatter(*pair, ax=ax)
fig.tight_layout()

Related

How to plot graph from file using Python, problem of the junction of lines

I'm new to python and have a question. I have a file.csv file that contains two columns.
FILE.csv
0.0000 9.0655
0.0048 9.0640
0.0096 9.0592
0.0144 9.0510
0.0192 9.0392
0.0240 9.0233
0.0288 9.0028
0.0336 8.9770
0.0384 8.9451
0.0432 8.9063
0.0480 8.8595
0.0528 8.8039
0.0576 8.7385
0.0624 8.6626
0.0000 11.0013
0.0048 11.0018
0.0096 11.0032
0.0144 11.0057
0.0192 11.0091
0.0240 11.0134
0.0288 11.0186
0.0336 11.0247
0.0384 11.0317
0.0432 11.0394
0.0480 11.0478
0.0528 11.0569
0.0576 11.0666
0.0624 11.0767
0.0672 11.0873
I tried to plot the graph from FILE.csv
with xmgrace and Gnuplot, and the result is very convincing.
I have two lines in the graph, as shown in the two figure below:
enter image description here
enter image description here
On the other hand, if I use my python script, the two lines are joined
here is my script:
import matplotlib.pyplot as plt
import pylab as plt
#
with open('bb.gnu') as f:
f=[x.strip() for x in f if x.strip()]
data=[tuple(map(float,x.split())) for x in f[2:]]
BX1=[x[0] for x in data]
BY1=[x[1] for x in data]
plt.figure(figsize=(8,6))
ax = plt.subplot(111)
plt.plot(BX1, BY1, 'k-', linewidth=2 ,label='Dos')
plt.plot()
plt.savefig("Fig.png", dpi=100)
plt.show()
And here's the result
enter image description here
My question, does it exist a solution to plot graph with Python, without generating the junction between the two lines.
In order to find a similar result to Gnuplot and xmgrace.
Thank you in advance for your help.
To my knowledge, matplotlib is only joining your two curves because you provide them as one set of data. This means that you need to call plot twice in order to generate two curves. I put your data in a file called data.csv and wrote the following piece of code:
import numpy
import matplotlib.pyplot as plt
data = numpy.genfromtxt('data.csv')
starts = numpy.asarray(data[:, 0] == 0).nonzero()[0]
fig, ax = plt.subplots(nrows=1, ncols=1, num=0, figsize=(16, 8))
for i in range(starts.shape[0]):
if i == starts.shape[0] - 1:
ax.plot(data[starts[i]:, 0], data[starts[i]:, 1])
else:
ax.plot(data[starts[i]:starts[i + 1], 0],
data[starts[i]:starts[i + 1], 1])
plt.show()
which generates this figure
What I do with starts is that I look for the rows in the first column of data which contain the value 0, which I consider to be the start of a new curve. The loop then generates a curve at each iteration. The if statement discerns between the last curve and the other ones. There is probably more elegant, but it works.
Also, do not import pylab, it is discouraged because of the unnecessary filling of the namespace.

Python: plot different kinds of colors [duplicate]

I am using matplotlib to create the plots. I have to identify each plot with a different color which should be automatically generated by Python.
Can you please give me a method to put different colors for different plots in the same figure?
Matplotlib does this by default.
E.g.:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.show()
And, as you may already know, you can easily add a legend:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.legend(['y = x', 'y = 2x', 'y = 3x', 'y = 4x'], loc='upper left')
plt.show()
If you want to control the colors that will be cycled through:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.gca().set_color_cycle(['red', 'green', 'blue', 'yellow'])
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.legend(['y = x', 'y = 2x', 'y = 3x', 'y = 4x'], loc='upper left')
plt.show()
If you're unfamiliar with matplotlib, the tutorial is a good place to start.
Edit:
First off, if you have a lot (>5) of things you want to plot on one figure, either:
Put them on different plots (consider using a few subplots on one figure), or
Use something other than color (i.e. marker styles or line thickness) to distinguish between them.
Otherwise, you're going to wind up with a very messy plot! Be nice to who ever is going to read whatever you're doing and don't try to cram 15 different things onto one figure!!
Beyond that, many people are colorblind to varying degrees, and distinguishing between numerous subtly different colors is difficult for more people than you may realize.
That having been said, if you really want to put 20 lines on one axis with 20 relatively distinct colors, here's one way to do it:
import matplotlib.pyplot as plt
import numpy as np
num_plots = 20
# Have a look at the colormaps here and decide which one you'd like:
# http://matplotlib.org/1.2.1/examples/pylab_examples/show_colormaps.html
colormap = plt.cm.gist_ncar
plt.gca().set_prop_cycle(plt.cycler('color', plt.cm.jet(np.linspace(0, 1, num_plots))))
# Plot several different functions...
x = np.arange(10)
labels = []
for i in range(1, num_plots + 1):
plt.plot(x, i * x + 5 * i)
labels.append(r'$y = %ix + %i$' % (i, 5*i))
# I'm basically just demonstrating several different legend options here...
plt.legend(labels, ncol=4, loc='upper center',
bbox_to_anchor=[0.5, 1.1],
columnspacing=1.0, labelspacing=0.0,
handletextpad=0.0, handlelength=1.5,
fancybox=True, shadow=True)
plt.show()
Setting them later
If you don't know the number of the plots you are going to plot you can change the colours once you have plotted them retrieving the number directly from the plot using .lines, I use this solution:
Some random data
import matplotlib.pyplot as plt
import numpy as np
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
for i in range(1,15):
ax1.plot(np.array([1,5])*i,label=i)
The piece of code that you need:
colormap = plt.cm.gist_ncar #nipy_spectral, Set1,Paired
colors = [colormap(i) for i in np.linspace(0, 1,len(ax1.lines))]
for i,j in enumerate(ax1.lines):
j.set_color(colors[i])
ax1.legend(loc=2)
The result is the following:
TL;DR No, it can't be done automatically. Yes, it is possible.
import matplotlib.pyplot as plt
my_colors = plt.rcParams['axes.prop_cycle']() # <<< note that we CALL the prop_cycle
fig, axes = plt.subplots(2,3)
for ax in axes.flatten(): ax.plot((0,1), (0,1), **next(my_colors))
Each plot (axes) in a figure (figure) has its own cycle of colors — if you don't force a different color for each plot, all the plots share the same order of colors but, if we stretch a bit what "automatically" means, it can be done.
The OP wrote
[...] I have to identify each plot with a different color which should be automatically generated by [Matplotlib].
But... Matplotlib automatically generates different colors for each different curve
In [10]: import numpy as np
...: import matplotlib.pyplot as plt
In [11]: plt.plot((0,1), (0,1), (1,2), (1,0));
Out[11]:
So why the OP request? If we continue to read, we have
Can you please give me a method to put different colors for different plots in the same figure?
and it make sense, because each plot (each axes in Matplotlib's parlance) has its own color_cycle (or rather, in 2018, its prop_cycle) and each plot (axes) reuses the same colors in the same order.
In [12]: fig, axes = plt.subplots(2,3)
In [13]: for ax in axes.flatten():
...: ax.plot((0,1), (0,1))
If this is the meaning of the original question, one possibility is to explicitly name a different color for each plot.
If the plots (as it often happens) are generated in a loop we must have an additional loop variable to override the color automatically chosen by Matplotlib.
In [14]: fig, axes = plt.subplots(2,3)
In [15]: for ax, short_color_name in zip(axes.flatten(), 'brgkyc'):
...: ax.plot((0,1), (0,1), short_color_name)
Another possibility is to instantiate a cycler object
from cycler import cycler
my_cycler = cycler('color', ['k', 'r']) * cycler('linewidth', [1., 1.5, 2.])
actual_cycler = my_cycler()
fig, axes = plt.subplots(2,3)
for ax in axes.flat:
ax.plot((0,1), (0,1), **next(actual_cycler))
Note that type(my_cycler) is cycler.Cycler but type(actual_cycler) is itertools.cycle.
I would like to offer a minor improvement on the last loop answer given in the previous post (that post is correct and should still be accepted). The implicit assumption made when labeling the last example is that plt.label(LIST) puts label number X in LIST with the line corresponding to the Xth time plot was called. I have run into problems with this approach before. The recommended way to build legends and customize their labels per matplotlibs documentation ( http://matplotlib.org/users/legend_guide.html#adjusting-the-order-of-legend-item) is to have a warm feeling that the labels go along with the exact plots you think they do:
...
# Plot several different functions...
labels = []
plotHandles = []
for i in range(1, num_plots + 1):
x, = plt.plot(some x vector, some y vector) #need the ',' per ** below
plotHandles.append(x)
labels.append(some label)
plt.legend(plotHandles, labels, 'upper left',ncol=1)
**: Matplotlib Legends not working
Matplot colors your plot with different colors , but incase you wanna put specific colors
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x,color='blue')
plt.plot(x, 3 * x,color='red')
plt.plot(x, 4 * x,color='green')
plt.show()
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from skspatial.objects import Line, Vector
for count in range(0,len(LineList),1):
Line_Color = np.random.rand(3,)
Line(StartPoint,EndPoint)).plot_3d(ax,c="Line"+str(count),label="Line"+str(count))
plt.legend(loc='lower left')
plt.show(block=True)
The above code might help you to add 3D lines with different colours in a randomized fashion. Your colored lines can also be referenced with a help of a legend as mentioned in the label="... " parameter.
Honestly, my favourite way to do this is pretty simple: Now this won't work for an arbitrarily large number of plots, but it will do you up to 1163. This is by using the map of all matplotlib's named colours and then selecting them at random.
from random import choice
import matplotlib.pyplot as plt
from matplotlib.colors import mcolors
# Get full named colour map from matplotlib
colours = mcolors._colors_full_map # This is a dictionary of all named colours
# Turn the dictionary into a list
color_lst = list(colours.values())
# Plot using these random colours
for n, plot in enumerate(plots):
plt.scatter(plot[x], plot[y], color=choice(color_lst), label=n)

Plotting error bars from 2 axis

I'm looking to plot the standard deviation of some array data I've been looking at in python however the data is averaged over both longitude and latitude (Axis 2,3 of my arrays).
What I have so far is a monthly plot that looks like this but I can't get the standard deviations to work Monthly plot
I was just wondering if anyone knew how to get around this problem. Here's the code I've used thus far.
Any help is much appreciated!
# import things
import matplotlib.pyplot as plt
import numpy as np
import netCDF4
# [ date, hour, 0, lon, lat ]
temp = (f.variables['TEMP2'][:, 14:24, 0, :, :]) # temp at 2m
temp2 = (f.variables['TEMP2'][:, 0:14, 0, :, :])
# concatenate back to 24 hour period
tercon = np.concatenate((temp, temp2), axis=1)
ter1 = tercon.mean(axis=(2, 3))
rtemp = np.reshape(ter1, 672)-273
# X axis dates instead of times
date = np.arange(rtemp.shape[0]) # assume that delta time between data is 1
date21 = (date/24.) # use days instead of hours
# change plot size for monthly
rcParams['figure.figsize'] = 15, 5
plt.plot(date21, rtemp , linestyle='-', linewidth=3.0, c='orange')
You should errorbar instead of plot and pass the precalculated standard deviations. The following adapted example uses random data to emulate your temperature data with an hourly resolution and accumulates the data and the standard deviation.
# import things
import matplotlib.pyplot as plt
import numpy as np
# x-axis: day-of-month
date21 = np.arange(1, 31)
# generate random "hourly" data
hourly_temp = np.random.random(30*24)*10 + 20
# mean "temperature"
dayly_mean_temp = hourly_temp.reshape(24,30).mean(axis=0)
# standard deviation per day
dayly_std_temp = hourly_temp.reshape(24,30).std(axis=0)
# create a figure
figure = plt.figure(figsize = (15, 5))
#add an axes to the figure
ax = figure.add_subplot(111)
ax.grid()
ax.errorbar(date21, dayly_mean_temp , yerr=dayly_std_temp, fmt="--o", capsize=15, capthick=3, linestyle='-', linewidth=3.0, c='orange')
plt.show()

Matching dendrogram with cluster number in Python's scipy.cluster.hierarchy

The following code generates a simple hierarchical cluster dendrogram with 10 leaf nodes:
import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt
X = scipy.randn(10,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
P =sch.dendrogram(Z)
plt.show()
I generate three flat clusters like so:
T = sch.fcluster(Z, 3, 'maxclust')
# array([3, 1, 1, 2, 2, 2, 2, 2, 1, 2])
However, I'd like to see the cluster labels 1,2,3 on the dendrogram. It's easy for me to visualize with just 10 leaf nodes and three clusters, but when I have 1000 nodes and 10 clusters, I can't see what's going on.
How do I show the cluster numbers on the dendrogram? I'm open to other packages. Thanks.
Here is a solution that appropriately colors the clusters and labels the leaves of the dendrogram with the appropriate cluster name (leaves are labeled: 'point number, cluster number'). These techniques can be used independently or together. I modified your original example to include both:
import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt
n=10
k=3
X = scipy.randn(n,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
T = sch.fcluster(Z, k, 'maxclust')
# calculate labels
labels=list('' for i in range(n))
for i in range(n):
labels[i]=str(i)+ ',' + str(T[i])
# calculate color threshold
ct=Z[-(k-1),2]
#plot
P =sch.dendrogram(Z,labels=labels,color_threshold=ct)
plt.show()

Use a loop to plot n charts Python

I have a set of data that I load into python using a pandas dataframe. What I would like to do is create a loop that will print a plot for all the elements in their own frame, not all on one. My data is in an excel file structured in this fashion:
Index | DATE | AMB CO 1 | AMB CO 2 |...|AMB CO_n | TOTAL
1 | 1/1/12| 14 | 33 |...| 236 | 1600
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
n
This is what I have for code so far:
import pandas as pd
import matplotlib.pyplot as plt
ambdf = pd.read_excel('Ambulance.xlsx',
sheetname='Sheet2', index_col=0, na_values=['NA'])
print type(ambdf)
print ambdf
print ambdf['EAS']
amb_plot = plt.plot(ambdf['EAS'], linewidth=2)
plt.title('EAS Ambulance Numbers')
plt.xlabel('Month')
plt.ylabel('Count of Deliveries')
print amb_plot
for i in ambdf:
print plt.plot(ambdf[i], linewidth = 2)
I am thinking of doing something like this:
for i in ambdf:
ambdf_plot = plt.plot(ambdf, linewidth = 2)
The above was not remotely what i wanted and it stems from my unfamiliarity with Pandas, MatplotLib etc, looking at some documentation though to me it looks like matplotlib is not even needed (question 2)
So A) How can I produce a plot of data for every column in my df
and B) do I need to use matplotlib or should I just use pandas to do it all?
Thank you,
Ok, so the easiest method to create several plots is this:
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
for i in range(len(x)):
plt.figure()
plt.plot(x[i],y[i])
# Show/save figure as desired.
plt.show()
# Can show all four figures at once by calling plt.show() here, outside the loop.
#plt.show()
Note that you need to create a figure every time or pyplot will plot in the first one created.
If you want to create several data series all you need to do is:
import matplotlib.pyplot as plt
plt.figure()
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
plt.plot(x[0],y[0],'r',x[1],y[1],'g',x[2],y[2],'b',x[3],y[3],'k')
You could automate it by having a list of colours like ['r','g','b','k'] and then just calling both entries in this list and corresponding data to be plotted in a loop if you wanted to. If you just want to programmatically add data series to one plot something like this will do it (no new figure is created each time so everything is plotted in the same figure):
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
colours=['r','g','b','k']
plt.figure() # In this example, all the plots will be in one figure.
for i in range(len(x)):
plt.plot(x[i],y[i],colours[i])
plt.show()
If anything matplotlib has a very good documentation page with plenty of examples.
17 Dec 2019: added plt.show() and plt.figure() calls to clarify this part of the story.
Use a dictionary!!
You can also use dictionaries that allows you to have more control over the plots:
import matplotlib.pyplot as plt
# plot 0 plot 1 plot 2 plot 3
x=[[1,2,3,4],[1,4,3,4],[1,2,3,4],[9,8,7,4]]
y=[[3,2,3,4],[3,6,3,4],[6,7,8,9],[3,2,2,4]]
plots = zip(x,y)
def loop_plot(plots):
figs={}
axs={}
for idx,plot in enumerate(plots):
figs[idx]=plt.figure()
axs[idx]=figs[idx].add_subplot(111)
axs[idx].plot(plot[0],plot[1])
return figs, axs
figs, axs = loop_plot(plots)
Now you can select the plot that you want to modify easily:
axs[0].set_title("Now I can control it!")
Of course, is up to you to decide what to do with the plots. You can either save them to disk figs[idx].savefig("plot_%s.png" %idx) or show them plt.show(). Use the argument block=False only if you want to pop up all the plots together (this could be quite messy if you have a lot of plots). You can do this inside the loop_plot function or in a separate loop using the dictionaries that the function provided.
Just to add returning figs and axs is not mandatory to execute plt.show().
Here are two examples of how to generate graphs in separate windows (frames), and, an example of how to generate graphs and save them into separate graphics files.
Okay, first the on-screen example. Notice that we use a separate instance of plt.figure(), for each graph, with plt.plot(). At the end, we have to call plt.show() to put it all on the screen.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show()
Another way to do this, is to use plt.show(block=False) inside the loop:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show( block=False )
Now, let's generate the graphs and instead, write them each to a file. Here we replace plt.show(), with plt.savefig( filename ). The difference from the previous example is that we don't have to account for ''blocking'' at each graph. Note also, that we number the file names. Here we use %03d so that we can conveniently have them in number order afterwards.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.savefig('myfilename%03d.png'%(n))
If your requirement is to plot against one column, then feel free to use this (First import data into a pandas DF) (plots a matrix of plots with 5 columns and as many rows required):
import math
i,j=0,0
PLOTS_PER_ROW = 5
fig, axs = plt.subplots(math.ceil(len(df.columns)/PLOTS_PER_ROW),PLOTS_PER_ROW, figsize=(20, 60))
for col in df.columns:
axs[i][j].scatter(df['target_col'], df[col], s=3)
axs[i][j].set_ylabel(col)
j+=1
if j%PLOTS_PER_ROW==0:
i+=1
j=0
plt.show()
A simple way of plotting on different frames would be like:
import matplotlib.pyplot as plt
for grp in list_groups:
plt.figure()
plt.plot(grp)
plt.show()
Then python will plot multiple frames for each iteration.
We can create a for loop and pass all the numeric columns into it.
The loop will plot the graphs one by one in separate pane as we are including
plt.figure() into it.
import pandas as pd
import seaborn as sns
import numpy as np
numeric_features=[x for x in data.columns if data[x].dtype!="object"]
#taking only the numeric columns from the dataframe.
for i in data[numeric_features].columns:
plt.figure(figsize=(12,5))
plt.title(i)
sns.boxplot(data=data[i])