I have identified that the most time-consuming step in the execution of code is saving plots to memory. I'm using matplotlib for plotting and saving the plots. The issue is that I'm running several simulations and saving the plots resulting from these simulations; this effort is consuming an insane number of compute hours. I have verified that it is indeed the plotting that is doing the damage.
It seems that pyqtgraph renders and saves images comparatively faster than matplotlib. I want to know if something similar to the following lines of code could be implemented in pyqtgraph?
import matplotlib
matplotlib.use('Agg')
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(6, 2.5))
rows = 1
columns = 2
gs = gridspec.GridSpec(rows, columns, hspace=0.0,wspace=0.0)
aj=0
for specie in lines:
for transition in species[specie].items():
gss = gridspec.GridSpecFromSubplotSpec(2, 1, subplot_spec=gs[aj],hspace=0.0,height_ratios=[1, 3])
ax0 = fig.add_subplot(gss[0])
ax1 = fig.add_subplot(gss[1], sharex=ax0)
ax0.plot(fitregs[specie+transition[0]+'_Vel'],fitregs['N_residue'],color='black',linewidth=0.85)
ax1.plot(fitregs[specie+transition[0]+'_Vel'],fitregs['N_flux'],color='black',linewidth=0.85)
ax1.plot(fitregs[specie+transition[0]+'_Vel'],fitregs['Best_profile'],color='red',linewidth=0.85)
ax1.xaxis.set_minor_locator(AutoMinorLocator())
ax1.tick_params(which='both', width=1)
ax1.tick_params(which='major', length=5)
ax1.tick_params(which='minor', length=2.5)
ax1.text(0.70,0.70,r'$\chi^{2}_{\nu}$'+'= {}'.format(round(red_chisqr[aj],2)),transform=ax1.transAxes)
ax1.text(0.10,0.10,'{}'.format(specie+' '+transition[0]),transform=ax1.transAxes)
ak=ak+1
aj=aj+1
canvas = FigureCanvas(fig)
canvas.print_figure('fits.png')
Example output from the above code is
Cant figure out how to combine these two plots.
Here is the relevant code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%pylab inline #Jupyter Notebook
df_state.head()
lm_orginal_plot.head()
outputs for .head()
df_state.loc['Alabama'][['month','total purchases',
'permit','permit_recheck']].plot(x='month',figsize=(6,3),linestyle='-', marker='o')
lm_original_plot.plot(x='month',figsize=(6,3),linestyle=':');
outputs for plots
This is how I would do this (not saying it is the best method or anything):
1) merge two dfs on month
all = df_state(lm_original_plot, on = 'month', how='left')
2) create figure (total is now column just like the other variables in
the first chart, so you can just add ‘total’ to your first chart code)
Not my work, just what a peer showed me.
for info, shape in zip(map.counties_info, map.counties):
if info['FIPS'] in geoids:
x = np.random.rand(1)[0]
c = cmap(x)[:3]
newc = rgb2hex(c)
patches.append(Polygon(np.array(shape), color=newc, closed=True))
ax.add_collection(PatchCollection(patches))
plt.title('Counties with HQ of NYSE-Listed Firms: 1970')
plt.show()
produces this image:
My question is the code specifically asks for random colors in the polygons. If I print the values of newc and display them at a website that converts hex codes to colors, there is a wide range of different colors. But the output has only one. How can I fix this?
In order for a PatchCollection to have different colors for the individual patches, you have two options.
Using the colors of the original patches.
Using a colormap to determine the colors according to some array of values.
Using the colors of the original patches.
This approach is closest to the code from the question. It would require to set the argument match_original=True to the patch collection.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import matplotlib.patches
import matplotlib.collections
ar = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]])
cmap=plt.cm.jet
patches=[]
fig, ax=plt.subplots()
for i in range(5):
x = np.random.rand(1)[0]
c = cmap(x)[:3]
poly = plt.Polygon(ar+i, color=c, closed=True)
patches.append(poly)
collection = matplotlib.collections.PatchCollection(patches,match_original=True)
ax.add_collection(collection)
ax.autoscale()
plt.show()
Using a colormap to determine the colors according to some array of values.
This is probably easier to implement. Instead of giving each individual polygon a color, you would set an array of values to the PatchCollection and specify a colormap according to which the polygons are colorized.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import matplotlib.patches
import matplotlib.collections
ar = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]])
values = np.random.rand(5)
cmap=plt.cm.jet
patches=[]
fig, ax=plt.subplots()
for i in range(len(values)):
poly = plt.Polygon(ar+i, closed=True)
patches.append(poly)
collection = matplotlib.collections.PatchCollection(patches, cmap=cmap)
collection.set_array(values)
ax.add_collection(collection)
ax.autoscale()
plt.show()
Helo everyone
I need some help. I wrote this scrip:
import matplotlib.pyplot as plt
import scipy
import pyfits
import numpy as np
import re
import os
import glob
import time
global numbers
numbers=re.compile(r'(\d+)')
def numericalSort(value):
parts = numbers.split(value)
parts[1::2] = map(int, parts[1::2])
return parts
image_list=sorted(glob.glob('*.fit'), key=numericalSort)
for i in range(len(image_list)):
hdulist=pyfits.open(image_list[i])
data=hdulist[0].data
dimension=hdulist[0].header['NAXIS1']
time=hdulist[0].header['TIME']
hours=float(time[:2])*3600
minutes=float(time[3:5])*60
sec=float(time[6:])
cas=hours+minutes+sec
y=[]
for n in range(0,dimension):
y.append(data.flat[n])
maxy= max(y)
print image_list[i],cas,maxy
plt.plot([cas],[maxy],'bo')
plt.ion()
plt.draw()
This scrip read fit data file. From each file find max value which is y value and from header TIME which is x value axis.
And now my problem...When I run this scrip I get graph but only with points. How I get graph with line (line point to point)?
Thank for answer and help
Your problem may well be here:
plt.plot([cas],[maxy],'bo')
at the point that this statement is encountered, cas is a single value and maxy is also a single value -- you have only one point to plot and therefore nothing to join. Next time round the loop you plot another single point, unconnected to the previous one, and so on.
I can't be sure, but perhaps you mean to do something like:
x = []
for i in range(len(image_list)):
hdulist=pyfits.open(image_list[i])
data=hdulist[0].data
dimension=hdulist[0].header['NAXIS1']
time=hdulist[0].header['TIME']
hours=float(time[:2])*3600
minutes=float(time[3:5])*60
sec=float(time[6:])
cas=hours+minutes+sec
x.append(cas)
y=[]
for n in range(0,dimension):
y.append(data.flat[n])
maxy= max(y)
print image_list[i],cas,maxy
plt.plot(x, y ,'bo-')
plt.ion()
plt.draw()
ie plot a single line once you've collected all the x and y values. The linestyle format, bo- which provides the connecting line.
OK here is solution
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy
import pyfits
import numpy as np
import re
import os
import glob
import time
global numbers
numbers=re.compile(r'(\d+)')
def numericalSort(value):
parts = numbers.split(value)
parts[1::2] = map(int, parts[1::2])
return parts
fig=plt.figure()
ax1=fig.add_subplot(1,1,1)
def animate(i):
image_list=sorted(glob.glob('*.fit'), key=numericalSort)
cas,maxy=[],[]
files=open("data.dat","wr")
for n in range(len(image_list)):
hdulist=pyfits.open(image_list[n])
data=hdulist[0].data
maxy=data.max()
time=hdulist[0].header['TIME']
hours=int(float(time[:2])*3600)
minutes=int(float(time[3:5])*60)
sec=int(float(time[6:]))
cas=hours+minutes+sec
files.write("\n{},{}".format(cas,maxy))
files.close()
pool=open('data.dat','r')
data=pool.read()
dataA=data.split('\n')
xar=[]
yar=[]
pool.close()
for line in dataA:
if len(line)>1:
x,y=line.split(',')
xar.append(int(x))
yar.append(int(y))
print xar,yar
ax1.clear()
ax1.plot(xar,yar,'b-')
ax1.plot(xar,yar,'ro')
plt.title('Light curve')
plt.xlabel('TIME')
plt.ylabel('Max intensity')
plt.grid()
This script read some values from files and plot it.
I have a set of data that I load into python using a pandas dataframe. What I would like to do is create a loop that will print a plot for all the elements in their own frame, not all on one. My data is in an excel file structured in this fashion:
Index | DATE | AMB CO 1 | AMB CO 2 |...|AMB CO_n | TOTAL
1 | 1/1/12| 14 | 33 |...| 236 | 1600
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
n
This is what I have for code so far:
import pandas as pd
import matplotlib.pyplot as plt
ambdf = pd.read_excel('Ambulance.xlsx',
sheetname='Sheet2', index_col=0, na_values=['NA'])
print type(ambdf)
print ambdf
print ambdf['EAS']
amb_plot = plt.plot(ambdf['EAS'], linewidth=2)
plt.title('EAS Ambulance Numbers')
plt.xlabel('Month')
plt.ylabel('Count of Deliveries')
print amb_plot
for i in ambdf:
print plt.plot(ambdf[i], linewidth = 2)
I am thinking of doing something like this:
for i in ambdf:
ambdf_plot = plt.plot(ambdf, linewidth = 2)
The above was not remotely what i wanted and it stems from my unfamiliarity with Pandas, MatplotLib etc, looking at some documentation though to me it looks like matplotlib is not even needed (question 2)
So A) How can I produce a plot of data for every column in my df
and B) do I need to use matplotlib or should I just use pandas to do it all?
Thank you,
Ok, so the easiest method to create several plots is this:
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
for i in range(len(x)):
plt.figure()
plt.plot(x[i],y[i])
# Show/save figure as desired.
plt.show()
# Can show all four figures at once by calling plt.show() here, outside the loop.
#plt.show()
Note that you need to create a figure every time or pyplot will plot in the first one created.
If you want to create several data series all you need to do is:
import matplotlib.pyplot as plt
plt.figure()
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
plt.plot(x[0],y[0],'r',x[1],y[1],'g',x[2],y[2],'b',x[3],y[3],'k')
You could automate it by having a list of colours like ['r','g','b','k'] and then just calling both entries in this list and corresponding data to be plotted in a loop if you wanted to. If you just want to programmatically add data series to one plot something like this will do it (no new figure is created each time so everything is plotted in the same figure):
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
colours=['r','g','b','k']
plt.figure() # In this example, all the plots will be in one figure.
for i in range(len(x)):
plt.plot(x[i],y[i],colours[i])
plt.show()
If anything matplotlib has a very good documentation page with plenty of examples.
17 Dec 2019: added plt.show() and plt.figure() calls to clarify this part of the story.
Use a dictionary!!
You can also use dictionaries that allows you to have more control over the plots:
import matplotlib.pyplot as plt
# plot 0 plot 1 plot 2 plot 3
x=[[1,2,3,4],[1,4,3,4],[1,2,3,4],[9,8,7,4]]
y=[[3,2,3,4],[3,6,3,4],[6,7,8,9],[3,2,2,4]]
plots = zip(x,y)
def loop_plot(plots):
figs={}
axs={}
for idx,plot in enumerate(plots):
figs[idx]=plt.figure()
axs[idx]=figs[idx].add_subplot(111)
axs[idx].plot(plot[0],plot[1])
return figs, axs
figs, axs = loop_plot(plots)
Now you can select the plot that you want to modify easily:
axs[0].set_title("Now I can control it!")
Of course, is up to you to decide what to do with the plots. You can either save them to disk figs[idx].savefig("plot_%s.png" %idx) or show them plt.show(). Use the argument block=False only if you want to pop up all the plots together (this could be quite messy if you have a lot of plots). You can do this inside the loop_plot function or in a separate loop using the dictionaries that the function provided.
Just to add returning figs and axs is not mandatory to execute plt.show().
Here are two examples of how to generate graphs in separate windows (frames), and, an example of how to generate graphs and save them into separate graphics files.
Okay, first the on-screen example. Notice that we use a separate instance of plt.figure(), for each graph, with plt.plot(). At the end, we have to call plt.show() to put it all on the screen.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show()
Another way to do this, is to use plt.show(block=False) inside the loop:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show( block=False )
Now, let's generate the graphs and instead, write them each to a file. Here we replace plt.show(), with plt.savefig( filename ). The difference from the previous example is that we don't have to account for ''blocking'' at each graph. Note also, that we number the file names. Here we use %03d so that we can conveniently have them in number order afterwards.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.savefig('myfilename%03d.png'%(n))
If your requirement is to plot against one column, then feel free to use this (First import data into a pandas DF) (plots a matrix of plots with 5 columns and as many rows required):
import math
i,j=0,0
PLOTS_PER_ROW = 5
fig, axs = plt.subplots(math.ceil(len(df.columns)/PLOTS_PER_ROW),PLOTS_PER_ROW, figsize=(20, 60))
for col in df.columns:
axs[i][j].scatter(df['target_col'], df[col], s=3)
axs[i][j].set_ylabel(col)
j+=1
if j%PLOTS_PER_ROW==0:
i+=1
j=0
plt.show()
A simple way of plotting on different frames would be like:
import matplotlib.pyplot as plt
for grp in list_groups:
plt.figure()
plt.plot(grp)
plt.show()
Then python will plot multiple frames for each iteration.
We can create a for loop and pass all the numeric columns into it.
The loop will plot the graphs one by one in separate pane as we are including
plt.figure() into it.
import pandas as pd
import seaborn as sns
import numpy as np
numeric_features=[x for x in data.columns if data[x].dtype!="object"]
#taking only the numeric columns from the dataframe.
for i in data[numeric_features].columns:
plt.figure(figsize=(12,5))
plt.title(i)
sns.boxplot(data=data[i])