Related
I am on python 2.7, with spyder IDE and this is my data:
Duration ptno
7432.0 X35133502100
7432.0 X35133502100
35255.0 T7956000304
35255.0 T7956000304
17502.0 T7956000304
17502.0 T7956000304
46.0 T7956000304
46.0 T7956000304
The code:
import time
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv('Nissin_11.09.2018.csv')
bx = df1.plot.bar(x='ptno', y='d', rot=0)
plt.setp(bx.get_xticklabels(),rotation=30,horizontalalignment='right')
plt.show()
I get a nice bar plot as I wanted for each value mentioned in columns Duration & ptno. For reference I am attaching image file of the plot.
But when I try to get a scatter plot with:
df1.plot.scatter(x='ptno', y='d')
It throws a error as :
ValueError: scatter requires x column to be numeric
How can I have a 'scatter' plot for my data ??
As suggested by #Hristo Iliev I used his code:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)
But It only plot two unique values on axis where I would like to have all values on x axis as my bar plot has x axis values.
One option is to use pure matplotlib. You need to create an array of numbers to use as the x axis, i.e. [1,2,3,4,5,...] and then change the tick labels to the value of the column ptno.
For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame({"Duration":[7432,7432,35255,35255,17502,17502,46,46],
"ptno":["X35", "X35", "T79", "T79", "T79", "T79", "T79", "T79"]})
dummy_x = np.arange(len(df1.ptno))
plt.scatter(dummy_x, df1.Duration)
plt.xticks(dummy_x, df1.ptno)
plt.show()
You cannot make scatter plots with non-numeric values as indicated by the error. In a scatter plot, the position of each point is determined by the location on the real axis of the value of each variable. Categorical or string values such as T7956000304 have no direct mapping to a position on the real axis.
What you can plot though is a series of strip plots, one for each unique value of ptno. That's easiest to do with Seaborn:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)
I am using the matplotlib.pylot module to generate thousands of figures that all deal with a value called "Total Vertical Depth(TVD)". The data that these values come from are all negative numbers but the industry standard is to display them as positive (I.E. distance from zero / absolute value). My y axis is used to display the numbers and of course uses the actual value (negative) to label the axis ticks. I do not want to change the values, but am wondering how to access the text elements and just remove the negative symbols from each value(shown in red circles on the image).
Several iterations of code after diving into the matplotlib documentation has gotten me to the following code, but I am still getting an error.
locs, labels = plt.yticks()
newLabels = []
for lbl in labels:
newLabels.append((lbl[0], lbl[1], str(float(str(lbl[2])) * -1)))
plt.yticks(locs, newLabels)
It appears that some of the strings in the "labels" list are empty and therefore the cast isn't working correctly, but I don't understand how it has any empty values if the yticks() method is retrieving the current tick configuration.
#SiHA points out that if we change the data then the order of labels on the y-axis will be reversed. So we can use a ticker formatter to just change the labels without changing the data as shown in the example below:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
#ticker.FuncFormatter
def major_formatter(x, pos):
label = str(-x) if x < 0 else str(x)
return label
y = np.linspace(-3000,-1000,2001)
fig, ax = plt.subplots()
ax.plot(y)
ax.yaxis.set_major_formatter(major_formatter)
plt.show()
This gives me the following plot, notice the order of y-axis labels.
Edit:
based on the Amit's great answer, here's the solution if you want to edit the data instead of the tick formatter:
import matplotlib.pyplot as plt
import numpy as np
y = np.linspace(-3000,-1000,2001)
fig, ax = plt.subplots()
ax.plot(-y) # invert y-values of the data
ax.invert_yaxis() # invert the axis so that larger values are displayed at the bottom
plt.show()
for info, shape in zip(map.counties_info, map.counties):
if info['FIPS'] in geoids:
x = np.random.rand(1)[0]
c = cmap(x)[:3]
newc = rgb2hex(c)
patches.append(Polygon(np.array(shape), color=newc, closed=True))
ax.add_collection(PatchCollection(patches))
plt.title('Counties with HQ of NYSE-Listed Firms: 1970')
plt.show()
produces this image:
My question is the code specifically asks for random colors in the polygons. If I print the values of newc and display them at a website that converts hex codes to colors, there is a wide range of different colors. But the output has only one. How can I fix this?
In order for a PatchCollection to have different colors for the individual patches, you have two options.
Using the colors of the original patches.
Using a colormap to determine the colors according to some array of values.
Using the colors of the original patches.
This approach is closest to the code from the question. It would require to set the argument match_original=True to the patch collection.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import matplotlib.patches
import matplotlib.collections
ar = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]])
cmap=plt.cm.jet
patches=[]
fig, ax=plt.subplots()
for i in range(5):
x = np.random.rand(1)[0]
c = cmap(x)[:3]
poly = plt.Polygon(ar+i, color=c, closed=True)
patches.append(poly)
collection = matplotlib.collections.PatchCollection(patches,match_original=True)
ax.add_collection(collection)
ax.autoscale()
plt.show()
Using a colormap to determine the colors according to some array of values.
This is probably easier to implement. Instead of giving each individual polygon a color, you would set an array of values to the PatchCollection and specify a colormap according to which the polygons are colorized.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import matplotlib.patches
import matplotlib.collections
ar = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]])
values = np.random.rand(5)
cmap=plt.cm.jet
patches=[]
fig, ax=plt.subplots()
for i in range(len(values)):
poly = plt.Polygon(ar+i, closed=True)
patches.append(poly)
collection = matplotlib.collections.PatchCollection(patches, cmap=cmap)
collection.set_array(values)
ax.add_collection(collection)
ax.autoscale()
plt.show()
I am using matplotlib to create the plots. I have to identify each plot with a different color which should be automatically generated by Python.
Can you please give me a method to put different colors for different plots in the same figure?
Matplotlib does this by default.
E.g.:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.show()
And, as you may already know, you can easily add a legend:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.legend(['y = x', 'y = 2x', 'y = 3x', 'y = 4x'], loc='upper left')
plt.show()
If you want to control the colors that will be cycled through:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.gca().set_color_cycle(['red', 'green', 'blue', 'yellow'])
plt.plot(x, x)
plt.plot(x, 2 * x)
plt.plot(x, 3 * x)
plt.plot(x, 4 * x)
plt.legend(['y = x', 'y = 2x', 'y = 3x', 'y = 4x'], loc='upper left')
plt.show()
If you're unfamiliar with matplotlib, the tutorial is a good place to start.
Edit:
First off, if you have a lot (>5) of things you want to plot on one figure, either:
Put them on different plots (consider using a few subplots on one figure), or
Use something other than color (i.e. marker styles or line thickness) to distinguish between them.
Otherwise, you're going to wind up with a very messy plot! Be nice to who ever is going to read whatever you're doing and don't try to cram 15 different things onto one figure!!
Beyond that, many people are colorblind to varying degrees, and distinguishing between numerous subtly different colors is difficult for more people than you may realize.
That having been said, if you really want to put 20 lines on one axis with 20 relatively distinct colors, here's one way to do it:
import matplotlib.pyplot as plt
import numpy as np
num_plots = 20
# Have a look at the colormaps here and decide which one you'd like:
# http://matplotlib.org/1.2.1/examples/pylab_examples/show_colormaps.html
colormap = plt.cm.gist_ncar
plt.gca().set_prop_cycle(plt.cycler('color', plt.cm.jet(np.linspace(0, 1, num_plots))))
# Plot several different functions...
x = np.arange(10)
labels = []
for i in range(1, num_plots + 1):
plt.plot(x, i * x + 5 * i)
labels.append(r'$y = %ix + %i$' % (i, 5*i))
# I'm basically just demonstrating several different legend options here...
plt.legend(labels, ncol=4, loc='upper center',
bbox_to_anchor=[0.5, 1.1],
columnspacing=1.0, labelspacing=0.0,
handletextpad=0.0, handlelength=1.5,
fancybox=True, shadow=True)
plt.show()
Setting them later
If you don't know the number of the plots you are going to plot you can change the colours once you have plotted them retrieving the number directly from the plot using .lines, I use this solution:
Some random data
import matplotlib.pyplot as plt
import numpy as np
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
for i in range(1,15):
ax1.plot(np.array([1,5])*i,label=i)
The piece of code that you need:
colormap = plt.cm.gist_ncar #nipy_spectral, Set1,Paired
colors = [colormap(i) for i in np.linspace(0, 1,len(ax1.lines))]
for i,j in enumerate(ax1.lines):
j.set_color(colors[i])
ax1.legend(loc=2)
The result is the following:
TL;DR No, it can't be done automatically. Yes, it is possible.
import matplotlib.pyplot as plt
my_colors = plt.rcParams['axes.prop_cycle']() # <<< note that we CALL the prop_cycle
fig, axes = plt.subplots(2,3)
for ax in axes.flatten(): ax.plot((0,1), (0,1), **next(my_colors))
Each plot (axes) in a figure (figure) has its own cycle of colors — if you don't force a different color for each plot, all the plots share the same order of colors but, if we stretch a bit what "automatically" means, it can be done.
The OP wrote
[...] I have to identify each plot with a different color which should be automatically generated by [Matplotlib].
But... Matplotlib automatically generates different colors for each different curve
In [10]: import numpy as np
...: import matplotlib.pyplot as plt
In [11]: plt.plot((0,1), (0,1), (1,2), (1,0));
Out[11]:
So why the OP request? If we continue to read, we have
Can you please give me a method to put different colors for different plots in the same figure?
and it make sense, because each plot (each axes in Matplotlib's parlance) has its own color_cycle (or rather, in 2018, its prop_cycle) and each plot (axes) reuses the same colors in the same order.
In [12]: fig, axes = plt.subplots(2,3)
In [13]: for ax in axes.flatten():
...: ax.plot((0,1), (0,1))
If this is the meaning of the original question, one possibility is to explicitly name a different color for each plot.
If the plots (as it often happens) are generated in a loop we must have an additional loop variable to override the color automatically chosen by Matplotlib.
In [14]: fig, axes = plt.subplots(2,3)
In [15]: for ax, short_color_name in zip(axes.flatten(), 'brgkyc'):
...: ax.plot((0,1), (0,1), short_color_name)
Another possibility is to instantiate a cycler object
from cycler import cycler
my_cycler = cycler('color', ['k', 'r']) * cycler('linewidth', [1., 1.5, 2.])
actual_cycler = my_cycler()
fig, axes = plt.subplots(2,3)
for ax in axes.flat:
ax.plot((0,1), (0,1), **next(actual_cycler))
Note that type(my_cycler) is cycler.Cycler but type(actual_cycler) is itertools.cycle.
I would like to offer a minor improvement on the last loop answer given in the previous post (that post is correct and should still be accepted). The implicit assumption made when labeling the last example is that plt.label(LIST) puts label number X in LIST with the line corresponding to the Xth time plot was called. I have run into problems with this approach before. The recommended way to build legends and customize their labels per matplotlibs documentation ( http://matplotlib.org/users/legend_guide.html#adjusting-the-order-of-legend-item) is to have a warm feeling that the labels go along with the exact plots you think they do:
...
# Plot several different functions...
labels = []
plotHandles = []
for i in range(1, num_plots + 1):
x, = plt.plot(some x vector, some y vector) #need the ',' per ** below
plotHandles.append(x)
labels.append(some label)
plt.legend(plotHandles, labels, 'upper left',ncol=1)
**: Matplotlib Legends not working
Matplot colors your plot with different colors , but incase you wanna put specific colors
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
plt.plot(x, x)
plt.plot(x, 2 * x,color='blue')
plt.plot(x, 3 * x,color='red')
plt.plot(x, 4 * x,color='green')
plt.show()
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from skspatial.objects import Line, Vector
for count in range(0,len(LineList),1):
Line_Color = np.random.rand(3,)
Line(StartPoint,EndPoint)).plot_3d(ax,c="Line"+str(count),label="Line"+str(count))
plt.legend(loc='lower left')
plt.show(block=True)
The above code might help you to add 3D lines with different colours in a randomized fashion. Your colored lines can also be referenced with a help of a legend as mentioned in the label="... " parameter.
Honestly, my favourite way to do this is pretty simple: Now this won't work for an arbitrarily large number of plots, but it will do you up to 1163. This is by using the map of all matplotlib's named colours and then selecting them at random.
from random import choice
import matplotlib.pyplot as plt
from matplotlib.colors import mcolors
# Get full named colour map from matplotlib
colours = mcolors._colors_full_map # This is a dictionary of all named colours
# Turn the dictionary into a list
color_lst = list(colours.values())
# Plot using these random colours
for n, plot in enumerate(plots):
plt.scatter(plot[x], plot[y], color=choice(color_lst), label=n)
I have a set of data that I load into python using a pandas dataframe. What I would like to do is create a loop that will print a plot for all the elements in their own frame, not all on one. My data is in an excel file structured in this fashion:
Index | DATE | AMB CO 1 | AMB CO 2 |...|AMB CO_n | TOTAL
1 | 1/1/12| 14 | 33 |...| 236 | 1600
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
n
This is what I have for code so far:
import pandas as pd
import matplotlib.pyplot as plt
ambdf = pd.read_excel('Ambulance.xlsx',
sheetname='Sheet2', index_col=0, na_values=['NA'])
print type(ambdf)
print ambdf
print ambdf['EAS']
amb_plot = plt.plot(ambdf['EAS'], linewidth=2)
plt.title('EAS Ambulance Numbers')
plt.xlabel('Month')
plt.ylabel('Count of Deliveries')
print amb_plot
for i in ambdf:
print plt.plot(ambdf[i], linewidth = 2)
I am thinking of doing something like this:
for i in ambdf:
ambdf_plot = plt.plot(ambdf, linewidth = 2)
The above was not remotely what i wanted and it stems from my unfamiliarity with Pandas, MatplotLib etc, looking at some documentation though to me it looks like matplotlib is not even needed (question 2)
So A) How can I produce a plot of data for every column in my df
and B) do I need to use matplotlib or should I just use pandas to do it all?
Thank you,
Ok, so the easiest method to create several plots is this:
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
for i in range(len(x)):
plt.figure()
plt.plot(x[i],y[i])
# Show/save figure as desired.
plt.show()
# Can show all four figures at once by calling plt.show() here, outside the loop.
#plt.show()
Note that you need to create a figure every time or pyplot will plot in the first one created.
If you want to create several data series all you need to do is:
import matplotlib.pyplot as plt
plt.figure()
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
plt.plot(x[0],y[0],'r',x[1],y[1],'g',x[2],y[2],'b',x[3],y[3],'k')
You could automate it by having a list of colours like ['r','g','b','k'] and then just calling both entries in this list and corresponding data to be plotted in a loop if you wanted to. If you just want to programmatically add data series to one plot something like this will do it (no new figure is created each time so everything is plotted in the same figure):
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
colours=['r','g','b','k']
plt.figure() # In this example, all the plots will be in one figure.
for i in range(len(x)):
plt.plot(x[i],y[i],colours[i])
plt.show()
If anything matplotlib has a very good documentation page with plenty of examples.
17 Dec 2019: added plt.show() and plt.figure() calls to clarify this part of the story.
Use a dictionary!!
You can also use dictionaries that allows you to have more control over the plots:
import matplotlib.pyplot as plt
# plot 0 plot 1 plot 2 plot 3
x=[[1,2,3,4],[1,4,3,4],[1,2,3,4],[9,8,7,4]]
y=[[3,2,3,4],[3,6,3,4],[6,7,8,9],[3,2,2,4]]
plots = zip(x,y)
def loop_plot(plots):
figs={}
axs={}
for idx,plot in enumerate(plots):
figs[idx]=plt.figure()
axs[idx]=figs[idx].add_subplot(111)
axs[idx].plot(plot[0],plot[1])
return figs, axs
figs, axs = loop_plot(plots)
Now you can select the plot that you want to modify easily:
axs[0].set_title("Now I can control it!")
Of course, is up to you to decide what to do with the plots. You can either save them to disk figs[idx].savefig("plot_%s.png" %idx) or show them plt.show(). Use the argument block=False only if you want to pop up all the plots together (this could be quite messy if you have a lot of plots). You can do this inside the loop_plot function or in a separate loop using the dictionaries that the function provided.
Just to add returning figs and axs is not mandatory to execute plt.show().
Here are two examples of how to generate graphs in separate windows (frames), and, an example of how to generate graphs and save them into separate graphics files.
Okay, first the on-screen example. Notice that we use a separate instance of plt.figure(), for each graph, with plt.plot(). At the end, we have to call plt.show() to put it all on the screen.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show()
Another way to do this, is to use plt.show(block=False) inside the loop:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show( block=False )
Now, let's generate the graphs and instead, write them each to a file. Here we replace plt.show(), with plt.savefig( filename ). The difference from the previous example is that we don't have to account for ''blocking'' at each graph. Note also, that we number the file names. Here we use %03d so that we can conveniently have them in number order afterwards.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.savefig('myfilename%03d.png'%(n))
If your requirement is to plot against one column, then feel free to use this (First import data into a pandas DF) (plots a matrix of plots with 5 columns and as many rows required):
import math
i,j=0,0
PLOTS_PER_ROW = 5
fig, axs = plt.subplots(math.ceil(len(df.columns)/PLOTS_PER_ROW),PLOTS_PER_ROW, figsize=(20, 60))
for col in df.columns:
axs[i][j].scatter(df['target_col'], df[col], s=3)
axs[i][j].set_ylabel(col)
j+=1
if j%PLOTS_PER_ROW==0:
i+=1
j=0
plt.show()
A simple way of plotting on different frames would be like:
import matplotlib.pyplot as plt
for grp in list_groups:
plt.figure()
plt.plot(grp)
plt.show()
Then python will plot multiple frames for each iteration.
We can create a for loop and pass all the numeric columns into it.
The loop will plot the graphs one by one in separate pane as we are including
plt.figure() into it.
import pandas as pd
import seaborn as sns
import numpy as np
numeric_features=[x for x in data.columns if data[x].dtype!="object"]
#taking only the numeric columns from the dataframe.
for i in data[numeric_features].columns:
plt.figure(figsize=(12,5))
plt.title(i)
sns.boxplot(data=data[i])