I am getting barplot , but not getting scatterplot - python-2.7

I am on python 2.7, with spyder IDE and this is my data:
Duration ptno
7432.0 X35133502100
7432.0 X35133502100
35255.0 T7956000304
35255.0 T7956000304
17502.0 T7956000304
17502.0 T7956000304
46.0 T7956000304
46.0 T7956000304
The code:
import time
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv('Nissin_11.09.2018.csv')
bx = df1.plot.bar(x='ptno', y='d', rot=0)
plt.setp(bx.get_xticklabels(),rotation=30,horizontalalignment='right')
plt.show()
I get a nice bar plot as I wanted for each value mentioned in columns Duration & ptno. For reference I am attaching image file of the plot.
But when I try to get a scatter plot with:
df1.plot.scatter(x='ptno', y='d')
It throws a error as :
ValueError: scatter requires x column to be numeric
How can I have a 'scatter' plot for my data ??
As suggested by #Hristo Iliev I used his code:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)
But It only plot two unique values on axis where I would like to have all values on x axis as my bar plot has x axis values.

One option is to use pure matplotlib. You need to create an array of numbers to use as the x axis, i.e. [1,2,3,4,5,...] and then change the tick labels to the value of the column ptno.
For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame({"Duration":[7432,7432,35255,35255,17502,17502,46,46],
"ptno":["X35", "X35", "T79", "T79", "T79", "T79", "T79", "T79"]})
dummy_x = np.arange(len(df1.ptno))
plt.scatter(dummy_x, df1.Duration)
plt.xticks(dummy_x, df1.ptno)
plt.show()

You cannot make scatter plots with non-numeric values as indicated by the error. In a scatter plot, the position of each point is determined by the location on the real axis of the value of each variable. Categorical or string values such as T7956000304 have no direct mapping to a position on the real axis.
What you can plot though is a series of strip plots, one for each unique value of ptno. That's easiest to do with Seaborn:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)

Related

How can I remove the negative sign from y tick labels in matplotlib.pyplot figure?

I am using the matplotlib.pylot module to generate thousands of figures that all deal with a value called "Total Vertical Depth(TVD)". The data that these values come from are all negative numbers but the industry standard is to display them as positive (I.E. distance from zero / absolute value). My y axis is used to display the numbers and of course uses the actual value (negative) to label the axis ticks. I do not want to change the values, but am wondering how to access the text elements and just remove the negative symbols from each value(shown in red circles on the image).
Several iterations of code after diving into the matplotlib documentation has gotten me to the following code, but I am still getting an error.
locs, labels = plt.yticks()
newLabels = []
for lbl in labels:
newLabels.append((lbl[0], lbl[1], str(float(str(lbl[2])) * -1)))
plt.yticks(locs, newLabels)
It appears that some of the strings in the "labels" list are empty and therefore the cast isn't working correctly, but I don't understand how it has any empty values if the yticks() method is retrieving the current tick configuration.
#SiHA points out that if we change the data then the order of labels on the y-axis will be reversed. So we can use a ticker formatter to just change the labels without changing the data as shown in the example below:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
#ticker.FuncFormatter
def major_formatter(x, pos):
label = str(-x) if x < 0 else str(x)
return label
y = np.linspace(-3000,-1000,2001)
fig, ax = plt.subplots()
ax.plot(y)
ax.yaxis.set_major_formatter(major_formatter)
plt.show()
This gives me the following plot, notice the order of y-axis labels.
Edit:
based on the Amit's great answer, here's the solution if you want to edit the data instead of the tick formatter:
import matplotlib.pyplot as plt
import numpy as np
y = np.linspace(-3000,-1000,2001)
fig, ax = plt.subplots()
ax.plot(-y) # invert y-values of the data
ax.invert_yaxis() # invert the axis so that larger values are displayed at the bottom
plt.show()

Column order reversed in step histogram plot

Passing a 2D array to Matplotlib's histogram function with histtype='step' seems to plot the columns in reverse order (at least from my biased, Western perspective of left-to-right).
Here's an illustration:
import matplotlib.pyplot as plt
import numpy as np
X = np.array([
np.random.normal(size=5000),
np.random.uniform(size=5000)*2.0 - 1.0,
np.random.beta(2.0,1.0,size=5000)*3.0,
]).T
trash = plt.hist(X,bins=50,histtype='step')
plt.legend(['Normal','2*Uniform-1','3*Beta(2,1)'],loc='upper left')
Produces this:
Running matplotlib version 2.0.2, python 2.7
From the documentation for legend:
in order to keep the "label" and the legend element instance together,
it is preferable to specify the label either at artist creation, or by
calling the set_label method on the
artist
I recommend to use the label keyword argument to hist:
String, or sequence of strings to match multiple datasets
The result is:
import matplotlib.pyplot as plt
import numpy as np
X = np.array([
np.random.normal(size=5000),
np.random.uniform(size=5000)*2.0 - 1.0,
np.random.beta(2.0,1.0,size=5000)*3.0,
]).T
trash = plt.hist(X,bins=50,histtype='step',
label=['Normal','2*Uniform-1','3*Beta(2,1)'])
plt.legend(loc='upper left')
plt.show()

Stacking scatter_matrix and matshow

I was using the iris data from sci-kit-learn to obtain following data frame:
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
Plotting the scatter_matrix and using matshow to plot the correlation matrix give me the graphs scatter_matrix plot and
matshow(df.corr()), respectively.
My question is the following. Is there a way to stack these graphs? In other words, plot the scatter_matrix over the matshow(df.corr()) ?
Thanks in advance.
I suppose what you really want is to colorize the background of the respective axes in the color that would appear in a matshow plot of the correlation matrix.
To this end we can find out the color by supplying the normalized (to 0..1) correlation matrix to a matplotlib colormap and change the axes background color using ax.set_facecolor.
import seaborn.apionly as sns
import pandas as pd
import matplotlib.pyplot as plt
# taking the iris from seaborn (should be same as scikit)
df = sns.load_dataset("iris")
axes =pd.scatter_matrix(df)
corr = df.corr().values
corr_norm = (corr-corr.min())/(corr.max()-corr.min())
for i, ax in enumerate(axes.flatten()):
c = plt.cm.viridis(corr_norm.flatten()[i])
ax.set_facecolor(c)
plt.show()

Secondary y-axis overwriting legend colors and resetting x-y tick font specification

I am trying to plot 2 y-variables (primary and secondary y-axis) against one one x-variable (axis) in Matplotlib (scatterplot), with Python 2.7.
I first create 2 x-axes [twinx()]. Then I use a for loop to plot the primary y-axis scatter plot - I have to use a loop in order to iterate through the columns of a Pandas dataframe which are the x and y-variables. Then I would like to use a second for loop to add a secondary y-axis scatter plot. Finally, I would like to add a legend consisting of primary and secondary y-axis variable names.
However, when I follow these steps, I am having some problems:
s
1. the colors of the legend symbols are not correct
2. the colors of the symbols in the 2 scatter plots are the same so it looks like there
is no secondary y-axis
3. the x and y-tick labels are not picking up the correct font
Here is the code that I have:
import numpy as np
import matplotlib.pyplot as plt
import pylab as pl
import matplotlib.cm as cm
from matplotlib.ticker import MultipleLocator, FormatStrFormatter, FuncFormatter, AutoMinorLocator
from matplotlib import cm
import pandas as pd
from matplotlib.font_manager import FontProperties
# Loading data from *.csv file:
x_var = pd.DataFrame(np.random.rand(10,5),columns=['chan','top','lk_207','robt_sh','me_sh'])
y_var = pd.DataFrame(np.random.rand(10,5),columns=['values','count','chko','54_lib','941_dat'])
y_var_sec = pd.DataFrame(np.random.rand(10,5),columns=['header_max','two','bottom_7739','max_gain','low_ext'])
# List of primary and secondary x and y-variable names:
x_var_names = x_var.columns.tolist() #primary x variable list of names
y_var_names = y_var.columns.tolist() #primary y variable list of names
y_var_sec_names = y_var_sec.columns.tolist() #secondary y variable list of names
# Matplotlib plotting begins:
fig = plt.figure(1)
fig.set_facecolor('white')
ax = fig.add_subplot(111)
ax2=ax.twinx()
for j in range(0,len(x_var_names)): #Generate plot on primary y-axis
ax.scatter(x_var[x_var_names[j]], y_var[y_var_names[j]], color=cm.jet(1.*j/len(x_var)), label=x_var_names[j])
ax.legend(y_var_names+y_var_sec_names, loc = 1, scatterpoints = 1)
# for label in ax2.get_xticklabels(): #NOT WOKRING
# label.set_fontproperties(axis_tick_font)
title_font = {'fontname':'Times New Roman', 'size':'28', 'color':'black', 'weight':'bold','verticalalignment':'bottom'}
axis_font = {'fontname':'Constantia', 'size':'26'}
axis_tick_font = FontProperties(family='Times New Roman', style='normal', size=20, weight='normal', stretch='normal')
legend_fontsize = 20
ax.set_title(ax.get_title())
ax.grid(False)
ax.set_xlabel('Both X here_bottom',**axis_font)
ax.xaxis.set_label_position('bottom')
ax.set_ylabel('Primary Y here_left',**axis_font)
ax.xaxis.set_minor_locator(AutoMinorLocator(5))
ax.yaxis.set_minor_locator(AutoMinorLocator(5))
ax.tick_params(which='minor', length=5, width = 1)
ax.tick_params(direction='in')
ax.tick_params(which='major', width=1)
ax.tick_params(length=10)
ax2.set_ylabel('Secondary Y here_right',**axis_font)
ax2.yaxis.set_minor_locator(AutoMinorLocator(5))
ax2.tick_params(which='minor', length=5, width = 1)
ax2.tick_params(direction='in')
ax2.tick_params(which='major', width=1)
ax2.tick_params(length=10)
plt.show()
To dd the secondary y-axis plot, I would need to use ax2 in the scatter() code but I do not know how to get the colors to increment from the last primary axis color. I want the primary y-axis colors to be different from the secondary y-axis colors.
How can I plot the x-axis on the right (from inside the loop) and get the correct different colors from the primary y-axis?
How can I fix specify the font properties for the primary secondary axes' tickmarks?

Use a loop to plot n charts Python

I have a set of data that I load into python using a pandas dataframe. What I would like to do is create a loop that will print a plot for all the elements in their own frame, not all on one. My data is in an excel file structured in this fashion:
Index | DATE | AMB CO 1 | AMB CO 2 |...|AMB CO_n | TOTAL
1 | 1/1/12| 14 | 33 |...| 236 | 1600
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
. | ... | ... | ... |...| ... | ...
n
This is what I have for code so far:
import pandas as pd
import matplotlib.pyplot as plt
ambdf = pd.read_excel('Ambulance.xlsx',
sheetname='Sheet2', index_col=0, na_values=['NA'])
print type(ambdf)
print ambdf
print ambdf['EAS']
amb_plot = plt.plot(ambdf['EAS'], linewidth=2)
plt.title('EAS Ambulance Numbers')
plt.xlabel('Month')
plt.ylabel('Count of Deliveries')
print amb_plot
for i in ambdf:
print plt.plot(ambdf[i], linewidth = 2)
I am thinking of doing something like this:
for i in ambdf:
ambdf_plot = plt.plot(ambdf, linewidth = 2)
The above was not remotely what i wanted and it stems from my unfamiliarity with Pandas, MatplotLib etc, looking at some documentation though to me it looks like matplotlib is not even needed (question 2)
So A) How can I produce a plot of data for every column in my df
and B) do I need to use matplotlib or should I just use pandas to do it all?
Thank you,
Ok, so the easiest method to create several plots is this:
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
for i in range(len(x)):
plt.figure()
plt.plot(x[i],y[i])
# Show/save figure as desired.
plt.show()
# Can show all four figures at once by calling plt.show() here, outside the loop.
#plt.show()
Note that you need to create a figure every time or pyplot will plot in the first one created.
If you want to create several data series all you need to do is:
import matplotlib.pyplot as plt
plt.figure()
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
plt.plot(x[0],y[0],'r',x[1],y[1],'g',x[2],y[2],'b',x[3],y[3],'k')
You could automate it by having a list of colours like ['r','g','b','k'] and then just calling both entries in this list and corresponding data to be plotted in a loop if you wanted to. If you just want to programmatically add data series to one plot something like this will do it (no new figure is created each time so everything is plotted in the same figure):
import matplotlib.pyplot as plt
x=[[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]
y=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]]
colours=['r','g','b','k']
plt.figure() # In this example, all the plots will be in one figure.
for i in range(len(x)):
plt.plot(x[i],y[i],colours[i])
plt.show()
If anything matplotlib has a very good documentation page with plenty of examples.
17 Dec 2019: added plt.show() and plt.figure() calls to clarify this part of the story.
Use a dictionary!!
You can also use dictionaries that allows you to have more control over the plots:
import matplotlib.pyplot as plt
# plot 0 plot 1 plot 2 plot 3
x=[[1,2,3,4],[1,4,3,4],[1,2,3,4],[9,8,7,4]]
y=[[3,2,3,4],[3,6,3,4],[6,7,8,9],[3,2,2,4]]
plots = zip(x,y)
def loop_plot(plots):
figs={}
axs={}
for idx,plot in enumerate(plots):
figs[idx]=plt.figure()
axs[idx]=figs[idx].add_subplot(111)
axs[idx].plot(plot[0],plot[1])
return figs, axs
figs, axs = loop_plot(plots)
Now you can select the plot that you want to modify easily:
axs[0].set_title("Now I can control it!")
Of course, is up to you to decide what to do with the plots. You can either save them to disk figs[idx].savefig("plot_%s.png" %idx) or show them plt.show(). Use the argument block=False only if you want to pop up all the plots together (this could be quite messy if you have a lot of plots). You can do this inside the loop_plot function or in a separate loop using the dictionaries that the function provided.
Just to add returning figs and axs is not mandatory to execute plt.show().
Here are two examples of how to generate graphs in separate windows (frames), and, an example of how to generate graphs and save them into separate graphics files.
Okay, first the on-screen example. Notice that we use a separate instance of plt.figure(), for each graph, with plt.plot(). At the end, we have to call plt.show() to put it all on the screen.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show()
Another way to do this, is to use plt.show(block=False) inside the loop:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.show( block=False )
Now, let's generate the graphs and instead, write them each to a file. Here we replace plt.show(), with plt.savefig( filename ). The difference from the previous example is that we don't have to account for ''blocking'' at each graph. Note also, that we number the file names. Here we use %03d so that we can conveniently have them in number order afterwards.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace( 0,10 )
for n in range(3):
y = np.sin( x+n )
plt.figure()
plt.plot( x, y )
plt.savefig('myfilename%03d.png'%(n))
If your requirement is to plot against one column, then feel free to use this (First import data into a pandas DF) (plots a matrix of plots with 5 columns and as many rows required):
import math
i,j=0,0
PLOTS_PER_ROW = 5
fig, axs = plt.subplots(math.ceil(len(df.columns)/PLOTS_PER_ROW),PLOTS_PER_ROW, figsize=(20, 60))
for col in df.columns:
axs[i][j].scatter(df['target_col'], df[col], s=3)
axs[i][j].set_ylabel(col)
j+=1
if j%PLOTS_PER_ROW==0:
i+=1
j=0
plt.show()
A simple way of plotting on different frames would be like:
import matplotlib.pyplot as plt
for grp in list_groups:
plt.figure()
plt.plot(grp)
plt.show()
Then python will plot multiple frames for each iteration.
We can create a for loop and pass all the numeric columns into it.
The loop will plot the graphs one by one in separate pane as we are including
plt.figure() into it.
import pandas as pd
import seaborn as sns
import numpy as np
numeric_features=[x for x in data.columns if data[x].dtype!="object"]
#taking only the numeric columns from the dataframe.
for i in data[numeric_features].columns:
plt.figure(figsize=(12,5))
plt.title(i)
sns.boxplot(data=data[i])