Stacking scatter_matrix and matshow - python-2.7

I was using the iris data from sci-kit-learn to obtain following data frame:
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
Plotting the scatter_matrix and using matshow to plot the correlation matrix give me the graphs scatter_matrix plot and
matshow(df.corr()), respectively.
My question is the following. Is there a way to stack these graphs? In other words, plot the scatter_matrix over the matshow(df.corr()) ?
Thanks in advance.

I suppose what you really want is to colorize the background of the respective axes in the color that would appear in a matshow plot of the correlation matrix.
To this end we can find out the color by supplying the normalized (to 0..1) correlation matrix to a matplotlib colormap and change the axes background color using ax.set_facecolor.
import seaborn.apionly as sns
import pandas as pd
import matplotlib.pyplot as plt
# taking the iris from seaborn (should be same as scikit)
df = sns.load_dataset("iris")
axes =pd.scatter_matrix(df)
corr = df.corr().values
corr_norm = (corr-corr.min())/(corr.max()-corr.min())
for i, ax in enumerate(axes.flatten()):
c = plt.cm.viridis(corr_norm.flatten()[i])
ax.set_facecolor(c)
plt.show()

Related

Applying Multi Otsu Threshold for my image

I have this image shown below
And, here I am trying to define the threshold to distinguish bimodal class by using the Otsu technique based on intensity and then visualise those in the histogram. So far I have written following codes:
import matplotlib.pyplot as plt
import numpy as np
from skimage import data, io, img_as_ubyte
from skimage.filters import threshold_multiotsu
# Read an image
image = io.imread("Fig_1.png")
# Apply multi-Otsu threshold
thresholds = threshold_multiotsu(image,classes=5)
# Digitize (segment) original image into multiple classes.
#np.digitize assign values 0, 1, 2, 3, ... to pixels in each class.
regions = np.digitize(image, bins=thresholds)
output = img_as_ubyte(regions) #Convert 64 bit integer values to uint8
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(10, 3.5))
# Plotting the original image.
ax[0].imshow(image, cmap='gray')
ax[0].set_title('Original')
ax[0].axis('off')
# Plotting the histogram and the two thresholds obtained from
# multi-Otsu.
ax[1].hist(image.ravel(), bins=255)
ax[1].set_title('Histogram')
for thresh in thresholds:
ax[1].axvline(thresh, color='r')
# Plotting the Multi Otsu result.
ax[2].imshow(regions, cmap='gray')
ax[2].set_title('Multi-Otsu result')
ax[2].axis('off')
plt.subplots_adjust()
plt.show()
This gives me the following result. Here As you can see Multi-Otsu result is totally black and does not show the two class of object present in the figure.
I choose classes=5 but this is bimodal hence putting classes=3 also giving me the same result.
Any advice on how to correct this? Thanks in advance.

I am getting barplot , but not getting scatterplot

I am on python 2.7, with spyder IDE and this is my data:
Duration ptno
7432.0 X35133502100
7432.0 X35133502100
35255.0 T7956000304
35255.0 T7956000304
17502.0 T7956000304
17502.0 T7956000304
46.0 T7956000304
46.0 T7956000304
The code:
import time
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv('Nissin_11.09.2018.csv')
bx = df1.plot.bar(x='ptno', y='d', rot=0)
plt.setp(bx.get_xticklabels(),rotation=30,horizontalalignment='right')
plt.show()
I get a nice bar plot as I wanted for each value mentioned in columns Duration & ptno. For reference I am attaching image file of the plot.
But when I try to get a scatter plot with:
df1.plot.scatter(x='ptno', y='d')
It throws a error as :
ValueError: scatter requires x column to be numeric
How can I have a 'scatter' plot for my data ??
As suggested by #Hristo Iliev I used his code:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)
But It only plot two unique values on axis where I would like to have all values on x axis as my bar plot has x axis values.
One option is to use pure matplotlib. You need to create an array of numbers to use as the x axis, i.e. [1,2,3,4,5,...] and then change the tick labels to the value of the column ptno.
For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame({"Duration":[7432,7432,35255,35255,17502,17502,46,46],
"ptno":["X35", "X35", "T79", "T79", "T79", "T79", "T79", "T79"]})
dummy_x = np.arange(len(df1.ptno))
plt.scatter(dummy_x, df1.Duration)
plt.xticks(dummy_x, df1.ptno)
plt.show()
You cannot make scatter plots with non-numeric values as indicated by the error. In a scatter plot, the position of each point is determined by the location on the real axis of the value of each variable. Categorical or string values such as T7956000304 have no direct mapping to a position on the real axis.
What you can plot though is a series of strip plots, one for each unique value of ptno. That's easiest to do with Seaborn:
import seaborn as sns
_ = sns.stripplot(x='ptno', y='d', data=df1)

Show all colors on colorbar with scatter plot

In the following I use scatter and an own ListedColormap to plot some coloured data points. In addition the corresponding colorbar is also plotted.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap, ListedColormap, BoundaryNorm
from numpy import arange
fig, ax = plt.subplots()
my_cm = ListedColormap(['#a71b1b','#94258f','#ea99e6','#ec9510','#ece43b','#a3f8ff','#2586df','#035e0d'])
bounds=range(8)
norm = BoundaryNorm(bounds, my_cm.N)
data = [1,2,1,3,0,5,3,4]
ret = ax.scatter(range(my_cm.N), [1]*my_cm.N, c=data, edgecolors='face', cmap=my_cm, s=50)
cbar = fig.colorbar(ret, ax=ax, boundaries=arange(-0.5,8,1), ticks=bounds, norm=norm)
cbar.ax.tick_params(axis='both', which='both',length=0)
If my data is not covering each value of the boundary interval, the colorbar does not show all colours (like in the added figure). If data would be set to range(8), I get a dot of each colour and the colorbar also shows all colours.
How can I force the colorbar to show all defined colours even if data does not contain all boundary values?
You need to manually set vminand vmax in your call to ax.scatter:
ret = ax.scatter(range(my_cm.N), [1]*my_cm.N, c=data, edgecolors='face', cmap=my_cm, s=50, vmin=0, vmax=7)
resulting in
If my data is not covering each value of the boundary interval, the colorbar does not show all colours (like in the added figure).
If either vminor vmax are `None the color limits are set via the method
autoscale_None, and the minimum and maximum of your data are therefore used.
So using your code it is actually not necessary for showing all colors in the colorbar that every value of the boundary interval is covered, only the minimum and maximum need to be included.
Using e.g. data = [0,0,0,0,0,0,0,7] results in the following:
When looking for something else, I found another solution to that problem: colorbar-for-matplotlib-plot-surface-command.
In that case, I do not need to set vmin and vmax and it is also working in cases if the arrays/lists of points to plot are empty. Instead a ScalarMappable is defined and provided to colorbar instead of the scatterinstance.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap, ListedColormap, BoundaryNorm
import matplotlib.cm as cm
from numpy import arange
fig, ax = plt.subplots()
my_cm = ListedColormap(['#a71b1b','#94258f','#ea99e6','#ec9510','#ece43b','#a3f8ff','#2586df','#035e0d'])
bounds=range(8)
norm = BoundaryNorm(bounds, my_cm.N)
mappable = cm.ScalarMappable(cmap=my_cm)
mappable.set_array(bounds)
data = [] # also x and y can be []
ax.scatter(x=range(my_cm.N), y=[1]*my_cm.N, c=data, edgecolors='face', cmap=my_cm, s=50)
cbar = fig.colorbar(mappable, ax=ax, boundaries=arange(-0.5,8,1), ticks=bounds, norm=norm)
cbar.ax.tick_params(axis='both', which='both',length=0)

matplotlib correlation matrix heatmap with grouped colors as labels

I have a correlation matrix hat I am trying to visualize with matplotlib. I can create a heatmap style figure just fine, but I am running into problems with how I want the labels. I'm not even sure if this is possible, but this is what I'm trying to do and can't seem to make it work:
My correlation matrix is 150 X 150. On either the x or y (or both...this doesn't matter) axis, I would like to group the labels and then simply label them with a color, or a white label on a color background.
To clarify, let's say I'd like to have 1-15 as "Group 1" and either simply be a Blue bar, or "Group 1" text on a blue bar. Then 16-20 as "Group 2" on a red bar, or simply a red bar. Etc, through all of the items in the matrix.
I have been failing at both grouping axis labels as well as getting any color on them. Any help would be greatly appreciated. My code is below, though it's quite basic and I don't know if it will help.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
# COREELATION MATRIX TEST #
corr = np.genfromtxt(csv_path,delimiter=',')
fig = plt.figure()
ax1 = fig.add_subplot(111)
cmap = cm.get_cmap('jet', 30)
cax = ax1.imshow(corr, cmap=cmap)
ax1.grid(True)
plt.title('THIS IS MY TITLE')
fig.colorbar(cax, ticks=[-1,-0.8,-0.6,-0.4,-0.2,0.0,0.2,0.4,0.6,0.8,1.0])
plt.show()
You may create auxilary axes next to the plot and plot colored bar plots to them. Turning the axes spines off lets those bars look like labelboxes.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
# COREELATION MATRIX TEST #
corr = 2*np.random.rand(150,150)-1
# labels [start,end]
labels = np.array([[0,15],[16,36],[37,82],[83,111],[112,149]])
colors = ["crimson", "limegreen","gold","orchid","turquoise"]
fig, ax = plt.subplots()
im = ax.imshow(corr, cmap="Blues")
ax.set_title('THIS IS MY TITLE')
fig.colorbar(im, ticks=[-1,-0.8,-0.6,-0.4,-0.2,0.0,0.2,0.4,0.6,0.8,1.0])
# create axes next to plot
divider = make_axes_locatable(ax)
axb = divider.append_axes("bottom", "10%", pad=0.06, sharex=ax)
axl = divider.append_axes("left", "10%", pad=0.06, sharey=ax)
axb.invert_yaxis()
axl.invert_xaxis()
axb.axis("off")
axl.axis("off")
# plot colored bar plots to the axes
barkw = dict( color=colors, linewidth=0.72, ec="k", clip_on=False, align='edge',)
axb.bar(labels[:,0],np.ones(len(labels)),
width=np.diff(labels, axis=1).flatten(), **barkw)
axl.barh(labels[:,0],np.ones(len(labels)),
height=np.diff(labels, axis=1).flatten(), **barkw)
# set margins to zero again
ax.margins(0)
ax.tick_params(axis="both", bottom=0, left=0, labelbottom=0,labelleft=0)
# Label the boxes
textkw = dict(ha="center", va="center", fontsize="small")
for k,l in labels:
axb.text((k+l)/2.,0.5, "{}-{}".format(k,l), **textkw)
axl.text(0.5,(k+l)/2., "{}-{}".format(k,l), rotation=-90,**textkw)
plt.show()

Set location of xticks in a matplotlib scatter plot

I am trying to create a scatter plot of measurements where the x labels are WIFI channels. By default matplotlib is spacing the labels in proportion to their numerical value. However, I would like them to be spaced uniformly over the scatter plot. Is that possible?
This is basically what my plot code currently looks like:
- where chanPoints is a list of frequencies and measurements is a list of measurements.
plt.scatter(chanPoints,measurements)
plt.xlabel('Frequency (MHz)')
plt.ylabel('EVM (dB)')
plt.xticks(Tchan,rotation = 90)
plt.title('EVM for 5G Channels by Site')
plt.show()
Numpy
You may use numpy to create an array which maps the unique items within chanPoints to numbers 0,1,2.... You can then give each of those numbers the corresponding label.
import matplotlib.pyplot as plt
import numpy as np
chanPoints = [4980, 4920,4920,5500,4980,5500,4980, 5500, 4920]
measurements = [5,6,4,3,5,8,4,6,3]
unique, index = np.unique(chanPoints, return_inverse=True)
plt.scatter(index, measurements)
plt.xlabel('Frequency (MHz)')
plt.ylabel('EVM (dB)')
plt.xticks(range(len(unique)), unique)
plt.title('EVM for 5G Channels by Site')
plt.show()
Seaborn
If you're happy to use seaborn, this can save a lot of manual work. Seaborn is specialized for plotting categorical data. The chanPoints would be interpreted as categories on the x axis, and have the same spacing between them, if you were e.g. using a swarmplot. If several points would then overlap, they are plotted next to each other, which may be an advantage as it allows to see the number of measurement for that channel.
import matplotlib.pyplot as plt
import seaborn.apionly as sns
chanPoints = [4980, 4920,4920,5500,4980,5500,4980, 5500, 4920]
measurements = [5,6,4,3,5,8,4,6,3]
sns.swarmplot(chanPoints, measurements)
plt.xlabel('Frequency (MHz)')
plt.ylabel('EVM (dB)')
plt.title('EVM for 5G Channels by Site')
plt.show()
Replace chanPoints with an index.
index = numpy.searchsorted(Tchan, chanPoints)
plt.scatter(index, measurements)
Then build your xticks with the corresponding lables.
ticks = range(len(Tchan))
plt.xticks(ticks, labels=Tchan, rotation = 90)