Color image segmentation with Python - python-2.7

I have many pictures as below:
My objective is to identify those "beads", try to mark it with a circle, and count the detected numbers.
I tried to use image segmentation algorithms via Python and the source codes are as below:
from matplotlib import pyplot as plt
from skimage import data
from skimage.feature import blob_dog, blob_log, blob_doh
from math import sqrt
from skimage.color import rgb2gray
from scipy import misc # try
image = misc.imread('test.jpg')
image_gray = rgb2gray(image)
blobs_log = blob_log(image_gray, max_sigma=10, num_sigma=5, threshold=.1)
# Compute radii in the 3rd column.
blobs_log[:, 2] = blobs_log[:, 2] * sqrt(2)
blobs_dog = blob_dog(image_gray, max_sigma=2, threshold=.051)
blobs_dog[:, 2] = blobs_dog[:, 2] * sqrt(2)
blobs_doh = blob_doh(image_gray, max_sigma=2, threshold=.01)
blobs_list = [blobs_log, blobs_dog, blobs_doh]
colors = ['yellow', 'lime', 'red']
titles = ['Laplacian of Gaussian', 'Difference of Gaussian',
'Determinant of Hessian']
sequence = zip(blobs_list, colors, titles)
for blobs, color, title in sequence:
fig, ax = plt.subplots(1, 1)
ax.set_title(title)
ax.imshow(image, interpolation='nearest')
for blob in blobs:
y, x, r = blob
c = plt.Circle((x, y), r, color=color, linewidth=2, fill=False)
ax.add_patch(c)
plt.show()
The best results obtained so far are still unsatisfactory:
How can I improve it ?

You could use Gimp or Photoshop and test some filters and colors changes to differentiate the circles from the background. Brightness and Contrast adjustments may work. Then you can apply an Edge detector to detect the circles.

by converting this image to grayscale you have effectively thrown away the most powerful cue you have to segment the beads - their distinctive green color. try running the same code but replace
image_gray = rgb2gray(image)
with
image_gray = image[:,:,1]

Related

keras autoencoder vs PCA

I am playing with a toy example to understand PCA vs keras autoencoder
I have the following code for understanding PCA:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
pca = decomposition.PCA(n_components=3)
pca.fit(X)
pca.explained_variance_ratio_
array([ 0.92461621, 0.05301557, 0.01718514])
pca.components_
array([[ 0.36158968, -0.08226889, 0.85657211, 0.35884393],
[ 0.65653988, 0.72971237, -0.1757674 , -0.07470647],
[-0.58099728, 0.59641809, 0.07252408, 0.54906091]])
I have done a few readings and play codes with keras including this one.
However, the reference code feels too high a leap for my level of understanding.
Does someone have a short auto-encoder code which can show me
(1) how to pull the first 3 components from auto-encoder
(2) how to understand what amount of variance the auto-encoder captures
(3) how the auto-encoder components compare against PCA components
First of all, the aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. So, the target output of the autoencoder is the autoencoder input itself.
It is shown in [1] that If there is one linear hidden layer and the mean squared error criterion is used to train the network, then the k hidden units learn to project the input in the span of the first k principal components of the data.
And in [2] you can see that If the hidden layer is nonlinear, the autoencoder behaves differently from PCA, with the ability to capture multi-modal aspects of the input distribution.
Autoencoders are data-specific, which means that they will only be able to compress data similar to what they have been trained on. So, the usefulness of features that have been learned by hidden layers could be used for evaluating the efficacy of the method.
For this reason, one way to evaluate an autoencoder efficacy in dimensionality reduction is cutting the output of the middle hidden layer and compare the accuracy/performance of your desired algorithm by this reduced data rather than using original data.
Generally, PCA is a linear method, while autoencoders are usually non-linear. Mathematically, it is hard to compare them together, but intuitively I provide an example of dimensionality reduction on MNIST dataset using Autoencoder for your better understanding. The code is here:
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense
from keras.utils import np_utils
import numpy as np
num_train = 60000
num_test = 10000
height, width, depth = 28, 28, 1 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(num_train, height * width)
X_test = X_test.reshape(num_test, height * width)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range
Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels
input_img = Input(shape=(height * width,))
x = Dense(height * width, activation='relu')(input_img)
encoded = Dense(height * width//2, activation='relu')(x)
encoded = Dense(height * width//8, activation='relu')(encoded)
y = Dense(height * width//256, activation='relu')(x)
decoded = Dense(height * width//8, activation='relu')(y)
decoded = Dense(height * width//2, activation='relu')(decoded)
z = Dense(height * width, activation='sigmoid')(decoded)
model = Model(input_img, z)
model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy
model.fit(X_train, X_train,
epochs=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, X_test))
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)
reduced.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
reduced.fit(X_train, Y_train,
epochs=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, Y_test))
scores = reduced.evaluate(X_test, Y_test, verbose=1)
print("Accuracy: ", scores[1])
It produces a $y\in \mathbb{R}^{3}$ ( almost like what you get by decomposition.PCA(n_components=3) ). For example, here you see the outputs of layer y for a digit 5 instance in dataset:
class y_1 y_2 y_3
5 87.38 0.00 20.79
As you see in the above code, when we connect layer y to a softmax dense layer:
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
the new model mid give us a good classification accuracy about 95%. So, it would be reasonable to say that y, is an efficiently extracted feature vector for the dataset.
References:
[1]: Bourlard, Hervé, and Yves Kamp. "Auto-association by multilayer perceptrons and singular value decomposition." Biological cybernetics 59.4 (1988): 291-294.
[2]: Japkowicz, Nathalie, Stephen Jose Hanson, and Mark A. Gluck. "Nonlinear autoassociation is not equivalent to PCA." Neural computation 12.3 (2000): 531-545.
The earlier answer cover the whole thing, however I am doing the analysis on the Iris data - my code comes with a slightly modificiation from this post which dives further into the topic. As it was request, lets load the data
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
Let's do a regular PCA
from sklearn import decomposition
pca = decomposition.PCA()
pca_transformed = pca.fit_transform(X_scaled)
plot3clusters(pca_transformed[:,:2], 'PCA', 'PC')
A very simple AE model with linear layers, as the earlier answer pointed out with ... the first reference, one linear hidden layer and the mean squared error criterion is used to train the network, then the k hidden units learn to project the input in the span of the first k principal components of the data.
from keras.layers import Input, Dense
from keras.models import Model
import matplotlib.pyplot as plt
#create an AE and fit it with our data using 3 neurons in the dense layer using keras' functional API
input_dim = X_scaled.shape[1]
encoding_dim = 2
input_img = Input(shape=(input_dim,))
encoded = Dense(encoding_dim, activation='linear')(input_img)
decoded = Dense(input_dim, activation='linear')(encoded)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
print(autoencoder.summary())
history = autoencoder.fit(X_scaled, X_scaled,
epochs=1000,
batch_size=16,
shuffle=True,
validation_split=0.1,
verbose = 0)
# use our encoded layer to encode the training input
encoder = Model(input_img, encoded)
encoded_input = Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]
decoder = Model(encoded_input, decoder_layer(encoded_input))
encoded_data = encoder.predict(X_scaled)
plot3clusters(encoded_data[:,:2], 'Linear AE', 'AE')
You can look into the loss if you want
#plot our loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model train vs validation loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
The function to plot the data
def plot3clusters(X, title, vtitle):
import matplotlib.pyplot as plt
plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
plt.scatter(X[y == i, 0], X[y == i, 1], color=color, alpha=1., lw=lw, label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title(title)
plt.xlabel(vtitle + "1")
plt.ylabel(vtitle + "2")
return(plt.show())
Regarding explaining the variability, using non-linear hidden function, leads to other approximation similar to ICA / TSNE and others. Where the idea of variance explanation is not there, still one can look into the convergence.

Equal width plot sizes in pyplot, while keeping aspect ratio equal

I want to have two plots be the same width, however the resulting code shrinks the imshow plot.
xx = np.linspace(0.0,255.5,512)
yy = np.linspace(0.0,255.5,512)
Func = np.random.rand(len(xx),len(yy))
f, axarr = plt.subplots(2,1)
f.tight_layout()
im = axarr[0].imshow(Func, cmap = 'jet', interpolation = 'lanczos',origin = 'lower')
pos = axarr[0].get_position()
colorbarpos = [pos.x0+1.05*pos.width,pos.y0,0.02,pos.height]
cbar_ax = f.add_axes(colorbarpos)
cbar = f.colorbar(im,cax=cbar_ax)
axarr[1].plot(xx,Func[:,255],yy,Func[255,:])
plt.show()
plt.close('all')
EDIT: I would also like to keep imshow's plot from looking stretched (essentially, I need the width and length stretched appropriately so the aspect ratio's are still equal).
Some options:
A. `aspect="auto"
Use `aspect="auto" on the imshow plot
plt.imshow(..., aspect="auto")
B. adjust the figure margings
Adjust the figure margings or the figure size, such that the lower axes will have the same size as the imshow plot, e.g.
plt.subplots_adjust(left=0.35, right=0.65)
C. using a divider
You can use make_axes_locatable functionality from mpl_toolkits.axes_grid1 to divide the image axes to make space for the other axes.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
xx = np.linspace(0.0,255.5,512)
yy = np.linspace(0.0,255.5,512)
Func = np.random.rand(len(xx),len(yy))
fig, ax = plt.subplots(figsize=(4,5))
im = ax.imshow(Func, cmap = 'jet', interpolation = 'lanczos',origin = 'lower')
divider = make_axes_locatable(ax)
ax2 = divider.append_axes("bottom", size=0.8, pad=0.3)
cax = divider.append_axes("right", size=0.08, pad=0.1)
ax2.plot(xx,Func[:,255],yy,Func[255,:])
cbar = fig.colorbar(im,cax=cax)
plt.show()

Strange thick line in python plots?

I am using matplotlib.pyplot in python. Consider y-axis is real values called "ranking loss" and x-axis is the number of the iterations(1000). Then I plot the average ranking loss of 2 runs of an algorithm in each iteration.
Does anyone know why do I get this strange thick chart instead of a line?
Thank you very much in advance
And the command is :
fig = plt.figure()
fig.suptitle('Batch-GD', fontsize=20)
plt.xlabel('Iteration', fontsize=18)
plt.ylabel('Avg ranking loss', fontsize=16)
plt.grid(True)
plt.xlim(0, iter)
plt.plot(avg_loss)
fig.savefig('GD_with_ini.jpg')
plt.show()
What's happening here is probably, that your line density is so high that lines overlap in a way that a plain surface is shown instead of the line itself.
If we take e.g. 10000 points and make the plot oscillate at very high frequency, we get a similar behaviour. Zooming in shows that there actually is a line.
Code to reproduce the plot:
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.axes_grid1.inset_locator import inset_axes, mark_inset
x = np.linspace(0,1000,num=10000)
y = np.sin(x*100.)*x/5000.+np.exp(-x/60.)+np.sin(x/50.)*0.016
plt.plot(x,y)
###### show inset ####
ax = plt.gca()
axins = inset_axes(ax, 2,2, loc=1)
axins.plot(x,y)
axins.set_xlim(400, 410)
axins.set_ylim(-0.1, 0.17)
mark_inset(ax, axins, loc1=2, loc2=4, fc="none", ec="0.5")
plt.show()
A solution can then be to calculate some kind of rolling average. E.g.:
def running_mean(x, N):
cumsum = np.cumsum(np.insert(x, 0, 0))
return (cumsum[N:] - cumsum[:-N]) / N
N=300
cumsum = running_mean(y, N)
ax.plot(x[N//2:-N//2+1], cumsum, c="orange")
axins.plot(x[N//2:-N//2+1], cumsum, c="orange")

Colouring the surface of a sphere with a set of scalar values in matplotlib

I am rather new to matplotlib (and this is also my first question here). I'm trying to represent the scalp surface potential as recorded by an EEG. So far I have a two-dimensional figure of a sphere projection, which I generated using contourf, and pretty much boils down to an ordinary heat map.
Is there any way this can be done on half a sphere?, i.e. generating a 3D sphere with surface colours given by a list of values? Something like this, http://embal.gforge.inria.fr/img/inverse.jpg, but I have more than enough with just half a sphere.
I have seen a few related questions (for example, Matplotlib 3d colour plot - is it possible?), but they either don't really address my question or remain unanswered to date.
I have also spent the morning looking through countless examples. In most of what I've found, the colour at one particular point of a surface is indicative of its Z value, but I don't want that... I want to draw the surface, then specify the colours with the data I have.
You can use plot_trisurf and assign a custom field to the underlying ScalarMappable through set_array method.
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import matplotlib.tri as mtri
(n, m) = (250, 250)
# Meshing a unit sphere according to n, m
theta = np.linspace(0, 2 * np.pi, num=n, endpoint=False)
phi = np.linspace(np.pi * (-0.5 + 1./(m+1)), np.pi*0.5, num=m, endpoint=False)
theta, phi = np.meshgrid(theta, phi)
theta, phi = theta.ravel(), phi.ravel()
theta = np.append(theta, [0.]) # Adding the north pole...
phi = np.append(phi, [np.pi*0.5])
mesh_x, mesh_y = ((np.pi*0.5 - phi)*np.cos(theta), (np.pi*0.5 - phi)*np.sin(theta))
triangles = mtri.Triangulation(mesh_x, mesh_y).triangles
x, y, z = np.cos(phi)*np.cos(theta), np.cos(phi)*np.sin(theta), np.sin(phi)
# Defining a custom color scalar field
vals = np.sin(6*phi) * np.sin(3*theta)
colors = np.mean(vals[triangles], axis=1)
# Plotting
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
cmap = plt.get_cmap('Blues')
triang = mtri.Triangulation(x, y, triangles)
collec = ax.plot_trisurf(triang, z, cmap=cmap, shade=False, linewidth=0.)
collec.set_array(colors)
collec.autoscale()
plt.show()

Any way to create histogram with matplotlib.pyplot without plotting the histogram?

I am using matplotlib.pyplot to create histograms. I'm not actually interested in the plots of these histograms, but interested in the frequencies and bins (I know I can write my own code to do this, but would prefer to use this package).
I know I can do the following,
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.normal(1.5,1.0)
x2 = np.random.normal(0,1.0)
freq, bins, patches = plt.hist([x1,x1],50,histtype='step')
to create a histogram. All I need is freq[0], freq[1], and bins[0]. The problem occurs when I try and use,
freq, bins, patches = plt.hist([x1,x1],50,histtype='step')
in a function. For example,
def func(x, y, Nbins):
freq, bins, patches = plt.hist([x,y],Nbins,histtype='step') # create histogram
bincenters = 0.5*(bins[1:] + bins[:-1]) # center bins
xf= [float(i) for i in freq[0]] # convert integers to float
xf = [float(i) for i in freq[1]]
p = [ (bincenters[j], (1.0 / (xf[j] + yf[j] )) for j in range(Nbins) if (xf[j] + yf[j]) != 0]
Xt = [j for i,j in p] # separate pairs formed in p
Yt = [i for i,j in p]
Y = np.array(Yt) # convert to arrays for later fitting
X = np.array(Xt)
return X, Y # return arrays X and Y
When I call func(x1,x2,Nbins) and plot or print X and Y, I do not get my expected curve/values. I suspect it something to do with plt.hist, since there is a partial histogram in my plot.
I don't know if I'm understanding your question very well, but here, you have an example of a very simple home-made histogram (in 1D or 2D), each one inside a function, and properly called:
import numpy as np
import matplotlib.pyplot as plt
def func2d(x, y, nbins):
histo, xedges, yedges = np.histogram2d(x,y,nbins)
plt.plot(x,y,'wo',alpha=0.3)
plt.imshow(histo.T,
extent=[xedges.min(),xedges.max(),yedges.min(),yedges.max()],
origin='lower',
interpolation='nearest',
cmap=plt.cm.hot)
plt.show()
def func1d(x, nbins):
histo, bin_edges = np.histogram(x,nbins)
bin_center = 0.5*(bin_edges[1:] + bin_edges[:-1])
plt.step(bin_center,histo,where='mid')
plt.show()
x = np.random.normal(1.5,1.0, (1000,1000))
func1d(x[0],40)
func2d(x[0],x[1],40)
Of course, you may check if the centering of the data is right, but I think that the example shows some useful things about this topic.
My recommendation: Try to avoid any loop in your code! They kill the performance. If you look, In my example there aren't loops. The best practice in numerical problems with python is avoiding loops! Numpy has a lot of C-implemented functions that do all the hard looping work.
You can use np.histogram2d (for 2D histogram) or np.histogram (for 1D histogram):
hst = np.histogram(A, bins)
hst2d = np.histogram2d(X,Y,bins)
Output form will be the same as plt.hist and plt.hist2d, the only difference is there is no plot.
No.
But you can bypass the pyplot:
import matplotlib.pyplot
fig = matplotlib.figure.Figure()
ax = matplotlib.axes.Axes(fig, (0,0,0,0))
numeric_results = ax.hist(data)
del ax, fig
It won't impact active axes and figures, so it would be ok to use it even in the middle of plotting something else.
This is because any usage of plt.draw_something() will put the plot in current axis - which is a global variable.
If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the np.histogram() function is available