Data duplicated(doubled) in Dataset.next_batch() method - python-2.7

Assume that I have given only one test image.
I extract images,labels using mnist.test.next_batch(100).
When i give 1 test image, I am getting 2 images (same image duplicated)
When I give 2 test images, I am getting 4 (2 images duplicated).
Same problem exists for training images.
I consoled the length of test images and train images inside read_data_sets method(...tensorflow/contrib/learn/python/learn/datasets/mnist.py).
It is giving correct length.
Here is my code.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, validation_size=0)
images, labels = mnist.test.next_batch(100)
print len(images) #double the actual length

Related

ml::KNearest->findNearest() inconsistent result when category label changes

I am using OpenCV 3.4.1.
I am working on a video classification project and am trying to use KNearest to classify between 2 categories. I have 8 areas of interest in each video frame. To make decision on each frame, each KNearest is done on the pixel values on each area. The majority win (and favor to one category if it is a tie). So, I have 8 sets of training data (one for each area of interest).
Problem: The response generated from the knn model changed when I labelled the categories differently.
The training data sets are organized as rows of:
[category label], data0, data1, data2....etc. (different dimensions for each
training set)
where dataX = pixel data of a frame (1 row = 1 frame)
Then, I build the model by:
Ptr<TrainData> tdata = TrainData::loadFromCSV(filename, 0, 0, -1, String("cat"));
Mat raw = tdata->getTrainSamples();
Mat res = tdata->getResponses();
PCA pca(raw, noArray(), PCA::DATA_AS_ROW, 0.99);
Mat knnIn = pca.project(raw);
Ptr<ml::KNearest> knn = ml::KNearest::create();
knn->train(knnIn, ml::ROW_SAMPLE, res);
After that, testing data is passed to the pca and knn to get the response.
To testing it, I put 1 set of testing data to the 8 knn models.
If I use 0 & 1 as the [category label] in the training data sets, the 8 responses from KNN are 1,1,1,1,1,0,0,1.
If I change the label to 2 & 1 instead (replace all '0' by '2' in the first column of the training data), the 8 responses becomes 1,1,1,1,1,1,1,1, while I am expecting 1,1,1,1,1,2,2,1.
Some observations:
no editing error on the training data(while changing the category labels).
The KNN model isClassifier()=true. DefaultK=10. AlgorithmType=1 (BRUTE_FORCE).
The result is consistent with the same training data set and testing data set.
I don't see any pattern on the difference responses for the 2 label sets (after using different
training data sets and testing data sets).
Please shed some light. Thank you very much!

mask and extract cell values from a vrt file?

I have raster data for built up areas around the globe with 40m resolution as vrt file, download data from here , and I am trying to crop the data by a mask and then extract color index value for each cell.
Note: another 2 files exist with the data: vrt.clr and vrt.ovr
Here is a sample of data:
view of vrt data in arcmap.
My question: why I am getting empty cells values when I crop by mask ?
I have tried the following:
extract by mask using arcmap toolbox
using gdal in python 2.7
import gdal
ds = gdal.Open('input.vrt')
ds = gdal.Translate('output.vrt', ds, projWin =
[80.439,5.341,81.048,4.686])
ds = None
I have also try to save the data as tif
Also, is there any way to read the color index value at given coordinates (x,y) after masking the data?
The data appears to be in the Pseudo Mercator projection (EPSG 3857). So therefore you should either specify the extent for projWin in that coordinate system, or add projWinSRS if you want to provide them in a different coordinate system.
Also, if you want gdal.Translate to output to a VRT file, you should add format='VRT. Because in your code snippet outputs to the default file format, which is GeoTIFF.
When i assume your coordinates are WGS84 (EPSG 4326), it defines a small region over the ocean south of Sri Lanka. That doesn't make much sense given the nature of the data.
If you want to read the array given by your coordinates you could use:
invrt = 'GHS_BUILT_LDSMT_GLOBE_R2015B_3857_38_v1_0.vrt'
outfile = '/vsimem/tmpfile'
ds = gdal.Translate(outfile, invrt, projWin=[80.439, 5.341, 81.048, 4.686], projWinSRS='EPSG:4326')
data = ds.ReadAsArray()
ds = None
gdal.Unlink(outfile)
The plotted array looks like:

Doing OCR to identify text written on trucks/cars or other vehicles

I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

FuncAnimation issues

I'm trying to make a little movie out of 15 (in this case) images that I obtained by color mapping matrices and the problem is that whatever the input series of images, the movie I obtain always stops after showing 13 images (it is the case for every tried number if I have n images, the movie shows only the n-2 first ones).
This is my piece of code:
im = plt.imshow(matrix_list[0],cmap=cm.hot_r,interpolation='none')
plt.colorbar()
#matrix_list is a list of 15 ndarrays of shape (15,15)
def updatefig(j):
# set the data in the axesimage object
im.set_array(matrix_list[j])
# return the artists set
return im,
# kick off the animation
ani = animation.FuncAnimation(fig, updatefig,frames=range(len(matrix_list)),
interval=1000, blit=True)
mywriter = animation.FFMpegWriter(fps=2.)
ani.save('my_movie_test.avi',writer=mywriter)

Tensorflow return similar images

I want to use Google's Tensorflow to return similar images to an input image.
I have installed Tensorflow from http://www.tensorflow.org (using PIP installation - pip and python 2.7) on Ubuntu14.04 on a virtual machine CPU.
I have downloaded the trained model Inception-V3 (inception-2015-12-05.tgz) from http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz that is trained on ImageNet Large Visual Recognition Challenge using the data from 2012, but I think it has both the Neural network and the classifier inside it (as the task there was to predict the category). I have also downloaded the file classify_image.py that classifies an image in 1 of the 1000 classes in the model.
So I have a random image image.jpg that I an running to test the model. when I run the command:
python /home/amit/classify_image.py --image_file=/home/amit/image.jpg
I get the below output: (Classification is done using softmax)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 3
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 3
trench coat (score = 0.62218)
overskirt (score = 0.18911)
cloak (score = 0.07508)
velvet (score = 0.02383)
hoopskirt, crinoline (score = 0.01286)
Now, the task at hand is to find images that are similar to the input image (image.jpg) out of a database of 60,000 images (jpg format, and kept in a folder at /home/amit/images). I believe this can be done by removing the final classification layer from the inception-v3 model, and using the feature set of the input image to find cosine distance from the feature set all the 60,000 images, and we can return the images having less distance (cos 0 = 1)
Please suggest me the way forward for this problem and how do I do this using Python API.
I think I found an answer to my question:
In the file classify_image.py that classifies the image using the pre trained model (NN + classifier), I made the below mentioned changes (statements with #ADDED written next to them):
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
feature_tensor = sess.graph.get_tensor_by_name('pool_3:0') #ADDED
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
feature_set = sess.run(feature_tensor,
{'DecodeJpeg/contents:0': image_data}) #ADDED
feature_set = np.squeeze(feature_set) #ADDED
print(feature_set) #ADDED
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
I ran the pool_3:0 tensor by feeding in the image_data to it. Please let me know if I am doing a mistake. If this is correct, I believe we can use this tensor for further calculations.
Tensorflow now has a nice tutorial on how to get the activations before the final layer and retrain a new classification layer with different categories:
https://www.tensorflow.org/versions/master/how_tos/image_retraining/
The example code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
In your case, yes, you can get the activations from pool_3 the layer below the softmax layer (or the so-called bottlenecks) and send them to other operations as input:
Finally, about finding similar images, I don't think imagenet's bottleneck activations are very pertinent representation for image search. You could consider to use an autoencoder network with direct image inputs.
(source: deeplearning4j.org)
Your problem sounds similar to this visual search project