Run tflite accuracy tool on official tensorflow resnet50 model - c++

I have downloaded the official resnet50 model provided here: https://github.com/tensorflow/models/tree/master/official/resnet. I needed a tflite quantized version of this model and hence I converted the model to a tflite format as follows :
toco --output_file /tmp/resnet50_quant.tflite --saved_model_dir <path/to/saved_model_dir> --output_format TFLITE --quantize_weights QUANTIZE_WEIGHTS
After this, I thought I'd run the tflite accuracy tool to verify the accuracy of this model is still reasonable. Although it looks like I run into the following issue:
bazel run -c opt --copt=-march=native --cxxopt='--std=c++11' -- //tensorflow/contrib/lite/tools/accuracy/ilsvrc:imagenet_accuracy_eval --model_file=/tmp/resnet50_quant.tflite --ground_truth_images_path=<path/to/images> --ground_truth_labels=/tmp/validation_labels.txt --model_output_labels=/tmp/tf_labels.txt --output_file_path=/tmp/accuracy_output.txt --num_images=0
INFO: Analysed target //tensorflow/contrib/lite/tools/accuracy/ilsvrc:imagenet_accuracy_eval (0 packages loaded).
INFO: Found 1 target...
Target //tensorflow/contrib/lite/tools/accuracy/ilsvrc:imagenet_accuracy_eval up-to-date:
bazel-bin/tensorflow/contrib/lite/tools/accuracy/ilsvrc/imagenet_accuracy_eval
INFO: Elapsed time: 14.589s, Critical Path: 14.28s
INFO: 3 processes: 3 local.
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/tensorflow/contrib/lite/tools/accuracy/ilsvrc/imagenet_accuracy_eval '--model_file=/tmp/resnet50_quant.tflite' '--ground_truth_images_path=<path/to/images>' '--ground_truth_labels=/tmp/validation_labels.txt' '--model_output_labels=/tmp/tf_labels.txt' '--output_file_path=/tmp/accuracy_output.txt' 'INFO: Build completed successfully, 4 total actions
2018-10-12 15:30:06.237058: E tensorflow/contrib/lite/tools/accuracy/ilsvrc/imagenet_accuracy_eval.cc:155] Starting evaluation with: 4 threads.
2018-10-12 15:30:06.536802: E tensorflow/contrib/lite/tools/accuracy/ilsvrc/imagenet_accuracy_eval.cc:98] Starting model evaluation: 50000
2018-10-12 15:30:06.565334: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at run_tflite_model_op.cc:89 : Invalid argument: Data shapes mismatch for tensors: 0 expected: [64,224,224,3] got: [1,224,224,3]
2018-10-12 15:30:06.565453: F tensorflow/contrib/lite/tools/accuracy/ilsvrc/imagenet_model_evaluator.cc:222] Non-OK-status: eval_pipeline->Run(CreateStringTensor(image_label.image), CreateStringTensor(image_label.label)) status: Invalid argument: Data shapes mismatch for tensors: 0 expected: [64,224,224,3] got: [1,224,224,3]
[[{{node stage_run_tfl_model_output}} = RunTFLiteModel[input_type=[DT_FLOAT], model_file_path="/tmp/resnet50_quant.tflite", output_type=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](stage_inception_preprocess_output)]]
It looks like the issue is that the official resnet model has an input tensor of [64, 224, 224, 3] whereas the accuracy tool provides an input of [1, 224, 224, 3]. So, the official model seems to expect a batch of 64 images and hence the accuracy tool fails.
I was wondering what I need to do to get the accuracy tool to run on the official resnet50 model? I'm guessing that although the input tensor for resnet 50 is [64, 224, 224, 3], there should be a way to still run a single image through the model.

There are two ways to go about it:
Resize the input of your model to [1, 224, 224, 3] and run the tool.
You could try looking at this and then modifying this file accordingly.
Alternatively modify the same tool so that it feeds in 64 images at a time instead of 1. You can look at the same code file I point to above and feed 64 at a time instead of 1.
If you're looking for long-term support, consider filing a feature request on Github where we can support batching.

Related

Caffe "Unknown solver type : SGD"

Build Caffe (latest version, CPU only build) under Windows 10 for use in VS C++ project. So that everything began to work had to trouble. But when creating an instance of the Solver class, an error occurs.
SolverParameter solverParam;
ReadSolverParamsFromTextFileOrDie("solver.prototxt", &solverParam);
boost::shared_ptr<Solver<float>> solver(SolverRegistry<float>::CreateSolver(solverParam));
Output:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0601 14:21:42.943118 10832 solver_factory.cpp:29] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown solver type: SGD (known types: )
*** Check failure stack trace: ***
solver.prototxt content:
net: "model.prototxt"
base_lr: 0.02
lr_policy: "step"
gamma: 0.5
stepsize: 500000
display: 10
max_iter: 5000
snapshot: 1000000
snapshot_prefix: "XORProblem"
solver_mode: CPU
test_iter: 1
test_interval: 2000
What is the reason?
It seems like you did not define the type of the solver at all.
Try adding
type: "SGD"
to your 'solver.prototxt'.
Solver type should by "SGD" by default, but there are two ways to define it: one is using solver_type: SGD, and the other is using type: "SGD". The first option is marked as "deprecated" in the comments, so I guess this gives you trouble.
Try avoiding the default settings by explicitly setting the solver type using a non-deprecated method.
Update:
Looking at the windows branch readme it seems like there is an open issue with compiling shared library under windows, specifically with the solvers.
I believe the issue you are experiencing is related to that issue.
I solved the problem by including "caffe/solvers/sgd_solver.cpp" into "caffe.cpp".

Training model on AWS Deep Learning AMI instance - gets 'killed' with warnings

I am trying to train inception ResNetV2 model on my own dataset on Amazon's Deep Learning AMI
When I try to train on local machine the training starts as usual but when I try to train on aws instance it gets killed.
First I tried to train with MXNET backend . It gave the following error :
Notice that it gets killed.
So in
nano ~/.keras/keras.json
I tried to set image data format to channels_first :
{
"image_data_format": "channels_first",
"backend": "mxnet"
}
Then I got the error:
Traceback (most recent call last):
File "train.py", line 17, in <module>
model = applications.inception_resnet_v2.InceptionResNetV2(include_top=False, weights='imagenet', input_shape = (img_width, img_height, 3))
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/keras_applications/inception_resnet_v2.py", line 243, in InceptionResNetV2
weights=weights)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/keras_applications/imagenet_utils.py", line 296, in _obtain_input_shape
'`input_shape=' + str(input_shape) + '`')
ValueError: The input must have 3 channels; got `input_shape=(182, 182, 3)`
Then I tried to switch to tensorflow backend to see how it plays out because there might be some misunderstanding on my part on how this process works. But when I switched to tensorflow backend and started training I got the following error :
As you can see it gets killed again.
I am not sure what to do next. Some help would be great.
P.S I am sorry for the screenshots. You're going to have to zoom in a little to get a better view.
Deep Learning AMI was mostly not supported on t2 instance type. It should work on most of the good cpu instance type (like C4, C5) or GPU instance type (G3, P2 and P3) and many other instance type.

Tensorflow RNN slice error

I am attempting to create a multilayered RNN using LSTMs in tensorflow. I am using Tensorflow version 0.9.0 and python 2.7 on Ubuntu 14.04.
However, I keep getting the following error:
tensorflow.python.framework.errors.InvalidArgumentError: Expected begin[1] in [0, 2000], but got 4000
when I use
rnn_cell.MultiRNNCell([cell]*num_layers)
if num_layers is greater than 1.
My code:
size = 1000
config.forget_bias = 1
and config.num_layers = 3
cell = rnn_cell.LSTMCell(size,forget_bias=config.forget_bias)
cell_layers = rnn_cell.MultiRNNCell([cell]*config.num_layers)
I would also like to be able to switch to using GRU cells but this gives me the same error:
Expected begin[1] in [0, 1000], but got 2000
I have tried explicitly setting
num_proj = 1000
which also did not help.
Is this something to do with my use of concatenated states? As I have attempted to set
state_is_tuple=True
which gives:
`ValueError: Some cells return tuples of states, but the flag state_is_tuple is not set. State sizes are: [LSTMStateTuple(c=1000, h=1000), LSTMStateTuple(c=1000, h=1000), LSTMStateTuple(c=1000, h=1000)]`
Any help would be much appreciated!
I'm not sure why this worked but, I added in a dropout wrapper. i.e.
if Training:
cell = rnn_cell.DropoutWrapper(cell,output_keep_prob=config.keep_prob)
And now it works.
This works for both LSTM and GRU cells.
This problem is occurring because you have increased layer of your GRU cell but your initial vector is not doubled. If your initial_vector size is [batch_size,50].
Then initial_vector = tf.concat(1,[initial_vector]*num_layers)
Now input this to decoder as initial vector.

DLIB: train_shape_predictor_ex.exe for 194 landmarks with halen dataset gives runtime error: bad allocation

I am trying train dlib's shape_predictor for 194 landmarks with halen dataset
but it gives bad allocation exception when I run it command prompt
D:\Facial Feature Extraction>train_shape_predictor_ex.exe face_detector
Program is started
exception thrown!
bad allocation
, I reduced the number of image to only 50 then it run successfully but the result is not satisfactory. So I tried to train with 64 GB RAM System but bow I increased the parameter
trainer.set_nu(0.05);
trainer.set_tree_depth(2);
but now it is still showing bad allocation error. If I train with less data and for smaller parameter the train model is not correct.
Build your application in Release Mode and target to 64-bit Windows plateform.
Also Enable \LARGEADDRESSAWARE Flag in your Project.
Here is a link to your question:
Answer

elki-cli versus elki gui, I don't get equal results

Though the terminal on ubuntu:
db#morris:~/lisbet/elki-master/elki/target$ elki-cli -algorithm outlier.lof.LOF -dbc.parser ArffParser -dbc.in /home/db/lisbet/AllData/literature/WBC/WBC_withoutdupl_norm_v10_no_ids.arff -lof.k 8 -evaluator outlier.OutlierROCCurve -rocauc.positive yes
giving
# ROCAUC: 0.6230046948356808
and in ELKI's GUI:
Running: -verbose -dbc.in /home/db/lisbet/AllData/literature/WBC/WBC_withoutdupl_norm_v10_no_ids.arff -dbc.parser ArffParser -algorithm outlier.lof.LOF -lof.k 8 -evaluator outlier.OutlierROCCurve -rocauc.positive yes
de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.parse: 18 ms
de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.filter: 0 ms
LOF #1/3: Materializing LOF neighborhoods.
de.lmu.ifi.dbs.elki.index.preprocessed.knn.MaterializeKNNPreprocessor.k: 9
Materializing k nearest neighbors (k=9): 223 [100%]
de.lmu.ifi.dbs.elki.index.preprocessed.knn.MaterializeKNNPreprocessor.precomputation-time: 10 ms
LOF #2/3: Computing LRDs.
LOF #3/3: Computing LOFs.
LOF: complete.
de.lmu.ifi.dbs.elki.algorithm.outlier.lof.LOF.runtime: 39 ms
ROCAUC: **0.6220657276995305**
I don't understand why the 2 ROCAUCcurves aren't the same.
My goal in testing this is to be comfortable with my result, that what I do is right, but it is hard when I don't get matching results. When I see that my settings are right I will move on to making my own experiments, that I can trust.
Pass cli as first command line parameter to launche the CLI, or minigui to launch the MiniGUI. The following are equivalent:
java -jar elki/target/elki-0.6.5-SNAPSHOT.jar cli
java -jar elki/target/elki-0.6.5-SNAPSHOT.jar KDDCLIApplication
java -jar elki/target/elki-0.6.5-SNAPSHOT.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication
This will work for any class extending the class AbstractApplication.
Your can also do:
java -cp elki/target/elki-0.6.5-SNAPSHOT.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication
(Which will load 1 class less, but this is usually not worth the effort.)
This will work for any class that has a standard public void main(String[]) method, as this is the standard Java invocation.
But notice that -h currently will still print 0.6.0 (2014, January), that value was not updated for the 0.6.5 interim versions. It will be bumped for 0.7.0. That version number is therefore not reliable.
As for the differences you observed: try varing k by 1. If I recall correctly, we changed the meaning of the k parameter to be more consistent across different algorithms. (They are not consistent in literature anyway.)