Importing caffe's PriorBox into TensorRT - c++

We have a Caffe model that contains:
layer {
name: "foo"
type: "PriorBox"
prior_box_param { # ERROR HERE
# whatever
}
# etc
}
Now, following the code in sampleMNIST I try to import my model into TensorRT but get an error:
Error parsing text-format ditcaffe.NetParameter: 1000:19 ("ERROR HERE" location):
Message type "ditcaffe.LayerParameter" has no field named "prior_box_param".
Searching around, this is a known issue and there is even a TensorRT class nvinfer1::plugin::PriorBoxParameters that suggests it should be able to handle this layer, but there is little or documentation on how to proceed. I read one suggestion about splitting the model, but there are four instances of this node in my model, and more importantly, there is no information about what code should be in a custom node.
How should I handle this with minimal impact on the existing model that has been designed and trained by a third-party, so I cannot drastically alter either the model or the weights.

Related

Dataflow breaks using TaggedOutputs, "can't pickle WeakDictionary"

we are trying to deploy an Streaming pipeline to Dataflow where we separate in few different "routes" that we manipulate differently the data.
We did the complete development with the DirectRunner, and works smoothly as we tested but now, that we did deployed it to Dataflow, it does not work.
The code fails when yielding on the following doFn
class SplitByRoute(beam.DoFn):
OUTPUT_TAG_ROUTE_ONE= "route_one"
OUTPUT_TAG_ROUTE_TWO = "route_two"
OUTPUT_NOT_SUPPORTED = "not_supported"
def __init__(self):
beam.DoFn.__init__(self)
def process(self, elem):
try:
route = self.define_route(elem["param"]) # Just tag it depending on param
except Exception:
route = None
logging.info(f"Routed to {route}")
if route == self.OUTPUT_TAG_ROUTE_ONE:
yield TaggedOutput(self.OUTPUT_TAG_ROUTE_ONE, elem)
elif route == self.OUTPUT_TAG_ROUTE_TWO:
logging.info(f"Element: {elem}")
yield TaggedOutput(self.OUTPUT_TAG_ROUTE_TWO, elem)
else:
yield TaggedOutput(self.OUTPUT_NOT_SUPPORTED, elem)
It does log the element, yield the output and fails with the following error
AttributeError: Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove' [while running 'generatedPtransform-3196']
Other considerations are that we use taggedOutputs on the pipeline before this DoFn, and it works on Dataflow but this one in particularly fails with the error mentioned. Could it be the memory cache? or something related to it? Where Weakrefs are used?
Far as I know, this error happens when you have a class inside another one. Maybe not(?)
Any suggestions so how we could manage this? It's been very frustrating error.
Thank you!!! :)
We found the error
As you might know, apache-beam uses dill package to serialize the data between the modules. This let us pickle an instance of a object and send it through the pipeline.
The problem was that in self.define_route(elem["param"]), we used that instance of the class and we modified one of it's attributes. As the answer from Samuel Romero says, you can pickle a class, but I didn't really know (and probably someone has to) that if you modify the class instance it can not be pickle again. that's an strage behaviour, I know, so I opened an issue on BEAM https://issues.apache.org/jira/browse/BEAM-10384 if you want to check it out.
I will probably get into it (to understand better the problem) soon or later, but if someone had the same error, the workaround, as I mentioned is to do not modify the instance of a class beeing serialized.
Thanks to anyone who tried to help!
As you can read here, Python uses the pickle library for data serialization and it is subject to its limitations. Data serialization is the way processes transfer data between them since they do not share memory space.
Here I found a suggestion about using a fork of multiprocessing module that uses the dill package instead of pickle. This fork is part of the pathos framework (as is the dill package too) and is now called pathos.multiprocess and not pathos.multiprocessing as seen in the reference I mentioned previously.

TraciMobility::getExternalId error when adding custom module into Veins_Inet example (Omnet++)

I am attempting to add a new custom RSU module (extending AdHocHost) into the Veins_Inet example. Here is my updated scenario (with 1 RSU).
network TestScenario {
submodules:
radioMedium: Ieee80211ScalarRadioMedium;
manager: VeinsInetManager;
node[0]: VeinsInetCar;
// added rsu
rsu: VeinsInetRSU;
connections allowunconnected:}
I also updated the ini file so that the RSU mobility is
*.rsu.mobility.typename = "inet.mobility.static.StationaryMobility"
and the RSU application is barebones with minor implementation:
*.rsu.app[0].typename = "practice.veins_inet.VeinsInetRSUSampleApplication".
However, I get the following error:
TraCIMobility::getExternalId called with no external id set yet.
In the example, the VeinsInetManager is managing the cars with TRACI. Here is the ned file associated with the manager. The source file has 2 functions, pre-initialize module and update module position.
simple VeinsInetManager extends TraCIScenarioManagerLaunchd {
parameters:
#class(veins::VeinsInetManager);}
How can I add a custom module into the scenario without raising any errors?
Your application might be inheriting from VeinsInetApplicationBase, which calls TraCI methods (that fail for nodes that are not a TraCI-managed vehicle). See also its source code.
To be doubly-sure, run your simulation in debug mode, turn on debug-on-errors, and check the stack trace to see where the call is coming from.

Use TensorBoard with Keras Tuner

I ran into an apparent circular dependency trying to use log data for TensorBoard during a hyper-parameter search done with Keras Tuner, for a model built with TF2. The typical setup for the latter needs to set up the Tensorboard callback in the tuner's search() method, which wraps the model's fit() method.
from kerastuner.tuners import RandomSearch
tuner = RandomSearch(build_model, #this method builds the model
hyperparameters=hp, objective='val_accuracy')
tuner.search(x=train_x, y=train_y,
validation_data=(val_x, val_y),
callbacks=[tensorboard_cb]
In practice, the tensorboard_cb callback method needs to set up the directory where data will be logged and this directory has to be unique to each trial. A common way is to do this by naming the directory based on the current timestamp, with code like below.
log_dir = time.strftime('trial_%Y_%m_%d-%H_%M_%S')
tensorboard_cb = TensorBoard(log_dir)
This works when training a model with known hyper-parameters. However, when doing hyper-parameters search, I have to define and specify the TensorBoard callback before invoking tuner.search(). This is the problem: tuner.search() will invoke build_model() multiple times and each of these trials should have its own TensorBoard directory. Ideally defining log_dir will be done inside build_model() but the Keras Tuner search API forces the TensorBoard to be defined outside of that function.
TL;DR: TensorBoard gets data through a callback and requires one log directory per trial, but Keras Tuner requires defining the callback once for the entire search, before performing it, not per trial. How can unique directories per trial be defined in this case?
The keras tuner creates a subdir for each run (statement is probably version dependent).
I guess finding the right version mix is of importance.
Here is how it works for me, in jupyterlab.
prerequisite:
pip requirements
keras-tuner==1.0.1
tensorboard==2.1.1
tensorflow==2.1.0
Keras==2.2.4
jupyterlab==1.1.4
(2.) jupyterlab installed, built and running [standard compile arguments: production:minimize]
Here is the actual code. First i define the log folder and the callback
# run parameter
log_dir = "logs/" + datetime.datetime.now().strftime("%m%d-%H%M")
# training meta
stop_callback = EarlyStopping(
monitor='loss', patience=1, verbose=0, mode='auto')
hist_callback = tf.keras.callbacks.TensorBoard(
log_dir=log_dir,
histogram_freq=1,
embeddings_freq=1,
write_graph=True,
update_freq='batch')
print("log_dir", log_dir)
Then i define my hypermodel, which i do not want to disclose. Afterwards
i set up the hyper parameter search
from kerastuner.tuners import Hyperband
hypermodel = get_my_hpyermodel()
tuner = Hyperband(
hypermodel
max_epochs=40,
objective='loss',
executions_per_trial=5,
directory=log_dir,
project_name='test'
)
which i then execute
tuner.search(
train_data,
labels,
epochs=10,
validation_data=(val_data, val_labels),
callbacks=[hist_callback],
use_multiprocessing=True)
tuner.search_space_summary()
While the notebook with this code searches for adequate hyper parameters i control the loss in another notebook. Since tf V2 tensorboard can be called via a magic function
Cell 1
import tensorboard
Cell 2
%load_ext tensorboard
Cell 3
%tensorboard --logdir 'logs/'
Sitenote: Since i run jupyterlab in a docker container i have to specifiy the appropriate address and port for tensorboard and also forward this in the dockerfile.
The result is not really predictable for me... I did not understand yet, when i can expect histograms and distributions in tensorboard.
Some runs the loading time seems really excessive... so have patience
Under scalars i find a list of the turns as follows
"logdir"/"model_has"/execution[iter]/[train/validation]
E.g.
0101-1010/bb7981e03d05b05106d8a35923353ec46570e4b6/execution0/train
0101-1010/bb7981e03d05b05106d8a35923353ec46570e4b6/execution0/validation

Pylint: Module/Instance of has no member for google.cloud.vision API

When I run this code (which will later be used to detect and extract text using Google Vision API in Python) I get the following errors:
Module 'google.cloud.vision_v1.types' has no 'Image' member pylint(no-member)
Instance of 'ImageAnnotatorClient' has no 'text_detection' member pylint(no-member)
from google.cloud import vision
from google.cloud.vision import types
import os, io
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'C:\Users\paul\VisionAPI\key.json'
client = vision.ImageAnnotatorClient()
FILE_NAME = 'im3.jpg'
FOLDER_PATH = r'C:\Users\paul\VisionAPI\images'
with io.open(os.path.join(FOLDER_PATH , FILE_NAME), 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.text_detection(image=image)
What does "Module/Instance of ___ has no members" mean?
I was able to reproduce the pylint error, though the script executes successfully when run (with minor modifications for my environment to change the filename being processed).
Therefore, I am assuming that by "run this code" you mean "run this code through pylint". If not, please update the question with how you are executing the code in a way that generates pylint errors.
This page describes the specific error you are seeing, and the case that causes a false positive for it. This is likely exactly the false positive you are hitting.
The Google Cloud Vision module appears to dynamically create these members, and pylint doesn't have a way to detect that they actually exist at runtime, so it raises the error.
Two options:
Tag the affected lines with a # pylint: disable=no-member annotation, as suggested in the page linked above.
Run pylint with the --ignore-modules=google.cloud.vision_v1 flag (or put the equivalent in your .pylintrc). You'll notice that even the actual module name is different than the one you imported :)
This is a similar question with more detail about workarounds for the E1101 error.

how to serialize binary files to use with a celery task

I recently integrated celery (django-celery to be more specific) in one of my applications. I have a model in the application as follows.
class UserUploadedFile(models.Model)
original_file = models.FileField(upload_to='/uploads/')
txt = models.FileField(upload_to='/uploads/')
pdf = models.FileField(upload_to='/uploads/')
doc = models.FileField(upload_to='/uploads/')
def convert_to_others(self):
# Code to convert the original file to other formats
Now, once a user uploads a file, i want to convert the original file to txt, pdf and doc formats. calling the convert_to_others method is a bit of an expensive process so i plan to do it asynchronously using celery. So i wrote a simple celery task as follows.
#celery.task(default_retry_delay=bdev.settings.TASK_RETRY_DELAY)
def convert_ufile(file, request):
"""
This task method would call a UserUploadedFile object's convert_to_others
method to do the file conversions.
The best way to call this task would be doing it asynchronously
using apply_async method.
"""
try:
file.convert_to_others()
except Exception, err:
# If the task fails log the exception and retry in 30 secs
log.LoggingMiddleware.log_exception(request, err)
convert_ufile.retry(exc=err)
return True
and then called the task as follows:
ufile = get_object_or_404(models.UserUploadedFiles, pk=id)
tasks.convert_ufile.apply_async(args=[ufile, request])
Now when the apply_async method is called it raises the following exception:
PicklingError: Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed
I think this is because celery (by default) uses pickle library to serialize data, and pickle is not able to serialize the binary file.
Question
Are there any other serializers that can serialize a binary file on its own? If not how can i serialize a binary file using the default pickle serializer ?
You are correct that celery tries to pickle data for which pickling is unsupported. Even if you would find a way to serialize data you want to send to celery task, I wouldn't do this.
It is always a good idea to send as little data as possible to the celery tasks, so in your case I would pass only the id of a UserUploadedFile instance. Having this you can fetch your object by id in celery task and perform convert_to_others() .
Please also note that the object could change its state (or it could even be deleted) before the task is executed. So it is much safer to fetch the object in your celery task instead of sending its full copy.
To sum up, sending only an instance id and refetching it in task gives you a few things:
You send less data to your queue.
You do not have to deal with data inconsistency issues.
It's actually possible in your case. :)
The only 'drawback' is that you need to perform an extra, inexpensive SELECT query to refetch your data, which in overall looks like a good deal, when compared to above issues, doesn't it?