How to use ML.Net PredictionEnginePool with ONNX model? - ml.net

I am trying to use ML.Net to use ONNX models for prediction behind an API. There are documentation around how to use ML.Net & ONNX models with a console app here however, as it is described in this article, it wouldn't scale well. Since the article, they added PredictionEnginePool which solves the scaling problem, but I cannot make it work with ONNX models. When I try to load the model, it throws two exception:
InvalidOperationException: Repository doesn't contain entry DataLoaderModel\Model.key
Microsoft.ML.RepositoryReader.OpenEntry(string dir, string name)
InvalidOperationException: Could not load legacy format model
Microsoft.ML.ModelOperationsCatalog.Load(Stream stream, out DataViewSchema inputSchema)
The legacy format exception is interesting because I tried two different models, one from Azure Machine Learning Service with AutoML, and one with Scikit trained locally so not sure which part is "legacy".
The missing Model.key might be the hint though, because the zip model file that is used on the MS API documentation doesn't have a single .onnx file but it has folders with binary files and some of the files are actually named Model.key.
My question is:
Has anybody ever used PredictionEnginePool with ONNX models? Is it possible? Or it is not implemented yet? (Not sure if it matters but both are classification models, one SVM and one LightGBM)
*UPDATE
Found a way to do this. So it looks like the Engine Pool only supports models in ML.Net format, however you can open the model as it was described in the console app example and save it in ML.Net format, then you can use it with the engine pool.
There is a similar example for this here.
The OnnxModelConfigurator class opens the ONNX model and saves it in ML.Net format, then in the ctr of Startup.cs you call the configurator to save the model in the right format, and in the ConfigureServices() function you can actually create the pool with the ONNX model.
This works, however by following this approach, the conversion between the formats would be part of the API's source code, so you would need to at least restart the app when you want to use a new model. Which might not be a big deal, if a bit of downtime is ok and even if not, you can avoid it with deployment slots for example. You could also have the conversion as a separate service I guess and then just dump the model file to the API so the pool can detect the new model and use it.
Anyway, thanks for the answers guys!

I have run into your error before, but not using the Pool. If you look at this specific comment and the comments that follow, we resolved the issue by doing a full clean of his project. In that case, he had upgraded to a new version of ML.NET and didn't clean the project so it was causing issues. I am not sure if this will resolve your issue, but I am one of the engineers who works on ML.NET so if it doesn't please feel free to create an issue and we can help you resolve it.

You can also take a look at this guide.
In this case, a model trained using Azure Custom Vision is consumed within an ASP.NET application using PredictionEnginePool.

Related

How to train ML .Net model in runtime

is there any way to train an ml .net model in runtime through user input?
I've created a text classification model, trained it local, deployed it and now my users are using it.
Needed workflow:
Text will be categorized, category is displayed to user, he can accept it or select another of the predefined categories, than this feedback should train the model again.
Thanks!
What you are describing seems like online learning.
ML.NET doesn't have any true 'online' models (by which I mean, models that can adapt to new data example by example and instantaneously refresh): all ML.NET algorithms are 'batch' trainers, that require a (typically large) corpus of training data to produce a model.
If your situation allows, you could aggregate the users' responses as 'additional training data', and re-train the model periodically using this data (in addition to the older data, possibly down-sampled or otherwise decayed).
As #Jon pointed out, a slight modification of the above mechanism is to 'incrementally train an existing model on a new batch of data'. This is still a batch method, but it can reduce the retraining time.
Of ML.NET's multiclass trainers, only LbfgsMaximumEntropyMulticlassTrainer supports this mode (see documentation).
It might be tempting to take this approach to the limit, and 'retrain' the model on each 'batch' of one example. Unless you really, really know what you are doing, I would advise against it: more likely than not, such a training regime will be overfitting rapidly and disastrously.

How to keep a tensorflow session running in memory with django

I have an object detection model built with tensorflow and integrated with a Django project. What currently happens is that whenever a request comes to the Django API, a tf session is created and is closed after the detection is done. Is it possible to start the Django server and a tensorflow session with the required inference graphs to reduce object detection time?
A solution would consist in abstracting the logic to run inference using a session in module. In this module, the session, and the graph would be defined once as global variable, and would be transparently accessed by your views, or whatever, using interfaces like a function run_inference.
If you need more fine control over the lifecycle of the graph and/or session you could consider adding functions like reload_graph etc... or implement that within your module, for example using a class dedicated to managing the lifecycle of the tensorflow objects, and running inference.
This looks to me like the best solution. This way you will also be able to have a more robust workflow, and have more control in case for example you want to use multithreading and want more safety with respect to how the inference code is run.
As I commented to the previous solution suggestion, graph/session stopped being the optimal solution a long time ago.
I recommend you to start using mlops, and leave everything preloaded as much as you can, so that when you call the api, django only takes care of generating the predict.
I leave you here my project, I have several with ML/Django/Tflite
If you have any doubt don't hesitate to ask me.
Link to my project: https://github.com/Nouvellie/django-mlops-docker/blob/main/src/main/apps/mlops/utils/model_loader.py

Writing to Django-mapped tables from outside the project

Is there anything specifically that I should do or refrain from doing? The only thing I have found so far is that I should create the tables initially using Django's ORM.
I haven't found any resources specific to this topic which makes me think the answer is either very obvious or that this isn't done much, which probably means it's a bad idea.
Details: I have a standalone C# program that reads a file and exports the data to a PostgreSQL database. I would like to make a website where users can upload these types of files and query the data generated by the program. I really like python and I have done a number of projects with it (but not Django), and my experience with C# and .NET so far has been fairly negative. I have also done the Django tutorial, some of Tango, and I'm reading Two Scoops of Django, all of which make me think I will like working with Django.

djangonic way to deal with rdf?

I was looking for an RDF project for django and I cant find any active.
This seems to be a good one http://code.google.com/p/django-rdf, but the last commit was in 2008, (4 years ago). The group in google-groups seems to be abandoned. Last no-spam post was in 2008.
Therefore it has no support for new django versions.
Is there any library or some prebuilt open source app to easily expose rdf data?
Maybe is easy to solve, like writing a view and returning something using https://github.com/RDFLib/rdflib in one or two lines of code, but I can't figure it out how to do it...
The idea using RDFlib would be to take a django object or collection of objects and transform it to rdf in some way, maybe using an rdf parser.
I thought I could give html responses if the client request "accept:text/html", and RDF if the user requested the same page using a html accept header with rdf+xml or rdf+turtle (and it could exist an app that handles that for me)
From what little I've read of RDF you are probably going to have to do manual work to get meaningful RDF statements from Django models since it's not a simple data representation format like JSON, it is trying to encode semantic meaning.
That said, have a look at django-rdflib:
https://github.com/odeoncg/django-rdflib
There doesn't seem to be any documentation (and it seems to have been built for a specific app) but the author has posted here about a manage.py syncvb command that generates an RDF graph from existing Django models:
https://groups.google.com/d/msg/django-rdf/14WVK7t88PE/ktAKJo-aCfUJ
Not sure exactly what views django-rdflib provides, but if it can make an RDFlib graph for you then you can probably use the serialization plugins provided by RDFlib to output rdf+xml or whatever from your own view.
http://code.google.com/p/djubby/
SURF is useful as a RDF->object mapper (or RDFAlchemy)
injecting rdfa into your templates should work either (if you want to avoid triplestores)
you can also expose your database as a sparql endpoint using a tool such as http://d2rq.org/

What's a good open source django project to learn from?

What is a good django open source app that I can learn from? Something that follows best practices, and covers the majority of features and isn't overly complicated?
This would depend on your current level of knowledge of python and django.
If you are just starting to use django, I suggest you take a look in django documentation. It is well specified and clear. If you have some project in mind, start working on it while looking up for best practices about specific parts. For python coding style try to follow the pep 8 style guide.
If you already have done some work with django there are many sites lie these:
http://djangopackages.com/categories/apps/
http://www.django-apps.com/
What I do nowdays is look into django contrib apps (admin, auth, comments, flatpages), which are built based on the rest of django. This gives the best ways on how to write my apps.
Following the django comments framework (object independent), I am working on an app django-valuate (object independent attachment of ratings, likebuttons etc. through template tags)
These are some of my views. I have also starred this question, as I would like to know about some different perspectives and if mine are sound.
I've found djangobb (www.djangobb.org) to be a complete application, production quality and relatively simple. I use it as a base for my application which has nothing to do with forums and bb.
cloc output: only 3000 lines of python code in 30 files, another 2900 lines of templates html
I do not think there would be any one specific app that would cover all/most features of Django since the concept of the Django app itself is to perform specific/related functionality.
Having said that, a popular Django app is django-registration. Its popularity stems from the obvious requirement of most webapps to have User authentication and also its extremely easy to integrate with a Django project.
The best approach perhaps would be to keep trying the tons of open source Django apps available on the net. You can browse through http://www.djangopackages.com/ and http://www.django-apps.com/ to start getting your hands dirty.
snipt.net, a code sharing site:
https://github.com/lionburger/snipt
Review Board, a code review web app
https://github.com/reviewboard/reviewboard/tree/master/reviewboard
rietveld, another code review on app engine, by GVR himself. You need to know a bit of Django before digging into this source code since the Django models don't work on App Engine, GAE db model is used instead.
http://code.google.com/p/rietveld/source/browse/#svn%2Ftrunk