Tensorflow.js constant retraining - server-side

I have a application were we gather and classify images triggered by motion detection. We have a model trained on a lot of images that works OK. I have converted it to TF.js format and are able to make predictions in the browser, so far so good.
However we have cameras on a lot of different locations and the lighting and surroundings vary on each location whereas and we also put up new cameras each year. So we would need to retrain the model often and I am also afraid that the model will be to generic and not so accurate on each specific location.
All data we gather from the motion detection is uploaded to our server and we use a web interface to classify all the images as "false positive, positive etc" and store everything in a MYSQL database.
The best solution I think would to have a generic model trained on a lot of data. This model would be implemented on each each specific location. And while we manually interpret each image as we normally would do we would relearn the generic model so that it will be specific to each location.
To solve this we have to serve the models on our server our on some host and be able to write to the model since we are a lot of different people interpreting the data on different browsers and computers.
Would it be possible and a good solution? I would love some input before I invest more time in to this. I haven't found a whole lot of information about, serving writable models and reinforcement learning on tensorflow.js
So
I was wondering if it is possible to serve tensoflow.js on our server that was trained on our data. But for every manual intepretation the model would "relearn" with the new image.

Related

Problem loading large amount of geoJSON data to Leaflet map [duplicate]

I'm doing some tests with an HTML map in conjunction with Leaflet. Server side I have a Ruby Sinatra app serving json markers fetched by a MySQL table. What are the best practices working with 2k-5k and potentially more markers?
Load all the markers in the first place and then delegate everything to Leaflet.markercluster.
Load the markers every time the map viewport change, sending southWest & northEast points to the server, elaborate the clipping server side and then sync the marker buffer client side with the server-fetched entries (what I'm doing right now).
A mix of the two above approaches.
Thanks,
Luca
A few months have passed since I originally posted the question and I made it through!
As #Brett DeWoody correctly noted the right approach is to be strictly related to the number of DOM elements on the screen (I'm referring mainly to markers). The more the merrier if your device is faster (CPU especially). Since the app I was developing has both desktop and tablet as target devices, CPU was a relevant factor just like the marker density of different geo-areas.
I decided to separate DBase querying/fetching and map representation/displaying. Basically, the user adjusts controls/inputs to filter the whole dataset, afterward records are fetched and Leaflet.markercluster does the job of representation. When a filter is modified the cycle starts over. Users can choose the map zoom level of clusterization depending on their CPU power.
In my particular scenario, the above-mentioned represented the best approach (verified by console.time). I found that viewport optimization was good for lower marker-density areas (a pity).
Hope it may be helpful.
Cheers,
Luca
Try options and optimize when you see issues rather than optimizing early. You can probably get away with just Leaflet.markercluster unless your markers have lots of data attached to them.

How to train ML .Net model in runtime

is there any way to train an ml .net model in runtime through user input?
I've created a text classification model, trained it local, deployed it and now my users are using it.
Needed workflow:
Text will be categorized, category is displayed to user, he can accept it or select another of the predefined categories, than this feedback should train the model again.
Thanks!
What you are describing seems like online learning.
ML.NET doesn't have any true 'online' models (by which I mean, models that can adapt to new data example by example and instantaneously refresh): all ML.NET algorithms are 'batch' trainers, that require a (typically large) corpus of training data to produce a model.
If your situation allows, you could aggregate the users' responses as 'additional training data', and re-train the model periodically using this data (in addition to the older data, possibly down-sampled or otherwise decayed).
As #Jon pointed out, a slight modification of the above mechanism is to 'incrementally train an existing model on a new batch of data'. This is still a batch method, but it can reduce the retraining time.
Of ML.NET's multiclass trainers, only LbfgsMaximumEntropyMulticlassTrainer supports this mode (see documentation).
It might be tempting to take this approach to the limit, and 'retrain' the model on each 'batch' of one example. Unless you really, really know what you are doing, I would advise against it: more likely than not, such a training regime will be overfitting rapidly and disastrously.

add new vocabulary to existing Doc2vec model

I Already have a Doc2Vec model. I have trained it with my train data.
Now after a while I want to use Doc2Vec for my test data. I want to add my test data vocabulary to my existing model's vocabulary. How can I do this?
I mean how can I update my vocabulary?
Here is my model:
model = model.load('my_model.Doc2vec')
Words that weren't present for training mean nothing to Doc2Vec, so quite commonly, they're just ignored when encountered in later texts.
It would only make sense to add new words to a model if you could also do more training, including those new words, to somehow integrate them with the existing model.
But, while such continued incremental training is theoretically possible, it also requires a lot of murky choices of how much training should be done, at what alpha learning rates, and to what extent older examples should also be retrained to maintain model consistency. There's little published work suggesting working rules-of-thumb, and doing it blindly could just as likely worsen the model's performance as improve it.
(Also, while the parent class for Doc2Vec, Word2Vec, offers an experimental update=True option on its build_vocab() step for later vocabulary-expansion, it wasn't designed or tested with Doc2Vec in mind, and there's an open issue where trying to use it causes memory-fault crashes: https://github.com/RaRe-Technologies/gensim/issues/1019.)
Note that since Doc2Vec is an unsupervised method for creating features from text, if your ultimate task is using Doc2Vec features for classification, it can sometimes be sensible to include your 'test' texts (without class labeling) in the Doc2Vec training set, so that it learns their words and the (unsupervised) relations to other words. The separate supervised classifier would then only be trained on non-test items, and their known labels.

Correct implementation of the Filter (Criteria) Design Pattern

The design pattern is explained here:
http://www.tutorialspoint.com/design_pattern/filter_pattern.htm
I'm working on a software very similar to Adobe Lightroom or ACDSee but with different purposes. The user (photographer) is able to import thousands of images from his hard drive (it wouldn't be weird to have over 100k/200k images).
We have a side panel where users can create custom "filters" which are expressions like:
Does contain the keyword: "car"
AND
Does not contain the keyword "woods"
AND
(
Camera model is "Nikon D300s"
OR
Camera model is "Canon 7D Mark II"
)
AND
NOT
Directory is "C:\today_pictures"
You can get the idea from the above example.
We have a SQLite database where all image information is stored. The question is, should we load ALL Photo objects into memory from the database the first time the program is loaded and implement the Criteria/Filter design pattern as explained in the website cited above so our Criteria classes filter objects or is better to do the criteria classes actually generate an SQL query that is finally executed in order to retrieve only what's needed from the database?
We are developing the program with C++ (QT).
TL;DR: It's already properly implemented in SQLITE3, and look at how long that took. You'll face the same burden.
It'd be a horrible case of data duplication to read the data from the database and store it again in another data structure. Use database queries to implement the query that the user gave you. Let the database execute the query. That's what databases are for.
By reimplementing a search/query system for ~500k records, you'll be rewriting large chunks of a bog-standard database yourself. It'd be a mostly pointless exercise. SQLITE3 is very well tested and is essentially foolproof. It'll cost you thousands of hours of work to reimplement even a small fraction of its capabilities and reliability/resiliency. If that doesn't scream "reinventing the wheel", I don't know what does.
The database also allows you to very easily implement lookahead/dropdowns to aid the user in writing the query. For example, as you're typing out "camera model is", the user can have an option of autocompletion or a dropdown to select one or more models from.
You paid the "price" of a database, it'd be a shame for it all to go to waste. So, use it. It'll give you lots of leverage, and allow you to implement features two orders of magnitude faster than otherwise.
The pattern you've linked to is just a pattern. It doesn't mean that it's an exact blueprint of how to design your application to perform on real data. You'll be, eventually, fighting things such as concurrency (a file scanning thread running to update the metadata), indexing, resiliency in face of crashes, etc. In the end you'll end up with big chunks of SQLITE reimplemented for your particular application. 500k metadata records are nothing much, if you design your query translator well and support it with proper indexes, it'll work perfectly well.

GeoDjango Too Much For This Project?

I'm currently building my first Django-based app, which is proceeding
fairly well, thus far. One of the goals is to allow users to post
information (text, pics, video) and for the app to be able to
automatically detect the location where they posted this information
(i.e., pulling location info from the browser). That data would then ideally
be able to be filtered later, such as viewing the posts that were made within
a specific radius.
I've been reading a bit about GeoDjango and it sounds intriguing,
if perhaps more sophisticated than the requirements of this project.The querying aspects appear promising.
A colleague, though, suggested everything that can be done using
GeoDjango is equally efficient utilizing the Google Maps API with
JavaScript or JQuery to obtain the proper coordinates.
Essentially, I'm looking to see what benefits GeoDjango would offer
this fairly straightforward project over using just the Google Maps API. If I've already
started a project in basic Django; is incorporating GeoDjango
problematic? I'm still attempting to master the basics of Django and
venturing into GeoDjango may be too much for a novice developer. Or not.
Any insight appreciated.
To accurately find geolocated posts within a given radius of a location, you need to calculate distances between geographic locations. The calculations are not trivial. To quote the Django docs (with a minor grammatical correction):
Distance calculations with spatial data are tricky because,
unfortunately, the Earth is not flat.
Fortunately using GeoDjango hides this complexity. Distance queries are as simple as the following:
qs = SouthTexasCity.objects.filter(point__distance_lte=(pnt, 7000))
Although one could program this logic using JavaScript/JQuery, I don't see a reason to because you are already using Django. Unless if you are:
unable to use a spatial database. GeoDjango distance queries are only available if you use PostGIS, Oracle, or SpatiaLite as your database. (i.e. MySQL does not support distance queries)
unable to install the geospatial libraries that GeoDjango depends on.