WEKA Incremental Learning? - weka

The idea of incremental learning that i understand, is that after Training, i save my model and when i have new data, instead of training the old data with new one, i just load the model i have saved and train again using the new data and the new trained model would build on top of the old one.
I have searched for this in WEKA and i found that this can be done using "Incremental Algorithms". I know that Hoefdding-Tree is an incremental version of the J48 algorithm but i am not sure how do the incremental learning.
If anybody could explain if this is possible in WEKA and how it could be done.

In order to do incremental learning in WEKA, you have to choose classifiers that implement an UpdatableClassifer Interface. There are 10 classifiers that can do this. Note that this can only be done using either coding or command line.
You have to first build your model from training data, then save the model. After that you use the same model and train more.
Using HoefddingTree algorithm, it would be something like this:
java weka.classifiers.trees.HoeffdingTree -L 2 -S 0 -E 1.0E-7 -H 0.1 -M 0.01 -G 200.0 -N 0.0 -t Training.arff -no-cv -d ht.model
java weka.classifiers.trees.HoeffdingTree -t Training.arff -T Testing.arff -l ht.model -d ht.updated.model
of-course there is no need to specify the training parameter again when updating the model because these settings are already saved in the model.
For more information:
http://weka.8497.n7.nabble.com/WEKA-Incremental-Learning-Training-td35691.html
https://weka.wikispaces.com/Classification-Train/test%20set#Classification-Building a Classifier-Incremental

Related

Multi Object Tracker with YOLOv5

I'm using YOLOv5 to detect multiple object in every frame of a video using a webcam. I would like to track objects instead of detect them every frame and in order to do this I tried YOLOv5-DeepSort. There is a big problem though: Yolov5 can be compiled with TensorRT making it quite fast for an embedded board (50FPS) but DeepSort seems like can't be compiled in the same way.
So I'm now looking for an alternative that is not too expensive and that can improve my detection by tracking objects. Any idea? I already tried the KCF tracker from OpenCV and motpy but both are very bad.
DISCLAIMER: I am the main https://github.com/mikel-brostrom/Yolov5_DeepSort_OSNet contributor.
Sadly, there is no TensorRT export option at the moment. You could try using https://github.com/abewley/sort. This is, DeepSORT but without the deep appearance descriptor, so the tracking will only be based on motion, which depending on your use-case could be good enough.
Another option could be to export the models to ONNX which is relatively easy and then load them with TensorRT following some tutorial like: https://learnopencv.com/how-to-convert-a-model-from-pytorch-to-tensorrt-and-speed-up-inference/
Aug 6 2022 EDIT -------------------
I added a ReID specific export script to my repo. It generates: ONNX, OpenVINO and TFLite models out of mobilenet and resnet50 pt models. I also added a multibackend model loader and inferencer that supports the 3 aforementioned type of models. Planning to add TensorRT in a close future.
A small tutorial can be found here
Sept 9 2022 EDIT -------------------
TensorRT export and inference now supported. Example usage:
python3 reid_export.py --weights /datadrive/mikel/Yolov5_StrongSORT_OSNet/weights/osnet_x0_25_msmt17.pt --include onnx engine --dynamic --device 0 --batch-size 30
python3 track.py --source 0 --strong-sort-weights weights/osnet_x0_25_msmt17.engine --imgsz 640 --yolo-weights weights/yolov5m.engine --device 0 --class 0

Load partial shapefile into Postgis / GeoDjango project

I have a shapefile with Canadian postal codes, but am only looking to load a small subset of the data. I can load the entire data file and use SQL or Django queries to prune the data, but the load process takes about 2 hours on the slower machines I'm using.
As the data I'm actually after is about 10% of the dataset, this isn't a very efficient process.
I'm following the instructions in the Geodjango tutorial, specifically the following code:
from django.contrib.gis.utils import LayerMapping
from geoapp.models import TestGeo
mapping = {'name' : 'str', # The 'name' model field maps to the 'str' layer field.
'poly' : 'POLYGON', # For geometry fields use OGC name.
} # The mapping is a dictionary
lm = LayerMapping(TestGeo, 'test_poly.shp', mapping)
lm.save(verbose=True) # Save the layermap, imports the data.
Is there a way to only only import data with a particular name, as in the example above?
I'm limited to the Linux / OS X command line, so wouldn't be able to utilize any GUI tools.
Thanks for everyone here and on Postgis for their help, particularly ThomasG77 for this answer.
The following line did the trick:
ogr2ogr PostalCodes.shp CANmep.shp -sql "select * from CANmep where substr(postalcode,1,3) in ('M1C', 'M1R')"
ogr2ogr comes with GDAL. brew install gdal will install GDAL on OS X. If you're on another *nix system, the following installs it from source:
$ wget http://download.osgeo.org/gdal/gdal-1.9.2.tar.gz
$ tar xzf gdal-1.9.2.tar.gz
$ cd gdal-1.9.2
$ ./configure
$ make
$ sudo make install
If the required postal codes you need won't change for some time, try creating a shapefile of selected postal codes with QGIS. If you're not familiar with QGIS it's worth looking into. I use it to prepare the file for the webapp before uploading such as crs conversion, editing, attribute table and maybe simplify the geometry.
Theres plenty of tutorials and great help at gis.stackexchange
If you haven't done so already,take this question to gis.stackexchange.
hope this helps get you started, and feel free to ask for more info. I was new to django/geodjango not long ago and appreciated all of the help I received. Django is not for the faint of heart.
Michael

Weka - Measuring testing time

I'm using Weka 3.6.8 to carry out some machine learning and I'm want to find the 'time taken to test model on training/testing data'. When I test a predictive model on evaluation data, this parameter seems to be missing. Has this feature been removed from Weka or is it just a setting I'm missing? All I seem to be able to find is the time taken to build the actual predictive model. (I've also checked the Weka Manual but can't find anything)
Thanks in advance
That feature was added to 3.7.7, you need to upgrade. You should be able to get this data by running the test on the command line with the -T parameter.

WEKA: Classifying an ARFF data with a given SMO model

I'm new with weka and this is my problem:
I've a unlabeled arff data and a given SMO model; I need classify this data with that model.
I searched examples, but all of them use a testing set to build classifier and I've not testing sets.
I need get classification with java or weka command line.
I tryed (under linux) command like:
java weka.classifiers.functions.SMO -l /path/of/mymodel/SMOModel.model -T /path/pf/myunlabeledarff/unlabeled.arff
but I get several errors :S
Can someone help me?
Thanks a lot
Documentation showing that the -l flag works is here: http://weka.wikispaces.com/Primer. That documentation also indicates that your syntax is correct, and that what you are trying to do is possible.
You say that the data is unlabeled: this can cause errors if the arff file you are using to predict does not match the format of the arff file which was used to create the model. Make sure that the arff header has the class attribute declared in it, and that every instance (row) in the file has a class value in it (even if the value is a ? to indicate unknown). Otherwise the formats won't match, and the classifier won't work.
Please post your error messages if this does not solve the problem.

Weka says 'training and test set are not compatible' when both are the same file

I'm getting a very odd error from the weka machine learning toolkit:
java weka.classifiers.meta.AdaBoostM1 -t train.arff -d tmp.model -c 22 //generates the model
java weka.classifiers.meta.AdaBoostM1 -l tmp.model -T train.arff -p 22 //have the model predict values in the set it was trained on.
This produces the message:
java.lang.Exception: training and test set are not compatible
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1035)
at weka.classifiers.Classifier.runClassifier(Classifier.java:312)
at weka.classifiers.meta.AdaBoostM1.main(AdaBoostM1.java:779)
But of course, the input files are the same... Any suggestions?
Sometimes Weka is complaining when the class variable does not consist of the same number of classes, e.g. when you training data consists of the classes {a,b,c} and the testing data (loaded later) only has {a,c}. In that case Weka just throws that nice exception :)
Maybe you find a solution in the Weka source code or by loading your data sets with the Weka Explorer. The latter one tells you how the data set is looking like when it is loaded...