Is there bigger weights already available for Yolov4? - computer-vision

I want to detect more objects than coco dataset which detects only 80 objects , I want to detect as many as possible actions also like hugging ,swimming.....etc.
I don't care about the size and I do not want to train myself ... So is there a dataset(weights) big enough already available that I can download and use or I do have to train and label for yolo?

You can find here a very huge dataset with bounding boxes!

What you are trying to classify is represented as Action Recognition. Here [1] is a good repo that lists a lot of out-of-the-box models for this task.
An explanation: Models (like YOLO) contain two main blocks: feature extraction (CNN stuff) and classification (linear layers). When training from scratch, both feature extraction and classification will be trained from scratch. It is easy to train classification to what you want, but it is hard to train the feature extraction part (as it takes a lot of time). Hence, we typically use pre-trained models on generalized datasets (like YOLO is trained on COCO), so our feature extraction part starts from a somewhat good generalized place. Later, we replace the classification part will our own to be trained from scratch for our task.
TL;DR, you can use a pre-trained YOLO model on COCO for your task by replacing the last linear layers to classify what you want. Here are some resources for different frameworks [2, 3].
Here [4] is a simple walkthrough of how to do this.
[1] https://github.com/jinwchoi/awesome-action-recognition/blob/master/README.md#action-recognition-and-video-understanding
[2] TensorFlow: https://www.tensorflow.org/tutorials/images/transfer_learning
[3] PyTorch: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
[4] https://blog.roboflow.com/training-yolov4-on-a-custom-dataset/

Related

How to do transfer learning in darknet for YoloV3

I want to do transfer learning in YOLOv3 in Darknet so I want to use the pre-trained model of YOLOv3 that was trained on COCO dataset and then further train it on my own dataset to detect additional objects. So what are the steps that I should do? How can I label my data so that it can be used in Darknet? Please help me because it's the first time that I use Darknet and YOLO.
It's all explained here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
Note that notation must be consistent. Any missing annotated object will result in a bad learning and so a bad prediction.
This question was answered in "Fine-tuning and transfer learning by the example of YOLO" (Fine-tuning and transfer learning by the example of YOLO).
The answer given by gameon67, suggesting this:
If you are using AlexeyAB's darknet repo (not darkflow), he suggests
to do Fine-Tuning instead of Transfer Learning by setting this param
in cfg file : stopbackward=1 .
Then input ./darknet partial yourConfigFile.cfg
yourWeightsFile.weights outPutName.LastLayer# LastLayer# such as :
./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 It
will create yolov3.conv.81 and will freeze the lower layer, then you
can train by using weights file yolov3.conv.81 instead of original
darknet53.conv.74.
References : https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

Is it possible to use the multiclass classifier of aws to recognize the given place of the text?

I'm using AWS SageMaker, and i want to create something that, with a given text, it recognize the place of that description. Is it possible?
If there are no other classes besides the text that you would like your model to identify, you may not need a multiclass classifier.
You could train your own text detection model using Amazon SageMaker, and train using a dataset with labelled examples using the Object Detection Algorithm, but this becomes rather involved for a problem that has existing solutions available.
If the appearance of the text you're trying to detect is identical each time, your problem space gets reduced from trying to interpret variable text, to simply having to gather enough examples and perform object detection for the "pattern" your text forms visually. Note that if the text were to appear in different fonts or styles, that the generic object detection method would not interpret it dynamically, and an OCR-based solution would likely be necessary.
More broadly, for text identification in images on AWS, you have quite a few options:
Amazon Rekognition has a DetectText method that will enable you to easily find text within an image. If it's a small or simple phrase, with alphanumeric characters, this should work very well for your use case.
Amazon Textract will help you perform OCR (optical character recognition) while retaining the structure of the source. This is great for documents and tables, but doesn't sound like it may be applicable to your use case.
The AWS marketplace will also have hosted options available from third party vendors. One example of this for text region identification is this one from RocketML.
There are also some great open source tools I'd recommend looking into: OpenCV for ascertaining the text bounding boxes, and Tesseract for OCR and text extraction. This blog post does a good job walking through the process of using them together.
Any of these will help to solve your problem of performing OCR/text identification on AWS, but the best choice comes down to what your current and future needs are, and how quickly you're looking to implement the feature.
Your question is not clear regarding the data that you have or the problem that you want to solve.
If you have a text that includes a place name in it (for example, "I visited Seattle and enjoyed the fish market"), you can use Amazon Comprehend Name Entity Extraction (NEE) including places ("Seattle" in the above example)
{
"Entities": [
{
"Score": 0.9857407212257385,
"Type": "LOCATION",
"Text": "Seattle",
"BeginOffset": 10,
"EndOffset": 17
}
]
}
If the description is more general and you want to classify if the description is of a hotel, a restaurant, a theme park, a concert/show, or similar types of places, you can either use the Custom classification in Comprehend or the Neural Topic Model in SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/ntm.html). You will need some examples of the classes and documents/sentences that are used for the model training.

is the 'warm start' option of dlib's dcd trainer only for 1-class classification?

I am using dlib for a program that classifies medical images using SVM. Because the images are large (many features, say 10000 to 100000) and I use a linear kernel, it sounds as though the svm_c_linear_dcd_trainer is a good class to use.
Another reason that I like the svm_c_linear_dcd_trainer class is that it claims to support 'warm starting', so if a single observation is often added to/subtracted from the sample (such as in LOOCV) that is efficient for long vectors.
But the only example of svm_c_linear_dcd_trainer uses one_class classification. The documentation suggests that the force_last_weight_to_1 option that implements the warm start, is for 1-class classification only.
Is that true, i.e. is this warm-start option not available for binary classification? An in that case, would another implementation be faster?
That is not a limitation. Did you read the documentation for the class? http://dlib.net/dlib/svm/svm_c_linear_dcd_trainer_abstract.h.html#svm_c_linear_dcd_trainer Where in dlib's documentation does it say warm-starting is limited to one class classification. The documentation for the svm_c_linear_dcd_trainer doesn't even mention one class classification near as I can see.

How to use video for traning deep learning (caffe & digits)?

Based on a, b, c, d, Action Recognition with Deep Learning, Long-term Recurrent Convolutional Networks, e, Generic Features for Video Analysis,... there are several methods for analyses video by caffe but what is exactly the input for caffe.
Can we put video in different folders like image for training?
DIGITS doesn't support video data yet. When we do we'll add some sort of video example here:
https://github.com/NVIDIA/DIGITS/tree/master/examples
As far as my experience, you can't directly do it with digits, because in digits no default settings for sequences of frame analysis. A very famous project in Caffe known as C3D for action recognition can be used for training a new model or fine-tune existing moded for action or activity recognition.
C3D

NLTK wrapper for Weka to build a classifier

I'm building a Named Entity classifier with nltk and I have my focus on location retrieval (of any type, from countries to museums, restaurants or roads). I'm trying to vary featuresets and methods I use.
For now, I've used NLTK's built-in Maxent, NaiveBayes, PositiveNaiveBayes, DecisionTrees and SVM. I'm using 40 different combinations of featuresets.
Maxent seems to be the best, but it's too slow. nltk's SVM is for binary classification and I had some issues with pickling the final classifier. Then I tried nltk's wrapper for scikit-learn SVM, but it didn't accept my inputs, I tried to adapt but had some float coercion problem.
Now, I'm considering to use nltk's wrapper for Weka, but I don't know if it could give me some extremely different result worthy to try and don't have to much time. My question is, what advantages Weka has over nltk's built-in classifiers?