Vertex Ai object detection training failing with internal error - computer-vision

I got an error for an automl pipeline. I am trying to train an object detection model with 5 labels and very heavy bias towards one of the label. But the training job is not completing at all. Can the heavy bias be an issue for the training job failing?
Please indicate the reason for the error and any measures i can take to rectify it.

Related

Google AutoML Video Tracking Architecture

I'm developing an object tracking system using Google's Vertex AI AutoML Video Tracking. We currently have an accurate model that identifies objects per frame (as a picture) and I'm exploring models that may be able to gain further insight and accuracy by using a collection of frames (video) for the classification and tracking purposes. I want to learn more about the architecture used in the AutoML Object Tracking, but all I can find is articles hyping up the dynamic nature of the architecture. Mainly, I'm trying to answer the following 3 questions:
What methods does the AutoML Object Tracking use to classify the objects and track them? Are the classifications done frame to frame, with a Euclidean distance tracker mapping objects together? Or are the objects identified and classified across multiple frames a recurrent network in space (image) and time (frame to frame). Something like a LSTM.
What performance can object tracking in AutoML achieve that is better than their image object identification models?
Where can I go to learn more about the model architectures on Vertex AI? It's hard to know which google publications are associated with their current platform.
Any feedback is greatly appreciated!!!

How does Amazon Sagemaker Ground Truth work?

Is there a publication that explains how they evaluate how "sure" the automatic system is for the label it assigns? I understand part of the labelling process is done by humans but I'm interested in how they evaluate the confidence of the prediction.
I suggest you read the Ground Trust FAQ page as it addresses of your concerns.
Q: How does Amazon SageMaker Ground Truth help with increasing the accuracy of my training datasets?
A: Amazon SageMaker Ground Truth offers the following features to help you increase the accuracy of data labeling performed by humans:
(a) Annotation consolidation: This counteracts the error/bias of individual workers by sending each data object to multiple workers and then consolidating their responses (called “annotations”) into a single label. It then takes their annotations and compares them using an annotation consolidation algorithm. This algorithm first detects outlier annotations that are disregarded. It then performs a weighted consolidation of the annotations, assigning higher weights to more reliable annotations. The output is a single label for each object.
(b) Annotation interface best practices: These are features of the annotation interfaces that enable workers to perform their tasks more accurately. Human workers are prone to error and bias, and well-designed interfaces improve worker accuracy. One best practice is to display brief instructions along with good and bad label examples in a fixed side panel. Another best practice is to darken the area outside of the box bounding boundary when workers are drawing the bounding box on an image.

classify image with caffe

I am trying to build a binary classifier using Caffe. First I trained my own image dataset by slightly modifying Fine-tuning for style recognition model. But when I try to classify a single image, it gives 50% probability for both of the classes. To classify, I used the same deploy.prototxt, just changed output_number to 2. This is strange, since while training I get accuracy ~85%. I tried both python (classify.py) and cpp (classification.cpp), and both give same result. I think I am doing something wrong with the pipeline.
Thanks for your help.

Real-time object tracking in OpenCV

I have written an object classification program using BoW clustering and SVM classification algorithms. The program runs successfully. Now that I can classify the objects, I want to track them in real time by drawing a bounding rectangle/circle around them. I have researched and came with the following ideas.
1) Use homography by using the train set images from the train data directory. But the problem with this approach is, the train image should be exactly same as the test image. Since I'm not detecting specific objects, the test images are closely related to the train images but not essentially an exact match. In homography we find a known object in a test scene. Please correct me if I am wrong about homography.
2) Use feature tracking. Im planning to extract the features computed by SIFT in the test images which are similar to the train images and then track them by drawing a bounding rectangle/circle. But the issue here is how do I know which features are from the object and which features are from the environment? Is there any member function in SVM class which can return the key points or region of interest used to classify the object?
Thank you

Pose independent face detection

I'm working on a project where I need to detect faces in very messy videos (recorded from an egocentric point of view, so you can imagine..). Faces can have angles of yaw that variate between -90 and +90, pitch with almost the same variation (well, a bit lower due to the human body constraints..) and possibly some roll variations too.
I've spent a lot of time searching for some pose independent face detector. In my project I'm using OpenCV but OpenCV face detector is not even close to the detection rate I need. It has very good results on frontal faces but almost zero results on profile faces. Using haarcascade .xml files trained on profile images doesn't really help. Combining frontal and profile cascades yield slightly better results but still, not even close to what I need.
Training my own haarcascade will be my very last resource since the huge computational (or time) requirements.
By now, what I'm asking is any help or any advice regarding this matter.
The requirements for a face detector I could use are:
very good detection rate. I don't mind a very high false positive rate since using some temporal consistency in my video I'll probably be able to get rid of the majority of them
written in c++, or that could work in a c++ application
Real time is not an issue by now, detection rate is everything I care right now.
I've seen many papers achieving these results but i couldn't find any code that I could use.
I sincerely thank for any help that you'll be able to provide.
perhaps not an answer but too long to put into comment.
you can use opencv_traincascade.exe to train a new detector that can detect a wider variety of poses. this post may be of help. http://note.sonots.com/SciSoftware/haartraining.html. i have managed to trained a detector that is sensitive within -50:+50 yaw by using feret data set. for my case, we did not want to detect purely side faces so training data is prepared accordingly. since feret already provides convenient pose variations it might be possible to train a detector somewhat close to your specification. time is not an issue if you are using lbp features, training completes in 4-5 hours at most and it goes even faster(15-30min) by setting appropriate parameters and using fewer training data(useful for ascertaining whether the detector is going to produce the output you expected).