After training a custom object detection model using Roboflow, all of the predicted objected are switched - computer-vision

I have been using roboflow to annotate door and door handle images, (about 1300 images).
For generating the model, I have used auto-orientation and resizing to 416*416 to pre-process
and vertical flip for augmentation.
After testing the model on a webcam and seeing the test set predictions, the model seemed to predict doors as door handles and door handles as doors almost in every image.
In a nutshell, the issue is that the model predicts the bonding boxes accurately however have all the labels switched.
I have also tried using the yolov5 github code and it has the same issue.
I can not pin point where is the problem. I would appreciate any help in this matter and any tips to prevent this from happening.

Related

Stationary/Moving Pool Ball Detection

I am working on a project where I have to detect and track pool balls while players are playing. I tried various Background Subtraction Techniques and was able to track the moving balls but was unable to detect the stationary balls at the same time as it got adapted with the Background Model update. I can't stop the Background model update as it is necessary for changing lighting conditions. Any suggestions to achieve these goals?
SamplePoolImage
The work done so far:
Used opencv built-in BS algorithms like MOG2, KNN, GSOC
Written some algorithms like GMM, and Kalman Filter.
I was able to detect the moving balls/objects but the stationary balls/ objects get adapated as background after getting detected for some time.

How to detect Anomaly from real time CCTV video?

I build a model which can detect custom objects from CCTV footage. Where my model can detect three types of objects from the footage. But I want to detect anomalies by the model from the footage.
For training purposes, I just only use normal (which are not
anomalies) videos and my model should detect anything abnormal that
happens.

How to use CNN-LSTMs to classify image sequences for multiple bounding boxes in a video stream?

I am working on a pytorch project, where I’m using a webcam video stream. An object detector is used to find objects within the frame and each box is given an id by a tracker. Then, I want to analyse each bounding box with a CNN-LSTM and classify it (binary classification) based on the previous frame sequence of that box (for the last 5 frames). I want the program to run a close to real-time as possible.
Currently I am stuck with the CNN-LSTM part of my problem - the detector and tracker are working quite well already.
I am a little bit clueless on how to approach this task. Here are the questions I have:
1) How does inferencing work in this case? Do I have to save np arrays for each bounding box containing the last 5 frames, then add the current frame and delete the oldest one? Then use the model for each bounding box that is in the current frame. This way sounds very slow and inefficient. Is there a faster or easier way?
2) Do you have any tipps for creating the dataset? I have a couple of videos with bounding boxes and labels. Should I loop through the videos and save save each frame sequence for each bounding box in a new folder, together with a csv that contains the label? I have never worked with an CNN-LSTM, so I don’t know how to load the data for training.
3) Would it be possible to use the extracted features of the CNN in parallel? As mentioned above, The extracted features should be used by the LSTM for a binary classification problem. The classification is only needed for the current frame. I would like to use an additional classifier (8 classes) based on the extracted CNN features, also only for the current frame. For this classifier, the LSTM is not needed.
Since my explaining propably is very confusing, the following image hopefully helps with understanding what I want to build:
Architecture
This is the architecture I want to use. Is this possible using Pytorch? So far, I only worked with CNNs and LSTM seperately. Any help is apprechiated :)

Can I / How to implement object recognition in app?

our aim is to develop an automotive app which automates and standardizes photo shooting of newly arrived cars at dealership. Basically, our Mavic 2 Pro takes off, orbits the vehicle and shoots photo every 90 degrees and than lands to its original position. The orbiting radius is app 4.5 m.
Since the scene of shooting is small (or when shooting indoor, GPS might not be available), we would like to rely more on built in object recognition as implemented in activetrack missions. We have currently an app based on waypoint mission, but it is not accurate. So my questions are:
1, Can anyone point us in direction how to implement object recognition in our app?
2, If recognition is not available, how to ensure consistency of output? While testing, sometimes the output exif info of photos show compass deviation up to 4 degrees, which results in object being out of view.
Thanks for advice,
Mirek
What you need is object recognition and tracking like YOLO.
The resources are here.
https://pjreddie.com/darknet/yolo/
Then you need to train the model to recognize cars, so you need a bunch of labeled photos where the car bounding box has been already given correctly.
Downstream you want then have the camera connect to a computer and stream the video, have the tracking algorithm recognize the bonding box of the car and, depending on center and size, maneuver the drone accordingly.

face tracking or "dynamic recognition"

What is a best approach for face detection/tracking considering following scenario:
when person enters in scene/frame it should be detected and recognized in every next frame until he/she leaves the scene.
also should be able to do this for multiple users at once.
I have experience with viola jones detection, and fisher face recognition. But I've used ff recognition only for previously prepared learning set, and now I need something for any user that enters the scene..
I am also interested in different solutions.
I used opencv face detection for multiple faces and the rekognition api (http://rekognition.com) and pushed the faces and retrained the dataset frequently. Light-weighted from our side, but I am sure there are more robust solutions for this.
Have you tried VideoSurveillance? Also known as OpenCV blob tracker.
It's a motion-based tracker with across frames data association(1) and if you want to replace motion with face detection, you must adjust the code by replacing the foreground mask with detection responses. This approach is called track-by-detect in the literature.
(1) "Appearance Models for Occlusion Handling", Andrew Senior et al.