I am currently writing a code in order to detect small objects that are moving using OpenCV and C++. I have motion detection, logging of detects, and bounding boxes working well. Now I am stuck on a way to take the video, and after I get a detection, grab 5 seconds before and 5 seconds after the detection period and save it out to a different .avi file. Does anyone have ideas on a way to do this? Even a point in the general direction would be helpful as I can't seem to find anything on taking out clips from a .avi file and saving them to different files.
Related
I am working on a pytorch project, where I’m using a webcam video stream. An object detector is used to find objects within the frame and each box is given an id by a tracker. Then, I want to analyse each bounding box with a CNN-LSTM and classify it (binary classification) based on the previous frame sequence of that box (for the last 5 frames). I want the program to run a close to real-time as possible.
Currently I am stuck with the CNN-LSTM part of my problem - the detector and tracker are working quite well already.
I am a little bit clueless on how to approach this task. Here are the questions I have:
1) How does inferencing work in this case? Do I have to save np arrays for each bounding box containing the last 5 frames, then add the current frame and delete the oldest one? Then use the model for each bounding box that is in the current frame. This way sounds very slow and inefficient. Is there a faster or easier way?
2) Do you have any tipps for creating the dataset? I have a couple of videos with bounding boxes and labels. Should I loop through the videos and save save each frame sequence for each bounding box in a new folder, together with a csv that contains the label? I have never worked with an CNN-LSTM, so I don’t know how to load the data for training.
3) Would it be possible to use the extracted features of the CNN in parallel? As mentioned above, The extracted features should be used by the LSTM for a binary classification problem. The classification is only needed for the current frame. I would like to use an additional classifier (8 classes) based on the extracted CNN features, also only for the current frame. For this classifier, the LSTM is not needed.
Since my explaining propably is very confusing, the following image hopefully helps with understanding what I want to build:
Architecture
This is the architecture I want to use. Is this possible using Pytorch? So far, I only worked with CNNs and LSTM seperately. Any help is apprechiated :)
I have a custom USB camera with a custom driver on a custom board Nvidia Jetson TX2 that is not detected through openpose examples. I access the data using GStreamer custom source. I currently pull frames into a CV mat, color convert them and feed into OpenPose on a per picture basis, it works fine but 30 - 40% slower than a comparable video stream from a plug and play camera. I would like to explore things like tracking that is available for streams since Im trying to maximize the fps. I believe the stream feed is superior due to better (continuous) use of the GPU.
In particular the speedup would come at confidence expense and would be addressed later. 1 frame goes through pose estimation and 3 - 4 subsequent frames are just tracking the object with decreasing confidence levels. I tried that on a plug and play camera and openpose example and the results were somewhat satisfactory.
The point where I stumbled is that I can put the video stream into CV VideoCapture but I do not know, however, how to provide the CV video capture to OpenPose for processing.
If there is a better way to do it, I am happy to try different things but the bottom line is that the custom camera stays (I know ;/). Solutions to the issue described or different ideas are welcome.
Things I already tried:
Lower resolution of the camera (the camera crops below certain res instead of binning so cant really go below 1920x1080, its a 40+ MegaPixel video camera by the way)
use CUDA to shrink the image before feeding it to OpenPose (the shrink + pose estimation time was virtually equivalent to the pose estimation on the original image)
since the camera view is static, check for changes between frames, crop the image down to the area that changed and run pose estimation on that section (10% speedup, high risk of missing something)
How to recognize rain on camera vision using with OpenCV in C++?
Or if somebody stick a sticker on a camera how recognize it with OpenCV in C++?
Or if somebody throw color to the camera how can i detect it with OpenCV in C++?
Detect these on camera vision:
Rain
Sticker
Color
Here is an example video of sticker!
Camera Vision-Sticker
In case of a sticker, you're just looking for a large dark area that doesn't change in time.
In case of color, analyze image color stats - if somebody sprays some paint on a camera (is that what you mean by "throwing color"?), some color is going to be dominant over all the others.
You can also try to handle both cases by subtracting frames and detecting image areas that don't change in time that way.
You may want to use machine learning for finding threshold values (e.g. area size, its shape properties, such as width/length ratio, continuousness etc.) used to decide when to consider something to be a sticker/color or something else.
As for the rain, I guess there's no simple answer that can be given in a few sentences. There are some articles available in the web though. That said, I would guess it would be simpler and cheaper to detect rain by just installing external rain sensors (like the ones activating wipers in a car) rather than trying to do it by developing your own computer vision algorithm for that purpose.
This sounds like an interesting project, where a camera can automatically detect obstruction (paint, sticker, rain). It will most likely be necessary for the camera to be mounted without obstructions so that the expected image can be learned. If the usage scenario allows that, it won't be very hard.Both sticker and rain result in strong permanent deviations from the expected image, while rain will result in noisy images.
OpenCV with C++ or Python can help solve this kind of problems, because complicated computer vision algorithms are already implemented there. It takes some time to get started with, but after that OpenCV is not hard.
To track object on video frame, first of all I extract image frames from video and save those images to a folder. Then I am supposed to process those images to find an object. Actually I do not know if this is a practical thing, because all the algorithm did this for one step. Is this correct?
Well, your approach will consume a lot of space on your disk depending on the size of the video and the size of the frames, plus you will spend a considerable amount of time reading frames from the disk.
Have you tried to perform real-time video processing instead? If your algorithm is not too slow, there are some posts that show the things that you need to do:
This post demonstrates how to use the C interface of OpenCV to execute a function to convert frames captured by the webcam (on-the-fly) to grayscale and displays them on the screen;
This post shows a simple way to detect a square in an image using the C++ interface;
This post is a slight variation of the one above, and shows how to detect a paper sheet;
This thread shows several different ways to perform advanced square detection.
I trust you are capable of converting code from the C interface to the C++ interface.
There is no point in storing frames of a video if you're using OpenCV, as it has really handy methods for capturing frames from a camera/stored video real-time.
In this post you have an example code for capturing frames from a video.
Then, if you want to detect objects on those frames, you need to process each frame using a detection algorithm. OpenCV brings some sample code related to the topic. You can try to use SIFT algorithm, to detect a picture, for example.
heylo!
I have a bunch of old video files converted from old vhs tapes. The problem is, since those tapes were really old, the videos are jumpy (sometimes the bottom of the frame is in the middle of the screen followed by the top of the next frame)
My goal is to write something in opencv to automatically remove the frames where the image is not lined up properly.
My idea is to detect the difference between the previous frame and the next frame. If the video were smooth, the difference would be minimal. If the frame is jumpy then the difference would be noticeable.
My question: how would opencv calculate this difference between two frames?
Thx!!!!
I hope you know how to grab frames from video. If not, check here. Fortunately, it also finds similarity between two videos.
What you will learn in this tutorial:
How to open and read video streams
Two ways for checking image similarity: PSNR and SSIM
I think you can just make small adaptations to it as per your requirements. This tutorial has all enough information about it.
You can also check this SOF : Simple and fast method to compare images for similarity