To track object on video frame, first of all I extract image frames from video and save those images to a folder. Then I am supposed to process those images to find an object. Actually I do not know if this is a practical thing, because all the algorithm did this for one step. Is this correct?
Well, your approach will consume a lot of space on your disk depending on the size of the video and the size of the frames, plus you will spend a considerable amount of time reading frames from the disk.
Have you tried to perform real-time video processing instead? If your algorithm is not too slow, there are some posts that show the things that you need to do:
This post demonstrates how to use the C interface of OpenCV to execute a function to convert frames captured by the webcam (on-the-fly) to grayscale and displays them on the screen;
This post shows a simple way to detect a square in an image using the C++ interface;
This post is a slight variation of the one above, and shows how to detect a paper sheet;
This thread shows several different ways to perform advanced square detection.
I trust you are capable of converting code from the C interface to the C++ interface.
There is no point in storing frames of a video if you're using OpenCV, as it has really handy methods for capturing frames from a camera/stored video real-time.
In this post you have an example code for capturing frames from a video.
Then, if you want to detect objects on those frames, you need to process each frame using a detection algorithm. OpenCV brings some sample code related to the topic. You can try to use SIFT algorithm, to detect a picture, for example.
Related
In order to accomplish some specific editing on some .avi files, I'd like to create an application (in C++) that is able to load, edit, and save those .avi files. But, what is the most efficient way? When first thinking about it, a simple 3D-Array containing a 2D-array of pixels for every frame seems the simplest solution; But then its size would be ENORMOUS. I mean, let's assume that a pixel only needs a color. One color would mean 3bytes (1char r, 1char b, 1char g). If I now have a 1920x1080 video format, this would mean 2MEGABYTES for only one frame! This data may or may not be smaller if using pointers for the colors, so that alreay used colors wont take more size - I don't really know, since I'm pretty new to C++ and the whole low-level stuff. (As a comparison: One of my AVI files recorded with Xvid codec is 40seconds long, 30fps, and only has 2MB.)
So how would you actually store the video data (Not even the audio, just the video) efficiently (while still being easily able to perform per-frame-changes on it)?
As you have realised, uncompressed video is enormous and it is not practical to store an entire video in this way.
Video compression is an extremely complex topic, but more-or-less, it works as follows: certain "key-frames" are compressed using fairly standard compression techniques similar or identical to still-photo compression such as JPEG. Frames following key-frames are compressed by comparing the frame with the previous one and looking for changes (such as moving blocks). Every now and again, a new key-frame is used.
You don't really have to worry much about that as you are not going to write your own video coder/decoder (codec). There are standard ones.
What will happen is that your program will decode the compressed video frame-by-frame and keep a certain number of frames in memory while you are working on them and then re-encode them when it is finished. In the uncompressed form, you will have access to the individual pixels and can work on them how you want.
You are probably not going to do that either by yourself - it is very hard. You probably need to use a framework, such as OpenCV. There are a huge number of standard filters and tools built in to these frameworks, and it may be that what you want to do is already implemented somewhere.
The OpenCV framework can return individual frames in a Mat object and you can then access the pixels. See this post Get Pixels from Mat
OpenCV
Tutorial page: Open CV Tutorial
I am using the Bumblebee2 camera and I am having trouble with acquiring stereo images from it. When I attempt to access the camera using MATLAB, the program crashes.
Does anyone know how I can acquire the stereo images using FlyCapture?
Matlab cannot read the BumbleBee 2 output directly. To do that you'll have to record the stream and process it offline. I wrote a proprietary recorder based on the code samples in the SDK. You can split the left/right images and record each one in a separate video container (e.g. using OpenCV to write a compressed avi file). Later, you can load these images into memory, and use Triclops to compute disparity maps (or alternatively, use OpenCV to run other algorithms, like semi-global block matching).
Flycapture can capture image series or video clips, but you have less control over what you get. I suggest you use the code samples to write a simple recorder, and then load your output into Matlab in standard ways. Consult the Point Grey tech support.
I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm.
I have completed this task using Opencv in pixel domain. The results on the board are very accurate but extremely slow. Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead.
I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos.
I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further.
Thanks and Regards.
I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors.
For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement.
Note, however, that this is likely to be imprecise at best. Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. The bricks are enough alike that the closest match will depend more on illumination than the brick itself.
You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. openCV is excellent software and what it does best. Have you considered scaling the video down to a smaller resolution for easier analysis by openCV?
I think you are highly over estimating the computing power of this $45 computer. Object recognition and tracking is VERY hard computationally speaking. I would start by seeing how many frames per second your board can track and optimize from there. Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU.
Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor.
FFMPEG is a very old library that is not being supported now a days. It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. Most of the commands usually are outdated.
The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#.
We're currently developing some functionality for our program that needs OpenCV. One of the ideas being tossed at the table is the use of a "buffer" which saves a minute of video data to the memory and then we need to extract like a 13-second video file from that buffer for every event trigger.
Currently we don't have enough experience with OpenCV so we don't know if it is possible or not. Looking at the documentation the only allowable function to write in memory are imencode and imdecode, but those are images. If we can find a way to write sequences of images to a video file that would be neat, but for now our idea is to use a video buffer.
We're also using OpenCV version 2 specifications.
TL;DR We want to know if it is possible to write a portion of a video to memory.
In OpenCV, every video is treated as a collection of frames(images). Depending on your cameras' FPS you can capture frames periodically and fill the buffer with them. Meanwhile you can destroy the oldest frame(taken 1 min before). So a FIFO data structure can be implemented to achieve your goal. Getting a 13 second sample is easy, just jump to a random frame and write 13*FPS frames sequentially to a video file.
But there will be some sync and timing problems AFAIK and as far as I've used OpenCV.
Here is the link of OpenCV documentation about video i/o. Especially the last chunk of code is what you will use for writing.
TL;DR : There is no video, there are sequential images with little differences. So you need to treat them as such.
heylo!
I have a bunch of old video files converted from old vhs tapes. The problem is, since those tapes were really old, the videos are jumpy (sometimes the bottom of the frame is in the middle of the screen followed by the top of the next frame)
My goal is to write something in opencv to automatically remove the frames where the image is not lined up properly.
My idea is to detect the difference between the previous frame and the next frame. If the video were smooth, the difference would be minimal. If the frame is jumpy then the difference would be noticeable.
My question: how would opencv calculate this difference between two frames?
Thx!!!!
I hope you know how to grab frames from video. If not, check here. Fortunately, it also finds similarity between two videos.
What you will learn in this tutorial:
How to open and read video streams
Two ways for checking image similarity: PSNR and SSIM
I think you can just make small adaptations to it as per your requirements. This tutorial has all enough information about it.
You can also check this SOF : Simple and fast method to compare images for similarity