Custom avi/MP4 file writer - c++

I am writing some video files under Windows from a camera.
I need the data unaltered - not MP4's 'uncompressed' ie. no YUV, no color interpolation - just the raw camera sensor bytestream.
At the moment I am writing this direct to disk and re-reading it later to recode into a usable video. But with no header I have to keep track of image size, frame rate, color balance etc separately.
I could add a custom header but even if the actual video data is unreadable by anything other than my app, using an AVI file would at least give me a relatively standard header to store all the camera parameters and also means that resolution, length etc would show up in explorer.
Is there an easy way of generating an AVI header/footer without sending all the data through directshow or vfw? The data is coming in at >250MB/s and I can't lose any frames so I don't have time to do much more than dump each frame to disk.
edit: Perhaps MP4 would be better I have a lot of metadata about the camera config that isn't in the AVI standard

Well, after figuring out what 'reasonable' AVI headers would be for your stream (e.g. if you use a custom codec fourcc, no application would probably be able to do useful things with it -- so why bother with AVI?)
you could just write a prebuild RIFF-AVI header at the beginning of your file. It's not too hard to figure out the values.
Each frame then has to be enclosed in its own RIFF chunk (4 Byte type: "00db" + 4 byte length + your data).
After the fact you have to fix the num_frames and some length fields in the header. And for files >2GB don't forget the OpenDML extension for the header.

Martin, since you are proficient in OpenCV, couldn't you just use cvCreateVideoWriter() for creating an uncompressed .avi?
CvVideoWriter* cvCreateVideoWriter(const char* filename, int fourcc, double fps, CvSize frame_size, int is_color=1)
Regarding the foucc param, the documentation states:
Under Win32 if 0 is passed while using an avi filename it will create a video writer that creates an uncompressed avi file.

It sounds like you could really benefit from using opencv, it could probably handle alot of this nicely for you. Take a look and see if it suits your needs: http://opencv.willowgarage.com/documentation/cpp/reading_and_writing_images_and_video.html#videocapture

You can use OpenCV to read and write avi file.
see http://opencv.willowgarage.com/documentation/cpp/reading_and_writing_images_and_video.html
Note that OpenCV can also be used to grab images from a camera.

Related

Stream OpenGL framebuffer over HTTP (via FFmpeg)

I have an OpenGL application of which rendered images need to be streamed over internet to mobile clients. Previously, it sufficed to simply record the rendering into a video file, which is already working, and now this should be extended to subsequent streaming.
What is working right now:
Render a scene to an OpenGL framebuffer object
Capture the FBO content using NvIFR
Encode it to H.264 using NvENC (no CPU round trip required)
Download the encoded frame to host memory as a byte array
Append this frame to a video file
None of this steps involves FFmpeg or any other library so far. I now want to replace the last step with "Stream the current frame's byte array over internet" and I assume that using FFmpeg and FFserver would be a reasonable choice for this. Am I correct? If not, what would be the proper way?
If so, how do I approach this within my C++ code? As pointed out, the frame is already encoded. Also, there is no sound or other stuff, simply a H.264 encoded frame as byte array that is updated irregularly and should be converted into a steady video stream. I assume that this would be FFmpeg's job and that the subsequent streaming via FFserver would be simple from there. What I don't know is how to feed my data to FFmpeg in the first place, as all FFmpeg tutorials I found (in a non-exhaustive search) work on a file or webcam/capture device as data source, not volatile data in main memory.
The file mentioned above that I am already able to create is a C++ file stream to which I append each single frame, meaning that different framerates of video and rendering are not treated correctly. This also needs to be taken care of at some point.
Can somebody point me in the right direction? Can I forward data from my application to FFmpeg to build a proper video feed without writing to the hard disk? Tutorials are greatly appreciated. By the way FFmpeg/FFserver is not mandatory. If you have a better idea for streaming of OpenGL framebuffer contents, I'm eager to know.
You can feed the ffmpeg process readily encoded H.264 data (-f h264) and tell it to simply copy the stream into to the output multiplexer (-c:v copy). To get the data actually into ffmpeg just launch it as a child process with a pipe connected to its stdin and specify stdin as reading source
FILE *ffmpeg_in = popen("ffmpeg -i /dev/stdin -f h264 -c copy ...", "w");
you can then write your encoded h264 stream to ffmpeg_in.

Most efficient way to store video data

In order to accomplish some specific editing on some .avi files, I'd like to create an application (in C++) that is able to load, edit, and save those .avi files. But, what is the most efficient way? When first thinking about it, a simple 3D-Array containing a 2D-array of pixels for every frame seems the simplest solution; But then its size would be ENORMOUS. I mean, let's assume that a pixel only needs a color. One color would mean 3bytes (1char r, 1char b, 1char g). If I now have a 1920x1080 video format, this would mean 2MEGABYTES for only one frame! This data may or may not be smaller if using pointers for the colors, so that alreay used colors wont take more size - I don't really know, since I'm pretty new to C++ and the whole low-level stuff. (As a comparison: One of my AVI files recorded with Xvid codec is 40seconds long, 30fps, and only has 2MB.)
So how would you actually store the video data (Not even the audio, just the video) efficiently (while still being easily able to perform per-frame-changes on it)?
As you have realised, uncompressed video is enormous and it is not practical to store an entire video in this way.
Video compression is an extremely complex topic, but more-or-less, it works as follows: certain "key-frames" are compressed using fairly standard compression techniques similar or identical to still-photo compression such as JPEG. Frames following key-frames are compressed by comparing the frame with the previous one and looking for changes (such as moving blocks). Every now and again, a new key-frame is used.
You don't really have to worry much about that as you are not going to write your own video coder/decoder (codec). There are standard ones.
What will happen is that your program will decode the compressed video frame-by-frame and keep a certain number of frames in memory while you are working on them and then re-encode them when it is finished. In the uncompressed form, you will have access to the individual pixels and can work on them how you want.
You are probably not going to do that either by yourself - it is very hard. You probably need to use a framework, such as OpenCV. There are a huge number of standard filters and tools built in to these frameworks, and it may be that what you want to do is already implemented somewhere.
The OpenCV framework can return individual frames in a Mat object and you can then access the pixels. See this post Get Pixels from Mat
OpenCV
Tutorial page: Open CV Tutorial

Video players questions

Given that FFmpeg is the leading multimedia framework and most of the video/audio players uses it, I'm wondering somethings about audio/video players using FFmpeg as intermediate.
I'm studying and I want to know how audio/video players works and I have some questions.
I was reading the ffplay source code and I saw that ffplay handles the subtitle stream. I tried to use a mkv file with a subtitle on it and doesn't work. I tried using arguments such as -sst but nothing happened. - I was reading about subtitles and how video files uses it (or may I say containers?). I saw that there's two ways putting a subtitle: hardsubs and softsubs - roughly speaking hardsubs mode is burned and becomes part of the video, and softsubs turns a stream of subtitles (I might be wrong - please, correct me).
The question is: How does they handle this? I mean, when the subtitle is part of the video there's nothing to do, the video stream itself shows the subtitle, but what about the softsubs? how are they handled? (I heard something about text subs as well). - How does the subtitle appears on the screen and can be configured changing fonts, size, colors, without encoding everything again?
I was studying some video players source codes and some or most of them uses OpenGL as renderer of the frame and others uses (such as Qt's QWidget) (kind of or for sure) canvas. - What is the most used and which one is fastest and better? OpenGL with shaders and stuffs? Handling YUV or RGB and so on? How does that work?
It might be a dump question but what is the format that AVFrame returns? For example, when we want to save frames as images first we need the frame and then we convert, from which format we are converting from? Does it change according with the video codec or it's always the same?
Most of the videos I've been trying to handle is using YUV720P, I tried to save the frames as png and I need to convert to RGB first. I did a test with the players and I put at the same frame and I took also screenshots and compared. The video players shows the frames more colorful. I tried the same with ffplay that uses SDL (OpenGL) and the colors (quality) of the frames seems to be really low. What might be? What they do? Is it shaders (or a kind of magic? haha).
Well, I think that is it for now. I hope you help me with that.
If this isn't the correct place, please let me know where. I haven't found another place in Stack Exchange communities.
There are a lot of question in one post:
How are 'soft subtitles' handled
The same way as any other stream :
read packets from a stream to the container
Give the packet to a decoder
Use the decoded frame as you wish. Here with most containers supporting subtitles the presentation time will be present. All you need at this time is get the text and burn it onto the image at the same presentation time. There are a lot of ways to print the text on the video, with ffmpeg or another library
What is the most used renderer and which one is fastest and better?
most used depend on the underlying system. For instance Qt only wrap native renderers, and even has a openGL version
You can only be as fast as the underlying system allows. Does it support ouble-buffering? Can it render in your decoded pixel format or do you have to perform color conversion before? This topic is too broad
Better only depend on the use case. this is too broad
what is the format that AVFrame returns?
It is a raw format (enum AVPixelFormat), and depends on the codec. There is a list of YUV and RGB FOURCCs which cover most formats in ffmpeg. Programmatically you can access the table AVCodec::pix_fmts to obtain the pixel format a specific codec support.

How to read .avi files C++

I want to read in an .avi video file for a program that I am making. I have the file location saved as a string. Is there any good tutorials on using .avi files in c++ or does anyone know who to read one in? Is it the same as normal files?
I have a previously asked SO question that goes into better detail but here is what I want to do:
I am making a program that will detect faces (though OpenCV) As of now I have been given a video processor program that will detect each face on a frame, and return the frame as a image and the CvRec of the faces. I want to take these faces and test them to validate that they are all actually faces.
After I have all the faces (tested) I want to then take the images and test them together. I test the faces on each frame for size and distance changes. If the faces pass this for a frame length of two seconds, then I want to crop the face and make it the subject of each frame.
After each frame is cropped I then want to save the new video file for the user.
Hopefully that helps. If anyone needs a better explanation please let me know.
First of all, a little background.
What is AVI?
AVI stands for Audio Video Interleave. It is a special case of the RIFF (Resource Interchange File Format). AVI is defined by Microsoft and it is the most common format for audio/video data.
I assume you would want to read a avi file and decode the compressed video frames. AVI file is just like any other normal file and you can use fread()(in C) or iostream(in C++) to open an avi file and read it contents. But the contents of an avi file are video frames in a compressed format. The compression allows video content of bigger sizes to be efficiently packed in less memory space.To make any sense of this compressed data you would have to decode the encoded data format.You will have to study the standard which describes how AVI encoding is done and then extract and decode the frames. this raw video data now when fed to a video device will be displayed in video format.
It seems you are staying within OpenCV so things are easy. If OpenCV is compiled properly it is capable of delegating io/coding/decoding to other libraries. Quicktime and others for example, but best is to use ffmpeg. You open, read and decode everything using the OpenCV API which gives you the video frame by frame.
Make sure your OpenCV is compiled with ffmpeg support and then read the OpenCV tutorial on how to read/write AVI files. It's really easy.
Getting OpenCV to be built with ffmpeg support might be hard though. You might want to switch to an older version of OpenCV if you can't get ffmpeg running with the current one.
Personally i would not spent time trying to read the video by yourself and delegate the task to OpenCV. That's how it is supposed to be used.

How do you place EXIF tags into a JPG, having the raw jpeg buffer in C++?

I am having a bit of a problem.
I get a RAW char* buffer from a camera and I need to add this tags before I can save it to disk. Writing the file to disk and reading it back again is not an option, as this will happen thousands of times.
The buffer data I receive from the camera does not contain any EXIF information, apart from the Width, Height and Pixels per Inch.
Any ideas? (C++)
Look at this PDF, on page 20 you have a diagram showing you were to place or modify your exif information. What is the difference with a file on disk ?
Does the JPEG buffer of your camera contain an EXIF section already ?
What's the difference? Why would doing it to a file on the disk be any different from doing it in memory?
Just do whatever it is you do after you read the file from the disk..
As far as I know EXIF data in JPEG is continuous subpart of file.
So
prepare EXIF data in memory
write part of JPEG file upto EXIF
write prepared EXIF
write rest of JPEG file
You might want to take a look into Exiv2 library. I know it can work on files but I suppose it also has functions to work on memory buffers.