Saving images from a video camera to hard disk - c++

I have a Firewire camera whose driver software deposits incoming images into a circular buffer of 16 images. I would like to avoid re-buffering these images, and just write them as fast as possible to disk. So I would prefer to just enqueue a pointer to each buffer as it is filled, and have a separate disk write thread which kept far enough ahead of the incoming images to be confident that it would write them out to disk before the incoming images overwrote them.
Clearly this would depend on the image size and frame rate... but in principle, for VGA images at 30 frames per second, we're talking about needing to write 27.6 MB/sec. This seems quite achievable, particularly if the writing thread can decide to drop occasional frames to keep far enough ahead of the incoming images, and if this strategy fails, to at least detect that an overwrite has invalidated the image, and signal that appropriately (e.g. delete the file after completion).
Comments on the validity of this strategy are welcome... but what I really want to know is what disk writing functions should be used for maximum efficency to get the disk writing rate up as high as possible. E.g. CreateFile() using FILE_FLAG_NO_BUFFERING + WriteFile()?

Related

Streaming images to ffmpeg out of order

I have an algorithm that generates a video frame by frame, which I am refactoring to stream the images to ffmpeg rather than save them all to disk first.
The algorithm currently generates and saves a bunch of QImages then executes ffmpeg (with QProcess). The new version will generate QImages then stream them out to ffmpeg's stdin via QProcess. I can do all that just fine.
The issue is, I'm not actually generating the images serially, I'm generating them on a thread pool, maybe 12-16 at a time (just using QtConcurrent). I'd like to keep this parallelism for performance reasons, since the operation (even excluding image encoding and file I/O) does take a long time.
So my question is, does ffmpeg have something like image2pipe but that can take the images slightly out of order (let's assume there's some sane maximum offset, +/- 20 or so)?
If not, my solution will be to divide the image set into small batches, then for each batch, run it on the pool, reorder it, and stream them in the correct order; sacrificing some fraction of performance depending on the batch size. I'd rather not do that just because the code right now is dirt simple, and simple is good. So I'm wondering if there's a way to get ffmpeg to accept the images out of order.
They're at a fixed frame rate, if that is useful (maybe there's some way to set PTS's or something, then I just find a codec + container that can store out-of-order frames).

Seeking within MP3 file

I am working on the development of driving software for the hardware implementation by these people. The decoder works properly in overall, but I am struggling making it starting playing the sound at the middle. I suspect that it is common feature of the MP3 decoders as they must have some history of data in order to properly construct current sound (I am not that skilled in MPEG, however have an idea of some basics).
The problem is that this decoder is a black box, and any deepening in its code is an enormous time and effort.
I empirically found out that the sound garbage, when starting somewhere in the middle, happens in no more that 1 (one) seconds after start with file # 320 kbps and 44100 sampling rate. I am actually ok to mute decoder for a second (while it gathers/decodes proper required data for further playback), and then unmute it to continue playback.
I did search on the internet for the matter, did not find anything useful. Tried to invalidate first frames by corrupting frame headers (the easiest that could be done without going into the MP3 headers/data), made things even worse.
Questions:
is there any body of knowledge of how players perform seek in MP3 files and keep non-corrupt sound?
Is my action plan seem valid - mute for 1 second while decoder plays garbage? Is there any way to (easily) calculate the time I must mute output for?
Update: just tried on another file # 128 kbps/48k and maximal garbage time to be about 2 seconds... I can not believe that decoder with so limited resources - input buffer used is 2 kB with some intermediate working buffers, in total must be not more than 36 kB - can keep the history for 2 seconds, or decoder is having problems finding the sync word in the stream... and thus my driver needs to figure out the frame start (by finding out sync word, reading frame header, calculating frame size, and looking after the frame to contain another sync word).
I've found workarounds. The difficulty was that there are actually two problems overlaying each other, but was easy to cope with having structured approach.
The decoder is having issues getting the first sync word of the stream, and works very well when the first bytes supplied to it are FF FB or FF FA. All other bytes - in the middle of the frame - with very high probability, cause major sound corruption, until decoder catches correct sync. Thus I designed the code seeking to the next frame start after the seek point, checking that this is actual start of the frame by calculating frame size and looking at the next frame to contain FFFB/FA.
Having fixed the problem 1 I have had minor corruption left from the decoder starting decoding the frame without historical data. I have solved it by muting the decoder for the first 4 buffering transactions.
Major corruption still happens, but is rare, and it seems that nature of corruption depends on what was in the decoder buffers (not only Huffman input buffer, but other intermediate buffers) before the decoder is instructed to start. My hardware performs clear of the input buffers to 0 when decoder is in reset state, but it seems to be not enough (or just incorrect)...
The decoder itself is a kind of PoC (proof of concept) work, a student term with the aim to prove that they were able to make it; the package is having test bench code, but lacks low level documentation/comments in the code, and is not ready for field implementation and production. In general the fact that it worked for me at all (almost) out of the box makes the honor to the developers and is a mark of high quality of their work. I have reviewed and tried several published projects for MP3 decoders for silicon implementation (FPGA) and concluded that this one is the best available. In addition, the license they provide their work on is generous one.
Update: my research have shown that the most problem lies not in the input buffer (however it is possible to improve the situation by uploading 528 bytes of historical data to the decoder's buffer so that it would be able to grab main data from previous frame), but in the internal state of the decoder. Its documentation says:
To reduce resource usage, part of the RAM for buffering the intermediate data is also shared with Huffman decoding as bit reservoir ...
thus it is a contents of the reservoir and intermediate computed data affecting the decoding. I have confirmed it by starting various set of frames in different sequence, and if set of frames are played in different sequence, nature of garbage changes, or garbage may simply not appear.
Thus, unfortunately, my conclusion: it is not possible to properly seek using this decoder as is. I even do not think it is possible to "fake" playback (to quickly "play" the file till the needed point in buffers) as all three clocks are tied to each other.
I will keep my "best tested" implementation, with the notes on the quality.
Update 2: I was wrong, it is possible to seek softly, but to mitigate the sound corruption (yes, I am still unsure if I fixed it completely) I had to find another deficiency in the decoder: it is related to timing, decoder assumes that further data is always available in the buffer, while it may not be there yet. (It is actually clear from the test bench code supplied within the IP - the way data was replenished during QA and testing). In the cases I caught the corruption, first frames in the first part of the input buffer RAM were not decoded properly, skipped, and decoder quickly skips to second part of the RAM, assuming new data is there, however driving hardware is not ready yet fetching required data and putting this data into the second part of decoder's buffer RAM, thus corruption persisted for quite a long time with decoder looping skipping "invalid" frames until it catches correct image of the frame and normalizes its pace through the buffer.
Now the solution:
play (almost) 5 frames of silence through decoder before unmuting it. This will ensure all decoder's internal buffers are purged. It will not take much time, however requires some coding;
introduce a possibility to set huffman's decoder starting pointer readptr (in huffctl.v) after reset into the value other than 0. It will give the flexibility to have some history data uploaded into the decoder's buffer and start huffman decoder from the middle of the buffer rather than from its very start;
calculate the position to seek to, it calculates relatively easily for MPEG-1 Layer-3: duration=(filesize-ID3size)/(bitrate/8*1000), newPosition=ID3size+seekTime*(bitrate/8*1000). Duration is needed to check that position to seek to fits into the play time, alternatively newPosition can be used to check against file size. These forumlas do not take into account older tag versions appearing at the end of the file, but they are usually not more than 128 bytes, thus a kind of negligible for timing calculation relative to average MP3 sound file size; it also assumes CBR (VBR will require completely different way, requiring more power and data I/O for accurate seeking). Funny enough I found web pages with incorrect duration calculation formula, thus beware posts by ignorant people with cool job titles;
Seek to the calculated position, find next frame from this position on, calculate frame size, and ensure that there's next valid frame at that distance. New pointer will point to this next frame found at the distance;
find out the main_data_begin lookback pointer of the frame now being pointed to at step 4. Decrease the new pointer by this value so that pointer points within previous frame to the start of the main data for the current frame - it will be a pointer for the decoder data start. Note that it will fail if main data begins in more than one frame back (removal of headers of previous frame(s) will be required for proper operation);
fill decoder's buffer starting pointer identified in step 5, and set decoder's decoding start pointer to the one identified in step 4. While the implementation assumes you fill buffer in halves, do it different from the start: fill the whole buffer instead of just a first half. For this, after reset, set idle bit, check for data request, reset idle bit, perform two 1024 byte transfers to the decoder's buffer (effectively filling it completely), and then set idle bit, then reset it, and then set it again;
after performing step 7 continue normally replenishing 1024 bytes per decoder's request.
Employing this plan I had zero sound corruption cases. As you see it requires some changes to Verilog, but it must be easy if you know basics or hardware, know Verilog amd can perform reverse engineering.

Is there a standardized method to send JPG images over network with TCP/IP?

I am looking for a standardized approach to stream JPG images over the network. Also desirable would be a C++ programming interface, which can be easily integrated into existing software.
I am working on a GPGPU program which processes digitized signals and compresses them to JPG images. The size of the images can be defined by a user, typically the images are 1024 x 2048 or 2048 x 4096 pixel. I have written my "own" protocol, which first sends a header (image size in bytes, width, height and corresponding channel) and then the JPG data itself via TCP. After that the receiver sends a confirmation that all data are received and displayed correctly, so that the next image can be sent. So far so good, unfortunately my approach reaches just 12 fps, which does not satisfy the project requirements.
I am sure that there are better approaches that have higher frame rates. Which approach do streaming services like Netflix and Amzon take for UHD videos? Of course I googled a lot, but I couldn't find any satisfactory results.
Is there a standardized method to send JPG images over network with TCP/IP?
There are several internet protocols that are commonly used to transfer files over TCP. Perhaps the most commonly used protocol is HTTP. Another, older one is FTP.
Which approach do streaming services like Netflix and Amzon take for UHD videos?
Firstly, they don't use JPEG at all. They use some video compression codec (such as MPEG), that does not only compress the data spatially, but also temporally (successive frames tend to hold similar data). An example of the protocol that they might use to stream the data is DASH, which is operates over HTTP.
I don't have a specific library in mind that already does these things well, but some items to keep in mind:
Most image / screenshare/ video streaming applications use exclusively UDP, RTP,RTSP for the video stream data, in a lossy fashion. They use TCP for control flow data, like sending key commands, or communication between client / server on what to present, but the streamed data is not TCP.
If you are streaming video, see this.
Sending individual images you just need efficient methods to compress, serialize, and de-serialize, and you probably want to do so in a batch fashion instead of just one at a time.Batch 10 jpegs together, compress them, serialize them, send.
You mentioned fps so it sounds like you are trying to stream video and not just copy over images in fast way. I'm not entirely sure what you are trying to do. Can you elaborate on the digitized signals and why they have to be in jpeg? Can they not be in some other format, later converted to jpeg at the receiving end?
This is not a direct answer to your question, but a suggestion that you will probably need to change how you are sending your movie.
Here's a calculation: Suppose you can get 1Gb/s throughput out of your network. If each 2048x4096 file compresses to about 10MB (80Mb), then:
1000000000 ÷ (80 × 1000000) = 12.5
So, you can send about 12 frames a second. This means if you have a continuous stream of JPGs you want to display, if you want faster frame rates, you need a faster network.
If your stream is a fixed length movie, then you could buffer the data and start the movie after enough data is buffered to allow playback at desired frame rate sooner than waiting for the entire movie to download. If you want playback at 24 frames a second, then you will need to buffer at least 2/3rds of the movie before you being playback, because the the playback is twice is fast as your download speed.
As stated in another answer, you should use a streaming codec so that you can also take advantage of compression between successive frames, rather than just compressing the current frame alone.
To sum up, your playback rate will be limited by the number of frames you can transfer per second if the stream never ends (e.g., a live stream).
If you have a fixed length movie, buffering can be used to hide the throughput and latency bottlenecks.

How to create vector of matrices to store large number of images?

I want to create the vector of matrices to stores as many images as possible.
I know that,it is possible as written below:
vector<Mat> images1;
and during the image acquisition from the camera and i would save the images at 100fps with resolution of 1600*800 as below:
images1.push_back(InputImage.clone());
Where InputImage is the Mat and given by the camera. Since creating video during the acquisition process either leads to frame missing in the video or reduction in aquisition speed.
Later after stopping the image acquisition and before stopping the program, I would write the images into video as written below:
VideoWriter writer;
writer = Videowriter("video.avi",-1,100,frameSize(1600,800),false);
for (vector<Mat>::iterator iter = images1.begin(); ier != images1.end(); iter++)
writer.write(*iter);
Is it correct, since I am not sure the images1 can store the images around 1500 images without overflow.
You don't really have to worry about "overflow", whatever that means in your context.
The bigger problem is memory. A single frame takes (at 8 bits per color, with 3 colors) 3 * 1600 * 800 == 3.84Mb. At 100fps, One second of footage requires 0.384Gb of memory. 8GB of memory can only hold about 20 seconds of footage. You'll need almost 24GB of memory before you can hold a whole minute. There's a reason that the vast, vast, vast majority of Video Encoding Software only keeps a few frames of video data in memory at any given time, and dumps the rest to the hard drive (or discards it, depending on what purpose the software is serving).
What you should probably be doing (which is what programs like FRAPS do) is dumping frames to the hard drive as soon as soon as you receive them. Then, when recording finishes, you can either call it a day (if raw video footage is what you need) or you can begin a process of reading the file and encoding it into a more compressed format.
Pre-allocate your image vector in memory so that you just need to copy the frames without real-time allocation.
If you have memory problems, try dumping the frames to a file, the OS will hopefully be able to handle the I/O. If not try memory mapped files.

Running music as SDL_Mixer chunks

Currently, SDL_Mixer has two types of sound resources: chunks and music.
Apart from the API and supported formats limitations, are there any reasons not to load and play music as a SDL_Chunk and channel? (memory, speed, etc.)
The API is the real issue. The "music" APIs are designed to deal with streaming compressed music, while the "sound" APIs aren't. Then again, if you manage to make it work in your app, then it works.
I haven't looked at the SDL code, but my guess would be the "chunks" are intended for smaller sound samples, and are cached in memory, decoded, in their entirety while the "music" is streamed (not cached in memory in its entirety, but decoded and buffered as needed, with the assumption that it would, for the most part, be played from the beginning, and continuously from that point, with maybe some reset back to the beginning occasionally)
So the reason is memory. You don't want to decode say, 4 minutes of a 16-bit stereo song into memory, as it will eat 44100Hz * 2bytes * 2channels *4minutes *60sec/min == 42336000 bytes if you attempt it, when you can decode and buffer smaller pieces of it.
OTOH, if you have the ~10Mb of RAM per minute of music to waste and you need the CPU that would be consumed by the on-the-fly decoding... you could probably use chunks.