C++ Video Capturing using Sink Writer - Memory consumption

C++ Video Capturing using Sink Writer - Memory consumption - c++

I am writing a C++ program (Win64) using C++ Builder 11.1.5 that captures video from a web cam and stores the captured frames in a WMV file using the sink writer interface as described in the following tutorial:
https://learn.microsoft.com/en-gb/windows/win32/medfound/tutorial--using-the-sink-writer-to-encode-video?redirectedfrom=MSDN
The video doesn't need to be real time using 30 Frames per second as the process being recorded is a slow one so I have set the FPS to be 5 (which is fine.)
The recording needs to run for about 8-12 hours at a time and using the algorithms in the sink writer tutorial, I have seen the memory consumption of the program go up dramatically after 10 minutes of recording (in excess of 10 Gb of memory). I also have seen that the final WMV file only becomes populated when the Finalize routine is called. Because of the memory consumption, the program starts to slow down after a while.
First Question: Is it possible to flush the sink writer to free up ram while it is recording?
Second Question: Maybe it would be more efficient to save the video in pieces and finalize the recording every 10 minutes or so then start another recording using a different file name such that when 8 hours is done the program could combine all the saved WMV files? How would one go about combining numerous WMV files into one large file?

Related

Writing many large files quickly in C++

I have a program which gets a stream of raw data from different cameras and writes it to disk. The program runs these sorts of recordings for ~2 minutes and then another program is used to process the frames.
Each raw frame is 2MB and the frame rate is 30fps (ie. data rate is around 60MB/s) and I'm writing to an SSD which can easily handle a sustained > 150MB/s (tested by copying 4000 2MB files from another disk which took 38 seconds and Process Explorer shows constant IO activity).
My issue is that occasionally calls to fopen(), fwrite() and fclose() stall for up to 5 seconds which means that 300MB of frames build up in memory as a back log, and after a few of these delays I hit the 4GB limit of a 32 bit process. (When the delay happens, Process Explorer shows a gap in IO activity)
There is a thread which runs a loop calling this function for every new frame which gets added to a queue:
writeFrame(char* data, size_t dataSize, char* filepath)
{
// Time block 2
FILE* pFile = NULL;
fopen(&pFile, filepath, "wb");
// End Time block 2
// Time block 3
fwrite(data,1,dataSize,pFile);
// End Time block 3
// Time block 4
fclose(pFile);
// End Time block 4
}
(There's error checking too in the actual code but it makes no difference to this issue)
I'm logging the time it takes for each of the blocks and the total time it takes to run the function and I get results which most of the time look like this: (times in ms)
TotalT,5, FOpenT,1, FWriteT,2, FCloseT,2
TotalT,4, FOpenT,1, FWriteT,1, FCloseT,2
TotalT,5, FOpenT,1, FWriteT,2, FCloseT,2
ie. ~5ms to run the whole functions, ~1ms to open the file, ~2ms to call write and ~2ms to close the file.
Occasionally however (on average about 1 in every 50 frames, but sometimes it can be thousands of frames between this problem occurring), I get frames which take over 4000ms:
TotalT,4032, FOpenT,4023, FWriteT,6, FCloseT,3
and
TotalT,1533, FOpenT,1, FWriteT,2, FCloseT,1530
All the frames are the same size and its never fwrite that takes the extra time, always fopen or fclose
No other process is reading/writing to/from this SSD (confirmed with Process Monitor).
Does anyone know what could be causing this issue and/or any way of avoiding/mitigating this problem?

I'm going to side with X.J., you're probably writing too many files to a single directory.
A solution could be to create a new directory for each batch of frames. Also consider calling SetEndOfFile directly after creating the file, as that will help Windows allocate sufficient space in a single operation.
FAT isn't a real solution as it's doing even worse on large directories.

Prepare empty files (2 MB files filled with zeros) So that space is already "ready", then just overwrite these files. Or create a file that is a batch of several frames, so you can reduce number of files.
there are libraries for doing compression and decompression and playback of videos:
libTheora may be usefull because already compress frames (well you will need to output the video in a single file) and do that pretty fast (lossy compression by the way).

Multithreading an OpenCV Program

Thanks for reading my post.
I have a problem with multithreading an opencv application I was hoping you guys could help me out with.
My aim is to Save 400 frames (in jpeg) from the middle of a video sequence for further examination.
I have the code running fine single threaded, but the multithreading is causing quite a lot of issues so I’m wondering if I have got the philosophy all wrong.
In terms of a schematic of what I should do, would I be best to:
Option 1 : somehow simultaneously access the single video file (or make copies?), then with individual threads cycling through the video frame by frame, save each frame when it is between predetermined limits? E.g. thread 1 saves frames 50 to 100, thread 2 saves frames 101 to 150 etc.
Option 2 : open the file once, cycle through frame by frame then pass an individual frame to a series of unique threads to carry out a saving operation. E.g. frame 1 passed to thread 1 for saving, frame 2 to thread 2 for saving, frame 3 to thread 1, frame 4 to thread 2 etc etc.
Option 3 : some other buffer/thread arrangement which is a better idea than the above!
I'm using visual C++ with the standard libs.
Many thanks for your help on this,
Cheers, Kay

Option 1 is what i have tried to do this far, but because of the errors, i was wondering if it was even possible to do this! Can threads usually access the same file? how do I find out how many threads i can have?
Certainly, different threads can access the same file, but it's really a question if the supporting libraries support that. For reading a video stream, you can use either OpenCV or ffmpeg (you can use both in the same app, ffmpeg for reading and OpenCV for processing, for example). Haven't looked at the docs, so I'm guessing here: either lib should allow multiple readers on the same file.
To find out the number of cores:
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
numCPU = sysinfo.dwNumberOfProcessors;
from this post . You would create one thread / core as a starting point, then change the number based on your performance needs and on actual testing.

Using libpcap, is there a way to determine the file offset of a captured packet from an offline pcap file?

I'm writing a program to reconstruct TCP streams captured by Snort. Most of the examples I've read regarding session reconstruction either:
load the entire pcap file in to memory to start with (not a solution because of hardware constraints and the fact that some of the capture files are 10 GB in size), or
cache each packet in memory as it reads through the capture and discards the irrelevant ones as it goes; this presents basically the same problems as reading the entire file in to memory
My current solution was to write my own pcap file parser since the format is simple. I save the offsets of each packet in a vector and can reload each one after I've passed it. This, like libpcap, only streams one packet in to memory at a time; I am only using sequence numbers and flags for ordering, NOT the packet data. Unlike libpcap, it is noticeably slower. processing a 570 MB capture with libpcap takes roughly 0.9 seconds whereas my code takes 3.2 seconds. However, I have the advantage of being able to seek backwards without reloading the entire capture.
If I were to stick with libpcap for speed issues, I was thinking I could just make a currentOffset variable with an initial value of 24 (the size of the pcap file global header), push it to a vector every time I load a new packet, and increment it every time I call pcap_next_ex by the size of the packet + 16 (for the size of the pcap record header). Then, whenever I wanted to read an individual packet, I could load it using conventional means and seek to packetOffsets[packetNumber].
Is there a better way to do this using libpcap?

Solved the problem myself.
Before I call pcap_next_ex, I push ftell(pcap_file(myPcap)) in to a vector<unsigned long>. I manually parse the packets after that as needed.
EZPZ. It just took 24+ hours of brain wrack...

IMemAllocator:GetBuffer hangs

Does anybody know any reasons why IMemAllocator:GetBuffer (Directshow) hangs, apart from all samples being in use?
I have a directshow application which uses GMFBridge by Geraint Davies to connect two graphs. The GMFBridge is used to be able to switch inputs, but I am not switching in this situation. The application captures audio and video, and should do that non-stop. But after about 10 hours it stops. I found out both audio and video got stuck in a call to IMemAllocator:GetBuffer:
/* _COM_SMARTPTR_TYPEDEF(IMemAllocator, IID_IMemAllocator); */
/* IMemAllocatorPtr m_pCopyAllocator; */
hr = m_pCopyAllocator->GetBuffer(&pOut, NULL, NULL, 0);
If all samples are in use, this function can block, but I am pretty sure this is not the case. There are two threads calling this function, one for the video and one for the audio samples. The Audio thread blocks first, and after the GetBuffer has returned a buffer for almost 60 video samples, the video thread blocks too. (this is about 2 seconds later)
After almost 8 hours both threads continue for a small period, first the audio thread, and after 45 buffers for audio samples have been returned, the video thread unblocks too.
So because both threads do not block at the same time, it looks to me there is not a problem with all samples being in use.
The stacktrace shows a function inside quartz.dll is being called at that moment.
UPDATE
It looks like there was a memoryleak caused by decoder filters already installed on the pc. The graph included decoding of mpeg, for example the audio decoding used a cyberlink decoder. After installing ffdshow, the ffdshow audio + video decoder was used instead, and the problem seems to be disappeared. Lesson learned, do not depend automatically on existing filters.

Not sure that I can debug this from the info given. Can you create a log file (create an empty file c:\gmfbridge.txt, run until it hangs, then zip the file and email it). Also, if you set up your symbols with _NT_SYMBOL_PATH, you could look at the stack trace to see where in quartz.dll the various threads are.
G

Loading large multi-sample audio files into memory for playback - how to avoid temporary freezing

I am writing an application needs to use large audio multi-samples, usually around 50 mb in size. One file contains approximately 80 individual short sound recordings, which can get played back by my application at any time. For this reason all the audio data gets loaded into memory for quick access.
However, when loading one of these files, it can take many seconds to put into memory, meaning my program if temporarily frozen. What is a good way to avoid this happening? It must be compatible with Windows and OS X. It freezes at this : myMultiSampleClass->open(); which has to do a lot of dynamic memory allocation and reading from the file using ifstream.
I have thought of two possible options:
Open the file and load it into memory in another thread so my application process does not freeze. I have looked into the Boost library to do this but need to do quite a lot of reading before I am ready to implement. All I would need to do is call the open() function in the thread then destroy the thread afterwards.
Come up with a scheme to make sure I don't load the entire file into memory at any one time, I just load on the fly so to speak. The problem is any sample could be triggered at any time. I know some other software has this kind of system in place but I'm not sure how it works. It depends a lot on individual computer specifications, it could work great on my computer but someone with a slow HDD/Memory could get very bad results. One idea I had was to load x samples of each audio recording into memory, then if I need to play, begin playback of the samples that already exist whilst loading the rest of the audio into memory.
Any ideas or criticisms? Thanks in advance :-)

Use a memory mapped file. Loading time is initially "instant", and the overhead of I/O will be spread over time.

I like solution 1 as a first attempt -- simple & to the point.
If you are under Windows, you can do asynchronous file operations -- what they call OVERLAPPED -- to tell the OS to load a file & let you know when it's ready.

i think the best solution is to load a small chunk or single sample of wave data at a time during playback using asynchronous I/O (as John Dibling mentioned) to a fixed size of playback buffer.
the strategy will be fill the playback buffer first then play (this will add small amount of delay but guarantees continuous playback), while playing the buffer, you can re-fill another playback buffer on different thread (overlapped), at least you need to have two playback buffer, one for playing and one for refill in the background, then switch it in real-time
later you can set how large the playback buffer size based on client PC performance (it will be trade off between memory size and processing power, fastest CPU will require smaller buffer thus lower delay).

You might want to consider a producer-consumer approach. This basically involved reading the sound data into a buffer using one thread, and streaming the data from the buffer to your sound card using another thread.
The data reader is the producer, and streaming the data to the sound card is the consumer. You need high-water and low-water marks so that, if the buffer gets full, the producer stops reading, and if the buffer gets low, the producer starts reading again.
A C++ Producer-Consumer Concurrency Template Library
http://www.bayimage.com/code/pcpaper.html
EDIT: I should add that this sort of thing is tricky. If you are building a sample player, the load on the system varies continuously as a function of which keys are being played, how many sounds are playing at once, how long the duration of each sound is, whether the sustain pedal is being pressed, and other factors such as hard disk speed and buffering, and amount of processor horsepower available. Some programming optimizations that you eventually employ will not be obvious at first glance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js