How to detect camera frame loss using Windows media API like Media Foundation or DirectShow? - c++

I am writing an application for Windows that runs a CUDA accelerated HDR algorithm. I've set up an external image signal processor device that presents as a UVC device, and delivers 60 frames per second to the Windows machine over USB 3.0.
Every "even" frame is a more underexposed frame, and every "odd" frame is a more overexposed frame, which allows my CUDA code perform a modified Mertens exposure fusion algorithm to generate a high quality, high dynamic range image.
Very abstract example of Mertens exposure fusion algorithm here
My only problem is that I don't know how to know when I'm missing frames, since the only camera API I have interfaced with on Windows (Media Foundation) doesn't make it obvious that a frame I grab with IMFSourceReader::ReadSample isn't the frame that was received after the last one I grabbed.
Is there any way that I can guarantee that I am not missing frames, or at least easily and reliably detect when I have, using a Windows available API like Media Foundation or DirectShow?
It wouldn't be such a big deal to miss a frame and then have to purposefully "skip" the next frame in order to grab the next oversampled or undersampled frame to pair with the last frame we grabbed, but I would need to know how many frames were actually missed since a frame was last grabbed.
Thanks!

There is IAMDroppedFrames::GetNumDropped method in DirectShow and chances are that it can be retrieved through Media Foundation as well (never tried - they are possibly obtainable with a method similar to this).
The GetNumDropped method retrieves the total number of frames that the filter has dropped since it started streaming.
However I would question its reliability. The reason is that with these both APIs, the attribute which is more or less reliable is a time stamp of a frame. Capture devices can flexibly reduce frame rate for a few reasons, including both external like low light conditions and internal like slow blocking processing downstream in the pipeline. This makes it hard to distinguish between odd and even frames, but time stamp remains accurate and you can apply frame rate math to convert to frame indices.
In your scenario I would however rather detect large gaps in frame times to identify possible gap and continuity loss, and from there run algorithm that compares exposure on next a few consecutive frames to get back to sync with under-/overexposition. Sounds like a more reliable way out.
After all this exposure problem is highly likely to be pretty much specific to the hardware you are using.

Normally MFSampleExtension_Discontinuity is here for this. When you use IMFSourceReader::ReadSample, check this.

Related

Holoens 2 - VLC sensor frames have incorrect timestamps (frames out of order)

Im using the following repo to access and save the device streams:
https://github.com/microsoft/HoloLens2ForCV
When recording using StreamRecorder it seems that the timestamps returned by all of the visible light cameras are frequently incorrect resulting in an out of order sequence of frames.
To confirm this, I made a recording while looking at a stopwatch with each visible light camera. There are many frames where the reading on the stopwatch is lower than the previous frame (despite the frame's larger timestamp). Sometimes a disruption lasting more than 5 frames happens before the timesteps seem to become correct again.
This happens often enough for it to be a serious inconvenience. For a rough idea, I counted 12 times where the stopwatch time decreased compared to the previous frame in a 10 second recording. The out of order frames are very noticeable in the resulting video playback.
I tried using timestamp.SensorTicks instead of timestamp.HostTicks in RMCameraReader.cpp but the issue persisted.
This does not happen with the PV frames or with either mode of the depth sensor frames.
I'm using the latest insider preview build: Windows Version 21H1, OS build 20346.1402
I may be wrong but I do not recall this issue occurring with the first few insider builds that supported research mode, however, I couldn't find the older insider builds online to try.
Is there any way to fix this issue?
Thanks a lot!

GOP size for realtime video stream

I'm working on a kind of rich remote desktop system, with a video stream of the desktop encoded using avcodec/x264. I have to set manually the GOP size for the stream, and so far I was using a size of fps/2.
But I've just read the following on Wikipedia:
This structure [Group Of Picture# suggests a problem because the fourth frame (a P-frame) is needed in order to predict the second and the third (B-frames). So we need to transmit the P-frame before the B-frames and it will delay the transmission (it will be necessary to keep the P-frame).
It means I'm creating a lot of latency since the client needs to receive at least half of the GOP to output the first frame following the I frame. What is the best strategy for the GOP size if I want the smallest latency possible ? A gop of 1 picture ?
If you want to minimize latency with h264, you should generally avoid b-frames. This way the decoder has at least a chance to emit decoded frames early. This prevents decoder-induced latency.
You may also want to tune the encoder for latency, by reducing/disabling look-ahead. x264 has a "zero-latency" setting which should be a good starting point for finding you optimal settings.
The "GOP" size (which afaik is not really defined for h264; I'll just assume you mean the I(DR)-frame interval) does not necessarily affect the latency. This parameter only affects how long a client will have to wait until it can "sync" on the stream (time-to-first-picture).

OSX pushing pixels to screen with minimum latency

I'm trying to develop some very low-latency graphics applications and am getting really frustrated by how long it takes to draw to screen through OpenGL. Every discussion I find about it online addresses optimizing the OpenGL pipeline, but doesn't get anywhere near the results that I need.
Check this out:
https://www.dropbox.com/s/dbz4bq67cxluhs7/MouseLatency.MOV?dl=0
You probably noticed this before: With a c++ OpenGL app, dragging the mouse around the screen, and drawing the mouse location in OpenGL, the OpenGL lags behind by 3 or 4 frames. Clearly OSX CAN draw [the cursor] to screen with very low latency, but OpenGL is much slower. So let's say I don't need to do any fancy OpenGL rendering. I just want to push pixels to screen somehow. Is there a way for me to bypass OpenGL completely and draw to screen faster? Or is this kind of functionality going to be locked inside the kernel somewhere that I can't reach it?
datenwolf's answer is excellent. I just wanted to add one thing to this discussion regarding triple buffering at the compositor level, since I am very familiar with the Microsoft Windows desktop compositor.
I know you are asking about OS X here, but the implementation details I am going to discuss are the most sensible way of implementing this stuff and I would expect to see other systems work this way too.
Triple buffering as you might enable at the application level adds a third buffer to the swap-chain that is synchronized to refresh. That way of doing triple buffering does add latency, because that third buffer has to be displayed and nothing is allowed to touch it until this happens (this is D3D's mandated behavior -- the behavior and feature itself are undefined in OpenGL); but the way the Desktop Window Manager (Windows) works is slightly different.
The behavior I have seen most drivers implement for desktop composition is frame dropping. Any situation where multiple frames are finished between refreshes, all but 1 of those frames are discarded. You actually get lower latency using a window rather than fullscreen + triple buffering, because it does not block buffer swaps when the third buffer (owned by the compositor) has a finished frame waiting to be displayed.
It creates a whole different set of visual issues if framerate is not reasonably consistent. Technically, pixels belonging to dropped frames have infinite latency, so the benefits from latency reduction done this way might be worthless if you needed every single frame drawn to appear on screen.
I believe you can get this behavior on OS X (if you want it) by disabling VSYNC and drawing in a window. VSYNC basically only serves as a form of frame pacing (trade latency for consistency) in this scenario and tearing is eliminated by the compositor itself regardless what rate you draw at.
Regarding mouse cursor latency:
The cursor in any modern window system will always track with minimum latency. There is literally a feature on graphics hardware called a "hardware cursor," where the driver stores the cursor position and then once per-refresh, has the hardware overlay the cursor on top of whatever is sitting in the framebuffer waiting to be scanned-out. So even if your application is drawing at 30 FPS on a 60 Hz display, the cursor is updated every 16 ms when the hardware cursor's used.
This bypasses all graphics APIs altogether, but is quite limited (e.g. it uses the OS-defined cursor).
TL;DR: Latency comes in many forms.
If your problem is input latency, then you can mitigate that by reducing the number of pre-rendered frames and avoiding triple buffering. I could not begin to tell you how to reduce the number of driver pre-rendered frames on OS X.
Minimize length of time before something shows up on screen
If your problem is the amount of time that passes between executions of your render loop, you would go the other way. Increase pre-rendered frames, draw in a window and disable VSYNC. You may run into a lot of frames that are drawn but never displayed in this scenario.
Minimize time spent blocking (increase FPS); some frames will never be displayed
Pre-rendered frames are a powerful little feature that you do not get control over at the OpenGL API level. It sets up how deeply the driver is allowed to pipeline everything and depending on the desired task you will trade different types of latency by fiddling with it. Many gamers swear by setting this value to 1 to minimize input latency at the cost of overall framerate "smoothness."
UPDATE:
Pre-rendered frames are one reason for your multi-frame delay. Fixing this in a cross-platform way is difficult (it's a driver setting), but if you have access to Fence Sync Objects you can produce the same behavior as forcing this to 1.
I can explain this in more detail if need be, the general idea is that you insert a fence sync after the buffer swap and then wait for it to be signaled before the first command in the next frame is allowed to begin. Performance may take a nose dive, but latency will be minimized since the CPU won't be rendering ahead of the GPU anymore.
There are a number of latencies at play here.
Input event → drawing state latency
In your typical interactive application you have a event loop that usually goes
collect user input
process user input
determine what's to be drawn
draw to the back buffer
swap back to front buffer
With the usual ways in which event–update–display loops are written there's almost no delay between step 5 of the previous and step 1 of the following iteration. which means that steps 2, 3, and 4 operate with data that lags about one frame period behind.
So this is the first source of latency.
Tripple buffering / composition latency
Many graphics pipelines enable triple buffering for smoother display update. Instead of keeping only a back and a front buffer around, there's also a third buffer inbetween. The average rate at which to these buffers is drawn is the display refresh period. The buffers themself are stepped at exactly the display refresh period. So this adds another frame period of latency.
If you're running on a system with a window compositor (which is the default by MacOS X) this adds effectively another buffer stage, so if you've got a double buffer mode it gives you triple buffer and if you had a triple buffer it'd give you a "quad" buffer (quotes here, because quad buffer is a term usually used with stereoscopic rendering).
What can you do about this:
Turn off composition
Windows through the DWM API and MacOS X allow to turn off composition or bypass the compositor.
Reducing input lag
Try to collect and integrate the user input as late as possible (use high resolution sleeps). If you've got only a very simple scene you can push the drawing quite close to the V-Sync deadline; in fact the NVidia OpenGL implementation has a vendor specific extension that allows to sleep until a specific amount of time before the next V-Sync.
If your scene is complex but is separable in parts that require low latency user input and stuff where it doesn't matter so much you can draw the higher latency stuff earlier and only at the very last moment integrate user input into it. Of course if the mouse is used to control the viewing direction, or even worse you're rendering for a VR head mounted display things are going to become difficult.

Playing video at frame rates that are not multiples of the refresh rate.

I'm working on an application to stream video to OpenGL textures. My first thought was to lock the rendering loop to 60hz, so to play a video at 30fps or 60fps I would update the texture on every other frame or every frame respectively. How do computers play videos at other frame rates when monitors are at 60hz, or for that matter if a monitor is at 75 hz how do they play 30fps video?
For most consumer devices, you get something like 3:2 pulldown, which basically copies the source video frames unevenly. Specifically, in a 24 Hz video being shown on a 60 Hz display, the frames are alternately doubled and tripled. For your use case (video in OpenGL textures), this is likely the best way to do it, as it avoids tearing.
If you have enough compute ability to run actual resampling algorithms, you can convert any frame rate to any other frame rate. Your choice of algorithm defines how smooth the conversion looks, and different algorithms will work best in different scenarios.
Too much smoothness may cause things like the 120 Hz "soap opera" effect
[1]
[2]:
We have been trained by growing up watching movies at 24 FPS to expect movies to have a certain look and feel to them that is an artifact of that particular frame rate.
When these movies are [processed], the extra sharpness and clearness can make the movies look wrong to viewers, even though the video quality is actually closer to real.
This is commonly called the Soap Opera Effect, because some feel it makes these expensive movies look like cheap shot-on-video soap operas (because the videotape format historically used on soap operas worked at 30 FPS).
Essentially you're dealing with a resampling problem. Your original data was sampled at 30Hz or 60Hz, and you've to resample it to another sample rate. The very same algorithms apply. Most of the time you'll find articles about audio signal resampling. Just think each pixel's color channel to be a individual waveform you want to resample.

Direct3D 11 missing GetRasterStatus, how do I detect the vertical blank period?

I'm updating an application in which measurement of the time of presentation of a stimulus on a screen requires the greatest amount of accuracy. It is currently written with DirectDraw, which got put out to pasture a long while ago, and there's a need to update our graphics library.
The way which we measure the presentation time utilizes detecting the end of the Vertical Blank period. Specifically I need to know with, the greatest possible accuracy, when whatever was flipped onto the primary surface (or presented in the swap chain) is actually being drawn by the screen. Detecting the scan line can increase the certainty of that measurement, but I would be able to work with only detecting when the vertical blank period ended immediately after the Flip or Present was called.
Direct 3D 9 has the IDirect3DDevice9::GetRasterStatus Method that returns a D3DRASTER_STATUS struct which includes a InVBlank boolean, that describes if the device is in a vertical blank, as well as the current scan line. DirectDraw has similar functions (IDirectDraw::GetVerticalBlankStatus, also IDirectDraw::GetScanLine which returns DDERR_VERTICALBLANKINPROGRESS during Vertical Blank can be used to detect the VB).
However I have not been able to find any similar function in Direct3D11. Does anyone know if this functionality was moved or removed between Direct3D9 and Direct3D11, and if the latter, why?
Sorry for the late reply, but I notice there is still no accepted answer so perhaps you never found one that worked. Nowadays on Windows, the DesktopWindowManager service (dwm.exe) coordinates everything and can't really be bypassed. Ever since Windows 8, this service can't be disabled.
So DWM is always going to control the frame rate, render queue management, and final composition for all of the various IDXGISurface(n) objects and IDXGIOutput(n) monitors and there isn't much use in tracking VSync for an offscreen render target, unless I'm missing something (no sarcasm intended). As for your question, I wasn't sure if your goal was to:
obtain extremely precise timing info, but just for diagnostic, profiling, or informational use, or
whether the app was then going to (attempt to) use those results to (attempt to) schedule its own present cycles.
If it's the latter, I believe you can effectively only do this if the D3D app is running in full-screen exclusive mode. That's the only case where the DWM—in the guise of DXGI–will truly trust a client to handle its own Present timing.
The (barely) good news here is that if your interest in VSync is informational only—which is to say that you fall into bullet category (1.) from above—then you can indeed get all the timing data you'd ever want, and at QueryPerformanceFrequency resolution, which is typically around 320 ns.¹
Here's how to get that high-res video timing info. But again, just to be clear, despite the apparent success in obtaining the information as shown below, any attempt to use these interesting results, for example, to condition some deterministic--and thus potentially useful--outcome on the readings you obtain will be destined to fail, that is, entirely thwarted by DWM intermediation:
DWM_TIMING_INFO
Specifies Desktop Window Manager (DWM) composition timing information. Used by the DwmGetCompositionTimingInfo function.
typedef struct _DWM_TIMING_INFO
{
UINT32 cbSize; // size of this DWM_TIMING_INFO structure
URATIO rateRefresh; // monitor refresh rate
QPC_TIME qpcRefreshPeriod; // monitor refresh period
URATIO rateCompose; // composition rate
QPC_TIME qpcVBlank; // query performance counter value before the vertical blank
CFRAMES cRefresh; // DWM refresh counter
UINT cDXRefresh; // DirectX refresh counter
QPC_TIME qpcCompose; // query performance counter value for a frame composition
CFRAMES cFrame; // frame number that was composed at qpcCompose
UINT cDXPresent; // DirectX present number used to identify rendering frames
CFRAMES cRefreshFrame; // refresh count of the frame that was composed at qpcCompose
CFRAMES cFrameSubmitted; // DWM frame number that was last submitted
UINT cDXPresentSubmitted; // DirectX present number that was last submitted
CFRAMES cFrameConfirmed; // DWM frame number that was last confirmed as presented
UINT cDXPresentConfirmed; // DirectX present number that was last confirmed as presented
CFRAMES cRefreshConfirmed; // target refresh count of the last frame confirmed as completed by the GPU
UINT cDXRefreshConfirmed; // DirectX refresh count when the frame was confirmed as presented
CFRAMES cFramesLate; // number of frames the DWM presented late
UINT cFramesOutstanding; // number of composition frames that have been issued but have not been confirmed as completed
CFRAMES cFrameDisplayed; // last frame displayed
QPC_TIME qpcFrameDisplayed; // QPC time of the composition pass when the frame was displayed
CFRAMES cRefreshFrameDisplayed; // vertical refresh count when the frame should have become visible
CFRAMES cFrameComplete; // ID of the last frame marked as completed
QPC_TIME qpcFrameComplete; // QPC time when the last frame was marked as completed
CFRAMES cFramePending; // ID of the last frame marked as pending
QPC_TIME qpcFramePending; // QPC time when the last frame was marked as pending
CFRAMES cFramesDisplayed; // number of unique frames displayed
CFRAMES cFramesComplete; // number of new completed frames that have been received
CFRAMES cFramesPending; // number of new frames submitted to DirectX but not yet completed
CFRAMES cFramesAvailable; // number of frames available but not displayed, used, or dropped
CFRAMES cFramesDropped; // number of rendered frames that were never displayed because composition occurred too late
CFRAMES cFramesMissed; // number of times an old frame was composed when a new frame should have been used but was not available
CFRAMES cRefreshNextDisplayed; // frame count at which the next frame is scheduled to be displayed
CFRAMES cRefreshNextPresented; // frame count at which the next DirectX present is scheduled to be displayed
CFRAMES cRefreshesDisplayed; // total number of refreshes that have been displayed for the application since the DwmSetPresentParameters function was last called
CFRAMES cRefreshesPresented; // total number of refreshes that have been presented by the application since DwmSetPresentParameters was last called
CFRAMES cRefreshStarted; // refresh number when content for this window started to be displayed
ULONGLONG cPixelsReceived; // total number of pixels DirectX redirected to the DWM
ULONGLONG cPixelsDrawn; // number of pixels drawn
CFRAMES cBuffersEmpty; // number of empty buffers in the flip chain
}
DWM_TIMING_INFO;
(Note: To horizontally compress the above source code for display on this website, assume the following abbreviations are prepended:)
typedef UNSIGNED_RATIO URATIO;
typedef DWM_FRAME_COUNT CFRAMES;
Now for apps running in windowed mode, you can certainly grab this detailed information as often as you like. If you only need it for passive profiling, then getting the data from DwmGetCompositionTimingInfo is the modern way to do it.
And speaking of modern, since the question hinted at modernizing, you'll want to consider using a IDXGISwapChain1 obtained from IDXGIFactory2::CreateSwapChainForComposition to enable the use of the new DirectComposition component.
DirectComposition enables rich and fluid transitions by achieving a high framerate, using graphics hardware, and operating independently of the UI thread. DirectComposition can accept bitmap content drawn by different rendering libraries, including Microsoft DirectX bitmaps, and bitmaps rendered to a window (HWND bitmaps). Also, DirectComposition supports a variety of transformations, such as 2D affine transforms and 3D perspective transforms, as well as basic effects such as clipping and opacity.
Anyway, it seems less likely that detailed timing information might usefully inform an app's runtime behavior; maybe it will help you predict your next VSync, but one does wonder what significance "keen awareness of the blanking period" might have for some particular DWM-subjugated offscreen swap chain.
Because your app's surface is just one of many that the DWM is juggling, the DWM is going to be doing all kinds of dynamic adaptation of its own, under an assumption of each client behaving consistently. Unpredictable adaptations are uncooperative in such a regime, and will likely just end up confounding both parties.
Notes:1. The resolution of QPC is many orders of magnitude higher than that of the DateTime tick, despite the the latter's suggestive use of a 100 ns. unit denomination. Think of DateTime.Now.Ticks as a repackaging of the (millisecond-denoted) Environment.TickCount, but converted to 100-ns units. For the highest possible resolution, use static method Stopwatch.GetTimestamp() instead of DateTime.Now.Ticks.
Another alternative:
There's D3DKMTGetScanLine() which works with D3D9, D3D10, D3D11, D3D12, and even OpenGL.
It's actually a GDI32 function so you piggyback off the Window's existing graphics hAdaptor to poll the VBlank/Scanline -- no need to create a Direct3D frame buffer. That's why this API works fine with OpenGL, Mantle, and non-Direct3D renderers too, despite the D3D prefix of this API call.
It also tells you VBlank status & Raster scan line.
It's useful for beam-racing applications in supreme "latency-is-critical" applications. Some virtual reality renders use beam racing, when even a mere 20ms of lag can mean the difference between pleasant VR and dizzying/pukeworthy VR.
Beam racing is rendering on the fly, following the scanout of a display. In speciallized latency-critical applications, you can reduce latency from Direct3D Present() to pixels hitting your eyeballs, to absolute minimum (as little as 3ms).
To understand what beam racing is, https://www.wired.com/2009/03/racing-the-beam/ -- it was common back in the day when graphics chips had no frame buffers -- making beam racing necessary for improved graphics on Atari 2600, Nintendo, Commodore 64, etc...
For a more modern implementation of beam racing, see Lagless VSYNC ON Algorithm for Emulators.
"Specifically I need to know with, the greatest possible accuracy, when whatever was flipped onto the primary surface (or presented in the swap chain) is actually being drawn by the screen."
Good luck.
There is actually no guarantee that anything you put into the present queue will ever be shown on screen (!!); you can manually drop frames w/ buffer sequencing present flags, or NVIDIA can do it for you (... thanks?)
Buffer Sequencing in DXGI
The DXGI Swapchain's flip queue is generally FIFO, but popular new driver overrides (i.e. FastSync) that users concerned with latency will most assuredly have enabled, favor CPU-side throughput over such trivial things as displaying any of the frames you draw :)
Normally you could count on IDXGISwapChain::Present (...) to begin blocking when the swapchain is full of undisplayed images and the driver is staging commands n-many frames ahead of the GPU, but with FastSync forced, Present never blocks and the render-ahead-queue flushes its work by overwriting any completed frames in the Swapchain that are waiting on VBLANK.
Back-to-back presents that complete quicker than screen refresh are under no obligation to (and will not) scan-out, thus their status in relation to VBLANK is meaningless.
Unless you implement rate limiting yourself to prevent the CPU from immediately staging the next frame after any call to Present, you need a different paradigm for measuring frame status altogether.
D3D9Ex / DXGI Supports Presentation Statistics in Flip / Fullscreen Exclusive:
Frames do not actually present to a user unless the following APIs say they do:
IDXGISwapChain::GetFrameStatistics (...) and IDXGISwapChain::GetLastPresentCount (...)
You can use frame stats to compute the length of the render queue / present latency in real-time, and your timing goals likely can be satisfied by tracking a present # against the accounting information for successfully sync'd frames.
The question here is why? It looks like you want to solve a symptom of your issue; maybe that's a distraction from your real issue. Waiting for vsync was a useful technique on Amiga or DOS. It is totally wrong on any compositing or multithreading OS.
First, what do you want to achieve? Tearing-free rendering is done by setting a swap interval on either D3D or OpenGL. It is harmful to try to do better than the OS there. Just think about cases like multiple monitors or what happens if more than one app tries to sync.
If you are a client to some other process and want to run your timing on VSync, Windows unfortunately offers no object to wait on as far as I know. Your best bet is to still rely on the Present call and estimate what is happening.
There are two cases: You are either rendering (presenting) faster or slower than vsync. If you are faster, Present should block for you already. If present never waits and your time between calls is more than 1/60 sec., you probably want to render less often.
The most common case why people care about VSync is video. You can render a lot faster than vsync but want to wait for just the right time to present. The only thing to do there is to run a few frames as fast as you can and from that estimate you frame timing. Use some jitter and feedback... or use built in hardware video that is happy enough to be kernel friends with the video driver.