DirectShow universal media decoder - c++

I am new to DirectShow API.
I want to decode a media file and get uncompressed RGB video frames using DirectShow.
I noted that all such operations should be completed through a GraphBuilder. Also, every the processing block is called a filter and there are many different filters for different media files. For example, for decoding H264 we should use "Microsoft MPEG-2 Video Decoder", for AVI files "AVI Splitter Filter" etc.
I would like to know if there is a general way (decoder) that can handle all those file types?
I would really appreciate if someone can point out an example that goes from importing a local file to decoding it into uncompressed RGB frames. All the examples I found are dealing with window handles and they just configure it and call pGraph->run(). I have also surfed through Windows SDK samples, but couldn't find useful samples.
Thanks very much in advance.

Universal DirectShow decoder in general is against the concept of DirectShow API. The whole idea is that individual filters are responsible for individual task (esp. decoding certain encoding or demultiplexing certain container format). The registry of the filters and Intelligent Connect let one to have the filters built in chain to do certain requested processing, in particular decoding from compressed format to 24-bit RGB for video.
From this standpoint you don't need a universal decoder and it is not expected that such decoder exists. However, such decoder (or close) does exist and it's a ffdshow or one of its derivatives. Presently, you might want to look at LAVFilters, for example. They wrap FFmpeg, which itself can handle many formats, and connect it to DirectShow API so that, as as filter, ffdshow could handle many formats/encodings.
There is no general rule to use or not use such codec pack, in most cases you take into consideration various factors and decide what to do. If your application handles various scenarios, a good starting point into graph building would be Overview of Graph Building.
My goal is to accomplish the task using DirectShow in order to have no external dependencies. Do you know a particular example that does uncompressing frames for some file type?
Your request is too broad and in the same time typical and, to some extent, fairy simple. If you spend some time playing with GraphEdit SDK tool, or rather GraphStudioNext, which is a more powerful version of the former, you will be able to build filter graph interactively, also render media files of different types and see what filters participate in rendering. You can accomplish the very same programmatically too, since the interactive actions basically all have matching API calls individually.
You will be able to see that specific formats are handled by different filters and Intelligent Connect mentioned above is building chains of filters in combinations in order to satisfy the requests and get the pipeline together.
Default use case is playback, and if you want to get video rendered to 24/32-bit RGB, your course of actions is pretty much similar: you are to build a graph, which just needs to terminate with something else. More flexible, sophisticated and typical for advanced development approach is to supply a custom video renderer filter and accept decompressed RGB frames on it.
A simple and so much popular version of the solution is to use Sample Grabber filter, initialize it to accept RGB, setup a callback on it so that your SampleCB callback method is called every time RGB frame is decompressed, and use Sample Grabber in the graph. (You will find really a lot of attempts to accomplish that if you search open source code and/or web for keywords ISampleGrabber, ISampleGrabberCB, SampleCB or BufferCB, MEDIASUBTYPE_RGB24).
Using the Sample Grabber
DirectShow: Examples for Using SampleGrabber for Grabbing a Frame and Building a VU Meter
Another more or less popular approach is to setup a playback pipeline, play a file, and read back frames from video presenter. This is suggested in another answer to the question, is relatively easy to do, and does the job if you don't have performance requirement and requirements to extract every single frame. That is, it is a good way to get a random RGB frame from the feed but not every/all frames. See related on this:
Different approaches on getting captured video frames in DirectShow

You are looking for vmr9 example in DirectShow library.
In your Windows SDK's install, look for this example:
Microsoft SDKs\Windows\v7.0\Samples\multimedia\directshow\vmr9\windowless\windowless.sln
And search this function: CaptureImage, in this method, see IVMRWindowlessControl9::GetCurrentImage, is exactly what you want.
This method captures a video frame in bitmap format (RGB).
Next, this is a copy of CaptureImage code:
BOOL CaptureImage(LPCTSTR szFile)
{
HRESULT hr;
if(pWC && !g_bAudioOnly)
{
BYTE* lpCurrImage = NULL;
// Read the current video frame into a byte buffer. The information
// will be returned in a packed Windows DIB and will be allocated
// by the VMR.
if(SUCCEEDED(hr = pWC->GetCurrentImage(&lpCurrImage)))
{
BITMAPFILEHEADER hdr;
DWORD dwSize, dwWritten;
LPBITMAPINFOHEADER pdib = (LPBITMAPINFOHEADER) lpCurrImage;
// Create a new file to store the bitmap data
HANDLE hFile = CreateFile(szFile, GENERIC_WRITE, FILE_SHARE_READ, NULL,
CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);
if (hFile == INVALID_HANDLE_VALUE)
return FALSE;
// Initialize the bitmap header
dwSize = DibSize(pdib);
hdr.bfType = BFT_BITMAP;
hdr.bfSize = dwSize + sizeof(BITMAPFILEHEADER);
hdr.bfReserved1 = 0;
hdr.bfReserved2 = 0;
hdr.bfOffBits = (DWORD)sizeof(BITMAPFILEHEADER) + pdib->biSize +
DibPaletteSize(pdib);
// Write the bitmap header and bitmap bits to the file
WriteFile(hFile, (LPCVOID) &hdr, sizeof(BITMAPFILEHEADER), &dwWritten, 0);
WriteFile(hFile, (LPCVOID) pdib, dwSize, &dwWritten, 0);
// Close the file
CloseHandle(hFile);
// The app must free the image data returned from GetCurrentImage()
CoTaskMemFree(lpCurrImage);
// Give user feedback that the write has completed
TCHAR szDir[MAX_PATH];
GetCurrentDirectory(MAX_PATH, szDir);
// Strip off the trailing slash, if it exists
int nLength = (int) _tcslen(szDir);
if (szDir[nLength-1] == TEXT('\\'))
szDir[nLength-1] = TEXT('\0');
Msg(TEXT("Captured current image to %s\\%s."), szDir, szFile);
return TRUE;
}
else
{
Msg(TEXT("Failed to capture image! hr=0x%x"), hr);
return FALSE;
}
}
return FALSE;
}

Related

Efficiently Capture Frames in DirectX 11

I'm trying to capture every frame of a game I am playing. There are plenty of good screen capturing softwares out there, One of which is built right into Windows 10.
However I need a custom approach for different reasons. I'm currently using the DirectX Tool Kits SaveWICTextureToFile() method to save every frame produced by Present().
https://github.com/microsoft/DirectXTK
For every frame that is captured, I'd like to tag the end of the file name with it's applicable number. ScreenShot_0 through ScreenShot_n.
The method SaveWICTextureToFile() saves the screenshot for you like so:
DirectX::SaveWICTextureToFile(context, backbufferTex, GUID_ContainerFormatJpeg, L"C:/Users/User Name/Desktop/Images/ScreenShot.JPG");
This doesn't allow you to capture frames sequentially. It simply writes over the same file for each frame. The performance however is very smooth. No lagging whatsoever during gameplay.
To try and write a file for each frame I did the following:
#include <sstream>
int Frame_Number;
//For each Call to Present() do the following:
//Get Device
ID3D11Device* device;
HRESULT gd = pSwapChain->GetDevice(__uuidof(ID3D11Device), (void**)&device);
assert(gd == S_OK);
//Get context
ID3D11DeviceContext* context;
device->GetImmediateContext(&context);
//get back buffer
ID3D11Texture2D* backbufferTex;
HRESULT gb = pSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2d), (LPVOID*)&backbufferTex);
assert(gb == S_OK);
//Set-up Directory
std::wstringstream Image_Directory;
Image_Directory << L"C:/Users/User Name/Desktop/Images/ScreenShot_" << Frame_Number << L".JPG";
//Capture Frame
REFGUID GUID_ContainerFormatJpeg{ 0x19e4a5aa, 0x5662, 0x4fc5, 0xa0, 0xc0, 0x17, 0x58, 0x2, 0x8e, 0x10, 0x57 };
HRESULT hr = DirectX::SaveWICTextureToFile(context, backbufferTex, GUID_ContainerFormatJpeg, Image_Directory.str().c_str());
assert(hr == S_OK);
Frame_Number = Frame_Number + 1;
This worked, however the performance is choppy. As compared to the previous method, I don't get smooth gameplay anymore. Would somebody be able to recommend a more efficient way to do this?
If this is being done in a debug build I suspect that the stringstream has logic that is causing the problems, I suggest using character arrays instead and swprintf. Even better if you keep track of where the directory ends and so only need to write out the file name part (you could go even further and make it so you only need to format the number and extension).
It simply writes over the same file for each frame. The performance however is very smooth.
If everything else is equal, there are several possibilities for the slowdown:
The file is being written only on memory and never committed to disk, since it keeps being overwritten.
Opening many files per second is never a good idea, specially on Windows which is particularly slow at this compared to eg. Linux.
Writing many files into the same folder is another bad idea, since many filesystems do not handle that case well.
Find out which one of those three is the culprit, and iterate from there.
And let us know! It is always interesting to hear how fast IO is nowadays for different use cases :-)

can ISampleGrabber convert the video frames to a specific mediaType?

I found this nice example on the internet explaining how directshow works.
http://alax.info/trac/public/browser/trunk/Utilities/SetLifeCamStudioResolutionSample/SetLifeCamStudioResolutionSample.cpp
In that example there is two samplegrabbers. One called NON-RGB grabber, and one called RGB-grabber.
The first one: (NON-RGB)
#pragma region Non-RGB Sample Grabber
{
CComPtr<IBaseFilter> pBaseFilter;
ATLENSURE_SUCCEEDED(pBaseFilter.CoCreateInstance(__uuidof(SampleGrabber)));
ATLENSURE_SUCCEEDED(pFilterGraph->AddFilter(pBaseFilter, L"Non-RGB Sample Grabber")); // This will connect in MJPG format
const CComQIPtr<ISampleGrabber> pSampleGrabber = pBaseFilter;
ATLASSERT(pSampleGrabber);
#if TRUE
// NOTE: IFilterGraph::Connect would do just fine, but with a real capture device, if we prefer having Smart Tee added, we need to use
// Capture Graph Builder (only here)
CComPtr<ICaptureGraphBuilder2> pCaptureGraphBuilder;
ATLENSURE_SUCCEEDED(pCaptureGraphBuilder.CoCreateInstance(CLSID_CaptureGraphBuilder2));
ATLENSURE_SUCCEEDED(pCaptureGraphBuilder->SetFiltergraph(pFilterGraph));
ATLENSURE_SUCCEEDED(pCaptureGraphBuilder->RenderStream(&PIN_CATEGORY_CAPTURE, NULL, pCurrentOutputPin, NULL, pBaseFilter));
#else
ATLENSURE_SUCCEEDED(pFilterGraph->Connect(pCurrentOutputPin, GetPin(pBaseFilter, 0)));
#endif
MessageBox(GetActiveWindow(), _T("After Non-RGB Sample Grabber Connected"), _T("Debug"), MB_OK);
pCurrentOutputPin = GetPin(pBaseFilter, 1);
}
#pragma endregion
the second: (RGB)
#pragma region RGB Sample Grabber
{
CComPtr<IBaseFilter> pBaseFilter;
ATLENSURE_SUCCEEDED(pBaseFilter.CoCreateInstance(__uuidof(SampleGrabber)));
ATLENSURE_SUCCEEDED(pFilterGraph->AddFilter(pBaseFilter, L"RGB Sample Grabber"));
const CComQIPtr<ISampleGrabber> pSampleGrabber = pBaseFilter;
ATLASSERT(pSampleGrabber);
AM_MEDIA_TYPE MediaType;
ZeroMemory(&MediaType, sizeof MediaType);
MediaType.majortype = MEDIATYPE_Video;
MediaType.subtype = MEDIASUBTYPE_RGB24;
ATLENSURE_SUCCEEDED(pSampleGrabber->SetMediaType(&MediaType));
ATLENSURE_SUCCEEDED(pFilterGraph->Connect(pCurrentOutputPin, GetPin(pBaseFilter, 0)));
MessageBox(GetActiveWindow(), _T("After RGB Sample Grabber Connected"), _T("Debug"), MB_OK);
pCurrentOutputPin = GetPin(pBaseFilter, 1);
}
#pragma endregion
The method "setmediatype()" is used only in the "RGB" version. But i wonder. On MSDN page is says that setmediatype() says what type of data that is avaiable to in input pin of the sample grabber filter. And if it possible to use the sample grabber without seting the media type, why should I set it to anything?
Questions:
Do sample grabber do any type of media converting?
Why should I set the media type of the sample grabber?
If the mediatype form the cam is set to MJPG, and I set the media type to RGB24 in the samplegrabber, what happens?
Could there be any performance difference of using one over another? To increase the performance (fps) of the software, should I remove one of the sample grabbers?
Thanks!
Sample Grabber Filter does not do any conversion. This is why it is flexible to accept variety of formats, video and audio included, without being aware of individual format specific.
When you set media type on Sample Grabber, you force it to use this type only. To only accept this type and reject other. Together with Intelligent Connect, this works in a way that DirectShow might provide additional filters to convert to requested format, if possible. It is typically possible with 24-bit RGB because it is a sort of "universal uncompressed video format". This is why is is safe to set media type to 24-bit RGB and in the same time it is going to fail with almost any compressed video format (unless source already can supply exactly a match).
Note that if Intelligent Connect supplies additional conversion filters, they are attached upstream to Sample Grabber, not inside it.

WIC WINCODEC_ERR_BADHEADER only for JPEG images

I have a simple encoding/ decoding application using Windows Imaging Component API. The issue I'm having is that when I use either the JPEGXR or BMP formats, everything works fine. However, when I use the JPEG codec - the encoder works fine and I can visually verify the generated JPEG image, but when I try to decode that stream, I get a WINCODEC_ERR_BADHEADER (0x88982f61)
Here's the line that fails:
hr = m_pFactory->CreateDecoderFromStream(
pInputStream,
NULL,
WICDecodeMetadataCacheOnDemand,
&pDecoder);
Here pInputStream is an IStream created from a byte array (output of the encoder - a black box which outputs a byte vector).
Please help! This is driving me nuts!
When passing stream as an argument, make sure to pre-seek it to proper initial position (esp. seek it back to the beginning if you just wrote data into it and expect further retrieval). APIs are typically not expected to seek, because this way they let you provide data in the middle of a bigger stream.

Raw Audio File to AAC using Windows Media Foundation on Windows 7

Thanks for taking some time to read my question.
I'm developping a C++ application using Qt and windows API.
I'm recording the microphone output in small 10s audio files in raw format, and I want to convert them to aac format.
I have tried to read as many things as I could, and thought it would be a great idea to start from windows media foundation transcode API.
Problem is, I can't seem to use a .raw or .pcm file in the "CreateObjectFromUrl" function, and so I'm pretty much stuck here for the moment. It keeps on failing. The hr return code equals 3222091460. I have tried to pass an .mp3 file to the function and of course it works, so no url-human-failure involved.
MF_OBJECT_TYPE ObjectType = MF_OBJECT_INVALID;
IMFSourceResolver* pSourceResolver = NULL;
IUnknown* pUnkSource = NULL;
// Create the source resolver.
hr = MFCreateSourceResolver(&pSourceResolver);
if (FAILED(hr))
{
qDebug() << "Failed !";
}
// Use the source resolver to create the media source.
hr = pSourceResolver->CreateObjectFromURL(
sURL, // URL of the source.
MF_RESOLUTION_MEDIASOURCE, // Create a source object.
NULL, // Optional property store.
&ObjectType, // Receives the created object type.
&pUnkSource // Receives a pointer to the media source.
);
The MFCreateSourceResolver works fine, but CreateObjectFromURL does not succeed :(
So I have two questions for you folks :
Is it possible to encode raw audio files to aac files using windows media foundation ?
If yes, what should I read to accomplish what I want ?
I want to point out that I can't just use ffmpeg or libav because I can't afford any license for my software, and don't want it to be under the GPL license. But if there are alternatives to windows media foundations to encode raw audio files to aac, I would be glad to hear them.
And finally, sorry for my bad english, this is obviously not my native language and I'm sorry if I made your eyes bleed. (and happy if I made you laugh)
Have a nice day
The hr return code equals 3222091460
Those are HRESULT codes. Use this "ShowHresult" tool to have them conveniently decoded for you. The code means 0xC00D36C4 MF_E_UNSUPPORTED_BYTESTREAM_TYPE "The byte stream type of the given URL is unsupported."
The problem is basically that there is no support for these raw files, .WAV is a good source for raw audio - the file holds both format descriptor and the payload.
You can obviously read data from the raw audio file yourself and compress into AAC using Media Foundation's AAC Encoder via its IMFTransform interface. This is reasonably easy and you have AAC data on the output to e.g. write into raw .AAC.
Alternate options to Media Foundation is DirectShow (there are suitable codecs, though I thought it might be not so easy to start), libfaac, FFmpeg's libavcodec (available under LGPL, not GPL).

Virtual Webcam Driver

I want to develop a virtual webcam driver which from User mode I'll pass image to it and it will display as webcam output.
I don't want to use DirectX filter and CSourceStream etc. Because they don't work on some programs which doesn't use DirectX for capturing webcam image.
I have to write a kernel mode device driver so.
Any ideas? I tried testcap from DDK samples, but it doesn't process image from user mode and doesn't get any input, just it displays 7 colors in webcam...
Any help would be greatly appreciated.
Thanks
Thank you all!
I tried code from here:
http://tmhare.mvps.org/downloads.htm (find Capture Source Filter)
It worked well when I compiled it in Yahoo, MSN, but it crashed AIM, Internet Explorer Flash Webcam, Firefox Flash webcam and Skype... I got crash in QueryInterface after 8 time call to that, I found it with tracing it with a lot of tricks..
Now I know, it crashes on 8th call to
HRESULT CVCamStream::QueryInterface(REFIID riid, void **ppv)
8th call when it reaches to last if, I mean:
return CSourceStream::QueryInterface(riid, ppv);
It's in 17th line of Filters.cpp
Why do you think I'm getting crash??
Thank you all for guiding me to find correct solution which is DirectShow, not driver
There are several APIs from Microsoft which provide access to image data.
Twain: Used for single image capture from scanners, etc.
WIA: This seems to have degenerated to a single image codec library.
VfW: A very old (Win16) API which really works only Video-File encoding/decoding, but has support for some video acquisition.
DirectShow: previously part in the DirectX SDK, currently in the Platform SDK. This is the place to go for current (general) streaming solutions.
Windows Media/Media Foundation: This seems more to be geared at video playback/reencoding.
Manufacturer Specific Libraries: Pylon/Halcon/Imaging Control/...
DirectShow specific :
To create image acquisition devices under windows, you have to provide either a device (driver) which implements the streamclasses interfaces (or newer Avstream) or you have to write a usermode COM object which has to be added to the VideoInputCategory enumerator.
The Avstream sample provides everything for a real image acquisition device. Only the lower layer for the actual device really is missing.
If you can design a device, you should either create it DCAM or UVC compatible. For both there are built-in drivers supplied by windows.
How to write a software source device :
You have to create a DirectShow filter which provides at least one output pin and register this under the VideoInputCategory. There may be several interfaces certain applications require from a capture application, but these depend on the application itself. Simple applications to try out filters are GraphEdit and AMCap which are supplied in the Plattform SDK.
Some code :
#include <InitGuid.h>
#include <streams.h>
const AMOVIESETUP_MEDIATYPE s_VideoPinType =
{
&MEDIATYPE_Video, // Major type
&MEDIATYPE_NULL // Minor type
};
const AMOVIESETUP_PIN s_VideoOutputPin =
{
L"Output", // Pin string name
FALSE, // Is it rendered
TRUE, // Is it an output
FALSE, // Can we have none
FALSE, // Can we have many
&CLSID_NULL, // Connects to filter
NULL, // Connects to pin
1, // Number of types
&s_VideoPinType // Pin details
};
const AMOVIESETUP_FILTER s_Filter =
{
&CLSID_MyFilter, // Filter CLSID
L"bla", // String name
MERIT_DO_NOT_USE, // Filter merit
1, // Number pins
&s_VideoOutputPin // Pin details
};
REGFILTER2 rf2;
rf2.dwVersion = 1;
rf2.dwMerit = MERIT_DO_NOT_USE;
rf2.cPins = 1;
rf2.rgPins = s_Filter.lpPin;
HRESULT hr = pFilterMapper->RegisterFilter( CLSID_MyFilter, _FriendlyName.c_str(), 0,
&CLSID_VideoInputDeviceCategory, _InstanceID.c_str(), &rf2 );
if( FAILED( hr ) )
{
return false;
}
std::wstring inputCat = GUIDToWString( CLSID_VideoInputDeviceCategory );
std::wstring regPath = L"CLSID\\" + inputCat + L"\\Instance";
win32_utils::CRegKey hKeyInstancesDir;
LONG rval = openKey( HKEY_CLASSES_ROOT, regPath, KEY_WRITE, hKeyInstancesDir );
if( rval == ERROR_SUCCESS )
{
win32_utils::CRegKey hKeyInstance;
rval = createKey( hKeyInstancesDir, _InstanceID, KEY_WRITE, hKeyInstance );
....
_InstanceID is a GUID created for this 'virtual device' entry.
You can not decide how other program would call your driver. Most of programs will use DirectShow. Some would use the win3.x technology VFW. Many new programs, including Windows XP's scanner and camera wizard, may call you via the WIA interface. If you do not want to implement all that, you need to at least provide the DirectShow interface via WDM and let vfwwdm32.dll gives you a VFW interface, or write your own VFW driver.