I have a simple encoding/ decoding application using Windows Imaging Component API. The issue I'm having is that when I use either the JPEGXR or BMP formats, everything works fine. However, when I use the JPEG codec - the encoder works fine and I can visually verify the generated JPEG image, but when I try to decode that stream, I get a WINCODEC_ERR_BADHEADER (0x88982f61)
Here's the line that fails:
hr = m_pFactory->CreateDecoderFromStream(
pInputStream,
NULL,
WICDecodeMetadataCacheOnDemand,
&pDecoder);
Here pInputStream is an IStream created from a byte array (output of the encoder - a black box which outputs a byte vector).
Please help! This is driving me nuts!
When passing stream as an argument, make sure to pre-seek it to proper initial position (esp. seek it back to the beginning if you just wrote data into it and expect further retrieval). APIs are typically not expected to seek, because this way they let you provide data in the middle of a bigger stream.
Related
I am new to DirectShow API.
I want to decode a media file and get uncompressed RGB video frames using DirectShow.
I noted that all such operations should be completed through a GraphBuilder. Also, every the processing block is called a filter and there are many different filters for different media files. For example, for decoding H264 we should use "Microsoft MPEG-2 Video Decoder", for AVI files "AVI Splitter Filter" etc.
I would like to know if there is a general way (decoder) that can handle all those file types?
I would really appreciate if someone can point out an example that goes from importing a local file to decoding it into uncompressed RGB frames. All the examples I found are dealing with window handles and they just configure it and call pGraph->run(). I have also surfed through Windows SDK samples, but couldn't find useful samples.
Thanks very much in advance.
Universal DirectShow decoder in general is against the concept of DirectShow API. The whole idea is that individual filters are responsible for individual task (esp. decoding certain encoding or demultiplexing certain container format). The registry of the filters and Intelligent Connect let one to have the filters built in chain to do certain requested processing, in particular decoding from compressed format to 24-bit RGB for video.
From this standpoint you don't need a universal decoder and it is not expected that such decoder exists. However, such decoder (or close) does exist and it's a ffdshow or one of its derivatives. Presently, you might want to look at LAVFilters, for example. They wrap FFmpeg, which itself can handle many formats, and connect it to DirectShow API so that, as as filter, ffdshow could handle many formats/encodings.
There is no general rule to use or not use such codec pack, in most cases you take into consideration various factors and decide what to do. If your application handles various scenarios, a good starting point into graph building would be Overview of Graph Building.
My goal is to accomplish the task using DirectShow in order to have no external dependencies. Do you know a particular example that does uncompressing frames for some file type?
Your request is too broad and in the same time typical and, to some extent, fairy simple. If you spend some time playing with GraphEdit SDK tool, or rather GraphStudioNext, which is a more powerful version of the former, you will be able to build filter graph interactively, also render media files of different types and see what filters participate in rendering. You can accomplish the very same programmatically too, since the interactive actions basically all have matching API calls individually.
You will be able to see that specific formats are handled by different filters and Intelligent Connect mentioned above is building chains of filters in combinations in order to satisfy the requests and get the pipeline together.
Default use case is playback, and if you want to get video rendered to 24/32-bit RGB, your course of actions is pretty much similar: you are to build a graph, which just needs to terminate with something else. More flexible, sophisticated and typical for advanced development approach is to supply a custom video renderer filter and accept decompressed RGB frames on it.
A simple and so much popular version of the solution is to use Sample Grabber filter, initialize it to accept RGB, setup a callback on it so that your SampleCB callback method is called every time RGB frame is decompressed, and use Sample Grabber in the graph. (You will find really a lot of attempts to accomplish that if you search open source code and/or web for keywords ISampleGrabber, ISampleGrabberCB, SampleCB or BufferCB, MEDIASUBTYPE_RGB24).
Using the Sample Grabber
DirectShow: Examples for Using SampleGrabber for Grabbing a Frame and Building a VU Meter
Another more or less popular approach is to setup a playback pipeline, play a file, and read back frames from video presenter. This is suggested in another answer to the question, is relatively easy to do, and does the job if you don't have performance requirement and requirements to extract every single frame. That is, it is a good way to get a random RGB frame from the feed but not every/all frames. See related on this:
Different approaches on getting captured video frames in DirectShow
You are looking for vmr9 example in DirectShow library.
In your Windows SDK's install, look for this example:
Microsoft SDKs\Windows\v7.0\Samples\multimedia\directshow\vmr9\windowless\windowless.sln
And search this function: CaptureImage, in this method, see IVMRWindowlessControl9::GetCurrentImage, is exactly what you want.
This method captures a video frame in bitmap format (RGB).
Next, this is a copy of CaptureImage code:
BOOL CaptureImage(LPCTSTR szFile)
{
HRESULT hr;
if(pWC && !g_bAudioOnly)
{
BYTE* lpCurrImage = NULL;
// Read the current video frame into a byte buffer. The information
// will be returned in a packed Windows DIB and will be allocated
// by the VMR.
if(SUCCEEDED(hr = pWC->GetCurrentImage(&lpCurrImage)))
{
BITMAPFILEHEADER hdr;
DWORD dwSize, dwWritten;
LPBITMAPINFOHEADER pdib = (LPBITMAPINFOHEADER) lpCurrImage;
// Create a new file to store the bitmap data
HANDLE hFile = CreateFile(szFile, GENERIC_WRITE, FILE_SHARE_READ, NULL,
CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);
if (hFile == INVALID_HANDLE_VALUE)
return FALSE;
// Initialize the bitmap header
dwSize = DibSize(pdib);
hdr.bfType = BFT_BITMAP;
hdr.bfSize = dwSize + sizeof(BITMAPFILEHEADER);
hdr.bfReserved1 = 0;
hdr.bfReserved2 = 0;
hdr.bfOffBits = (DWORD)sizeof(BITMAPFILEHEADER) + pdib->biSize +
DibPaletteSize(pdib);
// Write the bitmap header and bitmap bits to the file
WriteFile(hFile, (LPCVOID) &hdr, sizeof(BITMAPFILEHEADER), &dwWritten, 0);
WriteFile(hFile, (LPCVOID) pdib, dwSize, &dwWritten, 0);
// Close the file
CloseHandle(hFile);
// The app must free the image data returned from GetCurrentImage()
CoTaskMemFree(lpCurrImage);
// Give user feedback that the write has completed
TCHAR szDir[MAX_PATH];
GetCurrentDirectory(MAX_PATH, szDir);
// Strip off the trailing slash, if it exists
int nLength = (int) _tcslen(szDir);
if (szDir[nLength-1] == TEXT('\\'))
szDir[nLength-1] = TEXT('\0');
Msg(TEXT("Captured current image to %s\\%s."), szDir, szFile);
return TRUE;
}
else
{
Msg(TEXT("Failed to capture image! hr=0x%x"), hr);
return FALSE;
}
}
return FALSE;
}
I am trying to get at specific frame from a video file using OpenCV 2.4.11.
I have tried to follow the documentation and online tutorials of how to do it correctly and have now tested two approaches:
1) The first method is brute force reading each frame using the video.grab() until I reach the specific frame (timestamp) I want. This method is slow if the specific frame is late in the video sequence!
string videoFile(videoFilename);
VideoCapture video(videoFile);
double videoTimestamp = video.get(CV_CAP_PROP_POS_MSEC);
int videoFrameNumber = static_cast<int>(video.get(CV_CAP_PROP_POS_FRAMES));
while (videoTimestamp < targetTimestamp)
{
videoTimestamp = video.get(CV_CAP_PROP_POS_MSEC);
videoFrameNumber = static_cast<int>(video.get(CV_CAP_PROP_POS_FRAMES));
// Grabe frame (but don't decode the frame as we are only "Fast forwarding")
video.grab();
}
// Get and save frame
if (video.retrieve(frame))
{
char txtBuffer[100];
sprintf(txtBuffer, "Video1Frame_Target_%f_TS_%f_FN_%d.png", targetTimestamp, videoTimestamp, videoFrameNumber);
string imgName = txtBuffer;
imwrite(imgName, frame);
}
2) The second method I uses the video.set(...). This method is faster and doesn't seem to be any slower if the specific frame is late in the video sequence.
string videoFile(videoFilename);
VideoCapture video2(videoFile);
videoTimestamp = video2.get(CV_CAP_PROP_POS_MSEC);
videoFrameNumber = static_cast<int>(video2.get(CV_CAP_PROP_POS_FRAMES));
video2.set(CV_CAP_PROP_POS_MSEC, targetTimestamp);
while (videoTimestamp < targetTimestamp)
{
videoTimestamp = video2.get(CV_CAP_PROP_POS_MSEC);
videoFrameNumber = (int)video2.get(CV_CAP_PROP_POS_FRAMES);
// Grabe frame (but don't decode the frame as we are only "Fast forwarding")
video2.grab();
}
// Get and save frame
if (video2.retrieve(frame))
{
char txtBuffer[100];
sprintf(txtBuffer, "Video2Frame_Target_%f_TS_%f_FN_%d.png", targetTimestamp, videoTimestamp, videoFrameNumber);
string imgName = txtBuffer;
imwrite(imgName, frame);
}
Problem) Now the issue is that using the two methods does end up with the same frame number of the content of the target image frame is not equal?!?
I am tempted to conclude that Method 1 is the correct one and there is something wrong with the OpenCV video.set(...) method. But if I use the VLC player finding the approximate target frame position it is actually Method 2 that is closest to a "correct" result?
As some extra info: I have tested the same video sequence but in two different video files being encoded with respectively 'avc1' MPG4 and 'wmv3' WMV codec.
Using the WMV file the two found frames are way off?
Using the MPG4 file the two found frames are only slightly off?
Is there anybody having some experience with this, can explain my findings and tell me the correct way to get a specific frame from a video file?
Obviously there's still a bug in opencv/ ffmpeg.
ffmpeg doesn't deliver the frames that are wanted and/or opencv doesn't handles this. See here and here.
[Edit:
Until that bug is fixed (either in ffmpeg or (as a work-around in opencv)) the only way to get exact frame by number is to "fast forward" as you did.
(Concerning VLC-player: I suspect that it uses that buggy set ()-interface. As for a player it is usually not too important to seek frame-exact. But for an editor it is).]
I think that OpenCV uses FFmpeg for video decoding.
We once had a similar problem but used FFmpeg directly. By default, random (but exact) frame access isn't guaranteed. The WMV decoder was particularly fuzzy.
Newer versions of FFmpeg allow you access to lower-level routines which can be used to build a retrieval function for frames. This solution was a little involved and nothing I can remember off my head right now. I try to find some more details later.
As a quick work-around, I would suggest to decode your videos off-line and then work on sequences off images. Though, this increases the amount of storage needed, it guarantees exact random frame access. You can use FFmpeg to convert your video file in to a sequence of images like this:
ffmpeg -i "input.mov" -an -f image2 "output_%05d.png"
I am trying to save the decoded image file back as a BMP image using the code in CUDA Decoder project.
if (g_bReadback && g_ReadbackSID)
{
CUresult result = cuMemcpyDtoHAsync(g_bFrameData[active_field], pDecodedFrame[active_field], (nDecodedPitch * nHeight * 3 / 2), g_ReadbackSID);
long padded_size = (nWidth * nHeight * 3 );
CString output_file;
output_file.Format(_T("image/sample_45.BMP"));
SaveBMP(g_bFrameData[active_field],nWidth,nHeight,padded_size,output_file );
if (result != CUDA_SUCCESS)
{
printf("cuMemAllocHost returned %d\n", (int)result);
}
}
But the saved image looks like this
Can anybody help me out here what am i doing wrong .. Thank you.
After investigating further, there were several modifications I made to your approach.
pDecodedFrame is actually in some non-RGB format, I think it is NV12 format which I believe is a particular YUV variant.
pDecodedFrame gets converted to an RGB format on the GPU using a particular CUDA kernel
the target buffer for this conversion will either be a surface provided by OpenGL if g_bUseInterop is specified, or else an ordinary region allocated by the driver API version of cudaMalloc if interop is not specified.
The target buffer mentioned above is pInteropFrame (even in the non-interop case). So to make an example for you, for simplicity I chose to only use the non-interop case, because it's much easier to grab the RGB buffer (pInteropFrame) in that case.
The method here copies pInteropFrame back to the host, after it has been populated with the appropriate RGB image by cudaPostProcessFrame. There is also a routine to save the image as a bitmap file. All of my modifications are delineated with comments that include RMC so search for that if you want to find all the changes/additions I made.
To use, drop this file in the cudaDecodeGL project as a replacement for the videoDecodeGL.cpp source file. Then rebuild the project. Then run the executable normally to display the video. To capture a specific frame, run the executable with the nointerop command-line switch, eg. cudaDecodGL nointerop and the video will not display, but the decode operation and frame capture will take place, and the frame will be saved in a framecap.bmp file. If you want to change the specific frame number that is captured, modify the g_FrameCapSelect = 37; variable to some other number besides 37, and recompile.
Here is the replacement for videoDecodeGL.cpp I used pastebin because SO has a limit on the number of characters that can be entered in a question body.
Note that my approach is independent of whether readback is specified. I would recommend not using readback for this sequence.
Thanks for taking some time to read my question.
I'm developping a C++ application using Qt and windows API.
I'm recording the microphone output in small 10s audio files in raw format, and I want to convert them to aac format.
I have tried to read as many things as I could, and thought it would be a great idea to start from windows media foundation transcode API.
Problem is, I can't seem to use a .raw or .pcm file in the "CreateObjectFromUrl" function, and so I'm pretty much stuck here for the moment. It keeps on failing. The hr return code equals 3222091460. I have tried to pass an .mp3 file to the function and of course it works, so no url-human-failure involved.
MF_OBJECT_TYPE ObjectType = MF_OBJECT_INVALID;
IMFSourceResolver* pSourceResolver = NULL;
IUnknown* pUnkSource = NULL;
// Create the source resolver.
hr = MFCreateSourceResolver(&pSourceResolver);
if (FAILED(hr))
{
qDebug() << "Failed !";
}
// Use the source resolver to create the media source.
hr = pSourceResolver->CreateObjectFromURL(
sURL, // URL of the source.
MF_RESOLUTION_MEDIASOURCE, // Create a source object.
NULL, // Optional property store.
&ObjectType, // Receives the created object type.
&pUnkSource // Receives a pointer to the media source.
);
The MFCreateSourceResolver works fine, but CreateObjectFromURL does not succeed :(
So I have two questions for you folks :
Is it possible to encode raw audio files to aac files using windows media foundation ?
If yes, what should I read to accomplish what I want ?
I want to point out that I can't just use ffmpeg or libav because I can't afford any license for my software, and don't want it to be under the GPL license. But if there are alternatives to windows media foundations to encode raw audio files to aac, I would be glad to hear them.
And finally, sorry for my bad english, this is obviously not my native language and I'm sorry if I made your eyes bleed. (and happy if I made you laugh)
Have a nice day
The hr return code equals 3222091460
Those are HRESULT codes. Use this "ShowHresult" tool to have them conveniently decoded for you. The code means 0xC00D36C4 MF_E_UNSUPPORTED_BYTESTREAM_TYPE "The byte stream type of the given URL is unsupported."
The problem is basically that there is no support for these raw files, .WAV is a good source for raw audio - the file holds both format descriptor and the payload.
You can obviously read data from the raw audio file yourself and compress into AAC using Media Foundation's AAC Encoder via its IMFTransform interface. This is reasonably easy and you have AAC data on the output to e.g. write into raw .AAC.
Alternate options to Media Foundation is DirectShow (there are suitable codecs, though I thought it might be not so easy to start), libfaac, FFmpeg's libavcodec (available under LGPL, not GPL).
I want to create a program, which gets a video-file from Qt, converts that video file to TIFF-files and sends them to an algorithm which handles these TIFF-Files.
My questions:
is it possible with ffmpeg or avcodec not to convert a video-file to TIFF-files first on harddrive and send them to the algorithm after that, but to convert frame for frame and send it to the algorithm right away?
The more important question: Is it possible to do that not with an external process with ffmpeg.exe, but with ffmpeg.dll? Or is it only possible with avcodec.dll? (It doesn't have to be "on-the-fly" like at my point above) How can I create a ffmpeg.dll with header and lib?
for exporting tif :
http://www.repaire.net/forums/cinema-numerique/215306-projet-dencodage-dcp.html
Creating a tiff from second 29 in a mpeg, using ffmpeg dd201110 can be done like this:
ffmpeg -i 'test.mpg' -vframes 1 -compression_level 0 -ss 29 'test.tiff'
YMMV :-D
If you dont want to store the image as a file, take a look at ffmpeg-php
http://ffmpeg-php.sourceforge.net/
$movie->getFrame([Integer framenumber])
Returns a frame from the movie as an ffmpeg_frame object.
$frame->toGDImage()
Returns a truecolor GD image of the frame.
There may be C code underneath you can reuse..