Thanks for taking some time to read my question.
I'm developping a C++ application using Qt and windows API.
I'm recording the microphone output in small 10s audio files in raw format, and I want to convert them to aac format.
I have tried to read as many things as I could, and thought it would be a great idea to start from windows media foundation transcode API.
Problem is, I can't seem to use a .raw or .pcm file in the "CreateObjectFromUrl" function, and so I'm pretty much stuck here for the moment. It keeps on failing. The hr return code equals 3222091460. I have tried to pass an .mp3 file to the function and of course it works, so no url-human-failure involved.
MF_OBJECT_TYPE ObjectType = MF_OBJECT_INVALID;
IMFSourceResolver* pSourceResolver = NULL;
IUnknown* pUnkSource = NULL;
// Create the source resolver.
hr = MFCreateSourceResolver(&pSourceResolver);
if (FAILED(hr))
{
qDebug() << "Failed !";
}
// Use the source resolver to create the media source.
hr = pSourceResolver->CreateObjectFromURL(
sURL, // URL of the source.
MF_RESOLUTION_MEDIASOURCE, // Create a source object.
NULL, // Optional property store.
&ObjectType, // Receives the created object type.
&pUnkSource // Receives a pointer to the media source.
);
The MFCreateSourceResolver works fine, but CreateObjectFromURL does not succeed :(
So I have two questions for you folks :
Is it possible to encode raw audio files to aac files using windows media foundation ?
If yes, what should I read to accomplish what I want ?
I want to point out that I can't just use ffmpeg or libav because I can't afford any license for my software, and don't want it to be under the GPL license. But if there are alternatives to windows media foundations to encode raw audio files to aac, I would be glad to hear them.
And finally, sorry for my bad english, this is obviously not my native language and I'm sorry if I made your eyes bleed. (and happy if I made you laugh)
Have a nice day
The hr return code equals 3222091460
Those are HRESULT codes. Use this "ShowHresult" tool to have them conveniently decoded for you. The code means 0xC00D36C4 MF_E_UNSUPPORTED_BYTESTREAM_TYPE "The byte stream type of the given URL is unsupported."
The problem is basically that there is no support for these raw files, .WAV is a good source for raw audio - the file holds both format descriptor and the payload.
You can obviously read data from the raw audio file yourself and compress into AAC using Media Foundation's AAC Encoder via its IMFTransform interface. This is reasonably easy and you have AAC data on the output to e.g. write into raw .AAC.
Alternate options to Media Foundation is DirectShow (there are suitable codecs, though I thought it might be not so easy to start), libfaac, FFmpeg's libavcodec (available under LGPL, not GPL).
Related
HiI'm trying to create a "Speech to text" app that can transcribe any audio/video file. I've created an app based on this post and it works great for WAV files. But if I use an MP3 file, the line hr = cpInputStream->BindToFile(wInputFileName.c_str(), SPFM_OPEN_READONLY, &sInputFormat.FormatId(), sInputFormat.WaveFormatExPtr(), SPFEI_ALL_EVENTS); returns
The Parameter is incorrect
The question is, can I use MP3 files as input for SAPI? and if yes, how do I determine the correct format for the call to hr = sInputFormat.AssignFormat(SPSF_16kHz16BitStereo) because SPSF_16kHz16BitStereo will certainly not be correct and I don't think we should hardcode it.
I am trying to modify the metadata of some audio files in C++, and I came across [what I thought was] a possible way using Windows Media Foundation. So I tried to put together a simple solution:
#include <atlbase.h>
#include <mfapi.h>
#include <mfidl.h>
#include <Windows.h>
#pragma comment(lib, "Mf.lib")
#pragma comment(lib, "Mfplat.lib")
int main() {
HRESULT hr;
CComPtr<IMFSourceResolver> source_resolver(nullptr);
if (FAILED(hr = MFCreateSourceResolver(&source_resolver))) {
// Handle errors...
}
MF_OBJECT_TYPE object_type = MF_OBJECT_INVALID;
CComPtr<IUnknown> source_object(nullptr);
if (FAILED(hr = source_resolver->CreateObjectFromURL(L"audio_file_here", MF_RESOLUTION_MEDIASOURCE | MF_RESOLUTION_READ | MF_RESOLUTION_CONTENT_DOES_NOT_HAVE_TO_MATCH_EXTENSION_OR_MIME_TYPE, NULL, &object_type, &source_object))) {
// Handle errors...
}
CComPtr<IMFMediaSource> source(nullptr);
if (FAILED(hr = source_object->QueryInterface(IID_PPV_ARGS(&source)))) {
// Handle errors...
}
CComPtr<IMFPresentationDescriptor> presentation_descriptor(nullptr);
if (FAILED(hr = source->CreatePresentationDescriptor(&presentation_descriptor))) {
// Handle errors...
}
CComPtr<IMFMetadataProvider> metadata_prov(nullptr);
if (FAILED(hr = MFGetService(source, MF_METADATA_PROVIDER_SERVICE, IID_PPV_ARGS(&metadata_prov)))) {
// Handle errors...
}
CComPtr<IMFMetadata> metadata(nullptr);
if (FAILED(hr = metadata_prov->GetMFMetadata(presentation_descriptor, 0, NULL, &metadata))) {
// Handle errors...
}
/* Use metadata, etc etc */
}
It works fine for a standard MP3 (.mp3) file, yet it always fails on AAC (.m4a from iTunes) audio files. Specifically, the MFGetService() function fails with a return value given by Visual Studio as "The object does not support the specified service.".
I don't understand why this is the case. Right here it says Media Foundation supports AAC, and Windows definitely supports it somehow, because I can play my AAC files perfectly fine via the in-built Groove Music player.
Furthermore, the file metadata is also readable by Windows somehow, because I can view the properties of the file in Explorer, which it lists the title, artist, album, etc just fine.
So how can I read and write metadata from MP3 and AAC audio files? Is it possible through Media Foundation, or do I need another tool from the Windows APIs? (I've seen reference here to a method involving a "Windows Shell interface", is that the way to go?)
First of all, your question is not actually related to AAC. You don't do AAC files here and your file is a MPEG-4 file (typically .MP4 however your .M4A is just a variant/alias of .MP4).
What is the difference between M4A and AAC Audio Files?
So the question is whether you can access the metadata of MPEG-4 file using MF_METADATA_PROVIDER_SERVICE or otherwise with Media Foundation.
Note that "support for AAC or MPEG-4" does not necessarily means metadata management as metadata management is, after all, an auxiliary capability.
It seems Microsoft deprecated metadata provider service and is no longer offering it for new media sources. Even though support for MF_METADATA_PROVIDER_SERVICE remains available for .MP3 files, it is no longer offered for .MP4. Instead, Microsoft suggests use of shell property handlers, which, for MP4 files, recourse to Media Foundation internally.
See MF_METADATA_PROVIDER_SERVICE for MP4 file:
To get metadata from the MP4 source you should actually get the IPropertyStore interface from the MF_PROPERTY_HANDLER_SERVICE service. MSDN is being updated to document this new method of retrieving metadata...
For info, the standard Shell property keys are documented here: Windows Properties.
Shell and explorer use exactly this method to retrieve the metadata.
You can also use FilePropertyStore tool from there to quickly list properties available via shell property handler API:
I am new to DirectShow API.
I want to decode a media file and get uncompressed RGB video frames using DirectShow.
I noted that all such operations should be completed through a GraphBuilder. Also, every the processing block is called a filter and there are many different filters for different media files. For example, for decoding H264 we should use "Microsoft MPEG-2 Video Decoder", for AVI files "AVI Splitter Filter" etc.
I would like to know if there is a general way (decoder) that can handle all those file types?
I would really appreciate if someone can point out an example that goes from importing a local file to decoding it into uncompressed RGB frames. All the examples I found are dealing with window handles and they just configure it and call pGraph->run(). I have also surfed through Windows SDK samples, but couldn't find useful samples.
Thanks very much in advance.
Universal DirectShow decoder in general is against the concept of DirectShow API. The whole idea is that individual filters are responsible for individual task (esp. decoding certain encoding or demultiplexing certain container format). The registry of the filters and Intelligent Connect let one to have the filters built in chain to do certain requested processing, in particular decoding from compressed format to 24-bit RGB for video.
From this standpoint you don't need a universal decoder and it is not expected that such decoder exists. However, such decoder (or close) does exist and it's a ffdshow or one of its derivatives. Presently, you might want to look at LAVFilters, for example. They wrap FFmpeg, which itself can handle many formats, and connect it to DirectShow API so that, as as filter, ffdshow could handle many formats/encodings.
There is no general rule to use or not use such codec pack, in most cases you take into consideration various factors and decide what to do. If your application handles various scenarios, a good starting point into graph building would be Overview of Graph Building.
My goal is to accomplish the task using DirectShow in order to have no external dependencies. Do you know a particular example that does uncompressing frames for some file type?
Your request is too broad and in the same time typical and, to some extent, fairy simple. If you spend some time playing with GraphEdit SDK tool, or rather GraphStudioNext, which is a more powerful version of the former, you will be able to build filter graph interactively, also render media files of different types and see what filters participate in rendering. You can accomplish the very same programmatically too, since the interactive actions basically all have matching API calls individually.
You will be able to see that specific formats are handled by different filters and Intelligent Connect mentioned above is building chains of filters in combinations in order to satisfy the requests and get the pipeline together.
Default use case is playback, and if you want to get video rendered to 24/32-bit RGB, your course of actions is pretty much similar: you are to build a graph, which just needs to terminate with something else. More flexible, sophisticated and typical for advanced development approach is to supply a custom video renderer filter and accept decompressed RGB frames on it.
A simple and so much popular version of the solution is to use Sample Grabber filter, initialize it to accept RGB, setup a callback on it so that your SampleCB callback method is called every time RGB frame is decompressed, and use Sample Grabber in the graph. (You will find really a lot of attempts to accomplish that if you search open source code and/or web for keywords ISampleGrabber, ISampleGrabberCB, SampleCB or BufferCB, MEDIASUBTYPE_RGB24).
Using the Sample Grabber
DirectShow: Examples for Using SampleGrabber for Grabbing a Frame and Building a VU Meter
Another more or less popular approach is to setup a playback pipeline, play a file, and read back frames from video presenter. This is suggested in another answer to the question, is relatively easy to do, and does the job if you don't have performance requirement and requirements to extract every single frame. That is, it is a good way to get a random RGB frame from the feed but not every/all frames. See related on this:
Different approaches on getting captured video frames in DirectShow
You are looking for vmr9 example in DirectShow library.
In your Windows SDK's install, look for this example:
Microsoft SDKs\Windows\v7.0\Samples\multimedia\directshow\vmr9\windowless\windowless.sln
And search this function: CaptureImage, in this method, see IVMRWindowlessControl9::GetCurrentImage, is exactly what you want.
This method captures a video frame in bitmap format (RGB).
Next, this is a copy of CaptureImage code:
BOOL CaptureImage(LPCTSTR szFile)
{
HRESULT hr;
if(pWC && !g_bAudioOnly)
{
BYTE* lpCurrImage = NULL;
// Read the current video frame into a byte buffer. The information
// will be returned in a packed Windows DIB and will be allocated
// by the VMR.
if(SUCCEEDED(hr = pWC->GetCurrentImage(&lpCurrImage)))
{
BITMAPFILEHEADER hdr;
DWORD dwSize, dwWritten;
LPBITMAPINFOHEADER pdib = (LPBITMAPINFOHEADER) lpCurrImage;
// Create a new file to store the bitmap data
HANDLE hFile = CreateFile(szFile, GENERIC_WRITE, FILE_SHARE_READ, NULL,
CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);
if (hFile == INVALID_HANDLE_VALUE)
return FALSE;
// Initialize the bitmap header
dwSize = DibSize(pdib);
hdr.bfType = BFT_BITMAP;
hdr.bfSize = dwSize + sizeof(BITMAPFILEHEADER);
hdr.bfReserved1 = 0;
hdr.bfReserved2 = 0;
hdr.bfOffBits = (DWORD)sizeof(BITMAPFILEHEADER) + pdib->biSize +
DibPaletteSize(pdib);
// Write the bitmap header and bitmap bits to the file
WriteFile(hFile, (LPCVOID) &hdr, sizeof(BITMAPFILEHEADER), &dwWritten, 0);
WriteFile(hFile, (LPCVOID) pdib, dwSize, &dwWritten, 0);
// Close the file
CloseHandle(hFile);
// The app must free the image data returned from GetCurrentImage()
CoTaskMemFree(lpCurrImage);
// Give user feedback that the write has completed
TCHAR szDir[MAX_PATH];
GetCurrentDirectory(MAX_PATH, szDir);
// Strip off the trailing slash, if it exists
int nLength = (int) _tcslen(szDir);
if (szDir[nLength-1] == TEXT('\\'))
szDir[nLength-1] = TEXT('\0');
Msg(TEXT("Captured current image to %s\\%s."), szDir, szFile);
return TRUE;
}
else
{
Msg(TEXT("Failed to capture image! hr=0x%x"), hr);
return FALSE;
}
}
return FALSE;
}
I have a file with .amr extension, and I want to get it's sample rate and number of channels using Microsoft Media Foundation. Further, I want to decode and get the uncompressed data.
I can successfully get those from .aac .mp4 and other file types but not from from a .amr file (or 3.gp file which contains .amr track).
So, for other types I do:
IMFSourceReader *m_pReader;
IMFMediaType *m_pAudioType;
MFCreateSourceReaderFromURL(filePath, NULL, &m_pReader);
m_pReader->SetStreamSelection(MF_SOURCE_READER_ALL_STREAMS, false);
m_pReader->SetStreamSelection(MF_SOURCE_READER_FIRST_AUDIO_STREAM, true);
m_pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, &m_pAudioType);
UINT32 numChannels,sampleRate;
m_pAudioType->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &numChannels);
m_pAudioType->GetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, &sampleRate);
Consider there are no any errors during this code.
For .amr files, some garbage is being written in the numChannels and sampleRate.
Does anyone have experience with this and knows how to recognize and/or get proper channels and sample rate for further decoding?
BTW, Windows Media Player plays this file with no problems.
Thanks in advance.
So I found out that it supports decoding for .amr files not encoding.
Just before we get this properties:
UINT32 numChannels,sampleRate;
m_pAudioType->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &numChannels);
m_pAudioType->GetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, &sampleRate);
We have to set a new media type to our Source Reader
m_pAudioType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)
m_pAudioType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_Float)
m_pReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, m_pAudioType);
I would like to open a small video file and map every frames in memory (to apply some custom filter). I don't want to handle the video codec, I would rather let the library handle that for me.
I've tried to use Direct Show with the SampleGrabber filter (using this sample http://msdn.microsoft.com/en-us/library/ms787867(VS.85).aspx), but I only managed to grab some frames (not every frames!). I'm quite new in video software programming, maybe I'm not using the best library, or I'm doing it wrong.
I've pasted a part of my code (mainly a modified copy/paste from the msdn example), unfortunately it doesn't grabb the 25 first frames as expected...
[...]
hr = pGrabber->SetOneShot(TRUE);
hr = pGrabber->SetBufferSamples(TRUE);
pControl->Run(); // Run the graph.
pEvent->WaitForCompletion(INFINITE, &evCode); // Wait till it's done.
// Find the required buffer size.
long cbBuffer = 0;
hr = pGrabber->GetCurrentBuffer(&cbBuffer, NULL);
for( int i = 0 ; i < 25 ; ++i )
{
pControl->Run(); // Run the graph.
pEvent->WaitForCompletion(INFINITE, &evCode); // Wait till it's done.
char *pBuffer = new char[cbBuffer];
hr = pGrabber->GetCurrentBuffer(&cbBuffer, (long*)pBuffer);
AM_MEDIA_TYPE mt;
hr = pGrabber->GetConnectedMediaType(&mt);
VIDEOINFOHEADER *pVih;
pVih = (VIDEOINFOHEADER*)mt.pbFormat;
[...]
}
[...]
Is there somebody, with video software experience, who can advise me about code or other simpler library?
Thanks
Edit:
Msdn links seems not to work (see the bug)
Currently these are the most popular video frameworks available on Win32 platforms:
Video for Windows: old windows framework coming from the age of Win95 but still widely used because it is very simple to use. Unfortunately it supports only AVI files for which the proper VFW codec has been installed.
DirectShow: standard WinXP framework, it can basically load all formats you can play with Windows Media Player. Rather difficult to use.
Ffmpeg: more precisely libavcodec and libavformat that comes with Ffmpeg open- source multimedia utility. It is extremely powerful and can read a lot of formats (almost everything you can play with VLC) even if you don't have the codec installed on the system. It's quite complicated to use but you can always get inspired by the code of ffplay that comes shipped with it or by other implementations in open-source software. Anyway I think it's still much easier to use than DS (and much faster). It needs to be comipled by MinGW on Windows, but all the steps are explained very well here (in this moment the link is down, hope not dead).
QuickTime: the Apple framework is not the best solution for Windows platform, since it needs QuickTime app to be installed and also the proper QuickTime codec for every format; it does not support many formats, but its quite common in professional field (so some codec are actually only for QuickTime). Shouldn't be too difficult to implement.
Gstreamer: latest open source framework. I don't know much about it, I guess it wraps over some of the other systems (but I'm not sure).
All of this frameworks have been implemented as backend in OpenCv Highgui, except for DirectShow. The default framework for Win32 OpenCV is using VFW (and thus able only to open some AVI files), if you want to use the others you must download the CVS instead of the official release and still do some hacking on the code and it's anyway not too complete, for example FFMPEG backend doesn't allow to seek in the stream.
If you want to use QuickTime with OpenCV this can help you.
I have used OpenCV to load video files and process them. It's also handy for many types of video processing including those useful for computer vision.
Using the "Callback" model of SampleGrabber may give you better results. See the example in Samples\C++\DirectShow\Editing\GrabBitmaps.
There's also a lot of info in Samples\C++\DirectShow\Filters\Grabber2\grabber_text.txt and readme.txt.
I know it is very tempting in C++ to get a proper breakdown of the video files and just do it yourself. But although the information is out there, it is such a long winded process building classes to hand each file format, and make it easily alterable to take future structure changes into account, that frankly it just is not worth the effort.
Instead I recommend ffmpeg. It got a mention above, but says it is difficult, it isn't difficult. There are a lot more options than most people would need which makes it look more difficult than it is. For the majority of operations you can just let ffmpeg work it out for itself.
For example a file conversion
ffmpeg -i inputFile.mp4 outputFile.avi
Decide right from the start that you will have ffmpeg operations run in a thread, or more precisely a thread library. But have your own thread class wrap it so that you can have your own EventAgs and methods of checking the thread is finished. Something like :-
ThreadLibManager()
{
List<MyThreads> listOfActiveThreads;
public AddThread(MyThreads);
}
Your thread class is something like:-
class MyThread
{
public Thread threadForThisInstance { get; set; }
public MyFFMpegTools mpegTools { get; set; }
}
MyFFMpegTools performs many different video operations, so you want your own event
args to tell your parent code precisely what type of operation has just raised and
event.
enum MyFmpegArgs
{
public int thisThreadID { get; set; } //Set as a new MyThread is added to the List<>
public MyFfmpegType operationType {get; set;}
//output paths etc that the parent handler will need to find output files
}
enum MyFfmpegType
{
FF_CONVERTFILE = 0, FF_CREATETHUMBNAIL, FF_EXTRACTFRAMES ...
}
Here is a small snippet of my ffmpeg tool class, this part collecting information about a video.
I put FFmpeg in a particular location, and at the start of the software running it makes sure that it is there. For this version I have moved it to the Desktop, I am fairly sure I have written the path correctly for you (I really hate MS's special folders system, so I ignore it as much as I can).
Anyway, it is an example of using windowless ffmpeg.
public string GetVideoInfo(FileInfo fi)
{
outputBuilder.Clear();
string strCommand = string.Concat(" -i \"", fi.FullName, "\"");
string ffPath =
System.Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\ffmpeg.exe";
string oStr = "";
try
{
Process build = new Process();
//build.StartInfo.WorkingDirectory = #"dir";
build.StartInfo.Arguments = strCommand;
build.StartInfo.FileName = ffPath;
build.StartInfo.UseShellExecute = false;
build.StartInfo.RedirectStandardOutput = true;
build.StartInfo.RedirectStandardError = true;
build.StartInfo.CreateNoWindow = true;
build.ErrorDataReceived += build_ErrorDataReceived;
build.OutputDataReceived += build_ErrorDataReceived;
build.EnableRaisingEvents = true;
build.Start();
build.BeginOutputReadLine();
build.BeginErrorReadLine();
build.WaitForExit();
string findThis = "start";
int offset = 0;
foreach (string str in outputBuilder)
{
if (str.Contains("Duration"))
{
offset = str.IndexOf(findThis);
oStr = str.Substring(0, offset);
}
}
}
catch
{
oStr = "Error collecting file information";
}
return oStr;
}
private void build_ErrorDataReceived(object sender, DataReceivedEventArgs e)
{
string strMessage = e.Data;
if (outputBuilder != null && strMessage != null)
{
outputBuilder.Add(string.Concat(strMessage, "\n"));
}
}
Try using the OpenCV library. It definitely has the capabilities you require.
This guide has a section about accessing frames from a video file.
If it's for AVI files I'd read the data from the AVI file myself and extract the frames. Now use the video compression manager to decompress it.
The AVI file format is very simple, see: http://msdn.microsoft.com/en-us/library/dd318187(VS.85).aspx (and use google).
Once you have the file open you just extract each frame and pass it to ICDecompress() to decompress it.
It seems like a lot of work but it's the most reliable way.
If that's too much work, or if you want more than AVI files then use ffmpeg.
OpenCV is the best solution if video in your case only needs to lead to a sequence of pictures. If you're willing to do real video processing, so ViDeo equals "Visual Audio", you need to keep up track with the ones offered by "martjno". New windows solutions also for Win7 include 3 new possibilities additionally:
Windows Media Foundation: Successor of DirectShow; cleaned-up interface
Windows Media Encoder 9: It does not only include the programm, it also ships libraries for coding
Windows Expression 4: Successor of 2.
Last 2 are commercial-only solutions, but the first one is free. To code WMF, you need to install the Windows SDK.
I would recommend FFMPEG or GStreamer. Try and stay away from openCV unless you plan to utilize some other functionality than just streaming video. The library is a beefy build and a pain to install from source to configure FFMPEG/+GStreamer options.