IMFSourceReader M4A Audio Accurate Frame Seek - c++

I'm using IMFSourceReader to continuously buffer 1 second portions of audio files from disk. I'm unable to accurately seek M4A audio data (AAC encoded) and this results in a discontinuous audio stream.
I'm aware that the data returned by IMFSourceReader.Read() is usually offset by a few hundred frames into the past relative to the position set in IMFSourceReader.SetCurrentPosition(). However, even accounting for this offset I'm unable to create a continuous glitch free stream (see readCall == 0 condition).
I am able to accurately seek portions of WAV files (uncompressed) so my offset calculation appears to be correct.
My question is whether the Media Foundation library is able to accurately seek/read portions of AAC encoded M4A files (or any compressed audio for that matter)?
Here's the code. inStartFrame is the sample frame I'm trying to read. Output format is configured as 32bit floating point data (see final function). To trim it down a little I've removed some error checks and cleanup e.g. end of file.
bool WindowsM4AReader::read(float** outBuffer, int inNumChannels, int64_t inStartFrame, int64_t inNumFramesToRead)
{
int64_t hnsToRequest = SampleFrameToHNS(inStartFrame);
int64_t frameRequested = HNSToSampleFrame(hnsToRequest);
PROPVARIANT positionProp;
positionProp.vt = VT_I8;
positionProp.hVal.QuadPart = hnsToRequest;
HRESULT hr = mReader->SetCurrentPosition(GUID_NULL, positionProp);
mReader->Flush(0);
IMFSample* pSample = nullptr;
int bytesPerFrame = sizeof(float) * mNumChannels;
int64_t totalFramesWritten = 0;
int64_t remainingFrames = inNumFramesToRead;
int readCall = 0;
bool quit = false;
while (!quit) {
DWORD streamIndex = 0;
DWORD flags = 0;
LONGLONG llTimeStamp = 0;
hr = mReader->ReadSample(
MF_SOURCE_READER_FIRST_AUDIO_STREAM, // Stream index.
0, // Flags.
&streamIndex, // Receives the actual stream index.
&flags, // Receives status flags.
&llTimeStamp, // Receives the time stamp.
&pSample // Receives the sample or NULL.
);
int64_t frameOffset = 0;
if (readCall == 0) {
int64_t hnsOffset = hnsToRequest - llTimeStamp;
frameOffset = HNSToSampleFrame(hnsOffset);
}
++readCall;
if (pSample) {
IMFMediaBuffer* decodedBuffer = nullptr;
pSample->ConvertToContiguousBuffer(&decodedBuffer);
BYTE* rawBuffer = nullptr;
DWORD maxLength = 0;
DWORD bufferLengthInBytes = 0;
decodedBuffer->Lock(&rawBuffer, &maxLength, &bufferLengthInBytes);
int64_t availableFrames = bufferLengthInBytes / bytesPerFrame;
availableFrames -= frameOffset;
int64_t framesToCopy = min(availableFrames, remainingFrames);
// copy to outputBuffer
float* floatBuffer = (float*)rawBuffer;
float* offsetBuffer = &floatBuffer[frameOffset * mNumChannels];
for (int channel = 0; channel < mNumChannels; ++channel) {
for (int64_t frame = 0; frame < framesToCopy; ++frame) {
float sampleValue = offsetBuffer[frame * mNumChannels + channel];
outBuffer[channel][totalFramesWritten + frame] = sampleValue;
}
}
decodedBuffer->Unlock();
totalFramesWritten += framesToCopy;
remainingFrames -= framesToCopy;
if (totalFramesWritten >= inNumFramesToRead)
quit = true;
}
}
}
LONGLONG WindowsM4AReader::SampleFrameToHNS(int64_t inFrame)
{
return inFrame * (10000000.0 / mSampleRate);
}
int64_t WindowsM4AReader::HNSToSampleFrame(LONGLONG inHNS)
{
return inHNS / 10000000.0 * mSampleRate;
}
bool WindowsM4AReader::ConfigureAsFloatDecoder()
{
IMFMediaType* outputType = nullptr;
HRESULT hr = MFCreateMediaType(&outputType);
UINT32 bitsPerSample = sizeof(float) * 8;
UINT32 blockAlign = mNumChannels * (bitsPerSample / 8);
UINT32 bytesPerSecond = blockAlign * (UINT32)mSampleRate;
hr = outputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio);
hr = outputType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_Float);
hr = outputType->SetUINT32(MF_MT_AUDIO_PREFER_WAVEFORMATEX, TRUE);
hr = outputType->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, (UINT32)mNumChannels);
hr = outputType->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, (UINT32)mSampleRate);
hr = outputType->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, blockAlign);
hr = outputType->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, bytesPerSecond);
hr = outputType->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, bitsPerSample);
hr = outputType->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE);
DWORD streamIndex = 0;
hr = mReader->SetCurrentMediaType(streamIndex, NULL, outputType);
return true;
}

If you are using AAC Decoder provided by Microsoft (AAC Decoder), and the MPEG-4 File Source, yes i confirm, you can't seek audio frame with the same precision of wave file.
I'll have to make more test, but i think it's possible to find a workaround in your case.
EDIT
I've made a program to check seek position with SourceReader :
github mofo7777
Under Stackoverflow > AudioSourceReaderSeek
Wav format is perfect at seeking, mp3 is good, and m4a not really good.
But the m4a file was encoded with VLC. I encoded a m4a file using Mediafoundation encoder. The result is better when seeking with this file (like mp3).
So i would say that the encoder is important for seeking.
It would be interesting to test different audio format, with different encoders.
Also, there is IMFSeekInfo interface
I can't test this interface, because i'm under Windows Seven, and it's for Win8. It would be interesting for someone to test.

Related

Xaudio2 pop sound when changing buffer or looping

I have a simple program that plays a sine wave.
At the end of the buffer I get a pop sound.
If I try to loop I get the pop sound between each loop.
If I alternate between buffers I get the pop sound.
struct win32_audio_buffer
{
XAUDIO2_BUFFER XAudioBuffer = {};
int16 *Memory;
};
struct win32_audio_setteings
{
int32 SampleRate = 44100;
int32 ToneHz = 200;
int32 Channels = 2;
int32 LoopTime = 10;
int32 TotalSamples = SampleRate * LoopTime;
};
win32_audio_setteings AudioSetteings;
win32_audio_buffer MainAudioBuffer;
win32_audio_buffer SecondaryAudioBuffer;
IXAudio2SourceVoice* pSourceVoice;
internal void Win32InitXaudio2()
{
WAVEFORMATEX WaveFormat = {};
WaveFormat.wFormatTag = WAVE_FORMAT_PCM;
WaveFormat.nChannels = AudioSetteings.Channels;
WaveFormat.nSamplesPerSec = AudioSetteings.SampleRate;
WaveFormat.wBitsPerSample = 16;
WaveFormat.nBlockAlign = (WaveFormat.nChannels * WaveFormat.wBitsPerSample) / 8;
WaveFormat.nAvgBytesPerSec = WaveFormat.nSamplesPerSec * WaveFormat.nBlockAlign;
WaveFormat.cbSize = 0;
IXAudio2* pXAudio2;
IXAudio2MasteringVoice* pMasterVoice;
XAudio2Create(&pXAudio2);
pXAudio2->CreateMasteringVoice(&pMasterVoice);
pXAudio2->CreateSourceVoice(&pSourceVoice, &WaveFormat);
}
//DOC: AudioBytes - Size of the audio data
//DOC: pAudioData - The buffer start loaction (Needs to be type cast into BYTE pointer)
internal void Win32CreateAudioBuffer(win32_audio_buffer *AudioBuffer)
{
int32 Size = (int16)sizeof(int16) * AudioSetteings.Channels * AudioSetteings.SampleRate * AudioSetteings.LoopTime;
AudioBuffer->Memory = (int16 *)VirtualAlloc(0, Size, MEM_COMMIT|MEM_RESERVE, PAGE_READWRITE);
AudioBuffer->XAudioBuffer.AudioBytes = Size;
AudioBuffer->XAudioBuffer.pAudioData = (BYTE *) AudioBuffer->Memory;
//AudioBuffer->XAudioBuffer.Flags = XAUDIO2_END_OF_STREAM;
AudioBuffer->XAudioBuffer.PlayBegin = 0;
AudioBuffer->XAudioBuffer.PlayLength = AudioSetteings.TotalSamples;
//AudioBuffer->XAudioBuffer.LoopCount = 10;
}
internal void Win32Playback(win32_audio_buffer *AudioBuffer)
{
for (int32 Index = 0, Sample = 0; Sample < AudioSetteings.TotalSamples; Sample++)
{
real32 Sine = sinf(Sample * 2.0f * Pi32 / AudioSetteings.ToneHz);
int16 value = (int16)(4000 * Sine);
AudioBuffer->Memory[Index++] = value;
AudioBuffer->Memory[Index++] = value;
}
pSourceVoice->SubmitSourceBuffer(&AudioBuffer->XAudioBuffer);
}
Win32InitXaudio2();
Win32CreateAudioBuffer(&MainAudioBuffer);
//Win32CreateAudioBuffer(&SecondaryAudioBuffer);
Win32Playback(&MainAudioBuffer);
//Win32Playback(&SecondaryAudioBuffer);
pSourceVoice->Start(0);
I have posted the relevant code here and it just play one sine buffer.
I tried altrantaing buffers and to start and end on a zero-crossing.
I had a similar problem.
Maybe it will help someone.
The problem is in allocating more memory for audio than needed.
So I tried something like this and found the problem (this is not solution I just show how I found problem! Probably, if it will not help in your case, then the problem somewhere else)
// XAUDIO2_BUFFER m_xaudio2Buffer...
m_xaudio2Buffer.pAudioData = source->m_data;
m_xaudio2Buffer.AudioBytes = source->m_dataSize - 100; // -100 and `pop` sound is gone
m_xaudio2Buffer.Flags = XAUDIO2_END_OF_STREAM;

Legacy Print driver is fuzzy

We have an old Print driver which takes a document and sends PCL to the spooler. The client then processes this PCL and displays everything as TIFF. Our users have been complaining that the TIFF is fuzzy and the image is not sharp. I am given this stupid task of solving the mystery
Is the PCL itself bad. I don't have enough knowledge about a PCL and if it has resolution. How do I trap the output of a driver that's being sent to the spooler?
Or is it the client that is somehow not rendering the PCL with a good resolution. Do I need to go through the pain of learning how to debug this driver. I will but is it going to help me fix a resolution issue. I have never done driver development so its going to be a curve for me. But if I need to do then thats ok. Where should I start? Is the PCL thats bad or the client that converts PCL to bitmap bad?
This is the C++ code
BOOL APIENTRY
CreatePCLRasterGraphicPage(
SURFOBJ *pso,
BOOL firstPage,
char *pageText
)
/*++
Routine Description:
Creates standard PCL end-of-document lines.
Arguments:
SURFOBJ - Surface Object
BOOL - First Page ?
char * Page Text
Return Value:
BOOL - True if successful
--*/
{
PDEVOBJ pDevObj = (PDEVOBJ)pso->dhpdev;
POEMPDEV pOemPDEV = (POEMPDEV)pDevObj->pdevOEM;
DWORD dwOffset = 0;
DWORD dwWritten = 0;
DWORD dwPageBufferSize = 0;
int i = 0;
ULONG n = 0;
BYTE bitmapRow[1050];
BYTE compRow[2100];
DWORD dwRowSize = 0;
DWORD dwCompPCLBitmapSize = 0;
//wchar_t traceBuff[256];
pOemPDEV->dwCompBitmapBufSize = 0;
// TRACE OUT ----------------------------------------------------
//ZeroMemory(traceBuff, 256);
//StringCchPrintf(traceBuff, 256, L"Top of CreatePCLRasterGraphicPage");
//WriteTraceLine(traceBuff);
// -----------------------------------------------------------------
// Invert color
for (n = 0; n < pso->cjBits; n++)
*(((PBYTE &)pso->pvBits) + n) ^= 0xFF;
// compress each row and store in a buffer with PCL line headings
for (i = 0; i < pso->sizlBitmap.cy; i++) {
// Zero Memory hack for bottom of form black line
if (*(((PBYTE &)pso->pvScan0) + (i * pso->lDelta) + 319) == 0xFF)
ZeroMemory(((PBYTE &)pso->pvScan0) + (i * pso->lDelta), 320);
// Copy the bitmap scan line into bitmapRow and send them off to be compressed
ZeroMemory(bitmapRow, 1050);
ZeroMemory(compRow, 2100);
MoveMemory(bitmapRow, ((PBYTE &)pso->pvScan0) + (i * pso->lDelta), pso->lDelta);
dwRowSize = CompressBitmapRow(compRow, bitmapRow, pso->lDelta);
// Create PCL Row Heading
char bufPCLLineHead[9];
StringCchPrintfA(bufPCLLineHead, 9, "%c%s%d%s", 27, "*b", dwRowSize, "W");
if ((dwCompPCLBitmapSize + dwRowSize + strlen(bufPCLLineHead))
> pOemPDEV->dwCompBitmapBufSize) {
if (!GrowCompBitmapBuf(pOemPDEV)) {
//ZeroMemory(traceBuff, 256);
//StringCchPrintf(traceBuff, 256,
// L"Compressed bitmap buffer could not allocate more memory.");
//WriteTraceLine(traceBuff);
}
}
if (pOemPDEV->pCompBitmapBufStart) {
// write the PCL line heading to the buffer
MoveMemory(pOemPDEV->pCompBitmapBufStart + dwCompPCLBitmapSize,
bufPCLLineHead, strlen(bufPCLLineHead));
dwCompPCLBitmapSize += strlen(bufPCLLineHead);
// write the compressed row to the buffer
MoveMemory(pOemPDEV->pCompBitmapBufStart + dwCompPCLBitmapSize,
compRow, dwRowSize);
dwCompPCLBitmapSize += dwRowSize;
}
}
// Calculate size and create buffer
dwPageBufferSize = 21;
if (!firstPage)
dwPageBufferSize++;
bGrowBuffer(pOemPDEV, dwPageBufferSize);
// Add all Raster Header Lines
if (!firstPage)
{
// Add a Form Feed
char bufFormFeed[2];
StringCchPrintfA(bufFormFeed, 2, "%c", 12); // 1 char
MoveMemory(pOemPDEV->pBufStart + dwOffset, bufFormFeed, 2);
dwOffset += 1;
}
// Position cursor at X0, Y0
char bufXY[8];
StringCchPrintfA(bufXY, 8, "%c%s", 27, "*p0x0Y"); // 7 chars
MoveMemory(pOemPDEV->pBufStart + dwOffset, bufXY, 8);
dwOffset += 7;
// Start Raster Graphics
char bufStartRas[6];
StringCchPrintfA(bufStartRas, 6, "%c%s", 27, "*r1A"); // 5 chars
MoveMemory(pOemPDEV->pBufStart + dwOffset, bufStartRas, 6);
dwOffset += 5;
// Raster Encoding - Run-Length Encoding
char bufRasEncoding[6];
StringCchPrintfA(bufRasEncoding, 6, "%c%s", 27, "*b1M"); // 5 chars
MoveMemory(pOemPDEV->pBufStart + dwOffset, bufRasEncoding, 6);
dwOffset += 5;
// Write out bitmap header PCL
dwWritten = pDevObj->pDrvProcs->DrvWriteSpoolBuf(pDevObj, pOemPDEV->pBufStart, dwPageBufferSize);
// Write out PCL plus compressed bitmap bytes
dwWritten = pDevObj->pDrvProcs->DrvWriteSpoolBuf(pDevObj, pOemPDEV->pCompBitmapBufStart, dwCompPCLBitmapSize);
// End Raster Graphics
char bufEndRas[5];
StringCchPrintfA(bufEndRas, 5, "%c%s", 27, "*rB"); // 4 chars
MoveMemory(pOemPDEV->pBufStart + dwOffset, bufEndRas, 5);
// Write out PCL end bitmap
dwWritten = pDevObj->pDrvProcs->DrvWriteSpoolBuf(pDevObj, bufEndRas, 4);
// Free Compressed Bitmap Memory
if (pOemPDEV->pCompBitmapBufStart) {
MemFree(pOemPDEV->pCompBitmapBufStart);
pOemPDEV->pCompBitmapBufStart = NULL;
pOemPDEV->dwCompBitmapBufSize = 0;
dwPageBufferSize = 0;
}
// Free Memory
vFreeBuffer(pOemPDEV);
// Write Page Text to the spooler
size_t charCount = 0;
StringCchLengthA(pageText, 32767, &charCount);
char bufWriteText[15];
ZeroMemory(bufWriteText, 15);
StringCchPrintfA(bufWriteText, 15, "%c%s%d%s", 27, "(r", charCount, "W");
dwWritten = pDevObj->pDrvProcs->DrvWriteSpoolBuf(pDevObj, bufWriteText, strlen(bufWriteText));
dwWritten = pDevObj->pDrvProcs->DrvWriteSpoolBuf(pDevObj, pageText, charCount);
return TRUE;
}
BOOL
GrowCompBitmapBuf(
POEMPDEV pOemPDEV
)
/*++
Routine Description:
Grows memory by 1000 bytes (per call) to hold compressed
bitmap and PCL data.
Arguments:
POEMPDEV - Pointer to the private PDEV structure
Return Value:
BOOL - True is successful
--*/
{
DWORD dwOldBufferSize = 0;
PBYTE pNewBuffer = NULL;
dwOldBufferSize = pOemPDEV->pCompBitmapBufStart ? pOemPDEV->dwCompBitmapBufSize : 0;
pOemPDEV->dwCompBitmapBufSize = dwOldBufferSize + 4096;
pNewBuffer = (PBYTE)MemAlloc(pOemPDEV->dwCompBitmapBufSize);
if (pNewBuffer == NULL) {
MemFree(pOemPDEV->pCompBitmapBufStart);
pOemPDEV->pCompBitmapBufStart = NULL;
pOemPDEV->dwCompBitmapBufSize = 0;
return FALSE;
}
if (pOemPDEV->pCompBitmapBufStart) {
CopyMemory(pNewBuffer, pOemPDEV->pCompBitmapBufStart, dwOldBufferSize);
MemFree(pOemPDEV->pCompBitmapBufStart);
pOemPDEV->pCompBitmapBufStart = pNewBuffer;
}
else {
pOemPDEV->pCompBitmapBufStart = pNewBuffer;
}
return TRUE;
}
RLE encoding (I have not had a chance to add this yet). I was looking at different forums as how the code should like and this is what I came up with. I will add, test the document and update the post
public virtual sbyte[] decompressRL(sbyte[] data, int startOffset, int width, int count)
{
/*type 1 compression*/
int dataCount = count;
List<sbyte> decompressed = new List<sbyte>();
int numberOfDecompressedBytes = 0;
int dataStartOffset = startOffset;
while (dataCount-- > 0)
{
int cntrlByte = (int) data[dataStartOffset++];
// Repeated pattern
int val = data[dataStartOffset++];
dataCount--;
while (cntrlByte >= 0)
{
decompressed.Insert(numberOfDecompressedBytes++, new sbyte?((sbyte) val));
cntrlByte--;
}
}
mMaxUncompressedByteCount = numberOfDecompressedBytes;
return getBytes(decompressed);
}
This is how fuzzy the users claim that the image looks. This is when printed from a word document to the driver. The original is very clear.

GL Screenshot Breaks on viewport resize…sometimes

I’m developing a plugin for SIMDIS (basically military google earth), written in c++ using VS 2012. It’s a pretty nifty little thing to auto plot points, and one of its functions is to take a series of screenshot of the view-port and save the images off so it can be used/processed somewhere else. This works fine too… until you re-size the view-port one too many times. Re-size is done by clicking the corner of the window and dragging it bigger and smaller, and the program may launch full screen or windowed mode; either way it works fine the first few sets… or as long as the window is not re-sized.
When it breaks, the program will still march happily along, create the files, and filling them with data at what seems to be an appropriate size for whatever resolution image I’m trying to generate… but the format becomes no-good. It will still be a *.bmp, but windows stops being able to understand it. No errors are thrown though, (I think, I’m not catching any GL errors?[if that’s possible?]).
I can’t get it to consistently happen with a specific number of actions, but it seems to start failing after 3-7 view-port re-sizes. I don’t know if this is a problem with my screenshot code, an issue with the SIMDIS program or plugin, a GL issue, or what. I’ve tested it on multiple machines.
Has anyone run into this problem before? Is there something specific I should be doing that I’m not? Is this a problem native to the parent program (SIMDIS), or something I can work with/around with GL commands I don’t know about?
Screenshot code follows:
#include "TakeScreenshot.h" //has "#include <gl/GL.h>" etc...
TakeScreenshot::TakeScreenshot()
{
}
std::vector<int> * TakeScreenshot::TakeAScreenshotBMP(const char* filename)
{
//std::cout << "Screenshot! ";
std::vector<int> * returnVec = new std::vector<int>();
int VPort[4] = {0,0,0,0};
int FSize = 0;
int PackStore = 0;
//get GL viewport dimensions, x,y,w,h into vport
glGetIntegerv(GL_VIEWPORT,VPort);
//make a framebuffer, RGB
FSize = VPort[2]*VPort[3]*3;
unsigned char PStore[8294400];// 4k sized buffer
//store settings
glGetIntegerv(GL_PACK_ALIGNMENT, &PackStore);
//unpack to byte order
glPixelStorei(GL_PACK_ALIGNMENT, 1);
//read the gl buffer into our buffer
glReadPixels(VPort[0],VPort[1],VPort[2],VPort[3],GL_RGB,GL_UNSIGNED_BYTE,&PStore);
//Pass back settings
glPixelStorei(GL_PACK_ALIGNMENT, PackStore);
///
//set up file info
///
BITMAPINFOHEADER BMIH; //info header
BMIH.biSize = sizeof(BITMAPINFOHEADER);
BMIH.biSizeImage= VPort[2] * VPort[3] * 3;
BMIH.biWidth = VPort[2];
BMIH.biHeight = VPort[3];
BMIH.biPlanes = 1;
BMIH.biBitCount = 24;
BMIH.biCompression = BI_RGB;
BITMAPFILEHEADER bmfh;//file header
int nBitsOffset = sizeof(BITMAPFILEHEADER) + BMIH.biSize;
LONG lImageSize = BMIH.biSizeImage;
LONG lFileSize = nBitsOffset + lImageSize;
bmfh.bfType = 'B' + ('M'<<8);
bmfh.bfOffBits = nBitsOffset;
bmfh.bfSize = lFileSize;
bmfh.bfReserved1 = bmfh.bfReserved2 = 0;
// swap r and b values because GL has them backwards for BMP format.
unsigned char SwapByte;
for(int loop = 0; loop<FSize; loop+=3)
{
SwapByte = PStore[loop];
PStore[loop] = PStore[loop+2];
PStore[loop +2] = SwapByte;
}
///
// File writing section
///
FILE *pFile;
pFile = fopen(filename, "wb");
//if something borked
if(pFile == NULL)
{
std::cout << "TakeScreenshot::TakeAScreenshotBMP>> Error; was not able to create file (Permisions?)" << std::endl;
returnVec->push_back(-1);
returnVec->push_back(-1);
return returnVec; //exit
}
UINT nWrittenFileHeaderSize = fwrite(&bmfh,1,sizeof(BITMAPFILEHEADER), pFile);
UINT nWrittenInfoHeaderSize = fwrite(&BMIH,1,sizeof(BITMAPINFOHEADER), pFile);
UINT nWrittenDIBDataSize = fwrite(&PStore, 1, lImageSize, pFile);
fclose(pFile);
//some return data for processing later
returnVec->push_back(VPort[2]);
returnVec->push_back(VPort[3]);
return returnVec;
}
TakeScreenshot::~TakeScreenshot(void)
{
}

FFmpeg audio encoder new encode function

I would like to update an AV Audio encoder using function avcodec_encode_audio (deprecated) to avcodec_encode_audio2, without modifying the structure of existing encoder:
outBytes = avcodec_encode_audio(m_handle, dst, sizeBytes, (const short int*)m_samBuf);
where:
1) m_handle AVCodecContext
2) dst, uint8_t * destination buffer
3) sizeBytes, uint32_t size of the destination buffer
4) m_samBuf void * to the input chunk of data to encode (this is casted to: const short int*)
is there a simply way to do it?
Im tryng with:
int gotPack = 1;
memset (&m_Packet, 0, sizeof (m_Packet));
m_Frame = av_frame_alloc();
av_init_packet(&m_Packet);
m_Packet.data = dst;
m_Packet.size = sizeBytes;
uint8_t* buffer = (uint8_t*)m_samBuf;
m_Frame->nb_samples = m_handle->frame_size;
avcodec_fill_audio_frame(m_Frame,m_handle->channels,m_handle->sample_fmt,buffer,m_FrameSize,1);
outBytes = avcodec_encode_audio2(m_handle, &m_Packet, m_Frame, &gotPack);
char error[256];
av_strerror(outBytes,error,256);
if (outBytes<0){
m_server->log(1,1,"Input data: %d, encode function call error: %s \n",gotPack, error);
return AUDIOWRAPPER_ERROR;
}
av_frame_free(&m_Frame);
it compiles but it does not encode anything, i dont here audio at the output if I pipe the output stream on mplayer, wich was warking prior to the upgrade.
What am I doing wrong?
The encoder accept only two sample formats:
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_FLT, ///< float
here is how the buffer is allocated:
free(m_samBuf);
int bps = 2;
if(m_handle->codec->sample_fmts[0] == AV_SAMPLE_FMT_FLT) {
bps = 4;
}
m_FrameSize = bps * m_handle->frame_size * m_handle->channels;
m_samBuf = malloc(m_FrameSize);
m_numSam = 0;
avcodec_fill_audio_frame should get you there
memset (&m_Packet, 0, sizeof (m_Packet));
memset (&m_Frame, 0, sizeof (m_Frame));
av_init_packet(&m_Packet);
m_Packet.data = dst;
m_Packet.size = sizeBytes;
m_Frame->nb_samples = //you need to get this value from somewhere, it is the number of samples (per channel) this frame represents
avcodec_fill_audio_frame(m_Frame, m_handle->channels, m_handle->sample_fmt,
buffer,
sizeBytes, 1);
int gotPack = 1;
avcodec_encode_audio2(m_handle, &m_Packet, &m_Frame, &gotPack);

Non-audible videos with libwebm (VP8/Opus) -- Syncing audio --

I am trying to create a very simple webm(vp8/opus) encoder, however I can not get the audio to work.
ffprobe does detect the file format and duration
Stream #1:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)
The video can be played just fine in VLC and Chrome, but with no audio, for some reason the audio input bitrate is always 0
Most of the audio encoding code was copied from
https://github.com/fnordware/AdobeWebM/blob/master/src/premiere/WebM_Premiere_Export.cpp
Here is the relevant code:
static const long long kTimeScale = 1000000000LL;
MkvWriter writer;
writer.Open("video.webm");
Segment mux_seg;
mux_seg.Init(&writer);
// VPX encoding...
int16_t pcm[SAMPLES];
uint64_t audio_track_id = mux_seg.AddAudioTrack(SAMPLE_RATE, 1, 0);
mkvmuxer::AudioTrack *audioTrack = (mkvmuxer::AudioTrack*)mux_seg.GetTrackByNumber(audio_track_id);
audioTrack->set_codec_id(mkvmuxer::Tracks::kOpusCodecId);
audioTrack->set_seek_pre_roll(80000000);
OpusEncoder *encoder = opus_encoder_create(SAMPLE_RATE, 1, OPUS_APPLICATION_AUDIO, NULL);
opus_encoder_ctl(encoder, OPUS_SET_BITRATE(64000));
opus_int32 skip = 0;
opus_encoder_ctl(encoder, OPUS_GET_LOOKAHEAD(&skip));
audioTrack->set_codec_delay(skip * kTimeScale / SAMPLE_RATE);
mux_seg.CuesTrack(audio_track_id);
uint64_t currentAudioSample = 0;
uint64_t opus_ts = 0;
while(has_frame) {
int bytes = opus_encode(encoder, pcm, SAMPLES, out, SAMPLES * 8);
opus_ts = currentAudioSample * kTimeScale / SAMPLE_RATE;
mux_seg.AddFrame(out, bytes, audio_track_id, opus_ts, true);
currentAudioSample += SAMPLES;
}
opus_encoder_destroy(encoder);
mux_seg.Finalize();
writer.Close();
Update #1:
It seems that the problem is that WebM requires the audio and video tracks to be interlaced.
However I can not figure out how to sync the audio.
Should I calculate the frame duration, then encode the equivalent audio samples?
The problem was that I was missing the OGG header data, and the audio frames timestamps were not accurate.
to complete the answer here is the pseudo code for the encoder.
const int kTicksPerSecond = 1000000000; // webm timescale
const int kTimeScale = kTicksPerSecond / FPS;
const int kTwoNanoSeconds = 1000000000;
init_opus_encoder();
audioTrack->set_seek_pre_roll(80000000);
audioTrack->set_codec_delay(opus_preskip);
audioTrack->SetCodecPrivate(ogg_header_data, ogg_header_size);
while(has_video_frame) {
encode_vpx_frame();
video_pts = frame_index * kTimeScale;
muxer_segment.addFrame(frame_packet_data, packet_length, video_track_id, video_pts, packet_flags);
// fill the video frames gap with OPUS audio samples
while(audio_pts < video_pts + kTimeScale) {
encode_opus_frame();
muxer_segment.addFrame(opus_frame_data, opus_frame_data_length, audio_track_id, audio_pts, true /* keyframe */);
audio_pts = curr_audio_samples * kTwoNanoSeconds / 48000;
curr_audio_samples += 960;
}
}