waveOutWrite compatible with ASIO? - c++

I am writing an application where I get sound data using low-latency ASIO card. The low-latency means that I get only 128 samples per batch, for 48k sample rate. From the ASIO card, I get raw samples in 32-bit signed integer range.
Now I want to listen to the sound coming through the ASIO card, but not on the ASIO card, but on the default output device in Windows. I am using waveOutWrite set up with WAVE_FORMAT_PCM and the same characteristics as the ASIO input. I call it every time I get a new 128-sample long batch. Now, because Wav format does not allow for 32-bit integer samples, I downgrade them to 16-bits.
HWAVEOUT waveOut;
void startListening(){
WAVEFORMATEX format;
format.wFormatTag = WAVE_FORMAT_PCM;
format.nChannels = 1;
format.nSamplesPerSec = sampleRate;
format.nAvgBytesPerSec = sampleRate * 2;
format.nBlockAlign = 2;
format.wBitsPerSample = 16;
format.cbSize = 0;
MMRESULT result = waveOutOpen(waveOut, WAVE_MAPPER, &format, 0, 0, CALLBACK_NULL);
if(result != MMSYSERR_NOERROR){
return;
}
}
typedef struct{
short *buffer;
int length;
HWAVEOUT waveOut;
} ListenInfo;
void newListeningData(void *buffer, int length){
ListenInfo *listenInfo = new ListenInfo();
listenInfo->buffer = new short[length];
listenInfo->length = length;
listenInfo->waveOut = *waveOut;
if(bitrate == 32){
int *bufferInt = (int *)buffer;
for(int i = 0; i < length; i++){
listenInfo->buffer[i] = (bufferInt[i]);
}
CreateThread(NULL, 0, &(listen), listenInfo, 0, NULL);
}
else if(bitrate == 16){
memcpy(listenInfo->buffer, (short *)buffer, length * 2);
CreateThread(NULL, 0, &(listen), listenInfo, 0, NULL);
}
else{
printf("%d: Bitrate is not 16 or 32!\n", index);
}
}
DWORD WINAPI listen(__in LPVOID lpParameter){
ListenInfo *info = (ListenInfo *)lpParameter;
WAVEHDR header;
memset(&header, 0, sizeof(WAVEHDR));
header.dwBufferLength = info->length;
header.lpData = (char *)(info->buffer);
MMRESULT result = waveOutPrepareHeader(info->waveOut, &header, sizeof(WAVEHDR));
result = waveOutWrite(info->waveOut, &header, sizeof(WAVEHDR));
while(waveOutUnprepareHeader(info->waveOut, &header, sizeof(WAVEHDR)) == WAVERR_STILLPLAYING){
Sleep(10);
}
delete[] info->buffer;
delete info;
return 0;
}
The problem is that I can hear only severe clipping and squeaking. The sound is distorted beyond recognition. I know it is not a synchronization error, because I also save the samples into a wav file with the same characteristics and the sound is distorted in the same way.
How can I convert signed 32-bit samples into something that waveOutWrite can play?

The problem was cause by the fact that I was using different bitrate than I was led to believe I was using. When I modified the WAVEFORMATEX with correct values, it worked!

Related

BIts Per Sample / Pixel libtiff vs WIC

TIFF *TiffImage;
uint16 photo, bps, spp, fillorder;
uint32 width,height;
unsigned long stripSize;
unsigned long imageOffset, result;
int stripMax, stripCount;
unsigned char *buffer, tempbyte;
unsigned short *buffer16;
unsigned int *buffer32;
unsigned long bufferSize, count;
bool success = true;
int shiftCount = 0;
//read image to InData
const char *InFileName = fileName.c_str();
if((TiffImage = TIFFOpen(InFileName, "r")) == NULL){
ErrMsg("Could not open incoming image\n");
return false;
}
// Check that it is of a type that we support
if(TIFFGetField(TiffImage, TIFFTAG_BITSPERSAMPLE, &bps) == 0) {
ErrMsg("Either undefined or unsupported number of bits per sample\n");
return false;
}
TBitPrecision bitPrecision = (TBitPrecision)bps;
char* imageDesc = NULL;
TIFFGetField(TiffImage, TIFFTAG_IMAGEDESCRIPTION, &imageDesc);
// Get actual bit precision for CP Images
if (bps > 8 && bps <= 16)
{
if (GetCpTiffTag(imageDesc, CP_TIFFTAG_BITPRECISION, (uint32*)&bitPrecision) == true)
{
shiftCount = 16 - bitPrecision;
}
}
In my libtiff implementation I have used 12 bit per pixel image and also 10 bpp
it is very easy to set this info in libtiff
I can't find a similar way to do so in WIC
uint16_t photo, bps, spp, fillorder;
uint32_t width,height;
unsigned long stripSize;
unsigned long imageOffset, result;
int stripMax, stripCount;
unsigned char *buffer, tempbyte;
unsigned short *buffer16;
unsigned int *buffer32;
unsigned long bufferSize, count;
bool success = true;
int shiftCount = 0;
//read image to InData
const char *InFileName = fileName.c_str();
IWICImagingFactory* piFactory = NULL;
// Create WIC factory
HRESULT hr = CoCreateInstance(
CLSID_WICImagingFactory,
NULL,
CLSCTX_INPROC_SERVER,
IID_PPV_ARGS(&piFactory)
);
// Create a decoder
IWICBitmapDecoder *pIDecoder = NULL;
IWICBitmapFrameDecode *pIDecoderFrame = NULL;
std::wstring ws;
ws.assign(fileName.begin(), fileName.end());
// get temporary LPCWSTR (pretty safe)
LPCWSTR pcwstr = ws.c_str();
hr = piFactory->CreateDecoderFromFilename(
pcwstr, // Image to be decoded
NULL, // Do not prefer a particular vendor
GENERIC_READ, // Desired read access to the file
WICDecodeMetadataCacheOnDemand, // Cache metadata when needed
&pIDecoder // Pointer to the decoder
);
// Retrieve the first bitmap frame.
if (SUCCEEDED(hr))
{
hr = pIDecoder->GetFrame(0, &pIDecoderFrame);
}
else
{
ErrMsg("Could not open incoming image\n");
}
return true;
Am I suppose to use EXIF or XMP? how do I find the TIFFTAG's for WIC?
The DirectXTex library has lots of examples of using WIC from C++.
You need something like:
using Microsoft::WRL::ComPtr;
ComPtr<IWICMetadataQueryReader> metareader;
hr = pIDecoderFrame->GetMetadataQueryReader(metareader.GetAddressOf());
if (SUCCEEDED(hr))
{
PROPVARIANT value;
PropVariantInit(&value);
if (SUCCEEDED(metareader->GetMetadataByName(L"/ifd/{ushort=258}", &value))
&& value.vt == VT_UI2)
{
// BitsPerSample is in value.uiVal
}
PropVariantClear(&value);
}
You should get in the habit of using a smart-pointer like ComPtr rather than using raw interface pointers to keep the ref counts correct.

DirectSound API explanation

As a college project we have to develop a Server-Client music streaming application using the DirectSound API. However, due to lack of information, guides or tutorials online, the only source I can gather info about it is the piece of code provided below (which was the only thing provided by the lecturer). Can anyone help me understand the general purpose of these functions and the order they should be implemented in?
Thanks in advance.
IDirectSound8 * directSound = nullptr;
IDirectSoundBuffer * primaryBuffer = nullptr;
IDirectSoundBuffer8 * secondaryBuffer = nullptr;
BYTE * dataBuffer = nullptr;
DWORD dataBufferSize;
DWORD averageBytesPerSecond;
// Search the file for the chunk we want
// Returns the size of the chunk and its location in the file
HRESULT FindChunk(HANDLE fileHandle, FOURCC fourcc, DWORD & chunkSize, DWORD & chunkDataPosition)
{
HRESULT hr = S_OK;
DWORD chunkType;
DWORD chunkDataSize;
DWORD riffDataSize = 0;
DWORD fileType;
DWORD bytesRead = 0;
DWORD offset = 0;
if (SetFilePointer(fileHandle, 0, NULL, FILE_BEGIN) == INVALID_SET_FILE_POINTER)
{
return HRESULT_FROM_WIN32(GetLastError());
}
while (hr == S_OK)
{
if (ReadFile(fileHandle, &chunkType, sizeof(DWORD), &bytesRead, NULL) == 0)
{
hr = HRESULT_FROM_WIN32(GetLastError());
}
if (ReadFile(fileHandle, &chunkDataSize, sizeof(DWORD), &bytesRead, NULL) == 0)
{
hr = HRESULT_FROM_WIN32(GetLastError());
}
switch (chunkType)
{
case fourccRIFF:
riffDataSize = chunkDataSize;
chunkDataSize = 4;
if (ReadFile(fileHandle, &fileType, sizeof(DWORD), &bytesRead, NULL) == 0)
{
hr = HRESULT_FROM_WIN32(GetLastError());
}
break;
default:
if (SetFilePointer(fileHandle, chunkDataSize, NULL, FILE_CURRENT) == INVALID_SET_FILE_POINTER)
{
return HRESULT_FROM_WIN32(GetLastError());
}
}
offset += sizeof(DWORD) * 2;
if (chunkType == fourcc)
{
chunkSize = chunkDataSize;
chunkDataPosition = offset;
return S_OK;
}
offset += chunkDataSize;
if (bytesRead >= riffDataSize)
{
return S_FALSE;
}
}
return S_OK;
}
// Read a chunk of data of the specified size from the file at the specifed location into the
supplied buffer
HRESULT ReadChunkData(HANDLE fileHandle, void * buffer, DWORD buffersize, DWORD bufferoffset)
{
HRESULT hr = S_OK;
DWORD bytesRead;
if (SetFilePointer(fileHandle, bufferoffset, NULL, FILE_BEGIN) == INVALID_SET_FILE_POINTER)
{
return HRESULT_FROM_WIN32(GetLastError());
}
if (ReadFile(fileHandle, buffer, buffersize, &bytesRead, NULL) == 0)
{
hr = HRESULT_FROM_WIN32(GetLastError());
}
return hr;
}
bool Initialise()
{
HRESULT result;
DSBUFFERDESC bufferDesc;
WAVEFORMATEX waveFormat;
// Initialize the direct sound interface pointer for the default sound device.
result = DirectSoundCreate8(NULL, &directSound, NULL);
if (FAILED(result))
{
return false;
}
// Set the cooperative level to priority so the format of the primary sound buffer can be modified.
// We use the handle of the desktop window since we are a console application. If you do write a
// graphical application, you should use the HWnd of the graphical application.
result = directSound->SetCooperativeLevel(GetDesktopWindow(), DSSCL_PRIORITY);
if (FAILED(result))
{
return false;
}
// Setup the primary buffer description.
bufferDesc.dwSize = sizeof(DSBUFFERDESC);
bufferDesc.dwFlags = DSBCAPS_PRIMARYBUFFER | DSBCAPS_CTRLVOLUME;
bufferDesc.dwBufferBytes = 0;
bufferDesc.dwReserved = 0;
bufferDesc.lpwfxFormat = NULL;
bufferDesc.guid3DAlgorithm = GUID_NULL;
// Get control of the primary sound buffer on the default sound device.
result = directSound->CreateSoundBuffer(&bufferDesc, &primaryBuffer, NULL);
if (FAILED(result))
{
return false;
}
// Setup the format of the primary sound bufffer.
// In this case it is a .WAV file recorded at 44,100 samples per second in 16-bit stereo (cd audio
format).
// Really, we should set this up from the wave file format loaded from the file.
waveFormat.wFormatTag = WAVE_FORMAT_PCM;
waveFormat.nSamplesPerSec = 44100;
waveFormat.wBitsPerSample = 16;
waveFormat.nChannels = 2;
waveFormat.nBlockAlign = (waveFormat.wBitsPerSample / 8) * waveFormat.nChannels;
waveFormat.nAvgBytesPerSec = waveFormat.nSamplesPerSec * waveFormat.nBlockAlign;
waveFormat.cbSize = 0;
// Set the primary buffer to be the wave format specified.
result = primaryBuffer->SetFormat(&waveFormat);
if (FAILED(result))
{
return false;
}
return true;
}
void Shutdown()
{
// Destroy the data buffer
if (dataBuffer != nullptr)
{
delete[] dataBuffer;
dataBuffer = nullptr;
}
// Release the primary sound buffer pointer.
if (primaryBuffer != nullptr)
{
primaryBuffer->Release();
primaryBuffer = nullptr;
}
// Release the direct sound interface pointer.
if (directSound != nullptr)
{
directSound->Release();
directSound = nullptr;
}
}
// Load the wave file into memory and setup the secondary buffer.
bool LoadWaveFile(TCHAR * filename)
{
WAVEFORMATEXTENSIBLE wfx = { 0 };
WAVEFORMATEX waveFormat;
DSBUFFERDESC bufferDesc;
HRESULT result;
IDirectSoundBuffer * tempBuffer;
DWORD chunkSize;
DWORD chunkPosition;
DWORD filetype;
HRESULT hr = S_OK;
// Open the wave file
HANDLE fileHandle = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0,
NULL);
if (fileHandle == INVALID_HANDLE_VALUE)
{
return false;
}
if (SetFilePointer(fileHandle, 0, NULL, FILE_BEGIN) == INVALID_SET_FILE_POINTER)
{
return false;
}
// Make sure we have a RIFF wave file
FindChunk(fileHandle, fourccRIFF, chunkSize, chunkPosition);
ReadChunkData(fileHandle, &filetype, sizeof(DWORD), chunkPosition);
if (filetype != fourccWAVE)
{
return false;
}
// Locate the 'fmt ' chunk, and copy its contents into a WAVEFORMATEXTENSIBLE structure.
FindChunk(fileHandle, fourccFMT, chunkSize, chunkPosition);
ReadChunkData(fileHandle, &wfx, chunkSize, chunkPosition);
// Find the audio data chunk
FindChunk(fileHandle, fourccDATA, chunkSize, chunkPosition);
dataBufferSize = chunkSize;
// Read the audio data from the 'data' chunk. This is the data that needs to be copied into
// the secondary buffer for playing
dataBuffer = new BYTE[dataBufferSize];
ReadChunkData(fileHandle, dataBuffer, dataBufferSize, chunkPosition);
CloseHandle(fileHandle);
// Set the wave format of the secondary buffer that this wave file will be loaded onto.
// The value of wfx.Format.nAvgBytesPerSec will be very useful to you since it gives you
// an approximate value for how many bytes it takes to hold one second of audio data.
waveFormat.wFormatTag = wfx.Format.wFormatTag;
waveFormat.nSamplesPerSec = wfx.Format.nSamplesPerSec;
waveFormat.wBitsPerSample = wfx.Format.wBitsPerSample;
waveFormat.nChannels = wfx.Format.nChannels;
waveFormat.nBlockAlign = wfx.Format.nBlockAlign;
waveFormat.nAvgBytesPerSec = wfx.Format.nAvgBytesPerSec;
waveFormat.cbSize = 0;
// Set the buffer description of the secondary sound buffer that the wave file will be loaded onto.
// In this example, we setup a buffer the same size as that of the audio data. For the assignment,
// your secondary buffer should only be large enough to hold approximately four seconds of data.
bufferDesc.dwSize = sizeof(DSBUFFERDESC);
bufferDesc.dwFlags = DSBCAPS_CTRLVOLUME | DSBCAPS_GLOBALFOCUS | DSBCAPS_CTRLPOSITIONNOTIFY;
bufferDesc.dwBufferBytes = dataBufferSize;
bufferDesc.dwReserved = 0;
bufferDesc.lpwfxFormat = &waveFormat;
bufferDesc.guid3DAlgorithm = GUID_NULL;
// Create a temporary sound buffer with the specific buffer settings.
result = directSound->CreateSoundBuffer(&bufferDesc, &tempBuffer, NULL);
if (FAILED(result))
{
return false;
}
// Test the buffer format against the direct sound 8 interface and create the secondary buffer.
result = tempBuffer->QueryInterface(IID_IDirectSoundBuffer8, (void**)&secondaryBuffer);
if (FAILED(result))
{
return false;
}
// Release the temporary buffer.
tempBuffer->Release();
tempBuffer = nullptr;
return true;
}
void ReleaseSecondaryBuffer()
{
// Release the secondary sound buffer.
if (secondaryBuffer != nullptr)
{
(secondaryBuffer)->Release();
secondaryBuffer = nullptr;
}
}
bool PlayWaveFile()
{
HRESULT result;
unsigned char * bufferPtr1;
unsigned long bufferSize1;
unsigned char * bufferPtr2;
unsigned long bufferSize2;
BYTE * dataBufferPtr = dataBuffer;
DWORD soundBytesOutput = 0;
bool fillFirstHalf = true;
LPDIRECTSOUNDNOTIFY8 directSoundNotify;
DSBPOSITIONNOTIFY positionNotify[2];
// Set position of playback at the beginning of the sound buffer.
result = secondaryBuffer->SetCurrentPosition(0);
if (FAILED(result))
{
return false;
}
// Set volume of the buffer to 100%.
result = secondaryBuffer->SetVolume(DSBVOLUME_MAX);
if (FAILED(result))
{
return false;
}
// Create an event for notification that playing has stopped. This is only useful
// when your audio file fits in the entire secondary buffer (as in this example).
// For the assignment, you are going to need notifications when the playback has reached the
// first quarter of the buffer or the third quarter of the buffer so that you know when
// you should copy more data into the secondary buffer.
HANDLE playEventHandles[1];
playEventHandles[0] = CreateEvent(NULL, FALSE, FALSE, NULL);
result = secondaryBuffer->QueryInterface(IID_IDirectSoundNotify8, (LPVOID*)&directSoundNotify);
if (FAILED(result))
{
return false;
}
// This notification is used to indicate that we have finished playing the buffer of audio. In
// the assignment, you will need two different notifications as mentioned above.
positionNotify[0].dwOffset = DSBPN_OFFSETSTOP;
positionNotify[0].hEventNotify = playEventHandles[0];
directSoundNotify->SetNotificationPositions(1, positionNotify);
directSoundNotify->Release();
// Now we can fill our secondary buffer and play it. In the assignment, you will not be able to fill
// the buffer all at once since the secondary buffer will not be large enough. Instead, you will need to
// loop through the data that you have retrieved from the server, filling different sections of the
// secondary buffer as you receive notifications.
// Lock the first part of the secondary buffer to write wave data into it. In this case, we lock the entire
// buffer, but for the assignment, you will only want to lock the half of the buffer that is not being played.
// You will definately want to look up the methods for the IDIRECTSOUNDBUFFER8 interface to see what these
// methods do and what the parameters are used for.
result = secondaryBuffer->Lock(0, dataBufferSize, (void**)&bufferPtr1, (DWORD*)&bufferSize1, (void**)&bufferPtr2, (DWORD*)&bufferSize2, 0);
if (FAILED(result))
{
return false;
}
// Copy the wave data into the buffer. If you need to insert some silence into the buffer, insert values of 0.
memcpy(bufferPtr1, dataBuffer, bufferSize1);
if (bufferPtr2 != NULL)
{
memcpy(bufferPtr2, dataBuffer, bufferSize2);
}
// Unlock the secondary buffer after the data has been written to it.
result = secondaryBuffer->Unlock((void*)bufferPtr1, bufferSize1, (void*)bufferPtr2, bufferSize2);
if (FAILED(result))
{
return false;
}
// Play the contents of the secondary sound buffer. If you want play to go back to the start of the buffer
// again, set the last parameter to DSBPLAY_LOOPING instead of 0. If play is already in progress, then
// play will just continue.
result = secondaryBuffer->Play(0, 0, 0);
if (FAILED(result))
{
return false;
}
// Wait for notifications. In this case, we only have one notification so we could use WaitForSingleObject,
// but for the assignment you will need more than one notification, so you will need WaitForMultipleObjects
result = WaitForMultipleObjects(1, playEventHandles, FALSE, INFINITE);
// In this case, we have been notified that playback has finished so we can just finish. In the assignment,
// you should use the appropriate notification to determine which part of the secondary buffer needs to be
// filled and handle it accordingly.
CloseHandle(playEventHandles[0]);
return true;
}
DirectSound is deprecated. See below for recommended replacements.
Documentation can be found on Microsoft Docs. The last time samples for DirectSound were shipped was in the legacy DirectX SDK (November 2007) release which is why you are having a hard time finding them. You can find them on GitHub. The headers and link libraries for DirectSound are in the Windows SDK.
Recommendations
For 'real-time mixing and effects' often used in games, the modern replacement is XAudio2. XAudio 2.9 is included in Windows 10, and is available through a simple side-by-side redistribution model for Windows 7, Windows 8.0, and Windows 8.1. Documentation can be found here, samples can be found here, and the
redist can be found here. You may also want to take a look at DirectX Tool Kit for Audio.
For other audio output and input, see Windows Core Audio APIs (WASAPI) which is supported on Windows Vista, Windows 7, Windows 8.0, Windows 8.1, and Windows 10. Documentation can be found here. Some samples can be found on GitHub in Xbox-ATG-Samples and Windows-universal-samples--while these are all UWP samples, the API also supports Win32 desktop.
There's also a new Microsoft Spatial Sounds API on Windows 10 (a.k.a. Windows Sonic). Documentation can be found here. Samples can be found on GitHub in Xbox-ATG-Samples.

How to record continuous raw audio data into a circular buffer with C++ on Windows 10?

Since Windows Multimedia turned out to be utterly incapable of recording continuous audio, I got the hint to use Windows Core Audio. There is sort of a manual here, but I can't figure out how to write the loads of overhead code to get the recording working. Can anyone provide a complete, minimal implementation of continuous audio recording to a circular buffer?
So far I am stuck at the code below not getting past the line pEnumerator->GetDefaultAudioEndpoint(eRender, eConsole, &pDevice); because pEnumerator remains nullptr.
#define VC_EXTRALEAN
#define _USE_MATH_DEFINES
#include <Windows.h>
#include <Audioclient.h>
#include <Mmdeviceapi.h>
#define REFTIMES_PER_SEC 10000000
#define REFTIMES_PER_MILLISEC 10000
int main() {
REFERENCE_TIME hnsRequestedDuration = REFTIMES_PER_SEC;
UINT32 bufferFrameCount;
UINT32 numFramesAvailable;
IMMDeviceEnumerator* pEnumerator = NULL;
IMMDevice* pDevice = NULL;
IAudioClient* pAudioClient = NULL;
IAudioCaptureClient* pCaptureClient = NULL;
WAVEFORMATEX* pwfx = NULL;
UINT32 packetLength = 0;
BYTE* pData;
DWORD flags;
CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), (void**)&pEnumerator);
pEnumerator->GetDefaultAudioEndpoint(eRender, eConsole, &pDevice);
pDevice->Activate(__uuidof(IAudioClient), CLSCTX_ALL, NULL, (void**)&pAudioClient);
pAudioClient->GetMixFormat(&pwfx);
pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_LOOPBACK, hnsRequestedDuration, 0, pwfx, NULL);
pAudioClient->GetBufferSize(&bufferFrameCount); // Get the size of the allocated buffer.
pAudioClient->GetService(__uuidof(IAudioCaptureClient), (void**)&pCaptureClient);
// Calculate the actual duration of the allocated buffer.
REFERENCE_TIME hnsActualDuration = (double)REFTIMES_PER_SEC* bufferFrameCount / pwfx->nSamplesPerSec;
pAudioClient->Start(); // Start recording.
// Each loop fills about half of the shared buffer.
while(true) {
// Sleep for half the buffer duration.
Sleep(hnsActualDuration/REFTIMES_PER_MILLISEC/2);
pCaptureClient->GetNextPacketSize(&packetLength);
while(packetLength != 0) {
// Get the available data in the shared buffer.
pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL);
if(flags&AUDCLNT_BUFFERFLAGS_SILENT) {
pData = NULL; // Tell CopyData to write silence.
}
// Copy the available capture data to the audio sink.
//hr = pMySink->CopyData(pData, numFramesAvailable, &bDone);
pCaptureClient->ReleaseBuffer(numFramesAvailable);
pCaptureClient->GetNextPacketSize(&packetLength);
}
}
pAudioClient->Stop();
return 0;
}
EDIT (24.07.2021):
Here is an update of my code for troubleshooting:
#define VC_EXTRALEAN
#define _USE_MATH_DEFINES
#include <Windows.h>
#include <Audioclient.h>
#include <Mmdeviceapi.h>
#include <chrono>
class Clock {
private:
typedef chrono::high_resolution_clock clock;
chrono::time_point<clock> t;
public:
Clock() { start(); }
void start() { t = clock::now(); }
double stop() const { return chrono::duration_cast<chrono::duration<double>>(clock::now()-t).count(); }
};
const uint base = 4096;
const uint sample_rate = 48000; // must be supported by microphone
const uint sample_size = 1*base; // must be a power of 2
const uint bandwidth = 5000; // must be <= sample_rate/2
float* wave = new float[sample_size]; // circular buffer
void fill(float* const wave, const float* const buffer, int offset) {
for(int i=sample_size; i>=offset; i--) {
wave[i] = wave[i-offset];
}
for(int i=0; i<offset; i++) {
const uint p = offset-1-i;
wave[i] = 0.5f*(buffer[2*p]+buffer[2*p+1]); // left and right channels
}
}
int main() {
for(uint i=0; i<sample_size; i++) wave[i] = 0.0f;
Clock clock;
#define REFTIMES_PER_SEC 10000000
#define REFTIMES_PER_MILLISEC 10000
REFERENCE_TIME hnsRequestedDuration = REFTIMES_PER_SEC;
UINT32 bufferFrameCount;
UINT32 numFramesAvailable;
IMMDeviceEnumerator* pEnumerator = NULL;
IMMDevice* pDevice = NULL;
IAudioClient* pAudioClient = NULL;
IAudioCaptureClient* pCaptureClient = NULL;
WAVEFORMATEX* pwfx = NULL;
UINT32 packetLength = 0;
BYTE* pData;
DWORD flags;
CoInitializeEx(NULL, COINIT_MULTITHREADED);
CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), (void**)&pEnumerator);
pEnumerator->GetDefaultAudioEndpoint(eRender, eConsole, &pDevice);
pDevice->Activate(__uuidof(IAudioClient), CLSCTX_ALL, NULL, (void**)&pAudioClient);
pAudioClient->GetMixFormat(&pwfx);
println(pwfx->wFormatTag);// 65534
println(WAVE_FORMAT_PCM);// 1
println(pwfx->nChannels);// 2
println((uint)pwfx->nSamplesPerSec);// 48000
println(pwfx->wBitsPerSample);// 32
println(pwfx->nBlockAlign);// 8
println(pwfx->wBitsPerSample*pwfx->nChannels/8);// 8
println((uint)pwfx->nAvgBytesPerSec);// 384000
println((uint)(pwfx->nBlockAlign*pwfx->nSamplesPerSec*pwfx->nChannels));// 768000
println(pwfx->cbSize);// 22
pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_LOOPBACK, hnsRequestedDuration, 0, pwfx, NULL);
pAudioClient->GetBufferSize(&bufferFrameCount); // Get the size of the allocated buffer.
pAudioClient->GetService(__uuidof(IAudioCaptureClient), (void**)&pCaptureClient);
// Calculate the actual duration of the allocated buffer.
//REFERENCE_TIME hnsActualDuration = (double)REFTIMES_PER_SEC* bufferFrameCount / pwfx->nSamplesPerSec;
pAudioClient->Start(); // Start recording.
while(running) {
pCaptureClient->GetNextPacketSize(&packetLength); // packetLength and numFramesAvailable are either 0 or 480
pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL);
const int offset = (uint)numFramesAvailable;
if(offset>0) {
fill(wave, (float*)pData, offset); // here I add pData to the circular buffer "wave"
}
while(packetLength != 0) {
pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL); // Get the available data in the shared buffer.
if(flags&AUDCLNT_BUFFERFLAGS_SILENT) {
pData = NULL; // Tell CopyData to write silence.
}
pCaptureClient->ReleaseBuffer(numFramesAvailable);
pCaptureClient->GetNextPacketSize(&packetLength);
}
sleep(1.0/120.0-clock.stop());
clock.start();
}
pAudioClient->Stop();
}
You're not calling CoInitializeEx, so all COM calls will fail.
You should also be testing all calls to see if they return an error.
To address the questions posed in the comments:
I believe that if you want to operate the endpoint in shared mode then you have to use the parameters returned by GetFixFormat. This means that:
you are limited to the one sample rate (unless you write code to perform a conversion, which is a non-trivial task)
if you want the samples as floats, you will have to convert them yourself
To write code that runs on all machines, you must cater for whatever the mix format throws at you. This might be:
16 bit integers
24 bit integers (nBlockAlign = 3)
24 bit integers in 32 bit containers (nBlockAlign = 4)
32 bit integers
32 bit floating point (rare)
64 bit floating point (unheard of, in my experience)
The samples will be in the native byte order of the machine your code is running on, and are interleaved.
So, case out on the various parameters in pwfx and write the relevant code for each sample format you want to support.
Assuming you want your floats to be normalised to -1 .. +1, and 2-channel input data, you might do this for 16 bit integers, for example:
const int16_t *inbuf = (const int16_t *) pData;
float *outbuf = ...;
for (int i = 0; i < numFramesAvailable * 2; ++i)
{
int16_t sample = *inbuf++;
*outbuf++ = (float) (sample * (1.0 / 32767));
}
Note that I avoid a (slow) floating point division by multiplying by the reciprocal (the compiler will pre-calculate 1.0 / 32767).
I'll leave the rest to you.
You could use this audio library instead. Its way easier to get up and running than trying to interface with the platform specific SDKs:
http://www.music.mcgill.ca/~gary/rtaudio/recording.html
Also, while removing the sleep might not help in your example you should never call sleep, lock a mutex, or allocate memory during audio processing. The delay introduced by those is completely arbitrary compared to the short buffer times, so will always create problems for you.

How to accelerate C++ writing speed to the speed tested by CrystalDiskMark?

Now I get about 3.6GB data per second in memory, and I need to write them on my SSD continuously. I used CrystalDiskMark to test the writing speed of my SSD, it is almost 6GB per second, so I had thought this work should not be that hard.
![my SSD test result][1]:
[1]https://plus.google.com/u/0/photos/photo/106876803948041178149/6649598887699308850?authkey=CNbb5KjF8-jxJQ "test result":
My computer is Windows 10, using Visual Studio 2017 community.
I found this question and tried the highest voted answer. Unfortunately, the writing speed was only about 1s/GB for his option_2, far slower than tested by CrystalDiskMark. And then I tried memory mapping, this time writing becomes faster, about 630ms/GB, but still much slower. Then I tried multi-thread memory mapping, it seems that when the number of threads is 4, the speed was about 350ms/GB, and when I add the threads' number, the writing speed didn't go up anymore.
Code for memory mapping:
#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <thread>
#include <windows.h>
#include <sstream>
// Generate random data
std::vector<int> GenerateData(std::size_t bytes) {
assert(bytes % sizeof(int) == 0);
std::vector<int> data(bytes / sizeof(int));
std::iota(data.begin(), data.end(), 0);
std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
return data;
}
// Memory mapping
int map_write(int* data, int size, int id){
char* name = (char*)malloc(100);
sprintf_s(name, 100, "D:\\data_%d.bin",id);
HANDLE hFile = CreateFile(name, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);//
if (hFile == INVALID_HANDLE_VALUE){
return -1;
}
Sleep(0);
DWORD dwFileSize = size;
char* rname = (char*)malloc(100);
sprintf_s(rname, 100, "data_%d.bin", id);
HANDLE hFileMap = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, dwFileSize, rname);//create file
if (hFileMap == NULL) {
CloseHandle(hFile);
return -2;
}
PVOID pvFile = MapViewOfFile(hFileMap, FILE_MAP_WRITE, 0, 0, 0);//Acquire the address of file on disk
if (pvFile == NULL) {
CloseHandle(hFileMap);
CloseHandle(hFile);
return -3;
}
PSTR pchAnsi = (PSTR)pvFile;
memcpy(pchAnsi, data, dwFileSize);//memery copy
UnmapViewOfFile(pvFile);
CloseHandle(hFileMap);
CloseHandle(hFile);
return 0;
}
// Multi-thread memory mapping
void Mem2SSD_write(int* data, int size){
int part = size / sizeof(int) / 4;
int index[4];
index[0] = 0;
index[1] = part;
index[2] = part * 2;
index[3] = part * 3;
std::thread ta(map_write, data + index[0], size / 4, 10);
std::thread tb(map_write, data + index[1], size / 4, 11);
std::thread tc(map_write, data + index[2], size / 4, 12);
std::thread td(map_write, data + index[3], size / 4, 13);
ta.join();
tb.join();
tc.join();
td.join();
}
//Test:
int main() {
const std::size_t kB = 1024;
const std::size_t MB = 1024 * kB;
const std::size_t GB = 1024 * MB;
for (int i = 0; i < 10; ++i) {
std::vector<int> data = GenerateData(1 * GB);
auto startTime = std::chrono::high_resolution_clock::now();
Mem2SSD_write(&data[0], 1 * GB);
auto endTime = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
std::cout << "1G writing cost: " << duration << " ms" << std::endl;
}
system("pause");
return 0;
}
So I'd like to ask, is there any faster writing method for C++ to writing huge files? Or, why can't I write as fast as tested by CrystalDiskMark? How does CrystalDiskMark write?
Any help would be greatly appreciated. Thank you!
first of all this is not c++ question but os related question. for get maximum performance need need use os specific low level api call, which not exist in general c++ libs. from your code clear visible that you use windows api, so search solution for windows how minimum.
from CreateFileW function:
When FILE_FLAG_NO_BUFFERING is combined with FILE_FLAG_OVERLAPPED,
the flags give maximum asynchronous performance, because the I/O does
not rely on the synchronous operations of the memory manager.
so we need use combination of this 2 flags in call CreateFileW or FILE_NO_INTERMEDIATE_BUFFERING in call NtCreateFile
also extend file size and valid data length take some time, so better if final file at begin is known - just set file final size via NtSetInformationFile with FileEndOfFileInformation
or via SetFileInformationByHandle with FileEndOfFileInfo. and then set valid data length with SetFileValidData or via NtSetInformationFile with FileValidDataLengthInformation. set valid data length require SE_MANAGE_VOLUME_NAME privilege enabled when opening a file initially (but not when call SetFileValidData)
also look for file compression - if file compressed (it will be compressed by default if created in compressed folder) this is very slow writting. so need disbale file compression via FSCTL_SET_COMPRESSION
then when we use asynchronous I/O (fastest way) we not need create several dedicated threads. instead we need determine number of I/O requests run in concurrent. if you use CrystalDiskMark it actually run CdmResource\diskspd\diskspd64.exe for test and this is coresponded to it -o<count> parameter (run diskspd64.exe /? > h.txt for look parameters list).
use non Buffering I/O make task more hard, because exist 3 additional requirements:
Any ByteOffset passed to WriteFile must be a multiple of the sector
size.
The Length passed to WriteFile must be an integral of the sector
size
Buffers must be aligned in accordance with the alignment requirement
of the underlying device. To obtain this information, call
NtQueryInformationFile with FileAlignmentInformation
or GetFileInformationByHandleEx with FileAlignmentInfo
in most situations, page-aligned memory will also be sector-aligned,
because the case where the sector size is larger than the page size is
rare.
so almost always buffers allocated with VirtualAlloc function and multiple page size (4,096 bytes ) is ok. in concrete test for smaller code size i use this assumption
struct WriteTest
{
enum { opCompression, opWrite };
struct REQUEST : IO_STATUS_BLOCK
{
WriteTest* pTest;
ULONG opcode;
ULONG offset;
};
LONGLONG _TotalSize, _BytesLeft;
HANDLE _hFile;
ULONG64 _StartTime;
void* _pData;
REQUEST* _pRequests;
ULONG _BlockSize;
ULONG _ConcurrentRequestCount;
ULONG _dwThreadId;
LONG _dwRefCount;
WriteTest(ULONG BlockSize, ULONG ConcurrentRequestCount)
{
if (BlockSize & (BlockSize - 1))
{
__debugbreak();
}
_BlockSize = BlockSize, _ConcurrentRequestCount = ConcurrentRequestCount;
_dwRefCount = 1, _hFile = 0, _pRequests = 0, _pData = 0;
_dwThreadId = GetCurrentThreadId();
}
~WriteTest()
{
if (_pData)
{
VirtualFree(_pData, 0, MEM_RELEASE);
}
if (_pRequests)
{
delete [] _pRequests;
}
if (_hFile)
{
NtClose(_hFile);
}
PostThreadMessageW(_dwThreadId, WM_QUIT, 0, 0);
}
void Release()
{
if (!InterlockedDecrement(&_dwRefCount))
{
delete this;
}
}
void AddRef()
{
InterlockedIncrementNoFence(&_dwRefCount);
}
void StartWrite()
{
IO_STATUS_BLOCK iosb;
FILE_VALID_DATA_LENGTH_INFORMATION fvdl;
fvdl.ValidDataLength.QuadPart = _TotalSize;
NTSTATUS status;
if (0 > (status = NtSetInformationFile(_hFile, &iosb, &_TotalSize, sizeof(_TotalSize), FileEndOfFileInformation)) ||
0 > (status = NtSetInformationFile(_hFile, &iosb, &fvdl, sizeof(fvdl), FileValidDataLengthInformation)))
{
DbgPrint("FileValidDataLength=%x\n", status);
}
ULONG offset = 0;
ULONG dwNumberOfBytesTransfered = _BlockSize;
_BytesLeft = _TotalSize + dwNumberOfBytesTransfered;
ULONG ConcurrentRequestCount = _ConcurrentRequestCount;
REQUEST* irp = _pRequests;
_StartTime = GetTickCount64();
do
{
irp->opcode = opWrite;
irp->pTest = this;
irp->offset = offset;
offset += dwNumberOfBytesTransfered;
DoWrite(irp++);
} while (--ConcurrentRequestCount);
}
void FillBuffer(PULONGLONG pu, LONGLONG ByteOffset)
{
ULONG n = _BlockSize / sizeof(ULONGLONG);
do
{
*pu++ = ByteOffset, ByteOffset += sizeof(ULONGLONG);
} while (--n);
}
void DoWrite(REQUEST* irp)
{
LONG BlockSize = _BlockSize;
LONGLONG BytesLeft = InterlockedExchangeAddNoFence64(&_BytesLeft, -BlockSize) - BlockSize;
if (0 < BytesLeft)
{
LARGE_INTEGER ByteOffset;
ByteOffset.QuadPart = _TotalSize - BytesLeft;
PVOID Buffer = RtlOffsetToPointer(_pData, irp->offset);
FillBuffer((PULONGLONG)Buffer, ByteOffset.QuadPart);
AddRef();
NTSTATUS status = NtWriteFile(_hFile, 0, 0, irp, irp, Buffer, BlockSize, &ByteOffset, 0);
if (0 > status)
{
OnComplete(status, 0, irp);
}
}
else if (!BytesLeft)
{
// write end
ULONG64 time = GetTickCount64() - _StartTime;
WCHAR sz[64];
StrFormatByteSizeW((_TotalSize * 1000) / time, sz, RTL_NUMBER_OF(sz));
DbgPrint("end:%S\n", sz);
}
}
static VOID NTAPI _OnComplete(
_In_ NTSTATUS status,
_In_ ULONG_PTR dwNumberOfBytesTransfered,
_Inout_ PVOID Ctx
)
{
reinterpret_cast<REQUEST*>(Ctx)->pTest->OnComplete(status, dwNumberOfBytesTransfered, reinterpret_cast<REQUEST*>(Ctx));
}
VOID OnComplete(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered, REQUEST* irp)
{
if (0 > status)
{
DbgPrint("OnComplete[%x]: %x\n", irp->opcode, status);
}
else
switch (irp->opcode)
{
default:
__debugbreak();
case opCompression:
StartWrite();
break;
case opWrite:
if (dwNumberOfBytesTransfered == _BlockSize)
{
DoWrite(irp);
}
else
{
DbgPrint(":%I64x != %x\n", dwNumberOfBytesTransfered, _BlockSize);
}
}
Release();
}
NTSTATUS Create(POBJECT_ATTRIBUTES poa, ULONGLONG size)
{
if (!(_pRequests = new REQUEST[_ConcurrentRequestCount]) ||
!(_pData = VirtualAlloc(0, _BlockSize * _ConcurrentRequestCount, MEM_COMMIT, PAGE_READWRITE)))
{
return STATUS_INSUFFICIENT_RESOURCES;
}
ULONGLONG sws = _BlockSize - 1;
LARGE_INTEGER as;
_TotalSize = as.QuadPart = (size + sws) & ~sws;
HANDLE hFile;
IO_STATUS_BLOCK iosb;
NTSTATUS status = NtCreateFile(&hFile,
DELETE|FILE_GENERIC_READ|FILE_GENERIC_WRITE&~FILE_APPEND_DATA,
poa, &iosb, &as, 0, 0, FILE_OVERWRITE_IF,
FILE_NON_DIRECTORY_FILE|FILE_NO_INTERMEDIATE_BUFFERING, 0, 0);
if (0 > status)
{
return status;
}
_hFile = hFile;
if (0 > (status = RtlSetIoCompletionCallback(hFile, _OnComplete, 0)))
{
return status;
}
static USHORT cmp = COMPRESSION_FORMAT_NONE;
REQUEST* irp = _pRequests;
irp->pTest = this;
irp->opcode = opCompression;
AddRef();
status = NtFsControlFile(hFile, 0, 0, irp, irp, FSCTL_SET_COMPRESSION, &cmp, sizeof(cmp), 0, 0);
if (0 > status)
{
OnComplete(status, 0, irp);
}
return status;
}
};
void WriteSpeed(POBJECT_ATTRIBUTES poa, ULONGLONG size, ULONG BlockSize, ULONG ConcurrentRequestCount)
{
BOOLEAN b;
NTSTATUS status = RtlAdjustPrivilege(SE_MANAGE_VOLUME_PRIVILEGE, TRUE, FALSE, &b);
if (0 <= status)
{
status = STATUS_INSUFFICIENT_RESOURCES;
if (WriteTest * pTest = new WriteTest(BlockSize, ConcurrentRequestCount))
{
status = pTest->Create(poa, size);
pTest->Release();
if (0 <= status)
{
MessageBoxW(0, 0, L"Test...", MB_OK|MB_ICONINFORMATION);
}
}
}
}
These are the suggestions that come to my mind:
stop all running processes that are using the disk, in particular
disable Windows Defender realtime protection (or other anti virus/malware)
disable pagefile
use Windows Resource Monitor to find processes reading or writing to your disk
make sure you write continuous sectors on disk
don't take into account file opening and closing times
do not use multithreading (your disk is using DMA so the CPU won't matter)
write data that is in RAM (obviously)
be sure to disable all debugging features when building (build a release)
if using M.2 PCIe disk (seems to be your case) make sure other PCIe
devices aren't stealing PCIe lanes to your disk (the CPU has a
limited number AND mobo too)
don't run the test from your IDE
disable Windows file indexing
Finally, you can find good hints on how to code fast writes in C/C++ in this question's thread: Writing a binary file in C++ very fast
One area that might give you improvement is to have your threads running constantly and each reading from a queue.
At the moment every time you go to write you spawn 4 threads (which is slow) and then they're deconstructed at the end of the function. You'll see a speedup of at least the cpu time of your function if you spawn the threads at the start and have them all reading from separate queue's in an infinite loop.
They'll simply check after a SMALL delay if there's anything in their queue, if their is they'll write it all. Your only issue then is making sure order of data is maintained.

run time inconsistency DXVA hardware video decoding

I am currently working on a project that involves using DXVA API and the FFmpeg framework to implement hardware-accelerated decoding of H264 video stream files.
I have done some research on GPU decoding and constructed my code based on the hardware acceleration implementation in VLC. From my understanding, using DXVA in FFmpeg involves initializing the DirectXVideoDecoder and implementing several callback functions in AVCodecContext. The decoding process is done with the FFmpeg function avcodec_decode_video2() and each frame is parsed with av_read_frame(). The decoded frame is stored in the graphics memory and displayed using Direct3D.
I tried to time each process with :GetTickCount() function and noticed that the execution time of the program for a 1550 frame video is 35000ms, with the display function taking 90% of the time and decoding function taking 6% of the time.
However, when I tried to comment out the displaying process and execute the code only decoding each frame, the total decoding time surprisingly increased to 25,000ms for the same video, taking 94% of the total time.
Here is the code for the decoding function:
//record start time
DWORD start_time = ::GetTickCount();
//media file to be loaded
const char *filename = "123.mkv";
//time recording parameters
unsigned frame_read_time_total = 0;
unsigned decode_frame_time_total = 0;
unsigned display_time_total = 0;
unsigned setup_time_total = 0;
/*********************Setup and Initialization Code*******************************/
unsigned setup_time_start = ::GetTickCount();
av_register_all();
av_log_set_level(AV_LOG_DEBUG);
int res;
AVFormatContext *file = NULL;
res = avformat_open_input(&file, filename, NULL, NULL);//´ò¿ªÎļþ
if (res < 0) {
printf("error %x in avformat_open_input\n", res);
return 1;
}
res = avformat_find_stream_info(file, NULL);//È¡³öÁ÷ÐÅÏ¢
if (res < 0)
{
printf("error %x in avformat_find_stream_info\n", res);
return 1;
}
av_dump_format(file, 0, filename, 0);//ÁгöÊäÈëÎļþµÄÏà¹ØÁ÷ÐÅÏ¢
int i;
int videoindex = -1;
int audioindex = -1;
for (i = 0; i < file->nb_streams; i++){
if (file->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO){
videoindex = i;
}
if (file->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO){
audioindex = i;
}
}
if (videoindex == -1){
av_log(NULL, AV_LOG_DEBUG, "can't find video stream\n");
return 0;
}
AVCodec *codec = avcodec_find_decoder(file->streams[videoindex]->codec->codec_id);//¸ù¾ÝÁ÷ÐÅÏ¢ÕÒµ½½âÂëÆ÷
if (!codec){
printf("decoder not found\n");
return 1;
}
AVCodecContext *codecctx = file->streams[videoindex]->codec;
screen_width = codecctx->width;
screen_height = codecctx->height;
//Initialize Win API Window
WNDCLASSEX window;
ZeroMemory(&window, sizeof(window));
window.cbSize = sizeof(window);
window.hbrBackground = (HBRUSH)(COLOR_WINDOW + 1);
window.lpfnWndProc = (WNDPROC)WindowProcess;
window.lpszClassName = L"D3D";
window.style = CS_HREDRAW | CS_VREDRAW;
RegisterClassEx(&window);
HWND hwnd_temp = CreateWindow(L"D3D", L"Player", WS_OVERLAPPEDWINDOW,
0, 0, screen_width, screen_height, NULL, NULL, NULL, NULL);
if (hwnd_temp == NULL){
av_log(NULL, AV_LOG_ERROR, "Error: Cannot create window\n");
system("pause");
}
hwnd.push_back(hwnd_temp);
vlc_va_dxva2_t *dxva = vlc_va_NewDxva2(codecctx->codec_id);
if (NULL == dxva){
return 0;
}
res = Setup(dxva, &codecctx->hwaccel_context, &codecctx->pix_fmt, screen_width, screen_height);
if (res < 0) {
printf("error DXVA setup\n", res);
return 1;
}
//Assign callback function
codecctx->opaque = dxva;
codecctx->get_format = ffmpeg_GetFormat;
codecctx->get_buffer = ffmpeg_GetFrameBuf;
codecctx->reget_buffer = ffmpeg_ReGetFrameBuf;
codecctx->release_buffer = ffmpeg_ReleaseFrameBuf;
codecctx->thread_count = 1;
res = avcodec_open2(codecctx, codec, NULL);
if (res < 0) {
printf("error %x in avcodec_open2\n", res);
return 1;
}
//Initialize Packet
AVPacket pkt = { 0 };
AVFrame *picture = avcodec_alloc_frame();
DWORD wait_for_keyframe = 60;
//initialize frame count
int count = 0;
ShowWindow(hwnd.at(0), SW_SHOWNORMAL);
UpdateWindow(hwnd.at(0));
RECT screen_size;
screen_size.top = 0;
screen_size.bottom = screen_height;
screen_size.left = 0;
screen_size.right = screen_width;
unsigned setup_time_end = ::GetTickCount();
setup_time_total = setup_time_end - setup_time_start;
MSG msg;
ZeroMemory(&msg, sizeof(msg));
while(msg.message!=WM_QUIT)
{
if (PeekMessage(&msg, NULL, 0,0, PM_REMOVE)){
TranslateMessage(&msg);
DispatchMessage(&msg);
continue;
}
int read_status;
unsigned read_frame_start = ::GetTickCount();
read_status = av_read_frame(file, &pkt);
if (read_status < 0)
{
av_free_packet(&pkt);
goto done;
}
unsigned read_frame_end = ::GetTickCount();
frame_read_time_total += (read_frame_end - read_frame_start);
int got_picture = 0;
unsigned decode_start = ::GetTickCount();
int bytes_used = avcodec_decode_video2(codecctx, picture, &got_picture, &pkt);
unsigned decode_end = ::GetTickCount();
decode_frame_time_total += (decode_end - decode_start);
if (got_picture)
{
count++;
unsigned display_start = ::GetTickCount();
//display_frame((vlc_va_dxva2_t *)codecctx->opaque, picture, screen_size,0);
unsigned display_end = ::GetTickCount();
display_time_total += (display_end - display_start);
}
av_free_packet(&pkt);
}
done:
UnregisterClass(L"D3D",0);
printf("Frames = %d\n",count);
unsigned stop_time = ::GetTickCount();
unsigned total_time = stop_time - start_time;
printf("total frame = %d\n", count);
printf("time cost = %d\n", total_time);
printf("total setup time = %d, %f %% total execution time\n", setup_time_total,(float) setup_time_total / total_time * 100);
printf("total frame read time = %d, %f %% total execution time\n", frame_read_time_total, (float)frame_read_time_total / total_time*100);
printf("total frame decode time = %d, %f %% total execution time\n", decode_frame_time_total, (float)decode_frame_time_total / total_time*100);
printf("total display time = %d, %f %% of total execution time\n", display_time_total, (float)display_time_total / total_time*100);
av_free(picture);
av_close_input_file(file);
system("pause");
return 0;
What could be the cause of this strange behavior? My guess is that it may be the possible incorrect use of :GetTickCount() or may be it has to do with the DXVA hardware-accelerated decoding process. Sorry for the long post. Any input and suggestion is appreciated. Thanks in advance.
I think it is a correct behaviour, if the decoding process is asynchronous. I know Ffmpeg uses threads, but it depends on compilation flags or decoding setup.
If the display process is very long, the decoder decodes frames, while the display process executes. So when you ask for rendering, some frames are already decoded, and it's fast.
If you avoid the display process, the decoding process takes all the processor time. Normally, the display process uses some sort of timestamp that lets enough time to the decoding process.
PS : from what i know about Ffmpeg and Dxva2, you also need to provide the directx texture.