Memory validate in difficult task within thread - c++

I'm currently creating a sound system for my project. Every call PlayAsync creating instance of sound in std::thread callback. The sound data proceed in cycle in this callback. When thread proceeds it store sound instance in static vector. When thread ends (sound complete) - it delete sound instance and decrement instance count. When application ends - it must stop all sounds immediate, sending interrupt to every cycle of sound.
The problem is in array keeping these sounds. I am not sure, but I think vector isn't right choice for this purpose.. Here is a code.
void gSound::PlayAsync()
{
std::thread t(gSound::Play,mp_Audio,std::ref(*this));
t.detach();
}
HRESULT gSound::Play(IXAudio2* s_XAudio,gSound& sound)
{
gSound* pSound = new gSound(sound);
pSound->m_Disposed = false;
HRESULT hr;
// Create the source voice
IXAudio2SourceVoice* pSourceVoice;
if( FAILED( hr = s_XAudio->CreateSourceVoice( &pSourceVoice, pSound->pwfx ) ) )
{
gDebug::ShowMessage(L"Error creating source voice");
return hr;
}
// Submit the wave sample data using an XAUDIO2_BUFFER structure
XAUDIO2_BUFFER buffer = {0};
buffer.pAudioData = pSound->pbWaveData;
buffer.Flags = XAUDIO2_END_OF_STREAM; // tell the source voice not to expect any data after this buffer
buffer.AudioBytes = pSound->cbWaveSize;
if( FAILED( hr = pSourceVoice->SubmitSourceBuffer( &buffer ) ) )
{
gDebug::ShowMessage(L"Error submitting source buffer");
pSourceVoice->DestroyVoice();
return hr;
}
hr = pSourceVoice->Start( 0 );
// Let the sound play
BOOL isRunning = TRUE;
m_soundInstanceCount++;
mp_SoundInstances.push_back(pSound); #MARK2
while( SUCCEEDED( hr ) && isRunning && pSourceVoice != nullptr && !pSound->m_Interrupted)
{
XAUDIO2_VOICE_STATE state;
pSourceVoice->GetState( &state );
isRunning = ( state.BuffersQueued > 0 ) != 0;
Sleep(10);
}
pSourceVoice->DestroyVoice();
delete pSound;pSound = nullptr; //its correct ??
m_soundInstanceCount--;
return 0;
}
void gSound::InterrupAllSoundInstances()
{
for(auto Iter = mp_SoundInstances.begin(); Iter != mp_SoundInstances.end(); Iter++)
{
if(*Iter != nullptr)//#MARK1
{
(*Iter)->m_Interrupted = true;
}
}
}
And this I call in application class before disposing sound objects, after main application loop immediate.
gSound::InterrupAllSoundInstances();
while (gSound::m_soundInstanceCount>0)//waiting for deleting all sound instances in threads
{
}
Questions:
So #MARK1 - How to check memory validation in vector? I don't have experience about it. And get errors when try check invalid memory (it's not equals null)
And #MARK2 - How to use vector correctly? Or maybe vector is bad choice? Every time I create sound instance it increases size. It's not good.

A typical issue:
delete pSound;
pSound = nullptr; // issue
This does not do what you think.
It will effectively set pSound to null, but there are other copies of the same pointer too (at least one in the vector) which do not get nullified. This is why you do not find nullptr in your vector.
Instead you could register the index into the vector and nullify that: mp_SoundInstances[index] = nullptr;.
However, I am afraid that you simply do not understand memory handling well and you lack structure. For memory handling, it's hard to tell without details and your system seems complicated enough that I am afraid it would tell too long to explain. For structure, you should read a bit about the Observer pattern.

Related

SetPerTcpConnectionEStats fails and can't get GetPerTcpConnectionEStats multiple times c++

I am following the example in https://learn.microsoft.com/en-gb/windows/win32/api/iphlpapi/nf-iphlpapi-getpertcp6connectionestats?redirectedfrom=MSDN to get the TCP statistics. Although, I got it working and get the statistics in the first place, still I want to record them every a time interval (which I haven't managed to do so), and I have the following questions.
The SetPerTcpConnectionEStats () fails with status != NO_ERROR and equal to 5. Although, it fails, I can get the statistics. Why?
I want to get the statistics every, let's say 1 second. I have tried two different ways; a) to use a while loop and use a std::this_thread::sleep_for(1s), where I could get the statistics every ~1sec, but the whole app was stalling (is it because of the this), I supposed that I am blocking the operation of the main, and b) (since a) failed) I tried to call TcpStatistics() from another function (in different class) that is triggered every 1 sec (I store clientConnectRow to a global var). However, in that case (b), GetPerTcpConnectionEStats() fails with winStatus = 1214 (ERROR_INVALID_NETNAME) and of course TcpStatistics() cannot get any of the statistics.
a)
ClassB::ClassB()
{
UINT winStatus = GetTcpRow(localPort, hostPort, MIB_TCP_STATE_ESTAB, (PMIB_TCPROW)clientConnectRow);
ToggleAllEstats(clientConnectRow, TRUE);
thread t1(&ClassB::TcpStatistics, this, clientConnectRow);
t1.join();
}
ClassB::TcpStatistics()
{
while (true)
{
GetAndOutputEstats(row, TcpConnectionEstatsBandwidth)
// some more code here
this_thread::sleep_for(milliseconds(1000));
}
}
b)
ClassB::ClassB()
{
MIB_TCPROW client4ConnectRow;
void* clientConnectRow = NULL;
clientConnectRow = &client4ConnectRow;
UINT winStatus = GetTcpRow(localPort, hostPort, MIB_TCP_STATE_ESTAB, (PMIB_TCPROW)clientConnectRow);
m_clientConnectRow = clientConnectRow;
TcpStatistics();
}
ClassB::TcpStatistics()
{
ToggleAllEstats(m_clientConnectRow , TRUE);
void* row = m_clientConnectRow;
GetAndOutputEstats(row, TcpConnectionEstatsBandwidth)
// some more code here
}
ClassB::GetAndOutputEstats(void* row, TCP_ESTATS_TYPE type)
{
//...
winStatus = GetPerTcpConnectionEStats((PMIB_TCPROW)row, type, NULL, 0, 0, ros, 0, rosSize, rod, 0, rodSize);
if (winStatus != NO_ERROR) {wprintf(L"\nGetPerTcpConnectionEStats %s failed. status = %d", estatsTypeNames[type], winStatus); //
}
else { ...}
}
ClassA::FunA()
{
classB_ptr->TcpStatistics();
}
I found a work around for the second part of my question. I am posting it here, in case someone else find it useful. There might be other solutions too, more advanced, but this is how I did it myself. We have to first Obtain MIB_TCPROW corresponding to the TCP connection and then to Enable Estats collection before dumping current stats. So, what I did was to add all of these in a function and call this instead, every time I want to get the stats.
void
ClassB::FunSetTcpStats()
{
MIB_TCPROW client4ConnectRow;
void* clientConnectRow = NULL;
clientConnectRow = &client4ConnectRow;
//this is for the statistics
UINT winStatus = GetTcpRow(lPort, hPort, MIB_TCP_STATE_ESTAB, (PMIB_TCPROW)clientConnectRow); //lPort & hPort in htons!
if (winStatus != ERROR_SUCCESS) {
wprintf(L"\nGetTcpRow failed on the client established connection with %d", winStatus);
return;
}
//
// Enable Estats collection and dump current stats.
//
ToggleAllEstats(clientConnectRow, TRUE);
TcpStatistics(clientConnectRow); // same as GetAllEstats() in msdn
}

Code runs fine with WinDbg but weird without it

I am debugging an issue with WinDbg which I can consistently produce. The problem is when I run the executable with WinDbg to debug it, the issue can't be reproduced. What could be the reason?
Here is the code the behaves differently:
CWnd* pWnd = GetDlgItem(IDOKCANCEL);
if(pWnd)
{
CString sOK;
sOK.LoadString(IDS_OK);
pWnd->SetWindowText(sOK);
}
Here the button text is updated properly when I run with WinDbg but it is not updated when I run it normally (which is the bug).
Update
Like I said in comments, the issue is not with the code above because it's doesn't even get called. The operation is done in a worker thread which sends update messages to this dialog. The final message that executes the above code is never send do it so the above code is never executed.
Why the worker thread doesn't send this message is interesting. It ges locked on a critical section while opening a database. WinDbg tells me that the main thread is the owner of that critical section but I can't see from call stack or any other way where does it failed to unlock the critical section.
What complicates the problem is that it works fine if I run it with debugger. I added log output but it also starts to works fine with this change.
The only way I can catch it with a debugger is when I run it normal mode, produce the problem, then attach the debugger and it shows me its locked on the critical section. It shows the main thread is the owner of that critical section but it not clear why it is in locked state. The critical section is simply locked and unlocked in one function and its out of there.
Update 2
I am using the critical section only in one file in my entire project and there in only two functions (when it opens database and recordset).
BOOL CADODatabase::Open(LPCTSTR lpstrConnection, LPCTSTR lpstrUserID, LPCTSTR lpstrPassword)
{
CString database = GetSourceDatabase( lpstrConnection, NULL );
// get the appropriate critical section based on database
g_dbCriticalSection = GetDbCriticalSection( database );
if( g_dbCriticalSection)
g_dbCriticalSection->Lock();
HRESULT hr = S_OK;
if(IsOpen())
Close();
if(wcscmp(lpstrConnection, _T("")) != 0)
m_strConnection = lpstrConnection;
ASSERT(!m_strConnection.IsEmpty());
try
{
if(m_nConnectionTimeout != 0)
m_pConnection->PutConnectionTimeout(m_nConnectionTimeout);
hr = m_pConnection->Open(_bstr_t(m_strConnection), _bstr_t(lpstrUserID), _bstr_t(lpstrPassword), NULL);
if( g_dbCriticalSection)
g_dbCriticalSection->Unlock();
return hr == S_OK;
}
catch(_com_error &e)
{
dump_com_error(e);
if( g_dbCriticalSection)
g_dbCriticalSection->Unlock();
return FALSE;
}
}
The 2nd function has other visible imperfections but please ignore that, its legacy code.
BOOL CADORecordset::Open(_ConnectionPtr mpdb, LPCTSTR lpstrExec, int nOption)
{
BSTR bstrConnString;
m_pConnection->get_ConnectionString(&bstrConnString);
CString database = GetSourceDatabase( bstrConnString, m_pConnection );
g_dbCriticalSection = GetDbCriticalSection( database );
if( g_dbCriticalSection)
g_dbCriticalSection->Lock();
Close();
if(wcscmp(lpstrExec, _T("")) != 0)
m_strQuery = lpstrExec;
ASSERT(!m_strQuery.IsEmpty());
if(m_pConnection == NULL)
m_pConnection = mpdb;
m_strQuery.TrimLeft();
BOOL bIsSelect = m_strQuery.Mid(0, _tcslen(_T("Select "))).CompareNoCase(_T("select ")) == 0 && nOption == openUnknown;
int maxRetries = 10;
bool bContinue = true;
CursorTypeEnum adCursorType = adOpenStatic;
if (!m_bSQLEngine)
{
// MDB Engine
adCursorType = adOpenStatic;
m_pConnection->CursorLocation = adUseClient;
}
else
{
// SQL Engine
adCursorType = adOpenDynamic;
m_pConnection->CursorLocation = adUseServer;
}
int currentCommandTimeout = m_pConnection->CommandTimeout;
if( g_dbCriticalSection)
g_dbCriticalSection->Unlock();
for (int iRetry = 0; (iRetry < maxRetries) && bContinue; iRetry++)
{
try
{
// we just use an auto lock object so it is unlocked automatically, it uses same
// critical section object.
if( g_dbCriticalSection)
g_dbCriticalSection->Lock();
int newCommandTimeout = currentCommandTimeout + 15 * iRetry;
m_pConnection->CommandTimeout = newCommandTimeout;
if(bIsSelect || nOption == openQuery || nOption == openUnknown)
{
m_pRecordset->Open((LPCTSTR)m_strQuery, _variant_t((IDispatch*)mpdb, TRUE),
adCursorType, adLockOptimistic, adCmdUnknown);
}
else if(nOption == openTable)
{
m_pRecordset->Open((LPCTSTR)m_strQuery, _variant_t((IDispatch*)mpdb, TRUE),
adOpenDynamic, adLockOptimistic, adCmdTable);
}
else if(nOption == openStoredProc)
{
m_pCmd->ActiveConnection = mpdb;
m_pCmd->CommandText = _bstr_t(m_strQuery);
m_pCmd->CommandType = adCmdStoredProc;
m_pRecordset = m_pCmd->Execute(NULL, NULL, adCmdText);
}
else
{
TRACE( _T("Unknown parameter. %d"), nOption);
if( g_dbCriticalSection)
g_dbCriticalSection->Unlock();
return FALSE;
}
if( g_dbCriticalSection)
g_dbCriticalSection->Unlock();
bContinue = false;
}
catch(_com_error &e)
{
if( g_dbCriticalSection)
g_dbCriticalSection->Unlock();
dump_com_error_without_exception(e, _T("Open"));
// retry Query timeout
CString szDescription;
_bstr_t bstrDescription(e.Description());
szDescription.Format( _T("%s"), (LPCTSTR)bstrDescription);
if ((szDescription.Find(_T("Query timeout expired")) == -1) || (iRetry == maxRetries - 1))
{
m_pConnection->CommandTimeout = currentCommandTimeout;
throw CADOException(e.Error(), e.Description());
}
Sleep (1000);
bContinue = true;
}
}
m_pConnection->CommandTimeout = currentCommandTimeout;
return m_pRecordset != NULL && m_pRecordset->GetState()!= adStateClosed;
}
For the sake of completeness, the above calls this function:
static CCriticalSection* GetDbCriticalSection(const CString& database)
{
// For now we only care about one database and its corresponding critical section
if (database.CompareNoCase( _T("Alr") ) == 0)
return &g_csAlrDb; // g_csAlrDb is defined static global in this file
else
return 0;
}
The Open() function gets called for various databases, I am only locking guarding access to one database. As you can see there is corresponding lock/unlocks so not sure how does code comes up of these functions leave th critical section locked. Could it be because of MFC issue?
In my case, most of the time, when C++ software behaves different between debug and release versions, it's because of uninitialized variables, different libraries linked, or compiler optimizations backfiring.
To trace the bug, try evaluating variables and function return values, i.e. LoadString, for example with AfxMessageBox().

Media Foundation - How to change frame-size in MFT (Media Foundation Transform)

I am trying to implement an MFT which is able to rotate a video. The rotation itself would be done inside a transform function. For that i need to change the output frame size but i donĀ“t know how to do that.
As a starting point, i used the MFT_Grayscale example given by Microsoft. I included this MFT in a partial topology as a transform node
HRESULT Player::AddBranchToPartialTopology(
IMFTopology *pTopology,
IMFPresentationDescriptor *pSourcePD,
DWORD iStream
)
{
...
IMFTopologyNode pTransformNode = NULL;
...
hr = CreateTransformNode(CLSID_GrayscaleMFT, &pTransformNode);
...
hr = pSourceNode->ConnectOutput(0, pTransformNode, 0);
hr = pTransformNode->ConnectOutput(0, pOutputNode, 0);
...
}
This code is working so far. The grayscale mft is applied and working as expected. Anyway i want to change this mft to handle video rotation. So lets assume i want to rotate a video by 90 degrees. For that the width and height of my input frame have to be switched. I tried different things but none of them workes as expected.
Based on the first comment in this thread How to change Media Foundation Transform output frame(video) size? i started changing the implementation of SetOutputType. i called GetAttributeSize inside GetOutputType to receive the actual frame_size. It fails when i try to set a new frame_size (when starting playback i receive hresult 0xc00d36b4 (Data specified is invalid, inconsistent, or not supported by this object)
HRESULT CGrayscale::SetOutputType(
DWORD dwOutputStreamID,
IMFMediaType *pType, // Can be NULL to clear the output type.
DWORD dwFlags
)
{ ....
//Receive the actual frame_size of pType (works as expected)
hr = MFGetAttributeSize(
pType,
MF_MT_FRAME_SIZE,
&width,
&height
));
...
//change the framesize
hr = MFSetAttributeSize(
pType,
MF_MT_FRAME_SIZE,
height,
width
));
}
I am sure i miss something here, so any hint will be greatly appreciated.
Thanks in advance
There is a transform available in W8+ that is supposed to do rotation. I haven't had much luck with it myself, but presumably it can be made to work. I'm going to assume that's not a viable solution for you.
The more interesting case is creating an MFT to do the transform.
It turns out there are a number of steps to turn 'Grayscale' into a rotator.
1) As you surmised, you need to affect the frame size on the output type. However, changing the type being passed to SetOutputType is just wrong. The pType being sent to SetOutputType is the type that the client is asking you to support. Changing that media type to something other than what they requested, then returning S_OK to say you support it makes no sense.
Instead what you need to change is the value sent back from GetOutputAvailableType.
2) When calculating the type to send back from GetOutputAvailableType, you need to base it on the IMFMediaType the client sent to SetInputType, with a few changes. And yes, you want to adjust MF_MT_FRAME_SIZE, but you probably also need to adjust MF_MT_DEFAULT_STRIDE, MF_MT_GEOMETRIC_APERTURE, and (possibly) MF_MT_MINIMUM_DISPLAY_APERTURE. Conceivably you might need to adjust MF_MT_SAMPLE_SIZE too.
3) You didn't say whether you intended the rotation amount to be fixed at start of stream, or something that varies during play. When I wrote this, I used the IMFAttributes returned from IMFTransform::GetAttributes to specify the rotation. Before each frame is processed, the current value is read. To make this work right, you need to be able to send MF_E_TRANSFORM_STREAM_CHANGE back from OnProcessOutput.
4) Being lazy, I didn't want to figure out how to rotate NV12 or YUY2 or some such. But there are functions readily available to do this for RGB32. So when my GetInputAvailableType is called, I ask for RGB32.
I experimented with supporting other input types, like RGB24, RGB565, etc, but ran into a problem. When your output type is RGB24, MF adds another MFT downstream to convert the RGB24 back into something it can more easily use (possibly RGB32). And that MFT doesn't support changing media types mid-stream. I was able to get this to work by accepting the variety of subtypes for input, but always outputting RGB32, rotated as specified.
This sounds complicated, but mostly it isn't. If you read the code you'd probably go "Oh, I get it." I'd offer you my source code, but I'm not sure how useful it would be for you. It's in c#, and you were asking about c++.
On the other hand, I'm making a template to make writing MFTs easier. ~A dozen lines of c# code to create the simplest possible MFT. The c# rotation MFT is ~131 lines as counted by VS's Analyze/Calculate code metrics (excluding the template). I'm experimenting with a c++ version, but it's still a bit rough.
Did I forget something? Probably a bunch of things. Like don't forget to generate a new Guid for your MFT instead of using Grayscale's. But I think I've hit the high points.
Edit: Now that my c++ version of the template is starting to work, I feel comfortable posting some actual code. This may make some of the points above clearer. For instance in #2, I talk about basing the output type on the input type. You can see that happening in CreateOutputFromInput. And the actual rotation code is in WriteIt().
I've simplified the code a bit for size, but hopefully this will get you to "Oh, I get it."
void OnProcessSample(IMFSample *pSample, bool Discontinuity, int InputMessageNumber)
{
HRESULT hr = S_OK;
int i = MFGetAttributeUINT32(GetAttributes(), AttribRotate, 0);
i &= 7;
// Will the output use different dimensions than the input?
bool IsOdd = (i & 1) == 1;
// Does the current AttribRotate rotation give a different
// orientation than the old one?
if (IsOdd != m_WasOdd)
{
// Yes, change the output type.
OutputSample(NULL, InputMessageNumber);
m_WasOdd = IsOdd;
}
// Process it.
DoWork(pSample, (RotateFlipType)i);
// Send the modified input sample to the output sample queue.
OutputSample(pSample, InputMessageNumber);
}
void OnSetInputType()
{
HRESULT hr = S_OK;
m_imageWidthInPixels = 0;
m_imageHeightInPixels = 0;
m_cbImageSize = 0;
m_lInputStride = 0;
IMFMediaType *pmt = GetInputType();
// type can be null to clear
if (pmt != NULL)
{
hr = MFGetAttributeSize(pmt, MF_MT_FRAME_SIZE, &m_imageWidthInPixels, &m_imageHeightInPixels);
ThrowExceptionForHR(hr);
hr = pmt->GetUINT32(MF_MT_DEFAULT_STRIDE, &m_lInputStride);
ThrowExceptionForHR(hr);
// Calculate the image size (not including padding)
m_cbImageSize = m_imageHeightInPixels * m_lInputStride;
}
else
{
// Since the input must be set before the output, nulling the
// input must also clear the output. Note that nulling the
// input is only valid if we are not actively streaming.
SetOutputType(NULL);
}
}
IMFMediaType *CreateOutputFromInput(IMFMediaType *inType)
{
// For some MFTs, the output type is the same as the input type.
// However, since we are rotating, several attributes in the
// media type (like frame size) must be different on our output.
// This routine generates the appropriate output type for the
// current input type, given the current state of m_WasOdd.
IMFMediaType *pOutputType = CloneMediaType(inType);
if (m_WasOdd)
{
HRESULT hr;
UINT32 h, w;
// Intentionally backward
hr = MFGetAttributeSize(inType, MF_MT_FRAME_SIZE, &h, &w);
ThrowExceptionForHR(hr);
hr = MFSetAttributeSize(pOutputType, MF_MT_FRAME_SIZE, w, h);
ThrowExceptionForHR(hr);
MFVideoArea *a = GetArea(inType, MF_MT_GEOMETRIC_APERTURE);
if (a != NULL)
{
a->Area.cy = h;
a->Area.cx = w;
SetArea(pOutputType, MF_MT_GEOMETRIC_APERTURE, a);
}
a = GetArea(inType, MF_MT_MINIMUM_DISPLAY_APERTURE);
if (a != NULL)
{
a->Area.cy = h;
a->Area.cx = w;
SetArea(pOutputType, MF_MT_MINIMUM_DISPLAY_APERTURE, a);
}
hr = pOutputType->SetUINT32(MF_MT_DEFAULT_STRIDE, w * 4);
ThrowExceptionForHR(hr);
}
return pOutputType;
}
void WriteIt(BYTE *pBuffer, RotateFlipType fm)
{
Bitmap *v = new Bitmap((int)m_imageWidthInPixels, (int)m_imageHeightInPixels, (int)m_lInputStride, PixelFormat32bppRGB, pBuffer);
if (v == NULL)
throw (HRESULT)E_OUTOFMEMORY;
try
{
Status s;
s = v->RotateFlip(fm);
if (s != Ok)
throw (HRESULT)E_UNEXPECTED;
Rect r;
if (!m_WasOdd)
{
r.Width = (int)m_imageWidthInPixels;
r.Height = (int)m_imageHeightInPixels;
}
else
{
r.Height = (int)m_imageWidthInPixels;
r.Width = (int)m_imageHeightInPixels;
}
BitmapData bmd;
bmd.Width = r.Width,
bmd.Height = r.Height,
bmd.Stride = 4*bmd.Width;
bmd.PixelFormat = PixelFormat32bppARGB;
bmd.Scan0 = (VOID*)pBuffer;
bmd.Reserved = NULL;
s = v->LockBits(&r, ImageLockModeRead + ImageLockModeUserInputBuf, PixelFormat32bppRGB, &bmd);
if (s != Ok)
throw (HRESULT)E_UNEXPECTED;
s = v->UnlockBits(&bmd);
if (s != Ok)
throw (HRESULT)E_UNEXPECTED;
}
catch(...)
{
delete v;
throw;
}
delete v;
}

Get continuous frames from camera filter when source filter stops sending?

Hi All,
I have a transform filter which takes two inputs, one from camera and other from a file source. Inside my transform filter I am blending the inputs from two sources.
Transform filter is derived from CTransformFilter
class CWMTransformFilter : public CTransformFilter
and extra pin is derived from: CTransformInputPin(which inturn derives from CBaseInputPin)
class CFileInputPin : public CTransformInputPin
In my case what is happening is, if the file source is small (assume 10 secs), I get input from camera also for 10 secs, later camera stops sending frames to the input pin.
So what i now need is :
1. How to inform camera to send frames even when source filter stops sending ?
2. How to restart the source filter when the playing of the source file is stopped? (something like playing file in loop)
Update:
STDMETHODIMP CFileInputPin::EndOfStream()
{
//return CTransformInputPin::EndOfStream();
return S_OK;
}
STDMETHODIMP CFileInputPin::Receive(IMediaSample* pSample)
{
HRESULT hr;
BYTE* pBufferIn;
long lBufferLength, lBufferSize;
hr = CBaseInputPin::Receive(pSample);
if (FAILED(hr))
{
printf("Error !!\n");
return hr;
}
hr = pSample->GetPointer(&pBufferIn);
DWORD stat = WaitForSingleObject(m_pFil->m_QSem,0L);
BOOL bSem = FALSE;
if( WAIT_OBJECT_0 == stat )
{
BYTE *pBuf = (BYTE *) malloc( Wsize*2 );
memcpy(pBuf,pBufferIn,Wsize); //lBufferLength);
sEncodedFrame CurFrame={pBuf,Wsize};
m_pFil->m_Q.push(CurFrame); //push it onto the queue
bSem = ReleaseSemaphore(m_pFil->m_QSem,1,NULL);
if(!bSem)
{
printf("ReleaseSemaphore error: %d \n", GetLastError());
}
return S_OK;
}
else
{
printf("Cant Receive frame 0x%x \n",stat);
return E_FAIL;
}
return S_OK;
}
HRESULT CWMTransformFilter::Transform(IMediaSample *pSource, IMediaSample *pDest)
{
unsigned char r,g,b;
unsigned char y,u,v;
BYTE *pBufferIn, *pBufferOut, *pBuf;
HRESULT hr = pSource->GetPointer(&pBufferIn);
hr = pDest->GetPointer(&pBufferOut);
if (FAILED(hr))
{
return hr;
}
long srclen = pSource->GetActualDataLength();
long dstlen = pDest->GetActualDataLength();
LONG pLastCnt;
BOOL bSem = FALSE;
//printf("Waiting to fill buffer %d\n",pSource);
//return S_OK;
//try
//{
while(1)
{
//if(1)
DWORD ret = WaitForSingleObject(m_QSem,0L);
if(ret != WAIT_OBJECT_0)
{
printf("Get error %d \n",GetLastError());
}
if( WAIT_OBJECT_0 == ret )
{
sEncodedFrame Frame;
if( m_Q.empty() == false )
{
Frame = m_Q.front();
m_Q.pop();
pBuf = (BYTE*) malloc(dstlen*2);
pLastCnt = Frame.iValidSize;
printf("Copy onto queue \n");
memcpy(pBuf,Frame.pFrame,pLastCnt); //Frame.iValidSize);
free(Frame.pFrame);
//delete &Frame;
bSem = ReleaseSemaphore(m_QSem,1,NULL);
if(!bSem)
{
printf("ReleaseSemaphore error: %d \n", GetLastError());
}
hr = S_OK;
break;
}
else
{
bSem = ReleaseSemaphore(m_QSem,1,NULL);
if(!bSem)
{
printf("ReleaseSemaphore error: %d \n", GetLastError());
}
//return S_OK;
hr = E_FAIL;
}
}
else
{
//return S_OK;
hr = E_FAIL;
}
}
for(i = 0; i < windowWidth*2*windowHeight ; i+=4)
{
y = pBufferIn[i];
u = pBufferIn[i+1];
v = pBufferIn[i+3];
r = y + 1.4075 * (v - 128);
g = y - 0.3455 * (u - 128) - (0.7169 * (v - 128));
b = y + 1.7790 * (u - 128);
if(((r > b) &&(g > b)) && (g <= 200) )
{
pBufferIn[i] = pBuf[i];
pBufferIn[i+1] = pBuf[i+1];
pBufferIn[i+2] = pBuf[i+2];
pBufferIn[i+3] = pBuf[i+3];
}
}
// Process the data.
memcpy(pBufferOut,pBufferIn,pSource->GetSize()); //after blend
pDest->SetActualDataLength(pDest->GetSize());
pDest->SetSyncPoint(TRUE);
return S_OK;
}
CFileInputPin::Receive is where I receive samples from file input pin,
CFileInputPin::EndOfStream(), notifies that samples are completed.
CWMTransformFilter::Transform(), is where the samples are given out to the outpin to renderer.
Thanks,
Shyam
There are many tricky things taken into the question.
First of all, the first point of interest is custom dual input filter. Whatever input filters do, this transform filter can rule things out - it solely depends on its implementation whether it is going to allow both input legs stream, or it will block one of it. The common (typical) rule is that if a filter has 2+ inputs, it is either of the two:
One of the streams is master, and the other inputs are either taken into processing if they make sense or they are discarded
The filter blocks input input legs to keep getting data with matching time stamps, then merges streams over the course of its internal processing.
The input streams are typically sequences of samples followed by EOS notification. In particular, a freeze might take place if one of the sources does not send EOS, or transform filter does not process it properly.
The second big issue is seeking. You normally don't seek a part of the graph. However, here it is exactly what you are trying to do. You can try seek the source filter independently locating its own seeking interfaces, or otherwise you can implement a buffer and hold everything it sends and looping this infinitely once source sends EOS. There is no advice on this - you decide what is appropriate in your scenario.
Another option is to split graphs and bridge them so that you could seek/restart source graph the regular way.

Buffer communication speed nightmare

I'm trying to use buffers to communicate between several 'layers' (threads) in my program and now that I have visual output of what's going on inside, I realize there's a devastating amount of time being eaten up in the process of using these buffers.
Here's some notes about what's going on in my code.
when the rendering mode is triggered in this thread, it begins sending as many points as it can to the layer (thread) below it
the points from the lower thread are then processed and returned to this thread via the output buffer of the lower thread
points received back are mapped (for now) as white pixels in the D3D surface
if I bypass the buffer and put the points directly into the surface pixels, it only takes about 3 seconds to do the whole job
if I hand the point down and then have the lower layer pass it right back up, skipping any actual number-crunching, the whole job takes about 30 minutes (which makes the whole program useless)
changing the size of my buffers has no noticeable effect on the speed
I was originally using MUTEXes in my buffers but have eliminated them in attempt the fix the problem
Is there something I can do differently to fix this speed problem I'm having?
...something to do with the way I'm handling these messages???
Here's my code
I'm very sorry that it's such a mess. I'm having to move way too fast on this project and I've left a lot of pieces laying around in comments where I've been experimenting.
DWORD WINAPI CONTROLSUBSYSTEM::InternalExProcedure(__in LPVOID lpSelf)
{
XMSG xmsg;
LPCONTROLSUBSYSTEM lpThis = ((LPCONTROLSUBSYSTEM)lpSelf);
BOOL bStall;
BOOL bRendering = FALSE;
UINT64 iOutstandingPoints = 0; // points that are out being tested
UINT64 iPointsDone = 0;
UINT64 iPointsTotal = 0;
BOOL bAssigning;
DOUBLE dNextX;
DOUBLE dNextY;
while(1)
{
if( lpThis->hwTargetWindow!=NULL && lpThis->d3ddev!=NULL )
{
lpThis->d3ddev->Clear(0,NULL,D3DCLEAR_TARGET,D3DCOLOR_XRGB(0,0,0),1.0f,0);
if(lpThis->d3ddev->BeginScene())
{
lpThis->d3ddev->StretchRect(lpThis->sfRenderingCanvas,NULL,lpThis->sfBackBuffer,NULL,D3DTEXF_NONE);
lpThis->d3ddev->EndScene();
}
lpThis->d3ddev->Present(NULL,NULL,NULL,NULL);
}
//bStall = TRUE;
// read input buffer
if(lpThis->bfInBuffer.PeekMessage(&xmsg))
{
bStall = FALSE;
if( HIBYTE(xmsg.wType)==HIBYTE(CONT_MSG) )
{
// take message off
lpThis->bfInBuffer.GetMessage(&xmsg);
// double check consistency
if( HIBYTE(xmsg.wType)==HIBYTE(CONT_MSG) )
{
switch(LOBYTE(xmsg.wType))
{
case SETRESOLUTION_MSG:
lpThis->iAreaWidth = (UINT)xmsg.dptPoint.X;
lpThis->iAreaHeight = (UINT)xmsg.dptPoint.Y;
lpThis->sfRenderingCanvas->Release();
if(lpThis->d3ddev->CreateOffscreenPlainSurface(
(UINT)xmsg.dptPoint.X,(UINT)xmsg.dptPoint.Y,
D3DFMT_X8R8G8B8,
D3DPOOL_DEFAULT,
&(lpThis->sfRenderingCanvas),
NULL)!=D3D_OK)
{
MessageBox(NULL,"Error resizing surface.","ERROR",MB_ICONERROR);
}
else
{
D3DLOCKED_RECT lrt;
if(D3D_OK == lpThis->sfRenderingCanvas->LockRect(&lrt,NULL,0))
{
lpThis->iPitch = lrt.Pitch;
VOID *data;
data = lrt.pBits;
ZeroMemory(data,lpThis->iPitch*lpThis->iAreaHeight);
lpThis->sfRenderingCanvas->UnlockRect();
MessageBox(NULL,"Surface Resized","yay",0);
}
else
{
MessageBox(NULL,"Error resizing surface.","ERROR",MB_ICONERROR);
}
}
break;
case SETCOLORMETHOD_MSG:
break;
case SAVESNAPSHOT_MSG:
lpThis->SaveSnapshot();
break;
case FORCERENDER_MSG:
bRendering = TRUE;
iPointsTotal = lpThis->iAreaHeight*lpThis->iPitch;
iPointsDone = 0;
MessageBox(NULL,"yay, render something!",":o",0);
break;
default:
break;
}
}// else, lost this message
}
else
{
if( HIBYTE(xmsg.wType)==HIBYTE(MATH_MSG) )
{
XMSG xmsg2;
switch(LOBYTE(xmsg.wType))
{
case RESETFRAME_MSG:
case ZOOMIN_MSG:
case ZOOMOUT_MSG:
case PANUP_MSG:
case PANDOWN_MSG:
case PANLEFT_MSG:
case PANRIGHT_MSG:
// tell self to start a render
xmsg2.wType = CONT_MSG|FORCERENDER_MSG;
if(lpThis->bfInBuffer.PutMessage(&xmsg2))
{
// pass it down
while(!lpThis->lplrSubordinate->PutMessage(&xmsg));
// message passed so pull it from buffer
lpThis->bfInBuffer.GetMessage(&xmsg);
}
break;
default:
// pass it down
if(lpThis->lplrSubordinate->PutMessage(&xmsg))
{
// message passed so pull it from buffer
lpThis->bfInBuffer.GetMessage(&xmsg);
}
break;
}
}
else if( lpThis->lplrSubordinate!=NULL )
// pass message down
{
if(lpThis->lplrSubordinate->PutMessage(&xmsg))
{
// message passed so pull it from buffer
lpThis->bfInBuffer.GetMessage(&xmsg);
}
}
}
}
// read output buffer from subordinate
if( lpThis->lplrSubordinate!=NULL && lpThis->lplrSubordinate->PeekMessage(&xmsg) )
{
bStall = FALSE;
if( xmsg.wType==(REPLY_MSG|TESTPOINT_MSG) )
{
// got point test back
D3DLOCKED_RECT lrt;
if(D3D_OK == lpThis->sfRenderingCanvas->LockRect(&lrt,NULL,0))
{
INT pitch = lrt.Pitch;
VOID *data;
data = lrt.pBits;
INT Y=dRound((xmsg.dptPoint.Y/(DOUBLE)100)*((DOUBLE)lpThis->iAreaHeight));
INT X=dRound((xmsg.dptPoint.X/(DOUBLE)100)*((DOUBLE)pitch));
// decide color
if( xmsg.iNum==0 )
((WORD *)data)[X+Y*pitch] = 0xFFFFFFFF;
else
((WORD *)data)[X+Y*pitch] = 0xFFFFFFFF;
// message handled so remove from buffer
lpThis->lplrSubordinate->GetMessage(&xmsg);
lpThis->sfRenderingCanvas->UnlockRect();
}
}
else if(lpThis->bfOutBuffer.PutMessage(&xmsg))
{
// message sent so pull the real one off the buffer
lpThis->lplrSubordinate->GetMessage(&xmsg);
}
}
if( bRendering && lpThis->lplrSubordinate!=NULL )
{
bAssigning = TRUE;
while(bAssigning)
{
dNextX = 100*((DOUBLE)(iPointsDone%lpThis->iPitch))/((DOUBLE)lpThis->iPitch);
dNextY = 100*(DOUBLE)((INT)(iPointsDone/lpThis->iPitch))/(DOUBLE)(lpThis->iAreaHeight);
xmsg.dptPoint.X = dNextX;
xmsg.dptPoint.Y = dNextY;
//
//xmsg.iNum = 0;
//xmsg.wType = REPLY_MSG|TESTPOINT_MSG;
//
xmsg.wType = MATH_MSG|TESTPOINT_MSG;
/*D3DLOCKED_RECT lrt;
if(D3D_OK == lpThis->sfRenderingCanvas->LockRect(&lrt,NULL,0))
{
INT pitch = lrt.Pitch;
VOID *data;
data = lrt.pBits;
INT Y=dRound((dNextY/(DOUBLE)100)*((DOUBLE)lpThis->iAreaHeight));
INT X=dRound((dNextX/(DOUBLE)100)*((DOUBLE)pitch));
((WORD *)data)[X+Y*pitch] = 0xFFFFFFFF;
lpThis->sfRenderingCanvas->UnlockRect();
}
iPointsDone++;
if( iPointsDone>=iPointsTotal )
{
MessageBox(NULL,"done rendering","",0);
bRendering = FALSE;
bAssigning = FALSE;
}
*/
if( lpThis->lplrSubordinate->PutMessage(&xmsg) )
{
bStall = FALSE;
iPointsDone++;
if( iPointsDone>=iPointsTotal )
{
MessageBox(NULL,"done rendering","",0);
bRendering = FALSE;
bAssigning = FALSE;
}
}
else
{
bAssigning = FALSE;
}
}
}
//if( bStall )
//Sleep(10);
}
return 0;
}
}
(still getting used to this forum's code block stuff)
Edit:
Here's an example that I perceive to be similar in concept, although this example consumes the messages it produces in the same thread.
#include <Windows.h>
#include "BUFFER.h"
int main()
{
BUFFER myBuffer;
INT jobsTotal = 1024*768;
INT currentJob = 0;
INT jobsOut = 0;
XMSG xmsg;
while(1)
{
if(myBuffer.PeekMessage(&xmsg))
{
// do something with message
// ...
// if successful, remove message
myBuffer.GetMessage(&xmsg);
jobsOut--;
}
while( currentJob<jobsTotal )
{
if( myBuffer.PutMessage(&xmsg) )
{
currentJob++;
jobsOut++;
}
else
{
// buffer is full at the moment
// stop for now and put more on later
break;
}
}
if( currentJob==jobsTotal && jobsOut==0 )
{
MessageBox(NULL,"done","",0);
break;
}
}
return 0;
}
This example also runs in about 3 seconds, as opposed to 30 minutes.
Btw, if anybody knows why visual studio keeps trying to make me say PeekMessageA and GetMessageA instead of the actual names I defined, that would be nice to know as well.
Locking and Unlocking an entire rect to change a single point is probably not very efficient, you might be better off generating a list of points you intend to modify and then locking the rect once, iterating over that list and modifying all the points, and then unlocking the rect.
When you lock the rect you are effectively stalling concurrent access to it, so its like a mutex for the GPU in that respect - then you only modify a single pixel. Doing this repeatedly for each pixel will constantly stall the GPU. You could use D3DLOCK_NOSYSLOCK to avoid this to some extent, but I'm not sure if it will play nicely in the larger context of your program.
I'm obviously not entirely sure what the goal of your algorithm is, but if you are trying to parallel process pixels on a d3d surface, then i think the best approach would be via a shader on the GPU.
Where you basically generate an array in system memory, populate it with "input" values on a per point/pixel basis, then generate a texture on a GPU from the array. Next you paint the texture to a full screen quad, and then render it with a pixel shader to some render target. The shader can be coded to process each point in whatever way you like, the GPU will take care of optimizing parallelization. Then you generate a new texture from that render target and then you copy that texture into a system memory array. And then you can extract all your outputs from that array. You can also apply multiple shaders to the render target result back into the render target to pipeline multiple transformations if needed.
A couple notes:
Don't write your own messape-passing code. It may be correct and slow, or fast and buggy. It takes a lot of experience to design code that's fast and then getting it bug-free is really hard, because debugging threaded code is hard. Win32 provides a couple of efficient threadsafe queues: SList and the window message queue.
Your design splits up work in the worst possible way. Passing information between threads is expensive even under the best circumstances, because it causes cache contention, both on the data and on the synchronization objects. It's MUCH better to split your work into distinct non-interacting (or minimize interaction) datasets and give each to a separate thread, that is then responsible for all stages of processing that dataset.
Don't poll.
That's likely to be the heart of the problem. You have a task continually calling peekmessage and probably finding nothing there. This will just eat all available CPU. Any task that wants to post messages is unlikely to receive any CPU time to acheive this.
I can't remember how you'd achieve this with the windows message queue (probably WaitMessage or some variant) but typically you might implement this with a counting semaphore. When the consumer wants data, it waits for the semaphore to be signalled. When the producer has data, it signals the semaphore.
I managed to resolve it by redesigning the whole thing
It now passes huge payloads instead of individual tasks
(I'm the poster)