setjmp, longjump and stack reconstruction - c++

Normally setjmp and longjmp does not care about call stack - instead functions are just preserving and restoring registers.
I would like to use setjmp and longjmp so that call stack would be preserved, and then restored at different executing context
EnableFeature( bool bEnable )
{
if( bEnable )
{
if( setjmp( jmpBuf ) == 0 )
{
backup call stack
} else {
return; //Playback backuped call stack + new call stack
}
} else {
restore saved call stack on top of current call stack
modify jmpBuf so we will jump to new stack ending
longjmp( jmpBuf )
}
Is this kind of approach possible - can someone code me a sample code for this ?
Why I believe by myself it's doable - is because of similar code snipet I have already coded / prototyped:
Communication protocol and local loopback using setjmp / longjmp
There is two call stack running simultaneously - independently from each other.
But just to help you out with this task - I'll give you function for getting callstack for native and managed code:
//
// Originated from: https://sourceforge.net/projects/diagnostic/
//
// Similar to windows API function, captures N frames of current call stack.
// Unlike windows API function, works with managed and native functions.
//
int CaptureStackBackTrace2(
int FramesToSkip, //[in] frames to skip, 0 - capture everything.
int nFrames, //[in] frames to capture.
PVOID* BackTrace //[out] filled callstack with total size nFrames - FramesToSkip
)
{
#ifdef _WIN64
CONTEXT ContextRecord;
RtlCaptureContext( &ContextRecord );
UINT iFrame;
for( iFrame = 0; iFrame < (UINT)nFrames; iFrame++ )
{
DWORD64 ImageBase;
PRUNTIME_FUNCTION pFunctionEntry = RtlLookupFunctionEntry( ContextRecord.Rip, &ImageBase, NULL );
if( pFunctionEntry == NULL )
{
if( iFrame != -1 )
iFrame--; // Eat last as it's not valid.
break;
}
PVOID HandlerData;
DWORD64 EstablisherFrame;
RtlVirtualUnwind( 0 /*UNW_FLAG_NHANDLER*/,
ImageBase,
ContextRecord.Rip,
pFunctionEntry,
&ContextRecord,
&HandlerData,
&EstablisherFrame,
NULL );
if( FramesToSkip > (int)iFrame )
continue;
BackTrace[iFrame - FramesToSkip] = (PVOID)ContextRecord.Rip;
}
#else
//
// This approach was taken from StackInfoManager.cpp / FillStackInfo
// http://www.codeproject.com/Articles/11221/Easy-Detection-of-Memory-Leaks
// - slightly simplified the function itself.
//
int regEBP;
__asm mov regEBP, ebp;
long *pFrame = (long*)regEBP; // pointer to current function frame
void* pNextInstruction;
int iFrame = 0;
//
// Using __try/_catch is faster than using ReadProcessMemory or VirtualProtect.
// We return whatever frames we have collected so far after exception was encountered.
//
__try {
for( ; iFrame < nFrames; iFrame++ )
{
pNextInstruction = (void*)(*(pFrame + 1));
if( !pNextInstruction ) // Last frame
break;
if( FramesToSkip > iFrame )
continue;
BackTrace[iFrame - FramesToSkip] = pNextInstruction;
pFrame = (long*)(*pFrame);
}
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
}
#endif //_WIN64
iFrame -= FramesToSkip;
if( iFrame < 0 )
iFrame = 0;
return iFrame;
} //CaptureStackBackTrace2
I think it can be modified to obtain actual stack pointer (x64 - eSP and for x32 - there is a pointer already).

Legally, setjmp/longjmp can only be used to jump "back" in the nested call sequence. Which means that it never needs to really "reconstruct" anything - at the moment when you execute the longjmp everything is still intact, right there in the stack. All you need to do is rollback the extra stuff accumulated on top of that between the moment of setjmp and the moment of longjmp.
longjmp automatically does a "shallow" rollback for you (i.e. it simply purges the raw bytes off the top of the stack without calling any destructors). So, if you wanted to do a proper "deep" rollback (like what exceptions do as they fly up the call hierarchy) you'd have to setjmp at each level that needs deep cleanup, "intercept" the jump, perform the cleanup manually and then longjmp further up the call hierarchy.
But this would basically be a manual implementation of "poor-man's exception handling". Why would you want to reimplement it manually? I'd understand if you wanted to do it in C code. But why in C++?
P.S. And yes, setjmp/longjmp are sometimes used in a non-standard way to implement co-routines in C, which does involve jumping "across" and a raw form of stack restoration. But this is non-standard. And in general case it would be much more painful to implement in C++ for the very same reasons that I mentioned above.

Related

WinAPI's VirtualQueryEx Shows an Inaccurate Amount of Memory Used by a Process

I have the following function, written in C++ using WinAPI:
The function receives a handle to a process, and should return the size of the memory used by the process.
DWORD GetProcessCommitedMemorySize(const HANDLE hProcess)
{
MEMORY_BASIC_INFORMATION mbi;
DWORD totalMemory = 0;
DWORD currAddr = 0;
//for checking repetion of pages base addresses
std::unordered_set<DWORD> tempAddresses;
while (VirtualQueryEx(hProcess, (LPVOID)currAddr, &mbi, (SIZE_T)sizeof(mbi)) == (SIZE_T)sizeof(mbi))
{
if (tempAddresses.find((DWORD)mbi.BaseAddress) == tempAddresses.end()) // not found
{
currAddr = (DWORD)mbi.BaseAddress + mbi.RegionSize;
tempAddresses.insert((DWORD)mbi.BaseAddress);
if (mbi.State == MEM_COMMIT && mbi.Type == MEM_PRIVATE)
{
totalMemory += mbi.RegionSize;
}
}
else
{
break;
}
}
return totalMemory;
}
The function always returns a number which is bigger than the amount of memory which is shown to be used on the 'details' page of the Task Manager.
for example, 86155 KB as the function's output, and 56924 KB on the Task Manager.
The program still isn't complete so I need to hard-code the PID, so I know I'm looking at the right one.
Does this issue occur because the number I'm seeking with this function and the number on the Task Manager have different meanings, or is my code just not doing what's it supposed to do?
Thanks.

How to trap stack overflow in a Windows x64 C++ application

I am trying to compile an application to x64 platform architecture in Windows. A couple of threads, handling the parsing of a scripting language, uses this code recommended by Microsoft to trap stack overflows and avoid access violation exceptions:
__try
{
DoSomethingThatMightUseALotOfStackMemory();
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
LPBYTE lpPage;
static SYSTEM_INFO si;
static MEMORY_BASIC_INFORMATION mi;
static DWORD dwOldProtect;
// Get page size of system
GetSystemInfo(&si);
// Find SP address
_asm mov lpPage, esp;
// Get allocation base of stack
VirtualQuery(lpPage, &mi, sizeof(mi));
// Go to page beyond current page
lpPage = (LPBYTE)(mi.BaseAddress)-si.dwPageSize;
// Free portion of stack just abandoned
if (!VirtualFree(mi.AllocationBase,
(LPBYTE)lpPage - (LPBYTE)mi.AllocationBase,
MEM_DECOMMIT))
{
exit(1);
}
// Reintroduce the guard page
if (!VirtualProtect(lpPage, si.dwPageSize,
PAGE_GUARD | PAGE_READWRITE,
&dwOldProtect))
{
exit(1);
}
Sleep(2000);
}
Unfortunately it uses one line of inline assembler to get the stack pointer. Visual Studio does not support inline assembly for x64 mode and I can't find a compiler intrinsic for getting the stack pointer neither.
Is it possible to do this in a x64 friendly manner?
As pointed out in a comment to the question, the whole "hack" above can be replaced by the _resetstkoflw function. This works fine in both x86 and x64 mode.
The code snippet above then becomes:
// Filter for the stack overflow exception. This function traps
// the stack overflow exception, but passes all other exceptions through.
int stack_overflow_exception_filter(int exception_code)
{
if (exception_code == EXCEPTION_STACK_OVERFLOW)
{
// Do not call _resetstkoflw here, because at this point
// the stack is not yet unwound. Instead, signal that the
// handler (the __except block) is to be executed.
return EXCEPTION_EXECUTE_HANDLER;
}
else
return EXCEPTION_CONTINUE_SEARCH;
}
void example()
{
int result = 0;
__try
{
DoSomethingThatMightUseALotOfStackMemory();
}
__except(stack_overflow_exception_filter(GetExceptionCode()))
{
// Here, it is safe to reset the stack.
result = _resetstkoflw();
}
// Terminate if _resetstkoflw failed (returned 0)
if (!result)
return 3;
return 0;
}

How to measure and fix context switching bottlenecks?

I have a multi-threaded socket program. I use boost threadpool (http://threadpool.sourceforge.net/) for executing tasks. I create a TCP client socket per thread in threadpool. Whenever I send large amount of data say 500KB (message size), the throughput reduces significantly. I checked my code for:
1) Waits that might cause context-switching
2) Lock/Mutexes
For example, a 500KB message is divided into multiple lines and I send each line through the socket using ::send( ).
typedef std::list< std::string > LinesListType;
// now send the lines to the server
for ( LinesListType::const_iterator it = linesOut.begin( );
it!=linesOut.end( );
++it )
{
std::string line = *it;
if ( !line.empty( ) && '.' == line[0] )
{
line.insert( 0, "." );
}
SendData( line + CRLF );
}
SendData:
void SendData( const std::string& data )
{
try
{
uint32_t bytesToSendNo = data.length();
uint32_t totalBytesSent = 0;
ASSERT( m_socketPtr.get( ) != NULL )
while ( bytesToSendNo > 0 )
{
try
{
int32_t ret = m_socketPtr->Send( data.data( ) + totalBytesSent, bytesToSendNo );
if ( 0 == ret )
{
throw;
}
bytesToSendNo -= ret;
totalBytesSent += ret;
}
catch( )
{
}
}
}
catch()
{
}
}
Send Method in Client Socket:
int Send( const char* buffer, int length )
{
try
{
int bytes = 0;
do
{
bytes = ::send( m_handle, buffer, length, MSG_NOSIGNAL );
}
while ( bytes == -1 && errno == EINTR );
if ( bytes == -1 )
{
throw SocketSendFailed( );
}
return bytes;
}
catch( )
{
}
}
Invoking ::select() before sending caused context switches since ::select could block. Holding a lock on shared mutex caused parallel threads to wait and switch context. That affected the performance.
Is there a best practice for avoiding context switches especially in network programming? I have spent at least a week trying to figure out various tools with no luck (vmstat, callgrind in valgrind). Any tools on Linux would help measuring these bottlenecks?
In general, not related to networking, you need one thread for each resource that could be used in parallel. In other words, if you have a single network interface, a single thread is enough to service the network interface. Since you don't typically just receive or send data but also do something with it, your thread then switches to consume a different resource like e.g. the CPU for computations or the IO channel to the harddisk for storage or retrieval. This task then needs to be done in a different thread, while the single network thread keeps retrieving messages from the network.
As a consequence, your approach of creating a thread for each connection seems a simple way to keep things clean and separate, but it simply doesn't scale since it involves too much unnecessary context switching. Instead, keep the networking in one place if you can. Also, don't reinvent the wheel. There are tools like e.g. zeromq out there that serve several connections, assemble whole messages from fragmented network packets and only invoke a callback when one message was completely received. And it does so performantly, so I'd suggest using this tool as a base for your communication. In addition, it provides a plethora of language bindings, so you can quickly prototype nodes using a scripting language and switch to C++ for performance lateron.
Lastly, I'm afraid that the library you are using (which does not seem to be part of Boost!) is abandonware, i.e. its development is discontinued. I'm not sure of that, but looking at the changelog, they claim that they made it compatible to Boost 1.37, which is really old. Make sure that what you are using is worth your time!

Memory Access error using STK and the FMVoices class

I'm trying to use the STK from Stanford to do some realtime wavetable synthesis. I'm using the FMVoices instrument class https://ccrma.stanford.edu/software/stk/classstk_1_1FMVoices.html
and trying to use it in a callback routine defined below.
int tick( void *outputBuffer, void *inputBuffer, unsigned int nBufferFrames,
double streamTime, RtAudioStreamStatus status, void *dataPointer )
{
FMVoices *FM = (FMVoices *) dataPointer;
register StkFloat *samples = (StkFloat *) outputBuffer;
for ( unsigned int i=0; i<nBufferFrames; i++ )
*samples++ = FM->tick();
return 0;
}
The issue, I think, is with the type of that last parameter. I'm getting a runtime error : "0xC0000005: Access violation executing location 0x000000001." Now, this is the way that the callback is supposed to be written for other STK instruments like Clarinet or even the FileLoop class, but there's something funky about FMVoices. The object is passed to openStream (which handles platform specific realtime output) as a pointer to void. The callback is called automatically when the system's audio buffer is full. A code snippet that implements this and DOES work for other instruments is shown below:
int main()
{
// Set the global sample rate before creating class instances.
Stk::setSampleRate( 44100.0 );
RtAudio dac;
Instrmnt * instrument_FM;
int nFrames = 10000;
try {
instrument_FM = new FMVoices;
}
catch ( StkError & ) {
goto cleanup;
}
instrument_FM->setFrequency(440.0);
// Figure out how many bytes in an StkFloat and setup the RtAudio stream.
RtAudio::StreamParameters parameters;
parameters.deviceId = dac.getDefaultOutputDevice();
parameters.nChannels = 1;
RtAudioFormat format = ( sizeof(StkFloat) == 8 ) ? RTAUDIO_FLOAT64 : RTAUDIO_FLOAT32;
unsigned int bufferFrames = RT_BUFFER_SIZE;
try {
dac.openStream( &parameters, NULL, format, (unsigned int)Stk::sampleRate(), &bufferFrames, &tick, (void *)&instrument_FM);
}
catch ( RtError &error ) {
error.printMessage();
Sleep(1000);
goto cleanup;
}
The size of nFrames does not seem to have an effect. It just seemed to me that these types of errors usually come from referencing a pointer to void.
The problem is you are taking the address of a pointer, and passing it into openStream.
// pointer to instrument
Instrmnt * instrument_FM;
// snip ...
// &instrument_FM is a pointer to a pointer! i.e. Instrmnt **
dac.openStream( &parameters, /* other params */, (void *)&instrument_FM)
The quickest solution is to just get rid of the & in that line.
Now some comments on C++, and some more fixes to your code. The code looks like a mixture of C and Java, and opens up a lot of pitfalls to fall into, one of which led to your problem.
There is no need for dynamically allocating FMVoices . Use the stack just like you did for RtAudio dac.
No need to worry about pointers, and deleteing the memory you allocated
Therefore no memory leaks.
Just write FMVoices instrument_FM;
There is no need to do try/catch in most cases for cleanup since C++ has destructors that trigger at the end of scope, and propagate the error.
If you only use the stack, there is no need to worry about delete and having cleanup operations
Don't ever use goto in C++, it's really not needed. (unlike in C, where it could be used for cleanup).
There are destructors and RAII for that.
Use C++ casts which are more fine-grained, such as static_cast<> and reinterpret_cast<>, instead of C-style casts
See this article for an explanation.
Here's the revised code:
int main()
{
// Set the global sample rate before creating class instances.
Stk::setSampleRate( 44100.0 );
RtAudio dac;
FMVoices instrument_FM;
instrument_FM.setFrequency(440.0);
// Figure out how many bytes in an StkFloat and setup the RtAudio stream.
RtAudio::StreamParameters parameters;
parameters.deviceId = dac.getDefaultOutputDevice();
parameters.nChannels = 1;
RtAudioFormat format = ( sizeof(StkFloat) == 8 ) ? RTAUDIO_FLOAT64 : RTAUDIO_FLOAT32;
unsigned int bufferFrames = RT_BUFFER_SIZE;
// didn't get rid of this try since you want to print the error message.
try {
// note here i need the ampersand &, because instrument_FM is on the stack
dac.openStream( &parameters, NULL, format, static_cast<unsigned int>(Stk::sampleRate()), &bufferFrames, &tick, reinterpret_cast<void*>(&instrument_FM));
}
catch ( RtError& error ) {
error.printMessage();
}
}

Can I pause the callback from within itself?

I am using SDL audio to play sounds.
SDL_LockAudio tells this :
Do not call this from the callback function or you will cause deadlock.
But, SDL_PauseAudio doesn't say that, instead it tells :
This function pauses and unpauses the audio callback processing
My mixer callback looks like this :
void AudioPlaybackCallback( void *, core::bty::UInt8 *stream, int len )
{
// number of bytes left to play in the current sample
const int thisSampleLeft = currentSample.dataLength - currentSample.dataPos;
// number of bytes that will be sent to the audio stream
const int amountToPlay = std::min( thisSampleLeft, len );
if ( amountToPlay > 0 )
{
SDL_MixAudio( stream,
currentSample.data + currentSample.dataPos,
amountToPlay,
currentSample.volume );
// update the current sample
currentSample.dataPos += amountToPlay;
}
else
{
if ( PlayingQueue::QueueHasElements() )
{
// update the current sample
currentSample = PlayingQueue::QueuePop();
}
else
{
// since the current sample finished, and there are no more samples to
// play, pause the playback
SDL_PauseAudio( 1 );
}
}
}
PlayingQueue is a class which provides access to a static std::queue object. Nothing fancy.
This worked fine, until we decided to update the SDL and alsa libraries (now there is no turning back anymore). Since then I see this in my log :
ALSA lib pcm.c:7316:(snd_pcm_recover) underrun occurred
If I assume there are no bugs in SDL or alsa library (this is most likely wrong, after googling this message), I guess it should be possible to change my code to fix, or at least avoid the underrun.
So, the question is : can I pause the callback from itself? Can it cause underruns I am seeing?
Finally I figured out.
When the SDL_PauseAudio( 1 ); is called in the callback, then the SDL is going to switch to another callback (which just put zeros into the audio stream). The callback will finish the execution after the function is called.
Therefore, it is safe to call this function from the callback.