I have a weird problem:
On my Win32 C++ App, I have a function where the function returns after a call to another function.
void f()
{
//SECTION 1//
if( interactFrame )
{
psFrame->getWindow()->deactivate();
interactFrame = activeFrame = 0;
logFile << "PS deactive" << endl;
}
//SECTION 2//
}
void Window::deactivate()
{
SetLayeredWindowAttributes( handle_, 0, 0, LWA_ALPHA );
SetFocus( applicationWindow_ );
}
After I call f(), the function goes through Section 1, branches into the if statement, completes line 1 (psFrame->...) and returns after that without evaluating the remaining two lines and the Section 2 out of the branch. I had this happen to me when for instance I was calling a method of a variable which was NULL, in this case psFrame, and it would instead of breaking, just return. However it is a legitimate variable, its contents and the pointer returned from getWindow() exists. In fact I can trace until the completion of deactivate() however my breakpoint at the next line is never hit.
In fact this is the same code that runs on one computer, and doesn't on my laptop. Both running Win 7.
What do you think could be the cause of this?
It sounds like something (quite possibly the deactivate, or something it calls) is making a mess of the stack (e.g., overwriting the end of a buffer) and messing up the return address. Much more than that would be a pretty wild guess though.
Given your description, it still sounds like you are getting null dereference errors. Guard your code a bit and see what happens like this:
if( interactFrame )
{
if (psFrame)
{
if (psFrame->getWindow())
{
psFrame->getWindow()->deactivate();
}
// else log("getWindow == null")
}
// else log("psFrame == null")
interactFrame = activeFrame = 0;
logFile << "PS deactive" << endl;
}
Beyond that we'd need to see more code.
UPDATE: OK - you posted more code, and that's pretty odd, unless something very strange is happening like getWindow() is overrunning your stack and trashing the return address. Check any local variables (especially strings and arrays) you have in getWindow().
GMan also has a good point - if psFrame is returning a pointer to a deleted window in getWindow, that could also be a culprit (and you might see different behaviors depending on if the memory has been re-allocated or not yet)
I guess the line
psFrame->getWindow()->deactivate();
simply generates an exception. And your function does not return at all - it terminates with exception. To confirm that set a breakpoint after the call to f() function (part of which is the code you've posted) and if this breakpoint doesn't hit either then it is likely an exception (possibly invalid memory access or simply C++ exception thrown).
Stack corruption is also possible and it will also likely lead to an exception (unless you accidentally overwrite return address with a valid address to executable memory).
Also note that if psFrame happen to be 0 (or other invalid pointer) then exception is guaranteed if getWindow() access any non-static member of its object in any way. And you would see exactly the behaviour you described. The same situation is when psFrame->getWindow() returns 0 (or another invalid pointer) and deactivate() accesses non-static member.
UPD:
You may also follow stack contents changes when debugging.
Related
I'm designing a preloader-based lock tracing utility that attaches to Pthreads, and I've run into a weird issue. The program works by providing wrappers that replace relevant Pthreads functions at runtime; these do some logging, and then pass the args to the real Pthreads function to do the work. They do not modify the arguments passed to them, obviously. However, when testing, I discovered that the condition variable pointer passed to my pthread_cond_wait() wrapper does not match the one that gets passed to the underlying Pthreads function, which promptly crashes with "futex facility returned an unexpected error code," which, from what I've gathered, usually indicates an invalid sync object passed in. Relevant stack trace from GDB:
#8 __pthread_cond_wait (cond=0x7f1b14000d12, mutex=0x55a2b961eec0) at pthread_cond_wait.c:638
#9 0x00007f1b1a47b6ae in pthread_cond_wait (cond=0x55a2b961f290, lk=0x55a2b961eec0)
at pthread_trace.cpp:56
I'm pretty mystified. Here's the code for my pthread_cond_wait() wrapper:
int pthread_cond_wait(pthread_cond_t* cond, pthread_mutex_t* lk) {
// log arrival at wait
the_tracer.add_event(lktrace::event::COND_WAIT, (size_t) cond);
// run pthreads function
GET_REAL_FN(pthread_cond_wait, int, pthread_cond_t*, pthread_mutex_t*);
int e = REAL_FN(cond, lk);
if (e == 0) the_tracer.add_event(lktrace::event::COND_LEAVE, (size_t) cond);
else {
the_tracer.add_event(lktrace::event::COND_ERR, (size_t) cond);
}
return e;
}
// GET_REAL_FN is defined as:
#define GET_REAL_FN(name, rtn, params...) \
typedef rtn (*real_fn_t)(params); \
static const real_fn_t REAL_FN = (real_fn_t) dlsym(RTLD_NEXT, #name); \
assert(REAL_FN != NULL) // semicolon absence intentional
And here's the code for __pthread_cond_wait in glibc 2.31 (this is the function that gets called if you call pthread_cond_wait normally, it has a different name because of versioning stuff. The stack trace above confirms that this is the function that REAL_FN points to):
int
__pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex)
{
/* clockid is unused when abstime is NULL. */
return __pthread_cond_wait_common (cond, mutex, 0, NULL);
}
As you can see, neither of these functions modifies cond, yet it is not the same in the two frames. Examining the two different pointers in a core dump shows that they point to different contents, as well. I can also see in the core dump that cond does not appear to change in my wrapper function (i.e. it's still equal to 0x5... in frame 9 at the crash point, which is the call to REAL_FN). I can't really tell which pointer is correct by looking at their contents, but I'd assume it's the one passed in to my wrapper from the target application. Both pointers point to valid segments for program data (marked ALLOC, LOAD, HAS_CONTENTS).
My tool is definitely causing the error somehow, the target application runs fine if it is not attached. What am I missing?
UPDATE: Actually, this doesn't appear to be what's causing the error, because calls to my pthread_cond_wait() wrapper succeed many times before the error occurs, and exhibit similar behavior (pointer value changing between frames without explanation) each time. I'm leaving the question open, though, because I still don't understand what's going on here and I'd like to learn.
UPDATE 2: As requested, here's the code for tracer.add_event():
// add an event to the calling thread's history
// hist_entry ctor gets timestamp & stack trace
void tracer::add_event(event e, size_t obj_addr) {
size_t tid = get_tid();
hist_map::iterator hist = histories.contains(tid);
assert(hist != histories.end());
hist_entry ev (e, obj_addr);
hist->second.push_back(ev);
}
// hist_entry ctor:
hist_entry::hist_entry(event e, size_t obj_addr) :
ts(chrono::steady_clock::now()), ev(e), addr(obj_addr) {
// these are set in the tracer ctor
assert(start_addr && end_addr);
void* buf[TRACE_DEPTH];
int v = backtrace(buf, TRACE_DEPTH);
int a = 0;
// find first frame outside of our own code
while (a < v && start_addr < (size_t) buf[a] &&
end_addr > (size_t) buf[a]) ++a;
// skip requested amount of frames
a += TRACE_SKIP;
if (a >= v) a = v-1;
caller = buf[a];
}
histories is a lock-free concurrent hashmap from libcds (mapping tid->per-thread vectors of hist_entry), and its iterators are guaranteed to be thread-safe as well. GNU docs say backtrace() is thread-safe, and there's no data races mentioned in the CPP docs for steady_clock::now(). get_tid() just calls pthread_self() using the same method as the wrapper functions, and casts its result to size_t.
Hah, figured it out! The issue is that Glibc exposes multiple versions of pthread_cond_wait(), for backwards compatibility. The version I reproduce in my question is the current version, the one we want to call. The version that dlsym() was finding is the backwards-compatible version:
int
__pthread_cond_wait_2_0 (pthread_cond_2_0_t *cond, pthread_mutex_t *mutex)
{
if (cond->cond == NULL)
{
pthread_cond_t *newcond;
newcond = (pthread_cond_t *) calloc (sizeof (pthread_cond_t), 1);
if (newcond == NULL)
return ENOMEM;
if (atomic_compare_and_exchange_bool_acq (&cond->cond, newcond, NULL))
/* Somebody else just initialized the condvar. */
free (newcond);
}
return __pthread_cond_wait (cond->cond, mutex);
}
As you can see, this version tail-calls the current one, which is probably why this took so long to detect: GDB is normally pretty good at detecting frames elided by tail calls, but I'm guessing it didn't detect this one because the functions have the "same" name (and the error doesn't affect the mutex functions because they don't expose multiple versions). This blog post goes into much more detail, coincidentally specifically about pthread_cond_wait(). I stepped through this function many times while debugging and sort of tuned it out, because every call into glibc is wrapped in multiple layers of indirection; I only realized what was going on when I set a breakpoint on the pthread_cond_wait symbol, instead of a line number, and it stopped at this function.
Anyway, this explains the changing pointer phenomenon: what happens is that the old, incorrect function gets called, reinterprets the pthread_cond_t object as a struct containing a pointer to a pthread_cond_t object, allocates a new pthread_cond_t for that pointer, and then passes the newly allocated one to the new, correct function. The frame of the old function gets elided by the tail-call, and to a GDB backtrace after leaving the old function it looks like the correct function gets called directly from my wrapper, with a mysteriously changed argument.
The fix for this was simple: GNU provides the libdl extension dlvsym(), which is like dlsym() but also takes a version string. Looking for pthread_cond_wait with version string "GLIBC_2.3.2" solves the problem. Note that these versions do not usually correspond to the current version (i.e. pthread_create()/exit() have version string "GLIBC_2.2.5"), so they need to be looked up on a per-function basis. The correct string can be determined either by looking at the compat_symbol() or versioned_symbol() macros that are somewhere near the function definition in the glibc source, or by using readelf to see the names of the symbols in the compiled library (mine has "pthread_cond_wait##GLIBC_2.3.2" and "pthread_cond_wait##GLIBC_2.2.5").
I developed a WDM filter driver on disk driver. I want to send an asynchronous request to write data on disk. The windows will crash when I delete the writeBuffer memory in WriteDataIRPCompletion function.
My question is: How can I safely free the writeBuffer memory without crashing?
This my send request code:
#pragma PAGEDCODE
NTSTATUS WriteToDeviceRoutine() {
PMYDRIVER_WRITE_CONTEXT context = (PMYDRIVER_WRITE_CONTEXT)ExAllocatePool(NonPagedPool,sizeof(PMYDRIVER_WRITE_CONTEXT));
context->writeBuffer = new(NonPagedPool) unsigned char[4096];
PIRP pNewIrp = IoBuildAsynchronousFsdRequest(IRP_MJ_WRITE,
pdx->LowerDeviceObject,
context->writeBuffer,(wroteRecordNodeCount<<SHIFT_BIT),
&startingOffset,NULL);
IoSetCompletionRoutine(pNewIrp,WriteDataIRPCompletion,context,TRUE,TRUE,TRUE);
IoCallDriver(pdx->LowerDeviceObject,pNewIrp);
}
This is my completion routine code:
#pragma LOCKEDCODE
NTSTATUS WriteDataIRPCompletion(IN PDEVICE_OBJECT DeviceObject,IN PIRP driverIrp,IN PVOID Context) {
PMDL mdl,nextMdl;
KdPrint((" WriteDataIRPCompletion \n"));
PMYDRIVER_WRITE_CONTEXT writeContext = (PMYDRIVER_WRITE_CONTEXT) Context;
if(driverIrp->MdlAddress!=NULL){
for(mdl=driverIrp->MdlAddress;mdl!=NULL;mdl = nextMdl) {
nextMdl = mdl->Next;
MmUnlockPages(mdl);
IoFreeMdl(mdl);
KdPrint(("mdl clear\n"));
}
driverIrp->MdlAddress = NULL;
}
delete [] writeContext->writeBuffer;
if(Context)
ExFreePool(Context);
KdPrint(("leave WriteDataIRPCompletion \n"));
return STATUS_CONTINUE_COMPLETION;
}
you error in next line
context = ExAllocatePool(NonPagedPool,sizeof(PMYDRIVER_WRITE_CONTEXT));
when must be
context = ExAllocatePool(NonPagedPool,sizeof(MYDRIVER_WRITE_CONTEXT));
not sizeof(PMYDRIVER_WRITE_CONTEXT) but sizeof(MYDRIVER_WRITE_CONTEXT) you allocate not structure but pointer to it.
this not produce error only if your MYDRIVER_WRITE_CONTEXT containing single field writeBuffer and no more data. otherwise you overwrite allocated memory (which is only sizeof(PVOID)) and this create bug
and about completion for IoBuildAsynchronousFsdRequest. unfortunately documentation not very good. here sated that
Before calling IoFreeIrp, an additional step is required to free the
buffer for an IRP built by IoBuildAsynchronousFsdRequest if the
following are all true:
The buffer was allocated from system memory pool.
but then all attention for
The Irp->MdlAddress field is non-NULL.
however we must check and for IRP_DEALLOCATE_BUFFER|IRP_BUFFERED_IO, without this we can leak Irp->AssociatedIrp.SystemBuffer. need next code
if (Irp->Flags & IRP_BUFFERED_IO)
{
if (Irp->Flags & IRP_INPUT_OPERATION)
{
if (!NT_ERROR(Irp->IoStatus.Status) && Irp->IoStatus.Information)
{
memcpy( Irp->UserBuffer, Irp->AssociatedIrp.SystemBuffer, Irp->IoStatus.Information );
}
}
if (Irp->Flags & IRP_DEALLOCATE_BUFFER)
{
ExFreePool(Irp->AssociatedIrp.SystemBuffer);
Irp->AssociatedIrp.SystemBuffer = 0;
}
Irp->Flags &= ~(IRP_DEALLOCATE_BUFFER|IRP_BUFFERED_IO);
}
and check for if (writeContext) after use writeContext->writeBuffer already senseless and nosense. really you need do check for context != NULL yet in WriteToDeviceRoutine()
I'm not too familiar with the specifics of what you're working with, so here're a few details that caught my attention.
In WriteDataIRPCompletion function
PMYDRIVER_WRITE_CONTEXT writeContext = (PMYDRIVER_WRITE_CONTEXT) Context;
// ...
delete [] writeContext->writeBuffer;
if(Context)
ExFreePool(Context);
Notice that your writeContext originates from your Context argument. However, you seem to be deleting/freeing the allocated memory twice.
The ExFreePool function docs state:
Specifies the address of the block of pool memory being deallocated.
It looks like the delete [] writeContext->writeBuffer; line might be causing the problem and it just needs to be removed.
As it is right now, part of the memory that should be freed by the function has already been manually deleted by the time you invoke ExFreePool, but not set to NULL, which in turn causes ExFreePool to receive a now-invalid pointer (i.e. a non-null pointer pointing to de-allocated memory) in its Context argument, causing the crash.
In WriteToDeviceRoutine function
The documentation for ExFreePool explicitly states that it deallocates memory that has been allocated with other functions, such as ExAllocatePool and other friends.
However, your code is trying to allocate/deallocate the writeContext->writeBuffer directly using the new/delete operators respectively. It seems like you should be allocating your memory with ExAllocatePool and then deallocating with ExFreePool instead of trying to do things manually like that.
These functions may be organizing the memory in a specific way and if/when this pre-condition is not met in ExFreePool, it could end up in a crash.
On a separate note, it seems odd that you check if(Context) is null before invoking ExFreePool, but not above before you try to type-cast for your local writeContext variable and use it.
Maybe you should also check at that first point of use? If Context is always non-null, then the check might also be unnecessary prior to invoking ExFreePool.
I have overloaded new function but unfortunetly never been able to execute global handler for requesting more memory access on my compiler. I also don't understand as per below code snippet if we invoke the
global handler for requesting more memory how it is gling to allocate to P.
I appreciate if anybody can through some light on this
void * Pool:: operator new ( size_t size ) throw( const char *)
{
int n=0;
while(1)
{
void *p = malloc (100000000L);
if(p==0)
{
new_handler ghd= set_new_handler(0);//deinstall curent handler
set_new_handler(ghd);// install global handler for more memory access
if(ghd)
(*ghd)();
else
throw "out of memory exception";
}
else
{
return p;
}
}
}
To have any effect, some other part of the program must have installed a global handler previously. That handler must also have some kind of memory to release when the handler is called (perhaps some buffers or cache that can be discarded).
The default new_handler is just a null pointer, so your code is very likely to end up throwing an exception.
Also, I would have thrown a bad_alloc exception to be consistent with other operator new overloads.
Here are two things to discuss, the first is using new_handler, the second is overloading operator new.
set_new_handler()
When you want use a new_handler, you have to register it. It is typically the first thing to do after entering main(). The handler should also be provided by you.
#include <iostream>
#include <new>
void noMemory() throw()
{
std::cout << "no memory" << std::endl;
exit(-1);
}
int main()
{
set_new_handler(noMemory);
// this will probably fail and noMemory() will be called
char *c = new char[100000000L];
std::cout << "end" << std::endl;
}
When no memory can be allocated, your registered handler will be called, and you have the chance to free up some memory. When the handler returns, operator new will give another try to allocate the amount of memory you requested.
operator new
The structure of the default operator new is something similar you presented. From the point of the new_handler the important part is the while(1) loop, since it is responsible for trying to get memory after called the new_handler.
There is two way out of this while(1) loop:
getting a valid pointer
throwing an exception
You have to have this in mind when you provide a new_handler, because if you can not do anything to free up memory you should deinstall the handler (or terminating or throwing an exception), otherwise you can stuck in an endless loop.
I guess omitting parameter size in your code is just for test purpose.
Also see Scott Meyers' Effective C++ Item 7 for details. As operator new must return a valid pointer even with parameter size = 0, the first thing to do in your operator new should be overwriting size to 1 in case of the user want to allocate 0 number of bytes. This trick is simple and fairly effective.
I have this very annoying issue, whenever i call a function:
void renderGame::renderMovingBlock(movingBlock* blockToRender){
sf::Shape blockPolygon;
sf::Shape blockLine = sf::Shape::Line(blockToRender->getLineBegin().x,blockToRender->getLineBegin().y,blockToRender->getLineEnd().x,blockToRender->getLineEnd().y, 3.f,movingBlockLineColor);
for(auto i = blockToRender->getVertexArray()->begin(); i!=blockToRender->getVertexArray()->end(); ++i){
blockPolygon.AddPoint(i->x, i->y, movingBlockBlockColor);
}
renderToWindow->Draw(blockLine);
renderToWindow->Draw(blockPolygon);
}
Which is a simple function, it takes a pointer to an object and uses SFML to render it on the screen. It's a simple polygon that moves on a rail.
getVertexArray() returns a pointer to the object's vector of vertices, renderToWindow is a pointer to sf::RenderWindow
The very weird issue i have is that i can call this function it won't return from it, VC++ breaks and points me to:
int __cdecl atexit (
_PVFV func
)
{
return (_onexit((_onexit_t)func) == NULL) ? -1 : 0;
}
I'm getting weird behavoir here, i can stop this function right before exiting by calling the Display() function and system("pause"), it'll display everything perfectly fine, but one step further and it breaks.
I'll add that i'm sending a dynamically allocated object, when i set a regular one everything's fine. It's weird, when i debug the program then the polygon and line have the right coordinates, everything displays properly, but it just can't return from the function.
If a function will not return sounds like you messed up the stack somewhere previously - this is most likely an out-of-bounds write.
Or possibly because you are ending up in atexit there could have been an uncaught exception thrown.
Either way - welcome to the joys of programming - now you have to find an error which probably happens long before your function gets stuck
You could try some tools like valgrind (if its available for windows) or some other bounds checkers.
I am asking this question for general coding guidelines:
class A {
A() { ... throw 0; }
};
A obj; // <---global
int main()
{
}
If obj throws exception in above code then, it will eventually terminate the code before main() gets called. So my question is, what guideline I should take for such scenario ? Is it ok to declare global objects for such classes or not ? Should I always refrain myself from doing so, or is it a good tendency to catch the error in the beginning itself ?
If you NEED a global instance of an object whose constructor can throw, you could make the variable static, instead:
A * f(){
try {
//lock(mutex); -> as Praetorian points out
static A a;
//unlock(mutex);
return &a;
}
catch (...){
return NULL;
}
}
int main() {
A * a = f(); //f() can be called whenever you need to access the global
}
This would alleviate the problem caused by a premature exception.
EDIT: Of course, in this case the solution is 90% of the way to being a Singleton. Why not just fully turn it into one, by moving f() into A?
No, you should not declare such objects global - any exception will be unhandled and very hard to diagnose. The program will just crash which means that it will have very poor (below zero) user experience and will be rather hard to maintain.
As #Kerrek SB has mentioned in the comments, the answer to this is dependent on the reasons that can cause your class to throw. If you're trying to acquire a system resource that might be unavailable, I feel you shouldn't declare a global object. Your program will crash as soon as the user tries to run it; needless to say, that doesn't look very good. If it can throw a std::bad_alloc or some such exception that is unlikely under normal circumstances (assuming you're not trying to allocate a few GB of memory) you could make a global instance; however, I would still not do that.
Instead, you could declare a global pointer to the object, instantiate the object right at the beginning of main (before any threads have been spawned etc.) and point the pointer to this instance, then access it through the pointer. This gives your program a chance to handle exceptions, and maybe prompt the user to take some sort of remedial measures (like popping up a Retry button to try and reacquire the resource, for instance).
Declaring a global object is fine, but the design of your class is insignificant, it lacks details to be compatible with practical needs and use.
One solution no one seems to have mentionned is to use a function try
block. Basically, if the situation is that without the constructed
object, the rest of your program won't work or be able to do anything
useful, then the only real problem is that your user will get some sort
of incomprehensible error message if the constructor terminates with an
exception. So you wrap the constructor in a function try block, and
generate a comprehensible message, followed by an error return:
A::() try
: var1( initVar1 )
// ...
{
// Additional initialization code...
} catch ( std::exception const& ) {
std::cerr << "..." << std::endl;
exit(EXIT_FAILURE);
} catch (...) {
std::cerr << "Unknown error initializing A" << std::endl;
exit(EXIT_FAILURE);
}
This solution is really only appropriate, however, if all instances of
the object are declared statically, or if you can isolate a single
constructor for the static instances; for the non-static instances, it
is probably better to propagate the exception.
Like #J T have said, you can write like this:
struct S {
S() noexcept(false);
};
S &globalS() {
try {
static S s;
return s;
} catch (...) {
// Handle error, perhaps by logging it and gracefully terminating the application.
}
// Unreachable.
}
Such scenario is quite a problem, please read ERR58-CPP. Handle all exceptions thrown before main() begins executing for more detail.