C++ - linking to 3rd party DLL - intermittent access violation

C++ - linking to 3rd party DLL - intermittent access violation - c++

I have been provided with a C++ DLL and associated header file in order to integrate it with my application. To begin with, I am simply trying to call the DLL from a simple Win32 console application (I'm using Visual Studio 2008 Express).
I've linked the DLL by specifying it as an additional dependency in the project settings.
The interface (i.e. the only exported function) simply returns a pointer to an instance of the Class that I actually need to call. I can successfully call this, get the pointer and call the first function that I need to (an "init" function).
When I come to call the function that actually does the processing I require, I'm intermittently getting a "0xC0000005: Access violation reading location...." error. That is, I run the program - it works successfully and exits - I try to run it again (changing nothing - all parameters are hard coded) and get the error (and continue to do so).
I can't consistently recreate the problem but I'm beginning to think that it may be something to do with the DLL not being unloaded properly - after getting the error on one occasion I tried deleting the DLL and was told by Windows that it was in use. That said, on another occasion I was able to delete the DLL after getting the error, copy it back, then still got the error on the next run.
Should the DLL be correctly unloaded when my .exe finishes? Would I be better off trying to explicitly load/unload the DLL rather than doing it implicitly?
Any other help or advice greatly appreciated.

It won't have anything to do with the DLL being unloaded; different processes using the same DLL do not share any state. Also, the DLL will be unloaded when the process exits; perhaps not gracefully, but it will be unloaded.
I can think of two likely reasons for an intermittent failure.
Most likely, the DLL has a race condition. This could be one that's exposed if the DLL has been cached, causing the timing to change. That would explain why your first run didn't fail but subsequent ones did.
I think it's also possible that the DLL didn't release its lock on some file. If there is some file you know is accessed by this DLL, try checking to see if that file is locked after the process ends.
Also, get this in a debugger. Turn on first-chance exceptions in visual studio and look at the callstack where the AV is happening, and post it here.

What types are involved in the classes exported by the DLL? We have often had these kinds of problems with Visual Studio when the classes make use of STL - I'd guess that any template usage would probably be a way to potentially cause this sort of problem.

What's your setting for "Runtime library" under "C/C++" in the configuration properties?
You should probably try /MD (or /MDd for debugging).
see http://msdn.microsoft.com/en-us/library/2kzt1wy3(VS.71).aspx

Related

Could imported DLL functions be calling the wrong function versions if the exe statically links to a different msvcr?

I have a strange issue that I am trying to work out for someone. I don't have any access to the code. There is a program that loads a DLL and has somewhat of a plugin framework. They provide virtually no documentation beyond how to import functions from the DLL and what calling convention to use for exports.
This person's plugin imports functions from a DLL (let's assume they used the proper calling conventions and imported properly). It periodically runs into access violations (usually access violation write/read from 0x0000000). Sometimes, it crashes the program and Event Viewer shows exception code 0xc0000005 (another access violation) with faulting module SHLWAPI.dll.
Using depends, I have determined that the program is statically linked to msvcr. I found that the plugin DLL dynamically links to msvcr120.dll.
Yes, I am aware that this is just asking for trouble and the access violations are no surprise, but unfortunately, I have to deal with someone else's problem.
Anyway, my question is this:
Let's say is imported from this DLL and inside is a call to a function that is provided by msvcr120. When the program calls the imported , is it possible that it is calling from the msvcr it is statically linked to rather than from msvcr120?
I realize that it probably depends on the main program's plugin framework, but general feedback would be appreciated.
Thanks in advance!

There are known issues when using multiple copies of the CRT in one program, even when they all use the same version of the CRT (see Potential Errors Passing CRT Objects Across DLL Boundaries). If the CRTs are different versions, there are lots of other problems due to different size or layout of internal structures.
Since the program you use statically links with the CRT, it can not reliably be plugged in to. The anti-debugger code is just plain silly; there are several ways around it. If you paid for it send it back and demand a refund.

MSVC Order in which DLLs are linked

I debug an application which links against two DLLs. When an object from one of these DLLs is instantiated the application segfaults. However, when the order of the .lib files in (VS2010) Linker->Input->Additional Dependencies is swapped then the application runs fine.
This workaround works for now but I still want to understand what caused the problem. Any hints, how can I further debug this?

While without much more information any answer is bound to be a speculation, one potential reason is the following:
in Windows, every DLL has DLLMain() function
as soon as you have two DLLs, they will have two DllMain() function calls
if there is a dependency between the two (i.e. one DLLMain() implicitly relies on an object being initialized within another one) - you have a problem.
I've seen such a problem myself (which is obviously a bad programming practice, but it does happen). However, a number of other explanations may also exist.

The way to debug this would be the way to debug any crash: start by inspecting the stack at the catch of the segfault. If the crash reproduces at debug builds, understanding the direct reason should be straightforward (the root cause usually takes some deeper investigation).
Not much can be said about the cause of the crash without viewing this stack, but it is reasonable to assume that a global resource is involved. A named event, common temporary file, perhaps some internal framework structure (which, if any, frameworks are involved in your code?) - but it must be something that crosses the individual DLL boundaries, and so is probably a resource at application scope.

Breaking as the code execution enters the Dll or Lib space

I'm debugging a Dll used by OS mechanisms using microsoft visual studio(In my case it is a minidriver).
I want the execution to break as it enters the dll code I want it because I don't know when base smart card CSP calls which function.
Is there any means to do it instead of having breakpoints in all exported functions?
If not, do you feel this feature is necessary?

So I'm not sure what you want to achieve - you need the program to break without setting breakpoints, or you need just an information into which function/method from the dll program will enter first? I think you can accomplish that with timeline profiling using for example dotTrace: https://www.jetbrains.com/profiler/whatsnew/

Executable suddenly stopped working: silent exit, no errors, no nothing

I am facing a rather peculiar issue: I have a Qt C++ application that used to work fine. Now, suddenly I cannot start it anymore. No error is thrown, no nothing.
Some more information:
Last line of output when application is started in debug mode with Visual Studio 2012:
The program '[4456] App.exe' has exited with code -1 (0xffffffff).
Actual application code (= first line in main()) is never called or at least no breakpoints are triggered, so debugging is not possible.
The executable process for a few seconds appears in the process list and then disappears again.
Win 7 x64 with latest Windows updates.
The issues simultaneously appeared on two separate machines.
Application was originally built with Qt 5.2.1. Today I test-wise switched to Qt 5.4.1. But as expected no change.
No changes to source code were made. The issue also applies to existing builds of the application.
Running DependencyWalker did not yield anything of interest from my point of view.
I am flat out of ideas. Any pointers on what to try or look at? How can an executable suddenly stop working at all with no error?

I eventually found the reason for this behavior...sort of. The coding (e. g. my singletons) were never the problem (as I expected since the code always worked). Instead an external library (SAP RFC SDK) caused the troubles.
This library depends on the ICU Unicode libraries and apparently specific versions at that. Since I wasn't aware of that fact, I only had the ICU libraries that my currently used Qt version needs in my application directory. The ICU libraries for the SAP RFC SDK must have been loaded from a standard windows path so far.
In the end some software changes (Windows updates, manual application uninstalls, etc.) must have removed those libraries which resulted in that described silent fail. Simply copying the required ICU library version DLLs into my application folder, solved the issue.
The only thing I am not quite sure about, is why this was not visible when tracing the loaded DLLs via DependencyWalker.

"Actual application code (= first line in main()) is never called. So debugging is not possible."
You probably have some static storage initialization failing, that's applied before main() is called.
Do you use any interdependent singletons in your code? Consolidate them to a single singleton if so (remember, there shouldn't be more than one singleton).
Also note, debugging still is possible well for such situation, the trap is ,- for such case as described in my answer -, main()'s bodies' first line is set as the first break point as default, when you start up your program in the debugger.
Nothing hinders you to set breakpoints, that are hit before starting up the code reaches main() actually.
As for your clarification from comments:
"I do use a few singletons ..."
As mentioned above, if you are really sure you need to use a singleton, use actually a single one.
Otherwise you may end up, struggling with undefined order of initialization of static storage.
Anyway, it doesn't matter that much if static storage data depends on each other, provide a single access point to it throughout your code, to avoid cluttering it with heavy coupling to a variety of instances.
Coupling with a single instance, makes it easier to refactor the code to go with an interface, if it turns out singleton wasn't one.

C++ - using 'new' operator cannot be stepped into during debug

COM+ application, building with MS Visual Studio 6, SP 6 on Windows XP SP 3, and debugging remotely.
My main question is this; why would I not be able to step into 'new' if I can step into 'delete'? I'm mostly looking for ideas as to what I should look into.
I'm working on a fairly large project that I'm only just getting familiar with. The current issue is a heap corruption problem in which a release build will eventually exhaust its working set and crash. The problem is so pervasive that the following code will corrupt the heap:
int * iArray = new int [100];
delete [] iArray;
I say 'corrupt the heap' because the debug output displays "heap[dllhost.exe]: invalid address specified to rtlvalidateheap" on the 'delete'.
I can step into the 'delete' call fine and it seems to be calling the proper one (located in DELOP.cpp in ...\Microsoft Visual Studio\VC98\CRT\SRC) but, for whatever reason, I cannot step into any call to 'new'. I'm grasping at straws here but I have a feeling somewhere in the code base someone overrode the 'new' operator and the code I'm looking at is unintentionally using it. The symptoms of the release build seem like memory is being allocated to one heap and attempted to be deleted from another, at least that's my hunch.
EDIT:
Ack! Sorry everyone, I posted too soon, should have given more info.
I've searched the code base and found a few overrides but they are all in classes, not globally defined. The only one that isn't in a class is the following:
struct _new_selector
{
};
inline void* operator new(size_t, void *ptr, _new_selector)
{
return (ptr);
}
But that's a placement new and I'm pretty sure it doesn't count in this situation. What is the library that I should be stepping into for the original 'new'? I'm guessing it's the same as the 'delete' but if not, maybe I just don't have debug info for it?
EDIT 2:
Messing around with this I've found that on a debug build this issue does not exist. I've already looked into the run-time libraries, it's using /MD and /MDd for debug. Not only that, but I've built the release version with /MDd jut to make sure and there was still no change. Looking at the maps of both a debug build and a release build the new operator (with it's mangling) is at the following:
Release:
0001:00061306 ??2#YAPAXI#Z 10062306 f MSVCRTD:MSVCRTD.dll
0002:00000298 _imp??2#YAPAXI#Z 1006e298 MSVCRTD:MSVCRTD.dll
Debug:
0001:00077d06 ??2#YAPAXI#Z 10078d06 f MSVCRTD:MSVCRTD.dll
0004:00000ad4 _imp??2#YAPAXI#Z 100b5ad4 MSVCRTD:MSVCRTD.dll
I also checked the delete operator:
Release:
0001:000611f0 ??3#YAXPAX#Z 100621f0 f msvcprtd:delop_s.obj
Debug:
0001:00077bf0 ??3#YAXPAX#Z 10078bf0 f msvcprtd:delop_s.obj
Also, and I don't have a print out of them but I can get it if it would help, the disassembly of the new operator looks the same in release and debug. So I guess it's not an override? Would an inline override of an operator make this untrue?
Also, being a COM+ application which spawns multiple dllhost.exe processes, would it be possible for a call to the new operator go to another DLL, or the exe, and a call to delete go to the opposite?

Going with your hunch that there is possibly an overloaded new somewhere in the code, few things you can check
the disassembly of the code and find out the library name in that file, generally there is something in the assembly that will give you a hint
If you are not able to find out the library name, check the address in the assembly that you are entering. Then in the debug output window, check the load addresses for the various libraries - that could give you a clue as to which library to check
If the above doesn't help, check if you can generate a map file for the complete project. If you can, then you can look up the address in the map file and that might help
Try to use the debug version of the runtime libraries. Can't recollect what was the option that will turn on the debug_malloc. That can help you figure out what is happening on the heap
The community can add a few more that I might have missed. And finally, if you do crack the problem, please share how you did so. Either here or as a link to your blog. Working on heap problems for a large project is generally not easy and we all can learn a new trick or two.

Ok, so here is what it ended up being.
The program I'm working on has about thirty or so projects within it. Some create libs, others dlls, still others exes. In any case, it's all intermingled. Add into that the fact that we use ATL and COM and it starts getting crazy pretty quick. The end result is that some, not all, of the projects were being built with the _ATL_MIN_CRT compiler definition, even though this is a desktop application, not a web based one in which a client would need to download a few modules.
Here is some info on _ATL_MIN_CRT:
• http://support.microsoft.com/default.aspx?scid=kb;EN-US;q166480
• http://msdn.microsoft.com/en-us/library/y3s1z4aw%28v=vs.80%29.aspx
Notice, from the first link, that this also will eliminate the use of memory allocation routines. I’m unsure what the original motive behind using this was, or if it was really intentional, but it certainly causes issues with allocating memory. Also, this only affects Release builds, hence why it was such a pain to find.
Basically, module A was built with _ATL_MIN_CRT and module B was built without it but had a dependency to module A. Since this is also using COM and everything was run in dllhost.exe, when module B tried to use the new operator it seems to have gone out of its dll to attempt to allocate memory on the heap. Then, when calling delete, it tried to delete it within its dll. Thus, we have a crazy memory leak.
Mind, removing _ATL_MIN_CRT fixes this but what I mention above is only my understanding of it. It may very well be more/less complicated but, regardless, this was the issue.
Thanks to everyone's suggestions, they really did help me find this thing!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js