Ignore globally overridden new/delete - c++

Hi I'm using a library that has globally overridden new/delete. But I have a problem with this library, the problem is that it has to be manually initialized in the main function.
Now I'm trying to use another library that initializes a few functions before main is called, unfortunately this library uses new within these functions. So I get errors because the memory manager that uses the overridden new/delete keywords are not initialized yet.
I'd really like to use the default memory manager because I want to add unit testing to this library. It would not make much sense to use the memory used my the library I want to test also used by my Unit Testing library.
So my question is if it's possible to ignore global overridden new/delete when including the second library and just use the default new/delete?
I'm using visual studio 2010 on Windows 7 with the standard C++ compiler.

I do not think this is possible without modifying the library itself. I guess it is about a static library (inside a dll, the overridden new/delete will be pointed by the functions inside the dll.)
You can remove an obj file from a static library by using the command (Visual command prompt):
LIB /REMOVE:obj_to_remove /OUT:removed.lib input.lib
To find out what obj to remove, first run:
DUMPBIN /ARCHIVEMEMBERS input.lib
You will see lines such as
Archive member name at 14286: /0 compilation.dir\objfile1.obj
14286 'identifies' the obj file. To see where each symbol is, run:
DUMPBIN /LINKERMEMBER:1 input.lib > members.txt
and look for the new/delete. members.txt will contains the mangled names of each symbols and an identifier of the obj in which this symbol is. For instance
14286 ?_Rank#?$_Arithmetic_traits#C#std##2HB
The 14286 is telling you the obj 'identifier' in which the symbol lies. If you have trouble finding new/delete, you can run:
DUMPBIN /SYMBOLS input.lib > sym.txt
which will flush into sym.txt the mangled and unmangled names for each symbol.
At the end, remove the obj file with the LIB command above by replacing obj_to_remove by compilation.dir\objfile1.obj in our example, and link against removed.lib.
Now, if you are not lucky, other symbols you need may be in the same object file as the new/delete. In that case, you can "hack" the lib using something like this (say renaming new to dew and delete to nelete.)

Can you place the memory manager's initialization out of main into a shared lib ?
If this is possible you could try forcing the initialization order of your libraries (by dependency) to load (and initialize) the memory manager before anything else is loaded. This is however a very brittle (or specific) solution since it'll be dependent on platform specific workarounds for forcing the order in which shared libs are initialized.

if overriding is done with a macro you may use
#pragma push_macro ("new")
#undef new
...code with standard new here ...
#pragma pop_macro ("new")
If it is really done by function override then you may temporarily construct a "new" macro yourself that calls a function with another name placed elsewhere which just calls the standard function.
Macros are resolved before function calls.

Related

How to detect multiply defined symbols

I have a common scenario:
Some source files for an executable which depend on some standard libraries and some user libraries. All the user libraries are statically linked into the executable whereas the standard libraries are linked dynamically to it.
Problem:
I believe that I have multiply defined symbols in my complete package (executable which already includes the user library code + shared standard libraries). The linker obviously has insight into it, but as I understand the linker won't complain unless it encounters multiple strong named symbols. My fear is that, while I am moving my code from solaris 8/sparc platform to solaris10/sparc platform, some standard unix functions have been implemented in the user libraries which are causing the app to crash at runtime. Note that the app runs fine in solaris 8/sparc platform
I have been facing weird issues which have led me to believe this might be the source
Modifying one variable from one library is changing the value of another variable in another library
Solaris 8-10: host2ip conversion problems
What I need:
Is there a way to easily list all multiply defined symbols?
Is there a way to easily list all multiply defined symbols stemming from user libraries?
Do you guys think the issue #1 might be caused by linking issues, or you feel it might be a sign of some other issue?
Edit1:
Since then I know that on generating map file using ld, it has a section of multiply defined symbol which I am going through to find names that look like standard library call. For people who do not know, the linker will only fail to link if it finds multiple symbols with the same name AND the names are strong names.
You could turn on MAP file generation in the compiler (actually linker) settings and look through the map file for symbols that match the UNIX system functions you are concerned about. You'd probably have to write a script to automate it, but this would be a good starting point. The command line switch is probably -map or something similar, it will depend on which compiler/linker you are using.
The actual problem that was happening is:
The library (let's call it lib1) had an array like below
#define ARRAY_SIZE 1024
SomeStruct* global_array[ARRAY_SIZE];
This array is used by my another library (let's call it lib2) which in turn is used by my application using an extern declaration for it.
While compiling lib2 (or is it the app not sure), we did not define ARRAY_SIZE at all. This somehow caused the compiler of lib2 (or the app) to miscalculate the size of global_array in-turn causing it to allocate the memory for some other variable at a location which was already allocated to the global_array.
By defining ARRAY_SIZE again while compiling my libs and apps, everything starts behaving normal. I do not fully understand what caused the issue and why it gets resolved since extern declaration of arrays do not contain the size. Also, if the library really used the MACRO ARRAY_SIZE, then why wouldn't the compilation fail? Also, there is a possibility, that the name used for the define is a standard name (the actual string was FD_SETSIZE)
My initial gut feeling about the linker was wrong.

C++ compilation static variable and shared objects

Description :
a. Class X contains a static private data member ptr and static public function member getptr()/setptr().
In X.cpp, the ptr is set to NULL.
b. libXYZ.so (shared object) contains the object of class X (i.e libXYZ.so contains X.o).
c. libVWX.so (shared object) contains the object of class X (i.e libVWX.so contains X.o).
d. Executable a.exe contains X.cpp as part of translation units and finally is linked to libXYZ.so, libVWX.so
PS:
1. There are no user namespaces involved in any of the classes.
2. The libraries and executable contain many other classes also.
3. no dlopen() has been done. All libraries are linked during compile time using -L and -l flags.
Problem Statement:
When compiling and linking a.exe with other libraries (i.e libXYZ.so and libVWX.so), I expected a linker error (conflict/occurance of same symbol multiple times) but did not get one.
When the program was executed - the behavior was strange in SUSE 10 Linux and HP-UX 11 IA64.
In Linux, when execution flow was pushed across all the objects in different libraries, the effect was registered in only one copy of X.
In HPUX, when execution flow was pushed across all the objects in different libraries, the effect was registered in 3 differnt copies of X (2 belonging to each libraries and 1 for executable)
PS : I mean during running the program, the flow did passed thourgh multiple objects belonging to a.exe, libXYZ.so and libVWX.so) which interacted with static pointer belonging to X.
Question:
Is Expecting linker error not correct? Since two compilers passed through compilation silently, May be there is a standard rule in case of this type of scenario which I am missing. If so, Please let me know the same.
How does the compiler (gcc in Linux and aCC in HPUX) decide how many copies of X to keep in the final executable and refer them in such scenarios.
Is there any flag supported by gcc and aCC which will warn/stop compilation to the users in these kind of scenarios?
Thanks for your help in advance.
I'm not too sure that I've completely understood the scenario. However,
the default behavior on loading dynamic objects under Linux (and other
Unices) is to make all symbols in the library available, and to only use
the first encountered. Thus, if you both libXYZ.so and libVWX.so
contain a symbol X::ourData, it is not an error; if you load them in
that order, libVWX.so will use the X::ourData from libXYZ.so,
instead of its own. Logically, this is a lot like a template definition
in a header: the compiler chooses one, more or less by chance, and if
any of the definitions is not the same as all of the others, it's
undefined behavior. This behavior can be
overridden by passing the flag RTLD_LOCAL to dlopen.
With regards to your questions:
The linker is simply implementing the default behavior of dlopen (that which you get when the system loads the library implicitely). Thus, no error (but the logical equivalent of undefined behavior if any of the definitions isn't the same).
The compiler doesn't decide. The decision is made when the .so is loaded, depending on whether you specify RTLD_GLOBAL or RTLD_LOCAL when calling dlopen. When the runtime calls dlopen implicitly, to resolve a dependency, it will use RTLD_GLOBAL if this occurs when loading the main executable, and what ever was used to load the library when the dependency comes from a library. (This means, of course, that RTLD_GLOBAL will propagate until you invoke dlopen explicitly.)
The function is "public static", so I assume it's OOP-meaning of "static" (does not need instance), not C meaning of static (file-static; local to compilation unit). Therefore the functions are extern.
Now in Linux you have explicit right to override library symbols, both using another library or in the executable. All extern symbols in libraries are resolved using the global offset table, even the one the library actually defines itself. And while functions defined in the executable are normally not resolved like this, but the linker notices the symbols will get to the symbol table from the libraries and puts the reference to the executable-defined one there. So the libraries will see the symbol defined in the executable, if you generated it.
This is explicit feature, designed so you can do things like replace memory allocation functions or wrap filesystem operations. HP-UX probably does not have the feature, so each library ends up calling it's own implementation, while any other object that would have the symbol undefined will see one of them.
There is a difference beetween "extern" symbols (which is the default in c++) and "shared libary extern". By default symbols are only "extern" which means the scope of one "link unit" e.g. an executable or a library.
So the expected behaviour would be: no compiler error and every module works with its own copy.
That leads to problems of course in case of inline compiling etc...etc...
To declare a symbol "shared library extern" you have to use a ".def" file or an compiler declaration.
e.g. in visual c++ that would be "_declspec(dllexport)" and "_declspec(dllimport)".
I do not know the declarations for gcc at the moment but I am sure someone does :-)

C++ wrapper DLLs to static LIBs

I have some statically compiled libraries (.lib) that I use in my project, which is written in C++ and built on both Windows and Linux. At my project's entry-point to these libraries, I use just one or two functions from the 'main' library in the static library suite, really (but I'm sure that these functions call many others in the other libraries in the suite).
I would ideally like to instead have a suite of dynamically linked libraries (DLLs) that wraps around each of the libs in the static lib suite; I've read/heard that the way to do this on Windows (e.g., Visual Studio 2005/2008/2010) is to "create a wrapper DLL" with some exposed functions calling the underlying static library functions. I would very much appreciate if someone can give me some detailed step-by-step including possibly some snippets, of how to go about doing this in MS Visual Studio 2005/2008/2010. I am sure some of you may already be doing this on a day-to-day basis; your experience is very much appreciated.
Edit:
For the benefit of others like myself, I am posting the first 'useful' link I found:
http://tom-shelton.net/index.php/2008/12/11/creating-a-managed-wrapper-for-a-lib-file/
"Convert a library to another library type" seems easy, but it is not. There is no straight-forward step-by-step way to do this because C++ and DLLs do not play well together at all, and your code will need to be adapted to support a DLL interface.
A concise way to describe the problem is this:
A .lib's interface is C++
A .dll's interface is C
Thus, a DLL's interface simply doesn't support C++ and you need to be clever to make it work - this is why the ambiguous existing answers.
One standard way is via COM, which means building an entire COM wrapper for the library, complete with class factory, interfaces, objects, and using BSTR instead of std::string. I would guess is not practical.
Another solution is to create a C interface for your C++ library which is DLL-safe. That means basically creating a winapi-style interface, which again is probably not practical or defeats the purpose of using your library at all. This is what #David Heffernan suggests. But what he doesn't address is how you must change your code to be DLL-compatible.
An important but subtle problem is you cannot pass ANY templated C++ objects across DLL boundaries. This means passing std::string in or out of a DLL function is considered unsafe. Each binary gets its own copy of the std::string code, and there's no guarantee that they will happen to play nicely with each other. Each binary (potentially) also gets its own copy of the CRT and you will mess up internal state of one module by manipulating objects from another.
Edit: You can export C++ objects in MSVC using __declspec(dllexport) and importing them using __declspec(dllimport). But there are a lot of restrictions on this and subtleties that cause problems. Basically this is a shortcut for getting the compiler to create a cheap C-style interface for your exported class or function. The problem is it doesn't warn you about how much unsafe stuff is happening. To reiterate:
If there are ANY templated symbols crossing DLL bounds, it is not safe (std::* for example).
Any objects with CRT-managed state should not cross DLL bounds (FILE* for example).
If you do not care about interface adaptation at all, you can export symbols from a static .lib to a .dll fairly easily. The trick is, you do not use Visual Studio GUI or projects at all, but just the linker (link.exe).
With this method, C symbols will remain C symbols and C++ symbols will remain C++ symbols. If you need to change that, you need to write wrapper code (e.g. extern C interfaces). This method simply presents existing symbols from the .objs in the .lib as official exports from the DLL.
Assume we have a .lib compiled from a source TestLib.c
#include <stdio.h>
void print(char* str)
{
printf("%s\n", str);
}
int add(int a, int b)
{
return a + b;
}
We compiled this into a static lib TestLib.lib. Now we wish to convert TestLib.lib to TestLibDll.dll (the base name should not be the same or you will get issues with the link output since the linker also creates DLL link .lib). To do this, we use link.exe outside Visual Studio GUI. Launch the "x64 Native Tools Command Prompt for Visual Studio xx" to get a cmd with the toolchain in path. (If you need 32 bit version, use x86 Native Tools instead). Change to the folder with TestLib.lib (e.g x64\Release). Then run:
link /DLL /EXPORT:add /EXPORT:print /OUT:TestLibDll.dll TestLib.lib
This should produce TestLibDll.dll. (The linker may complain a bit about there being no .obj, but you can ignore this.) The exports are:
dumpbin /exports TestLibDll.dll
Microsoft (R) COFF/PE Dumper Version 14.29.30040.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file TestLibDll.dll
File Type: DLL
Section contains the following exports for TestLibDll.dll
00000000 characteristics
FFFFFFFF time date stamp
0.00 version
1 ordinal base
2 number of functions
2 number of names
ordinal hint RVA name
1 0 00001080 add
2 1 00001070 print
We have successfully exported the functions.
In the case there are many functions, using /EXPORT is tedious. Instead make a .def file. For our example, here is TestLibDll.def:
LIBRARY TestLibDll
EXPORTS
print #1
add #2
The linker is then run as
link /DLL /DEF:TestLibDll.def /OUT:TestLibDll.dll TestLib.lib
This example uses x64 C symbols, which makes it straightforward. If you have C++ symbols, you need to provide the mangled version of the symbol in the /EXPORT argument or in the def file. For more complex situations than a single static lib, you may need to provide more link libraries on the command line and/or /LIBPATH args to point to link library folders.
Again, this method is only for exporting symbols verbatim from a static library. I personally used it to create a DLL to be loaded in Python with ctypes for a closed source static library. The advantage is you don't need to write any wrapper code or create any additional VS projects at all.
Note: the accepted answer provides a good discussion of pitfalls regarding C++ DLL interface and why C wrappers are a good idea. I did not focus on that here, only on the mechanics of getting the symbols to be exported to the DLL. Using a C interface to DLL if possible remains good advice.
This was a little big to add as a comment to tenfour's response...
If you want to still maintain a C++ API when using the DLL wrapper, you can put C++ to C conversion functions in the header file. This ensures that only C compatible data types ever cross the DLL boundary.
As an example
//MyDLL.h
class MyDLL {
public:
...
int Add2ToValues(std::vector<int>& someValues) {
int* cValues = new int[someValues.size()];
memcpy(cValues, &someValues[0], someValues.size() * sizeof(int));
int retVal = Add2ToValues_Internal(cValues, someValues.size());
someValues.assign(std::begin(cValues), std::end(cValues));
delete [] cValues;
return retVal;
}
private:
int Add2ToValues_Internal(int* valuesOut, const int numValues);
};
//MyDLL.cpp
int MyDLL::Add2ToValues_Internal(int* values, const int numValues)
{
for(int i = 0; i < numValues; ++i) {
values[i] += 2;
}
return 0;
}
One catch I ran into when doing these wrappers is that you must allocate and deallocate any memory within the header file. Since the header file will be compiled by the application that is using your library, it will use the CRT for whatever compiler you are using to build your application. All interaction with the DLL uses C, so you won't run into any runtime mismatches and all memory is allocated and freed either entirely within the DLL or entirely within the application so you don't have any cross DLL memory management issues either. In the example, I both allocated and deallocated in the header. If you need to allocate data in the _Internal function, you'll need to also add a function that allows you to free that memory within the DLL. Once inside of the _Internal functions, you are free to use as much C++ as you want.

Can a Visual Studio produced static library, be stripped of symbols?

I'll divide this questions in 3 parts:
I would like to produce a static library and strip off its symbols. (Debug info is already not included)
Similar to the strip command in linux. Can it be done?
Is there an equivalent tool in windows env, to the nm tool in linux?
When creating a static library using VS2008. Is it possible to define a script that will exclude some of the produced .obj files out of the build and out of the static lib?
Can it be dynamic? I mean I'd define a compilation mode in the script and this would result in specific object files being excluded from the build
If anything is visible that you feel should not be, try declaring it with the "static" keyword. This tells the compiler that it is accessible only to the current module.
There are cases where it would be convenient to be able to strip out all but a small number of "exported" public symbols, but it's not really feasible.
A static library is little more than a collection of .obj files. The internal dependencies haven't been resolved yet, and they won't be resolved until link time.
For example, if your .lib consists of foo.obj and bar.obj, and there's a call in foo.obj to a function defined in bar.obj, then that symbol must be available at link time, even if nothing outside of the library should be able to see it.
For that reason, you cannot strip the symbols (with the possible exception of file-scope static symbols). Even class methods that are protected or private (in the C++-sense) will exist in the symbol table, since the enforcement of the visibility is a compile-time issue, not a link-time one.
In contrast, a dynamic library is a standalone binary that has already been linked. References from foo.obj to bar.obj have already been resolved. Thus a DLL can be stripped of symbols except for the ones that must be exported (and even those can be renamed or replaced by ordinals).
If your DLL exposes a simple C API, then you're all set. But if you want to expose a C++ class, you're probably going to end up exporting all of its methods, even the protected and private ones (since inlining in the external application might result in direct calls to private methods).
No, how do you think the users of the static library would link to it without knowing where are the symbols they use defined?
Yes, try the DUMPBIN utility.
Well, yes. You can run the LIB utility with /REMOVE:foo.
That said, I think you are doing something that either is not worth doing or could be done a lot simpler than with removing library members.
I kept finding the names of certain (but not all) static functions in .obj files produced by VS2010. Interestingly, they were visible in my Release .obj files but not the Debug .obj files. I just used cygwin strings to perform the search:
$ strings myObjectFile.obj | grep myStaticFunctionName
I tracked it down to the "Whole Program Optimization = Yes" setting ("/GL"). When I switched this to "No" the function names no longer appear.
Update: As a followup test I opened the "cleansed" myObjectFile.obj in vim and I can still find them (with either :set encoding=utf-8 or :set encoding=latin1). I'm not sure why strings was missing the matches. Oh well.

static link library

I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks
In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.
Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.
Some of the space is used by C++ startup code, and the portions of the static library that you use.
In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries