How do i make dll smaller with MSVC++ 2010? - c++

i have some huge c++ projects, all of then are compiled with msvc++ 2010.
i want the DLL file to be smaller,
can anyone give me some inspiration?

Compile for release, use Link Time Code Generation (LTCG), remove unused references (OPT:ICF), put the CRT in a DLL. Don't export things from the DLL unless necessary.

In addition to the other answers, you can use upx to compress the dll, or some other compressor.
http://upx.sourceforge.net/

In addition to the above suggestions, Make sure in Project Properties->C/C++->Favor Size Or Speed, Favor small code (/Os) is selected.

Compile as release, not debug.
Link with MSVCRT dynamically instead of statically. This means you likely have to distribute the MSVCRT DLL with your program. Depending on how your program is structured, changing the links of CRT can have unintended side effects.
Remove all the code that isn't needed. Use a profiling or code coverage tool to identify code that doesn't appear to be getting called. You might be able to remove it.
Look at all the corresponding .obj files for each .c or .cpp file. If any one obj file is excessively large relative to the size of the code file, that might be a hint something could be reduced there.
Minimize the use of global instances or global data in your DLL. The binary size will bloat by the number of bytes of global variables declared.
Only export the very minimal number of functions you need to have other EXEs and DLLs import. Run "dumpbin /exports yourfile.dll" to get the list of exported functions. Only export functions that are directly called by the code relying on the DLL. If you are exporting something that no one outside of the DLL is going to call directly, don't export it. The linker will optimize it (and it's dependents) from being used if nothing internal is calling it.
Don't export entire C++ classes. Export simple C wrapper functions if your DLL is C++ code.

Related

How to reverse a DLL into C++ code?

I know it's impossible to reverse a dll into a c++ code so I would like to collect as much as possible details from it.
It's not my dll, so I don't have the source code of course. Which program should I use?
Well, if you are skilled you can disassemble the DLL and understand all of its functions. This takes a substantial amount of time, but if you do that you can reverse it back to source by hand.
Otherwise, you can start by using a tool like Dependency Walker to get the DLLs and functions it depends on, and the functions it exports. From there you can find functions that interest you, and use a disassembler like IDA to see what they do.
You can see the list of exported functions by using the dumpbin tool. If C++ functions are exported, you might be able to infer parameters by the name mangling.
You can extract all the resources from the DLL by just "opening" it as a file for resource viewing in Visual Studio. If the DLL is a COM based DLL, there's a small chance the Type Library is embedded as a resource inside it. And if you have the Type Library, you can #import it to reconstruct the header files for the public interfaces.
That's about as good as it gets.
You need a PE file viewer. This will tell you the exports from the DLL and you can get the data in the .text section to see the machine code.

Merge Mach-O executable with a static lib?

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.
How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
Analogy
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.

How does the Import Library work? Details?

I know this may seem quite basic to geeks. But I want to make it crystal clear.
When I want to use a Win32 DLL, usually I just call the APIs like LoadLibrary() and GetProcAdderss(). But recently, I am developing with DirectX9, and I need to add d3d9.lib, d3dx9.lib, etc files.
I have heard enough that LIB is for static linking and DLL is for dynamic linking.
So my current understanding is that LIB contains the implementation of the methods and is statically linked at link time as part of the final EXE file. While DLL is dynamic loaded at runtime and is not part of the final EXE file.
But sometimes, there're some LIB files coming with the DLL files, so:
What are these LIB files for?
How do they achieve what they are meant for?
Is there any tools that can let me inspect the internals of these LIB files?
Update 1
After checking wikipedia, I remember that these LIB files are called import library.
But I am wondering how it works with my main application and the DLLs to be dynamically loaded.
Update 2
Just as RBerteig said, there're some stub code in the LIB files born with the DLLs. So the calling sequence should be like this:
My main application --> stub in the LIB --> real target DLL
So what information should be contained in these LIBs? I could think of the following:
The LIB file should contain the fullpath of the corresponding DLL; So the DLL could be loaded by the runtime.
The relative address (or file offset?) of each DLL export method's entry point should be encoded in the stub; So correct jumps/method calls could be made.
Am I right on this? Is there something more?
BTW: Is there any tool that can inspect an import library? If I can see it, there'll be no more doubts.
Linking to a DLL file can occur implicitly at compile link time, or explicitly at run time. Either way, the DLL ends up loaded into the processes memory space, and all of its exported entry points are available to the application.
If used explicitly at run time, you use LoadLibrary() and GetProcAddress() to manually load the DLL and get pointers to the functions you need to call.
If linked implicitly when the program is built, then stubs for each DLL export used by the program get linked in to the program from an import library, and those stubs get updated as the EXE and the DLL are loaded when the process launches. (Yes, I've simplified more than a little here...)
Those stubs need to come from somewhere, and in the Microsoft tool chain they come from a special form of .LIB file called an import library. The required .LIB is usually built at the same time as the DLL, and contains a stub for each function exported from the DLL.
Confusingly, a static version of the same library would also be shipped as a .LIB file. There is no trivial way to tell them apart, except that LIBs that are import libraries for DLLs will usually be smaller (often much smaller) than the matching static LIB would be.
If you use the GCC toolchain, incidentally, you don't actually need import libraries to match your DLLs. The version of the Gnu linker ported to Windows understands DLLs directly, and can synthesize most any required stubs on the fly.
Update
If you just can't resist knowing where all the nuts and bolts really are and what is really going on, there is always something at MSDN to help. Matt Pietrek's article An In-Depth Look into the Win32 Portable Executable File Format is a very complete overview of the format of the EXE file and how it gets loaded and run. Its even been updated to cover .NET and more since it originally appeared in MSDN Magazine ca. 2002.
Also, it can be helpful to know how to learn exactly what DLLs are used by a program. The tool for that is Dependency Walker, aka depends.exe. A version of it is included with Visual Studio, but the latest version is available from its author at http://www.dependencywalker.com/. It can identify all of the DLLs that were specified at link time (both early load and delay load) and it can also run the program and watch for any additional DLLs it loads at run time.
Update 2
I've reworded some of the earlier text to clarify it on re-reading, and to use the terms of art implicit and explicit linking for consistency with MSDN.
So, we have three ways that library functions might be made available to be used by a program. The obvious follow up question is then: "How to I choose which way?"
Static linking is how the bulk of the program itself is linked. All of your object files are listed, and get collected together in to the EXE file by the linker. Along the way, the linker takes care of minor chores like fixing up references to global symbols so that your modules can call each other's functions. Libraries can also be statically linked. The object files that make up the library are collected together by a librarian in a .LIB file which the linker searches for modules containing symbols that are needed. One effect of static linking is that only those modules from the library that are used by the program are linked to it; other modules are ignored. For instance, the traditional C math library includes many trigonometry functions. But if you link against it and use cos(), you don't end up with a copy of the code for sin() or tan() unless you also called those functions. For large libraries with a rich set of features, this selective inclusion of modules is important. On many platforms such as embedded systems, the total size of code available for use in the library can be large compared to the space available to store an executable in the device. Without selective inclusion, it would be harder to manage the details of building programs for those platforms.
However, having a copy of the same library in every program running creates a burden on a system that normally runs lots of processes. With the right kind of virtual memory system, pages of memory that have identical content need only exist once in the system, but can be used by many processes. This creates a benefit for increasing the chances that the pages containing code are likely to be identical to some page in as many other running processes as possible. But, if programs statically link to the runtime library, then each has a different mix of functions each laid out in that processes memory map at different locations, and there aren't many sharable code pages unless it is a program that all by itself is run in more than process. So the idea of a DLL gained another, major, advantage.
A DLL for a library contains all of its functions, ready for use by any client program. If many programs load that DLL, they can all share its code pages. Everybody wins. (Well, until you update a DLL with new version, but that isn't part of this story. Google DLL Hell for that side of the tale.)
So the first big choice to make when planning a new project is between dynamic and static linkage. With static linkage, you have fewer files to install, and you are immune from third parties updating a DLL you use. However, your program is larger, and it isn't quite as good citizen of the Windows ecosystem. With dynamic linkage, you have more files to install, you might have issues with a third party updating a DLL you use, but you are generally being friendlier to other processes on the system.
A big advantage of a DLL is that it can be loaded and used without recompiling or even relinking the main program. This can allow a third party library provider (think Microsoft and the C runtime, for example) to fix a bug in their library and distribute it. Once an end user installs the updated DLL, they immediately get the benefit of that bug fix in all programs that use that DLL. (Unless it breaks things. See DLL Hell.)
The other advantage comes from the distinction between implicit and explicit loading. If you go to the extra effort of explicit loading, then the DLL might not even have existed when the program was written and published. This allows for extension mechanisms that can discover and load plugins, for instance.
These .LIB import library files are used in the following project property, Linker->Input->Additional Dependencies, when building a bunch of dll's that need additional information at link time which is supplied by the import library .LIB files. In the example below to not get linker errors I need to reference to dll's A,B,C, and D through their lib files. (note for the linker to find these files you may need to include their deployment path in Linker->General->Additional Library Directories else you will get a build error about being unable to find any of the provided lib files.)
If your solution is building all dynamic libraries you may have been able to avoid this explicit dependency specification by relying instead on the reference flags exposed under the Common Properties->Framework and References dialog. These flags appear to automatically do the linking on your behalf using the *.lib files.
This however is as it says a Common Properties, which is not configuration or platform specific. If you need to support a mixed build scenario as in our application we had a build configuration to render a static build and a special configuration that built a constrained build of a subset of assemblies that were deployed as dynamic libraries. I had used the Use Library Dependency Inputs and Link Library Dependencies flags set to true under various cases to get things to build and later realizing to simplify things but when introducing my code to the static builds I introduced a ton of linker warnings and the build was incredibly slow for the static builds. I wound up introducing a bunch of these sort of warnings...
warning LNK4006: "bool __cdecl XXX::YYY() already defined in CoreLibrary.lib(JSource.obj); second definition ignored D.lib(JSource.obj)
And I wound up using the manual specification of Additional Dependencies to satisfy the linker for the dynamic builds while keeping the static builders happy by not using a common property that slowed them down. When I deploy the dynamic subset build I only deploy the dll files as these lib files are only used at link time, not at runtime.
Here are some related MSDN topics to answer my question:
Linking an Executable to a DLL
Linking Implicitly
Determining Which Linking Method to Use
Building an Import Library and Export File
There are three kinds of libraries: static, shared and dynamically loaded libraries.
The static libraries are linked with the code at the linking phase, so they are actually in the executable, unlike the shared library, which has only stubs (symbols) to look for in the shared library file, which is loaded at run time before the main function gets called.
The dynamically loaded ones are much like the shared libraries, except they are loaded when and if the need arises by the code you've written.
In my mind, there are two method to link dll to exe.
Use dll and the import library (.lib file) implicitly
Use functions like loadlibrary() explicitly

Building C++ source code as a library - where to start?

Over the months I've written some nice generic enough functionality that I want to build as a library and link dynamically against rather than importing 50-odd header/source files.
The project is maintained in Xcode and Dev-C++ (I do understand that I might have to go command line to do what I want) and have to link against OpenGL and SDL (dynamically in SDL's case). Target platforms are Windows and OS X.
What am I looking at at all?
What will be the entry point of my
library if it needs one?
What do I have to change in my code?
(calling conventions?)
How do I release it? My understanding
is that headers and the compiled
library (.dll, .dylib(, .framework),
whatever it'll be) need to be
available for the project -
especially as template functionality
can not be included in the library by
nature.
What else I need to be aware of?
I'd recommend building as a statc library rather than a DLL. A lot of the issues of exporting C++ functions and classes go away if you do this, provided you only intend to link with code produced by the same compiler you built the library with.
Building a static library is very easy as it is just an collection of .o/.obj files - a bit like a ZIP file but without compression. There is no need to export anything - just include the library in the list of files that your application links with. To access specific functions or classes, just include the relevant header file. Note you can't get rid of header files - the C++ compilation model, particularly for templates, depends on them.
It can be problematic to export a C++ class library from a dynamic library, but it is possible.
You need to mark each function to be exported from the DLL (syntax depends on the compiler). I'm poking around to see if I can find how to do this from xcode. In VC it's __declspec(dllexport) and in CodeWarrior it's #pragma export on/#pragma export off.
This is perfectly reasonable if you are only using your binary in-house. However, one issue is that C++ methods are named differently by different compilers. This means that nobody who uses a different compiler will be able to use your DLL, unless you are only exporting C functions.
Also, you need to make sure the calling conventions match in the DLL and the DLL's client. This either means you should have the same default calling convention flag passed to the compiler for both the DLL or the client, or better, explicitly set the calling convention on each exported function in the DLL, so that it won't matter what the default is for the client.
This article explains the naming issue:
http://en.wikipedia.org/wiki/Name_decoration
The C++ standard doesn't define a standard ABI, and that's bad news for people trying to build C++ libraries. This means that you get different behavior from your compiled code depending on which flags were used to compile it, and that can lead to mysterious bugs in code that compiles and links just fine.
This extends beyond just different calling conventions - C++ code can be compiled to support or not support RTTI, exception handling, and with various optimizations that can affect the the memory layout of class instances, which C++ code relies on.
So, what can you do? I would build C++ libraries inside my source tree, and make sure that they're built as part of my project's build, and that all the libraries and the code that links to them use the same compiler flags.
Note that name mangling, which was supposed to at least prevent you from linking object files that were compiled with different compilers/compiler flags only mostly works, and there are certain things you can do, especially with GCC, that will result in code that links just fine and fails at runtime.
You have to be extra careful with vendor supplied dynamic C++ libraries (QT on most Linux distributions, for example.) I've seen instances of vendor supplied libraries that were compiled in ways that prevented certain things from working properly. For example, some Redhat Linux releases (maybe all of them) disabled exceptions in QT, which made it impossible to catch exceptions in main() if the exceptions were thrown in a QT callback. Fun.

why are my visual studio .obj files are massive in size compared to the output .exe?

As a background, I am a developer of an opensource project, a c++ library called openframeworks, that is a wrapper for different libraries, like opengl, quicktime, freeImage, etc. In the next release, we've added a c++ library called POCO, which is similar to boost in some ways in that it's an alternative for java foundation library type functionality.
I've just noticed, that in this latest release where I've added the POCO library as a statically linked library, the .obj files that are produced during the act of compilation are really massive - for example, several .obj files for really small .cpp files are 2mb each. The overall compiled .obj files are about 12mb or so. On the flip side, the exes that are produced are small - 300k to 1mb.
In comparison, the same library compiled in code::blocks produces .obj files that are roughly the same size at the exe - they are all fairly small.
Is there something happening with linking, and the .obj process in visual studio that I don't understand? for example, is it doing some kind of smart prelinking, or other thing, that's adding to the .obj size? I've experimented a bit with settings, such as incremental linking, etc, and not seen any changes.
thanks in advance for any ideas to try or insights !
-zach
note: thanks much! I just tried, dumpbin, which says "anonymous object" and doesn't return info about the object. this might be the reason why....
note 2, after checking out the above link, removing LTCG (link time code generation - /GL) the .obj files are much smaller and dumpbin understands them. thanks again !!
I am not a Visual Studio expert by any stretch of imagination, having hardly used it, but I believe Visual Studio employs link-time optimizations, which can make the resulting code run faster, but can cost a lot of space in the libraries. Also, it may be (I don't know the internals) that debugging information isn't stripped until the actual linking phase.
I'm sure someone's going to come with a better/more detailed answer anyway.
Possibly the difference is debug information.
The compiler outputs the debug information into the .obj, but the linker does not put that data into the .exe or .dll. It is either discarded or put into a .pdb.
In any case use the Visual Studio DUMPBIN utility on the .obj files to see what's in them.
Object files need to contain sufficient information for linking. In C++, this is name-based. Two object files refer to the same object (data/function/class) if they use the same name. This implies that all object files must contain names for all objects that might be referenced by other object files. The executable however will need the names visible from outside the library. In case of a DLL, this means only the names exported. The saving is twofold: there are less names, and those names are present only once in the DLL.
Modern C++ libraries will use namespaces. These namespaces mean that object names become longer, as they include the names of the encapsulating namespaces too.
The compiled library obj files will be huge because they must contain all of the functions, classes and template that your end users might eventually use.
Executables which link to your library will be smaller because they will include only the compiled code that they require to run. This will usually be a tiny subset of the library.