g++ produces big binaries despite small project - c++

Probably this is a common question. In fact I think I asked it years ago... but I can't remember the answer.
The problem is: I have a project that is composed of 6 source files. All of them no more than 200 lines of code. It uses many STL containers, stdlib.h and iostream. Now the executable is around 800kb in size.... I guess I shouldn't statically link libraries. How to do this with GCC? And in Eclipse CDT?
As I responses away from what I want I think it's the case for a clarification. What I want to know is why such a small program is so big in size and what is the relationship with static, shared libraries and their difference. If it's a too long story to tell feel free to give pointers to docs. Thank you

If you give g++ dynamic library names, and don't pass the -static flag, it should link dynamically.
To reduce size, you could of course strip the binary, and pass the -Os (optimize for size) optimization flag to g++.

One thing to remember is that using the STL results in having that extra code in your executable even if you are dynamically linking with the C++ library. This is by virtue of the fact that the STL is a bunch of templates that aren't actually compiled until you write and compile your code. Since the library can't anticipate what you might store in a container, there's no way for the library to already contain the code for that particular usage of the container. Same goes with algorithms and everything else in the STL.
I'm not saying this is definitely the reason your executable is so much larger than you expect. But it may be a factor.

Use -O3 and -s flags to produce the most optimized binary. Also see this link for some more information.
If you are building for Windows, consider using the Microsoft compiler. It always produces the smallest binary on that platform.

Eclipse should be linking dynamically by default, unless you've set the static flag on the linker in your makefile.
In response to your EDIT :
-when you link statically, the executable contains a full copy of each library you've linked to.
-when you link dynamically, the executable only contains references and hooks to the linked libraries, which is a much much smaller amount of code.

The executable has to contain more than just your code.
At the very least, it contains some startup code, setting up the environment and if necessary, loading any external libraries, before the program launches.
If you've statically linked the runtime library, you also get that included in your executable. Otherwise you only get a small stub, just big enough to redirect system calls to the external runtime.
It may, depending on compiler settings also include a lot of debugging info and other non-essential data. If optimizations are enabled, that may have increased code size as well.
The real question is why does this matter? 800KB still fits easily on a floppy disk!
Most of this is a one-time cost. it doesn't mean that if you write twice as much code, it'll take up 1600KB. More likely, it'll take 810KB or something like that.
Don't worry about one-time startup costs.

The size usually results in static libraries being linked into your application.
You can reduce the size of the compiled binary by compiling to RELEASE versions, with optimizations to binary size.
Another source of executable size are the libraries. You said that you don't use external libraries, except for STD, so I believe you're including the C Runtime with your executable, ie, linking statically. so check for dynamic linking.
IMO you shouldn't really worry about that, but if you're really paranoid, check this: Smallest x86 ELF Hello World

If C++ compiles to machine code, why do we need to install a 'runtime'?

At the end of the compilation process, the program is in a .exe file in machine code. So shouldn't the machine be able to run it without having to install something like MS Visual Studio C++? Basically, I am making a program with mingw and want to share it with someone else. I do not understand why I can not just send them the .exe file. Clarification will be appreciated.
C++ compiles your code to machine code. If your program is self-contained, that is all you need. However, more complex running programs often relies on additional compiled code, which is made available to your program through a library.
Generally, libraries come in two "flavors" - static and dynamic. Static libraries are "baked into" your compiled code. This is not ideal, because multiple programs include identical code, leading to code duplication. Dynamic libraries, on the other hand, are shared among all programs using them, leading to more efficient use of space.
Installing runtime adds dynamic libraries for use by all programs compiled with C++.
Your program likely calls many functions from the standard library that you didn't write yourself. You need the runtime libraries for that. Your code probably also needs code run before main to setup the basic environment that's expected for a C++ program - the runtime libs do that for you. Also after main ends, various cleanup needs to happen according to the standard (and your program probably also depends on this) and the compilers runtime libraries take care of this.
Your code does not exist in a vacuum (it can, but then it's no longer a standard hosted C++ program). It depends on and relies on the standard runtime libs to provide the environment the C++ standard says you can expect.

Merge Mach-O executable with a static lib?

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.
How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.

Changes in lines that will not execute breaks the build !

I have this job of implementing a library that provides a file-sharing feature.
This has already happened twice:
First, in a string in an if-else path, only the if path is being executed, but when i change a spelling in the else path, the software after a few minutes crashes in an std library. I verified with a debug attached that the lined changed was never being touched. When i reversed the change, it works nicely again.
Second, my software crashes on a std library again with the out-of-array check into a standard basic_string destructor.
I did everything, all library matched the _HAS_ITERATOR_DEBUGGING.
After 4 hours I discovered that the problematic file is TorrentFile.cpp/h.
If i add a function ( even though it is never called ), the program crashes at the end of that file, but if its not there, there's no bug. The code causing the problem:
std::vector<TorrentFileListPacket> TorrentFile::GetFileMap()
std::vector<TorrentFileListPacket> vFiles;
return vFiles;
If i comment this code out, the crash is gone.
This is really driving me crazy!
I've been a developer for 8 years, and I've never seen something like this before!
Additional Information
My memory is OK, I'm using Visual Studio 2010 with SP1 in Windows 7. The library is libTorrent from RasterBar and it links to boost. The software is using MFC.
This smells strongly of memory corruption in a totally different location from where you would expect from the crashes. Most likely adding and removing functions is changing the memory layout in such a way that causes the memory corruption's effects to be immediately visible or not.
Your best hope is something like Purify or Valgrind to hunt it down.
You probably want to make sure that all your object files and libraries are ABI compatible with each other.
Numerous compiler settings will change the ABI. Especially debug and release builds and iterator debugging. The struct layout for standard containers typically change when you enable iterator debugging (which I believe is on by default for all debug builds in msvc, and off for release builds).
So, if a single object file, static library or DLL that you link against is built with an incompatible configuration, you typically see very odd behaviors. With libtorrent you need to make sure you build the library with the same configuration as you link against it with. Many of the TORRENT_* defines will actually change some aspect of some struct layout or function call. Make sure you define the exact same set of those in your client as when building the library. One simple way of dealing with this problem is to simply pull all source files into your project and build everything together.
If you are using libtorrent as a DLL (or boost for that matter), are they compiled against the same C Run-Time?
Often when I run into this type of issue it is because I make a call into a library that was compiled with MinGW (which uses the CRT from VS6.0) or an older version of Visual Studio. If memory is allocated by the library and then free'd by your application, you will often get these types of errors in the destructor.
If you aren't sure, you can open the DLL in question in a tool like the Dependency Walker. Look for the dependency MSVCRT.DLL, MSVCR100.DLL, etc.

Same binary code on Windows and Linux (x86)

I want to compile a bunch of C++ files into raw machine code and the run it with a platform-dependent starter written in C. Something like
fread(buffer, 1, len, file);
How can I tell g++ to output raw code?
Will function calls work? How can I make it work?
I think the calling conventions of Linux and Windows differ. Is this a problem? How can I solve it?
EDIT: I know that PE and ELF prevent the DIRECT starting of the executable. But that's what I have the starter for.
There is one (relatively) simple way of achieving some of this, and that's called "position independent code". See your compiler documentation for this.
Meaning you can compile some sources into a binary which will execute no matter where in the address space you place it. If you have such a piece of x86 binary code in a file and mmap() it (or the Windows equivalent) it is possible to invoke it from both Linux and Windows.
Limitations already mentioned are of course still present - namely, the binary code must restrict itself to using a calling convention that's identical on both platforms / can be represented on both platforms (for 32bit x86, that'd be passing args on the stack and returning values in EAX), and of course the code must be fully self-contained - no DLL function calls as resolving these is system dependent, no system calls either.
You need position-independent code
You must create self-contained code without any external dependencies
You must extract the machine code from the object file.
Then mmap() that file, initialize a function pointer, and (*myblob)(someArgs) may do.
If you're using gcc, the -ffreestanding -nostdinc -fPIC options should give you most of what you want regarding the first two, then use objdump to extract the binary blob from the ELF object file afterwards.
Theoretically, some of this is achievable. However there are so many gotchas along the way that it's not really a practical solution for anything.
System call formats are totally incompatible
DEP will prevent data executing as code
Memory layouts are different
You need to effectively dynamically 'relink' the code before you can run it.
.. and so forth...
The same executable cannot be run on both Windows and Linux.
You write your code platform independently (STL, Boost & Qt can help with this), then compile in G++ on Linux to output a linux-binary, and similarly on a compiler on the windows platform.
EDIT: Also, perhaps these two posts might help you:
Why don't you take a look at wine? It's for using windows executables on Linux. Another solution for that is using Java or .NET bytecode.
You can run .NET executables on Linux (requires mono runtime)
Also have a look at Agner's objconv (disassembling, converting PE executable to ELF etc.)
Someone actually figured this out. It’s called αcτµαlly pδrταblε εxεcµταblε (APE) and you use the Cosmopolitan C library. The gist is that there’s a way to cause Windows PE executable headers to be ignored and treated as a shell script. Same goes for MacOS allowing you to define a single executable. Additionally, they also figured out how to smuggle ZIP into it so that it can incrementally compress the various sections of the file / decompress on run.
Example of a single identical Lua binary running on Linux and Windows:
Doing such a thing would be rather complicated. It isn't just a matter of the cpu commands being issued, the compiler has dependencies on many libraries that will be linked into the code. Those libraries will have to match at run-time or it won't work.
For example, the STL library is a series of templates and library functions. The compiler will inline some constructs and call the library for others. It'd have to be the exact same library to work.
Now, in theory you could avoid using any library and just write in fundamentals, but even there the compiler may make assumptions about how they work, what type of data alignment is involved, calling convention, etc.
Don't get me wrong, it can work. Look at the WINE project and other native drivers from windows being used on Linux. I'm just saying it isn't something you can quickly and easily do.
Far better would be to recompile on each platform.
That is achievable only if you have WINE available on your Linux system. Otherwise, the difference in the executable file format will prevent you from running Windows code on Linux.

c vs c++ on solaris 9 platform question

I have a program that I am sharing with a third party. I will be providing a bin executable to them. It is written in c++ but uses some c as well. they are suggesting that it needs to be c only. Do you guys think this will be a problem since I will be compiling and building it on a sparc station that will somewhat match their system specs like solaris 9 and the chipset (32 or 64) depending on what they use?
is solaris 9 able to compile the c++ code that i used or do they need to add c++ runtime libraries on their end. I am using c++ std classes. in any event if i am building it all on my end why worry about what they have? its not a static/dynamic lib that i am sharing where i think that would come into play.
just curious since they are saying it needs to be a c compilation. I suspect if they are expecting a lib then perhaps I need to address that but if its just a executable then the system specs like os and chipset is all that matters?
if i am wrong in this assumption please let know where.
Worst case you can always statically link in the C++ runtime library.
If you are only sending them an executable, I don't see why the language makes any difference whatsoever. If you're also sharing code, of course, that's an entirely different story.
Since you're providing them with only an executable (no shared libraries), you shouldn't have too much trouble.
Just run a ldd command on your binary and see which C++ libraries it links against (you might see libstdc++ for instance, if you use g++); you should include those along with your executable. Don't rely on the user having them, they might be missing or might be incompatible. You will want to use the -rpath (linker switch) to make sure your binary will use the libraries you provide and not any library found in the system.
Also, it's better to compile on an older Solaris to provide compatibility, i.e. don't compile on Solaris 10 for Solaris 7, but on 7 for 10. You get the ideea...