How can I analyze the size of my firmware image?

How can I analyze the size of my firmware image? - c++

We're currently developing a firmware for an IoT product based on the popular ESP32 chip using their ESP-IDF framework, which builds the binaries using a GCC/G++-Toolchain named xtensa (https://github.com/icamgo/xtensa-toolchain).
Recently, I noticed that the binary size got fairly huge (just shy of 1 MB) and decided to take a look and try to reduce it. NDEBUG is defined, -Os is enabled and the output is being stripped.
Basically, the toolchain produces an .elf file, so I took a look at its contents:
nm -S -C --size-sort <my-app>.elf
The six largest functions (6 kB-12 kB in size) are:
4011b24c 0000187b T __ssvfscanf_r
400f9f38 00001ffa T _svfiprintf_r
400f2aa4 000020fe T _vfiprintf_r
4012005c 000030d2 T _svfwprintf_r
400ef4d8 000030de T _svfprintf_r
400f50dc 000031e6 T _vfprintf_r
So, the largest functions in my firmware image are vfprintf and friends, adding up to about 60 kB of binary size alone. Why are they this big? How can I reduce their size or get rid of them (I don't need vfprintf at all since I don't have a file system on the microcontroller)?
Are there any further techniques to reduce the binary size? How would I proceed in my quest?
Edit (Clarification on the reason for the optimization):
There are different versions of the ESP32 with up to 16 MB of flash memory. The one we use has 4 MB. 1 MB of that is reserved for storing pinned server certificates, trusted URLs configuration options. And, since we want OTA-update functionality, we need to keep the same amount of flash memory as consumed by the application image free for a new version of it. That leaves us with 1.5 MB of flash for our application image, which is not too far away from our current 1 MB. Therefore, I think it is justified to think about size reduction before that problem stops us from introducing new features at all.
I do realize that 60 kB of unwanted vfprintf() functions is a small part of 1 MB, but we do need a lot of actually helpful libraries (mbedtls for encryption, a full IP-stack, a thin web server, ...). I cannot get rid of these, so I'd like to reduce the size as much as possible and feasible by removing functions I don't have any use for.

Considering the size of individual functions is not a sound approach. A single "tiny" function may have hundreds of equally tiny dependencies in its call graph which on aggregate constitute a huge chunk. For example for the following:
int main()
{
for(;;)
{
do_statemachine() ;
}
return 0 ;
}
main() will be tiny but ultimately causes all the rest of the application be linked because of whatever do_statemachine() does, which could be any size. You need to consider the total size of a function and all its dependencies.
Also the total size of static or constant data initialisers, also be stored in ROM needs consideration too.
You should use the linker to generate map file and call graph - that is likley to be more useful than using nm after the event.
With respect to the specific symbols in your question, you have to ask yourself what are you calling in stdio? For example printf needs stream access (for stdout), format specifier parsing and variadic argument traversal - that is all provided by vfprintf. If that were not so you'd have duplicated code, and while you might link fewer functions, they'd all be very large and potentially exhibit different behaviour. The fact that you have "file" oriented functions in the link is not an issue specifically; stdio operates on stream rather than files - "file" is conceptual, not physical. If you have not hooked your library into a file system (or if one is not provided in the tools already) no filesystem code will be included. The low-level stream access is performed by low-level I/O functions that may or may not support file access.
Another possibility is that the library lacks granularity - if all these functions were defined in the same object module, the linker will have no choice but to link them all even if they are not referenced. That might explain why you have integer, floating point and wide-character versions in the link.

Where those symbols came there are more wchar related symbols that you dont need.
You can get rid of those symbols by building the ESP32 toolchain with
-fdata-sections -ffunction-sections
enabled for newlib. Also set the flag --disable-wchar_t on libstdc++. Then use -Wl,--gc-sections to get rid of those sections.
You can also try to use -flto it should do the same for you - but I ran into issues when building newlib and libstdc++ with lto. It seems that the build tooling of libstdc++ has issues with the newlib archives then.
Nonetheless lto is what you should prefer when possible, as it also nicely detects and ODR violations or other broken code that a unit-by-unit compilation might hide.

Related

What's the best way to make DLL size as small as possible?

I am using LoadLibraryA to load my DLL's into my project. I've just started to notice their sizes are starting to get large as I keep adding more functions, etc. Are there any options in my project settings that can help reduce the size of my DLL's?

As every other person mentioned, you can use compiler options to reduce your size. At first, try to tweak these options for better result. These options normally affect size of your code.
But if you have a lot of resources in your EXE/DLL, you will not see much difference. If you really need a small size in this case, I suggest you to use a PE-packer. A very good free PE-packer is UPX.
UPX is an advanced executable file compressor. UPX will typically
reduce the file size of programs and DLLs by around 50%-70%, thus
reducing disk space, network load times, download times and
other distribution and storage costs.
You need to run upx as a post build process to pack your EXE/DLL file with a command like this:
upx --best mydll.dll
PE-packers compress your code and resources and encapsulate them in another EXE/DLL file. Then these files will be unpacked at runtime automatically, so you can use them like a normal EXE/DLL file. Even though PE-packers compress codes too, they are super effective when you have a lot of resources in your EXE/DLL.

The size of DLLs depends mostly on what code is reachable from the exported functions. During the link phase, everything not reachable from any exported function is dropped, but you still end up storing everything you didn't actually use from the outside.
This behaves different from a static library, which also includes everything at first, but where the linker is deferred until consuming the library, so you never end up with dead code not reachable in the linked output.
LTGC + function level sections + inline everything + code folding + string folding + /O2 is likely going to cut away a lot of the file size - at the cost of any chance of proper debugging though. (With aggressive inlining and folding of redundant stuff, call stacks stop being resolvable correctly.)
Forget about what you might have been reading about /O1 being smaller. It is only initially (pre-linking), as it mostly prevents inlining and loop unrolling, but with advances in compiler technologies this is nowadays a hindrance for exhaustive compile-time constant expression evaluation which often saves not only computational cost at runtime (and we are often talking factor 2-10x!), but also often enough also compiles into something more compact.

How can I get my very large program to link?

Our next product has grown too large to link on a machine running 32-bit Windows. The sum total of all the lib files exceeds 2Gb and can only be linked on a 64-bit Windows machine. Eventually we will exceed that boundary, since our software tends to grow rather than contract and we are using a 32-bit linker (MS Visual Studio 2005): we expect to hit trouble when our lib size total exceeds 3Gb.
How can I reduce the size of the .lib files, or the .obj files without trimming code? For example, we use a lot of templates: is there any way of reducing their footprint? Is there any way of finding out what's causing the bloat from examining the .lib/.obj files? Can this be automated rather than inspected by eye? 2.5Gb is a lot of text to peer through and compare.
External constraints prevent us from shipping as anything other than a single .exe, so a DLL solution is not available.

I had once been working on a project with several MLoC. While ours would still link on a 32bit machine, link times where abysmal and became a major problem, because developers were reduced to only get a dozen edit-compile-test cycles done per workday. (Compile times were handled pretty well by doing distributed compilation.)
We switched to dynamic linking. That increased startup time, but this could be managed by delay-loading of DLLs.

First, of course, make sure you compile with the 'Optimize for Size' option.
If you do that, I wouldn't expect inlining, at least, to contribute significantly to the code size. The compiler makes a tradeoff for every inlining candidate regarding how much (if at all) it'd increase code size, compared to the performance boost it'd give. And if you're optimizing for size, the compiler won't risk bloating the code much. (Note that inlining very small functions can actually decrease code size)
Second, have you considered unity builds? That'd pretty much eliminate the linker entirely, and with only one translation unit, there'd be much less duplicate work and hopefully, a smaller memory footprint.
Finally, I know Visual Studio (or possibly the Windows SDK) has a 64-bit compiler (that is, a compiler that is itself a 64-bit application, not just a compiler producing 64-bit code). Consider using that. (I don't know if there is also a 64-bit linker)
I don't know i the linker is built with the LARGEADDRESSAWARE flag set. If so, running it on a 64-bit machine will let the process consume a full 4GB of memory instead of the 2 GB it normally gets. (if necessary, you can add the flag yourself by modifying the PE header)
Perhaps limiting the linkage of various symbols could help as well. If you know that a symbol won't be needed outside of the current translation unit, put it in an anonymous namespace. That might allow the compiler to trim down unused symbols before passing everything on to the linker

Try using the Symbol Sort program to show you where the main bits of bloat are in your code. Also just looking at the size of the raw .obj files will give you a reasonable idea of where to target.

OMFG!!!!! That's huuuuuge!
Apart from the fact I think it's too big to be rational... can't you use dynamic linking to avoid linking all the mess in compile time and only link in runtime what's necesary (I mean, loading dlls in demand)?

Does it need to be one big app?
One option is to split various modules into DLLs and load/unload them as needed.
Alternatively, you might be able to split into several apps and share data using mapped memory, pipes a DBMS or even simple data files.

First of all, find out how to measure the size which is used by various features. Don't go ahead and try to play replace template usage or other things because you suspect that it makes a significant difference.
Run
dumpbin /HEADERS <somebinary>
to find out which sections in your binary are causing the huge size. Do you have a huge Debug Directory section? Strip symbols then. Is the Import Address Table large? Check the table and locate symbols which you don't need (a problem with templates is that symbols of template instantiations tend to be very very large). Similiar analysis can be done for the Exception Directory, COM Descriptor Directory etc..

I do not think there is any single tool that can give you statistics that you want/need. Using either .map files or the dumpbin utility with /SYMBOLS parameter plus some post-processing of the created log might help you get what you want.
If the statistics confirm your suspicion of template bloat, or even without the confirmation, it might be a good idea to do several things with the source:
Try using explicit instantiations and move template definitions into .cpp files. Of course this works only if you have limited and well known set of types/values that you use as arguments to the templates.
Add more abstraction and/or indirection. Factor code that does not depend on your template parameters into their own base classes or free functions. If you have several template type parameters, see if you cannot split the single class template into several base classes without overlapping template parameters. (See http://www2.research.att.com/~bs/SCARY.pdf.)
Try using the pimpl idiom; avoid instantiating templates in headers if you can, instantiate them only in .cpp files.
Templates are nice but sometimes ordinary classes work as well; e.g. avoid passing integer constants as non-type template parameters if you can pass them as parameter to ctor.

#hatcat and #jalf: There is indeed a full set of 64bit tools. For example, you can set an environment variable:
set PreferredToolArchitecture=x64
and then run Visual Studio (from de developer console).

GCC: Empty program == 23202 bytes?

test.c:
int main()
{
return 0;
}
I haven't used any flags (I am a newb to gcc) , just the command:
gcc test.c
I have used the latest TDM build of GCC on win32.
The resulting executable is almost 23KB, way too big for an empty program.
How can I reduce the size of the executable?

Don't follow its suggestions, but for amusement sake, read this 'story' about making the smallest possible ELF binary.

How can I reduce its size?
Don't do it. You just wasting your time.
Use -s flag to strip symbols (gcc -s)

By default some standard libraries (e.g. C runtime) linked with your executable. Check out keys --nostdlib --nostartfiles --nodefaultlib for details. Link options described here.
For real program second option is to try optimization options, e.g. -Os (optimize for size).

Give up. On x86 Linux, gcc 4.3.2 produces a 5K binary. But wait! That's with dynamic linking! The statically linked binary is over half a meg: 516K. Relax and learn to live with the bloat.
And they said Modula-3 would never go anywhere because of a 200K hello world binary!
In case you wonder what's going on, the Gnu C library is structured such as to include certain features whether your program depends on them or not. These features include such trivia as malloc and free, dlopen, some string processing, and a whole bucketload of stuff that appears to have to do with locales and internationalization, although I can't find any relevant man pages.
Creating small executables for programs that require minimum services is not a design goal for glibc. To be fair, it has also been not a design goal for every run-time system I've ever worked with (about half a dozen).

Actually, if your code does nothing, is it even fair that the compiler still creates an executable? ;-)
Well, on Windows any executable would still have a size, although it can be reasonable small. With the old MS-DOS system, a complete do-nothing application would just be a couple of bytes. (I think four bytes to use the 21h interrupt to close the program.) Then again, those application were loaded straight into memory.
When the EXE format became more popular, things changed a bit. Now executables had additional information about the process itself, like the relocation of code and data segments plus some checksums and version information.
The introduction of Windows added another header to the format, to tell MS-DOS that it couldn't execute the executable since it needed to run under Windows. And Windows would recognize it without problems.
Of course, the executable format was also extended with resource information, like bitmaps, icons and dialog forms and much, much more.
A do-nothing executable would nowadays be between 4 and 8 kilobytes in size, depending on your compiler and every method you've used to reduce it's size. It would be at a size where UPX would actually result in bigger executables! Additional bytes in your executable might be added because you added certain libraries to your code. Especially libraries with initialized data or resources will add a considerable amount of bytes. Adding debug information also increases the size of the executable.
But while this all makes a nice exercise at reducing size, you could wonder if it's practical to just continue to worry about bloatedness of applications. Modern hard disks will divide files up in segments and for really large disks, the difference would be very small. However, the amount of trouble it would take to keep the size as small as possible will slow down development speed, unless you're an expert developer whom is used to these optimizations. These kinds of optimizations don't tend to improve performance and considering the average disk space of most systems, I don't see why it would be practical. (Still, I do optimize my own code in similar ways but then again, I am experienced with these optimizations.)
Interested in the EXE header? It's starts with the letters MZ, for "Mark Zbikowski". The first part is the old-style MS-DOS header for executables and is used as a stub to MS-DOS saying the program is not an MS-DOS executable. (In the binary, you can find the text 'This program cannot be run in DOS mode.' which is basically all it does: displaying that message. Next is the PE header, which Windows will recognise and use instead of the MS-DOS header. It starts with the letters PE for Portable Executable. After this second header there will be the executable itself, divided in several blocks of code and data. The header contains special reallocation tables which tells the OS where to load a specific block. And if you can keep this to a limit, the final executable can be smaller than 4 KB, but 90% would then be header information and no functionality.

I like the way the DJGPP FAQ addressed this many many years ago:
In general, judging code sizes by looking at the size of "Hello" programs is meaningless, because such programs consist mostly of the startup code. ... Most of the power of all these features goes wasted in "Hello" programs. There is no point in running all that code just to print a 15-byte string and exit.

What is the purpose of this exercise?
Even with as low a level language as C, there's still a lot of setup that has to happen before main can be called. Some of that setup is handled by the loader (which needs certain information), some is handled by the code that calls main. And then there's probably a little bit of library code that any normal program would have to have. At the least, there's probably references to the standard libraries, if they are in dlls.
Examining the binary size of the empty program is a worthless exercise in and of itself. It tells you nothing. If you want to learn something about code size, try writing non-empty (and preferably non-trivial) programs. Compare programs that use standard libraries with programs that do everything themselves.
If you really want to know what's going on in that binary (and why it's so big), then find out the executable format get a binary dump tool and take the thing apart.

What does 'size a.out' tell you about the size of the code, data, and bss segments? The majority of the code is likely to be the start up code (classically crt0.o on Unix machines) which is invoked by the o/s and does set up work (like sorting out command line arguments into argc, argv) before invoking main().

Run strip on the binary to get rid of the symbols. With gcc version 3.4.4 (cygming special) I drop from 10k to 4K.
You can try linking a custom run time (The part that calls main) to setup your runtime environment. All programs use the same one to setup the runtime environment that comes with gcc but for your executable you don't need data or zero'ed memory. The means you could get rid of unused library functions like memset/memcpy and reduce CRT0 size. When looking for info on this look at GCC in embedded environment. Embedded developers are general the only people that use custom runtime environments.
The rest is overheads for the OS that loads the executable. You are not going to same much there unless you tune that by hand?

Using GCC, compile your program using -Os rather than one of the other optimization flags (-O2 or -O3). This tells it to optimize for size rather than speed. Incidentally, it can sometimes make programs run faster than the speed optimizations would have, if some critical segment happens to fit more nicely. On the other hand, -O3 can actually induce code-size increases.
There might also be some linker flags telling it to leave out unused code from the final binary.

What are some techniques or tools for profiling excessive code size in C/C++ applications?

I have a C++ library that generates much larger code that I would really expect for what it is doing. From less than 50K lines of source I get shared objects that are almost 4 MB and static archives pushing 9. This is problematic both because the library binaries are quite large, and, much worse, even simple applications linking against it typically gain 500 to 1000 KB in code size. Compiling the library with flags like -Os helps this somewhat, but not really very much.
I have also experimented with GCC's -frepo command (even though all the documentation I've seen suggests that on Linux collect2 will merge duplicate templates anyway) and explicit template instantiation on templates that seemed "likely" to be duplicated a lot, but with no real effect in either case. Of course I say "likely" because, as with any kind of profiling, blind guessing like this is almost always wrong.
Is there some tool that makes it easy to profile code size, or some other way I can figure out what is taking up so much room, or, more generally, any other things I should try? Something that works under Linux would be ideal but I'll take what I can get.

If you want to find out what is being put into your executable, then ask your tools. Turn on the ld linker's --print-map (or -M) option to produce a map file showing what it has put in memory and where. Doing this for the static linked example is probably more informative.
If you're not invoking ld directly, but only via the gcc command line, you can pass ld specific options to ld from the gcc command line by preceding them with -Wl,.

On Linux the linker certainly does merge multiple template instantiations.
Make sure you aren't measuring debug binaries (debug info could take up more than 75% of the final binary size).
One technique to reduce final binary size is to compile with -ffunction-sections and -fdata-sections, then link with -Wl,--gc-sections.
Even bigger reduction (we've seen 25%) may be possible if you use development version of [gold][1] (the new ELF-only linker, part of binutils), and link with -Wl,--icf
Another useful technique is reducing the set of symbols which are "exported" by your shared libraries (everything is exported by default), either via __attribute__((visibility(...))), or by using linker script. Details here (see "Export control").

One method that is very crude but very quick is to look at the size of your object files. Not all the code in the object files will be compiled into the final binary, so there may be a few false positives, but it can give a good impression of where the hotspots will be. Once you've found the largest object files you can then delve into them with tools like objdump and nm.

MAP file analysis - where's my code size comes from?

I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?

This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.

The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.

Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.

No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.

Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js