How to measure Code Size?

How to measure Code Size? - c++

When certain features or optimizations are discussed, Code Size is often mentioned.
While I certainly understand the basic concept, that is, that a collection of code, compiled to machine code will result in X bytes of machine code (plus static data) I have recently realized that I'm very unsure how to actually measure Code Size of a given binary.
So, how do you measure Code Size?
Do you just check how big the resulting binary ("executable", .exe) is?
Do you need a tool such as dumpbin.exe or some specific linker flags to get detailed results?

You can tell the linker to produce a map file. This gives about the most detailed information that's easy to get (i.e., much short of reverse engineering the code by hand).
Depending on the code, using dumpbin on an object file can produce meaningful results, but can also produce simply "anonymous object" -- especially (exclusively?) when you ask for link-time code generation.

I'd say your best bet is to disassemble the binary.
In the context of code optimizations, total code size isn't typically what is meant, but rather code size for some specific part of your program.

If you mean .exe in bytes in the literal term I think you're over-thinking the question. Your file explorer should say on the right the size of files (if it doesn't, right click the file and open properties). The files you're looking for should be in debug named after .exe
If it's something else, sorry.

Related

c/c++ convert position dependent object to be position independent

I have some compiled object file with debug symbols, but no acces to the sources.
Is there any method to convert this file to be position independent?
As far as I understand the '-fPIC' flag it makes all jumps to be relative. I'm wondering if having debug symbols is enough to be able to fix this jumps and so create a PIC binary.
If not please tell me why this operation is impossible to be done.

I think this question is rather platform than compiler specific since different platforms implement PIC code differently.
Nevertheless, I don't know of any platform where it would be possible with a simple tool to convert conventional code into position indepependant code. This is a decision that has to be made at compile/code generation time. Probably the only way to achieve your goal would be to disassemble the code and modify every absolute code/data reference into relative addressing.
The short answer would be: no, (practically) impossible.

How to figure out which methods increases size of 'exe'

I'm trying to write my first 'demoscene' application in MS Visual Studio Express 2010. Suddenly I realized, that my binary expanded from 16kb to ~100kb in fully-optimized-for-size release version. My target size is 64k. Is there any way to somehow "browse" binary to figure out, which methods consumes a lot of space, and which I should rewrite? I really want to know what my binary consists of.
From what I found in web, VS2010 is not the best compiler for demoscenes, but I still want to understand what's happening inside my .exe file.

I think you should have MSVC generate a map file for you. This is a file that will tell you the addresses of most of the different functions in your executable. The difference between consecutive addresses should tell you how much space the function takes. To generate a map file, add the /MAP linker option. For more info, see:
http://msdn.microsoft.com/en-us/library/k7xkk3e2(v=VS.100).aspx

You can strip off lots of unnecessary stuff from the executable and compress it with utilities such as mew.

I've found this useful for examining executable sizes (although not for demoscene type things): http://aras-p.info/projSizer.html
I will say this: if you are using the standard library at all then stop immediately. It is a huge code bloater. For example, each unique usage std::sort adds around 5KB and there's similar numbers for many of the standard containers (of course, it depends what functions you use, but in general they add lots of code).
Also, I'm not into the demo scene, but I believe people use Crinkler to compress their executables.

Use your version contol system to see what caused the increase. Going forward, Id log the built exe size during the nightly builds. And dont forget you can optimize for minimal size with the compiler settings.

exe checksum different after each recompile

So I'm trying to figure out how to get my exe to have the same hash code/checksum when it's recompiled. I'm using FastSum to generate the checksum. Currently, no code changes are made, I'm just rebuilding the project in VS and the checksum comes out different. The code is written in c++.
I'm not familiar with using hash codes and/or checksums in this manner, but I did some research and read something about needing a consistent GUID. But I have no idea how that would tie into the checksum generation program...
Well, I'll leave it at that, thanks in advance.

Have you examined the differences between the exes? I suspect the compiler/linker is inserting the date or time into the binary and as a result each binary will be different from another. Or it could be worse, sometimes compilers/linkers build static tables in their own system memory then copy that into the binary, say you have 9 bytes of something and for alignment reasons the compiler chooses to use 12 bytes in the binary, I have seen compilers/linkers take whatever 3 bytes are in system memory of that computer and copy that into the file. Ideally you would want the tools to zero out memory they are using for such a thing so you get repeatable results.
Basically do a binary diff between the files you should then find out why they dont match.

From what I recall, the EXE format includes a build timestamp so a hash of the exe, including that timestamp, would change on each recompile.

Is this a managed binary? Managed binaries have a GUID section that changes from build to build and there's not much you can do to stop that.
You can get a better look at the changes in your binary by running "link /dump /all [filename]" or "link /dump /disasm [filename]". The /all option will show you all the hex values as well as their ascii equivalent, while the /disasm option will disassemble the code and show it to you in assembly, which can be easier to read but might ignore some trivial differences which might have caused the hash to change.

print the code of a function in a DLL

I want to print the code of a function in a DLL.
I loaded the dll, I have the name of the desired function, what's next?
Thank you!

Realistically, next is getting the code. What you have in the DLL is object code -- binary code in the form ready for the processor to execute, not ready to be printed.
You can disassemble what's in the DLL. If you're comfortable working with assembly language, that may be useful, but it's definitely not the original source code (nor probably anything very close to it either). If you want to disassemble it, loading it in your program isn't (usually) a very good starting point. Try opening a VS command line and using dumpbin /disasm yourfile.dll. Be prepared for a lot of output unless the DLL in question is really tiny.

Your only option to retrieve hints about the actual implemented functionality of said function inside the DLL is to reverse engineer whatever the binary representation of assembly happens to be. What this means is that you pretty much have to use a disassembler(IDA Pro, or debugger, e.g. OllyDbg) to translate the opcodes to actual assembly mnemonics and then just work your way through it and try to understand the details of how it functions.
Note, that since it is compiled from C/C++ there is lots and lots of data lost in the process due to optimization and the nature of the process; the resulting assembly can(and probably will) seem cryptic and senseless, but it still does it's job the exact same way as the programmer programmed it in higher level language. It won't be easy. It will take time. You will need luck and nerves. But it IS doable. :)

Nothing. A DLL is compiled binary code; you can't get the source just by downloading it and knowing the name of the function.
If this was a .NET assembly, you might be able to get the source using reflection. However, you mentioned C++, so this is doubtful.

Check out this http://www.cprogramming.com/challenges/solutions/self_print.html and this Program that prints its own code? and this http://en.wikipedia.org/wiki/Quine_%28computing%29
I am not sure if it will do what you want, but i guess it may help you.

MAP file analysis - where's my code size comes from?

I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?

This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.

The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.

Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.

No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.

Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to measure Code Size? - c++

I'd say your best bet is to disassemble the binary. In the context of code optimizations, total code size isn't typically what is meant, but rather code size for some specific part of your program.

Related

c/c++ convert position dependent object to be position independent

How to figure out which methods increases size of 'exe'

exe checksum different after each recompile

print the code of a function in a DLL

MAP file analysis - where's my code size comes from?

Categories

Resources