Checking library integrity - c++

I created a script to remove useless code in many c++ libs (like ifdefs, comments, etc.)
Now, I want to compare the original lib and the "treated" lib to check if my script has done a good job.
The only solution I found is to compare the exported symbols.
I'm wondering if you have any other ideas to check the integrity?

FIRST of all: Unit tests are designed for this purpose.
You might get some mileage out of
compiling without optimization (-O0) and without debug information (or strip it afterwards)
objdump -dCS
and compare the disassemblies. Prepare to meet some / many spurious errors (the strip step was there to prevent needless differences in source line number info). In particular you will have to
ignore addresses
ignore generated label names
But if the transformation would really lead to unmodified code, you'd be able to verify it 1:1 using this technique and a little work.

assert based unit test would help you. Have some test cases , run them against the original library and then run with the code removed .

Related

How to export function names and variable names using GCC or clang?

I am making a commercial software and I don't want for it to be easily crackable. It is targeted for Linux and I am compiling it using GCC (8.2.1). The problem is that when I compile it, technically anyone can use disassembler like IDA or Binary Ninja to see all functions names. Here is example (you can see function names on left panel):
Is there any way to protect my program from this kind if reverse engineering? Is there any way of exporting all if these function names and variables from code automatically (with GCC or clang?), so I can make a simple script to change them to completely random before compilation?
So you want to hide/mask the names of symbols in your binary. You've decided that, to do this, you need to get a list of them so that you can create a script to modify them. Well, you could get that list with nm but you don't need any of that (rewriting names inside a compiled binary? oof… recipe for disaster).
Instead, just do what everybody does in a release build and strip the symbols! You'll see a much smaller binary, too. Of course this doesn't prevent reverse engineering (nothing does), though it arguably makes said task more difficult.
Honestly you should be stripping your release binaries anyway, and not to prevent cracking. Common wisdom is not to try too hard to prevent cracking, because you'll inevitably fail, and at the cost of wasted dev time in the attempt (and possibly a more complex codebase that's harder to maintain / a more complex executable that is less fast and/or useful for the honest customer).

Reverse engineering your own code c++

I have a compiled program which I want to know if a certain line exist in it. Is there a way, using my source code, I could determine that?
Tony commented on my message so I'll add some info:
I'm using the g++ compiler.
I'm compiling the code on Linux(Scientific)/Unix machine
I only use standard library (nothing downloaded from the web)
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
objdump is a utility that can be used as a disassembler to view executable in assembly form.
Use this command to disassemble a binary,
objdump -Dslx file
Important to note though that disassemblers make use of the symbolic debugging information present in object files(ELF), So that information should be present in your object files. Also, constants & comments in source code will not be a part of the disassembled output.
Summary
Use source code control and keep track of which source code revision the executable's built from... it should write that into the output so you can always cross-reference the two, checkout the same sources and rebuild the executable that gave you those results etc..
Discussion
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
For the very simplest case where you want all the MD simulations to be running the latest source, you can compare timestamps on the source files with the executable to see if you forgot to recompile, compare the process start time (e.g. as listed by ps) with the executable creation time.
Where you're deliberately deploying multiple versions of the program and only have the latest source, then it gets pretty tricky. A multiplication will typically only generate a single machine code instruction... unless you have some contextual insight you're unlikely to know which multiplication is significant (or if it's missing). The compiler may generate its own multiplications for e.g. array indexing, and may sometimes optimise multiplications into bit shifts (or nothing, as Ira comments), so it's not as simple as saying 'well, it's my only multiplication in function "X"'. If you're printing a specific line that may be easier to distinguish... if there's a unique string literal you can search for it in the executable (e.g. puts("Hello") -> strings program | grep Hello, though that may get other matches too, and the compiler's allowed to reuse string literal sequences so "Well Hello" might cater to your need via a pointer to 'H' too). If there's a new extern symbol involved you might see it in nm output etc..
All that said (woah)... you should do something altogether different really. Best is to use a source control system (e.g. svn, cvs...), and get it configured so you can do something to find out which revision of the codebase was used to create the executable - it should be a FAQ for any revision control system.
Failing that, you could, for example, do something to print out what multipliers or conditions the progarm was using when it starts running, capturing that in your logs. While hackish, macros allow you to "stringify" their parameters, so you can log and execute something without typing all the code twice. Lots of other options too.
Hope some of that helps....

How do I tell gcov to ignore un-hittable lines of C++ code?

I'm using gcov to measure coverage in my C++ code. I'd like to get to 100% coverage, but am hampered by the fact that there are some lines of code that are theoretically un-hittable (methods that are required to be implemented but which are never called, default branches of switch statements, etc.). Each of these branches contains an assert( false ); statement, but gcov still marks them as un-hit.
I'd like to be able to tell gcov to ignore these branches. Is there any way to give gcov that information -- by annotating the source code, or by any other mechanism?
Please use lcov. It hides gcov's complexity, produces nice output, allows detailed output per test, features easy file filtering and - ta-taa - line markers for already reviewed lines:
From geninfo(1):
The following markers are recognized by geninfo:
LCOV_EXCL_LINE
Lines containing this marker will be excluded.
LCOV_EXCL_START
Marks the beginning of an excluded section. The current line is part of this section.
LCOV_EXCL_STOP
Marks the end of an excluded section. The current line not part of this section.
A tool called gcovr can be used to summarise the output of gcov, and (from at least version 3.4) it supports the same exclusion markers as lcov.
From this answer:
The following markers are recognized by geninfo:
LCOV_EXCL_LINE
Lines containing this marker will be excluded.
LCOV_EXCL_START
Marks the beginning of an excluded section. The current line is part of this section.
LCOV_EXCL_STOP
Marks the end of an excluded section. The current line not part of this section.
You can also replace 'LCOV' above with 'GCOV' or 'GCOVR'. They all work.
Could you introduce unit tests of the relevant functions, that exist solely to shut gcov up by directly attacking the theoretically-unhittable code paths? Since they're unit tests, they could perhaps ignore the "impossibility" of the situations. They could call the functions that are never called, pass invalid enum values to catch default branches, etc.
Then either run those tests only on the version of your code compiled with NDEBUG, or else run them in a harness which tests that the assert is triggered - whatever your test framework supports.
I find it a bit odd though for the spec to say that the code has to be there, rather than the spec containing functional requirements on the code. In particular, it means that your tests aren't testing those requirements, which is as good a reason as any to keep requirements functional. Personally I'd want to modify the spec to say, "if called with an invalid enum value, the function shall fail an assert. Callers shall not call the function with an invalid enum value in release mode". Or some such.
Presumably what it currently says, is along the lines of "all switch statements must have a default case". But that means coding standards are interfering with observable behaviour (at least, observable under gcov) by introducing dead code. Coding standards shouldn't do that, so the functional spec should take account of the coding standards if possible.
Failing that, you could perhaps wrap the unhittable code in #if !GCOV_BUILD, and do a separate build for gcov's benefit. This build will fail some requirements, but conditional on your analysis of the code being correct, it gives you the confidence you want that the test suite tests everything else.
Edit: you say you're using a dodgy code generator, but you're also asking for a solution by annotating the source code. If you're changing the source, can you just remove the dead code in many cases? Not that changing generated source is ideal, but needs must...
I do not believe this is possible. Gcov depends on gcc to generate extra code to produce the coverage output. GCov itself just parses the data. This means that Gcov cannot analyze the code any better than gcc (and I assume you use -Wall and have removed code reported as unreachable).
Remember that relocatable functions can be called from anywhere, potentially even external dlls or executables so there is no way the compiler can know what relocatable functions will not be called or what input these functions may have.
You probably will need to use some facy static analysis tool to get the info that you want.

Can code formatting lead to change in object file content?

I have run though a code formatting tool to my c++ files. It is supposed to make only formatting changes. Now when I built my code, I see that size of object file for some source files have changed. Since my files are very big and tool has changed almost every line, I dont know whether it has done something disastrous. Now i am worried to check in this code to repo as it might lead to runtime error due to formatting tool. My question is , will the size of object file be changed , if code formatting is changed.?
Brief answer is no:)
I would not check your code into the repo without thoroughly checking it first (review, testing).
Pure formatting changes should not change the object file size, unless you've done a debug build (in which case all bets are off). A release build should be not just the same size, but barring your using __DATE__ and such to insert preprocessor content, it should be byte-for-byte the same as well.
If the "reformatting" tool has actually done some micro-optimizations for you (caching repeated access to invariants in local vars, or undoing your having done that unnecessarily), that might affect the optimization choices the compiler makes, which can have an effect on the object file. But I wouldn't assume that that was the case.
if ##__LINE__ macro is used might produce longer strings. How different are the sizes?
(this macro is often hides in new and assert messages in debug.)
just formatting the code should not change the size of the object file.
It might if you compile with debugging symbols, as it might have added more line number information. Normally it wouldn't though, as has already been pointed out.
Try comparing object files built without debugging symbols.
Try to find a comparison tool that won't care about the formatting changes (like perhaps "diff--ignore-all-space") and check using that before checking in.

MAP file analysis - where's my code size comes from?

I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?
This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.
The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.
Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.
No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.
Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.