Prevent from reverse engineering C++ binary

Prevent from reverse engineering C++ binary - c++

I have read several articles on the topic as this one and implemented most of the described techniques. But I also want to add some extra un-referenced/never-used code to the binary. Ideally I want to be able to add this code to the built binary through a tool. Is there such a tool? Any ideas on how to build such a tool? Or how to generate and add to my C++ program some never-used code? Where should I put it?
In an analysis of Skype internals I read that they mess the code as much as possible. One way of achieving it is to compute each call dynamically:
if ( sin(a) == 42 ) {
do_dummy_stuff () ;
}
Should I enter into the dummy function? Or is the dummy function the never-used code?
Update: the reason I want to add never-used code to the binary is because we ship many e-books. I want the binaries of each to be little different so if one is compromised, the others not to be (at least not right away).

If I got you right, you are talking about obfuscation.
This question on Stackoverflow covers the topic. There is a lot of software that obfuscates C++ code, quick googling shows a lot of such apps, e.g. this or this.

Is there such a tool?
Yes, there is. It is called compiler with proper parameters, and to add to it a linker. Add to this combination strip, and you'll get a proper library.
On a serious note, there are no ways to prevent reverse engineering. You can only make it harder (or better annoying) for the cracker. You can take a look in this article (where developers of spyro tried all sorts of piracy protection)

Related

C++ function dependency graph [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C++ code dependency / call-graph “viewer”?
I am working on a huge C++ code base and currently I am stuck with the problem of modularizing my code. I have to divide my code into separate independent modules.
One approach I can think of is to generate the dependency graph and then do a higher level categorization. Other approach is to start with a entry point (some function abc()) and generating a function call tree whose each node will contain the name of the file in which that function resides. Thereafter I can use some script to extract those functions into separate modules.
My question is, is there any tool that can help me do this task? Has anybody faced such a problem earlier. Or can you suggest any approach to achieve the same?

First level of modularization - and I hope you already have done that - is structuring your code in classes. If your code is merely a collection of functions - i.e., C plus some syntactic suggar - then it's high time to rewrite your code, because no amount of dependency-graph-building will save you from maintenance hell.
If you have classes, modularizing should be a matter of finding classes that closely work together (like Customer, Order and Invoice) and separate them from classes that are only losely coupled with them (like Employer or Facility). Then take it from there.
Modularizing code is something that requires, first and foremost, thought. No automatic procedure is a replacement for that. Actually, from what little you wrote, I would fear that any automated process would make things worse, because apparently there has been too little thought invested in this project already. I.e., you wrote 1 million lines of code without thinking about modularization, and now you want modularization to happen while still not actually thinking about it. You are heading for epic fail.

To get some overview doxygen might help. But you have to play around a little with the doxyfile settings to generate dependency graps and if your Code base is huge you should disable dynamic stuff from the generated methods.
Doxygen can create include, inheritance, call and caller graphs using graphviz.
Here are simple examples but it also works for bigger ones.
But doxygen will only give you an overview and no refactoring capabilities.

I regularly use "Understand for C/C++" to investigate these kind of dependencies.
If the code base is really huge and you start your modularization from scratch, you might want to look at some other tools, like:
Cytoscape (which can take the output of "Understand for C/C++" to visualize the dependencies
Lattix

It sounds like you are looking for a refactoring tool. Try taking a look at the answers on this question: Is there a working C++ refactoring tool?

One method will be a bit long but what you can do is to remove a method and compile to find dependencies and than group the decadencies into one component. Although this does not resolve your issue fully but it is an approach to start off with.

Is it possible for programmer to analyze unknown code fast?

I got a task related to ANCIENT C++ project which hasn't any documentation, comments at all and all code/variables is written in foreign language. Do I have a chance to analyze this code in a 1 working day and make a design/UML to create new features? I have been sitting around for 3 hours already and I feel so frustrated... Maybe somebody also had same problem? Any advice?
BR,

I suspect the biggest issue may be the fact that it's in a foreign language. You can use various static code analysis tools to try and understand what's going on, but if everything is presented in an unfamiliar language then that's still no use. Your first step (I believe) is to find someone who can speak this language and get them to translate as you go...

1) Use Doxygen , You can configure doxygen to extract the code structure from undocumented source files.
2) Use source Insight, Source Insight is an advanced code editor and browser with built-in analysis for C/C++, C#, and Java programs

Short answer, no - you probably don't have a chance to understand the code in one day. Reading/maintaining code is one of the hardest things to do, especially when it's lacking documentation. The fact that the code is in a foreign language (!) makes it even harder.
Sounds like you are on a very restricted (unrealistic) time-budget, but Working With Legacy Software is a good book if you're working with legacy systems. If you are planning to keep adding new features to the legacy system it's your responsibility to make your management aware of the scope of the operation. Or at least try.

Under this time constraint (1 day) it may or may not be doable depending on the size of the project - if its a few hundred lines of code then for sure. If its a serious project with several tens of thousands code lines, then likely no.
The first thing you need to know is what is this program supposed to do at all. If you have no idea what it does and how it does it, then analyzing the code will give you the answer but it will be a long and frustrating task. So my first suggestion would be to get yourself familiar with the outer workings of the software - what does it supposed to do and generally how it is supposed to do it. If you are doing it as part as your work then you should be able to get someone to walk you through using the program - even if its UI is in a foreign language (which I hope it doesn't, even if the code is written by a foreign language speaker).
Once you know what the software is attempting to do, then it should be fairly straight forward (even if lengthy and daunting) to rewrite all the comments in your own language for you to understand. I suggest doing so in a bottoms-up approach: its easier to understand the small and trivial things a program does, then to understand the top-level logic - and a lot of trivial things in order make up the logic of the software.
Only once you understand - to a large degree, anyway - the inner workings of the program you may write its functional spec and work on features.

Non-free way on Windows:
You can use CppDepend. This application is able to parse your visual project or your source files. It gives you a lot of information like dependency trees. You can try the trial (Maybe it will be enough for what you have to do).
Free way multi-platform:
You can use doxygen with a special configuration (extract code structure from undocumented code) and analyze the result.

I was quite happy with a tool called Understand (15-day eval license available) for this kind of task. However, I agree with Guss that the time you'll need depends a lot on the size of the code, and one day is probably just enough for a small program.

cscope & ctags are a must when I do my own code, and even more when looking to other's code.

You may also try this ::
http://www.sgvsarc.com/product_crystalflow.htm

Visualizing C++ to help understanding it

I'm a student who's learning C++ at school now. We are using Dev-C++ to make little, short exercises. Sometimes I find it hard to know where I made a mistake or what's really happing in the program. Our teacher taught us to make drawings. They can be useful when working with Linked Lists and Pointers but sometimes my drawing itself is wrong.
(example of a drawing that visualizes a linked list: nl.wikibooks.org/wiki/Bestand:GelinkteLijst.png )
Is there any software that could interpret my C++ code/program and visualize it (making the drawings for me)?
I found this: link text
other links:
cs.ru.ac.za/research/g05v0090/images/screen1.png and
cs.ru.ac.za/research/g05v0090/index.html
That looks like what I need but is not available for any download. I tried to contact that person but got no answer.
Does anybody know such software? Could be useful for other students also I guess...
Kind regards,
juFo

This is unrelated to the actual title but I'd like to make a simple suggestion concerning how to understand what's happening in the program.
I don't know if you've looked at a debugger but it's a great tool that can definitely vastly improve your understanding of what's going on. Depending on your IDE, it'll have more or less features, some of them should include:
seeing the current call stack (allows you to understand what function is calling what)
seeing the current accessible variables along with their values
allowing you to walk step by step and see how each value changes
and many, many more.
So I'd advise you to spend some time learning all about the particular debugger for your IDE, and start to use all of these features. There's sometimes a lot more stuff then simply clicking on Next. Some things may include dynamic code evaluation, going back in time, etc.

Have a look at DDD. It is a graphical front-end for debuggers.
Try debuggers in general to understand what your program is doing, they can walk you through your code step-by-step.

Doxygen has, if I recall, a basic form of this but it's really only a minor feature of a much bigger library, so that may be overkill for what you want. (Though it's a great program for documentation!)

Reverse engineering the code to some sort of diagram, will have limited benefit IMO. A better approach to understanding program flow is to step the code in the debugger. If you don't yet use a debugger, you should; it is the more appropriate tool for this particular problem.
Reverse engineering code to diagrams is useful when reusing or maintaining undocumented or poorly documented legacy code, but it seldom exposes the design intent of the code, since it lacks the abstraction that you would use if you were designing the code. You should not have to resort to such things on new code you have just written yourself! Moreover, tools that do this even moderately well are expensive.
Should you be thinking you can avoid design, and just hand in an automatically generated diagram, don't. It will be more than obvious that it is an automatically generated diagram!

What is the most common way of understanding a very large C++ application? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
When having a new C++ project passed along to you, what is the standard way of stepping through it and becoming acquainted with the entire codebase? Do you just start at the top file and start reading through all x-hundred files? Do you use a tool to generate information for you? If so, which tool?

I use change requests/bug reports to guide my learning of some new project. It never makes a lot of sense to me to try and consume the entirety of something all at once. A change order or bug report gives me guidance to focus on this one tendril of the system, tracing it's activity through the code.
After a reasonable amount of these, I can get a good understanding of the fundamentals of the project.

Here's my general process:
Start by understanding what the application does, and how its used. (I see way too many developers completely skip this critical step.)
Search for any developer documentation related to the project. (However, realize this will nearly always be wrong and out of date - it just will have helpful clues.)
Try to figure out the logic in the organization. How is the main architecture defined? What large scale patterns are used? (ie: MVC, MVP, IoC, etc)
Try to figure out the main classes related to the "large" objects in the project. This helps for the point above.
Slowly start refactoring and cleaning up as you try to maintain the project.
Usually, that will get me at least somewhat up to speed. However, usually I end up given a project like this because something has to be fixed or enhanced, and timing isn't always realistic, in which case I often just have to jump in and pray.

Start working on it, perhaps by
adding a small feature.
Step through application startup in the debugger.

You could try running it through doxygen to at last give a browsable set of documentation - but basically the only way is a debugger, some trace/std::cerr messages and a lot of coffee.
The suggestion to write test cases is the basis of Working-Effectively-Legacy-code and the point of the cppunit test library. If you can take this approach depends on your team and your setup - if you are the new junior you can't really rewrite the app to support testing.

Try writing unit tests for the various classes.

There is one tool I know about that may help you, it's currently in beta called CppDepend that will help you understand the relation between the classes and the projects in the solution.
Other than that you can try to understand the code by reading it:
Start with the header (.h/.hpp) files, reading them would help understand the "interfaces" between the classes
If the solution has several project try to understand the responsibility of each project.
Find someone who is familiar with the project that could give you and overview, 5 min with the right person can save you an hour with the debugger

Understanding how the code is used is usually very helpful.
If this is a library, look at client code and unit tests. If there aren't any unit tests, write some.
If this is an application, understand how it works - in detail. Again read & write unit tests.
Essentially, it's all about the interfaces. Understand the the interfaces and you'll go a long way towards understanding how the code works. By interface, I mean, the API if it's a library, the UI if it's a graphical application, the content of the inbound & outbound messages if it's a server.

Firstly how large is large?
I don't think you can answer this without knowing the other half of the scenario. What is the requirement for changing the code?
Are you just supporting/fixing it when it goes wrong? Developing new functionality? Porting the code to a new platform? Upgrading the code for a new C++ compiler?
Depending on what your requirement is I would start in different ways.

Here's how I approach the problem
Start by fixing easy bugs. Do extreme dilligance on these bugs and use the debugger heavily to find the problem
Code review every change that goes into the system. On an unbelievably large system, pick a smaller subset and review all of these changes
And most importantly: Ask a lot of questions!

Things to do:
Look at what the sales brochure tells you it does, set the scope of your expectations
Install it, what options do you have in the installer, read the quick start/install guide
Find out what it does, does it even execute, do you have multiple executables
Is there a developer setup guide/wiki, pointers to VCS
Get the code and make your build environment work, document SDKs, build tools you need if it isn't already
Look at the build process, project dependancies, is there a build machine/CI service
Look at generated doc output (if there is any!)
Find an interesting piece of the solution and see how it works, what are the entry points/ how does it work/look for main classes and interfaces
Replicate bugs, stop at interesting features in the program to get an overview and work down to tracing code.
Start to fix things, but ensure you are fixing things by having appropriate unit tests to show that it is broken now and when it will be fixed.

I have been incorporating source codes from some mid-sized projects. The most important lesson I learn from this process is before going into the source codes, you must be sure what part of the source codes interest you most. You should then go into that piece by grepping logging/warning messages or looking at class/function names. In understanding the source codes, you should run it in a debugger or insert your own warning messages. In all, you should focus on things you are interested in. The last thing you want is to read all the source codes.

Try generating a documentation using Doxygen or something similar if it wasn't done already.
Walk through the API and see if there is something that is unclear to you and look at the code, if you still don't get it ask a developer who already worked on it before.
Always examine whatever you have to work on first.
Take a look at whatever UML documents you've got, if you don't have any:
Smack the developer/s who worked on it. It's a shame they didn't do something as basic as UML class diagrams.
Try to generate them from the code. They will not be accurate but the they will give you a head start.
If there is something specific that you don't understand or think is wrong, ask the team who developed it. They will probably know better.

Fixing bugs works just fine for any project, not just c++ one.

Browse around in the file hierarchy with Total Commander, try getting an overview of the structure. Try identify where the main header files are located. Also find the file where the main() function is located.

Ask a person who is already familiar with the codebase to outline the basic concepts that were used during development.
He doesn't need to explain every detail, but should give you a rough idea of how the software works and how the individual modules are connected with each other.
Additionally, what I've found useful in the past was to first setup a working development environment before starting to think about the code.

Read the documentation. If possible, speak with the former maintainer. Then, check out the code bases from the first commit and the first release from the VCS and spend some time looking at them. Don't go for full understanding yet, just skim and understand which are the major components and what they do. Then read the change logs and the release notes for each of the major releases. Then start breaking everything and see what breaks what. Do some bug fixes. Review the test suite and understand which component each test is focused on. Add some tests. Step through the code in a debugger. Repeat.

As already said, grab doxygen and build HTML documentation for source code.
If code is well-designed, you'll easily see a nice class hierarchy, clear call graphs and many other things that otherwise would take ages to uncover. When certain parts behavior appears unclear, look at the unit tests or write your own.
However, if the structure appears to be flat, or messy, or both together, you may find yourself in some sort of trouble.

I'm not sure there is a standard way. There are some for-pay tools that will do C++ class diagrams/call graphs and provide some kind of code-level view. doxygen is a good free one. My low-tech approach is to find the top-level file and start to sort through what it provides and how...taking notes if needed.

In C++, the most common problem is that a lot of energy and time is wasted on low level tasks, such as "memory management".
Things that are no - brainers in managed languages are a pain to do in C++.

How to start modification with big projects

I have to do enhancements to an existing C++ project with above 100k lines of code.
My question is How and where to start with such projects ?
The problem increases further if the code is not well documented.
Are there any automated tools for studying code flow with large projects?
Thanx,

Use Source Control before you touch anything!

There's a book for you: Working Effectively with Legacy Code
It's not about tools, but about various approaches, processes and techniques you can use to better understand and make changes to the code. It is even written from a mostly C++ perspective.

First study the existing interface well.
Write tests if they are absent, or expand already written ones.
Modify the source code.
Run tests to check if the modification somehow breaks the older behaviour.

There is another good book, currently freely available on the net, about object oriented reengineering : http://www.iam.unibe.ch/~scg/OORP/

The book "Code Reading" by Diomidis Spinellis contains lots of advice about how to gain an overview and in-depth knowledge about larger, unknown projects.
Chapter 6 is focuses sonely on that topic (Tacking Large Projects). Also the chapters about tooling (Ch. 9) and architecture (Ch. 8) might contain nice hints for you.
However, the book is about understanding (by reading) the "code". It does not tackle directly the maintenance step.

First thing I would do is try to find the product's requirements.
It's almost unthinkable that a product of this size would be developed without requirements.
By perusing the requirements, you'll be able to:
get a sense of what the product (and hence the code) is at least supposed to be doing
see just how well (or poorly) the code actually fulfills those requirements
Otherwise you're just looking at code, trying to divine the intention of the developers...

If you are able to run the code in a PC, you can try to build a callgraph usually from a profiling output.
Also cross referencing tools like cscope, ctags, lxr, etc. Can help a lot. A
Spending some time reading, building class diagrams or even adding comments to the parts of the code you took long to understand are steps towards getting familiar with the codebase and getting ready to modify/extend it.

The first thing you need to do is understand how the code works. Read what documentation there is and then watch the program operate under a debugger. If you watch the main function/loop and then slowly work your way deeper into the program, you can gain a pretty good idea how things are operating. Make sure you write down your findings so others who follow after you have a better position to start from.

Running Doxygen with the EXTRACT_ALL tag set to document all the relationships in the code base. It's not going to help you with the code flow, but hopefully it will shed some light with regards to the structure and design of the entire application.

A very good austrian programmer once told me that in order to understand a program you first have to understand the data-structures that the program uses.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Prevent from reverse engineering C++ binary - c++

If I got you right, you are talking about obfuscation. This question on Stackoverflow covers the topic. There is a lot of software that obfuscates C++ code, quick googling shows a lot of such apps, e.g. this or this.

Related

C++ function dependency graph [duplicate]

Is it possible for programmer to analyze unknown code fast?

Visualizing C++ to help understanding it

What is the most common way of understanding a very large C++ application? [closed]

How to start modification with big projects

Categories

Resources