Any way or ideas to protect or sign source code? - c++

This is probably a strange question. My project involves a few other people that need to work on the code too. I'm not sure how careful they would be with it and I don't want it to leak. For this reason I split it into 2 parts, one is in the form of a library, the rest just plain source code. There is one other guy that needs everything so he also has the source to the library. I don't want this guy to make any changes to the library. I put in a version number that gets printed when everything is running but I have no way of knowing (from looking at logs) if the library was authentic (from me only).
I was hoping there is some way I can use a public-private-key signature or something like this but against what? I probably can't just calculate an MD5 hash either because the linker probably puts the library function in different places all the time.
I realize it's probably not feasible to sign and verify source code but I would be curious to hear if anybody has any ideas.

You can use one of the VCS (version control systems) listed here.
By my experience you can use Github, it is easy to work with.

Related

Are there any good reasons in using a makefile?

In C++ I can achieve the same results by using a shell script where I write all the compilation instructions. So my question is:
Are there any good reasons in using a makefile?
Do you have any examples to demonstrate this?
One of the main reasons to use a makefile is that it will recompile only the source files which have changed since the last time you built your project. Writing a shell script to do this will take much more work than writing the makefile.
Wear and tear on the keyboard.
Preventing it taking ages to compile everything
Easier to change between compiling for debugging and production
As to examples - See most GNU projects wrote in C/C++
You might want to take a look on autotools. The will make a Makefile for you while they can help with code portebility as well. However, you have to make some relatively simple template files that the auto tools will use to construct configure file and a end user can run ./configure [options]; make. They provide many features to your makefile that a end user might expect. For a good introduction see : http://www.freesoftwaremagazine.com/articles/brief_introduction_to_gnu_autotools
Let's say you do write a shell script. It will work and you will be happy. You will keep using it every chance you get. You will add parameters to it to allow you to specify options. You will also notice that it re-compiles everything, all the time. So you will then try and make it smarter so it only re-compiles the files that have changed. What you will be doing, in effect, is writing your own make system.
That's fine as long as you had a good reason to do it. For example: Existing make solutions don't do X well, so you wrote one to solve that problem.
You, however, don't have a problem that cannot be solved by an existing make system (or at least, it sounds like you don't :) ). The problem you're trying to solve has already been solved. Just read up and use the solution - a make file :)
So, to answer your question, yes, there are a lot - most of which you won't be aware of until you need the functionality. When you do, you will be grateful it already does what you want.
It's the same logic you apply to using libraries in code.

Discovering Symbol Usage

Issue
I have recently found myself working with a large, unfamiliar, multi-department, C++ codebase in need of better organization. I would like to discover a way to map which symbols are used by which source files for any given header. This is in the hope that if only one department uses a given function, then it can be moved out of the shared area and into that department's area.
Attempts
My first thoughts were to use the symbol table: ie. compile the project and dump the symbols for each object file. From there I figured I could simply write a script to check if the symbols from my header file were used. While this approach seems viable, it would require me to create a list of symbols I am looking for from the headers. With my limited knowledge, I am unsure of how to automate such a process, and with hundreds of headers files to test, doing it manually is out of the question.
Questions
Is my approach valid? If so..
What can I use to generate the symbol names from my header file?
If not..
What else can I do?
Additionally, while I am using Linux, most of the development teams work in Windows only environments. What utilities could I use on both platforms?
Any and all help is greatly appreciated.
When I need to clean up APIs I sometimes use information from callcatcher. It basically builds a database of all symbols while compiling and allows you to determine what symbols are used in some build product.
I sometimes also use DXR (code on github, an example installation) to browse what code defined where is used how. In contrast to callcatcher with DXR you can drill down to much finer detail. Setting up DXR is pretty heavy duty, but might be worth it if you have enough code to work with.
On the other side of the spectrum there are tools like cscope. Even though it doesn't work super nicely with C++ code it is still very useful. If you deal with more than a couple 100kloc you will quickly feel limited though.
If I had to pick only one of these tools and would be working on a large code base (>1Mloc) I would definitely pick DXR.
You can get a reasonable start on the information that you've described by using doxygen.
Even for source that doesn't contain the doxygen formatted comments the documentation created can contain a list of places (ie. source files) where a particular symbol is used.
And, as doxygen can be used to generate html documentation, navigating through your source tree becomes trivial. It's can be even better if you enable the dot functionality to generate relationship diagrams for the classes in your source tree.
very old-school, simple, and possibly unix only, but are you aware of etags? there's also gnu global which i think is similar.
the gnu global link refers to the "comparison with similar tools" discussion here which might also be useful.

Changing parts of compiled binaries

learned english as a second lang, sorry for the mistakes & awkwardness
I have given a peculiar project to work on. The company has lost the source code for the app, and I have to make changes to it. Now, reverse engineering the whole thing is impossible for one man, its just too huge, however patching individual functions would be feasible, since the changes are not that monumental.
So, one possible solution would be compiling C code and somehow -after rewriting addresses- patching it into the actual binary, ideally, replacing the code the CALL instruction jumps to, or inserting a JMP to my code.
Is there any way to accomplish this using MingW32? If it is, can you provide a simple example? I'm also interested in books which could help me accomplishing the task.
Thanks for your help
I use OllyDBG for this kind of things. It allows you to see the disassembly and debug it, you can place breakpoints etc, and you can also edit the binary. So, you could edit the PE header of that program adding a code section with your (compiled) code inside, then call it from the original program.
I can't give you any advice since I've never tried, although I thought about it many times. You know, lazyness.. :)
I would disassemble the program with a high-quality disassembler that produces something that can be assembled back into a runnable app, and then replace the parts you need to modify with C code.
Something like this will let you reverse the machine code into source. It won't be pretty but it does work.
http://www.hex-rays.com/idapro/
There are also tools for runtime patching http://www.dyninst.org/ for instance. They really aren't made for patching but they can do the trick.
And of course the last choice is to just use an assembler and write machine code :)

How should I integrate with and package this third-party library in a Win32 C++ app?

We have a (very large) existing codebase for a custom ActiveX control, and I'd like to integrate libkml into it for the sake of interacting with KML mapping data, rather than reinventing the wheel. The problem is, I'm a relatively new Windows developer, and coming from the Linux world, I'm really not sure what the right way of integrating a third party library is. Thankfully, libkml does provide MSVCC projects for compiling it, so porting isn't a problem. I guess I have a couple choices that I can think of:
Build and link the library directly. We already have a solution with project files in it for the "main" project; I could add the libkml projects to that solution, but I'd rather not. It's very unlikely that the libkml code will change in relation to our app's code.
Statically link to the .lib files produced by the libkml build. This is unattractive, since there are six .lib files that come out of the libkml solution and it seems inelegant to manually specify them in the linker options, etc.
Package the code as-is in a DLL. Maybe with COM? It seems like if I did this without any translation, I'd end up with a lot of overhead, and since I'm fairly unfamiliar with COM, I don't know how much work would be involved in exposing all the functionality I'd like to use via COM. The library is fairly big, has a lot of classes it uses, and if I had to manually write code to expose it all, I'd be hesitant to go this route.
Write wrapper code to to abstract the functionality I need, package that in a COM DLL, and interact with that. This seems sensible, I suppose, but it's difficult to determine how much abstraction I need since I haven't written the code that would use libkml yet.
Let me reiterate: I haven't yet written the code that will interact with libkml yet, so this is mostly experimental. Options 1 and 2 are also complicated by the fact that libkml relies additionally on three more external libraries that are also in .lib files (that I had to recompile anyways to get the code generation flags to line up). The goal obviously is to get the code to work, but maintainability and source tree organization are also goals, so I'm leaning towards options 3 and 4, but I don't know the best way to approach those on Windows.
Typing six file names, or using the declarative style with #pragma comment(lib, "foo.lib") is small potatoes compared to the work you'll have to do to turn this into a DLL or COM server.
The distribution is heavily biased towards using this as a static link library. There are only spotty declarations available to turn this into a DLL with __declspec(dllexport). They exist only in the 3rd party dependencies. All using different #defines of course, you'll by typing a bunch of names in the preprocessor definitions for the projects.
Furthermore, you'll have a hard time actually getting this DLL loaded at runtime since you are using it in a COM server. The search path for DLLs will be the client app's when COM creates your control instance, not likely to be anywhere near close to the place you deployed the DLL.
Making it a COM server is a lot of work, you'll have to write all the interface glue yourself. Again, nothing already in the source code that helps with this at all.
You can also wrap all the functionality you need in a non-COM-dll. Visual studio supports creating a static wrapper library which, when linked, will make your program use the dll. This way you only have one dependency to specify instead of six.
Other than that, what is wrong with specifying six dependencies. I would assume that there is a good reason that these are six separate libraries instead of one, so it is prudent to specify exactly which parts you actually use.
Maybe I'm missing something here, but I really don't see what is wrong with (1). I think that even if you had multiple projects that were using libkml, just insert the project file for libkml into your solution file, specify the dependencies, and you should be done. It's dead simple. Even solution (2) is dead simple. If the libraries ever change, you rebuild - you're going to need to do that anyway.
I'm failing to see how (3) or (4) are necessary or even desired. To me, it sounds like a lot of work for goals (source tree organization and maintainability) that I'm not even sure that those options really meet. In fact, you said yourself that "It's very unlikely that the libkml code will change in relation to our app's code."
What I've found over the years is to just keep things simple. If rebuilding KML is potentially time consuming, grab the libs and just statically link to the libraries. Yes, there are other dependencies, but you'll set this up once and be done, hopefully never to worry about it again. Otherwise, stick it in the project and move on. I think that it's worthwhile to ask whether spending a lot of time on this issue is worth the trouble.

Reading/Understanding third-party code

When you get a third-party library (c, c++), open-source (LGPL say), that does not have good documentation, what is the best way to go about understanding it to be able to integrate into your application?
The library usually has some example programs and I end up walking through the code using gdb. Any other suggestions/best-practicies?
For an example, I just picked one from sourceforge.net, but it's just a broad engineering/programming question:
http://sourceforge.net/projects/aftp/
I frequently use a couple of tools to help me with this:
GNU Global. It generates cross-referencing databases and can produce hyperlinked HTML from source code. Clicking function calls will take you to their definitions, and you can see lists of all references to a function. Only works for C and perhaps C++.
Doxygen. It generates documentation from Javadoc-style comments. If you tell it to generate documentation for undocumented methods, it will give you nice summaries. It can also produce hyperlinked source code listings (and can link into the listings provided by htags).
These two tools, along with just reading code in Emacs and doing some searches with recursive grep, are how I do most of my source reverse-engineering.
One of the better ways to understand it is to attempt to document it yourself. By going and trying to document it yourself, it forces you to really dive in and test and test and test and make sure you know what each statement is doing at what times. Then you can really start to understand what the previous developer may have been thinking (or not thinking for that matter).
Great question. I think that this should be addressed thoroughly, so I'm going to try to make my answer as thorough as possible.
One thing that I do when approaching large projects that I've either inherited or contributing to is automatically generate their sources, UML diagrams, and anything that can ease the various amounts of A.D.D. encountered when learning a new project:)
I believe someone here already mentioned Doxygen, that's a great tool! You should look into it and write a small bash script that will automatically generate sources for the application you're developing in some tree structure you've setup.
One thing that I've haven't seen people mention is BOUML! It's fantastic and free! It automatically generates reverse UML diagrams from existing sources and it supports a variety of languages. I use this as a way to really capture the big picture of what's going on in terms of architecture and design before I start reading code.
If you've got the money to spare, look into Understand for %language-here%. It's absolutely great and has helped me in many ways when inheriting legacy code.
EDIT:
Try out ack (betterthangrep.com), it is a pretty convenient script for searching source trees:)
Familiarize yourself with the information available in the headers. The functions you call will be declared there. Then try to identify the valid arguments and pre-/post-conditions of the functions, as those are your primary guidance (even if they are not documented!). The example programs are your next bet.
If you have code completion/intellisense I like opening up the library and going '.' or 'namespace::' and seeing what comes up. I always find it helpful, you can navigate through the objects/namespaces and see what functionality they have. This is of course assuming its an OOP library with relatively good naming of functions/objects.
There really isn't a silver bullet other than just rolling up your sleeves and digging into the code.
This is where we earn our money.
Three things;
(1) try to run the test or example apps available, set low debug levels, and walk through logs.
(2) use source navigator tool / cscope ( available both on windows and linux) and browse the code to understand the flow.
(3) also in parallel use gdb to walk into code while running test/example apps.