Make setup for encrypted individual files in a C++ project - c++

We have a C/C++ project where we wish to encrypt (with GPG) every single source file, and have make (specifically, GNU Make) seamlessly work (as it does now with unencrypted source).
If we encrypt only the C or C++ files, this seems fairly easy to accomplish with a rule like this:
%.o : %.cc.gpg %.hh
$(GPG) --decrypt $< | $(CXX) $(CFLAGS) -x c++ -c -o $# -
However, if we start encrypting header files, it gets a lot trickier, as the C file may #include any number of headers. So it seems to me that first I need to generate a dependency list, then decrypt every one that is encrypted, and compile. Ideally, the decryption would be done in-memory, rather than leaving decrypted files laying around while compilation takes place.
Some notes, in anticipation of the comments I'll get:
The users' workflow will involve GPG plugins for their editor, but the rest should be as seamless as possible (i.e. traditional commandline-based Linux svn + make + gcc workflow)
We are using subversion for source control. We know and are OK with source being stored as binary blobs (as well as the implications of this, e.g. breaking svn diff)
The subversion repo lives on an encrypted filesystem (LUKS), and access is only through https
This is a management requirement
In my web research of this problem, I've seen a lot of people argue against encrypting every source file. As I said, it's a management requirement. But one thing that is not addressed by these arguments is keeping the source safe from sysadmins. Yes, at some point you have to trust people, but our source is kind of like the recipe to Coke: if it is uncontrolled, it could literally ruin the company. So why take chances?

You have two problems: 1) decrypting files in the build process and 2) keeping the cleartext in RAM. The second is a little out of my field; I'd suggest air-gapped workstations with nightly disc-scrubbing and a really good auditing system, and anyone who points out a flaw in security gets rewarded, not punished. Anyway let's assume you've solved that problem. (At this point you could just decrypt the whole code base and work normally, but let's try to find a tighter solution.)
For the decryption, you're halfway there. Instead of decrypting in the %.o rule I'd break it into separate rules:
%.cc : %.cc.gpg
$(GPG) --decrypt $<
%.o : %.cc %.hh
$(CXX) $(CFLAGS) -x c++ -c -o $# -
Now as you say, all you have to do is generate a dependency list. Then you can expand the first rule to cover encrypted headers and you're golden.
If you're using a civilized compiler like g++, you can (in general) generate a dependency list with g++ -M, and use that to write a "smart" %.o rule such as described here, which will handle all dependency problems automatically and invisibly.
The problem is that you can't use g++ -M at first, because you're in a viscous circle: you don't want to decrypt all of the headers, just the ones you need, so you can't do the decryption until you know which headers you need, but you won't know that until you generate the dependency files, which means running g++, but g++ will pitch an error and quit if a needed header isn't there already.
So we'll cheat. Suppose we have a separate directory full of empty header files with the same names as the real header files (trivial to build/maintain with Make). We can direct g++ (and Make) to look there for any headers it can't find in the usual place. That is not enough to actually compile objects, but it is enough to run g++ -M without error. The dependency list it constructs will be incomplete (because the real headers may #include each other) but it is enough for the first iteration. Make can decrypt those headers, then start over; when the results of g++ -M are the same as the list from the previous iteration, the process is complete, all needed headers have been decrypted and compilation can begin.
Is that outline enough, or do you need help with the nut and bolts?

Related

getting compile-time date and time without macros

using c++
I compile my code on an automated schedule and need to use the time at which the code was compiled in the code itself. Currently I'm just using the __DATE__, __TIME__ macros to get the compile- time date and time. However, this causes the binaries to change even if no changes have been made to the source (macros will inflate at compile time) which is not good (i don't want the setup to think that the binary changed if there have been no changes to the source).
Is it possible to get the compile-time without using any means that would cause the source to change?
Thanks
The standard __DATE__ and __TIME__ macros do what you observe, return a time dependent string.
It depends upon the system (and perhaps the compiler) and notably the build system (like GNU make for example).
A possible idea could be to link in a seperate timestamp file, something like (in make syntax)
timestamp.c:
date +'const char timestamp[]="%c";' > $#
program: $(OBJECTS) timestamp.c
$(LINKER.cc) $^ -o $# $(LIBES)
rm -f timestamp.c
The timestamp.owould then be regenerated and your programwould be relinked at every make (so the generated program will indeed change, but most of the code -thru $(OBJECTS) make variable- will stay unchanged).
Alternatively, you could e.g. log inside some database or textual log file the time of linking, e.g.
program: $(OBJECTS)
$(LINKER.cc) $^ -o $# $(LIBES)
date +'$# built at %c' >> /var/log/build.log
(you might use logger instead of date to get that logged in the syslog)
Then the generated program won't change, but you'll have logged somewhere a build timestamp. BTW you could log also some checksum (e.g. $(shell md5sum program) in make syntax) of your binary program.
If you use the compile-time IN YOUR binaries, then you will have the binary change.
There are several solutions, but I think the main point is that if you rebuild the binaries on a regular basis, it should really only be done if there are actually some changes (either to the build system or to the source code). So make it part of your build system to check if there are changes, and don't build anything if there isn't any changes. A simple way to do this is to check what the "latest version" in the version control system for the source code is. If the latest version is the same as the one used in the previous build, then nothing needs to be built. This will save you generating builds that are identical (apart from build time-stamp), and will resolve the issue of storgin __DATE__ and __TIME__ in the binary.
It's not clear to me what you want. If it's the last modified
time of the file, getting it will depend on your system and
build system: something like -D $(shell ls -l
--time-style=long-iso $< | awk '{ print $7, $8 }') could be used
in the compiler invocation with GNU make under Linux, for
example. But of course, it means that if an include file was
changed, but not the source, the time and date wouldn't reflect
it.

C++ binary identification (manifest)

We have a large set of C++ projects (GCC, Linux, mostly static libraries) with many dependencies between them. Then we compile an executable using these libraries and deploy the binary on the front-end. It would be extremely useful to be able to identify that binary. Ideally what we would like to have is a small script that would retrieve the following information directly from the binary:
$ident binary
$binary : Product=PRODUCT_NAME;Version=0.0.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME1;Version=0.1.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME2;Version=1.0.1;Build=xxx;User=xxx...
So it should display all the information for the binary itself and for all of its dependencies.
Currently our approach is:
During compilation for each product we generate Manifest.h and Manifest.cpp and then inject Manifest.o into binary
ident script parses target binary, finds generated stuff there and prints this information
However this approach is not always reliable for different versions of gcc..
I would like to ask SO community - is there better approach to solve this problem?
Thanks for any advice
One of the catches with storing data in source code (your Manifest.h and .cpp) is the size limit for literal data, which is dependent on the compiler.
My suggestion is to use ld. It allows you to store arbitrary binary data in your ELF file (so does objcopy). If you prefer to write your own solution, have a look at libbfd.
Let us say we have a hello.cpp containing the usual C++ "Hello world" example. Now we have the following make file (GNUmakefile):
hello: hello.o hello.om
$(LINK.cpp) $^ $(LOADLIBES) $(LDLIBS) -o $#
%.om: %.manifest
ld -b binary -o $# $<
%.manifest:
echo "$#" > $#
What I'm doing here is to separate out the linking stage, because I want the manifest (after conversion to ELF object format) linked into the binary as well. Since I am using suffix rules this is one way to go, others are certainly possible, including a better naming scheme for the manifests where they also end up as .o files and GNU make can figure out how to create those. Here I'm being explicit about the recipe. So we have .om files, which are the manifests (arbitrary binary data), created from .manifest files. The recipe states to convert the binary input into an ELF object. The recipe for creating the .manifest itself simply pipes a string into the file.
Obviously the tricky part in your case isn't storing the manifest data, but rather generating it. And frankly I know too little about your build system to even attempt to suggest a recipe for the .manifest generation.
Whatever you throw into your .manifest file should probably be some structured text that can be interpreted by the script you mention or that can even be output by the binary itself if you implement a command line switch (and disregard .so files and .so files hacked into behaving like ordinary executables when run from the shell).
The above make file doesn't take into account the dependencies - or rather it doesn't help you create the dependency list in any way. You can probably coerce GNU make into helping you with that if you express your dependencies clearly for each goal (i.e. the static libraries etc). But it may not be worth it to take that route ...
Also look at:
C/C++ with GCC: Statically add resource files to executable/library and
Is there a Linux equivalent of Windows' "resource files"?
If you want particular names for the symbols generated from the data (in your case the manifest), you need to use a slightly different route and use the method described by John Ripley here.
How to access the symbols? Easy. Declare them as external (C linkage!) data and then use them:
#include <cstdio>
extern "C" char _binary_hello_manifest_start;
extern "C" char _binary_hello_manifest_end;
int main(int argc, char** argv)
{
const ptrdiff_t len = &_binary_hello_manifest_end - &_binary_hello_manifest_start;
printf("Hello world: %*s\n", (int)len, &_binary_hello_manifest_start);
}
The symbols are the exact characters/bytes. You could also declare them as char[], but it would result in problems down the road. E.g. for the printf call.
The reason I am calculating the size myself is because a.) I don't know whether the buffer is guaranteed to be zero-terminated and b.) I didn't find any documentation on interfacing with the *_size variable.
Side-note: the * in the format string tells printf that it should read the length of the string from the argument and then pick the next argument as the string to print out.
You can insert any data you like into a .comment section in your output binary. You can do this with the linker after the fact, but it's probably easier to place it in your C++ code like this:
asm (".section .comment.manifest\n\t"
".string \"hello, this is a comment\"\n\t"
".section .text");
int main() {
....
The asm statement should go outside any function, in this instance. This should work as long as your compiler puts normal functions in the .text section. If it doesn't then you should make the obvious substitution.
The linker should gather all the .comment.manifest sections into one blob in the final binary. You can extract them from any .o or executable with this:
objdump -j .comment.manfest -s example.o
Have you thought about using standard packaging system of your distro? In our company we have thousands of packages and hundreds of them are automatically deployed every day.
We are using debian packages that contain all the neccessary information:
Full changelog that includes:
authors;
versions;
short descriptions and timestamps of changes.
Dependency information:
a list of all packages that must be installed for the current one to work correctly.
Installation scripts that set up environment for a package.
I think you may not need to create manifests in your own way as soon as ready solution already exists. You can have a look at debian package HowTo here.

How do I define a dependency graph with unknown intermediate node names?

I'm using a tool chain where I do not know the names of all of the intermediate files.
E.g. I know that I start out with a foo.s, and go through several steps to get a foo.XXXX.sym and a foo.XXXX.hex, buried way down deep. And then running other tools on foo.XXXX.hex and foo.XXXX.sym, I eventually end up with something like final.results.
But, the trouble is that I don't know what the XXXX is. It is derived from some other parameters, but may be significantly transformed away from them.
Now, after running the tool/steps that generate foo.XXXX.{sym,hex}, I now typically scan the overall result directory looking for foo.*.{sym,hex}. I.e. I have code that can recognize the intermediate outputs, I just don't know exactly what the names will be.
I typically use make or scons - actually, I prefer scons, but my team highly prefers make. I'm open to other build tools.
What I want to do is be able to say (1) "make final.results", or "scons final.results", (2) and have it scan over the partial tree; (3) figure out that, while it does not know the full path, it definitely knows that it has to run the first step, (4) after that first step, look for and find the foo.XXX.* files; (5) and plug those into the dependency tree.
I.e. I want to finish building the dependency tree after the build has already started.
A friend got frustrated enough with scons' limitations in this area that he wrote his own build tool. Unfortunately it is proprietary.
I guess that I can create a first build graph, say in make with many .PHONY targets, and then after I get through the first step, generate a new makefile with the new names, and have the first make invoke the newly generated second makefile. Seems clumsy. Is there any more elegant way?
GNU make has an "auto-rexec" feature that you might be able to make use of. See How Makefiles Are Remade
After make finishes reading all the makefiles (both the ones found automatically and/or on the command line, as well as all included makefiles), it will try to rebuild all its makefiles (using the rules it knows about). If any of those makefiles are automatically rebuilt, then make will re-exec itself so it can re-read the newest versions of the makefiles/included files, and starts over (including re-trying to build all the makefiles).
It seems to me that you should be able to do something with this. You can write in your main makefile and "-include foo.sym.mk" for example, and then have a rule that builds "foo.sym.mk" by invoking the tool on foo.s, then running your "recognized the next step" code and generate a "foo.sym.mk" file which defines a rule for the intermediate output that got created. Something like (due to lack of specificity in your question I can't give true examples you understand):
SRCS = foo.s bar.s baz.s
-include $(patsubst %.s,%.sym.mk,$(SRCS))
%.sym.mk: %.s
<compile> '$<'
<recognize output and generate makefile> > '$#'
Now when make runs it will see that foo.sym.mk is out of date (if it is) using normal algorithms and it will rebuild foo.sym.mk, which as a "side effect" causes the foo.s file to be compiled.
And of course, the "foo.sym.mk" file can include ANOTHER file, which can recognize the next step, if necessary.
I'm not saying this will be trivial but it seems do-able based on your description.
Make constructs the graph before running any rule, so there won't be a perfect answer. Here are some reasonably clean solutions.
1) use PHONY intermediates and wildcards in the commands. (You can't use Make wildcards because make expands them before running rules.)
final.results: middle
# build $# using $(shell ls foo.*.sym) and $(shell ls foo.*.hex)
.PHONY: middle
middle: foo.s
# build foo.XXXX.sym and foo.XXXX.hex from $<
2) Use recursive Make (which is not as bad as people say, and sometimes very useful.)
SYM = $(wildcard foo.*.sym)
HEX = $(wildcard foo.*.hex)
# Note that this is is the one you should "Make".
# I've put it first so it'll be the default.
.PHONY: first-step
first-step: foo.s
# build foo.XXXX.sym and foo.XXXX.hex from $<
#$(MAKE) -s final.results
final.results:
# build $# using $(SYM) and $(HEX)
3) Similar to 2, but have a rule for the makefile which will cause Make to run a second time.
SYM = $(wildcard foo.*.sym)
HEX = $(wildcard foo.*.hex)
final.results:
# build $# using $(SYM) and $(HEX)
Makefile: foo.s
# build foo.XXXX.sym and foo.XXXX.hex from $<
#touch $#

CXXSources-- what are they?

I'm new to compiling C/C++ with the aid of make. I downloaded an open source project and noticed that there is in the make file CXXSources and CXXObjects. I think I understand roughly what the make file is doing with them but...
I don't have any of the source files listed under CXXSources. Are these like dependences I'm supposed to know how to find? Is there any custom as to what CXXSource is versus just Source?
Added link to project: http://www.fim.uni-passau.de/en/fim/faculty/chairs/theoretische-informatik/projects.html
More specifically, the GML parser, eg. http://www.fim.uni-passau.de/fileadmin/files/lehrstuhl/brandenburg/projekte/gml/gml-parser.tar.gz
It seems to be getting stuck on the line:
gml_to_graph : $(CXXOBJECTS) gml_scanner.o gml_parser.o
$(CXX) -o gml_to_graph_demo $(CXXOBJECTS) gml_parser.o gml_scanner.o -L$(LEDADIR)/lib -lG -lL -lm
The $CXXObjects is defined by
CXXSOURCES = gml_to_graph.cc gml_to_graph_demo.cc
CXXOBJECTS = $(CXXSOURCES:.cc=.o)
So I need gml_to_graph.cc, it seems. Or maybe I'm wrong?
Usually, the variables are set before the point where you see them. This could be
(a) via the environment
(b) before including the quoted makefile
(c) in the quoted makefile, but preceding the location quoted
To see (verbosely) what GNU make takes into account, do:
make -Bn
(it will show everything that _would get executed)
Even more verbose:
make -p all
It will show you all the internal variable expansions.
If you post a link or more information, we will be able to come up with less generic (and hence possibly less confusing) answers

Can I have one makefile to build a hierarchical project?

I have several hundred files in a non-flat directory structure. My Makefile lists each sourcefile, which, given the size of the project and the fact that there are multiple developers on the project, can create annoyances when we forget to put a new one in or take out the old ones. I'd like to generalize my Makefile so that make can simply build all .cpp and .h files without me having to specify all the filenames, given some generic rules for different types of files.
My question: given a large number of files in a directory with lots of subfolders, how do I tell make to build them all without having to specify each and every subfolder as part of the path? And how do I make it so that I can do this with only one Makefile in the root directory?
EDIT: this almost answers my question, but it requires that you specify all filenames :\
I'm sure a pure-gmake solution is possible, but using an external command to modify the makefile, or generate an external one (which you include in your makefile) is probably much simpler.
Something along the lines of:
all: myprog
find_sources:
zsh -c 'for x in **/*.cpp; echo "myprog: ${x/.cpp/.o}" >> deps.mk'
include deps.mk
and run
make find_sources && make
note: the exact zsh line probably needs some escaping to work in a make file, e.g. $$ instead of $. It can also be replaced with bash + find.
One way that would be platform independent (I mean independent from shell being in Windows or Linux) is this:
DIRS = relative/path1\
relative/path2
dd = absolute/path/to/subdirectories
all:
#$(foreach dir, $(DIRS), $(MAKE) -C $(dd)$(dir) build -f ../../Makefile ;)
build:
... build here
note that spaces and also the semicolon are important here, also it is important to specify the absolute paths, and also specify the path to the appropriate Makefile at the end (in this case I am using only one Makefile on grandparent folder)
But there is a better approach too which involves PHONY targets, it better shows the progress and errors and stops the build if one folder has problem instead of proceeding to other targets:
.PHONY: subdirs $(DIRS)
subdirs: $(DIRS)
$(DIRS):
$(MAKE) -C $# build -f ../../Makefile
all : prepare subdirs
...
build :
... build here
Again I am using only one Makefile here that is supposed to be applicable to all sub-projects. For each sub-project in the grandchild folder the target "build" is created usinf one Makefile in the root.
I would start by using a combination of the wildcard function:
http://www.gnu.org/software/make/manual/make.html#Wildcard-Function
VPATH/vpath
http://www.gnu.org/software/make/manual/make.html#Selective-Search
and the file functions
http://www.gnu.org/software/make/manual/make.html#File-Name-Functions
For exclusion (ie: backups, as Jonathan Leffler mentioned), use a seperate folder not in the vpath for backups, and use good implicit rules.
You will still need to define which folders to do to, but not each file in them.
I'm of two minds on this one. On one hand, if your Make system compiles and links everything it finds, you'll find out in a hurry if someone has left conflicting junk in the source directories. On the other hand, non-conflicting junk will proliferate and you'll have no easy way of distinguishing it from the live code...
I think it depends on a lot of things specific to your shop, such as source source control system and whether you plan to ever have another project with an overlapping code base. That said, if you really want to compile every source file below a given directory and then link them all, I'd suggest simple recursion: to make objects, compile all source files here, add the resultant objects (with full paths) to a list in the top source directory, recurse into all directories here. To link, use the list.