Remove protobuf c++ compiled path string from binary - c++

When I compile my c++ program that uses Protobuf, and then run the linux strings command on the binary, one of the strings is a path to the generated cc file, with my home directory and everything. Obviously I'd like to eliminate my home directory and other personal information from the binary.
Where does this path come from and how can I prevent it from making it into the compiled binary?

The string comes from the embedded protobuf descriptor, which is used to perform dynamic introspection of protobuf types. Essentially, the descriptor describes your whole .proto file. The descriptor itself is encoded in protobuf format; see google/protobuf/descriptor.proto.
Now, the descriptor normally should not contain absolute paths like you describe. It really wants to contain "canonical" paths -- that is, the path name of the proto file relative to the source code root, or in other words, the path that you'd write in an import statement for that file. For instance, descriptor.proto's own canonical path is google/protobuf/descirptor.proto; to import it, you would write import "google/protobuf/descriptor.proto";.
The reason your descriptors are getting the full absolute filesystem path is because that is the path that you are passing to protoc, and you are not passing a -I flag to tell protoc where the root of your source tree is. Since protoc can't figure out the root of the source code, it is falling back to the file system root.
For instance, say your .proto file is /home/foo/myproj/src/frobber/baz.proto. Say that the src directory in this path is your "source root", meaning that you want people to write import "frobber/baz.proto"; to import your proto file. In that case, you want to invoke protoc like this:
protoc -I/home/foo/myproj/src /home/foo/myproj/src/frobber/baz.proto
Note that if you are running the command from, say, the myproj directory, then you probably shouldn't specify an absolute path at all:
protoc -Isrc src/frobber/baz.proto
It is very important that the -I flag here is a textual prefix of the source file name. protoc is dumb and only knows how to compare strings. It doesn't, for instance, know what the current directory is:
# DOES NOT WORK
cd /home/foo/myproj
protoc -I/home/foo/myproj/src src/frobber/baz.proto
And it also cannot canonicalize "..":
# DOES NOT WORK: protoc doesn't collapse "xyz/../".
protoc -Isrc xyz/../src/frobber/baz.proto
However ".." is OK if it's consistent, because again protoc only cares about a prefix match:
# OK: Prefix is consistent.
protoc -Ixyz/../src xyz/../src/frobber/baz.proto
If you'd rather not have a descriptor
You can compile your proto files in "lite mode" by placing the following line in your .proto file:
option optimize_for = LITE_RUNTIME;
In this mode, descriptors will not be included at all. Additionally, you can link against the "lite" version of the protobuf runtime library, which is much smaller than the regular version. However, many useful features will be disabled. The whole reflection interface will be gone, and anything that depends on reflection will be gone as well. For example, TextFormat, which is what the DebugString() method uses to convert messages into text to print for debugging, will be removed, therefore debugging will be harder.

Related

How to specify directory for compilation

I wrote some tool, based on libclang, which get some file and create translation unit from it and parse ast tree, but the tool needs compile flags, so I must specify all needed flags for compilation the file. So, here is a small problem: I can not specify relative path to incude directories, because for create translation unit the tool create temporary file and copy all input file in it (this is needed, because tool also can work in interactive mode). So paths for temporary file and source file are different. I means that I should set absolute paths in -I flags of manually parse input flags and check it: is that relative or not? Can I specify some prefix for all relative include directories? Or set current directory for compiler?

Loading OCaml modules not in the current directory

I'm writing a large OCaml project. I wrote a file foo.ml, which works perfectly. In a subdirectory of foo.ml's directory, there is a file bar.ml.
bar.ml references code in foo.ml, so its opening line is:
open Foo
This gives me an error at compile time:
Unbound module Foo.
What can I do to fix this without changing the location of foo.ml?
The easy path is to use one of OCaml build system like ocamlbuild or oasis. Another option would be jbuilder but jbuilder is quite opiniated about file organization and does not allow for the kind of subdirectory structure that you are asking for.
The more explicit path comes with a warning: OCaml build process is complicated with many moving parts that can be hard to deal with.
After this customary warning, when looking for modules, OCaml compiler first looks for module in the current compilation environment, then looks for compiled interface ".cmi" files in the directories specified by the "-I" option flags (plus the current directory and the standard library directory).
Thus in order to compile your bar.ml file, you will need to add the parent directory in the list of included directories with the -I .. option.
After all this, you will discover that during the linking phase, all object files (i.e. .cmo or .cmx) need to be listed in a topological order compatible with the dependency graph of your project.
Consequently, let me repeat my advice: use a proper build system.

The path in protobuf

I don't quite understand the path in protobuf. My file layout like this:
Top
A
a.proto
B
C
c.proto // import "A/a.proto";
I have written an RPC system based on protobuf and I need generate two kinds of files(client and server code) from c.proto. Client code should be placed in B and Server code still in C.
I can't write a correct command.
Top> protoc -I=. --client_out=./B/ C/c.proto will generate client code in B/C and the #include in code will have a wrong path.
Top/C> protoc -I=../ -I=./ --client_out=./ ./c.proto lead a protobuf_AddDesc_* error.
For every .proto file, protoc tries to determine the file's "canonical name" -- a name which distinguishes it from any other .proto file that may ever find its way into your system. In fact, ideally, the canonical name is different from every other .proto file in the world. The canonical name is the name you use when you import the .proto file from another .proto file. It is also used to decide where to output the generated files and what #includes to generate.
For .proto files specified on the command line, protoc determines the canonical name by trying to figure out what name you would use to import that file. So, it goes through the import paths (specified with -I) and looks for one that is a prefix of the file name. It then removes that prefix to determine the canonical name.
In your case, if you specify -I=. C/c.proto, then the canonical name is C/c.proto. If you specified -I=C C/c.proto, the canonical name would then simply be c.proto.
It is important that any file which attempts to import your .proto file imports it using exactly the canonical name determined when the file itself was compiled. Otherwise, you get the linker error regarding AddDesc.
In general, everything works well if you designate some directory to be the "root" of your source tree, and all of your code lives in a subdirectory of that with a unique name designating your project. Your "root" directory should be the directory you pass to both -I and --client_out. Alternatively, you can have separate directories for source files vs. generated files, but the generated files directory should have an internal structure that mirrors your source directory. You can then specify the generated files directory to --client_out, and when you run the C++ compiler, specify both the source and generated files directories in the include path.
If you have some other setup -- e.g. one where the .proto files live at a different canonical path from the .pb.h files -- then unfortunately you will have some trouble making protoc do what you want. Though, given that you are writing a custom code generator, you could invent whatever rules you want for the way its output files are organized, but straying from the rules the standard code generator follows might lead to lots of little pitfalls.

Different paths used for #include and other files

I'm quite confused about this weird behaviour of my .cpp project. I've got the following folder structure:
include/mylib.h
myproject/src/eval.cpp
myproject/data/file.csv
myproject/Makefile
In eval.cpp I include mylib.h as follows:
#include "../../include/mylib.h"
and compile it through Makefile:
all:
g++ -I include ../include/mylib.h src/eval.cpp -o eval.out
Now in my eval.cpp I'm reading the file.csv from data directory and if I refer to it like this
../data/file.csv
it doesn't find it (gets empty lines all the time), but this
data/file.csv
works fine.
So, to include mylib.h it goes two directories up (from src folder) which seems right. But it doesn't make sense to me that to refer to another file from the same piece of code it assumes we are in project directory. I suppose it is connected with Makefile somehow, but I'm not sure.
Why is it so?
EDIT: After a few thing I tried it seems that the path which is used is not the path from binary location to the data location, but depends on where from I run the binary as well. I.e., if I have binary in bin directory and run it like:
./bin/eval.out
It works with data/file.csv.
This:
cd bin
./eval.out
works with ../data/file.csv.
Now it seems very confusing to me as depending on where I run the program from it will give different output. Can anyone please elaborate on the reasons for this behaviour and if it is normal or I'm making some mistake?
It is so because (as explained here ) the compiler will search for #included files with quotes (not with brackets) with the current working directory being the location of the source file.
Then, when you try to open your .csv file, it's now your program that looks for a file. But your program runs with the current working directory being myproject/ which explains why you must specify data/file.csv as your file path, and not ../data/file.csv. Your program does not run in your src folder, it will run in the directory the binary ends up being invoked from.
You could have noticed that in your Makefile, your -I options specify a different path for your header file than your .cpp file.
EDIT Answer: It's quite simple actually and completely normal. When you invoke your binary, the directory which you're in is the current working directory. That is, if you run it with the command ./myproject/bin/eval.out, the current working directory is . (e.g. /home/the_user/cpp_projects). My post was a bit misleading about that, I corrected it.
Note: You can use the command pwd in a command prompt to know which is the current working directory of this prompt (pwd stands for "print working directory").

How to find "my" lib directory?

I'm developing a C++ program under Linux. I want to put some stuff (to be specific, LLVM bitcode files, but that's not important) in libraries, so I want the following directory structure:
/somewhere/bin/myBin
/somewhere/lib/myLib.bc
How do I find the lib directory? I tried to compute a relative part from argv[0], but if /somewhere is in my PATH, argv[0] will just contain myBin. Is there some way to get this path? Or do I have to set it at compile time?
How do GNU autotools deal with this? What happens exactly if I supply the --prefix option to ./configure?
Edit: The word library is a bit misleading in my case. My library consist of LLVM bitcode, so it's not an actual (shared) object file, just a file I want to open from my program. You can think of it as an image or text file.
maybe what you want is :
/usr/lib
unix directory reference: http://www.comptechdoc.org/os/linux/usersguide/linux_ugfilestruct.html
Assume your lib directory is "../lib" relative to executable
First you need to identify where myBin located, You can get it by reading /proc/self/exe
Then concat your binary file path with "../lib" will give you the lib directory.
You will have to use a compiler flag to tell the program. For example, if you have a plugin dir:
# Makefile.am
AM_CPPFLAGS = -DPLUGIN_DIR=\"${pkglibdir}\"
bin_PROGRAMS = awesome_prog
pkglib_LTLIBRARIES = someplugin.la
The list of directories to be searched is stored in the file /etc/ld.so.conf.
In Linux, the environment variable LD_LIBRARY_PATH is a colon-separated set of directories where libraries should be searched for first, before the standard set of directories; this is useful when debugging a new library or using a nonstandard library for special purposes.
LD_LIBRARY_PATH is handy for development and testing:
$ export LD_LIBRARY_PATH=/path/to/mylib.so
$ ./myprogram
[read more]
Addressing only the portion of the question "how to GNU autotools deal with this?"...
When you assign a --prefix to configure, basically two things happen: 1) it instructs the build system that everything is to be installed in ${prefix}, and 2) it looks in ${prefix}/share/config.site for any additional information about how the system is set up (it is common for that file not to exist.) It does absolutely nothing to help find libraries, but depends on the user having set up the tool chain properly. If you want to use a library in /foo/lib, you must have your toolchain set up to look there (eg, by putting /foo/lib in /etc/ld.so.conf, or by putting -L/foo/lib in LDFLAGS and "/foo/lib" in LD_LIBRARY_PATH)
The configure script relies on you to have the environment set up. It does not help you set up that environment, but does help by alerting you that you have not done so.
You could use the readlink system call on /proc/self/exe to get the path of your executable. You might then use realpath etc.