Storing Enormous Static Variables in C++

Storing Enormous Static Variables in C++ - c++

I have a string of information that is roughly 17 kb long. My program will not generate this string or read it into a buffer - the data is already initialized, I want it to be compiled as is from within my code, like you would a static variable. Moreover, I'd much prefer it is within my executable, and not stored within a project file. I've never before encountered such an issue, what is the best way to go around this? Should I include as resource, or literally copy and paste the enormous stream of data into a variable? What would you recommend?
Forgot to mention, am using VisualStudio C++ 2015 if that matters

The GNU linker ld has the ability to directly include custom data as the .data section of an object file:
ld -r -b binary -o example.o example.txt
The resulting example.o file has symbols defined to access start and end of the embedded data (just look at the file with objdump to see how they're named).
Now I don't know whether the linker coming with Visual Studio has a similar ability, but I guess you could use the GNU linker either via mingw or also via cygwin (since the generated object file won't reference the standard lib you won't need the emulation lib that comes with cygwin).The resulting object file apparently can just be added to your sources like a regular source file.
Of course this manual workflow isn't too good if the data changes often...
Alternatively you can write a simple program which puts the contents of the file in a C string, like:
unsigned char const * const data = {
0x12, 0x34, 0x56 };
Of course there's already such a program (xdd) but I don't know whether it's available to you. One potential issue is that you could reach the limit for the length of string literals that way. To get around that you could try a (multidimensional) char array.
(When writing this answer I found this blog post very helpful.)

Related

Insert string at linking time

I want to define a external global symbol (const char*) used by the program. Add the symbol at linking time with a given value. This is useful for example commit hash or build time.
I found --defsym which does something else. The Go linker supports this functionality via the -X option. (Yeah I know, Go Strings are managed, but I am talking about plain old zero terminated c strings)
For example:
extern const char *git_commit;
int main(int argc, char *argv[]) {
puts(git_commit);
return 0;
}
gcc main.o -Wl<something here that adds git_commit and set it to '84e5...'>
I am aware of config.h approach and building object files containing those string. But its 2019 by know. Such a simple task should be easy.
Edit more precise question
Is there a equivalent option in gcc/binutils for Go Linker's -X option.

There is a change to do it in compile/preprocessing time, consider this:
#include <stdio.h>
const char * git_commit = GIT_COMMIT;
int main(int argc, char ** argv) {
puts(git_commit);
return 0;
}
and in command line:
gcc -o test test.c -DGIT_COMMIT="\"This is a test\""
assuming you are using GCC compiler.

One way:
echo "char const* git_commit = \"$(git rev-parse HEAD)\";" > git_commit.c
gcc -c -o git_commit.o git_commit.c
gcc -o main main.o git_commit.o
When you implement this in a makefile you may like to only recreate git_commit.c when the revision changes, so that it doesn't relink it on each make invokation.

I'm not aware of any way to do what you're seeking to do directly via a flag using one of the commonly used linkers. In general, if you want to link the definition of such an object into your program, you'll have to provide an object file containing a suitable symbol definition.
The probably simplest way to get there would be to just invoke the compiler to compile the definition of your variable with the content being fed from a macro defined via the command line like already suggested in the other answers. If you want to avoid creating temporary source files, gcc can also receive input straight from stdin:
echo "const char git_commit[] = GIT_COMMIT;" | gcc -DGIT_COMMIT=\"asdf\" -c -o git_commit.obj -xc++ -
And in your code your just declare it as
extern const char git_commit[];
Note: I'm using const char git_commit[] rather than const char* git_commit. That way, git_commit will directly be an array of suitable size initialized to hold the contents of the commit hash. const char* git_commit, on the other hand, will create a global pointer object initialized to hold the address of a separate string literal object, which means you introduce an unnecessary indirection. Not that it will really matter here, but it also doesn't really cost you anything to skip the inefficiency, however tiny it might be…
There would also be the objcopy utility which can be used to wrap arbitrary binary content in an object file, see, e.g., here How do I embed the contents of a binary file in an executable on Mac OS X? It may even be possible to pass input to objcopy straight via stdin as well. Finally, you could also just write your own tool that directly writes an object file containing a suitable symbol definition. Consider, however, that, at the end of the day, you're seeking to generate an object file that can be linked with the other object files making up your program. Simply using the same compiler you use to compile the rest of the code is probably the most robust way of going about doing that. With other solutions, you'll always have to manually make sure that the object files are compatible in terms of target architecture, memory layout, …

You could embed it as a ressource file, for example:
GitInfoHTML HTML "GitInfo.html"
This will placed into the executable by the linker (at link time) and can then be loaded.
For more info, see:
https://learn.microsoft.com/en-us/cpp/windows/resource-files-visual-studio?view=vs-2019

Accessing files made with mktemp for Linux through C++

I am trying to create a temporary file on a Linux system, but interfacing through C++ (so that the Linux commands are run through the C++ program).
To do so, I am using mktemp, which produces a temporary file.
I would need to later refer back to this file.
However, the filename is randomly generated and I am wondering if there is an easy way to access the filename.

The big honking comment in mktemp(3)'s manual page explicitly tells you to use mkstemp(3) instead of mktemp(3), and explains the good reason why it is so.
If you actually read the manual page for mkstemp(3) it clearly explains that the library function modifies the character buffer that's passed to it as a parameter to reflect the actual name of the created temporary file.
So to determine the name of the temporary file, simply refer to the character buffer you passed to this library function.

Is there a fully Standard compliant way to make the compiler paste the exact (binary) contents of a file in a source file?

I would like to provide the ease of use of Qt's Resource system (basically an xml containing a list of files, that is precompiled into a C++ source file containing a bunch of char arrays of the binary byte content of these files, which is compiled into the binary), in pure C++.
So I wonder, is the first and pretty much only requirement, even at all possible?
Can I compile a binary file into an object file?
I know it's not simply possible to #include another file in a C++11 raw string literal, but maybe there is a way around this. I would like to ditch the precompile step. Is there a way?
In the worst case, maybe a linker script and some voodoo in the code to access these bytes can make this functional, but I don't know if that's any better than the precompile step (which is certainly a lot more transparent...).

but I don't know if that's any better than the precompile step (which is certainly a lot more transparent...).
^^^^ That's the way to go (emphasis mine).
I don't see that there's something standard compliant to #include binary files directly into your code.
You'll need to have a tool that translates that binary file to something like
uint8_t myBinaryData = { 0x00, 0xfb, 0x42, /* ... */ }
and include that finally.
Simple python scripts or so will do fine.

Stripping symbols from shared library and encryption key

I am working on a shared library (.so in Linux) which has a XML file for a small database and that xml file is encrypted. Here is an abstract of my code:
void my_fucnt(char *in, char *out)
{
static char key[] = {0x34, 0x6c, 0x54....};
enrcryption(key, in, out);
}
First thing first; The other day I was examining the library with objdump and I found out that the many of the symbols (even those declared static) were found to be in the object file which I thought was revealing most of my code logic so I searched on internet and found out about strip utility so did I.
It would be nice to know that what methodology does strip utility applies and does it place addresses of symbols instead of their names?
Secondly, I still see the key in the .data section of object file which is revealing the database key although I have stripped the symbols. Is there any way I can hide that? or what other techniques can be applied so to encrypt my database file?
Any help would be appreciated.

What strip does is remove debug symbols and information. This can often take up a large part of an executable file, which is the reason the utility exists.
As for the key, it's going to be in there somewhere. You can obfuscate it (encrypting the key itself, store each byte of the key in different places, etc.) but if a cracker wants to find it he or she will find it. They are notoriously good at reverse engineering and figuring out what a piece of assembly code does.

Registering each C/C++ source file to create a runtime list of used sources

For a debugging and logging library, I want to be able to find, at runtime, a list of all of the source files that the project has compiled and linked. I assume I'll be including some kind of header in each source file, and the preprocessor __FILE__ macro can give me a character constant for that file, so I just need to somehow "broadcast" that information from each file to be gathered by a runtime function.
The question is how to elegantly do this, and especially if it can be done from C as opposed to C++. In C++ I'd probably try to make a class with a static storage to hold the list of filenames. Each header file would create a file-local static instance of that class, which on creation would append the FILE pointer or whatever into the class's static data members, perhaps as a linked list.
But I don't think this will work in C, and even in C++ I'm not sure it's guaranteed that each element will be created.

I wouldn't do that sort of thing right in the code. I would write a tool which parsed the project file (vcproj, makefile or even just scan the project directory for *.c* files) and generated an additional C source file which contained the names of all the source files in some kind of pre-initialized data structure.
I would then make that tool part of the build process so that every time you do a build this would all happen automatically. At run time, all you would have to do is read that data structure that was built.

I agree with Ferruccio, the best way to do this is in the build system, not the code itself. As an expansion of his idea, add a target to your build system which dumps a list of the files (which it has to know anyway) to a C file as a string, or array of strings, and compile this file into your source. This avoids a lot of complication in the source, and is expandable, if you want to add additional information, like the version number from your source code control system, who built the executable, etc.

There is a standard way on UNIX and Linux - ident. For every source file you create ID tag - usually it is assigned by you version control system, e.g. SVN keywords.
Then to find out the name and revision of each source file you just use ident command. If you need to do it at runtime check out how ident does it - source for it should be freely available.

Theres no way to do it in C. In C++ you can create a class like this:
struct Reg {
Reg( const char * file ) {
StaticDictionary::Register( file );
};
where StaticDictionary is a singleton container for all your file names. Then in each source file:
static Reg regthisfile( __FILE__ );
You would want to make the dictionary a Meyers singleton to avoid order of creation problems.

I don't think you can do this in the way you outline in a "passive" mode. That is, you are going to somehow run code for each source file to be added to the registry, it's hard to get it to happen automatically.
Of course, it's possible that you can make that code very unobtrusive using macros. It might be problematic for C source files that don't have an "entrypoint", so if your code isn't already organised as "modules", with e.g. an init() function for each module, it might be hard. Static initializing code might be possible, I'm not 100% sure if the order in which things are initialized creates problems here.
Using static storage in the registry module sounds like an excellent idea, a plain linked list or simple hash table should be easy enough to implement, if your project doesn't already include any general-purpose utility library.

In C++ your solution will work. It's guaranteed.
Edit: Just found out a solution in my head: Change a rule in your makefile to add
'-include "cfiles_register.h"' to each 'g++ file.cpp'.
%.o : %.cpp
$(CC) -include 'cfiles_register.h' -o $# $<
put your proposed in the question implemnatation to that 'cfiles_register.h'.

Using static instances in C++ would work fine.
You could do this also in C, but you need to use runtime specific features - for MSVC CRT take a look at http://www.codeguru.com/cpp/misc/misc/threadsprocesses/article.php/c6945/
For C - you could do it with a macro - define a variable named corresponding to your file, and then you could scan the symbols of your executable, just as an idea:
#define TRACK_FILE(name) char _file_tracker_##name;
use it in your my_c_file.c like this:
TRACK_FILE(my_c_file_c)
and than grep all file/variable names from the binary like this
nm my-binary | grep _file_tracker
Not really nice, but...

Horrible idea, I'm sure, but use a singleton. And on each file do something like
Singleton.register(__FILE__);
at global scope. It'll only work on cpp files though.
I did something like this years ago as a novice, and it worked. But I'd cringe to do it now. I'd add a build step now.

I agree with those who say that it is better to avoid doing this at run time, but in C, you can initialize a static variable with a function call, that is, in every file:
static int doesntmatter = register( __FILE__);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Storing Enormous Static Variables in C++ - c++

Related

Insert string at linking time

Accessing files made with mktemp for Linux through C++

Is there a fully Standard compliant way to make the compiler paste the exact (binary) contents of a file in a source file?

Stripping symbols from shared library and encryption key

Registering each C/C++ source file to create a runtime list of used sources

Categories

Resources