Compile a C++ function inside a C++ program - c++

Consider the following problem,
A C++ program may emit source of a C++ function, for example, say it will create a string with contents as below:
std::vector<std::shared_ptr<C>> get_ptr_vec()
{
std::vector<std::shared_ptr<C>> vec;
vec.push_back(std::shared_ptr<C>(new C(val1)));
vec.push_back(std::shared_ptr<C>(new C(val2)));
vec.push_back(std::shared_ptr<C>(new C(val3)));
vec.push_back(std::shared_ptr<C>(new C(val4)));
return vec;
}
The values of val1 etc will be determined at runtime when the program create the string of the source above. And this source will be write to a file, say get_ptr_vec.cpp.
Then another C++ program will need to read this source file, and compile it, and call the get_ptr_vec function and get the object it returns. Kind of like a JIT compiler.
Is there any way I can do this? One workaround I think would be having a script that will compile the file, build it into a shared library. And the second program can get the function through dlopen. However, is there anyway to skip this and having the second program to compile the file (without call to system). Note that, the second program will not be able to see this source file at compile time. In fact, there will be likely thousands such small source files emitted by the first program.
To give a little background, the first program will build a tree of expressions, and will serialize the tree by traversing through postorder. Each node of tree will have a string representation written to the file. The second program will read the list of this serialized tree nodes, and need to be able to reconstruct this list of strings to a list of C++ objects (and later from this list I can reconstruct the tree).
I think the LLVM framework may have something to offer here. Can someone give me some pointers on this? Not necessary a full answer, just somewhere for me to start.

You can compile your generated code with clang and emit LLVM bitcode (-emit-llvm flag). Then, statically link your program with parts of LLVM that read bitcode files and JITs them. Finally, take compiled bitcode and run JIT on them, so they will be available in your program's address space.

Related

if we had a single file project that contained all the code can we not use the linker?

Linker question:
if I had a file. c that has no includes at all, would we still need a linker?
Although the linker is so-named because it links together multiple object files, it performs other functions as well. It may resolve addresses that were left incomplete by the compiler. It produces a program in an executable file format that the system’s program loader can read and load, and that format may differ from that of object modules. Specifics depend on the operating system and build tools.
Further, to have a complete program in one source file, you must provide not just the main routine you are familiar with from C and C++ but also the true start of the program, the entry point that the program loader starts execution at, and you must provide implementations for all functions you use in the program, such as invocations of system services via special trap or system-call instructions to read and write data.
You can create a project, which has no typical C startup code, in which case, you may not even have a main(). However, you still need a linker, because the linker creates the required executable file format for the given architecture.
It also will set the entrypoint, where the actual execution starts.
So you can omit the standard libraries, and create a binary, which is completly void of any C functions, but you still need the linker to actually make a runable binary.
The object file format, generated by the compiler, is very different to the executable file format, because it only provides all information, that is required for the linker.
Yes. The linker does more than merely link the files. Check out this resource for more info: https://en.wikibooks.org/wiki/C%2B%2B_Programming/Programming_Languages/C%2B%2B/Code/Compiler/Linker#:~:text=The%20linker%20is%20a%20program,translation%20unit%20have%20external%20linkage.
Believe it or not, multiple libraries can be referenced by default. So, even if you don't #includea resource, the compiler may have to internally link or reference something outside of the translation unit. There are also redundancies and other considerations that are "eliminated" by the compiler.
Despite its name the linker is properly a "linker/locater". It performs two functions - 1) linking object code, 2) determining where in memory the data and code elements exist.
The object code out of the compiler is not "located" even if it has no unresolved links.
Also even if you have the simplest possible valid code:
int main(){ return 0; }
with no includes, the linker will normally implicitly link the C runtime start-up, which is required to do everything necessary before running main(). That may be very little. On some target such as ARM Cortex-M you can in fact run C code directly from the reset vector so long as you don't assume static initialisation or complete library support. So it is possible to write the reset code entirely in C, but you probably still need code to initialise the vector table with the reset handler (your C start-up function) and the initial stack pointer. On Cortex-M that can be done using in-line assembler perhaps, but it is all rather cumbersome and unnecessary and does not forgo the linker.

C++ to evaluate inclusion file during runtime

What I need to do is to "fine tune" some constant values that should be compiled along with the rest of the program, but I want to verify the results at every change without having to modify a value and recompile the whole program each time. So I was thinking at a sort of plain text configuration file to reload every time I change a number in it, and re-initialize part of the program to take action on the new values. It's something that I do often, but this time what I want to do is to have this configuration file under the form of a valid inclusion file with the following syntax:
const MyStructure[] =
{
{ 1, 0.5f, 0.2f, 0.77f, [other values...] },
{ 3, 0.4f, 0.1f, 0.15f, [other values...] },
[other rows...]
};
If I were using an interpreted language such as Perl, I'd have used the eval() function, which if course is not possible with C++. And while I have read other questions about the possiblity to have an eval() function in C++, what I want is not to evaluate and run this code, just to parse it and put the values in the variables they belong to.
I would probably use a Regular Expression to parse the C syntax above, but again, RegExp still is not something worth using in C++, so can you suggest an alternative method?
It's probably worth saying that I need to parse this file only during the development phase. I will #include it when the program is ready for the release.
Writing your own parser is probably more work than is appropriate for this use case.
A simpler solution would be to just compile the file containing the variables separately, as a shared object or DLL, which can be loaded dynamically at run time. (Precise details depend on your OS.) You could, if desired, invoke the compiler during program initialisation as well.
If you don't want to deal with the complication of finding the symbols and copying them into static variables, you could also compile the bulk of your program as a shared object, with only a small shim as the main executable. That shim would:
If necessary, invoke the compiler to create the data shared object
Dynamically load the data shared object
Dynamically load the program shared object, and
Invoke the main program using it's main entry point (possibly using a different name).
To produce the production version, it is only necessary to compile program and data together, and use it directly without the shim.
Variations on this theme are possible, depending on precise needs.

Compile lua code, store bytecode then load and execute it

I'm trying to compile a lua script that calls some exported functions, save the resulting bytecode to a file and then load this bytecode and execute it, but I haven't found any example on how to do this. Is there any example available on how to do this? How can I do this?
Edit: I'm using Lua + Luabind (C++)
This is all very simple.
First, you load the Lua script without executing it. It does not matter if you have connected the Lua state with your exported functions; all you're doing is compiling the script file.
You could use luaL_loadfile, which uses C-standard library functions to read a file from disk and load it into the lua_State. Alternatively, you can load the file yourself into a string and use luaL_loadstring to load it into the lua_State.
Both of these functions will emit return values and compiler errors as per the documentation for lua_load.
If the compilation was successful, the lua_State now has the compiled Lua chunk as a Lua function at the top of the stack. To get the compiled binary, you must use the lua_dump function. It's rather complicated as it uses a callback interface to pass you data. See the documentation for details.
After that process, you have the compiled Lua byte code. Shove that into a file of your choice. Just remember: write it as binary, not with text translation.
When it comes time to load the byte code, all you need to do is... exactly what you did before. Well, almost. Lua has heuristics to detect that a "string" it is given is a Lua source string or byte code. So yes, you can load byte code with luaL_loadfile just like before.
The difference is that you can't use luaL_loadstring with byte code. That function expects a NULL-terminated string, which is bad. Byte code can have embedded NULL characters in it, which would screw everything up. So if you want to do the file IO yourself (because you're using a special filesystem or something), you have to use lua_load directly (or luaL_loadbuffer). Which also uses a callback interface like lua_dump. So read up on how to use it.

Parsing C++ to make some changes in the code

I would like to write a small tool that takes a C++ program (a single .cpp file), finds the "main" function and adds 2 function calls to it, one in the beginning and one in the end.
How can this be done? Can I use g++'s parsing mechanism (or any other parser)?
If you want to make it solid, use clang's libraries.
As suggested by some commenters, let me put forward my idea as an answer:
So basically, the idea is:
... original .cpp file ...
#include <yourHeader>
namespace {
SpecialClass specialClassInstance;
}
Where SpecialClass is something like:
class SpecialClass {
public:
SpecialClass() {
firstFunction();
}
~SpecialClass() {
secondFunction();
}
}
This way, you don't need to parse the C++ file. Since you are declaring a global, its constructor will run before main starts and its destructor will run after main returns.
The downside is that you don't get to know the relative order of when your global is constructed compared to others. So if you need to guarantee that firstFunction is called
before any other constructor elsewhere in the entire program, you're out of luck.
I've heard the GCC parser is both hard to use and even harder to get at without invoking the whole toolchain. I would try the clang C/C++ parser (libparse), and the tutorials linked in this question.
Adding a function at the beginning of main() and at the end of main() is a bad idea. What if someone calls return in the middle?.
A better idea is to instantiate a class at the beginning of main() and let that class destructor do the call function you want called at the end. This would ensure that that function always get called.
If you have control of your main program, you can hack a script to do this, and that's by far the easiet way. Simply make sure the insertion points are obvious (odd comments, required placement of tokens, you choose) and unique (including outlawing general coding practices if you have to, to ensure the uniqueness you need is real). Then a dumb string hacking tool to read the source, find the unique markers, and insert your desired calls will work fine.
If the souce of the main program comes from others sources, and you don't have control, then to do this well you need a full C++ program transformation engine. You don't want to build this yourself, as just the C++ parser is an enormous effort to get right. Others here have mentioned Clang and GCC as answers.
An alternative is our DMS Software Reengineering Toolkit with its C++ front end. DMS, using its C++ front end, can parse code (for a variety of C++ dialects), builds ASTs, carry out full name/type resolution to determine the meaning/definition/use of all symbols. It provides procedural and source-to-source transformations to enable changes to the AST, and can regenerate compilable source code complete with original comments.

How to use CUDA constant memory in a programmer pleasant way?

I'm working on a number crunching app using the CUDA framework. I have some static data that should be accessible to all threads, so I've put it in constant memory like this:
__device__ __constant__ CaseParams deviceCaseParams;
I use the call cudaMemcpyToSymbol to transfer these params from the host to the device:
void copyMetaData(CaseParams* caseParams)
{
cudaMemcpyToSymbol("deviceCaseParams", caseParams, sizeof(CaseParams));
}
which works.
Anyways, it seems (by trial and error, and also from reading posts on the net) that for some sick reason, the declaration of deviceCaseParams and the copy operation of it (the call to cudaMemcpyToSymbol) must be in the same file. At the moment I have these two in a .cu file, but I really want to have the parameter struct in a .cuh file so that any implementation could see it if it wants to. That means that I also have to have the copyMetaData function in the a header file, but this messes up linking (symbol already defined) since both .cpp and .cu files include this header (and thus both the MS C++ compiler and nvcc compiles it).
Does anyone have any advice on design here?
Update: See the comments
With an up-to-date CUDA (e.g. 3.2) you should be able to do the memcpy from within a different translation unit if you're looking up the symbol at runtime (i.e. by passing a string as the first arg to cudaMemcpyToSymbol as you are in your example).
Also, with Fermi-class devices you can just malloc the memory (cudaMalloc), copy to the device memory, and then pass the argument as a const pointer. The compiler will recognise if you are accessing the data uniformly across the warps and if so will use the constant cache. See the CUDA Programming Guide for more info. Note: you would need to compile with -arch=sm_20.
If you're using pre-Fermi CUDA, you will have found out by now that this problem doesn't just apply to constant memory, it applies to anything you want on the CUDA side of things. The only two ways I have found around this are to either:
Write everything CUDA in a single file (.cu), or
If you need to break out code into separate files, restrict yourself to headers which your single .cu file then includes.
If you need to share code between CUDA and C/C++, or have some common code you share between projects, option 2 is the only choice. It seems very unnatural to start with, but it solves the problem. You still get to structure your code, just not in a typically C like way. The main overhead is that every time you do a build you compile everything. The plus side of this (which I think is possibly why it works this way) is that the CUDA compiler has access to all the source code in one hit which is good for optimisation.