C++ to evaluate inclusion file during runtime - c++

What I need to do is to "fine tune" some constant values that should be compiled along with the rest of the program, but I want to verify the results at every change without having to modify a value and recompile the whole program each time. So I was thinking at a sort of plain text configuration file to reload every time I change a number in it, and re-initialize part of the program to take action on the new values. It's something that I do often, but this time what I want to do is to have this configuration file under the form of a valid inclusion file with the following syntax:
const MyStructure[] =
{
{ 1, 0.5f, 0.2f, 0.77f, [other values...] },
{ 3, 0.4f, 0.1f, 0.15f, [other values...] },
[other rows...]
};
If I were using an interpreted language such as Perl, I'd have used the eval() function, which if course is not possible with C++. And while I have read other questions about the possiblity to have an eval() function in C++, what I want is not to evaluate and run this code, just to parse it and put the values in the variables they belong to.
I would probably use a Regular Expression to parse the C syntax above, but again, RegExp still is not something worth using in C++, so can you suggest an alternative method?
It's probably worth saying that I need to parse this file only during the development phase. I will #include it when the program is ready for the release.

Writing your own parser is probably more work than is appropriate for this use case.
A simpler solution would be to just compile the file containing the variables separately, as a shared object or DLL, which can be loaded dynamically at run time. (Precise details depend on your OS.) You could, if desired, invoke the compiler during program initialisation as well.
If you don't want to deal with the complication of finding the symbols and copying them into static variables, you could also compile the bulk of your program as a shared object, with only a small shim as the main executable. That shim would:
If necessary, invoke the compiler to create the data shared object
Dynamically load the data shared object
Dynamically load the program shared object, and
Invoke the main program using it's main entry point (possibly using a different name).
To produce the production version, it is only necessary to compile program and data together, and use it directly without the shim.
Variations on this theme are possible, depending on precise needs.

Related

Compile a C++ function inside a C++ program

Consider the following problem,
A C++ program may emit source of a C++ function, for example, say it will create a string with contents as below:
std::vector<std::shared_ptr<C>> get_ptr_vec()
{
std::vector<std::shared_ptr<C>> vec;
vec.push_back(std::shared_ptr<C>(new C(val1)));
vec.push_back(std::shared_ptr<C>(new C(val2)));
vec.push_back(std::shared_ptr<C>(new C(val3)));
vec.push_back(std::shared_ptr<C>(new C(val4)));
return vec;
}
The values of val1 etc will be determined at runtime when the program create the string of the source above. And this source will be write to a file, say get_ptr_vec.cpp.
Then another C++ program will need to read this source file, and compile it, and call the get_ptr_vec function and get the object it returns. Kind of like a JIT compiler.
Is there any way I can do this? One workaround I think would be having a script that will compile the file, build it into a shared library. And the second program can get the function through dlopen. However, is there anyway to skip this and having the second program to compile the file (without call to system). Note that, the second program will not be able to see this source file at compile time. In fact, there will be likely thousands such small source files emitted by the first program.
To give a little background, the first program will build a tree of expressions, and will serialize the tree by traversing through postorder. Each node of tree will have a string representation written to the file. The second program will read the list of this serialized tree nodes, and need to be able to reconstruct this list of strings to a list of C++ objects (and later from this list I can reconstruct the tree).
I think the LLVM framework may have something to offer here. Can someone give me some pointers on this? Not necessary a full answer, just somewhere for me to start.
You can compile your generated code with clang and emit LLVM bitcode (-emit-llvm flag). Then, statically link your program with parts of LLVM that read bitcode files and JITs them. Finally, take compiled bitcode and run JIT on them, so they will be available in your program's address space.

How does it work and compile a C++ extension of TCL with a Macro and no main function

I have a working set of TCL script plus C++ extension but I dont know exactly how it works and how was it compiled. I am using gcc and linux Arch.
It works as follows: when we execute the test.tcl script it will pass some values to an object of a class defined into the C++ extension. Using these values the extension using a macro give some result and print some graphics.
In the test.tcl scrip I have:
#!object
use_namespace myClass
proc simulate {} {
uplevel #0 {
set running 1
for {} {$running} { } {
moveBugs
draw .world.canvas
.statusbar configure -text "t:[tstep]"
}
}
}
set toroidal 1
set nx 100
set ny 100
set mv_dist 4
setup $nx $ny $mv_dist $toroidal
addBugs 100
# size of a grid cell in pixels
set scale 5
myClass.scale 5
The object.cc looks like:
#include //some includes here
MyClass myClass;
make_model(myClass); // --> this is a macro!
The Macro "make_model(myClass)" expands as follows:
namespace myClass_ns { DEFINE_MYLIB_LIBRARY; int TCL_obj_myClass
(mylib::TCL_obj_init(myClass),TCL_obj(mylib::null_TCL_obj,
(std::string)"myClass",myClass),1); };
The Class definition is:
class MyClass:
{
public:
int tstep; //timestep - updated each time moveBugs is called
int scale; //no. pixels used to represent bugs
void setup(TCL_args args) {
int nx=args, ny=args, moveDistance=args;
bool toroidal=args;
Space::setup(nx,ny,moveDistance,toroidal);
}
The whole thing creates a cell-grid with some dots (bugs) moving from one cell to another.
My questions are:
How do the class methods and variables get the script values?
How is possible to have c++ code and compile it without a main function?
What is that macro doing there in the extension and how it works??
Thanks
Whenever a command in Tcl is run, it calls a function that implements that command. That function is written in a language like C or C++, and it is passed in the arguments (either as strings or Tcl_Obj* values). A full extension will also include a function to do the library initialisation; the function (which is external, has C linkage, and which has a name like Foo_Init if your library is foo.dll) does basic setting up tasks like registering the implementation functions as commands, and it's explicit because it takes a reference to the interpreter context that is being initialised.
The implementation functions can do pretty much anything they want, but to return a result they use one of the functions Tcl_SetResult, Tcl_SetObjResult, etc. and they have to return an int containing the relevant exception code. The usual useful ones are TCL_OK (for no exception) and TCL_ERROR (for stuff's gone wrong). This is a C API, so C++ exceptions aren't allowed.
It's possible to use C++ instance methods as command implementations, provided there's a binding function in between. In particular, the function has to get the instance pointer by casting a ClientData value (an alias for void* in reality, remember this is mostly a C API) and then invoking the method on that. It's a small amount of code.
Compiling things is just building a DLL that links against the right library (or libraries, as required). While extensions are usually recommended to link against the stub library, it's not necessary when you're just developing and testing on one machine. But if you're linking against the Tcl DLL, you'd better make sure that the code gets loaded into a tclsh that uses that DLL. Stub libraries get rid of that tight binding, providing pretty strong ABI stability, but are little more work to set up; you need to define the right C macro to turn them on and you need to do an extra API call in your initialisation function.
I assume you already know how to compile and link C++ code. I won't tell you how to do it, but there's bound to be other questions here on Stack Overflow if you need assistance.
Using the code? For an extension, it's basically just:
# Dynamically load the DLL and call the init function
load /path/to/your.dll
# Commands are all present, so use them
NewCommand 3
There are some extra steps later on to turn a DLL into a proper Tcl package, abstracting code that uses the DLL away from the fact that it is exactly that DLL and so on, but they're not something to worry about until you've got things working a lot more.

Store results in a separate library for later loading

If I have some set of results which can be calculated at compile-time, and I want to use them elsewhere in a program, can I place them in a (shared?) library for later linking? Will this be slower?
For example, I can calculate factorials at compile time using
template<size_t N>
struct Factorial {
constexpr static size_t value = Factorial<N-1>::value * N;
};
template<>
struct Factorial<0> {
constexpr static size_t value = 1;
};
// Possibly an instantiation for a max value?
// template class Factorial<50>;
Then to use this in code, I just write Factorial<32>::value, or similar.
If I assume that my real values take somewhat longer to compute, then I might want to ensure that they aren't recomputed on each build / on any build that considers the old build to be invalid.
Consequently, I move the calculating code into a separate project, which I compile as a shared library.
Now, to use it, I link my main program to the library, and #include the header.
However, the library file is rather small (and seemingly independent of the value passed to create the template), and so I wonder if in fact, the library is only holding the methods to create a Factorial struct, and not the precomputed data.
How can I calculate a series of values and then use them in a separate program?
Solutions which provide compile-time value injection would be preferred - I note that loading a shared library does not fall into this category (I think)
What's happening here is the actual "code" that does the calculation is still in the header. Putting it into a shared library didn't really do anything; the compiler is still recomputing the factorials for your main program. (So, your intuition is correct.)
A better approach is to write another program to spit out the values as a the source code for a C++ constant array, then copy and paste them into your code. This will probably take about 5 lines of Python, and your C++ code will compile and run quickly.
You could calculate the variables as part of your build process (through a seperated application which you compile and invoke as part of your build process) and store the result in a generated source file.
With CMake, (configure_file) Makefiles or NMake for example it should be very easy.
You could use the generated source file to generate a shared library as you suggested or you could link/ include the generated sources into your application directly.
An advantage of this approach is that you are not limited to compile time calculation anymore,
you could also use runtime calculation which will be faster since it can be optimized.

What is the difference between linking and binding?

I was reading about the two things and got confused, what are the differences between the two?
Binding is a word that is used in more than one context. It always has to do with the connecting of one thing to another however when the act of binding happens can vary.
There is a concept of Binding Time or the point at which some component is bound to some other component. A basic list of binding time is: (1) binding at compile time, (2) binding at link time, (3) binding at load time, and (4) binding at run time.
Binding at compile time happens when the source code is compiled. For C/C++ there are two main stages, the Preprocessor which does source text replacement such as define replacement or macro replacement and the compilation of the source text which converts the source text into machine code along with the necessary instructions for the linker.
Binding at link time is when the external symbols are linked to a specific set of object files and libraries. You may have several different static libraries that have the same set of function names but the actual implementation of the function is different. So you can choose which library implementation to use by selecting different static libraries.
Binding at load time is when the loader loads the executable into memory along with any dynamic or shared libraries. The loader binds function calls to a particular dynamic or shared library and the library chosen can vary.
Binding at run time is when the program is actually running and makes choices depending on the current thread of execution.
So linking is actually just one of the types of binding. Take a look at this stackoverflow static linking vs dynamic linking which provides more information about linking and libraries.
You may also be interested in std::bind in C++ so here is a stackoverflow article std::function and std::bind what are they when they should be used.
The longer you wait before something is bound to something else can provide a needed degree of flexibility in how the software can be used. However often there is a trade off between delaying binding and run time efficiency as well as complexity of the source.
For an example of bind time consider an application that opens a file and reads from the file then closes it. You can pick a couple of different times when the file name is bound to the file open.
You might hard code the file name, binding at compile time, which means that it can only be used with that one file. To change the file name you have to change the source and recompile.
You might have the file name entered by the user such as with a user prompt or a command line argument, binding the file name to the file open at run time. To change the file name you no longer need to recompile, you can just run the program again with a different file name.
Suppose you have a function declared as:
void f(int, char);
and also as:
void f(int);
And you call the function f(4) with right signature. This is the binding.
The linker will link with the available definition of the function body for f matching with signature void f(int);
Actually both are having same meaning in the context of c programming. Some people use binding and others are use linking.
If you want ti know what linking is then here is a short explaination.
Suppose you have made a user defined function called sum() whose declaration is as under
int sum(int, int);
then whenever function is called from program, your program should know where to jump in memory to execute that function. In simple terms, called function's address should be known to your program inorder to reach to its body which is called binding.
Now sum is user defined function so it will be present in your source code itself. If it is called from main() then it will be linked to main at compile time because at that time compiler will know that where your function will be present in executable. This is called static binding.
Now think about printf() which is library function and its body is not present in your program. So when program is compiled, printf's body will not be present in your compiled executable. It will be loaded into memory when you execute your program and its address will be known to main at run time and not at compile time as it case of sum(). This type of linking is called dynamic linking.

Parsing C++ to make some changes in the code

I would like to write a small tool that takes a C++ program (a single .cpp file), finds the "main" function and adds 2 function calls to it, one in the beginning and one in the end.
How can this be done? Can I use g++'s parsing mechanism (or any other parser)?
If you want to make it solid, use clang's libraries.
As suggested by some commenters, let me put forward my idea as an answer:
So basically, the idea is:
... original .cpp file ...
#include <yourHeader>
namespace {
SpecialClass specialClassInstance;
}
Where SpecialClass is something like:
class SpecialClass {
public:
SpecialClass() {
firstFunction();
}
~SpecialClass() {
secondFunction();
}
}
This way, you don't need to parse the C++ file. Since you are declaring a global, its constructor will run before main starts and its destructor will run after main returns.
The downside is that you don't get to know the relative order of when your global is constructed compared to others. So if you need to guarantee that firstFunction is called
before any other constructor elsewhere in the entire program, you're out of luck.
I've heard the GCC parser is both hard to use and even harder to get at without invoking the whole toolchain. I would try the clang C/C++ parser (libparse), and the tutorials linked in this question.
Adding a function at the beginning of main() and at the end of main() is a bad idea. What if someone calls return in the middle?.
A better idea is to instantiate a class at the beginning of main() and let that class destructor do the call function you want called at the end. This would ensure that that function always get called.
If you have control of your main program, you can hack a script to do this, and that's by far the easiet way. Simply make sure the insertion points are obvious (odd comments, required placement of tokens, you choose) and unique (including outlawing general coding practices if you have to, to ensure the uniqueness you need is real). Then a dumb string hacking tool to read the source, find the unique markers, and insert your desired calls will work fine.
If the souce of the main program comes from others sources, and you don't have control, then to do this well you need a full C++ program transformation engine. You don't want to build this yourself, as just the C++ parser is an enormous effort to get right. Others here have mentioned Clang and GCC as answers.
An alternative is our DMS Software Reengineering Toolkit with its C++ front end. DMS, using its C++ front end, can parse code (for a variety of C++ dialects), builds ASTs, carry out full name/type resolution to determine the meaning/definition/use of all symbols. It provides procedural and source-to-source transformations to enable changes to the AST, and can regenerate compilable source code complete with original comments.