Store results in a separate library for later loading - c++

If I have some set of results which can be calculated at compile-time, and I want to use them elsewhere in a program, can I place them in a (shared?) library for later linking? Will this be slower?
For example, I can calculate factorials at compile time using
template<size_t N>
struct Factorial {
constexpr static size_t value = Factorial<N-1>::value * N;
};
template<>
struct Factorial<0> {
constexpr static size_t value = 1;
};
// Possibly an instantiation for a max value?
// template class Factorial<50>;
Then to use this in code, I just write Factorial<32>::value, or similar.
If I assume that my real values take somewhat longer to compute, then I might want to ensure that they aren't recomputed on each build / on any build that considers the old build to be invalid.
Consequently, I move the calculating code into a separate project, which I compile as a shared library.
Now, to use it, I link my main program to the library, and #include the header.
However, the library file is rather small (and seemingly independent of the value passed to create the template), and so I wonder if in fact, the library is only holding the methods to create a Factorial struct, and not the precomputed data.
How can I calculate a series of values and then use them in a separate program?
Solutions which provide compile-time value injection would be preferred - I note that loading a shared library does not fall into this category (I think)

What's happening here is the actual "code" that does the calculation is still in the header. Putting it into a shared library didn't really do anything; the compiler is still recomputing the factorials for your main program. (So, your intuition is correct.)
A better approach is to write another program to spit out the values as a the source code for a C++ constant array, then copy and paste them into your code. This will probably take about 5 lines of Python, and your C++ code will compile and run quickly.

You could calculate the variables as part of your build process (through a seperated application which you compile and invoke as part of your build process) and store the result in a generated source file.
With CMake, (configure_file) Makefiles or NMake for example it should be very easy.
You could use the generated source file to generate a shared library as you suggested or you could link/ include the generated sources into your application directly.
An advantage of this approach is that you are not limited to compile time calculation anymore,
you could also use runtime calculation which will be faster since it can be optimized.

Related

Is gcc compile time proportional to number of executions or lines of code?

I am using gcc 4.8.5 to compile a c++98 code. My c++ code statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs, and overloaded function which will take ~450 different types. This program will be executed on a continuous stream of data, and for every block of data, overloaded function will return an output.
The problem is, gcc takes too long to compile due to initializing ~20,000 key-value pairs.
The nested unordered_map has a structure of map< DATATYPE, map< key, value >>, and only one of the overloaded function gets called for each data input. In other words, I do not need to statically initialize the entire nested map, but I can instead dynamically define map<key, value> for the corresponding datatype when needed. For example, I can check for the definition of a map and when it is undefined, I can later populate it in run time. This will result a map with ~45 average key-value pairs.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time? My understanding is, whatever alternative I take, I still need to write a code to populate entire key-value pairs. Also, overhead and actual computation that goes behind populating an unordered_map (hashmap) should not differ asymptotically in most cases, and should not show significant difference than running same number of loops to increment a value.
For reference, I am writing a python script that reads in multiple json files to print out the c++ code, which then gets compiled using gcc. I am not reading json directly from c++ so whatever I do, c++ source will need to insert key-value one by one because it will not have access to json file.
// below is someEXE.cpp, which is a result from python script.
// Every line is inside python's print"" (using python 2.7)
// so that it can write complete c++ that should compile.
someEXE.cpp
// example of an overloaded function among ~450
// takes in pointer to data and exampleMap created above
void exampleFunction(DIFFERENT_TYPE1*data,
std::unorderd_map<std::string, std::unordered_map<std::string, std::string>> exampleMap) {
printf("this is in specific format: %s", exampleMap["DATATYPE1"]
[std::to_string(data->member_variable)].c_str();
//... more print functions below (~25 per datatype)
};
int main() {
// current definition of the unordered_map (total ~20,000 pairs)
std::unordered_map<std::string, std::unordered_map<std::string,
std::string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
// create below test function for all ~450 types
// when I run the program, code will printf values to screen
DIFFERENT_TYPE1 testObj = {0};
DIFFERENT_TYPE1 *testObjPointer = &testObj;
exampleFunction(testObjPointer, exampleMap);
return 0;
}
EDIT: My initial question was "Is CMAKE compile time proportional to...". Changed the term "CMAKE" with actual compiler name, gcc 4.8.5 with the help from the comments.
With the further code you posted, and Jonathan Wakely's answer on the specific issue with your compiler, I can make a suggestion.
When writing my own codegen, if possible, I prefer generating plain old data and leaving logic and behaviour in non-generated code. This way you get a small(er) pure C++ code in data-driven style, and a separate block of dumb and easy-to-generate data in declarative style.
For example, directly code this
// GeneratedData.h
namespace GeneratedData {
struct Element {
const char *type;
const char *key;
const char *val;
};
Element const *rawElements();
size_t rawElementCount();
}
and this
// main.cpp
#include "GeneratedData.h"
#include <string>
#include <unordered_map>
using Map = std::unordered_map<std::string, std::string>;
using TypeMap = std::unordered_map<std::string, Map>;
TypeMap buildMap(GeneratedData::Element const *el, size_t count)
{
TypeMap map;
for (; count; ++el, --count) {
// build the whole thing here
}
}
// rest of main can call buildMap once, and keep the big map.
// NB. don't pass it around by value!
and finally generate the big dumb file
// GeneratedData.cpp
#include "GeneratedData.h"
namespace {
GeneratedData::Element const array[] = {
// generated elements here
};
}
namespace GeneratedData {
Element const *rawElements { return array; }
size_t rawElementCount() { return sizeof(array)/sizeof(array[0]); }
}
if you really want to, you can separate even that logic from your codegen by just #includeing it in the middle, but it's probably not necessary here.
Original answer
Is CMAKE
CMake.
... compile time
CMake configures a build system which then invokes your compiler. You haven't told us which build system it is configuring for you, but you could probably run it manually for the problematic object file(s), and see how much of the overhead is really CMake's.
... proportional to number of executions or lines of code?
No.
There is some overhead per-execution. Each executed compiler process has some overhead per line of code, but probably much more overhead per enabled optimization, and some optimizations may scale with cyclomatic complexity or other metrics.
statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs
You should try to hide your giant initialization as much as possible - you haven't shown any code, but if it's only visible in one translation unit, only one object file will take a very long time to compile.
You could also probably use a codegen tool like gperf to build a perfect hash.
I can't give you a lot more detail without seeing at least a fragment of your actual code and some hint as to how your files and translation units are layed out.
Older versions of GCC take a very long time to compile large initializer-lists like this:
unordered_map<string, unordered_map<string, string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
The problem is that every new element in the initializer-list causes more code to be added to the block being compiled, and it gets bigger and bigger, needing to allocate more and more memory for the compiler's AST. Recent versions have been changed to process the initializer-list differently, although some problems still remain. Since you're using GCC 4.8.5 the recent improvements won't help you anyway.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time?
Splitting the large initializer-list into separate statements that insert elements one-by-one will definitely reduce the compile time when using older versions of GCC. Each statement can be compiled very quickly that way, instead of having to compile a single huge initialization that requires allocating more and more memory for each element.

C++ to evaluate inclusion file during runtime

What I need to do is to "fine tune" some constant values that should be compiled along with the rest of the program, but I want to verify the results at every change without having to modify a value and recompile the whole program each time. So I was thinking at a sort of plain text configuration file to reload every time I change a number in it, and re-initialize part of the program to take action on the new values. It's something that I do often, but this time what I want to do is to have this configuration file under the form of a valid inclusion file with the following syntax:
const MyStructure[] =
{
{ 1, 0.5f, 0.2f, 0.77f, [other values...] },
{ 3, 0.4f, 0.1f, 0.15f, [other values...] },
[other rows...]
};
If I were using an interpreted language such as Perl, I'd have used the eval() function, which if course is not possible with C++. And while I have read other questions about the possiblity to have an eval() function in C++, what I want is not to evaluate and run this code, just to parse it and put the values in the variables they belong to.
I would probably use a Regular Expression to parse the C syntax above, but again, RegExp still is not something worth using in C++, so can you suggest an alternative method?
It's probably worth saying that I need to parse this file only during the development phase. I will #include it when the program is ready for the release.
Writing your own parser is probably more work than is appropriate for this use case.
A simpler solution would be to just compile the file containing the variables separately, as a shared object or DLL, which can be loaded dynamically at run time. (Precise details depend on your OS.) You could, if desired, invoke the compiler during program initialisation as well.
If you don't want to deal with the complication of finding the symbols and copying them into static variables, you could also compile the bulk of your program as a shared object, with only a small shim as the main executable. That shim would:
If necessary, invoke the compiler to create the data shared object
Dynamically load the data shared object
Dynamically load the program shared object, and
Invoke the main program using it's main entry point (possibly using a different name).
To produce the production version, it is only necessary to compile program and data together, and use it directly without the shim.
Variations on this theme are possible, depending on precise needs.

How does it work and compile a C++ extension of TCL with a Macro and no main function

I have a working set of TCL script plus C++ extension but I dont know exactly how it works and how was it compiled. I am using gcc and linux Arch.
It works as follows: when we execute the test.tcl script it will pass some values to an object of a class defined into the C++ extension. Using these values the extension using a macro give some result and print some graphics.
In the test.tcl scrip I have:
#!object
use_namespace myClass
proc simulate {} {
uplevel #0 {
set running 1
for {} {$running} { } {
moveBugs
draw .world.canvas
.statusbar configure -text "t:[tstep]"
}
}
}
set toroidal 1
set nx 100
set ny 100
set mv_dist 4
setup $nx $ny $mv_dist $toroidal
addBugs 100
# size of a grid cell in pixels
set scale 5
myClass.scale 5
The object.cc looks like:
#include //some includes here
MyClass myClass;
make_model(myClass); // --> this is a macro!
The Macro "make_model(myClass)" expands as follows:
namespace myClass_ns { DEFINE_MYLIB_LIBRARY; int TCL_obj_myClass
(mylib::TCL_obj_init(myClass),TCL_obj(mylib::null_TCL_obj,
(std::string)"myClass",myClass),1); };
The Class definition is:
class MyClass:
{
public:
int tstep; //timestep - updated each time moveBugs is called
int scale; //no. pixels used to represent bugs
void setup(TCL_args args) {
int nx=args, ny=args, moveDistance=args;
bool toroidal=args;
Space::setup(nx,ny,moveDistance,toroidal);
}
The whole thing creates a cell-grid with some dots (bugs) moving from one cell to another.
My questions are:
How do the class methods and variables get the script values?
How is possible to have c++ code and compile it without a main function?
What is that macro doing there in the extension and how it works??
Thanks
Whenever a command in Tcl is run, it calls a function that implements that command. That function is written in a language like C or C++, and it is passed in the arguments (either as strings or Tcl_Obj* values). A full extension will also include a function to do the library initialisation; the function (which is external, has C linkage, and which has a name like Foo_Init if your library is foo.dll) does basic setting up tasks like registering the implementation functions as commands, and it's explicit because it takes a reference to the interpreter context that is being initialised.
The implementation functions can do pretty much anything they want, but to return a result they use one of the functions Tcl_SetResult, Tcl_SetObjResult, etc. and they have to return an int containing the relevant exception code. The usual useful ones are TCL_OK (for no exception) and TCL_ERROR (for stuff's gone wrong). This is a C API, so C++ exceptions aren't allowed.
It's possible to use C++ instance methods as command implementations, provided there's a binding function in between. In particular, the function has to get the instance pointer by casting a ClientData value (an alias for void* in reality, remember this is mostly a C API) and then invoking the method on that. It's a small amount of code.
Compiling things is just building a DLL that links against the right library (or libraries, as required). While extensions are usually recommended to link against the stub library, it's not necessary when you're just developing and testing on one machine. But if you're linking against the Tcl DLL, you'd better make sure that the code gets loaded into a tclsh that uses that DLL. Stub libraries get rid of that tight binding, providing pretty strong ABI stability, but are little more work to set up; you need to define the right C macro to turn them on and you need to do an extra API call in your initialisation function.
I assume you already know how to compile and link C++ code. I won't tell you how to do it, but there's bound to be other questions here on Stack Overflow if you need assistance.
Using the code? For an extension, it's basically just:
# Dynamically load the DLL and call the init function
load /path/to/your.dll
# Commands are all present, so use them
NewCommand 3
There are some extra steps later on to turn a DLL into a proper Tcl package, abstracting code that uses the DLL away from the fact that it is exactly that DLL and so on, but they're not something to worry about until you've got things working a lot more.

c++ loading large amount of data at compile time

I have a C++ object which needs a huge amount of data to instantiate. For example:
class object {
public object() {
double a[] = { array with 1 million double element };
/* rest of code here*/};
private:
/* code here*/
}
Now the data (i.e 1 million double numbers) is in a separate text file. The question: How can I put it after "double a[]" in an efficient way and eventually compile the code? I do not want to read the data at run time from a file. I want it compiled with the object. What can be a solution? Ideally I would like the data to sit in the separate text file as it presently resides and somehow also have an assignment like double a[] =..... above.
Is this possible? Thanks in advance!
Something like:
class object
{
public
object(){ double a[] = {
#include "file.h"
};
/* rest of code here*/};
private:
/* code here*/
}
The file has to be formatted correctly though - i.e. contain something like:
//file.h
23, 24, 40,
5, 1.1,
In general, you can use #include directives to paste content into files. I've seen virtual methods being pasted like that, if they were common for most derived classes. I personally don't really like this technique.
One large problem with this design is that 1 million ints on the stack will probably blow the stack. What you probably want is to put the data on the data segment, or in some kind of resource that is stores in your binary file and can be loaded at run time. If you need more than one copy of the data, duplicate it into a std::vector at run time, so you know the data is on the free store (heap). Mayhap even use a shared_ptr to a std::array to reduce the chance of needless accidental duplication(or unique_ptr to reduce the chance of reference duplication).
4mb of data is not going to play all that well is all I am saying. And locality of reference on a 4mb array to your other variables is not going to be your biggest concern.
Depending in your compiled target platform and framework, there will be ways to stuff this kind of data into a binary resource. I've never done it for a multi-meg file, but here is the visual studio help on resource files: http://msdn.microsoft.com/en-us/library/7zxb70x7%28v=vs.80%29.aspx
Note that "the data being in the code" does not make it fundamentally faster to load (other than traversing the filesystem once to find it maybe). The OS still has to load the binary, and larger binaries take more time to load, and a big array of values will take up as much room in a binary as it does in a distinct file. The real advantage is that it isn't a file that can be "misplaced" relative to your executable, but resource fork/resource file/etc methods can deal with that.
As noted in the comments below, static const data (and global data) tends to be loaded into the data segment, which is distinct from both the heap (aka free store) and stack (aka automatic store). I forget what the standard calls it. I do know that a static local variable in a function will behave differently than a static or global non-local variable with regards to initialization order (global (static or not) data gets initialized fully prior to main starting, while static local is initialized the first time the function is called, if I remember correctly).
The answer of Luchian Grigore is quite correct. But compiler can have some limit on length of source code line. See for example https://stackoverflow.com/questions/10519738/source-line-length-limit
So try on your compiler. But I am afraid, more simple solution of your problem will be reading of huge data from file.

Performance penalty for large C++ dll's with autogenerated C code

I am working on a piece of software that needs to call a family of optimisation solvers. Each solver is an auto-generated piece of C code, with thousands of lines of code. I am using 200 of these solvers, differing only in the size of optimisation problem to be solved.
All-in-all, these auto-generated solvers come to about 180MB of C code, which I compile to C++ using the extern "C"{ /*200 solvers' headers*/ } syntax, in Visual Studio 2008. Compiling all of this is very slow (with the "maximum speed /O2" optimisation flag, it takes about 8hours). For this reason I thought it would be a good idea to compile the solvers into a single DLL, which I can then call from a separate piece of software (which would have a reasonable compile time, and allow me to abstract away all this extern "C" stuff from higher-level code). The compiled DLL is then about 37MB.
The problem is that when executing one of these solvers using the DLL, execution requires about 30ms. If I were to compile only that single one solvers into a DLL, and call that from the same program, execution is about 100x faster (<1ms). Why is this? Can I get around it?
The DLL looks as below. Each solver uses the same structures (i.e. they have the same member variables), but they have different names, hence all the type casting.
extern "C"{
#include "../Generated/include/optim_001.h"
#include "../Generated/include/optim_002.h"
/*etc.*/
#include "../Generated/include/optim_200.h"
}
namespace InterceptionTrajectorySolver
{
__declspec(dllexport) InterceptionTrajectoryExitFlag SolveIntercept(unsigned numSteps, InputParams params, double* optimSoln, OutputInfo* infoOut)
{
int exitFlag;
switch(numSteps)
{
case 1:
exitFlag = optim_001_solve((optim_001_params*) &params, (optim_001_output*) optimSoln, (optim_001_info*) &infoOut);
break;
case 2:
exitFlag = optim_002_solve((optim_002_params*) &params, (optim_002_output*) optimSoln, (optim_002_info*) &infoOut);
break;
/*
...
etc.
...
*/
case 200:
exitFlag = optim_200_solve((optim_200_params*) &params, (optim_200_output*) optimSoln, (optim_200_info*) &infoOut);
break;
}
return exitFlag;
};
};
I do not know if your code is inlined into each case part in the example. If your functions are inline functions and you are putting it all inside one function then it will be much slower because the code is laid out in virtual memory, which will require much jumping around for the CPU as the code is executed. If it is not all inlined then perhaps these suggestions might help.
Your solution might be improved by...
A)
1) Divide the project into 200 separate dlls. Then build with a .bat file or similar.
2) Make the export function in each dll called "MyEntryPoint", and then use dynamic linking to load in the libraries as they are needed. This will then be the equivalent of a busy music program with a lot of small dll plugins loaded. Take a function pointer to the EntryPoint with GetProcAddress.
Or...
B) Build each solution as a separate .lib file. This will then compile very quickly per solution and you can then link them all together. Build an array of function pointers to all the functions and call it via lookup instead.
result = SolveInterceptWhichStep;
Combine all the libs into one big lib should not take eight hours. If it takes that long then you are doing something very wrong.
AND...
Try putting the code into different actual .cpp files. Perhaps that specific compiler will do a better job if they are all in different units etc... Then once each unit has been compiled it will stay compiled if you do not change anything.
Make sure that you measure and average the timing multiple calls to the optimizer, because it could be that there's a large overhead to the setup before the first call.
Then also check what that 200-branch conditional statement (your switch) is doing to your performance! Try eliminating that switch for testing, calling just one solver in your test project but linking all of them in the DLL. Do you still see slow performance?
I assume the reason you are generating the code is for better run-time performance, and also for better correctness.
I do the same thing.
I suggest you try this technique to find out what the run-time performance problem is.
If you're seeing a 100:1 performance difference, that means each time you interrupt it and look at the program's state, there is a 99% chance you will see what the problem is.
As far as build time goes, sure it makes sense to modularize it.
None of that should have much effect on run time, unless it means you're doing crazy I/O.