Is gcc compile time proportional to number of executions or lines of code? - c++

I am using gcc 4.8.5 to compile a c++98 code. My c++ code statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs, and overloaded function which will take ~450 different types. This program will be executed on a continuous stream of data, and for every block of data, overloaded function will return an output.
The problem is, gcc takes too long to compile due to initializing ~20,000 key-value pairs.
The nested unordered_map has a structure of map< DATATYPE, map< key, value >>, and only one of the overloaded function gets called for each data input. In other words, I do not need to statically initialize the entire nested map, but I can instead dynamically define map<key, value> for the corresponding datatype when needed. For example, I can check for the definition of a map and when it is undefined, I can later populate it in run time. This will result a map with ~45 average key-value pairs.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time? My understanding is, whatever alternative I take, I still need to write a code to populate entire key-value pairs. Also, overhead and actual computation that goes behind populating an unordered_map (hashmap) should not differ asymptotically in most cases, and should not show significant difference than running same number of loops to increment a value.
For reference, I am writing a python script that reads in multiple json files to print out the c++ code, which then gets compiled using gcc. I am not reading json directly from c++ so whatever I do, c++ source will need to insert key-value one by one because it will not have access to json file.
// below is someEXE.cpp, which is a result from python script.
// Every line is inside python's print"" (using python 2.7)
// so that it can write complete c++ that should compile.
someEXE.cpp
// example of an overloaded function among ~450
// takes in pointer to data and exampleMap created above
void exampleFunction(DIFFERENT_TYPE1*data,
std::unorderd_map<std::string, std::unordered_map<std::string, std::string>> exampleMap) {
printf("this is in specific format: %s", exampleMap["DATATYPE1"]
[std::to_string(data->member_variable)].c_str();
//... more print functions below (~25 per datatype)
};
int main() {
// current definition of the unordered_map (total ~20,000 pairs)
std::unordered_map<std::string, std::unordered_map<std::string,
std::string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
// create below test function for all ~450 types
// when I run the program, code will printf values to screen
DIFFERENT_TYPE1 testObj = {0};
DIFFERENT_TYPE1 *testObjPointer = &testObj;
exampleFunction(testObjPointer, exampleMap);
return 0;
}
EDIT: My initial question was "Is CMAKE compile time proportional to...". Changed the term "CMAKE" with actual compiler name, gcc 4.8.5 with the help from the comments.

With the further code you posted, and Jonathan Wakely's answer on the specific issue with your compiler, I can make a suggestion.
When writing my own codegen, if possible, I prefer generating plain old data and leaving logic and behaviour in non-generated code. This way you get a small(er) pure C++ code in data-driven style, and a separate block of dumb and easy-to-generate data in declarative style.
For example, directly code this
// GeneratedData.h
namespace GeneratedData {
struct Element {
const char *type;
const char *key;
const char *val;
};
Element const *rawElements();
size_t rawElementCount();
}
and this
// main.cpp
#include "GeneratedData.h"
#include <string>
#include <unordered_map>
using Map = std::unordered_map<std::string, std::string>;
using TypeMap = std::unordered_map<std::string, Map>;
TypeMap buildMap(GeneratedData::Element const *el, size_t count)
{
TypeMap map;
for (; count; ++el, --count) {
// build the whole thing here
}
}
// rest of main can call buildMap once, and keep the big map.
// NB. don't pass it around by value!
and finally generate the big dumb file
// GeneratedData.cpp
#include "GeneratedData.h"
namespace {
GeneratedData::Element const array[] = {
// generated elements here
};
}
namespace GeneratedData {
Element const *rawElements { return array; }
size_t rawElementCount() { return sizeof(array)/sizeof(array[0]); }
}
if you really want to, you can separate even that logic from your codegen by just #includeing it in the middle, but it's probably not necessary here.
Original answer
Is CMAKE
CMake.
... compile time
CMake configures a build system which then invokes your compiler. You haven't told us which build system it is configuring for you, but you could probably run it manually for the problematic object file(s), and see how much of the overhead is really CMake's.
... proportional to number of executions or lines of code?
No.
There is some overhead per-execution. Each executed compiler process has some overhead per line of code, but probably much more overhead per enabled optimization, and some optimizations may scale with cyclomatic complexity or other metrics.
statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs
You should try to hide your giant initialization as much as possible - you haven't shown any code, but if it's only visible in one translation unit, only one object file will take a very long time to compile.
You could also probably use a codegen tool like gperf to build a perfect hash.
I can't give you a lot more detail without seeing at least a fragment of your actual code and some hint as to how your files and translation units are layed out.

Older versions of GCC take a very long time to compile large initializer-lists like this:
unordered_map<string, unordered_map<string, string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
The problem is that every new element in the initializer-list causes more code to be added to the block being compiled, and it gets bigger and bigger, needing to allocate more and more memory for the compiler's AST. Recent versions have been changed to process the initializer-list differently, although some problems still remain. Since you're using GCC 4.8.5 the recent improvements won't help you anyway.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time?
Splitting the large initializer-list into separate statements that insert elements one-by-one will definitely reduce the compile time when using older versions of GCC. Each statement can be compiled very quickly that way, instead of having to compile a single huge initialization that requires allocating more and more memory for each element.

Related

Build a compile time command look up table using template metaprogramming

I am trying to build a command parser for a embedded system(bare metal) where it will receive command via message and call corresponding function. The structure will look like
struct cmdparse{
char* commandname;
function_pointer;
};
Initially the respective modules will register the command they will serve and the corresponding function pointer. The command parser builds the look up table during initialisation. when ever a command is received it parses the table and calls the corresponding function, Is it possible to achieve this i.e build this look up table in compile time using template metaprogramming. The main advantage I am expecting is when ever a new command is added, I don't need to check the command parser to see if the array size needs to be increased. Since it is a embedded system project usage of vector is banned due to dynamic memory requirements. Plus if this look up table goes to ROM instead of RAM it will add a safety clause of avoiding un intentional corruption.
If you have decent compiler (enable at least c++11) you can build at compile-time with:
struct cmdparse{
const char* commandname;
void (*fn)();
};
void whatever1();
void whatever2();
constexpr cmdparse commands[] = {//<--compiler time
cmdparse{"cmd1", &whatever1},
cmdparse{"cmd2", &whatever2}
};
If you don't have a good compiler you may need to remove constexpr - but otherwise this method should work.
Making room for more commands at runtime is perhaps best done in a seperate array:
std::array<cmdparse, 1024> dyn_commands; //<-- supports up to 1024 commands

Declaring/defining c++ struct in different file - better compile time using gcc 4.8.5?

I currently have a c++ code and a Json file. Json file contains enumerations in 2-D structure, so every outer key in Json has a map as its value i.e {Outer_key : {{Inner_key : Inner_value}, ...}, ...}. C++ code contains overloaded print function which parses the input data, and in the process of the function call, the code fetches Inner_value using Outer_key and Inner_key. For each call of the c++ main function, around 0~10 Inner_values are retrieved; however, the entire Json file maps about ~20,000 Inner_values.
I am using python to create the c++ code, and am compiling using gcc (CMAKE). I need to keep some kind of enumeration map within the body of c++ so I can run c++ code, get intermediate integer value and pass it into enumerations to finally return the associated string.
Right now, I list-initialize a 2-D unordered_map in the main function of the c++ file. This takes the shortest time among all the other compile-time initializations; however, it still takes 5~10 minutes.
On suggestion I received is to divide the 2-D enumerations into multiple(total number of Outer_keys) 1-D structs, store them in a different file, then 'use' a specific 1-D struct when needed.
Two questions I have here.
Even if I divide them up, and put them into different files, doesn't the time to compile remain the same?
If the compile time is reduced by splitting up in multiple 1-D structs, what approach should I take in coding this? Should I declare structs in .h then call them in .cpp main()? Should I go ahead and define the structs in additional .cpp file? Should I just typedef enums? Also, within the main function or the print function, how can I initialize only the struct that I need?
.cpp file generated using python below:
void overLoadedPrint (Particular_Datatype *data, std::unordered_map<std::string, std::unordered_map<std::string, std::string>> enumMap) {
printf("%s", enumMap["SomeKey"][A->member1.innerMember1].c_str());
//A->member1.innerMember1 returns integer.
//"SomeKey" is known in python so corresponding key is inputted.
}
int main() {
std::unordered_map<std::string, std::unordered_map<std::string, std::string>> enumMap = {{"A", {{"1", "a"},{"2", "b"}}...}
//list-initalize enumMap.
//compile time significantly increases here.
//info of this map is stored in a single json file.
overLoadedPrint(someData, enumMap);
return 0;
}
Even if I divide them up, and put them into different files, doesn't the time to compile remain the same?
Splitting a translation unit into fragments usually increases compilation time from scratch.
However, each translation unit can be compiled separately, so if you change only one, then only that file need to be recompiled. Having to compile a fraction of a program is usually much faster than compiling it entirely.

Store results in a separate library for later loading

If I have some set of results which can be calculated at compile-time, and I want to use them elsewhere in a program, can I place them in a (shared?) library for later linking? Will this be slower?
For example, I can calculate factorials at compile time using
template<size_t N>
struct Factorial {
constexpr static size_t value = Factorial<N-1>::value * N;
};
template<>
struct Factorial<0> {
constexpr static size_t value = 1;
};
// Possibly an instantiation for a max value?
// template class Factorial<50>;
Then to use this in code, I just write Factorial<32>::value, or similar.
If I assume that my real values take somewhat longer to compute, then I might want to ensure that they aren't recomputed on each build / on any build that considers the old build to be invalid.
Consequently, I move the calculating code into a separate project, which I compile as a shared library.
Now, to use it, I link my main program to the library, and #include the header.
However, the library file is rather small (and seemingly independent of the value passed to create the template), and so I wonder if in fact, the library is only holding the methods to create a Factorial struct, and not the precomputed data.
How can I calculate a series of values and then use them in a separate program?
Solutions which provide compile-time value injection would be preferred - I note that loading a shared library does not fall into this category (I think)
What's happening here is the actual "code" that does the calculation is still in the header. Putting it into a shared library didn't really do anything; the compiler is still recomputing the factorials for your main program. (So, your intuition is correct.)
A better approach is to write another program to spit out the values as a the source code for a C++ constant array, then copy and paste them into your code. This will probably take about 5 lines of Python, and your C++ code will compile and run quickly.
You could calculate the variables as part of your build process (through a seperated application which you compile and invoke as part of your build process) and store the result in a generated source file.
With CMake, (configure_file) Makefiles or NMake for example it should be very easy.
You could use the generated source file to generate a shared library as you suggested or you could link/ include the generated sources into your application directly.
An advantage of this approach is that you are not limited to compile time calculation anymore,
you could also use runtime calculation which will be faster since it can be optimized.

Proper way of designing functions with and without debug information in C++

When I design a function in a class, I want to balance the information I can extract from it. Some information may be useful for debug but not necessary as the output of the function. I give the following example:
class A
{
bool my_func(int arg1, int &output, std::vector<int> &intermediate_vec);
{
// do something
}
}
In the function my_func, std::vector<int> &intermediate_vec is not necessary as the only information I am interested in is stored in the variable output. However, for debug purpose I am also interested in obtaining intermediate_vec as it is not convenient to check this variable inside the function for some reason. Therefore, I am considering designing two functions inside class A, one is used for debug, and the other is for real application.
class A
{
// for debug
bool my_func(int arg1, int &output, std::vector<int> &intermediate_vec);
{
// do something
}
// invoked by other programs
bool my_func(int arg1, int &output);
{
// do something
std::vector<int> intermediate_vec
return my_func(arg1, output, intermediate_vec);
}
}
I am just wondering wheter there are better ways to do the job. Thanks.
Use a logging library and log those intermediate values at debug log level instead of collecting them as output.
If you plan on using the intermediate_vec in some debug post-processing it can be tricky. However, if you only plan on using it just to print the results it easier.
The main thing I dislike in your idea is having //do something, which seem to do exactly the same, in two different places. This is very error-prone and starts to grow into a real PIA, when you will have to maintain a dozen classes with a dozen methods, and half of them have some debug copy-cat. Every change in logic has to be done twice in a coherent manner.
When I came upon a similar problem I was considering following things, to avoid doubling logic while performing conditional logging and/or additional instrumentation.
#define DEBUG/NDEBUG
You just have one copy of code with some pre-processor conditionals.
template < int DEBUG >.
Basically the same effect but different semantics.
The template method might complicate the coding a little bit, but it will allow to use both version during the run-time which might come in handy. The #define method does not alter API at all but you really need to think when designing code if you want some fancy selective or multilevel debugging.
The two functions method was ok in my use-cases when I had to have safe version and fast version of routine. The safe did some checks and then called fast number cruncher. This was useful if the number cruncher was used in loops or internally where it was safe to assume you can skip checks.
If the debug version is slower (e.g. 'casue you need to initialize and fill a long vector), then you probably do not want to call it in a release code. The same goes for logging. If you really need to output one number, but in the debug version you will end up printing megabytes of data (e.g. calculating a norm of a vector and printing the vector itself), you will want to use conditional logging.
So in overall this would look more like this:
class A
{
bool my_func(int arg1, int &output, std::vector<int> &intermediate_vec);
{
if(DEBUG) {//fill in the vector}
// do something
if(DEBUG) {//print out some fancy stuff}
}
// invoked by other programs
bool my_func(int arg1, int &output);
{
std::vector<int> intermediate_vec;
return my_func(arg1, output, intermediate_vec);
}
}
Of course then you can use short-call with the debug, but you won't get the vector back. Or full-call in no-debug mode, however the intermediate_vec will not be meaningful.
Anything to avoid copy-pasting application logic stuff. I did it and I was very miserable when it came to changing the logic.

c++ loading large amount of data at compile time

I have a C++ object which needs a huge amount of data to instantiate. For example:
class object {
public object() {
double a[] = { array with 1 million double element };
/* rest of code here*/};
private:
/* code here*/
}
Now the data (i.e 1 million double numbers) is in a separate text file. The question: How can I put it after "double a[]" in an efficient way and eventually compile the code? I do not want to read the data at run time from a file. I want it compiled with the object. What can be a solution? Ideally I would like the data to sit in the separate text file as it presently resides and somehow also have an assignment like double a[] =..... above.
Is this possible? Thanks in advance!
Something like:
class object
{
public
object(){ double a[] = {
#include "file.h"
};
/* rest of code here*/};
private:
/* code here*/
}
The file has to be formatted correctly though - i.e. contain something like:
//file.h
23, 24, 40,
5, 1.1,
In general, you can use #include directives to paste content into files. I've seen virtual methods being pasted like that, if they were common for most derived classes. I personally don't really like this technique.
One large problem with this design is that 1 million ints on the stack will probably blow the stack. What you probably want is to put the data on the data segment, or in some kind of resource that is stores in your binary file and can be loaded at run time. If you need more than one copy of the data, duplicate it into a std::vector at run time, so you know the data is on the free store (heap). Mayhap even use a shared_ptr to a std::array to reduce the chance of needless accidental duplication(or unique_ptr to reduce the chance of reference duplication).
4mb of data is not going to play all that well is all I am saying. And locality of reference on a 4mb array to your other variables is not going to be your biggest concern.
Depending in your compiled target platform and framework, there will be ways to stuff this kind of data into a binary resource. I've never done it for a multi-meg file, but here is the visual studio help on resource files: http://msdn.microsoft.com/en-us/library/7zxb70x7%28v=vs.80%29.aspx
Note that "the data being in the code" does not make it fundamentally faster to load (other than traversing the filesystem once to find it maybe). The OS still has to load the binary, and larger binaries take more time to load, and a big array of values will take up as much room in a binary as it does in a distinct file. The real advantage is that it isn't a file that can be "misplaced" relative to your executable, but resource fork/resource file/etc methods can deal with that.
As noted in the comments below, static const data (and global data) tends to be loaded into the data segment, which is distinct from both the heap (aka free store) and stack (aka automatic store). I forget what the standard calls it. I do know that a static local variable in a function will behave differently than a static or global non-local variable with regards to initialization order (global (static or not) data gets initialized fully prior to main starting, while static local is initialized the first time the function is called, if I remember correctly).
The answer of Luchian Grigore is quite correct. But compiler can have some limit on length of source code line. See for example https://stackoverflow.com/questions/10519738/source-line-length-limit
So try on your compiler. But I am afraid, more simple solution of your problem will be reading of huge data from file.