Unable to recursively multiply BigInt beyond a certain number of iterations at compile-time in D - d

I need to get the product of an arbitrary number of variables. The actual number of variables and their values will be known at compile-time, however I cannot hardcode these because they come from reflection done on types at compile-time, using templates.
I can get the product of these into a BigInt at runtime just fine, however if I try to do so at compile-time using templates and immutable variables, I can only get the product for a small number of variables before I get a compiler error.
Here is a condensed example that doesn't use type-traits, but suffers from the same issue:
import std.bigint; // BigInt
import std.stdio; // writeln
template Product(ulong value) {
immutable BigInt Product = value;
}
template Product(ulong value, values...) {
immutable BigInt Product = Product!value * Product!values;
}
immutable BigInt NO_PROBLEM = cast(BigInt)ulong.max * ulong.max * ulong.max;
immutable BigInt ERROR = Product!(ulong.max, ulong.max, ulong.max);
void main() {
writeln(NO_PROBLEM, " ", ERROR);
}
Trying to compile this with dmd compiler gives the error message:
/opt/compiler-explorer/dmd2-nightly/dmd2/linux/bin64/../../src/druntime/import/core/cpuid.d(121): Error: static variable `_dataCaches` cannot be read at compile time
/opt/compiler-explorer/dmd2-nightly/dmd2/linux/bin64/../../src/phobos/std/internal/math/biguintcore.d(200): called from here: `dataCaches()`
/opt/compiler-explorer/dmd2-nightly/dmd2/linux/bin64/../../src/phobos/std/internal/math/biguintcore.d(1547): called from here: `getCacheLimit()`
/opt/compiler-explorer/dmd2-nightly/dmd2/linux/bin64/../../src/phobos/std/internal/math/biguintcore.d(758): called from here: `mulInternal(result, cast(const(uint)[])y.data, cast(const(uint)[])x.data)`
/opt/compiler-explorer/dmd2-nightly/dmd2/linux/bin64/../../src/phobos/std/bigint.d(380): called from here: `mul(this.data, y.data)`
/opt/compiler-explorer/dmd2-nightly/dmd2/linux/bin64/../../src/phobos/std/bigint.d(380): called from here: `this.data.opAssign(mul(this.data, y.data))`
/opt/compiler-explorer/dmd2-nightly/dmd2/linux/bin64/../../src/phobos/std/bigint.d(430): called from here: `r.opOpAssign(y)`
<source>(9): called from here: `Product.opBinary(Product)`
<source>(13): Error: template instance `example.Product!(18446744073709551615LU, 18446744073709551615LU, 18446744073709551615LU)` error instantiating
ASM generation compiler returned: 1
I'm quite puzzled by this. At initial glance, it would appear that too much memory is being requested at compile-time (I would understand if there were less heap available during compile-time execution than at runtime), however I'm not sure this is actually the problem, as I can generate the result at compile-time, just not through the recursive template.
Could it be a bug in the Phobos runtime, or an undocumented limitation?
std.bigint appears to be designed to be able to produce huge values at compile-time, with lines such as this compiling and executing fine (and bloating the size of the executable!):
immutable BigInt VERY_BIG = BigInt(2) ^^ 10000000;

The error happens on the last line of this function:
https://github.com/dlang/phobos/blob/e0af01c8adf75b164b43832dd7544e297347cf6f/std/internal/math/biguintcore.d#L1824-L1844
It looks like std.bigint is currently not written to work in CTFE in this circumstance. Perhaps simply making the GC.free call conditional on __ctfe will fix the problem.
As to why it happens with 10 iterations but not 11, the function has a branch which allows performing the calculation for small numbers without dynamic memory allocation.

Related

Is gcc compile time proportional to number of executions or lines of code?

I am using gcc 4.8.5 to compile a c++98 code. My c++ code statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs, and overloaded function which will take ~450 different types. This program will be executed on a continuous stream of data, and for every block of data, overloaded function will return an output.
The problem is, gcc takes too long to compile due to initializing ~20,000 key-value pairs.
The nested unordered_map has a structure of map< DATATYPE, map< key, value >>, and only one of the overloaded function gets called for each data input. In other words, I do not need to statically initialize the entire nested map, but I can instead dynamically define map<key, value> for the corresponding datatype when needed. For example, I can check for the definition of a map and when it is undefined, I can later populate it in run time. This will result a map with ~45 average key-value pairs.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time? My understanding is, whatever alternative I take, I still need to write a code to populate entire key-value pairs. Also, overhead and actual computation that goes behind populating an unordered_map (hashmap) should not differ asymptotically in most cases, and should not show significant difference than running same number of loops to increment a value.
For reference, I am writing a python script that reads in multiple json files to print out the c++ code, which then gets compiled using gcc. I am not reading json directly from c++ so whatever I do, c++ source will need to insert key-value one by one because it will not have access to json file.
// below is someEXE.cpp, which is a result from python script.
// Every line is inside python's print"" (using python 2.7)
// so that it can write complete c++ that should compile.
someEXE.cpp
// example of an overloaded function among ~450
// takes in pointer to data and exampleMap created above
void exampleFunction(DIFFERENT_TYPE1*data,
std::unorderd_map<std::string, std::unordered_map<std::string, std::string>> exampleMap) {
printf("this is in specific format: %s", exampleMap["DATATYPE1"]
[std::to_string(data->member_variable)].c_str();
//... more print functions below (~25 per datatype)
};
int main() {
// current definition of the unordered_map (total ~20,000 pairs)
std::unordered_map<std::string, std::unordered_map<std::string,
std::string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
// create below test function for all ~450 types
// when I run the program, code will printf values to screen
DIFFERENT_TYPE1 testObj = {0};
DIFFERENT_TYPE1 *testObjPointer = &testObj;
exampleFunction(testObjPointer, exampleMap);
return 0;
}
EDIT: My initial question was "Is CMAKE compile time proportional to...". Changed the term "CMAKE" with actual compiler name, gcc 4.8.5 with the help from the comments.
With the further code you posted, and Jonathan Wakely's answer on the specific issue with your compiler, I can make a suggestion.
When writing my own codegen, if possible, I prefer generating plain old data and leaving logic and behaviour in non-generated code. This way you get a small(er) pure C++ code in data-driven style, and a separate block of dumb and easy-to-generate data in declarative style.
For example, directly code this
// GeneratedData.h
namespace GeneratedData {
struct Element {
const char *type;
const char *key;
const char *val;
};
Element const *rawElements();
size_t rawElementCount();
}
and this
// main.cpp
#include "GeneratedData.h"
#include <string>
#include <unordered_map>
using Map = std::unordered_map<std::string, std::string>;
using TypeMap = std::unordered_map<std::string, Map>;
TypeMap buildMap(GeneratedData::Element const *el, size_t count)
{
TypeMap map;
for (; count; ++el, --count) {
// build the whole thing here
}
}
// rest of main can call buildMap once, and keep the big map.
// NB. don't pass it around by value!
and finally generate the big dumb file
// GeneratedData.cpp
#include "GeneratedData.h"
namespace {
GeneratedData::Element const array[] = {
// generated elements here
};
}
namespace GeneratedData {
Element const *rawElements { return array; }
size_t rawElementCount() { return sizeof(array)/sizeof(array[0]); }
}
if you really want to, you can separate even that logic from your codegen by just #includeing it in the middle, but it's probably not necessary here.
Original answer
Is CMAKE
CMake.
... compile time
CMake configures a build system which then invokes your compiler. You haven't told us which build system it is configuring for you, but you could probably run it manually for the problematic object file(s), and see how much of the overhead is really CMake's.
... proportional to number of executions or lines of code?
No.
There is some overhead per-execution. Each executed compiler process has some overhead per line of code, but probably much more overhead per enabled optimization, and some optimizations may scale with cyclomatic complexity or other metrics.
statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs
You should try to hide your giant initialization as much as possible - you haven't shown any code, but if it's only visible in one translation unit, only one object file will take a very long time to compile.
You could also probably use a codegen tool like gperf to build a perfect hash.
I can't give you a lot more detail without seeing at least a fragment of your actual code and some hint as to how your files and translation units are layed out.
Older versions of GCC take a very long time to compile large initializer-lists like this:
unordered_map<string, unordered_map<string, string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
The problem is that every new element in the initializer-list causes more code to be added to the block being compiled, and it gets bigger and bigger, needing to allocate more and more memory for the compiler's AST. Recent versions have been changed to process the initializer-list differently, although some problems still remain. Since you're using GCC 4.8.5 the recent improvements won't help you anyway.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time?
Splitting the large initializer-list into separate statements that insert elements one-by-one will definitely reduce the compile time when using older versions of GCC. Each statement can be compiled very quickly that way, instead of having to compile a single huge initialization that requires allocating more and more memory for each element.

Store results in a separate library for later loading

If I have some set of results which can be calculated at compile-time, and I want to use them elsewhere in a program, can I place them in a (shared?) library for later linking? Will this be slower?
For example, I can calculate factorials at compile time using
template<size_t N>
struct Factorial {
constexpr static size_t value = Factorial<N-1>::value * N;
};
template<>
struct Factorial<0> {
constexpr static size_t value = 1;
};
// Possibly an instantiation for a max value?
// template class Factorial<50>;
Then to use this in code, I just write Factorial<32>::value, or similar.
If I assume that my real values take somewhat longer to compute, then I might want to ensure that they aren't recomputed on each build / on any build that considers the old build to be invalid.
Consequently, I move the calculating code into a separate project, which I compile as a shared library.
Now, to use it, I link my main program to the library, and #include the header.
However, the library file is rather small (and seemingly independent of the value passed to create the template), and so I wonder if in fact, the library is only holding the methods to create a Factorial struct, and not the precomputed data.
How can I calculate a series of values and then use them in a separate program?
Solutions which provide compile-time value injection would be preferred - I note that loading a shared library does not fall into this category (I think)
What's happening here is the actual "code" that does the calculation is still in the header. Putting it into a shared library didn't really do anything; the compiler is still recomputing the factorials for your main program. (So, your intuition is correct.)
A better approach is to write another program to spit out the values as a the source code for a C++ constant array, then copy and paste them into your code. This will probably take about 5 lines of Python, and your C++ code will compile and run quickly.
You could calculate the variables as part of your build process (through a seperated application which you compile and invoke as part of your build process) and store the result in a generated source file.
With CMake, (configure_file) Makefiles or NMake for example it should be very easy.
You could use the generated source file to generate a shared library as you suggested or you could link/ include the generated sources into your application directly.
An advantage of this approach is that you are not limited to compile time calculation anymore,
you could also use runtime calculation which will be faster since it can be optimized.

What is the better way to use an error code in C++?

I'm a member of project which uses c++11. I'm not sure when I should use error code for return values. I found RVO in c++ works great even if string and struct data are returned directly. But if I use a return code, I cannot get the benefit of RVO and the code is a little redundant.
So every time I declare function, I couldn't decide which I should use for a return value. How should I keep a consistency of my code? Any advice will help me.
// return string
MyError getString(string& out);
string getString();
// return struct
MyError getStructData(MyStruct& out);
MyStruct getStructData();
Usually, using exceptions instead of error codes is the preferred alternative in C++. This has several reasons:
As you already observed, using error codes as return values prevents you from using the return values for something more "natural"
Global error codes, are not threadsafe and have several other problems
Error codes can be ignored, exceptions can not
Error codes have to be evaluated after every function call that could possibly fail, so you have to litter your code with error handling logic
Exceptions can be thrown and passed several layers up in the call stack without extra handling
Of course there are environments where exceptions are not available, often due to platform restrictions, e.g. in embedded programming. In those cases error codes are the only option.
However, if you use error codes, be consistent with how you pass them. The most appealing use of error codes I have seen that does not occupy the return value spot and is still thread safe was passign a reference to a context object in each and every function. The context object would have global or per-thread information, including error codes for the last functions.

Failing compilation if return value is unused for a certain type

I would like to make compilation fail for some function call but not others. The function call that I want to fail are those that do not handle return values when the value is of a certain type. In the example below, not handling a function returning Error is a compilation error but not handling a function that returns anything else should succeed just fine.
Note: our runtime environment (embedded) does not allow us to use the following constructs: RTTI, exceptions.
This code only needs to compiler with Clang, I would prefer not having to annotate each function.
We prefer a solution that fails at compile time instead of at runtime.
enum class Error {
INVAL,
NOERR,
};
// do something that can fail.
Error DoThing();
// may return different return codes, we never care (we can't change prototype)
int DoIgnoredThing();
int main() {
DoThing(); // compilation failure here, unused "Error" result
DoIgnoredThing(); // compilation succeeds, OK to ignore unused "int" result
return 0;
}
I don't know of a way to do it with straight C++, but if you're using g++ you can use the warn_unused_result attribute along with the -Werror=unused-result command-line flag. See the documentation for warn_unused result for how to specify it (you'll have to specify it on every function unfortunately; I don't believe you can specify it for a type). Then the compiler flag will turn that warning into an error.
If you're not using g++, your compiler may have similar functionality.
You might find it easier to use a code analysis tool to scan the source code.
This might let you use the return type, as you requested, or a different marker for the functions to test like a comment to indicate which functions should be checked.
You might run the analysis tool as part of the compilation process or as part of a build server.
I've tried a few ways to make a class that would do what you want, but I haven't been successful. Have you considered making your "return value" an argument passed by reference? That's the most common way that I've seen APIs force you to pay attention to the return value. So instead of
Error DoThing();
you have
void DoThing(Error& e);
Anything that calls DoThing() has to pass in an Error object or they will get a compiler error. Again, not exactly what you asked for, but maybe good enough?

variably modified type in char[]

I have this struct. What I am trying to do is to have a continues ram space to memcpy them on hard drive. I have a dynamic created string which I will use as a key. I want to create a struct that can do this. I used templates and I made this.
template <class ItemType> struct INXM_Node {
ItemType key;
int left;
int right;
int next; // Used for queue.
} ;
I was running:
INXM_Node<char[100]> *root = new INXM_Node<char[100]>();
Everything was fine until I tried to change 100 with a variable. Then i got error:
'char [(((long unsigned int)(((long int)attrLength) - 1)) + 1u)]' is a variably modified type
What I ran was:
sizeof(INXM_Node<char[attrLength]>);
I am taking attrLength as an argument from a function.
I need to generate multiple structs with different char arrays.
The problem is that the compiler needs to know what type ItemType is at compile time. When you use a variable, it cannot know. The compiler attempts to specifically create each ItemType that will be used in the execution of your program. If you are using a variable length char array, the compiler does not know how much memory to allocate for that particular ItemType. You might consider using std::string
The type you use to instantiate a template must be fixed at compile time. When you compile with a template, specific code is emitted by the compiler, for the different types you use with the template. This can't be done at run time (there might not be a compiler available even) and it would be unreasonable and indeed impossible to expect it to be done for every possible type at compile time.
I think you're taking the wrong approach to your problem in general though. It would be better to use std::string as the key if you need the size to vary at run time and use something like boost::serialize to (portably and safely) save your data to disk.