Compile time initialization of an associative array - d

According to D Language Reference static initialization of associative arrays an associative array (AA) can be initialized this way:
immutable long[string] aa = [
"foo": 5,
"bar": 10,
"baz": 2000
];
void main()
{
import std.stdio : writefln;
writefln("(aa = %s)", aa);
}
However the example doesn't compile with a reasonable recent DMD:
$ dmd --version
DMD64 D Compiler v2.083.0
Copyright (C) 1999-2018 by The D Language Foundation, All Rights Reserved written by Walter Bright
$ dmd -de -w so_003.d
so_003.d(3): Error: non-constant expression ["foo":5L, "bar":10L, "baz":2000L]
A bit of googling seems to indicate this is a long standing bug (?) in the language:
Cannot initialize associative array
What is the syntax for declaring a constant string[char] AA?
Error in Defining an associative array in D
So I know how to work around that with a static constructor. However considering the issue have existed already about 10 years is this in practice turned into a feature ?
In fact that just a prelude to my actual question:
Is it possible to initialize an associative array in compile time ?
In the example below I can initialize module level string[] doubleUnits with a generator function that is run in compile-time (with CTFE) as proofed by pragma(msg). And I can initialize int[string] doubleUnitMap in run-time. But how I can initialize the AA in compile-time ?
import std.stdio : writefln;
immutable char[] units = ['a', 'b', 'c'];
immutable string[] doubleUnits = generateDoubleUnits(units);
pragma(msg, "compile time: ", doubleUnits);
string[] generateDoubleUnits(immutable char[] units)
pure
{
import std.format : format;
string[] buffer;
foreach(unit; units) {
buffer ~= format("%s%s", unit, unit);
}
return buffer;
}
immutable int[string] doubleUnitMap;
// pragma(msg) below triggers the following compilation error:
// Error: static variable doubleUnitMap cannot be read at compile time
// while evaluating pragma(msg, "compile time: ", doubleUnitMap)
// pragma(msg, "compile time: ", doubleUnitMap);
shared static this() {
doubleUnitMap = generateDoubleUnitMap(units);
}
int[string] generateDoubleUnitMap(immutable char[] units)
pure
{
import std.format : format;
int[string] buffer;
foreach(unit; units) {
string key = format("%s%s", unit, unit);
buffer[key] = 1;
}
return buffer;
}
void main()
{
writefln("(doubleUnits = %s)", doubleUnits);
writefln("(doubleUnitMap = %s)", doubleUnitMap);
}

It is not possible to do the built-in AAs initialized at compile time because the compiler is ignorant of the runtime format. It knows the runtime interface and it knows the compile time memory layout... but the runtime memory layout is delegated to the library, so the compiler doesn't know how to form it. Hence the error.
But, it you were to implement your own AA type implementation, then you can write the CTFE code to lay it out and then the compiler could make it at compile time.
Many years ago, this was proposed as a fix - replace the built-in magic implementation with a library AA that happens to fit the compiler's interface. Then it could do it all. The problem was library types cannot express all the magic the built in associative arrays do. I don't remember the exact problems, but I think it was about const and other attribute interaction.
But that said, even if it failed for a 100% replacement, your own implementation of a 90% replacement may well be good enough for you. The declarations will look different - MyAA!(string, int) instead of string[int], and the literals for it are different (though possibly makeMyAA(["foo" : 10]); a helper ctfe function that takes a built-in literal and converts it to your format), but the usage will be basically the same thanks to operator overloading.
Of course, implementing your own AA can be a bit of code and maybe not worth it, but it is the way to make it work if CT initialization is a must have.
(personally I find the static constructor to be plenty good enough...)

At the moment, that is not possible (as described in the language specification document). I've submitted a change in the spec with a note that the feature is not yet implemented. It is definitely planned, but not yet implemented...

Related

C++ wrapper for C-API: Exploring best options for passing `char*`

There have been a number of questions on similar topics but none which I found that explore the options in this way.
Often we need to wrap a legacy C-API in C++ to use it's very good functionality while protecting us from the vagaries. Here we will focus just on one element. How to wrap legacy C-functions which accept char* params. The specific example is for an API (the graphviz lib) which accepts many of its params as char* without specifying if that is const or non-const. There appears to be no attempt to modify, but we can't be 100% sure.
The use case for the wrapper is that we want to conveniently call the C++ wrapper with a variety of "stringy" properties names and values, so string literals, strings, const strings, string_views, etc. We want to call both singly during setup where performance is non-critical and in the inner loop, 100M+ times, where performance does matter. (Benchmark code at bottom)
The many ways of passing "strings" to functions have been explained elsewhere.
The code below is heavily commented for 4 options of the cpp_wrapper() function being called 5 different ways.
Which is the best / safest / fastest option? Is it a case of Pick 2?
#include <array>
#include <cassert>
#include <cstdio>
#include <string>
#include <string_view>
void legacy_c_api(char* s) {
// just for demo, we don't really know what's here.
// specifically we are not 100% sure if the code attempts to write
// to char*. It seems not, but the API is not `const char*` eventhough C
// supports that
std::puts(s);
}
// the "modern but hairy" option
void cpp_wrapper1(std::string_view sv) {
// 1. nasty const_cast. Does the legacy API modifY? It appears not but we
// don't know.
// 2. Is the string view '\0' terminated? our wrapper api can't tell
// so maybe an "assert" for debug build checks? nasty too?!
// our use cases below are all fine, but the API is "not safe": UB?!
assert((int)*(sv.data() + sv.size()) == 0);
legacy_c_api(const_cast<char*>(sv.data()));
}
void cpp_wrapper2(const std::string& str) {
// 1. nasty const_cast. Does the legacy API modifY? It appears not but we
// don't know. note that using .data() would not save the const_cast if the
// string is const
// 2. The standard says this is safe and null terminated std::string.c_str();
// we can pass a string literal but we can't pass a string_view to it =>
// logical!
legacy_c_api(const_cast<char*>(str.c_str()));
}
void cpp_wrapper3(std::string_view sv) {
// the slow and safe way. Guaranteed be '\0' terminated.
// is non-const so the legacy can modfify if it wishes => no const_cast
// slow copy? not necessarily if sv.size() < 16bytes => SBO on stack
auto str = std::string{sv};
legacy_c_api(str.data());
}
void cpp_wrapper4(std::string& str) {
// efficient api by making the proper strings in calling code
// but communicates the wrong thing altogether => effectively leaks the c-api
// to c++
legacy_c_api(str.data());
}
// std::array<std::string_view, N> is a good modern way to "store" a large array
// of "stringy" constants? they end up in .text of elf file (or equiv). They ARE
// '\0' terminated. Although the sv loses that info. Used in inner loop => 100M+
// lookups and calls to legacy_c_api;
static constexpr const auto sv_colours =
std::array<std::string_view, 3>{"color0", "color1", "color2"};
// instantiating these non-const strings seems wrong / a waste (there are about
// 500 small constants) potenial heap allocation in during static storage init?
// => exceptions cannot be caught... just the wrong model?
static auto str_colours =
std::array<std::string, 3>{"color0", "color1", "color2"};
int main() {
auto my_sv_colour = std::string_view{"my_sv_colour"};
auto my_str_colour = std::string{"my_str_colour"};
cpp_wrapper1(my_sv_colour);
cpp_wrapper1(my_str_colour);
cpp_wrapper1("literal_colour");
cpp_wrapper1(sv_colours[1]);
cpp_wrapper1(str_colours[2]);
// cpp_wrapper2(my_sv_colour); // compile error
cpp_wrapper2(my_str_colour);
cpp_wrapper2("literal_colour");
// cpp_wrapper2(colours[1]); // compile error
cpp_wrapper2(str_colours[2]);
cpp_wrapper3(my_sv_colour);
cpp_wrapper3(my_str_colour);
cpp_wrapper3("literal_colour");
cpp_wrapper3(sv_colours[1]);
cpp_wrapper3(str_colours[2]);
// cpp_wrapper4(my_sv_colour); // compile error
cpp_wrapper4(my_str_colour);
// cpp_wrapper4("literal_colour"); // compile error
// cpp_wrapper4(sv_colours[1]); // compile error
cpp_wrapper4(str_colours[2]);
}
Benchmark code
Not entirely realistic yet, because work in C-API is minimal and non-existent in C++ client. In the full app I know that I can do 10M in <1s. So just changing between these 2 API abstraction styles looks like it might be a 10% change? Early days...needs more work. Note: that's with a short string which fits in SBO. Longer ones with heap allocation just blow it out completely.
#include <benchmark/benchmark.h>
static void do_not_optimize_away(void* p) {
asm volatile("" : : "g"(p) : "memory");
}
void legacy_c_api(char* s) {
// do at least something with the string
auto sum = std::accumulate(s, s+6, 0);
do_not_optimize_away(&sum);
}
// ... wrapper functions as above: I focused on 1&3 which seem
// "the best compromise".
// Then I added wrapper4 because there is an opportunity to use a
// different signature when in main app's tight loop.
void bench_cpp_wrapper1(benchmark::State& state) {
for (auto _: state) {
for (int i = 0; i< 100'000'000; ++i) cpp_wrapper1(sv_colours[1]);
}
}
BENCHMARK(bench_cpp_wrapper1);
void bench_cpp_wrapper3(benchmark::State& state) {
for (auto _: state) {
for (int i = 0; i< 100'000'000; ++i) cpp_wrapper3(sv_colours[1]);
}
}
BENCHMARK(bench_cpp_wrapper3);
void bench_cpp_wrapper4(benchmark::State& state) {
auto colour = std::string{"color1"};
for (auto _: state) {
for (int i = 0; i< 100'000'000; ++i) cpp_wrapper4(colour);
}
}
BENCHMARK(bench_cpp_wrapper4);
Results
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
bench_cpp_wrapper1 58281636 ns 58264637 ns 11
bench_cpp_wrapper3 811620281 ns 811632488 ns 1
bench_cpp_wrapper4 147299439 ns 147300931 ns 5
Correct first, then optimize if needed.
wrapper1 has at least two potential instances of undefined behavior: The dubious const_cast, and (in debug versions) possibly accessing an element past the end of an array. (You can create a pointer to one element past the last, but you cannot access it.)
wrapper2 also has a dubious const_case, potentially invoking undefined behavior.
wrapper3 doesn't rely on any UB (that I see).
wrapper4 is similar to wrapper3, but exposes details you're trying to encapsulate.
Start by doing the most correct thing, which is to copy the strings and pass a pointer to the copy, which is wrapper3.
If performance is unacceptable in the tight loop, you can look at alternatives. The tight loop may use only a subset of the interfaces. The tight loop may be heavily biased toward short strings or long strings. The compiler might inline enough of your wrapper in the tight loop that it's effectively a no-op. These factors will affect how (and if) you solve the performance problem.
Alternative solutions might involve caching to reduce the number of copies made, investigating the underlying library enough to make some strategic changes (like changing the underlying library to use const where it can), or by making an overload that exposes the char * and passes it straight through (which shifts the burden to the caller to know what's right).
But all of that is implementation detail: Design the API for usability by the callers.
Is the string view '\0' terminated?
If it happens to point to null terminated string, then sv.data() may be null terminated. But string view does not need to be null terminated, so one should not assume that it is. Thus cpp_wrapper1 is a bad choice.
Does the legacy API modifY? .. we don't know.
If you don't know whether the API modifies the string, then you cannot use const, so cpp_wrapper2 is not an option.
One thing to consider is whether a wrapper is necessary. Most efficient solution is to pass a char*, which is just fine in C++. If using const strings is a typical operation, then cpp_wrapper3 may be useful - but is it typical considering the operations may modify the string? cpp_wrapper4 is more efficient than 3, but not as efficient as plain char* if you don't already have a std::string.
You can provide all of the options mentioned above as overloads.

Conditionally Declaring an Enum at Runtime (C++)

I've come across some interesting findings relating to runtime detection that has spurred an interesting question. If I want to declare a global variable based off a condition, I can easily do so during pre-compilation with #ifdefs. However, this produces different binaries, and I am looking for a solution that can be realized during runtime so as to constrain behavior to a single binary.
Example
A classic example that I can perform with arrays or other data types (note, the data types here are all const and must remain so - mirroring the immutable behavior of an enum):
Original (Compile Time):
#ifdef CONDITION
static const int faveNums[] = {......};
#else
static const int faveNums[] = {******};
#endif
Modified (Run Time):
static const int conditonalFaveNums[] = {......};
static const int defaultFaveNums[] = {******};
static const int * const faveNums = IsConditionTrue() ? conditonalFaveNums : defaultFaveNums;
Common Pitfall (Scoping):
This is a common pitfall that will not work, as if/switch conditionals are scoped and therefore unable to be referenced later - thank goodness for the ternary operator!
if(IsConditionTrue())
{
static const int faveNums[] = {......};
}
else
{
static const int faveNums[] = {******};
}
Problem
However, the situation appears to change with enums. Let's try the run time solution, but with an enum this time:
enum conditionalFaveNums = {......};
enum defaultFaveNums = {******};
enum faveNums = IsConditionTrue() ? conditonalFaveNums : defaultFaveNums;
This will not compile.
Compile time defines (as with the first example) can solve this, but is there a way to solve this at run time by conditionally declaring a global enum in C++?
While you can't do exactly what you're asking - the difference between your array and enum examples being that the array is simply data, whereas the enum is a type, and type definitions must be resolvable at compile time - perhaps a more helpful answer is that this is a good thing.
Dynamic data should be represented in a dynamic data structure. std::set is a pretty close conceptual match to an enum, and provides many useful (and efficient) methods that may come in handy later. Even better might be defining an enum listing all possible values at compile time, and then dynamically constructing a set of these values based on runtime information. (The set is thus a proper subset of the enum's range.)
You cannot do this. Key points to remember:
Values of variables can be initialized differently based on run time information.
Types MUST be defined/set at compile time.
You can use
static const int * const faveNums = IsConditionTrue() ? conditonalFaveNums : defaultFaveNums;
since that says what faveNums is to be initialized to at run time, using run time information.
You cannot use
enum faveNums = IsConditionTrue() ? conditonalFaveNums : defaultFaveNums;
since that tries to define the type faveNums using run time information.

What is the benefit of using #define to declare a constant? [duplicate]

This question already has answers here:
"static const" vs "#define" vs "enum"
(17 answers)
Closed 7 years ago.
I have seen a lot of programs using #define at the beginning. Why shouldn't I declare a constant global variable instead ?
(This is a C++ answer. In C, there is a major advantage to using macros, which is that they are pretty much the only way you can get a true constant-expression.)
What is the benefit of using #define to declare a constant?
There isn't one.
I have seen a lot of programs using #define at the beginning.
Yes, there is a lot of bad code out there. Some of it is legacy, and some of it is due to incompetence.
Why shouldn't I declare a constant global variable instead ?
You should.
A const object is not only immutable, but has a type and is far easier to debug, track and diagnose, since it actually exists at compilation time (and, crucially, has a name in a debug build).
Furthermore, if you abide by the one-definition rule, you don't have to worry about causing an almighty palaver when you change the definition of a macro and forget to re-compile literally your entire project, and any code that is a dependent of that project.
And, yes, it's ironic that const objects are still called "variables"; of course, in practice, they are not variable in the slightest.
What is the benefit of using #define to declare a constant?
Declaring a constant with #define is a superior alternative to using literals and magic numbers (that is, code is much better off with a value defined as #define NumDaysInWeek (7) than simply using 7), but not a superior alternative to defining proper constants.
You should declare a constant instead of #define-ing it, for the following reasons:
#define performs a token/textual replacement in the source code, not a semantic replacement.
This screws up namespace use (#defined variables are replaced with values and not containing a fully qualified name).
That is, given:
namespace x {
#define abc 1
}
x::abc is an error, because the compiler actually tries to compile x::1 (which is invalid).
abc on the other hand will always be seen as 1, forbidding you from redefining/reusing the identifier abc in any other local context or namespace.
#define inserts it's parameters textually, instead of as variables:
#define max(a, b) a > b ? a : b;
int a = 10, b = 5;
int c = max(a++, b); // (a++ > b ? a++ : b); // c = 12
#define has absolutely no semantic information:
#define pi 3.14 // this is either double or float, depending on context
/*static*/ const double pi = 3.14; // this is always double
#define makes you (the developer) see different code than the compiler
This may not be a big thing, but the errors created this way are obscure, unexpected and waste a lot of time (you could look at an error, where the code looks perfectly fine to you, and curse the compiler for half a day, only to discover later, that one of the symbols in your expression actually means something completely different).
If you get with a debugger to code using one of the declarations of pi above, the first one will cause the debugger to tell you that pi is an invalid symbol.
Edit (valid example for a local static const variable):
const result& some_class::some_function(const int key) const
{
if(map.count(key)) // map is a std::map<int,result> member of some_class
return map.at(key); // return a (const result&) to existing element
static const result empty_value{ /* ... */ }; // "static" is required here
return empty_value; // return a (const result&) to empty element
}
This shows a case when you have a const value, but it's storage needs to outlast the function, because you are returning a const reference (and the value doesn't exist in the data of some_class). It's a relatively rare case, but valid.
According to the "father" of C++, Stroustroup, defining constants using macros should be avoided.
The biggest Problems when using macros as constants include
Macros override all occurrences in the code. e.g. also variable definitions. This may result in compile Errors or undefined behavior.
Macros make the code very difficult to read and understand because the complexity of a macro can be hidden in a Header not clearly visible to the programmer

Object.Error: Access Violation when printing result of std.algorithm.cartesianProduct

I'm using DMD 2.062 for x86.
module test;
private enum test1
{
one,
two,
three,
}
private enum test2
{
one,
two,
three,
}
auto ct = cartesianProduct([EnumMembers!test1], [EnumMembers!test2]);
unittest
{
import std.stdio;
foreach (n, m; ct)
{
writeln(n, " ", m);
}
}
This program prints out:
one one
two one
three one
Then an access violation error is thrown. Am I using cartesianProduct incorrectly, or is this a bug in the function?
Tiny bit of both, probably. The issue here is that ct is attempted to be evaluated at compile-time and produces result range that is used in run-time. I guess either CTFE or cartesianProduct does not expect such scenario and something bad happens that involves using invalid memory. I think it should have either work, or be a compile-time error, but that won't help you and belongs to bug tracker.
What does matter here, though is that everything will work if you move ct initialization to unit-test body or static this() module constructor. What you seem to miss is that D does not support initialization of global variables at program start-up. Value assigned to global is always evaluated at compile-time, which often "just works", often results in compile-time error (if initialization is not CTFE-able) and in this case results in weird behavior :)
What you may want is this code:
auto Test1Members = [ EnumMembers!test1 ];
auto Test2Members = [ EnumMembers!test2 ];
alias CT = typeof(cartesianProduct(Test1Members, Test2Members));
CT ct;
static this()
{
ct = cartesianProduct(Test1Members, Test2Members);
}
In general, interconnection between compile-time data and run-time data for complex types as arrays or associative arrays is very tricky with current D front-end implementation and requires lot of attention.

Is a compiletime constant index into a compiletime constant array itself compiletime constant?

I am trying to play fancy games which have the C++ compiler synthesize hash values of constant strings at compiletime. This would let me replace the string with a single identifier, with a massive savings in code size and complexity.
For programming clarity and ease, it'd be awesome if I could examine and compute at compiletime with simple inline character strings like "Hello" which are compiletime constant pointers to compiletime constant chars.
If I can index into these at compiletime, I can make a template metaprogram to do what I want. But it is unclear if the C++ standard treats a ct-constant index of a ct-constant array as ct-constant by itself.
Asked another way,
const char v="Hello"[0];
is quite valid C++ (and C). But is the value v a compile time constant?
I already believe the answer is no, but in practice some compilers accept it without even any warning, much less error. For example, the following compiles and runs without even a single warning from Intel's C++ compiler:
#include <iostream>
const char value="Hello"[0];
template<char c> void printMe()
{
std::cout << "Template Val=" << c << std::endl;
}
int main()
{
const char v='H';
printMe<'H'>();
printMe<v>();
printMe<value>(); // The tricky and interesting case!
}
However, Microsoft's compiler will not compile at all, giving a reasonably coherent error message about using a template with an object with internal linkage.
I suspect the answer to my question is "No, you can't assume any array reference even to a constant array with a constant index is constant at compiletime". Does this mean the Intel compiler's successful execution is a bug in the Intel compiler?
It doesn't work on GCC either.
However, outside of a language-compliance viewpoint, it's nice that the compiler optimiser does treat it as a character constant, pretty much. I exploited that fact to allow preprocessor-generated character constants (by using *#foo). See http://cvs.openbsd.org/cgi-bin/query-pr-wrapper?full=yes&numbers=1652, in file hdr.h. With that macro, you could write
DECR(h, e, l, l, o)
rather than
DECR('h', 'e', 'l', 'l', 'o')
Much more readable, in my view. :-)
Good question, yes this can be done, and its fine with the standards, and it'll work on Microsoft, GCC, and Intel, problem is you have the syntax wrong :)
One second I'll cook up a sample...
Ok done, here it is. This sample is valid C++, and I've used it quite often, but indeed most programmers don't know how to get it right.
template<char* MSG>
class PrintMe
{
public:
void operator()()
{
printf(MSG);
}
};
char MyMessage[6] = "Hello"; //you can't use a char*, but a char array is statically safe and determined at compiletime
int main(int argc, char* argv[])
{
PrintMe<MyMessage> printer;
printer();
return 0;
}
The relevant difference here is the difference between a "Integral Constant Expression" and a mere compile-time constant. "3.0" is a compile-time constant. "int(3.0)" is a compile-time constant, too. But only "3" is an ICE. [See 5.19]
More details at boost.org