I've implemented a constexpr map array lookup based on this SO answer, but it leaves me wondering now what the memory overhead might be like if the map array is very large, and what other gotchas might exist with this technique, particularly if the constexpr function cannot be resolved at compile time.
Here is a contrived code example that hopefully makes my question more clear:
example.h:
enum class MyEnum
{
X0,
X1,
X2,
X3,
X4,
X5
};
struct MyStruct
{
const MyEnum type;
const char* id;
const char* name;
const int size;
};
namespace
{
constexpr MyStruct myMap[] = {
{MyEnum::X0,"X0","Test 0", 0},
{MyEnum::X1,"X1","Test 1", 1},
{MyEnum::X2,"X2","Test 2", 2},
{MyEnum::X3,"X3","Test 3", 3},
{MyEnum::X4,"X4","Test 4", 4},
{MyEnum::X5,"X5","Test 5", 5},
};
constexpr auto mapSize = sizeof myMap/sizeof myMap[0];
}
class invalid_map_exception : public std::exception {};
// Retrieves a struct based on the associated enum
inline constexpr MyStruct getStruct(MyEnum key, int range = mapSize) {
return (range == 0) ? (throw invalid_map_exception()):
(myMap[range - 1].type == key) ? myMap[range - 1]:
getStruct(key, range - 1);
};
example.cpp:
#include <iostream>
#include <vector>
#include "example.h"
int main()
{
std::vector<MyEnum> enumList = {MyEnum::X0, MyEnum::X1, MyEnum::X2, MyEnum::X3, MyEnum::X4, MyEnum::X5};
int idx;
std::cout << "Enter a number between 0 and 5:" << std::endl;
std::cin >> idx;
MyStruct test = getStruct(enumList[idx]);
std::cout << "choice name: " << test.name << std::endl;
return 0;
}
Output:
Enter a number between 0 and 5:
1
choice name: Test 1
Compiled with g++ with -std=c++14.
In the above example, although getStruct is a constexpr function, it cannot be fully resolved until runtime since the value of idx is not known until then. May that change the memory overhead when compiled with optimization flags, or would the full contents of myMap be included in the binary regardless? Does it depend on the compiler and optimization setting used?
Also, what if the header file is included in multiple translation units? Would myMap be duplicated in each one?
I imagine this could be important if the map array becomes enormous and/or the code is going to be used in more resource constrained environments such as embedded devices.
Are there any other potential gotchas with this approach?
If you call a constexpr function with a non-constant expression, it will call the function at run time.
If you call getStruct with a constant expression, the compiler can just call the function at compile time. Then, the getStruct function will be "unused" at runtime, and the compiler will probably optimise it out. At this point, myMap will also be unused, and be optimised out.
In terms of runtime size, it would actually probably be smaller than an std::unordered_map or std::map; It literally stores the minimum information necessary. But it's lookup time would be a lot slower, as it has to compare all the elements individually in O(N) time, so it doesn't actually do what a map does (reduce lookup time).
If you want to make it more likely that it is optimised out, I would ensure that it is only used in constant-expression situations:
template<MyEnum key>
struct getStruct
{
static constexpr const MyStruct value = _getStruct(key);
}
Here's some compiler output that shows that the map is optimised out entirely
And about including it in multiple translation units, it would be duplicated in every one since you use an anonymous namespace to define it. If it was optimised out in all of them, there would be no overhead, but it would still be duplicated for every translation unit that you do a runtime lookup in.
Related
I would like to perform range check for a std::array at compile time. Here is an example:
#include <iostream>
#include <array>
void rarelyUsedFunction(const std::array<double, 2>& input)
{
std::cout << input[5] << std::endl;
}
int main()
{
std::array<double, 2> testArray;
rarelyUsedFunction(testArray);
}
If I compile this with g++ there is no warning or error, despite the undefined access to an element which is not in the array. The compiled program just prints some random value.
Is there a compiler option in g++ for a suitable range/boundary check, that is performed during compile time? I know that I can add "-D_GLIBCXX_DEBUG" but this will only perform a check during runtime. If I have a function which is not called very often, this won't be triggered.
I am aware, that such a range check could not be performed in all circumstances, but in the case above, the compiler should be able to spot the problem!?
As mentioned in the comments, std::get(std::array) will do that nicely for you, since it is obligated to do such bounds checking:
I must be an integer value in range [0, N). This is enforced at compile time as opposed to at() or operator[].
In your example, it would look like this:
void rarelyUsedFunction(const std::array<double, 2>& input)
{
std::cout << std::get<5>(input) << std::endl; // <---- Compilation error!
}
If the index is not just a literal, you can still calculate it with "complex" code as long as you manage to stuff it in a constexpr variable:
void rarelyUsedFunction(const std::array<double, 2>& input)
{
constexpr std::size_t index = /* Whatever, as long as it compiles... */;
std::cout << std::get<index>(input) << std::endl;
}
Obviously, in either case, this involves providing the compiler with a hard guarantee that the index is known at compile time.
At the moment, we have two primary options for compile-time evaluation: template metaprogramming (generally using template structs and/or variables), and constexpr operations1.
template<int l, int r> struct sum_ { enum { value = l + r }; }; // With struct.
template<int l, int r> const int sum = sum_<l, r>::value; // With struct & var.
template<int l, int r> const int sub = l - r; // With var.
constexpr int mul(int l, int r) { return l * r; } // With constexpr.
Of these, we are guaranteed that all four can be evaluated at compile time.
template<int> struct CompileTimeEvaluable {};
CompileTimeEvaluable<sum_<2, 2>::value> template_struct; // Valid.
CompileTimeEvaluable<sum<2, 2>> template_struct_with_helper_var; // Valid.
CompileTimeEvaluable<sub<2, 2>> template_var; // Valid.
CompileTimeEvaluable<mul(2, 2)> constexpr_func; // Valid.
We can also guarantee that the first three will only be evaluable at compile time, due to the compile-time nature of templates; we cannot, however, provide this same guarantee for constexpr functions.
int s1 = sum_<1, 2>::value;
//int s2 = sum_<s1, 12>::value; // Error, value of i not known at compile time.
int sv1 = sum<3, 4>;
//int sv2 = sum<s1, 34>; // Error, value of i not known at compile time.
int v1 = sub<5, 6>;
//int v2 = sub<v1, 56>; // Error, value of i not known at compile time.
int c1 = mul(7, 8);
int c2 = mul(c1, 78); // Valid, and executed at run time.
It is possible to use indirection to provide an effective guarantee that a given constexpr function can only be called at compile time, but this guarantee breaks if the function is accessed directly instead of through the indirection helpers (as noted in the linked answer's comments). It is also possible to poison a constexpr function such that calling it at runtime becomes impossible, by throwing an undefined symbol, thus providing this guarantee by awkward hack. Neither of these seems optimal, however.
Considering this, my question is thus: Including current standards, C++20 drafts, proposals under consideration, experimental features, and anything else of the sort, is there a way to provide this guarantee without resorting to hacks or indirection, using only features and tools built into and/or under consideration for being built into the language itself? [Such as, for example, an attribute such as (both theoretical) [[compile_time_only]] or [[no_runtime]], usage of std::is_constant_evaluated, or a concept, perhaps?]
1: Macros are technically also an option, but... yeah, no.
C++20 added consteval for this express purpose. A consteval function is a constexpr function that is guaranteed to be only called at compile time.
I want to optimize a little programm/library i'm writing and since 2 weeks i'm somewhat stuck and now wondering if what i had in mind is even possible like that.
(Please be gentle i don't have very much experience in meta-programming.)
My goal is of course to have certain computations be done by the compiler, so that the programmer - hopefully - only has to edit code at one point in the program and have the compiler "create" all the boilerplate. I do have a resonably good idea how to do what i want with macros, but it is wished that i do it with templates if possible.
My goal is:
Lets say i have a class that a using programmer can derive from. There he can have multiple incoming and outgoing datatypes that i want to register somehow so that the base class can do i'ts operations on them.
class my_own_multiply : function_base {
in<int> a;
in<float> b;
out<double> c;
// ["..."] // other content of the class that actually does something but is irrelevant
register_ins<a, b> ins_of_function; // example meta-function calls
register_outs<c> outs_of_function;
}
The meta-code i have up till now is this: (but it's not jet working/complete)
template <typename... Ts>
struct register_ins {
const std::array<std::unique_ptr<in_type_erasured>, sizeof...(Ts)> ins;
constexpr std::array<std::unique_ptr<in_type_erasured>, sizeof...(Ts)>
build_ins_array() {
std::array<std::unique_ptr<in_type_erasured>, sizeof...(Ts)> ins_build;
for (unsigned int i = 0; i < sizeof...(Ts); ++i) {
ins_build[i] = std::make_unique<in_type_erasured>();
}
return ins_build;
}
constexpr register_ins() : ins(build_ins_array()) {
}
template <typename T>
T getValueOf(unsigned int in_nr) {
return ins[in_nr]->getValue();
}
};
As you may see, i want to call my meta-template-code with a variable number of ins. (Variable in the sens that the programmer can put however many he likes in there, but they won't change at runtime so they can be "baked" in at compile time)
The meta-code is supposed to be creating an array, that is of the lengt of the number of ins and is initialized so that every field points to the original in in the my_own_multiply class. Basically giving him an indexable data structure that will always have the correct size. And that i could access from the function_base class to use all ins for certain functions wich are also iterable making things convinient for me.
Now i have looked into how one might do that, but i now am getting the feeling that i might not really be allowed to "create" this array at compile time in a fashion that allows me to still have the ins a and b be non static and non const so that i can mutate them. From my side they wouldn't have to be const anyway, but my compliler seems to not like them to be free. The only thing i need const is the array with the pointers. But using constexpr possibly "makes" me make them const?
Okay, i will clarify what i don't get:
When i'm trying to create an "instance" of my meta-stuff-structure then it fails because it expects all kinds of const, constexpr and so on. But i don't want them since i need to be able to mutate most of those variables. I only need this meta-stuff to create an array of the correct size already at compile time. But i don't want to sacrifice having to make everything static and const in order to achive this. So is this even possible under these kinds of terms?
I do not get all the things you have in mind (also regarding that std::unique_ptr in your example), but maybe this helps:
Starting from C++14 (or C++11, but that is strictly limited) you may write constexpr functions which can be evaluated at compile-time. As a precondition (in simple words), all arguments "passed by the caller" must be constexpr. If you want to enforce that the compiler replaces that "call" by the result of a compile-time computation, you must assign the result to a constexpr.
Writing usual functions (just with constexpr added) allows to write code which is simple to read. Moreover, you can use the same code for both: compile-time computations and run-time computations.
C++17 example (similar things are possible in C++14, although some stuff from std is just missing the constexpr qualifier):
http://coliru.stacked-crooked.com/a/154e2dfcc41fb6c7
#include <cassert>
#include <array>
template<class T, std::size_t N>
constexpr std::array<T, N> multiply(
const std::array<T, N>& a,
const std::array<T, N>& b
) {
// may be evaluated in `constexpr` or in non-`constexpr` context
// ... in simple man's words this means:
// inside this function, `a` and `b` are not `constexpr`
// but the return can be used as `constexpr` if all arguments are `constexpr` for the "caller"
std::array<T, N> ret{};
for(size_t n=0; n<N; ++n) ret[n] = a[n] * b[n];
return ret;
}
int main() {
{// compile-time evaluation is possible if the input data is `constexpr`
constexpr auto a = std::array{2, 4, 6};
constexpr auto b = std::array{1, 2, 3};
constexpr auto c = multiply(a, b);// assigning to a `constexpr` guarantees compile-time evaluation
static_assert(c[0] == 2);
static_assert(c[1] == 8);
static_assert(c[2] == 18);
}
{// for run-time data, the same function can be used
auto a = std::array{2, 4, 6};
auto b = std::array{1, 2, 3};
auto c = multiply(a, b);
assert(c[0] == 2);
assert(c[1] == 8);
assert(c[2] == 18);
}
return 0;
}
I was wondering whether sorting an array of std::pair is faster, or an array of struct?
Here are my code segments:
Code #1: sorting std::pair array (by first element):
#include <algorithm>
pair <int,int> client[100000];
sort(client,client+100000);
Code #2: sort struct (by A):
#include <algorithm>
struct cl{
int A,B;
}
bool cmp(cl x,cl y){
return x.A < y.A;
}
cl clients[100000];
sort(clients,clients+100000,cmp);
code #3: sort struct (by A and internal operator <):
#include <algorithm>
struct cl{
int A,B;
bool operator<(cl x){
return A < x.A;
}
}
cl clients[100000];
sort(clients,clients+100000);
Update: I used these codes to solve a problem in an online Judge. I got time limit of 2 seconds for code #1, and accept for code #2 and #3 (ran in 62 milliseconds). Why code #1 takes so much time in comparison to other codes? Where is the difference?
You know what std::pair is? It's a struct (or class, which is the same thing in C++ for our purposes). So if you want to know what's faster, the usual advice applies: you have to test it and find out for yourself on your platform. But the best bet is that if you implement the equivalent sorting logic to std::pair, you will have equivalent performance, because the compiler does not care whether your data type's name is std::pair or something else.
But note that the code you posted is not equivalent in functionality to the operator < provided for std::pair. Specifically, you only compare the first member, not both. Obviously this may result in some speed gain (but probably not enough to notice in any real program).
I would estimate that there isn't much difference at all between these two solutions.
But like ALL performance related queries, rather than rely on someone on the internet telling they are the same, or one is better than the other, make your own measurements. Sometimes, subtle differences in implementation will make a lot of difference to the actual results.
Having said that, the implementation of std::pair is a struct (or class) with two members, first and second, so I have a hard time imagining that there is any real difference here - you are just implementing your own pair with your own compare function that does exactly the same things that the already existing pair does... Whether it's in an internal function in the class or as an standalone function is unlikely to make much of a difference.
Edit: I made the following "mash the code together":
#include <algorithm>
#include <iostream>
#include <iomanip>
#include <cstdlib>
using namespace std;
const int size=100000000;
pair <int,int> clients1[size];
struct cl1{
int first,second;
};
cl1 clients2[size];
struct cl2{
int first,second;
bool operator<(const cl2 x) const {
return first < x.first;
}
};
cl2 clients3[size];
template<typename T>
void fill(T& t)
{
srand(471117); // Use same random number each time/
for(size_t i = 0; i < sizeof(t) / sizeof(t[0]); i++)
{
t[i].first = rand();
t[i].second = -t[i].first;
}
}
void func1()
{
sort(clients1,clients1+size);
}
bool cmp(cl1 x, cl1 y){
return x.first < y.first;
}
void func2()
{
sort(clients2,clients2+size,cmp);
}
void func3()
{
sort(clients3,clients3+size);
}
void benchmark(void (*f)(), const char *name)
{
cout << "running " << name << endl;
clock_t time = clock();
f();
time = clock() - time;
cout << "Time taken = " << (double)time / CLOCKS_PER_SEC << endl;
}
#define bm(x) benchmark(x, #x)
int main()
{
fill(clients1);
fill(clients2);
fill(clients3);
bm(func1);
bm(func2);
bm(func3);
}
The results are as follows:
running func1
Time taken = 10.39
running func2
Time taken = 14.09
running func3
Time taken = 10.06
I ran the benchmark three times, and they are all within ~0.1s of the above results.
Edit2:
And looking at the code generated, it's quite clear that the "middle" function takes quite a bit longer, since the comparison is made inline for pair and struct cl2, but can't be made inline for struct cl1 - so every compare literally makes a function call, rather than a few instructions inside the functions. This is a large overhead.
I have an optimisation algorithm which finds the best partition of a graph.
There are many measures for the quality of a partition (the variable being optimised), so I thought it would be a good idea to use function pointers to these quality functions, and pass that into my optimisation algorithm function.
This works fine, but the problem is different quality functions take some different arguments.
For example one quality function is find_linearised_stability and it requires a markov_time parameter:
float find_linearised_stability(cliques::Graph<T> &my_graph, cliques::Partition &my_partition,
std::vector<float> &markov_times, std::vector<float> &stabilities)
and is used in the optimisation function :
cliques::find_optimal_partition_louvain(my_new_graph, markov_times, &cliques::find_linearised_stability);
however another quality function find_modularityrequires no markov_time parameter. Of course I could just include it as an argument and not use it in the function but that seems like bad practice, and would get unwieldy once I start adding a lot of different quality functions.
What is a better design for this kind of situation?
Use function objects. One of those function objects can have a markov_time member that is passed in to the constructor:
struct find_linearised_stability {
std::vector<float> & markov_times_;
find_linearised_stability(std::vector<float> & markov_times)
:markov_times_(markov_times)
{}
float operator () (cliques::Graph<T> &my_graph, cliques::Partition &my_partition,
std::vector<float> &stabilities)
{
// use markov_times_ in here, we didn't need to pass it since it's a member
}
};
(you may need to make adjustments to constness/referenceness to suit your needs)
Then you can call your function like this:
cliques::find_optimal_partition_louvain(my_new_graph, cliques::find_linearised_stability(markov_times));
"what type for the function object do I use when declaring the ... function?"
Make it a function template that takes the function object type as a template parameter, thusly:
template<typename PR>
whatever find_optimal_partition_louvain(my_new_graph, PR & pr)
{
...
pr(my_new_graph, partition, stabilities);
...
}
Your only option is boost::bind or something like it stored in a boost::function or something like it.
If profiling shows that to be too slow then you'll be stuck with the "poor practice" version because any alternative is going to run afoul of UB and/or end up being just as 'slow' as the more reasonable alternative.
parameter is not known before: add argument to every function (reference/pointer) that contains all info, every function uses whatever it needs
parameter is known before: use boost::bind, e.g.:
sample source code:
#include <iostream>
#include <cstddef>
#include <algorithm>
#include <boost/bind.hpp>
using namespace std;
void output(int a, int b)
{
cout << a << ", " << b << '\n';
}
int main()
{
int arr[] = { 1, 2, 3, 4, 5 };
for_each(arr, arr + 5, bind(output, 5, _1));
return 0;
}
Outputs:
5, 1
5, 2
5, 3
5, 4
5, 5