I'm trying to understand constexpr as best as I can. However, i've found a problem that I can't really explain (I don't understand the compiler's decisions on this code-piece). This code has been compiled with the -O3 flag on X86-64 gcc 7.2, with C++17 as it's std flag (I've been using godbolt.org for this compilation)
Taking this code:
#include <stdlib.h>
#include <stdio.h>
template <size_t N>
class constexpr_sum_array_compile_time
{
public:
inline constexpr constexpr_sum_array_compile_time ()
{
start_arr();
sum();
}
inline constexpr void start_arr()
{
for (int i = 0; i<N; ++i)
{
m_arr[i] = i;
}
}
inline constexpr void sum()
{
m_sum = 0;
for (int i = 0; i<N; ++i)
{
m_sum += m_arr[i];
}
}
constexpr int sum_res()
{
return this->m_sum;
}
private:
int m_arr[N];
int m_sum = 0;
};
#define NUMBER (4)
int main()
{
return constexpr_sum_array_compile_time<NUMBER>().sum_res();
}
In a nutshell, this is a constexpr class that creates an array with a given size, and then sums an array with incremental values (arr[0] = 0, arr[1] = 1, arr[2] = 2... arr[n] = n) on compile_time (at least thats what I want it to do).
If the "NUMBER" define is in range: { 0 <= NUMBER <= 4 or 8 <= NUMBER <= 71 }
Then this class is optimized completely and returns only a single value (Like expected)
However! If NUMBER is in range: { 5 <= NUMBER <= 7 or NUMBER >= 72}, the compiler ISN'T ABLE to optimize the return value.
How come? What's so special about these values?
You can check the optimizations over at godbolt.org, it shows raw assembly as it's being compiled.
SOLVED
It seems like I needed to create a variable that holds the keyword of constexpr in order to allow the compiler to calculate it in compile time. The new code is:
#include <stdlib.h>
#include <stdio.h>
template <size_t N>
class constexpr_sum_array_compile_time
{
public:
inline constexpr constexpr_sum_array_compile_time() : m_arr(), m_sum(0)
{
start_arr();
sum();
}
inline constexpr void start_arr()
{
for (int i = 0; i<N; ++i)
{
m_arr[i] = i;
}
}
inline constexpr void sum()
{
m_sum = 0;
for (int i = 0; i<N; ++i)
{
m_sum += m_arr[i];
}
}
inline constexpr int sum_res()
{
return this->m_sum;
}
private:
int m_arr[N];
int m_sum;
};
#define NUMBER (6)
int main()
{
constexpr auto res = constexpr_sum_array_compile_time<NUMBER>().sum_res();
return res;
}
Now no matter what I write in NUMBER (even 100000) it shows the value optimized and calculated at compile-time!
Contrary to your expectation your class is not constexpr (and not used in constexpr expression).
constexpr auto res = constexpr_sum_array_compile_time<NUMBER>().sum_res();
would show you the different errors you have.
So what you observe with assembly is just regular optimization.
Related
The below is the basic code. I want to make the array globally so that i don't have to call it for every function.Now how do I initialize the 2d array with -1. I tried to use memset(arr,-1,sizeof(arr)) just below line 3, but it didn't worked out, so, can anyone tell me what am I doing wrong??
#include <bits/stdc++.h>
using namespace std;
int arr[10][10];
int func(){
//this function will be using the global arr
// if(arr[x][y]!=-1)
//do something
}
int main(){
//the code
}
I do not know the good way to initialize a built-in array in place without code repetition. I do, however, know a way to initialize std::array:
#include <array>
#include <utility>
#include <cstddef>
template<size_t... Ix>
auto constexpr make1array(int v, std::index_sequence<Ix...>) {
auto populate = [](int v, size_t) { return v; };
std::array<int, 10> a = { populate(v, Ix)... };
return a;
}
template<size_t... Ix1, size_t... Ix2>
auto constexpr make2array(int v, std::index_sequence<Ix1...> seq, std::index_sequence<Ix2...>) {
auto populate = [](auto v, size_t) { return v; };
std::array<std::array<int, 10>, 10> a = { populate(make1array(v, seq), Ix2)... };
return a;
}
std::array<std::array<int, 10>, 10> arr = make2array(-1, std::make_index_sequence<10>{}, std::make_index_sequence<10>{});
This code produces an array pre-populated with -1 as the value at compile time.
The function memset won't work because memset uses bytes and many integers occupy more than one byte.
IMHO, your best source is to use std::fill.
Example:
std::fill(&arr[0][0], &arr[9][9] + 1, -1);
Otherwise, you can always fall back on the nested loop:
for (int r = 0; r < MAX_ROWS; ++r)
{
for (int c = 0; c < MAX_COLUMNS; ++c)
{
arr[r][c] = -1;
}
}
Your best bet is to let the compiler optimize the nested loops.
There may be some micro-optimizations that you could employ, but the compiler probably already has them in its tool chest.
There is no direct way to initialize raw array by values that aren't result of default initialization. One of the reasons is that array cannot be returned from function and cannot be assigned directly from anything that is not a {}-list.
Simplest way (since C++14) is to make it part of class-type with constexpr constructor. In C++111 constructor with non-empty body cannot be constexpr.
#include <iostream>
struct MinusOneArray {
static constexpr int NX = 10;
static constexpr int NY = 10;
int arr[NX][NY];
constexpr MinusOneArray() : arr() {
for(int i = 0; i < NX; ++i)
for(int j = 0; j < NY; ++j)
arr[i][j] = -1;
}
};
int main()
{
MinusOneArray a;
auto &arr = a.arr;
for(auto &line: arr) {
for(auto val: line)
std::cout << val << ",";
std::cout << std::endl;
}
}
Alternative is to use standard structure std::array and initialize it with constexpr function, how SergeyA offered.
The question
I am writing a software in c++17 for which performances are absolutely critical. I would like available in a few key functions constants in arrays themselves in arrays. It matters that both these array are accessible by a integer value in such (or similar) manner :
int main()
{
for (int i = 0; i < size_of_A ; i++)
{
for (int j = 0; j < size_of_B_in_A(i); j++)
{
std::cout << A[i][j];
}
}
}
This would be the kind of array we would like to create assuming some function int f(a, b)
A
{
// B1
{
f(1, 1),
f(1, 2),
f(1, 3),
...
f(1, large number)
},
// B2
{
f(2, 1),
...
f(2, some other large number)
},
... etc
}
The Twist
Each inner array may be of different size which we have will stored elsewhere, we have to find the size at compile time. I would rather not use std::vector for they are assumed
slightly slower
.
Also an I suppose a std::vector would be stored on the heap which would be a performance issue in my specific case.
Furthermore,
std::vector cannot be used as "inline constexpr"
which would be necessary as I expect to have a large amount of value in those array never going to change. I am fine with recompiling all those values each time but not keeping them in an external file by policy as I am to follow a strict coding style.
What I Have Tried
brace initializer
// A.hh
#pragma once
#include <iostream>
void test1();
void test2();
inline constexpr int B1[1] = {1};
inline constexpr int B2[2] = {2, 3};
inline constexpr int B3[3] = {4, 5, 6};
inline constexpr const int *A[3] = {B1, B2, B3};
// main.cc
#include "A.hh"
int main()
{
std::cout << "values : ";
for (int i = 0; i < 3; i++)
{
for (int j = 0; j <= i; j++)
{
std::cout << A[i][j];
}
}
std::cout << "\n\naddress test : \n";
std::cout << &A << '\n';
test1();
test2();
}
// somewhere.cc
#include "A.hh"
void test1()
{
std::cout << &A << '\n';
}
// elsewhere.cc
#include "A.hh"
void test2()
{
std::cout << &A << '\n';
}
which prints :
./a.out
values : 123456
address test :
0x56180505cd70
0x56180505cd70
0x56180505cd70
Therefore A has not been copied in main.cc, somewhere.cc and elsewhere.cc which is good. I would like to go further and be able to create a huge amount of values.
struct with constexpr
using tips found
here
, I do this to be able to perform operations during array construction.
// B.hh
#pragma once
#include <iostream>
template <int N>
struct X
{
int arr[N];
constexpr X(): arr()
{
for (int i = 0; i < N; i++)
{
arr[i] = i % 3;
}
}
};
inline constexpr auto A = X<500>();
// main.cc
#include "B.hh"
int main()
{
for (int i = 0; i < 500; i++)
{
std::cout << A.arr[i];
}
}
Which unsuspectingly prints out
012012 (etc)...
Finally an array of array
And this where I am stuck
#pragma once
#include <iostream>
template <int N>
struct sub_array
{
int arr[N];
constexpr sub_array() : arr()
{
for (int i = 0; i < N; i++)
{
arr[i] = i;
}
}
};
struct array
{
sub_array</*what here ?*/> arr[100];
constexpr array() : arr()
{
for (int i = 0; i < 100; i++)
{
int size = i * 2; // a very large number
// the value of 'size' is not usable in a constant expression
//
// I see why it is, but I can't think of any other way
arr[i] = sub_array<size>;
}
}
};
inline constexpr array A = array();
How can I build such kind of array ?
Thank you for your time and consideration.
Just use std::array<std::span<int>, N>, which is a fixed size array of spans of different sizes. To generate this, use an std::index_sequence
Header:
constexpr std::size_t size_of_A = 500;
extern const std::array<const std::span<const int>, size_of_A>& A;
Implementation:
constexpr std::size_t size_of_B_in_A(std::size_t i) { return i%10+1;}
constexpr int f(std::size_t i, std::size_t j) {return static_cast<int>(i%(j+1));}
template <int I, int N>
struct B
{
std::array<int,N> arr;
explicit constexpr B()
{
for (int j = 0; j < N; j++)
arr[j] = f(I, j);
}
constexpr operator const std::span<const int>() const {return {arr};}
};
template<class index_sequence>
class BGen;
template<std::size_t... I>
struct BGen<std::integer_sequence<std::size_t,I...>> {
static constexpr std::tuple<B<I, size_of_B_in_A(I)>...> bs{};
static constexpr std::array<const std::span<const int>, sizeof...(I)> A {std::get<I>(bs)...};
};
const std::array<const std::span<const int>, size_of_A>& A
= BGen<decltype(std::make_index_sequence<size_of_A>{})>::A;
Usage:
int main()
{
for (unsigned i = 0; i < A.size() ; i++)
{
for (unsigned j = 0; j < A[i].size(); j++)
{
std::cout << A[i][j];
}
}
}
http://coliru.stacked-crooked.com/a/d68b0e9fd6142f86
However, stepping back: This solution is NOT the normal way to go about solving this problem. Since it's all constexpr, this is all data not code. Ergo, the most performant solution is two programs. One generates the data and saves it to a file that ships with (inside?) your program. Then your program simply maps the file into memory, and uses the data directly.
Here's a way of implementing a constexpr jagged array which can be initialized without intermediates. It does require listing the row sizes as template arguments, but there are ways to make that easier too, depending on how the row sizes can be known at compile time.
#include <tuple>
#include <array>
#include <utility>
template <std::size_t ...Sizes>
struct jagged_array
{
const std::tuple<std::array<int,Sizes>...> data;
static constexpr std::size_t num_rows = sizeof...(Sizes);
static constexpr std::size_t length[num_rows]{Sizes...};
int const* const row_ptr[num_rows];
template <std::size_t ...I>
constexpr jagged_array(std::index_sequence<I...>,
const std::array<int, Sizes>& ...arrs)
: data{arrs...}, row_ptr{&std::get<I>(data)[0]...} {}
constexpr jagged_array(const std::array<int, Sizes>& ...arrs)
: jagged_array(std::make_index_sequence<num_rows>{}, arrs...)
{}
constexpr int const* operator[](std::size_t idx) const
{ return row_ptr[idx]; }
};
inline constexpr jagged_array<2,4> jarr = {{2,3}, {4,5,6,7}};
I wrote this code using 2d Vector and Array. But I wanted to use std::array this time and my code did not work because this was the first time I use std::array and template.
It gave me for this line:
array<array<int, sizeY>, sizeX> arr;
this error:
Error C2971 std::array: template parameter _Size: sizeY,sizeX: a variable with non-static storage duration cannot be used as a non-type argument
#include <iostream>
#include <array>
using namespace std;
template <size_t Y, size_t X>
bool IsMagicSquare(array<array<int, Y>, X>& ar)
{
int x = ar.size();
int y = ar[0].size();
if (x == y)
{
int ver[x] = { };
int hor[y] = { };
int cross0 = 0;
int cross1 = 0;
for (int i = 0; i < x; i++)
{
for (int j = 0; j < y; j++)
{
hor[i] += ar[i][j];
ver[j] += ar[i][j];
if (i == j)
cross0 += ar[i][j];
if (i + j == x - 1)
cross1 += ar[i][j];
}
}
if (cross0 != cross1)
return false;
else
{
for (int i = 0; i < x; i++)
if ((cross0 != ver[i]) || (cross1 != hor[i]))
return false;
}
}
else
return false;
return true;
}
int main()
{
int sizeX, sizeY;
cout << "Size of Matrix:";
cin >> sizeX >> sizeY;
**array<array<int, sizeY>, sizeX> arr;**
cout << "Elements of the Matrix:";
for (int i = 0; i < sizeX; i++)
for (int j = 0; j < sizeY; j++)
cin >> arr[i][j];
if (IsMagicSquare(arr))
{
for (int i = 0; i < sizeX; i++)
{
cout << "\n";
for (int j = 0; j < sizeY; j++)
cout << arr[i][j];
}
}
else
cout << "Matrix is not magical square!";
return 0;
}
The size of an array (or template arguments in general) has to be known at compile-time, so there is no way to use the runtime values sizeX, sizeY as size (template argument) for an array.
You have to use a variable-length container like std::vector instead.
For reference, here's how you can get a std::array with a size which is decided at runtime:
#include <array>
#include <cstddef>
#include <iostream>
#include <memory>
template<typename T>
struct DynArray {
virtual std::size_t size() const = 0;
virtual T * data() = 0;
virtual ~DynArray() {}
};
template<typename T, std::size_t Size>
struct DynArrayImpl : public DynArray<T> {
std::array<T, Size> array;
std::size_t size() const override {
return array.size();
}
T * data() override {
return array.data();
}
};
template<typename T, std::size_t Size>
struct DynArrayFactory {
static DynArray<T> * allocate(std::size_t const size) {
if (size > Size) {
// ERROR
return nullptr;
}
if (size == Size) {
return new DynArrayImpl<T, Size>();
}
return DynArrayFactory<T, Size - 1>::allocate(size);
}
};
template<typename T>
struct DynArrayFactory<T, 0> {
static DynArray<T> * allocate(std::size_t const size) {
if (size > 0) {
return nullptr;
}
return new DynArrayImpl<T, 0>();
}
};
int main() {
std::size_t size;
std::cin >> size;
std::unique_ptr<DynArray<int>> array{DynArrayFactory<int, 100>::allocate(size)};
std::cout << array->size() << std::endl;
}
This requires a maximum size (100 in this case) to be specified at compile time and is a really convoluted way of doing things; thus not recommended.
Accessing the std::array is nearly impossible though, unless with similar templated code which then generates code for each possible size (see below). This will generate a lot of code. One can easily access the contents of the array however, as seen in the example above. But really: use std::vector.
"similar templated code":
template<std::size_t Size>
struct FillWithNumbers {
static void run(std::array<int, Size> & array) {
int n = 0;
std::generate(begin(array), end(array), [&n](){ return n++; });
}
};
template<typename T, std::size_t Size>
struct DynArrayApply {
template<template<std::size_t S> class Fn>
static void apply(DynArray<T> & array) {
if (array.size() > Size) {
// ERROR
}
if (array.size() == Size) {
DynArrayImpl<T, Size> & real = dynamic_cast<DynArrayImpl<T, Size> &>(array);
Fn<Size>::run(real.array);
}
else {
DynArrayApply<T, Size - 1>::template apply<Fn>(array);
}
}
};
template<typename T>
struct DynArrayApply<T,0> {
template<template<std::size_t S> class Fn>
static void apply(DynArray<T> & array) {
if (array.size() > 0) {
// ERROR
}
DynArrayImpl<T, 0> & real = dynamic_cast<DynArrayImpl<T, 0> &>(array);
Fn<0>::run(real.array);
}
};
int main() {
std::size_t size;
std::cin >> size;
std::unique_ptr<DynArray<int>> array{DynArrayFactory<int, 100>::allocate(size)};
DynArrayApply<int, 100>::apply<FillWithNumbers>(*array);
std::cout << array->size() << std::endl;
std::cout << array->data()[array->size() / 2] << std::endl;
}
I wrote this code using 2d Vector and Array.
That is appropriate, as you do not know the size of the matrix until run time.
But I wanted to use std::array this time [...]
Well, that's a problem because the size of a std::array must be known at compile time. Moving away from C-style arrays is a recommended move, but you have to know where to go. Use the correct tool for the job at hand.
Fixed-size arrays: For arrays whose size is known by the compiler, a std::array is a reasonable replacement. In fact, the std::array is probably nothing more than the C-style array with a different interface.
Variable-size arrays: For arrays whose size is not known until run time, a std::vector is a reasonable replacement. Even though the name does not say "array", it is an array. It is a bit more complex than std::array, but that is because it supports sizes not known at compile time.
This distinction tends to be better-known by those not using gcc, as that compiler has an extension that supports declaring variable-size C-style arrays using the same syntax as declaring fixed-size C-style arrays. It is standard C++ to declare an array along the lines of int col[10]. However, it is not standard C++ to declare an array along the lines of int col[sizeY], where sizeY has a value supplied at run time. The latter syntax is supported by gcc as an extension, and some people use it without realizing it is an extension (ported from gcc's C support). To some extent, std::vector makes this extension available in a more portable form.
Today, I try to solve one weird (kind of) question with my friend, .
Try to get the sum of 1 + 2 + ยทยทยท + n, without using multiplication and division, for, while, if, else, switch, case, ternary expression and other keywords.
Here are our solutions
constructor
class Sum
{
public:
Sum() { ++num; sum += num; }
static void Init() { num = 0; sum = 0; }
static unsigned int SumValue() { return sum; }
private:
static unsigned int num;
static unsigned int sum;
};
unsigned int Sum::num = 0;
unsigned int Sum::sum = 0;
unsigned int get_sum(unsigned int n)
{
Sum::Init();
Sum * tmp = new Sum[n];
delete[] tmp;
return Sum::SumValue();
}
recursive
class Ba
{
public:
virtual unsigned int sum(unsigned int n)
{
return 0;
}
};
Ba* sumArray[2];
class D : public Ba
{
public:
virtual unsigned int sum(unsigned int n)
{
return sumArray[!!n]->sum(n - 1) + n;
}
};
unsigned int get_sum2(unsigned int n)
{
Ba b;
D d;
sumArray[0] = &b;
sumArray[1] = &d;
return sumArray[1]->sum(n);
}
We think maybe this question could be solved var template? However, we failed to figure it out. Is it possible to do that with template?
BTW, we try to find the same question in this site, but we failed. Sorry to this duplicated question if it is.
with a minimum of keywords, using short circuit evaluation
unsigned sum(unsigned n) {
unsigned i=0;
n && (i=n+sum(n-1));
return i;
}
I suspect this question will be closed soon, but it sounds like what you're after is the following: a pretty standard introduction to the idea of compile-time recursion, used heavily in template metaprogramming.
template <int I>
struct sum {
static constexpr int value = I + sum<I-1>::value;
};
template <>
struct sum<0> {
static constexpr int value = 0;
};
int main() {
std::cout << sum<5>::value << std::endl;
}
We have that the sum of the first N numbers is S = N(N+1)/2 = (N^2 + N)/2. Therefore,
int main()
{
int N = 10;
int sum = (N*N + N) >> 1;
}
What about using algorithms?
std::iota
template< class ForwardIterator, class T >
void iota( ForwardIterator first, ForwardIterator last, T value );
Fills the range [first, last) with sequentially increasing values,
starting with value and repetitively evaluating ++value.
http://en.cppreference.com/w/cpp/algorithm/iota
std::accumulate
template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );
Computes the sum of the given value init and the elements in the range
[first, last).
http://en.cppreference.com/w/cpp/algorithm/accumulate
Example:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int sum(const int n) {
std::vector<int> v(n);
std::iota(begin(v), end(v), 1);
return std::accumulate(begin(v), end(v), 0);
}
int main() {
const int n = 12;
std::cout << "Sum: " << sum(n) << std::endl;
return 0;
}
https://ideone.com/ajOhWM
Recursion is probably the expected solution, but by no means as complex as you have made it.
unsigned num( unsigned n ){ return n ; }
unsigned sum1toN( unsigned n ) ;
unsigned (*function[])() = { sum1toN, num } ;
unsigned sum1toN( unsigned n )
{
return n + (function[n==1])( n - 1 ) ;
}
Is template recursion more efficient than non-template recursion?
I.e. which one of the two is better:
typedef std::vector<int> Ivec;
template <int N>
void test1(Ivec& v){
assert(v.size() >= N);
for (int i=0;i<N;i++){v[i]++;}
test1<N-1>(v);
}
template <>
void test1<0>(Ivec& v){}
void test2(Ivec& v,int N){
assert(v.size() >= N);
for (int i=0;i<N;i++){v[i]++;}
if (N == 1) {return;}
test2(v,N-1);
}
I will be surprised if the template version is ever slower. It should be faster most of the time, if not every time. After all, the template version computes the values at compile time.
Here's program that times the two approaches.
#include <iostream>
#include <cstddef>
#include <vector>
#include <cstdlib>
#include <ctime>
#include <cassert>
typedef std::vector<int> Ivec;
template <int N>
void test1(Ivec& v){
assert(v.size() >= N);
for (int i=0;i<N;i++){v[i]++;}
test1<N-1>(v);
}
template <>
void test1<0>(Ivec& v){}
void test2(Ivec& v,int N){
assert(v.size() >= N);
for (int i=0;i<N;i++){v[i]++;}
if (N == 1) {return;}
test2(v,N-1);
}
void timeFunction(void (*fun)())
{
clock_t start = std::clock();
fun();
clock_t end = std::clock();
double secs = 1.0*(end-start)/CLOCKS_PER_SEC;
std::cout << "Time taken: " << secs << std::endl;
}
void time_test1()
{
Ivec a;
const int N = 500;
for (int i = 0; i < N; ++i )
{
a.push_back(std::rand());
}
for ( int i = 0; i < N*20; ++i )
{
test1<N>(a);
}
}
void time_test2()
{
Ivec a;
const int N = 500;
for (int i = 0; i < N; ++i )
{
a.push_back(std::rand());
}
for ( int i = 0; i < N*20; ++i )
{
test2(a, N);
}
}
int main()
{
std::srand(time(NULL));
timeFunction(time_test1);
timeFunction(time_test2);
return 0;
}
Program built on a Linux machine with g++ version 4.8.4 with the command:
g++ -Wall -std=c++11 socc.cc -o socc
Output:
Time taken: 3.96467
Time taken: 4.32788
The output validates my hunch. As usual, your mileage may vary.
Template recursion should be faster, but you need to know N at compile time.
Which is better?
Usually function recursion since it is more flexible and generates the machine code only once.
But if you know N at compile time ( as a define for example ) and not as something you read from a file, and the performance need is greater than the size of the generated code, than you could yake advantage of the optimisations that the compiler can do.