C++11 - bad_alloc on a constexpr - c++

Arrays with bitmasks are really popular, often times they are tedious to write and they make the code less readable, I would like to generate them with a constexpr, here is my try
#include <iostream>
#include <cstdint>
#include <vector>
#include <utility>
typedef uint32_t myT;
template <typename T>
constexpr std::vector<T> vecFarm(T &&lower, T &&upper, T &&step) {
// std::cout << lower << " " << upper << " " << step << "\n";
std::vector<T> v;
if (lower < upper) {
for (T count = lower; count < upper; count += step) {
v.push_back(count);
};
}
return (v);
}
int main() {
std::vector<myT> k(std::move(vecFarm(myT(0), ~(myT(0)), myT(256)))); //why
// this doesn't work ?
// std::vector<myT> k(std::move(vecFarm(myT(0), ((~(myT(0))) >> 16), myT(256))));
// but this one works
// let's see what we got
for (const auto &j : k) {
std::cout << j << " ";
}
std::cout << "\n";
return (0);
}
I have used std::move, unnamed objects and a constexpr, this code compiles fine with
g++-4.8 -O3 -std=c++11 -pthread -Werror -Wall -Wextra
but it fails at runtime because of a bad_alloc, and I can see my "small" application allocating a lot of space .
Maybe the error is huge and I can't see it, but why this doesn't work ?
Why my application does the allocation at run-time ? Isn't supposed to compute everything at compile-time ? I was expecting this to maybe fail at compile-time not at run-time.

std::bad_alloc usually means it cannot allocate any more memory. Changing your code to the following will show you why:
for (T count = lower; count < upper; count += step) {
std::cout << "count:" << count << "\n";
std::cout << "upper:" << upper << "\n";
};
This prints the following on the first loop when I tested it:
count:0
upper:4294967295
In other words, you have a long way to go before count < upper fails and the for loop stops, especially since you are adding only 256 each time.
Also, in order for constexpr functions to be evaluated at compile time, there are certain conditions it has to fullfil. For example, its return type must be LiteralType, and your function returns std::vector, also, exactly one return statement that contains only literal values, constexpr variables and functions. and you have a compound statement. Therefore, your function cannot be evaluated at compile time.
Also, note that if you do not fullfill these conditions, the constexpr qualifier is ignored, although if you turn on -pedantic it should give you better diagnostics.

Related

Random exit code on giving larger array sizes in DPC++ Vector Addition

I am trying to run a hello-world DPC++ sample of oneAPI which adds two 1-D Arrays on both CPU and GPU, and verifies the results. Code is shown below:
/*
DataParallel Addition of two Vectors
*/
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
using namespace sycl;
constexpr size_t array_size = 100000;
typedef std::array<int, array_size> IntArray;
// Initialize array with the same value as its index
void InitializeArray(IntArray& a) { for (size_t i = 0; i < a.size(); i++) a[i] = i; }
/*
Create an asynchronous Exception Handler for sycl
*/
static auto exception_handler = [](cl::sycl::exception_list eList) {
for (std::exception_ptr const& e : eList) {
try {
std::rethrow_exception(e);
}
catch (std::exception const& e) {
std::cout << "Failure" << std::endl;
std::terminate();
}
}
};
void VectorAddParallel(queue &q, const IntArray& x, const IntArray& y, IntArray& parallel_sum) {
range<1> num_items{ x.size() };
buffer x_buf(x);
buffer y_buf(y);
buffer sum_buf(parallel_sum.data(), num_items);
/*
Submit a command group to the queue by a lambda
which contains data access permissions and device computation
*/
q.submit([&](handler& h) {
auto xa = x_buf.get_access<access::mode::read>(h);
auto ya = y_buf.get_access<access::mode::read>(h);
auto sa = sum_buf.get_access<access::mode::write>(h);
std::cout << "Adding on GPU (Parallel)\n";
h.parallel_for(num_items, [=](id<1> i) { sa[i] = xa[i] + ya[i]; });
std::cout << "Done on GPU (Parallel)\n";
});
/*
queue runs the kernel asynchronously. Once beyond the scope,
buffers' data is copied back to the host.
*/
}
int main() {
default_selector d_selector;
IntArray a, b, sequential, parallel;
InitializeArray(a);
InitializeArray(b);
try {
// Queue needs: Device and Exception handler
queue q(d_selector, exception_handler);
std::cout << "Accelerator: "
<< q.get_device().get_info<info::device::name>() << "\n";
std::cout << "Vector size: " << a.size() << "\n";
VectorAddParallel(q, a, b, parallel);
}
catch (std::exception const& e) {
std::cout << "Exception while creating Queue. Terminating...\n";
std::terminate();
}
/*
Do the sequential, which is supposed to be slow
*/
std::cout << "Adding on CPU (Scalar)\n";
for (size_t i = 0; i < sequential.size(); i++) {
sequential[i] = a[i] + b[i];
}
std::cout << "Done on CPU (Scalar)\n";
/*
Verify results, the old-school way
*/
for (size_t i = 0; i < parallel.size(); i++) {
if (parallel[i] != sequential[i]) {
std::cout << "Fail: " << parallel[i] << " != " << sequential[i] << std::endl;
std::cout << "Failed. Results do not match.\n";
return -1;
}
}
std::cout << "Success!\n";
return 0;
}
With a relatively small array_size, (I tested 100-50k elements) the computation works out to be fine.
Sample output:
Accelerator: Intel(R) Gen9
Vector size: 50000
Adding on GPU (Parallel)
Done on GPU (Parallel)
Adding on CPU (Scalar)
Done on CPU (Scalar)
Success!
It can be noted that it takes barely a second to finish the computation on both CPU and GPU.
But when I increase the array_size, to say, 100000, I get this seemingly clueless error:
C:\Users\myuser\source\repos\dpcpp-iotas\x64\Debug\dpcpp-iotas.exe (process 24472) exited with code -1073741571.
Although I am not sure at what precise value the error starts occurring, but I seem to be sure it happens after around 70000. I seem to have no idea why this is happening, any insights on what can be wrong?
Turns out, this is due to Stack size reinforcement by VS. Contiguous array with too many elements resulted in a stack overflow.
As mentioned by #user4581301, the error code -107374171 in hex, gives C00000FD, which is a signed representation of 'stack exhaustion/overflow' in Visual Studio.
Ways to fix this:
Increase the /STACK reserve to something higher than 1MB (this is the default) in the Project Properties > Linker > System > Stack Reserve/Commit values.
Use a binary editor (editbin.exe and dumpbin.exe) to edit /STACK:reserve.
Use std::vector instead, which allows dynamic allocation (suggested by #Retired Ninja).
I couldn't find an option to change /STACK in oneAPI, the normal way in Linker properties, shown here.
I decided to go with dynamic allocation.
Related: https://stackoverflow.com/a/26311584/9230398
When I program big applications I always do a
ulimit -s unlimited
to explain to the shell that I am grown up and I really want some space on my stack.
Here this is the bash syntax but you can obviously adapt to some other shells.
I guess there might be an equivalent for non-UNIX OS?

How to use vectors to find mean and standard deviation

This is the assignment:
Write two functions that calculate the mean and standard deviation.
Your functions shall implement the following prototypes:
double mean(vector<double>x);
double sd(vector<double>x);
b. Place these functions in a file named “statfun.cpp”.
c. Place their function definitions in a file named “statfun.h”.
Write a main() function in a file named “lab1.cpp”.
Prompt the user to input 10 floating-point values and store them in a vector v.
Print vector v on a single line with each element separated by a space.
Call your functions mean(v) and sd(v) ...
I know how to code the formula for mean, but I'm not sure how to code the formula for standard deviation using vectors. I'm even less sure of how to do this with different files involved. I'm fully aware my code is garbage, but there are so many things I'm not sure of, I don't know what to tackle first.
Edit: Updated the code
//statfun.h
#include <iostream>
#include <vector>
#ifndef STATFUN_H
#define STATFUN_H
using namespace std;
double mean(vector<double> v);
double sd(vector<double> v);
#endif
//statfun.cpp
#include <iostream>
#include <cmath>
#include <vector>
#include "statfun.h"
#ifndef STATFUN_CPP
#define STATFUN_CPP
using namespace std;
double mean(const vector<double> v) {
double result;
double sumVal = 0.0; //Calculating sum of all values
for (int i = 0; i < v.size(); ++i) {
sumVal = sumVal + v.at(i);
}
result = sumVal / v.size(); //Calculating mean
return result;
}
double sd(const vector<double> v) {
double total = 0;
for (int i = 0; i < 10; ++i) { //Calcuating standard deviation
double mean_value = mean(v);
int length = v.size()
total = total + (val - mean_value)*(val - mean_value);
}
return sqrt(total / length);
}
#endif
//lab1.cpp
#include "statfun.cpp"
#include <iomanip>
using namespace std;
vector<double> v;
int main() {
cout << "Enter 10 numbers: " << endl;
float userInput = 0;
for (int i = 0; i < 10; ++i) {
cin >> userInput;
v.push_back(userInput);
}
for (int i = 0; i < 10; ++i) {
cout << v.at(i) << " ";
}
cout << endl;
cout.precision(3);
cout << mean(v) << " " << sd(v) << endl;
cout.precision(5);
cout << scientific << mean(v) << " " << sd(v) << endl;
return 0;
}
You made many mistakes and your code has much to improve.
Let me show you me by me.
The header
Since one file can include header multiple times, to prevent any side effect of this, an include guard is required for each header file.
// statfun.h
#ifndef __statfun_H__
# define __statfun_H__
# include <vector>
double mean(const std::vector<double>&);
double sd(const std::vector<double>&);
#endif
BTW, a function declaration can abbreviate the arguments' name.
Reference
The second mistake you made is that you didn't use reference. In c++, an object is by default passed by value.
Note: This is just like R, except it doesn't have language level copy-on-write semantics, but user-defined class can implement this, classes defined in std namespace can also implement this.
So in order to prevent costy copy, reference is made.
double mean(const std::vector<double>&);
Here I used const left-value reference (const &), since mean will not modify the vector passed in.
Function blocks.
In c++, a function is defined as below:
return_value func_name(type1 arg1 /* , type2 arg2, ... */)
{
// The function body goes here:
}
So
// statfun.cpp
// c++11
#include "statfun.h"
#include <cmath>
double mean(const std::vector<double> &v)
{
double sum = 0;
for (auto &each: v)
sum += each;
return sum / v.size();
}
double sd(const std::vector<double> &v)
{
double square_sum_of_difference = 0;
double mean_var = mean(v);
auto len = v.size();
double tmp;
for (auto &each: v) {
tmp = each - mean_var;
square_sum_of_difference += tmp * tmp;
}
return std::sqrt(square_sum_of_difference / (len - 1));
}
Compile-time variable type deduction
As you might have noticed in the code above, I used auto len = v.size(), which is a c++11 language feature -- auto.
Since c++11, c++ can deduce the return type of function calls at compile-time. So instead of define variable like typename std::vector<double>::size_type len = v.size(), we now have auto len = v.size().
range-for loop
If you have learnt python, then you must know range-for. Since c++11, c++ can also do this:
for (auto &each: v) {
// Loop body
}
where v can be std::vector or any other container in c++.
IO error check
Last but not least, you didn't check if any of these IO you performed on std::cout or std::cin succeeds or not!
Using std::cout or std::cin, you have to check stream state by std::cout.fail() every time after you performed an IO, or use the following code:
std::cout.exceptions(std::ios_base::failbit | std::ios_base::badbit);
std::cin.exceptions(std::ios_base::failbit | std::ios_base::badbit);
To make std::cout and std::cin throws when an IO fails.
I personally like to not handle this error and let the exception terminates the program, since there is nothing you can do to cleanup and resume the control flow of the program.
Below is the last piece of code:
// lab1.cpp
// c++11
#include "statfun.h"
#include <iostream>
auto get_use_input() -> std::vector<double>
{
std::vector<double> v;
v.reserve(10);
double userInput;
for (int i = 0; i != 10; ++i) {
std::cout << "Please enter the " << i + 1 << " number: ";
std::cin >> userInput;
std::cout << std::endl;
v.push_back(userInput);
}
return v;
}
void print_vec(const std::vector<double> &v)
{
std::cout << "Vector: ";
for (auto &each: v)
std::cout << each << " ";
std::cout << std::endl;
}
int main() {
// Configure std::cout and std::cin to throw if io fails.
std::cout.exceptions(std::ios_base::failbit | std::ios_base::badbit);
std::cin.exceptions(std::ios_base::failbit | std::ios_base::badbit);
/*
* With "-O3" or [c++17, copy elision](https://en.cppreference.com/w/cpp/language/copy_elision),
* the cost of initializing an object using the return value of anther function is nearly zero.
*/
std::vector<double> v = get_use_input();
print_vec(v);
std::cout.precision(3);
std::cout << "mean: " << mean(v) << " sd: " << sd(v) << std::endl;
std::cout.precision(5);
std::cout <<std::scientific << "mean: " << mean(v) << " sd: " << sd(v) << std::endl;
return 0;
}
To build this program, you must have a c++ compiler that supports c++11 and pass -std=c++11 to the compiler.
PS: You can also use -std=c++14 or -std=c++17.
A simple Makefile to build the program:
cxx = ${CXX}
# The flags
CPPFLAGS := -std=c++11
# The two line below is the flags I used for clang++-8
# CPPFLAGS := -std=c++17 -Ofast -pipe -flto
# LDFLAGS := -flto -pipe -Wl,--icf=all,-O2,--as-needed,--strip-all,--plugin-opt=O3
lab1: lab1.o statfun.o
$(CXX) $(LDFLAGS) $^ -o $#
statfun.o: statfun.h
lab1.o: statfun.h
.PHONY: clean
rm -f lab1.o statfun.o lab
I believe your first issue is in understanding the file structure of your stats assignment. Tackle this first. Understanding headers. More on headers and function calls from other files.
The .cpp files will contain implementation of logic, the .h files are headers that should declare definitions of objects and functions. When you include files at the top of your code, generally think of this as having all the code from that file above the current file.
Example:
statfun.h
double mean(vector<double> v);
// other **declaration** stuff....
lab1.cpp at the top of the file
#include "statfun.h" // equivalent to copy/pasting 'double mean(vector<double> v); and other declarations' into your lab1.cpp
// This is to help with cleanliness of your file structure.
// You'll thank yourself when projects become bigger.
Note: lab1.cpp includes statfun.cpp which includes statfun.h; implicitly, lab1.cpp includes statfun.h which means you don't have to include statfun.h in lab1, although typically the header is included, not the cpp file. You must avoid circular dependencies which you do with the ifndef.
b. statfun.cpp should be the place where you code all of your logic for the mean and standard deviation.
example:
statfun.cpp
double mean(vector<double> v) {
// Your mean calculation logic here.
return mean;
}
double sd(vector<double> x) {
// Your standard deviation calculation logic here.
return sd;
}
c.
So you have lab1.cpp which will be compiled to produce some runnable binary. As the entry point of your program, it should include an int main() function. This main function needs to ask for user input (search the webs for how to take std input).
Store the standard input as a vector (this is still in your main function).
Use cout to print to standard out. 'cout << name_of_variable_with_vector_input_from_user;' (still in your int main())
Call/use the functions you wrote in statfun.cpp (notably mean() and sd()). Maybe store their return values in a variable to use later. Since you need to call the statfun functions here, the lab1.cpp entry file must include statfun.h so that it knows what code to execute when you call those functions.
Now that this file structure logic is complete. A simple way to calculate std deviation in pseudocode:
statfun.madeuplanguage
type sd(vector<type> values) {
type total = 0;
type mean_value = mean(values);
for val in values {
total += (val - mean_value)^2;
}
total /= values.length;
return sqrt(total);
}
This in mind, I would structure the lab1.cpp as follows.
lab1.cpp
int main() {
vector<double> v;
// take input and store in v.
// std out - v
double mean_val = mean(v);
double std_dev = sd(v);
// std out - mean_val and std_dev
}
If you have any questions about implementing the above pseudocode in C++, great! It's your assignment/class, so take care to search the webs in doing extremely specific things in C++ (e.g. iterating on a vector, squaring, square rooting, etc...). Good luck learning.

Does std::string move constructor actually move?

So here i got a small test program:
#include <string>
#include <iostream>
#include <memory>
#include <vector>
class Test
{
public:
Test(const std::vector<int>& a_, const std::string& b_)
: a(std::move(a_)),
b(std::move(b_)),
vBufAddr(reinterpret_cast<long long>(a.data())),
sBufAddr(reinterpret_cast<long long>(b.data()))
{}
Test(Test&& mv)
: a(std::move(mv.a)),
b(std::move(mv.b)),
vBufAddr(reinterpret_cast<long long>(a.data())),
sBufAddr(reinterpret_cast<long long>(b.data()))
{}
bool operator==(const Test& cmp)
{
if (vBufAddr != cmp.vBufAddr) {
std::cout << "Vector buffers differ: " << std::endl
<< "Ours: " << std::hex << vBufAddr << std::endl
<< "Theirs: " << cmp.vBufAddr << std::endl;
return false;
}
if (sBufAddr != cmp.sBufAddr) {
std::cout << "String buffers differ: " << std::endl
<< "Ours: " << std::hex << sBufAddr << std::endl
<< "Theirs: " << cmp.sBufAddr << std::endl;
return false;
}
}
private:
std::vector<int> a;
std::string b;
long long vBufAddr;
long long sBufAddr;
};
int main()
{
Test obj1 { {0x01, 0x02, 0x03, 0x04}, {0x01, 0x02, 0x03, 0x04}};
Test obj2(std::move(obj1));
obj1 == obj2;
return 0;
}
Software i used for test:
Compiler: gcc 7.3.0
Compiler flags: -std=c++11
OS: Linux Mint 19 (tara) with upstream release Ubuntu 18.04 LTS (bionic)
The results i see here, that after move, vector buffer still has the same address, but string buffer doesn't. So it looks to me, that it allocated fresh one, instead of just swapping buffer pointers. What causes such behavior?
You're likely seeing the effects of the small/short string optimization (SSO). To avoid unnecessary allocations for every tiny little string, many implementations of std::string include a small fixed size array to hold small strings without requiring new (this array usually repurposes some of the other members that aren't necessary when dynamic allocation has not been used, so it consumes little or no additional memory to provide it, either for small or large strings), and those strings don't benefit from std::move (but they're small, so it's fine). Larger strings will require dynamic allocation, and will transfer the pointer as you expect.
Just for demonstration, this code on g++:
void move_test(std::string&& s) {
std::string s2 = std::move(s);
std::cout << "; After move: " << std::hex << reinterpret_cast<uintptr_t>(s2.data()) << std::endl;
}
int main()
{
std::string sbase;
for (size_t len=0; len < 32; ++len) {
std::string s1 = sbase;
std::cout << "Length " << len << " - Before move: " << std::hex << reinterpret_cast<uintptr_t>(s1.data());
move_test(std::move(s1));
sbase += 'a';
}
}
Try it online!
produces high (stack) addresses that change on move construction for lengths of 15 or less (presumably varies with architecture pointer size), but switches to low (heap) addresses that remain unchanged after move construction once you hit length 16 or higher (the switch is at 16, not 17, because it is NUL-terminating the strings, since C++11 and higher require it).
To be 100% clear: This is an implementation detail. No part of the C++ spec requires this behavior, so you should not rely on it occurring at all, and when it occurs, you should not rely on it occurring for specific string lengths.

concatenate string_views in constexpr

I'm trying to concatenate string_views in a constexpr.
The following is a simplified version of my code:
#include <iostream>
#include <string_view>
using namespace std::string_view_literals;
// concatenate two string_views by copying their bytes
// into a newly created buffer
constexpr const std::string_view operator+
(const std::string_view& sv1, const std::string_view& sv2)
{
char buffer[sv1.size()+sv2.size()] = {0};
for(size_t i = 0; i < sv1.size(); i++)
buffer[i] = sv1[i];
for(size_t i = sv1.size(); i < sv1.size()+sv2.size(); i++)
buffer[i] = sv2[i-sv1.size()];
return std::string_view(buffer, sv1.size()+sv2.size());
}
int main()
{
const std::string_view sv1("test1;");
const std::string_view sv2("test2;");
std::cout << sv1 << "|" << sv2 << ": " << (sv1+sv2+sv1) << std::endl;
std::cout << "test1;"sv << "|" << "test2;"sv << ": " <<
("test1;"sv+"test2;"sv) << std::endl;
return 0;
}
However this code does not produce the result I expected. Instead of printing test1;test2;test1 and test1;test2; it prints out correct characters mixed with random characters as if I'm accessing uninitialized memory.
test1;|test2;: F��<��itest;
test1;|test2;: est1;te`�i
However if I remove the constexpr specifier and replace the string_views with strings the above code prints the expected output.
test1;|test2;: test1;test2;test1;
test1;|test2;: test1;test2;
Either I'm missing some obvious mistake in my code or there is something about constexpr that I don't understand (yet). Is it the way I'm creating the buffer for the new string_view? What else could I do? Or is what I'm trying to do impossible? Maybe there is someone who can shed light to this for me.
Your task is fundamentally impossible, since string_view, by definition, needs to have continuous non-owning storage from start to finish. So it'll be impossible to manage the lifetime of the data.
You need to create some kind of concatenated_string<> custom range as your return type if you want to do something like this.
As to the specific reason your code is yielding weird results, it's simply because buffer does not exist anymore when the function exits.
In return std::string_view(buffer, sv1.size()+sv2.size()); the returned string_view views the buffer which goes out of scope, so you essentially have a dangling reference.

Foreach range iteration over a vector<int> - auto or auto&?

Game engine micro-optimization situation: I'm using C++11 range for loop to iterate over a vector<int>, with the auto keyword.
What is faster:
for(auto value : ints) ...
or
for(auto& value : ints) ...
?
Before caring about which is faster, you should care about which is semantically correct. If you do not need to alter the element being iterated, you should choose the first version. Otherwise, you should choose the second version.
Sure, you could object that even if you do not need to alter the content of the vector, there is still the option to use a reference to const:
for(auto const& value : ints)
And then the question becomes: Which is faster? By reference to const or by value?
Well, again, you should first consider whether the above is semantically correct at all, and that depends on what you are doing inside the for loop:
int i = 0;
for (auto const& x : v) // Is this correct? Depends! (see the loop body)
{
v[i] = 42 + (++i);
std::cout << x;
}
This said, for fundamental types I would go with for (auto i : x) whenever this is semantically correct.
I do not expect performance to be worse (rather, I expect it to be better), but as always when it comes to performance, the only meaningful way to back up your assumptions is to measure, measure, and measure.
If you modify value and expect it to modify an actual element in the vector you need auto&. If you don't modify value it likely compiles into the exact same code with auto or auto& (profile it to find out for yourself).
I did some timing using VS2012 with a timer based on QueryPerformanceCounter...
m::HighResTimer timer;
std::vector<int> ints(100000000, 17);
int count = 0;
timer.Start();
for(auto& i : ints)
count += i;
timer.Stop();
std::cout << "Count: " << count << '\n'
<< "auto& time: " << duration_cast<duration<double, std::milli>>(timer.Elapsed()).count() << '\n';
count = 0;
timer.Reset();
timer.Start();
for(const auto& i : ints)
count += i;
timer.Stop();
std::cout << "Count: " << count << '\n'
<< "const auto& time: " << duration_cast<duration<double, std::milli>>(timer.Elapsed()).count() << '\n';
count = 0;
timer.Reset();
timer.Start();
for(auto i : ints)
count += i;
timer.Stop();
std::cout << "Count: " << count << '\n'
<< "auto time: " << duration_cast<duration<double, std::milli>>(timer.Elapsed()).count() << '\n';
The Results....
Count: 1700000000
auto& time: 77.0204
Count: 1700000000
const auto& time: 77.0648
Count: 1700000000
auto time: 77.5819
Press any key to continue . . .
I wouldn't read into the time differences here. For all practical purposes they are identical, and fluctuate slightly run to run.
First of all, if you going to modify value use auto&, if not - don't. Because you can accidentally change it.
But there may be choice between const auto& and simple auto. I believe that performance isn't issue here, for std::vector<int>.
Why use auto
It's easier to read.
It's allow to change it(without changing vector)
Why use const auto&
It should be used with other type, so it's more generic to write this way. Moreover, if you change type to any complicated type, you will not accidentally get performance issues.
It doesn't allow to change value of this variable, so some errors maybe catched in compilation time.
In both cases you should understand what do you do. It may depend if the cycle somehow modify our range.
In GCC, both versions compile to the same assembly with optimization flags -O1 through -O3.
Since the compiler takes care of the optimization for you, I would use the for (auto value : ints) whenever you don't need to change the value. As Andy points out, you could use const-refs, but if there's no performance gain whatsoever, then I wouldn't bother.