I have wrote two functions to compare the time cost of std::vector and dynamic allocated array
#include <iostream>
#include <vector>
#include <chrono>
void A() {
auto t1 = std::chrono::high_resolution_clock::now();
std::vector<float> data(5000000);
auto t2 = std::chrono::high_resolution_clock::now();
float *p = data.data();
for (int i = 0; i < 5000000; ++i) {
p[i] = 0.0f;
}
auto t3 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << " us\n";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t3 - t2).count() << " us\n";
}
void B() {
auto t1 = std::chrono::high_resolution_clock::now();
auto* data = new float [5000000];
auto t2 = std::chrono::high_resolution_clock::now();
float *ptr = data;
for (int i = 0; i < 5000000; ++i) {
ptr[i] = 0.0f;
}
auto t3 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << " us\n";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t3 - t2).count() << " us\n";
}
int main(int argc, char** argv) {
A();
B();
return 0;
}
A() cost about 6000 us to initialize the vector, then 1400 us to fill zeros.
B() cost less than 10 us to allocate memory, then 5800 us to fill zeros.
Why their time costs have such a large difference?
compiler: g++=9.3.0
flags: -O3 -DNDEBUG
First, note that the std::vector<float> constructor already zeros the vector.
There are many plausible system-level explanations for the behavior you observe:
One very plausible is caching: When you allocate the array using new, the memory referenced by the returned pointer is not in the cache. When you create a vector, the constructor will zero the allocated memory area under the hood thereby bringing the memory to the cache. Subsequent zeroing will hit in the cache thus.
Other reasons might include compiler optimizations. A compiler might realize that your zeroing is unneccesary with std::vector. Given the figures you obtained I would discount this here though.
QuickBench is a nice tool to compare different ways doing the same thing.
https://quick-bench.com/q/p4ThYlVCa7VbO6vy6LEVVZ_0CVs
Your array example leaves a huge memory leak and QuickBench gives an error because of that.
The code I used (added two more variants):
static void Vector(benchmark::State& state) {
// Code inside this loop is measured repeatedly
for (auto _ : state) {
std::vector<float> data(500000);
float *p = data.data();
for (int i = 0; i < 500000; ++i) {
p[i] = 0.0f;
}
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(data);
}
}
// Register the function as a benchmark
BENCHMARK(Vector);
static void VectorPushBack(benchmark::State& state) {
for (auto _ : state) {
std::vector<float> data;
for (int i = 0; i < 500000; ++i) {
data.push_back(0.0f);
}
benchmark::DoNotOptimize(data);
}
}
BENCHMARK(VectorPushBack);
static void VectorInit(benchmark::State& state) {
for (auto _ : state) {
std::vector<float> data(500000, 0.0f);
benchmark::DoNotOptimize(data);
}
}
BENCHMARK(VectorInit);
static void Array(benchmark::State& state) {
for (auto _ : state) {
auto* data = new float [500000];
float *ptr = data;
for (int i = 0; i < 500000; ++i) {
ptr[i] = 0.0f;
}
benchmark::DoNotOptimize(data);
delete[] data;
}
}
BENCHMARK(Array);
static void ArrayInit(benchmark::State& state) {
for (auto _ : state) {
auto* data = new float [500000]();
benchmark::DoNotOptimize(data);
delete[] data;
}
}
BENCHMARK(ArrayInit);
static void ArrayMemoryLeak(benchmark::State& state) {
for (auto _ : state) {
auto* data = new float [500000];
float *ptr = data;
for (int i = 0; i < 500000; ++i) {
ptr[i] = 0.0f;
}
benchmark::DoNotOptimize(data);
}
}
//BENCHMARK(ArrayMemoryLeak);
Results:
All variants but the push_back one are almost the same in runtime. But the vector is much safer. It's very easy to forget to free the memory (as you demonstrated yourself).
EDIT: Fixed the mistake in the push_back variant. Thanks to t.niese and Scheff's Cat for pointing it out and fixing it.
Related
I have an array of integers with a bunch of numbers from 1-10
Then I have an array of names(strings) which belong with the numbers a.e.
Numbers[0] = 5, Numbers[1] = 2
Names[0] = "Jeremy", Names [1] = "Samantha".
I can easily order the numbers with:
int n = sizeof(Numbers) / sizeof(Numbers[0]);
sort(Numbers, Numbers + n, greater<int>());
But then the names and numbers don't match at all.
How do I fix this?
A very common approach is to create an array of indices and sort that:
std::vector<int> indices(Numbers.size());
std::iota(indices.begin(), indices.end(), 0);
std::sort(indices.begin(), indices.end(),
[&](int A, int B) -> bool {
return Numbers[A] < Numbers[B];
});
The original arrays are not altered, but now indices can be used to access both arrays in the desired order.
If we want to reorder Numbers or Names in place, then we can
create a set of "back indices" that record where to find the element i in the sorted array:
std::vector<int> back_indices(indices.size());
for (size_t i = 0; i < indices.size(); i++)
back_indices[indices[i]] = i;
Now we can reorder, for example, Names in place in the desired order:
int index = 0;
std::string name = Names[index];
for (int i = 0; i < back_indices.size(); i++) {
index = back_indices[index];
std::swap(name,Names[index]);
}
I've tested this code which should give you the required behavior:
struct numberName {
int num;
string name;
};
bool compare(numberName a, numberName b){
return a.num < b.num; // if equal, no need to sort.
}
int main() {
numberName list[2];
list[0].num = 5, list[1].num = 2;
list[0].name = "Jeremy", list[1].name = "Samantha";
sort(list, list+2, compare);
}
Like HAL9000 said, you want to use a struct since this keeps variables that belong to each other together. Alternatively you could use a pair, but I don't know if a pair would be good practice for your situation or not.
This is a great example of the complexities introduced by using parallel arrays.
If you insist on keeping them as parallel arrays, here is a possible approach. Create a vector of integer indexes, initialised to { 0, 1, 2, 3, etc }. Each integer represents one position in your array. Sort your vector of indexes using a custom comparision function that uses the indexes to refer to array1 (Numbers). When finished you can use the sorted indexes to reorder array1 and array2 (Names).
One could also write their own sort algorithm that performs swaps on the extra array at the same time.
Or one could trick std::sort into sorting both arrays simultaneously by using a cleverly designed proxy. I will demonstrate that such a thing is possible, although the code below may be considered a simple hackish proof of concept.
Tricking std::sort with a cleverly-designed proxy
#include <iostream>
#include <algorithm>
constexpr size_t SZ = 2;
int Numbers[SZ] = {5, 2};
std::string Names[SZ] = {"Jeremy", "Samantha"};
int tempNumber;
std::string tempName;
class aproxy {
public:
const size_t index = 0;
const bool isTemp = false;
aproxy(size_t i) : index(i) {}
aproxy() = delete;
aproxy(const aproxy& b) : isTemp(true)
{
tempName = Names[b.index];
tempNumber = Numbers[b.index];
}
void operator=(const aproxy& b) {
if(b.isTemp) {
Names[index] = tempName;
Numbers[index] = tempNumber;
} else {
Names[index] = Names[b.index];
Numbers[index] = Numbers[b.index];
}
}
bool operator<(const aproxy& other) {
return Numbers[index] < Numbers[other.index];
}
};
int main() {
aproxy toSort[SZ] = {0, 1};
std::sort(toSort, toSort+SZ);
for(int i=0; i<SZ; ++i) {
std::cout << "Numbers[" << i << "]=" << Numbers[i] << std::endl;
std::cout << "Names[" << i << "]=" << Names[i] << std::endl;
}
return 0;
}
...and an even more cleverly-designed proxy could avoid entirely the need to allocate SZ "aproxy" elements.
Tricking std::sort with an "even more cleverly-designed" proxy
#include <iostream>
#include <algorithm>
class aproxy;
constexpr size_t SZ = 2;
int Numbers[SZ] = {5, 2};
std::string Names[SZ] = {"Jeremy", "Samantha"};
aproxy *tempProxyPtr = nullptr;
int tempNumber;
std::string tempName;
class aproxy {
public:
size_t index() const
{
return (this - reinterpret_cast<aproxy*>(Numbers));
}
bool isTemp() const
{
return (this == tempProxyPtr);
}
~aproxy()
{
if(isTemp()) tempProxyPtr = nullptr;
}
aproxy() {}
aproxy(const aproxy& b)
{
tempProxyPtr = this;
tempName = Names[b.index()];
tempNumber = Numbers[b.index()];
}
void operator=(const aproxy& b) {
if(b.isTemp()) {
Names[index()] = tempName;
Numbers[index()] = tempNumber;
} else {
Names[index()] = Names[b.index()];
Numbers[index()] = Numbers[b.index()];
}
}
bool operator<(const aproxy& other) {
return Numbers[index()] < Numbers[other.index()];
}
};
int main() {
aproxy* toSort = reinterpret_cast<aproxy*>(Numbers);
std::sort(toSort, toSort+SZ);
for(int i=0; i<SZ; ++i) {
std::cout << "Numbers[" << i << "]=" << Numbers[i] << std::endl;
std::cout << "Names[" << i << "]=" << Names[i] << std::endl;
}
return 0;
}
Disclaimer: although my final example above may technically be in violation of the strict-aliasing rule (due to accessing the same space in memory as two different types), the underlying memory is only used for addressing space-- not modified-- and it does seems to work fine when I tested it. Also it relies entirely on std::sort being written in a certain way: using a single temp variable initialized via copy construction, single-threaded, etc. Putting together all these assumptions it may be a convenient trick but not very portable so use at your own risk.
I have the following fragment of code. It contains 3 sections where I measure memory access runtime. First is plain iteration over the array. The second is almost the same with the exception that the array address received from the function call. The third is the same as the second but manually optimized.
#include <map>
#include <cstdlib>
#include <chrono>
#include <iostream>
std::map<void*, void*> cache;
constexpr int elems = 1000000;
double x[elems] = {};
template <typename T>
T& find_in_cache(T& var) {
void* key = &var;
void* value = nullptr;
if (cache.count(key)) {
value = cache[key];
} else {
value = malloc(sizeof(T));
cache[key] = value;
}
return *(T*)value;
}
int main() {
std::chrono::duration<double> elapsed_seconds1, elapsed_seconds2, elapsed_seconds3;
for (int k = 0; k < 100; k++) { // account for cache effects
// first section
auto start = std::chrono::steady_clock::now();
for (int i = 1; i < elems; i++) {
x[i] = (x[i-1] + 1.0) * 1.001;
}
auto end = std::chrono::steady_clock::now();
elapsed_seconds1 = end-start;
// second section
start = std::chrono::steady_clock::now();
for (int i = 1; i < elems; i++) {
find_in_cache(x)[i] = (find_in_cache(x)[i-1] + 1.0) * 1.001;
}
end = std::chrono::steady_clock::now();
elapsed_seconds2 = end-start;
// third section
start = std::chrono::steady_clock::now();
double* y = find_in_cache(x);
for (int i = 1; i < elems; i++) {
y[i] = (y[i-1] + 1.0) * 1.001;
}
end = std::chrono::steady_clock::now();
elapsed_seconds3 = end-start;
}
std::cout << "elapsed time 1: " << elapsed_seconds1.count() << "s\n";
std::cout << "elapsed time 2: " << elapsed_seconds2.count() << "s\n";
std::cout << "elapsed time 3: " << elapsed_seconds3.count() << "s\n";
return x[elems - 1]; // prevent optimizing away
}
The timings of these sections are following:
elapsed time 1: 0.0018678s
elapsed time 2: 0.00423903s
elapsed time 3: 0.00189678s
Is it possible to change the interface of find_in_cache() without changing the body of the second iteration section to make its performance the same as section 3?
template <typename T>
[[gnu::const]]
T& find_in_cache(T& var) { ... }
lets clang optimize the code the way you want, but gcc fails to handle the call as a loop invariant, even with gnu::noinline to make sure the attribute is not lost (maybe worth a bug report?).
How safe such code is may depend on the rest of your code. It is a lie since the function can use memory, but it may be ok if that memory is private enough to the function. Preventing inlining of find_in_cache may help reduce the risks.
You can also convince gcc to optimize with
template <typename T>
[[gnu::const,gnu::noinline]]
T& find_in_cache(T& var) noexcept { ... }
which would cause your program to terminate if there isn't enough memory to add an element in the cache.
I am trying to understand the difference between AoS and SoA in practical terms.
I've tried this in C# and it yielded no result, so now I'm trying in C++.
#include <stdlib.h>
#include <chrono>
#include <iostream>
#include <math.h>
const int iterations = 40000000;
class Entity {
public:
float a, b, c;
};
struct Entities {
public:
float a[iterations];
float b[iterations];
float c[iterations];
};
void AoSTest(int iterations, Entity enArr[]);
void SoATest(int iterations, Entities* entities);
int main()
{
Entity* enArr = new Entity[iterations];
Entities* entities = new Entities;
int A = rand() - 50;
int B = rand() - 50;
int C = rand() - 50;
for (int i = 0; i < iterations; i++)
{
enArr[i].a = A;
enArr[i].b = B;
enArr[i].c = C;
entities->a[i] = A;
entities->b[i] = B;
entities->c[i] = C;
}
auto start = std::chrono::high_resolution_clock::now();
AoSTest(iterations, enArr);
//SoATest(iterations, entities);
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
//std::cout << std::to_string(elapsed.count()) + "time";
std::cout << std::to_string(std::chrono::duration_cast<std::chrono::seconds>(finish - start).count()) + "s";
}
void AoSTest(int iterations, Entity enArr[]) {
for (int i = 0; i < iterations; i++)
{
enArr[i].a = sqrt(enArr[i].a * enArr[i].c);
enArr[i].c = sqrt(enArr[i].c * enArr[i].a);
//std::cout << std::to_string(sqrt(enArr[i].a) + sqrt(enArr[i].b)) + "\n";
}
}
void SoATest(int iterations, Entities* entities) {
for (int i = 0; i < iterations; i++)
{
entities->a[i] = sqrt(entities->a[i] * entities->c[i]);
entities->c[i] = sqrt(entities->c[i] * entities->a[i]);
//std::cout << std::to_string(sqrt(entities->a[i]) + sqrt(entities->b[i])) + "\n";
}
}
My thought was that since data layout, in theory, should be different there should be a performance difference...
I don't understand why some say that there is a lot to gain here if it's so context sensitive as it seems to me thus far.
Is it completely dependent on SIMD or some specific optimization option?
I'm running it in Visual Studio.
I compiled your code with the intel compiler 18.0.1 and optimisation turned on (-O3). I added some return value, just to make sure nothing can be optimised away.
I found that Structure of Arrays (SoA) is approx twice as fast as Array of Structures (AoS). This makes sence, since the quantity B will not be loaded into the cache from the slow memory (RAM) if you use the SoA approch, but it will occupy cache with the AoS approach. Please not that I changed the time resolution to nanoseconds. Otherwise, I always get 0s as output.
#include <stdlib.h>
#include <chrono>
#include <iostream>
#include <math.h>
const int iterations = 40000000;
class Entity {
public:
float a, b, c;
};
struct Entities {
public:
float a[iterations];
float b[iterations];
float c[iterations];
};
int AoSTest(int iterations, Entity enArr[]);
int SoATest(int iterations, Entities* entities);
int main() {
Entity* enArr = new Entity[iterations];
Entities* entities = new Entities;
int A = rand() - 50;
int B = rand() - 50;
int C = rand() - 50;
for (int i = 0; i < iterations; i++) {
enArr[i].a = A;
enArr[i].b = B;
enArr[i].c = C;
entities->a[i] = A;
entities->b[i] = B;
entities->c[i] = C;
}
auto start = std::chrono::high_resolution_clock::now();
// const auto ret = AoSTest(iterations, enArr);
const auto ret = SoATest(iterations, entities);
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << std::to_string(std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count()) + "ns "
<< "ret=" << ret;
}
int AoSTest(int iterations, Entity enArr[]) {
for (int i = 0; i < iterations; i++) {
enArr[i].a = sqrt(enArr[i].a * enArr[i].c);
enArr[i].c = sqrt(enArr[i].c * enArr[i].a);
}
return enArr[iterations - 1].c;
}
int SoATest(int iterations, Entities* entities) {
for (int i = 0; i < iterations; i++) {
entities->a[i] = sqrt(entities->a[i] * entities->c[i]);
entities->c[i] = sqrt(entities->c[i] * entities->a[i]);
}
return entities->c[iterations - 1];
}
SoA is beneficial to load or store your data with SIMD intrinsics, see e.g. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX&cats=Load&expand=3317 for intel AVX.
Now in your specific case, you need to provide more information regarding the compiler options etc., but probably your specific case is not easy for the compiler to vectorize.
I would suggest that you use independent instructions for each entry (here c depends on a) to perform more tests.
Inside a performance-critical, parallel code I have a vector whose elements are:
Very expensive to compute, and the result is deterministic (the value of the element at a given position will depend on the position only)
Random access (typically the number of accesses are larger or much larger than the size of the vector)
Clustered accesses (many accesses request the same value)
The vector is shared by different threads (race condition?)
To avoid heap defragmention, the object should never be recreated, but whenever possible resetted and recycled
The value to be placed in the vector will be provided by a polymorphic object
Currently, I precompute all possible values of the vectors, so race condition should not be an issue.
In order to improve performances, I am considering to create a lazy vector, such that the code performs computations only when the element of the vector is requested.
In a parallel region, it might happen that more than one thread are requesting, and perhaps calculating, the same element at the same time.
How do I take care of this possible race condition?
Below is an example of what I want to achieve. It compiles and runs properly under Windows 10, Visual Studio 17. I use C++17.
// Lazy.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <vector>
#include <iostream>
#include <stdlib.h>
#include <chrono>
#include <math.h>
const double START_SUM = 1;
const double END_SUM = 1000;
//base object responsible for providing the values
class Evaluator
{
public:
Evaluator() {};
~Evaluator() {};
//Function with deterministic output, depending on the position
virtual double expensiveFunction(int pos) const = 0;
};
//
class EvaluatorA: public Evaluator
{
public:
//expensive evaluation
virtual double expensiveFunction(int pos) const override {
double t = 0;
for (int j = START_SUM; j++ < END_SUM; j++)
t += log(exp(log(exp(log(j + pos)))));
return t;
}
EvaluatorA() {};
~EvaluatorA() {};
};
class EvaluatorB : public Evaluator
{
public:
//even more expensive evaluation
virtual double expensiveFunction(int pos) const override {
double t = 0;
for (int j = START_SUM; j++ < 10*END_SUM; j++)
t += log(exp(log(exp(log(j + pos)))));
return t;
}
EvaluatorB() {};
~EvaluatorB() {};
};
class LazyVectorTest //vector that contains N possible results
{
public:
LazyVectorTest(int N,const Evaluator & eval) : N(N), innerContainer(N, 0), isThatComputed(N, false), eval_ptr(&eval)
{};
~LazyVectorTest() {};
//reset, to generate a new table of values
//the size of the vector stays constant
void reset(const Evaluator & eval) {
this->eval_ptr = &eval;
for (int i = 0; i<N; i++)
isThatComputed[i] = false;
}
int size() { return N; }
//accessing the same position should yield the same result
//unless the object is resetted
const inline double& operator[](int pos) {
if (!isThatComputed[pos]) {
innerContainer[pos] = eval_ptr->expensiveFunction(pos);
isThatComputed[pos] = true;
}
return innerContainer[pos];
}
private:
const int N;
const Evaluator* eval_ptr;
std::vector<double> innerContainer;
std::vector<bool> isThatComputed;
};
//the parallel access will take place here
template <typename T>
double accessingFunction(T& A, const std::vector<int>& elementsToAccess) {
double tsum = 0;
int size = elementsToAccess.size();
//#pragma omp parallel for
for (int i = 0; i < size; i++)
tsum += A[elementsToAccess[i]];
return tsum;
}
std::vector<int> randomPos(int sizePos, int N) {
std::vector<int> elementsToAccess;
for (int i = 0; i < sizePos; i++)
elementsToAccess.push_back(rand() % N);
return elementsToAccess;
}
int main()
{
srand(time(0));
int minAccessNumber = 1;
int maxAccessNumber = 100;
int sizeVector = 50;
auto start = std::chrono::steady_clock::now();
double res = 0;
float numberTest = 100;
typedef LazyVectorTest container;
EvaluatorA eval;
for (int i = 0; i < static_cast<int>(numberTest); i++) {
res = eval.expensiveFunction(i);
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double, std::milli>diff(end - start);
double benchmark = diff.count() / numberTest;
std::cout <<"Average time to compute expensive function:" <<benchmark<<" ms"<<std::endl;
std::cout << "Value of the function:" << res<< std::endl;
std::vector<std::vector<int>> indexs(numberTest);
container A(sizeVector, eval);
for (int accessNumber = minAccessNumber; accessNumber < maxAccessNumber; accessNumber++) {
indexs.clear();
for (int i = 0; i < static_cast<int>(numberTest); i++) {
indexs.emplace_back(randomPos(accessNumber, sizeVector));
}
auto start_lazy = std::chrono::steady_clock::now();
for (int i = 0; i < static_cast<int>(numberTest); i++) {
A.reset(eval);
double res_lazy = accessingFunction(A, indexs[i]);
}
auto end_lazy = std::chrono::steady_clock::now();
std::chrono::duration<double, std::milli>diff_lazy(end_lazy - start_lazy);
std::cout << accessNumber << "," << diff_lazy.count() / numberTest << ", " << diff_lazy.count() / (numberTest* benchmark) << std::endl;
}
return 0;
}
Rather than roll you own locking, I'd first see if you get acceptable performance with std::call_once.
class LazyVectorTest //vector that contains N possible results
{
//Function with deterministic output, depending on the position
void expensiveFunction(int pos) {
double t = 0;
for (int j = START_SUM; j++ < END_SUM; j++)
t += log(exp(log(exp(log(j+pos)))));
values[pos] = t;
}
public:
LazyVectorTest(int N) : values(N), flags(N)
{};
int size() { return values.size(); }
//accessing the same position should yield the same result
double operator[](int pos) {
std::call_once(flags[pos], &LazyVectorTest::expensiveFunction, this, pos);
return values[pos];
}
private:
std::vector<double> values;
std::vector<std::once_flag> flags;
};
call_once is pretty transparent. It allows exactly one thread to run a function to completion. The only potential drawback is that it will block a second thread waiting for a possible exception, rather than immediately do nothing. In this case that is desirable, as you want the modification values[pos] = t; to be sequenced before the read return values[pos];
Your current code is problematic, mainly because of std::vector<bool> being horrible, but also atomicity and memory consistency is missing. Here is the sketch of a solution based entirely on OpenMP. I would suggest to actually special marker for missing entries instead of a separate vector<bool> - it makes everything much easier:
class LazyVectorTest //vector that contains N possible results
{
public:
LazyVectorTest(int N,const Evaluator & eval) : N(N), innerContainer(N, invalid), eval_ptr(&eval)
{};
~LazyVectorTest() {};
//reset, to generate a new table of values
//the size of the vector stays constant
void reset(const Evaluator & eval) {
this->eval_ptr = &eval;
for (int i = 0; i<N; i++) {
// Use atomic if that could possible be done in parallel
// omit that for performance if you doun't ever run it in parallel
#pragma omp atomic write
innerContainer[i] = invalid;
}
// Flush to make sure invalidation is visible to all threads
#pragma omp flush
}
int size() { return N; }
// Don't return a reference here
double operator[] (int pos) {
double value;
#pragma omp atomic read
value = innerContainer[pos];
if (value == invalid) {
value = eval_ptr->expensiveFunction(pos);
#pragma omp atomic write
innerContainer[pos] = value;
}
return value;
}
private:
// Use nan, inf or some random number - doesn't really matter
static constexpr double invalid = std::nan("");
const int N;
const Evaluator* eval_ptr;
std::vector<double> innerContainer;
};
In case of a collision, the other threads will just redundantly compute the value. - exploiting the deterministic nature. My using omp atomic on both read and write of the elements, you ensure that no inconsistent "half-written" values are ever read.
This solution may create some additional latency for the rare bad cases. In turn, the good cases are optimal, with just a single atomic read. You don't even need any memory flushes / seq_cst - worst case is a redundant computation. You would need these (sequential consistency) if you write the flag and value separately, to ensure the order in which the changes becomes visible is correct.
I am trying to create a void pointer to a class object and have it be initialized inside a function. Unfortunately, the array member of the class cannot escape the function i.e. it cannot be accessed after initialization.
In the code below, the first call to print positions (inside the initialize function) works properly, however, the second call to print positions from outside the initialization function fails. I have a feeling that the array object created in the initialization function is destroyed and not passed along but I am not sure and also don't know how to fix it.
Any help would be greatly appreciated.
#include <iostream>
#include <iomanip>
#include <string>
class Atoms
{
double * positions;
int nAtoms;
public:
// Standard constructor prividing a pre-existant array
Atoms(int nAtoms, double * positionsArray)
{
this->nAtoms = nAtoms;
this->positions = positionsArray;
}
// Print positions to screen
void print_positions()
{
std::cout<< "nAtoms: " << this->nAtoms << std::endl;
int nDim = 3;
for (int i = 0; i < nAtoms; i++)
{
for (int j = 0; j < nDim; j++)
{
std::cout << std::setw(6) << this->positions[i * nDim + j] << " ";
}
std::cout << std::endl;
}
std::cout << std::endl;
}
};
void initialize_Atoms_void_pointer(void ** voidAtomsPointer)
{
//Create a new instance of Atoms by a pointer
int numAtoms = 5;
int numDim = 3;
int elemN = numAtoms * numDim;
double data_array[elemN];
for (int i = 0; i < numAtoms; i++)
for (int j = 0; j < numDim; j++)
{
data_array[i * numDim + j] = i * numDim + j + 10;
}
Atoms *atoms = new Atoms(numAtoms, data_array);
// Set the vPointer that the void pointer points to a pointer to Atoms object
*voidAtomsPointer = static_cast<void *>(atoms);
//Test call
std::cout << std::endl << "Initializing atoms" << std::endl;
static_cast<Atoms *>(*voidAtomsPointer)->print_positions();
}
void print_Atoms_pointer_positions(void * voidAtomsPointer)
{
//Cast the pointer as an atoms pointer
Atoms *atomsPointer = static_cast<Atoms *>(voidAtomsPointer);
atomsPointer->print_positions();
}
int main()
{
//Use the initializer function for getting a pointer
void *testVoidAtomsPointer;
initialize_Atoms_void_pointer(&testVoidAtomsPointer);
print_Atoms_pointer_positions(testVoidAtomsPointer);
}
The problem is that in
Atoms *atoms = new Atoms(numAtoms, data_array);
data_array is a local array, which is destroyed when initialize_Atoms_void_pointer quits.
Instead of copying the raw pointer, make a new allocation in Atoms's constructor and copy the content:
Atoms(int nAtoms, double * positionsArray)
{
this->nAtoms = nAtoms;
this->positions = new double[nAtoms];
for (int ii = 0; ii < nAtoms; ++ii)
this->positions[ii] = positionsArray[ii];
}
~Atoms()
{
delete[] this->positions;
}
A safer implementation would include the use of a std::unique_ptr, which will automatically de-allocate the memory for you when Atoms is destroyed:
#include <memory>
class Atoms {
std::unique_ptr<double[]> positions;
// ...
public:
Atoms(int nAtoms, double * positionsArray) :
positions(new double[nAtoms]) {
this->nAtoms = nAtoms;
for (int ii = 0; ii < nAtoms; ++ii)
this->positions[ii] = positionsArray[ii];
}
// ...
};
You'd need also to check if nAtoms is 0 or negative, if the input array is null, etc., but I think it falls out of the scope of the question.
If you need to access the raw pointer, you can use the positions.get() method (do not try to delete it or your application will crash due to a double delete).
Update
Of course, another more straightforward solution is simply to use a std::vector<double> instead ;)
#include <vector>
class Atoms {
std::vector<double> positions;
// int nAtoms; -- no longer necessary
public:
Atoms(int nAtoms, double * positionsArray) :
positions(nAtoms) {
for (int ii = 0; ii < nAtoms; ++ii)
this->positions[ii] = positionsArray[ii];
}
// ...
};
If you need to access the raw pointer, you can use the positions.data() method (do not try to delete it or your application will crash due to a double delete). The number of atoms can be checked using positions.size().
As mentioned in a comment, if the only purpose of the Atoms class is to store doubles but not to add any other operation, then just forget about it and directly use the std::vector<double>.