Dealing with Vectors - cudaMemcpyDeviceToHost

Dealing with Vectors - cudaMemcpyDeviceToHost - c++

It is not obvious how to use std::vector in CUDA, so I have designed my own Vector class:
#ifndef VECTORHEADERDEF
#define VECTORHEADERDEF
#include <cmath>
#include <iostream>
#include <cassert>
template <typename T>
class Vector
{
private:
T* mData; // data stored in vector
int mSize; // size of vector
public:
Vector(const Vector& otherVector); // Constructor
Vector(int size); // Constructor
~Vector(); // Desructor
__host__ __device__ int GetSize() const; // get size of the vector
T& operator[](int i); // see element
// change element i
__host__ __device__ void set(size_t i, T value) {
mData[i] = value;
}
template <class S> // output vector
friend std::ostream& operator<<(std::ostream& output, Vector<S>& v);
};
// Overridden copy constructor
// Allocates memory for new vector, and copies entries of other vector into it
template <typename T>
Vector<T>::Vector(const Vector& otherVector)
{
mSize = otherVector.GetSize();
mData = new T [mSize];
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
}
// Constructor for vector of a given size
// Allocates memory, and initialises entries to zero
template <typename T>
Vector<T>::Vector(int size)
{
assert(size > 0);
mSize = size;
mData = new T [mSize];
for (int i=0; i<mSize; i++)
{
mData[i] = 0.0;
}
}
// Overridden destructor to correctly free memory
template <typename T>
Vector<T>::~Vector()
{
delete[] mData;
}
// Method to get the size of a vector
template <typename T>
__host__ __device__ int Vector<T>::GetSize() const
{
return mSize;
}
// Overloading square brackets
// Note that this uses `zero-based' indexing, and a check on the validity of the index
template <typename T>
T& Vector<T>::operator[](int i)
{
assert(i > -1);
assert(i < mSize);
return mData[i];
}
// Overloading the assignment operator
template <typename T>
Vector<T>& Vector<T>::operator=(const Vector& otherVector)
{
assert(mSize == otherVector.mSize);
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
return *this;
}
// Overloading the insertion << operator
template <typename T>
std::ostream& operator<<(std::ostream& output, Vector<T>& v) {
for (int i=0; i<v.mSize; i++) {
output << v[i] << " ";
}
return output;
}
My main function - where I just pass a vector to the device, modify it and pass it back - is as follows (with the kernel designed just for testing purposes):
#include <iostream>
#include "Vector.hpp"
__global__ void alpha(Vector<int>* d_num)
{
int myId = threadIdx.x + blockDim.x * blockIdx.x;
d_num->set(0,100);
d_num->set(2,11);
}
int main()
{
Vector<int> num(10);
for (int i=0; i < num.GetSize(); ++i) num.set(i,i); // initialize elements to 0:9
std::cout << "Size of vector: " << num.GetSize() << "\n";
std::cout << num << "\n"; // print vector
Vector<int>* d_num;
// allocate global memory on the device
cudaMalloc((void **) &d_num, num.GetSize()*sizeof(int));
// copy data from host memory to the device memory
cudaMemcpy(d_num, &num[0], num.GetSize()*sizeof(int), cudaMemcpyHostToDevice);
// launch the kernel
alpha<<<1,100>>>(d_num);
// copy the modified array back to the host, overwriting the contents of h_arr
cudaMemcpy(num, &d_num[0], num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);
std::cout << num << "\n";
// free GPU memory allocation and exit
cudaFree(d_num);
return 0;
}
The problem I encounter is with cudaMemcpyDeviceToHost. It does not really copy the device vector to the num vector as can be seen from the output.
How should I deal with that? (Please be explicit, I am fairly new to CUDA).

This will create a valid pointer to the first element of the vector num:
cudaMemcpy(d_num, &num[0], num.GetSize()*sizeof(int), cudaMemcpyHostToDevice);
^^^^^^^
This will not:
cudaMemcpy(num, &d_num[0], num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);
^^^
The name of a your Vector object is not a pointer to its first data element. Instead, you should write that line in a similar fashion to the first one you wrote, like this:
cudaMemcpy(&num[0], d_num, num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);
However this by itself is not a fix. Note that d_num is not a Vector, but is already a pointer, so we can use it directly in these operations. Although it is not wrong to use &(d_num[0]), it is unnecessary to do so.
Because d_num is not a Vector (as you have allocated it - it is a bare pointer to a set of int quantities), your usage of Vector methods in the kernel is also broken. If you want to use Vector methods in the kernel, you will need to pass it an actual Vector object, not just the data. Since passing an object will require device data handling within the object (data accessible on the host is not accessible on the device, and vice-versa), it is an extensive re-write of your Vector class. I've made a limited attempt at that, showing one possible way forward. The basic methodology (ie. one possible approach) is as follows:
The object will contain pointers to both a host copy of the data and a device copy of the data.
At object instantiation, we will allocate both, and initially set our "reference" pointer to point to the host copy.
Prior to usage on the device, we must copy the host data to the device data, and the to_device() method is used for this purpose. This method also switches our "reference" pointer (mData) to refer to the device-side copy of the Vector data.
In addition to copying host data to device data "internal" to the object, we must make the object itself usable on the device. For this, we copy the object itself via pointer to a device-side copy (d_num).
We can then use the object in the usual way on the device, for those methods which have a __device__ decoration.
After completion of the kernel, we must update the host copy of the data and switch our "reference" pointer back to the host data. the to_host() method is provided for this purpose.
Thereafter the object can be used again in host code, reflecting the data changes if any which occurred in the kernel.
Here is a worked example:
$ cat t101.cu
#include <iostream>
#include <cmath>
#include <iostream>
#include <cassert>
template <typename T>
class Vector
{
private:
T* mData, *hData, *dData; // data stored in vector
int mSize; // size of vector
public:
Vector(const Vector& otherVector); // Constructor
Vector(int size); // Constructor
~Vector(); // Desructor
__host__ __device__ int GetSize() const; // get size of the vector
__host__ __device__ T& operator[](int i); // see element
// change element i
__host__ __device__ void set(size_t i, T value) {
mData[i] = value;
};
__host__ __device__ Vector<T>& operator=(const Vector<T>& otherVector);
void to_device();
void to_host();
template <class S> // output vector
friend std::ostream& operator<<(std::ostream& output, Vector<S>& v);
};
// Overridden copy constructor
// Allocates memory for new vector, and copies entries of other vector into it
template <typename T>
Vector<T>::Vector(const Vector& otherVector)
{
mSize = otherVector.GetSize();
hData = new T [mSize];
cudaMalloc(&dData, mSize*sizeof(T));
mData = hData;
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
}
// Constructor for vector of a given size
// Allocates memory, and initialises entries to zero
template <typename T>
Vector<T>::Vector(int size)
{
assert(size > 0);
mSize = size;
hData = new T [mSize];
cudaMalloc(&dData, mSize*sizeof(T));
mData = hData;
for (int i=0; i<mSize; i++)
{
mData[i] = 0.0;
}
}
// Overridden destructor to correctly free memory
template <typename T>
Vector<T>::~Vector()
{
delete[] hData;
if (dData) cudaFree(dData);
}
// Method to get the size of a vector
template <typename T>
__host__ __device__
int Vector<T>::GetSize() const
{
return mSize;
}
// Overloading square brackets
// Note that this uses `zero-based' indexing, and a check on the validity of the index
template <typename T>
__host__ __device__
T& Vector<T>::operator[](int i)
{
assert(i > -1);
assert(i < mSize);
return mData[i];
}
// Overloading the assignment operator
template <typename T>
__host__ __device__
Vector<T>& Vector<T>::operator=(const Vector<T>& otherVector)
{
assert(mSize == otherVector.mSize);
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
return *this;
}
// Overloading the insertion << operator
// not callable on the device!
template <typename T>
std::ostream& operator<<(std::ostream& output, Vector<T>& v) {
for (int i=0; i<v.mSize; i++) {
output << v[i] << " ";
}
return output;
}
template <typename T>
void Vector<T>::to_device(){
cudaMemcpy(dData, hData, mSize*sizeof(T), cudaMemcpyHostToDevice);
mData = dData;
}
template <typename T>
void Vector<T>::to_host(){
cudaMemcpy(hData, dData, mSize*sizeof(T), cudaMemcpyDeviceToHost);
mData = hData;
}
__global__ void alpha(Vector<int> *d_num)
{
d_num->set(0,100);
d_num->set(2,11);
(*d_num)[1] = 50;
}
int main()
{
Vector<int> num(10);
for (int i=0; i < num.GetSize(); ++i) num.set(i,i); // initialize elements to 0:9
std::cout << "Size of vector: " << num.GetSize() << "\n";
std::cout << num << "\n"; // print vector
Vector<int> *d_num;
cudaMalloc(&d_num, sizeof(Vector<int>));
num.to_device();
cudaMemcpy(d_num, &(num), sizeof(Vector<int>), cudaMemcpyHostToDevice);
// launch the kernel
alpha<<<1,1>>>(d_num);
// copy the modified array back to the host, overwriting the contents of h_arr
num.to_host();
std::cout << num << "\n";
// free GPU memory allocation and exit
return 0;
}
$ nvcc -arch=sm_61 -o t101 t101.cu
$ cuda-memcheck ./t101
========= CUDA-MEMCHECK
Size of vector: 10
0 1 2 3 4 5 6 7 8 9
100 50 11 3 4 5 6 7 8 9
========= ERROR SUMMARY: 0 errors
$
Notes:
According to my testing, your posted code had various compile errors so I had to make other changes to your Vector class just to get it to compile.
Passing an object by value to the kernel will invoke the copy constructor, and subsequently the destructor, which makes things more difficult, therefore I have elected to pass the object via pointer (which is how you originally had it), to avoid this.
Your kernel call is launching 100 threads. Since they are all doing precisely the same thing, without any read activity going on, there's nothing particularly wrong with this, but I have changed it to just a single thread. It still demonstrates the same capability.

It is not just cudaMemcpyDeviceToHost part that you're having trouble with.
Vector<int> num(10);
Vector<int>* d_num;
cudaMalloc(&d_num, num.GetSize()*sizeof(int));
This will allocate 40 bytes on the cuda global memory(assuming sizeof(int) is 4), which is pointed by d_num of type Vector<int>*. I don't think you are expecting Vector<int> object itself to be 40 bytes.
Let's try another way.
cudaMalloc(&d_num, sizeof(Vector<int>));
cudaMalloc(&d_num->mData, num.GetSize()*sizeof(int)); // assume mData is a public attribute
Unfortunately, the second line will emit segmentation fault because you are accessing device memory from host code(d_num->mData).
So your implementation of Vector class has many fallacies. If you're planning to have a fixed size array, just declare d_num as a pointer.
int* d_num;
cudaMalloc(&d_num, num.GetSize()*sizeof(int));
cudaMemcpy(d_num, &num[0], num.GetSize()*sizeof(int), cudaMemcpyHostToDevice);
// .. some kernel operations
cudaMemcpy(&num[0], d_num, num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);

Thrust is library written for CUDA and it has vectors. http://docs.nvidia.com/cuda/thrust/
Maybe it has all the functions you need, so why reinvent the wheel if you dont have to.

Related

C++20 Visual Studio 2022 Complier Optimization Setting To Implement Move Constructor

I'm exploring move semantics in C++20 using "Beginning C++20 From Novice to Professional" by Horton and Van Weert. I'm using MS Visual Studio 2022 Version 17.2.5 as my IDE and I've tried a few different compiler optimization options under "C/C++ -> Optimization" and they don't seem to have any affect. The option currently selected is "Maximum Optimization (Favor Size) (/O1)" What is supposed to happen is that the number of times the 1000 element Array is moved, gets reduced from 20 to 10, as reflected in the output of the program:
The line "Array of 1000 elements moved" should only print 10 times if the move constructor is used by the compiler
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
Array of 1000 elements moved
The .cpp source and .ixx module files are listed below. Has anyone run into a similar situation and successfully tuned the compiler to avoid this issue?
Array.ixx
export module array;
import <stdexcept>;
import <string>;
import <utility>;
import <iostream>;
export template <typename T>
class Array
{
public:
explicit Array(size_t size); // Constructor
~Array(); // Destructor
Array(const Array& array); // Copy constructor
Array(Array&& array); // Move constructor
Array& operator=(const Array& rhs); // Copy assignment operator
void swap(Array& other) noexcept; // Swap member function
T& operator[](size_t index); // Subscript operator
const T& operator[](size_t index) const; // Subscript operator-const arrays
size_t getSize() const { return m_size; } // Accessor for m_size
private:
T* m_elements; // Array of type T
size_t m_size; // Number of array elements
};
// Constructor template
template <typename T>
Array<T>::Array(size_t size) : m_elements{ new T[size] {} }, m_size{ size }
{}
// Copy constructor template
template <typename T>
inline Array<T>::Array(const Array& array) : Array{ array.m_size }
{
std::cout << "Array of " << m_size << " elements copied" << std::endl;
for (size_t i{}; i < m_size; ++i)
m_elements[i] = array.m_elements[i];
}
// Move constructor template
template <typename T>
Array<T>::Array(Array&& moved)
: m_size{ moved.m_size }, m_elements{ moved.m_elements }
{
std::cout << "Array of " << m_size << " elements moved" << std::endl;
moved.m_elements = nullptr;
}
// Destructor template
template <typename T>
Array<T>::~Array() { delete[] m_elements; }
// const subscript operator template
template <typename T>
const T& Array<T>::operator[](size_t index) const
{
if (index >= m_size)
throw std::out_of_range{ "Index too large: " + std::to_string(index) };
return m_elements[index];
}
// Non-const subscript operator template in terms of const one
// Uses the 'const-and-back-again' idiom
template <typename T>
T& Array<T>::operator[](size_t index)
{
return const_cast<T&>(std::as_const(*this)[index]);
}
// Template for exception-safe copy assignment operators
// (expressed in terms of copy constructor and swap member)
template <typename T>
inline Array<T>& Array<T>::operator=(const Array& rhs)
{
Array<T> copy{ rhs }; // Copy... (could go wrong and throw an exception)
swap(copy); // ... and swap! (noexcept)
return *this;
}
// Swap member function template
template <typename T>
void Array<T>::swap(Array& other) noexcept
{
std::swap(m_elements, other.m_elements); // Swap two pointers
std::swap(m_size, other.m_size); // Swap the sizes
}
// Swap non-member function template (optional)
export template <typename T>
void swap(Array<T>& one, Array<T>& other) noexcept
{
one.swap(other); // Forward to public member function
}
Ex18_01.cpp
import array;
import <string>;
import <vector>;
Array<std::string> buildStringArray(const size_t size)
{
Array<std::string> result{ size };
for (size_t i{}; i < size; ++i)
result[i] = "You should learn from you competitor, but never copy. Copy and you die.";
return result;
}
int main()
{
const size_t numArrays{ 10 };
const size_t numStringsPerArray{ 1000 };
std::vector<Array<std::string>> vectorOfArrays;
vectorOfArrays.reserve(numArrays);
for (size_t i{}; i < numArrays; ++i)
{
vectorOfArrays.push_back(buildStringArray(numStringsPerArray));
}
}

The move constructor is being used, otherwise you would see messages saying "elements copied", not "elements moved". This is not an optimization. It is guaranteed by the language. The compiler options don't matter.
It is not guaranteed, no matter what optimization settings you use, that the line will be printed only 10 times. The compiler is allowed to perform NRVO (named return value optimization) in buildStringArray, in which case there will be only 10 lines for the construction inside the std::vector, but the compiler does not have to apply NRVO, nor can it be forced to do that. If it doesn't apply NRVO (for whatever reason), then the line may be printed up to 20 times, since each return result; statement also causes a move construction.
If the book claims that it would be guaranteed that the line is printed only 10 times, then it is wrong. But according to OP's comment under this question it qualifies the statement with "normally", which I guess isn't incorrect. My tests on compiler explorer, putting everything in one translation unit, show that GCC, Clang and MSVC all apply NRVO, but I am not sure what the effect of separating into multiple translation units/modules will be.
Of course, if the compiler does not apply NRVO even with optimizations enabled, you may ask whether this is a missed optimization, but that is purely a quality-of-implementation issue, not a language issue.

Why is this vector implementation more performant?

For learning purposes, I decided to implement my own vector data structure. I called it list because that seems to generally be the more proper name for it but that's unimportant.
I am halfway through implementing this class (inserting and getting are complete) and I decide to write some benchmarks with surprising results.
My compiler is whatever Visual Studio 2019 uses. I have tried debug and release, in x64 and x86.
For some reason, my implementation is faster than vector and I cannot think of a reason why. I fear that either my implementation or testing method are flawed.
Here are my results (x64, debug):
List: 13269ms
Vector: 78515ms
Release has a much less drastic, but still apparent, difference.
List: 65ms
Vector: 247ms
Here is my code
dataset.hpp:
#ifndef DATASET_H
#define DATASET_H
#include <memory>
#include <stdexcept>
#include <algorithm>
#include <functional>
#include <chrono>
namespace Dataset {
template <class T>
class List {
public:
List();
List(unsigned int);
void push(T);
T& get(int);
void reserve(int);
void shrink();
int count();
int capacity();
~List();
private:
void checkCapacity(int);
void setCapacity(int);
char* buffer;
int mCount, mCapacity;
};
template <class T>
List<T>::List() {
mCount = 0;
mCapacity = 0;
buffer = 0;
setCapacity(64);
}
template <class T>
List<T>::List(unsigned int initcap) {
mCount = 0;
buffer = 0;
setCapacity(initcap);
}
template <class T>
void List<T>::push(T item) {
checkCapacity(1);
new(buffer + (sizeof(T) * mCount++)) T(item);
}
template <class T>
T& List<T>::get(int index) {
return *((T*)(buffer + (sizeof(T) * index)));
}
template <class T>
void List<T>::reserve(int desired) {
if (desired > mCapacity) {
setCapacity(desired);
}
}
template <class T>
void List<T>::shrink() {
if (mCapacity > mCount) {
setCapacity(mCount);
}
}
template <class T>
int List<T>::count() {
return mCount;
}
template <class T>
int List<T>::capacity() {
return mCapacity;
}
template <class T>
void List<T>::checkCapacity(int cap) {
// Can <cap> more items fit in the list? If not, expand!
if (mCount + cap > mCapacity) {
setCapacity((int)((float)mCapacity * 1.5));
}
}
template <class T>
void List<T>::setCapacity(int cap) {
mCapacity = cap;
// Does buffer exist yet?
if (!buffer) {
// Allocate a new buffer
buffer = new char[sizeof(T) * cap];
}
else {
// Reallocate the old buffer
char* newBuffer = new char[sizeof(T) * cap];
if (newBuffer) {
std::copy(buffer, buffer + (sizeof(T) * mCount), newBuffer);
delete[] buffer;
buffer = newBuffer;
}
else {
throw std::runtime_error("Allocation failed");
}
}
}
template <class T>
List<T>::~List() {
for (int i = 0; i < mCount; i++) {
get(i).~T();
}
delete[] buffer;
}
long benchmark(std::function<void()>);
long benchmark(std::function<void()>, long);
long benchmark(std::function<void()> f) {
return benchmark(f, 100000);
}
long benchmark(std::function<void()> f, long iters) {
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
auto start = high_resolution_clock::now();
for (long i = 0; i < iters; i++) {
f();
}
auto end = high_resolution_clock::now();
auto time = duration_cast<std::chrono::milliseconds>(end - start);
return (long)time.count();
}
}
#endif
test.cpp:
#include "dataset.hpp"
#include <iostream>
#include <vector>
/*
TEST CODE
*/
class SimpleClass {
public:
SimpleClass();
SimpleClass(int);
SimpleClass(const SimpleClass&);
void sayHello();
~SimpleClass();
private:
int data;
};
SimpleClass::SimpleClass() {
//std::cout << "Constructed " << this << std::endl;
data = 0;
}
SimpleClass::SimpleClass(int data) {
//std::cout << "Constructed " << this << std::endl;
this->data = data;
}
SimpleClass::SimpleClass(const SimpleClass& other) {
//std::cout << "Copied to " << this << std::endl;
data = other.data;
}
SimpleClass::~SimpleClass() {
//std::cout << "Deconstructed " << this << std::endl;
}
void SimpleClass::sayHello() {
std::cout << "Hello! I am #" << data << std::endl;
}
int main() {
long list = Dataset::benchmark([]() {
Dataset::List<SimpleClass> list = Dataset::List<SimpleClass>(1000);
for (int i = 0; i < 1000; i++) {
list.push(SimpleClass(i));
}
});
long vec = Dataset::benchmark([]() {
std::vector<SimpleClass> list = std::vector<SimpleClass>(1000);
for (int i = 0; i < 1000; i++) {
list.emplace_back(SimpleClass(i));
}
});
std::cout << "List: " << list << "ms" << std::endl;
std::cout << "Vector: " << vec << "ms" << std::endl;
return 0;
}

std::vector constructor with one parameter creates vector with count elements:
explicit vector( size_type count, const Allocator& alloc = Allocator() );
To have something comparable for vector you have to do:
std::vector<SimpleClass> list;
list.reserve( 1000 );
also your "vector" copies objects it holds by simply copying memory, which is only allowed for trivially copyable objects, and SimpleClass is not one of them as it has user defined constuctors.

This is a really nice start! Clean and simple solution to the exercise. Sadly, your instincts are right that you weren’t testing enough cases.
One thing that jumps out at me is that you never resize your vectors, and therefore don’t measure how most STL implementations can often avoid copying when they grow in size. It also never returns any memory to the heap when it shrinks. You also don’t say whether you were compiling with /Oz to enable optimizations. But my guess is that there’s a small amount of overhead in Microsoft’s implementation, and it would pay off in other tests (especially an array of non-trivially-copyable data that needs to be resized, or a series of vectors that start out big but can be filtered and shrunk, or storing lots of data that can be moved instead of copied).
One bug that jumps out at me is that you call new[] to allocate a buffer of char—which is not guaranteed to meet the alignment requirements of T. On some CPUs, that can crash the program.
Another is that you use std::copy with an uninitialized area of memory as the destination in List::setCapacity. That doesn’t work except in special cases: std::copy expects a validly-initialized object that can be assigned to. For any type where assignment is a non-trivial operation, this will fail when the program tries to call a destructor on garbage data. If that happens to work, the move will then inefficiently clone the data and destroy the original, rather than using the move constructor if one exists. The STL algorithm you really want here is std::uninitialized_move. You might also want to use calloc/realloc, which allows resizing blocks.
Your capacity and size members should be size_t rather than int. This not only limits the size to less memory than most implementations can address, calculating a size greater than INT_MAX (i.e., 2 GiB or more on most implementations) causes undefined behavior.
One thing List::push has going for it is that it uses the semantics of std::vector::emplace_back (which you realize, and use as your comparison). It could, however, be improved. You pass item in by value, rather than by const reference. This creates an unnecessary copy of the data. Fortunately, if T has a move constructor, the extra copy can be moved, and if item is an xvalue, the compiler might be able to optimize the copy away, but it would be better to have List::push(const T&) and List::push(T&&). This will let the class push an xvalue without making any copies at all.
List::get is better, and avoids making copies, but it does not have a const version, so a const List<T> cannot do anything. It also does not check bounds.
Consider putting the code to look up the position of an index within the buffer into a private inline member function, which would drastically cut down the amount of work you will need to do to fix design changes (such as the ones you will need to fix the data-alignment bug).

Own vector error

I am trying create my own vector, I am at the beginning, and when compile e execute the code, i get "Program not responding". This is the code:
struct X
{
X(){};
~X(){};
int v1, v2, v3;
};
template<typename T>
class Vector
{
public:
// constructors
Vector();
Vector(unsigned s);
virtual ~Vector();
// overloaded operators
T operator[](unsigned index);
// others
void clear();
void add(T value);
unsigned getSize();
bool isEmpty();
private:
// pointer to first item of memory block
T* first;
unsigned size;
};
template<typename T>
Vector<T>::Vector()
{
first = NULL;
size = 0;
}
template<typename T>
Vector<T>::Vector(unsigned s)
{
size = s;
first = new T[s];
};
template<typename T>
Vector<T>::~Vector()
{
clear();
}
template<typename T>
void Vector<T>::clear()
{
for(unsigned i = size ; i > 0 ; i--)
delete &first[i];
first = NULL;
}
template<typename T>
void Vector<T>::add(T value)
{
T* temp = new T[size + 1]; // error happens here
// copy data to new location
for(unsigned i = 0 ; i < size ; i++)
temp[i] = first[i];
// delete older data
clear();
// add the new value in last index
temp[size + 1] = value;
// update the pointer
first = temp;
size++;
}
template<typename T>
T Vector<T>::operator[](unsigned index)
{
return first[index];
}
template<typename T>
unsigned Vector<T>::getSize()
{
return size;
}
template<typename T>
bool Vector<T>::isEmpty()
{
return first == NULL;
}
int main(int argc, char* args[])
{
Vector<X> anything;
X thing;
anything.add(thing);
anything.add(thing);
anything.add(thing); // if remove this line, program work fine.
}
As I commented, error happens in T* temp = new T[size + 1];.
If i define the value of v1, v2, v3 of X class, e.g. X() : v1(0), v2(0), v3(0) { }, the program works correctly.
If i change the type, e.g., Vector of int, he works perfectly.
If put X class in std::vector, work fine too.
Other comments are also accepted.
Can someone helpme?

Your description of the problem is incredibly vague, but I can point out problems with your code:
No vector copy constructor (causes double-deletes and crashes)
No vector copy assignment (causes double-deletes and crashes)
clear is incorrectly calling delete (causes crashes and corruption) (you should match your single new of an array with a single delete of the array. Don't loop over elements.
add is writing past the end of the array (causes crashes and corruption)
add is not exception safe
You have to fix at least the first four. The third and fourth are probably the causes of your hang.

You have a buffer overflow occurring.
T* temp = new T[size + 1]; // When size is 0, you allocate 1 space.
You then assign to the temp array, but in location temp[1], which isn't a valid location because your array has only 1 element. This is undefined behavior, and that this point, your program is free to continue however it chooses. In this case, it seems to loop indefinitely.
// add the new value in last index
temp[size + 1] = value; // When size is zero, your array is length '1', but
// you are accessing temp[1] which is outside the
// bounds of your allocated memory.

allocating vectors (or vectors of vectors) dynamically

I need to dynamically allocate 1-D and 2-D arrays whose sizes are given at run-time.
I managed to "discover" std::vector and I think it fits my purposes, but I would like to ask whether what I've written is correct and/or can be improved.
This is what I'm doing:
#include <vector>
typedef std::vector< std::vector<double> > matrix;
//... various code and other stuff
std::vector<double> *name = new std::vector<double> (size);
matrix *name2 = new matrix(sizeX, std::vector<double>(sizeY));

Dynamically allocating arrays is required when your dimensions are given at runtime, as you've discovered.
However, std::vector is already a wrapper around this process, so dynamically allocating vectors is like a double positive. It's redundant.
Just write (C++98):
#include <vector>
typedef std::vector< std::vector<double> > matrix;
matrix name(sizeX, std::vector<double>(sizeY));
or (C++11 and later):
#include <vector>
using matrix = std::vector<std::vector<double>>;
matrix name(sizeX, std::vector<double>(sizeY));

You're conflating two issues, dynamic allocation and resizable containers. You don't need to worry about dynamic allocation, since your container does that for you already, so just say it like this:
matrix name(sizeX, std::vector<double>(sizeY));
This will make name an object with automatic storage duration, and you can access its members via name[i][j].

What you're doing should basically work, however:
In general, don't dynamically allocate objects
If you want a vector, do this:
std::vector<double> vec(size);
not this:
std::vector<double>* vec = new std::vector<double>(size);
The latter gives you a pointer, which you have to delete. The former gives you a vector which, when it goes out of scope, cleans up after itself. (Internally, of course, it dynamically allocates objects, but the trick is that this is handled by the class itself, and you don't need to worry about it in your user code).

It is correct but could be made more efficient.
You could use the boost multidimensional arrays:
http://www.boost.org/doc/libs/1_47_0/libs/multi_array/doc/user.html
Or, you can implement your own class for it and handle the indexing yourself.
Perhaps something like this (which is not well tested):
#include <vector>
#include <cassert>
template <typename T, typename A = std::allocator<T> >
class Array2d
{
public:
typedef Array2d<T> self;
typedef std::vector<T, A> Storage;
typedef typename Storage::iterator iterator;
typedef typename Storage::const_iterator const_iterator;
Array2d() : major_(0), minor_(0) {}
Array2d(size_t major, size_t minor)
: major_(major)
, minor_(minor)
, storage_(major * minor)
{}
template <typename U>
Array2d(size_t major, size_t minor, U const& init)
: major_(major)
, minor_(minor)
, storage_(major * minor, u)
{
}
iterator begin() { return storage_.begin(); }
const_iterator begin() const { return storage_.begin(); }
iterator end() { return storage_.end(); }
const_iterator end() const { return storage_.end(); }
iterator begin(size_t major) {
assert(major < major_);
return storage_.begin() + (major * minor_);
}
const_iterator begin(size_t major) const {
assert(major < major_);
return storage_.begin() + (major * minor_);
}
iterator end(size_t major) {
assert(major < major_);
return storage_.begin() + ((major + 1) * minor_);
}
const_iterator end(size_t major) const {
assert(major < major_);
return storage_.begin() + ((major + 1) * minor_);
}
void clear() {
storage_.clear();
major_ = 0;
minor_ = 0;
}
void clearResize(size_t major, size_t minor)
{
clear();
storage_.resize(major * minor);
major_ = major;
minor_ = minor;
}
void resize(size_t major, size_t minor)
{
if ((major != major_) && (minor != minor_))
{
Array2d tmp(major, minor);
swap(tmp);
// Get minimum minor axis
size_t const dist = (tmp.minor_ < minor_) ? tmp.minor_ : minor_;
size_t m = 0;
// copy values across
for (; (m < tmp.major_) && (m < major_); ++m) {
std::copy(tmp.begin(m), tmp.begin(m) + dist, begin(m));
}
}
}
void swap(self& other)
{
storage_.swap(other.storage_);
std::swap(major_, other.major_);
std::swap(minor_, other.minor_);
}
size_t minor() const {
return minor_;
}
size_t major() const {
return major_;
}
T* buffer() { return &storage_[0]; }
T const* buffer() const { return &storage_[0]; }
bool empty() const {
return storage_.empty();
}
template <typename ArrRef, typename Ref>
class MajorProxy
{
ArrRef arr_;
size_t major_;
public:
MajorProxy(ArrRef arr, size_t major)
: arr_(arr)
, major_(major)
{}
Ref operator[](size_t index) const {
assert(index < arr_.minor());
return *(arr_.buffer() + (index + (major_ * arr_.minor())));
}
};
MajorProxy<self&, T&>
operator[](size_t major) {
return MajorProxy<self&, T&>(*this, major);
}
MajorProxy<self const&, T const&>
operator[](size_t major) const {
return MajorProxy<self&, T&>(*this, major);
}
private:
size_t major_;
size_t minor_;
Storage storage_;
};

While the points the other answers made were very correct (don't dynamically allocate the vector via new, but rather let the vector do the allocation), if you are thinking terms of vectors and matrices (e.g. linear algebra), you might want to consider using the Eigen matrix library.

You don't allocate containers dynamically. They can automatically manage memory for you if they themselves are not manually managed.
A vector grows when you add new items with push_back (or insert), you can choose its size from the start with arguments to the constructor, and you can resize it later with the resize method.
Creating a vector of vectors with your sizes with the constructor looks like this:
std::vector< std::vector<double> > matrix(size, std::vector<double>(sizeY));
This means: size instances of a std::vector<double>, each containing sizeY doubles (initialized to 0.0).

Sometimes you don't want to fill your stack and your memory requirement is large. Hence you may want to use vector> created dynamically especially while creating a table of a given row and col values.
Here is my take on this in C++11
int main() {
int row, col;
std::cin >> row >> col;
auto *arr = new std::vector<std::vector<int>*>(row);
for (int i=0; i<row; i++) {
auto *x = new std::vector<int>(col, 5);
(*arr)[i] = x;
}
for (int i=0; i<row; i++) {
for(int j=0; j<col; j++) {
std::cout << arr->at(i)->at(j) << " ";
}
std::cout << std::endl;
}
return 0;
}

#include < iostream >
#include < vector >
using namespace std;
int main(){
vector<int>*v = new vector<int>(); // for 1d vector just copy paste it
v->push_back(5);
v->push_back(10);
v->push_back(20);
v->push_back(25);
for(int i=0;i<v->size();i++){
cout<<v->at(i)<<" ";
}
cout<<endl;
delete v;
system("pause");
return 0;
}

If you don't need to resize the array sizes at run time, then you can just use standard arrays (allocated at runtime)!
However, if you do need to resize arrays at runtime, then you can use the following (revised) code:
#include <vector>
typedef std::vector< std::vector<double> > matrix;
//... various code and other stuff
std::vector<double> *name = new std::vector<double> (size);
matrix *name2 = new matrix(sizeX, std::vector<double>(sizeY));
In essence, all I've done is remove a single bracket (().

Compiler Error C2106 when trying to simulate a vector

I'm trying to write a fake vector for my class assignment, and I currently get an error in the member function pushBack.
The compiler doesn't seem to like incrementing the SIZE variable which holds the number of elements in the "vector". Is there something I might need to fix?
Your assistance would be highly appreciated for helping me with this, and any other problems you might happen to find.
/*
Write a simple program that simulates the behavior of vectors
-You should be able to add and remove elements to the vector
-You should be able to access an element directly.
-The vector should be able to hold any data type.
*/
#include <stdio.h>
template <class T, int SIZE>
class Vector
{
#pragma region constructors&destructors
private:
T vec[SIZE];
public:
Vector()
{}
~Vector()
{}
#pragma endregion
template <class T/*, int SIZE*/>
void defineVec(T var)
{
for(int i=0; i<SIZE; i++)
{
vec[i] = var;
}
//printf("All elements of the vector have been defined with %", var)
//What should I do when trying to print a data type or variable
//of an unspecified one along with the '%'?
}
template <class T/*, int SIZE*/>
void pushBack(T var)
{
SIZE ++; //C1205
vec[SIZE - 1] = var;
}
template <class T/*, int SIZE*/>
void popBack()
{
vec[SIZE - 1] = NULL;
SIZE--;
}
//template <class T/*, int SIZE*/>
void showElements()
{
for(int i=0; i<SIZE; i++)
{
printf("%d",vec[i]);
printf("\n");
}
}
};
int main()
{
Vector <int, 5> myints;
myints.pushBack(6);
myints.showElements();
return 0;
}

You're passing SIZE as a template parameter. Inside the definition of a template, a non-type template parameter is basically a constant -- i.e., you can't modify it.
You'll need to define a separate variable to keep track of how much of the storage in your vector-alike is currently being used.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Dealing with Vectors - cudaMemcpyDeviceToHost - c++

Thrust is library written for CUDA and it has vectors. http://docs.nvidia.com/cuda/thrust/ Maybe it has all the functions you need, so why reinvent the wheel if you dont have to.

Related

C++20 Visual Studio 2022 Complier Optimization Setting To Implement Move Constructor

Why is this vector implementation more performant?

Own vector error

allocating vectors (or vectors of vectors) dynamically

Compiler Error C2106 when trying to simulate a vector

Categories

Resources