Why is this vector implementation more performant? - c++

For learning purposes, I decided to implement my own vector data structure. I called it list because that seems to generally be the more proper name for it but that's unimportant.
I am halfway through implementing this class (inserting and getting are complete) and I decide to write some benchmarks with surprising results.
My compiler is whatever Visual Studio 2019 uses. I have tried debug and release, in x64 and x86.
For some reason, my implementation is faster than vector and I cannot think of a reason why. I fear that either my implementation or testing method are flawed.
Here are my results (x64, debug):
List: 13269ms
Vector: 78515ms
Release has a much less drastic, but still apparent, difference.
List: 65ms
Vector: 247ms
Here is my code
dataset.hpp:
#ifndef DATASET_H
#define DATASET_H
#include <memory>
#include <stdexcept>
#include <algorithm>
#include <functional>
#include <chrono>
namespace Dataset {
template <class T>
class List {
public:
List();
List(unsigned int);
void push(T);
T& get(int);
void reserve(int);
void shrink();
int count();
int capacity();
~List();
private:
void checkCapacity(int);
void setCapacity(int);
char* buffer;
int mCount, mCapacity;
};
template <class T>
List<T>::List() {
mCount = 0;
mCapacity = 0;
buffer = 0;
setCapacity(64);
}
template <class T>
List<T>::List(unsigned int initcap) {
mCount = 0;
buffer = 0;
setCapacity(initcap);
}
template <class T>
void List<T>::push(T item) {
checkCapacity(1);
new(buffer + (sizeof(T) * mCount++)) T(item);
}
template <class T>
T& List<T>::get(int index) {
return *((T*)(buffer + (sizeof(T) * index)));
}
template <class T>
void List<T>::reserve(int desired) {
if (desired > mCapacity) {
setCapacity(desired);
}
}
template <class T>
void List<T>::shrink() {
if (mCapacity > mCount) {
setCapacity(mCount);
}
}
template <class T>
int List<T>::count() {
return mCount;
}
template <class T>
int List<T>::capacity() {
return mCapacity;
}
template <class T>
void List<T>::checkCapacity(int cap) {
// Can <cap> more items fit in the list? If not, expand!
if (mCount + cap > mCapacity) {
setCapacity((int)((float)mCapacity * 1.5));
}
}
template <class T>
void List<T>::setCapacity(int cap) {
mCapacity = cap;
// Does buffer exist yet?
if (!buffer) {
// Allocate a new buffer
buffer = new char[sizeof(T) * cap];
}
else {
// Reallocate the old buffer
char* newBuffer = new char[sizeof(T) * cap];
if (newBuffer) {
std::copy(buffer, buffer + (sizeof(T) * mCount), newBuffer);
delete[] buffer;
buffer = newBuffer;
}
else {
throw std::runtime_error("Allocation failed");
}
}
}
template <class T>
List<T>::~List() {
for (int i = 0; i < mCount; i++) {
get(i).~T();
}
delete[] buffer;
}
long benchmark(std::function<void()>);
long benchmark(std::function<void()>, long);
long benchmark(std::function<void()> f) {
return benchmark(f, 100000);
}
long benchmark(std::function<void()> f, long iters) {
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
auto start = high_resolution_clock::now();
for (long i = 0; i < iters; i++) {
f();
}
auto end = high_resolution_clock::now();
auto time = duration_cast<std::chrono::milliseconds>(end - start);
return (long)time.count();
}
}
#endif
test.cpp:
#include "dataset.hpp"
#include <iostream>
#include <vector>
/*
TEST CODE
*/
class SimpleClass {
public:
SimpleClass();
SimpleClass(int);
SimpleClass(const SimpleClass&);
void sayHello();
~SimpleClass();
private:
int data;
};
SimpleClass::SimpleClass() {
//std::cout << "Constructed " << this << std::endl;
data = 0;
}
SimpleClass::SimpleClass(int data) {
//std::cout << "Constructed " << this << std::endl;
this->data = data;
}
SimpleClass::SimpleClass(const SimpleClass& other) {
//std::cout << "Copied to " << this << std::endl;
data = other.data;
}
SimpleClass::~SimpleClass() {
//std::cout << "Deconstructed " << this << std::endl;
}
void SimpleClass::sayHello() {
std::cout << "Hello! I am #" << data << std::endl;
}
int main() {
long list = Dataset::benchmark([]() {
Dataset::List<SimpleClass> list = Dataset::List<SimpleClass>(1000);
for (int i = 0; i < 1000; i++) {
list.push(SimpleClass(i));
}
});
long vec = Dataset::benchmark([]() {
std::vector<SimpleClass> list = std::vector<SimpleClass>(1000);
for (int i = 0; i < 1000; i++) {
list.emplace_back(SimpleClass(i));
}
});
std::cout << "List: " << list << "ms" << std::endl;
std::cout << "Vector: " << vec << "ms" << std::endl;
return 0;
}

std::vector constructor with one parameter creates vector with count elements:
explicit vector( size_type count, const Allocator& alloc = Allocator() );
To have something comparable for vector you have to do:
std::vector<SimpleClass> list;
list.reserve( 1000 );
also your "vector" copies objects it holds by simply copying memory, which is only allowed for trivially copyable objects, and SimpleClass is not one of them as it has user defined constuctors.

This is a really nice start! Clean and simple solution to the exercise. Sadly, your instincts are right that you weren’t testing enough cases.
One thing that jumps out at me is that you never resize your vectors, and therefore don’t measure how most STL implementations can often avoid copying when they grow in size. It also never returns any memory to the heap when it shrinks. You also don’t say whether you were compiling with /Oz to enable optimizations. But my guess is that there’s a small amount of overhead in Microsoft’s implementation, and it would pay off in other tests (especially an array of non-trivially-copyable data that needs to be resized, or a series of vectors that start out big but can be filtered and shrunk, or storing lots of data that can be moved instead of copied).
One bug that jumps out at me is that you call new[] to allocate a buffer of char—which is not guaranteed to meet the alignment requirements of T. On some CPUs, that can crash the program.
Another is that you use std::copy with an uninitialized area of memory as the destination in List::setCapacity. That doesn’t work except in special cases: std::copy expects a validly-initialized object that can be assigned to. For any type where assignment is a non-trivial operation, this will fail when the program tries to call a destructor on garbage data. If that happens to work, the move will then inefficiently clone the data and destroy the original, rather than using the move constructor if one exists. The STL algorithm you really want here is std::uninitialized_move. You might also want to use calloc/realloc, which allows resizing blocks.
Your capacity and size members should be size_t rather than int. This not only limits the size to less memory than most implementations can address, calculating a size greater than INT_MAX (i.e., 2 GiB or more on most implementations) causes undefined behavior.
One thing List::push has going for it is that it uses the semantics of std::vector::emplace_back (which you realize, and use as your comparison). It could, however, be improved. You pass item in by value, rather than by const reference. This creates an unnecessary copy of the data. Fortunately, if T has a move constructor, the extra copy can be moved, and if item is an xvalue, the compiler might be able to optimize the copy away, but it would be better to have List::push(const T&) and List::push(T&&). This will let the class push an xvalue without making any copies at all.
List::get is better, and avoids making copies, but it does not have a const version, so a const List<T> cannot do anything. It also does not check bounds.
Consider putting the code to look up the position of an index within the buffer into a private inline member function, which would drastically cut down the amount of work you will need to do to fix design changes (such as the ones you will need to fix the data-alignment bug).

Related

Finding Bug in implementation of dynamic array class. Crashes after building list of strings

I have written a DynamicArray class in the past analogous to vector which worked.
I have also written as a demo, one where the performance is bad because it has only length and pointer, and has to grow every time. Adding n elements is therefore O(n^2).
The purpose of this code was just to demonstrate placement new. The code works for types that do not use dynamic memory, but with string it crashes and -fsanitize=address shows that the memory allocated in the addEnd() method is being used in printing. I commented out removeEnd, the code is only adding elements, then printing them. I'm just not seeing the bug. can anyone identify what is wrong?
#include <iostream>
#include <string>
#include <memory.h>
using namespace std;
template<typename T>
class BadGrowArray {
private:
uint32_t size;
T* data;
public:
BadGrowArray() : size(0), data(nullptr) {}
~BadGrowArray() {
for (uint32_t i = 0; i < size; i++)
data[i].~T();
delete [] (char*)data;
}
BadGrowArray(const BadGrowArray& orig) : size(orig.size), data((T*)new char[orig.size*sizeof(T)]) {
for (int i = 0; i < size; i++)
new (data + i) T(orig.data[i]);
}
BadGrowArray& operator =(BadGrowArray copy) {
size = copy.size;
swap(data, copy.data);
return *this;
}
void* operator new(size_t sz, void* p) {
return p;
}
void addEnd(const T& v) {
char* old = (char*)data;
data = (T*)new char[(size+1)*sizeof(T)];
memcpy(data, old, size*sizeof(T));
new (data+size) T(v); // call copy constructor placing object at data[size]
size++;
delete [] (char*)old;
}
void removeEnd() {
const char* old = (char*)data;
size--;
data[size].~T();
data = (T*)new char[size*sizeof(T)];
memcpy(data, old, size*sizeof(T));
delete [] (char*)old;
}
friend ostream& operator <<(ostream& s, const BadGrowArray& list) {
for (int i = 0; i < list.size; i++)
s << list.data[i] << ' ';
return s;
}
};
class Elephant {
private:
string name;
public:
Elephant() : name("Fred") {}
Elephant(const string& name) {}
};
int main() {
BadGrowArray<int> a;
for (int i = 0; i < 10; i++)
a.addEnd(i);
for (int i = 0; i < 9; i++)
a.removeEnd();
// should have 0
cout << a << '\n';
BadGrowArray<string> b;
b.addEnd("hello");
string s[] = { "test", "this", "now" };
for (int i = 0; i < sizeof(s)/sizeof(string); i++)
b.addEnd(s[i]);
// b.removeEnd();
cout << b << '\n';
BadGrowArray<string> c = b; // test copy constructor
c.removeEnd();
c = b; // test operator =
}
The use of memcpy is valid only for trivially copyable types.
The compiler may even warn you on that, with something like:
warning: memcpy(data, old, size * sizeof(T));
writing to an object of non-trivially copyable type 'class string'
use copy-assignment or copy-initialization instead [-Wclass-memaccess]
Note that your code do not move the objects, but rather memcpy them, which means that if they have for example internal pointers that point to a position inside the object, then your mem-copied object will still point to the old location.
Trivially Copyable types wouldn't have internal pointers that point to a position in the object itself (or similar issues that may prevent mem-copying), otherwise the type must take care of them in copying and implement proper copy and assignemnt operations, which would make it non-trivially copyable.
To fix your addEnd method to do proper copying, for non-trivially copyable types, if you use C++17 you may add to your code an if-constexpr like this:
if constexpr(std::is_trivially_copyable_v<T>) {
memcpy(data, old, size * sizeof(T));
}
else {
for(std::size_t i = 0; i < size; ++i) {
new (data + i) T(std::move_if_noexcept(old[i]));
}
}
In case you are with C++14 or before, two versions of copying with SFINAE would be needed.
Note that other parts of the code may also require some fixes.

How can I use a map function, created by me, to print the values ​of an array

I created this map function which takes a function as input, in my case the print function, and prints the values ​​of an array
template<typename Data>
void Vector<Data>::MapPreOrder(MapFunctor fun, void* par){
for(unsigned long index = 0 ; index < Size; index++){
fun(Elements[index], par);
}
}
template<typename Data>
void print( Data& data){
std::cout << data<< '\n';
}
in main, however, I don't know how to pass the function to the map function. I thought of doing this but it gives me errors
void* noParameter = nullptr;
myVec.MapPreOrder(print(myVec[0]), noParameter);
The usage of void * is a C code, and not C++. I think it is much easier if you use the entire force of C++ (for this C++14 and higher) in order to make a much simpler code:
template<typename Data>
class Vector
{
// class code here...
template <typename MapFunc, typename ... ParamTypes>
void map_pre_order(MapFunc fun, ParamTypes ... par)
{
for(uint32_t index = 0 ; index < size; index++)
{
fun(elements[index], par...);
}
}
};
int main()
{
constexpr auto print_func = [](const auto& data) { std::cout << data << "\n"; };
constexpr auto print_func_with_params = [](const auto& data, const std::string& extra)
{
std::cout << data << extra;
};
Vector<int> vec{10};
vec.map_pre_order(print_func);
vec.map_pre_order(print_func_with_params, " extra \n");
}
In this code, I made MapFunc a template instead of defining it, and variadic template in order to pack all of the extra arguments for the function. This way, if you have a function with no extra arguments, you don't need to pass anything to it.
I have also used the auto specifier in the lambda, in order to mimic the "template" behavior, just in lambda (this is why C++14 is needed here).

Dealing with Vectors - cudaMemcpyDeviceToHost

It is not obvious how to use std::vector in CUDA, so I have designed my own Vector class:
#ifndef VECTORHEADERDEF
#define VECTORHEADERDEF
#include <cmath>
#include <iostream>
#include <cassert>
template <typename T>
class Vector
{
private:
T* mData; // data stored in vector
int mSize; // size of vector
public:
Vector(const Vector& otherVector); // Constructor
Vector(int size); // Constructor
~Vector(); // Desructor
__host__ __device__ int GetSize() const; // get size of the vector
T& operator[](int i); // see element
// change element i
__host__ __device__ void set(size_t i, T value) {
mData[i] = value;
}
template <class S> // output vector
friend std::ostream& operator<<(std::ostream& output, Vector<S>& v);
};
// Overridden copy constructor
// Allocates memory for new vector, and copies entries of other vector into it
template <typename T>
Vector<T>::Vector(const Vector& otherVector)
{
mSize = otherVector.GetSize();
mData = new T [mSize];
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
}
// Constructor for vector of a given size
// Allocates memory, and initialises entries to zero
template <typename T>
Vector<T>::Vector(int size)
{
assert(size > 0);
mSize = size;
mData = new T [mSize];
for (int i=0; i<mSize; i++)
{
mData[i] = 0.0;
}
}
// Overridden destructor to correctly free memory
template <typename T>
Vector<T>::~Vector()
{
delete[] mData;
}
// Method to get the size of a vector
template <typename T>
__host__ __device__ int Vector<T>::GetSize() const
{
return mSize;
}
// Overloading square brackets
// Note that this uses `zero-based' indexing, and a check on the validity of the index
template <typename T>
T& Vector<T>::operator[](int i)
{
assert(i > -1);
assert(i < mSize);
return mData[i];
}
// Overloading the assignment operator
template <typename T>
Vector<T>& Vector<T>::operator=(const Vector& otherVector)
{
assert(mSize == otherVector.mSize);
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
return *this;
}
// Overloading the insertion << operator
template <typename T>
std::ostream& operator<<(std::ostream& output, Vector<T>& v) {
for (int i=0; i<v.mSize; i++) {
output << v[i] << " ";
}
return output;
}
My main function - where I just pass a vector to the device, modify it and pass it back - is as follows (with the kernel designed just for testing purposes):
#include <iostream>
#include "Vector.hpp"
__global__ void alpha(Vector<int>* d_num)
{
int myId = threadIdx.x + blockDim.x * blockIdx.x;
d_num->set(0,100);
d_num->set(2,11);
}
int main()
{
Vector<int> num(10);
for (int i=0; i < num.GetSize(); ++i) num.set(i,i); // initialize elements to 0:9
std::cout << "Size of vector: " << num.GetSize() << "\n";
std::cout << num << "\n"; // print vector
Vector<int>* d_num;
// allocate global memory on the device
cudaMalloc((void **) &d_num, num.GetSize()*sizeof(int));
// copy data from host memory to the device memory
cudaMemcpy(d_num, &num[0], num.GetSize()*sizeof(int), cudaMemcpyHostToDevice);
// launch the kernel
alpha<<<1,100>>>(d_num);
// copy the modified array back to the host, overwriting the contents of h_arr
cudaMemcpy(num, &d_num[0], num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);
std::cout << num << "\n";
// free GPU memory allocation and exit
cudaFree(d_num);
return 0;
}
The problem I encounter is with cudaMemcpyDeviceToHost. It does not really copy the device vector to the num vector as can be seen from the output.
How should I deal with that? (Please be explicit, I am fairly new to CUDA).
This will create a valid pointer to the first element of the vector num:
cudaMemcpy(d_num, &num[0], num.GetSize()*sizeof(int), cudaMemcpyHostToDevice);
^^^^^^^
This will not:
cudaMemcpy(num, &d_num[0], num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);
^^^
The name of a your Vector object is not a pointer to its first data element. Instead, you should write that line in a similar fashion to the first one you wrote, like this:
cudaMemcpy(&num[0], d_num, num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);
However this by itself is not a fix. Note that d_num is not a Vector, but is already a pointer, so we can use it directly in these operations. Although it is not wrong to use &(d_num[0]), it is unnecessary to do so.
Because d_num is not a Vector (as you have allocated it - it is a bare pointer to a set of int quantities), your usage of Vector methods in the kernel is also broken. If you want to use Vector methods in the kernel, you will need to pass it an actual Vector object, not just the data. Since passing an object will require device data handling within the object (data accessible on the host is not accessible on the device, and vice-versa), it is an extensive re-write of your Vector class. I've made a limited attempt at that, showing one possible way forward. The basic methodology (ie. one possible approach) is as follows:
The object will contain pointers to both a host copy of the data and a device copy of the data.
At object instantiation, we will allocate both, and initially set our "reference" pointer to point to the host copy.
Prior to usage on the device, we must copy the host data to the device data, and the to_device() method is used for this purpose. This method also switches our "reference" pointer (mData) to refer to the device-side copy of the Vector data.
In addition to copying host data to device data "internal" to the object, we must make the object itself usable on the device. For this, we copy the object itself via pointer to a device-side copy (d_num).
We can then use the object in the usual way on the device, for those methods which have a __device__ decoration.
After completion of the kernel, we must update the host copy of the data and switch our "reference" pointer back to the host data. the to_host() method is provided for this purpose.
Thereafter the object can be used again in host code, reflecting the data changes if any which occurred in the kernel.
Here is a worked example:
$ cat t101.cu
#include <iostream>
#include <cmath>
#include <iostream>
#include <cassert>
template <typename T>
class Vector
{
private:
T* mData, *hData, *dData; // data stored in vector
int mSize; // size of vector
public:
Vector(const Vector& otherVector); // Constructor
Vector(int size); // Constructor
~Vector(); // Desructor
__host__ __device__ int GetSize() const; // get size of the vector
__host__ __device__ T& operator[](int i); // see element
// change element i
__host__ __device__ void set(size_t i, T value) {
mData[i] = value;
};
__host__ __device__ Vector<T>& operator=(const Vector<T>& otherVector);
void to_device();
void to_host();
template <class S> // output vector
friend std::ostream& operator<<(std::ostream& output, Vector<S>& v);
};
// Overridden copy constructor
// Allocates memory for new vector, and copies entries of other vector into it
template <typename T>
Vector<T>::Vector(const Vector& otherVector)
{
mSize = otherVector.GetSize();
hData = new T [mSize];
cudaMalloc(&dData, mSize*sizeof(T));
mData = hData;
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
}
// Constructor for vector of a given size
// Allocates memory, and initialises entries to zero
template <typename T>
Vector<T>::Vector(int size)
{
assert(size > 0);
mSize = size;
hData = new T [mSize];
cudaMalloc(&dData, mSize*sizeof(T));
mData = hData;
for (int i=0; i<mSize; i++)
{
mData[i] = 0.0;
}
}
// Overridden destructor to correctly free memory
template <typename T>
Vector<T>::~Vector()
{
delete[] hData;
if (dData) cudaFree(dData);
}
// Method to get the size of a vector
template <typename T>
__host__ __device__
int Vector<T>::GetSize() const
{
return mSize;
}
// Overloading square brackets
// Note that this uses `zero-based' indexing, and a check on the validity of the index
template <typename T>
__host__ __device__
T& Vector<T>::operator[](int i)
{
assert(i > -1);
assert(i < mSize);
return mData[i];
}
// Overloading the assignment operator
template <typename T>
__host__ __device__
Vector<T>& Vector<T>::operator=(const Vector<T>& otherVector)
{
assert(mSize == otherVector.mSize);
for (int i=0; i<mSize; i++)
{
mData[i] = otherVector.mData[i];
}
return *this;
}
// Overloading the insertion << operator
// not callable on the device!
template <typename T>
std::ostream& operator<<(std::ostream& output, Vector<T>& v) {
for (int i=0; i<v.mSize; i++) {
output << v[i] << " ";
}
return output;
}
template <typename T>
void Vector<T>::to_device(){
cudaMemcpy(dData, hData, mSize*sizeof(T), cudaMemcpyHostToDevice);
mData = dData;
}
template <typename T>
void Vector<T>::to_host(){
cudaMemcpy(hData, dData, mSize*sizeof(T), cudaMemcpyDeviceToHost);
mData = hData;
}
__global__ void alpha(Vector<int> *d_num)
{
d_num->set(0,100);
d_num->set(2,11);
(*d_num)[1] = 50;
}
int main()
{
Vector<int> num(10);
for (int i=0; i < num.GetSize(); ++i) num.set(i,i); // initialize elements to 0:9
std::cout << "Size of vector: " << num.GetSize() << "\n";
std::cout << num << "\n"; // print vector
Vector<int> *d_num;
cudaMalloc(&d_num, sizeof(Vector<int>));
num.to_device();
cudaMemcpy(d_num, &(num), sizeof(Vector<int>), cudaMemcpyHostToDevice);
// launch the kernel
alpha<<<1,1>>>(d_num);
// copy the modified array back to the host, overwriting the contents of h_arr
num.to_host();
std::cout << num << "\n";
// free GPU memory allocation and exit
return 0;
}
$ nvcc -arch=sm_61 -o t101 t101.cu
$ cuda-memcheck ./t101
========= CUDA-MEMCHECK
Size of vector: 10
0 1 2 3 4 5 6 7 8 9
100 50 11 3 4 5 6 7 8 9
========= ERROR SUMMARY: 0 errors
$
Notes:
According to my testing, your posted code had various compile errors so I had to make other changes to your Vector class just to get it to compile.
Passing an object by value to the kernel will invoke the copy constructor, and subsequently the destructor, which makes things more difficult, therefore I have elected to pass the object via pointer (which is how you originally had it), to avoid this.
Your kernel call is launching 100 threads. Since they are all doing precisely the same thing, without any read activity going on, there's nothing particularly wrong with this, but I have changed it to just a single thread. It still demonstrates the same capability.
It is not just cudaMemcpyDeviceToHost part that you're having trouble with.
Vector<int> num(10);
Vector<int>* d_num;
cudaMalloc(&d_num, num.GetSize()*sizeof(int));
This will allocate 40 bytes on the cuda global memory(assuming sizeof(int) is 4), which is pointed by d_num of type Vector<int>*. I don't think you are expecting Vector<int> object itself to be 40 bytes.
Let's try another way.
cudaMalloc(&d_num, sizeof(Vector<int>));
cudaMalloc(&d_num->mData, num.GetSize()*sizeof(int)); // assume mData is a public attribute
Unfortunately, the second line will emit segmentation fault because you are accessing device memory from host code(d_num->mData).
So your implementation of Vector class has many fallacies. If you're planning to have a fixed size array, just declare d_num as a pointer.
int* d_num;
cudaMalloc(&d_num, num.GetSize()*sizeof(int));
cudaMemcpy(d_num, &num[0], num.GetSize()*sizeof(int), cudaMemcpyHostToDevice);
// .. some kernel operations
cudaMemcpy(&num[0], d_num, num.GetSize()*sizeof(int), cudaMemcpyDeviceToHost);
Thrust is library written for CUDA and it has vectors. http://docs.nvidia.com/cuda/thrust/
Maybe it has all the functions you need, so why reinvent the wheel if you dont have to.

HashTable in C++

I need to implement a HashTable in C++. I thought of using Array.
But i don't know exactly how to create an array of fixed size.
Lets say that my class is named HT.
In the constructor i want to specify the array size but i don't know how.
I have a members size_type size; and string [] t; in HT headerfile.
How can i specify the size of t from the constructor?
HT(size_type s):size(s) {
}
If it is not possible what data structure should i use to implement a hash table?
In the Constructor for the HT class, pass in (or default to 0) a size variable (s) to specify the size of the array. Then set t to a new string array of size s
So something like:
HT::HT(size_type s)
{
t = new string[s];
}
You could do as std::array and make the size a compile-time parameter.
If not, there is really no use trying to avoid std::vector, since you'll be doing dynamic allocation no matter what.
So, while you could
struct HT
{
HT(size_t size) : _size(size), _data(new std::string[size]) {}
private:
size_t const _size;
std::unique_ptr<std::string[]> _data;
};
It's only making your class more complex, less flexible and generally less elegant, so I'd go with vector:
#include <memory>
using namespace std;
struct HT
{
HT(size_t size) : _size(size), _data(new std::string[size]) {}
private:
size_t const _size;
std::unique_ptr<std::string[]> _data;
};
#include <vector>
struct HT2
{
HT2(size_t size) : _data(size) {}
private:
std::vector<std::string> _data;
};
int main()
{
HT table1(31);
HT2 table2(31);
}
Most suggestions seem to assume it's ok to implement your hash table container class in terms of the standard library. I wonder exactly what your situation is; how did it come about, this "need" to implement a primitive container class? Is it really cool if you depend on another library?
Everyone else seems to think so, though. I guess std is really a fundamental component of the C++ language, now....
Looking at other answers, I see std::vector, std::string, std::unique_pointer...
But the road doesn't end there. Not even close.
#include <unordered_map>
#include <string>
#include <iostream>
template <typename T>
class CHashTable {
typedef std::string KEYTYPE;
struct HASH_FUNCTOR {
size_t operator ()(const KEYTYPE& key) const {
return CHashTable::MyAmazingHashFunc(key);
} };
typename std::unordered_map<KEYTYPE, T, HASH_FUNCTOR> m_um;
public:
static size_t MyAmazingHashFunc(const KEYTYPE& key) {
size_t h = key.length();
for(auto c : key) {
h = h*143401 + static_cast<size_t>(c)*214517 + 13;
}
h = (~h << (sizeof(h)*4)) + (h >> (sizeof(h)*4));
return h;
}
template <typename KT>
T& operator [] (const KT& key) {
return m_um[KEYTYPE(key)];
}
template <typename KT>
const T& operator [] (const KT& key) const {
return m_um.at(KEYTYPE(key));
}
void DeleteAll() {
m_um.clear();
}
template <typename KT>
void Delete(const KT& key) {
m_um.erase(KEYTYPE(key));
}
template <typename KT>
bool Exists(const KT& key) const {
const auto fit = m_um.find(KEYTYPE(key));
return fit != m_um.end();
}
};
int main() {
CHashTable<int> ht;
// my Universal Translator, a "WIP"
ht["uno"] = 1;
ht["un"] = 1;
ht["one"] = 1;
ht["dos"] = 2;
ht["deux"] = 2;
ht["two"] = 2;
const char* key = "deux";
int value = ht[key];
std::cout << '[' << key << "] => " << value << std::endl;
key = "un";
bool exists = ht.Exists(key);
std::cout << '[' << key << "] "
<< (exists ? "exists" : "does not exist") << std::endl;
key = "trois";
exists = ht.Exists(key);
std::cout << '[' << key << "] "
<< (exists ? "exists" : "does not exist") << std::endl;
return 0;
}
main()'s output:
[deux] => 2
[un] exists
[trois] does not exist
And that's not even the end of the Hash Table std:: Highway! The end is abrupt, at a class that just publicly inherits from std::unordered_map. But I would never suggest THAT in an Answer because I don't want to come across as a sarcastic smartass.

Array with undefined size as Class-member

I'm searching for a way to define an array as a class-member with an undefined size (which will be defined on initialization).
class MyArrayOfInts {
private:
int[] array; // should declare the array with an (yet) undefined length
public:
MyArrayOfInts(int);
int Get(int);
void Set(int, int);
};
MyArrayOfInts::MyArrayOfInts(int length) {
this->array = int[length]; // defines the array here
}
int MyArrayOfInts::Get(int index) {
return this->array[index];
}
void MyArrayOfInts:Set(int index, int value) {
this->array[index] = value;
}
How can I achieve this behaviour ?
Why not just use std::vector<int>?
Proof Of Concept
Ok, inspired by UncleBens challenge here, I came up with a Proof-Of-Concept (see below) that let's you actually do:
srand(123);
for (int i=0; i<10; i++)
{
size_t N = rand() % DEMO_MAX; // capped for demo purposes
std::auto_ptr<iarray> dyn(make_dynamic_array(N));
exercise(*dyn);
}
It revolves around a template trick in factory<>::instantiate that actually uses a compile-time meta-binary-search to match the specified (runtime) dimension to a range of explicit static_array class template instantiations.
I feel the need to repeat that this is not good design, I provide the code sample only to show what the limits are of what can be done - with reasonable effor, to achieve the actual goal of the question. You can see the drawbacks:
the compiler is crippled with a boatload of useless statical types and create classes that are so big that they become a performance liability or a reliability hazard (stack allocation anyone? -> we're on 'stack overflow' already :))
at DEMO_MAX = 256, g++ -Os will actually emit 258 instantiations of factory<>; g++ -O4 will keep 74 of those, inlining the rest[2]
compilation doesn't scale well: at DEMO_MAX = MAX_RAND compilation takes about 2m9s to... run out of memory on a 64-bit 8GB machine; at MAX_RAND>>16 it takes over 25 minutes to possibly compile (?) while nearly running out of memory. It would really require some amounts of ugly manual optimization to remove these limits - I haven't gone so insane as to actually do that work, if you'll excuse me.
on the upside, this sample demonstrates the arguably sane range for this class (0..256) and compiles in only 4 seconds and 800Kb on my 64-bit linux. See also a down-scaled, ANSI-proof version at codepad.org
[2] established that with objdump -Ct test | grep instantiate | cut -c62- | sort -k1.10n
Show me the CODE already!
#include <iostream>
#include <memory>
#include <algorithm>
#include <iterator>
#include <stdexcept>
struct iarray
{
typedef int value_type;
typedef value_type* iterator;
typedef value_type const* const_iterator;
typedef value_type& reference;
typedef value_type const& const_reference;
virtual size_t size() const = 0;
virtual iterator begin() = 0;
virtual const_iterator begin() const = 0;
// completely unoptimized plumbing just for demonstration purps here
inline iterator end() { return begin()+size(); }
inline const_iterator end() const { return begin()+size(); }
// boundary checking would be 'gratis' here... for compile-time constant values of 'index'
inline const_reference operator[](size_t index) const { return *(begin()+index); }
inline reference operator[](size_t index) { return *(begin()+index); }
//
virtual ~iarray() {}
};
template <size_t N> struct static_array : iarray
{
static const size_t _size = N;
value_type data[N];
virtual size_t size() const { return _size; }
virtual iterator begin() { return data; }
virtual const_iterator begin() const { return data; }
};
#define DEMO_MAX 256
template <size_t PIVOT=DEMO_MAX/2, size_t MIN=0, size_t MAX=DEMO_MAX>
struct factory
/* this does a binary search in a range of static types
*
* due to the binary search, this will require at most 2log(MAX) levels of
* recursions.
*
* If the parameter (size_t n) is a compile time constant expression,
* together with automatic inlining, the compiler will be able to optimize
* this all the way to simply returning
*
* new static_array<n>()
*
* TODO static assert MIN<=PIVOT<=MAX
*/
{
inline static iarray* instantiate(size_t n)
{
if (n>MAX || n<MIN)
throw std::range_error("unsupported size");
if (n==PIVOT)
return new static_array<PIVOT>();
if (n>PIVOT)
return factory<(PIVOT + (MAX-PIVOT+1)/2), PIVOT+1, MAX>::instantiate(n);
else
return factory<(PIVOT - (PIVOT-MIN+1)/2), MIN, PIVOT-1>::instantiate(n);
}
};
iarray* make_dynamic_array(size_t n)
{
return factory<>::instantiate(n);
}
void exercise(iarray& arr)
{
int gen = 0;
for (iarray::iterator it=arr.begin(); it!=arr.end(); ++it)
*it = (gen+=arr.size());
std::cout << "size " << arr.size() << ":\t";
std::copy(arr.begin(), arr.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
}
int main()
{
{ // boring, oldfashioned method
static_array<5> i5;
static_array<17> i17;
exercise(i5);
exercise(i17);
}
{ // exciting, newfangled, useless method
for (int n=0; n<=DEMO_MAX; ++n)
{
std::auto_ptr<iarray> dyn(make_dynamic_array(n));
exercise(*dyn);
}
try { make_dynamic_array(-1); } catch (std::range_error e) { std::cout << "range error OK" << std::endl; }
try { make_dynamic_array(DEMO_MAX + 1); } catch (std::range_error e) { std::cout << "range error OK" << std::endl; }
return 0;
srand(123);
for (int i=0; i<10; i++)
{
size_t N = rand() % DEMO_MAX; // capped for demo purposes
std::auto_ptr<iarray> dyn(make_dynamic_array(N));
exercise(*dyn);
}
}
return 0;
}
Declare it as:
int* array;
Then you can initialize it this way:
MyArrayOfInts::MyArrayOfInts(int length) {
this->array = new int[length];
}
Don't forget to free the memory in the destrutor:
MyArrayOfInts::~MyArrayOfInts() {
delete [] this->array;
}
Is the class declaration complete ? If the constructor of the class takes the size of the array as an argument and you don't want to resize the array, then templatizing the class can give you runtime behaviour.
Now, we don't have to pass the size of the array as argument to the constructor.
template<size_t size>
class MyClass
{
public:
MyClass() { std::iota(arr_m, arr_m + size, 1); }
int operator[](int index) const
{
return arr_m[index];
}
int& operator[](int index)
{
return arr_m[index];
}
void Set(size_t index, int value)
{
arr_m[index] = value;
}
private:
int arr_m[size];
};
int main()
{
{
MyClass<5> obj;
std::cout << obj[4] << std::endl;
}
{
MyClass<4> obj;
std::cout << obj[3] << std::endl;
obj.Set(3, 30);
std::cout << obj[3] << std::endl;
}
}
In response to critics in the comments
I think many people fail to notice a crucial given in the question: since the question asks specifically how to declare an int[N] array inside a struct, it follows that each N will yield a distinct static type to the compiler.
As much as my approach is being 'critiqued' for this property, I did not invent it: it is a requirement from the original question. I can join the chorus saying: "just don't" or "impossible" but as a curious engineer I feel I'm often more helped by defining the boundaries of ust what is in fact still possible.
I'll take a moment to come up with a sketch of an answer to mainly UncleBen interesting challenge. Of course I could hand-waive 'just use template metaprogramming' but it sure would be more convincing and fun to come up with a sample1
1 only to follow that sample with a big warning: don't do this in actual life :)
The TR1 (or c++0x) type std::array does exactly that; you'll need to make the containing class generic to cater for the array size:
template <size_t N> struct MyArrayOfInts : MyArrayOfIntsBase /* for polymorphism */
{
std::array<int, N> _data;
explicit MyArrayOfInts(const int data[N])
{
std::copy(data, data+N, _data);
}
};
You can make the thing easier to work with by doing a smart template overloaded factory:
template <size_t N>
MyArrayOfInts<N> MakeMyArray(const int (&data)[N])
{ return MyArrayOfInts<N>(data); }
I'm working on this too for solving a dynamic array problem - I found the answer provided was sufficient to resolve.
This is tricky because arrays in functions from my reading do not continue after function ends, arrays have a lot of strange nuance, however if trying to make a dynamic array without being allowed to use a vector, I believe this is the best approach..
Other approaches such as calling new and delete upon the same pointed to array can/will lead to double free pending the compiler as it causes some undefined behavior.
class arrayObject
{
public:
arrayObject();
~arrayObject();
int createArray(int firstArray[]);
void getSize();
void getValue();
void deleting();
// private:
int *array;
int size;
int counter;
int number;
};
arrayObject::arrayObject()
{
this->array = new int[size];
}
arrayObject::~arrayObject()
{
delete [] this->array;
}