I have a variable length data structure, a multi-dimensional iterator:
class Iterator
{
public:
static Iterator& init(int dim, int* sizes, void* mem)
{
return *(new (mem) Iterator(dim, sizes));
}
static size_t alloc_size(int dim)
{
return sizeof(Iterator) + sizeof(int) * 2 * dim;
}
void operator++()
{
// increment counters, update pos_ and done_
}
bool done() const { return done_; }
bool pos() const { return pos_; }
private:
Iterator(int dim, int* sizes) : dim_(dim), pos_(0), done_(false)
{
for (int i=0; i<dim_; ++i) size(i) = sizes[i];
for (int i=0; i<dim_; ++i) counter(i) = 0;
}
int dim_;
int pos_;
bool done_;
int size (int i) { return reinterpret_cast<int*>(this+1)[i]; }
int& counter(int i) { return reinterpret_cast<int*>(this+1)[dim_+i]; }
};
The dimensionality of the iterator is not known at compile time but probably small, so I allocate memory for the iterator with alloca:
void* mem = alloca(Iterator::alloc_size(dim));
for (Iterator& i = Iterator::create(dim, sizes, mem); !i.done(); ++i)
{
// do something with i.pos()
}
Is there a more elegant way of allocating memory for the iterator? I am aware of the fact that upon returning from a function, its stack is unwound, thus alloca must be used in the caller's stack frame (see e.g. here). This answer suggests that the allocation be performed in a default parameter:
static Iterator& init(int dim, int* sizes, void* mem = alloca(alloc_size(dim)));
However elegant, this solution does not help me: Default argument references parameter 'dim'. Any suggestion for a nice solution?
Unfortunately, given that dim is a run-time value, there isn't any way to do this other than with a macro:
#define CREATE_ITERATOR(dim, sizes) \
Iterator::init(dim, sizes, alloca(Iterator::alloc_size(dim)))
You could have the dimension parameter as a template argument.
My suggestion might not be what you are looking for but why not have a create|make_iterator function that does the alloca call?
I wouldn't recommend to use alloca at all. If dim value is small, then some fixed-size buffer inside a class would be sufficient. If dim is large then heap allocation cost would be neglectable comparing to complexity of other operations performed on your iterator (note that for very large dim values alloca may cause stack overflow). You may choose between fixed buffer and heap allocation at runtime, depending on dim size.
So I'd recomend approach similar to small string optimization in std::string.
Perhaps, some kind of COW (copy on write http://en.wikipedia.org/wiki/Copy-on-write) techique may be also useful for your iterators.
Note, that this technique cannot be used with alloca, only with heap allocation. Morever, it's almost impossible to copy or copy initialize your iterators if they use alloca (at least without more and more ugly macroses).
Alloca is evil :)
Related
I am interfacing with some C++ code that has a method providing a pointer and object size (some proprietary library that I can't change). The interface looks something like this:
float *arrayPtr();
int arraySize();
I have a class that needs to copy this to a vector (in order to extend its lifetime). In the scenario where this pointer is not a nullptr, the constructor is fairly simple (I need to copy the data to extend its lifetime):
struct A {
std::vector<float> vec;
A(float *ptr, int size) : vec( ptr, std::next(ptr, size) ) {}
}
I am, however, a little unsure of how best to handle the initialization when ptr is a nullptr. I could default initialize, and then move everything to the constructor body, but that feels quite inefficient:
A(float *ptr, int size) {
if (ptr) {
vec = std::vector<float>( ptr, std::next(ptr, size));
}
}
Are there any other alternatives?
When ptr is a nullptr, I would like the vector to just be default initialized to an empty vector.
EDIT:
It just occurred to me that I should probably be doing this:
A(float *ptr, int size) : vec( ptr ? std::vector<float>(ptr, std::next(ptr, size)) : std::vector<float>() ) {}
But perhaps there is a better form??
This is not necessarily "better" (actually I think your original code is fine, since there is not really any cost to default initialization of a vector), but when you need to perform some logic for an initializer you can use a helper function:
struct A {
std::vector<float> vec;
A(float *ptr, int size) : vec( make_A_vector(ptr, size) ) {}
private:
static std::vector<float> make_A_vector(float const *begin, int size)
{
if ( size < 0 || (size > 0 && !begin) )
throw std::runtime_error("invalid array length for A");
if ( size == 0 )
return {};
return std::vector<float>(begin, begin + size);
}
};
Another common design is to keep your class simple and have construction logic entirely in a free function:
struct A
{
std::vector<float> vec;
};
inline A make_A(float const *ptr, int size)
{
// sanity check omitted for brevity
if ( size == 0 )
return A{};
return A{ std::vector<float>(ptr, ptr + size) };
}
It's an experience-based judgement call as to what would be overkill and what would be aesthetic :)
Researching what std::next and the ctor of std::vector is doing. your first try works just fine. ONLY and ONLY if your size is reliably 0, when your pointer is a nullptr.
std::next(it, n)
it - Iterator to base position. ForwardIterator shall be at least a
forward iterator.
n - Number of element positions offset (1 by default).
This shall only be negative for random-access and bidirectional
iterators. difference_type is the numerical type that represents
distances between iterators of the ForwardIterator type.
If you are assured, your given size is 0, when you get a nullptr, std::next returns the same position as your pointer points to.
std::vector
You use this ctor:
Constructs the container with the contents of the range [first, last).
If your first and last are the same, you get an empty std::vector. So the nullptr is never a problem.
Regardless this findings. you should ALWAYS check for nullptr's. For the sake of defensive programming - ALL INPUT IS EVIL!
You mentioned, the library you get is known for being error prone. Please save yourself much troubles and check every parameter you get out of it of integrity.
One comment on your worries that your implementation would perform any worse then an other. Check those implementations in an loop with thousands or millions of iterations. Then you can make an prediction if it effects you.
I have come across some code which allocates a 2d array with following approach:
auto a = new int[10][10];
Is this a valid thing to do in C++? I have search through several C++ reference books, none of them has mentioned such approach.
Normally I would have done the allocation manually as follow:
int **a = new int *[10];
for (int i = 0; i < 10; i++) {
a[i] = new int[10];
}
If the first approach is valid, then which one is preferred?
The first example:
auto a = new int[10][10];
That allocates a multidimensional array or array of arrays as a contiguous block of memory.
The second example:
int** a = new int*[10];
for (int i = 0; i < 10; i++) {
a[i] = new int[10];
}
That is not a true multidimensional array. It is, in fact, an array of pointers and requires two indirections to access each element.
The expression new int[10][10] means to allocate an array of ten elements of type int[10], so yes, it is a valid thing to do.
The type of the pointer returned by new is int(*)[10]; one could declare a variable of such a type via int (*ptr)[10];.
For the sake of legibility, one probably shouldn't use that syntax, and should prefer to use auto as in your example, or use a typedef to simplify as in
using int10 = int[10]; // typedef int int10[10];
int10 *ptr;
In this case, for small arrays, it is more efficient to allocate them on the stack. Perhaps even using a convenience wrapper such as std::array<std::array<int, 10>, 10>. However, in general, it is valid to do something like the following:
auto arr = new int[a][b];
Where a is a std::size_t and b is a constexpr std::size_t. This results in more efficient allocation as there should only be one call to operator new[] with sizeof(int) * a * b as the argument, instead of the a calls to operator new[] with sizeof(int) * b as the argument. As stated by Galik in his answer, there is also the potential for faster access times, due to increased cache coherency (the entire array is contiguous in memory).
However, the only reason I can imagine one using something like this would be with a compile-time-sized matrix/tensor, where all of the dimensions are known at compile time, but it allocates on the heap if it exceeds the stack size.
In general, it is probably best to write your own RAII wrapper class like follows (you would also need to add various accessors for height/width, along with implementing a copy/move constructor and assignment, but the general idea is here:
template <typename T>
class Matrix {
public:
Matrix( std::size_t height, std::size_t width ) : m_height( height ), m_width( width )
{
m_data = new T[height * width]();
}
~Matrix() { delete m_data; m_data = nullptr; }
public:
T& operator()( std::size_t x, std::size_t y )
{
// Add bounds-checking here depending on your use-case
// by throwing a std::out_of_range if x/y are outside
// of the valid domain.
return m_data[x + y * m_width];
}
const T& operator()( std::size_t x, std::size_t y ) const
{
return m_data[x + y * m_width];
}
private:
std::size_t m_height;
std::size_t m_width;
T* m_data;
};
As my usually used C++ compilers allow variable-length arrays (e.g. arrays depending on runtime size), I wonder if there is something like std::array with variable size? Of course std::vector is of variable size, but it allocates on heap, and reallocates on need.
I like to have a stack allocated array with size defined at runtime. Is there any std-template that may feature this? Maybe using std::vector with a fixed maximum size?
There are two proposals currently being worked on to bring run-time fixed size arrays to C++ which may be of interest to you:
Runtime-sized arrays with automatic storage duration. This would make runtime sized arrays a language feature (like in C11). So you could do:
void foo(std::size_t size) {
int arr[size];
}
C++ Dynamic Arrays. This would bring a new container to the library, std::dynarray, which is given a fixed size at construction. It is intended to be optimized to be allocated on the stack when possible.
void foo(std::size_t size) {
std::dynarray<int> arr(size);
}
These are both being worked on as part of an Array Extensions Technical Specification, which will be released alongside C++14.
UPDATE: std::dynarray is not implemented yet(25Aug2021).please refer to What is the status on dynarrays?
As Daniel stated in the comment, size of the std::array is specified as a template parameter, so it cannot be set during runtime.
You can though construct std::vector by passing the minimum capacity through the constructor parameter:
#include <vector>
int main(int argc, char * argv[])
{
std::vector<int> a;
a.reserve(5);
std::cout << a.capacity() << "\n";
std::cout << a.size();
getchar();
}
But. Still vector's contents will be stored on the heap, not on the stack. The problem is, that compiler has to know, how much space should be allocated for the function prior to its execution, so it is simply not possible to store variable-length data on the stack.
Maybe using std::vector with a fixed maximal size?
If boost is allowed then boost::container::static_vector and boost::container::small_vector are the closest I can think of
boost::container::static_vector<int, 1024> my_array;
boost::container::small_vector<int, 1024> my_vector;
static_vector is a sequence container like boost::container::vector with contiguous storage that can change in size, along with the static allocation, low overhead, and fixed capacity of boost::array.
The size of each object is still fixed but it can be worth it if the number of allocations is significant and/or the item count is small
If the vector can grow beyond the limit then just use boost::container::small_vector. The heap is only touched when the size is larger than the defined limit
small_vector is a vector-like container optimized for the case when it contains few elements. It contains some preallocated elements in-place, which can avoid the use of dynamic storage allocation when the actual number of elements is below that preallocated threshold.
If you use Qt then QVarLengthArray is another way to go:
QVarLengthArray is an attempt to work around this gap in the C++ language. It allocates a certain number of elements on the stack, and if you resize the array to a larger size, it automatically uses the heap instead. Stack allocation has the advantage that it is much faster than heap allocation.
Example:
int myfunc(int n)
{
QVarLengthArray<int, 1024> array(n + 1);
...
return array[n];
}
Some other similar solutions:
llvm::SmallVector
Facebook's folly::small_vector
Electronic Arts Standard Template Library's eastl::fixed_vector
If a 3rd party solution isn't allowed then you can roll your own solution by wrapping std::array in a struct to get a static vector
template<typename T, size_t N>
struct my_static_vector
{
explicit static_vector(size_t size) { } // ...
size_t size() const noexcept { return curr_size; }
static size_t capacity() const noexcept { return N; }
T& operator[](size_t pos) { return data[pos]; }
void push_back(const T& value) { data[curr_size++] = value; }
// ...
private:
std::array<typename T, N> data;
std::size_t curr_size;
}
And if small_vector is required then you can use std::variant to contain both the my_static_vector and the vector
template<typename T, size_t N>
struct my_small_vector
{
explicit small_vector(size_t size) { } // ...
size_t size() const noexcept {
if (data.index() == 0) {
return data.get<0>().size();
} else {
return data.get<1>().size();
}
}
static size_t capacity() const noexcept {
if (data.index() == 0) {
return data.get<0>().capacity();
} else {
return data.get<1>().capacity();
}
}
// ...
private:
std::variant<my_static_vector<T, N>, std::vector<T>> data;
}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am working with a list of array in C++, each in an object and wanted to split some of them.
These are allocated dynamically.
I wanted to do the split in constant time as it is theoretically possible:
from
[ pointer, size1 ]
to
[ pointer, size2 ]; [ other array ]; [ pointer + size2, size1-size2 ]
(+ other data each time)
I tried to use malloc and simply create a new pointer incremented with the size.
As it could be expected, I got error due to the automatic freeing of the memory.
I tried a realloc starting at the second address, but as in "what is the difference between malloc and calloc" on this site already told me it is not possible.
Is there a way to avoid recopying the second part and define correctly the pointer?
Having a linear cost where I know I can have constant time is frustrating.
class TableA
{
public:
(constructor)
void divide(int size); // the one i am trying to implement
(other, geteur, seteur)
private
Evenement* _el;
vector<bool>** _old;//said arrays
int _size;
}
nothing really complicated
Basically, the malloc library can't cope with mallocing a chunk of memory and then freeing it slices.
You can do what you want, but you must only free the memory all at once right at the end using the original pointer that malloc handed you.
e.g.
int* p = malloc(9 * sizeof(int));
int* q = p + 3;
int* r = p + 6;
// Now we have three pointers to three arrays of three integers.
// Do stuff with p, q, r
free(p); // p is the only pointer it is valid to free.
By the way, if this is really about C++, there are probably standard C++ data structures you can use.
I don't think you can have only part of a dynamic array freed by keeping track of pointers and length. However, you can fake this by making a new class to manage the starting array, allocate it however you want and have it managed by a std::shared_ptr.
You simply return a Class containing a shared_ptr to memory, plain pointer to first element, and array size. When the current Array class goes out of scope, the shared_ptr gets decremented, and when no slices of the memory are used anymore, the memory gets freed.
You need to be careful with this though, as there may be multiple objects referencing the same memory, but there's ways around that (marking the original invalid after splitting with a bool for example).
[Edit] Below is a very basic implementation of this idea. A split() operator can very easily be implemented in terms of 2 slice() operations. I'm not sure how you want to implement this in terms of your example above, as I'm not sure how you manage your vector<bool> **, but if you wanted the possibility of splitting your vector<bool>, you could instantiate a ShareVector<bool>, or if you have an array of vector<bool>, your make a SharedVector<vector<bool>> instead.
#ifndef __SharedVector__
#define __SharedVector__
#include <memory>
#include <assert.h>
template <typename T>
class SharedVector {
std::shared_ptr<T> _data;
T *_begin;
size_t _size;
// perhaps add size_t capacity if need for limited resizing arises.
public:
SharedVector<T>(size_t const size)
: _data(std::shared_ptr<T>(new T[size], []( T *p ) { delete[] p; })), _begin(_data.get()), _size(size)
{}
// standard copy and move constructors work fine
// pass shared_ptr by reference to avoid unnecessary refcount changes
SharedVector<T>(std::shared_ptr<T> &data, T *begin, size_t size)
: _data(data), _begin(begin), _size(size)
{}
T& operator[] (const size_t nIndex) {
assert(nIndex < _size);
return _begin[nIndex];
}
T const & operator[] (const size_t nIndex) const {
assert(nIndex < _size);
return _begin.get()[nIndex];
}
size_t size(){
return _size;
}
SharedVector<T> slice(size_t const begin, size_t const end) {
assert(begin + end < _size);
return SharedVector<T>(_data, _begin + begin, end - begin);
}
T *begin() {
return _begin;
}
T *end() {
return _begin + _size;
}
};
#endif
I think the malloc and create new pointer idea is good. But you need to free the memory manually I think.
Maybe you can use std::copy.
int* p = (int*)malloc(sizeof(int) * 5);
for (int i = 0; i < 5; i++)
p[i] = i;
for (int i = 0; i < 5; i++)
std::cout << p[i];
// tmp will hold 3 values
// p2 will hold 2 values
// so we want to copy first 3 into tmp
// and last 2 into p2
int tmp[3];
std::copy(p, p+3, tmp);
for (int i = 0; i < 3; i++)
std::cout << tmp[i];
int p2[2];
std::copy(p+3, p+5, p2);
for (int i = 0; i < 2; i++)
std::cout << p2[i];
// get rid of original when done
free(p);
Output:
0123401234
The question is relatively unclear
But according to your topic
splitting dynamically allocated array without linear time copy
i would suggest you to use linked list instead of an array, you don't need to copy anything (and therefore no need to free anything UNLESS you wanna delete one of the item), manipulating the pointers would be sufficient for splitting a linked list
as in the title is it possible to join a number of arrays together without copying and only using pointers? I'm spending a significant amount of computation time copying smaller arrays into larger ones.
note I can't used vectors since umfpack (some matrix solving library) does not allow me to or i don't know how.
As an example:
int n = 5;
// dynamically allocate array with use of pointer
int *a = new int[n];
// define array pointed by *a as [1 2 3 4 5]
for(int i=0;i<n;i++) {
a[i]=i+1;
}
// pointer to array of pointers ??? --> this does not work
int *large_a = new int[4];
for(int i=0;i<4;i++) {
large_a[i] = a;
}
Note: There is already a simple solution I know and that is just to iteratively copy them to a new large array, but would be nice to know if there is no need to copy repeated blocks that are stored throughout the duration of the program. I'm in a learning curve atm.
thanks for reading everyone
as in the title is it possible to join a number of arrays together without copying and only using pointers?
In short, no.
A pointer is simply an address into memory - like a street address. You can't move two houses next to each other, just by copying their addresses around. Nor can you move two houses together by changing their addresses. Changing the address doesn't move the house, it points to a new house.
note I can't used vectors since umfpack (some matrix solving library) does not allow me to or i don't know how.
In most cases, you can pass the address of the first element of a std::vector when an array is expected.
std::vector a = {0, 1, 2}; // C++0x initialization
void c_fn_call(int*);
c_fn_call(&a[0]);
This works because vector guarantees that the storage for its contents is always contiguous.
However, when you insert or erase an element from a vector, it invalidates pointers and iterators that came from it. Any pointers you might have gotten from taking an element's address no longer point to the vector, if the storage that it has allocated must change size.
No. The memory of two arrays are not necessarily contiguous so there is no way to join them without copying. And array elements must be in contiguous memory...or pointer access would not be possible.
I'd probably use memcpy/memmove, which is still going to be copying the memory around, but at least it's been optimized and tested by your compiler vendor.
Of course, the "real" C++ way of doing it would be to use standard containers and iterators. If you've got memory scattered all over the place like this, it sounds like a better idea to me to use a linked list, unless you are going to do a lot of random access operations.
Also, keep in mind that if you use pointers and dynamically allocated arrays instead of standard containers, it's a lot easier to cause memory leaks and other problems. I know sometimes you don't have a choice, but just saying.
If you want to join arrays without copying the elements and at the same time you want to access the elements using subscript operator i.e [], then that isn't possible without writing a class which encapsulates all such functionalities.
I wrote the following class with minimal consideration, but it demonstrates the basic idea, which you can further edit if you want it to have functionalities which it's not currently having. There should be few error also, which I didn't write, just to make it look shorter, but I believe you will understand the code, and handle error cases accordingly.
template<typename T>
class joinable_array
{
std::vector<T*> m_data;
std::vector<size_t> m_size;
size_t m_allsize;
public:
joinable_array() : m_allsize() { }
joinable_array(T *a, size_t len) : m_allsize() { join(a,len);}
void join(T *a, size_t len)
{
m_data.push_back(a);
m_size.push_back(len);
m_allsize += len;
}
T & operator[](size_t i)
{
index ix = get_index(i);
return m_data[ix.v][ix.i];
}
const T & operator[](size_t i) const
{
index ix = get_index(i);
return m_data[ix.v][ix.i];
}
size_t size() const { return m_allsize; }
private:
struct index
{
size_t v;
size_t i;
};
index get_index(size_t i) const
{
index ix = { 0, i};
for(auto it = m_size.begin(); it != m_size.end(); it++)
{
if ( ix.i >= *it ) { ix.i -= *it; ix.v++; }
else break;
}
return ix;
}
};
And here is one test code:
#define alen(a) sizeof(a)/sizeof(*a)
int main() {
int a[] = {1,2,3,4,5,6};
int b[] = {11,12,13,14,15,16,17,18};
joinable_array<int> arr(a,alen(a));
arr.join(b, alen(b));
arr.join(a, alen(a)); //join it again!
for(size_t i = 0 ; i < arr.size() ; i++ )
std::cout << arr[i] << " ";
}
Output:
1 2 3 4 5 6 11 12 13 14 15 16 17 18 1 2 3 4 5 6
Online demo : http://ideone.com/VRSJI
Here's how to do it properly:
template<class T, class K1, class K2>
class JoinArray {
JoinArray(K1 &k1, K2 &k2) : k1(k1), k2(k2) { }
T operator[](int i) const { int s = k1.size(); if (i < s) return k1.operator[](i); else return k2.operator[](i-s); }
int size() const { return k1.size() + k2.size(); }
private:
K1 &k1;
K2 &k2;
};
template<class T, class K1, class K2>
JoinArray<T,K1,K2> join(K1 &k1, K2 &k2) { return JoinArray<T,K1,K2>(k1,k2); }
template<class T>
class NativeArray
{
NativeArray(T *ptr, int size) : ptr(ptr), size(size) { }
T operator[](int i) const { return ptr[i]; }
int size() const { return size; }
private:
T *ptr;
int size;
};
int main() {
int array[2] = { 0,1 };
int array2[2] = { 2,3 };
NativeArray<int> na(array, 2);
NativeArray<int> na2(array2, 2);
auto joinarray = join(na,na2);
}
A variable that is a pointer to a pointer must be declared as such.
This is done by placing an additional asterik in front of its name.
Hence, int **large_a = new int*[4]; Your large_a goes and find a pointer, while you've defined it as a pointer to an int. It should be defined (declared) as a pointer to a pointer variable. Just as int **large_a; could be enough.