C++ using array or vector for many items - c++

The following code is an extract from a source code of a project about graph theory for the university (representation for the directed weighted graph):
struct Edge
{
int neighboor : 20;
int weight : 12;
} e;
struct Node
{
double x;
double y;
vector<Edge> neighboors;
};
...
vector<Node> list;
list.resize(countNode);
Is there a way to save even more memory by replacing vector< Edge > with for example an array?
I'm thinking that there are thousands of vector< Edge > being created, and that takes a lot of memory, doesn't it?
Sorry for my English. Thanks in advance.

You may share the vector of Edge for all nodes:
Something like:
struct Edge
{
int32_t neighboor : 20;
int32_t weight : 12;
} e;
struct Node
{
double x;
double y;
int32_t indexInEdges : 28;
int32_t neighboorCount : 4; // You may adjust these numbers
};
std::vector<Node> nodes;
std::vector<Edge> edges; // So edges contains the edges of nodes[0]
// then those of nodes[1] and so on.
You may also reduce the size of Node by using float instead of double

An array is an ordered collection with static length, and a vector is an ordered collection with dynamic length. Whether an array or a vector is the most memory-efficient data type for you depends on the number of elements that you intend to store. If all elements in the graph have the same number of edges, an array is more memory-efficient, because you get rid of the overhead for variable-length storage.
If the number of edges per element varies strongly, vector is definitely the better choice, because you don't have to statically over-allocate memory on every node for the edges that might be there, but aren't.
If you have small variation in the number of edges (say, 100 to 103 edges per element), an array might still be the better choice, because you can trade static overallocation for the overhead of both a dynamic allocation and the bookkeeping for the vector's size and capacity. Just how large this overhead is depends strongly on your vector implementation, and an experiment is the best way to decide whether it's worth it.
If an experiment seems too much hassle or if you don't have enough data for an experiment, don't overoptimize, and use the vector.

If you use C++98 then you can save significant amount of memory by using arrays instead of vector<Edge> and vector<Node>. vector's consume more memory than they actually needed because they reserve some additional memory for reducing number of memory relocation when a vector is growing.
If you use C++11 you can use vector::shrink_to_fit() to free the unused memory.

Related

How does the performance of a Multidimensional array compare to an array of objects?

Say I have 3 values:
int age;
int girth;
int length;
Would it be more efficient to store and loop through these values in a multidimensional array OR to declare a class with these 3 member variables and then fill an array with objects of this class?
If you want to interpret every dimension as one of your values, it will be ok for the small array sizes. Despite that, when you will fall out of cache, you will probably experience huge decrease in performance if you want to access first element of the first dimension, then of the second and third dimensions. It is due to memory layout of multi dimensional arrays and CPU caching.
If you'll have it as a struct or all 3 values are interpreted as one dimension of the array, and If you will keep accessing that array in traversal manner (e.g. not jumping around), then you will get probably the best performance possible.
So, it should be like this:
struct entry
{
int age;
int girth;
int length;
}
std::array<entry, /*size*/> entries;

Which container should I use for random access, cheap addition and removal (without de/allocation), with a known maximum size?

I need the lighter container that must store till 128 unsigned int.
It must add, edit and remove each element accessing it quickly, without allocating new memory every time (I already know it will be max 128).
Such as:
add int 40 at index 4 (1/128 item used)
add int 36 at index 90 (2/128 item used)
edit to value 42 the element at index 4
add int 36 at index 54 (3/128 item used)
remove element with index 90 (2/128 item used)
remove element with index 4 (1/128 item used)
... and so on. So every time I can iterate trought only the real number of elements added to the container, not all and check if NULL or not.
During this process, as I said, it must not allocating/reallocating new memory, since I'm using an app that manage "audio" data and this means a glitch every time I touch the memory.
Which container would be the right candidate?
It sounds like a "indexes" queue.
As I understand the question, you have two operations
Insert/replace element value at cell index
Delete element at cell index
and one predicate
Is cell index currently occupied?
This is an array and a bitmap. When you insert/replace, you stick the value in the array cell and set the bitmap bit. When you delete, you clear the bitmap bit. When you ask, you query the bitmap bit.
You can just use std::vector<int> and do vector.reserve(128); to keep the vector from allocating memory. This doesn't allow you to keep track of particular indices though.
If you need to keep track of an 'index' you could use std::vector<std::pair<int, int>>. This doesn't allow random access though.
If you only need cheap setting and erasing values, just use an array. You
can keep track of what cells are used by marking them in another array (or bitmap). Or by just defining one value (e.g. 0 or -1) as an "unused" value.
Of course, if you need to iterate over all used cells, you need to scan the whole array. But that's a tradeoff you need to make: either do more work during adding and erasing, or do more work during a search. (Note that an .insert() in the middle of a vector<> will move data around.)
In any case, 128 elements is so few, that a scan through the whole array will be negligible work. And frankly, I think anything more complex than a vector will be total overkill. :)
Roughly:
unsigned data[128] = {0}; // initialize
unsigned used[128] = {0};
data[index] = newvalue; used[index] = 1; // set value
data[index] = used[index] = 0; // unset value
// check if a cell is used and do something
if (used[index]) { do something } else { do something else }
I'd suggest a tandem of vectors, one to hold the active indices, the other to hold the data:
class Container
{
std::vector<size_t> indices;
std::vector<int> data;
size_t index_worldToData(size_t worldIndex) const
{
auto it = std::lower_bound(begin(indices), end(indices), worldIndex);
return it - begin(indices);
}
public:
Container()
{
indices.reserve(128);
data.reserve(128);
}
int& operator[] (size_t worldIndex)
{
return data[index_worldToData(worldIndex)];
}
void addElement(size_t worldIndex, int element)
{
auto dataIndex = index_worldToData(worldIndex);
indices.insert(it, worldIndex);
data.insert(begin(data) + dataIndex, element);
}
void removeElement(size_t worldIndex)
{
auto dataIndex = index_worldToData(worldIndex);
indices.erase(begin(indices) + dataIndex);
data.erase(begin(indices) + dataIndex);
}
class iterator
{
Container *cnt;
size_t dataIndex;
public:
int& operator* () const { return cnt.data[dataIndex]; }
iterator& operator++ () { ++dataIndex; }
};
iterator begin() { return iterator{ this, 0 }; }
iterator end() { return iterator{ this, indices.size() }; }
};
(Disclaimer: code not touched by compiler, preconditions checks omitted)
This one has logarithmic time element access, linear time insertion and removal, and allows iterating over non-empty elements.
You could use a doubly-linked list and an array of node pointers.
Preallocate 128 list nodes and keep them on freelist.
Create a empty itemlist.
Allocate an array of 128 node pointers called items
To insert at i: Pop the head node from freelist, add it to
itemlist, set items[i] to point at it.
To access/change a value, use items[i]->value
To delete at i, remove the node pointed to by items[i], reinsert it in 'freelist'
To iterate, just walk itemlist
Everything is O(1) except iteration, which is O(Nactive_items). Only caveat is that iteration is not in index order.
Freelist can be singly-linked, or even an array of nodes, as all you need is pop and push.
class Container {
private:
set<size_t> indices;
unsigned int buffer[128];
public:
void set_elem(const size_t index, const unsigned int element) {
buffer[index] = element;
indices.insert(index);
}
// and so on -- iterate over the indices if necessary
};
There are multiple approaches that you can use, I will cite them in order of effort expended.
The most affordable solution is to use the Boost non-standard containers, of particular interest is flat_map. Essentially, a flat_map offers the interface of a map over the storage provided by a dynamic array.
You can call its reserve member at the start to avoid memory allocation afterward.
A slightly more involved solution is to code your own memory allocator.
The interface of an allocator is relatively easy to deal with, so that coding an allocator is quite simple. Create a pool-allocator which will never release any element, warm it up (allocate 128 elements) and you are ready to go: it can be plugged in any collection to make it memory-allocation-free.
Of particular interest, here, is of course std::map.
Finally, there is the do-it-yourself road. Much more involved, quite obviously: the number of operations supported by standard containers is just... huge.
Still, if you have the time or can live with only a subset of those operations, then this road has one undeniable advantage: you can tailor the container specifically to your needs.
Of particular interest here is the idea of having a std::vector<boost::optional<int>> of 128 elements... except that since this representation is quite space inefficient, we use the Data-Oriented Design to instead make it two vectors: std::vector<int> and std::vector<bool>, which is much more compact, or even...
struct Container {
size_t const Size = 128;
int array[Size];
std::bitset<Size> marker;
}
which is both compact and allocation-free.
Now, iterating requires iterating the bitset for present elements, which might seem wasteful at first, but said bitset is only 16 bytes long so it's a breeze! (because at such scale memory locality trumps big-O complexity)
Why not use std::map<int, int>, it provides random access and is sparse.
If a vector (pre-reserved) is not handy enough, look into Boost.Container for various “flat” varieties of indexed collections. This will store everything in a vector and not need memory manipulation, but adds a layer on top to make it a set or map, indexable by which elements are present and able to tell which are not.

Is it worth to use vector in case of making a map

I have got a class that represents a 2D map with size 40x40.
I read some data from sensors and create this map with marking cells if my sensors found something and I set value of propablity of finding an obstacle. For example when I am find some obstacle in cell [52,22] I add to its value for example to 10 and add to surrounded cells value 5.
So each cell of this map should keep some little value(propably not bigger). So when a cell is marked three times by sensor, its value will be 30 and surronding cells will have 15.
And my question is, is it worth to use casual array or is it better to use vector even I do not sort this cells, dont remove them etc. I just set its value, and read it later?
Update:
Actually I have in my header file:
using cell = uint8_t;
class Grid {
private:
int xSize, ySize;
cell *cells;
public:
//some methods
}
In cpp :
using cell = uint8_t;
Grid::Grid(int xSize, int ySize) : xSize(xSize), ySize(ySize) {
cells = new cell[xSize * ySize];
for (int i = 0; i < xSize; i++) {
for (int j = 0; j < ySize; j++)
cells[x + y * xSize] = 0;
}
}
Grid::~Grid(void) {
delete cells;
}
inline cell* Grid::getCell(int x, int y) const{
return &cells[x + y * xSize];
}
Does it look fine?
I'd use std::array rather than std::vector.
For fixed size arrays you get the benefits of STL containers with the performance of 'naked' arrays.
http://en.cppreference.com/w/cpp/container/array
A static (C-style) array is possible in your case since the size in known at compile-time.
BUT. It may be interesting to have the data on the heap instead of the stack.
If the array is a global variable, it's ugly an bug-prone (avoid that when you can).
If the array is a local variable (let say, in your main() function), then a stack overflow may occur. Well, it's very unlikely for a 40*40 array of tiny things, but I'd prefer have my data on the heap, to keep things safe, clean, and future-proof.
So, IMHO you should definitely go for the vector, it's fast, clean and readable, and you don't have to worry about stack overflow, memory allocation, etc.
About your data. If you know your values are storable on a single byte, go for it !
An uint8_t (same as unsigned char) can store values from 0 to 255. If it's enough, use it.
using cell = uint8_t; // define a nice name for your data type
std::vector<cell> myMap;
size_t size = 40;
myMap.reserve(size*size);
side note: don't use new[]. Well, you can, but it has no advantages over a vector. You will probably only gain headaches handling memory manually.
Some advantages of using a std::vector is that it can be dynamically allocated (flexible size, can be resized during execution, etc) and can be passed/returned from a function. Since you have a fixed size 40x40 and you know you have one element int in every cell, I don't think it matters that much in your case and I would NOT suggest using a class object std::vector to process this simple task.
And here is a possible duplicate.

Vector Object Invetory, Object that can store other Object types?

I'm trying to create a Inventory system that can hold any object
for example
struct Ore{
string name;
int Size;
};
struct Wood{
string name;
int size;
int color;
};
my idea is to create a struct with 2 vectors, one for Numeric numbers, like items with Attack, Deffense and stuff, and the other vector for Name,description or other text.
With multiple constructors for different item types.
the problem I have with it is ive heard vectors can take up more memory, and i expect this program to create hundreds or thousands of items.
So i was looking for any suggestion for bettery memory storage.
struct Invetory{
vector<float> Number;
vector<string> Word;
Invetory(string n,float a)
{Word.push_back(s); Number.push_back(a)}
Invetory(string n,float a, float b)
{Word.push_back(s); Number.push_back(a); Number.push_back(b);}
};
vector<Invetory>Bag_Space;
You are trying to optimize too early.
Go with whatever is the cleanest thing to use. vectors are not an insane choice. (Using arrays or std::vectors in C++, what's the performance gap?)
Deal with a performance issue if/when it arises.
Checkout the following discussions on premature optimizations.
When is optimisation premature?
https://softwareengineering.stackexchange.com/questions/80084/is-premature-optimization-really-the-root-of-all-evil
BTW I stumbled upon this interesting discussion on potential performance issues with vectors. In summary, it's saying that if your vectors are shrinking, then the memory footprint won't shrink with the vector size unless you call swap function.
And if you are making a lot of vectors and don't need to initialize it's elements to 0s, then instead of
vector<int> bigarray(N);
try
vector<int> bigarray;
bigarray.reserve(N);

Memory allocation for struct (low performance)

I do have a question related to slow performance of allocating memory for several structs.
I have a struct which is looking like: see below
typedef struct _node
{
// Pointer to leaves & neigbours
struct _node *children[nrChild], *neighb[nrNeigh];
// Pointer to parent Node
struct _node *parentNode;
struct _edgeCorner *edgePointID[nrOfEdge];
int indexID; // Value
double f[latDir]; // Lattice Velos
double rho; // Density
double Umag; // Mag. velocity
int depth; // Depth of octree element
} node
At the beginning of my code I do have to create a lot of them (100.000 – 1.000.000 ) by
Using :
tree = new node();
and initiating the elements after it.
Unfortunately, this is pretty slow, therefore do anyone of you have an idea to improve the performance?
Firstly, you'll want to fix it so that it's actually written in C++.
struct node
{
// Pointer to leaves & neigbours
std::array<std::unique_ptr<node>, nrChild> children;
std::array<node*, nrNeigh> neighb;
// Pointer to parent Node
node* parentNode;
std::array<_edgeCorner*, nrOfEdge> edgePointID;
int indexID; // Value
std::array<double, latDir> f; // Lattice Velos
double rho; // Density
double Umag; // Mag. velocity
int depth; // Depth of octree element
};
Secondly, in order to improve your performance, you will require a custom allocator. Boost.Pool would be a fine choice- it's a pre-existing solution that is explicitly designed for repeated allocations of the same size, in this case, sizeof(node). There are other schemes like memory arena that can be even faster, depending on your deallocation needs.
If you know how many nodes you will have, you could allocate them all in one go:
node* Nodes = new node[1000000];
You will need to set the values afterwards, just like you would do if you did it one by one. If it's a lot faster this way, you could try an architecture where you find out how many nodes you will need before allocating them, even if you don't have that number right now.