C++: Dynamically growing 2d array - c++

I have the following situation solved with a vector, but one of my older colleagues told me in a discussion that it would be much faster with an array.
I calculate lots (and I mean lots!) of 12-dimensional vectors from lots of audio files and have to store them for processing. I really need all those vectors before I can start my calculation. Anyhow, I can not predict how many audios, and I can not predict how many vectors are extracted from each audio. Therefor I need a structure to hold the vectors dynamically.
Therefor I create a new double array for each vector and push it to a vector.
I now want to face and test, if my colleague is really right that the calculation can be boosted with using also an array instead of a vector for storing.
vector<double*>* Features = new vector<double*>();
double* feature = new double[12];
// adding elements
Features->push_back(features);
As far as i know to create dynamically 2d array I need to know the count of rows.
double* container = new double*[rows];
container[0] = new double[12];
// and so on..
I know rows after processing all audios, and I don't want to process the audio double times.
Anyone got any idea on how to solve this and append it, or is it just not possible in that way and I should use either vector or create own structure (which assumed may be slower than vector).

Unless have any strong reasons not to, I would suggest something like this:
std::vector<std::array<double, 12>> Features;
You get all the memory locality you could want, and all of the the automagic memory management you need.

You can certainly do this, but it would be much better if you perform this with std::vector. For dynamic growth of a 2D array, you would have to perform all these things.
Create a temporary 2D Array
Allocate memory to it.
Allocate memory to its each component array.
Copy data into its component arrays.
Delete each component array of the original 2D Array.
Delete the 2D Array.
Take new Input.
Add new item to the temporary 2D array.
Create the original 2D Array and allocate memory to it.
Allocate memory to its component arrays.
Copy temporary data into it again.
After doing this in each step, it is hardly acceptable that arrays would be any faster. Use std:vector. The above written answers explain that.

Using vector will make the problem easier because it makes growing the data automatic. Unfortunately due to how vectors grow, using vectors may not be the best solution because of the number of times required to grow for a large data set. On the other hand if you set the initial size of the vector quite large but only need a small number of 12 index arrays. You just wasted a large amount of memory. If there is someway to produce a guess of the size required you might use that guess value to dynamically allocate arrays or set the vector to that size initially.
If you are only going to calculate with the data once or twice, than maybe you should consider using map or list. These two structures for large arrays will create a memory structure that matches your exact needs, and bypass the extra time requirements for growing the arrays. On the other hand the calculations with these data structures will be slower.
I hope these thoughts add some alternative solutions to this discussion.

Related

Declaring 3D array structure in c++ using vector

Hi I am a graduate student studying scientific computing using c++. Some of our research focus on speed of an algorithm, therefore it is important to construct array structure that is fast enough.
I've seen two ways of constructing 3D Arrays.
First one is to use vector liblary.
vector<vector<vector<double>>> a (isize,vector<double>(jsize,vector<double>(ksize,0)))
This gives 3D array structure of size isize x jsize x ksize.
The other one is to construct a structure containing 1d array of size isize* jsize * ksize using
new double[isize*jsize*ksize]. To access the specific location of (i,j,k) easily, operator overloading is necessary(am I right?).
And from what I have experienced, first one is much faster since it can access to location (i,j,k) easily while latter one has to compute location and return the value. But I have seen some people preferring latter one over the first one. Why do they prefer the latter setting? and is there any disadvantage of using the first one?
Thanks in adavance.
Main difference between those will be the layout:
vector<vector<vector<T>>>
This will get you a 1D array of vector<vector<T>>.
Each item will be a 1D array of vector<T>.
And each item of those 1D array will be a 1D array of T.
The point is, vector itself does not store its content. It manages a chunk of memory, and stores the content there. This has a number of bad consequences:
For a matrix of dimension X·Y·Z, you will end up allocating 1 + X + X·Y memory chunks. That's horribly slow, and will trash the heap. Imagine: a cube matrix of size 20 would trigger 421 calls to new!
To access a cell, you have 3 levels of indirection:
You must access the vector<vector<vector<T>>> object to get pointer to top-level memory chunk.
You must then access the vector<vector<T>> object to get pointer to second-level memory chunk.
You must then access the vector<T> object to get pointer to the leaf memory chunk.
Only then you can access the T data.
Those memory chunks will be spread around the heap, causing a lot of cache misses and slowing the overall computation.
Should you get it wrong at some point, it is possible to end up with some lines in your matrix having different lengths. After all, they're independent 1-d arrays.
Having a contiguous memory block (like new T[X * Y * Z]) on the other hand gives:
You allocate 1 memory chunk. No heap trashing, O(1).
You only need to access the pointer to the memory chunk, then can go straight for desired element.
All matrix is contiguous in memory, which is cache-friendly.
Those days, a single cache miss means dozens or hundreds lost computing cycles, do not underestimate the cache-friendliness aspect.
By the way, there is a probably better way you didn't mention: using one of the numerous matrix libraries that will handle this for you automatically and provide nice support tools (like SSE-accelerated matrix operations). One such library is Eigen, but there are plenty others.
→ You want to do scientific computing? Let a lib handle the boilerplate and the basics so you can focus on the scientific computing part.
In my point of view, there are too much advantages std::vector's have over normal plain arrays.
In short here are some:
It is much harder to create memory leaks with std::vector. This point alone is one of the biggest advantages. This has nothing to do with performance, but should be considered all the time.
std::vector is part of the STL. This part of C++ is one of the most used one. Thousands of people use the STL and so they get "tested" every day. Over the last years they got optimized so radically, they don't lack any performance anymore. (pls correct me if i see this wrong)
Handling with std::vector is easy as 1, 2, 3. No pointer handling no nothing... Just accessing it via methods or with []-operator and more other methods.
First of all, the idea that you access (i,j,k) in your vec^3 directly is somewhat flawed. What you have is a structure of pointers where you need to dereference three pointers along the way. Note that I have no idea whether that is faster or slower than computing the position within a one-dimensional array, though. You'd need to test that and it might depend on the size of your data (especially whether it fits in a chunk).
Second, the vector^3 requires pointers and vector sizes, which require more memory. In many cases, this will be irrelevant (as the image grows cubically but the memory difference only quadratically) but if your algoritm is really going to fill out any memory available, that can matter.
Third, the raw array stores everything in consecutive memory, which is good for streaming and can be good for certain algorithms because of quick cache accesses. For example when you add one 3D image to another.
Note that all of this is about hyper-optimization that you might not need. The advantages of vectors that skratchi.at pointed out in his answer are quite strong, and I add the advantage that vectors usually increase readability. If you do not have very good reasons not to use vectors, then use them.
If you should decide for the raw array, in any case, make sure that you wrap it well and keep the class small and simple, in order to counter problems regarding leaks and such.
Welcome to SO.
If everything what you have are the two alternatives, then the first one could be better.
Prefer using STL array or vector instead of a C array
You should avoid to use C++ plain arrays since you need to manage yourself the memory allocating/deallocating with new/delete and other boilerplate code like keep track of the size/check bounds. In clearly words "C arrays are less safe, and have no advantages over array and vector."
However, there are some important drawbacks in the first alternative. Something I would like to highlight is that:
std::vector<std::vector<std::vector<T>>>
is not a 3-d matrix. In a matrix, all the rows must have the same size. On the other hand, in a "vector of vectors" there is no guarantee that all the nested vectors have the same length. The reason is that a vector is a linear 1-D structure as pointed out in the #spectras answer. Hence, to avoid all sort of bad or unexpected behaviours, you must to include guards in your code to obtain the rectangular invariant guaranteed.
Luckily, the first alternative is not the only one you may have in hands.
For example, you can replace the c-style array by a std::array:
const int n = i_size * j_size * k_size;
std::array<int, n> myFlattenMatrix;
or use std::vector in case your matrix dimensions can change.
Accessing element by its 3 coordinates
Regarding your question
To access the specific location of (i,j,k) easily, operator
overloading is necessary(am I right?).
Not exactly. Since there isn't a 3-parameter operator for neither std::vector nor array, you can't overload it. But you can create a template class or function to wrap it for you. In any case you will must to deference the 3 vectors or calculate the flatten index of the element in the linear storage.
Considering do not use a third part matrix library like Eigen for your experiments
You aren't coding it for production but for research purposes instead. Particularly, your research is exactly regarding the performance of algorithms. In that case, I prefer do not recommend to use a third part library, like Eigen, absolutely. Of course it depends a lot of what kind of "speed of an algorithm" metrics are you willing to gather, but Eigen, for instance, will do a lot of things under the hood (like vectorization) which will have a tremendous influence on your experiments. Since it will be hard for you to control those unseen optimizations, these library's features may lead you to wrong conclusions about your algorithms.
Algorithm's performance and big-o notation
Usually, the performance of algorithms are analysed by using the big-O approach where factors like the actual time spent, hardware speed or programming language traits aren't taken in account. The book "Data Structures and Algorithms in C++" by Adam Drozdek can provide more details about it.

Dynamic Jagged Arrays on the GPU

I have a an array of arrays in which each array is of a different size. This is all stored in GPU memory, so it is compacted into a one dimensional array. I need each independent array to be able to re-size independently without touching (read -> move -> write) any of the other data. For example, when one of the inner arrays needs to grow, I could remove the data, compact the remaining data, and place the new, larger array at the end of the jagged array. However, as the jagged array contains a lot of data, this would be extremely inefficient. Empty spaces can exist, but due to the nature of the application, the max amount of GPU memory would quickly be reached if I only added to the end.
I have a fallback option where I would insert new arrays into available spaces if they fit perfectly, but this would require a second write on empty data to remove useless information, something that I would preferably like to avoid as it would also involve a serial algorithm in a parallel environment.
If anyone has solution, or even a topic that I could research more into, I would appreciate it immensely.

Data structure in C/C++ for multiple variable size arrays

This is the problem at hand:
I have several 10000s of arrays. Each array could be anywhere between 2-15 units in length.
The total length of all the elements in all the arrays and the number of arrays can be computed using some very low cost calculations. But the exact number in each array is not known until some fairly expensive computation is completed.
Since I know the total length of all the elements in all the arrays, I would like to just allocate data for it using just one new/malloc and just set pointers within this allocation. In my current implementation I use memmove to move the data after a certain item is inserted and updates all pointers accordingly.
Is there a better way of doing this?
Thanks,
Sid
It's not clear what you mean by better way. If you are looking for something that works faster and can afford some extra memory then you can keep two arrays, one with data, and the other one with the index of the array it belongs. After you added all the data, you can sort by the index and you have all your data split by arrays, finally you sweep the arrays and get the pointer to where each array belongs.
Regarding memory consumption, depending on how many arrays you have, and how big is your data, you can squeeze the index data to the last bits of your data, if you have it bounded by some number. This way, you only need to sort the numbers, and when you are sweeping retrieving the pointer where each array begins, you can clean the top bits.
Since I know the total length of all the elements in all the arrays, I would like to just allocate data for it using just one new/malloc and just set pointers within this allocation.
You can use one large vector. You'll need to manually calculate the offset of each sub-array yourself.
vectors guarantee that their data is stored in contiguous memory, but be careful of maintaining references or pointers to individual elements if the vector is used in such a way that may make it reallocate. Shouldn't be a problem since you're not adding anything beyond the initial size.
int main() {
std::vector<T> vec;
vec.reserve(calc_total_size());
// now you'll need to manually translate the offset of
// a given "array" and then add the offset of the element to that
T someElem = vec[array_offset + element_offset];
}
Yes, there is a better way:
std::vector<std::vector<Item>> array;
array.resize(cheap_calc());
for(int i = 0; i < array.size(); ++i) {
array[i].resize(expensive_calc(i));
for(int j = 0; j < array[i].size(); j++) {
array[i][j] = Item(some_other_calc());
}
}
No pointers, no muss, no fuss.
Are you looking for memory efficiency, speed efficiency, or simplicity?
You can always write or download a dead-simple pool allocator, then pass that as the allocator to the appropriate data structures. Because you know the total size in advance, and never need to resize vectors or add new ones, this can be even simpler than a typical pool allocator. Just malloc all of the storage in one big block, and keep a single pointer to the next block. To allocate n bytes, T *ret = nextBlock; nextBlock += n; return ret;. If your objects are trivial and don't need destruction, you can even just do one big free at the end.
This means you can use any data structure you want, or compare and contrast different ones. A vector of vectors? A giant vector of cells plus a vector of offsets? Something else you came up with that sounds crazy but just might work? You can compare their readability, usability, and performance without worrying about the memory allocation side of things.
(Of course if your goal is speed, packing things this way may not be the best answer. You can often gain a lot of speed by wasting a little space to improve your cache and/or page alignment. You could write a fancy allocator that, e.g., allocates vector space in a transposed way to improve the performance of your algorithm that does column-major where it should do row-major and vice-versa, but at that point, it's probably easier to tweak your algorithms than your allocator.)

How to create an array with size more than C++ limits

I have a little problem here, i write c++ code to create an array but when i want to set array size to 100,000,000 or more i got an error.
this is my code:
int i=0;
double *a = new double[n*n];
this part is so important for my project.
When you think you need an array of 100,000,000 elements, what you actually need is a different data structure that you probably have never heard of before. Maybe a hash map, or maybe a sparse matrix.
If you tell us more about the actual problem you are trying to solve, we can provide better help.
In general, the only reason that would fail would be due to lack of memory/memory fragmentation/available address space. That is, trying to allocate 800MB of memory. Granted, I have no idea why your system's virtual memory can't handle that, but maybe you allocated a bunch of other stuff. It doesn't matter.
Your alternatives are to tricks like memory-mapped files, sparse arrays, and so forth instead of an explicit C-style array.
If you do not have sufficient memory, you may need to use a file to store your data and process it in smaller chunks.
Don't know if IMSL provides what you are looking for, however, if you want to work on smaller chunks you might devise an algorithm that can call IMSL functions with these small chunks and later merge the results. For example, you can do matrix multiplication by combining multiplication of sub-matrices.

Dynamic memory allocation, C++

I need to write a function that can read a file, and add all of the unique words to a dynamically allocated array. I know how to create a dynamically allocated array if, for instance, you are asking for the number of entries in the array:
int value;
cin >> value;
int *number;
number = new int[value];
My problem is that I don't know ahead of time how many unique words are going to be in the file, so I can't initially just read the value or ask for it. Also, I need to make this work with arrays, and not vectors. Is there a way to do something similar to a push_back using a dynamically allocated array?
Right now, the only thing I can come up with is first to create an array that stores ALL of the words in the file (1000), then have it pass through it and find the number of unique words. Then use that value to create a dynamically allocated array which I would then pass through again to store all the unique words. Obviously, that solution sounds pretty overboard for something that should have a more effective solution.
Can someone point me in the right direction, as to whether or not there is a better way? I feel like this would be rather easy to do with vectors, so I think it's kind of silly to require it to be an array (unless there's some important thing that I need to learn about dynamically allocated arrays in this homework assignment).
EDIT: Here's another question. I know there are going to be 1000 words in the file, but I don't know how many unique words there will be. Here's an idea. I could create a 1000 element array, write all of the unique words into that array while keeping track of how many I've done. Once I've finished, I could provision a dynamically allocate a new array with that count, and then just copy the words from the initial array to the second. Not sure if that's the most efficient, but with us not being able to use vectors, I don't think efficiency is a huge concern in this assignment.
A vector really is a better fit for this than an array. Really.
But if you must use an array, you can at least make it behave like a vector :-).
Here's how: allocate the array with some capacity. Store the allocated capacity in a "capacity" variable. Each time you add to the array, increment a separate "length" variable. When you go to add something to the array and discover it's not big enough (length == capacity), allocate a second, longer array, then copy the original's contents to the new one, then finally deallocate the original.
This gives you the effect of being able to grow the array. If performance becomes a concern, grow it by more than one element at a time.
Congrats, after following these easy steps you have implemented a small subset of std::vector functionality atop an array!
As you have rightly pointed out this is trivial with a Vector.
However, given that you are limited to using an array, you will likely need to do one of the following:
Initialize the array with a suitably large size and live with poor memory utilization
Write your own code to dynamically increase the size of the array at run time (basically the internals of a Vector)
If you were permitted to do so, some sort of hash map or linked list would also be a good solution.
If I had to use an array, I'd just allocate one with some initial size, then keep doubling that size when I fill it to accommodate any new values that won't fit in an array with the previous sizes.
Since this question regards C++, memory allocation would be done with the new keyword. But what would be nice is if one could use the realloc() function, which resizes the memory and retains the values in the previously allocated memory. That way one wouldn't need to copy the new values from the old array to the new array. Although I'm not so sure realloc() would play well with memory allocated with new.
You can "resize" array like this (N is size of currentArray, T is type of its elements):
// create new array
T *newArray = new T[N * 2];
// Copy the data
for ( int i = 0; i < N; i++ )
newArray[i] = currentArray[i];
// Change the size to match
N *= 2;
// Destroy the old array
delete [] currentArray;
// set currentArray to newArray
currentArray = newArray;
Using this solution you have to copy the data. There might be a solution that does not require it.
But I think it would be more convenient for you to use std::vectors. You can just push_back into them and they will resize automatically for you.
You can cheat a bit:
use std::set to get all the unique words then copy the set into a dynamically allocated array (or preferably vector).
#include <iterator>
#include <set>
#include <iostream>
#include <string>
// Copy into a set
// this will make sure they are all unique
std::set<std::string> data;
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(data, data.end()));
// Copy the data into your array (or vector).
std::string* words = new std::string[data.size()];
std::copy(data.begin(), data.end(), &words[0]);
This could be going a bit overboard, but you could implement a linked list in C++... it would actually allow you to use a vector-like implementation without actually using vectors (which are actually the best solution).
The implementation is fairly easy: just a pointer to the next and previous nodes and storing the "head" node in a place you can easily access to. Then just looping through the list would let you check which words are already in, and which are not. You could even implement a counter, and count the number of times a word is repeated throughout the text.