Numerical array too long in c++ : how can I circumvent this?

Numerical array too long in c++ : how can I circumvent this? - c++

DISCLAIMER: I am at a very entry level in c++ (or any language)... I searched for similar questions but found none
I am trying to write a simple program which should make some operations on an array as big as int pop[100000000][4] (10^8); however my compiler crashs even for a int pop[130000][4] array... is there any way out? Am I using a wrong approach?
(For now I am limiting myself to a very simple program, my aim is to generate random numbers in the array[][0] every "turn" to simulate a population and work with that).
Thanks for your time and attention

An array of 130000 * 4 ints is going to be huge, and likely not something you want stored locally (in reality, on the stack where it generally won't fit).
Instead you can use dynamic allocation to get heap storage, the recommended means would be a vector of vectors
std::vector<std::vector<int>> pop(130000, std::vector<int>(4));
pop[12000][1] = 9; // expected syntax
vectors are dynamic, so know that they can be changed with all sorts of calls
If you're a new programmer and trying to write a simple programmer, you should consider not using 203KiB of ints

Related

Storing large amounts of compile time constant data

I hope that this question hasn’t been asked before, but I couldn’t find any answer after googling for an hour.
I have the following problem: I do numerical mathematics and I have about 40 MB of doubles in the form of certain matrices that are compile time constants. I very frequently use these matrices throughout my program. The creation of these matrices takes a week of computation, so I do not want to compute them on the fly but use precomputed values.
What is the best way of doing this in a portable manner? Right now I have some very large CPP-files, in which I create dynamically allocated matrices with the constants as initialiser lists. Something along the lines:
data.cpp:
namespace // Anonymous
{
// math::matrix uses dynamic memory internally
const math::matrix K { 3.1337e2, 3.1415926e00, /* a couple of hundred-thousand numbers */ };
}
const math::matrix& get_K() { return K; }
data.h
const math::matrix& get_K();
I am not sure if this can cause problems with too much data on the stack, due to the initialiser list. Or if this can crash some compilers. Or if this is the right way to go about things. It does seem to be working though.
I also thought about loading the matrices at program startup from an external file, but that also seems messy to me.
Thank you very much in advance!

I am not sure if this can cause problems with too much data on the stack, due to the initialiser list.
There should not be a problem assuming it has static storage with non-dynamic initialisation. Which should be the case if math::matrix is an aggregate.
Given that the values will be compile time constant, you might consider defining them in a header, so that all translation units can take advantage of them at compile time. How beneficial that would be depends on how the values are used.
I also thought about loading the matrices at program startup from an external file
The benefit of this approach is the added flexibility that you gain because you would not need to recompile the program when you change the data. This is particularly useful if the program is expensive to compile.

A slightly cleaner approach:
// math::matrix uses dynamic memory internally
const math::matrix K {
#include "matrix_initial_values.h"
};
And, in the included header,
3.1337e2, 3.1415926e00, 1,2,3,4,5e6, 7.0,...
Considering your comment of "A few hundred thousand" float values: 1M double values takes 8,000,000 bytes, or about 7.6MB. That's not going to blow the stack. Win64 has a max stack size of 1GB, so you'd have to go really, really nuts, and that's assuming that these values are actually placed on the stack, which they should not be given that it's const.
This is probably implementation-specific, but a large block of literals is typically stored as a big chunk of code-space data that's loaded directly into the process' memory space. The identifier (K) is really just a label for that data. It doesn't exist on the stack or the heap anymore than code does.

What advantages do arrays hold over vectors?

Well, after a full year of programming and only knowing of arrays, I was made aware of the existence of vectors (by some members of StackOverflow on a previous post of mine). I did a load of researching and studying them on my own and rewrote an entire application I had written with arrays and linked lists, with vectors. At this point, I'm not sure if I'll still use arrays, because vectors seem to be more flexible and efficient. With their ability to grow and shrink in size automatically, I don't know if I'll be using arrays as much. At this point, the only advantage I personally see is that arrays are much easier to write and understand. The learning curve for arrays is nothing, where there is a small learning curve for vectors. Anyway, I'm sure there's probably a good reason for using arrays in some situation and vectors in others, I was just curious what the community thinks. I'm an entirely a novice, so I assume that I'm just not well-informed enough on the strict usages of either.
And in case anyone is even remotely curious, this is the application I'm practicing using vectors with. Its really rough and needs a lot of work: https://github.com/JosephTLyons/Joseph-Lyons-Contact-Book-Application

A std::vector manages a dynamic array. If your program need an array that changes its size dynamically at run-time then you would end up writing code to do all the things a std::vector does but probably much less efficiently.
What the std::vector does is wrap all that code up in a single class so that you don't need to keep writing the same code to do the same stuff over and over.
Accessing the data in a std::vector is no less efficient than accessing the data in a dynamic array because the std::vector functions are all trivial inline functions that the compiler optimizes away.
If, however, you need a fixed size then you can get slightly more efficient than a std::vector with a raw array. However you won't loose anything using a std::array in those cases.
The places I still use raw arrays are like when I need a temporary fixed-size buffer that isn't going to be passed around to other functions:
// some code
{ // new scope for temporary buffer
char buffer[1024]; // buffer
file.read(buffer, sizeof(buffer)); // use buffer
} // buffer is destroyed here
But I find it hard to justify ever using a raw dynamic array over a std::vector.

This is not a full answer, but one thing I can think of is, that the "ability to grow and shrink" is not such a good thing if you know what you want. For example: assume you want to save memory of 1000 objects, but the memory will be filled at a rate that will cause the vector to grow each time. The overhead you'll get from growing will be costly when you can simply define a fixed array
Generally speaking: if you will use an array over a vector - you will have more power at your hands, meaning no "background" function calls you don't actually need (resizing), no extra memory saved for things you don't use (size of vector...).
Additionally, using memory on the stack (array) is faster than heap (vector*) as shown here
*as shown here it's not entirely precise to say vectors reside on the heap, but they sure hold more memory on the heap than the array (that holds none on the heap)

One reason is that if you have a lot of really small structures, small fixed length arrays can be memory efficient.
compare
struct point
{
float coords[4]
}
with
struct point
{
std::vector<float> coords;
}
Alternatives include std::array for cases like this. Also std::vector implementations will over allocate, meaning that if you want resize to 4 slots, you might have memory allocated for 16 slots.
Furthermore, the memory locations will be scattered and hard to predict, killing performance - using an exceptionally larger number of std::vectors may also need to memory fragmentation issues, where new starts failing.

I think this question is best answered flipped around:
What advantages does std::vector have over raw arrays?
I think this list is more easily enumerable (not to say this list is comprehensive):
Automatic dynamic memory allocation
Proper stack, queue, and sort implementations attached
Integration with C++ 11 related syntactical features such as iterator
If you aren't using such features there's not any particular benefit to std::vector over a "raw array" (though, similarly, in most cases the downsides are negligible).
Despite me saying this, for typical user applications (i.e. running on windows/unix desktop platforms) std::vector or std::array is (probably) typically the preferred data structure because even if you don't need all these features everywhere, if you're already using std::vector anywhere else you may as well keep your data types consistent so your code is easier to maintain.
However, since at the core std::vector simply adds functionality on top of "raw arrays" I think it's important to understand how arrays work in order to be fully take advantage of std::vector or std::array (knowing when to use std::array being one example) so you can reduce the "carbon footprint" of std::vector.
Additionally, be aware that you are going to see raw arrays when working with
Embedded code
Kernel code
Signal processing code
Cache efficient matrix implementations
Code dealing with very large data sets
Any other code where performance really matters
The lesson shouldn't be to freak out and say "must std::vector all the things!" when you encounter this in the real world.
Also: THIS!!!!
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
embedding arrays can localise memory access requirement, improving cache hits and therefore performance

C++ - using a 2D array of which the dimension is [250][12]

I have a CSV file which contains 250 lines and each line contains 12 items separated by commas. I am going to store this in a 2D array of which the dimension is [250][12].
My question is : " Is it a bad programming practice to use such a huge array ?
I am going to pass this array to a method which takes a 2D array as the argument. It comes with openCV.
will there be a memory overflow ? "

Well, if you don't have to use it, it would be better. For example, read the file line by line and enter each line into the csv parser. That way each line is dealt with, and you rely on the (hopefully professional and optimized) memory management.
However, if it works it works. If you don't need this in a production environment, I don't see why you should have to change it, other than good practice.

First, you have to be clear about how you'll break a line of text into 12 fields typed as expected by openCV. You may want that to be the central area of the design.
No problem using a static array if the size 250x12 will never change and memory consumption is suitable for the hardware your program is supposed to run on. You face a trade-off between memory usage and complexity of code: if memory is a concern or if you have flexibility in mind then you should process line by line or even token by token, provided openCV implements those modes.

If you know the size of the array is going to be limited to 250*12 then that is not a huge array, assuming you are using a reasonable machine. Even with long double type elements your array is going to take 36 MB of space. If, however, your underlying elements are objects with sub-elements then you may want to re-think your approach e.g., processing the array row-by-row or element-by-element instead of reading it into the memory all at once.
As for passing the array to the function, you will not pass the array by value, you will pass a pointer to the array so it should not be a big overhead.

How to create an array with size more than C++ limits

I have a little problem here, i write c++ code to create an array but when i want to set array size to 100,000,000 or more i got an error.
this is my code:
int i=0;
double *a = new double[n*n];
this part is so important for my project.

When you think you need an array of 100,000,000 elements, what you actually need is a different data structure that you probably have never heard of before. Maybe a hash map, or maybe a sparse matrix.
If you tell us more about the actual problem you are trying to solve, we can provide better help.

In general, the only reason that would fail would be due to lack of memory/memory fragmentation/available address space. That is, trying to allocate 800MB of memory. Granted, I have no idea why your system's virtual memory can't handle that, but maybe you allocated a bunch of other stuff. It doesn't matter.
Your alternatives are to tricks like memory-mapped files, sparse arrays, and so forth instead of an explicit C-style array.

If you do not have sufficient memory, you may need to use a file to store your data and process it in smaller chunks.
Don't know if IMSL provides what you are looking for, however, if you want to work on smaller chunks you might devise an algorithm that can call IMSL functions with these small chunks and later merge the results. For example, you can do matrix multiplication by combining multiplication of sub-matrices.

c++ variable array reservation problem

I have a problem caused when reserving array. The problem is a heap error. The software i am making is like that:
I am making a small software to render a model of a specific format. the model contains several groups and every group contains array vertices and array of indices for these verts "such as a motorcycle model of 3 groups: front wheel, back wheel and body. After i load the model into memory, i want to render it as a vbo but the model is made of several groups as mentioned. so i am merging all the verts in all groups into one array of verts and the same goes for indices. when merging a heap error occurs when reserving the array. The code is like this:
int index=0;
for(int i=0;i<this->groupsSize;i++)
index+=this->groups[i]->capacity.vertsSize;
mdl_vert *m_pVertices=new mdl_vert[index];
index=0;
for(int i=0;i<this->groupsSize;i++)
index+=this->groups[i]->capacity.indicesSize;
unsigned int *m_pIndices=new unsigned int[index];
index=0;
for(int i=0;i<this->groupsSize;i++)
{
for(int j=0;j<this->groups[i]->capacity.vertsSize;j++)
{
m_pVertices[index]=this->groups[i]->verts[j];
index++;
}
}
When i reserve indices the heap error is occuring. I also used std::vector but the same error occur. can anybody give me a hint of what am i doing wrong in this case.
N.B. mdl_vert is a struct that consists of float x,y,z;caused when reserving array.

You don't supply enough info to pintpoint the problem, or even what the problem is.
But there are things you can do to clean up the code, and maybe that will help.
1. Use std::vector instead of `new`-ing raw arrays
Instead of
unsigned int *m_pIndices=new unsigned int[index];
use
std::vector<unsigned> indices( index );
Note that this std::vector is not itself dynamically allocated.
It uses dynamic allocation inside, and it does that correctly for you.
Even better, just use …
std::vector<int> indices( index );
… because unsigned arithmetic can easily screw up.
2. Don't use misleading naming
The m_ prefix makes it seem as if you really want to access data members, not local variables.
But you are defining local variables.
Either use the data members, or drop the m_ name prefixes.
3. Don't "reuse" variables
You're using the variable index for multiple successive purposes.
Declare and use one (properly named) variable for each purpose.
4. Don't rely on side-effects from earlier code.
For example, you are relying on the value of index after a for-loop where it's used a loop counter.
Instead, directly use the value that you have deduced that it will have.
5. Don't obscure the code with do-nothing things.
This is just a style issue, but consider removing all the this-> qualifications. It's verbose and obscures the code; it makes the code less readable and less clear. Yes, with primitive tools like Visual Studio such qualifications can help with getting names in drop-down lists, but that's a disservice: it makes it more difficult to remember things, and without remembering things you can't have the understanding needed to write correct code.
Cheers & hth.,

How big is the resulting index? Probably just overflow occurs.

Do you know what the error is? If not, you may want to try putting the code into a try/catch with std::exception. Generally when I have gotten errors along those lines, it was related to a st9bad_alloc error. Which essentially means the size supplied to the new was invalid or too big (either in terms or actual memory, or because of limits imposed by the system with regards to stack space). If so, validate the numbers supplied to new, and ensure the stack size is large enough (try the 'limit' command if using Linux). Good luck

One or more of your capacity.indicesSize is likely uninitialized, so you get a way too large number. Print index before allocating

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js