shortening the period of two dimensional dynamic array algorithms in C++ - c++

I defined two dimensional dynamic array and allocate memory for arrays.Dimensions of arrays are same as each other(256*256):
double **I1,**I2;
int M=256;
int N=256;
int i,j;
I1= new double *[M+1];
for(i=1;i<=M;i++)
{I1[i]=new double [N+1];}
I2= new double *[M+1];
for(i=1;i<=M;i++)
{I2[i]=new double [N+1];}
Then,I assigned values elements of arrays.I have to execute mathematical algorithms on these arrays .I used a lot of for loops.And my code worked very very slowly.
For example if I subtract I2 from I1 and asssigned substract array to another I3 two dimensional array,I used this code:
double **I3;
double temp;
//allocate I3;
I3= new double *[M+1];
for(i=1;i<=M;i++)
{I3[i]=new double [N+1];}
//I3=I1-I2
for(i=1;i<=M;i++){
for(j=1;j<=N;j++){
temp=I1[i][j]-I2[i][j];
I3[i][j]=temp;}
}
How can I short execution time of C++ without using for loops ?
Could you advise me another methods please?
Best Regards..

First of all, in most cases I would advise against manually managing your memory like this. I'm sure you have heard that C++ offers container classes to which "algorithms" can be applied. These containers are less error prone (especially in the case of exceptions), the operations are more expressive, optimized and usually well-tested, so proven to work.
In your case, with the size of array known before, a std::vector can be used with no performance loss (except at creation), since the memory is guaranteed to be continuous and can thus be used like an array. You should also think about flattening your array, calling an allocation routine in a loop is not exactly speedy - allocation is costly. When doing matrix multiplication, consider allocation in row-major / column-major pairs, this helps caching...but I digress.
This is only a general advice, though - I am not advising you to re-implement this using containers, I just felt the need to mention them.
In this specific case, since you mentioned you want to "execute mathematical algorithms" I would suggest you have a look at a numeric library that is able to do matrix / vector operations, as this seems to be what you are after.
For C++, there is Newmat for example, and the (more or less) canonical BLAS/LAPACK implementations (i.e. Netlib, AMD's ACML, ATLAS). These allow you to perform common (and not so common) operations like adding/subtracting vectors, multiplying matrices etc. much faster, both using optimized algorithms and also optimizations as SIMD instructions your processor might offer (i.e. SSE).
Obviously, there is no way to avoid iterating over these values when doing computations, but you can do it in an optimized manner and with a standard interface.

In order of importance:
Switch on compiler optimization.
Allocate a single array for each matrix and use something like M*i+j for indexing. This will allocate faster and perhaps more importantly be more compact and less fragmented than multiple allocations.
Get used to indexing starting by zero, this will save you one array element and in general zero comparisons have the potential to be faster.
I see nothing wrong in using for loops.
If you are willing to spend even more effort, you could either use a vectorized 3rd party linear algebra lib or vectorize yourself by using things like SSE* or GPUs.

Some architectures have hardware support for vector arithmetic, such that a single instruction will sum all the elements of an array of doubles.
However, the first thing you must do to speed up a program is measure it. Have you timed your program to see where the slowdown occurs?
For example, one thing you appear to be doing in a for loop is lots of heap allocation, which tends to be slow. You could combine all your arrays into one array for greater speed.
You are currently doing the logical equivalent of this:
I3 = I1 - I2;
If you did this:
I1 -= I2;
Now I1 would be storing the result. This would destroy the original value of I1, but would avoid allocating a new array-of-arrays.
Also the intention of C++ is that you define classes to represent a data type and the operations on it. So you could write a class to represent your dynamic array storage. Or use an existing one - check out the uBLAS library.

I don't understand why you say that this is very slow. You're doing 256*256 subtractions here. I don't think there is a way to avoid for loops here (even if you're using a matrix library it will probably still do the same).
You might consider allocating 256*256 floats in one go instead of calling new 256 times (and then use some indexing arithmetic because you have only one index) but then it's probably easier to find a matrix library which does this for you.

Everything is already in STL, use valarray.
See also: How can I use a std::valarray to store/manipulate a contiguous 2D array?

Related

Do 2d+ vectors cause a performance hit? [duplicate]

In our C++ course they suggest not to use C++ arrays on new projects anymore. As far as I know Stroustroup himself suggests not to use arrays. But are there significant performance differences?
Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem that you have to keep track of the size, and you need to delete them manually and do all sorts of housekeeping.
Using arrays on the stack is also discouraged because you don't have range checking, and passing the array around will lose any information about its size (array to pointer conversion). You should use std::array in that case, which wraps a C++ array in a small class and provides a size function and iterators to iterate over it.
Now, std::vector vs. native C++ arrays (taken from the internet):
// Comparison of assembly code generated for basic indexing, dereferencing,
// and increment operations on vectors and arrays/pointers.
// Assembly code was generated by gcc 4.1.0 invoked with g++ -O3 -S on a
// x86_64-suse-linux machine.
#include <vector>
struct S
{
int padding;
std::vector<int> v;
int * p;
std::vector<int>::iterator i;
};
int pointer_index (S & s) { return s.p[3]; }
// movq 32(%rdi), %rax
// movl 12(%rax), %eax
// ret
int vector_index (S & s) { return s.v[3]; }
// movq 8(%rdi), %rax
// movl 12(%rax), %eax
// ret
// Conclusion: Indexing a vector is the same damn thing as indexing a pointer.
int pointer_deref (S & s) { return *s.p; }
// movq 32(%rdi), %rax
// movl (%rax), %eax
// ret
int iterator_deref (S & s) { return *s.i; }
// movq 40(%rdi), %rax
// movl (%rax), %eax
// ret
// Conclusion: Dereferencing a vector iterator is the same damn thing
// as dereferencing a pointer.
void pointer_increment (S & s) { ++s.p; }
// addq $4, 32(%rdi)
// ret
void iterator_increment (S & s) { ++s.i; }
// addq $4, 40(%rdi)
// ret
// Conclusion: Incrementing a vector iterator is the same damn thing as
// incrementing a pointer.
Note: If you allocate arrays with new and allocate non-class objects (like plain int) or classes without a user defined constructor and you don't want to have your elements initialized initially, using new-allocated arrays can have performance advantages because std::vector initializes all elements to default values (0 for int, for example) on construction (credits to #bernie for reminding me).
Preamble for micro-optimizer people
Remember:
"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%".
(Thanks to metamorphosis for the full quote)
Don't use a C array instead of a vector (or whatever) just because you believe it's faster as it is supposed to be lower-level. You would be wrong.
Use by default vector (or the safe container adapted to your need), and then if your profiler says it is a problem, see if you can optimize it, either by using a better algorithm, or changing container.
This said, we can go back to the original question.
Static/Dynamic Array?
The C++ array classes are better behaved than the low-level C array because they know a lot about themselves, and can answer questions C arrays can't. They are able to clean after themselves. And more importantly, they are usually written using templates and/or inlining, which means that what appears to a lot of code in debug resolves to little or no code produced in release build, meaning no difference with their built-in less safe competition.
All in all, it falls on two categories:
Dynamic arrays
Using a pointer to a malloc-ed/new-ed array will be at best as fast as the std::vector version, and a lot less safe (see litb's post).
So use a std::vector.
Static arrays
Using a static array will be at best:
as fast as the std::array version
and a lot less safe.
So use a std::array.
Uninitialized memory
Sometimes, using a vector instead of a raw buffer incurs a visible cost because the vector will initialize the buffer at construction, while the code it replaces didn't, as remarked bernie by in his answer.
If this is the case, then you can handle it by using a unique_ptr instead of a vector or, if the case is not exceptional in your codeline, actually write a class buffer_owner that will own that memory, and give you easy and safe access to it, including bonuses like resizing it (using realloc?), or whatever you need.
Vectors are arrays under the hood.
The performance is the same.
One place where you can run into a performance issue, is not sizing the vector correctly to begin with.
As a vector fills, it will resize itself, and that can imply, a new array allocation, followed by n copy constructors, followed by about n destructor calls, followed by an array delete.
If your construct/destruct is expensive, you are much better off making the vector the correct size to begin with.
There is a simple way to demonstrate this. Create a simple class that shows when it is constructed/destroyed/copied/assigned. Create a vector of these things, and start pushing them on the back end of the vector. When the vector fills, there will be a cascade of activity as the vector resizes. Then try it again with the vector sized to the expected number of elements. You will see the difference.
To respond to something Mehrdad said:
However, there might be cases where
you still need arrays. When
interfacing with low level code (i.e.
assembly) or old libraries that
require arrays, you might not be able
to use vectors.
Not true at all. Vectors degrade nicely into arrays/pointers if you use:
vector<double> vector;
vector.push_back(42);
double *array = &(*vector.begin());
// pass the array to whatever low-level code you have
This works for all major STL implementations. In the next standard, it will be required to work (even though it does just fine today).
You have even fewer reasons to use plain arrays in C++11.
There are 3 kind of arrays in nature from fastest to slowest, depending on the features they have (of course the quality of implementation can make things really fast even for case 3 in the list):
Static with size known at compile time. --- std::array<T, N>
Dynamic with size known at runtime and never resized. The typical optimization here is, that if the array can be allocated in the stack directly. -- Not available. Maybe dynarray in C++ TS after C++14. In C there are VLAs
Dynamic and resizable at runtime. --- std::vector<T>
For 1. plain static arrays with fixed number of elements, use std::array<T, N> in C++11.
For 2. fixed size arrays specified at runtime, but that won't change their size, there is discussion in C++14 but it has been moved to a technical specification and made out of C++14 finally.
For 3. std::vector<T> will usually ask for memory in the heap. This could have performance consequences, though you could use std::vector<T, MyAlloc<T>> to improve the situation with a custom allocator. The advantage compared to T mytype[] = new MyType[n]; is that you can resize it and that it will not decay to a pointer, as plain arrays do.
Use the standard library types mentioned to avoid arrays decaying to pointers. You will save debugging time and the performance is exactly the same as with plain arrays if you use the same set of features.
There is definitely a performance impact to using an std::vector vs a raw array when you want an uninitialized buffer (e.g. to use as destination for memcpy()). An std::vector will initialize all its elements using the default constructor. A raw array will not.
The c++ spec for the std:vector constructor taking a count argument (it's the third form) states:
`Constructs a new container from a variety of data sources, optionally using a user supplied allocator alloc.
Constructs the container with count default-inserted instances of T. No copies are made.
Complexity
2-3) Linear in count
A raw array does not incur this initialization cost.
Note that with a custom allocator, it is possible to avoid "initialization" of the vector's elements (i.e. to use default initialization instead of value initialization). See these questions for more details:
Is this behavior of vector::resize(size_type n) under C++11 and Boost.Container correct?
How can I avoid std::vector<> to initialize all its elements?
Go with STL. There's no performance penalty. The algorithms are very efficient and they do a good job of handling the kinds of details that most of us would not think about.
STL is a heavily optimized library. In fact, it's even suggested to use STL in games where high performance might be needed. Arrays are too error prone to be used in day to day tasks. Today's compilers are also very smart and can really produce excellent code with STL. If you know what you are doing, STL can usually provide the necessary performance. For example by initializing vectors to required size (if you know from start), you can basically achieve the array performance. However, there might be cases where you still need arrays. When interfacing with low level code (i.e. assembly) or old libraries that require arrays, you might not be able to use vectors.
About duli's contribution with my own measurements.
The conclusion is that arrays of integers are faster than vectors of integers (5 times in my example). However, arrays and vectors are arround the same speed for more complex / not aligned data.
If you compile the software in debug mode, many compilers will not inline the accessor functions of the vector. This will make the stl vector implementation much slower in circumstances where performance is an issue. It will also make the code easier to debug since you can see in the debugger how much memory was allocated.
In optimized mode, I would expect the stl vector to approach the efficiency of an array. This is since many of the vector methods are now inlined.
The performance difference between the two is very much implementation dependent - if you compare a badly implemented std::vector to an optimal array implementation, the array would win, but turn it around and the vector would win...
As long as you compare apples with apples (either both the array and the vector have a fixed number of elements, or both get resized dynamically) I would think that the performance difference is negligible as long as you follow got STL coding practise. Don't forget that using standard C++ containers also allows you to make use of the pre-rolled algorithms that are part of the standard C++ library and most of them are likely to be better performing than the average implementation of the same algorithm you build yourself.
That said, IMHO the vector wins in a debug scenario with a debug STL as most STL implementations with a proper debug mode can at least highlight/cathc the typical mistakes made by people when working with standard containers.
Oh, and don't forget that the array and the vector share the same memory layout so you can use vectors to pass data to legacy C or C++ code that expects basic arrays. Keep in mind that most bets are off in that scenario, though, and you're dealing with raw memory again.
If you're using vectors to represent multi-dimensional behavior, there is a performance hit.
Do 2d+ vectors cause a performance hit?
The gist is that there's a small amount of overhead with each sub-vector having size information, and there will not necessarily be serialization of data (as there is with multi-dimensional c arrays). This lack of serialization can offer greater than micro optimization opportunities. If you're doing multi-dimensional arrays, it may be best to just extend std::vector and roll your own get/set/resize bits function.
If you do not need to dynamically adjust the size, you have the memory overhead of saving the capacity (one pointer/size_t). That's it.
There might be some edge case where you have a vector access inside an inline function inside an inline function, where you've gone beyond what the compiler will inline and it will force a function call. That would be so rare as to not be worth worrying about - in general I would agree with litb.
I'm surprised nobody has mentioned this yet - don't worry about performance until it has been proven to be a problem, then benchmark.
I'd argue that the primary concern isn't performance, but safety. You can make a lot of mistakes with arrays (consider resizing, for example), where a vector would save you a lot of pain.
Vectors use a tiny bit more memory than arrays since they contain the size of the array. They also increase the hard disk size of programs and probably the memory footprint of programs. These increases are tiny, but may matter if you're working with an embedded system. Though most places where these differences matter are places where you would use C rather than C++.
The following simple test:
C++ Array vs Vector performance test explanation
contradicts the conclusions from "Comparison of assembly code generated for basic indexing, dereferencing, and increment operations on vectors and arrays/pointers."
There must be a difference between the arrays and vectors. The test says so... just try it, the code is there...
Sometimes arrays are indeed better than vectors. If you are always manipulating
a fixed length set of objects, arrays are better. Consider the following code snippets:
int main() {
int v[3];
v[0]=1; v[1]=2;v[2]=3;
int sum;
int starttime=time(NULL);
cout << starttime << endl;
for (int i=0;i<50000;i++)
for (int j=0;j<10000;j++) {
X x(v);
sum+=x.first();
}
int endtime=time(NULL);
cout << endtime << endl;
cout << endtime - starttime << endl;
}
where the vector version of X is
class X {
vector<int> vec;
public:
X(const vector<int>& v) {vec = v;}
int first() { return vec[0];}
};
and the array version of X is:
class X {
int f[3];
public:
X(int a[]) {f[0]=a[0]; f[1]=a[1];f[2]=a[2];}
int first() { return f[0];}
};
The array version will of main() will be faster because we are avoiding the
overhead of "new" everytime in the inner loop.
(This code was posted to comp.lang.c++ by me).
For fixed-length arrays the performance is the same (vs. vector<>) in release build, but in debug build low-level arrays win by a factor of 20 in my experience (MS Visual Studio 2015, C++ 11).
So the "save time debugging" argument in favor of STL might be valid if you (or your coworkers) tend to introduce bugs in your array usage, but maybe not if your debugging time is mostly waiting on your code to run to the point you are currently working on so that you can step through it.
Experienced developers working on numerically intensive code sometimes fall into the second group (especially if they use vector :) ).
Assuming a fixed-length array (e.g. int* v = new int[1000]; vs std::vector<int> v(1000);, with the size of v being kept fixed at 1000), the only performance consideration that really matters (or at least mattered to me when I was in a similar dilemma) is the speed of access to an element. I looked up the STL's vector code, and here is what I found:
const_reference
operator[](size_type __n) const
{ return *(this->_M_impl._M_start + __n); }
This function will most certainly be inlined by the compiler. So, as long as the only thing that you plan to do with v is access its elements with operator[], it seems like there shouldn't really be any difference in performance.
There is no argument about which of them is the best or good to use.They both have there own use cases,they both have their pros and cons.The behavior of both containers are different in different places.One of the main difficulty with arrays is that they are fixed in size if once they are defined or initialized then you can not change values and on the other side vectors are flexible, you can change vectors value whenever you want it's not fixed in size like arrays,because array has static memory allocation and vector has dynamic memory or heap memory allocation(we can push and pop elements into/from vector) and the creator of c++ Bjarne Stroustrup said that vectors are flexible to use more than arrays.
Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem you have to keep track of the size, and you need to delete them manually and do all sort of housekeeping.
We can also insert, push and pull values easily in vectors which is not easily possible in arrays.
If we talk about performance wise then if you are working with small values then you should use arrays and if you are working with big scale code then you should go with vector(vectors are good at handling big values then arrays).

Declaring 3D array structure in c++ using vector

Hi I am a graduate student studying scientific computing using c++. Some of our research focus on speed of an algorithm, therefore it is important to construct array structure that is fast enough.
I've seen two ways of constructing 3D Arrays.
First one is to use vector liblary.
vector<vector<vector<double>>> a (isize,vector<double>(jsize,vector<double>(ksize,0)))
This gives 3D array structure of size isize x jsize x ksize.
The other one is to construct a structure containing 1d array of size isize* jsize * ksize using
new double[isize*jsize*ksize]. To access the specific location of (i,j,k) easily, operator overloading is necessary(am I right?).
And from what I have experienced, first one is much faster since it can access to location (i,j,k) easily while latter one has to compute location and return the value. But I have seen some people preferring latter one over the first one. Why do they prefer the latter setting? and is there any disadvantage of using the first one?
Thanks in adavance.
Main difference between those will be the layout:
vector<vector<vector<T>>>
This will get you a 1D array of vector<vector<T>>.
Each item will be a 1D array of vector<T>.
And each item of those 1D array will be a 1D array of T.
The point is, vector itself does not store its content. It manages a chunk of memory, and stores the content there. This has a number of bad consequences:
For a matrix of dimension X·Y·Z, you will end up allocating 1 + X + X·Y memory chunks. That's horribly slow, and will trash the heap. Imagine: a cube matrix of size 20 would trigger 421 calls to new!
To access a cell, you have 3 levels of indirection:
You must access the vector<vector<vector<T>>> object to get pointer to top-level memory chunk.
You must then access the vector<vector<T>> object to get pointer to second-level memory chunk.
You must then access the vector<T> object to get pointer to the leaf memory chunk.
Only then you can access the T data.
Those memory chunks will be spread around the heap, causing a lot of cache misses and slowing the overall computation.
Should you get it wrong at some point, it is possible to end up with some lines in your matrix having different lengths. After all, they're independent 1-d arrays.
Having a contiguous memory block (like new T[X * Y * Z]) on the other hand gives:
You allocate 1 memory chunk. No heap trashing, O(1).
You only need to access the pointer to the memory chunk, then can go straight for desired element.
All matrix is contiguous in memory, which is cache-friendly.
Those days, a single cache miss means dozens or hundreds lost computing cycles, do not underestimate the cache-friendliness aspect.
By the way, there is a probably better way you didn't mention: using one of the numerous matrix libraries that will handle this for you automatically and provide nice support tools (like SSE-accelerated matrix operations). One such library is Eigen, but there are plenty others.
→ You want to do scientific computing? Let a lib handle the boilerplate and the basics so you can focus on the scientific computing part.
In my point of view, there are too much advantages std::vector's have over normal plain arrays.
In short here are some:
It is much harder to create memory leaks with std::vector. This point alone is one of the biggest advantages. This has nothing to do with performance, but should be considered all the time.
std::vector is part of the STL. This part of C++ is one of the most used one. Thousands of people use the STL and so they get "tested" every day. Over the last years they got optimized so radically, they don't lack any performance anymore. (pls correct me if i see this wrong)
Handling with std::vector is easy as 1, 2, 3. No pointer handling no nothing... Just accessing it via methods or with []-operator and more other methods.
First of all, the idea that you access (i,j,k) in your vec^3 directly is somewhat flawed. What you have is a structure of pointers where you need to dereference three pointers along the way. Note that I have no idea whether that is faster or slower than computing the position within a one-dimensional array, though. You'd need to test that and it might depend on the size of your data (especially whether it fits in a chunk).
Second, the vector^3 requires pointers and vector sizes, which require more memory. In many cases, this will be irrelevant (as the image grows cubically but the memory difference only quadratically) but if your algoritm is really going to fill out any memory available, that can matter.
Third, the raw array stores everything in consecutive memory, which is good for streaming and can be good for certain algorithms because of quick cache accesses. For example when you add one 3D image to another.
Note that all of this is about hyper-optimization that you might not need. The advantages of vectors that skratchi.at pointed out in his answer are quite strong, and I add the advantage that vectors usually increase readability. If you do not have very good reasons not to use vectors, then use them.
If you should decide for the raw array, in any case, make sure that you wrap it well and keep the class small and simple, in order to counter problems regarding leaks and such.
Welcome to SO.
If everything what you have are the two alternatives, then the first one could be better.
Prefer using STL array or vector instead of a C array
You should avoid to use C++ plain arrays since you need to manage yourself the memory allocating/deallocating with new/delete and other boilerplate code like keep track of the size/check bounds. In clearly words "C arrays are less safe, and have no advantages over array and vector."
However, there are some important drawbacks in the first alternative. Something I would like to highlight is that:
std::vector<std::vector<std::vector<T>>>
is not a 3-d matrix. In a matrix, all the rows must have the same size. On the other hand, in a "vector of vectors" there is no guarantee that all the nested vectors have the same length. The reason is that a vector is a linear 1-D structure as pointed out in the #spectras answer. Hence, to avoid all sort of bad or unexpected behaviours, you must to include guards in your code to obtain the rectangular invariant guaranteed.
Luckily, the first alternative is not the only one you may have in hands.
For example, you can replace the c-style array by a std::array:
const int n = i_size * j_size * k_size;
std::array<int, n> myFlattenMatrix;
or use std::vector in case your matrix dimensions can change.
Accessing element by its 3 coordinates
Regarding your question
To access the specific location of (i,j,k) easily, operator
overloading is necessary(am I right?).
Not exactly. Since there isn't a 3-parameter operator for neither std::vector nor array, you can't overload it. But you can create a template class or function to wrap it for you. In any case you will must to deference the 3 vectors or calculate the flatten index of the element in the linear storage.
Considering do not use a third part matrix library like Eigen for your experiments
You aren't coding it for production but for research purposes instead. Particularly, your research is exactly regarding the performance of algorithms. In that case, I prefer do not recommend to use a third part library, like Eigen, absolutely. Of course it depends a lot of what kind of "speed of an algorithm" metrics are you willing to gather, but Eigen, for instance, will do a lot of things under the hood (like vectorization) which will have a tremendous influence on your experiments. Since it will be hard for you to control those unseen optimizations, these library's features may lead you to wrong conclusions about your algorithms.
Algorithm's performance and big-o notation
Usually, the performance of algorithms are analysed by using the big-O approach where factors like the actual time spent, hardware speed or programming language traits aren't taken in account. The book "Data Structures and Algorithms in C++" by Adam Drozdek can provide more details about it.

What is the best way to treat large arrays of numbers in C++?

I need to deal with large arrays of float numbers (>200,000 numbers) and perform some maths with these arrays.
What do you suggest to treat these arrays so that I do not get any stack overflow problems?
UPDATE: I want to do simple and complex maths (sums, products, sin, cos, arctan) operations.
Plain numerical data you need to sequentially operate on?
std::valarray<double>
If profiling shows this is slowing you down, look for ways to make it faster by
std::valarray<double>::resize()
(yes, there's no reserve() unfortunately.
Why std::valarray<double> for numerical data? If you want to perform an operation on every element, just call
std::valarray<double>::apply(somefunction)
See for more information: a C++ reference.
If you want to be able to reserve(), you'll need std::vector, which is also fine but doesn't have overloads for the math functions you may want to use.
EDIT: This is of course assuming you have enough memory to fit all your arrays into std::valarrays. If not, you should split the 200,000 rows up so that only so many floats are in memory at the same time.
If your data are sparse, you could probably use boost's sparse_matrix http://www.boost.org/doc/libs/1_41_0/libs/numeric/ublas/doc/matrix_sparse.htm to represent you data structure and significantly reduce memory requirements.
Otherwise I would suggest looking into ways that you can split the data into chunks and work on one chunk in memory, then store that state to file and repeat.
I suggest you treat them like 10.000 per 10.000 then sum everything ?
It depends of what operations you are doing.
Depends on what you want to do with them.
Also, as chris said in the comments, dynamically allocate memory for your array (to get memory from the heap) and avoid using it as a local variable (which is allocated in the stack).

How to create an array with size more than C++ limits

I have a little problem here, i write c++ code to create an array but when i want to set array size to 100,000,000 or more i got an error.
this is my code:
int i=0;
double *a = new double[n*n];
this part is so important for my project.
When you think you need an array of 100,000,000 elements, what you actually need is a different data structure that you probably have never heard of before. Maybe a hash map, or maybe a sparse matrix.
If you tell us more about the actual problem you are trying to solve, we can provide better help.
In general, the only reason that would fail would be due to lack of memory/memory fragmentation/available address space. That is, trying to allocate 800MB of memory. Granted, I have no idea why your system's virtual memory can't handle that, but maybe you allocated a bunch of other stuff. It doesn't matter.
Your alternatives are to tricks like memory-mapped files, sparse arrays, and so forth instead of an explicit C-style array.
If you do not have sufficient memory, you may need to use a file to store your data and process it in smaller chunks.
Don't know if IMSL provides what you are looking for, however, if you want to work on smaller chunks you might devise an algorithm that can call IMSL functions with these small chunks and later merge the results. For example, you can do matrix multiplication by combining multiplication of sub-matrices.

Optimising C++ 2-D arrays

I need a way to represent a 2-D array (a dense matrix) of doubles in C++, with absolute minimum accessing overhead.
I've done some timing on various linux/unix machines and gcc versions. An STL vector of vectors, declared as:
vector<vector<double> > matrix(n,vector<double>(n));
and accessed through matrix[i][j] is between 5% and 100% slower to access than an array declared as:
double *matrix = new double[n*n];
accessed through an inlined index function matrix[index(i,j)], where index(i,j) evaluates to i+n*j. Other ways of arranging a 2-D array without STL - an array of n pointers to the start of each row, or defining the whole thing on the stack as a constant size matrix[n][n] - run at almost exactly the same speed as the index function method.
Recent GCC versions (> 4.0) seem to be able to compile the STL vector-of-vectors to nearly the same efficiency as the non-STL code when optimisations are turned on, but this is somewhat machine-dependent.
I'd like to use STL if possible, but will have to choose the fastest solution. Does anyone have any experience in optimising STL with GCC?
If you're using GCC the compiler can analyze your matrix accesses and change the order in memory in certain cases. The magic compiler flag is defined as:
-fipa-matrix-reorg
Perform matrix flattening and
transposing. Matrix flattening tries
to replace a m-dimensional matrix with
its equivalent n-dimensional matrix,
where n < m. This reduces the level of
indirection needed for accessing the
elements of the matrix. The second
optimization is matrix transposing
that attemps to change the order of
the matrix's dimensions in order to
improve cache locality. Both
optimizations need fwhole-program
flag. Transposing is enabled only if
profiling information is avaliable.
Note that this option is not enabled by -O2 or -O3. You have to pass it yourself.
My guess would be the fastest is, for a matrix, to use 1D STL array and override the () operator to use it as 2D matrix.
However, the STL also defines a type specifically for non-resizeable numerical arrays: valarray. You also have various optimisations for in-place operations.
valarray accept as argument a numerical type:
valarray<double> a;
Then, you can use slices, indirect arrays, ... and of course, you can inherit the valarray and define your own operator()(int i, int j) for 2D arrays ...
Very likely this is a locality-of-reference issue. vector uses new to allocate its internal array, so each row will be at least a little apart in memory due to each block's header; it could be a long distance apart if memory is already fragmented when you allocate them. Different rows of the array are likely to at least incur a cache-line fault and could incur a page fault; if you're really unlucky two adjacent rows could be on memory lines that share a TLB slot and accessing one will evict the other.
In contrast your other solutions guarantee that all the data is adjacent. It could help your performance if you align the structure so it crosses as few cache lines as possible.
vector is designed for resizable arrays. If you don't need to resize the arrays, use a regular C++ array. STL operations can generally operate on C++ arrays.
Do be sure to walk the array in the correct direction, i.e. across (consecutive memory addresses) rather than down. This will reduce cache faults.
My recommendation would be to use Boost.UBLAS, which provides fast matrix/vector classes.
To be fair depends on the algorithms you are using upon the matrix.
The double name[n*m] format is very fast when you are accessing data by rows both because has almost no overhead besides a multiplication and addition and because your rows are packed data that will be coherent in cache.
If your algorithms access column ordered data then other layouts might have much better cache coherence. If your algorithm access data in quadrants of the matrix even other layouts might be better.
Try to make some research directed at the type of usage and algorithms you are using. That is specially important if the matrix are very large, since cache misses may hurt your performance way more than needing 1 or 2 extra math operations to access each address.
You could just as easily do vector< double >( n*m );
You may want to look at the Eigen C++ template library at http://eigen.tuxfamily.org/ . It generates AltiVec or sse2 code to optimize the vector/matrix calculations.
There is the uBLAS implementation in Boost. It is worth a look.
http://www.boost.org/doc/libs/1_36_0/libs/numeric/ublas/doc/matrix.htm
Another related library is Blitz++: http://www.oonumerics.org/blitz/docs/blitz.html
Blitz++ is designed to optimize array manipulation.
I have done this some time back for raw images by declaring my own 2 dimensional array classes.
In a normal 2D array, you access the elements like:
array[2][3]. Now to get that effect, you'd have a class array with an overloaded
[] array accessor. But, this would essentially return another array, thereby giving
you the second dimension.
The problem with this approach is that it has a double function call overhead.
The way I did it was to use the () style overload.
So instead of
array[2][3], change I had it do this style array(2,3).
That () function was very tiny and I made sure it was inlined.
See this link for the general concept of that:
http://www.learncpp.com/cpp-tutorial/99-overloading-the-parenthesis-operator/
You can template the type if you need to.
The difference I had was that my array was dynamic. I had a block of char memory I'd declare. And I employed a column cache, so I knew where in my sequence of bytes the next row began. Access was optimized for accessing neighbouring values, because I was using it for image processing.
It's hard to explain without the code but essentially the result was as fast as C, and much easier to understand and use.