2D array access time comparison - c++

I have two ways of constructing a 2D array:
int arr[NUM_ROWS][NUM_COLS];
//...
tmp = arr[i][j]
and flattened array
int arr[NUM_ROWS*NUM_COLS];
//...
tmp = arr[i*NuM_COLS+j];
I am doing image processing so even a little improvement in access time is necessary. Which one is faster? I am thinking the first one since the second one needs calculation, but then the first one requires two addressing so I am not sure.

I don't think there is any performance difference. System will allocate same amount of contiguous memory in both cases. For calculate i*Numcols+j, either you would do it for 1D array declaration, or system would do it in 2D case. Only concern is ease of usage.

You should have trust into the capabilities of your compiler in optimizing standard code.
Also you should have trust into modern CPUs having fast numeric multiplication instructions.
Don't bother to use one or another!
I - decades ago - optimized some code greatly by using pointers instead of using 2d-array-calculation --> but this will a) only be useful if it is an option to store the pointer - e.g. in a loop and b) have low impact since i guess modern cpus should do 2d array access in a single cycle? Worth measuring! May be related to the array size.
In any case pointers using ptr++ or ptr += NuM_COLS will for sure be a little bit faster if applicable!

The first method will almost always be faster. IN GENERAL (because there are always corner cases) processor and memory architecture as well as compilers may have optimizations built in to aid with 2d arrays or other similar data structures. For example, GPUs are optimized for matrix (2d array) math.
So, again in general, I would allow the compiler and hardware to optimize your memory and address arithmetic if possible.
...also I agree with #Paul R, there are much bigger considerations when it comes to performance than your array allocation and address arithmetic.

There are two cases to consider: compile time definition and run-time definition of the array size. There is big difference in performance.
Static allocation, global or file scope, fixed size array:
The compiler knows the size of the array and tells the linker to allocate space in the data / memory section. This is the fastest method.
Example:
#define ROWS 5
#define COLUMNS 6
int array[ROWS][COLUMNS];
int buffer[ROWS * COLUMNS];
Run time allocation, function local scope, fixed size array:
The compiler knows the size of the array, and tells the code to allocate space in the local memory (a.k.a. stack) for the array. In general, this means adding a value to a stack register. Usually one or two instructions.
Example:
void my_function(void)
{
unsigned short my_array[ROWS][COLUMNS];
unsigned short buffer[ROWS * COLUMNS];
}
Run Time allocation, dynamic memory, fixed size array:
Again, the compiler has already calculated the amount of memory required for the array since it was declared with fixed size. The compiler emits code to call the memory allocation function with the required amount (usually passed as a parameter). A little slower because of the function call and the overhead required to find some dynamic memory (and maybe garbage collection).
Example:
void another_function(void)
{
unsigned char * array = new char [ROWS * COLS];
//...
delete[] array;
}
Run Time allocation, dynamic memory, variable size:
Regardless of the dimensions of the array, the compiler must emit code to calculate the amount of memory to allocate. This quantity is then passed to the memory allocation function. A little slower than above because of the code required to calculate the size.
Example:
int * create_board(unsigned int rows, unsigned int columns)
{
int * board = new int [rows * cols];
return board;
}

Since your goal is image processing then I would assume your images are too large for static arrays. The correct question you should be about dynamically allocated arrays
In C/C++ there are multiple ways you can allocate a dynamic 2D array How do I work with dynamic multi-dimensional arrays in C?. To make this work in both C/C++ we can use malloc with casting (for C++ only you can use new)
Method 1:
int** arr1 = (int**)malloc(NUM_ROWS * sizeof(int*));
for(int i=0; i<NUM_ROWS; i++)
arr[i] = (int*)malloc(NUM_COLS * sizeof(int));
Method 2:
int** arr2 = (int**)malloc(NUM_ROWS * sizeof(int*));
int* arrflat = (int*)malloc(NUM_ROWS * NUM_COLS * sizeof(int));
for (int i = 0; i < dimension1_max; i++)
arr2[i] = arrflat + (i*NUM_COLS);
Method 2 essentially creates a contiguous 2D array: i.e. arrflat[NUM_COLS*i+j] and arr2[i][j] should have identical performance. However, arrflat[NUM_COLS*i+j] and arr[i][j] from method 1 should not be expected to have identical performance since arr1 is not contiguous. Method 1, however, seems to be the method that is most commonly used for dynamic arrays.
In general, I use arrflat[NUM_COLS*i+j] so I don't have to think of how to allocated dynamic 2D arrays.

Related

How fast is memory access to a vector compared to a normal array?

I have a function that is called thousands times a second (it's an audio effect) and I need a buffer for writing and reading audio data. Is there a considerable difference in performance between declaring a float array as a normal array or as a vector?
Once declared, my array is not resized during the audio loop, but at the initialization phase I don't know the exact length because it's dependant on the audio sampling rate. So, for example, if I need a 2 seconds audio buffer for a sampling rate of 44100 Hz, I normally do this:
declaration:
int size;
float *buffer;
void init (int sr)
{
size = sr * 2;
buffer = new float[size]();
}
~destroy()
{
delete [] buffer;
}
Allocating memory dynamically has a small cost, as does the later deallocation, but you've illustrated use of new already so your costs are equivalent to vector with an adequate initial size or reserve call in every way.
Once allocated the operations can be expected to be as fast in any optimised build, but you should profile yourself if you've any reason to care.
It's not relevant to your new-ing code, but FYI there at least a potential difference due to addressing - a global or static array might have the virtual address known at compile time, and a stack based array might be at a known offset from the stack pointer, but on most architectures there's no appreciable performance difference between those and indexing relative to a runtime-determined pointer.

Getting User store segfault error

I am receiving the error "User store segfault # 0x000000007feff598" for a large convolution operation.
I have defined the resultant array as
int t3_isize = 0;
int t3_irowcount = 0;
t3_irowcount=atoi(argv[2]);
t3_isize = atoi(argv[3]);
int iarray_size = t3_isize*t3_irowcount;
uint64_t t_result[iarray_size];
I noticed that if the array size is less than 2^16 - 1, the operation doesn't fail, but for the array size 2^16 or higher, I get the segfault error.
Any idea why this is happening? And how can i rectify this?
“I noticed that if the array size is greater than 2^16 - 1, the operation doesn't fail, but for the array size 2^16 or higher, I get the segfault error”
↑ Seems a bit self-contradictory.
But probably you're just allocating a too large array on the stack. Using dynamic memory allocation (e.g., just switch to using std::vector) you avoid that problem. For example:
std::vector<uint64_t> t_result(iarray_size);
In passing, I would ditch the Hungarian notation-like prefixes. For example, t_ reads like this is a type. The time for Hungarian notation was late 1980's, and its purpose was to support Microsoft's Programmer's Workbench, a now dicontinued (for very long) product.
You're probably declaring too large of an array for the stack. 216 elements of 8 bytes each is quite a lot (512K bytes).
If you just need static allocation, move the array to file scope.
Otherwise, consider using std::vector, which will allocate storage from the heap and manage it for you.
Using malloc() solved the issue.
uint64_t* t_result = (uint64_t*) malloc(sizeof(uint64_t)*iarray_size);

c++ dynamic allocated variables , what is the flow of execution?

I have few questions:
1) why when I created more than two dynamic allocated variables the difference between their memory address is 16 bytes. (I thought one of the advantages of using dynamic variables is saving memory, so when you delete unused variable it will free that memory); but if the difference between two dynamic variables is 16 bytes even using a short integer, then there a lot of memery that I will not benifit .
2) creating a dynamic allocated variable using new operator.
int x;
cin >> x;
int* a = new int(3);
int y = 4;
int z = 1;
In the e.g above. what is the flow of execution of this program. is it gonna store all variable likes x,a,y and z in the stack and then will store the value 3 in the address that a points to?
3) creating a dynamic alloated array.
int x;
cin >> x;
int* array = new int[x];
int y = 4;
int z = 1;
and the same question here.
4) does the size of the heap(free scope) depend on how much of memory im using in the code area,the stack are, and the global area ?
Storing small values like integers on the heap is fairly pointless because you use the same or more memory to store the pointer. The 16 byte alignment is just so the CPU can access the memory as efficiently as possible.
Yes, although the stack variables might be allocated to registers; that is up to the compiler.
Same as 2.
The size of the heap is controlled by the operating system and expanded as necessary as you allocate more memory.
Yes, in the examples, a and array are both "stack" variables. The data they point to is not.
I put stack in quotes because we are not going to concern ourselves with hardware detail here, but just the semantics. They have the semantics of stack variables.
The chunks of heap memory which you allocate need to store some housekeeping data so that the allocator (the code which works in behind of new) could work. The data usually includes chunk length and the address of next allocated chunk, among other things — depending on the actual allocator.
In your case, the service data are stored directly in front of (and, maybe, behind of, too) the actual allocated chunk. This (plus, likely, alignment) is the reason of 16 byte gap you observe.

Size of std::array, std::vector and raw array

Lets we have,
std::array <int,5> STDarr;
std::vector <int> VEC(5);
int RAWarr[5];
I tried to get size of them as,
std::cout << sizeof(STDarr) + sizeof(int) * STDarr.max_size() << std::endl;
std::cout << sizeof(VEC) + sizeof(int) * VEC.capacity() << std::endl;
std::cout << sizeof(RAWarr) << std::endl;
The outputs are,
40
20
40
Are these calculations correct? Considering I don't have enough memory for std::vector and no way of escaping dynamic allocation, what should I use? If I would know that std::arrayresults in lower memory requirement I could change the program to make array static.
These numbers are wrong. Moreover, I don't think they represent what you think they represent, either. Let me explain.
First the part about them being wrong. You, unfortunately, don't show the value of sizeof(int) so we must derive it. On the system you are using the size of an int can be computed as
size_t sizeof_int = sizeof(RAWarr) / 5; // => sizeof(int) == 8
because this is essentially the definition of sizeof(T): it is the number of bytes between the start of two adjacent objects of type T in an array. This happens to be inconsistent with the number print for STDarr: the class template std::array<T, n> is specified to have an array of n objects of type T embedded into it. Moreover, std::array<T, n>::max_size() is a constant expression yielding n. That is, we have:
40 // is identical to
sizeof(STDarr) + sizeof(int) * STDarr.max_size() // is bigger or equal to
sizeof(RAWarr) + sizeof_int * 5 // is identical to
40 + 40 // is identical to
80
That is 40 >= 80 - a contradication.
Similarily, the second computation is also inconsistent with the third computation: the std::vector<int> holds at least 5 elements and the capacity() has to be bigger than than the size(). Moreover, the std::vector<int>'s size is non-zero. That is, the following always has to be true:
sizeof(RAWarr) < sizeof(VEC) + sizeof(int) * VEC.capacity()
Anyway, all this is pretty much irrelevant to what your actual question seems to be: What is the overhead of representing n objects of type T using a built-in array of T, an std::array<T, n>, and an std::vector<T>? The answer to this question is:
A built-in array T[n] uses sizeof(T) * n.
An std::array<T, n> uses the same size as a T[n].
A std::vector<T>(n) has needs some control data (the size, the capacity, and possibly and possibly an allocator) plus at least 'n * sizeof(T)' bytes to represent its actual data. It may choose to also have a capacity() which is bigger than n.
In addition to these numbers, actually using any of these data structures may require addition memory:
All objects are aligned at an appropriate address. For this there may be additional byte in front of the object.
When the object is allocated on the heap, the memory management system my include a couple of bytes in addition to the memory made avaiable. This may be just a word with the size but it may be whatever the allocation mechanism fancies. Also, this memory may live somewhere else than the allocate memory, e.g. in a hash table somewhere.
OK, I hope this provided some insight. However, here comes the important message: if std::vector<T> isn't capable of holding the amount of data you have there are two situations:
You have extremely low memory and most of this discussion is futile because you need entirely different approaches to cope with the few bytes you have. This would be the case if you are working on extremely resource constrained embedded systems.
You have too much data an using T[n] or std::array<T, n> won't be of much help because the overhead we are talking of is typically less than 32 bytes.
Maybe you can describe what you are actually trying to do and why std::vector<T> is not an option.

What exactly do pointers store? (C++)

I know that pointers store the address of the value that they point to, but if you display the value of a pointer directly to the screen, you get a hexadecimal number. If the number is exactly what the pointer stores, then when saying
pA = pB; //both are pointers
you're copying the address. Then wouldn't there be a bigger overhead to using pointers when working with very small items like ints and bools?
A pointer is essentially just a number. It stores the address in RAM where the data is. The pointer itself is pretty small (probably the same size as an int on 32 bit architectures, long on 64 bit).
You are correct though that an int * would not save any space when working with ints. But that is not the point (no pun intended). Pointers are there so you can have references to things, not just use the things themselves.
Memory addresses.
That is the locations in memory where other stuff is.
Pointers are generally the word size of the processor, so they can generally be moved around in a single instruction cycle. In short, they are fast.
As others have said, a pointer stores a memory address which is "just a number' but that is an abstraction. Depending on processor architecture it may be more than one number, for instance a base and offset that must be added to dereference the pointer. In this case the overhead is slightly higher than if the address is a single number.
Yes, there is overhead in accessing an int or a bool via a pointer vs. directly, where the processor can put the variable in a register. Pointers are usually used where the value of the indirection outweighs any overhead, i.e. traversing an array.
I've been referring to time overhead. Not sure if OP was more concerned space or time overhead.
The number refers to its address in memory. The size of a pointer is typically the native size of the computer's architecture so there is no additional overhead compared to any other primitive type.
On some architectures there is an additional overhead of pointers to characters because the architecture only supports addressing words (32- or 64-bit values). A pointer to a character is therefore stored as a word address and an offset of the character within that word. De-referencing the pointer involves fetching the word and then shifting and masking it's value to extract the character.
Let me start from the basics. First of all, you will have to know what variable are and how they are used.
Variables are basically memory locations(usually containing some values) and we use some identifier(i.e., variable names) to refer to that memory location and use the value present at that location.
For understanding it better, suppose we want the information from memory cells present at some location relative to the current variable. Can we use the identifier to extract information from nearby cells?
No. Because the identifier(variable name) will only give the value contained in that particular cell.
But, If somehow we can get the memory address at which this variable is present then we can easily move to nearby locations and use their information as well(at runtime).
This is where pointers come into play. They are used to store the location of that variable so that we can use the additional address information whenever required.
Syntax: To store the address of a variable we can simply use & (address-of) operator.
foo = &bar
Here foo stores the address of variable bar.
Now, what if we want to know the value present at that address?
For that, we can simply use the * (dereference) operator.
value = *foo
Now that we have to store the address of a variable, we'll be needing the memory the same way as we need in case of a variable. This means pointers are also stored in the memory the same way as other variables, so just like in case of variables, we can also store the address of a pointer into yet another pointer.
An address in memory. Points to somewhere! :-)
Yes, you're right, both in terms of speed and memory.
Pointers almost always take up more bytes than your standard int and, especially, bool and char data types. On modern machines pointers typically are 8 bytes while char is almost always just 1 byte.
In this example, accessing the the char and bool from Foo requires more machine instructions than accessing from Bar:
struct Foo
{
char * c; // single character
bool * b; // single bool
};
struct Bar
{
char c;
bool b;
};
... And if we decide to make some arrays, then the size of the arrays of Foo would be 8 times larger - and the code is more spread-apart so this means you'll end up having a lot more cache misses.
#include <vector>
int main()
{
int size = 1000000;
std::vector<Foo> foo(size);
std::vector<Bar> bar(size);
return 0;
}
As dmckee pointed out, a single copy of a one-byte bool and a single copy of a pointer are just as fast:
bool num1, num2,* p1, * p2;
num1 = num2; // this takes one clock cycle
p1 = p2; // this takes another
As dmckee said, this is true when you're using a 64-bit architecture.
However, copying of arrays of ints, bools and chars can be much faster, because we can squeeze multiples of them onto each register:
#include <iostream>
int main ()
{
const int n_elements = 100000 * sizeof(int64_t);
bool A[n_elements];
bool B[n_elements];
int64_t * A_fast = (int64_t *) A;
int64_t * B_fast = (int64_t *) B;
const int n_quick_elements = n_elements / sizeof(int64_t);
for (int i = 0; i < 10000; ++i)
for (int j = 0; j < n_quick_elements; ++j)
A_fast[j] = B_fast[j];
return 0;
}
The STL containers and other good libraries do this sort of thing for us, using type_traits (is_trivially_copyable) and std::memcopy. Using pointers under the false guise that they're always just as fast can prevent those libraries from optimising.
Conclusion: It may seem obvious with these examples, but only use pointers/references on basic data types when you need to take/give access to the original object.