Higher dimensional array vs 1-D array efficiency in C++ - c++

I'm curious about the efficiency of using a higher dimensional array vs a one dimensional array. Do you lose anything when defining, and iterating through an array like this:
array[i][j][k];
or defining and iterating through an array like this:
array[k + j*jmax + i*imax];
My inclination is that there wouldn't be a difference, but I'm still learning about high efficiency programming (I've never had to care about this kind of thing before).
Thanks!

The only way to know for sure is to benchmark both ways (with optimization flags on in the compiler of course). The one think you lose for sure in the second method is the clarity of reading.

The former way and the latter way to access arrays are identical once you compile it. Keep in mind that accessing memory locations that are close to one another does make a difference in performance, as they're going to be cached differently. Thus, if you're storing a high-dimensional matrix, ensure that you store rows one after the other if you're going to be accessing them that way.
In general, CPU caches optimize for temporal and spacial ordering. That is, if you access memory address X, the odds of you accessing X+1 are higher. It's much more efficient to operate on values within the same cache line.
Check out this article on CPU caches for more information on how different storage policies affect performance: http://en.wikipedia.org/wiki/CPU_cache

If you can rewrite the indexing, so can the compiler. I wouldn't worry about that.
Trust your compiler(tm)!

It probably depends on implementation, but I'd say it more or less amounts to your code for one-dimensional array.

Do yourself a favor and care about such things after profiling the code. It is very unlikely that something like that will affect the performance of the application as a whole. Using the correct algorithms is much more important
And even if it does matter, it is most certainly only a single inner loop that needs attention.

Related

Being cache efficient with data, mainly arrays

I have recently started to look into being cache efficient by trying to avoid cache misses in c++. So far I have taken away the following:
Try and avoid linked lists objects where possible when processing. Instead use them to point to contiguous data that you can store in cache and perform operations on.
Be careful of holding state in classes as it makes the above potentially more difficult.
Use structs when allocating on the heap, as this helps in localising data.
Try and use 1D arrays when possible for lists of data.
So my question is broken into two parts:
Is the above correct? Have I made any fundamental misunderstandings?
When dealing with 2D arrays I have seen other users recommend the use of Hilbert curves. I do not understand how this provides a speed increase over using division and modulus operators on an index to simulate a 2D array as that is surely less instructions which is good for speed and instruction cache usage?
Thanks for reading.
P.S. I do not have a CompSci background therefore, if you notice anything that I have said that is incorrect I would appreciate it if you could alert me so that I can read around that topic.
Your approach is flawed for at least one reason: you are willing to sacrifice everything to avoid cache misses. How do you know if that (cache misses) is the major performance factor in your code?
For example, there are MANY cases where the use of linked list is better that a contiguous array, specifically - where you frequently insert / delete items. You would pay greatly for compacting or expanding an array.
So the answer to your first question is: yes, you will improve the data locality using those four principals. But - at the cost, probably greater than the savings.
For the second question, I suggest you read about Hilbert curves. You don't need them if you are processing your 2D array in order, row-by-row. They will help a lot (with data locality) if you process some area of your 2D array, because the distance between elements in the same column/different rows is much smaller that way.

GPU Programming Strategy

I am trying to program a type of neural network using c in CUDA. I have one basic question. For the programming, I can either use big arrays or different naming strategy. For example for the weights, I can put all the weights in one big array or use different arrays for different layers with different names such as weight1 which is for layer one and weight2 for layer2 and so on. The first strategy is a little bit troublesome while the second one is easier for me. However, I am wondering if I use the different naming strategy, does it make the program slower to run on GPU?
As long as all the arrays are allocated only once and not resized, the difference in performance should be negligible.
If you are constantly reallocating memory and resizing arrays holding the weights, then there might be a performance benefit in managing your own memory within the big array.
That however that is very implementation specific, if you don't know what you are doing, managing your own memory/arrays could make your code slower and less robust. Also if your NN is huge, you might have trouble finding a contiguous block of memory large enough to hold your memory/array block.
This is my 2 cents.
The drawbacks of having 1 very large array:
harder to resize, so if you intent on resizing indiviual layers. Go for a large block.
As Daniel said it might be hard to find a contiguous block of memory(take in mind that something might feel large. But isn't from a techinal/hardware perspective.
The drawbacks of Seperate arrays or containers.
If you have a very granulated, unpredictable access pattern. The access times can be slower if it takes multiple steps to find a single location in an array. For example, if you have a list of pointers to a list of pointers, to a list of pointers. You have to take three(slightly expensive) steps every time. This can be avoided with proper coding.
In general I would be in favor of splitting up.

Microoptimizing D jagged arrays

Is there a reason to linearize multidimensional arrays into flat for productivity? I mean, even hypothetically you just take pointer operations from compiler and do them explicitly when counting indexes. So what's the point?
Making them flat might make them stay in the CPU cache longer (since they're in the same block of memory), which would reduce the number of cache misses and therefore improve performance. But you'd have to profile and benchmark the code to see exactly what the actual performance implications would be with any particular program. Certainly, I wouldn't worry about that sort of thing unless profiling indicated that you should figure out how to optimize with regards to the jagged arrays in order to reduce a performance bottleneck.

Cache locality performance

If I had a C or C++ program where I was using say 20 integers throughout the program, would it improve performance to create an array of size 20 to store the integers and then create aliases for each number?
Would this improve the cache locality (rather than just creating 20 normal ints) because the ints would be loaded into the cache together as part of the int array(or at least , improve the chances of this)?
The question is how do you allocate space for them? I doubt that you just randomly do new int 20 times here and there in the code. If they are local variables then they will get on stack and get cached.
The main question is is that worth bothering? Try to write your program in readable and elegant way first, and then try to remove major bottlenecks and only after start messing with microoptimizations. If you are processing 20 ints, should not they be array essentially?
Also is it theoretical question? If it is, then yes, array will likely be cached better then 20 random areas in memory. If it is practical question, then I doubt that this is really important unless you are writing supercritical performance code, and even then microoptimizations are last thing to deal with.
It might improve performance a bit, yes. It might also completely ruin your performance. Or it might have no impact whatsoever because the compiler already did something similar for you. Or it might have no impact because you're just not using those integers often enough for this to make a difference.
It also depends on whether one or multiple threads access these integers, and whether they just read, or also modify the numbers. (if you have multiple threads and you write to those integers, then putting them in an array will cause false sharing which will hurt your performance far more than anything you'd hoped to gain)
So why don't you just try it?
There is no simple, single answer. The only serious answer you're going to get is "it depends". If you want to know how it would behave in your case, then you have two options:
try it and see what happens, or
gain a thorough understanding of how your CPU works, gather data on exactly how often these values are accessed and in which patterns, so you can make an educated guess at how the change would affect your performance.
If you choose #2, you'll likely need to follow it up with #1 anyway, to verify that your guess was correct.
Performance isn't simple. There are few universal rules, and everything depends on context. A change which is an optimization in one case might slow everything down in another.
If you're serious about optimizing your code, then there's no substitute for the two steps above. And if you're not serious about it, don't do it. :)
Yes, the theoretical chance of the 20 integers ending up on the same cache line would be higher, although I think a good compiler would almost always be able to replicate the same performance for you even when not using an array.
So, you currently have int positionX, positionY, positionZ;then somewhere else int fuzzy; and int foo;, etc to make about 20 integers?
And you want to do something like this:
int arr[20];
#define positionX arr[0]
#define positionY arr[1]
#define positionZ arr[2]
#define fuzzy arr[3]
#define foo arr[4]
I would expect that if there is ANY performance difference, it may make it slower, because the compiler will notice that you are using arr in some other place, and thus can't use registers to store the value of foo, since it sees that you call update_position which touches arr[0]..arr[2]. It depends on how finegrained the compilers detection of "we're touching the same data" is. And I suspect it may quite often be based on "object" rather than individual fields of an object - particularly for arrays.
However, if you do have data that is used close together, e.g. position variables, it would probably help to have them next to each other.
But I seriously think that you are wasting your time trying to put variables next to each other, and using an array is almost certainly a BAD idea.
It would likely decrease performance. Modern compilers will move variables around in memory when you're not looking, and may store two variables at the same address when they're not used concurrently. With your array idea, those variables cannot overlap, and must use distinct cache lines.
Yes this may improve your performance, however it may not, as its really Variables that get used together that should be stored together.
So if they are used together then yes. Variables and objects should really be declared in the function in which they are used as they will be stored on the stack (level-1 cache in most cases).
So yes if you are going to use them together i.e they are relevant to each other, then this would probably be a little more efficient, provding you also take into consideration how you allocate them memory.

Dynamically allocate or waste memory?

I have a 2d integer array used for a tile map.
The size of the map is unknown and read in from a file at runtime. currently the biggest file is 2500 items(50x50 grid).
I have a working method of dynamic memory allocation from an earlier question but people keep saying that it a bad idea so I have been thinking whether or not to just use a big array and not fill it all up when using a smaller map.
Do people know of any pros or cons to either solution ? any advice or personal opinions welcome.
c++ btw
edit: all the maps are made by me so I can pick a max size.
Probably the easiest way is for example a std::vector<std::vector<int> > to allow it to be dynamically sized AND let the library do all the allocations for you. This will prevent accidentally leaking memory.
My preference would be to dynamically allocate. That way should you encounter a surprisingly large map you (hopefully) won't overflow if you've written it correctly, whereas with the fixed size your only option is to return an error and fail.
Presumably loading tile maps is a pretty infrequent operation. I'd be willing to bet too that you can't even measure a meaningful difference in speed between the two. Unless there is a measurable performance reduction, or you're actually hitting something else which is causing you problems the static sized one seems like a premature optimisation and is asking for trouble later on.
It depends entirely on requirements that you haven't stated :-)
If you want your app to be as blazingly fast as possible, with no ability to handle larger tile maps, then by all means just use a big array. For small PIC-based embedded systems this could be an ideal approach.
But, if you want your code to be robust, extensible, maintainable and generally suitable for a wider audience, use STL containers.
Or, if you just want to learn stuff, and have no concern about maintainability or performance, try and write your own dynamically allocating containers from scratch.
I believe the issue people refer to with dynamic allocation results from allocating randomly sized blocks of memory and not being able to effectively manage the random sized holes left when deallocated. If you're allocating fixed sized tiles then this may not be an issue.
I see quite a few people suggest allocating a large block of memory and managing it themselves. That might be an alternative solution.
Is allocating the memory dynamically a bottleneck in your program? Is it the cause of a performance issue? If not, then simply keep dynamic allocation, you can handle any map size. If yes, then maybe use some data structure that does not deallocate the memory it has allocated but rather use its old buffer and if needed, reallocate more memory.