Performance using IDs and arrays (vectors)

Performance using IDs and arrays (vectors) - c++

I have been taught at school to use database with integer IDs, and I want to know if it's also a good way to do so in C/C++. I'm making a game, using Ogre3D, so I'd like my game code to use as few cycles as possible.
This is not the exact code (I'm using vectors and it's about characters and abilities and such), but I'm curious to know if the line where I access the weight is going to cause a bottleneck or not, since I'd doing several array subscript.
struct item
{
float weight;
int mask;
item(): mask(0) {}
}
items[2000];
struct shipment
{
int item_ids[20];
}
shipments[10000];
struct order
{
int shipment_ids[20];
}
orders[3000];
int main()
{
// if I want to access an item's data of a certain order, I do:
for (int i = 0; i < 3000; ++ i)
{
if (items[shipments[orders[4].shipment_ids[5]]].weight > 23.0)
s |= (1<< 31);
}
}
I have heard that putting data into arrays is the best way to gain performance when looping over data repeatedly, I just want to know your opinion on this code...

A good optimizer should be able to compute the exact offset of the memory address each of those items. There is no dependency between loop iterations, so you should be able to get loop unrolled (SIMD processing). Looks great, IMHO. If you can avoid floats, that will also help you.

Related

Which is the best way to index through a for loop?

I am trying to do a product operand on the values inside of a vector. It is a huge mess of code.. I have posted it previously but no one was able to help. I just wanna confirm which is the correct way to do a single part of it. I currently have:
vector<double> taylorNumerator;
for(a = 0; a <= (constant); a++) {
double Number = equation involving a to get numerous values;
taylorNumerator.push_back(Number);
for(b = 0; b <= (constant); b++) {
double NewNumber *= taylorNumerator[b];
}
This is what I have as a snapshot, it is very short from what I actually have. Someone told me it is better to do vector.at(index) instead. Which is the correct or best way to accomplish this? If you so desire I can paste all of the code, it works but the values I get are wrong.

When possible, you should probably avoid using indexes at all. Your options are:
A range-based for loop:
for (auto numerator : taylorNumerators) { ... }
An iterator-based loop:
for (auto it = taylorNumerators.begin(); it != taylorNuemrators.end(); ++it) { ... }
A standard algorithm, perhaps with a lambda:
#include <algorithm>
std::for_each(taylorNumerators, [&](double numerator) { ... });
In particular, note that some algorithms let you specify a number of iterations, like std::generate_n, so you can create exactly n items without counting to n yourself.
If you need the index in the calculation, then it can be appropriate to use a traditional for loop. You have to watch for a couple pitfalls: std::vector<T>::size() returns a std::vector<T>::size_type which is typically identical to std::size_type, which is (1) unsigned and (2) quite possibly larger than an int.
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) { ... }
Your calculations probably deal with doubles or some numerical type other than std::size_t, so you have to consider the best way to convert it. Many programmers would rely on implicit conversions, but that can be dangerous unless you know the conversion rules very well. I'd generally start by doing a static cast of the index to the type I actually need. For example:
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) {
const auto x = static_cast<double>(i);
/* calculation involving x */
}
In C++, it's probably far more common to make sure the index is in range and then use operator[] rather than to use at(). Many projects disable exceptions, so the safety guarantee of at() wouldn't really be available. And, if you can check the range once yourself, then it'll be faster to use operator[] than to rely on the range-check built into at() on each index operation.

What you have is fine. Modern compilers can optimize the heck out of the above such that the code is just as fast as the equivalent C code of accessing items direclty.
The only optimization for using vector I recommend is to invoke taylorNumerator.reserve(constant) to allocate the needed storage upfront instead of the vector resizing itself as new items are added.
About the only worthy optimization after that is to not use vector at all and just use a static array - especially if constant is small enough that it doesn't blow up the stack (or binary size if global).
double taylorNumerator[constant];

How to perform GroupBy Sum query on a list?

Background
I have worked with C#.Net + LINQ wherever possible and trying my hand at C++ development for a project I am involved. Of course, I fully realize that C# and C++ are two different worlds.
Question
I have an std::list<T> where T is a struct as follows:
struct SomeStruct{
int id;
int rate;
int value;
};
I need to get a result of group by rate and sum of value. How can I perform GroupBy Sum aggregate function on this list?
Example:
SomeStruct s1;
SomeStruct s2;
SomeStruct s3;
s1.id=1;
s1.rate=5;
s1.value=100;
s2.id=2;
s2.rate=10;
s2.value=50;
s3.id=3;
s3.rate=10;
s3.value=200;
std::list<SomeStruct> myList;
myList.push_front(s1);
myList.push_front(s2);
myList.push_front(s3);
With these inputs I would like to get following output:
rate|value
----|-----
5| 100
10| 250
I found a few promising libs such as CINQ and cppitertools. But I couldn't fully understand as I lack sufficient knowledge. It would be great if someone guide me to right direction, I am more than willing to learn new things.

Computing a Group-By sum is relatively straightforward:
using sum_type = int; // but maybe you want a larger type
auto num_groups = max_rate + 1;
std::vector<sum_type> rate_sums(num_groups); // this is initialized to 0
for(const auto& s : myList) {
rate_sums[s.rate] += s.value;
}
this is when the rate values are within 0 and max_rate, and max_rate is not too large relative to myList.size(); otherwise the memory use might be excessive (and you'll have some overhead initializing the vector).
If the rate values are scattered over a large range relative to myList.size(), consider using an std::unoredered_map instead of an std::vector).
The code above can also be parallelized. The way to parallelize it depends on your hardware, and there are all sorts of libraries to help you do this. In C++20 there might be language facilities for parallelization.
Remember, though, that linked lists are rather slow to work with, because you have to dereference an arbitrary address to get from one element to the next. If you can get your input in an std::vector or a plain array, that would be faster; and if you can't, it's probably worthless to bother with parallelization.

For loop or no loop? (dataset is small and not subject to change)

Let's say I have a situation where I have a matrix of a small, known size where the size is unlikely to change over the life of the software. If I need to examine each matrix element, would it be more efficient to use a loop or to manually index into each matrix location?
For example, let's say I have a system made up of 3 windows, 2 panes per window. I need to keep track of state for each window pane. In my system, there will only ever be 3 windows, 2 panes per window.
static const int NUMBER_OF_WINDOWS = 3;
static const int NUMBER_OF_PANES = 2;
static const int WINDOW_LEFT = 0;
static const int WINDOW_MIDDLE = 1;
static const int WINDOW_RIGHT = 2;
static const int PANE_TOP = 0;
static const int PANE_BOTTOM = 1;
paneState windowPanes[NUMBER_OF_WINDOWS][NUMBER_OF_PANES];
Which of these accessing methods would be more efficient?
loop version:
for (int ii=0; ii<NUMBER_OF_WINDOWS; ii++)
{
for (int jj=0; jj<NUMBER_OF_PANES; jj++)
{
doSomething(windowPanes[ii][jj];
}
}
vs.
manual access version:
doSomething(windowPanes[WINDOW_LEFT][PANE_TOP]);
doSomething(windowPanes[WINDOW_MIDDLE][PANE_TOP]);
doSomething(windowPanes[WINDOW_RIGHT][PANE_TOP]);
doSomething(windowPanes[WINDOW_LEFT][PANE_BOTTOM]);
doSomething(windowPanes[WINDOW_MIDDLE][PANE_BOTTOM]);
doSomething(windowPanes[WINDOW_RIGHT][PANE_BOTTOM]);
Will the loop code generate branch instructions, and will those be more costly than the instructions that would be generated on the manual access?

The classic Efficiency vs Organization. The for loops are much more human readable and the manual way is more machine readable.
I recommend you use the loops. Because the compiler, if optimizing is enabled, will actually generate the manual code for you when it sees that the upper bounds are constant. That way you get the best of both worlds.

First of all: How complex is your function doSomething? If it is (most likely this is so), then you will not notice any difference.
In general, calling your function sequentially will be slightly more effective than the loop. But once again, the gain will be so tiny that it is not worth discussing it.
Bear in mind that optimizing compilers do loop unrolling. This is essentially generating code that will rotate your loop smaller number of times while doing more work in each rotation (they will call your function 2-4 times in sequence). When the number of rotations is small and fixed compiler may easily eliminate the loop completely.
Look at your code from the point of view of clarity and ease of modification. In many cases compiler will do a lot of useful tricks related to performance.

You may linearize your multi-dimensional array
paneState windowPanes[NUMBER_OF_WINDOWS * NUMBER_OF_PANES];
and then
for (auto& pane : windowPanes) {
doSomething(pane);
}
Which avoid extra loop if compiler doesn't optimize it.

Couple performance questions (one bigger vector vs smaller chunks vectors) and Is it worth to store iteration index for jump access of vector?

I am a bit curiuous about vector optimization and have couple questions about it. (I am still a beginner in programing)
example:
struct GameInfo{
EnumType InfoType;
// Other info...
};
int _lastPosition;
// _gameInfoV is sorted beforehand
std::vector<GameInfo> _gameInfoV;
// The tick function is called every game frame (in "perfect" condition it's every 1.0/60 second)
void BaseClass::tick()
{
for (unsigned int i = _lastPosition; i < _gameInfoV.size(); i++{
auto & info = _gameInfoV[i];
if( !info.bhasbeenAdded ){
if( DoWeNeedNow() ){
_lastPosition++;
info.bhasbeenAdded = true;
_otherPointer->DoSomething(info.InfoType);
// Do something more with "info"....
}
else return; //Break the cycle since we don't need now other "info"
}
}
}
The _gameInfoV vector size can be between 2000 and 5000.
My main 2 questions are:
Is it better to leave the way how it is or it's better to make smaller chunks of it, which is checked for every different GameInfo.InfoType
Is it worth the hassle of storing the last start position index of the vector instead of iterating from the beginning.
Note that if using smaller vectors there will be like 3 to 6 of them
The third thing is probably that I am not using vector iterators, but is it safe to use then like this?
std::vector<GameInfo>::iterator it = _gameInfoV.begin() + _lastPosition;
for (it = _gameInfoV.begin(); it != _gameInfoV.end(); ++it){
//Do something
}
Note: It will be used in smartphones, so every optimization will be appreciated, when targeting weaker phones.
-Thank you

Don't; except if you frequently move memory around
It is no hassle if you do it correctly:
std::vector<GameInfo>::const_iterator _lastPosition(gameInfoV.begin());
// ...
for (std::vector<GameInfo>::iterator info=_lastPosition; it!=_gameInfoV.end(); ++info)
{
if (!info->bhasbeenAdded)
{
if (DoWeNeedNow())
{
++_lastPosition;
_otherPointer->DoSomething(info->InfoType);
// Do something more with "info"....
}
else return; //Break the cycle since we don't need now other "i
}
}

Breaking one vector up into several smaller vectors in general doesn't improve performance. It could even slightly degrade performance because the compiler has to manage more variables, which take up more CPU registers etc.
I don't know about gaming so I don't understand the implication of GameInfo.InfoType. Your processing time and CPU resource requirements are going to increase if you do more total iterations through loops (where each loop iteration performs the same type of operation). So if separating the vectors causes you to avoid some loop iterations because you can skip entire vectors, that's going to increase performance of your app.
iterators are the most secure way to iterate through containers. But for a vector I often just use the index operator [] and my own indexer (a plain old unsigned integer).

Fastest Possible Struct-of-Arrays to Array-of-Structs Conversion

I have a structure that looks like this:
struct SoA
{
int arr1[COUNT];
int arr2[COUNT];
};
And I want it to look like this:
struct AoS
{
int arr1_data;
int arr2_data;
};
std::vector<AoS> points;
as quickly as possible. Order must be preserved.
Is constructing each AoS object individually and pushing it back the fastest way to do this, or is there a faster option?
SoA before;
std::vector<AoS> after;
for (int i = 0; i < COUNT; i++)
points.push_back(AoS(after.arr1[i], after.arr2[i]));
There are SoA/AoS related questions on StackOverflow, but I haven't found one related to fastest-possible conversion. Because of struct packing differences I can't see any way to avoid copying the data from one format to the next, but I'm hoping someone can tell me there's a way to simply reference the data differently and avoid a copy.
Off the wall solutions especially encouraged.

Binary layout of SoA and AoS[]/std::vector<AoS> is different, so there is really no way to transform one to another without copy operation.
Code you have is pretty close to optimal - one improvement maybe to pre-allocate vector with expected number of elements. Alternatively try raw array with both constructing whole element and per-property initialization. Changes need to be measured carefully (definitely measure using fully optimized build with array sizes you expect) and weighted against readabilty/correctness of the code.
If you don't need exact binary layout (seem to be that case as you are using vector) you may be able to achieve similarly looking syntax by creating couple custom classes that would expose existing data differently. This will avoid copying altogether.
You would need "array" type (provide indexing/iteration over instance of SoA) and "element" type (initialized with referece to instance of SoA and index, exposing accessors for separate fields at that index)
Rough sketch of code (add iterators,...):
class AoS_Element
{
SoA& soa;
int index;
public:
AoS_Element(SoA& soa, int index) ...
int arr1_data() { return soa.arr1[index];}
int arr2_data() { return soa.arr2[index];}
}
class AoS
{
SoA& soa;
public:
AoS(SoA& _soa):soa(_soa){}
AoS_Element operator[](int index) { return AoS_Element(soa, index);}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Performance using IDs and arrays (vectors) - c++

A good optimizer should be able to compute the exact offset of the memory address each of those items. There is no dependency between loop iterations, so you should be able to get loop unrolled (SIMD processing). Looks great, IMHO. If you can avoid floats, that will also help you.

Related

Which is the best way to index through a for loop?

How to perform GroupBy Sum query on a list?

For loop or no loop? (dataset is small and not subject to change)

Couple performance questions (one bigger vector vs smaller chunks vectors) and Is it worth to store iteration index for jump access of vector?

Fastest Possible Struct-of-Arrays to Array-of-Structs Conversion

Categories

Resources