Quick access to a cell in 2D matrix which wraps around - c++

I have a matrix which wraps around.
m_matrixOffset points to first cell(0, 0) of the wrapped around matrix. So to access a cell we have below function GetCellInMatrix .Logic to wrap around(in while loop) is executed each time someone access a cell. This is executed thousands of time in a second. Is there any way to optimize this using some lookup or someother way. MAX_ROWS and MAX_COLS may not be power of 2.
struct Cell
{
Int rowId;
Int colId;
}
int matData[MAX_ROWS][MAX_COLS];
int GetCellInMatrix(const Cell& cellIndex)
{
Cell newCellIndex = cellIndex + m_matrixOffset ;
while (newCellIndex.rowId > MAX_ROWS)
{
newCellIndex.rowId -= MAX_ROWS;
}
while (newCellIndex.colId > MX_COLS)
{
newCellIndex.y -= MAX_COLS;
}
return data[newCellIndex.rowId][newCellIndex.colId];
}

You might be interested in the concept of division with remainder, usually implemented as a % b for the remainder.
Thus
return data[newCellIndex.rowId % MAX_ROWS][newCellIndex.colId % MAX_COLS];
does not need the while loops before it.
As per comment, the implied integer division in the remainder computation is too costly if done at each query. Assuming that m_matrixOffset is constant over a large number of queries, reduce its coordinates once using the remainder operations. Then the newCellIndex are less than twice the maximum, thus need only to be reduced at most once. Thus it is safe to replace while with if, sparing one comparison.
If you can sacrifice memory for space, then double the matrix dimensions and fill the excess entries with the repeated matrix elements. You have to make sure this pattern holds when updating the matrix.
Then, again assuming that both m_matrixOffset and CellIndex are inside the maxima for rows and columns, you can access the cell of the extended matrix without any further reduction. This would be a variant on the "lookup table" idea.
Or use real lookup tables, but you then execute 3 array cell lookups like in
return data[repeatedRowIndex[newCellIndex.rowId]][repeatedColIndex[newCellIndex.colId]];

It depends if the wrap is small or large in relation to the matrix.
The most common case is that all you need is the nearest neighbour. So make the matrix N+2 by M+2 and duplicate the wrap. That makes reads fast but writes a bit fiddly (often a good trade-off).
If that's no good, specialise the functions. Work out which cells are edge cells and handle the specially (you must be able to do this cheaper than simply hard-coding the logic into the access, of course, if only one or two cells change every pass that will hold, not if you generate a random list every pass).

Related

Which is the best away to create and store cycles using c/c++?

Which is the best away to create and store cycles using c/c++?
I have the structs:
struct CYCLE {
vector<Arc> route;
float COST;
}
struct Arc {
int i, j;
Arc () {};
Arc (const Arc& obj): i(obj.i), j(obj.j) {};
Arc(int _i, int _j) : i(_i), j(_j) {}
};
To store the cycles that have already been created, I thought about using:
vector<CYCLE> ConjCycles;
For each cycle created, I need to verify if this cycle has not yet been added to the ConjCycles.
The cycle: 1-2-2-1; is the same as the cycle: 2-2-1-2.
How can I detect that cycles like those are the same?
I thought about using a map to control this.
However, I don't know how to set a key to the cycle, so that the two cycles described above have the same key.
You have quite a lot of redundancy in your cycle representation, e. g. for a cycle 1-3-2-4-1:
{ (1, 3), (3, 2), (2, 4), (4, 1) }
If we consider a cycle being a cyclic graph, then you store the edges in your data structure. It would be more efficient to store the vertices instead:
struct Cycle
{
std::vector<int> vertices;
};
The edges you get implicitly from vertices[n] and vertices[n + 1]; the last vertex is always the same as the first one, so do not store it explicitly, the last edge then will be vertices[vertices.size() - 1], vertices[0].
Be aware that this is only internal representation; you still can construct the cycle from a sequence of edges (Arcs). You'd most likely check the sequence in the constructor and possibly throw an exception, if it is invalid (there are alternatives, though, if you dislike exceptions...).
Then you need some kind of equivalence. My proposition would be:
if the number of vertices is not equal, the cycles cannot be equal.
it might shorten the rest of the algorithm (but that would yet have to be evaluated!), if you count the number of occurrences for each vertex id, these must match
search the minimum vertex id for each cycle, from this on, compare each subsequent value, wrapping around in the vector, if the end is reached.
if sequences match, your done; this does not yet cover the case that there are multiple minimum values, though; if this happens, you might just repeat the step trying the next minimum value in one cycle, staying with the same in the other. You might try to do the same in parallel with the maxima, or if you have counted them anyway (see above), use of minima/maxima the ones with less elements.
Edit: Further improvement (idea inspired by [Scheff]'s comment to the question):
Instead of re-trying each minimum found, we preferably should select some kind of absolute minimum from the relative minima found so far; a relative minimum x is smaller than a relative minimum y if the successor of x is smaller than the successor of y; if both successors are equal, look at the next successors, and so on. If you discover more than one absolute minimum (if some indirect successor gets equal to the initial minium), then you have a sequence some sub-cycle repeating itself multiple times (1-2-3-1-2-3-1-2-3). Then it does not matter, which "absolute" minimum you select...
You'd definitely skip step 2 above then, though.
Find the minimum already in the constructor and store it. Then comparison gets easy, you just start in both cycles at their respective minimum...

(effectively) storing a polynomial dynamically

What i am trying to accomplish is to store an unknown size of a polynomial using arrays.
What i have seen over the internet is using an array that each cell contains the coeffecient and the degree is the cell number, but that is not effecient because what if we have a polynomial like : 6x^14+x+5. this would mean we would have zeros all throughout the cells from 1 till 13.Ive already looked at some solutions with vectors and linked lists but is there any other way to effectively tackle this problem, without the use of (std::vectors or std::list)?
Unless there is a compelling reason to act otherwise (this is a programming assignment where you are required to use C-style arrays), you should use a std::vector from the standard library. Libraries are there for a reason: to make your life easier. The overhead is probably insignificant in the context of your program.
You mention that storing a polynomial (such as 4*x^5 + x - 1) in an std::vector with the indices representing the power (such as [-1, 1, 0, 0, 0, 4]) is inefficient. This is true, but unless you are storing polynomials of degree greater than 1000, this waste is entirely insignificant. For "sparse" polynomials, of high degree but with few coefficients, you could consider using a vector of pairs, with the first value of each pair storing the power and the second value storing the coefficient.
A sparse polynomial can be represented with a map, where a zero element is represented by nonexistent key. Here is an example of such class:
#include <map>
//example of sparse integer polynomial
class SparsePolynomial{
std::map<int,int> coeff;
int& operator[](const int& degree);
int get(int degree);
void update(int degree, int val);
};
Whenever you try to get or update the coefficient of an element, its existence in the map is evaluated. Everytime the coefficient of an element is updated, it is checked whether the value is zero. Hence, the size of the map can always be minimal.
We can replace these two methods with operator[]. However, in that case, we would not be able to check for zero during an update operation, thus the storage would not be as efficient as using two separate methods for access and update.
int SparsePolynomial::get(int degree){
if (coeff.find(degree) == coeff.end()){
return 0;
}else{
return coeff[degree];
}
}
void SparsePolynomial::update(int degree, int val){
if (val == 0){
std::map<int,int>::iterator it = coeff.find(degree);
if (it!=coeff.end()){
coeff.erase(it);
}
}else{
coeff[degree]=val;
}
}
While this method gives us a more efficient storage, it requires more time for access and update than vector does. However, in the case of a sparse polynomial, the difference can be small. Given a std::map of size N, the average search complexity of an element is O(log N). Suppose you have a sparse polynomial with degree d and number of non-zero coefficients N. If N is much smaller than d, then the access and update time would be small enough not to notice.

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

Finding all possible pairs of subsets using recursion

I am given
struct point
{
int x;
int y;
};
and the table of points:
point tab[MAX];
Program should return the minimal distance between the centers of gravity of any possible pair of subsets from tab. Subset can be any size (of course >=1 and < MAX).
I am obliged to write this program using recursion.
So my function will be int type because I have to return int.
I globally set variable min (because while doing recurssion I have to compare some values with this min)
int min = 0;
My function should for sure, take number of elements I add, sum of Y coordinates and sum of X coordinates.
int return_min_distance(int sY, int sX, int number, bool iftaken[])
I will be glad for any help further.
I thought about another table of bools which I pass as a parameter to determine if I took value or not from table. Still my problem is how to implement this, I do not know how to even start.
I think you need a function that can iterate through all subsets of the table, starting with either nothing or an existing iterator. The code then gets easy:
int min_distance = MAXINT;
SubsetIterator si1(0, tab);
while (si1.hasNext())
{
SubsetIterator si2(&si1, tab);
while (si2.hasNext())
{
int d = subsetDistance(tab, si1.subset(), si2.subset());
if (d < min_distance)
{
min_distance = d;
}
}
}
The SubsetIterators can be simple base-2 numbers capable of counting up to MAX, where a 1 bit indicates membership in the subset. Yes, it's a O(N^2) algorithm, but I think it has to be.
The trick is incorporating recursion. Sorry, I just don't see how it helps here. If I can think of a way to use it, I'll edit my answer.
Update: I thought about this some more, and while I still can't see a use for recursion, I found a way to make the subset processing easier. Rather than run through the entire table for every distance computation, the SubsetIterators could store precomputed sums of the x and y values for easy distance computation. Then, on every iteration, you subtract the values that are leaving the subset and add the values that are joining. A simple bit-and operation can reveal these. To be even more efficient, you could use gray coding instead of two's complement to store the membership bitmap. This would guarantee that at each iteration exactly one value enters and/or leaves the subset. Minimal work.

Optimization of Point to Voxel mapping

I used a profiler to look over some code which does not yet run fast enough. It found that the following function took most of the time, and half of the time in this function was spent in floor. Now, there are two possibilities: optimizing this function or going one level above and reducing the calls to this function. I wonder, if the first one is possible.
int Sph::gridIndex (Vector3 position) const {
int mx = ((int)floor(position.x / _gridIntervalSize) % _gridSize);
int my = ((int)floor(position.y / _gridIntervalSize) % _gridSize);
int mz = ((int)floor(position.z / _gridIntervalSize) % _gridSize);
if (mx < 0) {
mx += _gridSize;
}
if (my < 0) {
my += _gridSize;
}
if (mz < 0) {
mz += _gridSize;
}
int x = mx * _gridSize * _gridSize;
int y = my * _gridSize;
int z = mz * 1;
return x + y + z;
}
Vector3 is just some simple class which stores three floats and provides some overloaded operators. _gridSize is of type int and _gridIntervalSize is a float. There are _gridSize ^ 3 buckets.
The purpose of the function is to provide hash table support. Every 3d-point is mapped to an index, and points which lie in the same voxel of size _gridIntervalSize ^ 3 should land in the same bucket.
First rule of optimization when there is math involved: Eliminate division, square roots, and trig functions.
inverse_size = 1 / _gridIntervalSize;
....that should be done only once, not once per call.
int mx = ((int)floor(position.x * inverse_size) % _gridSize);
int my = ((int)floor(position.y * inverse_size) % _gridSize);
int mz = ((int)floor(position.z * inverse_size) % _gridSize);
I would also recommend dropping the mod operation because that's another division - if your grid size is a power of 2 you can use & (gridsize-1) which will also allow you to delete the conditional code at the bottom which is another big savings.
On another note, using overloaded operators may be hurting you. This is a touchy subject here so I'll let you experiment with it and decide for yourself.
I assume you use floor because negative values are possible, and because you don't want an anomaly due to the default truncation when you cast to int (values rounding toward zero from both sides, making some oversized voxels).
If you can specify a safe most-negative value for each value in the vector, you could subtract that (negative) value, or rather the nearest more-negative multiple of _gridIntervalSize, before the cast, and drop the floor.
Using fmod may ensure you have a safe most-negative value, and replace the integer %, but it's probably an anti-optimisation. Still, as a quick change, it may be worth checking.
Also, check whether your platform supports vector instructions, and whether your compiler can easily be encouraged to use them. x86 chips certainly have integer vector instructions as well as float (the old Pentium 1 MMX instructions, for a start) and might be able to handle this much more efficiently than the "normal" CPU instruction set. This may even be a case for digging out the list of vector instruction intrinsics for your compiler and doing some hand-optimisation. Just check what the compiler can do for you first - I'm not sure how much of this kind of optimisation compilers will do for you already.
One probably trivial piece of micro-optimisation...
return (mx * _gridSize + my) * _gridSize + mz;
Saves one integer multiplication. Trivial, of course, and the compiler may catch it anyway, but this is an old habitual thing.
Oh - watch the leading underscores. Those are reserved identifiers. Not likely to cause a problem, but you can't complain if they do.
EDIT
Another way to avoid the floor is to handle positive and negative separately. If you are willing to accept that items bang-on-the-edge of a grid cell may be in the wrong cell (possible anyway since floats should be considered approximate). Just apply a -1 offset in the negative case, to pull it away from the zero by almost exactly right amount to compensate for the truncation. You might consider a bit-fiddling increment-the-mantissa afterwards (to get already integer values in the cell you'd expect) but this is probably unnecessary.
If you can impose power-of-two limitations to your sizes, there may be a bit-fiddling way to efficiently extract the grid position from a float, avoiding some or all of the multiply, floor and % for each of x, y and z, assuming a standard floating point representation (ie this is non-portable). Again, handle positive and negative separately. Extract the exponent, bit-shift the mantissa accordingly, then mask out unwanted bits.
I think you need to look higher up the hierarchy to get real speed improvements. That is, is storing points in a hash-map really the most efficent solution? I assume you have an array of Vector3 arrays, i.e:
Vector3 *points [size][size][size]
where each element in the 3D array is an array of Vector3.
The algorithm you're using doesn't guarantee uniform distribution of points in each Vector3 array, which may be a problem. A cluster of points within _gridIntervalSize will map to the same array.
An alternative method would be to use oct-trees, which are like binary trees but each node has eight child nodes. Each node requires the min/max x/y/z values to define the volume the node covers. To add values to the tree:
Recursive search tree to find smallest node that can contain point
Add point to node
If number of points in node > upper limit to number of points in a node
Create child nodes and move points to child nodes
You may want to use quad-trees if there is little variation in values along a particular axis. Another method is to use BSPs - divide the world into two halves and recurse to find the container to add your point to. Again, these can be dynamic.
Converting the floats to ints and having the division planes lie on integer values will speed up the process as well.
Googling the above terms will lead you to more in depth analysis of the algorithms.
Finally, using floats (or doubles) for co-ordinates in an infinite plane is a bad idea - the further you get from (0,0,0) the less precision you have (the gaps between floating point values increases as the value increases). You will need to 'reset' the floating point values to keep the precision. One method is to 'tile' the space and change the co-ordinates to use integer and floating point parts. The integer part defines the 'tile' and the floating point part defines the position in the tile. This method gets you a much simpler hashing method - just use the integer parts, no call to floor required and only integer calculations required. Another approach is to use fixed-point values rather than floating point values, but this would constrain your precision. This would make calculations accross tile boundaries much easier.
If you could expand on what the top-level requriements of your coordinate system is, there are probably better algorithms available to you.