C++ optimizations - c++

I'm doing some real-time stuff and I need a lot of speed. But in my code, I have this :
float maxdepth;
uint32_t faceindex;
for (uint32_t tr_iterator = 0; tr_iterator < facesNum-1; tr_iterator++)
{
maxdepth = VXTrisDepth[tr_iterator];
faceindex = tr_iterator;
uint32_t tr_literator = 3*tr_iterator;
uint32_t facelindex = 3*faceindex;
for (uint32_t tr_titerator = tr_iterator+1; tr_titerator < facesNum; tr_titerator++)
{
float depth = VXTrisDepth[tr_titerator];
if (depth > maxdepth)
{
maxdepth = depth;
faceindex = tr_titerator;
}
}
Vei2 itmpx = trs[tr_literator+0];
trs[tr_literator+0] = trs[facelindex+0];
trs[facelindex+0] = itmpx;
itmpx = trs[tr_literator+1];
trs[tr_literator+1] = trs[facelindex+1];
trs[facelindex+1] = itmpx;
itmpx = trs[tr_literator+2];
trs[tr_literator+2] = trs[facelindex+2];
trs[facelindex+2] = itmpx;
float id = VXTrisDepth[tr_iterator];
VXTrisDepth[tr_iterator] = VXTrisDepth[faceindex];
VXTrisDepth[faceindex] = id;
}
VXTrisDepth is just an array of float, faceindex is a uint32_t and is a big number, trs is an array of Vei2, and Vei2 is just a integer 2D vector.
The problem is that when we have something like 16074 in facenum, this loop takes 700ms to run on my computer, and that's way too much, any idea of optimizations ?

I've rewritten it a bit to find out what you really was doing.
Warning all code is untested
float maxdepth;
uint32_t faceindex;
for (uint32_t tr_iterator = 0; tr_iterator < facesNum-1; tr_iterator++) {
faceindex = tr_iterator;
uint32_t tr_literator = 3*tr_iterator;
uint32_t facelindex = 3*faceindex;
auto fi = std::max_element(&VXTrisDepth[tr_iterator], &VXTrisDepth[facesNum]);
maxdepth = *fi;
faceindex = std::distance(&VXTrisDepth[0], fi);
// hmm was this originally a VEC3...
std::swap(trs[tr_literator+0], trs[facelindex+0]);
std::swap(trs[tr_literator+1], trs[facelindex+1]);
std::swap(trs[tr_literator+2], trs[facelindex+2]);
// with the above this looks like a struct of arrays. SOA vs AOS
std::swap(VXTrisDepth[tr_iterator], VXTrisDepth[faceindex]);
}
Now it looks like selection sort of two arrays which is O(N^2) no wonder it feels slow.
There are multiple methods to sort this
External index, make an array with length facesNum, initalized from zero to facesNum-1 and sort them using the index into VXTrisDepth. Then reorder the 2 original arrays according to the index array.
External pair of index and key, to make it easy use std::pair, sort it and then reorder the original 2 arrays.
sort the 2 arrays as if it was one, slight hack. using std::swap you need to specialize on a type so it can be misused to swap 2 arrays. No extra storage needed.
Lets try an easy version with the external pair.
We need 3 stages
make helper array O(N)
sort helper array O(N lg N)
reorder original arrays O(N)
And some more code
// make helper array
using hPair = std::pair<float, int>; // order is important
std::vector<hPair> helper;
helper.reserve(numFaces);
for (int idx = 0; idx < facesNum; idx++)
helper.emplace_back(VXTrisDepth[idx], idx);
// sort it using std::pair's operator < or write your own
std::sort(helper.begin(), helper.end());
// reorder the SOA arrays
auto vx = std::begin(VXTrisDepth);
for (auto& help : helper) {
int tr_literator = help.second;
std::swap(trs[tr_literator+0], trs[facelindex+0]);
std::swap(trs[tr_literator+1], trs[facelindex+1]);
std::swap(trs[tr_literator+2], trs[facelindex+2]);
*vs++ = help.first; // we already have the sorted depth in helper.
//std::swap(VXTrisDepth[tr_iterator], VXTrisDepth[faceindex]);
}
Remember to test that it still works ... you already have a test framework right?

Related

Fastest way to determine if a uint64 has been "seen" already

I've been interested in optimizing "renumbering" algorithms that can relabel an arbitrary array of integers with duplicates into labels starting from 1. Sets and maps are too slow for what I've been trying to do, as are sorts. Is there a data structure that only remembers if a number has been seen or not reliably? I was considering experimenting with a bloom filter, but I have >12M integers and the target performance is faster than a good hashmap. Is this possible?
Here's a simple example pseudo-c++ algorithm that would be slow:
// note: all elements guaranteed > 0
std::vector<uint64_t> arr = { 21942198, 91292, 21942198, ... millions more };
std::unordered_map<uint64_t, uint64_t> renumber;
renumber.reserve(arr.size());
uint64_t next_label = 1;
for (uint64_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
if (renumber[elem]) {
arr[i] = renumber[elem];
}
else {
renumber[elem] = next_label;
arr[i] = next_label;
++next_label;
}
}
Example input/output:
{ 12, 38, 1201, 120, 12, 39, 320, 1, 1 }
->
{ 1, 2, 3, 4, 1, 5, 6, 7, 7 }
Your algorithm is not bad, but the appropriate data structure to use for the map is a hash table with open addressing.
As explained in this answer, std::unordered_map can't be implemented that way: https://stackoverflow.com/a/31113618/5483526
So if the STL container is really too slow for you, then you can do better by making your own.
Note, however, that:
90% of the time, when someone complains about STL containers being too slow, they are running a debug build with optimizations turned off. Make sure you are running a release build compiled with optimizations on. Running your code on 12M integers should take a few milliseconds at most.
You are accessing the map multiple times when only once is required, like this:
uint64_t next_label = 1;
for (size_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
uint64_t &label = renumber[elem];
if (!label) {
label = next_label++;
}
arr[i] = label;
}
Note that the unordered_map operator [] returns a reference to the associated value (creating it if it doesn't exist), so you can test and modify the value without having to search the map again.
Updated with bug fix
First, anytime you experience "slowness" with a std:: collection class like vector or map, just recompile with optimizations (release build). There is usually a 10x speedup.
Now to your problem. I'll show a two-pass solution that runs in O(N) time. I'll leave it as an exercise for you to convert to a one-pass solution. But I'll assert that this should be fast enough, even for vectors with millions of items.
First, declare not one, but two unordered maps:
std::unordered_map<uint64_t, uint64_t> element_to_label;
std::unordered_map<uint64_t, std::pair<uint64_t, std::vector<uint64_t>>> label_to_elements;
The first map, element_to_label maps an integer value found in the original array to it's unique label.
The second map, label_to_elements maps to both the element value and the list of indices that element occurs in the original array.
Now to build these maps:
element_to_label.reserve(arr.size());
label_to_elements.reserve(arr.size());
uint64_t next_label = 1;
for (size_t index = 0; index < arr.size(); index++)
{
const uint64_t elem = arr[index];
auto itor = element_to_label.find(elem);
if (itor == element_to_label.end())
{
// new element
element_to_label[elem] = next_label;
auto &p = label_to_elements[next_label];
p.first = elem;
p.second.push_back(index);
next_label++;
}
else
{
// existing element
uint64_t label = itor->second;
label_to_elements[label].second.push_back(index);
}
}
When the above code runs, it's built up a database all values in the array, their labels, and indices where they occur.
So now to renumber the array such that all elements are replaced with their smaller label value:
for (auto itor = label_to_elements.begin(); itor != label_to_elements.end(); itor++)
{
uint64_t label = itor->first;
auto& p = itor->second;
uint64_t elem = p.first; // technically, this isn't needed. It's just useful to know which element value we are replacing from the original array
const auto& vec = p.second;
for (size_t j = 0; j < vec.size(); j++)
{
size_t index = vec[j];
arr[index] = label;
}
}
Notice where I assign variables by reference with the & operator to avoid making an expensive copy of any value in the maps.
So if your original vector or array was this:
{ 100001, 2000002, 300003, 400004, 400004, 300003, 2000002, 100001 };
Then the application of labels would render the array as this:
{1,2,3,4,4,3,2,1}
And what's nice you still have a quick O(1) look operator to map any label in that set back to its original element value using label_to_elements

Unrestricted conversion from Array to TypedArray<std::complex<double>>?

Tried many things, just cannot get it to work when writing a mex-function.
I have an input from MATLAB which I pass to a method as const matlab::data::Array. This array may contain complex data, sometimes it's only real. So the most straightforward approach should be, in my naive thoughts, that I can simply convert the Array to a TypedArray<std::complex<double>> and I get full complex values if the array contains complex values, and I get complex values with imag=0 if the array contains only real values. It seems to be impossible... This last conversion is not accepted in any case, and MATLAB even simply crashes on trying to cast single elements from a real-valued Array to std::complex<double>.
Anybody a solution how to get a TypedArray<std::complex<double>> in all cases so I can use that in C++ code?
Story of my life, trying for hours and after posting here I find something that works within half an hour... Following code seems to do the job:
void prepareObject(const matlab::data::Array& corners, const matlab::data::Array& facets)
{
size_t N_facet_rows = facets.getDimensions()[0];
size_t N_facet_columns = facets.getDimensions()[1];
matlab::data::TypedArray<std::complex<double>> complex_facets = arrayFactory.createArray<std::complex<double>>(facets.getDimensions());
// Convert the facets to a complex-valued array.
if (facets.getType() == ArrayType::DOUBLE) {
std::complex<double> v;
// Input is DOUBLE, so for each value init a complex<double> and store that in the complex array.
v.imag(0);
for (int i_r = 0; i_r < N_facet_rows; i_r++) {
for (int i_c = 0; i_c < N_facet_columns; i_c++) {
v.real(facets[i_r][i_c]);
complex_facets[i_r][i_c] = v;
}
}
}
else {
// Input is COMPLEX_DOUBLE, so simply copy all values.
for (int i_r = 0; i_r < N_facet_rows; i_r++) {
for (int i_c = 0; i_c < N_facet_columns; i_c++) {
complex_facets[i_r][i_c] = (std::complex<double>) facets[i_r][i_c];
}
}
}

How to concatenate smaller matrices or vectors to form a larger one OpenCV c++ cv::Matx

Suppose you have 3 5-vectors and you want to fill an 15-vector with them. (If I know how to do this I will know how to work with cv::Matx matrices too)
int const DIM = 5;
int const STS = 15;
typedef cv::Matx<double, DIM,1> VecDim;
typedef cv::Matx<double, STS,1> VecSts;
.
.
.
VecDim xSt, ySt, wSt;
Then some process occur and their values will be set.
Is there any way in OpenCV 3.1 to do:
VecSts allVals;
allVals << xSt, ySt, wSt;
Or the only "elegant" way ( because it is a bit better than filling Vec15 element by element) it is a for loop as follows?
for(int id = 0; id < DIM; ++id)
{
allVals(id,0) = xSt(id,0);
allVals(id+DIM,0) = ySt(id,0);
allVals(id+2*DIM,0) = wSt(id,0);
}

How to generate a hashmap for huge chunk of data?

I want to make a map such that a set of pointers point to arrays of dynamic size.
I did use hashing with chaining. But since data I am using it for is huge, the program give std::bad_alloc after few iterations. The reason of which may be new used to generate the linked list.
Someone please suggest which data structure shall I use?
Or anything else that can improve memory usage with my hash table?
Program is in C++.
This is what my code looks like:
Initialization of hashtable:
class Link
{
public:
double iData;
Link* pNext;
Link(double it) : iData(it)
{ }
void displayLink()
{ cout << iData << " "; }
};
class List
{
private:
Link* pFirst;
public:
List()
{ pFirst = NULL; }
void insert(double key)
{
if(pFirst==NULL)
pFirst = new Link(key);
else
{
Link* pLink = new Link(key);
pLink->pNext = pFirst;
pFirst = pLink;
}
}
};
class HashTable
{
public:
int arraySize;
vector<List*> hashArray;
HashTable(int size)
{
hashArray.resize(size);
for(int j=0; j<size; j++)
hashArray[j] = new List;
}
};
main snippet:
int t_sample = 1000;
for(int i=0; i < k; i++) // initialize random position
{
x[i] = (cal_rand() * dom_sizex); //dom_sizex = 20e-10 cal_rand() generates rand no between 0 and 1
y[i] = (cal_rand() * dom_sizey); //dom_sizey = 10e-10
}
for(int t=0; t < t_sample; t++)
{
int size;
size = cell_nox * cell_noy; //size of hash table cell_nox = 212, cell_noy = 424
HashTable theHashTable(size); //make table
int hashValue = 0;
for(int n=0; n<k; n++) // k = 10*212*424
{
int m = x[n] /cell_width; //cell_width = 4.7e-8
int l = y[n] / cell_width;
hashValue = (kx*l)+m;
theHashTable.hashArray[hashValue]->insert(n);
}
-------
-------
}
First things first, use a Standard Container. In your specific case, you might want:
either std::unordered_multimap<int, double>
or std::unordered_map<int, std::vector<double>>
(Note: if you do not have C++11, those are available in Boost)
Your main loop becomes (using the second option):
typedef std::unordered_map<int, std::vector<double>> HashTable;
for(int t = 0; t < t_sample; ++t)
{
size_t const size = cell_nox * cell_noy;
// size of hash table cell_nox = 212, cell_noy = 424
HashTable theHashTable;
theHashTable.reserve(size);
for (int n = 0; n < k; ++n) // k = 10*212*424
{
int m = x[n] / cell_width; //cell_width = 4.7e-8
int l = y[n] / cell_width;
int const cellId = (kx*l)+m;
theHashTable[cellId].push_back(n);
}
}
This will not leak memory (reliably), although of course you might have other leaks, and thus will give you a reliable baseline. It is also probably faster than your approach, with a more convenient interface, etc...
In general you should not re-invent the wheel, unless you have a specific need that is not addressed by the available wheels or you are actually trying to learn how to create a wheel or to create a better wheel.
The OS has to solve the same issues with the memory pages, maybe it's worth looking at how that is done? First of all, let's assume all pages are on the disk. A page is a fixed size memory chunk. For your use case, let's say it's an array of your records. Because RAM is limited, the OS maintains a mapping between the page number and it's location in RAM.
So, let's say your pages have 1000 records, and you want to access record 2024, you would ask the OS for page 2, and read record 24 from that page. That way, your map is only 1/1000 in size.
Now, if your page has no mapping to a memory location, then it is either on disk or has never been accessed before (is empty). Then you need to swap out another page, and load that page from disk (and update the location mapping).
This is a very simplified description of what happens and i wouldn't be surprised if someone jumps me in the neck for describing it like this.
The point is:
What does this mean for you?
First of all, your data exceeds your RAM - you won't get around writing to disk, if you don't want to try compression first.
Second, your chains can work as pages if you want, but i wonder whether just paging your hashcode would work better. What i mean is, use the upper bits as page number, and the lower bits as offset in the page. Avoiding collisions is still key, as you want to load the least pages possible. You can still chain your pages, and end up with a much smaller map.
Second - a crucial part is deciding which pages to swap out to make room for the new pages. LRU should do ok. If you can better predict which pages you will (not) need, so much better for you.
Third - you need placeholders for your pages to tell you whether they are in-memory or on disk.
Hope this helps.

Template Sort In C++

Hey all, I'm trying to write a sort function but am having trouble figuring out how to initialize a value, and making this function work as a generic template. The sort works by:
Find a pair =(ii,jj)= with a minimum value = ii+jj = such at A[ii]>A[jj]
If such a pair exists, then
swap A[ii] and A[jj] else
break;
The function I have written is as follows:
template <typename T>
void sort(T *A, int size)
{
T min =453;
T temp=0;
bool swapper = false;
int index1 = 0, index2 = 0;
for (int ii = 0; ii < size-1; ii++){
for (int jj = ii + 1; jj < size; jj++){
if((min >= (A[ii]+A[jj])) && (A[ii] > A[jj])){
min = (A[ii]+A[jj]);
index1 = ii;
index2 = jj;
swapper = true;
}
}
}
if (!swapper)
return;
else
{
temp = A[index1];
A[index1] = A[index2];
A[index2] = temp;
sort(A,size);
}
}
This function will successfully sort an array of integers, but not an array of chars. I do not know how to properly initialize the min value for the start of the comparison. I tried initializing the value by simply adding the first two elements of the array together (min = A[0] + A[1]), but it looks to me like for this algorithm it will fail. I know this is sort of a strange type of sort, but it is practice for a test, so thanks for any input.
most likely reason it fails, is because char = 453 does not produce 453 but rather different number, depending what char is (signed versus unsigned). your immediate solution would be to use numerical_limits, http://www.cplusplus.com/reference/std/limits/numeric_limits/
you may also need to think about design, because char has small range, you are likely to overflow often when adding two chars.
The maximum value of any type is std::numeric_limits<T>::max(). It's defined in <limits>.
Also, consider a redesign. This is not a good algorithm. And I would make sure I knew what I was doing before calling my sort function recursively.
I haven't put too much time reading your algorithm, but as an alternative to std::numeric_limits, you can use the initial element in your array as the initial minimum value. Then you don't have to worry about what happens if you call the function with a class that doesn't specialize std::numeric_limits, and thus can't report a maximum value.