Initializing a C++ vector to random values... fast - c++

Hey, id like to make this as fast as possible because it gets called A LOT in a program i'm writing, so is there any faster way to initialize a C++ vector to random values than:
double range;//set to the range of a particular function i want to evaluate.
std::vector<double> x(30, 0.0);
for (int i=0;i<x.size();i++) {
x.at(i) = (rand()/(double)RAND_MAX)*range;
}
EDIT:Fixed x's initializer.

Right now, this should be really fast since the loop won't execute.
Personally, I'd probably use something like this:
struct gen_rand {
double range;
public:
gen_rand(double r=1.0) : range(r) {}
double operator()() {
return (rand()/(double)RAND_MAX) * range;
}
};
std::vector<double> x(num_items);
std::generate_n(x.begin(), num_items, gen_rand());
Edit: It's purely a micro-optimization that might make no difference at all, but you might consider rearranging the computation to get something like:
struct gen_rand {
double factor;
public:
gen_rand(double r=1.0) : factor(range/RAND_MAX) {}
double operator()() {
return rand() * factor;
}
};
Of course, there's a really good chance the compiler will already do this (or something equivalent) but it won't hurt to try it anyway (though it's really only likely to help with optimization turned off).
Edit2: "sbi" (as is usually the case) is right: you might gain a bit by initially reserving space, then using an insert iterator to put the data into place:
std::vector<double> x;
x.reserve(num_items);
std::generate_n(std::back_inserter(x), num_items, gen_rand());
As before, we're into such microscopic optimization, I'm not at all sure I'd really expect to see a difference at all. In particular, since this is all done with templates, there's a pretty good chance most (if not all) the code will be generated inline. In that case, the optimizer is likely to notice that the initial data all gets overwritten, and skip initializing it.
In the end, however, nearly the only part that's really likely to make a significant difference is getting rid of the .at(i). The others might, but with optimizations turned on, I wouldn't really expect them to.

I have been using Jerry Coffin's functor method for some time, but with the arrival of C++11, we have loads of cool new random number functionality. To fill an array with random float values we can now do something like the following . . .
const size_t elements = 300;
std::vector<float> y(elements);
std::uniform_real_distribution<float> distribution(0.0f, 2.0f); //Values between 0 and 2
std::mt19937 engine; // Mersenne twister MT19937
auto generator = std::bind(distribution, engine);
std::generate_n(y.begin(), elements, generator);
See the relevant section of Wikipedia for more engines and distributions

Yes, whereas x.at(i) does bounds checking, x[i] does not do so. Also, your code is incorrect as you have failed to specify the size of x in advance. You need to use std::vector<double> x(n), where n is the number of elements that you want to use; otherwise, your loop there will never execute.
Alternatively, you may want to make a custom iterator for generating random values and filling it using the iterator; because the std::vector constructor will initialize its elements, anyway, so if you have a custom iterator class that generates random values you may be able to eliminate a pass over the items.
In terms of implementing an iterator of your own, here is my untested code:
class random_iterator
{
public:
typedef std::input_iterator_tag iterator_category;
typedef double value_type;
typedef int difference_type;
typedef double* pointer;
typedef double& reference;
random_iterator() : _range(1.0), _count(0) {}
random_iterator(double range, int count) :
_range(range), _count(count) {}
random_iterator(const random_iterator& o) :
_range(o._range), _count(o._count) {}
~random_iterator(){}
double operator*()const{ return ((rand()/(double)RAND_MAX) * _range); }
int operator-(const random_iterator& o)const{ return o._count-_count; }
random_iterator& operator++(){ _count--; return *this; }
random_iterator operator++(int){ random_iterator cpy(*this); _count--; return cpy; }
bool operator==(const random_iterator& o)const{ return _count==o._count; }
bool operator!=(const random_iterator& o)const{ return _count!=o._count; }
private:
double _range;
int _count;
};
With the code above, it should be possible to use:
std::vector<double> x(random_iterator(range,number),random_iterator());
That said, the generate code for the other solution given is simpler, and frankly, I would just explicitly fill the vector without resorting to anything fancy like this.... but it is kind of cool to think about.

#include <iostream>
#include <vector>
#include <algorithm>
struct functor {
functor(double v):val(v) {}
double operator()() const {
return (rand()/(double)RAND_MAX)*val;
}
private:
double val;
};
int main(int argc, const char** argv) {
const int size = 10;
const double range = 3.0f;
std::vector<double> dvec;
std::generate_n(std::back_inserter(dvec), size, functor(range));
// print all
std::copy(dvec.begin(), dvec.end(), (std::ostream_iterator<double>(std::cout, "\n")));
return 0;
}
опоздал :(

You may consider using a pseudo-random number generator that gives output as a sequence. Since most PRNGs just provide a sequence anyways, that will be a lot more efficient than simply calling rand() over and over again.
But then, I think I really need to know more about your situation.
Why does this piece of code execute so much? Can you restructure your code to avoid re-generating random data so frequently?
How big are your vectors?
How "good" does your random number generator need to be? High-quality distributions tend to be more expensive to calculate.
If your vectors are large, are you reusing their buffer space, or are you throwing it away and reallocating it elsewhere? Creating new vectors willy-nilly is a great way to destroy your cache.

#Jerry Coffin's answer looks very good. Two other thoughts, though:
Inlining - All of your vector access will be very fast, but if the call to rand() is out-of-line, the function call overhead might dominate. If that's the case, you may need to roll your own pseudorandom number generator.
SIMD - If you're going to roll your own PRNG, you might as well make it compute 2 doubles (or 4 floats) at once. This will reduce the number of the int-to-float conversions as well as the multiplications. I've never tried it, but apparently there's a SIMD version of the Mersenne Twister that's quite good. A simple linear congruential generator might be good enough too (and that's probably what rand() is using already).

int main() {
int size = 10;
srand(time(NULL));
std::vector<int> vec(size);
std::generate(vec.begin(), vec.end(), rand);
std::vector<int> vec_2(size);
std::generate(vec_2.begin(), vec_2.end(), [](){ return rand() % 50;})
}
You need to include vector, algorithm, time, cstdlib.

The way I think about these is a rubber-meets-the-road approach.
In other words, there are certain minimal things that have to happen, no getting around it, such as:
the rand() function has to be called N times.
the result of rand() has to be converted to double and then multiplied by something.
the resulting numbers have to get stored in consecutive elements of an array.
The object is, at a minimum, to get those things done.
Other concerns, like whether or not to use an std::vector and iterators are fine as long as they don't add any extra cycles.
The easiest way to see if they add significant extra cycles is to single-step the code at the assembly language level.

Related

How to correctly implement a function that will generate pseudo-random integers with C++20

I want to note that in C++ the generation of pseudo random numbers is overcomplicated. If you remember about old languages like Pascal, then they had the function Random(n), where n is integer and the generation range is from 0 to n-1. Now, going back to modern C++, I want to get a similar interface, but with a function random_int(a,b), which generates numbers in the [a,b].
Consider the following example:
#include <random>
namespace utils
{
namespace implementation_details
{
struct eng_wrap {
std::mt19937 engine;
eng_wrap()
{
std::random_device device;
engine.seed(device());
}
std::mt19937& operator()()
{
return engine;
}
};
eng_wrap rnd_eng;
}
template <typename int_t, int_t a, int_t b> int_t random_int()
{
static_assert(a <= b);
static std::uniform_int_distribution<int_t> distr(a, b);
return distr(implementation_details::rnd_eng());
}
}
You can see that the distr is marked with the static keyword. Due to this, repeated calls with the same arguments will not cause the construction of the type std::uniform_int_distribution.
In some cases, at the compilation time we do not know the generation boundaries.
Therefore, we have to rewrite this function:
template <typename int_t> int_t random_int2(int_t a, int_t b)
{
std::uniform_int_distribution<int_t> distr(a, b);
return distr(implementation_details::rnd_eng());
}
Next, suppose the second version of this function is called more times:
int a, b;
std::cin>>a>>b;
for (int i=1;i!=1000000;++i)
std::cout<<utils::random_int2(a,b)<<' ';
Question
What is the cost of creating std::uniform_int_distribution in each
iteration of the loop?
Can you suggest a more optimized function that returns a pseudo-random number in the passed range for a normal desktop application?
If you want to use the same a and b repeatedly, use a class with a member function—that’s what they’re for. If you don’t want to expose your rnd_eng (choosing instead to preclude useful multithreaded clients), write the class to use it:
template<class T>
struct random_int {
random_int(T a,T b) : d(a,b) {}
T operator()() const {return d(implementation_details::rnd_eng());}
private:
std::uniform_int_distribution<T> d;
};
IMO, for most simple programs such as games, graphics, and Monte Carlo simulations, the API you actually want is
static xoshiro256ss g;
// Generate a random number between 0 and n-1.
// For example, randint0(2) flips a coin; randint0(6) rolls a die.
int randint0(int n) {
return g() % n;
}
// This version is useful for games like NetHack, where you often
// want to express an ad-hoc percentage chance of something happening.
bool pct(int n) {
return randint0(100) < n;
}
(or substitute std::mt19937 for xoshiro256ss but be aware you're trading away performance in exchange for... something. :))
The % n above is mathematically dubious, when n is astronomically large (e.g. if you're rolling a 12297829382473034410-sided die, you'll find that values between 0 and 6148914691236517205 come up twice as often as they should). So you may prefer to use C++11's uniform_int_distribution:
int randint0(int n) {
return std::uniform_int_distribution<int>(0, n-1)(g);
}
However, again be aware you're gaining mathematical perfection at the cost of raw speed. uniform_int_distribution is more for when you don't already trust your random number engine to be sane (e.g. if the engine's output range might be 0 to 255 but you want to generate numbers from 1 to 1000), or when you're writing template code to work with any arbitrary integer distribution (e.g. binomial_distribution, geometric_distribution) and need a uniform distribution object of that same general "shape" to plug into your template.
The answer to your question #1 is "The cost is free." You will not gain anything by stashing the result of std::uniform_int_distribution<int>(0, n-1) into a static variable. A distribution object is very small, trivially copyable, and basically free to construct. In fact, the cost of constructing the uniform_int_distribution in this case is orders of magnitude cheaper than the cost of thread-safe static initialization.
(There are special cases such as std::normal_distribution where not-stashing the distribution object between calls can result in your doing twice as much work as needed; but uniform_int_distribution is not one of those cases.)

How to calculate the standard deviation with iterators and lambda functions

After learning that one can calculate the mean of data, which is stored in a std::vector< std::vector<double> > data, can be done the following way:
void calculate_mean(std::vector<std::vector<double>>::iterator dataBegin,
std::vector<std::vector<double>>::iterator dataEnd,
std::vector<double>& rowmeans) {
auto Mean = [](std::vector<double> const& vec) {
return std::accumulate(vec.begin(), vec.end(), 0.0) / vec.size(); };
std::transform(dataBegin, dataEnd, rowmeans.begin(), Mean);
}
I made a function which takes the begin and the end of the iterator of the data vector to calculate the mean and std::vector<double> is where I store the result.
My first question is, how to handle the return value of function, when working with vectors. I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?
Second my main questions is, how to adapt this function so one can calculate the standard deviation of each row in a similar way. I tried really hard but it only gives a huge mess, where nothing is working properly. So if someone sees it right away how to do that, I would be glad, for a insight. Thank you.
Edit: Solution
So here is my solution for the problem. Given a std::vector< vector<double> > data (rows, std::vector<double>(columns)), where the data is stored in the rows. The following function calculates the sample standard deviation of each row simultaneously.
auto begin = data.begin();
auto end = data.end();
std::vector<double> std;
std.resize(data.size());
void calculate_std(std::vector<std::vector<double>>::iterator dataBegin,
std::vector<std::vector<double>>::iterator dataEnd,
std::vector<double>& rowstds){
auto test = [](std::vector<double> const& vec) {
double sum = std::accumulate(vec.begin(), vec.end(), 0.0);
double mean = sum / vec.size();
double stdSum = 0.0;
auto Std = [&](const double x) { stdSum += (x - mean) * (x - mean); };
std::for_each(vec.begin(), vec.end(), Std);
return sqrt(stdSum / (vec.size() - 1));
};
std::transform(dataBegin, dataEnd, rowstds.begin(), test);
}
I tested it and it works just fine. So if anyone has some suggestions for improvement, please let me know. And is this piece of code good performance wise?
You will find relatively often the convention to write functions with input parameters first, followed by input / output parameters.
Output parameters (that you write to with the return values of your function) are often a pointer to the data, or a reference.
So your solution seems perfect, from that point of view.
Source:
Google's C++ coding conventions
I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?
No, you should use a local vector<double> variable and return by value. Any compiler worth using would optimize away the copying/moving, and any conforming C++11 compiler is required to perform a move if for whatever reason it cannot elide the copy/move altogether.
Your code as written imposes additional requirements on the caller that are not obvious. For instance, rowmeans must contain enough elements to store the means, or undefined behavior results.

How do I create a vector of object values from a vector of object pointers?

The "naive" solution is;
std::vector<T> vector_of_objects;
vector_of_objects.reserve(vector_of_pointers.size());
for (T const * p : vector_of_pointers)
vector_of_objects.push_back(*p);
The above seems cumbersome and perhaps not immediately obvious.
Is there a solution that is at least not significantly less efficient and perhaps a little quicker and more intuitive? I'm thinking C++11 might have a solution that I am not aware of...
Does writing everything in one line mean shorter code? No.
In my opinion these two lines are shorter and more readable:
for (auto p : vector_of_pointers)
vector_of_objects.emplace_back(*p);
Function std::for_each is not shorter than ranged-based loop, sometime it's bigger due to passing lambda expressions.
Function std::transform is even longer than std::for_each, however the word transform is an advantage to find out what's happening in the following.
You are doing the correct way. Another way is to use built-in algorithms library, like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
// Create a vector of pointers
std::vector<int*> vptr;
vptr.push_back(new int(1));
vptr.push_back(new int(2));
// Copy to vector of objects
std::vector<int> vobj;
std::for_each(vptr.begin(), vptr.end(), [&](int *n) { vobj.emplace_back(*n); });
// Free the pointers
std::for_each(vptr.begin(), vptr.end(), [&](int *n) { delete n; });
// Print out the vector of objects
std::copy(vobj.begin(), vobj.end(), std::ostream_iterator<int>(std::cout, " "));
return 0;
}
The idiomatic way would be to use std::transform:
std::transform( vector_of_pointers.begin(),
vector_of_pointers.end(),
std::back_inserter( vector_of_objects ),
[]( T* p ) { return *p; } );
Whether this is "better" than what you've written is another
question: it has the advantage of being idiomatic, and of
actually naming what is going on (which makes the code slightly
clearer). On the other hand, the "transformation" is very, very
simple, so it would be easily recognized in the loop, and the
new form for writing such loops makes things fairly clear as
well.
No, you have to call the copy-ctor as in your solution, there's no way around that.
std::vector<T*> vecPointers;
std::vector<T> vecValues;
for(size_t x=0;x<vecPointers.size();x++)
{
vecValues.push_back(*vecPointers[x]);
}
I believe that if type T is a custom object then you will need to create a copy constructor for class T.
class T
{
private:
int someValue;
public:
T()
{
}
T(const T &o)// copy constructor
{
someValue = o.someValue;
}
virtual ~T()
{
}
};
It seems to me that the real question is whether you're doing this often enough for it to be worth writing extra code in one place to clean up the code in the other places.
I can imagine writing a deref_iterator that would allow you to do something like this:
std::vector<T> vector_of_objects{
deref_iterator(std::begin(vector_of_pointers)),
deref_iterator(std::end(vector_of_pointers))};
Now, we're left with the question of whether this is really shorter than the original loop or not. In terms of simple number of key strokes, it's probably going to depend on the names you give things. If you didn't care about readable names, it could be:
vector<T> v_o{d(begin(v_p)), d(end(v_p))};
The short names obviously make it short, but I certainly wouldn't advise them -- if I hadn't just typed this in, I'd have no clue in the world what it meant. A longer name (that needs to be repeated a couple of times) obviously adds more key-strokes, but I can't imagine anybody thinking the readability wasn't worth it.
In any case, the deref_iterator itself would clearly take up some code. An iterator has enough boiler-plate that it typically takes around 100 lines of code or so. Let's (somewhat arbitrarily) decide that this saves one line of code every time you use it. On that basis, you'd have to use it 100 times to break even.
I'm not sure that's accurate in characterizing the code overall -- the code for an iterator is mostly boiler-plate, and other than a typo, there's not much that could really go wrong with it. For the most part, it would be a matter of including the right header, and using it, not of virtually ever having to look at the code for the iterator itself.
That being the case, I might accept it as an improvement even if the total number of lines of code increased. Writing it to use only once would clearly be a loss, but I don't think it'd need to be a full 100 times to qualify as breaking even either.

Millions of random numbers generated "overflow" rand_r?

I am having trouble with rand_r. I have a simulation that generates millions of random numbers. I have noticed that at a certain point in time, these numbers are no longer uniform. What could be the problem?
What i do: i create an instance of a generator and give it is own seed.
mainRGen= new nativeRandRUni(idumSeed_g);
here is the class/object def:
class nativeRandRUni {
public:
unsigned seed;
nativeRandRUni(unsigned sd){ seed= sd; }
float genP() { return (rand_r(&seed))/float(RAND_MAX); } // [0,1]
int genI(int R) { return (rand_r(&seed) % R); } // [0,R-1]
};
numbers are simply generated by:
newIntNumber= mainRGen->genI(desired_max);
newFloatNumber= mainRGen->genP();
the simulations have the problem described above. I know this is happening cause i have checked the distribution of the generated numbers after the point in time that a signature is shown in the results (see this, top image, http://ubuntuone.com/0tbfidZaXfGNTfiVr3x7DR)
also, if i print the seed at t-1 and t, being t the time point of the signature, i can see the seed changing by an order of magnitude from value 263069042 to 1069048066
if i run the code with a different seed, the problem is always present but at different time points
Also, if i use rand() instead of my object, all goes well... i DO need the object cause sometimes i used threads. The example above does not have threads.
i am really lost here, any clues?
EDIT - EDIT
it can be reproducible by looping enough times, problem is that, like i said, it takes millions of iterations for the problem to arise. For seed -158342163 i get it at generation t=134065568. One can check numbers generated before (uniform) and after (not uniform). I get the same problem if i change the seed manually at given t's, see (*) in code. Something i also do not expect to happen?
#include <tr1/random>
#include <fstream>
#include <sstream>
#include <iostream>
using std::ofstream;
using std::cout;
using std::endl;
class nativeRandRUni {
public:
unsigned seed;
long count;
nativeRandRUni(unsigned sd){ seed= sd; count=0; }
float genP() { count++; return (rand_r(&seed))/float(RAND_MAX); } // [0,1]
int genI(int R) { count++; return (rand_r(&seed) % R); } // [0,R-1]
};
int main(int argc, char *argv[]){
long timePointOfProblem= 134065568;
nativeRandRUni* mainRGen= new nativeRandRUni(-158342163);
int rr;
//ofstream* fout_metaAux= new ofstream();
//fout_metaAux->open("random.numbers");
for(int i=0; i< timePointOfProblem; i++){
rr= mainRGen->genI(1009200);
//(*fout_metaAux) << rr << endl;
//if(i%1000==0) mainRGen->seed= 111111; //(*) FORCE
}
//fout_metaAux->close();
}
Given that random numbers is key to your simulation, you should implement your own generator. I don't know what algorithm rand_r is using, but it could be something pretty crappy like linear congruent generator.
I'd look into implementing something fast and with good qualities where you know the underlying algorithm. I'd start by looking at implementing Mersenne Twister:
http://en.wikipedia.org/wiki/Mersenne_twister
Its simple to implement and very fast - requires no divides.
ended up trying a simple solution from boost, changing the generator to:
class nativeRandRUni {
public:
typedef mt19937 EngineType;
typedef uniform_real<> DistributionType;
typedef variate_generator<EngineType, DistributionType> VariateGeneratorType;
nativeRandRUni(long s, float min, float max) : gen(EngineType(s), DistributionType(min, max)) {}
VariateGeneratorType gen;
};
I don't get the problem anymore... tho it solved it, i dont feel very comfortable with not understanding what it was. I think Rafael is right, i should not trust rand_r for this intensive number of generations
Now, this is slower than before, so i may look for ways of optimizing it.
QUESTION: Would a Mersenne Twister implementation in principle be faster?
and thanks to all!

Sort objects of dynamic size

Problem
Suppose I have a large array of bytes (think up to 4GB) containing some data. These bytes correspond to distinct objects in such a way that every s bytes (think s up to 32) will constitute a single object. One important fact is that this size s is the same for all objects, not stored within the objects themselves, and not known at compile time.
At the moment, these objects are logical entities only, not objects in the programming language. I have a comparison on these objects which consists of a lexicographical comparison of most of the object data, with a bit of different functionality to break ties using the remaining data. Now I want to sort these objects efficiently (this is really going to be a bottleneck of the application).
Ideas so far
I've thought of several possible ways to achieve this, but each of them appears to have some rather unfortunate consequences. You don't necessarily have to read all of these. I tried to print the central question of each approach in bold. If you are going to suggest one of these approaches, then your answer should respond to the related questions as well.
1. C quicksort
Of course the C quicksort algorithm is available in C++ applications as well. Its signature matches my requirements almost perfectly. But the fact that using that function will prohibit inlining of the comparison function will mean that every comparison carries a function invocation overhead. I had hoped for a way to avoid that. Any experience about how C qsort_r compares to STL in terms of performance would be very welcome.
2. Indirection using Objects pointing at data
It would be easy to write a bunch of objects holding pointers to their respective data. Then one could sort those. There are two aspects to consider here. On the one hand, just moving around pointers instead of all the data would mean less memory operations. On the other hand, not moving the objects would probably break memory locality and thus cache performance. Chances that the deeper levels of quicksort recursion could actually access all their data from a few cache pages would vanish almost completely. Instead, each cached memory page would yield only very few usable data items before being replaced. If anyone could provide some experience about the tradeoff between copying and memory locality I'd be very glad.
3. Custom iterator, reference and value objects
I wrote a class which serves as an iterator over the memory range. Dereferencing this iterator yields not a reference but a newly constructed object to hold the pointer to the data and the size s which is given at construction of the iterator. So these objects can be compared, and I even have an implementation of std::swap for these. Unfortunately, it appears that std::swap isn't enough for std::sort. In some parts of the process, my gcc implementation uses insertion sort (as implemented in __insertion_sort in file stl_alog.h) which moves a value out of the sequence, moves a number items by one step, and then moves the first value back into the sequence at the appropriate position:
typename iterator_traits<_RandomAccessIterator>::value_type
__val = _GLIBCXX_MOVE(*__i);
_GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
*__first = _GLIBCXX_MOVE(__val);
Do you know of a standard sorting implementation which doesn't require a value type but can operate with swaps alone?
So I'd not only need my class which serves as a reference, but I would also need a class to hold a temporary value. And as the size of my objects is dynamic, I'd have to allocate that on the heap, which means memory allocations at the very leafs of the recusrion tree. Perhaps one alternative would be a vaue type with a static size that should be large enough to hold objects of the sizes I currently intend to support. But that would mean that there would be even more hackery in the relation between the reference_type and the value_type of the iterator class. And it would mean I would have to update that size for my application to one day support larger objects. Ugly.
If you can think of a clean way to get the above code to manipulate my data without having to allocate memory dynamically, that would be a great solution. I'm using C++11 features already, so using move semantics or similar won't be a problem.
4. Custom sorting
I even considered reimplementing all of quicksort. Perhaps I could make use of the fact that my comparison is mostly a lexicographical compare, i.e. I could sort sequences by first byte and only switch to the next byte when the firt byte is the same for all elements. I haven't worked out the details on this yet, but if anyone can suggest a reference, an implementation or even a canonical name to be used as a keyword for such a byte-wise lexicographical sorting, I'd be very happy. I'm still not convinced that with reasonable effort on my part I could beat the performance of the STL template implementation.
5. Completely different algorithm
I know there are many many kinds of sorting algorithms out there. Some of them might be better suited to my problem. Radix sort comes to my mind first, but I haven't really thought this through yet. If you can suggest a sorting algorithm more suited to my problem, please do so. Preferrably with implementation, but even without.
Question
So basically my question is this:
“How would you efficiently sort objects of dynamic size in heap memory?”
Any answer to this question which is applicable to my situation is good, no matter whether it is related to my own ideas or not. Answers to the individual questions marked in bold, or any other insight which might help me decide between my alternatives, would be useful as well, particularly if no definite answer to a single approach turns up.
The most practical solution is to use the C style qsort that you mentioned.
template <unsigned S>
struct my_obj {
enum { SIZE = S; };
const void *p_;
my_obj (const void *p) : p_(p) {}
//...accessors to get data from pointer
static int c_style_compare (const void *a, const void *b) {
my_obj aa(a);
my_obj bb(b);
return (aa < bb) ? -1 : (bb < aa);
}
};
template <unsigned N, typename OBJ>
void my_sort (const char (&large_array)[N], const OBJ &) {
qsort(large_array, N/OBJ::SIZE, OBJ::SIZE, OBJ::c_style_compare);
}
(Or, you can call qsort_r if you prefer.) Since STL sort inlines the comparision calls, you may not get the fastest possible sorting. If all your system does is sorting, it may be worth it to add the code to get custom iterators to work. But, if most of the time your system is doing something other than sorting, the extra gain you get may just be noise to your overall system.
Since there are only 31 different object variations (1 to 32 bytes), you could easily create an object type for each and select a call to std::sort based on a switch statement. Each call will get inlined and highly optimized.
Some object sizes might require a custom iterator, as the compiler will insist on padding native objects to align to address boundaries. Pointers can be used as iterators in the other cases since a pointer has all the properties of an iterator.
I'd agree with std::sort using a custom iterator, reference and value type; it's best to use the standard machinery where possible.
You worry about memory allocations, but modern memory allocators are very efficient at handing out small chunks of memory, particularly when being repeatedly reused. You could also consider using your own (stateful) allocator, handing out length s chunks from a small pool.
If you can overlay an object onto your buffer, then you can use std::sort, as long as your overlay type is copyable. (In this example, 4 64bit integers). With 4GB of data, you're going to need a lot of memory though.
As discussed in the comments, you can have a selection of possible sizes based on some number of fixed size templates. You would have to have pick from these types at runtime (using a switch statement, for example). Here's an example of the template type with various sizes and example of sorting the 64bit size.
Here's a simple example:
#include <vector>
#include <algorithm>
#include <iostream>
#include <ctime>
template <int WIDTH>
struct variable_width
{
unsigned char w_[WIDTH];
};
typedef variable_width<8> vw8;
typedef variable_width<16> vw16;
typedef variable_width<32> vw32;
typedef variable_width<64> vw64;
typedef variable_width<128> vw128;
typedef variable_width<256> vw256;
typedef variable_width<512> vw512;
typedef variable_width<1024> vw1024;
bool operator<(const vw64& l, const vw64& r)
{
const __int64* l64 = reinterpret_cast<const __int64*>(l.w_);
const __int64* r64 = reinterpret_cast<const __int64*>(r.w_);
return *l64 < *r64;
}
std::ostream& operator<<(std::ostream& out, const vw64& w)
{
const __int64* w64 = reinterpret_cast<const __int64*>(w.w_);
std::cout << *w64;
return out;
}
int main()
{
srand(time(NULL));
std::vector<unsigned char> buffer(10 * sizeof(vw64));
vw64* w64_arr = reinterpret_cast<vw64*>(&buffer[0]);
for(int x = 0; x < 10; ++x)
{
(*(__int64*)w64_arr[x].w_) = rand();
}
std::sort(
w64_arr,
w64_arr + 10);
for(int x = 0; x < 10; ++x)
{
std::cout << w64_arr[x] << '\n';
}
std::cout << std::endl;
return 0;
}
Given the enormous size (4GB), I would seriously consider dynamic code generation. Compile a custom sort into a shared library, and dynamically load it. The only non-inlined call should be the call into the library.
With precompiled headers, the compilation times may actually be not that bad. The whole <algorithm> header doesn't change, nor does your wrapper logic. You just need to recompile a single predicate each time. And since it's a single function you get, linking is trivial.
#define OBJECT_SIZE 32
struct structObject
{
unsigned char* pObject;
bool operator < (const structObject &n) const
{
for(int i=0; i<OBJECT_SIZE; i++)
{
if(*(pObject + i) != *(n.pObject + i))
return (*(pObject + i) < *(n.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * OBJECT_SIZE); // 10 Objects
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*OBJECT_SIZE);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to check the sort
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end());
free(pObjects);
To skip the #define
struct structObject
{
unsigned char* pObject;
};
struct structObjectComparerAscending
{
int iSize;
structObjectComparerAscending(int _iSize)
{
iSize = _iSize;
}
bool operator ()(structObject &stLeft, structObject &stRight)
{
for(int i=0; i<iSize; i++)
{
if(*(stLeft.pObject + i) != *(stRight.pObject + i))
return (*(stLeft.pObject + i) < *(stRight.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int iObjectSize = 32; // Read it from somewhere
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * iObjectSize);
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*iObjectSize);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to work with something...
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end(), structObjectComparerAscending(iObjectSize));
free(pObjects);