Implement qsort() in terms of std::sort() - c++

For stupid reasons, I'd like to write a function with the following signature (in which the (^) represents Apple's "blocks" extension to C++):
extern "C" my_qsort_b(void *arr, size_t nelem, size_t eltsize, int (^)(const void *, const void *));
where the function is implemented in terms of std::sort. (Note that I can't use qsort because it takes a function pointer, not a block pointer; and I can't use qsort_b because I might not have Apple's standard library. I won't accept answers that involve qsort_b.)
Is it possible to implement this function in C++ using std::sort? Or do I have to write my own quicksort implementation from scratch?
Please provide working code. The devil is in the details here; I'm not asking "How do I use std::sort?"

Doing this is harder than it seems it should be — although std::sort is clearly more powerful than qsort, the impedance mismatch between the two is sufficient to make implementing the latter in terms of the former a daunting task.
Still, it can be done. Here is a working implementation of my_qsort_b (here called block_qsort) that uses std::sort as the workhorse. The code is adapted from an implementation of qsort in terms of std::sort done as an exercise, and trivially modified to compare by invoking a block. The code is tested to compile and work with clang++ 3.3 on x86_64 Linux.
#include <algorithm>
#include <cstring>
struct Elem {
char* location;
size_t size;
bool needs_deleting;
Elem(char* location_, size_t size_):
location(location_), size(size_), needs_deleting(false) {}
Elem(const Elem& rhs): size(rhs.size) {
location = new char[size];
*this = rhs;
needs_deleting = true;
}
Elem& operator=(const Elem& rhs) {
memcpy(location, rhs.location, size);
return *this;
}
~Elem() {
if (needs_deleting)
delete[] location;
}
};
struct Iter: public std::iterator<std::random_access_iterator_tag, Elem> {
Elem elem;
Iter(char* location, size_t size): elem(location, size) {}
// Must define custom copy/assignment to avoid copying of iterators
// making copies of elem.
Iter(const Iter& rhs): elem(rhs.elem.location, rhs.elem.size) {}
Iter& operator=(const Iter& rhs) {elem.location = rhs.elem.location; return *this;}
char* adjust(ptrdiff_t offset) const {
return elem.location + ptrdiff_t(elem.size) * offset;
}
// Operations required for random iterator.
Iter operator+(ptrdiff_t diff) const {return Iter(adjust(diff), elem.size);}
Iter operator-(ptrdiff_t diff) const {return Iter(adjust(-diff), elem.size);}
ptrdiff_t operator-(const Iter& rhs) const {
return (elem.location - rhs.elem.location) / ptrdiff_t(elem.size);
}
Iter& operator++() {elem.location=adjust(1); return *this;}
Iter& operator--() {elem.location=adjust(-1); return *this;}
Iter operator++(int) {Iter old = *this; ++*this; return old;}
Iter operator--(int) {Iter old = *this; --*this; return old;}
bool operator!=(const Iter& rhs) const {return elem.location != rhs.elem.location;}
bool operator==(const Iter& rhs) const {return elem.location == rhs.elem.location;}
bool operator<(const Iter& rhs) const {return elem.location < rhs.elem.location;}
Elem& operator*() {return elem;}
};
struct Cmp_adaptor {
typedef int (^Qsort_comparator)(const void*, const void*);
Qsort_comparator cmp;
Cmp_adaptor(Qsort_comparator cmp_) : cmp(cmp_) {}
bool operator()(const Elem& a, const Elem& b) {
return cmp(a.location, b.location) < 0;
}
};
void block_qsort(void* base, size_t nmemb, size_t size,
int (^compar)(const void *, const void *))
{
Iter begin = Iter(static_cast<char*>(base), size);
std::sort(begin, begin + nmemb, Cmp_adaptor(compar));
}
If block_qsort needs to be called from C, you can declare it extern "C", since it uses no C++ features in its interface. To test the function, compile and run this additional code:
// test block_qsort
#include <iostream>
#include <cstring>
int main(int argc, char** argv)
{
// sort argv[1..argc].
block_qsort(argv + 1, argc - 1, sizeof (char*),
^int (const void* a, const void* b) {
return strcmp(*(char**) a, *(char**) b);
});
for (++argv; *argv; argv++)
std::cout << *argv << std::endl;
return 0;
}

to use std::sort, you'd have to write an iterator class and a class that wraps the block in a functor object. Implementing quicksort by yourself seems like a shorter alternative.
BTW: the block should be returning bool, not void, right?

Start with this:
struct memblockref {
void* location;
size_t size;
memblockref( void* loc, size_t s ):location(loc), size(s) {}
memblockref& operator=( memblockref const& right ) {
Assert( size == right.size );
memcpy( location, right.location, std::min( size, right.size ));
return *this;
}
private:
memblockref( memblockref const& ) = delete; // or leave unimplemented in C++03
memblockref() = delete; // or leave unimplemented in C++03
};
then use http://www.boost.org/doc/libs/1_52_0/libs/iterator/doc/iterator_facade.html to create iterators of memblockref to your memory buffer.
Then turn the block into a function pointer, or wrap it in a lambda or functor, and call std::sort, where you call your block based comparison on the location field of the left and right memblockref.
You may have to specialize swap or iter_swap as well, but maybe not.

Related

Cannot use range-for loop for a custom type inside a template function - C++ [duplicate]

(also see Is there a good way not to hand-write all twelve required Container functions for a custom type in C++? )
For a class such as
namespace JDanielSmith {
class C
{
const size_t _size;
const std::unique_ptr<int[]> _data;
public:
C(size_t size) : _size(size), _data(new int[size]) {}
inline const int* get() const noexcept { return _data.get(); }
inline int* get() noexcept { return _data.get(); }
size_t size() const noexcept { return _size; }
};
}
what is the preferred way to expose iteration? Should I write begin()/end() (and cbegin()/cend()) member functions?
const int* cbegin() const {
return get();
}
const int* cend() const {
return cbegin() + size();
}
or should these be non-member functions?
const int* cbegin(const C& c) {
return c.get();
}
const int* cend(const C& c) {
return cbegin(c) + c.size();
}
Should begin()/end() have both const and non-const overloads?
const int* begin() const {
return get();
}
int* begin() {
return get();
}
Are there any other things to consider? Are there tools/techniques to make this "easy to get right" and reduce the amount of boiler-plate code?
Some related questions/discussion include:
Should custom containers have free begin/end functions?
Why use non-member begin and end functions in C++11?
When to use std::begin and std::end instead of container specific versions
There is a standard which describes what your class interfaces should look like if you want them to be congruent with the STL. C++ has this notion of 'concepts' which pin down the requirements for a given class to be a sufficient implementation of a concept. This almost became a language feature in c++11.
A concept you may be interested in is the Container concept. As you can see, in order to meet the requirements of the Container concept, you need begin, cbegin, end, and cend as member functions (among other things).
Since it looks like you're storing your data in an array, you might also be interested in SequenceContainer.
I'll take option C.
The main problem here is that std::begin() doesn't actually work for finding non-member begin() with ADL. So the real solution is to write your own that does:
namespace details {
using std::begin;
template <class C>
constexpr auto adl_begin(C& c) noexcept(noexcept(begin(c)))
-> decltype(begin(c))
{
return begin(c);
}
}
using details::adl_begin;
Now, it doesn't matter if you write your begin() as a member or non-member function, just use adl_begin(x) everywhere and it'll just work. As well as for both the standard containers and raw arrays. This conveniently side-steps the member vs. non-member discussion.
And yes, you should have const and non-const overloads of begin() and friends, if you want to expose const and non-const access.
I suggest creating both sets of functions -- member functions as well as non-member functions -- to allow for maximum flexibility.
namespace JDanielSmith {
class C
{
const size_t _size;
const std::unique_ptr<int[]> _data;
public:
C(size_t size) : _size(size), _data(new int[size]) {}
inline const int* get() const { return _data.get(); }
inline int* get() { return _data.get(); }
size_t size() const { return _size; }
int* begin() { return get(); }
int* end() { return get() + _size; }
const int* begin() const { return get(); }
const int* end() const { return get() + _size; }
const int* cbegin() const { return get(); }
const int* cend() const { return get() + _size; }
};
int* begin(C& c) { return c.begin(); }
int* end(C& c) { return c.end(); }
const int* begin(C const& c) { return c.begin(); }
const int* end(C const& c) { return c.end(); }
const int* cbegin(C const& c) { return c.begin(); }
const int* cend(C const& c) { return c.end(); }
}
The member functions are necessary if you want to be able to use objects of type C as arguments to std::begin, std::end, std::cbegin, and std::cend.
The non-member functions are necessary if you want to be able to use objects of type C as arguments to just begin, end, cbegin, and cend. ADL will make sure that the non-member functions will be found for such usages.
int main()
{
JDanielSmith::C c1(10);
{
// Non-const member functions are found
auto b = std::begin(c1);
auto e = std::end(c1);
for (int i = 0; b != e; ++b, ++i )
{
*b = i*10;
}
}
JDanielSmith::C const& c2 = c1;
{
// Const member functions are found
auto b = std::begin(c2);
auto e = std::end(c2);
for ( ; b != e; ++b )
{
std::cout << *b << std::endl;
}
}
{
// Non-member functions with const-objects as argument are found
auto b = begin(c2);
auto e = end(c2);
for ( ; b != e; ++b )
{
std::cout << *b << std::endl;
}
}
}
In order to create a valid iterator, you must ensure that std::iterator_traits is valid. This means you must set the iterator category among other things.
An iterator should implement iterator(), iterator(iterator&&), iterator(iterator const&), operator==, operator !=, operator++, operator++(int), operator*, operator=, and operator->. It's also a good idea to add operator< and operator+ if you can (you can't always, e.g. linked lists.)
template <typename T>
class foo
{
public:
using value_type = T;
class iterator
{
public:
using value_type = foo::value_type;
using iterator_category = std::random_access_iterator_tag;
// or whatever type of iterator you have...
using pointer = value_type*;
using reference = value_type&;
using difference_type = std::ptrdiff_t;
// ...
};
class const_iterator
{
// ...
};
iterator begin() { /*...*/ }
iterator end() { /*...*/ }
const_iterator cbegin() const { /*...*/ }
const_iterator cend() const { /*...*/ }
/* ... */
};
See: http://en.cppreference.com/w/cpp/iterator/iterator_traits for more information on what you need to make a valid iterator. (Note: You also need certain properties to be a valid "container", like .size())
Ideally you should use member functions for begin and end, but it's not required... you can also overload std::begin and std::end. If you don't know how to do that, I suggest you use member functions.
You should create begin() const and end() const, but it should be an alias for cbegin(), NEVER the same as begin()!

Iterator gives different results based on usage c++

I recently started c++ programming. I shifted from Java.
I was building my own Iterable class template like this:
template<class T> class Iterable
{
T start,stop;
public:
explicit Iterable(T s,T e) {start=s; stop=e;;}
public:
virtual void next(T& i) =0;
public:
class iterator: public std::iterator<
std::input_iterator_tag, // iterator_category
T, // value_type
long, // difference_type
const T*, // pointer
T // reference
>{
T current;
Iterable<T>* obj;
public:
explicit iterator(T t,Iterable<T>* o) : obj(o) {current=t;}
iterator& operator++() {obj->next(current); return *this;}
iterator operator++(int) {iterator retval = *this; ++(*this); return retval;}
bool operator==(iterator other) const {return current == other.current;}
bool operator!=(iterator other) const {return !(*this == other);}
const T& operator*() const {return current;}
};
iterator begin() {return iterator(start,this);}
iterator end() {return iterator(stop,this);}
};
When i tried to use this iterator, I got different results when invoked differently:
for(auto S=SI.begin();S!=SI.end();S++)
{
cout << *S << "\n";
//cout << contains(seqs,S) << "\n";
if(!contains(seqs,*S))
seqs.push_back(*(new Sequence(*S)));
}
gave different results from:
for(Sequence S : SI)
{
cout << S << "\n";
//cout << contains(seqs,S) << "\n";
if(!contains(seqs,S))
seqs.push_back(*(new Sequence(S)));
}
even in the loop.
My SeqIter class (SI is object of this class) is as follows:
class SeqIter : public flex::Iterable<Sequence>
{
int n;
public:
SeqIter(int s) : Iterable(Sequence(copyList(0,s),s),Sequence(copyList(3,s),s)) {n=s;}
void next(Sequence& s)
{
char ch;
for(int i=0;i<n;i++)
{
ch=nextBase(s[i]);
s[i]=ch;
if(ch!=0)
break;
}
}
};
Sorry if this is too much code, but I do not know how much code is required.
Also, a brief explanation on the Sequence class:
It is a class that has an array of numbers (in this case I tried with 3), and it generates next sequences based on the first, i.e. 000, 100, 200, 300; 010,110 ...
Each digit ranges from 0-3 (both included)
I am unable to understand why both loops give different sequences (first gives 000 100 200 300 010 110 whereas second gives 000 100 200 300 000 110)
I thought both the loops were fundamentally same, and that the first was just the expansion of the second. Is that not so?
Also sequence class: (Sorry for delay, but I guess this is the problem)
class Sequence
{
int size=1;
char* bps;
public:
Sequence() {size=0;}
Sequence(int s)
{
size=s;
bps=new char[s];
}
Sequence(char* arr,int s)
{
size=s;
bps=arr;
}
Sequence(const Sequence& seq)
{
size=seq.size;
bps=new char[size];
strcpy(bps,seq.bps);
}
String toString() const {return *(new String(bps,size));}
inline char* toCharArray() {return bps;}
inline int getSize() const {return size;}
//operator overloading
public:
bool operator==(const Sequence& s2) const
{
if(s2.size!=size)
return false;
String r1=toString();
String r2=s2.toString();
return (r1==r2 || r1==r2.reverse());
}
inline bool operator!=(const Sequence& s2) const {return !operator==(s2);}
const char& operator[](int n) const
{
if(n>=size)
throw commons::IndexOutOfBoundsException(n,size);
return bps[n];
}
char& operator[](int n)
{
if(n>=size)
throw commons::IndexOutOfBoundsException(n,size);
return bps[n];
}
Sequence& operator=(const Sequence& seq)
{
size=seq.size;
bps=new char[size];
strcpy(bps,seq.bps);
}
};
Sorry everyone. Answering my own question after debugging:
In my Sequence class, I was using strcpy in copying char*, where the array did not end with a '\0'
Probably that caused the error:
I read online a bit more to find that the expansion was as follows:
for(Sequence S : seqs)
{
...
}
is equivalent to
for(auto i=SI.begin();i!=SI.end();i++)
{
Sequence S=*i;
...
}
So in the assignment, (S=*i) the data was not properly copied.
Sorry for all the trouble
fixed by removing assignment operator overload, and changing copy-constructor to:
Sequence(const Sequence& seq)
{
size=seq.size;
bps=new char[size];
for(int i=0;i<size;i++)
bps[i]=seq[i];
}

Standard containers encapsulation and range-based for loops

I'm designing a class which has two standard vectors as members. I would like to be able to use range-based for loops on the vector elements and I came up with this solution
#include <iostream>
#include <vector>
using namespace std;
class MyClass {
public:
void addValue1(int val){data1_.push_back(val);}
void addValue2(int val){data2_.push_back(val);}
vector<int> const & data1() const {return data1_;}
vector<int> const & data2() const {return data2_;}
// ...
private:
vector<int> data1_;
vector<int> data2_;
// ...
};
void print1(MyClass const & mc) {
for (auto val : mc.data1()){
cout << val << endl;
}
}
void print2(MyClass const & mc) {
for (auto val : mc.data2()){
cout << val << endl;
}
}
int main(){
MyClass mc;
mc.addValue1(1);
mc.addValue1(2);
mc.addValue1(3);
print1(mc);
}
Clearly, the alternative of defining begin() and end() functions doesn't make sense since I have two distinct vectors.
I would like to ask the following questions:
A shortcoming of the proposed solution is that the contents of the two vectors cannot be changed (due to the const qualifier). In the case I need to modify the vector elements how can I modify the code?
EDIT: the modification should preserve encapsulation
Considering data encapsulation, do you think it is bad practice to return a (const) reference to the two vectors?
Use something like gsl::span<int> and gsl::span<const int>.
Here is a minimal one:
template<class T>
struct span {
T* b = 0; T* e = 0;
T* begin() const { return b; }
T* end() const { return e; }
span( T* s, T* f ):b(s),e(f) {}
span( T* s, std::size_t len ):span(s, s+len) {}
template<std::size_t N>
span( T(&arr)[N] ):span(arr, N) {}
// todo: ctor from containers with .data() and .size()
// useful helpers:
std::size_t size() const { return end()-begin(); }
bool empty() const { return size()==0; }
T& operator[](std::size_t i) const { return begin()[i]; }
T& front() const { return *begin(); }
T& back() const { return *(std::prev(end())); }
// I like explicit defaults of these:
span() = default;
span(span const&) = default;
span& operator=(span const&) = default;
~span() = default;
};
now you can write:
span<int const> data1() const {return {data1_.data(), data1_.size()};}
span<int const> data2() const {data2_.data(), data2_.size()};}
span<int> data1() {return {data1_.data(), data1_.size()};}
span<int> data2() {data2_.data(), data2_.size()};}
A shortcoming of the proposed solution is that the contents of the two vectors cannot be changed (due to the const qualifier). In the case I need to modify the vector elements how can I modify the code?
First of all, you should add a data1() and a data2() not-const versions that return a reference to the data1_ and data2_ members
vector<int> const & data1() const {return data1_;}
vector<int> const & data2() const {return data2_;}
vector<int> & data1() {return data1_;}
vector<int> & data2() {return data2_;}
Second: if you want modify the element in print1() (by example) you have to receive mc as not const reference
// ..........vvvvvvvvv no more const
void print1 (MyClass & mc) {
so you can change mc.
Third: in the range based loop you have to define val as reference so you can modify it modifying also the referenced value inside the vector
// ........V by reference
for ( auto & val : mc.data1() ) {
++val ; // this modify the value in the vector inside mc
cout << val << endl;
}
Considering data encapsulation, do you think it is bad practice to return a (const) reference to the two vectors?
IMHO: if the reference is const, not at all: it's a good practice because permit the safe use of the member without the need to duplicate it.
If the reference isn't const, I don't see big difference with declaring the member public.

Is it possible to define the length of the type to sort in STXXL at run time?

I have an application that requires a built-in sort and I'm hoping to replace the existing sort mechanism with the sort provided by STXXL. I have successfully tested it using STXXL, but my problem is that, although a specific run of the sort needs to operate on fixed length strings, the length is determined at run-time and can be anywhere between 10 bytes and 4000 bytes. Always allowing for 4000 bytes will obviously be grossly inefficient if the actual length is small.
For those not familiar with STXXL, I believe the problem roughly equates to defining a std::vector without knowing the size of the objects at compilation time. However, I'm not a C++ expert - the application is written in C.
In my test this is the type that I am sorting:
struct string80
{
char x[80];
};
and this is the type definition for the STXXL sorter:
typedef stxxl::sorter<string80, sort_comparator80> stxxl_sorter80;
The problem is that I don't want to hard-code the array size to '80'.
The only solution I can come up with, is to define a number of structures of varying lengths and pick the closest at run-time. Am I missing a trick? Am I thinking in C rather than C++?
What if we store objects (records) of size n in a flat stxxl::vector of chars. Then, define a custom iterator based on stxxl::vector::iterator that merely skips n bytes on each increment. This will work with std::sort and even tbb::sort, when used std::vector instead of STXXL's. I see that STXXL's ExtIterator has a lot of additional traits. Is it possible to define them correctly for such an iterator?
#include <vector>
#include <cassert>
#include <cstdlib>
#include <stxxl.h>
#include <iostream>
#include <algorithm>
typedef std::vector<char>::iterator It;
class ObjectValue;
//This class defines a reference object that handles assignment operations
//during a sorting
class ObjectReference
{
public:
ObjectReference() : recordSize_(0) {}
ObjectReference(It ptr, size_t recordSize) : ptr_(ptr), recordSize_(recordSize) {}
void operator = (ObjectReference source) const
{
std::copy(source.ptr_, source.ptr_ + recordSize_, ptr_);
}
void operator = (const ObjectValue & source) const;
It GetIterator() const
{
return ptr_;
}
size_t GetRecordSize() const
{
return recordSize_;
}
private:
It ptr_;
size_t recordSize_;
};
//This class defines a value object that is used when a temporary value of a
//record is required somewhere
class ObjectValue
{
public:
ObjectValue() {}
ObjectValue(ObjectReference prx) : object_(prx.GetIterator(), prx.GetIterator() + prx.GetRecordSize()) {}
ObjectValue(It ptr, size_t recordSize) : object_(ptr, ptr + recordSize) {}
std::vector<char>::const_iterator GetIterator() const
{
return object_.begin();
}
private:
std::vector<char> object_;
};
//We need to support copying from a reference to an object
void ObjectReference::operator = (const ObjectValue & source) const
{
std::copy(source.GetIterator(), source.GetIterator() + recordSize_, ptr_);
}
//The comparator passed to a sorting algorithm. It recieves iterators, converts
//them to char pointers, that are passed to the actual comparator tha handles
//object comparison
template<class Cmp>
class Comparator
{
public:
Comparator() {}
Comparator(Cmp cmp) : cmp_(cmp) {}
bool operator () (const ObjectReference & a, const ObjectReference & b) const
{
return cmp_(&*a.GetIterator(), &*b.GetIterator());
}
bool operator () (const ObjectValue & a, const ObjectReference & b) const
{
return cmp_(&*a.GetIterator(), &*b.GetIterator());
}
bool operator () (const ObjectReference & a, const ObjectValue & b) const
{
return cmp_(&*a.GetIterator(), &*b.GetIterator());
}
bool operator () (const ObjectValue & a, const ObjectValue & b) const
{
return cmp_(&*a.GetIterator(), &*b.GetIterator());
}
private:
Cmp cmp_;
};
//The iterator that operates on flat byte area. If the record size is $n$, it
//just skips $n$ bytes on each increment operation to jump to the next record
class RecordIterator : public std::iterator<std::random_access_iterator_tag, ObjectValue, size_t, RecordIterator, ObjectReference>
{
public:
RecordIterator() : recordSize_(0) {}
RecordIterator(It ptr, size_t recordSize) : ptr_(ptr), recordSize_(recordSize) {}
ObjectReference operator * () const
{
return ObjectReference(ptr_, recordSize_);
}
ObjectReference operator [] (size_t diff) const
{
return *(*this + diff);
}
It GetIterator() const
{
return ptr_;
}
size_t GetRecordSize() const
{
return recordSize_;
}
RecordIterator& operator ++()
{
ptr_ += recordSize_;
return *this;
}
RecordIterator& operator --()
{
ptr_ -= recordSize_;
return *this;
}
RecordIterator operator ++(int)
{
RecordIterator ret = *this;
ptr_ += recordSize_;
return ret;
}
RecordIterator operator --(int)
{
RecordIterator ret = *this;
ptr_ -= recordSize_;
return ret;
}
friend bool operator < (RecordIterator it1, RecordIterator it2);
friend bool operator > (RecordIterator it1, RecordIterator it2);
friend bool operator == (RecordIterator it1, RecordIterator it2);
friend bool operator != (RecordIterator it1, RecordIterator it2);
friend size_t operator - (RecordIterator it1, RecordIterator it2);
friend RecordIterator operator - (RecordIterator it1, size_t shift);
friend RecordIterator operator + (RecordIterator it1, size_t shift);
private:
It ptr_;
size_t recordSize_;
};
bool operator < (RecordIterator it1, RecordIterator it2)
{
return it1.ptr_ < it2.ptr_;
}
bool operator > (RecordIterator it1, RecordIterator it2)
{
return it1.ptr_ > it2.ptr_;
}
bool operator == (RecordIterator it1, RecordIterator it2)
{
return it1.ptr_ == it2.ptr_;
}
bool operator != (RecordIterator it1, RecordIterator it2)
{
return !(it1 == it2);
}
RecordIterator operator - (RecordIterator it1, size_t shift)
{
return RecordIterator(it1.ptr_ - shift * it1.recordSize_, it1.recordSize_);
}
RecordIterator operator + (RecordIterator it1, size_t shift)
{
return RecordIterator(it1.ptr_ + shift * it1.recordSize_, it1.recordSize_);
}
size_t operator - (RecordIterator it1, RecordIterator it2)
{
return (it1.ptr_ - it2.ptr_) / it1.recordSize_;
}
namespace std
{
//We need to specialize the swap for the sorting to work correctly
template<>
void swap(ObjectReference & it1, ObjectReference & it2)
{
ObjectValue buf(it1.GetIterator(), it1.GetRecordSize());
std::copy(it2.GetIterator(), it2.GetIterator() + it2.GetRecordSize(), it1.GetIterator());
std::copy(buf.GetIterator(), buf.GetIterator() + it1.GetRecordSize(), it2.GetIterator());
}
}
//Finally, here is the "user"-defined code. In the example, "records" are
//4-byte integers, although actual size of a record can be changed at runtime
class RecordComparer
{
public:
bool operator ()(const char * aRawPtr, const char * bRawPtr) const
{
const int * aPtr = reinterpret_cast<const int*>(aRawPtr);
const int * bPtr = reinterpret_cast<const int*>(bRawPtr);
return *aPtr < *bPtr;
}
};
int main(int, char*[])
{
size_t size = 100500;
//Although it is a constant, it is easy to change to in runtime
size_t recordSize = sizeof(int);
std::vector<int> intVector(size);
std::generate(intVector.begin(), intVector.end(), rand);
const char * source = reinterpret_cast<const char*>(&intVector[0]);
std::vector<char> recordVector;
std::copy(source, source + recordVector.size(), &recordVector[0]);
RecordIterator begin(recordVector.begin(), recordSize);
RecordIterator end(recordVector.end(), recordSize);
//Sort "records" as blocks of bytes
std::sort(begin, end, Comparator<RecordComparer>());
//Sort "records" as usual
std::sort(intVector.begin(), intVector.end());
//Checking that arrays are the same:
for (; begin != end; ++begin)
{
size_t i = begin - RecordIterator(recordVector.begin(), recordSize);
It it = (*(begin)).GetIterator();
int* value = reinterpret_cast<int*>(&(*it));
assert(*value == intVector[i]);
}
return 0;
}
There is no good solution here, at least not with STXXL.
The STXXL sorter is highly optimized, and the code requires the data type's size to be provided at compile time via template parameters. I don't see that this will, or even should change.
The method of instantiating classes for many different parameters is not nice, but pretty common practise. Just think of all the different std::vector instances used in simple C++ programs, which could all be handled via void* functions in C.
Depending on how much code you want to roll out, try instanciating powers of two, and then more fine grain for your common parameters.

How to define iterator for a special case in order to use it in a for loop using the auto keyword

I would like to define the subsequent code in order to be able to use it like
"for (auto x:c0){ printf("%i ",x); }"
But I do not understand something and i have searched it for some time.
The error I get is:
error: invalid type argument of unary ‘*’ (have ‘CC::iterator {aka int}’)
#include <stdio.h>
class CC{
int a[0x20];
public: typedef int iterator;
public: iterator begin(){return (iterator)0;}
public: iterator end(){return (iterator)0x20;}
public: int& operator*(iterator i){return this->a[(int)i];}
} ;
int main(int argc, char **argv)
{ class CC c0;
for (auto x:c0){
printf("%i ",x);
}
printf("\n");
return 0;
}
It seems you are trying to use int as you iterator type using the member operator*() as the deference operations. That won't work:
The operator*() you defined is a binary operator (multiplication) rather than a unary dereference operation.
You can't overload operators for built-in types and an iterator type needs to have a dereference operator.
To be able to use the range-based for you'll need to create a forward iterator type which needs a couple of operations:
Life-time management, i.e., copy constructor, copy assignment, and destruction (typically the generated ones are sufficient).
Positioning, i.e., operator++() and operator++(int).
Value access, i.e., operator*() and potentially operator->().
Validity check, i.e., operator==() and operator!=().
Something like this should be sufficient:
class custom_iterator {
int* array;
int index;
public:
typedef int value_type;
typedef std::size_t size_type;
custom_iterator(int* array, int index): array(array), index(index) {}
int& operator*() { return this->array[this->index]; }
int const& operator*() const { return this->array[this->index]; }
custom_iterator& operator++() {
++this->index;
return *this;
}
custom_iterator operator++(int) {
custom_iterator rc(*this);
this->operator++();
return rc;
}
bool operator== (custom_iterator const& other) const {
return this->index = other.index;
}
bool operator!= (custom_iteartor const& other) const {
return !(*this == other);
}
};
You begin() and end() methods would then return a suitably constructed version of this iterator. You may want to hook the iterator up with suitable std::iterator_traits<...> but I don't think these are required for use with range-based for.
Dietmar Kühl explained well why your code does not work: you cannot make int behaving as an iterator.
For the given case, a suitable iterator can be defined as a pointer to int. The following code is tested at ideone:
#include <stdio.h>
class CC{
int a[0x20];
public: typedef int* iterator;
public: iterator begin() {return a;}
public: iterator end() {return a+0x20;}
} ;
int main(int argc, char **argv)
{
class CC c0;
int i = 0;
for (auto& x:c0){
x = ++i;
}
for (auto x:c0){
printf("%i ",x);
}
printf("\n");
return 0;
}