memcmp sort - c++

I have a single buffer, and several pointers into it. I want to sort the pointers based upon the bytes in the buffer they point at.
qsort() and stl::sort() can be given custom comparision functions. For example, if the buffer was zero-terminated I could use strcmp:
int my_strcmp(const void* a,const void* b) {
const char* const one = *(const char**)a,
const two = *(const char**)b;
return ::strcmp(one,two);
}
however, if the buffer is not zero-terminated, I have to use memcmp() which requires a length parameter.
Is there a tidy, efficient way to get the length of the buffer into my comparision function without a global variable?

With std::sort, you can use a Functor like this:
struct CompString {
CompString(int len) : m_Len(len) {}
bool operator<(const char *a, const char *b) const {
return std::memcmp(a, b, m_Len);
}
private:
int m_Len;
};
Then you can do this:
std::sort(begin(), end(), CompString(4)); // all strings are 4 chars long
EDIT: from the comment suggestions (i guess both strings are in a common buffer?):
struct CompString {
CompString (const unsigned char* e) : end(e) {}
bool operator()(const unsigned char *a, const unsigned char *b) const {
return std::memcmp(a, b, std::min(end - a, end - b)) < 0;
}
private:
const unsigned char* const end;
};

With the C function qsort(), no, there is no way to pass the length to your comparison function without using a global variable, which means it can't be done in a thread-safe manner. Some systems have a qsort_r() function (r stands for reentrant) which allows you to pass an extra context parameter, which then gets passed on to your comparison function:
int my_comparison_func(void *context, const void *a, const void *b)
{
return memcmp(*(const void **)a, *(const void **)b, (size_t)context);
}
qsort_r(data, n, sizeof(void*), (void*)number_of_bytes_to_compare, &my_comparison_func);

Is there a reason you can't null-terminate your buffers?
If not, since you're using C++ you can write your own function object:
struct MyStrCmp {
MyStrCmp (int n): length(n) { }
inline bool operator< (char *lhs, char *rhs) {
return ::strcmp (lhs, rhs, length);
}
int length;
};
// ...
std::sort (myList.begin (), myList.end (), MyStrCmp (STR_LENGTH));

Can you pack your buffer pointer + length into a structure and pass a pointer of that structure as void *?

You could use a hack like:
int buffcmp(const void *b1, const void *b2)
{
static int bsize=-1;
if(b2==NULL) {bsize=*(int*)(b1); return 0;}
return memcmp(b1, b2, idsize);
}
which you would first call as buffcmp(&bsize, NULL) and then pass it as the comparison function to qsort.
You could of course make the comparison behave more naturally in the case of buffcmp(NULL, NULL) etc by adding more if statements.

You could functors (give the length to the functor's constructor) or Boost.Lambda (use the length in-place).

I'm not clear on what you're asking. But I'll try, assuming that
You have a single buffer
You have an array of pointers of some kind which has been processed in some way so that some or all of its contents point into the buffer
That is code equivalent to:
char *buf = (char*)malloc(sizeof(char)*bufsize);
for (int i=0; i<bufsize; ++i){
buf[i] = some_cleverly_chosen_value(i);
}
char *ary[arraysize] = {0};
for(int i=0; i<arraysize; ++i){
ary[i] = buf + some_clever_function(i);
}
/* ...do the sort here */
Now if you control the allocation of the buffer, you could substitute
char *buf = (char*)malloc(sizeof(char)*(bufsize+1));
buf[bufsize]='\0';
and go ahead using strcmp. This may be possible even if you don't control the filling of the buffer.
If you have to live with a buffer handed you by someone else you can
Use some global storage (which you asked to avoid and good thinking).
Hand the sort function something more complicated than a raw pointer (the address of a struct or class that supports the extra data). For this you need to control the deffinition of ary in the above code.
Use a sort function which supports an extra input. Either sort_r as suggested by Adam, or a home-rolled solution (which I do recommend as an exercise for the student, and don't recommend in real life). In either case the extra data is probably a pointer to the end of the buffer.

memcmp should stop on the first byte that is unequal, so the length should be large, i.e. to-the-end-of-the-buffer. Then the only way it can return zero is if it does go to the end of the buffer.
(BTW, I lean toward merge sort myself. It's stable and well-behaved.)

Related

Custom implementation of a bool vector with bit representation - how to implement operator[]

Disclaimer - this is a school assignment, however the problem is still interesting I hope!
I have implemented a custom class called Vector<bool>, which stores the bool entries as bits in an array of numbers.
Everything has gone fine except for implementing this:
bool& operator[](std::size_t index) {
validate_bounds(index);
???
}
The const implementation is quite straight forward, just reading out the value. Here however I can't really understand what to do, and the course is a specialization course on C++ so I'm guessing I should do some type-deffing or something. The data is represented by an array of type unsigned int and should be dynamic (e.g. push_back(bool value) should be implemented).
I solved this implementing a proxy class:
class BoolVectorProxy {
public:
explicit BoolVectorProxy(unsigned int& reference, unsigned char index) {
this->reference = &reference;
this->index = index;
}
void operator=(const bool v) {
if (v) *reference |= 1 << index;
else *reference &= ~(1 << index);
}
operator bool() const {
return (*reference >> index) & 1;
}
private:
unsigned int* reference;
unsigned char index;
};
And inside the main class:
BoolVectorProxy operator[](std::size_t index) {
validate_bound(index);
return BoolVectorProxy(array[index / BLOCK_CAPACITY], index % BLOCK_CAPACITY);
}
I also use Catch as a testing library, the code passes this test:
TEST_CASE("access and assignment with brackets", "[Vector]") {
Vector<bool> a(10);
a[0] = true;
a[0] = false;
REQUIRE(!a[0]);
a[1] = true;
REQUIRE(a[1]);
const Vector<bool> &b = a;
REQUIRE(!b[0]);
REQUIRE(b[1]);
a[0] = true;
REQUIRE(a[0]);
REQUIRE(b[0]);
REQUIRE(b.size() == 10);
REQUIRE_THROWS(a[-1]);
REQUIRE_THROWS(a[10]);
REQUIRE_THROWS(b[-1]);
REQUIRE_THROWS(b[10]);
}
If anyone finds any issues or improvements that can be made, please comment, thanks!
Basically implementing operator[] is the same as implementing const operator[] as you might expect, it's just that one is writable (lvalue) and the other is read only (rvalue).
I think you've got a understanding of the problem : you can convert an unsigned int into a bool using bitwise operations, and you can also say "if the nth bool is modified in X, do a bitwise operation with X and it's done !". But this operator means : I want a lvalue of the bool so I can modify it whenever I want and have an impact on the integer associated. It means that you want a reference of a bool, or in your case a reference of a single bit, so you can modify that bit on the fly. Unfortunately you can't reference a single bit, the smallest you can do is a whole byte (with char), so you would have to take a chunk of at least 7 other booleans with you. That's not what you want.
That being said, I understand that it might be for your assignment, but converting bools into multiple unsigned int is more like useless C optimization to me. You would be better with having a single array of bools (C-style), and doing the memory handling manually, because that is almost what you are doing. Plus with that method, you would actually be able to reference one single boolean (and be able to modify it) without touching the others. Is it mandatory that you have to use an array of unsigned int for this assignment ?

segmentation fault using q sort?

I am trying to sort a pointer array of characters using qsort and keep getting a segmentation fault when I compile. I will post the code for my qsort call and the compare function and any help would be greatly appreciated.
//count declaration
size_t count = (sizeof (strPtrsQsort)/sizeof (*strPtrsQsort));
//function call
qsort ((char *)ptr, size, sizeof(char), compare);
//compare function
int compare (const void *a, const void *b)
{
const char **ia = (const char **)a;
const char **ib = (const char **)b;
return strcmp (*ia, *ib);
}
Judging by your qsort call, you are sorting an array of char elements: the base pointer type is passed to qsort as char * value and the element size is sizeof(char). However, your comparison function is written for an array of pointers to char. That's completely incorrect and inconsistent. That is what is causing the crash.
In the accompanying text you state that you are "trying to sort a pointer array of characters". Why in that case are you specifying the element size as sizeof(char) and not as, say, sizeof (char *)?
Note that even when you're required to work with C-style raw arrays you can still use C++ STL algorithms, since pointers are in fact RandomAccessIterators. For example, this works:
#include <algorithm>
#include <iostream>
#include <cstring>
static
bool compare(const char *a, const char *b)
{
return std::strcmp(a, b) < 0;
}
int main()
{
const char *stringarray[] = {
"zyxulsusd",
"abcdef",
"asdf"
};
std::sort(stringarray, stringarray + 3, compare);
// -----------^
// Just like a normal iterator the end iterator points
// to an imaginary element behind the data.
for(int i = 0; i < 3; i++) {
std::cout << stringarray[i] << std::endl;
}
return 0;
}
The primary advantage of this approach is type safety and it avoids thus most pitfalls common with C-style functions like qsort.

C++ variable length arrays in struct

I am writing a program for creating, sending, receiving and interpreting ARP packets. I have a structure representing the ARP header like this:
struct ArpHeader
{
unsigned short hardwareType;
unsigned short protocolType;
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
unsigned short operationCode;
unsigned char senderHardwareAddress[6];
unsigned char senderProtocolAddress[4];
unsigned char targetHardwareAddress[6];
unsigned char targetProtocolAddress[4];
};
This only works for hardware addresses with length 6 and protocol addresses with length 4. The address lengths are given in the header as well, so to be correct the structure would have to look something like this:
struct ArpHeader
{
unsigned short hardwareType;
unsigned short protocolType;
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
unsigned short operationCode;
unsigned char senderHardwareAddress[hardwareAddressLength];
unsigned char senderProtocolAddress[protocolAddressLength];
unsigned char targetHardwareAddress[hardwareAddressLength];
unsigned char targetProtocolAddress[protocolAddressLength];
};
This obviously won't work since the address lengths are not known at compile time. Template structures aren't an option either since I would like to fill in values for the structure and then just cast it from (ArpHeader*) to (char*) in order to get a byte array which can be sent on the network or cast a received byte array from (char*) to (ArpHeader*) in order to interpret it.
One solution would be to create a class with all header fields as member variables, a function to create a byte array representing the ARP header which can be sent on the network and a constructor which would take only a byte array (received on the network) and interpret it by reading all header fields and writing them to the member variables. This is not a nice solution though since it would require a LOT more code.
In contrary a similar structure for a UDP header for example is simple since all header fields are of known constant size. I use
#pragma pack(push, 1)
#pragma pack(pop)
around the structure declaration so that I can actually do a simple C-style cast to get a byte array to be sent on the network.
Is there any solution I could use here which would be close to a structure or at least not require a lot more code than a structure?
I know the last field in a structure (if it is an array) does not need a specific compile-time size, can I use something similar like that for my problem? Just leaving the sizes of those 4 arrays empty will compile, but I have no idea how that would actually function. Just logically speaking it cannot work since the compiler would have no idea where the second array starts if the size of the first array is unknown.
You want a fairly low level thing, an ARP packet, and you are trying to find a way to define a datastructure properly so you can cast the blob into that structure. Instead, you can use an interface over the blob.
struct ArpHeader {
mutable std::vector<uint8_t> buf_;
template <typename T>
struct ref {
uint8_t * const p_;
ref (uint8_t *p) : p_(p) {}
operator T () const { T t; memcpy(&t, p_, sizeof(t)); return t; }
T operator = (T t) const { memcpy(p_, &t, sizeof(t)); return t; }
};
template <typename T>
ref<T> get (size_t offset) const {
if (offset + sizeof(T) > buf_.size()) throw SOMETHING;
return ref<T>(&buf_[0] + offset);
}
ref<uint16_t> hwType() const { return get<uint16_t>(0); }
ref<uint16_t> protType () const { return get<uint16_t>(2); }
ref<uint8_t> hwAddrLen () const { return get<uint8_t>(4); }
ref<uint8_t> protAddrLen () const { return get<uint8_t>(5); }
ref<uint16_t> opCode () const { return get<uint16_t>(6); }
uint8_t *senderHwAddr () const { return &buf_[0] + 8; }
uint8_t *senderProtAddr () const { return senderHwAddr() + hwAddrLen(); }
uint8_t *targetHwAddr () const { return senderProtAddr() + protAddrLen(); }
uint8_t *targetProtAddr () const { return targetHwAddr() + hwAddrLen(); }
};
If you need const correctness, you remove mutable, create a const_ref, and duplicate the accessors into non-const versions, and make the const versions return const_ref and const uint8_t *.
Short answer: you just cannot have variable-sized types in C++.
Every type in C++ must have a known (and stable) size during compilation. IE operator sizeof() must give a consistent answer. Note, you can have types that hold variable amount of data (eg: std::vector<int>) by using the heap, yet the size of the actual object is always constant.
So, you can never produce a type declaration that you would cast and get the fields magically adjusted. This goes deeply into the fundamental object layout - every member (aka field) must have a known (and stable) offset.
Usually, the issue have is solved by writing (or generating) member functions that parse the input data and initialize the object's data. This is basically the age-old data serialization problem, which has been solved countless times in the last 30 or so years.
Here is a mockup of a basic solution:
class packet {
public:
// simple things
uint16_t hardware_type() const;
// variable-sized things
size_t sender_address_len() const;
bool copy_sender_address_out(char *dest, size_t dest_size) const;
// initialization
bool parse_in(const char *src, size_t len);
private:
uint16_t hardware_type_;
std::vector<char> sender_address_;
};
Notes:
the code above shows the very basic structure that would let you do the following:
packet p;
if (!p.parse_in(input, sz))
return false;
the modern way of doing the same thing via RAII would look like this:
if (!packet::validate(input, sz))
return false;
packet p = packet::parse_in(input, sz); // static function
// returns an instance or throws
If you want to keep access to the data simple and the data itself public, there is a way to achieve what you want without changing the way you access data. First, you can use std::string instead of the char arrays to store the addresses:
#include <string>
using namespace std; // using this to shorten notation. Preferably put 'std::'
// everywhere you need it instead.
struct ArpHeader
{
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
string senderHardwareAddress;
string senderProtocolAddress;
string targetHardwareAddress;
string targetProtocolAddress;
};
Then, you can overload the conversion operator operator const char*() and the constructor arpHeader(const char*) (and of course operator=(const char*) preferably too), in order to keep your current sending/receiving functions working, if that's what you need.
A simplified conversion operator (skipped some fields, to make it less complicated, but you should have no problem in adding them back), would look like this:
operator const char*(){
char* myRepresentation;
unsigned char mySize
= 2+ senderHardwareAddress.length()
+ senderProtocolAddress.length()
+ targetHardwareAddress.length()
+ targetProtocolAddress.length();
// We need to store the size, since it varies
myRepresentation = new char[mySize+1];
myRepresentation[0] = mySize;
myRepresentation[1] = hardwareAddressLength;
myRepresentation[2] = protocolAddressLength;
unsigned int offset = 3; // just to shorten notation
memcpy(myRepresentation+offset, senderHardwareAddress.c_str(), senderHardwareAddress.size());
offset += senderHardwareAddress.size();
memcpy(myRepresentation+offset, senderProtocolAddress.c_str(), senderProtocolAddress.size());
offset += senderProtocolAddress.size();
memcpy(myRepresentation+offset, targetHardwareAddress.c_str(), targetHardwareAddress.size());
offset += targetHardwareAddress.size();
memcpy(myRepresentation+offset, targetProtocolAddress.c_str(), targetProtocolAddress.size());
return myRepresentation;
}
While the constructor can be defined as such:
ArpHeader& operator=(const char* buffer){
hardwareAddressLength = buffer[1];
protocolAddressLength = buffer[2];
unsigned int offset = 3; // just to shorten notation
senderHardwareAddress = string(buffer+offset, hardwareAddressLength);
offset += hardwareAddressLength;
senderProtocolAddress = string(buffer+offset, protocolAddressLength);
offset += protocolAddressLength;
targetHardwareAddress = string(buffer+offset, hardwareAddressLength);
offset += hardwareAddressLength;
targetProtocolAddress = string(buffer+offset, protocolAddressLength);
return *this;
}
ArpHeader(const char* buffer){
*this = buffer; // Re-using the operator=
}
Then using your class is as simple as:
ArpHeader h1, h2;
h1.hardwareAddressLength = 3;
h1.protocolAddressLength = 10;
h1.senderHardwareAddress = "foo";
h1.senderProtocolAddress = "something1";
h1.targetHardwareAddress = "bar";
h1.targetProtocolAddress = "something2";
cout << h1.senderHardwareAddress << ", " << h1.senderProtocolAddress
<< " => " << h1.targetHardwareAddress << ", " << h1.targetProtocolAddress << endl;
const char* gottaSendThisSomewhere = h1;
h2 = gottaSendThisSomewhere;
cout << h2.senderHardwareAddress << ", " << h2.senderProtocolAddress
<< " => " << h2.targetHardwareAddress << ", " << h2.targetProtocolAddress << endl;
delete[] gottaSendThisSomewhere;
Which should offer you the utility needed, and keep your code working without changing anything out of the class.
Note however that if you're willing to change the rest of the code a bit (talking here about the one you've written already, ouside of the class), jxh's answer should work as fast as this, and is more elegant on the inner side.

Void pointer values comparing C++

My actual question is it really possible to compare values contained in two void pointers, when you actually know that these values are the same type? For example int.
void compVoids(void *firstVal, void *secondVal){
if (firstVal < secondVal){
cout << "This will not make any sense as this will compare addresses, not values" << endl;
}
}
Actually I need to compare two void pointer values, while outside the function it is known that the type is int. I do not want to use comparison of int inside the function.
So this will not work for me as well: if (*(int*)firstVal > *(int*)secondVal)
Any suggestions?
Thank you very much for help!
In order to compare the data pointed to by a void*, you must know what the type is. If you know what the type is, there is no need for a void*. If you want to write a function that can be used for multiple types, you use templates:
template<typename T>
bool compare(const T& firstVal, const T& secondVal)
{
if (firstVal < secondVal)
{
// do something
}
return something;
}
To illustrate why attempting to compare void pointers blind is not feasible:
bool compare(void* firstVal, void* secondVal)
{
if (*firstVal < *secondVal) // ERROR: cannot dereference a void*
{
// do something
}
return something;
}
So, you need to know the size to compare, which means you either need to pass in a std::size_t parameter, or you need to know the type (and really, in order to pass in the std::size_t parameter, you have to know the type):
bool compare(void* firstVal, void* secondVal, std::size_t size)
{
if (0 > memcmp(firstVal, secondVal, size))
{
// do something
}
return something;
}
int a = 5;
int b = 6;
bool test = compare(&a, &b, sizeof(int)); // you know the type!
This was required in C as templates did not exist. C++ has templates, which make this type of function declaration unnecessary and inferior (templates allow for enforcement of type safety - void pointers do not, as I'll show below).
The problem comes in when you do something (silly) like this:
int a = 5;
short b = 6;
bool test = compare(&a, &b, sizeof(int)); // DOH! this will try to compare memory outside the bounds of the size of b
bool test = compare(&a, &b, sizeof(short)); // DOH! This will compare the first part of a with b. Endianess will be an issue.
As you can see, by doing this, you lose all type safety and have a whole host of other issues you have to deal with.
It is definitely possible, but since they are void pointers you must specify how much data is to be compared and how.
The memcmp function may be what you are looking for. It takes two void pointers and an argument for the number of bytes to be compared and returns a comparison. Some comparisons, however, are not contingent upon all of the data being equal. For example: comparing the direction of two vectors ignoring their length.
This question doesn't have a definite answer unless you specify how you want to compare the data.
You need to dereference them and cast, with
if (*(int*) firstVal < *(int*) secondVal)
Why do you not want to use the int comparison inside the function, if you know that the two values will be int and that you want to compare the int values that they're pointing to?
You mentioned a comparison function for comparing data on inserts; for a comparison function, I recommend this:
int
compareIntValues (void *first, void *second)
{
return (*(int*) first - *(int*) second);
}
It follows the convention of negative if the first is smaller, 0 if they're equal, positive if the first is larger. Simply call this function when you want to compare the int data.
yes. and in fact your code is correct if the type is unsigned int. casting int values to void pointer is often used even not recommended.
Also you could cast the pointers but you have to cast them directly to the int type:
if ((int)firstVal < (int)secondVal)
Note: no * at all.
You may have address model issues doing this though if you build 32 and 64 bits. Check the intptr_t type that you could use to avoid that.
if ((intptr_t)firstVal < (intptr_t)secondVal)

Returning the first pointer of a private array (edit)

I'm trying to learn C++ and OpenGL...
I want to return the first pointer of the array... but maintaing the const correctness...
it happened that:
class Foo{
private:
GLubyte array[64][64][4];
public:
const GLubyte& get_array(){return array;}
}
gives me this compiler error:
:28: error: invalid initialization of reference of type 'const GLubyte&' from expression of type 'GLubyte (*)[64][4]'
can you help me out in understanding how to return the const correctness first pointer?
It has nothing to do with const correctness. If you want the first pointer... well, you need to return a pointer:
const GLubyte* get_array(){return (GLubyte*)array;}
The cast works because arrays are represented continuously in memory.
But i'm pretty sure a better solution to what you're trying to achieve can be devised with std::vector instead of C-style arrays. What exactly are you trying to do?
This has nothing to do with const correctness. You can't return an array of GLubytes as if it was a single GLubyte. You would get the same error message if you removed the const and the & (except that the error message would no longer contain the const and the & either, of course).
Edit in response to your edit: If you want to return a reference to the first element, just return the first element: return array[0][0][0];. If you want to return a pointer to the first element, return the address of the first element (return &array[0][0][0]) and change the return type to GLubyte* instead of GLubyte&.
The type of array is GLubyte * * *, so you cannot convert to a reference, the have to write
GLubyte const & get_array() const { return array[0][0][0]; }
so you get a reference to the first element of array. But if you want the pointer, you have to change your code as follow
GLubyte const * get_array() const { return &(array[0][0][0]); }
If you want something better for your C++ code, you can see also Boost.MultiArray.
If you are after some sort of multi array wrapper then I think this might be close to what you are looking for
class Foo{
private:
std::vector<GLubyte> array;
size_t x_sz, y_sz, z_sz;
public:
Foo(size_t x, size_t y=1, size_t z=1)
: array(x*y*z),
x_sz(x), y_sz(y), z_sz(z)
{}
const GLubyte& element_at(size_t x, size_t y=0, size_t z=0) const
{
return array.at(z*y_sz*x_sz+y*x_sz+x);
}
GLubyte& element_at(size_t x, size_t y=0, size_t z=0)
{
return array.at(z*y_sz*x_sz+y*x_sz+x);
}
GLubyte *data() //returns array
{
return &vector[0];
}
const GLubyte *data() const //returns const array
{
return &vector[0];
}
};