C++ Custom std::map<> key class causing memory violation - c++

For the first time I've written a class that is supposed to be usable as a key type for std::map<>. I've overloaded copy constructor, assignment, and operator < as suggested in other questions on SO. But for some reason it crashes when I'm trying to insert using operator []. This class is meant to hold a buffer of binary data whose length is indicated by the member m_nLen.
Here is the code :
class SomeKeyClass
{
public:
unsigned char m_buffer[ SOME_LENGTH_CONSTANT ];
size_t m_nLen;
public:
inline SomeKeyClass( const unsigned char * data, size_t nLen )
{
m_nLen = min( SOME_LENGTH_CONSTANT, nLen );
memcpy( m_buffer, data, m_nLen );
}
inline SomeKeyClass( const SomeKeyClass& oKey )
{
*this = oKey;
}
inline bool operator < ( const SomeKeyClass& oKey ) const
{
return memcmp( m_buffer, oKey.m_buffer, min( m_nLen, oKey.m_nLen ) ) < 0;
}
inline SomeKeyClass & operator = ( const SomeKeyClass& oKey )
{
memcpy( m_buffer, oKey.m_buffer, oKey.m_nLen );
return *this;
}
};
Is there anything wrong with this class? Could I use std::string<unsigned char> for using binary data as keys instead?

The issue is that you were not setting the m_nLen member in the copy constructor or the assignment operator. Thus whenever you use the object that has the uninitialized or wrong m_nLen value, things may go wrong leading to possible crashes (in general, undefined behavior).
When implementing a user-defined copy constructor and assignment operator, you should strive to make sure that what comes out at the end is an actual copy of the object in question (reference counted objects are a special case, but it still implies that a copy is being done). Otherwise, programs that produce incomplete or wrong copies of the object are very fragile, and an awful burden to debug.

See Paul McKenzie's answer for the reason it crashes.
Is there anything wrong with this class ?
Yes, your operator< is broken.
Consider the case where you have one key "abc" and another key "abcd", your less-than operator will say they are equivalent, because you only test the first 3 characters.
A correct implementation needs to compare the lengths when memcmp says they are equal, because the memcmp call doesn't necessarily compare the full strings:
bool operator<(const SomeKeyClass& oKey) const
{
const std::size_t len = std::min(m_nLen, oKey.m_nLen);
if (len > 0)
{
const int cmp = memcmp(m_buffer, oKey.m_buffer, len);
if (cmp != 0)
return cmp < 0;
}
return m_nLen < oKey.m_nLen;
}

Related

Reallocate array with memcpy and memset

I've taken over some code, and came across a weird reallocation of an array. This is a function from within an Array class (used by the JsonValue)
void reserve( uint32_t newCapacity ) {
if ( newCapacity > length + additionalCapacity ) {
newCapacity = std::min( newCapacity, length + std::numeric_limits<decltype( additionalCapacity )>::max() );
JsonValue *newPtr = new JsonValue[newCapacity];
if ( length > 0 ) {
memcpy( newPtr, values, length * sizeof( JsonValue ) );
memset( values, 0, length * sizeof( JsonValue ) );
}
delete[] values;
values = newPtr;
additionalCapacity = uint16_t( newCapacity - length );
}
}
I get the point of this; it is just allocating a new array, and doing a copy of the memory contents from the old array into the new array, then zero-ing out the old array's contents. I also know this was done in order to prevent calling destructors, and moves.
The JsonValue is a class with functions, and some data which is stored in a union (string, array, number, etc.).
My concern is whether this is actually defined behaviour or not. I know it works, and has not had a problem since we began using it a few months ago; but if its undefined then it doesn't mean it is going to keep working.
EDIT:
JsonValue looks something like this:
struct JsonValue {
// …
~JsonValue() {
switch ( details.type ) {
case Type::Array:
case Type::Object:
array.destroy();
break;
case Type::String:
delete[] string.buffer;
break;
default: break;
}
}
private:
struct Details {
Key key = Key::Unknown;
Type type = Type::Null; // (0)
};
union {
Array array;
String string;
EmbedString embedString;
Number number;
Details details;
};
};
Where Array is a wrapper around an array of JsonValues, String is a char*, EmbedString is char[14], Number is a union of int, unsigned int, and double, Details contains the type of value it holds. All values have 16-bits of unused data at the beginning, which is used for Details. Example:
struct EmbedString {
uint16_t : 16;
char buffer[14] = { 0 };
};
Whether this code has well-defined behavior basically depends on two things: 1) is JsonValue trivially-copyable and, 2) if so, are a bunch of all-zero Bytes a valid object representation for a JsonValue.
If JsonValue is trivially-copyable, then the memcpy from one array of JsonValues to another will indeed be equivalent to copying all the elements over [basic.types]/3. If all-zeroes is a valid object representation for a JsonValue, then the memset should be ok (I believe this actually falls into a bit of a grey-area with the current wording of the standard, but I believe at least the intention would be that this is fine).
I'm not sure why you'd need to "prevent calling destructors and moves", but overwriting objects with zeroes does not prevent destructors from running. delete[] values will call the destructurs of the array members. And moving the elements of an array of trivially-copyable type should compile down to just copying over the bytes anyways.
Furthermore, I would suggest to get rid of these String and EmbedString classes and simply use std::string. At least, it would seem to me that the sole purpose of EmbedString is to manually perform small string optimization. Any std::string implementation worth its salt is already going to do exactly that under the hood. Note that std::string is not guaranteed (and will often not be) trivially-copyable. Thus, you cannot simply replace String and EmbedString with std::string while keeping the rest of this current implementation.
If you can use C++17, I would suggest to simply use std::variant instead of or at least inside this custom JsonValue implementation as that seems to be exactly what it's trying to do. If you need some common information stored in front of whatever the variant value may be, just have a suitable member holding that information in front of the member that holds the variant value rather than relying on every member of the union starting with the same couple of members (which would only be well-defined if all union members are standard-layout types that keep this information in their common initial sequence [class.mem]/23).
The sole purpose of Array would seem to be to serve as a vector that zeroes memory before deallocating it for security reasons. If this is the case, I would suggest to just use an std::vector with an allocator that zeros memory before deallocating instead. For example:
template <typename T>
struct ZeroingAllocator
{
using value_type = T;
T* allocate(std::size_t N)
{
return reinterpret_cast<T*>(new unsigned char[N * sizeof(T)]);
}
void deallocate(T* buffer, std::size_t N) noexcept
{
auto ptr = reinterpret_cast<volatile unsigned char*>(buffer);
std::fill(ptr, ptr + N, 0);
delete[] reinterpret_cast<unsigned char*>(buffer);
}
};
template <typename A, typename B>
bool operator ==(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return true; }
template <typename A, typename B>
bool operator !=(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return false; }
and then
using Array = std::vector<JsonValue, ZeroingAllocator<JsonValue>>;
Note: I fill the memory via volatile unsigned char* to prevent the compiler from optimizing away the zeroing. If you need to support overaligned types, you can replace the new[] and delete[] with direct calls to ::operator new and ::operator delete (doing this will prevent the compiler from optimizing away allocations). Pre C++17, you will have to allocate a sufficiently large buffer and then manually align the pointer, e.g., using std::align…

Byte to bits Operator Overloading C++

I've been writing C++ a long time and maybe it's because I don't need to do this very often, but I seem to be lacking with regard to operator overloading. I use it from time to time, but never needed to do what I wanted to do recently and found it somewhat problematic.
class foo
{
public:
static const size_t ARRAY_SIZE = 100000;
uint8_t& operator[](const size_t& index) { return my_array[index >> 3]; }
// problematic equality operator
bool operator==(const size_t& index) const { return my_array[index >> 3] & (1 << (index & 7)); }
//
// Need an assignment operator to do:
// my_array[index >> 3] |= 1 << (index & 7);
// ^------------------^ might not needed as it's returned from [] operator
private:
std::array<uint8_t, (ARRAY_SIZE >> 3) + ((ARRAY_SIZE & 7) ? 1 : 0)> my_array;
};
Now as you can see from the above, what is being done here is to take a size_t number and store it in it's relative bit position. So, 5 for instance would be stored in bit 4 of byte 0 and 9 would be stored in bit 1 of byte 1 in the array etc.
Now the subscript operator works fine and returns the correct byte from the array, but that left the problem of things like this:
if (foo[n]) // where n is a size_t integer representing a bit position
It then dawned on me that the above is an abbreviated form of:
if (foo[n] == true)
and so that led to me writing the above equality operator, but for some reason I don't understand, the operator isn't called. I thought it would have been called following the subscript operator, or is it not called because it's not an object of type foo anymore? What's the best way to fix this? Is it to write an external operator== and make it a friend of foo?
Oh and some pointers regarding the construction of the assignment operator would be appreciated too. Thanks very much...
EDIT:
Thanks for all the help people. I do think it's incredibly harsh to get downvoted for asking a question about something I didn't quite understand. It's not like it was a stupid question or anything and if you re-read my original question properly, I did actually question that foo might not be the correct type after the subscript operator, that a few of you have pointed out. Anyway, here's a bit more context. I haven't had chance to properly study all the great replies...
I did originally write the operator like this, which did actually return the correct bit from the array. Something someone has already pointed out.
bool operator[](const size_t index) const { return my_array[index >> 3] & (1 << (index & 7)); }
What I then had a problem with was setting the bits in the array:
foo f;
if (f[3]) // this is fine
But doing something like:
f[6] = true;
I guess what I was hoping for was a more elegant way of doing this than writing the following:-
class Foo
{
public:
static const size_t MAX_LIST_SIZE = 100000;
bool get(const size_t index) const { return my_array[index >> 3] & (1 << (index & 7)); }
void set(const size_t index) { my_array[index >> 3] |= 1 << (index & 7); }
private:
std::array<uint8_t, ((MAX_LIST_SIZE >> 3) + ((MAX_LIST_SIZE & 7) ? 1 : 0))> my_array;
}
and then using the class like this:
Foo f
f.set(10);
if (f.get(10))
...
I just thought it would be easier to overload the operators, but from the look of it, it seems more cumbersome. (Oh and someone asked why I used uint8_t rather than bool, well this is because on this particular platform, bool is actually 32bits!)
Here we have several deep-ish misunderstandings.
Now the subscript operator works fine and returns the correct byte
from the array, but that left the problem of things like this:
if (foo[n]) // where n is a size_t integer representing a bit position
Your problem here is not the if per se; it's that you are returning the wrong thing. If you are building a packed bit set, your operator[] should just return the value of the bit at the requested position. So:
bool operator[](size_t index) { return (my_array[index >> 3]) & (1<<(index&7)); }
and here your if, as well as any other operation involving your operator[], will work as expected.
It then dawned on me that the above is an abbreviated form of:
if (foo[n] == true)
It is not. if evaluates the expression insides the parentheses, and (essentially) casts it to a boolean; if the result is true, it executes the branch, otherwise it does not.
and so that led to me writing the above equality operator, but for some reason I don't understand, the operator isn't called.
The operator isn't called because:
as explained above, the operator== is never involved in if (foo[n]);
even if you explicitly wrote if (foo[n]==true), your operator wouldn't be invoked, because once your operator[] returns, foo is no longer involved.
Think about it: even in your "original" operator[] you return a reference to uint8_t. The statement:
if (a[n] == true)
(with a being of type foo)
is effectively the same as:
uint8_t &temp = a[n];
if (temp == true)
Now, in the expression temp == true the type of a is never mentioned - there's only temp, which is an uint8_t&, independently of how it was ever obtained, and true, a bool literal. Your operator== would be considered if you were comparing a with a size_t, but that would make no sense.
Finally, about your comment:
// Need an assignment operator to do:
// my_array[index >> 3] |= 1 << (index & 7);
// ^------------------^ might not needed as it's returned from [] operator
this, again, won't work for the exact same reason - you need an operator overload to work on the return value of operator[], not on the foo class itself.
This is generally accomplished by having operator[] return not the value itself, but a proxy object, which remembers its parent and the requested index, and provides its own operator== and operator= that perform what you were trying to put straight in the foo class (along with extra operators that make it possible to it to pass for a reference to a boolean).
Something like:
struct PackedBitVector {
static const size_t ARRAY_SIZE = 100000;
struct ElementProxy {
PackedBitVector &parent;
size_t idx;
operator bool() const { return parent.data[idx>>3] & (1<<(idx&7)) }
bool operator==(bool other) const { return bool(*this) == other; }
bool operator!=(bool other) const { return !(*this == other); }
ElementProxy &operator=(bool other) {
if(other) parent.data[idx>>3] |= 1<<(idx&7);
else parent.data[idx>>3] &= ~(1<<(idx&7));
return *this;
}
}:
ElementProxy operator[](size_t index) { return ElementProxy{*this, index}; }
private:
std::array<uint8_t, (ARRAY_SIZE >> 3) + ((ARRAY_SIZE & 7) ? 1 : 0)> data;
};
To make this work in general you'd have to add a full bucket of other operators, so that this proxy object could credibly pass as a reference to a bool, which is what std::vector<bool> does.
About this, from your remark about bool being 32 bit wide on your platform you seem not to know that std::vector<bool> already sports this "packed bit array" space optimization, so you could directly use it, without reimplementing a broken version of the real thing.

Custom implementation of a bool vector with bit representation - how to implement operator[]

Disclaimer - this is a school assignment, however the problem is still interesting I hope!
I have implemented a custom class called Vector<bool>, which stores the bool entries as bits in an array of numbers.
Everything has gone fine except for implementing this:
bool& operator[](std::size_t index) {
validate_bounds(index);
???
}
The const implementation is quite straight forward, just reading out the value. Here however I can't really understand what to do, and the course is a specialization course on C++ so I'm guessing I should do some type-deffing or something. The data is represented by an array of type unsigned int and should be dynamic (e.g. push_back(bool value) should be implemented).
I solved this implementing a proxy class:
class BoolVectorProxy {
public:
explicit BoolVectorProxy(unsigned int& reference, unsigned char index) {
this->reference = &reference;
this->index = index;
}
void operator=(const bool v) {
if (v) *reference |= 1 << index;
else *reference &= ~(1 << index);
}
operator bool() const {
return (*reference >> index) & 1;
}
private:
unsigned int* reference;
unsigned char index;
};
And inside the main class:
BoolVectorProxy operator[](std::size_t index) {
validate_bound(index);
return BoolVectorProxy(array[index / BLOCK_CAPACITY], index % BLOCK_CAPACITY);
}
I also use Catch as a testing library, the code passes this test:
TEST_CASE("access and assignment with brackets", "[Vector]") {
Vector<bool> a(10);
a[0] = true;
a[0] = false;
REQUIRE(!a[0]);
a[1] = true;
REQUIRE(a[1]);
const Vector<bool> &b = a;
REQUIRE(!b[0]);
REQUIRE(b[1]);
a[0] = true;
REQUIRE(a[0]);
REQUIRE(b[0]);
REQUIRE(b.size() == 10);
REQUIRE_THROWS(a[-1]);
REQUIRE_THROWS(a[10]);
REQUIRE_THROWS(b[-1]);
REQUIRE_THROWS(b[10]);
}
If anyone finds any issues or improvements that can be made, please comment, thanks!
Basically implementing operator[] is the same as implementing const operator[] as you might expect, it's just that one is writable (lvalue) and the other is read only (rvalue).
I think you've got a understanding of the problem : you can convert an unsigned int into a bool using bitwise operations, and you can also say "if the nth bool is modified in X, do a bitwise operation with X and it's done !". But this operator means : I want a lvalue of the bool so I can modify it whenever I want and have an impact on the integer associated. It means that you want a reference of a bool, or in your case a reference of a single bit, so you can modify that bit on the fly. Unfortunately you can't reference a single bit, the smallest you can do is a whole byte (with char), so you would have to take a chunk of at least 7 other booleans with you. That's not what you want.
That being said, I understand that it might be for your assignment, but converting bools into multiple unsigned int is more like useless C optimization to me. You would be better with having a single array of bools (C-style), and doing the memory handling manually, because that is almost what you are doing. Plus with that method, you would actually be able to reference one single boolean (and be able to modify it) without touching the others. Is it mandatory that you have to use an array of unsigned int for this assignment ?

HW Seems too easy - Overloading equivalency operator as a member and as a non-member function

The assignment is to create a class that implements a dynamic cstring (null-terminated char array).The default constructor should create an empty array and there should also be an overloaded constructor that creates an array of size n.There should also be a function that will grow the array to a larger size (he said that this should be in the class but we will not utilize it until a later assignment). We're also supposed to create two versions of this class. In one version, we will overload the equivalency operator as a member function, and in the second version, we will overload the equivalency operator as a non-member function.
Something feels weird to me because this just seems way to easy.
For the member version, I set it to return true if the two class sizes were equal.
bool CSTR::operator ==(const CSTR & rhs) {
return (size == rhs.size);
}
For the non-member version, I just created a member function to return it's size as an integer and then compare them when the operator is overloaded.
bool operator ==(const CSTR2 & CSTR2_1, const CSTR2 & CSTR2_2) {
return (CSTR2_1.getSize() == CSTR2_2.getSize());
}
I'm just kind of terrified to turn this in without any outside input because this solution of mine seems way too simple compared to everything we have been going over in class. I know we are going to expand on this program for a later assignment, but if any of you see anything that I'm missing here, some input would be awesome.
Here is the code I'm trying to use to compare the cstring. Note: there is nothing in the assignment description that says anything about inputting values into the cstrings.
#include "CSTR.h"
#include <cstring>
using namespace std;
class CSTR {
public:
CSTR();
CSTR(unsigned int n);
~CSTR();
bool operator ==(const CSTR & rhs);
private:
unsigned int size;
char *elems;
bool grow(unsigned int newSize);
};
=================================
CSTR::CSTR() {
size = 0;
elems = new char [0];
}
CSTR::CSTR(unsigned int n) {
if (n > 0) {
size = n;
elems = new char [size];
}
else {
size = 0;
elems = new char [0];
}
}
CSTR::~CSTR() {
delete [] elems;
}
bool CSTR::operator ==(const CSTR & rhs) {
return ((elems == rhs.elems) == 0);
}
I've initialized two objects of CSTR with different sizes, and when I test for equivalency it is returning that they are equal.

memcmp sort

I have a single buffer, and several pointers into it. I want to sort the pointers based upon the bytes in the buffer they point at.
qsort() and stl::sort() can be given custom comparision functions. For example, if the buffer was zero-terminated I could use strcmp:
int my_strcmp(const void* a,const void* b) {
const char* const one = *(const char**)a,
const two = *(const char**)b;
return ::strcmp(one,two);
}
however, if the buffer is not zero-terminated, I have to use memcmp() which requires a length parameter.
Is there a tidy, efficient way to get the length of the buffer into my comparision function without a global variable?
With std::sort, you can use a Functor like this:
struct CompString {
CompString(int len) : m_Len(len) {}
bool operator<(const char *a, const char *b) const {
return std::memcmp(a, b, m_Len);
}
private:
int m_Len;
};
Then you can do this:
std::sort(begin(), end(), CompString(4)); // all strings are 4 chars long
EDIT: from the comment suggestions (i guess both strings are in a common buffer?):
struct CompString {
CompString (const unsigned char* e) : end(e) {}
bool operator()(const unsigned char *a, const unsigned char *b) const {
return std::memcmp(a, b, std::min(end - a, end - b)) < 0;
}
private:
const unsigned char* const end;
};
With the C function qsort(), no, there is no way to pass the length to your comparison function without using a global variable, which means it can't be done in a thread-safe manner. Some systems have a qsort_r() function (r stands for reentrant) which allows you to pass an extra context parameter, which then gets passed on to your comparison function:
int my_comparison_func(void *context, const void *a, const void *b)
{
return memcmp(*(const void **)a, *(const void **)b, (size_t)context);
}
qsort_r(data, n, sizeof(void*), (void*)number_of_bytes_to_compare, &my_comparison_func);
Is there a reason you can't null-terminate your buffers?
If not, since you're using C++ you can write your own function object:
struct MyStrCmp {
MyStrCmp (int n): length(n) { }
inline bool operator< (char *lhs, char *rhs) {
return ::strcmp (lhs, rhs, length);
}
int length;
};
// ...
std::sort (myList.begin (), myList.end (), MyStrCmp (STR_LENGTH));
Can you pack your buffer pointer + length into a structure and pass a pointer of that structure as void *?
You could use a hack like:
int buffcmp(const void *b1, const void *b2)
{
static int bsize=-1;
if(b2==NULL) {bsize=*(int*)(b1); return 0;}
return memcmp(b1, b2, idsize);
}
which you would first call as buffcmp(&bsize, NULL) and then pass it as the comparison function to qsort.
You could of course make the comparison behave more naturally in the case of buffcmp(NULL, NULL) etc by adding more if statements.
You could functors (give the length to the functor's constructor) or Boost.Lambda (use the length in-place).
I'm not clear on what you're asking. But I'll try, assuming that
You have a single buffer
You have an array of pointers of some kind which has been processed in some way so that some or all of its contents point into the buffer
That is code equivalent to:
char *buf = (char*)malloc(sizeof(char)*bufsize);
for (int i=0; i<bufsize; ++i){
buf[i] = some_cleverly_chosen_value(i);
}
char *ary[arraysize] = {0};
for(int i=0; i<arraysize; ++i){
ary[i] = buf + some_clever_function(i);
}
/* ...do the sort here */
Now if you control the allocation of the buffer, you could substitute
char *buf = (char*)malloc(sizeof(char)*(bufsize+1));
buf[bufsize]='\0';
and go ahead using strcmp. This may be possible even if you don't control the filling of the buffer.
If you have to live with a buffer handed you by someone else you can
Use some global storage (which you asked to avoid and good thinking).
Hand the sort function something more complicated than a raw pointer (the address of a struct or class that supports the extra data). For this you need to control the deffinition of ary in the above code.
Use a sort function which supports an extra input. Either sort_r as suggested by Adam, or a home-rolled solution (which I do recommend as an exercise for the student, and don't recommend in real life). In either case the extra data is probably a pointer to the end of the buffer.
memcmp should stop on the first byte that is unequal, so the length should be large, i.e. to-the-end-of-the-buffer. Then the only way it can return zero is if it does go to the end of the buffer.
(BTW, I lean toward merge sort myself. It's stable and well-behaved.)