Byte to bits Operator Overloading C++ - c++

I've been writing C++ a long time and maybe it's because I don't need to do this very often, but I seem to be lacking with regard to operator overloading. I use it from time to time, but never needed to do what I wanted to do recently and found it somewhat problematic.
class foo
{
public:
static const size_t ARRAY_SIZE = 100000;
uint8_t& operator[](const size_t& index) { return my_array[index >> 3]; }
// problematic equality operator
bool operator==(const size_t& index) const { return my_array[index >> 3] & (1 << (index & 7)); }
//
// Need an assignment operator to do:
// my_array[index >> 3] |= 1 << (index & 7);
// ^------------------^ might not needed as it's returned from [] operator
private:
std::array<uint8_t, (ARRAY_SIZE >> 3) + ((ARRAY_SIZE & 7) ? 1 : 0)> my_array;
};
Now as you can see from the above, what is being done here is to take a size_t number and store it in it's relative bit position. So, 5 for instance would be stored in bit 4 of byte 0 and 9 would be stored in bit 1 of byte 1 in the array etc.
Now the subscript operator works fine and returns the correct byte from the array, but that left the problem of things like this:
if (foo[n]) // where n is a size_t integer representing a bit position
It then dawned on me that the above is an abbreviated form of:
if (foo[n] == true)
and so that led to me writing the above equality operator, but for some reason I don't understand, the operator isn't called. I thought it would have been called following the subscript operator, or is it not called because it's not an object of type foo anymore? What's the best way to fix this? Is it to write an external operator== and make it a friend of foo?
Oh and some pointers regarding the construction of the assignment operator would be appreciated too. Thanks very much...
EDIT:
Thanks for all the help people. I do think it's incredibly harsh to get downvoted for asking a question about something I didn't quite understand. It's not like it was a stupid question or anything and if you re-read my original question properly, I did actually question that foo might not be the correct type after the subscript operator, that a few of you have pointed out. Anyway, here's a bit more context. I haven't had chance to properly study all the great replies...
I did originally write the operator like this, which did actually return the correct bit from the array. Something someone has already pointed out.
bool operator[](const size_t index) const { return my_array[index >> 3] & (1 << (index & 7)); }
What I then had a problem with was setting the bits in the array:
foo f;
if (f[3]) // this is fine
But doing something like:
f[6] = true;
I guess what I was hoping for was a more elegant way of doing this than writing the following:-
class Foo
{
public:
static const size_t MAX_LIST_SIZE = 100000;
bool get(const size_t index) const { return my_array[index >> 3] & (1 << (index & 7)); }
void set(const size_t index) { my_array[index >> 3] |= 1 << (index & 7); }
private:
std::array<uint8_t, ((MAX_LIST_SIZE >> 3) + ((MAX_LIST_SIZE & 7) ? 1 : 0))> my_array;
}
and then using the class like this:
Foo f
f.set(10);
if (f.get(10))
...
I just thought it would be easier to overload the operators, but from the look of it, it seems more cumbersome. (Oh and someone asked why I used uint8_t rather than bool, well this is because on this particular platform, bool is actually 32bits!)

Here we have several deep-ish misunderstandings.
Now the subscript operator works fine and returns the correct byte
from the array, but that left the problem of things like this:
if (foo[n]) // where n is a size_t integer representing a bit position
Your problem here is not the if per se; it's that you are returning the wrong thing. If you are building a packed bit set, your operator[] should just return the value of the bit at the requested position. So:
bool operator[](size_t index) { return (my_array[index >> 3]) & (1<<(index&7)); }
and here your if, as well as any other operation involving your operator[], will work as expected.
It then dawned on me that the above is an abbreviated form of:
if (foo[n] == true)
It is not. if evaluates the expression insides the parentheses, and (essentially) casts it to a boolean; if the result is true, it executes the branch, otherwise it does not.
and so that led to me writing the above equality operator, but for some reason I don't understand, the operator isn't called.
The operator isn't called because:
as explained above, the operator== is never involved in if (foo[n]);
even if you explicitly wrote if (foo[n]==true), your operator wouldn't be invoked, because once your operator[] returns, foo is no longer involved.
Think about it: even in your "original" operator[] you return a reference to uint8_t. The statement:
if (a[n] == true)
(with a being of type foo)
is effectively the same as:
uint8_t &temp = a[n];
if (temp == true)
Now, in the expression temp == true the type of a is never mentioned - there's only temp, which is an uint8_t&, independently of how it was ever obtained, and true, a bool literal. Your operator== would be considered if you were comparing a with a size_t, but that would make no sense.
Finally, about your comment:
// Need an assignment operator to do:
// my_array[index >> 3] |= 1 << (index & 7);
// ^------------------^ might not needed as it's returned from [] operator
this, again, won't work for the exact same reason - you need an operator overload to work on the return value of operator[], not on the foo class itself.
This is generally accomplished by having operator[] return not the value itself, but a proxy object, which remembers its parent and the requested index, and provides its own operator== and operator= that perform what you were trying to put straight in the foo class (along with extra operators that make it possible to it to pass for a reference to a boolean).
Something like:
struct PackedBitVector {
static const size_t ARRAY_SIZE = 100000;
struct ElementProxy {
PackedBitVector &parent;
size_t idx;
operator bool() const { return parent.data[idx>>3] & (1<<(idx&7)) }
bool operator==(bool other) const { return bool(*this) == other; }
bool operator!=(bool other) const { return !(*this == other); }
ElementProxy &operator=(bool other) {
if(other) parent.data[idx>>3] |= 1<<(idx&7);
else parent.data[idx>>3] &= ~(1<<(idx&7));
return *this;
}
}:
ElementProxy operator[](size_t index) { return ElementProxy{*this, index}; }
private:
std::array<uint8_t, (ARRAY_SIZE >> 3) + ((ARRAY_SIZE & 7) ? 1 : 0)> data;
};
To make this work in general you'd have to add a full bucket of other operators, so that this proxy object could credibly pass as a reference to a bool, which is what std::vector<bool> does.
About this, from your remark about bool being 32 bit wide on your platform you seem not to know that std::vector<bool> already sports this "packed bit array" space optimization, so you could directly use it, without reimplementing a broken version of the real thing.

Related

Conversion of uintptr_t into bool slows down SSO benchmark several times

Consider the following class that implements (very basically for the sake of MCVE) small string optimization (assuming little endian, 64 bit pointers, etc.):
class String {
char* data_;
bool sso() const { return reinterpret_cast<uintptr_t>(data_) & 1; }
public:
String(const char * arg = "") {
auto len = strlen(arg);
if (len > 6) {
data_ = new char[len + 1];
memcpy(data_, arg, len + 1);
}
else {
data_ = reinterpret_cast<char*>((uintptr_t)1);
memcpy(reinterpret_cast<char*>(&data_) + 1, arg, len + 1);
}
}
~String() { if (sso() == false) delete data_; }
// ~String() { if (reinterpret_cast<uintptr_t>(data_) & 1 == 0) delete data_; }
};
Note that there are 2 versions of the destructor. When I measured the difference between these 2 versions with Quick C++ Benchmark:
static void CreateShort(benchmark::State& state) {
for (auto _ : state) {
String s("hello");
benchmark::DoNotOptimize(s);
}
}
I got 5.7 times faster running time in the second case with GCC. I don't understand why the compiler cannot generate the same optimized assembly here. What hinders compiler optimizations in case the result of bitwise AND operation is additionally converted into bool? (Though I am not an assembler expert, I can see some differences in the assembly outputs for both variants, but cannot figure out why there are there.)
Benchmark link: http://quick-bench.com/wZhYuffRc1LMwFJ4rx4Xxy330Sw
Godbolt link: https://godbolt.org/z/dAUI_u
With Clang, there is no difference and both variants are fast.
The problem is with conversion to bool, not with inlining. The destructor of the following form causes the same problem:
~String() { if ((bool)(reinterpret_cast<uintptr_t>(data_) & 1) == false) delete data_; }
For this code:
if (reinterpret_cast<uintptr_t>(data_) & 1 == 0) delete data_;
it can be optimized out entirely: 1 == 0 is always 0, and x & 0 is always false for all x. The first case is slower because it is actually doing something.
I suppose you meant:
if ( (reinterpret_cast<uintptr_t>(data_) & 1) == 0) delete data_;
A mnemonic I use for precedence of & | is to recall that in precursors of C, there were not separate operators & and &&; the & operator fulfilled both roles (and you manually convert to boolean range if you wanted a logical comparison). So x == y & z == w was normal code for checking if those two equalities held.
When && was introduced, in order to avoid breaking existing code, && was given lower precedence than &; but & remained unchanged, below ==.
The C++ language did not alter these precedences either, presumably this was purposeful to minimize incompatibilities between the two languages.

Custom implementation of a bool vector with bit representation - how to implement operator[]

Disclaimer - this is a school assignment, however the problem is still interesting I hope!
I have implemented a custom class called Vector<bool>, which stores the bool entries as bits in an array of numbers.
Everything has gone fine except for implementing this:
bool& operator[](std::size_t index) {
validate_bounds(index);
???
}
The const implementation is quite straight forward, just reading out the value. Here however I can't really understand what to do, and the course is a specialization course on C++ so I'm guessing I should do some type-deffing or something. The data is represented by an array of type unsigned int and should be dynamic (e.g. push_back(bool value) should be implemented).
I solved this implementing a proxy class:
class BoolVectorProxy {
public:
explicit BoolVectorProxy(unsigned int& reference, unsigned char index) {
this->reference = &reference;
this->index = index;
}
void operator=(const bool v) {
if (v) *reference |= 1 << index;
else *reference &= ~(1 << index);
}
operator bool() const {
return (*reference >> index) & 1;
}
private:
unsigned int* reference;
unsigned char index;
};
And inside the main class:
BoolVectorProxy operator[](std::size_t index) {
validate_bound(index);
return BoolVectorProxy(array[index / BLOCK_CAPACITY], index % BLOCK_CAPACITY);
}
I also use Catch as a testing library, the code passes this test:
TEST_CASE("access and assignment with brackets", "[Vector]") {
Vector<bool> a(10);
a[0] = true;
a[0] = false;
REQUIRE(!a[0]);
a[1] = true;
REQUIRE(a[1]);
const Vector<bool> &b = a;
REQUIRE(!b[0]);
REQUIRE(b[1]);
a[0] = true;
REQUIRE(a[0]);
REQUIRE(b[0]);
REQUIRE(b.size() == 10);
REQUIRE_THROWS(a[-1]);
REQUIRE_THROWS(a[10]);
REQUIRE_THROWS(b[-1]);
REQUIRE_THROWS(b[10]);
}
If anyone finds any issues or improvements that can be made, please comment, thanks!
Basically implementing operator[] is the same as implementing const operator[] as you might expect, it's just that one is writable (lvalue) and the other is read only (rvalue).
I think you've got a understanding of the problem : you can convert an unsigned int into a bool using bitwise operations, and you can also say "if the nth bool is modified in X, do a bitwise operation with X and it's done !". But this operator means : I want a lvalue of the bool so I can modify it whenever I want and have an impact on the integer associated. It means that you want a reference of a bool, or in your case a reference of a single bit, so you can modify that bit on the fly. Unfortunately you can't reference a single bit, the smallest you can do is a whole byte (with char), so you would have to take a chunk of at least 7 other booleans with you. That's not what you want.
That being said, I understand that it might be for your assignment, but converting bools into multiple unsigned int is more like useless C optimization to me. You would be better with having a single array of bools (C-style), and doing the memory handling manually, because that is almost what you are doing. Plus with that method, you would actually be able to reference one single boolean (and be able to modify it) without touching the others. Is it mandatory that you have to use an array of unsigned int for this assignment ?

C++ Custom std::map<> key class causing memory violation

For the first time I've written a class that is supposed to be usable as a key type for std::map<>. I've overloaded copy constructor, assignment, and operator < as suggested in other questions on SO. But for some reason it crashes when I'm trying to insert using operator []. This class is meant to hold a buffer of binary data whose length is indicated by the member m_nLen.
Here is the code :
class SomeKeyClass
{
public:
unsigned char m_buffer[ SOME_LENGTH_CONSTANT ];
size_t m_nLen;
public:
inline SomeKeyClass( const unsigned char * data, size_t nLen )
{
m_nLen = min( SOME_LENGTH_CONSTANT, nLen );
memcpy( m_buffer, data, m_nLen );
}
inline SomeKeyClass( const SomeKeyClass& oKey )
{
*this = oKey;
}
inline bool operator < ( const SomeKeyClass& oKey ) const
{
return memcmp( m_buffer, oKey.m_buffer, min( m_nLen, oKey.m_nLen ) ) < 0;
}
inline SomeKeyClass & operator = ( const SomeKeyClass& oKey )
{
memcpy( m_buffer, oKey.m_buffer, oKey.m_nLen );
return *this;
}
};
Is there anything wrong with this class? Could I use std::string<unsigned char> for using binary data as keys instead?
The issue is that you were not setting the m_nLen member in the copy constructor or the assignment operator. Thus whenever you use the object that has the uninitialized or wrong m_nLen value, things may go wrong leading to possible crashes (in general, undefined behavior).
When implementing a user-defined copy constructor and assignment operator, you should strive to make sure that what comes out at the end is an actual copy of the object in question (reference counted objects are a special case, but it still implies that a copy is being done). Otherwise, programs that produce incomplete or wrong copies of the object are very fragile, and an awful burden to debug.
See Paul McKenzie's answer for the reason it crashes.
Is there anything wrong with this class ?
Yes, your operator< is broken.
Consider the case where you have one key "abc" and another key "abcd", your less-than operator will say they are equivalent, because you only test the first 3 characters.
A correct implementation needs to compare the lengths when memcmp says they are equal, because the memcmp call doesn't necessarily compare the full strings:
bool operator<(const SomeKeyClass& oKey) const
{
const std::size_t len = std::min(m_nLen, oKey.m_nLen);
if (len > 0)
{
const int cmp = memcmp(m_buffer, oKey.m_buffer, len);
if (cmp != 0)
return cmp < 0;
}
return m_nLen < oKey.m_nLen;
}

2 Nibbles Struct, assign operator and bitset. How to?

I have an old device which send thru a serial port a large array of 7-bit bytes (the most significant bit is always 0). These bytes are sent splitted in two nibbles, so a byte 0abcdefg is received as 00000abc 0000defg.
So i created a DoubleByte struct to store each of them:
struct Dbyte {
BYTE h;
BYTE l;
};
Now, i have 2 problems:
1) what i want to do is to access each Dbyte as a normal, basic type variable, so i tried to override some operator:
struct Dbyte {
BYTE h;
BYTE l;
Dbyte& operator =(const Dbyte& a) {
l = a.l; h = a.h;
return *this;
}
Dbyte& operator =(const int& i) {
l = i&0x0F; h = i>>4 & 0x07;
return *this;
}
int operator = (const Dbyte& a) const {
return (int)(a.h<<4 + a.l);
}
};
With my code i can do:
Dbyte db1, db2;
db1 = 20;
db2 = db1;
but i can't compile
int i = db1;
What i'm doing wrong?
2)
Some of these Dbytes are bitfields (or the low nibble is a value and the high nibble is a bitfield).
There's a handy way to access them as bit? I've just learned the existence of std::bitset but i still don't know how to use them, and in my dreams i would like to write something like
if (db1[6]==true) ...
or
if (db1.bit[6]==true) ...
There's a way to obtain something like this?
If this matters, i'm working with C++Builder 2006, but the project could be migrated in a QTQuick app, so i'd prefere a "stardard c++" solution.
Thanks
To convert your type to another, you need a conversion, not assignment, operator:
operator int() const {
return (h<<4) + l;
}
For the opposite conversion, you would be better off with a converting constructor, rather than an assignment operator:
DByte(int i) : l(i & 0xf), h((i>>4) & 0xf) {}
This can be used for initialising a DByte as well as assigning to it:
DByte db = 42; // not assignment - requires a constructor
Assignment still works with just this constructor, and no special assignment operator - an integer value can be converted using the constructor, then assigned with the implicit assignment operator.
Finally, there's no point writing your own copy constructor here. The implicit one does exactly the same thing.
int operator = (const Dbyte& a) const tries to be a copy-assignment operator but can't be because it's const. What you wanted to write was: operator int() const
I think your overloaded = operator for int is out of scope because you placed it in the struct for DByte. For your Dbyte[i] == true concept, try overloading the [] operator to return a boolean.

How do I set an integer's interval?

I know this is a very noob-ish question,but how do I define an integer's interval?
If I want an integer X to be 56<= X <=1234 , how do I declare X ?
The best way would be to create your own integer class with bounds on it and overloaded operators like +, * and == basically all the ops a normal integer can have. You will have to decide the behavior when the number gets too high or too low, I'll give you a start on the class.
struct mynum {
int value;
static const int upper = 100000;
static const int lower = -100000;
operator int() {
return value;
}
explicit mynum(int v) {
value=v;
if (value > upper)value=upper;
if (value < lower)value=lower;
}
};
mynum operator +(const mynum & first, const mynum & second) {
return mynum(first.value + second.value);
}
There is a question on stackoverflow already like your question. It has a more complete version of what I was doing, it may be a little hard to digest for a beginner but it seems to be exactly what you want.