Bit Array Program - c++

So I'm in a summer OO class and we have a test tomorrow based around this project. Basically we need to create an array that holds an unspecified amount of bits and write four functions that perform operations on this array- Set() //set bit with given index to 1, Unset() //set bit with given index to 0, Flip() // change bit (with given index) and Query() // return true if the given bit is set to 1, false otherwise.
Here's a complete description if anyone is interested: http://pastebin.com/v7BCCYjh and some sample runs: http://pastebin.com/1ijh5p7p
The problem I'm having is with the high level concept. I'm pretty sure we're meant to store byte representations of the bits in each index of the array. If that is true, then I'm completely at a loss for how to implement the functions. If anyone can give me some pointers on how to approach this (I need to have a good understanding of it by tonight because I have to write out some pseudo code for it tomorrow for a midterm) I would be much, much appreciative.
Here's my .h if it helps
// bitarray.h
//
// BitArray class declaration
#ifndef _BITARRAY_H
#define _BITARRAY_H
#include <iostream>
using namespace std;
class BitArray
{
friend ostream& operator<< (ostream& os, const BitArray& a);
friend bool operator== (const BitArray&, const BitArray&);
friend bool operator!= (const BitArray&, const BitArray&);
public:
BitArray(unsigned int n); // Construct an array that can handle n bits
BitArray(const BitArray&); // copy constructor
~BitArray(); // destructor
BitArray& operator= (const BitArray& a); // assignment operator
unsigned int Length() const; // return number of bits in bitarray
void Set (unsigned int index); // set bit with given index to 1
void Unset (unsigned int index); // set bit with given index to 0
void Flip (unsigned int index); // change bit (with given index)
bool Query (unsigned int index) const; // return true if the given bit
// is set to 1, false otherwise
private:
unsigned char* barray; // pointer to the bit array
int arraySize;
};
#endif
And my constructor:
BitArray::BitArray(unsigned int n){
int size = sizeof(char);
if(n%(8*size) != 0)
arraySize = ((n/(8*size))+1);
else
arraySize = n/(8*size);
barray = new unsigned char[arraySize];
for(int i = 0; i < arraySize; i++)
barray[i] = 0;
}

For Set() and Query(), find the position of the word that holds the bit you are interested in. (Your code seems to use char as words.) Then, find the position of the bit within this word. Create a bitmask that addresses the specific bit, you will need a shifting operator for this. Recall the bitwise operators which will finally help you do the job. Sometimes the bitwise assignment operators will be more elegant.
Do you remember the bitwise XOR operator in C++? Use this with the concept learned from Set() to implement Flip(). Use the bitwise negation operator to finally implement Unset().
Note that your way of determining the array size is overly complicated. Recall that ceil(a/b) == floor((a+b-1)/b) in the cases that can happen here.
Consider using std::vector instead of a plain array if you are allowed to. SPOILER BELOW!
There is also an interesting specialization of this class.
Impress your teacher by turning your class into a template where you can specify the actual storage unit (char, uint16_t, ...) as parameter. For starters, say typedef char WORD_TYPE and see if your code later compiles when you change the definition of WORD_TYPE.

You could treat array of integers as an array of bits.
Say, you have an array A = [0xC30FF0C3, 0xC20FF0C3], and you want to access the 53. bit.
You could find the index of an int that holds the 53. bit doing floor(53 / 32) which is 1 and the bit position within that int doing 53 % 32, which is 21.
As for the Flip function...
Well, you already have Query(), Set(), Unset().
Simple
Flip(i) {
Query(i) ? Unset(i) : Set(i)
}
would do the job.

Related

Byte to bits Operator Overloading C++

I've been writing C++ a long time and maybe it's because I don't need to do this very often, but I seem to be lacking with regard to operator overloading. I use it from time to time, but never needed to do what I wanted to do recently and found it somewhat problematic.
class foo
{
public:
static const size_t ARRAY_SIZE = 100000;
uint8_t& operator[](const size_t& index) { return my_array[index >> 3]; }
// problematic equality operator
bool operator==(const size_t& index) const { return my_array[index >> 3] & (1 << (index & 7)); }
//
// Need an assignment operator to do:
// my_array[index >> 3] |= 1 << (index & 7);
// ^------------------^ might not needed as it's returned from [] operator
private:
std::array<uint8_t, (ARRAY_SIZE >> 3) + ((ARRAY_SIZE & 7) ? 1 : 0)> my_array;
};
Now as you can see from the above, what is being done here is to take a size_t number and store it in it's relative bit position. So, 5 for instance would be stored in bit 4 of byte 0 and 9 would be stored in bit 1 of byte 1 in the array etc.
Now the subscript operator works fine and returns the correct byte from the array, but that left the problem of things like this:
if (foo[n]) // where n is a size_t integer representing a bit position
It then dawned on me that the above is an abbreviated form of:
if (foo[n] == true)
and so that led to me writing the above equality operator, but for some reason I don't understand, the operator isn't called. I thought it would have been called following the subscript operator, or is it not called because it's not an object of type foo anymore? What's the best way to fix this? Is it to write an external operator== and make it a friend of foo?
Oh and some pointers regarding the construction of the assignment operator would be appreciated too. Thanks very much...
EDIT:
Thanks for all the help people. I do think it's incredibly harsh to get downvoted for asking a question about something I didn't quite understand. It's not like it was a stupid question or anything and if you re-read my original question properly, I did actually question that foo might not be the correct type after the subscript operator, that a few of you have pointed out. Anyway, here's a bit more context. I haven't had chance to properly study all the great replies...
I did originally write the operator like this, which did actually return the correct bit from the array. Something someone has already pointed out.
bool operator[](const size_t index) const { return my_array[index >> 3] & (1 << (index & 7)); }
What I then had a problem with was setting the bits in the array:
foo f;
if (f[3]) // this is fine
But doing something like:
f[6] = true;
I guess what I was hoping for was a more elegant way of doing this than writing the following:-
class Foo
{
public:
static const size_t MAX_LIST_SIZE = 100000;
bool get(const size_t index) const { return my_array[index >> 3] & (1 << (index & 7)); }
void set(const size_t index) { my_array[index >> 3] |= 1 << (index & 7); }
private:
std::array<uint8_t, ((MAX_LIST_SIZE >> 3) + ((MAX_LIST_SIZE & 7) ? 1 : 0))> my_array;
}
and then using the class like this:
Foo f
f.set(10);
if (f.get(10))
...
I just thought it would be easier to overload the operators, but from the look of it, it seems more cumbersome. (Oh and someone asked why I used uint8_t rather than bool, well this is because on this particular platform, bool is actually 32bits!)
Here we have several deep-ish misunderstandings.
Now the subscript operator works fine and returns the correct byte
from the array, but that left the problem of things like this:
if (foo[n]) // where n is a size_t integer representing a bit position
Your problem here is not the if per se; it's that you are returning the wrong thing. If you are building a packed bit set, your operator[] should just return the value of the bit at the requested position. So:
bool operator[](size_t index) { return (my_array[index >> 3]) & (1<<(index&7)); }
and here your if, as well as any other operation involving your operator[], will work as expected.
It then dawned on me that the above is an abbreviated form of:
if (foo[n] == true)
It is not. if evaluates the expression insides the parentheses, and (essentially) casts it to a boolean; if the result is true, it executes the branch, otherwise it does not.
and so that led to me writing the above equality operator, but for some reason I don't understand, the operator isn't called.
The operator isn't called because:
as explained above, the operator== is never involved in if (foo[n]);
even if you explicitly wrote if (foo[n]==true), your operator wouldn't be invoked, because once your operator[] returns, foo is no longer involved.
Think about it: even in your "original" operator[] you return a reference to uint8_t. The statement:
if (a[n] == true)
(with a being of type foo)
is effectively the same as:
uint8_t &temp = a[n];
if (temp == true)
Now, in the expression temp == true the type of a is never mentioned - there's only temp, which is an uint8_t&, independently of how it was ever obtained, and true, a bool literal. Your operator== would be considered if you were comparing a with a size_t, but that would make no sense.
Finally, about your comment:
// Need an assignment operator to do:
// my_array[index >> 3] |= 1 << (index & 7);
// ^------------------^ might not needed as it's returned from [] operator
this, again, won't work for the exact same reason - you need an operator overload to work on the return value of operator[], not on the foo class itself.
This is generally accomplished by having operator[] return not the value itself, but a proxy object, which remembers its parent and the requested index, and provides its own operator== and operator= that perform what you were trying to put straight in the foo class (along with extra operators that make it possible to it to pass for a reference to a boolean).
Something like:
struct PackedBitVector {
static const size_t ARRAY_SIZE = 100000;
struct ElementProxy {
PackedBitVector &parent;
size_t idx;
operator bool() const { return parent.data[idx>>3] & (1<<(idx&7)) }
bool operator==(bool other) const { return bool(*this) == other; }
bool operator!=(bool other) const { return !(*this == other); }
ElementProxy &operator=(bool other) {
if(other) parent.data[idx>>3] |= 1<<(idx&7);
else parent.data[idx>>3] &= ~(1<<(idx&7));
return *this;
}
}:
ElementProxy operator[](size_t index) { return ElementProxy{*this, index}; }
private:
std::array<uint8_t, (ARRAY_SIZE >> 3) + ((ARRAY_SIZE & 7) ? 1 : 0)> data;
};
To make this work in general you'd have to add a full bucket of other operators, so that this proxy object could credibly pass as a reference to a bool, which is what std::vector<bool> does.
About this, from your remark about bool being 32 bit wide on your platform you seem not to know that std::vector<bool> already sports this "packed bit array" space optimization, so you could directly use it, without reimplementing a broken version of the real thing.

Custom implementation of a bool vector with bit representation - how to implement operator[]

Disclaimer - this is a school assignment, however the problem is still interesting I hope!
I have implemented a custom class called Vector<bool>, which stores the bool entries as bits in an array of numbers.
Everything has gone fine except for implementing this:
bool& operator[](std::size_t index) {
validate_bounds(index);
???
}
The const implementation is quite straight forward, just reading out the value. Here however I can't really understand what to do, and the course is a specialization course on C++ so I'm guessing I should do some type-deffing or something. The data is represented by an array of type unsigned int and should be dynamic (e.g. push_back(bool value) should be implemented).
I solved this implementing a proxy class:
class BoolVectorProxy {
public:
explicit BoolVectorProxy(unsigned int& reference, unsigned char index) {
this->reference = &reference;
this->index = index;
}
void operator=(const bool v) {
if (v) *reference |= 1 << index;
else *reference &= ~(1 << index);
}
operator bool() const {
return (*reference >> index) & 1;
}
private:
unsigned int* reference;
unsigned char index;
};
And inside the main class:
BoolVectorProxy operator[](std::size_t index) {
validate_bound(index);
return BoolVectorProxy(array[index / BLOCK_CAPACITY], index % BLOCK_CAPACITY);
}
I also use Catch as a testing library, the code passes this test:
TEST_CASE("access and assignment with brackets", "[Vector]") {
Vector<bool> a(10);
a[0] = true;
a[0] = false;
REQUIRE(!a[0]);
a[1] = true;
REQUIRE(a[1]);
const Vector<bool> &b = a;
REQUIRE(!b[0]);
REQUIRE(b[1]);
a[0] = true;
REQUIRE(a[0]);
REQUIRE(b[0]);
REQUIRE(b.size() == 10);
REQUIRE_THROWS(a[-1]);
REQUIRE_THROWS(a[10]);
REQUIRE_THROWS(b[-1]);
REQUIRE_THROWS(b[10]);
}
If anyone finds any issues or improvements that can be made, please comment, thanks!
Basically implementing operator[] is the same as implementing const operator[] as you might expect, it's just that one is writable (lvalue) and the other is read only (rvalue).
I think you've got a understanding of the problem : you can convert an unsigned int into a bool using bitwise operations, and you can also say "if the nth bool is modified in X, do a bitwise operation with X and it's done !". But this operator means : I want a lvalue of the bool so I can modify it whenever I want and have an impact on the integer associated. It means that you want a reference of a bool, or in your case a reference of a single bit, so you can modify that bit on the fly. Unfortunately you can't reference a single bit, the smallest you can do is a whole byte (with char), so you would have to take a chunk of at least 7 other booleans with you. That's not what you want.
That being said, I understand that it might be for your assignment, but converting bools into multiple unsigned int is more like useless C optimization to me. You would be better with having a single array of bools (C-style), and doing the memory handling manually, because that is almost what you are doing. Plus with that method, you would actually be able to reference one single boolean (and be able to modify it) without touching the others. Is it mandatory that you have to use an array of unsigned int for this assignment ?

C++ bitwise operations on structs and classes

I'm developing a generic Genetic Algorithm library, where the chromosome of each organism is its bit representation in memory. So, for instance, if I want to mutate a organism, I flip the bits themselves of the object randomly.
At first, I tried using the bitset class from the C++ standard library, but, when converting back to an object T, my only option was using the to_ullong member function, which was a problem for representations with a number of bits larger than the size of an unsigned long long.
Then I decided to create a generic library for bitwise operations on any object T, so I could apply these operations directly onto the objects themselves, instead of converting them first to a bitset.
So you can see what I'm trying to achieve, here's a function from the library:
template<typename T>
void flip(T& x, size_t const i)
{
x ^= 1 << i;
}
And it's used in the GA library like this:
template<typename T>
void GeneticAlgorithm<T>::mutate(T& organism, double const rate)
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(0, 1);
for(size_t i = 0; i < m_nBits; ++i)
if(dist(mt) <= rate)
bit::flip(organism, i);
}
It would be really nice if this worked, however now I'm getting this error message from the VC++ 2015 RC compiler:
Severity Code Description Project File Line Error C2677 binary '^': no
global operator found which takes type 'T' (or there is no acceptable
conversion) GeneticAlgorithm path\geneticalgorithm\geneticalgorithm\BitManip.hpp 57
If I correct this error for the ^, I get more for the other operators.
I haven't used bitwise operators before in my code, so I guess these operators are not supposed to be used with any object? If so, how could I work around the problem?
What you want to achieve can be done like that (see Peter Schneider's comment):
template<typename T> void flip(T& x, size_t const i) {
unsigned char* data = reinterpret_cast<unsigned char*>(&x);
data[i/8] ^= (1 << (i%8));
}
what it does is reinterpreting your data x as an array of bytes (unsigned char), then determining which byte should be flipped (i/8), then which bit within the byte (i%8).
Note: in addition, it may be safe to add at the beginning of the function:
assert(i < sizeof(T)*8)
I am under the impression that you are not yet fully appreciating the object oriented features C++ offers. (That's not untypical when coming from a more data-centric programming in C. C++ is specifically designed to make that transition at the desired speed and to make it painless.)
My suggestion is to encapsulate the flip operation in an organism and let the organism handle it. As an illustration (untested, but compiles):
#include<climits> // CHAR_BIT
#include<cstdlib> // exit()
class string;
void log(const char *);
// inaccessible from the outside
constexpr int NUM_TRAITS = 1000;
constexpr size_t TRAIT_ARR_SZ = (NUM_TRAITS+CHAR_BIT-1)/CHAR_BIT;
class Organism
{
char traits[TRAIT_ARR_SZ];
int flips[NUM_TRAITS];
/////////////////////////////////////////////////////////////
public:
Organism() { /* set traits and flips zero */ }
// Consider a virtual function if you may derive
/** Invert the trait at index traitIndex */
void flipTrait(int traitIndex)
{
if( traitIndex >= NUM_TRAITS ) { log("trait overflow"); exit(1); }
int charInd = traitIndex / CHAR_BIT;
int bitInd = traitIndex % CHAR_BIT;
traits[traitIndex] ^= 1 << bitInd;
flips[traitIndex]++;
}
// Organisms can do so much more!
void display();
void store(string &path);
void load(string &path);
void mutate(float traitRatio);
Organism clone();
};

Void pointer values comparing C++

My actual question is it really possible to compare values contained in two void pointers, when you actually know that these values are the same type? For example int.
void compVoids(void *firstVal, void *secondVal){
if (firstVal < secondVal){
cout << "This will not make any sense as this will compare addresses, not values" << endl;
}
}
Actually I need to compare two void pointer values, while outside the function it is known that the type is int. I do not want to use comparison of int inside the function.
So this will not work for me as well: if (*(int*)firstVal > *(int*)secondVal)
Any suggestions?
Thank you very much for help!
In order to compare the data pointed to by a void*, you must know what the type is. If you know what the type is, there is no need for a void*. If you want to write a function that can be used for multiple types, you use templates:
template<typename T>
bool compare(const T& firstVal, const T& secondVal)
{
if (firstVal < secondVal)
{
// do something
}
return something;
}
To illustrate why attempting to compare void pointers blind is not feasible:
bool compare(void* firstVal, void* secondVal)
{
if (*firstVal < *secondVal) // ERROR: cannot dereference a void*
{
// do something
}
return something;
}
So, you need to know the size to compare, which means you either need to pass in a std::size_t parameter, or you need to know the type (and really, in order to pass in the std::size_t parameter, you have to know the type):
bool compare(void* firstVal, void* secondVal, std::size_t size)
{
if (0 > memcmp(firstVal, secondVal, size))
{
// do something
}
return something;
}
int a = 5;
int b = 6;
bool test = compare(&a, &b, sizeof(int)); // you know the type!
This was required in C as templates did not exist. C++ has templates, which make this type of function declaration unnecessary and inferior (templates allow for enforcement of type safety - void pointers do not, as I'll show below).
The problem comes in when you do something (silly) like this:
int a = 5;
short b = 6;
bool test = compare(&a, &b, sizeof(int)); // DOH! this will try to compare memory outside the bounds of the size of b
bool test = compare(&a, &b, sizeof(short)); // DOH! This will compare the first part of a with b. Endianess will be an issue.
As you can see, by doing this, you lose all type safety and have a whole host of other issues you have to deal with.
It is definitely possible, but since they are void pointers you must specify how much data is to be compared and how.
The memcmp function may be what you are looking for. It takes two void pointers and an argument for the number of bytes to be compared and returns a comparison. Some comparisons, however, are not contingent upon all of the data being equal. For example: comparing the direction of two vectors ignoring their length.
This question doesn't have a definite answer unless you specify how you want to compare the data.
You need to dereference them and cast, with
if (*(int*) firstVal < *(int*) secondVal)
Why do you not want to use the int comparison inside the function, if you know that the two values will be int and that you want to compare the int values that they're pointing to?
You mentioned a comparison function for comparing data on inserts; for a comparison function, I recommend this:
int
compareIntValues (void *first, void *second)
{
return (*(int*) first - *(int*) second);
}
It follows the convention of negative if the first is smaller, 0 if they're equal, positive if the first is larger. Simply call this function when you want to compare the int data.
yes. and in fact your code is correct if the type is unsigned int. casting int values to void pointer is often used even not recommended.
Also you could cast the pointers but you have to cast them directly to the int type:
if ((int)firstVal < (int)secondVal)
Note: no * at all.
You may have address model issues doing this though if you build 32 and 64 bits. Check the intptr_t type that you could use to avoid that.
if ((intptr_t)firstVal < (intptr_t)secondVal)

memcmp sort

I have a single buffer, and several pointers into it. I want to sort the pointers based upon the bytes in the buffer they point at.
qsort() and stl::sort() can be given custom comparision functions. For example, if the buffer was zero-terminated I could use strcmp:
int my_strcmp(const void* a,const void* b) {
const char* const one = *(const char**)a,
const two = *(const char**)b;
return ::strcmp(one,two);
}
however, if the buffer is not zero-terminated, I have to use memcmp() which requires a length parameter.
Is there a tidy, efficient way to get the length of the buffer into my comparision function without a global variable?
With std::sort, you can use a Functor like this:
struct CompString {
CompString(int len) : m_Len(len) {}
bool operator<(const char *a, const char *b) const {
return std::memcmp(a, b, m_Len);
}
private:
int m_Len;
};
Then you can do this:
std::sort(begin(), end(), CompString(4)); // all strings are 4 chars long
EDIT: from the comment suggestions (i guess both strings are in a common buffer?):
struct CompString {
CompString (const unsigned char* e) : end(e) {}
bool operator()(const unsigned char *a, const unsigned char *b) const {
return std::memcmp(a, b, std::min(end - a, end - b)) < 0;
}
private:
const unsigned char* const end;
};
With the C function qsort(), no, there is no way to pass the length to your comparison function without using a global variable, which means it can't be done in a thread-safe manner. Some systems have a qsort_r() function (r stands for reentrant) which allows you to pass an extra context parameter, which then gets passed on to your comparison function:
int my_comparison_func(void *context, const void *a, const void *b)
{
return memcmp(*(const void **)a, *(const void **)b, (size_t)context);
}
qsort_r(data, n, sizeof(void*), (void*)number_of_bytes_to_compare, &my_comparison_func);
Is there a reason you can't null-terminate your buffers?
If not, since you're using C++ you can write your own function object:
struct MyStrCmp {
MyStrCmp (int n): length(n) { }
inline bool operator< (char *lhs, char *rhs) {
return ::strcmp (lhs, rhs, length);
}
int length;
};
// ...
std::sort (myList.begin (), myList.end (), MyStrCmp (STR_LENGTH));
Can you pack your buffer pointer + length into a structure and pass a pointer of that structure as void *?
You could use a hack like:
int buffcmp(const void *b1, const void *b2)
{
static int bsize=-1;
if(b2==NULL) {bsize=*(int*)(b1); return 0;}
return memcmp(b1, b2, idsize);
}
which you would first call as buffcmp(&bsize, NULL) and then pass it as the comparison function to qsort.
You could of course make the comparison behave more naturally in the case of buffcmp(NULL, NULL) etc by adding more if statements.
You could functors (give the length to the functor's constructor) or Boost.Lambda (use the length in-place).
I'm not clear on what you're asking. But I'll try, assuming that
You have a single buffer
You have an array of pointers of some kind which has been processed in some way so that some or all of its contents point into the buffer
That is code equivalent to:
char *buf = (char*)malloc(sizeof(char)*bufsize);
for (int i=0; i<bufsize; ++i){
buf[i] = some_cleverly_chosen_value(i);
}
char *ary[arraysize] = {0};
for(int i=0; i<arraysize; ++i){
ary[i] = buf + some_clever_function(i);
}
/* ...do the sort here */
Now if you control the allocation of the buffer, you could substitute
char *buf = (char*)malloc(sizeof(char)*(bufsize+1));
buf[bufsize]='\0';
and go ahead using strcmp. This may be possible even if you don't control the filling of the buffer.
If you have to live with a buffer handed you by someone else you can
Use some global storage (which you asked to avoid and good thinking).
Hand the sort function something more complicated than a raw pointer (the address of a struct or class that supports the extra data). For this you need to control the deffinition of ary in the above code.
Use a sort function which supports an extra input. Either sort_r as suggested by Adam, or a home-rolled solution (which I do recommend as an exercise for the student, and don't recommend in real life). In either case the extra data is probably a pointer to the end of the buffer.
memcmp should stop on the first byte that is unequal, so the length should be large, i.e. to-the-end-of-the-buffer. Then the only way it can return zero is if it does go to the end of the buffer.
(BTW, I lean toward merge sort myself. It's stable and well-behaved.)