C++ bitwise operations on structs and classes - c++

I'm developing a generic Genetic Algorithm library, where the chromosome of each organism is its bit representation in memory. So, for instance, if I want to mutate a organism, I flip the bits themselves of the object randomly.
At first, I tried using the bitset class from the C++ standard library, but, when converting back to an object T, my only option was using the to_ullong member function, which was a problem for representations with a number of bits larger than the size of an unsigned long long.
Then I decided to create a generic library for bitwise operations on any object T, so I could apply these operations directly onto the objects themselves, instead of converting them first to a bitset.
So you can see what I'm trying to achieve, here's a function from the library:
template<typename T>
void flip(T& x, size_t const i)
{
x ^= 1 << i;
}
And it's used in the GA library like this:
template<typename T>
void GeneticAlgorithm<T>::mutate(T& organism, double const rate)
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(0, 1);
for(size_t i = 0; i < m_nBits; ++i)
if(dist(mt) <= rate)
bit::flip(organism, i);
}
It would be really nice if this worked, however now I'm getting this error message from the VC++ 2015 RC compiler:
Severity Code Description Project File Line Error C2677 binary '^': no
global operator found which takes type 'T' (or there is no acceptable
conversion) GeneticAlgorithm path\geneticalgorithm\geneticalgorithm\BitManip.hpp 57
If I correct this error for the ^, I get more for the other operators.
I haven't used bitwise operators before in my code, so I guess these operators are not supposed to be used with any object? If so, how could I work around the problem?

What you want to achieve can be done like that (see Peter Schneider's comment):
template<typename T> void flip(T& x, size_t const i) {
unsigned char* data = reinterpret_cast<unsigned char*>(&x);
data[i/8] ^= (1 << (i%8));
}
what it does is reinterpreting your data x as an array of bytes (unsigned char), then determining which byte should be flipped (i/8), then which bit within the byte (i%8).
Note: in addition, it may be safe to add at the beginning of the function:
assert(i < sizeof(T)*8)

I am under the impression that you are not yet fully appreciating the object oriented features C++ offers. (That's not untypical when coming from a more data-centric programming in C. C++ is specifically designed to make that transition at the desired speed and to make it painless.)
My suggestion is to encapsulate the flip operation in an organism and let the organism handle it. As an illustration (untested, but compiles):
#include<climits> // CHAR_BIT
#include<cstdlib> // exit()
class string;
void log(const char *);
// inaccessible from the outside
constexpr int NUM_TRAITS = 1000;
constexpr size_t TRAIT_ARR_SZ = (NUM_TRAITS+CHAR_BIT-1)/CHAR_BIT;
class Organism
{
char traits[TRAIT_ARR_SZ];
int flips[NUM_TRAITS];
/////////////////////////////////////////////////////////////
public:
Organism() { /* set traits and flips zero */ }
// Consider a virtual function if you may derive
/** Invert the trait at index traitIndex */
void flipTrait(int traitIndex)
{
if( traitIndex >= NUM_TRAITS ) { log("trait overflow"); exit(1); }
int charInd = traitIndex / CHAR_BIT;
int bitInd = traitIndex % CHAR_BIT;
traits[traitIndex] ^= 1 << bitInd;
flips[traitIndex]++;
}
// Organisms can do so much more!
void display();
void store(string &path);
void load(string &path);
void mutate(float traitRatio);
Organism clone();
};

Related

concatenating uint16_t and uint32_t values for hashing

I am trying concatenating (not adding) 2 uint16_t struct members and 2 uint32_t struct members and assigning the result to const void *p for the purpose of hashing. The struct and concatenation function that I am trying to implement is as follows.
struct xyz {
....
uint32_t a;
uint32_t b;
....
uint16_t c;
uint16_t d;
....
}
const void *p=concatenation(xyz.a,xyz.b,xyz.c,xyz.d)
Edited:
I have to use pre-defined hash functions. The most suitable hash function for my task seems to be this.
uint32_t hash(const uint32_t p[], size_t n)
{
//Returns the hash of the 'n' 32-bit words at 'p'
}
or
uint32_t hash64(const uint64_t p[], size_t n)
{
//Returns the hash of the 'n' 64-bit words at 'p'
}
for the purpose of hashing
In this case, I'd rather prefer providing a custom hash function – or specialise std::hash for. For use with standard templates, this might look like this:
namespace std // any extension of std namespace is UB
// sole exception: specialising templates, which we are going to do
{
template <>
struct hash<xyz>
{
size_t operator()(xyz const& i) const
{
// TODO: need to calculate the value from a, b, c, and d appropriately
return 0;
};
};
// if xyz is polymorphic, you might need to operate on pointers
// no problem either:
template <>
struct hash<xyz*>
{
size_t operator()(xyz const* i) const
{
return hash<xyz>()(*i);
// or if hash value is type dependent:
return i->hash(); // custom virtual hash member function needs to be implented
}
}
// now you can have
std::unordered_set<xyz> someSet;
void demo()
{
someSet.insert(xyz());
}
(Untested code, in case of errors please fix yourself.)
A list of hashing algorithms which might be used can be found at wikipedia.
If you want the value to fit into a pointer, the full value can be 32 bits on x86 or 64 bits on x64. I'm going to assume you are compiling for 64 bit machines.
This means you can only fit 2 uint16 and one uint32, or 2 uint32s.
Either way, you would shift the values into a uint64 (c | (d << 16) | (c << 32)) and then convert that value to a void*.
Edit: for clarification, you cannot fit all the structs members bit shifted one after another into a single pointer. You need a minimum of 96 bits to hold the packed struct which means at least two 64 bit pointers.
There are a few things to consider:
Does that hash value need to be portable across systems? If it does, then you will need to be careful to order the bytes the same way on different systems. If not, then the implementation can be simpler.
Do you want to hash every member of the class, and the class has no padding, and no value of a member should be hashed equally to another different value?
If both of these simplifications apply, then your function is fast and easy to implement but violating that precondition will break the hash. If not, then you must serialise the the data into a buffer, which practically means that you cannot simply return a pointer.
Here is a super simple implementation for the case that you don't need portability, and you hash all members, and there is no padding:
xyz example;
static_assert(std::has_unique_object_representations_v<xyz>);
const void* p = &example;
Note that this doesn't work with (IEEE-754) float members due to peculiarities of NaN.
A more robust solution that can produce hashes that are portable across systems is to use a general purpose serialisation scheme, and hash the serialised result. There is no standard serialisation functionality in C++.
void* has problems like: Who owns the memory? What's the type you are going to reinterpret the pointer as?
A more typed solution would be to use std::array of std::byte then you at least know that you're looking at an array of raw bytes and nothing else:
#include <cstdint>
#include <array>
#include <cstddef>
#include <cstring>
auto concat(std::uint32_t a, std::uint32_t b, std::uint16_t c, std::uint16_t d) {
std::array<std::byte, sizeof a + sizeof b + sizeof c + sizeof d> res;
std::byte* p = res.data();
std::memcpy(p, &a, sizeof a);
std::memcpy(p += sizeof a, &b, sizeof b);
std::memcpy(p += sizeof b, &c, sizeof c);
std::memcpy(p += sizeof c, &d, sizeof d);
return res;
}
int main() {
std::uint32_t a = 1, b = 0;
std::uint16_t c = 1, d = 0;
auto res = concat(a, b, c, d);
return 0;
}

C++ and dynamically typed languages

Today I talked to a friend about the differences between statically and dynamically typed languages (more info about the difference between static and dynamic typed languages in this SO question). After that, I was wondering what kind of trick can be used in C++ to emulate such dynamic behavior.
In C++, as in other statically typed languages, the variable type is specified at compile time. For example, let's say I have to read from a file a big amount of numbers, which are in the majority of the cases quite small, small enough to fit in an unsigned short type. Here comes the tricky thing, a small amount of these values are much bigger, bigger enough to need an unsigned long long to be stored.
Since I assume I'm going to do calculations with all of them I want all of them stored in the same container in consecutive positions of memory in the same order than I read them from the input file.. The naive approach would be to store them in a vector of type unsigned long long, but this means having typically up to 4 times extra space of what is actually needed (unsigned short 2 bytes, unsigned long long 8 bytes).
In dynamically typed languages, the type of a variable is interpreted at runtime and coerced to a type where it fits. How can I achieve something similar in C++?
My first idea is to do that by pointers, depending on its size I will store the number with the appropriate type. This has the obvious drawback of having to also store the pointer, but since I assume I'm going to store them in the heap anyway, I don't think it matters.
I'm totally sure that many of you can give me way better solutions than this ...
#include <iostream>
#include <vector>
#include <limits>
#include <sstream>
#include <fstream>
int main() {
std::ifstream f ("input_file");
if (f.is_open()) {
std::vector<void*> v;
unsigned long long int num;
while(f >> num) {
if (num > std::numeric_limits<unsigned short>::max()) {
v.push_back(new unsigned long long int(num));
}
else {
v.push_back(new unsigned short(num));
}
}
for (auto i: v) {
delete i;
}
f.close();
}
}
Edit 1:
The question is not about saving memory, I know in dynamically typed languages the necessary space to store the numbers in the example is going to be way more than in C++, but the question is not about that, it's about emulating a dynamically typed language with some c++ mechanism.
Options include...
Discriminated union
The code specifies a set of distinct, supported types T0, T1, T2, T3..., and - conceptually - creates a management type to
struct X
{
enum { F0, F1, F2, F3... } type_;
union { T0 t0_; T1 t1_; T2 t2_; T3 t3_; ... };
};
Because there are limitations on the types that can be placed into unions, and if they're bypassed using placement-new care needs to be taken to ensure adequate alignment and correct destructor invocation, a generalised implementation becomes more complicated, and it's normally better to use boost::variant<>. Note that the type_ field requires some space, the union will be at least as large as the largest of sizeof t0_, sizeof t1_..., and padding may be required.
std::type_info
It's also possible to have a templated constructor and assignment operator that call typeid and record the std::type_info, allowing future operations like "recover-the-value-if-it's-of-a-specific-type". The easiest way to pick up this behaviour is to use boost::any.
Run-time polymorphism
You can create a base type with virtual destructor and whatever functions you need (e.g. virtual void output(std::ostream&)), then derive a class for each of short and long long. Store pointers to the base class.
Custom solutions
In your particular scenario, you've only got a few large numbers: you could do something like reserve one of the short values to be a sentinel indicating that the actual value at this position can be recreated by bitwise shifting and ORing of the following 4 values. For example...
10 299 32767 0 0 192 3929 38
...could encode:
10
299
// 32767 is a sentinel indicating next 4 values encode long long
(0 << 48) + (0 << 32) + (192 << 16) + 3929
38
The concept here is similar to UTF-8 encoding for international character sets. This will be very space efficient, but it suits forward iteration, not random access indexing a la [123].
You could create a class for storing dynamic values:
enum class dyn_type {
none_type,
integer_type,
fp_type,
string_type,
boolean_type,
array_type,
// ...
};
class dyn {
dyn_type type_ = dyn_type::none_type;
// Unrestricted union:
union {
std::int64_t integer_value_;
double fp_value_;
std::string string_value_;
bool boolean_value_;
std::vector<dyn> array_value_;
};
public:
// Constructors
dyn()
{
type_ = dyn_type::none_type;
}
dyn(std::nullptr_t) : dyn() {}
dyn(bool value)
{
type_ = dyn_type::boolean_type;
boolean_value_ = value;
}
dyn(std::int32_t value)
{
type_ = dyn_type::integer_type;
integer_value_ = value;
}
dyn(std::int64_t value)
{
type_ = dyn_type::integer_type;
integer_value_ = value;
}
dyn(double value)
{
type_ = dyn_type::fp_type;
fp_value_ = value;
}
dyn(const char* value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(value);
}
dyn(std::string const& value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(value);
}
dyn(std::string&& value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(std::move(value));
}
// ....
// Clear
void clear()
{
switch(type_) {
case dyn_type::string_type:
string_value_.std::string::~string();
break;
//...
}
type_ = dyn_type::none_type;
}
~dyn()
{
this->clear();
}
// Copy:
dyn(dyn const&);
dyn& operator=(dyn const&);
// Move:
dyn(dyn&&);
dyn& operator=(dyn&&);
// Assign:
dyn& operator=(std::nullptr_t);
dyn& operator=(std::int64_t);
dyn& operator=(double);
dyn& operator=(bool);
// Operators:
dyn operator+(dyn const&) const;
dyn& operator+=(dyn const&);
// ...
// Query
dyn_type type() const { return type_; }
std::string& string_value()
{
assert(type_ == dyn_type::string_type);
return string_value_;
}
// ....
// Conversion
explicit operator bool() const
{
switch(type_) {
case dyn_type::none_type:
return true;
case dyn_type::integer_type:
return integer_value_ != 0;
case dyn_type::fp_type:
return fp_value_ != 0.0;
case dyn_type::boolean_type:
return boolean_value_;
// ...
}
}
// ...
};
Used with:
std::vector<dyn> xs;
xs.push_back(3);
xs.push_back(2.0);
xs.push_back("foo");
xs.push_back(false);
An easy way to get dynamic language behavior in C++ is to use a dynamic language engine, e.g. for Javascript.
Or, for example, the Boost library provides an interface to Python.
Possibly that will deal with a collection of numbers in a more efficient way than you could do yourself, but still it's extremely inefficient compared to just using an appropriate single common type in C++.
The normal way of dynamic typing in C++ is a boost::variant or a boost::any.
But in many cases you don't want to do that. C++ is a great statically typed language and it's just not your best use case to try to force it to be dynamically typed (especially not to save memory use). Use an actual dynamically typed language instead as it is very likely better optimized (and easier to read) for that use case.

c++ stringstream into variables of different types

I'm getting a string containing raw binary data which needs to be converted to integers. The Problem is these values are not always in the same order and do not always appear. So the format of the binary data gets described in a config file and the type of the values read from the binary data is not known at compile time.
I'm thinking of a solution similar to this:
enum BinaryType {
TYPE_UINT16,
TYPE_UNIT32,
TYPE_INT32
};
long convert(BinaryType t, std::stringstream ss) {
long return_value;
switch(t) {
case TYPE_UINT16:
unsigned short us_value;
ss.read(&us_value, sizeof(unsigned short));
return_value = short;
break;
case TYPE_UINT32:
unsigned int ui_value;
ss.read(&ui_value, sizeof(unsigned int));
return_value = ui_value;
break;
case TYPE_INT32:
signed int si_value;
ss.read(&si_value, sizeof(signed int));
return_value = si_value;
break;
}
return return_value;
}
The goal is to output these values in decimal.
My Questions are:
This code is very repetitive. Is there a simpler solution? (Templates?)
should I make use of the standard types like signed int if the value needs to be 32 bit? What to use instead? Endianness?
A simple solution: define a base class for converters:
class Converter {
public:
virtual int_64 convert(std::stringstream& ss) = 0;
}
Next define a concrete converter for each binary type. Have a map/array mapping from binary types identifiers to your converters, e.g.:
Converter* converters[MAX_BINARY_TYPES];
converters[TYPE_UINT16] = new ConverterUINT16;
...
Now, you can use it like this (variables defined like in your function convert):
cout << converters[t]->convert(ss)
For portability, instead of basic types like int, long, etc, you should use int32_t, int64_t which are guaranteed to be the same on all systems.
Of course, if your code is meant to deal with different endianness, you need to deal with it explicitly. For the above example code you can have two different converters' sets, one for little endian data decoding, another for big endian. Another thing you can do is to write a wrapper class for std::stringstream, let's call it StringStream, which defines functions for reading int32, uint32, etc., and swaps the bytes if the endianness is different than the architecture of the system your code is running on. You can make the class a template and instantiate it with one of the two:
class SameByteOrder {
public:
template<typename T> static void swap(T &) {}
};
class OtherByteOrder {
public:
template<typename T> static void swap(T &o)
{
char *p = reinterpret_cast<char *>(&o);
size_t size = sizeof(T);
for (size_t i=0; i < size / 2; ++i)
std::swap(p[i], p[size - i - 1]);
}
};
then use the swap function inside your StringStream's functions to swap (or not) the bytes.

Bit Array Program

So I'm in a summer OO class and we have a test tomorrow based around this project. Basically we need to create an array that holds an unspecified amount of bits and write four functions that perform operations on this array- Set() //set bit with given index to 1, Unset() //set bit with given index to 0, Flip() // change bit (with given index) and Query() // return true if the given bit is set to 1, false otherwise.
Here's a complete description if anyone is interested: http://pastebin.com/v7BCCYjh and some sample runs: http://pastebin.com/1ijh5p7p
The problem I'm having is with the high level concept. I'm pretty sure we're meant to store byte representations of the bits in each index of the array. If that is true, then I'm completely at a loss for how to implement the functions. If anyone can give me some pointers on how to approach this (I need to have a good understanding of it by tonight because I have to write out some pseudo code for it tomorrow for a midterm) I would be much, much appreciative.
Here's my .h if it helps
// bitarray.h
//
// BitArray class declaration
#ifndef _BITARRAY_H
#define _BITARRAY_H
#include <iostream>
using namespace std;
class BitArray
{
friend ostream& operator<< (ostream& os, const BitArray& a);
friend bool operator== (const BitArray&, const BitArray&);
friend bool operator!= (const BitArray&, const BitArray&);
public:
BitArray(unsigned int n); // Construct an array that can handle n bits
BitArray(const BitArray&); // copy constructor
~BitArray(); // destructor
BitArray& operator= (const BitArray& a); // assignment operator
unsigned int Length() const; // return number of bits in bitarray
void Set (unsigned int index); // set bit with given index to 1
void Unset (unsigned int index); // set bit with given index to 0
void Flip (unsigned int index); // change bit (with given index)
bool Query (unsigned int index) const; // return true if the given bit
// is set to 1, false otherwise
private:
unsigned char* barray; // pointer to the bit array
int arraySize;
};
#endif
And my constructor:
BitArray::BitArray(unsigned int n){
int size = sizeof(char);
if(n%(8*size) != 0)
arraySize = ((n/(8*size))+1);
else
arraySize = n/(8*size);
barray = new unsigned char[arraySize];
for(int i = 0; i < arraySize; i++)
barray[i] = 0;
}
For Set() and Query(), find the position of the word that holds the bit you are interested in. (Your code seems to use char as words.) Then, find the position of the bit within this word. Create a bitmask that addresses the specific bit, you will need a shifting operator for this. Recall the bitwise operators which will finally help you do the job. Sometimes the bitwise assignment operators will be more elegant.
Do you remember the bitwise XOR operator in C++? Use this with the concept learned from Set() to implement Flip(). Use the bitwise negation operator to finally implement Unset().
Note that your way of determining the array size is overly complicated. Recall that ceil(a/b) == floor((a+b-1)/b) in the cases that can happen here.
Consider using std::vector instead of a plain array if you are allowed to. SPOILER BELOW!
There is also an interesting specialization of this class.
Impress your teacher by turning your class into a template where you can specify the actual storage unit (char, uint16_t, ...) as parameter. For starters, say typedef char WORD_TYPE and see if your code later compiles when you change the definition of WORD_TYPE.
You could treat array of integers as an array of bits.
Say, you have an array A = [0xC30FF0C3, 0xC20FF0C3], and you want to access the 53. bit.
You could find the index of an int that holds the 53. bit doing floor(53 / 32) which is 1 and the bit position within that int doing 53 % 32, which is 21.
As for the Flip function...
Well, you already have Query(), Set(), Unset().
Simple
Flip(i) {
Query(i) ? Unset(i) : Set(i)
}
would do the job.

Extract subset from boost dynamic_bitset

I need to extract and decode the bits (idx, idx+1, ... idx+n_bits) from a given boost dynamic_bitset.
I have created the following solution:
boost::dynamic_bitset<> mybitset(...);
// build mask 2^{idx+n_bits} - 2^{idx}
const boost::dynamic_bitset<> mask(mybitset.size(), (1 << idx+n_bits) - (1 << idx));
// shift the masked result idx times and get long
unsigned long u = ((mybitset & mask) >> idx ).to_ulong();
It works well, but as this code is critical for the performance of my application, I am curious if there exists a better way to achieve this?
The solution is easy:
#include <tuple>
using std::get;
using std::tuple;
using std::make_tuple;
#include <boost/dynamic_bitset.hpp>
using boost::dynamic_bitset;
template <typename Block, typename Allocator>
unsigned block_index(const boost::dynamic_bitset<Block, Allocator>& b, unsigned pos)
{ return pos / b.bits_per_block; }
namespace boost {
template <>
inline void
to_block_range(const dynamic_bitset<>& b, tuple<unsigned, unsigned, unsigned long&> param)
{
{
unsigned beg = get<0>(param);
unsigned len = get<1>(param);
unsigned block1 = block_index(b, beg);
unsigned block2 = block_index(b, beg + len -1);
unsigned bit_index = beg % b.bits_per_block;
unsigned long bitmask = (1 << len) - 1;
get<2>(param) = ((b.m_bits[block1] >> bit_index) |
(b.m_bits[block2] << (b.bits_per_block - bit_index) )) &
bitmask;
return;
}
}
}
unsigned long res;
to_block_range(bits, make_tuple(pos, len, std::ref(res)));
To call:
boost::dynamic_bitset<> bits;
unsigned long result;
to_block_range(bits, t_extract_range{begin_bit, length_bits, result});
There is no direct, native support in dynamic_bitset.
To get a range of bits, you have to get inside dynamic_bitset, get access to the underlying storage, and extract the bits yourself.
The code to do this is trivial but the data (dynamic_bitset::m_bits) is inside the private part of the class. There are three ways to hack past the private wall:
Pretend your compiler is non-conforming.
#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS. This changes private to public by changing BOOST_DYNAMIC_BITSET_PRIVATE.
Hacking the dynamic_bitset.hpp header to expose m_bits.
The third solution is to work around the current code.
(1) and (2) are brittle, frontal assaults which will be a maintenance nightmare.
Luckily for (3), there are template functions which are friends of dynamic_bitset. We can substitute our own function to do our own extraction by taking over (specialising) this template.
template <typename Block, typename Allocator, typename BlockOutputIterator>
inline void
to_block_range(const dynamic_bitset<Block, Allocator>& b,
BlockOutputIterator result)
{
std::copy(b.m_bits.begin(), b.m_bits.end(), result);
}
The canonical template function copies the entire bitset to iterator BlockOutputIterator which is not what we want.
We are going to specialise boost::to_block_range using a single custom type in place of BlockOutputIterator which will hold all 3 i/o parameters: namely
begin_bit,
length_of_range and
destination.
Providing you call to_block_range with the requisite type, it will call your own function instead of the standard template, but with full access to the internals as well. You have essentially subverted the c++ access specification system!
N.B. The example code does no error checking. No attempt to make sure
that the range fits in unsigned long or
that the range does not exceed the bounds of the bitset or
that the bitset uses unsigned longs internally.