Access bits in memory - c++

I want to assemble a message bit by bit, then handle the message as a vector of unsigned characters ( e.g. to calculate the CRC )
I can assemble the message OK, using either a std::vector<bool> or a std::bitset
I can copy the assembled message to a std::vector doing it bit by bit. ( Note: the meesage is padded so that its length is an integer number of bytes )
// assemble message
std::vector<bool> bitMessage;
...
// copy the bits one by one into bytes and add them to the message
std::vector<unsigned char> myMessage;
// loop over bytes
for (int kbyte = 0;
kbyte < bitMessage.size() / 8;
kbyte++)
{
unsigned char byte = 0;
// loop over bits
for (int kbit = 0;
kbit < 8;
kbit++)
{
// add bit to byte
byte += bitMessage[8 * kbyte + kbit] << kbit;
}
// add byte to message
myMessage.push_back(byte);
}
This works.
But it seems awfully slow! I would like to use std::memcpy.
For a 'normal' vector I would do
memcpy(
myMessage.data(),
bitMessage.data(),
bitMessage.size() / 8 );
or
memcpy(
&myMessage[0],
&bitMessage[0],
bitMessage.size() / 8 );
but neither of these methods is possible with either a vector<bool> or bitset
Question: Is there a way to get a pointer to the memory where the bits are stored?
The answer is: not with std::vector<bool> or std::bitset
However, with some hints , especially from #Ayxan Haqverdili, it is possible to write a small class that will accept single bits and construct a well mannered std::vector<unsigned char> as we go along.
/** Build a message bit by bit, creating an unsigned character vector of integer length
*
* Hides the messy bit twiddling required,
* allowing bits to be added to the end of the message
*
* The message is automatically padded at the end with zeroes
*/
class cTwiddle
{
public:
std::vector<unsigned char> myMessage;
cTwiddle() : myBitLength(0) {}
/** add a bit to end of message
* #param[in] bit
*/
void add(bool bit)
{
// check if message vector is full
if (!(myBitLength % 8))
{
// add byte to end of message
myMessage.push_back(0);
}
// control order bits are added to a byte
int shift = 7 - (myBitLength % 8); // add bits from left to right ( MSB first )
// int shift = (myBitLength % 8); // add bits from right to left ( LSB first )
myMessage.back() += (1 & bit) << shift;
myBitLength++;
}
private:
int myBitLength;
};

Apparently neither of those classes define the layout. Just write your own class and define the layout you want:
template <int size>
class BitSet final {
private:
unsigned char buffer[size / 8 + (size % 8 != 0)] = {};
public:
constexpr bool get(size_t index) const noexcept {
return (buffer[index / 8] >> (index % 8)) & 1U;
}
constexpr void set(size_t index) noexcept {
buffer[index / 8] |= (1U << (index % 8));
}
constexpr void clear(size_t index) noexcept {
buffer[index / 8] &= ~(1U << (index % 8));
}
};
Memcpy-ing this class is perfectly fine. Otherwise, you might also provide direct access to the byte array.
Alternatively, you can dynamically allocate the buffer:
#include <memory>
class DynBitSet final {
private:
size_t size = 0;
std::unique_ptr<unsigned char[]> buffer;
public:
explicit DynBitSet(size_t bitsize)
: size(bitsize / 8 + (bitsize % 8 != 0)),
buffer(new unsigned char[size]{}) {}
bool get(size_t index) const noexcept {
return (buffer[index / 8] >> (index % 8)) & 1U;
}
void set(size_t index) noexcept { buffer[index / 8] |= (1U << (index % 8)); }
void clear(size_t index) noexcept {
buffer[index / 8] &= ~(1U << (index % 8));
}
auto bitSize() const noexcept { return size * 8; }
auto byteSize() const noexcept { return size; }
auto const* byteBuffer() const noexcept { return buffer.get(); }
};

Is there a way to get a pointer to the memory where the bits are stored [in std::vector]?
No. The idea is that it should be not possible.
Is there a way
Fun fact: In glibc the member in iterator is public.
#include <vector>
#include <iostream>
int main() {
std::vector<bool> vec{1,0,1,0,1,1,1,1};
std::cout << *vec.begin()._M_p << '\n';
}

Related

Is there a way I can use a 2-bit size type instead of an int, by just plugging in the new type name instead of int?

I have an application where I need to save as much of memory as possible. I need to store a large amount of data that can take exactly three possible values. So, I have been trying to use a 2 bit sized type.
One possibility is using bit fields. I could do
struct myType {
uint8_t twoBits : 2;
}
This is a suggestion from this thread.
However, everywhere where I have used int variables prior to this, I would need to change their usage by appending a .twoBits. I checked if I can create a bit field outside of a struct, such as
uint8_t twoBits : 2;
but this thread says it is not possible. However,that thread is specific to C, so I am not sure if it applied to C++.
Is there a clean way I can define a 2-bit type, so that by simply replacing int with my type, I can run the program correctly? Or is using bit fields the only possible way?
CPU, and thus the memory, the bus, and the compiler too, uses only bytes or groups of bytes. There's no way to store a 2-bits type without storing also the other 6 remaining bits.
What you can so is define a struct that only uses some bits. But we aware that it will not save memory.
You can pack several x-bits types in a struct, as you already know. Or you can do bits operations to pack/unpack them into a integer type.
Is there a clean way I can define a 2-bit type, so that by simply
replacing int with my type, I can run the program correctly? Or is
using bit fields the only possible way?
You can try to make the struct as transparent as possible by providing implicit conversion operators and constructors:
#include <cstdint>
#include <iostream>
template <std::size_t N, typename T = unsigned>
struct bit_field {
T rep : N;
operator T() { return rep; }
bit_field(T i) : rep{ i } { }
bit_field() = default;
};
using myType = bit_field<2, std::uint8_t>;
int main() {
myType mt;
mt = 3;
std::cout << mt << "\n";
}
So objects of type my_type somewhat behave like real 3-bit unsigned integers, despite having more than 3 bits.
Of course, the residual bits are unused, but as single bits are not addressable on most systems, this is the best way to go.
I'm not convinced that you will save anything with your existing structure, as the surrounding structure still gets rounded up to a whole number of bytes.
You can write the following to squeeze 4 2-bit counters into 1 byte, but as you say, you have to name them myInst.f0:
struct MyStruct
{
ubyte_t f0:2,
f1:2,
f2:2,
f3:2;
} myInst;
In c and c++98, you can declare this anonymous, but this usage is deprecated. You can now access the 4 values directly by name:
struct
{ // deprecated!
ubyte_t f0:2,
f1:2,
f2:2,
f3:2;
};
You could declare some sort of template that wraps a single instance with an operator int and operator =(int), and then define a union to put the 4 instances at the same location, but again anonymous unions are deprecated. However you could then declare references to your 4 values, but then you are paying for the references, which are bigger than the bytes you were trying to save!
template <class Size,int offset,int bits>
struct Bitz
{
Size ignore : offset,
value : bits;
operator Size()const { return value; }
Size operator = (Size val) { return (value = val); }
};
template <class Size,int bits>
struct Bitz0
{ // I know this can be done better
Size value : bits;
operator Size()const { return value; }
Size operator = (Size val) { return (value = val); }
};
static union
{ // Still deprecated!
Bitz0<char, 2> F0;
Bitz<char, 2, 2> F1;
Bitz<char, 4, 2> F2;
Bitz<char, 6, 2> F3;
};
union
{
Bitz0<char, 2> F0;
Bitz<char, 2, 2> F1;
Bitz<char, 4, 2> F2;
Bitz<char, 6, 2> F3;
} bitz;
Bitz0<char, 2>& F0 = bitz.F0; /// etc...
Alternatively, you could simply declare macros to replace the the dotted name with a simple name (how 1970s):
#define myF0 myInst.f0
Note that you can't pass bitfields by reference or pointer, as they don't have a byte address, only by value and assignment.
A very minimal example of a bit array with a proxy class that looks (for the most part) like you were dealing with an array of very small integers.
#include <cstdint>
#include <iostream>
#include <vector>
class proxy
{
uint8_t & byte;
unsigned int shift;
public:
proxy(uint8_t & byte,
unsigned int shift):
byte(byte),
shift(shift)
{
}
proxy(const proxy & src):
byte(src.byte),
shift(src.shift)
{
}
proxy & operator=(const proxy &) = delete;
proxy & operator=(unsigned int val)
{
if (val <=3)
{
uint8_t wipe = 3 << shift;
byte &= ~wipe;
byte |= val << shift;
}
// might want to throw std::out_of_range here
return *this;
}
operator int() const
{
return (byte >> shift) &0x03;
}
};
Proxy holds a reference to a byte and knows how to extract two specific bits and look like an int to anyone who uses it.
If we wrap an array of bits packed into bytes with a class that returns this proxy object wrapped around the appropriate byte, we now have something that looks a lot like an array of very small ints.
class bitarray
{
size_t size;
std::vector<uint8_t> data;
public:
bitarray(size_t size):
size(size),
data((size + 3) / 4)
{
}
proxy operator[](size_t index)
{
return proxy(data[index/4], (index % 4) * 2);
}
};
If you want to extend this and go the distance, Writing your own STL Container should help you make a fully armed and operational bit-packed array.
There's room for abuse here. The caller can hold onto a proxy and get up to whatever manner of evil this allows.
Use of this primitive example:
int main()
{
bitarray arr(10);
arr[0] = 1;
arr[1] = 2;
arr[2] = 3;
arr[3] = 1;
arr[4] = 2;
arr[5] = 3;
arr[6] = 1;
arr[7] = 2;
arr[8] = 3;
arr[9] = 1;
std::cout << arr[0] << std::endl;
std::cout << arr[1] << std::endl;
std::cout << arr[2] << std::endl;
std::cout << arr[3] << std::endl;
std::cout << arr[4] << std::endl;
std::cout << arr[5] << std::endl;
std::cout << arr[6] << std::endl;
std::cout << arr[7] << std::endl;
std::cout << arr[8] << std::endl;
std::cout << arr[9] << std::endl;
}
Simply, build on top of bitset, something like:
#include<bitset>
#include<iostream>
using namespace std;
template<int N>
class mydoublebitset
{
public:
uint_least8_t operator[](size_t index)
{
return 2 * b[index * 2 + 1] + b[index * 2 ];
}
void set(size_t index, uint_least8_t store)
{
switch (store)
{
case 3:
b[index * 2] = 1;
b[index * 2 + 1] = 1;
break;
case 2:
b[index * 2] = 0;
b[index * 2 + 1] = 1;
break;
case 1:
b[index * 2] = 0;
b[index * 2 + 1] = 1;
break;
case 0:
b[index * 2] = 0;
b[index * 2 + 1] = 0;
break;
default:
throw exception();
}
}
private:
bitset<N * 2> b;
};
int main()
{
mydoublebitset<12> mydata;
mydata.set(0, 0);
mydata.set(1, 2);
mydata.set(2, 2);
cout << (unsigned int)mydata[0] << (unsigned int)mydata[1] << (unsigned int)mydata[2] << endl;
system("pause");
return 0;
}
Basically use a bitset with twice the size and index it accordingly. its simpler and memory efficient as is required by you.

C++ most optimal conversion

Is that most optimal for server (speed) conversion of data ?
Could I change it for better performance ?
This is used in packet parser to set/get packet data.
void Packet::setChar(char val, unsigned int offset)
{
raw[offset + 8] = val;
}
short Packet::getChar(unsigned int offset)
{
return raw[offset + 8];
}
void Packet::setShort(short val, unsigned int offset)
{
raw[offset + 8] = val & 0xff;
raw[offset + 9] = (val >> 8) & 0xff;
}
short Packet::getShort(unsigned int offset)
{
return (short)((raw[offset + 9]&0xff) << 8) | (raw[offset + 8]&0xff);
}
void Packet::setInt(int val, unsigned int offset)
{
raw[offset + 8] = val & 0xff;
raw[offset + 9] = (val >> 8) & 0xff;
raw[offset + 10] = (val >> 16) & 0xff;
raw[offset + 11] = (val >> 24) & 0xff;
}
int Packet::getInt(unsigned int offset)
{
return (int)((raw[offset + 11]&0xff) << 24) | ((raw[offset + 10]&0xff) << 16) | ((raw[offset + 9]&0xff) << 8) | (raw[offset + 8]&0xff);
}
Class defs :
class Packet
{
public:
Packet(unsigned int length);
Packet(char * raw);
///header
void setChar(char val, unsigned int offset);
short getChar(unsigned int offset);
void setShort(short val, unsigned int offset);
short getShort(unsigned int offset);
void setInt(int val, unsigned int offset);
int getInt(unsigned int offset);
void setLong(long long val, unsigned int offset);
long getLong(unsigned int offset);
char * getRaw();
~Packet();
protected:
private:
char * raw;
};
#EDIT added class definitions
Char raw is inirialized with packet (new char).
I do agree with the comments saying "if it's not shown to be a problem, don't change it".
If your hardware is little endian, AND you either know that the offset is always aligned, or the processor supports unaligned accesses (e.g. x86), then you could speed up the setting of the larger data types by simply storing the whole item in one move (and yes, there will probably be people saying "it's undefined" - and it may well be undefined, but I've yet to see a compiler that doesn't do this correctly, because it's a fairly common thing to do in various types of code).
So something like this:
void Packet::setInt(int val, unsigned int offset)
{
int *ptr = static_cast<int*>(&raw[offset + 8]);
*ptr = val;
}
void Packet::getInt(int val, unsigned int offset)
{
int *ptr = static_cast<int*>(&raw[offset + 8]);
return *ptr;
}
Another thing I would DEFINITELY do is to ensure that the functions are present in a headerfile, so that the compiler has the choice to inline the functions. This will quite likely give you MORE benefit than fiddling with the code inside the functions, because the overhead of calling a function vs. being able to use the function inline will be quite noticeable. So that would be my first step - assuming you think it is a problem in the first place. For most things, stuffing the data into the buffer is not the "slow part" of sending a data packet - it is either the forming of the content, or the bytes passing down the wire to the other machine (which of the two depends on how fast your line is, and what calculations go into preparing the data in the first place).
It looks like that your implementation is already almost as efficient as possible. It simply isn't possible to optimize it further without a major overhaul of the application and even then, you'll save only few CPU cycles.
By the way, make sure that the function definitions are present in the header file, or #included to it. Otherwise, each output operation will need a function call, which is quite expensive for what you're doing.

How construct hash function for a user defined type?

For example, in the following struct:
1) editLine is a pointer to a data line which has CLRF,
2) nDisplayLine is the display line index of this editLine,
3) start is the offset in the display line,
4) len is the length of the text;
struct CacheKey {
const CEditLine* editLine;
int32 nDisplayLine;
int32 start;
int32 len;
friend bool operator==(const CacheKey& item1, const CacheKey& item2) {
return (item1.start == item2.start && item1.len == item2.len && item1.nDisplayLine == item2.nDisplayLine &&
item1.editLine == item2.editLine);
}
CacheKey() {
editLine = NULL;
nDisplayLine = 0;
start = 0;
len = 0;
}
CacheKey(const CEditLine* editLine, int32 dispLine, int32 start, int32 len) :
editLine(editLine), nDisplayLine(dispLine), start(start), len(len)
{
}
int hash() {
return (int)((unsigned char*)editLine - 0x10000) + nDisplayLine * nDisplayLine + start * 2 - len * 1000;
}
};
Now I need to put it into a std::unordered_map<int, CacheItem> cacheMap_
The problem is how to design the hash function for this structure, is there any guidelines?
How could i make sure the hash function is collision-free?
To create a hash function, you can use std::hash, which is defined for integers. Then, you can combine them "as the boost guys does" (because doing a good hash is something non trivial) as explained here : http://en.cppreference.com/w/cpp/utility/hash.
Here is a hash_combine method :
inline void hash_combine(std::size_t& seed, std::size_t v)
{
seed ^= v + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
So the "guideline" is more or less what's is shown on cppreference.
You CAN'T be sure your hash function is colision free. Colision free means that you do not loose data (or you restrict yourself to a small set of possibilities for your class). If any int32 value is allowed for each fields, a collision free hash is a monstrously big index, and it won't fit in a small table. Let unordered_map take care of collisions, and combine std::hash hash as explained above.
In you case, it will look something like
std::size_t hash() const
{
std::size_t h1 = std::hash<CEditLine*>()(editLine);
//Your int32 type is probably a typedef of a hashable type. Otherwise,
// you'll have to static_cast<> it to a type supported by std::hash.
std::size_t h2 = std::hash<int32>()(nDisplayLine);
std::size_t h3 = std::hash<int32>()(start);
std::size_t h4 = std::hash<int32>()(len);
std::size_t hash = 0;
hash_combine(hash, h1);
hash_combine(hash, h2);
hash_combine(hash, h3);
hash_combine(hash, h4);
return hash;
}
Then, you can specialize the std::hash operator for your class.
namespace std
{
template<>
struct hash<CacheKey>
{
public:
std::size_t operator()(CacheKey const& s) const
{
return s.hash();
}
};
}

Converting float values from big endian to little endian

Is it possible to convert floats from big to little endian? I have a big endian value from a PowerPC platform that I am sendING via TCP to a Windows process (little endian). This value is a float, but when I memcpy the value into a Win32 float type and then call _byteswap_ulongon that value, I always get 0.0000?
What am I doing wrong?
simply reverse the four bytes works
float ReverseFloat( const float inFloat )
{
float retVal;
char *floatToConvert = ( char* ) & inFloat;
char *returnFloat = ( char* ) & retVal;
// swap the bytes into a temporary buffer
returnFloat[0] = floatToConvert[3];
returnFloat[1] = floatToConvert[2];
returnFloat[2] = floatToConvert[1];
returnFloat[3] = floatToConvert[0];
return retVal;
}
Here is a function can reverse byte order of any type.
template <typename T>
T bswap(T val) {
T retVal;
char *pVal = (char*) &val;
char *pRetVal = (char*)&retVal;
int size = sizeof(T);
for(int i=0; i<size; i++) {
pRetVal[size-1-i] = pVal[i];
}
return retVal;
}
I found something roughly like this a long time ago. It was good for a laugh, but ingest at your own peril. I've not even compiled it:
void * endian_swap(void * arg)
{
unsigned int n = *((int*)arg);
n = ((n >> 8) & 0x00ff00ff) | ((n << 8) & 0xff00ff00);
n = ((n >> 16) & 0x0000ffff) | ((n << 16) & 0xffff0000);
*arg = n;
return arg;
}
An elegant way to do the byte exchange is to use a union:
float big2little (float f)
{
union
{
float f;
char b[4];
} src, dst;
src.f = f;
dst.b[3] = src.b[0];
dst.b[2] = src.b[1];
dst.b[1] = src.b[2];
dst.b[0] = src.b[3];
return dst.f;
}
Following jjmerelo's recommendation to write a loop, a more generic solution could be:
typedef float number_t;
#define NUMBER_SIZE sizeof(number_t)
number_t big2little (number_t n)
{
union
{
number_t n;
char b[NUMBER_SIZE];
} src, dst;
src.n = n;
for (size_t i=0; i<NUMBER_SIZE; i++)
dst.b[i] = src.b[NUMBER_SIZE-1 - i];
return dst.n;
}
Don't memcpy the data directly into a float type. Keep it as char data, swap the bytes and then treat it as a float.
It might be easier to use the ntoa and related functions to convert from network to host and from host to network..the advantage it would be portable. Here is a link to an article that explains how to do this.
From SDL_endian.h with slight changes:
std::uint32_t Swap32(std::uint32_t x)
{
return static_cast<std::uint32_t>((x << 24) | ((x << 8) & 0x00FF0000) |
((x >> 8) & 0x0000FF00) | (x >> 24));
}
float SwapFloat(float x)
{
union
{
float f;
std::uint32_t ui32;
} swapper;
swapper.f = x;
swapper.ui32 = Swap32(swapper.ui32);
return swapper.f;
}
This value is a float, but when I "memcpy" the value into a win32 float type and then call _byteswap_ulong on that value, I always get 0.0000?
This should work. Can you post the code you have?
However, if you care for performance (perhaps you do not, in that case you can ignore the rest), it should be possible to avoid memcpy, either by directly loading it into the target location and swapping the bytes there, or using a swap which does the swapping while copying.
in some case, especially on modbus: network byte order for a float is:
nfloat[0] = float[1]
nfloat[1] = float[0]
nfloat[2] = float[3]
nfloat[3] = float[2]
Boost libraries have already been mentioned by #morteza and #AnotherParker, stating that the support for float was removed. However, it was added back in a subset of the library since they wrote their comments.
Using Boost.Endian conversion functions, version 1.77.0 as I wrote this answer, you can do the following:
float input = /* some value */;
float reversed = input;
boost::endian::endian_reverse_inplace(reversed);
Check the FAQ to learn why the support was removed then partially added back (mainly, because a reversed float may not be valid anymore) and here for the support history.

Data structures with different sized bit fields

If I have a requirement to create a data structure that has the following fields:
16-bit Size field
3-bit Version field
1-bit CRC field
How would I code this struct? I know the Size field would be an unsigned short type, but what about the other two fields?
First, unsigned short isn't guaranteed to be only 16 bits, just at least 16 bits.
You could do this:
struct Data
{
unsigned short size : 16;
unsigned char version : 3;
unsigned char crc : 1;
};
Assuming you want no padding between the fields, you'll have to issue the appropriate instructions to your compiler. With gcc, you can decorate the structure with __attribute__((packed)):
struct Data
{
// ...
} __attribute__((packed));
In Visual C++, you can use #pragma pack:
#pragma pack(push, 0)
struct Data
{
// ...
};
#pragma pack(pop)
The following class implements the fields you are looking for as a kind of bitfields.
struct Identifier
{
unsigned int a; // only bits 0-19 are used
unsigned int getSize() const {
return a & 0xFFFF; // access bits 0-15
}
unsigned int getVersion() const {
return (a >> 16) & 7; // access bits 16-18
}
unsigned int getCrc() const {
return (a >> 19) & 1; // access bit 19
}
void setSize(unsigned int size) {
a = a - (a & 0xFFF) + (size & 0xFFF);
}
void setVersion(unsigned int version) {
a = a - (a & (7<<16)) + ((version & 7) << 16);
}
void setCrc(unsigned int crc) {
a = a - (a & (1<<19)) + ((crc & 1) << 19);
}
};