Alignment and padding of data inside a blob

Alignment and padding of data inside a blob - c++

I'm using a large blob (allocated memory) to store data continuously in the memory.
I want data inside the blob to be organized like this:
| data1 type | data1 | data2 type | data2 | dataN type | dataN |
dataN type is an int that I use in a switch to convert the dataN to the appropriate type.
The problem is I want to keep data properly aligned to do so I want to enforce all data inside the blob to be 8-bytes packed (I chosen 8 bytes for packing because it will probably keep data properly aligned?), this way data will tightly packed (there won't be holes between data->data types because of alignment).
I tried this:
#pragma pack(8)
class A
{
public:
short b;
int x;
char v;
};
But it doesn't work because using sizeof(A) I get 12 bytes instead of the expected 16 bytes.
P.S: Is there any data type larger than 8 bytes in either x86 or x64 architectures?

This answer assumes two things:
You want the binary blob to be packed tightly (no holes).
You don't want the data members to accessed in an unaligned fashion (which is slow compared to accessing data members that are aligned the way the compiler wants by default).
If this is the case, then you should consider a design where you treat the large "blob" as a byte-oriented stream. In this stream, you marshall/demarshall tag/value pairs that populate objects having natural alignment.
With this scheme, you get the best of both worlds. You get a tightly packed blob, but once you extract objects from the blob, accessing object members is fast because of the natural alignment. It is also portable1 and does not rely of compiler extensions. The disadvantage is the boilerplate code that you need to write for every type that can be put in the blob.
Rudimentary example:
#include <cassert>
#include <iomanip>
#include <iostream>
#include <stdint.h>
#include <vector>
enum BlobKey
{
kBlobKey_Widget,
kBlobKey_Gadget
};
class Blob
{
public:
Blob() : cursor_(0) {}
// Extract a value from the blob. The key associated with this value should
// already have been extracted.
template <typename T>
Blob& operator>>(T& value)
{
assert(cursor_ < bytes_.size());
char* dest = reinterpret_cast<char*>(&value);
for (size_t i=0; i<sizeof(T); ++i)
dest[i] = bytes_[cursor_++];
return *this;
}
// Insert a value into the blob
template <typename T>
Blob& operator<<(const T& value)
{
const char* src = reinterpret_cast<const char*>(&value);
for (size_t i=0; i<sizeof(T); ++i)
bytes_.push_back(src[i]);
return *this;
}
// Overloads of << and >> for std::string might be useful
bool atEnd() const {return cursor_ >= bytes_.size();}
void rewind() {cursor_ = 0;}
void clear() {bytes_.clear(); rewind();}
void print() const
{
using namespace std;
for (size_t i=0; i<bytes_.size(); ++i)
cout << setfill('0') << setw(2) << hex << int(bytes_[i]) << " ";
std::cout << "\n" << dec << bytes_.size() << " bytes\n";
}
private:
std::vector<uint8_t> bytes_;
size_t cursor_;
};
class Widget
{
public:
explicit Widget(int a=0, short b=0, char c=0) : a_(a), b_(b), c_(c) {}
void print() const
{
std::cout << "Widget: a_=" << a_ << " b=" << b_
<< " c_=" << c_ << "\n";
}
private:
int a_;
short b_;
long c_;
friend Blob& operator>>(Blob& blob, Widget& widget)
{
// Demarshall members from blob
blob >> widget.a_;
blob >> widget.b_;
blob >> widget.c_;
return blob;
};
friend Blob& operator<<(Blob& blob, Widget& widget)
{
// Marshall members to blob
blob << kBlobKey_Widget;
blob << widget.a_;
blob << widget.b_;
blob << widget.c_;
return blob;
};
};
class Gadget
{
public:
explicit Gadget(long a=0, char b=0, short c=0) : a_(a), b_(b), c_(c) {}
void print() const
{
std::cout << "Gadget: a_=" << a_ << " b=" << b_
<< " c_=" << c_ << "\n";
}
private:
long a_;
int b_;
short c_;
friend Blob& operator>>(Blob& blob, Gadget& gadget)
{
// Demarshall members from blob
blob >> gadget.a_;
blob >> gadget.b_;
blob >> gadget.c_;
return blob;
};
friend Blob& operator<<(Blob& blob, Gadget& gadget)
{
// Marshall members to blob
blob << kBlobKey_Gadget;
blob << gadget.a_;
blob << gadget.b_;
blob << gadget.c_;
return blob;
};
};
int main()
{
Widget w1(1,2,3), w2(4,5,6);
Gadget g1(7,8,9), g2(10,11,12);
// Fill blob with widgets and gadgets
Blob blob;
blob << w1 << g1 << w2 << g2;
blob.print();
// Retrieve widgets and gadgets from blob
BlobKey key;
while (!blob.atEnd())
{
blob >> key;
switch (key)
{
case kBlobKey_Widget:
{
Widget w;
blob >> w;
w.print();
}
break;
case kBlobKey_Gadget:
{
Gadget g;
blob >> g;
g.print();
}
break;
default:
std::cout << "Unknown object type in blob\n";
assert(false);
}
}
}
If you can use Boost, you might want to use Boost.Serialization with a binary memory stream, as in this answer.
(1) Portable means that the source code should compile anywhere. The resulting binary blob will not be portable if transferred to other machines with different endianness and integer sizes.

It looks like in this case #pragma pack(8) has no effect.
In MS compiler documentation the parameter of pack is described in the following way:
Specifies the value, in bytes, to be used for packing. The default value for n is 8. Valid values are 1, 2, 4, 8, and 16. The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller.
Thus, the #pragma pack directive cannot increase the alignment of a member, but rather can decrease it (using #pragma pack(1) for example). In you case the whole structure alignment is chosen to make its biggest element to be naturally aligned (int which is usually 4 bytes on both 32 and 64-bit CPUs). As a result, the total size is 4 * 3 = 12 bytes.

#Negai explained why you get the observed size.
You should also reconsider your assumptions about "tightly packed" data. With the above structure there is holes in the structure. Assuming 32 bits int and 16 bits short, there is a two bytes hole after the short, and a 3 bytes hole after the char. But it does not matter as this space is inside the structure.
In other words either you get a tightly packed data structure, or you get an aligned data structure, but not both.
Typically, you won't have anything special to do to get the "aligned" behavior that is what the compiler do by default. #pragma pack is useful if you want your data "packed" instead of aligned, that is removing some holes introduced by compiler to keep data aligned.

Did you try this?
class A {
public:
union {
uint64_t dummy;
int data;
};
};
Instances of A and its data member will always be aligned to 8 bytes now. Of course this is pointless if you squeeze a 4 byte data type in the front, it has to be 8 bytes too.

Related

Reading struct/union members from a character buffer

I need to process data that is given to me as a char buffer where the actual structure of the data depends on the values of some of its fields.
More specifically, consider the following header file:
struct IncomingMsgStruct
{
MsgHdrStruct msgHdr;
char msgData[MSG_DATA_MAX_SIZE]; // Can hold any of several structures
};
struct RelevantMessageData
{
DateTimeStruct dateTime;
CommonDataStruct commonData;
MsgBodyUnion msgBody;
};
struct DateTimeStruct { /* ... */ };
struct CommonDataStruct
{
char name[NAME_MAX_SIZE + 1];
MsgTypeEnum msgType;
// more elements here
};
union MsgBodyUnion
{
MsgBodyType1Struct msgBodyType1;
MsgBodyType2Struct msgBodyType2;
// ...
MsgBodyTypeNStruct msgBodyTypeN;
};
struct MsgBodyType1Struct { /* ... */ };
struct MsgBodyType2Struct { /* ... */ };
// ...
struct MsgBodyTypeNStruct { /* ... */ };
The structures contain data members (some of which are also structures) and member functions for initialization, conversion to string, etc. There are no constructors, destructors, virtual functions, or inheritance.
Please note that this is in the context of a legacy code that I have no control over. The header and the definitions in it are used by other components, and some of them can change with time.
The data is made available to me as a buffer of characters, so my processing function will look like:
ResultType processRelevantMessage(char const* inBuffer);
It is guaranteed that inBuffer contains a MsgStruct structure, and that its msgData member holds a RelevantMessageData structure. Correct alignment and endianness are also guaranteed as the data originated from the corresponding structures on the same platform.
For simplicity, let's assume that I am only interested in the case where msgType equals to a specific value, so only the members of, say MsgBodyType2Struct, will need to be accessed (and an error returned otherwise). I can generalize it to handle several types later.
My understanding is that a naive implementation using reinterpret_cast can run afoul of the C++ strict aliasing rules.
My question is:
How can I do it in standard-compliant C++ without invoking undefined behaviour, without changing or duplicating the definitions, and without extra copying or allocations?
Or, if that is not possible, how can I do it in GCC (possibly using flags such as -fno-strict-aliasing etc.)?
EDIT:
Since the data comes from the same platform, there should be no endianness concerns.
As mentioned above, I prefer to avoid copying.
Upon further reading, it seems to me that placement-new should be safe. So is the following implementation compliant?
ResultType processRelevantMessageType2(char const* in)
{
MsgStruct const* pMsgStruct = new (in) MsgStruct;
RelevantMessageData const* pRelevantMessageData = new (pMsgStruct->msgData) RelevantMessageData;
// Assume we're only interested in the MsgBodyType2Struct case
if (pRelevantMessageData->commonData.msgType == MSG_TYPE_2) {
MsgBodyType2Struct const& msgBodyType2Struct = pRelevantMessageData->msgBody.MsgBodyType2Struct;
// Can access the fields of msgBodyType2Struct here?
// ...
}
// ...
}

My understanding is that a naive implementation using reinterpret_cast can run afoul of the C++ strict aliasing rules.
Indeed. Also, consider that an array of bytes might start at an arbitrary address in memory, whereas a struct typically has some alignment restrictions that need to be satisfied. The safest way to deal with this is to create a new object of the desired type, and use std::memcpy() to copy the bytes from the buffer into the object:
ResultType processRelevantMessage(char const* inBuffer) {
MsgHdrStruct hdr;
std::memcpy(&hdr, inbuffer, sizeof hdr);
...
RelevantStruct data;
std::memcpy(&data, inbuffer + sizeof hdr, sizeof data);
...
}
The above is well-defined C++ code, you can use hdr and data afterwards without problems (as long as those are POD types that don't contain any pointers).

I suggest using a serialization library or write operator<< and operator>> overloads for those structs. You could use the functions htonl and ntohl which are available on some platforms or write a support class to stream numeric values yourself.
Such a class could look like this:
#include <bit>
#include <algorithm>
#include <cstring>
#include <iostream>
#include <iterator>
#include <limits>
#include <type_traits>
template<class T>
struct tfnet { // to/from net (or file)
static_assert(std::endian::native == std::endian::little ||
std::endian::native == std::endian::big); // endianess must be known
static_assert(std::numeric_limits<double>::is_iec559); // only support IEEE754
static_assert(std::is_arithmetic_v<T>); // only for arithmetic types
tfnet(T& v) : val(&v) {} // store a pointer to the value to be streamed
// write a value to a stream
friend std::ostream& operator<<(std::ostream& os, const tfnet& n) {
if constexpr(std::endian::native == std::endian::little) {
// reverse byte order to be in network byte order
char buf[sizeof(T)];
std::memcpy(buf, n.val, sizeof buf);
std::reverse(std::begin(buf), std::end(buf));
os.write(buf, sizeof buf);
} else {
// already in network byte order
os.write(n.val, sizeof(T));
}
return os;
}
// read a value from a stream
friend std::istream& operator>>(std::istream& is, const tfnet& n) {
char buf[sizeof(T)];
if(is.read(buf, sizeof buf)) {
if constexpr(std::endian::native == std::endian::little) {
// reverse byte order to be in network byte order
std::reverse(std::begin(buf), std::end(buf));
}
std::memcpy(n.val, buf, sizeof buf);
}
return is;
}
T* val;
};
Now, if you have a set of structs:
#include <cstdint>
struct data {
std::uint16_t x = 10;
std::uint32_t y = 20;
std::uint64_t z = 30;
};
struct compound {
data x;
int y = 40;
};
You can add the streaming operators for them:
std::ostream& operator<<(std::ostream& os, const data& d) {
return os << tfnet{d.x} << tfnet{d.y} << tfnet{d.z};
}
std::istream& operator>>(std::istream& is, data& d) {
return is >> tfnet{d.x} >> tfnet{d.y} >> tfnet{d.z};
}
std::ostream& operator<<(std::ostream& os, const compound& d) {
return os << d.x << tfnet{d.y}; // using data's operator<< for d.x
}
std::istream& operator>>(std::istream& is, compound& d) {
return is >> d.x >> tfnet{d.y}; // using data's operator>> for d.x
}
And reading/writing the structs:
#include <sstream>
int main() {
std::stringstream ss;
compound x;
compound y{{0,0,0},0};
ss << x; // write to stream
ss >> y; // read from stream
}
Demo
If you can't use the streaming operators directly on the source streams, you can put the char buffer you do get in an istringstream and extract the data from that using the added operators.

Dynamically allocate memory to arrays in a union

I'm using union to fill some message fields in a char type message buffer. If the length of the message is constant, it works correctly. See the simplified code sample below.
The problem is, my message can have variable length. Specifically, the const N will be decided on runtime. Is there a way to keep using unions by dynamically allocating memory for buf?
I'm exploring smart pointers but haven't had any luck so far.
const int N = 4;
struct repeating_group_t {
uint8_t field1;
uint8_t field2;
}rpt_group;
struct message_t
{
union
{
char buf[2 + 2*N];
struct {
uint8_t header;
uint8_t block_len;
std::array<repeating_group_t, N> group;
};
};
};
int main()
{
message_t msg;
msg.header = 0x32;
msg.block_len = 8;
for (auto i = 0; i < N; i++)
{
msg.group[i].field1 = i;
msg.group[i].field2 = 10*i;
}
// msg.buf is correctly filled
return 0;
}

As said in the comments, use std::vector.
int main() {
// before C++17 use char
std::vector<std::byte> v.
v.push_back(0x32);
v.push_back(8);
for (auto i = 0; i < N; i++) {
v.push_back(i);
const uint16_t a = 10 * i;
// store uint16_t in big endian
v.push_back(a >> 16);
v.push_back(a & 0xff);
}
}
For custom datatypes, you could provide your own stream-like or container-like container and overload operator>> or another custom function of your choice for your datatypes.
struct Message{
std::vector<std::byte> v;
Message& push8(uint8_t t) { ... }
// push 16 bits little endian
Message& push16le(uint16_t t) { ... }
// push 16 bits big endian
Message& push16be(uint16_t t) { ... }
// etc
Message& push(const Repeating_group& t) {
v.push_back(t.field1);
v.push_back(t.field2);
return v;
}
// etc.
};
int main(){
Message v;
v.push8(0x32).push8(8);
for (...) {
v.push(Repeating_group(i, i * 10));
}
}

You can't have N evaluated at runtime because both c-array (your buf) and std::array have size information in its type.
Also - using union for (de)serialization is not a good practice - size of your structure will depend on alignment needed on given machine it is compiled for and so on... You could add packed attribute to overcome it, but you still have plenty of platform dependency problems here.
Regarding variable length - you'd need to write custom (de)serializer that will understand and store/read that size information to recreate that container on the other end.
Where do you want to pass these messages?

How to write an easy to expand set of color classes in C++

Background
I'm working with a number of different displays (hardware) and different canvases (is that the plural of canvas?). Each can work with different colors. Example cases:
RGB16 canvas -> RGB16 display (this is straight forward, but I didn't want exotic combinations only)
RGB16 canvas -> RGB24 display
Monochrome Canvas -> RGB16 display, where 'true' must show up as a color set at runtime.
At the moment I'm indeed working with the mono -> rgb16 combo (3) to have everything displayed in red and dimmable. Each display that might come might also have slightly different color types.
What I'd like to have
I'd like to have an easy to expand set of color classes (C++). My goal is to be able to write simple assignments, as in
Monochrome m; // default value set at runtime
RGB556 rgb; // default value set at runtime
rgb = m; // conversion function known at compile time
and also
pixelBuffer<Monochrome,w*h> src;
pixelBuffer<RGB556,w*h> dest;
std::copy(src.begin(), src.end(), dest.begin());
This should be possible with templates. However, I'm not sure how to do that, and how I can keep the whole thing simple enough so that more color types can be added later without having to rewrite most of the code, but still being able to influence the details.
I'm sure it's possible to write a class template that takes a type and automagically offers conversion methods to 24-bit RGB, so that any color can be converted to that and then "down"-converted to anything smaller than that.
What I've tried
Well, nothing specific, to be honest. I thought about using the CRTP to provide commonly used methods with compile-time inheritance, but I failed to come up with an implementation that compiles and does what I want. When I have something more I'll add it here.
Totally different approaches are also welcome as I haven't yet written code that uses a predefined interface that my color classes would have to provide.
Experiment 1: Abstract interface vs direct assignment
As suggested in the comments and the first answer, I wrote a simple test to get a feeling for the timing. Here's my header:
#ifndef COLORS_H
#define COLORS_H
#include <stdint.h>
class Color
{
public:
virtual uint8_t r() const = 0;
virtual uint8_t g() const = 0;
virtual uint8_t b() const = 0;
virtual void setR(uint8_t v) = 0;
virtual void setG(uint8_t v) = 0;
virtual void setB(uint8_t v) = 0;
static uint32_t copies;
};
class RGB24 : public Color
{
public:
RGB24(uint8_t r = 0, uint8_t g = 0, uint8_t b = 0)
: r_(r),
g_(g),
b_(b)
{
}
uint8_t r() const {return r_;}
uint8_t g() const {return g_;}
uint8_t b() const {return b_;}
void setR(uint8_t v) {r_ = v;}
void setG(uint8_t v) {g_ = v;}
void setB(uint8_t v) {b_ = v;}
RGB24& operator=(const Color& other)
{
copies++;
setR(other.r());
setG(other.g());
setB(other.b());
return *this;
}
RGB24& operator=(bool b)
{
copies++;
if (b)
{
setR(0xFF);
setG(0xFF);
setB(0xFF);
}
return *this;
}
private:
uint8_t r_;
uint8_t g_;
uint8_t b_;
};
class Monochrome : public Color
{
public:
Monochrome(bool b = false)
: b_(b)
{
}
uint8_t r() const {return b_ ? 0xFF : 0;}
uint8_t g() const {return b_ ? 0xFF : 0;}
uint8_t b() const {return b_ ? 0xFF : 0;}
void setR(uint8_t v) {b_ = (v != 0);}
void setG(uint8_t v) {b_ = (v != 0);}
void setB(uint8_t v) {b_ = (v != 0);}
Monochrome& operator=(const Color& other)
{
setR(other.r());
setG(other.g());
setB(other.b());
return *this;
}
private:
bool b_;
};
#endif // COLORS_H
Here's the code I used to get some execution times (running on the target hardware, a Cortex-M4):
RGB24 rgb[N];
Monochrome m[N] = {true};
elapsedMicros t;
Serial.printf("Abstract Interface:\n");
std::copy(m, &m[N], rgb);
uint32_t us = t;
Serial.printf("time = %u us\n", us);
Serial.printf("rgb[N/2].r() = 0x%02x\n", rgb[N/2].r());
Serial.printf("copies: %u\n\n", Color::copies);
bool b[N] = {false};
Serial.printf("Direct copy:");
t = elapsedMicros();
std::copy(b, &b[N], rgb);
us = t;
Serial.printf("time = %u us\n", us);
Serial.printf("rgb[N/2].r() = 0x%02x\n", rgb[N/2].r());
Serial.printf("copies: %u\n\n", Color::copies);
And the output:
Abstract Interface:
time = 1241 us
rgb[N/2].r() = 0x00
copies: 1000
Direct copy:
time = 157 us
rgb[N/2].r() = 0x00
copies: 2000
When I extrapolate the first number, 1241 microseconds, for a 128*128 display at 25 fps, just converting between the colors would take 50% of the CPU time. The calculation is: 128*128*25 pixels per second * 1241 us / 1000 pixels = 0.51 seconds. A canvas/driver combo written for this situation can do it in about 0.1 seconds and it indeed has to copy and convert this many pixels per second because the whole display is drawn in every frame.
This comparison is probably a bit unfair, but bear with me. I'm not very experienced in profiling; and writing code that makes a fair comparison between what I have and what I'd like to have is simply not possible. The point is that Monochrome is essentially just a bool, and the compiler should be able to optimize for that when I have proper code.
Experiment 2: Separate classes with a templated convert function
As André suggested, I wrote an RGB24 class, defined it as the MostPreciseFormat, and a templated free convert function. That said, I'm not sure if this is exactly what he meant:
class RGB24
{
public:
RGB24() : r_(0), g_(0), b_(0) {}
uint8_t r_, g_, b_;
uint8_t r() const {return r_;}
uint8_t g() const {return g_;}
uint8_t b() const {return b_;}
void setR(const uint8_t& r) {r_ = r;}
void setG(const uint8_t& g) {g_ = g;}
void setB(const uint8_t& b) {b_ = b;}
template<typename Other>
RGB24(const Other& other)
{
convert(*this, other);
}
template <typename Other>
RGB24& operator=(const Other& other)
{
convert(*this, other);
return *this;
}
};
typedef RGB24 MostPreciseFormat;
template <typename To, typename From>
void convert (To& to, const From& from)
{
// Serial.println("Convert() called"); Serial.flush();
MostPreciseFormat precise;
precise.setR(from.r());
precise.setG(from.g());
precise.setB(from.b());
to = precise;
}
template <>
void convert(RGB24& to, const bool& from)
{
if (from)
{
to.setR(0xFF);
to.setG(0xFF);
to.setB(0xFF);
}
else
{
to.setR(0);
to.setG(0);
to.setB(0);
}
}
This conversion needs 209 microseconds for 1000 pixels, which seems reasonable. But did I get it right?
What I have now
This works as intended, based on Andrés answer. It has some issues, and probably needs some restructuring here and there. I have not yet looked at the CPU time it takes:
#include <bitset>
#include <iostream>
#include <stdint.h>
using namespace std;
namespace channel
{
static constexpr struct left_aligned_t {} left_aligned = left_aligned_t();
static constexpr struct right_aligned_t {} right_aligned = right_aligned_t();
template<typename T, unsigned int Offset_, unsigned int Width_>
class Proxy
{
public:
/* Some checks and typedefs */
static_assert(std::is_unsigned<T>::value, "ChannelProxy: T must be an unsigned arithmetic type.");
typedef T data_type;
static constexpr unsigned int Width = Width_;
static_assert(Width <= 8, "ChannelProxy: Width must be <= 8.");
static constexpr unsigned int Offset = Offset_;
static_assert((Offset + Width) <= 8*sizeof(T), "ChannelProxy: Channel is out of the data type's bounds. Check data type, offset and width.");
Proxy(T& data) : data_(data) {}
uint8_t read(right_aligned_t) const
{
return ((data_ & read_mask) >> Offset);
}
uint8_t read(left_aligned_t) const
{
return read(right_aligned) << (8-Width);
}
void write(const uint8_t& value, right_aligned_t)
{
// input data is right aligned
data_ = (data_ & write_mask) | ((value & value_mask) << Offset);
}
void write(const uint8_t& value, left_aligned_t)
{
// input data is left aligned, so shift right to right align, then write
write(value >> (8-Width), right_aligned);
}
private:
static constexpr uint8_t value_mask = (uint8_t)((1<<Width)-1);
static constexpr T read_mask = (value_mask << Offset);
static constexpr T write_mask = (T)~read_mask;
T& data_;
};
} // namespace channel
struct RGB24
{
typedef channel::Proxy<uint8_t, 0, 8> proxy;
typedef channel::Proxy<const uint8_t, 0, 8> const_proxy;
RGB24() : r_(0), g_(0), b_(0) {}
RGB24(const uint8_t& r, const uint8_t& g, const uint8_t& b)
: r_(r), g_(g), b_(b) {}
// unfortunately, we need different proxies for read and write access (data_type constness)
const_proxy r() const {return const_proxy(r_);}
proxy r() {return proxy(r_);}
const_proxy g() const {return const_proxy(g_);}
proxy g() {return proxy(g_);}
const_proxy b() const {return const_proxy(b_);}
proxy b() {return proxy(b_);}
template <typename From>
RGB24& operator=(const From& from)
{
convert(*this, from);
return *this;
}
uint8_t r_;
uint8_t g_;
uint8_t b_;
};
struct RGB565 // 16 bits: MSB | RRRRR GGGGGG BBBBB | LSB
{
typedef uint16_t data_type;
typedef channel::Proxy<data_type, 0, 5> b_proxy;
typedef channel::Proxy<const data_type, 0, 5> const_b_proxy;
typedef channel::Proxy<data_type, 5, 6> g_proxy;
typedef channel::Proxy<const data_type, 5, 6> const_g_proxy;
typedef channel::Proxy<data_type, 11, 5> r_proxy;
typedef channel::Proxy<const data_type, 11, 5> const_r_proxy;
RGB565() : data_(0) {}
template <typename alignment_type = channel::right_aligned_t>
RGB565(const uint8_t& r_, const uint8_t& g_, const uint8_t& b_, alignment_type = alignment_type())
{
alignment_type alignment;
r().write(r_, alignment);
g().write(g_, alignment);
b().write(b_, alignment);
}
template <typename From>
RGB565& operator=(const From& from)
{
convert(*this, from);
return *this;
}
const_r_proxy r() const {return const_r_proxy(data_);}
r_proxy r() {return r_proxy(data_);}
const_g_proxy g() const {return const_g_proxy(data_);}
g_proxy g() {return g_proxy(data_);}
const_b_proxy b() const {return const_b_proxy(data_);}
b_proxy b() {return b_proxy(data_);}
data_type data_;
};
typedef bool Monochrome;
template <typename To, typename From>
void convert(To& to, const From& from)
{
to.r().write(from.r().read(channel::left_aligned), channel::left_aligned);
to.g().write(from.g().read(channel::left_aligned), channel::left_aligned);
to.b().write(from.b().read(channel::left_aligned), channel::left_aligned);
}
/* bool to RGB565 wouldn't work without this: */
template <>
void convert<RGB565, Monochrome>(RGB565& to, const Monochrome& from)
{
to.data_ = from ? 0xFFFF : 0;
}
int main()
{
cout << "Initializing RGB24 color0(0b11111101, 0, 0)\n\n";
RGB24 color0(0b11111101, 0, 0);
cout << "Initializing RGB24 color1(default)\n\n";
RGB24 color1;
cout << "color 1 = color0\n";
color1 = color0;
cout << "color1.r() = " << std::bitset<8*sizeof(uint8_t)>(color1.r().read(channel::right_aligned)) << "\n";
cout << "color1.g() = " << std::bitset<8*sizeof(uint8_t)>(color1.g().read(channel::right_aligned)) << "\n";
cout << "color1.b() = " << std::bitset<8*sizeof(uint8_t)>(color1.b().read(channel::right_aligned)) << "\n\n";
cout << "Initializing RGB565 color2(0b10001, 0b100100, 0b10100)\n";
RGB565 color2(0b10001, 0b100100, 0b10100);
cout << "color2.data = " << std::bitset<8*sizeof(uint16_t)>(color2.data_) << "\n";
cout << "color2.b(right aligned) = " << std::bitset<8*sizeof(uint8_t)>(color2.b().read(channel::right_aligned)) << "\n";
cout << "color2.b(left aligned) = " << std::bitset<8*sizeof(uint8_t)>(color2.b().read(channel::left_aligned)) << "\n\n";
cout << "color 0 = color2\n";
color0 = color2;
cout << "color0.b(right aligned) = " << std::bitset<8*sizeof(uint8_t)>(color0.b().read(channel::right_aligned)) << "\n";
cout << "color0.b(left aligned) = " << std::bitset<8*sizeof(uint8_t)>(color0.b().read(channel::left_aligned)) << "\n\n";
cout << "Initializing Monochrome color3(true)\n\n";
Monochrome color3 = true;
cout << "color 2 = color3\n";
color2 = color3;
cout << "color2.data = " << std::bitset<8*sizeof(uint16_t)>(color2.data_) << "\n";
cout << "color2.b(right aligned) = " << std::bitset<8*sizeof(uint8_t)>(color2.b().read(channel::right_aligned)) << "\n";
cout << "color2.b(left aligned) = " << std::bitset<8*sizeof(uint8_t)>(color2.b().read(channel::left_aligned)) << "\n\n";
return 0;
}
With this code, converting 1000 pixels from RGB565 to RGB24 takes 296 us with the source pixels generated from ADC noise (the compiler cannot have taken any shortcut here regarding the source data). Converting 1000 pixels from Monochrome to RGB24 takes 313 us, using a template specialization of convert().

Firstly, from the question, I see it is required that the solution have very fast execution time, so this requirement should drive the approaches we can try to solve the problem.
This leads to the conclusion that we should avoid virtual function calls on a per pixel basis, otherwise the CPU will have to make one extra unneeded indirection for each pixel. However, we should not avoid virtual functions at all, as it is totally acceptable to use them on per canvas operations.
So, the general solution I propose is to focus on run-time flexibility on canvas classes, so that you can, for instance, use inheritance for each canvas type, and to focus on compile-time binding for pixel operations.
The problem suggests the most important feature to be addressed by the color classes is the conversion of colors formats between them, so I'll focus on that for now. You can implement conversion functions between types using three approaches:
Star: every color format is convertible to and from the most precise format. To convert A into B, first convert A to the most precise format, then from this format to B. This is simple and very extensible, as adding a new format just requires the definition of two more functions, i.e., the number of functions grows linearly with the number of formats (O(N)).
Fully connected: every color is convertible to and from every other color format. This is much faster, because requires only one conversion with the minimum effort and maximum potential for optimization. However, the number of functions is O(N2).
Hybrid: if a direct conversion is defined, it is used, otherwise, use the most precise format as intermediary.
In order to the compiler to pick the right conversion function, templates are the most elegant solution that I can figure out. I'll try to keep it simple, but of course, this can leads to some limitations.
Added: One of the limitations is partial function template specialization, that would actually be useful when some conversion function code applies to more than one pair of formats. A suggested approach to deal with this is to use a traits system to describe a color format. The convert function in the code below would be written as a template method in the format traits. This is not covered in this answer.
Let's have some code (Edited):
// Complete and repeat the class definition below for every color format.
// There is no specific interface to follow, but all classes must have a
// template constructor and a template assignment operator to convert from
// other color formats.
class ColorXYZ {
public:
...
template <class Other>
ColorXYZ(const Other& other) {
convert(*this, other);
}
template <class Other>
ColorXYZ& operator=(const Other& other) {
convert(*this, other);
return *this;
}
...
};
// These should be class definitions, not just forward declarations:
class ColorMono;
class ColorRGB16;
class ColorRGB24;
// Every format must be able to convert to and from the MostPreciseFormat
typedef ColorRGB24 MostPreciseFormat;
// Generic conversion of color formats that converts to MostPreciseFormat and
// then to the required format.
template <class To, class From>
void convert(To& to, const From& from) {
MostPreciseFormat precise(from);
convert(to, precise);
}
// Specialization to convert from Mono to RGB24.
template <>
void convert<ColorRGB24, ColorMono>(ColorRGB24& to, const ColorMono& from) {
// specific code to convert from mono to RGB24.
to.setR(from.value() ? 255 : 0);
to.setG(from.value() ? 255 : 0);
to.setB(from.value() ? 255 : 0);
}
... // A lot of other specializations of convert here.
The compile will pick the most specialized conversion function when it exists, otherwise it should fallback to the conversion that uses MostPreciseFormat.
Added: It is important that all specializations of convert are defined for MostPreciseFormat in both To and From, including the case where that format is the same for To and From. To be more specific, for the code above, at least the following specializations are needed:
convert<ColorRGB24, ColorMono>
convert<ColorRGB24, ColorRGB16>
convert<ColorRGB24, ColorRGB24>
convert<ColorRGB24, ColorXYZ>
convert<ColorMono, ColorRGB24>
convert<ColorRGB16, ColorRGB24>
convert<ColorRGB24, ColorRGB24>
convert<ColorXYZ, ColorRGB24>
Added: Other specializations, such as converting from Mono to RGB16 can use the generic approach, that will be instantiated as a conversion from Mono to RGB24 and from RGB24 to RGB16. This is inefficient (star approach), but works. For common cases, it might be a good idea to have specializations too (towards fully connected approach).
You should note that there is no common base class between all colors and it shouldn't exist. I expect the user to have a Canvas class that abstracts the details of working with every pixel format by means of virtual methods. For instance:
class Canvas {
public:
...
virtual void setPixel(unsigned int index, ColorRGB24 color) = 0;
...
};
class CanvasMono {
public:
...
virtual void setPixel(unsigned int index, ColorRGB24 color) {
pixel[index] = color; // converting from RGB24 to mono
}
...
private:
ColorMono* pixel;
};
Depending on most common use cases, it might be worth to have an overload for each format, or at least the most common ones. But if the user shouldn't use formatted color values frequently, there can be only one overload and it always use the convert function.
I hope I covered the most relevant aspects of the color system you want to create.

Create a Color interface with setter/getter methods
class Color {
public:
void setR(unsigned char )=0;
unsigned char getR()=0;
...
}
and inherit it to custom color classes as you want with bitfield data and overridden methods.
class RGB556 : public Color {
unsigned char r:5;
unsigned char g:5;
unsigned char b:6;
public:
void setR(unsigned char r) { this->r=r; }
unsigned char getR() { return r; }
...
}

change bit size of packed struct in derived class

The existing code:
typedef unsigned int uint;
Class A
{
union xReg
{
uint allX;
struct
{
uint x3 : 9;
uint x2 : 9;
uint x1 : 14;
}__attribute__((__packed__)) __attribute__((aligned(4)));
};
};
My requirement:
Now, I need to derive a class from A, and and in the derived class, the bit sizes of x1, x2 and x3 has to change.
How do i do this ?
Thanks for your help !
EDIT
I have a class (lets say A) with approx. 7-8 unions (each representing HW register), and around 20 (approx.) functions. Most of these functions create instances of these unions, and use the bits (x1, x2, x3 etc in my example).
Now, my requirement is to add code for a new hardware which has 95% of functionality same. The changes include the change in register bit sizes, and some functionality change. So, among 20 functions, atleast 5 functions I need to change to change the implementation. This is the reason for me to select inheritance and override these functions.
The rest 15 functions, only change is the bit size changes. So, I dont want to override these functions, but use the base class ones. But, the bit sizes of the registers (unions) should change. How do I do that?

You cannot change a bit-field length in a derived class in C++.
What you could try, however, is parametrize your class with the bit_field lengths.
template <size_t N1, size_t N2, size_t N3 = 32 - N1 - N2>
struct myStruct
{
uint bitField1 : N1;
uint bitField2 : N2;
uint bitField3 : N3;
};
Now you can instantiate the struct with any N1, N2, N3 you wish, for example:
myStruct<9, 9> s;

With the given design you cannot solve it. The problem is that while you can derive and override methods, data members cannot be overridden, the members that are not overridden in the derived class would access the field in exactly the same way that they are doing, and the result is that you would be using different sizes in different places.
Runtime polymorphism
I haven't think much on the design, but the first simple runtime approach would be refactoring all the existing code so that instead of accessing the fields directly they do so by means of accessors (setters and getters), and that map the arguments to the storage types. You would be able to override those accessors and the functions would not depend on the exact size of each bitfield. On the negative side, having the accessors virtual means that there will be a performance instance, so you might consider
Compile time (or static) polymorphism
You can refactor the class so that it is a template that takes as argument the type of the union. That way, you can instantiate the template with a different union in what would be in your current design an derived class. Addition of new member functions (if you want to use member functions) would not be so simple, and you might end up having to use CRTP or some other approach to create the base of the implementation while allowing you to extend it with specialization.
template <typename R>
class A
{
R xReg;
public:
unsigned int read_x1() const {
return xReg.x1;
}
// rest of implementation
};
union xReg1 {
unsigned int all;
struct {
unsigned int x3 : 9;
unsigned int x2 : 9;
unsigned int x1 : 14;
};
};
union xReg2 {
unsigned int all;
struct {
unsigned int x3 : 8;
unsigned int x2 : 9;
unsigned int x1 : 15;
};
};
int main() {
A< xReg1 > initial;
std::cout << initial.access_x1() << std::endl;
A< xReg2 > second;
std::cout << second.access_x1() << std::endl;
}

Given your additional problem statement, a variant on what Armen suggested might be applicable. It sounds as though you do not need actual inheritance here, just some way to reuse some of the common code.
You might not want member functions at all, for instance.
template<typename reg>
union hardware_register {
unsigned all;
struct {
unsigned i : reg::i;
unsigned j : reg::j;
unsigned k : reg::k;
};
};
template<typename hardware>
void print_fields(const hardware& hw) {
cout << hw.i << " " << hw.j << " " << hw.k << endl;
}
//this method needs special handling depending on what type of hardware you're on
template<typename hardware>
void print_largest_field(const hardware& hw);
struct register_a {
static const unsigned i = 9;
static const unsigned j = 4;
static const unsigned k = 15;
};
struct register_b {
static const unsigned i = 4;
static const unsigned j = 15;
static const unsigned k = 9;
};
template<>
void print_largest_field<register_a>(const register_a& a) {
cout << a.k << endl;
}
template<>
void print_largest_field<register_b>(const register_b& b) {
cout << b.j << endl;
}
int main() {
hardware_register<register_a> a;
hardware_register<register_b> b;
print_fields(a);
print_fields(b);
print_largest_field(a);
print_largest_field(b);
}
Alternatively, you can wrap up all the common functionality into a templated base class. You derive from that base class, and implement whatever special behavior you need.
template<typename HW>
struct base {
void print_fields {
cout << hw.i << hw.j << hw.k << endl;
};
private:
HW hw;
};
struct hw_a : base<register_a> {
void print_largest_field {
cout << hw.k << end;
}
};
struct hw_b : base<register_b> {
void print_largest_field {
cout << hw.j << end;
}
};
You can provide multiple template parameters for your different types of registers, or expand on the underlying trait structure such that it defines more than one register at a time.

Problem using reinterpret_cast<> in c++

I am trying to cast a datastream into a struct since the datastream consists of fixed-width messages and each message has fulle defined fixed width fields as well. I was planning on creating a struct and then using reinterpret_cast to cast pointer to the datastream to the struct to get the fields. I made some test code and get weird results. Could any explain why I am getting these or how to correct the code. (the datastream will be binary and alpha numeric mixed but im just testing with strings)
#pragma pack(push,1)
struct Header
{
char msgType[1];
char filler[1];
char third[1];
char fourth[1];
};
#pragma pack(pop)
int main(void)
{
cout << sizeof(Header) << endl;
char* data = "four";
Header* header = reinterpret_cast<Header*>(data);
cout << header->msgType << endl;
cout << header ->filler << endl;
cout << header->third << endl;
cout << header->fourth << endl;
return 0;
}
The result that are coming up are
4
four
our
ur
r
I think the four, our and ur is printing since it cant find the null terminator. How do I get around the null terminator issue?

In order to be able to print an array of chars, and being able to distinguish it from a null-terminated string, you need other operator<< definitions:
template< size_t N >
std::ostream& operator<<( std::ostream& out, char (&array)[N] ) {
for( size_t i = 0; i != N; ++i ) out << array[i];
return out;
}

You're right about the lack of null terminator. The reason it's printing "ur" again is because you repeated the header->third instead of header->fourth. Instead of "char[1]", why not just declare those variables as "char"?
struct Header
{
char msgType;
char filler;
char third;
char fourth;
};

The issue is not reinterpret_cast (although using it is a very bad idea) but in the types of the things in the struct. They should be of type 'char', not of type 'char[1]'.

#pragma pack(push,1)
template<int N>
struct THeader
{
char msgType[1+N];
char filler[1+N];
char third[1+N];
char fourth[1+N];
};
typedef THeader<0> Header0;
typedef THeader<1> Header1;
Header1 Convert(const Header0 & h0) {
Header1 h1 = {0};
std::copy(h0.msgType, h0.msgType + sizeof(h0.msgType)/sizeof(h0.msgType[0]), h1.msgType);
std::copy(h0.filler, h0.filler+ sizeof(h0.filler)/sizeof(h0.filler[0]), h1.filler);
std::copy(h0.third , h0.third + sizeof(h0.third) /sizeof(h0.third [0]), h1.third);
std::copy(h0.fourth, h0.fourth+ sizeof(h0.fourth)/sizeof(h0.fourth[0]), h1.fourth);
return h1;
}
#pragma pack(pop)
int main(void)
{
cout << sizeof(Header) << endl;
char* data = "four";
Header0* header0 = reinterpret_cast<Header*>(data);
Header1 header = Convert(*header0);
cout << header.msgType << endl;
cout << header.filler << endl;
cout << header.third << endl;
cout << header.fourth << endl;
return 0;
}

In my experience, using #pragma pack has caused headaches -- partially due to a compiler that doesn't correctly pop, but also due to developers forgetting to pop in one header. One mistake like that and structs end up defined differently depending on which order headers get included in a compilation unit. It's a debugging nightmare.
I try not to do memory overlays for that reason -- you can't trust that your struct is properly aligned with the data you are expecting. Instead, I create structs (or classes) that contain the data from a message in a "native" C++ format. For example, you don't need a "filler" field defined if it's just there for alignment purposes. And perhaps it makes more sense for the type of a field to be int than for it to be char[4]. As soon as possible, translate the datastream into the "native" type.

Assuming you want to keep using an overlayable struct (which is sensible, since it avoids the copy in Alexey's code), you can replace your raw char arrays with a wrapper like the following:
template <int N> struct FixedStr {
char v[N];
};
template <int N>
std::ostream& operator<<( std::ostream& out, FixedStr const &str) {
char const *nul = (char const *)memchr(str.v, 0, N);
int n = (nul == NULL) ? N : nul-str.v;
out.write(str.v, n);
return out;
}
Then your generated structures will look like:
struct Header
{
FixedStr<1> msgType;
FixedStr<1> filler;
FixedStr<1> third;
FixedStr<40> forty;
};
and your existing code should work fine.
NB. you can add methods to FixedStr if you want (eg, std::string FixedStr::toString()) just don't add virtual methods or inheritance, and it will overlay fine.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Alignment and padding of data inside a blob - c++

Did you try this? class A { public: union { uint64_t dummy; int data; }; }; Instances of A and its data member will always be aligned to 8 bytes now. Of course this is pointless if you squeeze a 4 byte data type in the front, it has to be 8 bytes too.

Related

Reading struct/union members from a character buffer

Dynamically allocate memory to arrays in a union

How to write an easy to expand set of color classes in C++

change bit size of packed struct in derived class

Problem using reinterpret_cast<> in c++

Categories

Resources