C++ class initialization of multiple data members with one assignment - c++

Assume I have the following class that is more or less a generic mechanism to store arbitrary values/types in the same class (this is in the context of a database)
class RawStruct {
private:
uint16_t value1; /// value[0..1]
uint16_t value2; /// value[2..3]
uint16_t value3; /// value[4..5]
uint16_t value4; /// value[6..7]
uint8_t flags; ///
uint8_t padding; /// unused.
INLINE_HOST_DEVICE RawStruct(const int64_t value) : RawStruct(toFlags(XMLDatatype::xsd_long),(uint64_t)value) {}
INLINE_HOST_DEVICE RawStruct(uint8_t flags, uint64_t value) {setValue(flags,value);}
INLINE_HOST_ONLY void setValue(uint8_t flags, uint64_t value) {
*((uint64_t*)&this->value1) = value;
/*
*((uint16_t*)&this->value1) = (value);
*((uint16_t*)&this->value2) = (value) >> 16;
*((uint16_t*)&this->value3) = (value) >> 32;
*((uint16_t*)&this->value4) = (value) >> 48;
*/
this->flags = flags;
this->padding = 0;
}
Now some code that invokes the above:
int value=42;
char* data = (char*)malloc(6*sizeof(RawStruct));
RawStruct* rsArray = (RawStruct*)data;
rsArray[0] = RawStruct((long) value);
Now my problem: When compiling a complex piece of code that more or less behaves like the invocation above I get some incorrect value being stored in the RawStruct array. This only happens in our testing on the ARM platform with optimizations (-O3). On X86 all our testing passes. They also pass on arm when using debug mode (-g and no optimizations).
My question would be : Are we violating some rule inside that setValue method by looking at the four uint16_t-s as one big uint64_t ?
Do I hit maybe some incorrect code generation by the compiler ?
The code works fine if I comment the current cast/assignment and replace with the four individual assignments that are commented out above (inside setValue() helper).

The line
*((uint64_t*)&this->value1) = value;
has undefined behavior. &this->value1 is a pointer to a uint16_t. You cannot use a pointer of type uint64_t* to write to that uint16_t object.
It violates the aliasing rules and even if they weren't a thing, the C++ object model wouldn't have any concept of writing to multiple consecutive objects in one access like this.
The alignment requirements of the types might also be a problem. this->value1 may not be aligned correctly for a uint64_t, causing undefined behavior as well (at least in general, probably not with the malloc).
And even further, there is no guarantee that there won't be padding between the members in general, although you would probably know if this was the case based on the target ABI.
If you find yourself using C-style casts in C++, rethink what you are doing. If it can't be done with a C++-style static_cast instead, it is very likely leading to UB (and even static_cast allows some potentially dangerous cast).
For example (uint64_t)value can also be written as static_cast<uint64>(value) and will work, while static_cast<uint64_t*>(&this->value1) will fail, which should be considered a warning that there is danger in the cast. (C++-style would require reinterpret_cast here.)
Also, malloc in C++ code is very likely wrong and leading to UB as well. It should be new (or rather std::unique_ptr or std::vector or some equivalent) instead. (In your particular example here it turns out to be fine, but if your class wasn't that trivial, it wouldn't.)

Related

Passing pointers to arrays of unrelated, but compatible, types without copying?

(Disclaimer: At this point, this is mostly academic interest.)
Imagine I have such an external interface, that is, I do not control it's code:
// Provided externally: Cannot (easily) change this:
// fill buffer with n floats:
void data_source_external(float* pDataOut, size_t n);
// send n data words from pDataIn:
void data_sink_external(const uint32_t* pDataIn, size_t n);
Is it possible within standard C++ to "move" / "stream" data between these two interfaces without copying?
That is, is there any way to make the following be non-UB, without copying of the data between two correctly typed buffers?
int main()
{
constexpr size_t n = 64;
float fbuffer[n];
data_source_external(fbuffer, n);
// These hold and can be checked statically:
static_assert(sizeof(float) == sizeof(uint32_t), "same size");
static_assert(alignof(float) == alignof(uint32_t), "same alignment");
static_assert(std::numeric_limits<float>::is_iec559 == true, "IEEE 754");
// This is clearly UB. Any way to make this work without copying the data?
const uint32_t* buffer_alias = static_cast<uint32_t*>(static_cast<void*>(fbuffer));
// **Note**:
// + reinterpret_cast would also be UB.
data_sink_external(buffer_alias, n);
// ...
As far as I can tell the following would be defined behavior, at least with regard to strict aliasing:
...
uint32_t ibuffer[n];
std::memcpy(ibuffer, fbuffer, n * sizeof(uint32_t));
data_sink_external(ibuffer, n);
but given that the ibuffer will have exactly the same bits as the fbuffer this seems quite insane.
Or would we expect optimizing compilers to optimize even this copy away? (In a now deleted comment-like answer a user posted a godbolt link that seems to indicate, at least on first glance, that clang 11 indeed would be able to optimize out the memcpy.)
I didn't test and can't comment yet (cause not enough reputation). But reinterpret_cast may help in this situation.
Documentation
Basically it tells the compiler, hey treat this pointer as if it was the specified type in the cast.

Is there any portable way to ensure a struct is defined without padding bytes using c++11? [duplicate]

This question already has answers here:
Compile-time check to make sure that there is no padding anywhere in a struct
(4 answers)
Closed 3 years ago.
Lets consider the following task:
My C++ module as part of an embedded system receives 8 bytes of data, like: uint8_t data[8].
The value of the first byte determines the layout of the rest (20-30 different). In order to get the data effectively, I would create different structs for each layout and put each to a union and read the data directly from the address of my input through a pointer like this:
struct Interpretation_1 {
uint8_t multiplexer;
uint8_t timestamp;
uint32_t position;
uint16_t speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
...
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << +interpreter->movement.position << "\n";
The problem I have is, the compiler can insert padding bytes to the interpretation structs and this kills my idea. I know I can use
with gcc: struct MyStruct{} __attribute__((__packed__));
with MSVC: I can use #pragma pack(push, 1) MyStruct{}; #pragma pack(pop)
with clang: ? (I could check it)
But is there any portable way to achieve this? I know c++11 has e.g. alignas for alignment control, but can I use it for this? I have to use c++11 but I would be just interested if there is a better solution with later version of c++.
But is there any portable way to achieve this?
No, there is no (standard) way to "make" a type that would have padding to not have padding in C++. All objects are aligned at least as much as their type requires and if that alignment doesn't match with the previous sub objects, then there will be padding and that is unavoidable.
Furthermore, there is another problem: You're accessing through a reinterpreted pointed that doesn't point to an object of compatible type. The behaviour of the program is undefined.
We can conclude that classes are not generally useful for representing arbitrary binary data. The packed structures are non-standard, and they also aren't compatible across different systems with different representations for integers (byte endianness).
There is a way to check whether a type contains padding: Compare the size of the sub objects to the size of the complete object, and do this recursively to each member. If the sizes don't match, then there is padding. This is quite tricky however because C++ has minimal reflection capabilities, so you need to resort either hard coding or meta programming.
Given such check, you can make the compilation fail on systems where the assumption doesn't hold.
Another handy tool is std::has_unique_object_representations (since C++17) which will always be false for all types that have padding. But note that it will also be false for types that contain floats for example. Only types that return true can be meaningfully compared for equality with std::memcmp.
Reading from unaligned memory is undefined behavior in C++. In other words, the compiler is allowed to assume that every uint32_t is located at a alignof(uint32_t)-byte boundary and every uint16_t is located at a alignof(uint16_t)-byte boundary. This means that if you somehow manage to pack your bytes portably, doing interpreter->movement.position will still trigger undefined behaviour.
(In practice, on most architectures, unaligned memory access will still work, but albeit incur a performance penalty.)
You could, however, write a wrapper, like how std::vector<bool>::operator[] works:
#include <cstdint>
#include <cstring>
#include <iostream>
#include <type_traits>
template <typename T>
struct unaligned_wrapper {
static_assert(std::is_trivial<T>::value);
std::aligned_storage_t<sizeof(T), 1> buf;
operator T() const noexcept {
T ret;
memcpy(&ret, &buf, sizeof(T));
return ret;
}
unaligned_wrapper& operator=(T t) noexcept {
memcpy(&buf, &t, sizeof(T));
return *this;
}
};
struct Interpretation_1 {
unaligned_wrapper<uint8_t> multiplexer;
unaligned_wrapper<uint8_t> timestamp;
unaligned_wrapper<uint32_t> position;
unaligned_wrapper<uint16_t> speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
int main(){
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << interpreter->movement.position << "\n";
}
This would ensure that every read or write to the unaligned integer is transformed to a bytewise memcpy, which does not have any alignment requirement. There might be a performance penalty for this on architectures with the ability to access unaligned memory quickly, but it would work on any conforming compiler.

Advantages of boolean values to bit-fields

The code base I work in is quite old. While we compile nearly everything with c++11. Much of the code was written in c many years ago. When developing new classes in old areas I always find myself in a situation where I have to choose between matching old methodologies, or going with a more modern approach.
In most cases, I prefer sticking with more modern techniques when possible. However, one common old practice I often see, which I have a hard time arguing the use of, is bitfields. We pass a lot of messages, here, many times, they are full of single bit values. Take the example below:
class NewStructure
{
public:
const bool getValue1() const
{
return value1;
}
void setValue1(const bool input)
{
value1 = input;
}
private:
bool value1;
bool value2;
bool value3;
bool value4;
bool value5;
bool value6;
bool value7;
bool value8;
};
struct OldStructure
{
const bool getValue1() const
{
return value1;
}
void setValue1(const bool input)
{
value1 = input;
}
unsigned char value1 : 1;
unsigned char value2 : 1;
unsigned char value3 : 1;
unsigned char value4 : 1;
unsigned char value5 : 1;
unsigned char value6 : 1;
unsigned char value7 : 1;
unsigned char value8 : 1;
};
In this case the sizes are 8 bytes for the New Structure and 1 for the old.
I added a "getter" and "setter" to illustrate the point that from a user perspective, they can be identical. I realize that perhaps you could make the case of readability for the next developer, but other than that, is there a reason to avoid bit fields? I know that packed fields take a performance hit, but because these are all characters, padding rules are still in place.
There are several things to consider when using bitfields. Those are (order of importance would depend on situation)
Performance
Bitfields operation incur performance penalty when set or read (as compared to direct types). A simple example of codegen shows the extra instructions emitted: https://gcc.godbolt.org/z/DpcErN However, bitfields provide for more compact data, which becomes more cache-friendly, and that could completely outweigh any drawbacks of additional operations. The only way to understand the real performance impact is by benchmarking actual application in the real use case.
ABI Interoperability
Endiannes of bit fields is implementation defined, so layout of the same struct produced by two compiler can differ.
Usability
There is no reference binding to a bitfield, nor you can take it's address. This could affect code and make it less clear.
For you, as the programmer, there's not much difference. But the machine code to access a whole byte is much simpler/shorter than to access an individual bit, so using bitfields bulks up the generated code.
In a pseudo-assembly-language, your setter might turn into something like:
ldb input1,b ; get the new value into accumulator b
movb b,value1 ; put it into the variable
rts ; return from subroutine
But it's not so easy for bitfields:
ldb input1,b ; get the new value into accumulator b
movb bitfields,a ; get current bitfield values into accumulator a
cmpb b,#0 ; See what to do.
brz clearvalue1: ; If it's zero, go to clearing the bit
orb #$80,a ; set the bit representing value1.
bra resume: ; skip the clearing code.
clearvalue1:
andb #$7f,a ; clear the bit representing value1
resume:
movb a,bitfields ; put the value back
rts ; return
And it has to do that for each of your 8 members' setters, and something similar for getters. It adds up. Additionally, even today's dumbest compilers would probably inline the full-byte setter code, rather than actually make a subroutine call. For the bitfield setter, it may depend whether you're compiling optimizing for speed vs. space.
And you've only asked about booleans. If they were integer bitfields, then the compiler has to deal with loading, masking out prior values, shifting the value to align into its field, masking out unused bits, and/or the value into place, then writing it back to memory.
So why would you use one vs. the other?
Bitfields are slower, but pack data more efficiently.
Non-bitfields are faster, and require less machine code to access.
As the developer, it's your judgement call. If you will be keeping many instances of Structure in memory at once, the memory savings may be worth it. If you aren't going to have many instances of that structure in memory at once, the compiled code bloat offsets memory savings and you're sacrificing speed.
template<typename enum_type,size_t n_bits>
class bit_flags{
std::bitset<n_bits> bits;
auto operator[](enum_type bit){return bits[bit];};
auto& set(enum_type bit)){return set(bit);};
auto& reset(enum_type bit)){return set(bit);};
//go on with flip et al...
static_assert(std::is_enum<enum_type>{});
};
enum class v_flags{v1,v2,/*...*/vN};
bit_flags<v_flags,v_flags::vN+1> my_flags;
my_flags.set(v_flags::v1);
my_flags.[v_flags::v2]=true;
std::bitset is as efficient as bool bit fields. You can wrap it in a class to force using every bit by names defined in an enum. Now you have a small but scalable utility to use for multiple defferent sets of bool flags. C++17 makes it even more convenient:
template<auto last_flag, typename enum_type=decltype(last_flag)>
class bit_flags{
std::bitset<last_flag+1> bits;
//...
};
bit_flags<v_flags::vN+1> my_flags;

Are casts as safe as unions?

I want to split large variables like floats into byte segments and send these serially byte by byte via UART. I'm using C/C++.
One method could be to deepcopy the value I want to send to a union and then send it. I think that would be 100% safe but slow. The union would look like this:
union mySendUnion
{
mySendType sendVal;
char[sizeof(mySendType)] sendArray;
}
Another option could be to cast the pointer to the value I want to send, into a pointer to a particular union. Is this still safe?
The third option could be to cast the pointer to the value I want to send to a char, and then increment a pointer like this:
sendType myValue = 443.2;
char* sendChar = (char*)myValue;
for(char i=0; i< sizeof(sendType) ; i++)
{
Serial.write(*(sendChar+j), 1);
}
I've had succes with the above pointer arithmetics, but I'm not sure if it's safe under all circumstances. My concern is, what if we for instance is using a 32 bit processor and want to send a float. The compiler choose to store this 32 bit float into one memory cell, but does only store one single char into each 32 bit cell.
Each counter increment would then make the program pointer increment one whole memory cell, and we would miss the float.
Is there something in the C standard that prevents this, or could this be an issue with a certain compiler?
First off, you can't write your code in "C/C++". There's no such language as "C/C++", as they are fundamentally different languages. As such, the answer regarding unions differs radically.
As to the title:
Are casts as safe as unions?
No, generally they aren't, because of the strict aliasing rule. That is, if you type-pun a pointer of one certain type with a pointer to an incompatible type, it will result in undefined behavior. The only exception to this rule is when you read or manipulate the byte-wise representation of an object by aliasing it through a pointer to (signed or unsigned) char. As in your case.
Unions, however, are quite different bastards. Type punning via copying to and reading from unions is permitted in C99 and later, but results in undefined behavior in C89 and all versions of C++.
In one direction, you can also safely type pun (in C99 and later) using a pointer to union, if you have the original union as an actual object. Like this:
union p {
char c[sizeof(float)];
float f;
} pun;
union p *punPtr = &pun;
punPtr->f = 3.14;
send_bytes(punPtr->c, sizeof(float));
Because "a pointer to a union points to all of its members and vice versa" (C99, I don't remember the exact pargraph, it's around 6.2.5, IIRC). This isn't true in the other direction, though:
float f = 3.14;
union p *punPtr = &f;
send_bytes(punPtr->c, sizeof(float)); // triggers UB!
To sum up: the following code snippet is valid in both C89, C99, C11 and C++:
float f = 3.14;
char *p = (char *)&f;
size_t i;
for (i = 0; i < sizeof f; i++) {
send_byte(p[i]); // hypotetical function
}
The following is only valid in C99 and later:
union {
char c[sizeof(float)];
float f;
} pun;
pun.f = 3.14;
send_bytes(pun.c, sizeof float); // another hypotetical function
The following, however, would not be valid:
float f = 3.14;
unsigned *u = (unsigned *)&f;
printf("%u\n", *u); // undefined behavior triggered!
Another solution that is always guaranteed to work is memcpy(). The memcpy() function does a bytewise copying between two objects. (Don't get me started on it being "slow" -- in most modern compilers and stdlib implementations, it's an intrinsic function).
A general advice when sending floating point data on a byte stream would be to use some serialization technology, to ensure that the data format is well defined (and preferably architecture neutral, beware of endianness issues!).
You could use XDR -or perhaps ASN1- which is a binary format (see xdr(3) for more). For C++, see also libs11n
Unless speed or data size is very critical, I would suggest instead a textual format like JSON or perhaps YAML (textual formats are more verbose, but easier to debug and to document). There are several good libraries supporting it (e.g. jsoncpp for C++ or jansson for C).
Notice that serial ports are quite slow (w.r.t. CPU). So the serialization processing time is negligible.
Whatever you do, please document the serialization format (even for an internal project).
The cast to [[un]signed] char [const] * is legal and it won't cause issues when reading, so that is a fine option (that is, after fixing char *sendChar = reinterpret_cast<char*>(&myValue);, and since you are at it, make it const)
Now the next problem comes on the other side, when reading, as you cannot safely use the same approach for reading. In general, the cost of copying the variables is much less than the cost of sending over the UART, so I would just use the union when reading out of the serial.

Should I use #define, enum or const?

In a C++ project I'm working on, I have a flag kind of value which can have four values. Those four flags can be combined. Flags describe the records in database and can be:
new record
deleted record
modified record
existing record
Now, for each record I wish to keep this attribute, so I could use an enum:
enum { xNew, xDeleted, xModified, xExisting }
However, in other places in code, I need to select which records are to be visible to the user, so I'd like to be able to pass that as a single parameter, like:
showRecords(xNew | xDeleted);
So, it seems I have three possible appoaches:
#define X_NEW 0x01
#define X_DELETED 0x02
#define X_MODIFIED 0x04
#define X_EXISTING 0x08
or
typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;
or
namespace RecordType {
static const uint8 xNew = 1;
static const uint8 xDeleted = 2;
static const uint8 xModified = 4;
static const uint8 xExisting = 8;
}
Space requirements are important (byte vs int) but not crucial. With defines I lose type safety, and with enum I lose some space (integers) and probably have to cast when I want to do a bitwise operation. With const I think I also lose type safety since a random uint8 could get in by mistake.
Is there some other cleaner way?
If not, what would you use and why?
P.S. The rest of the code is rather clean modern C++ without #defines, and I have used namespaces and templates in few spaces, so those aren't out of question either.
Combine the strategies to reduce the disadvantages of a single approach. I work in embedded systems so the following solution is based on the fact that integer and bitwise operators are fast, low memory & low in flash usage.
Place the enum in a namespace to prevent the constants from polluting the global namespace.
namespace RecordType {
An enum declares and defines a compile time checked typed. Always use compile time type checking to make sure arguments and variables are given the correct type. There is no need for the typedef in C++.
enum TRecordType { xNew = 1, xDeleted = 2, xModified = 4, xExisting = 8,
Create another member for an invalid state. This can be useful as error code; for example, when you want to return the state but the I/O operation fails. It is also useful for debugging; use it in initialisation lists and destructors to know if the variable's value should be used.
xInvalid = 16 };
Consider that you have two purposes for this type. To track the current state of a record and to create a mask to select records in certain states. Create an inline function to test if the value of the type is valid for your purpose; as a state marker vs a state mask. This will catch bugs as the typedef is just an int and a value such as 0xDEADBEEF may be in your variable through uninitialised or mispointed variables.
inline bool IsValidState( TRecordType v) {
switch(v) { case xNew: case xDeleted: case xModified: case xExisting: return true; }
return false;
}
inline bool IsValidMask( TRecordType v) {
return v >= xNew && v < xInvalid ;
}
Add a using directive if you want to use the type often.
using RecordType ::TRecordType ;
The value checking functions are useful in asserts to trap bad values as soon as they are used. The quicker you catch a bug when running, the less damage it can do.
Here are some examples to put it all together.
void showRecords(TRecordType mask) {
assert(RecordType::IsValidMask(mask));
// do stuff;
}
void wombleRecord(TRecord rec, TRecordType state) {
assert(RecordType::IsValidState(state));
if (RecordType ::xNew) {
// ...
} in runtime
TRecordType updateRecord(TRecord rec, TRecordType newstate) {
assert(RecordType::IsValidState(newstate));
//...
if (! access_was_successful) return RecordType ::xInvalid;
return newstate;
}
The only way to ensure correct value safety is to use a dedicated class with operator overloads and that is left as an exercise for another reader.
Forget the defines
They will pollute your code.
bitfields?
struct RecordFlag {
unsigned isnew:1, isdeleted:1, ismodified:1, isexisting:1;
};
Don't ever use that. You are more concerned with speed than with economizing 4 ints. Using bit fields is actually slower than access to any other type.
However, bit members in structs have practical drawbacks. First, the ordering of bits in memory varies from compiler to compiler. In addition, many popular compilers generate inefficient code for reading and writing bit members, and there are potentially severe thread safety issues relating to bit fields (especially on multiprocessor systems) due to the fact that most machines cannot manipulate arbitrary sets of bits in memory, but must instead load and store whole words. e.g the following would not be thread-safe, in spite of the use of a mutex
Source: http://en.wikipedia.org/wiki/Bit_field:
And if you need more reasons to not use bitfields, perhaps Raymond Chen will convince you in his The Old New Thing Post: The cost-benefit analysis of bitfields for a collection of booleans at http://blogs.msdn.com/oldnewthing/archive/2008/11/26/9143050.aspx
const int?
namespace RecordType {
static const uint8 xNew = 1;
static const uint8 xDeleted = 2;
static const uint8 xModified = 4;
static const uint8 xExisting = 8;
}
Putting them in a namespace is cool. If they are declared in your CPP or header file, their values will be inlined. You'll be able to use switch on those values, but it will slightly increase coupling.
Ah, yes: remove the static keyword. static is deprecated in C++ when used as you do, and if uint8 is a buildin type, you won't need this to declare this in an header included by multiple sources of the same module. In the end, the code should be:
namespace RecordType {
const uint8 xNew = 1;
const uint8 xDeleted = 2;
const uint8 xModified = 4;
const uint8 xExisting = 8;
}
The problem of this approach is that your code knows the value of your constants, which increases slightly the coupling.
enum
The same as const int, with a somewhat stronger typing.
typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;
They are still polluting the global namespace, though.
By the way... Remove the typedef. You're working in C++. Those typedefs of enums and structs are polluting the code more than anything else.
The result is kinda:
enum RecordType { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } ;
void doSomething(RecordType p_eMyEnum)
{
if(p_eMyEnum == xNew)
{
// etc.
}
}
As you see, your enum is polluting the global namespace.
If you put this enum in an namespace, you'll have something like:
namespace RecordType {
enum Value { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } ;
}
void doSomething(RecordType::Value p_eMyEnum)
{
if(p_eMyEnum == RecordType::xNew)
{
// etc.
}
}
extern const int ?
If you want to decrease coupling (i.e. being able to hide the values of the constants, and so, modify them as desired without needing a full recompilation), you can declare the ints as extern in the header, and as constant in the CPP file, as in the following example:
// Header.hpp
namespace RecordType {
extern const uint8 xNew ;
extern const uint8 xDeleted ;
extern const uint8 xModified ;
extern const uint8 xExisting ;
}
And:
// Source.hpp
namespace RecordType {
const uint8 xNew = 1;
const uint8 xDeleted = 2;
const uint8 xModified = 4;
const uint8 xExisting = 8;
}
You won't be able to use switch on those constants, though. So in the end, pick your poison...
:-p
Have you ruled out std::bitset? Sets of flags is what it's for. Do
typedef std::bitset<4> RecordType;
then
static const RecordType xNew(1);
static const RecordType xDeleted(2);
static const RecordType xModified(4);
static const RecordType xExisting(8);
Because there are a bunch of operator overloads for bitset, you can now do
RecordType rt = whatever; // unsigned long or RecordType expression
rt |= xNew; // set
rt &= ~xDeleted; // clear
if ((rt & xModified) != 0) ... // test
Or something very similar to that - I'd appreciate any corrections since I haven't tested this. You can also refer to the bits by index, but it's generally best to define only one set of constants, and RecordType constants are probably more useful.
Assuming you have ruled out bitset, I vote for the enum.
I don't buy that casting the enums is a serious disadvantage - OK so it's a bit noisy, and assigning an out-of-range value to an enum is undefined behaviour so it's theoretically possible to shoot yourself in the foot on some unusual C++ implementations. But if you only do it when necessary (which is when going from int to enum iirc), it's perfectly normal code that people have seen before.
I'm dubious about any space cost of the enum, too. uint8 variables and parameters probably won't use any less stack than ints, so only storage in classes matters. There are some cases where packing multiple bytes in a struct will win (in which case you can cast enums in and out of uint8 storage), but normally padding will kill the benefit anyhow.
So the enum has no disadvantages compared with the others, and as an advantage gives you a bit of type-safety (you can't assign some random integer value without explicitly casting) and clean ways of referring to everything.
For preference I'd also put the "= 2" in the enum, by the way. It's not necessary, but a "principle of least astonishment" suggests that all 4 definitions should look the same.
Here are couple of articles on const vs. macros vs. enums:
Symbolic Constants
Enumeration Constants vs. Constant Objects
I think you should avoid macros especially since you wrote most of your new code is in modern C++.
If possible do NOT use macros. They aren't too much admired when it comes to modern C++.
With defines I lose type safety
Not necessarily...
// signed defines
#define X_NEW 0x01u
#define X_NEW (unsigned(0x01)) // if you find this more readable...
and with enum I lose some space (integers)
Not necessarily - but you do have to be explicit at points of storage...
struct X
{
RecordType recordType : 4; // use exactly 4 bits...
RecordType recordType2 : 4; // use another 4 bits, typically in the same byte
// of course, the overall record size may still be padded...
};
and probably have to cast when I want to do bitwise operation.
You can create operators to take the pain out of that:
RecordType operator|(RecordType lhs, RecordType rhs)
{
return RecordType((unsigned)lhs | (unsigned)rhs);
}
With const I think I also lose type safety since a random uint8 could get in by mistake.
The same can happen with any of these mechanisms: range and value checks are normally orthogonal to type safety (though user-defined-types - i.e. your own classes - can enforce "invariants" about their data). With enums, the compiler's free to pick a larger type to host the values, and an uninitialised, corrupted or just miss-set enum variable could still end up interpretting its bit pattern as a number you wouldn't expect - comparing unequal to any of the enumeration identifiers, any combination of them, and 0.
Is there some other cleaner way? / If not, what would you use and why?
Well, in the end the tried-and-trusted C-style bitwise OR of enumerations works pretty well once you have bit fields and custom operators in the picture. You can further improve your robustness with some custom validation functions and assertions as in mat_geek's answer; techniques often equally applicable to handling string, int, double values etc..
You could argue that this is "cleaner":
enum RecordType { New, Deleted, Modified, Existing };
showRecords([](RecordType r) { return r == New || r == Deleted; });
I'm indifferent: the data bits pack tighter but the code grows significantly... depends how many objects you've got, and the lamdbas - beautiful as they are - are still messier and harder to get right than bitwise ORs.
BTW /- the argument about thread safety's pretty weak IMHO - best remembered as a background consideration rather than becoming a dominant decision-driving force; sharing a mutex across the bitfields is a more likely practice even if unaware of their packing (mutexes are relatively bulky data members - I have to be really concerned about performance to consider having multiple mutexes on members of one object, and I'd look carefully enough to notice they were bit fields). Any sub-word-size type could have the same problem (e.g. a uint8_t). Anyway, you could try atomic compare-and-swap style operations if you're desperate for higher concurrency.
Enums would be more appropriate as they provide "meaning to the identifiers" as well as type safety. You can clearly tell "xDeleted" is of "RecordType" and that represent "type of a record" (wow!) even after years. Consts would require comments for that, also they would require going up and down in code.
Even if you have to use 4 byte to store an enum (I'm not that familiar with C++ -- I know you can specify the underlying type in C#), it's still worth it -- use enums.
In this day and age of servers with GBs of memory, things like 4 bytes vs. 1 byte of memory at the application level in general don't matter. Of course, if in your particular situation, memory usage is that important (and you can't get C++ to use a byte to back the enum), then you can consider the 'static const' route.
At the end of the day, you have to ask yourself, is it worth the maintenance hit of using 'static const' for the 3 bytes of memory savings for your data structure?
Something else to keep in mind -- IIRC, on x86, data structures are 4-byte aligned, so unless you have a number of byte-width elements in your 'record' structure, it might not actually matter. Test and make sure it does before you make a tradeoff in maintainability for performance/space.
If you want the type safety of classes, with the convenience of enumeration syntax and bit checking, consider Safe Labels in C++. I've worked with the author, and he's pretty smart.
Beware, though. In the end, this package uses templates and macros!
Do you actually need to pass around the flag values as a conceptual whole, or are you going to have a lot of per-flag code? Either way, I think having this as class or struct of 1-bit bitfields might actually be clearer:
struct RecordFlag {
unsigned isnew:1, isdeleted:1, ismodified:1, isexisting:1;
};
Then your record class could have a struct RecordFlag member variable, functions can take arguments of type struct RecordFlag, etc. The compiler should pack the bitfields together, saving space.
I probably wouldn't use an enum for this kind of a thing where the values can be combined together, more typically enums are mutually exclusive states.
But whichever method you use, to make it more clear that these are values which are bits which can be combined together, use this syntax for the actual values instead:
#define X_NEW (1 << 0)
#define X_DELETED (1 << 1)
#define X_MODIFIED (1 << 2)
#define X_EXISTING (1 << 3)
Using a left-shift there helps to indicate that each value is intended to be a single bit, it is less likely that later on someone would do something wrong like add a new value and assign it something a value of 9.
Based on KISS, high cohesion and low coupling, ask these questions -
Who needs to know? my class, my library, other classes, other libraries, 3rd parties
What level of abstraction do I need to provide? Does the consumer understand bit operations.
Will I have have to interface from VB/C# etc?
There is a great book "Large-Scale C++ Software Design", this promotes base types externally, if you can avoid another header file/interface dependancy you should try to.
If you are using Qt you should have a look for QFlags.
The QFlags class provides a type-safe way of storing OR-combinations of enum values.
I would rather go with
typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;
Simply because:
It is cleaner and it makes the code readable and maintainable.
It logically groups the constants.
Programmer's time is more important, unless your job is to save those 3 bytes.
Not that I like to over-engineer everything but sometimes in these cases it may be worth creating a (small) class to encapsulate this information.
If you create a class RecordType then it might have functions like:
void setDeleted();
void clearDeleted();
bool isDeleted();
etc... (or whatever convention suits)
It could validate combinations (in the case where not all combinations are legal, eg if 'new' and 'deleted' could not both be set at the same time). If you just used bit masks etc then the code that sets the state needs to validate, a class can encapsulate that logic too.
The class may also give you the ability to attach meaningful logging info to each state, you could add a function to return a string representation of the current state etc (or use the streaming operators '<<').
For all that if you are worried about storage you could still have the class only have a 'char' data member, so only take a small amount of storage (assuming it is non virtual). Of course depending on the hardware etc you may have alignment issues.
You could have the actual bit values not visible to the rest of the 'world' if they are in an anonymous namespace inside the cpp file rather than in the header file.
If you find that the code using the enum/#define/ bitmask etc has a lot of 'support' code to deal with invalid combinations, logging etc then encapsulation in a class may be worth considering. Of course most times simple problems are better off with simple solutions...