how do I declare and initialize an array of bytes in C++

how do I declare and initialize an array of bytes in C++ - c++

Is this really the best way to declare a byte (or array of bytes)?
static constexpr byte kGuard1_[] = {
byte{0x45}, byte{0x23}, byte{0x12}, byte{0x56}, byte{0x99}, byte{0x76}, byte{0x12}, byte{0x55},
};
why isn't there some suffix (like b) that you can use to directly mark the number as a byte? Or is the problem just with my use of uniform initialization?

I can't say if this is better but it's different since you won't have to repeat byte for every element:
#include <array>
#include <utility>
template <class T, class... Args>
constexpr auto mkarr(Args&&... args) {
return std::array{static_cast<T>(std::forward<Args>(args))...};
}
static constexpr auto kGuard1 = mkarr<std::byte>(0x45, 0x23, 0x12, 0x56,
0x99, 0x76, 0x12, 0x55);
Note that it uses a std::array<std::byte, 8> instead of a std::byte[8].
why isn't there some suffix (like b) that you can use to directly mark the number as a byte?
I can't say, but if you want you can define your own user-defined literal that could be used with C arrays.
constexpr std::byte operator""_B(unsigned long long x) {
// note: no space between "" and _B above
return static_cast<std::byte>(x);
}
static constexpr std::byte kGuard1[]{0x45_B, 0x23_B, 0x12_B, 0x56_B,
0x99_B, 0x76_B, 0x12_B, 0x55_B};
Or is the problem just with my use of uniform initialization?
No, it's just that there is no implicit conversion to std::byte.

Related

Convert last characters of std::array<char, 10> to int

Given the following array: std::array<char, 10> stuff I'd like to convert the last 4 characters to the corresponding int32 value.
I tried to chain OR operations on the last items but doesn't seem to be the right way:
int a = int(stuff[6] | stuff[7] | stuff[8] | stuff[9])
Is there an elegant way to solve this?

What you tried to do has elegance that comes across in not needing an endianness check in order to work properly. What you missed was some shifting to indicate significance in the final value:
int a = stuff[6] << 24 | stuff[7] << 16 | stuff[8] << 8 | stuff[9];
This alone does not care about endianness because from the language's perspective, it is based on values rather than bytes. You determine which values are most significant.
That said, this also assumes an 8-bit byte and at least 4-byte int. If you want elegance of use, you can get it with a safe and general abstraction:
#include <array>
#include <climits>
#include <cstddef>
namespace detail {
// Could be replaced by an inline lambda-template in C++20.
template<typename T, std::size_t N, std::size_t... Is>
constexpr T pack_into_impl(const std::array<std::byte, N>& bytes, std::index_sequence<Is...>) {
// Build final value from right to left to make the math more clear
// and to use the least significant bytes available when N < sizeof(T).
// e.g., bytes[3] << 0 | bytes[2] << 8 | bytes[1] << 16 | bytes[0] << 24
return ((static_cast<int>(bytes[N-Is-1]) << (CHAR_BIT * Is)) | ...);
}
}
// Takes bytes to pack from most significant to least significant.
// N.B. this is not a production-ready doc comment for this function.
template<typename T, std::size_t N>
constexpr T pack_into(std::array<std::byte, N> bytes) {
static_assert(sizeof(T) >= N, "Destination type is too small for this many bytes");
return detail::pack_into_impl<T>(bytes, std::make_index_sequence<N>{});
}
// Convenience overload.
template<typename T, typename... Bytes>
constexpr T pack_into(Bytes... bytes) {
// Check that each Bytes type can be static_cast to std::byte.
// Maybe check that values fit within a byte.
return pack_into<T>(std::array{static_cast<std::byte>(bytes)...});
}
int main() {
static_assert(pack_into<int>(0x12, 0x34, 0x56, 0x78) == 0x12345678);
static_assert(pack_into<int>(0x01, 0x02) == 0x0102);
// pack_into<int>(0x01, 0x02, 0x03, 0x04, 0x05); // static_assert
}
Some of this can be cleaned up in C++20 by using concepts and a []<std::size_t... Is> lambda, but you get the idea. Naturally, you're also free to transform the API to make the size unknown at compile-time for convenience and live with a possible runtime check when too many bytes are given. It depends on your use case.

Believe it or not, even though this is C++, memcpy() is the recommended way to do this kind of thing:
int32_t a;
memcpy(&a, stuff.data() + 6, 4);
It avoids strict aliasing violations, and compilers will optimize the memcpy call away.
Be aware of endianess differences if the data you're loading was created on a different machine with a different CPU architecture.

Type safe enum bit flags

I'm looking to use a set of bit flags for my current issue. These flags are (nicely) defined as part of an enum, however I understand that when you OR two values from an enum the return type of the OR operation has type int.
What I'm currently looking for is a solution which will allow the users of the bit mask to remain type safe, as such I have created the following overload for operator |
enum ENUM
{
ONE = 0x01,
TWO = 0x02,
THREE = 0x04,
FOUR = 0x08,
FIVE = 0x10,
SIX = 0x20
};
ENUM operator | ( ENUM lhs, ENUM rhs )
{
// Cast to int first otherwise we'll just end up recursing
return static_cast< ENUM >( static_cast< int >( lhs ) | static_cast< int >( rhs ) );
}
void enumTest( ENUM v )
{
}
int main( int argc, char **argv )
{
// Valid calls to enumTest
enumTest( ONE | TWO | FIVE );
enumTest( TWO | THREE | FOUR | FIVE );
enumTest( ONE | TWO | THREE | FOUR | FIVE | SIX );
return 0;
}
Does this overload really provide type safety? Does casting an int containing values not defined in the enum cause undefined behaviour? Are there any caveats to be aware of?

Does this overload really provide type safety?
In this case, yes. The valid range of values for the enumeration goes at least up to (but not necessarily including) the next largest power of two after the largest named enumerator, in order to allow it to be used for bitmasks like this. So any bitwise operation on two values will give a value representable by this type.
Does casting an int containing values not defined in the enum cause undefined behaviour?
No, as long as the values are representable by the enumeration, which they are here.
Are there any caveats to be aware of?
If you were doing operations such as arithmetic, which could take the value out of range, then you'd get an implementation-defined result, but not undefined behavoiur.

If you think about type safety, it is better to use std::bitset
enum BITS { A, B, C, D };
std::bitset<4> bset, bset1;
bset.set(A); bset.set(C);
bset1[B] = 1;
assert(bset[A] == bset[C]);
assert(bset[A] != bset[B]);
assert(bset1 != bset);

The values of your constants are not closed under OR. In other words, it's possible that the result of an OR of two ENUM constants will result in a value that is not an ENUM constant:
0x30 == FIVE | SIX;
The standard says that this is ok, an enumaration can have a value not equal to any of its enumarators (constants). Presumably it's to allow this type of usage.
In my opinion this is not type safe because if you were to look at the implementation of enumTest you have to be aware that the argument type is ENUM but it might have a value that's not an ENUM enumerator.
I think that if these are simply bit flags then do what the compiler wants you to: use an int for the combination of flags.

With a simple enum such as yours:
enum ENUM
{
ONE = 0x01,
TWO = 0x02,
...
};
it is implementation-defined what's the underlying type (most likely int)1, but as long as you are going to use | (bitwise or) for creating masks, the result will never require a wider type than the largest value from this enum.
[1] "The underlying type of an enumeration is an integral type that can represent all the enumerator values defined in the enumeration. It is implementation-defined which integral type is used as the underlying type for an enumeration except that the underlying type shall not be larger than int unless the value of an enumerator cannot fit in an int or unsigned int."

This is my approach to bit flags:
template<typename E>
class Options {
unsigned long values;
constexpr Options(unsigned long v, int) : values{v} {}
public:
constexpr Options() : values(0) {}
constexpr Options(unsigned n) : values{1UL << n} {}
constexpr bool operator==(Options const& other) const {
return (values & other.values) == other.values;
}
constexpr bool operator!=(Options const& other) const {
return !operator==(other);
}
constexpr Options operator+(Options const& other) const {
return {values | other.values, 0};
}
Options& operator+=(Options const& other) {
values |= other.values;
return *this;
}
Options& operator-=(Options const& other) {
values &= ~other.values;
return *this;
}
};
#define DECLARE_OPTIONS(name) class name##__Tag; using name = Options
#define DEFINE_OPTION(name, option, index) constexpr name option(index)
You can use it like so:
DECLARE_OPTIONS(ENUM);
DEFINE_OPTIONS(ENUM, ONE, 0);
DEFINE_OPTIONS(ENUM, TWO, 1);
DEFINE_OPTIONS(ENUM, THREE, 2);
DEFINE_OPTIONS(ENUM, FOUR, 3);
Then ONE + TWO is still of type ENUM. And you can re-use the class to define multiple bit flag sets that are of different, incompatible types.
I personally don't like using | and & to set and test bits. It's the logical operation that needs to be done to set and test, but they don't express the meaning of the operation unless you think about bitwise operations. If you read out ONE | TWO you might think that you want either ONE or TWO, not necessarily both. This is why I prefer using + to add flags together and == to test if a flag is set.
See this blog post for more details on my suggested implementation.

C macro computing the number of bytes that a given compile-time constant requires

Often I have some compile-time constant number that is also the upper limit of possible values assumed by the variables. And thus I'm interested in choosing the smallest type that can accomodate those values. For example I may know that variables will fit into <-30 000, 30 000> range, so when looking for a suitable type I would start with signed short int. But since I'm switching between platforms and compilers I would like a compile-time assert checking whether the constant upper values really fit within those type. BOOST_STATIC_ASSERT( sizeof(T) >= required_number_of_bytes_for_number ) works fine but the problem is:
How to automatically determine the number of bytes required for storing a given compile-time constant, signed or unsigned? I guess a C macro could do this job? Could anyone write it for me?
I might use std::numeric_limits::max() and min() instead of computing the bytes but then I would have to switch to run-time assert :(

Now that this is tagged with c++, I suggest using Boost.Integer for appropriate type selection. boost::int_max_value_t< MyConstant >::least would give the type you are looking for.

You may use the following code. It works only for positive 8/16/32/64bit integers. But you may do the appropriate changes for negative values as well.
template <typename T, T x> class TypeFor
{
template <T x>
struct BitsRequired {
static const size_t Value = 1 + BitsRequired<x/2>::Value;
};
template <>
struct BitsRequired<0> {
static const size_t Value = 0;
};
static const size_t Bits = BitsRequired<x>::Value;
static const size_t Bytes = (Bits + 7) / 8;
static const size_t Complexity = 1 + BitsRequired<Bytes-1>::Value;
template <size_t c> struct Internal {
};
template <> struct Internal<1> {
typedef UCHAR Type;
};
template <> struct Internal<2> {
typedef USHORT Type;
};
template <> struct Internal<3> {
typedef ULONG Type;
};
template <> struct Internal<4> {
typedef ULONGLONG Type;
};
public:
typedef typename Internal<Complexity>::Type Type;
};
TypeFor<UINT, 117>::Type x;
P.S. this compiles under MSVC. Probably some adjustment should be done to adopt it for gcc/mingw/etc.

How about you avoid the problem:
BOOST_STATIC_ASSERT((1LL << (8*sizeof(T))) >= number);

How about BOOST_STATIC_ASSERT(int(60000)==60000) ? This will test whether 60000 fits in an int. If int is 16 bits, int(60000) is 27232. For the comparison, this will then be zero-extended back to a 32 bits long, and fail reliably.

constexpr and endianness

A common question that comes up from time to time in the world of C++ programming is compile-time determination of endianness. Usually this is done with barely portable #ifdefs. But does the C++11 constexpr keyword along with template specialization offer us a better solution to this?
Would it be legal C++11 to do something like:
constexpr bool little_endian()
{
const static unsigned num = 0xAABBCCDD;
return reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD;
}
And then specialize a template for both endian types:
template <bool LittleEndian>
struct Foo
{
// .... specialization for little endian
};
template <>
struct Foo<false>
{
// .... specialization for big endian
};
And then do:
Foo<little_endian()>::do_something();

New answer (C++20)
c++20 has introduced a new standard library header <bit>.
Among other things it provides a clean, portable way to check the endianness.
Since my old method relies on some questionable techniques, I suggest anyone who uses it to switch to the check provided by the standard library.
Here's an adapter which allows to use the new way of checking endianness without having to update the code that relies on the interface of my old class:
#include <bit>
class Endian
{
public:
Endian() = delete;
static constexpr bool little = std::endian::native == std::endian::little;
static constexpr bool big = std::endian::native == std::endian::big;
static constexpr bool middle = !little && !big;
};
Old answer
I was able to write this:
#include <cstdint>
class Endian
{
private:
static constexpr uint32_t uint32_ = 0x01020304;
static constexpr uint8_t magic_ = (const uint8_t&)uint32_;
public:
static constexpr bool little = magic_ == 0x04;
static constexpr bool middle = magic_ == 0x02;
static constexpr bool big = magic_ == 0x01;
static_assert(little || middle || big, "Cannot determine endianness!");
private:
Endian() = delete;
};
I've tested it with g++ and it compiles without warnings. It gives a correct result on x64.
If you have any big-endian or middle-endian proccesor, please, confirm that this works for you in a comment.

It is not possible to determine endianness at compile time using constexpr (before C++20). reinterpret_cast is explicitly forbidden by [expr.const]p2, as is iain's suggestion of reading from a non-active member of a union. Casting to a different reference type is also forbidden, as such a cast is interpreted as a reinterpret_cast.
Update:
This is now possible in C++20. One way (live):
#include <bit>
template<std::integral T>
constexpr bool is_little_endian() {
for (unsigned bit = 0; bit != sizeof(T) * CHAR_BIT; ++bit) {
unsigned char data[sizeof(T)] = {};
// In little-endian, bit i of the raw bytes ...
data[bit / CHAR_BIT] = 1 << (bit % CHAR_BIT);
// ... corresponds to bit i of the value.
if (std::bit_cast<T>(data) != T(1) << bit)
return false;
}
return true;
}
static_assert(is_little_endian<int>());
(Note that C++20 guarantees two's complement integers -- with an unspecified bit order -- so we just need to check that every bit of the data maps to the expected place in the integer.)
But if you have a C++20 standard library, you can also just ask it:
#include <type_traits>
constexpr bool is_little_endian = std::endian::native == std::endian::little;

Assuming N2116 is the wording that gets incorporated, then your example is ill-formed (notice that there is no concept of "legal/illegal" in C++). The proposed text for [decl.constexpr]/3 says
its function-body shall be a compound-statement of the form
{ return expression; }
where expression is a potential constant expression (5.19);
Your function violates the requirement in that it also declares a local variable.
Edit: This restriction could be overcome by moving num outside of the function. The function still wouldn't be well-formed, then, because expression needs to be a potential constant expression, which is defined as
An expression is a potential constant expression if it is a constant
expression when all occurrences of function parameters are replaced
by arbitrary constant expressions of the appropriate type.
IOW, reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD would have to be a constant expression. However, it is not: &num would be a address constant-expression (5.19/4). Accessing the value of such a pointer is, however, not allowed for a constant expression:
The subscripting operator [] and the class member access . and
operators, the & and * unary operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in the creation of an
address constant expression, but the value of an object shall not be accessed by the use of these operators.
Edit: The above text is from C++98. Apparently, C++0x is more permissive what it allows for constant expressions. The expression involves an lvalue-to-rvalue conversion of the array reference, which is banned from constant expressions unless
it is applied to an lvalue of effective integral type that refers
to a non-volatile const variable or static data member initialized
with constant expressions
It's not clear to me whether (&num)[0] "refers to" a const variable, or whether only a literal num "refers to" such a variable. If (&num)[0] refers to that variable, it is then unclear whether reinterpret_cast<const unsigned char*> (&num)[0] still "refers to" num.

There is std::endian in the upcoming C++20.
#include <bit>
constexpr bool little_endian() noexcept
{
return std::endian::native == std::endian::little;
}

My first post. Just wanted to share some code that I'm using.
//Some handy defines magic, thanks overflow
#define IS_LITTLE_ENDIAN ('ABCD'==0x41424344UL) //41 42 43 44 = 'ABCD' hex ASCII code
#define IS_BIG_ENDIAN ('ABCD'==0x44434241UL) //44 43 42 41 = 'DCBA' hex ASCII code
#define IS_UNKNOWN_ENDIAN (IS_LITTLE_ENDIAN == IS_BIG_ENDIAN)
//Next in code...
struct Quad
{
union
{
#if IS_LITTLE_ENDIAN
struct { std::uint8_t b0, b1, b2, b3; };
#elif IS_BIG_ENDIAN
struct { std::uint8_t b3, b2, b1, b0; };
#elif IS_UNKNOWN_ENDIAN
#error "Endianness not implemented!"
#endif
std::uint32_t dword;
};
};
Constexpr version:
namespace Endian
{
namespace Impl //Private
{
//41 42 43 44 = 'ABCD' hex ASCII code
static constexpr std::uint32_t LITTLE_{ 0x41424344u };
//44 43 42 41 = 'DCBA' hex ASCII code
static constexpr std::uint32_t BIG_{ 0x44434241u };
//Converts chars to uint32 on current platform
static constexpr std::uint32_t NATIVE_{ 'ABCD' };
}
//Public
enum class Type : size_t { UNKNOWN, LITTLE, BIG };
//Compare
static constexpr bool IS_LITTLE = Impl::NATIVE_ == Impl::LITTLE_;
static constexpr bool IS_BIG = Impl::NATIVE_ == Impl::BIG_;
static constexpr bool IS_UNKNOWN = IS_LITTLE == IS_BIG;
//Endian type on current platform
static constexpr Type NATIVE_TYPE = IS_LITTLE ? Type::LITTLE : IS_BIG ? Type::BIG : Type::UNKNOWN;
//Uncomment for test.
//static_assert(!IS_LITTLE, "This platform has little endian.");
//static_assert(!IS_BIG, "This platform has big endian.");
//static_assert(!IS_UNKNOWN, "Error: Unsupported endian!");
}

That is a very interesting question.
I am not Language Lawyer, but you might be able to replace the reinterpret_cast with a union.
const union {
int int_value;
char char_value[4];
} Endian = { 0xAABBCCDD };
constexpr bool little_endian()
{
return Endian[0] == 0xDD;
}

This may seem like cheating, but you can always include endian.h... BYTE_ORDER == BIG_ENDIAN is a valid constexpr...

Here is a simple C++11 compliant version, inspired by #no-name answer:
constexpr bool is_system_little_endian(int value = 1) {
return static_cast<const unsigned char&>(value) == 1;
}
Using a default value to crank everything on one line is to meet C++11 requirements on constexpr functions: they must only contain a single return statement.
The good thing with doing it (and testing it!) in a constexpr context is that it makes sure that there is no undefined behavior in the code.
On compiler explorer here.

If your goal is to insure that the compiler optimizes little_endian() into a constant true or false at compile-time, without any of its contents winding up in the executable or being executed at runtime, and only generating code from the "correct" one of your two Foo templates, I fear you're in for a disappointment.
I also am not a language lawyer, but it looks to me like constexpr is like inline or register: a keyword that alerts the compiler writer to the presence of a potential optimization. Then it's up to the compiler writer whether or not to take advantage of that. Language specs typically mandate behaviors, not optimizations.
Also, have you actually tried this on a variety of C++0x complaint compilers to see what happens? I would guess most of them would choke on your dual templates, since they won't be able to figure out which one to use if invoked with false.

Variable-sized bitfields with aliasing

I have some struct containig a bitfield, which may vary in size. Example:
struct BitfieldSmallBase {
uint8_t a:2;
uint8_t b:3;
....
}
struct BitfieldLargeBase {
uint8_t a:4;
uint8_t b:5;
....
}
and a union to access all bits at once:
template<typename T>
union Bitfield
{
T bits;
uint8_t all; // <------------- Here is the problem
bool operator & (Bitfield<T> x) const {
return !!(all & x.all);
}
Bitfield<T> operator + (Bitfield<T> x) const {
Bitfield<T> temp;
temp.all = all + x.all; //works, because I can assume no overflow will happen
return temp;
}
....
}
typedef Bitfield<BitfieldSmallBase> BitfieldSmall;
typedef Bitfield<BitfieldLargeBase> BitfieldLarge;
The problem is: For some bitfield base classes, an uint8_t is not sufficient. BitfieldSmall does fit into a uint8_t, but BitfieldLarge does not. The data needs to be packed as tightly as possible (it will be handled by SSE instructions later), so always using uint16_t is out of question. Is there a way to declare the "all" field with an integral type, whose size is the same as the bitfield? Or another way to access bits as a whole?
I can of course forego the use of the template and declare every kind of bitfield explicitly, but I would like to avoid code repetition (there is quite a list of operators und member functions).

You could make the integral type a template parameter as well.
template<typename T, typename U>
union Bitfield
{
T bits;
U all;
}
typedef Bitfield<BitfieldSmallBase, uint8_t> BitfieldSmall;
typedef Bitfield<BitfieldLargeBase, uint16_t> BitfieldLarge;

I've learnt the hard way that whilst the bit width on vars that you're using is a convenient way of getting the compiler to do your masking and shifting for you, you cannot make assumptions about the order and padding of the members in the struct. Its compiler dependent and the compiler really does change the order and such dependent upon the other code in your project.
If you want to treat a byte as discrete fields, you really have to do it the hard way.

you can use template metaprogramming to define a template function that maps from BitfieldSmallBase, BitfieldLargeBase, etc into another type - uint8_t by default and to uint16_t for BitfieldLargeBase as a template specialization and then use that like this:
union Bitfield
{
T bits;
typename F<T>::holder_type all;
};

You might want to consider std::bitset or boost::dynamic_bitset rather than rolling your own. In any case, steer clear of std::vector<bool>!

Make the number of bytes you need part of the template parameters:
template <typename T, int S=1>
struct BitField
{
union
{
T bits;
unsigned char bytes[S];
};
};
typedef Bitfield<BitfieldSmallBase, 1> BitfieldSmall;
typedef Bitfield<BitfieldLargeBase, 2> BitfieldLarge;

How about this?
#include <limits.h>
template <class T>
union BitField
{
T bits;
unsigned all : sizeof(T) * CHAR_BIT;
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how do I declare and initialize an array of bytes in C++ - c++

Related

Convert last characters of std::array<char, 10> to int

Type safe enum bit flags

C macro computing the number of bytes that a given compile-time constant requires

constexpr and endianness

Variable-sized bitfields with aliasing

Categories

Resources