Storing 8 logical true/false values inside 1 byte? - c++

I'm working on a microcontroller with only 2KB of SRAM and desperately need to conserve some memory. Trying to work out how I can put 8 0/1 values into a single byte using a bitfield but can't quite work it out.
struct Bits
{
int8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
int main(){
Bits b;
b.b0 = 0;
b.b1 = 1;
cout << (int)b.b0; // outputs 0, correct
cout << (int)b.b1; // outputs -1, should be outputting 1
}
What gives?

All of your bitfield members are signed 1-bit integers. On a two's complement system, that means they can represent only either 0 or -1. Use uint8_t if you want 0 and 1:
struct Bits
{
uint8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};

As a word of caution - the standard doesn't really enforce an implementation scheme for bitfields. There is no guarantee that Bits will be 1 byte, and hypothetically it is entirely possible for it to be larger.
In practice however the actual implementations usually follow the obvious logic and it will "almost always" be 1 byte in size, but again, there is no requirement that it is guaranteed. Just in case you want to be sure, you could do it manually.
BTW -1 is still true but it -1 != true

As noted, these variables consist of only a sign bit, so the only available values are 0 and -1.
A more appropriate type for these bitfields would be bool. C++14 §9.6/4:
If the value true or false is stored into a bit-field of type bool of any size (including a one bit bit-field), the original bool value and the value of the bit-field shall compare equal.
Yes, std::uint8_t will do the job, but you might as well use the best fit. You won't need things like the cast for std::cout << (int)b.b0;.

Signed and unsigned integers are the answer.
Keep in mind that signaling is just an interpretation of bits, -1 or 1 is just the 'print' serializer interpreting the "variable type", as it was "revealed" to cout functions (look operator overloading) by compiler, the bit is the same, its value also (on/off) - since you have only 1 bit.
Don't care about that, but is a good practice to be explicit, so prefer to declare your variable with unsigned, it instructs the compiler to mount a proper code when you set or get the value to a serializer like "print" (cout).
"COUT" OPERATOR OVERLOADING:
"cout" works through a series of functions which the parameter overloading instructs the compiler which function to call. So, there are two functions, one receives an unsigned and another signed, thus they can interpret the same data differently, and you can change it, instructing the compiler to call another one using cast. See cout << myclass

Related

How to iterate over every bit of a type in C++

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this
#include <iostream>
template<typename T>
bool getIthBit (T data, unsigned int bit) {
return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}
int main() {
uint32_t a = 16;
for (int i = 0; i < 32; i++) {
std::cout << getIthBit (a, i);
}
std::cout << std::endl;
}
Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here
#include <iostream>
struct s {
uint32_t i;
char c;
};
int main() {
std::cout << sizeof (s) << std::endl;
}
The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".
It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.
I recommend not attempting to support types that have padding as keys of your DST.
Following trick might work for finding padding bits of trivially copyable classes:
Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.
I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.
So, there are a lot of caveats, but it should work in your example case:
s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);
std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;
There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.
So if an object has unique representations, it is safe to assume that every bit is significant.
There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.
Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }
Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.
¹ like multiple NaN floating point values

MSVC 1 bit enum type equals -1 and equality test fails

I have defined a bitfield of enum types to match a set of bits in an embedded system. I'm trying to write a test harness in MSVC for the code, but comparing what should be equal values fails.
The definition looks like this:
typedef enum { SERIAL, PARALLEL } MODE_e;
typedef union {
struct {
TYPE_e Type : 1; // 1
POSITION_e 1Pos : 1; // 2
POSITION_e 2Pos : 1; // 3
bool Enable : 1; // 4
NET_e Net : 1; // 5
TYPE_e Type : 1; // 6
bool En : 1; // 7
TIME_e Time : 3; // 8-10
MODE_e Mode : 1; // 11
bool TestEn : 1; // 12
bool DelayEn : 1; // 13
MODE_e Mode : 1; // 14
bool xEn : 1; // 15
MODE_e yMode : 1; // 16
bool zEnable : 1; // 17
} Bits;
uint32_t Word;
} BITS_t;
Later the following comparison fails:
Store.Bits.Mode = PARALLEL;
if (store.Bits.Mode == PARALLEL)
...
I examined the Mode bool in the debugger, and it looked odd. The value of Mode is -1.
It's as if MSVC considers the value to be a two's complement number, but 1 bit wide, so 0b1 is decimal -1. The enum sets PARALLEL to 1, so the two do not match.
The comparison works fine on the embedded side using LLVM or GCC.
Which behavior is correct? I assume GCC and LLVM have better support for the C standards than MSVC in areas such as bit fields. More importantly, can I work around this difference without making major changes to the embedded code?
Dissecting this in detail, you have the following problems:
There is no guarantee that Type : 1 is the MSB or LSB. Generally, there are no guarantees of the bit-field layout in memory at all.
As mentioned in other answers, enumeration variables (unlike enumeration constants) have implementation-defined size. Meaning that you can't know their size, portably. In addition, if the size is something which isn't the same as either int or _Bool, the compiler need not support it at all.
Enums are most often a signed integer type. And when you create a bit-field of size 1 with a signed type, nobody including the standard knows what it means. Is it the sign bit you intend to store there or is it data?
The size of what the C standard calls "storage unit" inside the bit-field, is unspecified. Typically it is alignment-based. The C standard does guarantee that if you have several bit-fields of the same type trailing each other, they must be merged into the same storage unit (if there is room). For different types, there are no such guarantees.
It is fairly common that when you go from one type like POSITION_e to a different type bool, the compiler places them in different storage units. In practice meaning that there's a high risk of padding bit insertion whenever this happens. Lots of mainstream compilers do in fact behave just like that.
In addition, a struct or union may contain padding bytes anywhere.
In addition, there is the endianess problem.
Conclusion: bit-fields cannot be used in programs that need any form of portability. They cannot be used for the purpose of memory mapping.
Also, you really don't need all these abstraction layers - it's a simple dip-switch, not a space shuttle! :)
Solution:
I would strongly recommend to drop all of this in favour for a plain uint32_t. You can mask individual bits with plain integer constants:
#define DIP_TYPE (1u << 31)
#define DIP_POS (1u << 30)
...
uint32_t dipswitch = ...;
bool actuator_active = dipswitch & DIP_TYPE; // read
dipswitch |= DIP_POS; // write
This is massively portable, well-defined, standardized, MISRA-C compliant - you can even port it between different endianess architectures. It solves all of the above mentioned problems.
I would use the following approach.
typedef enum { SERIAL_TEST_MODE = 0, PARALLEL_TEST_MODE = 1 } TEST_MODE_e;
Then set the value and test the value as follows.
config.jumpers.Bits.TestMode = PARALLEL_TEST_MODE;
if (config.jumpers.Bits.TestMode & PARALLEL_TEST_MODE)
...
The value of 1 will have the least significant bit turned on and the value of 0 would have the least significant bit turned off.
And this should be portable across multiple compilers.
The exact type used to represent an enum is implementation defined. So what is most likely happening is that MSVC is using char for this particular enum which is signed. So declaring a 1-bit bitfield of this type means you get 0 and -1 for the values.
Rather that declaring the bitfield as the type of the enum, declare them as unsigned int or unsigned char so the values are properly represented.
A simple fix I came up with, which is only valid for MSVC and GCC/LLVM, is:
#ifdef _WIN32
#define JOFF 0
#define JON -1
#else
#define JOFF 0
#define JON 1
#endif
typedef enum { SERIAL = JOFF, PARALLEL = JON } TEST_MODE_e;

Why is parameter to isdigit integer?

The function std::isdigit is:
int isdigit(int ch);
The return (Non-zero value if the character is a numeric character, zero otherwise.) smells like the function was inherited from C, but even that does not explain why the parameter type is int not char while at the same time...
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Is there any technical reason why isdigitstakes an int not a char?
The reaons is to allow EOF as input. And EOF is (from here):
EOF integer constant expression of type int and negative value
The accepted answer is correct, but I believe the question deserves more detail.
A char in C++ is either signed or unsigned depending on your implementation (and, yet, it's a distinct type from signed char and unsigned char).
Where C grew up, char was typically unsigned and assumed to be an n-bit byte that could represent [0..2^n-1]. (Yes, there were some machines that had byte sizes other than 8 bits.) In fact, chars were considered virtually indistinguishable from bytes, which is why functions like memcpy take char * rather than something like uint8_t *, why sizeof char is always 1, and why CHAR_BITS isn't named BYTE_BITS.
But the C standard, which was the baseline for C++, only promised that char could hold any value in the execution character set. They might hold additional values, but there was no guarantee. The source character set (basically 7-bit ASCII minus some control characters) required something like 97 values. For a while, the execution character set could be smaller, but in practice it almost never was. Eventually there was an explicit requirement that a char be large enough to hold an 8-bit byte.
But the range was still uncertain. If unsigned, you could rely on [0..255]. Signed chars, however, could--in theory--use a sign+magnitude representation that would give you a range of [-127..127]. Note that's only 255 unique values, not 256 values ([-128..127]) like you'd get from two's complement. If you were language lawyerly enough, you could argue that you cannot store every possible value of an 8-bit byte in a char even though that was a fundamental assumption throughout the design of the language and its run-time library. I think C++ finally closed that apparent loophole in C++17 or C++20 by, in effect, requiring that a signed char use two's complement even if the larger integral types use sign+magnitude.
When it came time to design fundamental input/output functions, they had to think about how to return a value or a signal that you've reached the end of the file. It was decided to use a special value rather than an out-of-band signaling mechanism. But what value to use? The Unix folks generally had [128..255] available and others had [-128..-1].
But that's only if you're working with text. The Unix/C folks thought of textual characters and binary byte values as the same thing. So getc() was also for reading bytes from a binary file. All 256 possible values of a char, regardless of its signedness, were already claimed.
K&R C (before the first ANSI standard) didn't require function prototypes. The compiler made assumptions about parameter and return types. This is why C and C++ have the "default promotions," even though they're less important now than they once were. In effect, you couldn't return anything smaller than an int from a function. If you did, it would just be converted to int anyway.
The natural solution was therefore to have getc() return an int containing either the character value or a special end-of-file value, imaginatively dubbed EOF, a macro for -1.
The default promotions not only mandated a function couldn't return an integral type smaller than an int, they also made it difficult to pass in a small type. So int was also the natural parameter type for functions that expected a character. And thus we ended up with function signatures like int isdigit(int ch).
If you're a Posix fan, this is basically all you need.
For the rest of us, there's a remaining gotcha: If your chars are signed, then -1 might represent a legitimate character in your execution character set. How can you distinguish between them?
The answer is that functions don't really traffic in char values at all. They're really using unsigned char values dressed up as ints.
int x = getc(source_file);
if (x != EOF) { /* reached end of file */ }
else if (0 <= x && x < 128) { /* plain 7-bit character */ }
else if (128 <= x && x < 256) {
// Here it gets interesting.
bool b1 = isdigit(x); // OK
bool b2 = isdigit(static_cast<char>(x)); // NOT PORTABLE
bool b3 = isdigit(static_cast<unsigned char>(x)); // CORRECT!
}

Cleanest way to combine two shorts to an int

I have two 16-bit shorts (s1 and s2), and I'm trying to combine them into a single 32-bit integer (i1). According to the spec I'm dealing with, s1 is the most significant word, and s2 is the least significant word, and the combined word appears to be signed. (i.e. the top bit of s1 is the sign.)
What is the cleanest way to combine s1 and s2?
I figured something like
const utils::int32 i1 = ((s1<<16) | (s2));
would do, and it seems to work, but I'm worried about left-shifting a short by 16.
Also, I'm interested in the idea of using a union to do the job, any thoughts on whether this is a good or bad idea?
What you are doing is only meaningful if the shorts and the int are all unsigned. If either of the shorts is signed and has a negative value, the idea of combining them into a single int is meaningless, unless you have been provided with a domain-specific specification to cover such an eventuality.
What you've got looks nearly correct, but will probably fail if the second part is negative; the implicit conversion to int will probably sign-extend and fill the upper 16 bits with ones. A cast to unsigned short would probably prevent that from happening, but the best way to be sure is to mask off the bits.
const utils::int32 combineddata = ((data.first<<16) | ((data.second) & 0xffff));
I know this is an old post but the quality of the present posted answers is depressing...
These are the issues to consider:
Implicit integer promotion of shorts (or other small integer types) will result in an operand of type int which is signed. This will happen regardless of the signedness of the small integer type. Integer promotion happens in the shift operations and in the bitwise OR.
In case of the shift operators, the resulting type is that of the promoted left operand. In case of bitwise OR, the resulting type is obtained from "the usual arithmetic conversions".
Left-shifting a negative number results in undefined behavior. Right-shifting a negative number results in implementation-defined behavior (logical vs arithmetic shift). Therefore signed numbers should not be used together with bit shifts in 99% of all use-cases.
Unions, arrays and similar are poor solutions since they make the code endianess-dependent. In addition, type punning through unions is also not well-defined behavior in C++ (unlike C). Pointer-based solutions are bad since they will end up violating "the strict aliasing rule".
A proper solution will therefore:
Use operands with types that are guaranteed to be unsigned and will not be implicitly promoted.
Use bit-shifts, since these are endianess-independent.
Not use some non-portable hogwash solution with unions or pointers. There is absolutely nothing gained from such solutions except non-portability. Such solutions are however likely to invoke one or several cases of undefined behavior.
It will look like this:
int32_t i32 = (int32_t)( (uint32_t)s1<<16 | (uint32_t)s2 );
Any other solution is highly questionable and at best non-portable.
Since nobody posted it, this is what the union would look like. But the comments about endian-ness definitely apply.
Big-endian:
typedef union {
struct {
uint16_t high;
uint16_t low;
} pieces;
uint32_t all;
} splitint_t;
Little-endian:
typedef union {
struct {
uint16_t low;
uint16_t high;
} pieces;
uint32_t all;
} splitint_t;
Try projecting data.second explicite to short type, like:
const utils::int32 combineddata = ((data.first<<16) | ((short)data.second));
edit: I am C# dev, probably the casting in your code language looks different, but idea could be the same.
You want to cast data.first to an int32 before you do the shift, otherwise the shift will overflow the storage before it gets a chance to be automatically promoted when it is assigned to combined data.
Try:
const utils::int32 combineddata = (static_cast<utils::int32>(data.first) << 16) | data.second;
This is of course assuming that data.first and data.second are types that are guaranteed to be precisely 16 bits long, otherwise you have bigger problems.
I really don't understand your statement "if data.second gets too big, the | won't take account of the fact that they're both shorts."
Edit: And Neil is absolutely right about signedness.
Using a union to do the job looks like a good choice, but is a portability issue due to endian differences of processors. It's doable, but you need to be prepared to modify your union based on the target architecture. Bit shifting is portable, but please write a function/method to do it for you. Inline if you like.
As to the signedness of the shorts, for this type of operation, it's the meaning that matters not the data type. In other words, if s1 and s2 are meant to be interpreted as two halves of a 32 bit word, having bit 15 set only matters if you do something that would cause s2 to be sign extended. See Mark Ransoms answer, which might be better as
inline utils::int32 CombineWord16toLong32(utils::int16 s1, utils::int16 s2)
{
return ((s1 <<16) | (s2 & 0xffff));
}

Is it safe to use -1 to set all bits to true?

I've seen this pattern used a lot in C & C++.
unsigned int flags = -1; // all bits are true
Is this a good portable way to accomplish this? Or is using 0xffffffff or ~0 better?
I recommend you to do it exactly as you have shown, since it is the most straight forward one. Initialize to -1 which will work always, independent of the actual sign representation, while ~ will sometimes have surprising behavior because you will have to have the right operand type. Only then you will get the most high value of an unsigned type.
For an example of a possible surprise, consider this one:
unsigned long a = ~0u;
It won't necessarily store a pattern with all bits 1 into a. But it will first create a pattern with all bits 1 in an unsigned int, and then assign it to a. What happens when unsigned long has more bits is that not all of those are 1.
And consider this one, which will fail on a non-two's complement representation:
unsigned int a = ~0; // Should have done ~0u !
The reason for that is that ~0 has to invert all bits. Inverting that will yield -1 on a two's complement machine (which is the value we need!), but will not yield -1 on another representation. On a one's complement machine, it yields zero. Thus, on a one's complement machine, the above will initialize a to zero.
The thing you should understand is that it's all about values - not bits. The variable is initialized with a value. If in the initializer you modify the bits of the variable used for initialization, the value will be generated according to those bits. The value you need, to initialize a to the highest possible value, is -1 or UINT_MAX. The second will depend on the type of a - you will need to use ULONG_MAX for an unsigned long. However, the first will not depend on its type, and it's a nice way of getting the highest value.
We are not talking about whether -1 has all bits one (it doesn't always have). And we're not talking about whether ~0 has all bits one (it has, of course).
But what we are talking about is what the result of the initialized flags variable is. And for it, only -1 will work with every type and machine.
unsigned int flags = -1; is portable.
unsigned int flags = ~0; isn't portable because it
relies on a two's-complement representation.
unsigned int flags = 0xffffffff; isn't portable because
it assumes 32-bit ints.
If you want to set all bits in a way guaranteed by the C standard, use the first one.
Frankly I think all fff's is more readable. As to the comment that its an antipattern, if you really care that all the bits are set/cleared, I would argue that you are probably in a situation where you care about the size of the variable anyway, which would call for something like boost::uint16_t, etc.
A way which avoids the problems mentioned is to simply do:
unsigned int flags = 0;
flags = ~flags;
Portable and to the point.
I am not sure using an unsigned int for flags is a good idea in the first place in C++. What about bitset and the like?
std::numeric_limit<unsigned int>::max() is better because 0xffffffff assumes that unsigned int is a 32-bit integer.
unsigned int flags = -1; // all bits are true
"Is this a good[,] portable way to accomplish this?"
Portable? Yes.
Good? Debatable, as evidenced by all the confusion shown on this thread. Being clear enough that your fellow programmers can understand the code without confusion should be one of the dimensions we measure for good code.
Also, this method is prone to compiler warnings. To elide the warning without crippling your compiler, you'd need an explicit cast. For example,
unsigned int flags = static_cast<unsigned int>(-1);
The explicit cast requires that you pay attention to the target type. If you're paying attention to the target type, then you'll naturally avoid the pitfalls of the other approaches.
My advice would be to pay attention to the target type and make sure there are no implicit conversions. For example:
unsigned int flags1 = UINT_MAX;
unsigned int flags2 = ~static_cast<unsigned int>(0);
unsigned long flags3 = ULONG_MAX;
unsigned long flags4 = ~static_cast<unsigned long>(0);
All of which are correct and more obvious to your fellow programmers.
And with C++11: We can use auto to make any of these even simpler:
auto flags1 = UINT_MAX;
auto flags2 = ~static_cast<unsigned int>(0);
auto flags3 = ULONG_MAX;
auto flags4 = ~static_cast<unsigned long>(0);
I consider correct and obvious better than simply correct.
Converting -1 into any unsigned type is guaranteed by the standard to result in all-ones. Use of ~0U is generally bad since 0 has type unsigned int and will not fill all the bits of a larger unsigned type, unless you explicitly write something like ~0ULL. On sane systems, ~0 should be identical to -1, but since the standard allows ones-complement and sign/magnitude representations, strictly speaking it's not portable.
Of course it's always okay to write out 0xffffffff if you know you need exactly 32 bits, but -1 has the advantage that it will work in any context even when you do not know the size of the type, such as macros that work on multiple types, or if the size of the type varies by implementation. If you do know the type, another safe way to get all-ones is the limit macros UINT_MAX, ULONG_MAX, ULLONG_MAX, etc.
Personally I always use -1. It always works and you don't have to think about it.
As long as you have #include <limits.h> as one of your includes, you should just use
unsigned int flags = UINT_MAX;
If you want a long's worth of bits, you could use
unsigned long flags = ULONG_MAX;
These values are guaranteed to have all the value bits of the result set to 1, regardless of how signed integers are implemented.
Yes. As mentioned in other answers, -1 is the most portable; however, it is not very semantic and triggers compiler warnings.
To solve these issues, try this simple helper:
static const struct All1s
{
template<typename UnsignedType>
inline operator UnsignedType(void) const
{
static_assert(std::is_unsigned<UnsignedType>::value, "This is designed only for unsigned types");
return static_cast<UnsignedType>(-1);
}
} ALL_BITS_TRUE;
Usage:
unsigned a = ALL_BITS_TRUE;
uint8_t b = ALL_BITS_TRUE;
uint16_t c = ALL_BITS_TRUE;
uint32_t d = ALL_BITS_TRUE;
uint64_t e = ALL_BITS_TRUE;
On Intel's IA-32 processors it is OK to write 0xFFFFFFFF to a 64-bit register and get the expected results. This is because IA32e (the 64-bit extension to IA32) only supports 32-bit immediates. In 64-bit instructions 32-bit immediates are sign-extended to 64-bits.
The following is illegal:
mov rax, 0ffffffffffffffffh
The following puts 64 1s in RAX:
mov rax, 0ffffffffh
Just for completeness, the following puts 32 1s in the lower part of RAX (aka EAX):
mov eax, 0ffffffffh
And in fact I've had programs fail when I wanted to write 0xffffffff to a 64-bit variable and I got a 0xffffffffffffffff instead. In C this would be:
uint64_t x;
x = UINT64_C(0xffffffff)
printf("x is %"PRIx64"\n", x);
the result is:
x is 0xffffffffffffffff
I thought to post this as a comment to all the answers that said that 0xFFFFFFFF assumes 32 bits, but so many people answered it I figured I'd add it as a separate answer.
See litb's answer for a very clear explanation of the issues.
My disagreement is that, very strictly speaking, there are no guarantees for either case. I don't know of any architecture that does not represent an unsigned value of 'one less than two to the power of the number of bits' as all bits set, but here is what the Standard actually says (3.9.1/7 plus note 44):
The representations of integral types shall define values by use of a pure binary numeration system. [Note 44:]A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.
That leaves the possibility for one of the bits to be anything at all.
I would not do the -1 thing. It's rather non-intuitive (to me at least). Assigning signed data to an unsigned variable just seems to be a violation of the natural order of things.
In your situation, I always use 0xFFFF. (Use the right number of Fs for the variable size of course.)
[BTW, I very rarely see the -1 trick done in real-world code.]
Additionally, if you really care about the individual bits in a vairable, it would be good idea to start using the fixed-width uint8_t, uint16_t, uint32_t types.
Although the 0xFFFF (or 0xFFFFFFFF, etc.) may be easier to read, it can break portability in code which would otherwise be portable. Consider, for example, a library routine to count how many items in a data structure have certain bits set (the exact bits being specified by the caller). The routine may be totally agnostic as to what the bits represent, but still need to have an "all bits set" constant. In such a case, -1 will be vastly better than a hex constant since it will work with any bit size.
The other possibility, if a typedef value is used for the bitmask, would be to use ~(bitMaskType)0; if bitmask happens to only be a 16-bit type, that expression will only have 16 bits set (even if 'int' would otherwise be 32 bits) but since 16 bits will be all that are required, things should be fine provided that one actually uses the appropriate type in the typecast.
Incidentally, expressions of the form longvar &= ~[hex_constant] have a nasty gotcha if the hex constant is too large to fit in an int, but will fit in an unsigned int. If an int is 16 bits, then longvar &= ~0x4000; or longvar &= ~0x10000; will clear one bit of longvar, but longvar &= ~0x8000; will clear out bit 15 and all bits above that. Values which fit in int will have the complement operator applied to a type int, but the result will be sign extended to long, setting the upper bits. Values which are too big for unsigned int will have the complement operator applied to type long. Values which are between those sizes, however, will apply the complement operator to type unsigned int, which will then be converted to type long without sign extension.
As others have mentioned, -1 is the correct way to create an integer that will convert to an unsigned type with all bits set to 1. However, the most important thing in C++ is using correct types. Therefore, the correct answer to your problem (which includes the answer to the question you asked) is this:
std::bitset<32> const flags(-1);
This will always contain the exact amount of bits you need. It constructs a std::bitset with all bits set to 1 for the same reasons mentioned in other answers.
It is certainly safe, as -1 will always have all available bits set, but I like ~0 better. -1 just doesn't make much sense for an unsigned int. 0xFF... is not good because it depends on the width of the type.
Practically: Yes
Theoretically: No.
-1 = 0xFFFFFFFF (or whatever size an int is on your platform) is only true with two's complement arithmetic. In practice, it will work, but there are legacy machines out there (IBM mainframes, etc.) where you've got an actual sign bit rather than a two's complement representation. Your proposed ~0 solution should work everywhere.
I say:
int x;
memset(&x, 0xFF, sizeof(int));
This will always give you the desired result.
Leveraging on the fact that assigning all bits to one for an unsigned type is equivalent to taking the maximum possible value for the given type,
and extending the scope of the question to all unsigned integer types:
Assigning -1 works for any unsigned integer type (unsigned int, uint8_t, uint16_t, etc.) for both C and C++.
As an alternative, for C++, you can either:
Include <limits> and use std::numeric_limits< your_type >::max()
Write a custom templated function (This would also allow some sanity check, i.e. if the destination type is really an unsigned type)
The purpose could be add more clarity, as assigning -1 would always need some explanatory comment.
A way to make the meaning bit more obvious and yet to avoid repeating the type:
const auto flags = static_cast<unsigned int>(-1);
An additional effort to emphasize, why Adrian McCarthy's approach here might be the best solution at latest since C++11 in terms of a compromise between standard conformity, type safety/explicit clearness and reduction of possible ambiguities:
unsigned int flagsPreCpp11 = ~static_cast<unsigned int>(0);
auto flags = ~static_cast<unsigned int>(0); // C++11 initialization
predeclaredflags = ~static_cast<decltype(predeclaredflags)>(0); // C++11 assignment to already declared variable
I'm going to explain my preference in detail below. As Johannes mentioned totally correctly, the fundamental origin of irritations here is the question about value vs. according bit representation semantics and about what types we're talking about exactly (the assigned value type vs. the possible compile time integral constant's type). Since there's no standard built-in mechanism to explicitly ensure the set of all bits to 1 for the concrete use case of the OP about unsigned integer values, it's obvious, that it's impossible to be fully independent of value semantics here (std::bitset is a common pure bit-layer refering container but the question was about unsigned integers in general). But we might be able to reduce ambiguity here.
Comparison of the 'better' standard compliant approaches:
The OP's way:
unsigned int flags = -1;
PROs:
is "established" and short
is quite intuitive in terms of modulo perspective of value to "natural" bit value representation
changing the target unsigned type to unsigned long for instance is possible without any further adaptions
CONs:
At least beginners might not be sure about the standard conformity ("Do I have to concern about padding bits?").
Violates type ranges (in the heavier way: signed vs. unsigned).
Solely from the code, you do not directly see any bit semantics association.
Refering to maximum values via defines:
unsigned int flags = UINT_MAX;
This circumvents the signed unsigned transition issue of the -1 approach but introduces several new problems: In doubt, one has to look twice here again, at the latest if you want to change the target type to unsigned long for instance. And here, one has to be sure about the fact, that the maximum value leads to all bits set to 1 by the standard (and padding bit concerns again). Bit semantics are also not obvious here directly from the code solely again.
Refering to maximum values more explicitly:
auto flags = std::numeric_limits<unsigned int>::max();
On my opinion, that's the better maximum value approach since it's macro/define free and one is explicit about the involved type. But all other concerns about the approach type itself remain.
Adrian's approach (and why I think, it's the preferred one before C++11 and since):
unsigned int flagsPreCpp11 = ~static_cast<unsigned int>(0);
auto flagsCpp11 = ~static_cast<unsigned int>(0);
PROs:
Only the simplest integral compile time constant is used: 0. So no worries about further bit representation or (implicit) casts are justified. From an intuitive point of view, I think we all can agree on the fact, that the bit representation for zero is commonly clearer than for maximum values, not only for unsigned integrals.
No type ambiguities are involved, no further look-ups required in doubt.
Explicit bit semantics are involved here via the complement ~. So it's quite clear from the code, what the intention was. And it's also very explicit, on which type and type range, the complement is applied.
CONs:
If assigned to a member for instance, there's a small chance that you mismatch types with pre C++11:
Declaration in class:
unsigned long m_flags;
Initialization in constructor:
m_flags(~static_cast<unsigned int>(0))
But since C++11, the usage of decltype + auto is powerful to prevent most of these possible issues. And some of these type mismatch scenarios (on interface boundaries for instance) are also possible for the -1 approach.
Robust final C++11 approach for pre-declared variables:
m_flags(~static_cast<decltype(m_flags)>(0)) // member initialization case
So with a full view on the weighting of the PROs and CONs of all approaches here, I recommend this one as the preferred approach, at latest since C++11.
Update: Thanks to a hint by Andrew Henle, I removed the statement about its readability since that might be a too subjective statement. But I still think, its readability is at least not that worse than most of the maximum value approaches or the ones with explicit maximum value provision via compile time integrals/literals since static_cast-usage is "established" too and built-in in contrast to defines/macros and even the std-lib.
yes the representation shown is very much correct as if we do it the other way round u will require an operator to reverse all the bits but in this case the logic is quite straightforward if we consider the size of the integers in the machine
for instance in most machines an integer is 2 bytes = 16 bits maximum value it can hold is 2^16-1=65535 2^16=65536
0%65536=0
-1%65536=65535 which corressponds to 1111.............1 and all the bits are set to 1 (if we consider residue classes mod 65536)
hence it is much straight forward.
I guess
no if u consider this notion it is perfectly dine for unsigned ints and it actually works out
just check the following program fragment
int main()
{
unsigned int a=2;
cout<<(unsigned int)pow(double(a),double(sizeof(a)*8));
unsigned int b=-1;
cout<<"\n"<<b;
getchar();
return 0;
}
answer for b = 4294967295 whcih is -1%2^32 on 4 byte integers
hence it is perfectly valid for unsigned integers
in case of any discrepancies plzz report