Is std::bitset bit-order portable? - c++

Does C++ say anything on bit-ordering? I'm especially working on protocol packet layouts, and I'm doubting whether there is a portable way to specify that a certain number be written into bits 5,6,7, where bit 5 is the 'most significant'.
My questions:
is 0x01 always represented as a byte with bit 7 set?
is bitset<8>().set(7).to_ulong() always equal to 1?

From 20.5/3 (ISO/IEC 14882:2011)
When converting between an object
of class bitset and a value of some integral type, bit position pos corresponds to the bit value 1 << pos.
That is, bitset<8>().set(7).to_ulong() is guaranteed to be (1 << 7) == 128.

bitset doesn't do serialization, so you don't (need to) know. Use serialization/deserialization.
is bitset<8>().set(7).to_ulong() always equal to 1
No, not on my machine (see below).
However, I'd certainly expect the iostream operators to behave portably:
#include <bitset>
#include <sstream>
#include <iostream>
int main()
{
std::bitset<8> bits;
std::cout << bits.set(7).to_ulong() << std::endl;
std::stringstream ss;
ss << bits;
std::cout << ss.rdbuf() << std::endl;
std::bitset<8> cloned;
ss >> cloned;
std::cout << cloned.set(7).to_ulong() << std::endl;
std::cout << cloned << std::endl;
}
Prints
128
10000000
128
10000000

If the question is whether you can happily ignore endianness of the platform while sending binary objects over the network, the answer is you cannot. If the question is whether the same code compiled in two different platforms will yield the same results, then the answer is yes.

Related

conversion of integers into binary in c++

As we know, each value is stored in binary form inside memory. So, in C++, will these two values have different binary numbers when stored inside memory ?
unsigned int a = 90;
signed int b = 90;
So, in C++, will these two values have different binary numbers when stored inside memory ?
The C++ language doesn't specify whether they do. Ultimately, the binary representation is dictated by the hardware, so the answer technically depends on that.
That said, I haven't encountered hardware and C++ implementation where identically valued signed and unsigned variants of an integer didn't have identical binary representation. As such, I would find it surprising if the binary representations were different.
Sidenote: Since "byte" is the smallest addressable unit of memory in C++, there isn't a way in the language to observe a directional order of individual bits in memory.
Consider the value 63. In binary it is 111111 and in hex it is 3f.
Because char is special in C++, and any object can be viewed as a sequence of bytes, you can directly look at the binary representation:
#include <iostream>
#include <iomanip>
int main()
{
unsigned int a = 63;
signed int b = 63;
std::cout << std::hex;
char* a_bin = reinterpret_cast<char*>(&a);
for (int i=0; i < sizeof(unsigned int); ++i)
std::cout << std::setw(4) << std::setfill('0') << static_cast<unsigned>(*(a_bin+i)) << " ";
std::cout << "\n";
char* b_bin = reinterpret_cast<char*>(&b);
for (int i=0; i < sizeof(signed int); ++i)
std::cout << std::setw(4) << std::setfill('0') << static_cast<unsigned>(*(b_bin+i)) << " ";
}
Unfortunately, there is no std::bin io-manipulator, so I used std::hex (it is sticky). The reinterpret_cast is ok, because of the aforementioned special rules for char. Because std::cout << has special overload to print characters, but we want to see numerical values, another cast is needed. The output of the above is:
003f 0000 0000 0000
003f 0000 0000 0000
Live Demo
As already mentioned in a comment, the byte order is implementation defined. Moreover, I have to admit that I am not aware about the very details of what the standard has to say about this. Be careful with assumptions about byte representation, especially when transfering objects between two programs or over a wire. You would typically use some form of de-/serialization, such that you are in control of the byte representations to be transfered.
TL;DR: Typically yes, in general you need to carefully consider what the C++ standard mandates, and I am not aware of signed and unsigned being guaranteed to have same byte representations.

How to avoid 0xFF prefix when converting char to short?

When I do:
cout << std::hex << (short)('\x3A') << std::endl;
cout << std::hex << (short)('\x8C') << std::endl;
I expect the following output:
3a
8c
but instead, I have:
3a
ff8c
I suppose that this is due to the way char—and more precisely a signed char—is stored in memory: everything below 0x80 would not be prefixed; the value 0x80 and above, on the other hand, would be prefixed with 0xFF.
When given a signed char, how do I get a hexadecimal representation of the actual character inside it? In other words, how do I get 0x3A for \x3A, and 0x8C for \x8C?
I don't think a conditional logic is well suited here. While I can subtract 0xFF00 from the resulting short when needed, it doesn't seem very clear.
Your output might make more sense if you looked at it in decimal instead of hexadecimal:
std::cout << std::dec << (short)('\x3A') << std::endl;
std::cout << std::dec << (short)('\x8C') << std::endl;
output:
58
-116
The values were cast to short, so we are (most commonly) dealing with 16 bit values. The 16-bit binary representation of -116 is 1111 1111 1000 1100, which becomes FF8C in hexadecimal. So the output is correct given what you requested (on systems where char is a signed type). So not so much the way the char is stored in memory, but more the way the bits are interpreted. As an unsigned value, the 8-bit pattern 1000 1100 represents -116, and the conversion to short is supposed to preserve this value, rather than preserving the bits.
Your desired output of a hexadecimal 8C corresponds (for a short) to the decimal value 140. To get this value out of 8 bits, the value has to be interpreted as an unsigned 8-bit value (since the largest signed 8-bit value is 127). So the data needs to be interpreted as an unsigned char before it gets expanded to some flavor of short. For a character literal like in the example code, this would look like the following.
std::cout << std::hex << (unsigned short)(unsigned char)('\x3A') << std::endl;
std::cout << std::hex << (unsigned short)(unsigned char)('\x8C') << std::endl;
Most likely, the real code would have variables instead of character literals. If that is the case, then rather than casting to an unsigned char, it might be more convenient to declare the variable to be of unsigned char type. Which is possibly the type you should be using anyway, based on the fact that you want to see its hexadecimal value. Not definitively, but this does suggest that the value is seen simply as a byte of data rather than as a number, and that suggests that an unsigned type is appropriate. Have you looked at std::byte?
One other nifty thought to throw out: the following also gives the desired output as a reasonable facsimile of using an unsigned char variable.
#include <iostream>
unsigned char operator "" _u (char c) { return c; } // Suffix for unsigned char literals
int main()
{
std::cout << std::hex << (unsigned short)('\x3A'_u) << std::endl;
std::cout << std::hex << (unsigned short)('\x8C'_u) << std::endl;
}
A more straightforward approach is to cast a signed char to an unsigned char. In other words, this:
cout << std::hex << (short)(unsigned char)('\x3A') << std::endl;
cout << std::hex << (short)(unsigned char)('\x8C') << std::endl;
produces the expected result:
3a
8c
Not sure this is particularly clear, though.

How to access range of bits in a bitset?

I have a bitset which is very large, say, 10 billion bits.
What I'd like to do is write this to a file. However using .to_string() actually freezes my computer.
What I'd like to do is iterate over the bits and take 64 bits at a time, turn it into a uint64 and then write it to a file.
However I'm not aware how to access different ranges of the bitset. How would I do that? I am new to c++ and wasn't sure how to access the underlying bitset::reference so please provide an example for an answer.
I tried using a pointer but did not get what I expected. Here's an example of what I'm trying so far.
#include <iostream>
#include <bitset>
#include <cstring>
using namespace std;
int main()
{
bitset<50> bit_array(302332342342342323);
cout<<bit_array << "\n";
bitset<50>* p;
p = &bit_array;
p++;
int some_int;
memcpy(&some_int, p , 2);
cout << &bit_array << "\n";
cout << &p << "\n";
cout << some_int << "\n";
return 0;
}
the output
10000110011010100111011101011011010101011010110011
0x7ffe8aa2b090
0x7ffe8aa2b098
17736
The last number seems to change on each run which is not what I expect.
There are a couple of errors in the program. The maximum value bitset<50> can hold is 1125899906842623 and this is much less than what bit_array has been initialized with in the program.
some_int has to be defined as unsigned long and verify if unsigned long has 64 bits on your platform.
After this, test each bit of bit_array in a loop and then do the appropriate bitwise (OR and shift) operations and store the result into some_int.
std::size_t start_bit = 0;
std::size_t end_bit = 64;
for (std::size_t i = start_bit; i < end_bit; i++) {
if (bit_array[i])
some_int |= mask;
mask <<= 1;
}
You can change the values of start_bit and end_bit appropriately as you navigate through the large bitset.
See DEMO.
For accessing ranges of a bitset, you should look at the provided interface. The lack of something like bitset::data() indicates that you should not try to access the underlying data directly. Doing so, even if it had seemed to work, is fragile, hacky, and probably undefined behavior of some sort.
I see two possibilities for converting a massive bitset into more manageable pieces. A fairly straight-forward approach is to just go through bit-by-bit and collect these into an integer of some sort (or write them directly to a file as '0' or '1' if you're not that concerned about file size). Looks like P.W already provided code for this, so I'll skip an example for now.
The second possibility is to use bitwise operators and to_ullong(). The downside of this approach is that it nominally uses auxiliary storage space, specifically two additional bitsets the same size as your original. I say "nominally", though, because a compiler might be clever enough to optimize them away. Might. Maybe not. And you are dealing with sizes over a gigabyte each. Realistically, the bit-by-bit approach is probably the way to go, but I think this example is interesting at a theoretical level.
#include <iostream>
#include <iomanip>
#include <bitset>
#include <cstdint>
using namespace std;
constexpr size_t FULL_SIZE = 120; // Some large number
constexpr size_t CHUNK_SIZE = 64; // Currently the mask assumes 64. Otherwise, this code just
// assumes CHUNK_SIZE is nonzero and at most the number of
// bits in long long (which is at least 64).
int main()
{
// Generate some large bitset. This is just test data, so don't read too much into this.
bitset<FULL_SIZE> bit_array(302332342342342323);
bit_array |= bit_array << (FULL_SIZE/2);
cout << "Source: " << bit_array << "\n";
// The mask avoids overflow in to_ullong().
// The mask should be have exactly its CHUNK_SIZE low-order bits set.
// As long as we're dealing with 64-bit chunks, there's a handy constant to handle this.
constexpr bitset<FULL_SIZE> mask64(UINT64_MAX);
cout << "Mask: " << mask64 << "\n";
// Extract chunks.
const size_t num_chunks = (FULL_SIZE + CHUNK_SIZE - 1)/CHUNK_SIZE; // Round up.
for ( size_t i = 0; i < num_chunks; ++i ) {
// Extract the next CHUNK_SIZE bits, then convert to an integer.
const bitset<FULL_SIZE> chunk_set{(bit_array >> (CHUNK_SIZE * i)) & mask64};
unsigned long long chunk_val = chunk_set.to_ullong();
// NOTE: as long as CHUNK_SIZE <= 64, chunk_val can be converted safely to the desired uint64_t.
cout << "Chunk " << dec << i << ": 0x" << hex << setfill('0') << setw(16) << chunk_val << "\n";
}
return 0;
}
The output:
Source: 010000110010000110011010100111011101011011010101011010110011010000110010000110011010100111011101011011010101011010110011
Mask: 000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111
Chunk 0: 0x343219a9dd6d56b3
Chunk 1: 0x0043219a9dd6d56b

Is it more portable to use ~0 or -1 to represent a type with all bits flipped to 1?

I saw an code example today which used the following form to check against -1 for an unsigned 64-bit integer:
if (a == (uint64_t)~0)
Is there any use case where you would WANT to compare against ~0 instead of something like std::numeric_limits<uint64_t>::max() or straight up -1? The original intent was unclear to me as I'd not seen a comparison like this before.
To clarify, the comparison is checking for an error condition where the unsigned integer type will have all of its bits set to 1.
UPDATE
According to https://stackoverflow.com/a/809341/1762276, -1 does not always represent all bits flipped to 1 but ~0 does. Is this correct?
I recommend you to do it exactly as you have shown, since it is the
most straight forward one. Initialize to -1 which will work always,
independent of the actual sign representation, while ~ will sometimes
have surprising behavior because you will have to have the right
operand type. Only then you will get the most high value of an
unsigned type.
I believe this error case is handled so long as ~0 is always case to the correct type (as indicated). So this would suggest that (uint64_t)~0 is indeed a more accurate and portal representation of an unsigned type with all bits flipped?
All of the following seem to be true (GCC x86_x64):
#include <iostream>
#include <limits>
using namespace std;
int main() {
uint64_t a = 0xFFFFFFFFFFFFFFFF;
cout << (int)(a == -1) << endl;
cout << (int)(a == ~0) << endl;
cout << (int)(a == (uint64_t)-1) << endl;
cout << (int)(a == (uint64_t)~0) << endl;
cout << (int)(a == static_cast<uint64_t>(-1)) << endl;
cout << (int)(a == static_cast<uint64_t>(~0)) << endl;
cout << (int)(a == std::numeric_limits<uint64_t>::max()) << endl;
return 0;
}
Result:
1
1
1
1
1
1
1
In general you should be casting before applying the operator, because casting to a wider unsigned type may or may not cause sign extension depending on whether the source type is signed.
If you want a value of primitive type T with all bits set, the most portable approach is ~T(0). It should work on any number-like classes as well.
As Mr. Bingley said, the types from stdint.h are guaranteed to be two's-complement, so that -T(1) will also give a value with all bits set.
The source you reference has the right thought but misses some of the details, for example neither of (T)~0u nor (T)-1u will be the same as ~T(0u) and -T(1u). (To be fair, litb wasn't talking about widening in that answer you linked)
Note that if there are no variables, just an unsuffixed literal 0 or -1, then the source type is guaranteed to be signed and none of the above concerns apply. But why write different code when dealing with literals, when the universally correct code is no more complex?
std::numeric_limits<uint64_t>::max() is same as (uint64_t)~0 witch is same as (uint64_t)-1
look to this example of code:
#include <iostream>
#include <stdint.h>
using namespace std;
int main()
{
bool x = false;
cout << x << endl;
x = std::numeric_limits<uint64_t>::max() == (uint64_t)~0;
cout << x << endl;
x = false;
cout << x << endl;
x = std::numeric_limits<uint64_t>::max() == (uint64_t)-1;
cout << x;
}
Result:
0
1
0
1
so it's more simple to write (uint64_t)~0 or (uint64_t)-1 than std::numeric_limits<uint64_t>::max() in the code.
The fixed-width integer types like uint64_t are guaranteed to be represented in two's complement, so for those -1 and ~0 are equivalent. For the normal integer types (like int or long) this is not necessarily the case, since the C++ standard does not specify their bit representations.

Write BitSet of 8 bits to file (C++)

I have a BitSet of 8 bits.
How would I convert those 8 bits to a byte then write to file?
I have looked everywhere and only find converting the other way.
Thanks alot!
Assuming that you are talking about C++ STL bitsets, the answer is to convert the bitset to int (ulong to be precise), and casting the result into a char.
Example:
#include <bitset>
#include <iostream>
using namespace std;
main()
{
bitset<8> x;
char byte;
cout << "Enter a 8-bit bitset in binary: " << flush;
cin >> x;
cout << "x = " << x << endl;
byte = (char) x.to_ulong();
cout << "As byte: " << (int) byte << endl;
}
http://www.cplusplus.com/reference/stl/bitset/
They can also be directly inserted and extracted from streams in binary format.
You don't need to convert anything, you just write them to the output stream.
Aside from that, if you really wanted to extract them into something you're used to, to_ulong and to_string methods are provided.
If you have more bits in the set than an unsigned long can hold and don't want to write them out directly to the stream, then you're either going to have convert to a string and go that route, or access each bit using the [] operator and shift them into bytes that you're writing out.
You could use fstream std::ofstream:
#include <fstream>
std::ofstream os("myfile.txt", std::ofstream::binary);
os << static_cast<uint_fast8_t>(bitset<8>("01101001").to_ulong());
os.close();