reinterpret_cast swaps bits? - c++

I was testing a simple compiler when I noticed that its output was completely wrong. In fact, the output had its endianness swapped from little to big. Upon closer examination, the offending code turned out to be this:
const char *bp = reinterpret_cast<const char*>(&command._instruction);
for (int i = 0; i < 4; ++i)
out << bp[i];
A four-byte instruction is reinterpreted as a set of one-byte characters and printed to stdout (it's clunky, yes, but that decision was not mine). It doesn't seem logical to me why the bits would be swapped, since the char pointer should be pointing to the most-significant (on this x86 system) bits at first. For example, given 0x00...04, the char pointer should point to 0x00, not 0x04. The case is the latter.
I have created a simple demonstration of code:
CODE
#include <bitset>
#include <iostream>
#include <stdint.h>
int main()
{
int32_t foo = 4;
int8_t* cursor = reinterpret_cast<int8_t*>(&foo);
std::cout << "Using a moving 8-bit pointer:" << std::endl;
for (int i = 0; i < 4; ++i)
std::cout << std::bitset<8>(cursor[i]) << " "; // <-- why?
std::cout << std::endl << "Using original 4-byte int:" << std::endl;
std::cout << std::bitset<32>(foo) << std::endl;
return 0;
}
Output:
Using a moving 8-bit pointer:
00000100 00000000 00000000 00000000
Using original 4-byte int:
00000000000000000000000000000100

It doesn't seem logical to me why the bits would be swapped, since the char pointer should be pointing to the most-significant (on this x86 system) bits at first.
On an x86 system, a pointer to the base of a multi-byte object does not point at the most significant byte, but at the least-significant byte. This is called "little endian" byte order.
In C, if we take the address of an object that occupies multiple bytes, and convert that to char *, it points to the base of the object: that one which is considered to be at the least significant address, from which the pointer can be positively displaced (with + or ++ etc) to get to the other bytes.

Related

conversion of integers into binary in c++

As we know, each value is stored in binary form inside memory. So, in C++, will these two values have different binary numbers when stored inside memory ?
unsigned int a = 90;
signed int b = 90;
So, in C++, will these two values have different binary numbers when stored inside memory ?
The C++ language doesn't specify whether they do. Ultimately, the binary representation is dictated by the hardware, so the answer technically depends on that.
That said, I haven't encountered hardware and C++ implementation where identically valued signed and unsigned variants of an integer didn't have identical binary representation. As such, I would find it surprising if the binary representations were different.
Sidenote: Since "byte" is the smallest addressable unit of memory in C++, there isn't a way in the language to observe a directional order of individual bits in memory.
Consider the value 63. In binary it is 111111 and in hex it is 3f.
Because char is special in C++, and any object can be viewed as a sequence of bytes, you can directly look at the binary representation:
#include <iostream>
#include <iomanip>
int main()
{
unsigned int a = 63;
signed int b = 63;
std::cout << std::hex;
char* a_bin = reinterpret_cast<char*>(&a);
for (int i=0; i < sizeof(unsigned int); ++i)
std::cout << std::setw(4) << std::setfill('0') << static_cast<unsigned>(*(a_bin+i)) << " ";
std::cout << "\n";
char* b_bin = reinterpret_cast<char*>(&b);
for (int i=0; i < sizeof(signed int); ++i)
std::cout << std::setw(4) << std::setfill('0') << static_cast<unsigned>(*(b_bin+i)) << " ";
}
Unfortunately, there is no std::bin io-manipulator, so I used std::hex (it is sticky). The reinterpret_cast is ok, because of the aforementioned special rules for char. Because std::cout << has special overload to print characters, but we want to see numerical values, another cast is needed. The output of the above is:
003f 0000 0000 0000
003f 0000 0000 0000
Live Demo
As already mentioned in a comment, the byte order is implementation defined. Moreover, I have to admit that I am not aware about the very details of what the standard has to say about this. Be careful with assumptions about byte representation, especially when transfering objects between two programs or over a wire. You would typically use some form of de-/serialization, such that you are in control of the byte representations to be transfered.
TL;DR: Typically yes, in general you need to carefully consider what the C++ standard mandates, and I am not aware of signed and unsigned being guaranteed to have same byte representations.

how can I create a bitmaped data in c?

I am trying to create a bitmaped data in , here is the code I used but I am not able to figure the right logic. Here's my code
bool a=1;
bool b=0;
bool c=1;
bool d=0;
uint8_t output = a|b|c|d;
printf("outupt = %X", output);
I want my output to be "1010" which is equivalent to hex "0x0A". How do I do it ??
The bitwise or operator ors the bits in each position. The result of a|b|c|d will be 1 because you're bitwise oring 0 and 1 in the least significant position.
You can shift (<<) the bits to the correct positions like this:
uint8_t output = a << 3 | b << 2 | c << 1 | d;
This will result in
00001000 (a << 3)
00000000 (b << 2)
00000010 (c << 1)
| 00000000 (d; d << 0)
--------
00001010 (output)
Strictly speaking, the calculation happens with ints and the intermediate results have more leading zeroes, but in this case we do not need to care about that.
If you're interested in setting/clearing/accessing very simply specific bits, you could consider std::bitset:
bitset<8> s; // bit set of 8 bits
s[3]=a; // access individual bits, as if it was an array
s[2]=b;
s[1]=c;
s[0]=d; // the first bit is the least significant bit
cout << s <<endl; // streams the bitset as a string of '0' and '1'
cout << "0x"<< hex << s.to_ulong()<<endl; // convert the bitset to unsigned long
cout << s[3] <<endl; // access a specific bit
cout << "Number of bits set: " << s.count()<<endl;
Online demo
The advantage is that the code is easier to read and maintain, especially if you're modifying bitmapped data. Because setting specific bits using binary arithmetics with a combination of << and | operators as explained by Anttii is a vorkable solution. But clearing specific bits in an existing bitmap, by combining the use of << and ~ (to create a bit mask) with & is a little more tricky.
Another advantage is that you can easily manage large bitsets of hundreds of bits, much larger than the largest built-in type unsigned long long (although doing so will not allow you to convert as easily to an unsigned long or an unsigned long long: you'll have to go via a string).
C only
I would use bitfields. I know that they are not portable, but for the particular embedded hardware (especially uCs) it is well defined.
#include <string.h>
#include <stdio.h>
#include <stdbool.h>
typedef union
{
struct
{
bool a:1;
bool b:1;
bool c:1;
bool d:1;
bool e:1;
bool f:1;
};
unsigned char byte;
}mydata;
int main(void)
{
mydata d;
d.a=1;
d.b=0;
d.c=1;
d.d=0;
printf("outupt = %hhX", d.byte);
}

How to access range of bits in a bitset?

I have a bitset which is very large, say, 10 billion bits.
What I'd like to do is write this to a file. However using .to_string() actually freezes my computer.
What I'd like to do is iterate over the bits and take 64 bits at a time, turn it into a uint64 and then write it to a file.
However I'm not aware how to access different ranges of the bitset. How would I do that? I am new to c++ and wasn't sure how to access the underlying bitset::reference so please provide an example for an answer.
I tried using a pointer but did not get what I expected. Here's an example of what I'm trying so far.
#include <iostream>
#include <bitset>
#include <cstring>
using namespace std;
int main()
{
bitset<50> bit_array(302332342342342323);
cout<<bit_array << "\n";
bitset<50>* p;
p = &bit_array;
p++;
int some_int;
memcpy(&some_int, p , 2);
cout << &bit_array << "\n";
cout << &p << "\n";
cout << some_int << "\n";
return 0;
}
the output
10000110011010100111011101011011010101011010110011
0x7ffe8aa2b090
0x7ffe8aa2b098
17736
The last number seems to change on each run which is not what I expect.
There are a couple of errors in the program. The maximum value bitset<50> can hold is 1125899906842623 and this is much less than what bit_array has been initialized with in the program.
some_int has to be defined as unsigned long and verify if unsigned long has 64 bits on your platform.
After this, test each bit of bit_array in a loop and then do the appropriate bitwise (OR and shift) operations and store the result into some_int.
std::size_t start_bit = 0;
std::size_t end_bit = 64;
for (std::size_t i = start_bit; i < end_bit; i++) {
if (bit_array[i])
some_int |= mask;
mask <<= 1;
}
You can change the values of start_bit and end_bit appropriately as you navigate through the large bitset.
See DEMO.
For accessing ranges of a bitset, you should look at the provided interface. The lack of something like bitset::data() indicates that you should not try to access the underlying data directly. Doing so, even if it had seemed to work, is fragile, hacky, and probably undefined behavior of some sort.
I see two possibilities for converting a massive bitset into more manageable pieces. A fairly straight-forward approach is to just go through bit-by-bit and collect these into an integer of some sort (or write them directly to a file as '0' or '1' if you're not that concerned about file size). Looks like P.W already provided code for this, so I'll skip an example for now.
The second possibility is to use bitwise operators and to_ullong(). The downside of this approach is that it nominally uses auxiliary storage space, specifically two additional bitsets the same size as your original. I say "nominally", though, because a compiler might be clever enough to optimize them away. Might. Maybe not. And you are dealing with sizes over a gigabyte each. Realistically, the bit-by-bit approach is probably the way to go, but I think this example is interesting at a theoretical level.
#include <iostream>
#include <iomanip>
#include <bitset>
#include <cstdint>
using namespace std;
constexpr size_t FULL_SIZE = 120; // Some large number
constexpr size_t CHUNK_SIZE = 64; // Currently the mask assumes 64. Otherwise, this code just
// assumes CHUNK_SIZE is nonzero and at most the number of
// bits in long long (which is at least 64).
int main()
{
// Generate some large bitset. This is just test data, so don't read too much into this.
bitset<FULL_SIZE> bit_array(302332342342342323);
bit_array |= bit_array << (FULL_SIZE/2);
cout << "Source: " << bit_array << "\n";
// The mask avoids overflow in to_ullong().
// The mask should be have exactly its CHUNK_SIZE low-order bits set.
// As long as we're dealing with 64-bit chunks, there's a handy constant to handle this.
constexpr bitset<FULL_SIZE> mask64(UINT64_MAX);
cout << "Mask: " << mask64 << "\n";
// Extract chunks.
const size_t num_chunks = (FULL_SIZE + CHUNK_SIZE - 1)/CHUNK_SIZE; // Round up.
for ( size_t i = 0; i < num_chunks; ++i ) {
// Extract the next CHUNK_SIZE bits, then convert to an integer.
const bitset<FULL_SIZE> chunk_set{(bit_array >> (CHUNK_SIZE * i)) & mask64};
unsigned long long chunk_val = chunk_set.to_ullong();
// NOTE: as long as CHUNK_SIZE <= 64, chunk_val can be converted safely to the desired uint64_t.
cout << "Chunk " << dec << i << ": 0x" << hex << setfill('0') << setw(16) << chunk_val << "\n";
}
return 0;
}
The output:
Source: 010000110010000110011010100111011101011011010101011010110011010000110010000110011010100111011101011011010101011010110011
Mask: 000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111
Chunk 0: 0x343219a9dd6d56b3
Chunk 1: 0x0043219a9dd6d56b

Bit representation of float using an int pointer

I have the following exercise:
Implement a function void float to bits(float x) which prints the bit
representation of x. Hint: Casting a float to an int truncates the
fractional part, but no information is lost casting a float pointer to
an int pointer.
Now, I know that a float is represented by a sign-bit, some bits for its mantissa, some bits for the basis and some bits for the exponent. It depends on my system how many bits are used.
The problem we are facing here is that our number basically has two parts. Let's consider 8.7 the bit representation of this number would be (to my understanding) the following: 1000.0111
Now, float's are stored wit a leading zero, so 8.8 would become 0.88*10^1
So I somehow have to get all the information out of my memory. I don't really see how I should do that. What should that hint hint me to? What's the difference between a integer pointer and a float pointer?
Currently I have this:
void float_to_bits() {
float a = 4.2345678f;
int* b;
b = (int*)(&a);
*b = a;
std::cout << *(b) << "\n";
}
But I really don't get the bigger picture behind the hint here. How do I get the mantissa, the exponent, the sign and the basis? I also tried playing around with the bit-wise operators >>, <<. But I just don't see how this should help me here, since they won't change the pointers position. It's useful to get e.g. the bit representation of an integer but that's about it, no idea what use it'd be here.
The hint your teacher gave is misleading: casting pointer between different types is at best implementation defined. However, memcpy(...)ing an object to a suutably sized array if unsigned char is defined. The content if the resulting array can then be decomposed into bits. Here is a quick hack to represent the bits using hexadecimal values:
#include <iostream>
#include <iomanip>
#include <cstring>
int main() {
float f = 8.7;
unsigned char bytes[sizeof(float)];
std::memcpy(bytes, &f, sizeof(float));
std::cout << std::hex << std::setfill(‘0’);
for (int b: bytes) {
std::cout << std::setw(2) << b;
}
std::cout << ‘\n’;
}
Note that IEEE 754 binary floating points do not store the full significand (the standard doesn’t use mantissa as a term) except for denormalized values: the 32 bit floats store
1 bit for the sign
8 bits for the exponent
23 bits for the normalized significand with the non-zero high bit being implied
The hint directs you how to pass the Float into an Integer without passing through value conversion.
When you assign floating-point value to an integer, the processor removes the fraction part. int i = (int) 4.502f; will result in i=4;
but when you make a int pointer (int*) point to a float's location,
no conversion is made, also when you read the int* value.
to show the representation, i like seeing HEX numbers,
thats why my first example was given in HEX
(each Hexa-decimal digit represents 4 binary digits).
but it is also possible to print as binary,
and there are many ways (I like this one best!)
Follows an annotated example code:
Also available # Culio
#include <iostream>
#include <bitset>
using namespace std;
int main()
{
float a = 4.2345678f; // allocate space for a float. Call it 'a' and put the floating point value of `4.2345678f` in it.
unsigned int* b; // allocate a space for a pointer (address), call the space b, (hint to compiler, this will point to integer number)
b = (unsigned int*)(&a); // GREAT, exactly what you needed! take the float 'a', get it's address '&'.
// by default, it is an address pointing at float (float*) , so you correctly cast it to (int*).
// Bottom line: Set 'b' to the address of a, but treat this address of an int!
// The Hint implied that this wont cause type conversion:
// int someInt = a; // would cause `someInt = 4` same is your line below:
// *b = a; // <<<< this was your error.
// 1st thing, it aint required, as 'b' already pointing to `a` address, hence has it's value.
// 2nd by this, you set the value pointed by `b` to 'a' (including conversion to int = 4);
// the value in 'a' actually changes too by this instruction.
cout << a << " in binary " << bitset<32>(*b) << endl;
cout << "Sign " << bitset<1>(*b >> 31) << endl; // 1 bit (31)
cout << "Exp " << bitset<8>(*b >> 23) << endl; // 8 bits (23-30)
cout << "Mantisa " << bitset<23>(*b) << endl; // 23 bits (0-22)
}

Why is (int)'\xff' != 0xff but (int)'\x7f' == 0x7f?

Consider this code :
typedef union
{
int integer_;
char mem_[4];
} MemoryView;
int main()
{
MemoryView mv;
mv.integer_ = (int)'\xff';
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\xff\xff\xff
mv.integer_ = 0xff;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\x00\x00\x00
// now i try with a value less than 0x80
mv.integer_ = (int)'\x7f'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
mv.integer_ = 0x7f;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
// now i try with 0x80
mv.integer_ = (int)'\x80'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\xff\xff\xff
mv.integer_ = 0x80;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\x00\x00\x00
}
I tested it with both GCC4.6 and MSVC2010 and results was same.
When I try with values less than 0x80 output is correct but with values bigger than 0x80,
left three bytes are '\xff'.
CPU : Intel 'core 2 Duo'
Endianness : little
OS : Ubuntu 12.04LTS (64bit), Windows 7(64 bit)
It's implementation-specific whether type char is signed or unsigned.
Assigning a variable of type char the value of 0xFF might either yield 255 (if type is really unsigned) or -1 (if type is really signed) in most implementations (where the number of bits in char is 8).
Values less, or equal to, 0x7F (127) will fit in both an unsigned char and a signed char which explains why you are getting the result you are describing.
#include <iostream>
#include <limits>
int
main (int argc, char *argv[])
{
std::cerr << "unsigned char: "
<< +std::numeric_limits<unsigned char>::min ()
<< " to "
<< +std::numeric_limits<unsigned char>::max ()
<< ", 0xFF = "
<< +static_cast<unsigned char> ('\xFF')
<< std::endl;
std::cerr << " signed char: "
<< +std::numeric_limits<signed char>::min ()
<< " to "
<< +std::numeric_limits<signed char>::max ()
<< ", 0xFF = "
<< +static_cast<signed char> ('\xFF')
<< std::endl;
}
typical output
unsigned char: 0 to 255, 0xFF = 255
signed char: -128 to 127, 0xFF = -1
To circumvent the problem you are experiencing explicitly declare your variable as either signed or unsigned, in this case casting your value into a unsigned char will be sufficient:
mv.integer_ = static_cast<unsigned char> ('\xFF'); /* 255, NOT -1 */
side note:
you are invoking undefined behaviour when reading a member of a union that is not the last member you wrote to. the standard doesn't specify what will be going on in this case. sure, under most implementations it will work as expected. accessing union.mem_[0] will most probably yield the first byte of union.integer_, but this is not guarenteed.
The type of '\xff' is char. char is a signed integral type on a lot of platforms, so the value of '\xff is negative (-1 rather than 255). When you convert (cast) that to an int (also signed), you get an int with the same, negative, value.
Anything strictly less than 0x80 will be positive, and you'll get a positive out of the conversion.
Because '\xff' is a signed char (default for char is signed in many architectures, but not always) - when converted to an integer, it is sign-extended, to make it 32-bit (in this case) int.
In binary arithmetic, nearly all negative representations use the highest bit to indicate "this is negative" and some sort of "inverse" logic to represent the value. The most common is to use "two's complement", where there is no "negative zero". In this form, all ones is -1, and the "most negative number" is a 1 followed by a lot of zeros, so 0x80 in 8 bits is -128, 0x8000 in 16 bits is -32768, and 0x80000000 is -2147 million (and some more digits).
A solution, in this case, would be to use static_cast<unsigned char>('\xff').
Basically, 0xff stored in a signed 8 bit char is -1. Whether a char without signedor unsigned specifier is signed or unsigned depends on the compiler and/or platform and in this case it seems to be.
Cast to an int, it keeps the value -1, which stored in a 32 bit signed int is 0xffffffff.
0x7f on the other hand stored in an 8 bit signed char is 127, which cast to a 32 bit int is 0x0000007f.