Appending bits in C/C++ - c++

I want to append two unsigned 32bit integers into 1 64 bit integer. I have tried this code, but it fails. However, it works for 16bit integers into 1 32 bit
Code:
char buffer[33];
char buffer2[33];
char buffer3[33];
/*
uint16 int1 = 6535;
uint16 int2 = 6532;
uint32 int3;
*/
uint32 int1 = 653545;
uint32 int2 = 562425;
uint64 int3;
int3 = int1;
int3 = (int3 << 32 /*(when I am doing 16 bit integers, this 32 turns into a 16)*/) | int2;
itoa(int1, buffer, 2);
itoa(int2, buffer2, 2);
itoa(int3, buffer3, 2);
std::cout << buffer << "|" << buffer2 << " = \n" << buffer3 << "\n";
Output when the 16bit portion is enabled:
1100110000111|1100110000100 =
11001100001110001100110000100
Output when the 32bit portion is enabled:
10011111100011101001|10001001010011111001 =
10001001010011111001
Why is it not working? Thanks

I see nothing wrong with this code. It works for me. If there's a bug, it's in the code that's not shown.
Version of the given code, using standardized type declarations and iostream manipulations, instead of platform-specific library calls. The bit operations are identical to the example given.
#include <iostream>
#include <iomanip>
#include <stdint.h>
int main()
{
uint32_t int1 = 653545;
uint32_t int2 = 562425;
uint64_t int3;
int3 = int1;
int3 = (int3 << 32) | int2;
std::cout << std::hex << std::setw(8) << std::setfill('0')
<< int1 << " "
<< std::setw(8) << std::setfill('0')
<< int2 << "="
<< std::setw(16) << std::setfill('0')
<< int3 << std::endl;
return (0);
}
Resulting output:
0009f8e9 000894f9=0009f8e9000894f9
The bitwise operation looks correct to me. When working with bits, hexadecimal is more convenient. Any bug, if there is one, is in the code that was not shown in the question. As far as "appending bits in C++" goes, what you have in your code appears to be correct.

Try declaring buffer3 as buffer3[65]
Edit:
Sorry.
But I don't understand what the complaint is about.
In fact the answer is just as expected. You can infer it from your own result for the 16 bit input.
Since when you are oring the 32 '0' bits in lsb with second integer it will have leading zeroes in msb (when assigned to a 32 bit int which is in the signature of atoi) which are truncated in atoi (only the integer value equivalent will be read in the string, hence the string has to be 0X0 terminated, otherwise it would have a determinable size), giving the result.

Related

Showing binary representation of floating point types in C++ [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Consider the following code for integral types:
template <class T>
std::string as_binary_string( T value ) {
return std::bitset<sizeof( T ) * 8>( value ).to_string();
}
int main() {
unsigned char a(2);
char b(4);
unsigned short c(2);
short d(4);
unsigned int e(2);
int f(4);
unsigned long long g(2);
long long h(4);
std::cout << "a = " << +a << " " << as_binary_string( a ) << std::endl;
std::cout << "b = " << +b << " " << as_binary_string( b ) << std::endl;
std::cout << "c = " << c << " " << as_binary_string( c ) << std::endl;
std::cout << "d = " << c << " " << as_binary_string( d ) << std::endl;
std::cout << "e = " << e << " " << as_binary_string( e ) << std::endl;
std::cout << "f = " << f << " " << as_binary_string( f ) << std::endl;
std::cout << "g = " << g << " " << as_binary_string( g ) << std::endl;
std::cout << "h = " << h << " " << as_binary_string( h ) << std::endl;
std::cout << "\nPress any key and enter to quit.\n";
char q;
std::cin >> q;
return 0;
}
Pretty straight forward, works well and is quite simple.
EDIT
How would one go about writing a function to extract the binary or bit pattern of arbitrary floating point types at compile time?
When it comes to floats I have not found anything similar in any existing libraries of my own knowledge. I've searched google for days looking for one, so then I resorted into trying to write my own function without any success. I no longer have the attempted code available since I've originally asked this question so I can not exactly show you all of the different attempts of implementations along with their compiler - build errors. I was interested in trying to generate the bit pattern for floats in a generic way during compile time and wanted to integrate that into my existing class that seamlessly does the same for any integral type. As for the floating types themselves, I have taken into consideration the different formats as well as architecture endian. For my general purposes the standard IEEE versions of the floating point types is all that I should need to be concerned with.
iBug had suggested for me to write my own function when I originally asked this question, while I was in the attempt of trying to do so. I understand binary numbers, memory sizes, and the mathematics, but when trying to put it all together with how floating point types are stored in memory with their different parts {sign bit, base & exp } is where I was having the most trouble.
Since then with the suggestions those who have given a great answer - example I was able to write a function that would fit nicely into my already existing class template and now it works for my intended purposes.
What about writing one by yourself?
static_assert(sizeof(float) == sizeof(uint32_t));
static_assert(sizeof(double) == sizeof(uint64_t));
std::string as_binary_string( float value ) {
std::uint32_t t;
std::memcpy(&t, &value, sizeof(value));
return std::bitset<sizeof(float) * 8>(t).to_string();
}
std::string as_binary_string( double value ) {
std::uint64_t t;
std::memcpy(&t, &value, sizeof(value));
return std::bitset<sizeof(double) * 8>(t).to_string();
}
You may need to change the helper variable t in case the sizes for the floating point numbers are different.
You can alternatively copy them bit-by-bit. This is slower but serves for arbitrarily any type.
template <typename T>
std::string as_binary_string( T value )
{
const std::size_t nbytes = sizeof(T), nbits = nbytes * CHAR_BIT;
std::bitset<nbits> b;
std::uint8_t buf[nbytes];
std::memcpy(buf, &value, nbytes);
for(int i = 0; i < nbytes; ++i)
{
std::uint8_t cur = buf[i];
int offset = i * CHAR_BIT;
for(int bit = 0; bit < CHAR_BIT; ++bit)
{
b[offset] = cur & 1;
++offset; // Move to next bit in b
cur >>= 1; // Move to next bit in array
}
}
return b.to_string();
}
You said it doesn't need to be standard. So, here is what works in clang on my computer:
#include <iostream>
#include <algorithm>
using namespace std;
int main()
{
char *result;
result=new char[33];
fill(result,result+32,'0');
float input;
cin >>input;
asm(
"mov %0,%%eax\n"
"mov %1,%%rbx\n"
".intel_syntax\n"
"mov rcx,20h\n"
"loop_begin:\n"
"shr eax\n"
"jnc loop_end\n"
"inc byte ptr [rbx+rcx-1]\n"
"loop_end:\n"
"loop loop_begin\n"
".att_syntax\n"
:
: "m" (input), "m" (result)
);
cout <<result <<endl;
delete[] result;
return 0;
}
This code makes a bunch of assumptions about the computer architecture and I am not sure on how many computers it would work.
EDIT:
My computer is a 64-bit Mac-Air. This program basically works by allocating a 33-byte string and filling the first 32 bytes with '0' (the 33rd byte will automatically be '\0').
Then it uses inline assembly to store the float into a 32-bit register and then it repeatedly shifts it to the right by one bit.
If the last bit in the register was 1 before the shift, it gets stored into the carry flag.
The assembly code then checks the carry flag and, if it contains 1, it increases the corresponding byte in the string by 1.
Since it was previously initialized to '0', it will turn to '1'.
So, effectively, when the loop in the assembly is finished, the binary representation of a float is stored into a string.
This code only works for x64 (it uses 64-bit registers "rbx" and "rcx" to store the pointer and the counter for the loop), but I think it's easy to tweak it to work on other processors.
An IEEE floating point number looks like the following
sign exponent mantissa
1 bit 11 bits 52 bits
Note that there's a hidden 1 before the mantissa, and the exponent
is biased so 1023 = 0, not two's complement.
By memcpy()ing to a 64 bit unsigned integer you can then apply AND and
OR masks to get the bit pattern. The arrangement could be big endian
or little endian.
You can easily work out which arrangement you have by passing easy numbers
such as 1 or 2.
Generally people either use std::hexfloat or cast a pointer to the floating-point value to a pointer to an unsigned integer of the same size and print the indirected value in hex format. Both methods facilitate bit-level analysis of floating-point in a productive fashion.
You could roll your by casting the address of the float/double to a char and iterating it that way:
#include <memory>
#include <iostream>
#include <limits>
#include <iomanip>
template <typename T>
std::string getBits(T t) {
std::string returnString{""};
char *base{reinterpret_cast<char *>(std::addressof(t))};
char *tail{base + sizeof(t) - 1};
do {
for (int bits = std::numeric_limits<unsigned char>::digits - 1; bits >= 0; bits--) {
returnString += ( ((*tail) & (1 << bits)) ? '1' : '0');
}
} while (--tail >= base);
return returnString;
}
int main() {
float f{10.0};
double d{100.0};
double nd{-100.0};
std::cout << std::setprecision(1);
std::cout << getBits(f) << std::endl;
std::cout << getBits(d) << std::endl;
std::cout << getBits(nd) << std::endl;
}
Output on my machine (note the sign flip in the third output):
01000001001000000000000000000000
0100000001011001000000000000000000000000000000000000000000000000
1100000001011001000000000000000000000000000000000000000000000000

Why can't I pack these ints together?

I have the following code. The goal is to combine the two uint32_ts into a single uint64_t and then retrieve the values.
#include <iostream>
#include <cstdint>
int main()
{
uint32_t first = 5;
uint32_t second = 6;
uint64_t combined = (first << 32) | second;
uint32_t firstR = combined >> 32;
uint32_t secondR = combined & 0xffffffff;
std::cout << "F: " << firstR << " S: " << secondR << std::endl;
}
It outputs
F: 0 S: 7
How do I successfully retrieve the values correctly?
first is a 32-bit type and you bit-shift it by 32 bits. This is technically undefined behaviour, but probably the most likely outcome is that the result of the expression is 0. You need to cast it to a larger type before bit-shifting it.
uint64_t combined = (static_cast<uint64_t>(first) << 32) | second;
When you perform first << 32, you are shifting 32 bits within the space of 32 bits, so there are no bits remaining after the shift. The result of the shift is 0. You need to convert the first value to 64 bits before you shift it:
uint64_t combined = (uint64_t(first) << 32) | second;
As per the comments:
#include <iostream>
#include <cstdint>
int main()
{
uint32_t first = 5;
uint32_t second = 6;
uint64_t combined = (uint64_t(first) << 32) | second;
uint32_t firstR = combined >> 32;
uint32_t secondR = combined & 0xffffffff;
std::cout << "F: " << firstR << " S: " << secondR << std::endl;
}
The bit manipulation operators return a type of the first parameter. So you need to cast it to uint64_t in order for it to have room for the second value.

Why must I cast a `uint8_t` to `uint64_t` *before* left-shifting it?

I just want to concatenate my uint8_t array to uint64_t. In fact, I solved my problem but need to understand the reason. Here is my code;
uint8_t byte_array[5];
byte_array[0] = 0x41;
byte_array[1] = 0x42;
byte_array[2] = 0x43;
byte_array[3] = 0x44;
byte_array[4] = 0x45;
cout << "index 0: " << byte_array[0] << " index 1: " << byte_array[1] << " index 2: " << byte_array[2] << " index 3: " << byte_array[3] << " index 4: " << byte_array[4] << endl;
/* This does not work */
uint64_t reverse_of_value = (byte_array[0] & 0xff) | ((byte_array[1] & 0xff) << 8) | ((byte_array[2] & 0xff) << 16) | ((byte_array[3] & 0xff) << 24) | ((byte_array[4] & 0xff) << 32);
cout << reverse_of_value << endl;
/* this works fine */
reverse_of_value = (uint64_t)(byte_array[0] & 0xff) | ((uint64_t)(byte_array[1] & 0xff) << 8) | ((uint64_t)(byte_array[2] & 0xff) << 16) | ((uint64_t)(byte_array[3] & 0xff) << 24) | ((uint64_t)(byte_array[4] & 0xff) << 32);
cout << reverse_of_value << endl;
The first output will be "44434245" and second one will be "4544434241" that is what I want.
So as we see when I use casting each byte to uint64_t code works, however, if I do not use casting it gives me irrelevant result. Can anybody explain the reason?
Left-shifting a uint8_t that many bits isn't necessarily going to work. The left-hand operand will be promoted to int, whose width you don't know. It could already be 64-bit, but it could be 32-bit or even 16-bit, in which case… where would the result go? There isn't enough room for it! It doesn't matter that your code later puts the result into a uint64_t: the expression is evaluated in isolation.
You've correctly fixed that in your second version, by converting to uint64_t before the left-shift takes place. In this situation, the expression will assuredly have the desired behaviour.
Here is an example showing left-shift turning the char to 0. At least it does so on my machine, gcc 4.8.4, Ubuntu 14.04 LTS, x86_64.
#include <iostream>
using std::cout;
int main()
{
unsigned char ch;
ch = 0xFF;
cout << "Char before shift: " << static_cast<int>(ch) << '\n';
ch <<= 10;
cout << "Char after shift: " << static_cast<int>(ch) << '\n';
}
Note also my comment to the original question above, on some platforms, the 0x45 shifted 32 bits actually ends up in the least significant byte of the 64-bit value.
Shifting a type by more than the number of bits in the type is undefined behavior in C++. See this answer for more detail: https://stackoverflow.com/a/7401981/1689844

Flip bits using XOR 0xffffffff or ~ in C++?

If I want to flip some bits, I was wondering which way is better. Should I flip them using XOR 0xffffffff or by using ~?
I'm afraid that there will be some cases where I might need to pad bits onto the end in one of these ways and not the other, which would make the other way safer to use. I'm wondering if there are times when it's better to use one over the other.
Here is some code that uses both on the same input value, and the output values are always the same.
#include <iostream>
#include <iomanip>
void flipBits(unsigned long value)
{
const unsigned long ORIGINAL_VALUE = value;
std::cout << "Original value:" << std::setw(19) << std::hex << value << std::endl;
value ^= 0xffffffff;
std::cout << "Value after XOR:" << std::setw(18) << std::hex << value << std::endl;
value = ORIGINAL_VALUE;
value = ~value;
std::cout << "Value after bit negation: " << std::setw(8) << std::hex << value << std::endl << std::endl;
}
int main()
{
flipBits(0x12345678);
flipBits(0x11223344);
flipBits(0xabcdef12);
flipBits(15);
flipBits(0xffffffff);
flipBits(0x0);
return 0;
}
Output:
Original value: 12345678
Value after XOR: edcba987
Value after bit negation: edcba987
Original value: 11223344
Value after XOR: eeddccbb
Value after bit negation: eeddccbb
Original value: abcdef12
Value after XOR: 543210ed
Value after bit negation: 543210ed
Original value: f
Value after XOR: fffffff0
Value after bit negation: fffffff0
Original value: ffffffff
Value after XOR: 0
Value after bit negation: 0
Original value: 0
Value after XOR: ffffffff
Value after bit negation: ffffffff
Use ~:
You won't be relying on any specific width of the type; for example, int is not 32 bits on all platforms.
It removes the risk of accidentally typing one f too few or too many.
It makes the intent clearer.
As you're asking for c++ specifically, simply use std::bitset
#include <iostream>
#include <iomanip>
#include <bitset>
#include <limits>
void flipBits(unsigned long value) {
std::bitset<std::numeric_limits<unsigned long>::digits> bits(value);
std::cout << "Original value : 0x" << std::hex << value;
value = bits.flip().to_ulong();
std::cout << ", Value after flip: 0x" << std::hex << value << std::endl;
}
See live demo.
As for your mentioned concerns, of just using the ~ operator with the unsigned long value, and having more bits flipped as actually wanted:
Since std::bitset<NumberOfBits> actually specifies the number of bits, that should be operated on, it will well solve such problems correctly.

Converting hex String to structure

I've got a file containing a large string of hexidecimal. Here's the first few lines:
0000038f
0000111d
0000111d
03030303
//Goes on for a long time
I have a large struct that is intended to hold that data:
typedef struct
{
unsigned int field1: 5;
unsigned int field2: 11;
unsigned int field3: 16;
//Goes on for a long time
}calibration;
What I want to do is read the above string and store it in the struct. I can assume the input is valid (it's verified before I get it).
I've already got a loop that reads the file and puts the whole item in a string:
std::string line = "";
std::string hexText = "";
while(!std::getline(readFile, line))
{
hexText += line;
}
//Convert string into calibration
//Convert string into long int
long int hexInt = strtol(hexText.c_str(), NULL, 16);
//Here I get stuck: How to get from long int to calibration...?
How to get from long int to calibration...?
Cameron's answer is good, and probably what you want.
I offer here another (maybe not so different) approach.
Note1: Your file input needs re-work. I will suggest
a) use getline() to fetch one line at a time into a string
b) convert the one entry to a uint32_t (I would use stringstream instead of atol)
once you learn how to detect and recover from invalid input,
you could then work on combining a) and b) into one step
c) then install the uint32_t in your structure, for which my
offering below might offer insight.
Note2: I have worked many years with bit fields, and have developed a distaste for them.
I have never found them more convenient than the alternatives.
The alternative I prefer is bit masks and field shifting.
So far as we can tell from your problem statement, it appears your problem does not need bit-fields (which Cameron's answer illustrates).
Note3: Not all compilers will pack these bit fields for you.
The last compiler I used require what is called a "pragma".
G++ 4.8 on ubuntu seemed to pack the bytes just fine (i.e. no pragma needed)
The sizeof(calibration) for your original code is 4 ... i.e. packed.
Another issue is that packing can unexpectedly change when you change options or upgrade the compiler or change the compiler.
My team's work-around was to always have an assert against struct size and a few byte offsets in the CTOR.
Note4: I did not illustrate the use of 'union' to align a uint32_t array over your calibration struct.
This may be preferred over the reinterpret cast approach. Check your requirements, team lead, professor.
Anyway, in the spirit of your original effort, consider the following additions to your struct calibration:
typedef struct
{
uint32_t field1 : 5;
uint32_t field2 : 11;
uint32_t field3 : 16;
//Goes on for a long time
// I made up these next 2 fields for illustration
uint32_t field4 : 8;
uint32_t field5 : 24;
// ... add more fields here
// something typically done by ctor or used by ctor
void clear() { field1 = 0; field2 = 0; field3 = 0; field4 = 0; field5 = 0; }
void show123(const char* lbl=0) {
if(0 == lbl) lbl = " ";
std::cout << std::setw(16) << lbl;
std::cout << " " << std::setw(5) << std::hex << field3 << std::dec
<< " " << std::setw(5) << std::hex << field2 << std::dec
<< " " << std::setw(5) << std::hex << field1 << std::dec
<< " 0x" << std::hex << std::setfill('0') << std::setw(8)
<< *(reinterpret_cast<uint32_t*>(this))
<< " => " << std::dec << std::setfill(' ')
<< *(reinterpret_cast<uint32_t*>(this))
<< std::endl;
} // show
// I did not create show456() ...
// 1st uint32_t: set new val, return previous
uint32_t set123(uint32_t nxtVal) {
uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
uint32_t prevVal = myVal[0];
myVal[0] = nxtVal;
return (prevVal);
}
// return current value of the combined field1, field2 field3
uint32_t get123(void) {
uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
return (myVal[0]);
}
// 2nd uint32_t: set new val, return previous
uint32_t set45(uint32_t nxtVal) {
uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
uint32_t prevVal = myVal[1];
myVal[1] = nxtVal;
return (prevVal);
}
// return current value of the combined field4, field5
uint32_t get45(void) {
uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
return (myVal[1]);
}
// guess that next 4 fields fill 32 bits
uint32_t get6789(void) {
uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
return (myVal[2]);
}
// ... tedious expansion
} calibration;
Here is some test code to illustrate the use:
uint32_t t125()
{
const char* lbl =
"\n 16 bits 11 bits 5 bits hex => dec";
calibration cal;
cal.clear();
std::cout << lbl << std::endl;
cal.show123();
cal.field1 = 1;
cal.show123("field1 = 1");
cal.clear();
cal.field1 = 31;
cal.show123("field1 = 31");
cal.clear();
cal.field2 = 1;
cal.show123("field2 = 1");
cal.clear();
cal.field2 = (2047 & 0x07ff);
cal.show123("field2 = 2047");
cal.clear();
cal.field3 = 1;
cal.show123("field3 = 1");
cal.clear();
cal.field3 = (65535 & 0x0ffff);
cal.show123("field3 = 65535");
cal.set123 (0xABCD6E17);
cal.show123 ("set123(0x...)");
cal.set123 (0xffffffff);
cal.show123 ("set123(0x...)");
cal.set123 (0x0);
cal.show123 ("set123(0x...)");
std::cout << "\n";
cal.clear();
std::cout << "get123(): " << cal.get123() << std::endl;
std::cout << " get45(): " << cal.get45() << std::endl;
// values from your file:
cal.set123 (0x0000038f);
cal.set45 (0x0000111d);
std::cout << "get123(): " << "0x" << std::hex << std::setfill('0')
<< std::setw(8) << cal.get123() << std::endl;
std::cout << " get45(): " << "0x" << std::hex << std::setfill('0')
<< std::setw(8) << cal.get45() << std::endl;
// cal.set6789 (0x03030303);
// std::cout << "get6789(): " << cal.get6789() << std::endl;
// ...
return(0);
}
And the test code output:
16 bits 11 bits 5 bits hex => dec
0 0 0 0x00000000 => 0
field1 = 1 0 0 1 0x00000001 => 1
field1 = 31 0 0 1f 0x0000001f => 31
field2 = 1 0 1 0 0x00000020 => 32
field2 = 2047 0 7ff 0 0x0000ffe0 => 65,504
field3 = 1 1 0 0 0x00010000 => 65,536
field3 = 65535 ffff 0 0 0xffff0000 => 4,294,901,760
set123(0x...) abcd 370 17 0xabcd6e17 => 2,882,366,999
set123(0x...) ffff 7ff 1f 0xffffffff => 4,294,967,295
set123(0x...) 0 0 0 0x00000000 => 0
get123(): 0
get45(): 0
get123(): 0x0000038f
get45(): 0x0000111d
The goal of this code is to help you see how the bit fields map into the lsbyte through msbyte of the data.
If you care at all about efficiency, don't read the whole thing into a string and then convert it. Simply read one word at a time, and convert that. Your loop should look something like:
calibration c;
uint32_t* dest = reinterpret_cast<uint32_t*>(&c);
while (true) {
char hexText[8];
// TODO: Attempt to read 8 bytes from file and then skip whitespace
// TODO: Break out of the loop on EOF
std::uint32_t hexValue = 0; // TODO: Convert hex to dword
// Assumes the structure padding & packing matches the dump version's
// Assumes the structure size is exactly a multiple of 32-bytes (w/ padding)
static_assert(sizeof(calibration) % 4 == 0);
assert(dest - &c < sizeof(calibration) && "Too much data");
*dest++ = hexValue;
}
assert(dest - &c == sizeof(calibration) && "Too little data");
Converting 8 chars of hex to an actual 4-byte int is a good exercise and is well-covered elsewhere, so I've left it out (along with the file reading, which is similarly well-covered).
Note the two assumptions in the loop: the first one cannot be checked either at run-time or compile time, and must be either agreed upon in advance or extra work has to be done to properly serialize the structure (handling structure packing and padding, etc.). The last one can at least be checked at compile time with the static_assert.
Also, care has to be taken to ensure that the endianness of the hex bytes in the file matches the endianness of the architecture executing the program when converting the hex string. This will depend on whether the hex was written in a specific endianness in the first place (in which case you can convert it from the know endianness to the current architecture's endianness quite easily), or whether it's architecture-dependent (in which case you have no choice but to assume the endianness is the same as your current architecture).