How do you concatenate 4 UINT8 variables into one UINT32 variable? - c++

I have the following UINT8 variables:
UINT8 var1 = 0b00000001; //0000 0001
UINT8 var2 = 0b00000011; //0000 0011
UINT8 var3 = 0b00000111; //0000 0111
UINT8 var4 = 0b00001111; //0000 1111
I would like to pack these four UINT8 variables into one UINT32 variable with the following value:
UINT32 var1 = 0b00000001000000110000011100001111; //00000001 00000011 00000111 00001111
Would the following code do it correctly and safely?
UINT32 var1 = (var1<<24) + (var2<<16) + (var3<<8) + var4;

Short answer, yes.
I'm not going to worry about how you wrote your binary numbers. I will enter them in hex and let you look for binary representations by this related SO question: Can I use a binary literal in C or C++?
#include "stdafx.h" // you are using devstudio
#include <Windows.h> // you are using windows types
#include <iostream> // I print out the result
#include <bitset> // I use bitset to print the binary string
int main()
{
UINT8 var1 = 0x01; //0000 0001
UINT8 var2 = 0x03; //0000 0011
UINT8 var3 = 0x07; //0000 0111
UINT8 var4 = 0x0F; //0000 1111
UINT32 bigvar = (var1 << 24) + (var2 << 16) + (var3 << 8) + var4;
std::cout << std::bitset<32>(bigvar) << std::endl;
}
Your math is correct and safe. The bytes are independently declared, so you don't have to worry about byte order. The types are all unsigned, so no UB issues with the sign bit. The shifts all fit in the correct bit count, so no overflow. I generated:
00000001000000110000011100001111
Alternatively, you could have read in a 32 bit integer as 4 bytes, and reconstructed the 32 bit number, but that would not be portable, because sometimes the numbers are stored in reverse order. For example, in TIFF, you read in a header value which tells you whether you would put var1 first and count up, or var4 first and count down. Byte order is something you have to watch out for in almost all practical applications of combining a bunch of bytes into a larger integer type. Look up big-endian and little-endian for more info.

Related

Expanding packed nibbles to 5-bit groups

I currently have an un unsigned int of 64 bits that contains:
0100
0100
0100
0100
0000...
And i would change it to :
01000
01000
01000
01000
00000...
Is there a way to to do that ?
Thanks
📎 Hi! It looks like you are trying to expand 4-bit nibbles into 5-bit groups.
In general, you can do it like this
uint64_t value = YOUR_DATA; //this has your bits.
for (int i; i< sizeof(value)*2; i++) {
uint8_t nibble = (value & 0xF);
nibble <<= 1; //shift left 1 bit, add 0 to end.
STORE(nibble, i);
value>>=4; //advance to next nibble
}
This will call STORE once for each group of 4 bits. The arguments to STORE are the "expanded" 5 bit value, and the nibble counter, where 0 represents the least significant 4 bits.
The design question to answer is how to store the result? 64 bits / 4 * 5 = 80 bits, so you either need 2 words, or to throw away the data at one end.
Assuming 2 words with the anchor at the LSB, STORE could look like
static uint64_t result[2] = {0,0};
void STORE(uint64_t result[], uint8_t value, int n) {
int idx = (n>12); //which result word?
result[idx] |= value << ( n*5 - idx*64 );
if (n==12) result[1] |= value>>4; //65th bit goes into 2nd result word
}
Omit the leading 0, it serves no purpose => shift left one bit

Using bit fields in struct and reading an instance from Memory with C++

unsigned __int16 var16bit= 6545; //00011001 10010001
unsigned __int8 var8bit;
memcpy(&var8bit, &var16bit, sizeof(var8bit));
cout << var8bit; // result is 145 which is 10010001
When working with little endian machine, lsb is written to memory first then offset is added to address if necessary. In this example a 16bit integer is copied to 8bit integer and it losts its msb side because of target space is not enough. Reverse behaviour happens in big endian machines.
There is another example below
here is my structure
struct Account
{
unsigned int dollar : 4;
unsigned int euro : 4;
unsigned int pound : 4;
unsigned int ruble : 4;
};
int _tmain(int argc, _TCHAR* argv[])
{
Account myWealthyAccount = {};
myWealthyAccount.dollar = 2; // 0010
myWealthyAccount.euro = 3; // 0011
myWealthyAccount.pound = 4; // 0100
myWealthyAccount.ruble = 5; // 0101
unsigned __int16 sum;
memcpy(&sum, &myWealthyAccount, sizeof(myWealthyAccount));
cout << sum;
// result is 0101010000110010 ruble pound euro dollar
unsigned __int8 sum8Bit;
memcpy(&sum8Bit, &myWealthyAccount, sizeof(sum8Bit));
cout << sum8Bit; // result is 50 which is 0011 0010 euro dollar
return 0;
}
0101010000110010 ruble pound euro dollar
Why was it ordered as ruble pound euro dollar why not the opposite way?
Why is ruble is beside msb and dollar is beside lsb?
Is this about being little endian or big endian machine? Or Is this about Struct? Is this about compiler?
If I run this example in big endian machine, what result would occur? Why?
I'm open to every advises like articles, videos, websites about the topic.
Thanks
"Why was it ordered as ruble pound euro dollar why not the opposite way?"
I copied your code and got the exact same result.
On your first example the 16 bit integer is stored like that 00011001 10010001. So if you take only the lsb you get 10010001. But your Bitfield is stored like 0101 0100 0011 0010.
If you look on http://en.cppreference.com/w/cpp/language/bit_field under notes:
"Also, on some platforms, bit fields are packed left-to-right, on others right-to-left". So it is implementation defined which order you have.

Safely convert 2 bytes to short

I'm making an emulator for the Intel 8080. One of the opcodes requires a 16 bit address by combining the b and c registers (both 1 byte). I have a struct with the registers adjacent to each other. The way I combine the two registers is:
using byte = char;
struct {
... code
byte b;
byte c;
... code
} state;
...somewhere in code
// memory is an array of byte with a size of 65535
memory[*reinterpret_cast<short*>(&state.b)]
I was thinking I can just OR them together, but that doesn't work.
short address = state.b | state.c
Another way I tried doing this was by creating a short, and setting the 2 bytes individually.
short address;
*reinterpret_cast<byte*>(&address) = state.b;
*(reinterpret_cast<byte*>(&address) + 1) = state.c;
Is there a better/safer way to achieve what I am trying to do?
short j;
j = state.b;
j <<= 8;
j |= state.c;
Reverse the state.b and state.c if you need the opposite endianness.
short address = ((unsigned short)state.b << 8) | (unsigned char)state.c;
That's the portable way. Your way, with reinterpret_cast is not really that terrible, as long as you understand that it'll only work on architecture with the correct endian-ness.
As others have mentioned there are concerns with endian-ness but you can also use a union to manipulate the memory without the need to do any shifting.
Example Code
#include <cstdint>
#include <iostream>
using byte = std::uint8_t;
struct Regs
{
union
{
std::uint16_t bc;
struct
{
// The order of these bytes matters
byte c;
byte b;
};
};
};
int main()
{
Regs regs;
regs.b = 1; // 0000 0001
regs.c = 7; // 0000 0111
// Read these vertically to know the value associated with each bit
//
// 2 1
// 5 2631
// 6 8426 8421
//
// The overall binary: 0000 0001 0000 0111
//
// 256 + 4 + 2 + 1 = 263
std::cout << regs.bc << "\n";
return 0;
}
Example Output
263
Live Example
You can use:
unsigned short address = state.b * 0x100u + state.c;
Using multiplication instead of shift avoids all the issues relating to shifting the sign bit etc.
The address should be unsigned otherwise you will cause out-of-range assignment, and probably you want to use 0 to 65535 as your address range anyway, instead of -32768 to 32767.

Writing on MSB and on LSB of an unsigned Char

I have an unsigned char and i want to write 0x06 on the four most significant, and i want to write 0x04 on its 4 least significant bits.
So the Char representation should be like 0110 0010
Can some can guide me how i can do this in C?
c = (0x06 << 4) | 0x04;
Because:
0x04 = 0000 0100
0x06 = 0000 0110
0x06<<4 = 0110 0000
or op: = 0110 0100
Shift values into the right position with the bitwise shift operators, and combine with bitwise or.
unsigned char c = (0x6 << 4) | 0x4;
To reverse the process and extract bitfields, you can use bitwise and with a mask containing just the bits you're interested in:
unsigned char lo4 = c & 0xf;
unsigned char hi4 = c >> 4;
First, ensure there are eight bits per unsigned char:
#include <limits.h>
#if CHAR_BIT != 8
#error "This code does not support character sizes other than 8 bits."
#endif
Now, suppose you already have an unsigned char defined with:
unsigned char x;
Then, if you want to completely set an unsigned char to have 6 in the high four bits and 4 in the low four bits, use:
x = 0x64;
If you want to see the high bits to a and the low bits to b, then use:
// Shift a to high four bits and combine with b.
x = a << 4 | b;
If you want to set the high bits to a and leave the low bits unchanged, use:
// Shift a to high four bits, extract low four bits of x, and combine.
x = a << 4 | x & 0xf;
If you want to set the low bits to b and leave the high bits unchanged, use:
// Extract high four bits of x and combine with b.
x = x & 0xf0 | b;
The above presumes that a and b contain only four-bit values. If they might have other bits set, use (a & 0xf) and (b & 0xf) in place of a and b above, respectively.

C++ copying integer to char[] or unsigned char[] error

So I'm using the following code to put an integer into a char[] or an unsigned char[]
(unsigned???) char test[12];
test[0] = (i >> 24) & 0xFF;
test[1] = (i >> 16) & 0xFF;
test[2] = (i >> 8) & 0xFF;
test[3] = (i >> 0) & 0xFF;
int j = test[3] + (test[2] << 8) + (test[1] << 16) + (test[0] << 24);
printf("Its value is...... %d", j);
When I use type unsigned char and value 1000000000 it prints correctly.
When I use type char (same value) I get 98315724 printed?
So, the question really is can anyone explain what the hell is going on??
Upon examining the binary for the two different numbers I still can't work out whats going on. I thought signed was when the MSB was set to 1 to indicate a negative value (but negative char? wth?)
I'm explicitly telling the buffer what to insert into it, and how to interpret the contents, so don't see why this could be happening.
I have included binary/hex below for clarity in what I examined.
11 1010 1001 1001 1100 1010 0000 0000 // Binary for 983157248
11 1011 1001 1010 1100 1010 0000 0000 // Binary for 1000000000
3 A 9 9 C A 0 0 // Hex for 983157248
3 B 9 A C A 0 0 // Hex for 1000000000
In addition to the answer by Kerrek SB please consider the following:
Computers (almost always) use something called twos-complement notation for negative numbers, with the high bit functioning as a 'negative' indicator. Ask yourself what happens when you perform shifts on a signed type considering that the computer will handle the signed bit specially.
You may want to read Why does left shift operation invoke Undefined Behaviour when the left side operand has negative value? right here on StackOverflow for a hint.
When you say i & 0xFF etc, you're creaing values in the range [0, 256). But (your) char has a range of [-128, +128), and so you cannot actually store those values sensibly (i.e. the behaviour is implementation defined and tedious to reason about).
Use unsigned char for unsigned values. The clue is in the name.
This all has to do with internal representation and the way each type uses that data to interpret it. In the internal representation of a signed character, the first bit of your byte holds the sign, the others, the value. when the first bit is 1, the number is negative, the following bits then represent the complement of the positive value. for example:
unsigned char c; // whose internal representation we will set at 1100 1011
c = (1 * 2^8) + (1 * 2^7) + (1 * 2^4) + (1 * 2^2) + (1 * 2^1);
cout << c; // will give 203
// inversely:
char d = c; // not unsigned
cout << d; // will print -53
// as if the first is 1, d is negative,
// and other bits complement of value its positive value
// 1100 1011 -> -(complement of 100 1011)
// the complement is an XOR +1 011 0101
// furthermore:
char e; // whose internal representation we will set at 011 0101
e = (1 * 2^6) + (1 * 2^5) + (1 * 3^2) + (1 * 2^1);
cout << e; // will print 53