Safely convert 2 bytes to short

Safely convert 2 bytes to short - c++

I'm making an emulator for the Intel 8080. One of the opcodes requires a 16 bit address by combining the b and c registers (both 1 byte). I have a struct with the registers adjacent to each other. The way I combine the two registers is:
using byte = char;
struct {
... code
byte b;
byte c;
... code
} state;
...somewhere in code
// memory is an array of byte with a size of 65535
memory[*reinterpret_cast<short*>(&state.b)]
I was thinking I can just OR them together, but that doesn't work.
short address = state.b | state.c
Another way I tried doing this was by creating a short, and setting the 2 bytes individually.
short address;
*reinterpret_cast<byte*>(&address) = state.b;
*(reinterpret_cast<byte*>(&address) + 1) = state.c;
Is there a better/safer way to achieve what I am trying to do?

short j;
j = state.b;
j <<= 8;
j |= state.c;
Reverse the state.b and state.c if you need the opposite endianness.

short address = ((unsigned short)state.b << 8) | (unsigned char)state.c;
That's the portable way. Your way, with reinterpret_cast is not really that terrible, as long as you understand that it'll only work on architecture with the correct endian-ness.

As others have mentioned there are concerns with endian-ness but you can also use a union to manipulate the memory without the need to do any shifting.
Example Code
#include <cstdint>
#include <iostream>
using byte = std::uint8_t;
struct Regs
{
union
{
std::uint16_t bc;
struct
{
// The order of these bytes matters
byte c;
byte b;
};
};
};
int main()
{
Regs regs;
regs.b = 1; // 0000 0001
regs.c = 7; // 0000 0111
// Read these vertically to know the value associated with each bit
//
// 2 1
// 5 2631
// 6 8426 8421
//
// The overall binary: 0000 0001 0000 0111
//
// 256 + 4 + 2 + 1 = 263
std::cout << regs.bc << "\n";
return 0;
}
Example Output
263
Live Example

You can use:
unsigned short address = state.b * 0x100u + state.c;
Using multiplication instead of shift avoids all the issues relating to shifting the sign bit etc.
The address should be unsigned otherwise you will cause out-of-range assignment, and probably you want to use 0 to 65535 as your address range anyway, instead of -32768 to 32767.

Related

c++ combining 2 uint8_t into one uint16_t not working?

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!

There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.

In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.

If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15

Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.

One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.

How can i store 2 numbers in a 1 byte char?

I have the question of the title, but If not, how could I get away with using only 4 bits to represent an integer?
EDIT really my question is how. I am aware that there are 1 byte data structures in a language like c, but how could I use something like a char to store two integers?

In C or C++ you can use a struct to allocate the required number of bits to a variable as given below:
#include <stdio.h>
struct packed {
unsigned char a:4, b:4;
};
int main() {
struct packed p;
p.a = 10;
p.b = 20;
printf("p.a %d p.b %d size %ld\n", p.a, p.b, sizeof(struct packed));
return 0;
}
The output is p.a 10 p.b 4 size 1, showing that p takes only 1 byte to store, and that numbers with more than 4 bits (larger than 15) get truncated, so 20 (0x14) becomes 4. This is simpler to use than the manual bitshifting and masking used in the other answer, but it is probably not any faster.

You can store two 4-bit numbers in one byte (call it b which is an unsigned char).
Using hex is easy to see that: in b=0xAE the two numbers are A and E.
Use a mask to isolate them:
a = (b & 0xF0) >> 4
and
e = b & 0x0F
You can easily define functions to set/get both numbers in the proper portion of the byte.
Note: if the 4-bit numbers need to have a sign, things can become a tad more complicated since the sign must be extended correctly when packing/unpacking.

How to set specific bits?

Let's say I've got a uint16_t variable where I must set specific bits.
Example:
uint16_t field = 0;
That would mean the bits are all zero: 0000 0000 0000 0000
Now I get some values that I need to set at specific positions.
val1=1; val2=2, val3=0, val4=4, val5=0;
The structure how to set the bits is the following
0|000| 0000| 0000 000|0
val1 should be set at the first bit on the left. so its only one or zero.
val2 should be set at the next three bits. val3 on the next four bits. val4 on the next seven bits and val5 one the last bit.
The result would be this:
1010 0000 0000 1000
I only found out how to the one specific bit but not 'groups'. (shift or bitset)
Does anyone have an idea how to solve this issue?

There are (at least) two basic approaches. One would be to create a struct with some bitfields:
struct bits {
unsigned a : 1;
unsigned b : 7;
unsigned c : 4;
unsigned d : 3;
unsigned e : 1;
};
bits b;
b.a = val1;
b.b = val2;
b.c = val3;
b.d = val4;
b.e = val5;
To get the 16-bit value, you could (for one example) create a union of that struct with a uint16_t. Just one minor problem: the standard doesn't guarantee what order the bit fields will end up in when you look at the 16-bit value. Just for example, you might need to reverse the order I've given above to get the order from most to least significant bits that you really want (but changing compilers might muck things up again).
The other obvious possibility would be to use shifting and masking to put the pieces together into a number:
int16_t result = val1 | (val2 << 1) | (val3 << 8) | (val4 << 12) | (val5 << 15);
For the moment, I've assumed each of the inputs starts out in the correct range (i.e., has a value that can be represented in the chosen number of bits). If there's a possibility that could be wrong, you'd want to mask it to the correct number of bits first. The usual way to do that is something like:
uint16_t result = input & ((1 << num_bits) - 1);
In case you're curious about the math there, it works like this. Lets's assume we want to ensure an input fits in 4 bits. Shifting 1 left 4 bits produces 00010000 (in binary). Subtracting one from that then clears the one bit that's set, and sets all the less significant bits than that, giving 00001111 for our example. That gives us the first least significant bits set. When we do a bit-wise AND between that and the input, any higher bits that were set in the input are cleared in the result.

One of the solutions would be to set a K-bit value starting at the N-th bit of field as:
uint16_t value_mask = ((1<<K)-1) << N; // for K=4 and N=3 will be 00..01111000
field = field & ~value_mask; // zeroing according bits inside the field
field = field | ((value << N) & value_mask); // AND with value_mask is for extra safety
Or, if you can use struct instead of uint16_t, you can use Bit fields and let the compiler to perform all these actions for you.

finalvle = 0;
finalvle = (val1&0x01)<<15;
finalvle += (val2&0x07)<<12;
finalvle += (val3&0x0f)<<8
finalvle += (val4&0xfe)<<1;
finalvle += (val5&0x01);

You can use the bitwise or and shift operators to achieve this.
Use shift << to 'move bytes to the left':
int i = 1; // ...0001
int j = i << 3 // ...1000
You can then use bitwise or | to put it at the right place, (assuming you have all zeros at the bits you are trying to overwrite).
int k = 0; // ...0000
k |= i // ...0001
k |= j // ...1001
Edit: Note that #Inspired's answer also explains with zeroing out a certain area of bits. It overall explains how you would go about implementing it properly.

try this code:
uint16_t shift(uint16_t num, int shift)
{
return num | (int)pow (2, shift);
}
where shift is position of bit that you wanna set

convert 4 bytes to 3 bytes in C++

I have a requirement, where 3 bytes (24 bits) need to be populated in a binary protocol. The original value is stored in an int (32 bits). One way to achieve this would be as follows:-
Technique1:-
long x = 24;
long y = htonl(x);
long z = y>>8;
memcpy(dest, z, 3);
Please let me know if above is the correct way to do it?
The other way, which i dont understand was implemented as below
Technique2:-
typedef struct {
char data1;
char data2[3];
} some_data;
typedef union {
long original_data;
some_data data;
} combined_data;
long x = 24;
combined_data somedata;
somedata.original_data = htonl(x);
memcpy(dest, &combined_data.data.data2, 3);
What i dont understand is, how did the 3 bytes end up in combined_data.data.data2 as opposed to first byte should go into combined_data.data.data1 and next 2 bytes should go into
combined_data.data.data2?
This is x86_64 platform running 2.6.x linux and gcc.
PARTIALLY SOLVED:-
On x86_64 platform, memory is addressed from right to left. So a variable of type long with value 24, will have following memory representation
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0 0 0 0x18
With htonl() performed on above long type, the memory becomes
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0x18 0 0 0
In the struct some_data, the
data1 = Byte1
data2[0] = Byte2
data2[1] = Byte3
data4[2] = Byte4
But my Question still holds, Why not simply right shift by 8 as shown in technique 1 ?

A byte takes 8 bits :-)
int x = 24;
int y = x<<8;
moving by 0 you are changing nothing. By 1 - *2, by 2 - *4, by 8 - *256.
if we are on the BIG ENDIAN machine, 4 bytes are put in memory as so: 2143. And such algorythms won't work for numbers greater than 2^15. On the other way, on the BIG ENDIAN machine you should define, what means " putting integer in 3 bytes"
Hmm. I think, the second proposed algorythm will be ok, but change the order of bytes:
You have them as 2143. You need 321, I think. But better check it.
Edit: I checked on wiki - x86 is little endian, they say, so algorythms are OK

C++ How to combine two signed 8 Bit numbers to a 16 Bit short? Unexplainable results

I need to combine two signed 8 Bit _int8 values to a signed short (16 Bit) value. It is important that the sign is not lost.
My code is:
unsigned short lsb = -13;
unsigned short msb = 1;
short combined = (msb << 8 )| lsb;
The result I get is -13. However, I expect it to be 499.
For the following examples, I get the correct results with the same code:
msb = -1; lsb = -6; combined = -6;
msb = 1; lsb = 89; combined = 345;
msb = -1; lsb = 13; combined = -243;
However, msb = 1; lsb = -84; combined = -84; where I would expect 428.
It seems that if the lsb is negative and the msb is positive, something goes wrong!
What is wrong with my code? How does the computer get to these unexpected results (Win7, 64 Bit and VS2008 C++)?

Your lsb in this case contains 0xfff3. When you OR it with 1 << 8 nothing changes because there is already a 1 in that bit position.
Try short combined = (msb << 8 ) | (lsb & 0xff);

Or using a union:
#include <iostream>
union Combine
{
short target;
char dest[ sizeof( short ) ];
};
int main()
{
Combine cc;
cc.dest[0] = -13, cc.dest[1] = 1;
std::cout << cc.target << std::endl;
}

It is possible that lsb is being automatically sign-extended to 16 bits. I notice you only have a problem when it is negative and msb is positive, and that is what you would expect to happen given the way you're using the or operator. Although, you're clearly doing something very strange here. What are you actually trying to do here?

Raisonanse C complier for STM8 (and, possibly, many other compilers) generates ugly code for classic C code when writing 16-bit variables into 8-bit hardware registers.
Note - STM8 is big-endian, for little-endian CPUs code must be slightly modified. Read/Write byte order is important too.
So, standard C code piece:
unsigned int ch1Sum;
...
TIM5_CCR1H = ch1Sum >> 8;
TIM5_CCR1L = ch1Sum;
Is being compiled to:
;TIM5_CCR1H = ch1Sum >> 8;
LDW X,ch1Sum
CLR A
RRWA X,A
LD A,XL
LD TIM5_CCR1,A
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Too long, too slow.
My version:
unsigned int ch1Sum;
...
TIM5_CCR1H = ((u8*)&ch1Sum)[0];
TIM5_CCR1L = ch1Sum;
That is compiled into adequate two MOVes
;TIM5_CCR1H = ((u8*)&ch1Sum)[0];
MOV TIM5_CCR1,ch1Sum
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Opposite direction:
unsigned int uSonicRange;
...
((unsigned char *)&uSonicRange)[0] = TIM1_CCR2H;
((unsigned char *)&uSonicRange)[1] = TIM1_CCR2L;
instead of
unsigned int uSonicRange;
...
uSonicRange = TIM1_CCR2H << 8;
uSonicRange |= TIM1_CCR2L;

Some things you should know about the datatypes (un)signed short and char:
char is an 8-bit value, thats what you where looking for for lsb and msb. short is 16 bits in length.
You should also not store signed values in unsigned ones execpt you know what you are doing.
You can take a look at the two's complement. It describes the representation of negative values (for integers, not for floating-point values) in C/C++ and many other programming languages.
There are multiple versions of making your own two's complement:
int a;
// setting a
a = -a; // Clean version. Easier to understand and read. Use this one.
a = (~a)+1; // The arithmetical version. Does the same, but takes more steps.
// Don't use the last one unless you need it!
// It can be 'optimized away' by the compiler.
stdint.h (with inttypes.h) is more for the purpose of having exact lengths for your variable. If you really need a variable to have a specific byte-length you should use that (here you need it).
You should everythime use datatypes which fit your needs the best. Your code should therefore look like this:
signed char lsb; // signed 8-bit value
signed char msb; // signed 8-bit value
signed short combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
or like this:
#include <stdint.h>
int8_t lsb; // signed 8-bit value
int8_t msb; // signed 8-bit value
int_16_t combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
For the last one the compiler will use signed 8/16-bit values everytime regardless what length int has on your platform. Wikipedia got some nice explanation of the int8_t and int16_t datatypes (and all the other datatypes).
btw: cppreference.com is useful for looking up the ANSI C standards and other things that are worth to know about C/C++.

You wrote, that you need to combine two 8-bit values. Why you're using unsigned short then?
As Dan already said, lsb automatically extended to 16 bits. Try the following code:
uint8_t lsb = -13;
uint8_t msb = 1;
int16_t combined = (msb << 8) | lsb;
This gives you the expected result: 499.

If this is what you want:
msb: 1, lsb: -13, combined: 499
msb: -6, lsb: -1, combined: -1281
msb: 1, lsb: 89, combined: 345
msb: -1, lsb: 13, combined: -243
msb: 1, lsb: -84, combined: 428
Use this:
short combine(unsigned char msb, unsigned char lsb) {
return (msb<<8u)|lsb;
}
I don't understand why you would want msb -6 and lsb -1 to generate -6 though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js