c++ combining 2 uint8_t into one uint16_t not working?

c++ combining 2 uint8_t into one uint16_t not working? - c++

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!

There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.

In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.

If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15

Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.

One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.

Related

C/C++ Bitwise Operations not resulting in expected output?

I'm currently working on bitwise operations but I am confused right now... Here's the scoop and why
I have a byte 0xCD in bits this is 1100 1101
I am shifting the bits left 7, then I'm saying & 0xFF since 0xFF in bits is 1111 1111
unsigned int bit = (0xCD << 7) & 0xFF<<7;
Now I would make the assumption that both 0xCD and 0xFF would get shifted to the left 7 times and the remaining bit would be 1&1 = 1 but I'm not getting that for output also I would also make the assumption that shifting 6 would give me bits 0&1 = 0 but I'm getting again a number above 1 like 205 0.o Is there something incorrect about the way I am trying to process bit shifting in my head? If so what is it that I am doing wrong?
Code Below:
unsigned char byte_now = 0xCD;
printf("Bits for byte_now: 0x%02x: ", byte_now);
/*
* We want to get the first bit in a byte.
* To do this we will shift the bits over 7 places for the last bit
* we will compare it to 0xFF since it's (1111 1111) if bit&1 then the bit is one
*/
unsigned int bit_flag = 0;
int bit_pos = 7;
bit_flag = (byte_now << bit_pos) & 0xFF;
printf("%d", bit_flag);

Is there something incorrect about the way I am trying to process bit shifting in my head?
There seems to be.
If so what is it that I am doing wrong?
That's unclear, so I offer a reasonably full explanation.
In the first place, it is important to understand that C does not not perform any arithmetic directly on integers smaller than int. Consider, then, your expression byte_now << bit_pos. "The usual arithmetic promotions" are performed on the operands, resulting in the left operand being converted to the int value 0xCD. The result has the same pattern of least-significant value bits as bit_flag, but also a bunch of leading zero bits.
Left shifting the result by 7 bits produces the bit pattern 110 0110 1000 0000, equivalent to 0x6680. You then perform a bitwise and operation on the result, masking off all but the least-significant 8 bits, thus yielding 0x80. What happens when you assign that to bit_flag depends on the type of that variable, but if it is an integer type that is either unsigned or has more than 7 value bits then the assignment is well-defined and value-preserving. Note that it is bit 7 that is nonzero, not bit 0.
The type of bit_flag is more important when you pass it to printf(). You've paired it with a %d field descriptor, which is correct if bit_flag has type int and incorrect otherwise. If bit_flag does have type int, then I would expect the program to print 128.

Unsure of normalising double values loaded as 2 bytes each

The code that I'm using for reading .wav file data into an 2D array:
int signal_frame_width = wavHeader.SamplesPerSec / 100; //10ms frame
int total_number_of_frames = numSamples / signal_frame_width;
double** loadedSignal = new double *[total_number_of_frames]; //array that contains the whole signal
int iteration = 0;
int16_t* buffer = new int16_t[signal_frame_width];
while ((bytesRead = fread(buffer, sizeof(buffer[0]), signal_frame_width, wavFile)) > 0)
{
loadedSignal[iteration] = new double[signal_frame_width];
for(int i = 0; i < signal_frame_width; i++){
//value normalisation:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
double normalisedValue = c/32768.0;
loadedSignal[iteration][i] = normalisedValue;
}
iteration++;
}
The problem is in this part, I don't exaclty understand how it works:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
It's example taken from here.
I'm working on 16bit .wav files only. As you can see, my buffer is loading (for ex. sampling freq. = 44.1kHz) 441 elements (each is 2byte signed sample). How should I change above code?

The original example, from which you constructed your code, used an array where each individual element represented a byte. It therefore needs to combine two consecutive bytes into a 16-bit value, which is what this line does:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
It shifts the byte at index i+1 (here assumed to be the most significant byte) left by 8 positions, and then ORs the byte at index i onto that. For example, if buffer[i+1]==0x12 and buffer[i]==0x34, then you get
buffer[i+1] << 8 == 0x12 << 8 == 0x1200
0x1200 | buffer[i] == 0x1200 | 0x34 == 0x1234
(The | operator is a bitwise OR.)
Note that you need to be careful whether your WAV file is little-endian or big-endian (but the original post explains that quite well).
Now, if you store the resulting value in a signed 16-bit integer, you get a value between −32768 and +32767. The point in the actual normalization step (dividing by 32768) is just to bring the value range down to [−1.0, 1.0).
In your case above, you appear to already be reading into a buffer of 16-bit values. Note that your code will therefore only work if the endianness of your platform matches that of the WAV file you are working with. But if this assumption is correct, then you don't need the code line which you do not understand. You can just convert every array element into a double directly:
double normalisedValue = buffer[i]/32768.0;

If buffer was an array of bytes, then that piece of code would interpret two consecutive bytes as a single 16-bit integer (assuming little-endian encoding). The | operator will perform a bit-wise OR on the bits of the two bytes. Since we wish to interpret the two bytes as a single 2-byte integer, then we must shift the bits of one of them 8 bits (1 byte) to the left. Which one depends on whether they are ordered in little-endian or big-endian order. Little-endian means that the least significant byte comes first, so we shift the second byte 8 bits to the left.
Example:
First byte: 0101 1100
Second byte: 1111 0100
Now shift second byte:
Second "byte": 1111 0100 0000 0000
First "byte": 0000 0000 0101 1100
Bitwise OR-operation (if either is 1, then 1. If both are 0, then 0):
16-bit integer: 1111 0100 0101 1100
In your case however, the bytes in your file have already been interpreted as 16-bit ints using whatever endianness the platform has. So you do not need this step. However, in order to correctly interpret the bytes in the file, one must assume the same byte-order as they were written in. Therefore, one usually adds this step to ensure that the code works independent of the endianness of the platform, instead relying on the expected byte-order of the files (as most file formats will specify what the byte-order should be).

Shifting syntax error

I have a byte array:
byte data[2]
I want to to keep the 7 less significant bits from the first and the 3 most significant bits from the second.
I do this:
unsigned int the=((data[0]<<8 | data[1])<<1)>>6;
Can you give me a hint why this does not work?
If I do it in different lines it works fine.

Can you give me a hint why this does not work?
Hint:
You have two bytes and want to preserve 7 less significant bits from the first and the 3 most significant bits from the second:
data[0]: -xxxxxxx data[1]: xxx-----
-'s represent bits to remove, x's represent bits to preserve.
After this
(data[0]<<8 | data[1])<<1
you have:
the: 00000000 0000000- xxxxxxxx xx-----0
Then you make >>6 and result is:
the: 00000000 00000000 00000-xx xxxxxxxx
See, you did not remove high bit from data[0].

Keep the 7 less significant bits from the first and the 3 most significant bits from the second.
Assuming the 10 bits to be preserved should be the LSB of the unsigned int value, and should be contiguous, and that the 3 bits should be the LSB of the result, this should do the job:
unsigned int value = ((data[0] & 0x7F) << 3) | ((data[1] & 0xE0) >> 5);
You might not need all the masking operands; it depends in part on the definition of byte (probably unsigned char, or perhaps plain char on a machine where char is unsigned), but what's written should work anywhere (16-bit, 32-bit or 64-bit int; signed or unsigned 8-bit (or 16-bit, or 32-bit, or 64-bit) values for byte).
Your code does not remove the high bit from data[0] at any point — unless, perhaps, you're on a platform where unsigned int is a 16-bit value, but if that's the case, it is unusual enough these days to warrant a comment.

Extremely fast hash function with collisions allowed

My key is a 64 bit address and the output is a 1 byte number (0-255). Collisions are allowed but the probability of them occurring should be low. Also, assume that number of elements to be inserted are low, lets say not more than 255, as to minimize the pigeon hole effect.
The addresses are addresses of the functions in the program.

uint64_t addr = ...
uint8_t hash = addr & 0xFF;
I think that meets all of your requirements.

I would XOR together the 2 LSB (least significant bytes), if this distribues badly, then add a 3rd one, and so forth
The rationale behind this is the following: function addresses do not distribute uniformly. The problem normally lies in the lower (lsb) bits. Functions usually need to begin in addresses divisible by 4/8/16 so the 2-4 lsb are probably meaningless. By XORing with the next byte, you should get rid of most of these problems and it's still pretty fast.

Function addresses are, I think, quite likely to be aligned (see this question, for instance). That seems to indicate that you want to skip least significant bits, depending on the alignment.
So, perhaps take the 8 bits starting from bit 3, i.e. skipping the least significant 3 bits (bits 0 through 2):
const uint8_t hash = (address >> 3);
This should be obvious from inspection of your set of addresses. In hex, watch the rightmost digit.

How about:
uint64_t data = 0x12131212121211B12;
uint32_t d1 = (data >> 32) ^ (uint32_t)(data);
uint16_t d2 = (d1 >> 16) ^ (uint16_t)(d1);
uint8_t d3 = (d2 >> 8) ^ (uint8_t)(d2);
return d3;
It combined all bits of your 8 bytes with 3 shifts and three xor instructions.

C++ How to combine two signed 8 Bit numbers to a 16 Bit short? Unexplainable results

I need to combine two signed 8 Bit _int8 values to a signed short (16 Bit) value. It is important that the sign is not lost.
My code is:
unsigned short lsb = -13;
unsigned short msb = 1;
short combined = (msb << 8 )| lsb;
The result I get is -13. However, I expect it to be 499.
For the following examples, I get the correct results with the same code:
msb = -1; lsb = -6; combined = -6;
msb = 1; lsb = 89; combined = 345;
msb = -1; lsb = 13; combined = -243;
However, msb = 1; lsb = -84; combined = -84; where I would expect 428.
It seems that if the lsb is negative and the msb is positive, something goes wrong!
What is wrong with my code? How does the computer get to these unexpected results (Win7, 64 Bit and VS2008 C++)?

Your lsb in this case contains 0xfff3. When you OR it with 1 << 8 nothing changes because there is already a 1 in that bit position.
Try short combined = (msb << 8 ) | (lsb & 0xff);

Or using a union:
#include <iostream>
union Combine
{
short target;
char dest[ sizeof( short ) ];
};
int main()
{
Combine cc;
cc.dest[0] = -13, cc.dest[1] = 1;
std::cout << cc.target << std::endl;
}

It is possible that lsb is being automatically sign-extended to 16 bits. I notice you only have a problem when it is negative and msb is positive, and that is what you would expect to happen given the way you're using the or operator. Although, you're clearly doing something very strange here. What are you actually trying to do here?

Raisonanse C complier for STM8 (and, possibly, many other compilers) generates ugly code for classic C code when writing 16-bit variables into 8-bit hardware registers.
Note - STM8 is big-endian, for little-endian CPUs code must be slightly modified. Read/Write byte order is important too.
So, standard C code piece:
unsigned int ch1Sum;
...
TIM5_CCR1H = ch1Sum >> 8;
TIM5_CCR1L = ch1Sum;
Is being compiled to:
;TIM5_CCR1H = ch1Sum >> 8;
LDW X,ch1Sum
CLR A
RRWA X,A
LD A,XL
LD TIM5_CCR1,A
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Too long, too slow.
My version:
unsigned int ch1Sum;
...
TIM5_CCR1H = ((u8*)&ch1Sum)[0];
TIM5_CCR1L = ch1Sum;
That is compiled into adequate two MOVes
;TIM5_CCR1H = ((u8*)&ch1Sum)[0];
MOV TIM5_CCR1,ch1Sum
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Opposite direction:
unsigned int uSonicRange;
...
((unsigned char *)&uSonicRange)[0] = TIM1_CCR2H;
((unsigned char *)&uSonicRange)[1] = TIM1_CCR2L;
instead of
unsigned int uSonicRange;
...
uSonicRange = TIM1_CCR2H << 8;
uSonicRange |= TIM1_CCR2L;

Some things you should know about the datatypes (un)signed short and char:
char is an 8-bit value, thats what you where looking for for lsb and msb. short is 16 bits in length.
You should also not store signed values in unsigned ones execpt you know what you are doing.
You can take a look at the two's complement. It describes the representation of negative values (for integers, not for floating-point values) in C/C++ and many other programming languages.
There are multiple versions of making your own two's complement:
int a;
// setting a
a = -a; // Clean version. Easier to understand and read. Use this one.
a = (~a)+1; // The arithmetical version. Does the same, but takes more steps.
// Don't use the last one unless you need it!
// It can be 'optimized away' by the compiler.
stdint.h (with inttypes.h) is more for the purpose of having exact lengths for your variable. If you really need a variable to have a specific byte-length you should use that (here you need it).
You should everythime use datatypes which fit your needs the best. Your code should therefore look like this:
signed char lsb; // signed 8-bit value
signed char msb; // signed 8-bit value
signed short combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
or like this:
#include <stdint.h>
int8_t lsb; // signed 8-bit value
int8_t msb; // signed 8-bit value
int_16_t combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
For the last one the compiler will use signed 8/16-bit values everytime regardless what length int has on your platform. Wikipedia got some nice explanation of the int8_t and int16_t datatypes (and all the other datatypes).
btw: cppreference.com is useful for looking up the ANSI C standards and other things that are worth to know about C/C++.

You wrote, that you need to combine two 8-bit values. Why you're using unsigned short then?
As Dan already said, lsb automatically extended to 16 bits. Try the following code:
uint8_t lsb = -13;
uint8_t msb = 1;
int16_t combined = (msb << 8) | lsb;
This gives you the expected result: 499.

If this is what you want:
msb: 1, lsb: -13, combined: 499
msb: -6, lsb: -1, combined: -1281
msb: 1, lsb: 89, combined: 345
msb: -1, lsb: 13, combined: -243
msb: 1, lsb: -84, combined: 428
Use this:
short combine(unsigned char msb, unsigned char lsb) {
return (msb<<8u)|lsb;
}
I don't understand why you would want msb -6 and lsb -1 to generate -6 though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js