If I receive a message via RS232 consisting of 2 Byte length, e.g. 0000 0001 0001 1100 (that is 100011100, lsb on the right), I wanna save it to a variable called value.
I am "decoding" the byte stream with this step:
rxByte = Serial1.read()
messageContent[0] = rxByte
messageContent[1] = rxByte
with the first rxByte having the value 0000 0001 and the second 0001 1100.
Or are those values already converted internally to HEX or DEC?
Now I have seen code that saves it this way to value:
uint32_t value = messageContent[0] *256 + messageContent[0]
How does this work?
messageContent[0] *256 is essentially a bitshift: the code is equivelent to (and more readable as)
uint32_t value = (messageContext[0]) << 8 + messageContent[1];
So if `messageContent[0] = 0x01' and messageContext[2] = 0x1C
value = (0x01 << 8)+0x1C
value = (0x0100)+0x1C
value = 0x011C
Works find, but depending on the endianess of your machine, it is equivalent to:
uint32_t value = *((uint16_t*)(messageContext));
Decoding procedure:
//char messageContent[2]; //Always keep in mind datatypes in use!!!
messageContent[0] = Serial1.read()
messageContent[1] = Serial1.read()
Way you were doing was placing same value in both positions.
If you want to read both bytes into a 16-bit or bigger integer:
short int messageContent = Serial1.read()<<8+Serial.read();
Or are those values already converted internally to HEX or DEC?
Data is always binary. Hex or Dec is just its representation. You say "variable x as a value of 123" - this is a human interpretation, actually variable x is a block of memory comprised of some bytes which are by themselves groups of 8 bits.
Now I have seen code that saves it this way to value:
uint32_t value = messageContent[0] *256 + messageContent[0]
That's like I tell you 45 thousands and 123, so you build your number as 45*1000+123=45123. 256 is 2^8, equal to a full byte, b'1 0000 0000'.
Related
I have to parse date from raw bytes I get from the database for my application on C++. I've found out that date in MySQL is 4 bytes and the last two are month and day respectively. But the first two bytes strangely encode the year, so if the date is 2002-08-30, the content will be 210, 15, 8, 31. If the date is 1996-12-22, the date will be stored as 204, 15, 12, 22.
Obviously, the first byte can't be bigger than 255, so I've checked year 2047 -- it's 255, 15, and 2048 -- it's 128, 16.
At first I thought that the key is binary operations, but I did not quite understand the logic:
2047: 0111 1111 1111
255: 0000 1111 1111
15: 0000 0000 1111
2048: 1000 0000 0000
128: 0000 1000 0000
16: 0000 0001 0000
Any idea?
It seems that the logic of encoding is to erase the most significant bit of the first number and to write the second number from this erased bit like this:
2002 from 210 and 15:
1101 0010 -> _101 0010;
0000 1111 + _101 0010 -> 0111 1101 0010
2048 from 128 and 16:
1000 0000 -> _000 0000
0001 0000 + _000 0000 -> 1000 0000 0000
We had the same issue and developed the following C++20 helper methods for production use with mysqlx (MySQL Connector/C++ 8.0 X DevAPI) to properly read DATE, DATETIME and TIMESTAMP fields:
#pragma once
#include <vector>
#include <cstddef>
#include <chrono>
#include <mysqlx/xdevapi.h>
namespace mysqlx {
static inline std::vector<uint64_t>
mysqlx_raw_as_u64_vector(const mysqlx::Value& in_value)
{
std::vector<uint64_t> out;
const auto bytes = in_value.getRawBytes();
auto ptr = reinterpret_cast<const std::byte*>(bytes.first);
auto end = reinterpret_cast<const std::byte*>(bytes.first) + bytes.second;
while (ptr != end) {
static constexpr std::byte carry_flag{0b1000'0000};
static constexpr std::byte value_mask{0b0111'1111};
uint64_t v = 0;
uint64_t shift = 0;
bool is_carry;
do {
auto byte = *ptr;
is_carry = (byte & carry_flag) == carry_flag;
v |= std::to_integer<uint64_t>(byte & value_mask) << shift;
++ptr;
shift += 7;
} while (is_carry && ptr != end && shift <= 63);
out.push_back(v);
}
return out;
}
static inline std::chrono::year_month_day
read_date(const mysqlx::Value& value)
{
const auto vector = mysqlx_raw_as_u64_vector(value);
if (vector.size() < 3)
throw std::out_of_range{"Value is not a valid DATE"};
return std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
}
static inline std::chrono::system_clock::time_point
read_date_time(const mysqlx::Value& value)
{
const auto vector = mysqlx_raw_as_u64_vector(value);
if (vector.size() < 3)
throw std::out_of_range{"Value is not a valid DATETIME"};
auto ymd = std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
auto sys_days = std::chrono::sys_days{ymd};
auto out = std::chrono::system_clock::time_point(sys_days);
auto it = vector.begin() + 2;
auto end = vector.end();
if (++it == end)
return out;
out += std::chrono::hours{*it};
if (++it == end)
return out;
out += std::chrono::minutes{*it};
if (++it == end)
return out;
out += std::chrono::seconds{*it};
if (++it == end)
return out;
out += std::chrono::microseconds{*it};
return out;
}
} //namespace
Which can then be used as follows:
auto row = table.select("datetime", "date").execute().fetchOne();
auto time_point = read_date_time(row[0]);
auto year_month_day = read_date(row[1]);
getBytes document links to ColumnMetaData document url.
ColumnMetaData document links to protobuf encoding url.
protobuf encoding url / Protocol Buffers Documentation Documentation say :
Base 128 Varints
Variable-width integers, or varints, are at the core of the wire
format. They allow encoding unsigned 64-bit integers using anywhere
between one and ten bytes, with small values using fewer bytes.
Each byte in the varint has a continuation bit that indicates if the
byte that follows it is part of the varint. This is the most
significant bit (MSB) of the byte (sometimes also called the sign
bit). The lower 7 bits are a payload; the resulting integer is built
by appending together the 7-bit payloads of its constituent bytes.
So, for example, here is the number 1, encoded as 01 – it’s a single
byte, so the MSB is not set:
0000 0001
^ msb
And here is 150, encoded as 9601 – this is a bit more complicated:
10010110 00000001
^ msb ^ msb
How do you figure out that this is 150? First you drop the MSB from
each byte, as this is just there to tell us whether we’ve reached the
end of the number (as you can see, it’s set in the first byte as there
is more than one byte in the varint). Then we concatenate the 7-bit
payloads, and interpret it as a little-endian, 64-bit unsigned
integer:
10010110 00000001 // Original inputs.
0010110 0000001 // Drop continuation bits.
0000001 0010110 // Put into little-endian order.
10010110 // Concatenate.
128 + 16 + 4 + 2 = 150 // Interpret as integer.
Because varints are so crucial to protocol buffers, in protoscope
syntax, we refer to them as plain integers. 150 is the same as 9601.
Based on what you provide, it seems to be N1 - 128 + N2 * 128.
Which version???
DATETIME used to be encoded in packed decimal (8 bytes). But, when fractional seconds were added, the format was changed to something like
Length indication (1 byte)
INT UNSIGNED for seconds-since-1970 (4 bytes)
fractional seconds (0-3 bytes)
DATE is stored like MEDIUMINT UNSIGNED (3 bytes) as days since 0000-00-00 (or something like that).
How did you get the "raw bytes"? There is no function to let you do that. Select HEX(some-date) first converts to a string (like "2022-03-22") then takes the hex of it. That gives you 323032322D30332D3232.
About Code refrence the answer.
About document content check below words :
getBytes document links to ColumnMetaData document url.
ColumnMetaData document links to protobuf encoding url.
protobuf encoding url / Protocol Buffers Documentation Documentation say :
Base 128 Varints
Variable-width integers, or varints, are at the core of the wire
format. They allow encoding unsigned 64-bit integers using anywhere
between one and ten bytes, with small values using fewer bytes.
Each byte in the varint has a continuation bit that indicates if the
byte that follows it is part of the varint. This is the most
significant bit (MSB) of the byte (sometimes also called the sign
bit). The lower 7 bits are a payload; the resulting integer is built
by appending together the 7-bit payloads of its constituent bytes.
So, for example, here is the number 1, encoded as 01 – it’s a single
byte, so the MSB is not set:
0000 0001
^ msb
And here is 150, encoded as 9601 – this is a bit more complicated:
10010110 00000001
^ msb ^ msb
How do you figure out that this is 150? First you drop the MSB from
each byte, as this is just there to tell us whether we’ve reached the
end of the number (as you can see, it’s set in the first byte as there
is more than one byte in the varint). Then we concatenate the 7-bit
payloads, and interpret it as a little-endian, 64-bit unsigned
integer:
10010110 00000001 // Original inputs.
0010110 0000001 // Drop continuation bits.
0000001 0010110 // Put into little-endian order.
10010110 // Concatenate.
128 + 16 + 4 + 2 = 150 // Interpret as integer.
Because varints are so crucial to protocol buffers, in protoscope
syntax, we refer to them as plain integers. 150 is the same as 9601.
Each byte in the varint has a continuation bit that indicates if the byte that follows it is part of the varint.
So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!
There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.
In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.
If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15
Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.
One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.
In the below code, the variable Speed is of type int. How is it stored in two variables of char type? I also don't understand the comment // 16 bits - 2 x 8 bits variables.
Can u explain me with example for the type conversion because when I run the code it shows symbols after type conversion
AX12A::turn(unsigned char ID, bool SIDE, int Speed)
{
if (SIDE == LEFT)
{
char Speed_H,Speed_L;
Speed_H = Speed >> 8;
Speed_L = Speed; // 16 bits - 2 x 8 bits variables
}
}
main(){
ax12a.turn(ID,Left,200)
}
It seems like on your platform, a variable of type int is stored on 16 bits and a variable of type char is stored on 8 bits.
This does not always happen, as the C++ standard does not guarantee the size of these types. I made my assumption based on the code and the comment. Use data types of fixed size, such as the ones described here, to make sure this assumption is always going to be true.
Both int and char are integral types. When converting from a larger integral type to a smaller integral type (e.g. int to char), the most significant bits are discarded, and the least significant bits are kept (in this case, you keep the last 8 bits).
Before fully understanding the code, you also need to know about right shift. This simply moves the bits to the right (for the purpose of this answer, it does not matter what is inserted to the right). Therefore, the least significant bit (the rightmost bit) is discarded, every other bit is moved one space to the right. Very similar to division by 10 in the decimal system.
Now, you have your variable Speed, which has 16 bits.
Speed_H = Speed >> 8;
This shifts Speed with 8 bits to the right, and then assigns the 8 least significant bits to Speed_H. This basically means that you will have in Speed_H the 8 most significant bits (the "upper" half of Speed).
Speed_L = Speed;
Simply assigns to Speed_L the least significant 8 bits.
The comment basically states that you split a variable of 16 bits into 2 variables of 8 bits, with the first (most significant) 8 bits being stored in Speed_H and the last (least significant) 8 bits being stored in Speed_L.
From your code I understand that sizeof(int) = 2 bytes in your case.
Let us take example as shown below.
int my_var = 200;
my_var is allocated 2 bytes of memory address because datatype is ‘int’.
value assigned to my_var is 200.
Note that 200 decimal = 0x00C8 Hexadecimal = 0000 0000 1100 1000 binary
Higher byte 0000 0000 binary is stored in one of the addresses allocated to my_var
And lower byte 1100 1000 is stored in other address depending on endianness.
To know about endianness, check this link
https://www.geeksforgeeks.org/little-and-big-endian-mystery/
In your code :
int Speed = 200;
Speed_H = Speed >> 8;
=> 200 decimal value right shifted 8 times
=> that means 0000 0000 1100 1000 binary value right shifted by 8 bits
=> that means Speed_H = 0000 0000 binary
Speed_L = Speed;
=> Speed_L = 200;
=> Speed_L = 0000 0000 1100 1000 binary
=> Speed_L is of type char so it can accommodate only one byte
=> The value 0000 0000 1100 1000 will be narrowed (in other words "cut-off") to least significant byte and assigned to Speed_L.
=> Speed_L = 1100 1000 binary = 200 decimal
I've a function that takes int8_t val and converts it to int7_t.
//Bit [7] reserved
//Bits [6:0] = signed -64 to +63 offset value
// user who calls this function will use it correctly (-64 to +63)
uint8_t func_int7_t(int8_t val){
uint8_t val_6 = val & 0b01111111;
if (val & 0x80)
val_6 |= 0x40;
//...
//do stuff...
return val_6;
}
What is the best and fastest way to manipulate the int8 to int7? Did I do it efficient and fast? or there is better way?
The target is ARM Cortex M0+ if that matters
UPDATE:
After reading different answers I can say the question was asked wrong? (or my code in the question is what gave wrong assumptions to others) I had the intension to make an int8 to int7
So it will be done by doing nothing because
8bit:
63 = 0011 1111
62 = 0011 1110
0 = 0000 0000
-1 = 1111 1111
-2 = 1111 1110
-63 = 1100 0001
-64 = 1100 0000
7bit:
63 = 011 1111
62 = 011 1110
0 = 000 0000
-1 = 111 1111
-2 = 111 1110
-63 = 100 0001
-64 = 100 0000
the faster way is probably :
uint8_t val_7 = (val & 0x3f) | ((val >> 1) & 0x40);
val & 0x3f get the 6 lower bits (truncate) and ((val >> 1) & 0x40) move the bit to sign from the position 8 to 7
The advantage to not use a if is to have a shorter code (even you can use arithmetic if) and to have a code without sequence break
To clear the reserved bit, just
return val & 0x7f;
To leave the reserved bit exactly like how it was from input, nothing needs to be done
return val;
and the low 7 bits will contain the values in [-64, 63]. Because in two's complement down casting is done by a simple truncation. The value remains the same. That's what happens for an assignment like (int8_t)some_int_value
There's no such thing as 0bX1100001. There's no undefined bit in machine language. That state only exists in hardware, like the high-Z state or undefined state in Verilog or other hardware description languages
Use bitfield to narrow the value and let compiler to choose what sequence of shifts and/or masks is most efficient for that on your platform.
inline uint8_t to7bit(int8_t x)
{
struct {uint8_t x:7;} s;
return s.x = x;
}
If you are not concerned about what happens to out-of-range values, then
return val & 0x7f;
is enough. This correctly handles values in the range -64 <= val <= 63.
You haven't said how you want to handle out-of-range values, so I have nothing to say about that.
Updated to add: The question has been updated so stipulate that the function will never be called with out-of-range values. So this method qualifies unambiguously as "best and fastest".
the user who calls this function he knows he should put data -64 to +63
So not considering any other values, the really fastest thing you can do is not doing anything at all!
You have a 7 bit value stored in eight bits. Any value within specified range will have both bit 7 and bit 6 the same value, and when you process the 7-bit value, you just ignore the MSB (of 8-bit value), no matter if set or not, e. g.:
for(unsigned int bit = 0x40; bit; bit >>= 1)
// NOT: 0x80!
std::cout << (value & bit);
The other way round is more critical: whenever you receive these seven bits via some communication channel, then you need to do manual sign extension for eight (or more) bits to be able to correctly use that value.
The code that I'm using for reading .wav file data into an 2D array:
int signal_frame_width = wavHeader.SamplesPerSec / 100; //10ms frame
int total_number_of_frames = numSamples / signal_frame_width;
double** loadedSignal = new double *[total_number_of_frames]; //array that contains the whole signal
int iteration = 0;
int16_t* buffer = new int16_t[signal_frame_width];
while ((bytesRead = fread(buffer, sizeof(buffer[0]), signal_frame_width, wavFile)) > 0)
{
loadedSignal[iteration] = new double[signal_frame_width];
for(int i = 0; i < signal_frame_width; i++){
//value normalisation:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
double normalisedValue = c/32768.0;
loadedSignal[iteration][i] = normalisedValue;
}
iteration++;
}
The problem is in this part, I don't exaclty understand how it works:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
It's example taken from here.
I'm working on 16bit .wav files only. As you can see, my buffer is loading (for ex. sampling freq. = 44.1kHz) 441 elements (each is 2byte signed sample). How should I change above code?
The original example, from which you constructed your code, used an array where each individual element represented a byte. It therefore needs to combine two consecutive bytes into a 16-bit value, which is what this line does:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
It shifts the byte at index i+1 (here assumed to be the most significant byte) left by 8 positions, and then ORs the byte at index i onto that. For example, if buffer[i+1]==0x12 and buffer[i]==0x34, then you get
buffer[i+1] << 8 == 0x12 << 8 == 0x1200
0x1200 | buffer[i] == 0x1200 | 0x34 == 0x1234
(The | operator is a bitwise OR.)
Note that you need to be careful whether your WAV file is little-endian or big-endian (but the original post explains that quite well).
Now, if you store the resulting value in a signed 16-bit integer, you get a value between −32768 and +32767. The point in the actual normalization step (dividing by 32768) is just to bring the value range down to [−1.0, 1.0).
In your case above, you appear to already be reading into a buffer of 16-bit values. Note that your code will therefore only work if the endianness of your platform matches that of the WAV file you are working with. But if this assumption is correct, then you don't need the code line which you do not understand. You can just convert every array element into a double directly:
double normalisedValue = buffer[i]/32768.0;
If buffer was an array of bytes, then that piece of code would interpret two consecutive bytes as a single 16-bit integer (assuming little-endian encoding). The | operator will perform a bit-wise OR on the bits of the two bytes. Since we wish to interpret the two bytes as a single 2-byte integer, then we must shift the bits of one of them 8 bits (1 byte) to the left. Which one depends on whether they are ordered in little-endian or big-endian order. Little-endian means that the least significant byte comes first, so we shift the second byte 8 bits to the left.
Example:
First byte: 0101 1100
Second byte: 1111 0100
Now shift second byte:
Second "byte": 1111 0100 0000 0000
First "byte": 0000 0000 0101 1100
Bitwise OR-operation (if either is 1, then 1. If both are 0, then 0):
16-bit integer: 1111 0100 0101 1100
In your case however, the bytes in your file have already been interpreted as 16-bit ints using whatever endianness the platform has. So you do not need this step. However, in order to correctly interpret the bytes in the file, one must assume the same byte-order as they were written in. Therefore, one usually adds this step to ensure that the code works independent of the endianness of the platform, instead relying on the expected byte-order of the files (as most file formats will specify what the byte-order should be).