I have a requirement, where 3 bytes (24 bits) need to be populated in a binary protocol. The original value is stored in an int (32 bits). One way to achieve this would be as follows:-
Technique1:-
long x = 24;
long y = htonl(x);
long z = y>>8;
memcpy(dest, z, 3);
Please let me know if above is the correct way to do it?
The other way, which i dont understand was implemented as below
Technique2:-
typedef struct {
char data1;
char data2[3];
} some_data;
typedef union {
long original_data;
some_data data;
} combined_data;
long x = 24;
combined_data somedata;
somedata.original_data = htonl(x);
memcpy(dest, &combined_data.data.data2, 3);
What i dont understand is, how did the 3 bytes end up in combined_data.data.data2 as opposed to first byte should go into combined_data.data.data1 and next 2 bytes should go into
combined_data.data.data2?
This is x86_64 platform running 2.6.x linux and gcc.
PARTIALLY SOLVED:-
On x86_64 platform, memory is addressed from right to left. So a variable of type long with value 24, will have following memory representation
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0 0 0 0x18
With htonl() performed on above long type, the memory becomes
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0x18 0 0 0
In the struct some_data, the
data1 = Byte1
data2[0] = Byte2
data2[1] = Byte3
data4[2] = Byte4
But my Question still holds, Why not simply right shift by 8 as shown in technique 1 ?
A byte takes 8 bits :-)
int x = 24;
int y = x<<8;
moving by 0 you are changing nothing. By 1 - *2, by 2 - *4, by 8 - *256.
if we are on the BIG ENDIAN machine, 4 bytes are put in memory as so: 2143. And such algorythms won't work for numbers greater than 2^15. On the other way, on the BIG ENDIAN machine you should define, what means " putting integer in 3 bytes"
Hmm. I think, the second proposed algorythm will be ok, but change the order of bytes:
You have them as 2143. You need 321, I think. But better check it.
Edit: I checked on wiki - x86 is little endian, they say, so algorythms are OK
Related
So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!
There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.
In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.
If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15
Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.
One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am an amateur at software programming.
However, I have a task to convert a dataframe that is 40 bytes long and in hex values to be converted to binary values and subsequently into decimal values. I tried converting the values from hex to binary after reading them byte by byte. It didn't work quite effectively as some of the data in the frame are not constituted of a single byte.
Let me explain a little in detail. I have a 40 byte long data frame that reads in hex like this:
0 40 ffffff82 2 0 0 28 6d
ffffffaf ffffffc8 0 41 0 8 78 8
72 17 16 16 0 42 0 2
1 2 1 16 ffffffff ffffffff 0 43
0 0 3 0 0 2 8 0
The reason I do not prefer converting these data by reading one byte at a time is because every byte displayed may not essentially imply a meaning. Please read on to understand what I mean by this.
For example:
1st to 6th byte represent data that are just 1 byte each. 1st byte is status, 2nd byte is unit voltage, 3rd being unit current and so forth.
Whereas when it comes to 7th and 8th byte represent a 2 byte data, unit SOC, meaning, unit SOC is a 16 bit data.
9th, 10th and 11th byte together indicate Module 1 cell failure information, i.e, the failure information is a 24 bit data.
12th,13th and 14th byte together indicate Module 2 cell failure information etc.
This being the case, how can I convert the incoming data frame into binary and subsequently to decimal without reading them byte after byte.
I would appreciate if this is something someone may be able to lend a helping hand with.
Suppose you have read your data frame into a buffer like this:
unsigned char inputbuffer[40];
Set a pointer pointing to the beginning of the buffer:
unsigned char *p = inputbuffer;
You can extract single-byte fields trivially:
int status = *p++; /* first byte */
int maxvoltage = *p++; /* second byte */
int current = *p++; /* third byte */
A two-byte field is only slightly more complicated:
int soc = *p++;
soc = (soc << 8) | *p++;
This reads two bytes for soc, concatenating them together as firstbyte+secondbyte. That assumes that the data frame uses what's called "big endian" byte order (that is, most-significant or "biggest" byte first). If that gives you crazy values, it's likely that the data uses "little endian" order, in which case you can flip the bytes around, yielding secondbyte+firstbyte, by reading them like this instead:
int soc = *p++;
soc = soc | *p++ << 8);
Alternatively, you can dispense with the pointer p, and access various bytes out of the inputbuffer array directly, although in that case you need to remember that arrays in C are 0-based:
int status = inputbuffer[0]; /* first byte */
int maxvoltage = inputbuffer[1]; /* second byte */
int current = inputbuffer[2]; /* third byte */
int soc = (inputbuffer[6] << 8) | inputbuffer[7];
or
int soc = inputbuffer[6] | (inputbuffer[7] << 8);
You can almost follow the same pattern for your 24-bit fields, except that for portability (and especially if you're on an old 16-bit machine) you need to take care to use a long int:
long int module_1_cell_failure = *p++;
module_1_cell_failure = (module_1_cell_failure << 8) | *p++;
module_1_cell_failure = (module_1_cell_failure << 8) | *p++;
or
long int module_1_cell_failure = *p++;
module_1_cell_failure |= (*p++ << 8);
module_1_cell_failure |= ((unsigned long)*p++ << 16);
or
long int module_1_cell_failure =
inputbuffer[8] | (inputbuffer[9] << 8) |
((unsigned long)inputbuffer[10] << 16);
I'm programming with a PLC and I'm reading values out of it.
It gives me the data in unsigned char. That's fine, but the values in my PLC can be over 255. And since unsigned chars can't give a value over 255 I get the wrong information.
The structure I get from the library:
struct PlcVarValue
{
unsigned long ulTimeStamp ALIGNATTRIB;
unsigned char bQuality ALIGNATTRIB;
unsigned char byData[1] ALIGNATTRIB;
};
ulTimeStamp gives the time
bQuality gives true/false (be able to read it or not)
byData[1] gives the data.
Anyways I'm trying this now: (where ppValues is an object of PlcVarValue)
unsigned char* variableValue = ppValues[0]->byData;
int iVariableValue = *variableValue;
This works fine... untill ppValues[0]->byData is > 255;
When I try the following when the number is for example 257:
unsigned char testValue = ppValues[0]->byData[0];
unsigned char testValue2 = ppValues[0]->byData[1];
the output is testvalue = 1 and testvalue2 = 1
that doesn't make sense to me.
So my question is, how can I get this solved so it gives me the correct number?
That actually looks like a variable-sized structure, where having an array of size 1 at the end being a common way to have it. See e.g. this tutorial about it.
In this case, both bytes being 1 for the value 257 is the correct values. Think of the two bytes as a 16-bit value, and combine the bits. One byte will become the hight byte, where 1 corresponds to 256, and then add the low bytes which is 1 and you have 256 + 1 which of course is equal to 257. Simple binary arithmetic.
Which byte is the high, and which is the low we can't say, but it's easy to check if you can force a message that contains the value 258 instead, as then one byte will still be 1 but the other will be 2.
How to combine it into a single unsigned 16-bit value is also easy if you know the bitwise shift and or operators:
uint8_t high_byte = ...
uint8_t low_byte = ...
uint16_t word = high_byte << 8 | low_byte;
I am writing a program and using memcpy to copy some bytes of data, using the following code;
#define ETH_ALEN 6
unsigned char sourceMAC[6];
unsigned char destMAC[6];
char* txBuffer;
....
memcpy((void*)txBuffer, (void*)destMAC, ETH_ALEN);
memcpy((void*)(txBuffer+ETH_ALEN), (void*)sourceMAC, ETH_ALEN);
Now I want to copy some data on to the end of this buffer (txBuffer) that is less than a single byte or greater than one byte, so it is not a multiple of 8 (doesn't finish on a whole byte boundary), so memcpy() can't be used (I don't believe?).
I want to add 16 more bits worth of data which is a round 4 bytes. First I need to add a value into the next 3 bits of txtBuffer which I have stored in an int, and a fourth bit which is always 0. Next I need to copy another 12 bit value, again I have this in an int.
So the first decimal value stored in an int is between 0 and 7 inclusively, the same is true for the second number I mention to go into the final 12 bits. The stored value is within the rang of 2^12. Should I for example 'bit-copy' the last three bits of the int into memory, or merge all these values together some how?
Is there a way I can compile these three values into 4 bytes to copy with memcpy, or should I use something like bitset to copy them in, bit at a time?
How should I solve this issue?
Thank you.
Assuming int is 4 bytes on your platform
int composed = 0;
int three_bits = something;
int twelve_bits = something_else;
composed = (three_bits & 0x07) | (1 << 3) | ((twelve_bits << 4) & 0xFFFFFF0);
I am using ms c++. I am using struct like
struct header {
unsigned port : 16;
unsigned destport : 16;
unsigned not_used : 7;
unsigned packet_length : 9;
};
struct header HR;
here this value of header i need to put in separate char array.
i did memcpy(&REQUEST[0], &HR, sizeof(HR));
but value of packet_length is not appearing properly.
like if i am assigning HR.packet_length = 31;
i am getting -128(at fifth byte) and 15(at sixth byte).
if you can help me with this or if their is more elegant way to do this.
thanks
Sounds like the expected behaviour with your struct as you defined packet_length to be 9 bits long. So the lowest bit of its value is already within the fifth byte of the memory. Thus the value -128 you see there (as the highest bit of 1 in a signed char is interpreted as a negative value), and the value 15 is what is left in the 6th byte.
The memory bits look like this (in reverse order, i.e. higher to lower bits):
byte 6 | byte 5 | ...
0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0
packet_length | not_used | ...
Note also that this approach may not be portable, as the byte order inside multibyte variables is platform dependent (see endianness).
Update: I am not an expert in cross-platform development, neither did you tell much details about the layout of your request etc. Anyway, in this situation I would try to set the fields of the request individually instead of memcopying the struct into it. That way I could at least control the exact values of each individual field.
struct header {
unsigned port : 16;
unsigned destport : 16;
unsigned not_used : 7;
unsigned packet_length : 9;
};
int main(){
struct header HR = {.packet_length = 31};
printf("%u\n", HR.packet_length);
}
$ gcc new.c && ./a.out
31
Update:
i now that i can print that value directly by using attribute in struct. But i need to send this struct on network and their i am using java.
In that case, use an array of chars (length 16+16+7+9) and parse on the other side using java.
Size of array will be less than struct, and more packing could be possible in a single MTU.