Memory layout in memset - c++

I have this "buggy" code :
int arr[15];
memset(arr, 1, sizeof(arr));
memset sets each byte to 1, but since int is generally 4-bytes, it won't give the desired output. I know that each int in the array will we initalized to 0x01010101 = 16843009. Since I have a weak (very) understanding of hex values and memory layouts, can someone explain why it gets initialized to that hex value ? What will be the case if I have say, 4, in place of 1 ?

If I trust the man page
The memset() function writes len bytes of value c (converted to an unsigned char) to the string b.
In your case it will convert 0x00000001 (as an int) into 0x01 (as an unsigned char), then fill each byte of the memory with this value. You can fit 4 of that in an int, that is, each int will become 0x01010101.
If you had 4, it would be casted into the unsigned char 0x04, and each int would be filled with 0x04040404.
Does that make sense to you ?

What memset does is
Converts the value ch to unsigned char and copies it into each of the first count characters of the object pointed to by dest.
So, first your value (1) will be converted to unsigned char, which occupies 1 byte, so that will be 0b00000001. Then memset will fill the whole array's memory with these values. Since an int takes 4 bytes on your machine, the value of each int int the array would be 00000001000000010000000100000001 which is 16843009. If you place another value instead of 1, the array's memory will be filled with that value instead.

Note that memset converts its second argument to an unsigned char which is one byte. One byte is eight bits, and you're setting each byte to the value 1. So we get
0b00000001 00000001 00000001 00000001
or in hexadecimal,
0x01010101
or the decimal number 16843009. Why that value? Because
0b00000001000000010000000100000001 = 1*2^0 + 1*2^8 + 1*2^16 + 1*2^24
= 1 + 256 + 65536 + 16777216
= 16843009
Each group of four binary digits corresponds to one hexadecimal digit. Since 0b0000 = 0x0 and 0b0001 = 0x1, your final value is 0x01010101. With memset(arr, 4, sizeof(arr)); you would get 0x04040404 and with 12 you would get 0x0c0c0c0c.

Related

converting unsigned char > 255 to int

I'm programming with a PLC and I'm reading values out of it.
It gives me the data in unsigned char. That's fine, but the values in my PLC can be over 255. And since unsigned chars can't give a value over 255 I get the wrong information.
The structure I get from the library:
struct PlcVarValue
{
unsigned long ulTimeStamp ALIGNATTRIB;
unsigned char bQuality ALIGNATTRIB;
unsigned char byData[1] ALIGNATTRIB;
};
ulTimeStamp gives the time
bQuality gives true/false (be able to read it or not)
byData[1] gives the data.
Anyways I'm trying this now: (where ppValues is an object of PlcVarValue)
unsigned char* variableValue = ppValues[0]->byData;
int iVariableValue = *variableValue;
This works fine... untill ppValues[0]->byData is > 255;
When I try the following when the number is for example 257:
unsigned char testValue = ppValues[0]->byData[0];
unsigned char testValue2 = ppValues[0]->byData[1];
the output is testvalue = 1 and testvalue2 = 1
that doesn't make sense to me.
So my question is, how can I get this solved so it gives me the correct number?
That actually looks like a variable-sized structure, where having an array of size 1 at the end being a common way to have it. See e.g. this tutorial about it.
In this case, both bytes being 1 for the value 257 is the correct values. Think of the two bytes as a 16-bit value, and combine the bits. One byte will become the hight byte, where 1 corresponds to 256, and then add the low bytes which is 1 and you have 256 + 1 which of course is equal to 257. Simple binary arithmetic.
Which byte is the high, and which is the low we can't say, but it's easy to check if you can force a message that contains the value 258 instead, as then one byte will still be 1 but the other will be 2.
How to combine it into a single unsigned 16-bit value is also easy if you know the bitwise shift and or operators:
uint8_t high_byte = ...
uint8_t low_byte = ...
uint16_t word = high_byte << 8 | low_byte;

What's the difference among '0', '\0' and 0 with sizeof() and strlen()?

#include <iostream>
int main(int argc, char* argv[])
{
int pt[4] = {'0','\0',0};
std::cout<<"size of pt: "<<sizeof(pt)<<std::endl;
std::cout<<"strlen of pt: "<<strlen((char*)pt)<<std::endl;
}
the result is:
size of pt: 16
strlen of pt: 1
and when I change int pt[4] = {'0','\0',0}; to int pt[4] = {'\0','0',0};
the result is
size of pt: 16
strlen of pt: 0
Why?
'0' is the "ASCII character 0" and has the value 0x30.
'\0' is the character representing the value 0 and has the value 0.
0 is just the value 0.
pt is an array of 4 integers, so its size is 4x the size of an integer on our machine (which is evidently 4), so you get 16.
Since pt is an array of integers whose first value is 0, which is 0x30, that value as an integer is 0x00000030. When you type cast pt to a character pointer, then it looks like a pointer to a character string whose first 3 values are zero. So the strlen is 0 (EDIT: because of the endianness of your particular architecture).
'0' is a character with the value 48, representing the printable and displayable digit.
'\0' and 0 are both the value 0, with the first having a character type and the second being an integer literal.
sizeof gives the number of bytes in an object or array. strlen counts the number of bytes from the start of an array of char to the first byte with the value 0, and does not include the terminating 0. In the case of your example, you have an array of 4 ints, with each int taking 4 bytes; 4*4=16.

c++; Is bitset the solution for me?

I am writing a program and using memcpy to copy some bytes of data, using the following code;
#define ETH_ALEN 6
unsigned char sourceMAC[6];
unsigned char destMAC[6];
char* txBuffer;
....
memcpy((void*)txBuffer, (void*)destMAC, ETH_ALEN);
memcpy((void*)(txBuffer+ETH_ALEN), (void*)sourceMAC, ETH_ALEN);
Now I want to copy some data on to the end of this buffer (txBuffer) that is less than a single byte or greater than one byte, so it is not a multiple of 8 (doesn't finish on a whole byte boundary), so memcpy() can't be used (I don't believe?).
I want to add 16 more bits worth of data which is a round 4 bytes. First I need to add a value into the next 3 bits of txtBuffer which I have stored in an int, and a fourth bit which is always 0. Next I need to copy another 12 bit value, again I have this in an int.
So the first decimal value stored in an int is between 0 and 7 inclusively, the same is true for the second number I mention to go into the final 12 bits. The stored value is within the rang of 2^12. Should I for example 'bit-copy' the last three bits of the int into memory, or merge all these values together some how?
Is there a way I can compile these three values into 4 bytes to copy with memcpy, or should I use something like bitset to copy them in, bit at a time?
How should I solve this issue?
Thank you.
Assuming int is 4 bytes on your platform
int composed = 0;
int three_bits = something;
int twelve_bits = something_else;
composed = (three_bits & 0x07) | (1 << 3) | ((twelve_bits << 4) & 0xFFFFFF0);

How is a pipe reading with a size of 4 bytes into a 4 byte int returning more data?

Reading from a pipe:
unsigned int sample_in = 0; //4 bytes - 32bits, right?
unsigned int len = sizeof(sample_in); // = 4 in debugger
while (len > 0)
{
if (0 == ReadFile(hRead,
&sample_in,
sizeof(sample_in),
&bytesRead,
0))
{
printf("ReadFile failed\n");
}
len-= bytesRead; //bytesRead always = 4, so far
}
In the debugger, first iteration through:
sample_in = 536739282 //36 bits?
How is this possible if sample in is an unsigned int? I think I'm missing something very basic, go easy on me!
Thanks
Judging from your comment that says //36 bits? I suspect that you're expecting the data to be sent in a BCD-style format: In other words, where each digit is a number that takes up four bits, or two digits per byte. This way would result in wasted space however, you would use four bits, but values "10" to "15" aren't used.
In fact integers are represented in binary internally, thus allowing a 32-bit number to represent up to 2^32 different values. This comes out to 4,294,967,295 (unsigned) which happens to be rather larger than the number you saw in sample_in.
536739282 is well within the maximum boundary of an unsigned 4 byte integer, which is upwards of 4 billion.
536,739,282 will easily fit in an unsigned int and 32bits. The cap on an unsigned int is 4,200,000,000 or so.
unsigned int, your 4 byte unsigned integer, allows for values from 0 to 4,294,967,295. This will easily fit your value of 536,739,282. (This would, in fact, even fit in a standard signed int.)
For details on allowable ranges, see MSDN's Data Type Ranges page for C++.

unsigned char array to unsigned int back to unsigned char array via memcpy is reversed

This isn't cross-platform code... everything is being performed on the same platform (i.e. endianess is the same.. little endian).
I have this code:
unsigned char array[4] = {'t', 'e', 's', 't'};
unsigned int out = ((array[0]<<24)|(array[1]<<16)|(array[2]<<8)|(array[3]));
std::cout << out << std::endl;
unsigned char buff[4];
memcpy(buff, &out, sizeof(unsigned int));
std::cout << buff << std::endl;
I'd expect the output of buff to be "test" (with a garbage trailing character because of the lack of '/0') but instead the output is "tset." Obviously changing the order of characters that I'm shifting (3, 2, 1, 0 instead of 0, 1, 2, 3) fixes the problem, but I don't understand the problem. Is memcpy not acting the way I expect?
Thanks.
This is because your CPU is little-endian. In memory, the array is stored as:
+----+----+----+----+
array | 74 | 65 | 73 | 74 |
+----+----+----+----+
This is represented with increasing byte addresses to the right. However, the integer is stored in memory with the least significant bytes at the left:
+----+----+----+----+
out | 74 | 73 | 65 | 74 |
+----+----+----+----+
This happens to represent the integer 0x74657374. Using memcpy() to copy that into buff reverses the bytes from your original array.
You're running this on a little-endian platform.
On a little-endian platform, a 32-bit int is stored in memory with the least significant byte in the lowest memory address. So bits 0-7 are stored at address P, bits 8-15 in address P + 1, bits 16-23 in address P + 2 and bits 24-31 in address P + 3.
In your example: bits 0-7 = 't', bits 8-15 = 's', bits 16-23 = 'e', bits 24-31 = 't'
So that's the order that the bytes are written to memory: "tset"
If you address the memory then as separate bytes (unsigned chars), you'll read them in the order they are written to memory.
On a little-endian platform the output should be tset. The original sequence was test from lower addresses to higher addresses. Then you put it into an unsigned int with first 't' going into the most significant byte and the last 't' going into the least significant byte. On a little-endian machine the least significant byte is stored at lower address. This is how it will be copied to the final buf. This is how it is going to be output: from the last 't' to the first 't', i.e. tset.
On a big-endian machine you would not observe the reversal.
You have written a test for platform byte order, and it has concluded: little endian.
How about adding a '\0' to your buff?