gcc calloc behavior when allocating memory to void* - c++

i know this is not a good way to use void pointers, but i am curious about this behavior.
int main()
{
void* ptr;
void* next;
ptr = ::operator new(8);
next = (void*)((char*)ptr + 1) ;
*(int*)next = 90;
*(int*)ptr =100;
cout << "Ptr = " << ptr<<endl;
cout << "Ptr-> " << *(int*)ptr<<endl;
cout << "Next = " << next<<endl;
cout << "Next-> " << *(int*)next<<endl;
}
---------------
Result :
Ptr = 0x234e010
Ptr-> 100
Next = 0x234e011
Next-> 0
---------------
in both 64bit(x86_64) and 32bit(x86) platforms (Linux and windows) when add more than 4 byte to ptr, 90 will remain in where next is pointing to, but when add less than 4,
*next suddenly became 0 when *ptr=100 will be execute, same result with calloc and malloc
Why changing the content of where ptr point have to change the content of where ptr+1 point ? and why this not ganna happen when we go more than 4 byte ? and if it is about memory alignment why same behavior on both 32bit and 64bit platforms ?
also same result in delphi (windows 32bit/Embarcadero)
Thank you and sorry for my bad English

As you're on an X86 architecture, alignment won't be a problem: the processor can access unaligned memory, even if it's slower.
Intel processors store data with the least significant byte FIRST. Google about endianness to get a more in-depth explanation. What happens is
vvvvvvvvvvvvvvvv this is Ptr
+===+===+===+===+===+===+===+===+
| |100| 0 | 0 | 0 | | | |
+===+===+===+===+===+===+===+===+
^^^^^^^^^^^^^^^^ this is next after you say '*next=100'
now when you store 90 to *Ptr, this changes to
vvvvvvvvvvvvvvvv this is Ptr
+===+===+===+===+===+===+===+===+
| 90| 0 | 0 | 0 | 0 | | | |
+===+===+===+===+===+===+===+===+
^^^^^^^^^^^^^^^^ this is next
so next becomes 0, because its first 3 bytes get overwritten by 0, and the last byte is 0 anyways.

ints are (typically) 4 bytes. You're only adding one byte to ptr for next rather than 4. This means that ptr and next overlap, thus your assignment of 100 is clobbering the last byte of 90 (which happens to 0 it out).
You need:
next = (void*)((char*)ptr + sizeof(int));
And really you need to just use properly typed pointers.

What appears to be happening is that the eight bytes of memory have the following contents after each operation:
*(int*)next = 90; // 90 == 0x00 00 00 5A
--> 0x?? 5A 00 00 00 ?? ?? ??
*(int*)ptr = 100; // 100 == 0x00 00 00 64
--> 0x64 00 00 00 00 ?? ?? ??
As a result, *(int*)next is now equal to zero. Note that you can't count on this behavior, since it depends on the underlying hardware.

This is nothing to do with allocation or void* and everything to do with your making assumptions about how values are stored in memory.
You wrote an int, a unit occupying 4 bytes on your system, to address 101 and then wrote another to address 100, overwriting 3 of the first value.
Unfortunatelty, the system you used doesn't use the endianess/byte ordering you assumed whereby the lsb of the first value would occupy address 104. The result was that one of the four zero bytes from your second write made the first value zero.
This is because you invoked undefined behavior using char offsets to bit twiddle int values.
When ints become 8 bytes anyone who was using their own hand manipulation of pointers will have the same problem.
Incidentally, you should be able to tell form this something about the word order of your systems numeric representation.

Related

Assign char address to an int pointer, and consequently write an int-size piece of memory

from the book Stroustrup - Programming: Principles and practices using C++. In §17.3, about Memory, addresses and pointers, it is supposed to be allowed to assign a char* to int*:
char ch1 = 'a';
char ch2 = 'b';
char ch3 = 'c';
char ch4 = 'd';
int* pi = &ch3; // point to ch3, a char-size piece of memory
*pi = 12345; // write to an int-size piece of memory
*pi = 67890;
graphically we have something like this:
quoting from the source:
Had the compiler allowed the code, we would have been writing 12345 to the memory starting at &ch3. That would definitely have changed the value of some nearby memory, such as ch2 or ch4, or we would have overwritten part of pi itself.
In that case, the next assignment *pi = 67890 would place 67890 in some completely different part of memory.
I don't understand, why the next assignment would place it: in some completely different part of memory? The address stored in int *pi is still &ch3, so that assignment would be overwrite the content at that address, i.e. 12345. Why it isn't so?
Please, can you help me? Many thanks!
char ch3 = 'c';
int* pi = &ch3;
it is supposed to be allowed to assign a char* to int*:
Not quite - there is an alignment concern. It is undefined behavior (UB) when
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr § 6.3.2.3 7
Example: Some processor require int * to be even and if &ch3 was odd, storing the address might fail, and de-referencing the address certainly fails: bus error.
The next is certainly UB as the destination is outside the memory of ch3.
ch1, ch2, ch4 might be nearby and provide some reasonable undefined behavior, but the result is UB.
// undefined behavior
*pi = 12345; // write to an int-size piece of memory`
When code attempts to write outside its bounds - it is UB, anything may happen, including writing into neighboring data.
The address stored in int *pi is still &ch3
Maybe, maybe not. UB has occurred.
why the next assignment would place it: in some completely different part of memory?
The abusive code suggests that pi itself is overwritten by *pi = 12345;. This might happen, it might not. It is UB. A subsequent use of *pi is simply more UB.
Recall with UB you might get what you hope for, you might not - it is not defined by C.
You seem to have skipped part of the explanation you quoted:
or we would have overwritten part of pi itself
Think of it this way, since ints are larger than chars, if an int* points to an address location that stores a char, there will be overflowing memory when you attempt to assign an integer value to that location, as you only have a single byte of memory allocated but are assigning 4 bytes worth of data. i.e. you cannot fit 4 bytes of data into one, so the other 3 bytes will go somewhere.
Assume then that the overflowing bytes partially change the value stored in pi. Now the next assignment will go to a random memory location.
Let's assume the memory address layout is:
0 1 2 3 4 5 6 7
From the left 0, 1, 2 and 3 are characters. From the right 4, 5, 6 and 7 are an int*.
The values in each byte in hex may be:
61 62 63 64 02 00 00 00
Note how the first four are ascii values and the last four are the address of ch3. Writing *pi = 12345; Changes the values like so:
61 62 39 30 00 00 00 00
With 0x39300000 being 12345 in little endian hexadecimal.
The next write *pi = 67890; would start from memory adress 00 00 00 00 not 02 00 00 00 as one could expect.
Firstly, you have to understand that everything is a number i.e., a char, int, int* all contain numbers. Memory addresses are also numbers. Let's assume the current example compiles and we have memory like following:
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 c
0x04 | ch4 d
0x05 | pi &ch3 = 0x03
Now let's dereference pi and reassign a new value to ch3:
*pi = 12345;
Let's assume int is 4 bytes. Since pi is an int pointer, it will write a 4 byte value to the location pointed by pi. Now, char can only contain one byte values so, what would happen if we try to write 4 bytes to that location? Strictly speaking, this is undefined behaviour but I will try to explain what the author means.
Since char cannot contain values larger than 1 byte, *pi = 12345 will overflow ch3. When this overflow happens, the remaining 3 bytes out of the 4 bytes may get written in the memory location nearby. What memory locations do we have nearby? ch4 and pi itself! ch4 can only contain 1 byte as well, that leaves us with 2 bytes and the next location is pi itself. Meaning pi will overwrite it's own value!
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 12 //12 ended up here
0x04 | ch4 34 //34 ended up here
0x05 | pi &ch3 = 0x03 // 5 gets written here
As you can see that pi is now pointing to some other memory address which is definitely not ch3.

Escape sequence in char confusion

I came across this example here
.
#include <iostream>
int main()
{
int i = 7;
char* p = reinterpret_cast<char*>(&i);
if(p[0] == '\x7') //POINT OF INTEREST
std::cout << "This system is little-endian\n";
else
std::cout << "This system is big-endian\n";
}
What I'm confused about is the if statement. How do the escape sequences behave here? I get the same result with p[0] == '\07' (\x being hexadecimal escape sequence). How would checking if p[0] == '\x7' tell me if the system is little or big endian?
The layout of a (32-bit) integer in memory is;
Big endian:
+-----+-----+-----+-----+
| 0 | 0 | 0 | 7 |
+-----+-----+-----+-----+
^ pointer to int points here
Little endian:
+-----+-----+-----+-----+
| 7 | 0 | 0 | 0 |
+-----+-----+-----+-----+
^ pointer to int points here
What the code basically does is read the first char that the integer pointer points to, which in case of little endian is \x0, and big endian is \x7.
Hex 7 and octal 7 happens to be the same value, as is decimal 7.
The check is intended to try to determine if the value ends up in the first or last byte of the int.
A little endian system will store the bytes of the value in "reverse" order, with the lower part first
07 00 00 00
A big endian system would store the "big" end first
00 00 00 07
By reading the first byte, the code will see if the 7 ends up there, or not.
7 in decimal is the same as 7 in hexadecimal and 7 in octal, so it doesn't matter if you use '\x7', '\07', or even just 7 (numeric literal, not a character one).
As for the endianness test: the value of i is 7, meaning it will have the number 7 in its least significant byte, and 0 in all other bytes. The cast char* p = reinterpret_cast<char*>(&i); makes p point to the first byte in the representation of i. The test then checks whether that byte's value is 7. If so, it's the least significant byte, implying a little-endian system. If the value is not 7, it's not a little-endian system. The code assumes that it's a big-endian, although that's not strictly established (I believe there were exotic systems with some sort of mixed endianness as well, although the code will probably not run on such in practice).

Checksum for binary PLC communication

I've been scratching my head around calculating a checksum to communicate with Unitronics PLCs using binary commands. They offer the source code but it's in a Windows-only C# implementation, which is of little help to me other than basic syntax.
Specification PDF (the checksum calculation is near the end)
C# driver source (checksum calculation in Utils.cs)
Intended Result
Below is the byte index, message description and the sample which does work.
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | 24 25 26 27 28 29 | 30 31 32
# sx--------------- id FE 01 00 00 00 cn 00 specific--------- lengt CHKSM | numbr ot FF addr- | CHKSM ex
# 2F 5F 4F 50 4C 43 00 FE 01 00 00 00 4D 00 00 00 00 00 01 00 06 00 F1 FC | 01 00 01 FF 01 00 | FE FE 5C
The specification calls for calculating the accumuated value of the 22 byte message header and, seperately, the 6+ byte detail, getting the value of sum modulo 65536, and then returning two's complement of that value.
Attempt #1
My understanding is the tilde (~) operator in Python is directly derived from C/C++. After a day of writing the Python that creates the message I came up with this (stripped down version):
#!/usr/bin/env python
def Checksum( s ):
x = ( int( s, 16 ) ) % 0x10000
x = ( ~x ) + 1
return hex( x ).split( 'x' )[1].zfill( 4 )
Details = ''
Footer = ''
Header = ''
Message = ''
Details += '0x010001FF0100'
Header += '0x2F5F4F504C4300FE010000004D000000000001000600'
Header += Checksum( Header )
Footer += Checksum( Details )
Footer += '5C'
Message += Header.split( 'x' )[1].zfill( 4 )
Message += Details.split( 'x' )[1].zfill( 4 )
Message += Footer
print Message
Message: 2F5F4F504C4300FE010000004D000000000001000600600L010001FF010001005C
I see an L in there, which is a different result to yesterday, which wasn't any closer. If you want a quick formula result based on the rest of the message: Checksum(Header) should return F1FC and Checksum(Details) should return FEFE.
The value it returns is nowhere near the same as the specification's example. I believe the issue may be one or two things: the Checksum method isn't calculating the sum of the hex string correctly or the Python ~ operator is not equivalent to the C++ ~ operator.
Attempt #2
A friend has given me his C++ interpretation of what the calculation SHOULD be, I just can't get my head around this code, my C++ knowledge is minimal.
short PlcBinarySpec::CalcHeaderChecksum( std::vector<byte> _header ) {
short bytesum = 0;
for ( std::vector<byte>::iterator it = _header.begin(); it != _header.end(); ++it ) {
bytesum = bytesum + ( *it );
}
return ( ~( bytesum % 0x10000 ) ) + 1;
}
I'm not entirely sure what the correct code should be… but if the intention is for Checksum(Header) to return f705, and it's returning 08fb, here's the problem:
x = ( ~( x % 0x10000 ) ) + 1
The short version is that you want this:
x = (( ~( x % 0x10000 ) ) + 1) % 0x10000
The problem here isn't that ~ means something different. As the documentation says, ~x returns "the bits of x inverted", which is effectively the same thing it means in C (at least on 2s-complement platforms, which includes all Windows platforms).
You can run into a problem with the difference between C and Python types here (C integral types are fixed-size, and overflow; Python integral types are effectively infinite-sized, and grow as needed). But I don't think that's your problem here.
The problem is just a matter of how you convert the result to a string.
The result of calling Checksum(Header), up to the formatting, is -2299, or -0x08fb, in both versions.
In C, you can pretty much treat a signed integer as an unsigned integer of the same size (although you may have to ignore warnings to do so, in some cases). What exactly that does depends on your platform, but on a 2s-complement platform, signed short -0x08fb is the bit-for-bit equivalent of unsigned 0xf705. So, for example, if you do sprintf(buf, "%04hx", -0x08fb), it works just fine—and it gives you (on most platforms, including everything Windows) the unsigned equivalent, f705.
But in Python, there are no unsigned integers. The int -0x08fb has nothing to do with 0xf705. If you do "%04hx" % -0x08fb, you'll get -8fb, and there's no way to forcibly "cast it to unsigned" or anything like that.
Your code actually does hex(-0x08fb), which gives you -0x8fb, which you then split on the x, giving you 8fb, which you zfill to 08fb, which makes the problem a bit harder to notice (because that looks like a perfectly valid pair of hex bytes, instead of a minus sign and three hex digits), but it's the same problem.
Anyway, you have to explicitly decide what you mean by "unsigned equivalent", and write the code to do that. Since you're trying to match what C does on a 2s-complement platform, you can write that explicit conversion as % 0x10000. If you do "%04hx" % (-0x08fb % 0x10000), you'll get f705, just as you did in C. And likewise for your existing code.
It's quite simple. I checked your friend's algorithm by adding all the header bytes manually on a calculator, and it yields the correct result (0xfcf1).
Now, I don't actually know python, but it looks to me like you are adding up half-byte values. You have made your header string like this:
Header = '2F5F4F504C4300FE010000004D000000000001000600'
And then you go through converting each byte in that string from hex and adding it. That means you are dealing with values from 0 to 15. You need to consider every two bytes as a pair and convert that (values from 0 to 255). Or you need to use actual binary data instead of a text representation of the binary data.
At the end of the algorithm, you don't really need to do the ~ operator if you don't trust it. Instead you can do (0xffff - (x % 0x10000)) + 1. Bear in mind that prior to adding 1, the value might actually be 0xffff, so you need to modulo the entire result by 0x10000 afterwards. Your friend's C++ version uses the short datatype so no modulo is necessary at all because the short will naturally overflow

Using bitwise operators in C++ to change 4 chars to int

What I must do is open a file in binary mode that contains stored data that is intended to be interpreted as integers. I have seen other examples such as Stackoverflow-Reading “integer” size bytes from a char* array. but I want to try taking a different approach (I may just be stubborn, or stupid :/). I first created a simple binary file in a hex editor that reads as follows.
00 00 00 47 00 00 00 17 00 00 00 41
This (should) equal 71, 23, and 65 if the 12 bytes were divided into 3 integers.
After opening this file in binary mode and reading 4 bytes into an array of chars, how can I use bitwise operations to make char[0] bits be the first 8 bits of an int and so on until the bits of each char are part of the int.
My integer = 00 00 00 00
+ ^ ^ ^ ^
Chars Char[0] Char[1] Char[2] Char[3]
00 00 00 47
So my integer(hex) = 00 00 00 47 = numerical value of 71
Also, I don't know how the endianness of my system comes into play here, so is there anything that I need to keep in mind?
Here is a code snippet of what I have so far, I just don't know the next steps to take.
std::fstream myfile;
myfile.open("C:\\Users\\Jacob\\Desktop\\hextest.txt", std::ios::in | std::ios::out | std::ios::binary);
if(myfile.is_open() == false)
{
std::cout &lt&lt "Error" &lt&lt std::endl;
}
char* mychar;
std::cout &lt&lt myfile.is_open() &lt&lt std::endl;
mychar = new char[4];
myfile.read(mychar, 4);
I eventually plan on dealing with reading floats from a file and maybe a custom data type eventually, but first I just need to get more familiar with using bitwise operations.
Thanks.
You want the bitwise left shift operator:
typedef unsigned char u8; // in case char is signed by default on your platform
unsigned num = ((u8)chars[0] << 24) | ((u8)chars[1] << 16) | ((u8)chars[2] << 8) | (u8)chars[3];
What it does is shift the left argument a specified number of bits to the left, adding zeros from the right as stuffing. For example, 2 << 1 is 4, since 2 is 10 in binary and shifting one to the left gives 100, which is 4.
This can be more written in a more general loop form:
unsigned num = 0;
for (int i = 0; i != 4; ++i) {
num |= (u8)chars[i] << (24 - i * 8); // += could have also been used
}
The endianness of your system doesn't matter here; you know the endianness of the representation in the file, which is constant (and therefore portable), so when you read in the bytes you know what to do with them. The internal representation of the integer in your CPU/memory may be different from that of the file, but the logical bitwise manipulation of it in code is independent of your system's endianness; the least significant bits are always at the right, and the most at the left (in code). That's why shifting is cross-platform -- it operates at the logical bit level :-)
Have you thought of using Boost.Spirit to make a binary parser? You might hit a bit of a learning curve when you start, but if you want to expand your program later to read floats and structured types, you'll have an excellent base to start from.
Spirit is very well-documented and is part of Boost. Once you get around to understanding its ins and outs, it's really mind-boggling what you can do with it, so if you have a bit of time to play around with it, I'd really recommend taking a look.
Otherwise, if you want your binary to be "portable" - i.e. you want to be able to read it on a big-endian and a little-endian machine, you'll need some sort of byte-order mark (BOM). That would be the first thing you'd read, after which you can simply read your integers byte by byte. Simplest thing would probably be to read them into a union (if you know the size of the integer you're going to read), like this:
union U
{
unsigned char uc_[4];
unsigned long ui_;
};
read the data into the uc_ member, swap the bytes around if you need to change endianness and read the value from the ui_ member. There's no shifting etc. to be done - except for the swapping if you want to change endianness..
HTH
rlc

Can someone explain this "endian-ness" function for me?

Write a program to determine whether a computer is big-endian or little-endian.
bool endianness() {
int i = 1;
char *ptr;
ptr = (char*) &i;
return (*ptr);
}
So I have the above function. I don't really get it. ptr = (char*) &i, which I think means a pointer to a character at address of where i is sitting, so if an int is 4 bytes, say ABCD, are we talking about A or D when you call char* on that? and why?
Would some one please explain this in more detail? Thanks.
So specifically, ptr = (char*) &i; when you cast it to char*, what part of &i do I get?
If you have a little-endian architecture, i will look like this in memory (in hex):
01 00 00 00
^
If you have a big-endian architecture, i will look like this in memory (in hex):
00 00 00 01
^
The cast to char* gives you a pointer to the first byte of the int (to which I have pointed with a ^), so the value pointed to by the char* will be 01 if you are on a little-endian architecture and 00 if you are on a big-endian architecture.
When you return that value, 0 is converted to false and 1 is converted to true. So, if you have a little-endian architecture, this function will return true and if you have a big-endian architecture, it will return false.
If ptr points to byte A or D depends on the endianness of the machine. ptr points to that byte of the integer that is at the lowest address (the other bytes would be at ptr+1,...).
On a big-endian machine the most significant byte of the integer (which is 0x00) will be stored at this lowest address, so the function will return zero.
On a litte-endian machine it is the opposite, the least significant byte of the integer (0x01) will be stored at the lowest address, so the function will return one in this case.
This is using type punning to access an integer as an array of characters. If the machine is big endian, this will be the major byte, and will have a value of zero, but if the machine is little endian, it will be the minor byte, which will have a value of one. (Instead of accessing i as a single integer, the same memory is accessed as an array of four chars).
Whether *((char*)&i) is byte A or byte D gets to the heart of endianness. On a little endian system, the integer 0x41424344 will be laid out in memory as: 0x44 43 42 41 (least significant byte first; in ASCII, this is "DCBA"). On a big endian system, it will be laid out as: 0x41 42 43 44. A pointer to this integer will hold the address of the first byte. Considering the pointer as an integer pointer, and you get the whole integer. Consider the pointer as a char pointer, and you get the first byte, since that's the size of a char.
Assume int is 4 bytes (in C it may not be). This assumption is just to simplify the example...
You can look at each of these 4 bytes individually.
char is a byte, so it's looking at the first byte of a 4 byte buffer.
If the first byte is non 0 then that tells you if the lowest bit is contained in the first byte.
I randomly chose the number 42 to avoid confusion of any special meaning in the value 1.
int num = 42;
if(*(char *)&num == 42)
{
printf("\nLittle-Endian\n");
}
else
{
printf("Big-Endian\n");
}
Breakdown:
int num = 42;
//memory of the 4 bytes is either: (where each byte is 0 to 255)
//1) 0 0 0 42
//2) 42 0 0 0
char*p = &num;/*Cast the int pointer to a char pointer, pointing to the first byte*/
bool firstByteOf4Is42 = *p == 42;/*Checks to make sure the first byte is 1.*/
//Advance to the 2nd byte
++p;
assert(*p == 0);
//Advance to the 3rd byte
++p;
assert(*p == 0);
//Advance to the 4th byte
++p;
bool lastByteOf4Is42 = *p == 42;
assert(firstByteOf4Is42 == !lastByteOf4Is42);
If firstByteOf4Is42 is true you have little-endian. If lastByteOf4Is42 is true then you have big-endian.
Sure, let's take a look:
bool endianness() {
int i = 1; //This is 0x1:
char *ptr;
ptr = (char*) &i; //pointer to 0001
return (*ptr);
}
If the machine is Little endian, then data will be in *ptr will be 0000 0001.
If the machine is Big Endian, then data will be inverted, that is, i will be
i = 0000 0000 0000 0001 0000 0000 0000 0000
So *ptr will hold 0x0
Finally, the return *ptr is equivalent to
if (*ptr = 0x1 ) //little endian
else //big endian