Char array representation in memory - c++

I was trying to perform bitwise operations on a char array, as if it were an int, essentially treating bytes like a contiguous area in memory. The code below illustrates my problem.
char *cstr = new char[5];
std::strcpy(cstr, "abcd");
int *p = (int *)(void *)cstr;
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n";
std::cout << sizeof(*p) << "\n";
(*p)++;
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n";
The following output is produced:
1684234849 0x55f046e7de70
abcd 0x55f046e7de70
4
1684234850 0x55f046e7de70
bbcd 0x55f046e7de70
Quick explanation of the code and how it works (to my understanding):
I initialize cstr with "abcd"
char *cstr = new char[5];
std::strcpy(cstr, "abcd");
I point p to the address of cstr and specify that I want it to be an int
int *p = (int *)(void *)cstr;
I test that p is pointing where it should and that it occupies 4 bytes
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n";
std::cout << sizeof(*p) << "\n";
I then increment the integer at the address p is pointing to
(*p)++;
So now, since "abcd" is a contiguous block of 32 bits in memory, incrementing by 1 should produce "abce". Instead, the code increments the integer successfully, but leaves the char array as "bbce". This last part checks the new values of the integer and cstr
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n"
Is this expected behavior?
PS: I compiled the code on a linux machine using this command: g++ main.cpp -o main.
file main
produces the following output: "1: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0"

x86-64 CPUs (like yours) store the least significant byte of multi-byte integers at the lowest memory address. So incrementing the integer that "abcd" corresponds to results in incrementing its least-significant byte, which is stored first in memory. This converted the "a" character into a "b". How code like this behaves is very dependent on how the CPU encodes integers and strings and your expectations of what this code will do have to take those details into account.
To expect the string "abce", you have to make lots of assumptions:
You have to expect integers to occupy 4 bytes.
You have to expect the least significant byte to be stored last.
You have to expect the encoding of the character "e" to be one more than the encoding of the character "d".
You have to expect that incrementing a "d" to an "e" won't overflow when viewed as a signed integer increment.
Some of these are reasonable assumptions and some of them aren't, but unless you have reasonable grounds for all these assumptions, your expectation isn't justified.
Is this expected behavior?
It is what people familiar with your platform would expect. But generally it's easy to avoid relying on these kinds of assumptions and so the best advice is not to rely on them. Assumption 3 is often unavoidable and reasonable on all modern platforms.

Related

Why casting in C++ prints an unexpected result?

I know basics of casting in C++—or I thought I knew.
Yesterday, I was trying to convert an 8-element uint_8 type array to a 2-element uint_32 type array.
I cast my values to 32bit format and while trying to display them into 32bit format, the computer gives me their address not their values.. You can see where I became confused about this code from comment part.
int main()
{
uint8_t info[8];
info[0] = '2';
info[1] = '0';
info[2] = '2';
info[3] = '0';
info[4] = '0';
info[5] = '0';
info[6] = '0';
info[7] = '0';
uint32_t *divided = (uint32_t*)&info[0];
uint32_t *dividedTwo = (uint32_t*)&info[4];
std::cout << "Address of info " << &info << std::endl; //Output 0x7ffdd25cabe0 as expected.
std::cout << "Expected value of info " << (uint8_t*)info<< std::endl; //Output 20200000 as expected.
std::cout << "Expected value of divided " << (uint32_t*)divided << std::endl; //Output 0x7ffdd25cabe0 not as my expected. What is the reason?
std::cout << "But why this return the true value? " << (uint8_t*)divided << std::endl; //Output 20200000 but why 8bit returns the true value instead of 32bit casting?
std::cout << "Same here, my value was type of 32... " << (uint8_t*)dividedTwo << std::endl; //Output 0000
}
Many of your print statements cause undefined behaviour or don't make sense.
std::cout << "Address of info " << &info << std::endl; //Output 0x7ffdd25cabe0 as expected.
This line prints the address of your array. As you see, you get some pointer value out.
std::cout << "Expected value of info " << (uint8_t*)info<< std::endl; //Output 20200000 as expected.
This line prints the contents of your array as a C string, and causes undefined behaviour since your string is not null terminated.
std::cout << "Expected value of divided " << (uint32_t*)divided << std::endl; //Output 0x7ffdd25cabe0 not as my expected. What is the reason?
This line prints the same pointer as in #1, just with a different type.
std::cout << "But why this return the true value? " << (uint8_t*)divided << std::endl; //Output 20200000 but why 8bit returns the true value instead of 32bit casting?
Same pointer, same string as in #2. Same undefined behaviour.
std::cout << "Same here, my value was type of 32... " << (uint8_t*)dividedTwo << std::endl; //Output 0000
This line prints the second half of your string - same undefined behaviour.
What you actually mean to print is likely something along these lines (in C to take the implicit type handling of iostream out of the example):
#include <inttypes.h>
printf("%p\n", (void *)info); // address of the array
printf("%" PRIx64 "\n", *(uint64_t *)info); // entire 64-bit value
printf("%" PRIx32 "\n", *divided); // first 32 bits of the 64-bit array
printf("%" PRIx32 "\n", *dividedTwo); // second 32 bits of the 64-bit array
Note that you are filling your array with char literals (e.g. '2'), not integer literals - you may want to fix that to make the output clearer for yourself.
Watch out for potential alignment problems with this type of casting - it's not (strictly speaking) legit.
std::cout << "Expected value of info " << (uint8_t*)info<< std::endl; //Output 20200000 as expected.
uint8_t is not only an integer type, but also a character type. It is an alias of unsigned char. When you insert a pointer to character type into a character stream, the behaviour is to treat it as a null terminated character string. The null termination is a pre-coondition and lack of null termination results in undefined behaviour.
Your array is not null terminated. Therefore the behaviour of the program is undefined.
std::cout << "But why this return the true value? " << (uint8_t*)divided << std::endl; //Output 20200000 but why 8bit returns the true value instead of 32bit casting?
std::cout << "Same here, my value was type of 32... " << (uint8_t*)dividedTwo << std::endl; //Output 0000
Both of these are the same. Attempts to print a non-null terminated strings resulting in undefined behaviour.
std::cout << "Expected value of divided " << (uint32_t*)divided << std::endl; //Output 0x7ffdd25cabe0 not as my expected. What is the reason?
uint32_t is not a character type. Pointers to types other than character types are treated differently. Instead of printing a null terminated character string, the address of the pointed object is printed. In this case the address happens to be 0x7ffdd25cabe0. It is unclear what you expected instead.
Note that attempting to access the pointed object through the reinterpreted divided and dividedTwo pointers would result in undefined behaviour because no object of such type exist at the pointed address.
Is there any better solution to convert an 8-element uint_8 type array to a 2-element uint_32 type array instead of "shifting (<< >> etc.)"?
Shifting is usually the best way because it can be used to produce the same output regardless of the byte endianness of the CPU, and is therefore portable and can be used for communication between separate systems over the network or transfer of files.
Other, non-shifting ways to convert produce output depending on the endianness, so they cannot be used for example in communication between different systems. But, here is a correct example of how to do that:
uint8_t info8 [8] = ...;
uint32_t info32[sizeof info8 / sizeof(uint32_t)];
std::memcpy(info32, info8, sizeof info32);

What is the relationship between an array and its address?

The following code:
#include<iostream>
int main (void) {
int lista[5] = {0,1,2,3,4};
std::cout << lista << std::endl;
std::cout << &lista << std::endl;
std::cout << lista+1 << std::endl;
std::cout << &lista+1 << std::endl;
std::cout << lista+2 << std::endl;
std::cout << &lista+2 << std::endl;
std::cout << lista+3 << std::endl;
std::cout << &lista+3 << std::endl;
return (0);
}
Outputs:
0x22ff20
0x22ff20
0x22ff24
0x22ff34
0x22ff28
0x22ff48
0x22ff2c
0x22ff5c
I understood that an array is another form to express a pointer, but we cannot change its address to point anywhere else after declaration. I also understood that an array has its value as the first position in memory. Therefore, 0x22ff20 in this example is the location of the array's starting position and the first variable is stored there.
What I did not understand is: why the other variables are not stored in sequence with the array address? I mean, why lista+1 is different from &lista+1. Should not they be the same?
In pointer arithmetic, types matter.
It's true that the value is the same for both lista and &lista, their types are different: lista (in the expression used in cout call) has type int* whereas &lista has type int (*)[5].
So when you add 1 to lista, it points to the "next" int. But &lista + 1 points to the location after 5 int's (which may not be a valid).
Answering the question as asked:
std::cout << &lista+1 << std::endl;
In this code you take the address of array lista and add 1 to obtained answer. Given the sizeof of the array is sizeof(int) * 5, which means when you increment a pointer to it by 1 you add sizeof(int) * 5 to the pointer address, you end up with a number you see.

Converting char* to int after using strdup()

Why after using strdup(value) (int)value returns you different output than before?
How to get the same output?
My short example went bad, please use the long one:
Here the full code for tests:
#include <stdio.h>
#include <iostream>
int main()
{
//The First Part
char *c = "ARD-642564";
char *ca = "ARD-642564";
std::cout << c << std::endl;
std::cout << ca << std::endl;
//c and ca are equal
std::cout << (int)c << std::endl;
std::cout << (int)ca << std::endl;
//The Second Part
c = strdup("ARD-642564");
ca = strdup("ARD-642564");
std::cout << c << std::endl;
std::cout << ca << std::endl;
//c and ca are NOT equal Why?
std::cout << (int)c << std::endl;
std::cout << (int)ca << std::endl;
int x;
std::cin >> x;
}
Because an array decays to a pointer in your case, you are printing a pointer (ie, on non-exotic computers, a memory address). There is no guarantee that a pointer fits in an int.
In the first part of your code, c and ca don't have to be equal. Your compiler performs a sort of memory optimization (see here for a full answer).
In the second part, strdup allocates dynamically a string twice, such that the returned pointers are not equal. The compiler does not optimize these calls because he does not seem to control the definition of strdup.
In both cases, c and ca may not be equal.
"The strdup() function shall return a pointer to a new string, which is a duplicate of the string pointed to by s1." source
So it's quite understandable that the pointers differ.

Weird Pointer Address for Individual Struct Data Member

I observe some weird behavior today , the code is as follow :
The Code :
#include <iostream>
struct text
{
char c;
};
int main(void)
{
text experim = {'b'};
char * Cptr = &(experim.c);
std::cout << "The Value \t: " << *Cptr << std::endl ;
std::cout << "The Address \t: " << Cptr << std::endl ; //Print weird stuff
std::cout << "\n\n";
*Cptr = 'z'; //Attempt to change the value
std::cout << "The New Value \t: " << *Cptr <<std::endl ;
std::cout << "The Address \t: " << Cptr << std::endl ; //Weird address again
return 0;
}
The Question :
1.) The only question I have is why cout theAddress for the above code would come out some weird value ?
2.)Why I can still change the value of the member c by dereferenncing the pointer which has weird address ?
Thank you.
Consider fixing the code like this:
std::cout << "The Address \t: " << (void *)Cptr << std::endl ;
There's a std::ostream& operator<< (std::ostream& out, const char* s ); that takes a char* so you have to cast to void* to print an address, not a string it "points" to
I think the "weird" stuff shows up because cout thinks it's a cstring, i.e. a 0-terminated character array, so it doesn't print the address as you expected. And since your "string" isn't 0-terminated, all it can do is walk the memory until it encounters a 0. To sum it up, you're not actually printing the address.
Why I can still change the value of the member c by dereferenncing the
pointer which has weird address
The address isn't weird, as explained above. In your code Cptr points to a valid memory location and you can do pretty much anything you want with it.

C++ free() changing other memory

I started noticing that sometimes when deallocating memory in some of my programs, they would inexplicably crash. I began narrowing down the culprit and have come up with an example that illustrates a case that I am having difficulty understanding:
#include <iostream>
#include <stdlib.h>
using namespace std;
int main() {
char *tmp = (char*)malloc(16);
char *tmp2 = (char*)malloc(16);
long address = reinterpret_cast<long>(tmp);
long address2 = reinterpret_cast<long>(tmp2);
cout << "tmp = " << address << "\n";
cout << "tmp2 = " << address2 << "\n";
memset(tmp, 1, 16);
memset(tmp2, 1, 16);
char startBytes[4] = {0};
char endBytes[4] = {0};
memcpy(startBytes, tmp - 4, 4);
memcpy(endBytes, tmp + 16, 4);
cout << "Start: " << static_cast<int>(startBytes[0]) << " " << static_cast<int>(startBytes[1]) << " " << static_cast<int>(startBytes[2]) << " " << static_cast<int>(startBytes[3]) << "\n";
cout << "End: " << static_cast<int>(endBytes[0]) << " " << static_cast<int>(endBytes[1]) << " " << static_cast<int>(endBytes[2]) << " " << static_cast<int>(endBytes[3]) << "\n";
cout << "---------------\n";
free(tmp);
memcpy(startBytes, tmp - 4, 4);
memcpy(endBytes, tmp + 16, 4);
cout << "Start: " << static_cast<int>(startBytes[0]) << " " << static_cast<int>(startBytes[1]) << " " << static_cast<int>(startBytes[2]) << " " << static_cast<int>(startBytes[3]) << "\n";
cout << "End: " << static_cast<int>(endBytes[0]) << " " << static_cast<int>(endBytes[1]) << " " << static_cast<int>(endBytes[2]) << " " << static_cast<int>(endBytes[3]) << "\n";
free(tmp2);
return 0;
}
Here is the output that I am seeing:
tmp = 8795380
tmp2 = 8795400
Start: 16 0 0 0
End: 16 0 0 0
---------------
Start: 17 0 0 0
End: 18 0 0 0
I am using Borland's free compiler. I am aware that the header bytes that I am looking at are implementation specific, and that things like "reinterpret_cast" are bad practice. The question I am merely looking to find an answer to is: why does the first byte of "End" change from 16 to 18?
The 4 bytes that are considered "end" are 16 bytes after tmp, which are 4 bytes before tmp2. They are tmp2's header - why does a call to free() on tmp affect this place in memory?
I have tried the same example using new [] and delete [] to create/delete tmp and tmp2 and the same results occur.
Any information or help in understanding why this particular place in memory is being affected would be much appreciated.
You will have to ask your libc implementation why it changes. In any case, why does it matter? This is a memory area that libc has not allocated to you, and may be using to maintain its own data structures or consistency checks, or may not be using at all.
Basically you are looking at memory you didn't allocate. You can't make any supposition on what happens to the memory outside what you requested (ie the 16 bytes you allocated). There is nothing abnormal going on.
The runtime and compilers are free to do whatever they want to do with them so you should not use them in your programs. The runtime probably change the values of those bytes to keep track of its internal state.
Deallocating memory is very unlikely to crash a program. On the other hand, accessing memory you have deallocated like in your sample is big programming mistake that is likely to do so.
A good way to avoid this is to set any pointers you free to NULL. Doing so you'll force your program to crash when accessing freed variables.
It's possible that the act of removing an allocated element from the heap modifies other heap nodes, or that the implementation reserves one or more bytes of headers for use as guard bytes from previous allocations.
The memory manager must remember for example what is the size of the memory block that has been allocated with malloc. There are different ways, but probably the simplest one is to just allocate 4 bytes more than the size requested in the call and store the size value just before the pointer returned to the caller.
The implementation of free can then subtract 4 bytes from the passed pointer to get a pointer to where the size has been stored and then can link the block (for example) to a list of free reusable blocks of that size (may be using again those 4 bytes to store the link to next block).
You are not supposed to change or even look at bytes before/after the area you have allocated. The result of accessing, even just for reading, memory that you didn't allocate is Undefined Behavior (and yes, you really can get a program to really crash or behave crazily just because of reading memory that wasn't allocated).