Understanding how malloc() is used in this context - c++

I'm not sure if this is a good question by community standards (let me know if there is a better way or place for this question please).
I'm working on understanding a piece of code which I've come across while trying to learn C++. Code is as follows:
MessageHdr *msg;
size_t msgsize = sizeof(MessageHdr) + sizeof(joinaddr->addr) + sizeof(long) + 1;
msg = (MessageHdr *) malloc(msgsize * sizeof(char));
// create JOINREQ message: format of data is {struct Address myaddr}
msg->msgType = JOINREQ;
memcpy((char *)(msg+1), &memberNode->addr.addr, sizeof(memberNode->addr.addr));
memcpy((char *)(msg+1) + 1 + sizeof(memberNode->addr.addr), &memberNode->heartbeat, sizeof(long));
emulNet->ENsend(&memberNode->addr, joinaddr, (char *)msg, msgsize);
What is the point of casting to MessageHdr * in line 3?
I feel like we're building a char[]. We're just using MessageHdr* to refer (to point) to it but I am not sure why? Wouldn't a char* be a better choice?
Receiving code is as follows (shortened):
int EmulNet::ENsend(Address *myaddr, Address *toaddr, char *data, int size) {
en_msg *em;
...
em = (en_msg *)malloc(sizeof(en_msg) + size);
em->size = size;
memcpy(&(em->from.addr), &(myaddr->addr), sizeof(em->from.addr));
memcpy(&(em->to.addr), &(toaddr->addr), sizeof(em->from.addr));
memcpy(em + 1, data, size);
...
I'm beyond confused at this point - sorry for the vague question. Is this idiomatic C++? I feel as if this could have been done in much cleaner ways instead of passing around a char[] and referencing it via pointers of random struct types.
I guess what I'm ultimately trying to ask is, while I kind of understand the code, it feels very unnatural. Is this a valid/common approach of doing things?
EDIT
MessageHdr is a struct as follows:
typedef struct MessageHdr {
enum MsgTypes msgType;
}MessageHdr;
joinaddr is a class intances:
class Address {
public:
char addr[6];
Address() {}
// Copy constructor
Address(const Address &anotherAddress);
// Overloaded = operator
Address& operator =(const Address &anotherAddress);
bool operator ==(const Address &anotherAddress);
Address(string address) {
size_t pos = address.find(":");
int id = stoi(address.substr(0, pos));
short port = (short)stoi(address.substr(pos + 1, address.size()-pos-1));
memcpy(&addr[0], &id, sizeof(int));
memcpy(&addr[4], &port, sizeof(short));
}
string getAddress() {
int id = 0;
short port;
memcpy(&id, &addr[0], sizeof(int));
memcpy(&port, &addr[4], sizeof(short));
return to_string(id) + ":" + to_string(port);
}
void init() {
memset(&addr, 0, sizeof(addr));
}
};

The code is really confusing. I'll try to explain the first part as I understand it. The intention definitely was to create a (structured) char buffer to send it over. This was probably initially created in c, or by a c programmer.
MessageHdr *msg;
this calculates size of the resulting send buffer
size_t msgsize = sizeof(MessageHdr) + sizeof(joinaddr->addr) + sizeof(long) + 1;
allocates the buffer. The cast is needed to allow c++ to compile it, otherwise it will error-out.
msg = (MessageHdr *) malloc(msgsize * sizeof(char));
This is used to set up a field in the buffer. Since it is of Type MessageHdr, it writes the value in the correct place of the buffer
// create JOINREQ message: format of data is {struct Address myaddr}
msg->msgType = JOINREQ;
These commands use pointer arithmetic with (MessageHdr) type to write data in the buffer beyond the MessagHdr itself. msg + 1 will skip the size of the MessageHdf in the char* buffer.
memcpy((char *)(msg+1), &memberNode->addr.addr, sizeof(memberNode->addr.addr));
memcpy((char *)(msg+1) + 1 + sizeof(memberNode->addr.addr), &memberNode->heartbeat, sizeof(long));
this will send the buffer by casting it to char* first, as a simple set of bytes.
emulNet->ENsend(&memberNode->addr, joinaddr, (char *)msg, msgsize);
The receiving code seems to add yet address header to the data to send it further (tcp-ip like)
This allocates another buffer with the size of the en_msg header + size of the data.
em = (en_msg *)malloc(sizeof(en_msg) + size);
em->size = size; // keeps data size in the en_msg struct
fills out address fields in the en_msg part of the buffer
memcpy(&(em->from.addr), &(myaddr->addr), sizeof(em->from.addr));
memcpy(&(em->to.addr), &(toaddr->addr), sizeof(em->from.addr));
and this copies the data in the buffer starting just beyond the en_msg header
memcpy(em + 1, data, size);
.

You didnt give details about MessageHdr, Address and en_msg. But some of them might be structs, and not simple types.
For the first question:
malloc returns a void*, but in line 3 the allocated memory is assigned to a pointer of type MessageHdr*, thus the return value of malloc needs to be casted to the correct type.
It is quite common to use structs in this way, as it provides a simple way of dealing with lets say multiple variables of different type, which shall belong together (e. g. Address could be a struct with some int variable for the port, and a char[] for the hostname).
Example:
struct Data
{
int something;
char somethingElse[10];
};
void* foo = malloc(100); // allocate 100 bytes of memory
void* bar = malloc(sizeof(struct Data)); // allocate a piece of memory with the size of struct Data
Data* data = (Data*)bar; // use the piece of memory
data->something = 10; // as Data struct
strcpy(data->something, "bla");
Note that of course you could use the piece of allocated memory in any way you want. E. g. in the above you could just do memcpy(foo, someSource, 100) to copy 100 bytes into the allocated buffer.
In C++ you would use the new operator, which works slightly different. In addition to allocating memory for a given class, it would also call the classes constructor.
For question 2:
Again you didn't give details about MessageHdr. In case it is not a struct, but only a typedef to e. g. char[10], you are right in that you could just use char[10] instead.
However, imagine throughout your program or library you need to deal with "MessageHdr" (Message-Header?) over and over again, and every time it is a char array with the length 10. Using a typedef you gain the benefit of:
Having a named type, which others might instantly recognize and understand what it does.
The possibility to easily change the size of the char array in case at some later point it needs to change.

This code is invalid C++ code. The pointer is cast to `(MessageHDR*) in order that the compiler does not complain about this code:
msg->msgtype=JOINREQ
But this code is undefined behaviour if MessageHDR has vacuous initialization(see below): this is invalid C++ code (access object member out of its life-time period). n.m. in is comment propose you to read a book, this is the best solution, and if you read code, it would be better to read well written C++ code: stdlibc++, libc++ for exemple.
According to the c++ standard [basic.life]/1
The lifetime of an object or reference is a runtime property of the object or reference. An object is said to have
non-vacuous initialization if it is of a class or aggregate type and it or one of its subobjects is initialized by a
constructor other than a trivial default constructor. [ Note: Initialization by a trivial copy/move constructor
is non-vacuous initialization. — end note ] The lifetime of an object of type T begins when:
(1.1) — storage with the proper alignment and size for type T is obtained, and
(1.2) — if the object has non-vacuous initialization, its initialization is complete
So if MessageHDR has a non-vacuous initialization (which is the point of using C++) then [basic.life]/6
Before the lifetime of an object has started but after the storage which the object will occupy has been
allocated or, after the lifetime of an object has ended and before the storage which the object occupied is
reused or released, any pointer that represents the address of the storage location where the object will be or
was located may be used but only in limited ways. For an object under construction or destruction, see 15.7.
Otherwise, such a pointer refers to allocated storage (6.7.4.2), and using the pointer as if the pointer were of
type void*, is well-defined. Indirection through such a pointer is permitted but the resulting lvalue may only
be used in limited ways, as described below. The program has undefined behavior if:[...]
(6.2) — the pointer is used to access a non-static data member or call a non-static member function of the
object, o

Related

Why can't we use a void* to operate on the object it addresses

I am learning C++ using C++ Primer 5th edition. In particular, i read about void*. There it is written that:
We cannot use a void* to operate on the object it addresses—we don’t know that object’s type, and the type determines what operations we can perform on that object.
void*: Pointer type that can point to any nonconst type. Such pointers may not
be dereferenced.
My question is that if we're not allowed to use a void* to operate on the object it addressess then why do we need a void*. Also, i am not sure if the above quoted statement from C++ Primer is technically correct because i am not able to understand what it is conveying. Maybe some examples can help me understand what the author meant when he said that "we cannot use a void* to operate on the object it addresses". So can someone please provide some example to clarify what the author meant and whether he is correct or incorrect in saying the above statement.
My question is that if we're not allowed to use a void* to operate on the object it addressess then why do we need a void*
It's indeed quite rare to need void* in C++. It's more common in C.
But where it's useful is type-erasure. For example, try to store an object of any type in a variable, determining the type at runtime. You'll find that hiding the type becomes essential to achieve that task.
What you may be missing is that it is possible to convert the void* back to the typed pointer afterwards (or in special cases, you can reinterpret as another pointer type), which allows you to operate on the object.
Maybe some examples can help me understand what the author meant when he said that "we cannot use a void* to operate on the object it addresses"
Example:
int i;
int* int_ptr = &i;
void* void_ptr = &i;
*int_ptr = 42; // OK
*void_ptr = 42; // ill-formed
As the example demonstrates, we cannot modify the pointed int object through the pointer to void.
so since a void* has no size(as written in the answer by PMF)
Their answer is misleading or you've misunderstood. The pointer has a size. But since there is no information about the type of the pointed object, the size of the pointed object is unknown. In a way, that's part of why it can point to an object of any size.
so how can a int* on the right hand side be implicitly converted to a void*
All pointers to objects can implicitly be converted to void* because the language rules say so.
Yes, the author is right.
A pointer of type void* cannot be dereferenced, because it has no size1. The compiler would not know how much data he needs to get from that address if you try to access it:
void* myData = std::malloc(1000); // Allocate some memory (note that the return type of malloc() is void*)
int value = *myData; // Error, can't dereference
int field = myData->myField; // Error, a void pointer obviously has no fields
The first example fails because the compiler doesn't know how much data to get. We need to tell it the size of the data to get:
int value = *(int*)myData; // Now fine, we have casted the pointer to int*
int value = *(char*)myData; // Fine too, but NOT the same as above!
or, to be more in the C++-world:
int value = *static_cast<int*>(myData);
int value = *static_cast<char*>(myData);
The two examples return a different result, because the first gets an integer (32 bit on most systems) from the target address, while the second only gets a single byte and then moves that to a larger variable.
The reason why the use of void* is sometimes still useful is when the type of data doesn't matter much, like when just copying stuff around. Methods such as memset or memcpy take void* parameters, since they don't care about the actual structure of the data (but they need to be given the size explicitly). When working in C++ (as opposed to C) you'll not use these very often, though.
1 "No size" applies to the size of the destination object, not the size of the variable containing the pointer. sizeof(void*) is perfectly valid and returns, the size of a pointer variable. This is always equal to any other pointer size, so sizeof(void*)==sizeof(int*)==sizeof(MyClass*) is always true (for 99% of today's compilers at least). The type of the pointer however defines the size of the element it points to. And that is required for the compiler so he knows how much data he needs to get, or, when used with + or -, how much to add or subtract to get the address of the next or previous elements.
void * is basically a catch-all type. Any pointer type can be implicitly cast to void * without getting any errors. As such, it is mostly used in low level data manipulations, where all that matters is the data that some memory block contains, rather than what the data represents. On the flip side, when you have a void * pointer, it is impossible to determine directly which type it was originally. That's why you can't operate on the object it addresses.
if we try something like
typedef struct foo {
int key;
int value;
} t_foo;
void try_fill_with_zero(void *destination) {
destination->key = 0;
destination->value = 0;
}
int main() {
t_foo *foo_instance = malloc(sizeof(t_foo));
try_fill_with_zero(foo_instance, sizeof(t_foo));
}
we will get a compilation error because it is impossible to determine what type void *destination was, as soon as the address gets into try_fill_with_zero. That's an example of being unable to "use a void* to operate on the object it addresses"
Typically you will see something like this:
typedef struct foo {
int key;
int value;
} t_foo;
void init_with_zero(void *destination, size_t bytes) {
unsigned char *to_fill = (unsigned char *)destination;
for (int i = 0; i < bytes; i++) {
to_fill[i] = 0;
}
}
int main() {
t_foo *foo_instance = malloc(sizeof(t_foo));
int test_int;
init_with_zero(foo_instance, sizeof(t_foo));
init_with_zero(&test_int, sizeof(int));
}
Here we can operate on the memory that we pass to init_with_zero represented as bytes.
You can think of void * as representing missing knowledge about the associated type of the data at this address. You may still cast it to something else and then dereference it, if you know what is behind it. Example:
int n = 5;
void * p = (void *) &n;
At this point, p we have lost the type information for p and thus, the compiler does not know what to do with it. But if you know this p is an address to an integer, then you can use that information:
int * q = (int *) p;
int m = *q;
And m will be equal to n.
void is not a type like any other. There is no object of type void. Hence, there exists no way of operating on such pointers.
This is one of my favourite kind of questions because at first I was also so confused about void pointers.
Like the rest of the Answers above void * refers to a generic type of data.
Being a void pointer you must understand that it only holds the address of some kind of data or object.
No other information about the object itself, at first you are asking yourself why do you even need this if it's only able to hold an address. That's because you can still cast your pointer to a more specific kind of data, and that's the real power.
Making generic functions that works with all kind of data.
And to be more clear let's say you want to implement generic sorting algorithm.
The sorting algorithm has basically 2 steps:
The algorithm itself.
The comparation between the objects.
Here we will also talk about pointer functions.
Let's take for example qsort built in function
void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*))
We see that it takes the next parameters:
base − This is the pointer to the first element of the array to be sorted.
nitems − This is the number of elements in the array pointed by base.
size − This is the size in bytes of each element in the array.
compar − This is the function that compares two elements.
And based on the article that I referenced above we can do something like this:
int values[] = { 88, 56, 100, 2, 25 };
int cmpfunc (const void * a, const void * b) {
return ( *(int*)a - *(int*)b );
}
int main () {
int n;
printf("Before sorting the list is: \n");
for( n = 0 ; n < 5; n++ ) {
printf("%d ", values[n]);
}
qsort(values, 5, sizeof(int), cmpfunc);
printf("\nAfter sorting the list is: \n");
for( n = 0 ; n < 5; n++ ) {
printf("%d ", values[n]);
}
return(0);
}
Where you can define your own custom compare function that can match any kind of data, there can be even a more complex data structure like a class instance of some kind of object you just define. Let's say a Person class, that has a field age and you want to sort all Persons by age.
And that's one example where you can use void * , you can abstract this and create other use cases based on this example.
It is true that is a C example, but I think, being something that appeared in C can make more sense of the real usage of void *. If you can understand what you can do with void * you are good to go.
For C++ you can also check templates, templates can let you achieve a generic type for your functions / objects.

Why is this example using memcpy to convert uint8_t* parameter to a structure?

I was using a TCP library that has an incoming data handler with the following signature:
static void handleData(void *arg, AsyncClient *client, void *data, size_t len)
When I tried to cast the data like the following the access the field values of the structure, the board crashed.
MyStructure* m = (MyStructure*)data;
In an example of an unrelated communication library, I had seen it using memcpy like the following, so I changed the casting code above to memcpy then it worked. But why is the example using memcpy instead of casting?
// Callback when data is received
void OnDataRecv(uint8_t * mac, uint8_t *incomingData, uint8_t len) {
memcpy(&incomingReadings, incomingData, sizeof(incomingReadings));
incomingTemp = incomingReadings.temp;
incomingHum = incomingReadings.hum;
}
The incomingReadings is declared as a global variable, but that variable is only used inside of that function, and only the fields which are copied to other global variables incomingTemp and incomingHum are used elsewhere. What if the example function were like the following, would it crash?
void OnDataRecv(uint8_t * mac, uint8_t *incomingData, uint8_t len) {
struct_message* incoming = (struct_message*)incomingData;
incomingTemp = incoming->temp;
incomingHum = incoming->hum;
}
PS: About the crashing above, I have tested more things to reproduce it with simpler code. It seems that the board does not crash at casting, but at accessing the cast variable.
The structure is as simple as
typedef struct TEST_TYPE
{
unsigned long a;
} TEST_TYPE;
and in the client, I sent a in
TEST_TYPE *a = new TEST_TYPE();
a->a = 1;
. In the server's handleData, I modified the code like below
static void handleData(void *arg, AsyncClient *client, void *data, size_t len)
{
Serial.printf("Data length = %i\n", len);
uint8_t* x = (uint8_t*)data;
for(int i =0; i<len; i++)
{
Serial.printf("%X, ", x[i]);
}
Serial.println("Casting.");
TEST_TYPE* a = (TEST_TYPE*)data;
Serial.println("Printing.");
Serial.printf("\nType = %i\n", a->a);
, and the output was
Data length = 4
1, 0, 0, 0, Casting.
Printing.
--------------- CUT HERE FOR EXCEPTION DECODER ---------------
Exception (9):
epc1=0x40201117 epc2=0x00000000 epc3=0x00000000 excvaddr=0x3fff3992 depc=0x00000000
>>>stack>>>
ctx: sys
sp: 3fffec30 end: 3fffffb0 offset: 0190
PS2: Seems like it indeed is an alignment issue. The exception code is 9 above, and according to this page, 9 means:
LoadStoreAlignmentCause Load or store to an unaligned address
I have found an old answer for a similar case. The author suggested some possible solutions
adding __attribute__((aligned(4))) to the buffer: I think this is not applicable in my case, because I did not create the data parameter.
adding __attribute__((packed)) to the structure: I have modified my structure like the following, and it did not crash this time.
typedef struct TEST_TYPE
{
unsigned long a;
} __attribute__((packed)) TEST_TYPE;
Read it by each one byte and construct the fields manually: This seems too much work.
Without the full picture of the lifetimes of all the data, it's hard to say what's going wrong in your particular case. Some thoughts:
uint8_t *bytes;
...
MyStructure* m = (MyStructure*)bytes;
What the snippet above is doing is using m to interpret the region of memory pointed to by bytes as a MyStructure. It's important to note that m is only valid as long as bytes is valid. When bytes goes out of scope (or freed, etc.), m is no longer valid.
uint8_t *bytes;
MyStructure m;
...
memcpy(&m, bytes, sizeof(MyStructure));
This snippet is copying the data referred to by bytes into m. At this point, m's lifetime is separate from bytes. Note that you could do the same thing with this syntax:
uint8_t *bytes;
MyStructure m;
...
m = *((MyStructure*)bytes)
This snippet is saying "treat bytes as a pointer to a MyStructure, then dereference the pointer and make a copy of it".
As #danadam points out in a comment, memcpy() should be used in the case of alignment issues.
Would it crash? Perhaps.
Essentially you're touching alignment and aliasing here.
The rules are here:
https://en.cppreference.com/w/cpp/language/object#Alignment
Your struct most probably has higher alignment requirements than 1 and therefore it depends on where in the memory the converted bytes are located if it will crash or not. As neither you nor the compiler can be sure of that, the cast is undefined behavior.
The only way your cast from void* to MyStructure* wouldn't be UB is when the void* was casted from a MyStructure* in the first place.
uint8 / char etc. have minimal alignment requirements (only 1 byte) and are therefore valid anywhere in a chunk of memory. That can be used to copy the memory into your correctly aligned object.

Strcpy behavior with stack array c++

Here is my program :
#include <cstring>
const int SIZE =10;
int main()
{
char aName [SIZE]; // creates an array on the stack
std::strcpy(aName, "Mary");
return 0;
}
This program is obviously useless, I am just trying to understand the behavior of the strcpy function.
Here is it's signature :
char * strcpy ( char * destination, const char * source )
so when I do :
std::strcpy(aName, "Mary");
I am passing by value the variable aName. I know that the aName (in the main) contains the address of the array.
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
I am asking this because it is very confusing to me. Whenever I have encountered addresses it usually was to point to a memory allocated on the heap...
Thanks!
Whenever you encounter addresses it doesn't mean it will always point to memory allocated to heap.
You can assign the address of a variable to a pointer like this
int a=5;
int *myPtr= &a;
Now, myPtr is a pointer of type integer which points to the memory of variable which is created on stack which is a have value 5.
So, whenever you create a pointer and assign the (address of) memory using new keyword, it will allocate the memory on heap. So, if I assign the value like this it will be on stack
int *myPtr= new int[5];
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
Yes.
Whenever I have encountered addresses it usually was to point to a memory allocated on the heap...
Yep, usually. But not always.
Pointers to non-dynamically-allocated things are fairly rare in C++, though in C it's more common as that's the only way to have "out arguments" (C does not have references).
strcpy is a function from C's standard library.
Maybe it would help to look at an example implementation of strcpy():
char* strcpy(char* d, const char* s)
{
char* tmp = d;
while (*tmp++ = *s++)
;
return d;
}
That's really all there is to it. Copy characters from the source to the destination until the source character is null (including the null). Return the pointer to the beginning of the destination. Done.
Pointers point to memory. It doesn't matter if that memory is "stack", "heap" or "static".
Function parameters are its local variables.
In this call
std::strcpy(aName, "Mary");
the two arrays (one that is created in main with the automatic storage duration and other is the string literal that has the static storage duration) are implicitly converted to pointers to their first elements.
So you may imagine this call and the function definition the following way
std::strcpy(aName, "Mary");
// …
char * strcpy ( /* char * destination, const char * source */ )
{
char *destination = aName;
const char *source = "Mary";
// …
return destination;
}
Or even like
char *p_to_aName = &aName[0];
const char *p_to_literal = &"Mary"[0];
std::strcpy( p_to_aName, p_to_literal );
// …
char * strcpy ( /* char * destination, const char * source */ )
{
char *destination = p_to_aName;
const char *source = p_to_literal;
// …
return destination;
}
That is within the function its parameters are local variable of pointer types with the automatic storage duration that are initialized by pointers to first characters of the passed character arrays
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
Yes. That is correct. Though I probably wouldn't call it a local variable. It is a parameter. Local variable usually means something like this:
int localVariable;
The word'parameter" is often associated with things like this:
int myFunction(int parameter) {
// use parameter some where...
}
The point is roughly the same though: it creates a variable that will go out of scope once the function exits.
I am asking this because it is very confusing to me. Whenever I have encountered addresses it usually was to point to a memory allocated on the heap...
Yes, this is the most common use case for them. But it isn't their only use. Pointers are addresses, and every variable has an address in memory regardless of whether it is allocated on the "heap" or "stack."
The use here probably because pointers to a char are commonly used to store strings, particularly on older compilers. That combined with the fact that arrays "decay" into pointers, it is probably easier to work with pointers. It is also certainly more backwards compatible to do it this way.
The function could have just as easily used an array, like this:
char * strcpy ( char destination[], const char source[ )
But I'm going to assume it is easier to work with pointers here instead (Note: I don't think you can return an array in C++, so I'm still using char *. However, even if you could, I would imagine it is still easier to work with pointers anyway, so I don't think it makes a lot of difference here.).
Another common use of pointers is using them as a way to sort of "pass by reference":
void foo(int * myX) {
*myX = 4;
}
int main() {
int x = 0;
foo(&x);
std::cout << x; // prints "4"
return 0;
}
However, in modern C++, actually passing by reference is preferred to this:
void foo(int & myX) {
myX = 4;
}
int main() {
int x = 0;
foo(x);
std::cout << x; // prints "4"
return 0;
}
But I bring it up as another example to help drive the point home: memory allocated on the heap isn't the only use of pointers, merely the most common one (though actually dynamically allocated memory has been mostly replaced in modern C++ by things like std::vector, but that is beside the point here).
I know that the aName (in the main) contains the address of the array.
You knew wrong. aName is an array. It contains the elements, not an address.
But when you use the name of the array as a value such as when passing it to strcpy, it is implicitly converted to a pointer to first element of the array (the value of a pointer is the memory address of the pointed object). Such implicit conversion is called decaying.
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
This is correct enough. To clarify: It is a function argument rather than a local variable. But the distinction is not important here. Technically, it is the caller who is responsible for pushing the arguments onto the stack or storing them into registers, so it could be considered that main "creates" the variable.
Whenever I have encountered addresses it usually was to point to a memory allocated on the heap
Pointers are not uniquely associated with "heap". Pretty much any object can be pointed at, whether it has dynamic, static or automatic storage or even if it is a subobject.

const char* getting modified after assigning to char*

int FunctionName(const char *pValueName, const char *pValueData, long iMaxValueSize)
{
char *pDataToStore = const_cast<char *>(pValueData);
int iActualSiz = ProcessData(pDataToStore, iMaxValueSize);
...
...
}
In the upper code snippet ProcessData() function modifies the char*, which it receives as parameter. Now even after assigning pValueData into pDataToStore, after ProcessData() get executed, value of pValueData is being same as pDataToStore.
My aim is to keep intact value of pValueData which is being passed as const char*
My aim is to keep intact value of pValueData which is being passed as
const char*
That's impossible. Passing via const means it cannot be modified, except when it was originally not constant.
Example:
char *ptr1 = new char[100]; // not const
char *ptr2 = new char[100]; // not const
int i = FunctionName(ptr1, ptr2, 123);
In this case, you could technically keep the const_cast. But what for? Just change your function parameters to take char *:
int FunctionName(char *pValueName, char *pValueData, long iMaxValueSize)
{
int iActualSiz = ProcessData(pValueData, iMaxValueSize);
// ...
}
However, you most likely want to be able to pass constant strings. For example string literals:
int i = FunctionName("name", "data", 123);
String literals are unmodifiable and thus require your function to take char const *. A later attempt to modify them causes undefined behaviour.
As you can see, the error is in the general architecture and code logic. You want to modify something and at the same time you do not want to allow to modify it.
The question is: What happens with your pDataToStore when ProcessData is done with it? Does the caller of FunctionName need to be aware of the modifications? Or is it just internal business of FunctionName?
If it's just internal business of FunctionName, then you can keep its signature intact and have ProcessData modify a copy of the passed data. Here is a simplified (not exception-safe, no error checks) example:
int FunctionName(const char *pValueName, const char *pValueData, long iMaxValueSize)
{
char *copy = new char[strlen(pValueData) + 1];
strcpy(copy, pValueData):
int iActualSiz = ProcessData(copy, iMaxValueSize);
// ...
delete[] copy;
}
The nice thing is that you can now massively improve the interface of FunctionName by hiding all the low-level pointer business. In fact, why use so many pointers at all when C++ standard classes can do all the work for you?
int FunctionName(std::string const &valueName, std::string const &valueData, long maxValueSize)
{
std::vector<char> copy(valueData.begin(), valueData.end());
int actualSize = ProcessData(&copy[0], maxValueSize);
// ...
// no more delete[] needed here
}
The std::vector<char> automatically allocates enough memory to hold a copy of valueData, and performs the copy. It fully automatically frees the memory when it is no longer needed, even if exceptions are thrown. And &copy[0] (which in C++11 can be written as copy.data()) is guaranteed to yield a pointer to the internally used data, so that low-level C functions can modify the vector's elements.
(I've also taken the chance to remove the Microsoft-style Hungarian Notation. It's a failed experiment from the 90s, and you've even used it incorrectly, supposing that a leading i is supposed to indicate an int.)
The bottom line is really:
If you need a const_cast anywhere in your code to make it compile, then somewhere else there is at least either one const missing or one too much. A const_cast always makes up for a mistake in another piece of code. It is always a workaround and never a solution designed up front.
Well I have solved the issue by creating the heap memory.
char *pDataToStore = new char[iMaxValueSize];
memcpy(pDataToStore, pValueData, iMaxValueSize*sizeof(char));
int iActualSiz = ProcessData(pDataToStore, iMaxValueSize);
...
....
delete []pDataToStore;
You have to make a difference between a const qualified type and a const qualified object.
The standard states in section 7.1.6.1: cv-qualifiers: (cv = const or volatile)
A pointer or reference to a cv-qualified type need not actually point
or refer to a cv-qualified object, but it is treated as if it does; a
const-qualified access path cannot be used to modify an object even if
the object referenced is a non-const object and can be modified
through some other access path.
If your pointer points to a non const object, the casting away will enable you to modifiy the objet, but as someone told, you are lying to the user of your function.
It your pointer points to a real const object (i.e. in const protected memory), the compiler will compile your code, but you might have a segmentation fault, typical for undefined behaviour.
Here an example, using the fact that "Ordinary string literal (...) has type “array of n const char”, where n is the size of the string (...)" (see standard, section 2.14.5):
char *my_realconst = "This is a real constant string"; // pointer does not claim that it points to const object
(*my_realconst)++; // Try to increment the first letter, will compile but will not run properly !!
So if your function ProcessData() is legacy code that is only reading the data but has forgotten to mention a const in the parameter list, your cast-away will work. If your function is however altering the data, it might work or it might fail, depending how the data poitned to was created !
So try to avoid casting const away if you are not 100% sure of what the effects will be ! Better clone your object the hard way creating a temporary object and copying the content.
I propose you a small template to handle these kind of issues easily:
template <typename T>
class Buffer {
size_t sz; // size
T* addr; // pointed
public:
Buffer(const T*source, size_t l) : sz(l), addr(new T[l]) { std::copy(source, source + l, addr); } // allocate and copy
~Buffer() { delete[]addr; } // destroy memory
operator T* () { return addr; } // convert to pointer
};
You may use your existing code almost as is:
Buffer<char> pDataToStore(pValueData, iMaxValueSize); // create the automatic buffer
int iActualSiz = ProcessData(pDataToStore, iMaxValueSize); // automatic use of pointer to buffer
cout << "modified copy: " << pDataToStore << endl;
cout << "original: " << pValueData << endl;
The buffer will be automatically released once pDataToStore is no longer in scope.
If you have similar issues with wchar_t buffers or anything else, it will work as well.
For explanations on the evil of casting away const, see my other answer

Deallocate structure using pointer arithmetics and a pointer to an element of that structure

I have the following structure in C++ :
struct wrapper
{
// Param constructor
wrapper(unsigned int _id, const char* _string1, unsigned int _year,
unsigned int _value, unsigned int _usage, const char* _string2)
:
id(_id), year(_year), value(_value), usage(_usage)
{
int len = strlen(_string1);
string1 = new char[len + 1]();
strncpy(string1, _string1, len);
len = strlen(_string2);
string2 = new char[len + 1]();
strncpy(string2, _string2, len);
};
// Destructor
~wrapper()
{
if(string1 != NULL)
delete [] string1;
if(string2 != NULL)
delete [] string2;
}
// Elements
unsigned int id;
unsigned int year;
unsigned int value;
unsigned int usage;
char* string1;
char* string2;
};
In main.cpp let's say I allocate memory for one object of this structure :
wrapper* testObj = new wrapper(125600, "Hello", 2013, 300, 0, "bye bye");
Can I now delete the entire object using pointer arithmetic and a pointer that points to one of the structure elements ?
Something like this :
void* ptr = &(testObj->string2);
ptr -= 0x14;
delete (wrapper*)ptr;
I've tested myself and apparently it works but I'm not 100% sure that is equivalent to delete testObj.
Thanks.
Technically, the code like this would work (ignoring the fact that wrapper testObj should be wrapper* testObj and that the offset is not necessarily 0x14, e.g. debug builds sometimes pad the structures, and maybe some other detail I missed), but it is a horrible, horrible idea. I can't stress hard enough how horrible it is.
Instead of 0x14 you could use offsetof macro.
If you like spending nights in the company of the debugger, sure, feel free to do so.
I will assume that the reason for the question is sheer curiosity about whether it is possible to use pointer arithmetic to navigate from members to parent, and not that you would like to really do it in production code. Please tell me I am right.
Can I now delete the entire object using pointer arithmetic and a pointer that points to one of the structure elements ?
Theoretically, yes.
The pointer that you give to delete needs to have the correct value, and it doesn't really matter whether that value comes from an existing pointer variable, or by "adjusting" one in this manner.
You also need to consider the type of the pointer; if nothing else, you should cast to char* before performing your arithmetic so that you are moving in steps of single bytes. Your current code will not compile because ISO C++ forbids incrementing a pointer of type 'void*' (how big is a void?).
However, I recommend not doing this at all. Your magic number 0x14 is unreliable, given alignment and padding and the potential of your structure to change shape.
Instead, store a pointer to the actual object. Also stop with all the horrid memory mess, and use std::string. At present, your lack of copy constructor is presenting a nasty bug.
You can do this sort of thing with pointer arithmetic. Whether you should is an entirely different story. Consider this macro (I know... I know...) that will give you the base address of a structure given its type, the name of a structure member and a pointer to that member:
#define ADDRESS_FROM_MEMBER(T, member, ptr) reinterpret_cast<T*>( \
reinterpret_cast<unsigned char *>(ptr) - (ptrdiff_t)(&(reinterpret_cast<T*>(0))->member))