Why use array size 1 instead of pointer? - c++

In one C++ open source project, I see this.
struct SomeClass {
...
size_t data_length;
char data[1];
...
}
What are the advantages of doing so rather than using a pointer?
struct SomeClass {
...
size_t data_length;
char* data;
...
}
The only thing I can think of is with the size 1 array version, users aren't expected to see NULL. Is there anything else?

With this, you don't have to allocate the memory elsewhere and make the pointer point to that.
No extra memory management
Accesses to the memory will hit the memory cache (much) more likely
The trick is to allocate more memory than sizeof (SomeClass), and make a SomeClass* point to it. Then the initial memory will be used by your SomeClass object, and the remaining memory can be used by the data. That is, you can say p->data[0] but also p->data[1] and so on up until you hit the end of memory you allocated.
Points can be made that this use results in undefined behavior though, because you declared your array to only have one element, but access it as if it contained more. But real compilers do allow this with the expected meaning because C++ has no alternative syntax to formulate these means (C99 has, it's called "flexible array member" there).

This is usually a quick(and dirty?) way of avoiding multiple memory allocations and deallocations, though it's more C stylish than C++.
That is, instead of this:
struct SomeClass *foo = malloc(sizeof *foo);
foo->data = malloc(data_len);
memcpy(foo->data,data,data_len);
....
free(foo->data);
free(foo);
You do something like this:
struct SomeClass *foo = malloc(sizeof *foo + data_len);
memcpy(foo->data,data,data_len);
...
free(foo);
In addition to saving (de)allocation calls, this can also save a bit of memory as there's no space for a pointer and you could even use space that otherwise could have been struct padding.

Usually you see this as the final member of a structure. Then whoever mallocs the structure, will allocate all the data bytes consecutively in memory as one block to "follow" the structure.
So if you need 16 bytes of data, you'd allocate an instance like this:
SomeClass * pObj = malloc(sizeof(SomeClass) + (16 - 1));
Then you can access the data as if it were an array:
pObj->data[12] = 0xAB;
And you can free all the stuff with one call, of course, as well.
The data member is a single-item array by convention because older C compilers (and apparently the current C++ standard) doesn't allow a zero-sized array. Nice further discussion here: http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

They are semantically different in your example.
char data[1] is a valid array of char with one uninitialized element allocated on the stack. You could write data[0] = 'w' and your program would be correct.
char* data; simply declares a pointer that is invalid until initialized to point to a valid address.

The structure can be simply allocated as a single block of memory instead of multiple allocations that must be freed.
It actually uses less memory because it doesn't need to store the pointer itself.
There may also be performance advantages with caching due to the memory being contiguous.

The idea behind this particular thing is that the rest of data fits in memory directly after the struct. Of course, you could just do that anyway.

Related

Delete on a pointer in a Union

I have tried some interesting code(at least for me !). Here it is.
#include <iostream>
struct myStruct{
int one;
/*Destructor: Program crashes if the below code uncommented*/
/*
~myStruct(){
std::cout<<"des\n";
}
*/
};
struct finalStruct {
int noOfChars;
int noOfStructs;
union {
myStruct *structPtr;
char *charPtr;
}U;
};
int main(){
finalStruct obj;
obj.noOfChars = 2;
obj.noOfStructs = 1;
int bytesToAllocate = sizeof(char)*obj.noOfChars
+ sizeof(myStruct)*obj.noOfStructs;
obj.U.charPtr = new char[bytesToAllocate];
/*Now both the pointers charPtr and structPtr points to same location*/
delete []obj.U.structPtr;
}
I have allocated memory to charPtr and deleted with structPtr. It is crashing when I add a destructor to myStruct otherwise no issues.
What exactly happens here. As I know delete[] will call the destructor as many times as number given in new[]. Why it is not crashing when there is no destructor in myStruct?
First off, storing one member of a union and then reading another in the way you're doing it is Undefined Behaviour, plain and simple. It's just wrong and anything could happen.
That aside, it's quite likely the type pun you're attempting with the union actually works (but remember it's not guaranteed). If that's the case, the following happens:
You allocate an array of bytesToAllocate objects of type char and store the address in the unionised pointer.
Then, you call delete[] on the unionised pointer typed as myStruct*. Which means that it assumes it's an array of myStruct objects, and it will invoke the destructor on each of these objects. However, the array does not contain any myStruct objects, it contains char objects. In fact, the size in bytes of the array is not even a multiple of the size of myStruct! The delete implementation must be thoroughly confused. It probably interprets the first sizeof(myStruct) bytes as one myStruct object and calls the destructor in those bytes. Then, there's less than sizeof(myStruct) bytes left, but there are still some left, so the destructor is called on those incomplete bytes, reaches beyond the array, and hilarity ensues.
Of course, since this is just UB, my guess at the behaviour above could be way off. Plain and simple, you've confused it, so it acts confused.
delete makes two things, call destructor and deallocate memory.
You allocate data for one type, but delete if faking another type.
You shouldn't do it. There are many things one could do in C/C++, take a look at IOCCC for more inspirations :-)
A struct in C++ without any function and having only plain old data is itself a POD. It never calls a constructor/destructor when created/deleted.
Even not standard c-tors/d-tors. Just for performance reasons.
A Struct having (EDIT) user-defined copy-assignment operator, virtual function or d-tor is internally a little bit more complicated. It has a table of member function pointers.
If you allocate the memory block with chars, this table is not initialized. When you try to delete this memory block using a not POD-type, it first calls the destructor. And as the destructor function pointer is not initialized, it calls any memory block in your memory space, thinking it was the function. That's why it crashes.
It works because myStruct does not have a destructor. [Edit: I now see that you tried that, and it does crash. I would find the question interesting why it crashes with that dtor, since the dtor does not access any memory of the object.]
As others said, the second function of free[] besides potentially calling the elements' dtors (which doesn't happen here, as described) is to free the memory.
That works perfectly in your implementation because typically free store implementations just allocate a block of memory for that purpose whose size is kept in a book keeping location in that very memory. The size (once allocated) is type independent, i.e. is not derived from the pointer type on free. Cf. How does delete[] "know" the size of the operand array?. The malloc like, type agnostic allocator returns the chunk of memory and is happy.
Note that, of course, what you do is bogous and don't do that at home and don't publish it and make ppl sign non liability disagreements and don't use it in nuclear facilities and always return int from main().
the problem is that obj.U.structPtr points to a struct, which can have a constructor and destructor.
delete also requires the correct type, otherwise it cannot call the destructor.
So it is illegal to create a char array with new and delete it as an struct pointer.
It would be okay if you use malloc and free. This won't call the constructor and destructor.

Using any buffer to store any data, with placement new and memcpy

It is a while now that I am reading documentation about alignment, aliasing, padding, and placement new, but still I am not sure about how to solve this problem - it is the first time I face a memory issue on this level, so I am not confident.
The problem is: I have a buffer, an array of data, which uses some type typeT, e.g.
typeT buff[N];
Inside that buffer, I have to store some other data, of type typeU, at the beginning of buff. Let's assume to have K elements of typeU, which is less or equal to M elements of type typeT.
EDIT: This problem depends on the data type, but my question is generic, and hold for POD and non-POD. But you can assume that non-POD is just an aggregation of POD types, there is no dynamic memory to be copied and all the data is contained inside the sizeof(typeX) bytes of the structure itself. That is, do not worry about deep/shallow copying.
I am mostly interested in the case where typeT and typeU have different alignments. Which alignment size is bigger is not known.
Question 1: (most important question) is it always safe to use placement new for storing my data on buff, considering the fact that I will always access to those data using typeU, like in the following code?
typeU *allocData = new (buff) typeU[K];
allocData[0] = foo;
Question 2: is it always safe to copy any data from typeU* inside a buffer of type typeT* using memcpy, given that I will always access the data using typeU*, like in the following code?
typeU prevData[K];
memcpy(buff, prevData, sizeof(typeU) * K);
// Is it safe to do the following?
typeU *accessData = reinterpret_cast<typeU *>(buff);
accessData[1] = foo;
Question 3: (least important question) is it true that it is not always safe to just cast pointers and write the data using those? Like in
typeU *castData = (typeU*)buff;
castData[2] = something;
even if I always use the first region of memory (first M elements) using typeU and the second region (elements from M to N) using typeT?
I hope it is clear what I mean... I searched a lot, but I am still confused.
Also, please consider I am talking about a C++98 or 03, not 11, with a minimal subset of STL, no boost and not always on x86 - but I can use memcpy and placement new, as I said.
Provided that the size and alignment requirements of typeT and typeU are identical, you can use uninitialized storage allocated for typeT to hold a value of typeU.
If typeT and/or typeU have constructors or destructors, you must ensure that they are called appropriately, and in the right order.
static_assert(sizeof(typeT) == sizeof(typeU))
static_assert(alignof(typeT) == alignof(typeU))
typeT t[10];
&t[0]->~typeT(); // placement delete t[0] to uninitialize
typeU* p = &t[0];
new (p) typeU(); // construct typeU at t[0]
typeU u& = *p;
u.doStuff();
p->~typeU(); // placement delete typeU at t[0]
new (&t[0]) typeT(); // reconstruct typeT at t[0]
I just discovered this article by Sutter which introduces something I didn't knew
Any memory that's allocated dynamically via new or malloc is
guaranteed to be properly aligned for objects of any type, but buffers
that are not allocated dynamically have no such guarantee
This means, first of all, that doing like I was doing is wrong:
typeT buff[N];
new (buff) typeU[K];
Because buff is not allocated dynamically, so there is no guarantee.
So, the point is: to do this kind of handling, memory has to be dynamically allocated. If it is not, the answer depends on the alignment.

Dynamic sized array in C

Here is a code snippet I was working with:
int *a;
int p = 10;
*(a+0) = 10;
*(a+1) = 11;
printf("%d\n", a[0]);
printf("%d\n", a[1]);
Now, I expect it to print
10
11
However, a window appears that says program.exe has stopped working.
The if I comment out the second line of code int p = 10; and then tun the code again it works.
Why is this happening? (What I wanted to do was create an array of dynamic size.)
There are probably at least 50 duplicates of this, but finding them may be non-trivial.
Anyway, you're defining a pointer, but no memory for it to point at. You're writing to whatever random address the pointer happened to contain at startup, producing undefined behavior.
Also, your code won't compile, because int *a, int p = 10; isn't syntactically correct -- the comma needs to become a semicolon (or you can get rid of the second int, but I wouldn't really recommend that).
In C, you probably want to use an array instead of a pointer, unless you need to allocate the space dynamically (oops, rereading, you apparently do want to -- so you need to use malloc to allocate the space, like a = malloc(2); -- but you also want to check the return value to before you use it -- at least in theory, malloc can return a null pointer). In C++, you probably want to use a std::vector instead of an array or pointer (it'll manage dynamic allocation for you).
No memory is being allocated for a, it's just an uninitialized pointer to an int (so there are two problems).
Therefore when data is stored in that location, the behavior is undefined. That means you may sometimes not even get a segmentation fault/program crash, or you may -> undefined. (Since C doesn't do any bounds checking, it won't alert you to these sort of problems. Unfortunately, one of the strength of C is also one of its major weaknesses, it will happily do what you ask of it)
You're not even allocating memory so you're accessing invalid memory...
Use malloc to allocate enough memory for your array:
int* a = (int*) malloc(sizeof(int)*arraySize);
//Now you can change the contents of the array
You will need to use malloc to assign memory that array.
If you want the size to by dynamic you will need to use realloc every time you wish to increase the size of the array without destroying the data that is already there
You have to allocate storage for the array. Use malloc if you're in C, new if it's C++, or use a C++ std::vector<int> if you really need the array size to be dynamic.
First a has not been initialized. What does it point to? Nothing, hopefully zero, but you do not know.
Then you are adding 1 to it and accessing that byte. IF a were 0, a+1 would be 1. What is in memory location 1?
Also you are bumping the address by one addressable memory unit. This may or may not be the size of an integer on that machine.

Is this a memory leak, if not is it guaranteed by the standard?

Related to a recent question, I wrote the following code:
int main()
{
char* x = new char[33];
int* sz = (int*)x;
sz--;
sz--;
sz--;
sz--;
int szn = *sz; //szn is 33 :)
}
I do know it's not safe and would never use it, but it brings to mind a question:
Is the following safe? Is it a memory leak?
char* allocate()
{
return new char[20];
}
int main()
{
char* x = allocate();
delete[] x;
}
If it's safe, doesn't that mean we can actually find the size of the array? Granted, not in a standard way, but is the compiler required to store information about the size of the array?
I am not using or plan on using this code. I know it is undefined behavior. I know it isn't guaranteed by anything. It's just a theoretical question!
Is the following safe?
Yes, of course that's safe. First snippet has UB however.
If it's safe, doesn't that mean we can actually find the size of the array? Granted, not in a standard way, but is the compiler required to store information about the size of the array?
Yes, generally extra data is stored before the first element. This is used to call the correct number of destructors. It's UB to access this.
required to store information about the size of the array?
No. It only requires delete[] work as expected. new int[10] could simply be a plain malloc call, which would not necessarily store the requested size 10.
This is safe, and is not a memory leak. The standards require that delete[] handle the freeing of memory by any array allocation.
If it's safe, doesn't that mean we can actually find the size of the array?
The standards don't put specific requirements on where and how the allocated size is stored. This could be discoverable as shown above, but different compilers/platforms could also use a completely different methodology. As such, it's not safe to rely on this technique to discover the size.
I know that in c, the size of any malloc on the heap resides before the pointer. The code for free relies on this. This is documented in K&R.
But you should not rely on this always being there or always being in the same position.
If you want to know the array length then I would suggest you create a class or struct to record capcity along side the actual array, and pass that around your program where you would previously just pass a char*.
int main()
{
char* x = new char[33];
int* sz = (int*)x;
sz--;
sz--;
sz--;
sz--;
int szn = *sz; //szn is 33 :)
}
This is an undefined behavior, because you access the memory location that you didn't allocate.
is the compiler required to store information about the size of the array?
No.
If it's safe, doesn't that mean we can actually find the size of the array?
You do not do anything special in the 2nd code snipet, therefore it's safe. But there are no ways to get the size of the array.
I am not sure it the delete must know the size of the array when the array is allocated with basic types (that doesn't demand a call to the destructor). In visual studio compilers, the value is stored only for user defined objects (in this case, delete[] must know the size of the array, as it must call their destructors).
Where in the memory the size is allocated is undefined (in visual studio it is in the same place of the gcc).
http://www.parashift.com/c++-faq-lite/freestore-mgmt.html#faq-16.14
There are two ways to destroy an array, depending on how it was created. In both cases the compiler is required to call a destructor for each element of the array, so the number of elements in the array must be known.
If the array is an automatic variable on the stack, the number of elements is known at compile time. The compiler can hard-code the number of elements in the code it emits for destroying the array.
If the array is dynamically allocated on the heap, there must be another mechanism for knowing the element count. That mechanism is not specified by the standard, nor is it exposed in any other fashion. I think that putting the count at an offset from the front of the array is a common implementation, but it's certainly not the only way, and the actual offset is just a private implementation detail.
Since the compiler must know how many elements are in the array, you'd think it would be possible for the standard to mandate a way of making that count available to programs. Unfortunately this is not possible because the count is only known at destruction time. Imagine that the standard included a function count_of that could access that hidden information:
MyClass array1[33];
MyClass * array2 = new MyClass[33];
cout << count_of(array1) << count_of(array2); // outputs 33 33
Foo(array1);
Foo(array2);
MyClass * not_array = new MyClass;
Foo(not_array);
void Foo(MyClass * ptr)
{
for (int i = 0; i < count_of(ptr); ++i) // how can count_of work here?
...
}
Since the pointer passed to Foo has lost all its context, there's no consistent way for the compiler to know how many elements are in the array, or even if it's an array at all.

How do I over-allocate memory using new to allocate variables within a struct?

So I have a couple of structs...
struct myBaseStruct
{
};
struct myDerivedStruct : public myBaseStruct
{
int a, b, c, d;
unsigned char* ident;
};
myDerivedStruct* pNewStruct;
...and I want to dynamically allocate enough space so that I can 'memcpy' in some data, including a zero-terminated string. The size of the base struct is apparently '1' (I assume because it can't be zero) and the size of the derived is 20, which seems to make sense (5 x 4).
So, I have a data buffer which is a size of 29, the first 16 bytes being the ints and the remaining 13 being the string.
How can I allocate enough memory for pNewStruct so that there is enough for the string? Ideally, I just want to go:
allocate 29 bytes at pNewStruct;
memcpy from buffer into pNewStruct;
Thanks,
You go back to C or abandon these ideas and actually use C++ as it's intended.
Use the constructor to allocate memory and destructor to delete it.
Don't let some other code write into your memory space, create a function that will ensure memory is allocated.
Use a std:string or std::vector to hold the data rather than rolling your own container class.
Ideally you should just say:
myDerivedClass* foo = new myDerivedClass(a, b, c, d, ident);
In the current C++ standard, myDerivedStruct is non-POD, because it has a base class. The result of memcpying anything into it is undefined.
I've heard that C++0x will relax the rules, so that more classes are POD than in C++98, but I haven't looked into it. Also, I doubt that very many compilers would lay out your class in a way that's incompatible with PODs. I expect you'd only have trouble with something that didn't do the empty base class optimisation. But there it is.
If it was POD, or if you're willing to take your chances with your implementation, then you could use malloc(sizeof(myStruct)+13) or new char[sizeof(myStruct)+13] to allocate enough space, basically the same as you would in C. The motivation presumably is to avoid the memory and time overhead of just putting a std::string member in your class, but at the cost of having to write the code for the manual memory management.
You can overallocate for any class instance, but it implies a certain amount of management overhead. The only valid way to do this is by using a custom memory allocation call. Without changing the class definition, you can do this.
void* pMem = ::operator new(sizeof(myDerivedStruct) + n);
myDerivedStruct* pObject = new (pMem) myDerivedStruct;
Assuming that you don't overload operator delete in the hierarchy then delete pObject will be a correct way to destroy pObject and deallocate the allocated memory. Of course, if you allocate any objects in the excess memory area then you must correctly free them before deallocating the memory.
You then have access to n bytes of raw memory at this address: void* p = pObject + 1. You can memcpy data to and from this area as you like. You can assign to the object itself and shouldn't need to memcpy its data.
You can also provide a custom memory allocator in the class itself that takes an extra size_t describing the amount of excess memory to allocate enabling you to do the allocation in a single new expression, but this requires more overhead in the class design.
myDerivedStruct* pObject = new (n) myDerivedStruct;
and
struct myDerivedStruct
{
// ...
void* operator new(std::size_t objsize, std::size_t excess storage);
// other operator new and delete overrides to make sure that you have no memory leaks
};
You can allocate any size you want with malloc:
myDerivedStruct* pNewStruct = (myDerivedStruct*) malloc(
sizeof(myDerivedStruct) + sizeof_extra data);
You have a different problem though, in that myDerivedStruct::ident is a very ambigous construct. It is a pointer to a char (array), then the structs ends with the address where the char array starts? ident can point to anywhere and is very ambigous who owns the array ident points to. It seems to me that you expect the struct to end with the actual char array itself and the struct owns the extra array. Such structures usualy have a size member to keep track of teir own size so that API functions can properly manage them and copy them, and the extra data starts, by convention, after the structure ends. Or they end with a 0 length array char ident[0] although that creates problems with some compilers. For many reasons, there is no place for inheritance in such structs:
struct myStruct
{
size_t size;
int a, b, c, d;
char ident[0];
};
Mixing memcpy and new seems like a terrible idea in this context. Consider using malloc instead.
You can dynamically allocate space by doing:
myDerivedStruct* pNewStruct = reinterpret_cast<myDerivedStruct*>(new char[size]);
however
Are you sure you want to do this?
Also, note that if you are intending to use ident as the pointer to the start of your string, that would be incorrect. You infact need &ident, since the ident variable is itself at the start of your unused space, interpreting what is at that space as a pointer is most likely going to be meaningless. Hence, it would make more sense if ident were unsigned char or char rather than unsigned char*.
[edit again]
I'd just like to emphasise that what you're doing is really a really really bad idea.
char* buffer = [some data here];
myDerivedStruct* pNewStruct = new myDerivedStruct();
memcpy(buffer,pNewStruct,4*sizeof(int));
pNewStruct->ident = new char[ strlen(buffer+(4*sizeof int)) ];
strcpy(pNewStruct->ident,buffer+(4*sizeof int));
Something like that.
Is the buffer size known at compile time? A statically allocated array would be an easier solution in that case. Otherwise, see Remus Rusanu's answer above. That's how the win32 api manages variable sized structs.
struct myDerivedStruct : public myBaseStruct
{
int a, b, c, d;
unsigned char ident[BUFFER_SIZE];
};
Firstly, I don't get what's the point of having a myBaseStruct base. You proivided no explanation.
Secondly, what you declared in your original post will no work with the data layout you described. For what you described in the OP, you need the last member of the struct to be an array, not a pointer
struct myDerivedStruct : public myBaseStruct {
int a, b, c, d;
unsigned char ident[1];
};
Array size doesn't matter, but it should be greater than 0. Arrays of size 0 are explicitly illegal in C++.
Thirdly, if you for some reason want to use new specifically, you'll have to allocate a buffer of char objects of required size and then convert the resultant pointer to your pointer type
char *raw_buffer = new char[29];
myDerivedStruct* pNewStruct = reinterpret_cast<myDerivedStruct*>(raw_buffer);
After that you can do your memcpy, assuming that the size is right.