what is the point of having static arrays - d

I don't have a background in C or C++, so static arrays puzzle me a little. What are they for? Why are they allocated on the stack?
I imagine there's a performance benefit. Stack allocation is faster and there's no need for garbage collection. But why does the length need to be known at compile time? Couldn't you create a fixed-size array at runtime and allocate it on the stack?
Dynamic arays or slices in D are represented by a struct that contains a pointer and a length property. Is the same true for static arrays? How are they represented?
If you pass them to a function, they are copied in their entirety (unless you use ref), what is the rationale behind that?
I realize that dynamic arrays and slices are much more imprtant in D than static arrays, that's why the documentation doesn't dwell on them very long, but I'd still like to have a bit more background. I guess the peculiarities of static arrays have to do with how stack allocation works.

static arrays hail from the C language where calls to (the slow) alloc were discouraged to the extreme because of memory leaks (when you forgot to free the allocated array), double freeing (as a result of...), dangling pointers (all dangers of manual memory management that can be avoided with GC)
this meant that constructs such as
int foo(char* inp){
char[80] buff;
strcpy(inp,buff);//don't do this this is a invite for a buffer overflow
//...
return 1;
}
were common instead of the alloc/free calls where you need to be sure everything you allocated was freed exactly once over the course of the program
technically you CAN dynamically allocate on the stack (using assembly if you want to) however this can cause some issues with the code as the length will only be known at runtime and lessen possible optimization that the compiler may apply (unrolling an iteration over it for example)
static arrays are mostly used for buffers because of the fast allocation possible on the stack
ubyte[1024] buff=void;//assigning void avoids the initializer for each element cause we are writing to it first thing anyway
ubyte[] b;
while((b=f.rawRead(buff[])).length>0){
//...
}
they can implicitly convert to a slice of the array (or explicitly with the slice operator []) so you can use them near interchangeably with normal dynamic arrays

Static arrays are value types in D2. Without static arrays, there would be no simple way to have 100 elements in a struct that are actually stored in the struct.
Static arrays carry their size as part of their type. This allows you to e.g. declare: alias ubyte[16] IPv6Address;
Unlike in C, D2 static arrays are value types through and through. This means that they are passed by value to functions, like structs. Static arrays generally behave like structs with N members as far as memory allocation and copying goes.
By the way, you can use alloca to allocate a variable amount of memory on the stack. C also has variable-length arrays.

Related

Why is it impossible to allocate an array of an arbitrary size on the stack?

Why can't I write the following?
char acBuf[nSize];
Only to prevent the stack from overgrowing?
Or is there a possibility to do something similar, if I can ensure that I always take just a few hundred kilobytes?
As far as I know, the std::string uses the memory of its members to store the assigned strings, as long as they are 15 characters or less. Only if the strings are longer, it uses this memory to store the address of some heap-allocated memory, which then takes the data.
It seems like it has to be 100%ly determined, during compile-time, how the stack will be aligned during runtime. Is that true? Why is that?
It has nothing to do with preventing stack overflow, you can overflow the stack just fine with char a[SOME_LARGE_CONSTANT]. In C++ the array size has to be known at compile time, this is among other things needed to compute the size of structures containing arrays.
C on the other hand had Variable Length Arrays since C99, which adds an exception and allow runtime dependant size for arrays within function scope. As to why C++ does not have this? It was never adopted by a C++ standard.
Why can't I write the following?
char acBuf[nSize];
Those are called Variable Length Arrays (VLA's) and aren't supported by C++. The reason being that the stack is very fast but tiny compared to the free store (the heap in your words). Which means that at any moment that you add lots of elements to a VLA your stack might just overflow and you get a vague runtime exception. This can also happen with compile-time sized stack arrays but these are way easier to catch because the behaviour of the program doesn't influence their size. Which means that x doesn't have to happen after y to create a stack overflow, it's just there right off the bat. This covers it in more detail and rage.
Containers like std::vector use the free store which is way bigger and has a way to deal with over-allocation (throws bad_alloc).
Unlike C, C++ doesn't support variable length arrays. If you want them, you can use non-standard extensions such as alloca or GNU extensions (supported by clang and GCC). They have their caveats, so be sure to read the manual to make sure you use them safely.
The reason the stack layout is mostly determined statically is so that the generated code has to perform fewer computations (additions, multiplications, and pointer dereferencing) to figure out where the data is on the stack. The offsets can instead be hardcoded into the generated machine code.
My advice is to take a look at alloca.h
void *alloca(size_t size);
The alloca() function allocates size bytes of space in the stack
frame of the caller. This temporary space is automatically freed
when the function that called alloca() returns to its caller.
One possible problem I see with VLA in C++ is the type.
What is the type of acBuf in char acBuf[nSize] or even worse in char acBuf[nSize][nSize] ?
template <typename T> void foo(const T&);
void foo(int n)
{
char mat[n][n];
foo(mat);
}
You cannot pass that array by reference to
template <typename T, std::size_t N>
void foo_on_array(const T (&a)[N]);
You should be happy that the C++ standard discourages a dangerous practice (variable-length arrays on the stack) and instead encourages a less dangerous practice (variable-length arrays and std::vector with heap allocation).
Variable-length arrays on the stack are more dangerous because:
The available stack space is typically 8 MB, much smaller than the 2 GB (or more) of available heap space.
When the stack space is exhausted, the program crashes with SIGSEGV, and it requires special software such as GNU libsigsegv to recover from such a situation.
In typical C++ programs, the programmer does not know whether the array length will definitely stay under a limit such as 4 MB.
Why can't I write the following?
char acBuf[nSize];
You can't do that because in C++ the lenght of the array has to be known at compile time, that's because the compiler reserves the specified memory for the array and it can not be modified during runtime. it's not about prevent a stack overflow, it's about memory layout.
If you want to make a dynamic array you should use the new operator so it will be stored in heap.
char *acBuf = new char[nsize];

Why is new int[n] valid when int array[n] is not?

For the following code:
foo(int n){
int array[n];
}
I understand that this is invalid syntax and that it is invalid because the c++ standard requires array size to be set at compile time (although some compilers support the following syntax).
However I also understand the the following is valid syntax:
bar(int n){
int *array = new int[n];
}
I don't understand why this is allowed, isn't it the same as creating an array where the size is determined at runtime? Is it good practice to do this or should I be using a vector if I need to do this instead?
That's because the former is allocated on the stack and the latter on the heap.
When you allocate something on the stack, knowing the size of the object is essential for correctly building it. C99 allows the size to be specified at run time, and this introduces some complications in building and dismantling the aforementioned stack, since you cannot calculate its size at compile time. Machine code must be emitted in order to perform said calculation during the execution of the program. This is probably the main reason why this feature wasn't included in the C++ standard.²
On the contrary, the heap has no fixed structure, as the name implies. Blocks of any size can be allocated with no particular order, as long as they do not overlap and you have enough (virtual) memory¹. In this case, knowing the size at compile time is not that relevant.
Also, remember that the stack has a limited size, mostly to detect infinite recursions before they consume all the available memory. Usually the limit is fixed around 1MB, and you rarely reach that. Unless you allocate large objects, which should be placed in the heap.
As of what you should use, probably a std::vector<int>. But it really depends on what you are trying to do.
Also note that C++11 has a std::array class, whose size must be known at compile time. C++14 should have introduced std::dynarray, but it was postponed because there is still much work to do concerning compile-time unknown size stack allocation.
¹ blocks are usually allocated sequentially for performance reasons, but that's not required.
² as pointed out, knowing the size at compile time is not a hard requirement, but it makes things simpler.
In the first case you are allocating the memory space statically to hold the integers. This is done when the program is compiled and so the amount of storage is inflexible.
In the latter case you are dynamically allocating a memory space to hold the integers. This is done when the program is run, and so the amount of storage required can be flexible.
The second call is actually a function that talks to the operating system to go and find a place in memory to use. That same process does not happen in the first case.
int array[n] allocates a fixed-length array on the call stack at compile-time, and thus n needs to be known at compile-time (unless a compiler-specific extension is used to allow the allocation at runtime, but the array is still on the stack).
int *array = new int[n] allocates a dynamic-length array on the heap at run-time, so n does not need to be known at compile-time.
The one and only valid answer to your question is, because the standard says so.
In contrast to C99, C++ never bothered to specify variable length arrays (VLAs), so the only way to get variably sized arrays is using dynamic allocation, with malloc, new or some other memory-manager.
In fairness to C++, having runtime-sized stack-allocations slightly complicates stack-unwinding, which would also make exception-handling for the functions using the feature consequently more bothersome.
Anyway, even if your compiler provides that C99-feature as an extension, it's a good idea to always keep a really tight rein on your stack-usage:
There is no way to recover from blowing the stack-limit, and the error-case is simply left Undefined Behavior for a reason.
The easiest way to simulate VLAs in C++, though without the performance-benefit of avoiding dynamic allocation (and the danger of blowing the limit):
unique_ptr<T[]> array{new T[n]};
In the expression
new int[n]
int[n] is not the type. C++ treats "new with arrays" and "new with non-arrays" differently. The N3337 standard draft has this to say about new:
When the allocated object is an array (that is, the noptr-new-declarator syntax is used or the new-type-id or type-id denotes an array type), the new-expression yields a pointer to the initial element (if any) of the array.
The noptr-new-declarator refers to this special case (evaluate n and create the array of this size), see:
noptr-new-declarator:
    [ expression ] attribute-specifier-seqopt
    noptr-new-declarator [ constant-expression ] attribute-specifier-seqopt
However you can't use this in the "usual" declarations like
int array[n];
or in the typedef
typedef int variable_array[n];
This is different with C99 VLAs, where both are allowed.
Should I be using vectors instead?
Yes, you should. You should use vectors all the time, unless you have a very strong reason to do otherwise (there was one time during the last 7 years when I used new - when I was implementing vector for a school assignment).
No, the second is not declaring an array. It's using the array form of operator new, and that specifically permits the first dimension to be variable.
This is because the C++ language does not have the C feature introduced in C99 known as "variable length arrays" (VLA).
C++ is lagging in adopting this C feature because the std::vector type from its library fulfills most of the requirements.
Furthermore, the 2011 standard of C backpedaled and made VLA's an optional feature.
VLA's, in a nutshell, allow you to use a run-time value to determine the size of a local array that is allocated in automatic storage:
int func(int variable)
{
long array[variable]; // VLA feature
// loop over array
for (size_t i = 0; i < sizeof array / sizeof array[0]; i++) {
// the above sizeof is also a VLA feature: it yields the dynamic
// size of the array, and so is not a compile-time constant,
// unlike other uses of sizeof!
}
}
VLA's existed in the GNU C dialect long before C99. In dialects of C without VLA's, array dimensions in a declaration must be constant expressions.
Even in dialects of C with VLA's, only certain arrays can be VLA's. For instance static arrays cannot be, and neither can dynamic arrays (for instance arrays inside a structure, even if instances of that structure are allocated dynamically).
In any case, since you're coding in C++, this is moot!
Note that storage allocated with operator new are not the VLA feature. This is a special C++ syntax for dynamic allocation, which returns a pointer type, as you know:
int *p = new int[variable];
Unlike a VLA's, this object will persist until it is explicitly destroyed with delete [], and can be returned from the surrounding scope.
Because it has different semantics:
If n is a compile-time constant (unlike in your example):
int array[n]; //valid iff n is compile-time constant, space known at compile-time
But consider when n is a runtime value:
int array[n]; //Cannot have a static array with a runtime value in C++
int * array = new int[n]; //This works because it happens at run-time,
// not at compile-time! Different semantics, similar syntax.
In C99 you can have a runtime n for an array and space will be made in the stack at runtime.
There are some proposals for similar extensions in C++, but none of them is into the standard yet.
You can allocate memory statically on the stack or dynamically on the heap.
In your first case, your function contains a declaration of an array with a possible variable length, but this is not possible, since raw arrays must have fixed size at compile time, because they are allocated on the stack. For this reason their size must be specified as a constant, for example 5. You could have something like this:
foo(){
int array[5]; // raw array with fixed size 5
}
Using pointers you can specify a variable size for the memory that will be pointed, since this memory will be allocated dynamically on the heap. In your second case, you are using the parameter n to specify the space of memory that will be allocated.
Concluding, we can say that pointers are not arrays: the memory allocated using a pointer is allocated on the heap, whereas the memory allocated for a raw array is allocated on the stack.
There are good alternatives to raw arrays, for example the standard container vector, which is basically a container with variable length size.
Make sure you understand well the difference between dynamic and static memory allocation, the difference between memory allocated on the stack and memory allocated on the heap.

Getting the size of a struct with dynamic variables?

I have a struct with a dynamic array:
struct test{int* arr;};
After allocating space for the arr array(arr=new int[100]), using sizeof returns 4 bytes, which is the size of the struct without the array elements. Is there another built-in function like sizeof that can return the size while keeping dynamically allocated space into account? Or do I have to do this myself?
+
I need this because I want to make it easier to save/load the contents of the struct to/from a file.
There's no way to get the memory usage due to an object and the other objects it points to, because it's not a well defined concept.
Two objects might point to the same arr block. Are they both responsible for consuming the memory?
What about recursion if you have an array of structures containing pointers? What about a cycle?
Maybe arr points to the stack. Does that count as using memory?
malloc might round up the requested allocation size, or allocate internal bookkeeping structures. Do such effects count?
Some operating systems do provide a facility to retrieve the argument to malloc (or sometimes a rounded-up value, because the underlying system might genuinely have no use for the original argument), but in standard C and C++, POSIX and in general practice, you are responsible for tracking allocation sizes yourself.
Unfortunately I think you are out of luck. The size returned by sizeof is the size of the pointer which cannot lead to the correct size of what it points to (in your case a dynamic array).
I suggest you to use std::vector. It has a size() member function that returns the number of elements in use:
struct test {
std::vector<int> arr;
}
test x;
// ...
x.arr.size();

What's the advantage of malloc?

What is the advantage of allocating a memory for some data. Instead we could use an array of them.
Like
int *lis;
lis = (int*) malloc ( sizeof( int ) * n );
/* Initialize LIS values for all indexes */
for ( i = 0; i < n; i++ )
lis[i] = 1;
we could have used an ordinary array.
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing? And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
Your question seems to rather compare dynamically allocated C-style arrays with variable-length arrays, which means that this might be what you are looking for: Why aren't variable-length arrays part of the C++ standard?
However the c++ tag yields the ultimate answer: use std::vector object instead.
As long as it is possible, avoid dynamic allocation and responsibility for ugly memory management ~> try to take advantage of objects with automatic storage duration instead. Another interesting reading might be: Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
"And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?"
- If you still consider n to be the amount of integers that it is possible to store in this array, you will most likely experience undefined behavior.
More fundamentally, I think, apart from the stack vs heap and variable vs constant issues (and apart from the fact that you shouldn't be using malloc() in C++ to begin with), is that a local array ceases to exist when the function exits. If you return a pointer to it, that pointer is going to be useless as soon as the caller receives it, whereas memory dynamically allocated with malloc() or new will still be valid. You couldn't implement a function like strdup() using a local array, for instance, or sensibly implement a linked representation list or tree.
The answer is simple. Local1 arrays are allocated on your stack, which is a small pre-allocated memory for your program. Beyond a couple thousand data, you can't really do much on a stack. For higher amounts of data, you need to allocate memory out of your stack.
This is what malloc does.
malloc allocates a piece of memory as big as you ask it. It returns a pointer to the start of that memory, which could be treated similar to an array. If you write beyond the size of that memory, the result is undefined behavior. This means everything could work alright, or your computer may explode. Most likely though you'd get a segmentation fault error.
Reading values from the memory (for example for printing) is the same as reading from an array. For example printf("%d", list[5]);.
Before C99 (I know the question is tagged C++, but probably you're learning C-compiled-in-C++), there was another reason too. There was no way you could have an array of variable length on the stack. (Even now, variable length arrays on the stack are not so useful, since the stack is small). That's why for variable amount of memory, you needed the malloc function to allocate memory as large as you need, the size of which is determined at runtime.
Another important difference between local arrays, or any local variable for that matter, is the life duration of the object. Local variables are inaccessible as soon as their scope finishes. malloced objects live until they are freed. This is essential in practically all data structures that are not arrays, such as linked-lists, binary search trees (and variants), (most) heaps etc.
An example of malloced objects are FILEs. Once you call fopen, the structure that holds the data related to the opened file is dynamically allocated using malloc and returned as a pointer (FILE *).
1 Note: Non-local arrays (global or static) are allocated before execution, so they can't really have a length determined at runtime.
I assume you are asking what is the purpose of c maloc():
Say you want to take an input from user and now allocate an array of that size:
int n;
scanf("%d",&n);
int arr[n];
This will fail because n is not available at compile time. Here comes malloc()
you may write:
int n;
scanf("%d",&n);
int* arr = malloc(sizeof(int)*n);
Actually malloc() allocate memory dynamically in the heap area
Some older programming environments did not provide malloc or any equivalent functionality at all. If you needed dynamic memory allocation you had to code it yourself on top of gigantic static arrays. This had several drawbacks:
The static array size put a hard upper limit on how much data the program could process at any one time, without being recompiled. If you've ever tried to do something complicated in TeX and got a "capacity exceeded, sorry" message, this is why.
The operating system (such as it was) had to reserve space for the static array all at once, whether or not it would all be used. This phenomenon led to "overcommit", in which the OS pretends to have allocated all the memory you could possibly want, but then kills your process if you actually try to use more than is available. Why would anyone want that? And yet it was hyped as a feature in mid-90s commercial Unix, because it meant that giant FORTRAN simulations that potentially needed far more memory than your dinky little Sun workstation had, could be tested on small instance sizes with no trouble. (Presumably you would run the big instance on a Cray somewhere that actually had enough memory to cope.)
Dynamic memory allocators are hard to implement well. Have a look at the jemalloc paper to get a taste of just how hairy it can be. (If you want automatic garbage collection it gets even more complicated.) This is exactly the sort of thing you want a guru to code once for everyone's benefit.
So nowadays even quite barebones embedded environments give you some sort of dynamic allocator.
However, it is good mental discipline to try to do without. Over-use of dynamic memory leads to inefficiency, of the kind that is often very hard to eliminate after the fact, since it's baked into the architecture. If it seems like the task at hand doesn't need dynamic allocation, perhaps it doesn't.
However however, not using dynamic memory allocation when you really should have can cause its own problems, such as imposing hard upper limits on how long strings can be, or baking nonreentrancy into your API (compare gethostbyname to getaddrinfo).
So you have to think about it carefully.
we could have used an ordinary array
In C++ (this year, at least), arrays have a static size; so creating one from a run-time value:
int lis[n];
is not allowed. Some compilers allow this as a non-standard extension, and it's due to become standard next year; but, for now, if we want a dynamically sized array we have to allocate it dynamically.
In C, that would mean messing around with malloc; but you're asking about C++, so you want
std::vector<int> lis(n, 1);
to allocate an array of size n containing int values initialised to 1.
(If you like, you could allocate the array with new int[n], and remember to free it with delete [] lis when you're finished, and take extra care not to leak if an exception is thrown; but life's too short for that nonsense.)
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
malloc in C and new in C++ allocate persistent memory from the "free store". Unlike memory for local variables, which is released automatically when the variable goes out of scope, this persists until you explicitly release it (free in C, delete in C++). This is necessary if you need the array to outlive the current function call. It's also a good idea if the array is very large: local variables are (typically) stored on a stack, with a limited size. If that overflows, the program will crash or otherwise go wrong. (And, in current standard C++, it's necessary if the size isn't a compile-time constant).
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?
You haven't allocated enough space for n integers; so code that assumes you have will try to access memory beyond the end of the allocated space. This will cause undefined behaviour; a crash if you're lucky, and data corruption if you're unlucky.
And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
You mean something like this?
for (i = 0; i < len; ++i) std::cout << lis[i] << '\n';

What is the "proper" way to allocate variable-sized buffers in C++?

This is very similar to this question, but the answers don't really answer this, so I thought I'd ask again:
Sometimes I interact with functions that return variable-length structures; for example, FSCTL_GET_RETRIEVAL_POINTERS in Windows returns a variably-sized RETRIEVAL_POINTERS_BUFFER structure.
Using malloc/free is discouraged in C++, and so I was wondering:
What is the "proper" way to allocate variable-length buffers in standard C++ (i.e. no Boost, etc.)?
vector<char> is type-unsafe (and doesn't guarantee anything about alignment, if I understand correctly), new doesn't work with custom-sized allocations, and I can't think of a good substitute. Any ideas?
I would use std::vector<char> buffer(n). There's really no such thing as a variably sized structure in C++, so you have to fake it; throw type safety out the window.
If you like malloc()/free(), you can use
RETRIEVAL_POINTERS_BUFFER* ptr=new char [...appropriate size...];
... do stuff ...
delete[] ptr;
Quotation from the standard regarding alignment (expr.new/10):
For arrays of char and unsigned char, the difference between the
result of the new-expression and the address returned by the
allocation function shall be an integral multiple of the strictest
fundamental alignment requirement (3.11) of any object type whose size
is no greater than the size of the array being created. [ Note:
Because allocation functions are assumed to return pointers to storage
that is appropriately aligned for objects of any type with fundamental
alignment, this constraint on array allocation overhead permits the
common idiom of allocating character arrays into which objects of
other types will later be placed. — end note ]
I don't see any reason why you can't use std::vector<char>:
{
std::vector<char> raii(memory_size);
char* memory = &raii[0];
//Now use `memory` wherever you want
//Maybe, you want to use placement new as:
A *pA = new (memory) A(/*...*/); //assume memory_size >= sizeof(A);
pA->fun();
pA->~A(); //call the destructor, once done!
}//<--- just remember, memory is deallocated here, automatically!
Alright, I understand your alignment problem. It's not that complicated. You can do this:
A *pA = new (&memory[i]) A();
//choose `i` such that `&memory[i]` is multiple of four, or whatever alignment requires
//read the comments..
You may consider using a memory pool and, in the specific case of the RETRIEVAL_POINTERS_BUFFER structure, allocate pool memory amounts in accordance with its definition:
sizeof(DWORD) + sizeof(LARGE_INTEGER)
plus
ExtentCount * sizeof(Extents)
(I am sure you are more familiar with this data structure than I am -- the above is mostly for future readers of your question).
A memory pool boils down to "allocate a bunch of memory, then allocate that memory in small pieces using your own fast allocator".
You can build your own memory pool, but it may be worth looking at Boosts memory pool, which is a pure header (no DLLs!) library. Please note that I have not used the Boost memory pool library, but you did ask about Boost so I thought I'd mention it.
std::vector<char> is just fine. Typically you can call your low-level c-function with a zero-size argument, so you know how much is needed. Then you solve your alignment problem: just allocate more than you need, and offset the start pointer:
Say you want the buffer aligned to 4 bytes, allocate needed size + 4 and add 4 - ((&my_vect[0] - reinterpret_cast<char*>(0)) & 0x3).
Then call your c-function with the requested size and the offsetted pointer.
Ok, lets start from the beginning. Ideal way to return variable-length buffer would be:
MyStruct my_func(int a) { MyStruct s; /* magic here */ return s; }
Unfortunately, this does not work since sizeof(MyStruct) is calculated on compile-time. Anything variable-length just do not fit inside a buffer whose size is calculated on compile-time. The thing to notice that this happens with every variable or type supported by c++, since they all support sizeof. C++ has just one thing that can handle runtime sizes of buffers:
MyStruct *ptr = new MyStruct[count];
So anything that is going to solve this problem is necessarily going to use the array version of new. This includes std::vector and other solutions proposed earlier. Notice that tricks like the placement new to a char array has exactly the same problem with sizeof. Variable-length buffers just needs heap and arrays. There is no way around that restriction, if you want to stay within c++. Further it requires more than one object! This is important. You cannot make variable-length object with c++. It's just impossible.
The nearest one to variable-length object that the c++ provides is "jumping from type to type". Each and every object does not need to be of same type, and you can on runtime manipulate objects of different types. But each part and each complete object still supports sizeof and their sizes are determined on compile-time. Only thing left for programmer is to choose which type you use.
So what's our solution to the problem? How do you create variable-length objects? std::string provides the answer. It needs to have more than one character inside and use the array alternative for heap allocation. But this is all handled by the stdlib and programmer do not need to care. Then you'll have a class that manipulates those std::strings. std::string can do it because it's actually 2 separate memory areas. The sizeof(std::string) does return a memory block whose size can be calculated on compile-time. But the actual variable-length data is in separate memory block allocated by the array version of new.
The array version of new has some restrictions on it's own. sizeof(a[0])==sizeof(a[1]) etc. First allocating an array, and then doing placement new for several objects of different types will go around this limitation.