c++ why isn't there something like length(array)? [closed] - c++

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Well I don't think that it's really important but since the program has to store the length because of delete[] anyway, Why can't we get this "stored information" ?

The implementation only needs to store the length, and typically only does, if the type is not trivially destructible (i.e., it needs to generate calls to a destructor) and the array was created with the new[] operator.
Since that property of the arrayed type bears no relation to the size of the array, it is more elegant simply to call the length "cookie" a private implementation detail.
To get the length of a complete array object (not a mere pointer), you can use std::extent< decltype( arr ) >::value or std::end( arr ) - std::begin( arr ).
Using new[] with a class with a destructor is a code smell. Consider std::vector instead. The overhead vs raw new[] (considering all bytes that need to be allocated, wherever they are) is one pointer's worth of bytes, and the benefits are innumerable.

Consider the case:
char* a = new char[100];
Now a needs to point to a buffer that's at least 100 chars big, but the system might have allocated a bigger buffer to fulfill this.
With this in mind, we can see that the system is free to immediately forget the size the program asked for, as long as it can still deallocate the buffer properly later. (Either by remembering the size of the allocated buffer or doing the memory allocation with some smart data structure where only the pointer to the start is required)
So, in the general case, the information you are looking for is not, in fact, stored anywhere.

Not all arrays are allocated by new.
void f(int* arr, size_t n)
{
length(arr); // ???
}
int main()
{
int a[5];
f(a);
}
It's trivial to write though, just call (std::end(arr) - std::begin(arr)), although it only works for arrays, not pointers that point to the start of arrays.

My understanding is that the philosophy of c++ is to not force on people any feature that has a potential cost unless unavoidable.
There may be additional costs in storing this information, and the user may not want to pay that cost if they don't need the information. And as it's trivial to store the length yourself if you want it there is no reason to provide a language feature that has a cost to everyone using an array.

For proper arrays, that is, int a[length], you already have that facility: just do
#define length(A) (sizeof(A) / sizeof(*A))
and you are done with it.
If you are talking about getting the length of the array pointed to by a pointer, well, pointers and arrays are two different concepts and coupling them makes no sense, even if proper arrays "decays" to pointers when needed and you access arrays through pointer arithmetic.
But even if we don't take that into account and talk about technological aspects, the C++ runtime may not know what the length of your array is, since new could rely on malloc, which stores the length of the array in its specific ways, which is understood only by free: the C++ runtime only stores extra informations only when you have non-empty destructors. A pretty messy picture, huh?

because its up to the implementation where it stores this information. so there is no general way to do a length(array)

The length is most certainly not stored on all implementations. C++, for example, allows for garbage collection (e.g. boehmgc), and many collectors do not need to know the length. In traditional allocators, the length stored will often be larger than the actual length, i.e. the length allocated, not the length used.

But, what exactly is the length of the array? Is it the number of bytes in the array or the number of elements in the array?
For example : for some class A of 100 byte size,
A* myArray = new A[100];
should length(myArray) return 100 or 100 * 100? Somebody might want 100, somebody might want 10000. So, there is no real argument for either of that.

The c++ type that works most like the "arrays" in languages that support length(array) is std::vector<>, and it does have std::vector<>::size().
The size of plain [] arrays is known to the compiler in scopes where the size is explicit, of course, but it is possible to pass them to scopes where the size is not known to the compiler. This gives c++ more ways to handle array-like data than languages that must support a length interogative (because they have to insure that the size is always passed).

Related

Why use std::vector instead of realloc? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Here, in this question, it's stated that there is no realloc-like operator or function in c++. If you wish to resize an array, just just std::vector instead. I've even seen Stroustrup saying the same thing here.
I believe it's not hard to implement one. There should be a reason for not implementing one. The answers only say to use std::vector but not why it's not implemented.
What is the reason for not implementing realloc-like operator or function and preferring to use std::vector instead?
What is the reason for not implementing realloc-like operator or function and preferring to use std::vector instead?
Save time. Don't chase bugs in your own code for a problem that has long been solved. Idiomatic C++ and readability. Get answers to your questions easily and quickly. Customize the realloc part by an allocator.
I believe it's not hard to implement one
That heavily depends on what you need from the template you intend to write. For a general-purpose std::vector-like one, have a look at the source code (libcxx's 3400 line vector header is here). I bet you will revise you initial assumption on the low complexity of such construct.
There's several advantages.
Vector keeps track of its size and capacity, which means you don't have to do this yourself.
Because the current size is part of the vector object itself, you can pass a vector (by reference or by value) without needing an additional size parameter. This is especially useful when returning a vector as the caller doesn't need to receive the size through some side-channel.
When reallocating, vector will add more capacity than is needed to add just the element(s) requested to be added. This sounds wasteful but saves time as fewer reallocations are needed.
Vector manages its own memory; using vector lets you focus on the more interesting parts of your program instead of the details of managing memory, which are relatively uninteresting and tricky to get exactly right.
Vector supports many operations that arrays don't natively support, such as removing elements from the middle and making copies of an entire vector.
realloc's expectation that there might be sufficient free space after the current allocation just does not fit well with modern allocators and modern programs.
(There's many more allocation going on, many allocation sizes go to a dedicated pool for that size, and the heap is shared between all the threads in a program.)
In most cases, realloc will have to move content to a completely new allocation, just like vector does. But unlike vector<T>, realloc does not know how to move elements of type T, it only knows how to copy plain data.
Well, as the other answers have explained nicely about the reason for using vectors, I will simply elaborate on why realloc was not implemented. For this, you need to take a look at what realloc actually does. It increases the size of the memory by intelligently using malloc() and free(). You need to understand, that though it seems to simply increase the size, it does not actually increase the size, but instead allocates another block of memory with the required size (That explains the name realloc).
Take a look at the following lines:
int* iarr = (int*)malloc(sizeof(iarr)*5);
iarr = (int*)realloc(6,sizeof(iarr)); //this is completely discouraged
//what you're supposed to do here is:
int* iarr2 = (int*)realloc(iarr,1 + sizeof(iarr)); //copies the old data to new block implicitly
//this not only saves the previous state, but also allows you to check if realloc succeeded
In C++, this can be (if it is must) achieved, by writing:
int* iarr = new int[5];
int* iarr2 = new int[6];
for(int i = 0; i < 5; i++) {
iarr2[i] = iarr[i];
}
delete[] iarr;
The only use of realloc was to increase the memory capacity; as C arrays did not do that automatically they had to provide a mechanism to do so; which has been implicitly implemented in most of the containers, making the reason for having a realloc in the first place, moot.

C++ doesn't tell you the size of a dynamic array. But why?

I know that there is no way in C++ to obtain the size of a dynamically created array, such as:
int* a;
a = new int[n];
What I would like to know is: Why? Did people just forget this in the specification of C++, or is there a technical reason for this?
Isn't the information stored somewhere? After all, the command
delete[] a;
seems to know how much memory it has to release, so it seems to me that delete[] has some way of knowing the size of a.
It's a follow on from the fundamental rule of "don't pay for what you don't need". In your example delete[] a; doesn't need to know the size of the array, because int doesn't have a destructor. If you had written:
std::string* a;
a = new std::string[n];
...
delete [] a;
Then the delete has to call destructors (and needs to know how many to call) - in which case the new has to save that count. However, given it doesn't need to be saved on all occasions, Bjarne decided not to give access to it.
(In hindsight, I think this was a mistake ...)
Even with int of course, something has to know about the size of the allocated memory, but:
Many allocators round up the size to some convenient multiple (say 64 bytes) for alignment and convenience reasons. The allocator knows that a block is 64 bytes long - but it doesn't know whether that is because n was 1 ... or 16.
The C++ run-time library may not have access to the size of the allocated block. If for example, new and delete are using malloc and free under the hood, then the C++ library has no way to know the size of a block returned by malloc. (Usually of course, new and malloc are both part of the same library - but not always.)
One fundamental reason is that there is no difference between a pointer to the first element of a dynamically allocated array of T and a pointer to any other T.
Consider a fictitious function that returns the number of elements a pointer points to.
Let's call it "size".
Sounds really nice, right?
If it weren't for the fact that all pointers are created equal:
char* p = new char[10];
size_t ps = size(p+1); // What?
char a[10] = {0};
size_t as = size(a); // Hmm...
size_t bs = size(a + 1); // Wut?
char i = 0;
size_t is = size(&i); // OK?
You could argue that the first should be 9, the second 10, the third 9, and the last 1, but to accomplish this you need to add a "size tag" on every single object.
A char will require 128 bits of storage (because of alignment) on a 64-bit machine. This is sixteen times more than what is necessary.
(Above, the ten-character array a would require at least 168 bytes.)
This may be convenient, but it's also unacceptably expensive.
You could of course envision a version that is only well-defined if the argument really is a pointer to the first element of a dynamic allocation by the default operator new, but this isn't nearly as useful as one might think.
You are right that some part of the system will have to know something about the size. But getting that information is probably not covered by the API of memory management system (think malloc/free), and the exact size that you requested may not be known, because it may have been rounded up.
You will often find that memory managers will only allocate space in a certain multiple, 64 bytes for example.
So, you may ask for new int[4], i.e. 16 bytes, but the memory manager will allocate 64 bytes for your request. To free this memory it doesn't need to know how much memory you asked for, only that it has allocated you one block of 64 bytes.
The next question may be, can it not store the requested size? This is an added overhead which not everybody is prepared to pay for. An Arduino Uno for example only has 2k of RAM, and in that context 4 bytes for each allocation suddenly becomes significant.
If you need that functionality then you have std::vector (or equivalent), or you have higher-level languages. C/C++ was designed to enable you to work with as little overhead as you choose to make use of, this being one example.
There is a curious case of overloading the operator delete that I found in the form of:
void operator delete[](void *p, size_t size);
The parameter size seems to default to the size (in bytes) of the block of memory to which void *p points. If this is true, it is reasonable to at least hope that it has a value passed by the invocation of operator new and, therefore, would merely need to be divided by sizeof(type) to deliver the number of elements stored in the array.
As for the "why" part of your question, Martin's rule of "don't pay for what you don't need" seems the most logical.
There's no way to know how you are going to use that array.
The allocation size does not necessarily match the element number so you cannot just use the allocation size (even if it was available).
This is a deep flaw in other languages not in C++.
You achieve the functionality you desire with std::vector yet still retain raw access to arrays. Retaining that raw access is critical for any code that actually has to do some work.
Many times you will perform operations on subsets of the array and when you have extra book-keeping built into the language you have to reallocate the sub-arrays and copy the data out to manipulate them with an API that expects a managed array.
Just consider the trite case of sorting the data elements.
If you have managed arrays then you can't use recursion without copying data to create new sub-arrays to pass recursively.
Another example is an FFT which recursively manipulates the data starting with 2x2 "butterflies" and works its way back to the whole array.
To fix the managed array you now need "something else" to patch over this defect and that "something else" is called 'iterators'. (You now have managed arrays but almost never pass them to any functions because you need iterators +90% of the time.)
The size of an array allocated with new[] is not visibly stored anywhere, so you can't access it. And new[] operator doesn't return an array, just a pointer to the array's first element. If you want to know the size of a dynamic array, you must store it manually or use classes from libraries such as std::vector

Why arrays in C++ didn't have member function size() until C++11? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Why arrays in C++ didn't have member function size() until C++11 (std::array) ? What was the reason behind this ? Questions like 'how to obtain size of an array?' are quite frequent.
Edition
As plain arrays are inherited from C, what is the reason it doesn't have such functionality ? As we can check type of data in array.
Because they come from C, and C doesn't have member functions (nor is array even a class-type).
Compatibility with C was paramount, because allowing existing C codebases to compile as C++ greatly increased the adoption rate of the language. Mucking with something that works while worrying about compatibility was low on the priorities list, when you've got new language features to design.
In C++11 the language reached a point where all these nice little things got to come into existence. So why not add it now? Because C arrays still work just fine and it would be a ton of work to change that. It's much easier to introduce a new container (and a very simple one at that), and that's exactly what happened.
Arrays in C++ (not std::array) are no different from arrays in C. They're just a contiguous region of memory allocated to hold some data (of the same type) in memory.
The type could be primitive data types like int, float, char, etc. or complex data types like structures or classes. Irrespective of that accessing an element of the array is just converted to:
base address + (sizeof(datatype) * index)
In contrast std::array is a class which overloads the [] operator for array access and has special methods to retrieve the size. It is quite similar to std::vector. Internally these classes might use an array to access the elements, but that is cleverly hidden by overloading the [] operator.
A primitive array is not an object and thus does not have member functions. The std::array is a class which wraps an array and thus can have member functions and logic to obtain the length of the array.
If you declare an array like:
int * myArray = new int[5];
It will have no methods to obtain its length.
std::array is a template class, like the other container classes. Plain arrays still don't have members in C++11.
As plain arrays are inherited from C, what is the reason it doesn't have such functionality ?
You can find out the amount of elements in a plain array, using the sizeof operator:
int a[10];
std::cout << sizeof(a) / sizeof(*a);
This will print "10". You might be confusing arrays with pointers. Here:
int* a = new int[10];
a is not an array. It's a pointer.
C arrays do have a known and retrievable size -- which is accessed by the sizeof operator. You appear to be confusing arrays and pointers. We cannot know how large a block of memory a pointer is pointing to anymore then you can tell me how many houses are on my street if I told you I live at 10 Downing Street.
C arrays are really just a special case of pointers. They are directly allocated on the stack, and whilst the original array variable they are declared is visible, the compiler is able to tell us the size of the array. However, arrays frequently decompose into pointers. This may be because it was passed to another method or stored in another variable. When this happens, because the compiler can no longer be 100% sure the pointer is pointing at an array (in all cases, that is), we lose the ability to know the size of the array referenced by the pointer.

Is there any way to get the length of an dynamically allocated array in C++? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Is there any way to determine the size of a C++ array programmatically? And if not, why?
Can I get the length of a dynamically allocated array in C++, if not, how does C++'s delete operator know how many elements to free.
I searched the forum and found that for dynamically allocated arrays, the length are stored in the four bytes before the array's header or somewhere else. How can I get this value?
The four bytes before the header are definitely an implementation detail you shouldn't use. If you need to know the size of your array, use a std::vector.
In short, you can't. This is implementation defined, so you cannot access it. You can (and have to), however, store it in a variable, or use some types that control the size, such as std::vector, std::string, etc.
The delete[] operator knows because the implementation of the C++ library has that information, but it is not available to the C++ program. In some implementations, that's the case, it is stored in some bytes before the actual address of the pointer, but again, you cannot know it.
You cannot. However, std::vector<T> v(N) is almost exactly the same thing as new T[N] and offers v.size(), so you should really be using that, unless you have a terrific reason why you want to use the manual array.
Another option is to define your own allocator (overloading operator new[] and delete[] for whatever class you care about or doing #define malloc mymalloc and #define free myfree depending on your situation). Where you store your array length is up to you but generally one would allocate extra bytes (whatever amount is the proper amount for memory alignment - generally 4, 8, or 16) before the beginning of the array and store the length there.

What is the "proper" way to allocate variable-sized buffers in C++?

This is very similar to this question, but the answers don't really answer this, so I thought I'd ask again:
Sometimes I interact with functions that return variable-length structures; for example, FSCTL_GET_RETRIEVAL_POINTERS in Windows returns a variably-sized RETRIEVAL_POINTERS_BUFFER structure.
Using malloc/free is discouraged in C++, and so I was wondering:
What is the "proper" way to allocate variable-length buffers in standard C++ (i.e. no Boost, etc.)?
vector<char> is type-unsafe (and doesn't guarantee anything about alignment, if I understand correctly), new doesn't work with custom-sized allocations, and I can't think of a good substitute. Any ideas?
I would use std::vector<char> buffer(n). There's really no such thing as a variably sized structure in C++, so you have to fake it; throw type safety out the window.
If you like malloc()/free(), you can use
RETRIEVAL_POINTERS_BUFFER* ptr=new char [...appropriate size...];
... do stuff ...
delete[] ptr;
Quotation from the standard regarding alignment (expr.new/10):
For arrays of char and unsigned char, the difference between the
result of the new-expression and the address returned by the
allocation function shall be an integral multiple of the strictest
fundamental alignment requirement (3.11) of any object type whose size
is no greater than the size of the array being created. [ Note:
Because allocation functions are assumed to return pointers to storage
that is appropriately aligned for objects of any type with fundamental
alignment, this constraint on array allocation overhead permits the
common idiom of allocating character arrays into which objects of
other types will later be placed. — end note ]
I don't see any reason why you can't use std::vector<char>:
{
std::vector<char> raii(memory_size);
char* memory = &raii[0];
//Now use `memory` wherever you want
//Maybe, you want to use placement new as:
A *pA = new (memory) A(/*...*/); //assume memory_size >= sizeof(A);
pA->fun();
pA->~A(); //call the destructor, once done!
}//<--- just remember, memory is deallocated here, automatically!
Alright, I understand your alignment problem. It's not that complicated. You can do this:
A *pA = new (&memory[i]) A();
//choose `i` such that `&memory[i]` is multiple of four, or whatever alignment requires
//read the comments..
You may consider using a memory pool and, in the specific case of the RETRIEVAL_POINTERS_BUFFER structure, allocate pool memory amounts in accordance with its definition:
sizeof(DWORD) + sizeof(LARGE_INTEGER)
plus
ExtentCount * sizeof(Extents)
(I am sure you are more familiar with this data structure than I am -- the above is mostly for future readers of your question).
A memory pool boils down to "allocate a bunch of memory, then allocate that memory in small pieces using your own fast allocator".
You can build your own memory pool, but it may be worth looking at Boosts memory pool, which is a pure header (no DLLs!) library. Please note that I have not used the Boost memory pool library, but you did ask about Boost so I thought I'd mention it.
std::vector<char> is just fine. Typically you can call your low-level c-function with a zero-size argument, so you know how much is needed. Then you solve your alignment problem: just allocate more than you need, and offset the start pointer:
Say you want the buffer aligned to 4 bytes, allocate needed size + 4 and add 4 - ((&my_vect[0] - reinterpret_cast<char*>(0)) & 0x3).
Then call your c-function with the requested size and the offsetted pointer.
Ok, lets start from the beginning. Ideal way to return variable-length buffer would be:
MyStruct my_func(int a) { MyStruct s; /* magic here */ return s; }
Unfortunately, this does not work since sizeof(MyStruct) is calculated on compile-time. Anything variable-length just do not fit inside a buffer whose size is calculated on compile-time. The thing to notice that this happens with every variable or type supported by c++, since they all support sizeof. C++ has just one thing that can handle runtime sizes of buffers:
MyStruct *ptr = new MyStruct[count];
So anything that is going to solve this problem is necessarily going to use the array version of new. This includes std::vector and other solutions proposed earlier. Notice that tricks like the placement new to a char array has exactly the same problem with sizeof. Variable-length buffers just needs heap and arrays. There is no way around that restriction, if you want to stay within c++. Further it requires more than one object! This is important. You cannot make variable-length object with c++. It's just impossible.
The nearest one to variable-length object that the c++ provides is "jumping from type to type". Each and every object does not need to be of same type, and you can on runtime manipulate objects of different types. But each part and each complete object still supports sizeof and their sizes are determined on compile-time. Only thing left for programmer is to choose which type you use.
So what's our solution to the problem? How do you create variable-length objects? std::string provides the answer. It needs to have more than one character inside and use the array alternative for heap allocation. But this is all handled by the stdlib and programmer do not need to care. Then you'll have a class that manipulates those std::strings. std::string can do it because it's actually 2 separate memory areas. The sizeof(std::string) does return a memory block whose size can be calculated on compile-time. But the actual variable-length data is in separate memory block allocated by the array version of new.
The array version of new has some restrictions on it's own. sizeof(a[0])==sizeof(a[1]) etc. First allocating an array, and then doing placement new for several objects of different types will go around this limitation.