Heap_size in heap_sort - c++

I'm reading Cormen's "Introduction to Algorithms", and I'm trying to implement a heap-sort, and there's one thing I continually fail to understand: how do we calculate the heap_size for a given array?
My textbook says
An array A that represents a heap is an object with two attributes:
A.length, which (as usual) gives the number of elements in the array,
and A.heap-size, which represents how many elements in the heap are
stored within array A. That is, although A[1 .. A.length] may contain
numbers, only the elements in A[1..A.heap-size],where 0 <= A.heap-size <=
A.length, are valid elements of the heap.
If I implement an array as std::vector<T> Arr, then its' size would be Arr.size, but what would its' heap_size be is currently beyond me.

The heap size should be a separately stored variable, which you manage yourself.
Whenever you remove from or add to the heap, you should decrement or increment the value appropriately.
In C++, using a vector, you may actually be able to use the size, since the underlying representation is an array that's at least as big as the size of the vector, and it's guaranteed to stay the same size if you call resize with a smaller size. (So the underlying array will be the array size and the vector size will be the heap size).

Related

How are vectors(in C++) having elements of variable size allocated?

I wrote the following code to accept test-cases on a competetive programming website. It uses a vector input of the structure case to store the inputs for given test-cases all at once, and then process them one at a time( I have left out the loops that take the input and calculate the output because they are irrelevant to the question.)
#include<iostream>
#include<vector>
using namespace std;
struct case{
int n, m;
vector<int> jobsDone;
};
int main(){
int testCase;
cin>>testCase;
vector<case> input;
input.reserve(testCase);
//The rest of the code is supposed to be here
return 0;
}
As I was writing this code, I realised that the working of input.reserve(t) in such a case where the element size is variable(since each instance of the structure case also has a vector of variable size) would be difficult. Infact, even if I had not explicitly written the reserve() statement, the vector still would have reserved a minumum number of elemtns.
For this particular situation, I have the following questions regarding the vector input:
Wouldn't random access in O(1) time be impossible in this case, since the beginning position of every element is not known?
How would the vector input manage element access at all when the beginning location of every element cannot be calculated? Will it pad all the entries to the size of the maximum entry?
Should I rather be implementing cases using a vector of pointers pointing to each instance of case? I am thinking about this because if the vector pads each element to a size and wastes space, or it maintains the location to each element, and random access is not constant in time, hence there is no use for a vector anyway.
Every object type has a fixed size. This is what sizeof returns. A vector itself typically holds a pointer to the array of objects, the number of objects for which space has been allocated, and the number of objects actually contained. The size of these three things is independent of the number of elements in the vector.
For example, a vector<int> might contain:
1) An int * holding the address of the data.
2) A size_t holding the number of objects we've allocated space for
3) A size_t holding the number of objects contained in the vector.
This will probably be somewhere around 24 bytes, regardless of how many objects are in the vector. And this is what sizeof(vector<int>) will return.

trim array to elements between i and j

A classic, I'm looking for optimisation here : I have an array of things, and after some processing I know I'm only interested in elements i to j. How to trim my array in the fatset, lightest way, with complete deletions/freeing of memory of elements before i and after j ?
I'm doing mebedded C++, so I may not be able to compile all sorts of library let's say. But std or vector things welcome in a first phase !
I've tried, for array A to be trimmed between i and j, with variable numElms telling me the number of elements in A :
A = &A[i];
numElms = i-j+1;
As it is this yields an incompatibility error. Can that be fixed, and even when fixed, does that free the memory at all for now-unused elements?
A little context : This array is the central data set of my module, and it can be heavy. It will live as long as the module lives. And there's no need to carry dead weight all this time. This is the very first thing that is done - figuring which segment of the data set has to be at all analyzed, and trimming and dumping the rest forever, never to use it again (until the next cycle where we get a fresh array with possibily a compeltely different size).
When asking questions about speed your millage may very based on the size of the array you're working with, but:
Your fastest way will be to not trim the array, just use A[index + i] to find the elements you want.
The lightest way to do this would be to:
Allocate a dynamic array with malloc
Once i and j are found copy that range to the head of the dynamic array
Use realloc to resize the dynamic array to the size j - i + 1
However you have this tagged as C++ not C, so I believe that you're also interested in readability and the required programming investment, not raw speed or weight. If this is true then I would suggest use of a vector or deque.
Given vector<thing> A or a deque<thing> A you could do:
A.erase(cbegin(A), next(cbegin(A), i));
A.resize(j - i + 1);
There is no way to change aloocated memory block size in standard C++ (unless you have POD data — in this case C facilities like realloc could be used). The only way to trim an array is to allocate new array. copy/move needed elements and destroy old array.
You can do it manually, or using vectors:
int* array = new int[10]{0,1,2,3,4,5,6,7,8,9};
std::vector<int> vec {0,1,2,3,4,5,6,7,8,9};
//We want only elements 3-5
{
int* new_array = new int[3];
std::copy(array + 3, array + 6, new_array);
delete[] array;
array = new_array;
}
vec = std::vector<int>(vec.begin()+3, vec.begin()+6);
If you are using C++11, both approaches should have same perfomance.
If you only want to remove extra elements and do not really want to release memory (for example you might want to add more elements later) you can follow NathanOliver link
However, you should consider: do you really need that memory freed immideately? Do you need to move elements right now? Will you array live for such long time that this memory would be lost for your program completely? Maybe you need a range or perharps a view to the array content? In many cases you can store two pointers (or pointer and size) to denote your "new" array, while keeping old one to be released all at once.

Filling a vector with out-of-order data in C++

I'd like to fill a vector with a (known at runtime) quantity of data, but the elements arrive in (index, value) pairs rather than in the original order. These indices are guaranteed to be unique (each index from 0 to n-1 appears exactly once) so I'd like to store them as follows:
vector<Foo> myVector;
myVector.reserve(n); //total size of data is known
myVector[i_0] = v_0; //data v_0 goes at index i_0 (not necessarily 0)
...
myVector[i_n_minus_1] = v_n_minus_1;
This seems to work fine for the most part; at the end of the code, all n elements are in their proper places in the vector. However, some of the vector functions don't quite work as intended:
...
cout << myVector.size(); //prints 0, not n!
It's important to me that functions like size() still work--I may want to check for example, if all the elements were actually inserted successfully by checking if size() == n. Am I initializing the vector wrong, and if so, how should I approach this otherwise?
myVector.reserve(n) just tells the vector to allocate enough storage for n elements, so that when you push_back new elements into the vector, the vector won't have to continually reallocate more storage -- it may have to do this more than once, because it doesn't know in advance how many elements you will insert. In other words you're helping out the vector implementation by telling it something it wouldn't otherwise know, and allowing it to be more efficient.
But reserve doesn't actually make the vector be n long. The vector is empty, and in fact statements like myVector[0] = something are illegal, because the vector is of size 0: on my implementation I get an assertion failure, "vector subscript out of range". This is on Visual C++ 2012, but I think that gcc is similar.
To create a vector of the required length simply do
vector<Foo> myVector(n);
and forget about the reserve.
(As noted in the comment you an also call resize to set the vector size, but in your case it's simpler to pass the size as the constructor parameter.)
You need to call myVector.resize(n) to set (change) the size of the vector. calling reserve doesn't actually resize the vector, it just makes it so you can later resize without reallocating memory. Writing past the end of the vector (as you are doing here -- the vector size is still 0 when you write to it) is undefined behavior.

STL's vector resizing

I can't find this piece of information. I'm dealing with an odd situation here where i'm inside of a loop and i can get a random information at any given time. This information has to be stored in a vector. Now each frame i have to set this vector to ensure that i won't exeed the space (i'm writing values into random points in the vector using indexing).
Now assuming there's no way to change this piece of code, i want to know, does the vector "ignore" the resize() function if i send an argument that's exactly the size of the vector? Where can i find this information?
From MSDN reference1
If the container's size is less than the requested size, _Newsize, elements are added to the vector until it reaches the requested size. If the container's size is larger than the requested size, the elements closest to the end of the container are deleted until the container reaches the size _Newsize. If the present size of the container is the same as the requested size, no action is taken
The ISO C++ standard (page 485 2) specifies this behaviour for vector::resize
void resize ( size_type sz , T c = T ());
if ( sz > size ())
insert ( end () , sz - size () , c );
else if ( sz < size ())
erase ( begin ()+ sz , end ());
else
; // Does nothing
So yes, the vector ignores it and you don't need to perform a check on your own.
Kinda-sorta.
Simply resizing a vector with resize() can only result in more memory being used by the vector itself (will change how much is used by its elements). If there's not enough room in the reserved space, it will reallocate (and sometimes they like to pad themselves a bit so even if there is you might grow). If there is already plenty of room for the requested size and whatever padding it wants to do, it will not regrow.
When the specification says that the elements past the end of the size will be deleted, it means in place. Basically it will call _M_buff[i].~T() for each element it is deleting. Thus any memory your object allocates will be deleted, assuming a working destructor, but the space that the object itself occupies (it's size) will not be. The vector will grow, and grow, and grow to the maximum size you ever tell it to and will not reshrink while it exists.

basic question on std::vector in C++

C++ textbooks, and threads, like these say that vector elements are physically contiguous in memory.
But when we do operations like v.push_back(3.14) I would assume the STL is using the new operator to get more memory to store the new element 3.14 just introduced into the vector.
Now say the vector of size 4 is stored in computer memory cells labelled 0x7, 0x8, 0x9, 0xA. If cell 0xB contains some other unrelated data, how will 3.14 go into this cell? Does that mean cell 0xB will be copied somewhere else, erased to make room for 3.14?
The short answer is that the entire array holding the vector's data is moved around to a location where it has space to grow. The vector class reserves a larger array than is technically required to hold the number of elements in the vector. For example:
vector< int > vec;
for( int i = 0; i < 100; i++ )
vec.push_back( i );
cout << vec.size(); // prints "100"
cout << vec.capacity(); // prints some value greater than or equal to 100
The capacity() method returns the size of the array that the vector has reserved, while the size() method returns the number of elements in the array which are actually in use. capacity() will always return a number larger than or equal to size(). You can change the size of the backing array by using the reserve() method:
vec.reserve( 400 );
cout << vec.capacity(); // returns "400"
Note that size(), capacity(), reserve(), and all related methods refer to individual instances of the type that the vector is holding. For example, if vec's type parameter T is a struct that takes 10 bytes of memory, then vec.capacity() returning 400 means that the vector actually has 4000 bytes of memory reserved (400 x 10 = 4000).
So what happens if more elements are added to the vector than it has capacity for? In that case, the vector allocates a new backing array (generally twice the size of the old array), copies the old array to the new array, and then frees the old array. In pseudo-code:
if(capacity() < size() + items_added)
{
size_t sz = capacity();
while(sz < size() + items_added)
sz*=2;
T* new_data = new T[sz];
for( int i = 0; i < size(); i++ )
new_data[ i ] = old_data[ i ];
delete[] old_data;
old_data = new_data;
}
So the entire data store is moved to a new location in memory that has enough space to store the current data plus a number of new elements. Some vectors may also dynamically decrease the size of their backing array if they have far more space allocated than is actually required.
std::vector first allocates a bigger buffer, then copies existing elements from the "old" buffer to the "new" buffer, then it deletes the "old buffer", and finally adds the new element into the "new" buffer.
Generally, std::vector implementation grow their internal buffer by doubling the capacity each time it's necessary to allocate a bigger buffer.
As Chris mentioned, every time the buffer grows, all existing iterators are invalidated.
When std::vector allocates memory for the values, it allocates more than it needs; you can find out how much by calling capacity. When that capacity is used up, it allocates a bigger chunk, again larger than it needs, and copies everything from the old memory to the new; then it releases the old memory.
If there is not enough space to add the new element, more space will be allocated (as you correctly indicated), and the old data will be copied to the new location. So cell 0xB will still contain the old value (as it might have pointers to it in other places, it is impossible to move it without causing havoc), but the whole vector in question will be moved to the new location.
A vector is an array of memory. Typical implementation is that it grabs more memory than is required. It that footprint needs to expand over any other memory - the whole lot is copied. The old stuff is freed. The vector memory is on the stack - and that should be noted. It is also a good idea to say the maximum size is required.
In C++, which comes from C, memory is not 'managed' the way you describe - Cell 0x0B's contents will not be moved around. If you did that, any existing pointers would be made invalid! (The only way this could be possible is if the language had no pointers and used only references for similar functionality.)
std::vector allocates a new, larger buffer and stores the value of 3.14 to the "end" of the buffer.
Usually, though, for optimized this->push_back()s, a std::vector allocates memory about twice its this->size(). This ensures that a reasonable amount of memory is exchanged for performance. So, it is not guaranteed 3.14 will cause a this->resize(), and may simply be put into this->buffer[this->size()++] if and only if this->size() < this->capacity().