C++ vector.erase() function bug - c++

I have this vector:
list.push_back("one");
list.push_back("two");
list.push_back("three");
I use list.erase(list.begin() + 1) to delete the "two" and it works. But when I try to output the list again:
cout<<list[0]<<endl;
cout<<list[1]<<endl;
cout<<list[2]<<endl;
produces:
one
three
three
I tried targeting the last element for erasing with list.erase(list.begin() + 2), but the duplicate three's remain. I imagined index 2 should have been shifted and list[2] should have outputted nothing. list[3] outputs nothing, as it should.
I'm trying to erase the "two" and output the list as only:
one
three

When using cout<<list[2]<<endl; you asume that you still have three elements. But in fact you are accessing remaining data in a part of the memory that is no more used.
You should use list.size () to obtain the number of elements. So, something like:
for ( size_t i = 0; i < list.size (); i++ )
{
cout<<list[i]<<endl;
}

But you erased the element, thus the size of your container was decreased by one, i.e. from 3 to 2.
So, after the erase, you shouldn't do this:
cout<<list[0]<<endl;
cout<<list[1]<<endl;
cout<<list[2]<<endl; // Undefined Behaviour!!
but this:
cout<<list[0]<<endl;
cout<<list[1]<<endl;

In your case, the "three" is just copied to the index 1, which is expected. you is vector.size() == 2 now.
it is because vector will do pre-allocation, which help to improve the performance.

To keep from having to resize with every change, vector grabs a block of memory bigger than it needs and keeps it until forced to get bigger or instructed to get smaller.
To brutally simplify, think of it as
string * array = new string[100];
int capacity = 100
int size = 0;
In this case you can write all through that 100 element array without the program crashing because it is good and valid memory, but only values beneath size have been initialized and are meaningful. What happens when you read above size is undefined. Because reading out of bounds is a bad idea and preventing it has a performance cost that should not be paid by correct usage, the C++ standard didn't waste any time defining what the penalty for doing so is. Some debug or security critical versions will test and throw exceptions or mark unused portions with a canary value to assist in detecting faults, but most implementations are aiming for maximum speed and do nothing.
Now you push_back "one", "two", and "three". The array is still 100 elements, capacity is still 100, but size is now 3.
You erase array[1] and every element after 1 up to size will be copied up one element (note potentially huge performance cost here. vector is not the right data structure choice if you are adding and removing items from it at random locations) and size will be reduced by one resulting in "one", "three", and "three". The array is still 100 elements, capacity is still 100, but size is now 2.
Say you add another 99 strings. This pushes size each time a string is added and when size will exceed capacity, a new array will be made, the old array will be copied to the new, and the old will be freed. Something along the lines of:
capacity *= 1.5;
string * temp = new string[capacity];
for (int index = 0; index < size; index ++)
{
temp[index] = array[index];
}
delete array;
array = temp;
The array is now 150 elements, capacity is now 150, and size is now 101.
Result:
There is usually a bit of fluff around the end of a vector that will allow reading out of bounds without the program crashing, but do not confuse this with the program working.

Related

Shrink QVector to last 10 elements & rearrange one item

I'm trying to find the most "efficient" way or, at least fast enough for 10k items vector to shrink it to last 10 items and move last selected item to the end of it.
I initially though of using this method for shrinking :
QVector<QModelIndex> newVec(listPrimary.end() - 10, listPrimary.end());
But that does not work, and I'm not sure how to use the Qt interators / std to get it to work...
And then once that's done do this test
if(newVec.contains(lastItem))
{
newVec.insert(newVec[vewVec.indexOf(newVec)],newVec.size());
}
else{
newVec.push_back(lastItem);
}
QVector Class has a method that does what you want:
QVector QVector::mid(int pos, int length = ...) const
Returns a sub-vector which contains elements from this vector, starting at position pos. If length is -1 (the default), all elements after pos are included; otherwise length elements (or all remaining elements if there are less than length elements) are included.
So as suggested in the comments, you can do something like this:
auto newVec = listPrimary.mid(listPrimary.size() - 10);
You do not have to pass length because its default value ensures that all elements after pos are included.

C++ vector performance with predefined capacity

There is 2 ways to define std::vector(that i know of):
std::vector<int> vectorOne;
and
std::vector<int> vectorTwo(300);
So if i don't define the first and fill it with 300 int's then it has to reallocate memory to store those int's. that would mean it would not for example be address 0x0 through 0x300 but there could be memory allocated inbetween because it has to be reallocated after, but the second vector will already have those addresses reserved for them so there would be no space inbetween.
Does this affect perfomance at all and how could I meassure this?
std::vector is guaranteed to always store its data in a continuous block of memory. That means that when you add items, it has to try and increase its range of memory in use. If something else is in the memory following the vector, it needs to find a free block of the right size somewhere else in memory and copy all the old data + the new data to it.
This is a fairly expensive operation in terms of time, so it tries to mitigate by allocating a slightly larger block than what you need. This allows you to add several items before the whole reallocate-and-move-operation takes place.
Vector has two properties: size and capacity. The former is how many elements it actually holds, the latter is how many places are reserved in total. For example, if you have a vector with size() == 10 and capacity() == 18, it means you can add 8 more elements before it needs to reallocate.
How and when the capacity increases exactly, is up to the implementer of your STL version. You can test what happens on your computer with the following test:
#include <iostream>
#include <vector>
int main() {
using std::cout;
using std::vector;
// Create a vector with values 1 .. 10
vector<int> v(10);
std::cout << "v has size " << v.size() << " and capacity " << v.capacity() << "\n";
// Now add 90 values, and print the size and capacity after each insert
for(int i = 11; i <= 100; ++i)
{
v.push_back(i);
std::cout << "v has size " << v.size() << " and capacity " << v.capacity()
<< ". Memory range: " << &v.front() << " -- " << &v.back() << "\n";
}
return 0;
}
I ran it on IDEOne and got the following output:
v has size 10 and capacity 10
v has size 11 and capacity 20. Memory range: 0x9899a40 -- 0x9899a68
v has size 12 and capacity 20. Memory range: 0x9899a40 -- 0x9899a6c
v has size 13 and capacity 20. Memory range: 0x9899a40 -- 0x9899a70
...
v has size 20 and capacity 20. Memory range: 0x9899a40 -- 0x9899a8c
v has size 21 and capacity 40. Memory range: 0x9899a98 -- 0x9899ae8
...
v has size 40 and capacity 40. Memory range: 0x9899a98 -- 0x9899b34
v has size 41 and capacity 80. Memory range: 0x9899b40 -- 0x9899be0
You see the capacity increase and re-allocations happening right there, and you also see that this particular compiler chooses to double the capacity every time you hit the limit.
On some systems the algorithm will be more subtle, growing faster as you insert more items (so if your vector is small, you waste little space, but if it notices you insert a lot of items into it, it allocates more to avoid having to increase the capacity too often).
PS: Note the difference between setting the size and the capacity of a vector.
vector<int> v(10);
will create a vector with capacity at least 10, and size() == 10. If you print the contents of v, you will see that it contains
0 0 0 0 0 0 0 0 0 0
i.e. 10 integers with their default values. The next element you push into it, may (and likely will) cause a re-allocation. On the other hand,
vector<int> v();
v.reserve(10);
will create an empty vector, but with its initial capacity set to 10 rather than the default (probably 1). You can be certain that the first 10 elements you push into it will not cause an allocation (and the one probably will but not necessarily, as reserve may actually set the capacity to more than what you requested).
You should use reserve() method:
std::vector<int> vec;
vec.reserve(300);
assert(vec.size() == 0); // But memory is allocated
This solves the problem.
In your example it affects the performance greatly. You can expect, that when you overflow the vector, it doubles the allocated memory. So, if you push_back() into vector N times (and you haven't called "reserve()"), you can expect O(logN) reallocations, each of them causing copying of all values. So the total complexity is expected to be O(N*logN), although it is not specified by C++ standard.
The differences can be dramatic because if data is not adjacent in memory, the data may have to be fetched from main memory which is 200 times slower than a l1 cache fetch. This will not happen in a vector because data in a vector is required to be adjacent.
see https://www.youtube.com/watch?v=YQs6IC-vgmo
Use std::vector::reserve, when you can to avoid realloc events. The C++ 'chrono' header has good time utilities, to measure the time difference, in high resolution ticks.

why removing the first element in ArrayList is slow?

some where I've read that removing the first elementarrayList.remove(0); is slower than removing the last one arrayList.remove(arrayList.size()-1); please some one provide the detailed explanation. Thanks in advance
In ArrayList the elements reside in contiguous memory locations.
So when you remove the first element, all elements from 2 to n have to be shifted.
E.g. If you remove 1 from [1,2,3,4], 2, 3 and 4 have to be shifted to left to maintain contiguous memory allocation.
This makes it a little slower.
On the other hand, if you remove the last element, there is no shifting required since all the remaining elements are in the proper place.
Implementation of remove:
public E More ...remove(int index) {
rangeCheck(index);
modCount++;
E oldValue = elementData(index);
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
The values are stored in an array, so if the last one is removed it only set the value in the array to null (elementData[--size] = null). But if it is somewhere else it needs to use arraycopy to move all the elements after it. So in the code above you can see clearly: index = size - 1 implies the arraycopy call (the extra time used).

C++ - read 1000 floats and insert them into a vector of size 10 by keeping the lowest 10 numbers only

So I am pretty new to c++ and I am not sure if there is a data structure already created to facilitate what I am trying to do (so I do not reinvent the wheel):
What I am trying to do
I am reading a file where I need to parse the file, do some calculations on every floating value on every row of the file, and return the top 10 results from the file in ascending order.
What am I trying to optimize
I am dealing with a 1k file and a 1.9 million row file so for each row, I will get a result that is of size 72 so in 1k row, I will need to allocate a vector of 72000 elements and for the 1.9 million rows ... well you get the idea.
What I have so far
I am currently working with a vector for the results which then I sort and resize it to 10.
const unsigned int vector_space = circularVector.size()*72;
//vector for the results
std::vector<ResultType> results;
results.reserve(vector_space);
but this is extremely inefficient.
*What I want to accomplish *
I want to only keep a vector of size 10, and whenever I perform a calculation, I will simply insert the value into the vector and remove the largest floating point that was in the vector, thus maintaining the top 10 results in ascending order.
Is there a structure already in c++ that will have such behavior?
Thanks!
EDIT: Changed to use the 10 lowest elements rather than the highest elements as the question now makes clear which is required
You can use a std::vector of 10 elements as a max heap, in which the elements are partially sorted such that the first element always contains the maximum value. Note that the following is all untested, but hopefully it should get you started.
// Create an empty vector to hold the highest values
std::vector<ResultType> results;
// Iterate over the first 10 entries in the file and put the results in the vector
for (... ; i < 10; i++) {
// Calculate the value of this row
ResultType r = ....
// Add it to the vector
results.push_back(r);
}
// Now that the vector is "full", turn it into a heap
std::make_heap(results.begin(), results.end());
// Iterate over all the remaining rows, adding values which are lower than the
// current maximum
for (i = 10; .....) {
// Calculate the value for this row
ResultType r = ....
// Compare it to the max element in the heap
if (r < results.front()) {
// Add the new element to the vector
results.push_back(r);
// Move the existing minimum to the back and "re-heapify" the rest
std::pop_heap(results.begin(), results.end());
// Remove the last element from the vector
results.pop_back();
}
}
// Finally, sort the results to put them all in order
// (using sort_heap just because we can)
std::sort_heap(results.begin(), results.end());
Yes. What you want is a priority queue or heap, defined so as to remove the lowest value. You just need to do such a remove if the size after the insertion is greater than 10. You should be able to do this with STL classes.
Just use std::set to do that, since in std::set all values are sorted from min to max.
void insert_value(std::set<ResultType>& myset, const ResultType& value){
myset.insert(value);
int limit = 10;
if(myset.size() > limit){
myset.erase(myset.begin());
}
}
I think MaxHeap will work for this problem.
1- Create a max heap of size 10.
2- Fill the heap with 10 elements for the first time.
3- For 11th element check it with the largest element i.e root/element at 0th index.
4- If 11th element is smaller; replace the root node with 11th element and heapify again.
Repeat the same steps until the whole file is parsed.

Copying part of an array stored in a memory mapped file

I have an array of doubles stored in a memory mapped file, and I want to read the last 3 entries of the array (or some arbitrary entry for that matter).
It is possible to copy the entire array stored in the MMF to an auxiliary array:
void ReadDataArrayToMMF(double* dataArray, int arrayLength, LPCTSTR* pBufPTR)
{
CopyMemory(dataArray, (PVOID)*pBufPTR, sizeof(double)*arrayLength);
}
and use the needed entries, but that would mean copying the entire array for just a few values actually needed.
I can shrink arrayLength to some number n in order to get the first n entries, but I'm having problems with copying a part of the array that doesn't start from the first entry. I tried playing with pBufPTR pointer but could only get runtime errors.
Any ideas on how to access/copy memory from the middle of the array without needing to copy the entire array?
To find start offset for nth-element:
const double *offset = reinterpret_cast<const double*>( *pBufPTR ) + n;
To copy last 3 elements:
CopyMemory( dataArray, reinterpret_cast<const double*>( *pBufPTR ) + arrayLength - 3, 3 * sizeof(double) );
You can just add the (0 based) index of the first array element you want to copy to the dataArray pointer value that you pass in... the pointer will be increased by the index times sizeof(double). Makes sure you pass in an arrayLength value reflecting the number of elements to operate on, and not the original array length.
For example, to copy the last 10 elements (assuming you've already checked there are at least 10 elements)...
ReadDataArrayToMMF(dataArray + arrayLength - 10, 10, &myBufPTR);
Similarly (albeit illustrating a different notation to get the address), to get elements 20..24:
ReadDataArrayToMMF(&dataArray[20], 5, &myBufPTR);