Two short questions about std::vector - c++

When a vector is created it has a default allocation size (probably this is not the right term to use, maybe step size?). When the number of elements reaches this size, the vector is resized. Is this size compiler specific? Can I control it? Is this a good idea?
Do repeated calls to vector::size() recount the number of elements (O(n) calculation) or is this value stored somewhere (O(1) lookup). For example, in the code below
// Split given string on whitespace
vector<string> split( const string& s )
{
vector<string> tokens;
string::size_type i, j;
i = 0;
while ( i != s.size() ) {
// ignore leading blanks
while ( isspace(s[i]) && i != s.size() ) {
i++;
}
// found a word, now find its end
j = i;
while ( !isspace(s[j]) && j != s.size() ) {
j++;
}
// if we found a word, add it to the vector
if ( i != j ) {
tokens.push_back( s.substr(i, j-i) );
i = j;
}
}
return tokens;
}
assuming s can be very large, should I call s.size() only once and store the result?
Thanks!

In most cases, you should leave the allocation alone unless you know the number of items ahead of time, so you can reserve the correct amount of space.
At least in every case of which I'm aware, std::vector::size() just returns a stored value, so it has constant complexity. In theory, the C++ standard allows it to do otherwise. There are reasons to allow otherwise for some other containers, primarily std::list, and rather than make a special case for those, they simply recommend constant time for all containers instead of requiring it for any. I can't quite imagine a vector::size that counted elements though -- I'm pretty no such thing has ever existed.
P.S., an easier way to do what your code above does, is something like this:
std::vector<string> split(std::string const &input) {
vector<string> ret;
istringstream buffer(input);
copy(istream_iterator<string>(input),
istream_iterator<string>(),
back_inserter(ret));
return ret;
}
Edit: IMO, The C++ Standard Library, by Nicolai Josuttis is an excellent reference on such things.

The actual size of the capacity increment is implementation-dependent, but it has to be (roughly) exponential to support the container's complexity requirements. As an example, the Visual C++ standard library will allocate exactly the space required for the first few elements (five, if I recall correctly), then increases the size exponentially after that.
The size has to be stored somehow in the vector, otherwise it doesn't know where the end of the sequence is! However, it may not necessarily be stored as an integer. The Visual C++ implementation (again, as an example) stores three pointers:
a pointer to the beginning of the underlying array,
a pointer to the current end of the sequence, and
a pointer to the end of the underlying array.
The size can be computed from (1) and (2); the capacity can be computed from (1) and (3).
Other implementations might store the information differently.

It's library-specific. You might be able to control the incremental allocation, but you might not.
The size is stored, so it is very fast (constant time) to retrieve. How else could it work? C has no way of knowing in general whether a memory location is "real data" or not.

The resizing mechanism is usually fixed. (Most compilers double the size of the vector when it reaches the limit.) The C++ standard specifies no way to control this behaviour.
The size is internally updated whenever you insert/remove elements and when you call size(), it's returned immediately. So yes, it's O(1).

Unrelated to your actual questions, but here's a more "STL" way of doing what you're doing:
vector<string> split(const string& s)
{
istringstream stream(s);
istream_iterator<string> iter(stream), eos;
vector<string> tokens;
copy(iter, eos, back_inserter(tokens));
return tokens;
}

When the number of elements reaches this size, the vector is resized. Is this size compiler specific? Can I control it? Is this a good idea?
In general, this is a library-specific behavior, but you may be able to influence this behavior by specifying a custom allocator, which is non-trivial work.
Do repeated calls to vector::size() recount the number of elements (O(n) calculation) or is this value stored somewhere (O(1) lookup).
Most implementations store the size as a member. It's a single memory read.

Related

Best way to concatenate and condense a std::vector<std::string>

Disclaimer: This problem is more of a theoretical, rather than a practical interest. I want to find out various different ways of doing this, with speed as icing on the new year cake.
The Problem
I want to be able to store a list of strings, and be able to quickly combine them into 1 if needed.
In short, I want to condense a structure (currently a std::vector<std::string>) that looks like
["Hello, ", "good ", "day ", " to", " you!"]
to
["Hello, good day to you!"]
Is there any idiomatic way to achieve this, ala python's [ ''.join(list_of_strings) ]?
What is the best way to achieve this in C++, in terms of time?
Possible Approaches
The first idea I had is to
loop over the vector,
append each element to the first,
simultaneously delete the element.
We will be concatenating with += and reserve(). I assume that max_size() will not be reached.
Approach 1 (The Greedy Approach)
So called because it ignores conventions and operates in-place.
#if APPROACH == 'G'
// Greedy Approach
void condense(std::vector< std::string >& my_strings, int total_characters_in_list)
{
// Reserve the size for all characters, less than max_size()
my_strings[0].reserve(total_characters_in_list);
// There are strings left, ...
for(auto itr = my_strings.begin()+1; itr != my_strings.end();)
{
// append, and...
my_strings[0] += *itr;
// delete, until...
itr = my_strings.erase(itr);
}
}
#endif
Now I know, you would say that this is risky and bad. So:
loop over the vector,
append each element to another std::string,
clear the vector and make the string first element of the vector.
Approach 2 (The "Safe" Haven)
So called because it does not modify the container while iterating over it.
#if APPROACH == 'H'
// Safe Haven Approach
void condense(std::vector< std::string >& my_strings, int total_characters_in_list)
{
// Store the whole vector here
std::string condensed_string;
condensed_string.reserve(total_characters_in_list);
// There are strings left...
for(auto itr = my_strings.begin(); itr != my_strings.end(); ++itr)
{
// append, until...
condensed_string += *itr;
}
// remove all elements except the first
my_strings.resize(1);
// and set it to condensed_string
my_strings[0] = condensed_string;
}
#endif
Now for the standard algorithms...
Using std::accumulate from <algorithm>
Approach 3 (The Idiom?)
So called simply because it is a one-liner.
#if APPROACH == 'A'
// Accumulate Approach
void condense(std::vector< std::string >& my_strings, int total_characters_in_list)
{
// Reserve the size for all characters, less than max_size()
my_strings[0].reserve(total_characters_in_list);
// Accumulate all the strings
my_strings[0] = std::accumulate(my_strings.begin(), my_strings.end(), std::string(""));
// And resize
my_strings.resize(1);
}
#endif
Why not try to store it all in a stream?
Using std::stringstream from <sstream>.
Approach 4 (Stream of Strings)
So called due to the analogy of C++'s streams with flow of water.
#if APPROACH == 'S'
// Stringstream Approach
void condense(std::vector< std::string >& my_strings, int) // you can remove the int
{
// Create out stream
std::stringstream buffer(my_strings[0]);
// There are strings left, ...
for(auto itr = my_strings.begin(); itr != my_strings.end(); ++itr)
{
// add until...
buffer << *itr;
}
// resize and assign
my_strings.resize(1);
my_strings[0] = buffer.str();
}
#endif
However, maybe we can use another container rather than std::vector?
In that case, what else?
(Possible) Approach 5 (The Great Indian "Rope" Trick)
I have heard about the rope data structure, but have no idea if (and how) it can be used here.
Benchmark and Verdict:
Ordered by their time efficiency (currently and surprisingly) is1:
Approaches Vector Size: 40 Vector Size: 1600 Vector Size: 64000
SAFE_HAVEN: 0.1307962699997006 0.12057728999934625 0.14202970000042114
STREAM_OF_STRINGS: 0.12656566000077873 0.12249500000034459 0.14765803999907803
ACCUMULATE_WEALTH: 0.11375975999981165 0.12984520999889354 3.748660090001067
GREEDY_APPROACH: 0.12164988000004087 0.13558526000124402 22.6994204800023
timed with2:
NUM_OF_ITERATIONS = 100
test_cases = [ 'greedy_approach', 'safe_haven' ]
for approach in test_cases:
time_taken = timeit.timeit(
f'system("{approach + ".exe"}")',
'from os import system',
number = NUM_OF_ITERATIONS
)
print(approach + ": ", time_taken / NUM_OF_ITERATIONS)
Can we do better?
Update: I tested it with 4 approaches (so far), as I could manage in my little time. More incoming soon. It would have been better to fold the code, so that more approaches could be added to this post, but it was declined.
1 Note that these readings are only for a rough estimate. There are a lot of things that influence the execution time, and note that there are some inconsistencies here as well.
2 This is the old code, used to test only the first two approaches. The current code is a good deal longer, and more integrated, so I am not sure I should add it here.
Conclusions:
Deleting elements is very costly.
You should just copy the strings somewhere, and resize the vector.
Infact, better reserve enough space too, if copying to another string.
You could also try std::accumulate:
auto s = std::accumulate(my_strings.begin(), my_strings.end(), std::string());
Won't be any faster, but at least it's more compact.
With range-v3 (and soon with C++20 ranges), you might do:
std::vector<std::string> v{"Hello, ", "good ", "day ", " to", " you!"};
std::string s = v | ranges::view::join;
Demo
By default, I would use std::stringstream. Simply construct the steam, stream in all the strings from the vector, and then return the output string. It isn't very efficient but it is clear what it does.
In most cases, one doesn't need fast method when dealing with strings and printing - so the "easy to understand and safe" methods are better. Plus, compilers nowadays are good at optimizing inefficiencies in simple cases.
The most efficient way... it is a hard question. Some applications require efficiency on multiple fronts. In these cases you might need to utilize multithreading.
Personally, I'd construct a second vector to hold a single "condensed" string, construct the condensed string, and then swap vectors when done.
void Condense(std::vector<std::string> &strings)
{
std::vector<std::string> condensed(1); // one default constructed std::string
std::string &constr = &condensed.begin(); // reference to first element of condensed
for (const auto &str : strings)
constr.append(str);
std::swap(strings, condensed); // swap newly constructed vector into original
}
If an exception is thrown for some reason, then the original vector is left unchanged, and cleanup occurs - i.e. this function gives a strong exception guarantee.
Optionally, to reduce resizing of the "condensed" string, after initialising constr in the above, one could do
// optional: compute the length of the condensed string and reserve
std::size_t total_characters_in_list = 0;
for (const auto &str : strings)
total_characters_in_list += str.size();
constr.reserve(total_characters_in_list);
// end optional reservation
As to how efficient this is compared with alternatives, that depends. I'm also not sure it's relevant - if strings keep on being appended to the vector, and needing to be appended, there is a fair chance that the code that obtains the strings from somewhere (and appends them to the vector) will have a greater impact on program performance than the act of condensing them.

Copying vector elements to a vector pair

In my C++ code,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
So I combined these two vectors into a single one,
void combineVectors(vector<string>& strVector, vector <int>& intVector, vector < pair <string, int>>& pairVector)
{
for (int i = 0; i < strVector.size() || i < intVector.size(); ++i )
{
pairVector.push_back(pair<string, int> (strVector.at(i), intVector.at(i)));
}
}
Now this function is called like this,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
vector < pair <string, int>> pairVector
combineVectors(strVector, intVector, pairVector);
//rest of the implementation
The combineVectors function uses a loop to add the elements of other 2 vectors to the vector pair. I doubt this is a efficient way as this function gets called hundrands of times passing different data. This might cause a performance issue because everytime it goes through the loop.
My goal is to copy both the vectors in "one go" to the vector pair. i.e., without using a loop. Am not sure whether that's even possible.
Is there a better way of achieving this without compromising the performance?
You have clarified that the arrays will always be of equal size. That's a prerequisite condition.
So, your situation is as follows. You have vector A over here, and vector B over there. You have no guarantees whether the actual memory that vector A uses and the actual memory that vector B uses are next to each other. They could be anywhere.
Now you're combining the two vectors into a third vector, C. Again, no guarantees where vector C's memory is.
So, you have really very little to work with, in terms of optimizations. You have no additional guarantees whatsoever. This is pretty much fundamental: you have two chunks of bytes, and those two chunks need to be copied somewhere else. That's it. That's what has to be done, that's what it all comes down to, and there is no other way to get it done, other than doing exactly that.
But there is one thing that can be done to make things a little bit faster. A vector will typically allocate memory for its values in incremental steps, reserving some extra space, initially, and as values get added to the vector, one by one, and eventually reach the vector's reserved size, the vector has to now grab a new larger block of memory, copy everything in the vector to the larger memory block, then delete the older block, and only then add the next value to the vector. Then the cycle begins again.
But you know, in advance, how many values you are about to add to the vector, so you simply instruct the vector to reserve() enough size in advance, so it doesn't have to repeatedly grow itself, as you add values to it. Before your existing for loop, simply:
pairVector.reserve(pairVector.size()+strVector.size());
Now, the for loop will proceed and insert new values into pairVector which is guaranteed to have enough space.
A couple of other things are possible. Since you have stated that both vectors will always have the same size, you only need to check the size of one of them:
for (int i = 0; i < strVector.size(); ++i )
Next step: at() performs bounds checking. This loop ensures that i will never be out of bounds, so at()'s bound checking is also some overhead you can get rid of safely:
pairVector.push_back(pair<string, int> (strVector[i], intVector[i]));
Next: with a modern C++ compiler, the compiler should be able to optimize away, automatically, several redundant temporaries, and temporary copies here. It's possible you may need to help the compiler, a little bit, and use emplace_back() instead of push_back() (assuming C++11, or later):
pairVector.emplace_back(strVector[i], intVector[i]);
Going back to the loop condition, strVector.size() gets evaluated on each iteration of the loop. It's very likely that a modern C++ compiler will optimize it away, but just in case you can also help your compiler check the vector's size() only once:
int i=strVector.size();
for (int i = 0; i < n; ++i )
This is really a stretch, but it might eke out a few extra quantums of execution time. And that pretty much all obvious optimizations here. Realistically, the most to be gained here is by using reserve(). The other optimizations might help things a little bit more, but it all boils down to moving a certain number of bytes from one area in memory to another area. There aren't really special ways of doing that, that's faster than other ways.
We can use std:generate() to achieve this:
#include <bits/stdc++.h>
using namespace std;
vector <string> strVector{ "hello", "world" };
vector <int> intVector{ 2, 3 };
pair<string, int> f()
{
static int i = -1;
++i;
return make_pair(strVector[i], intVector[i]);
}
int main() {
int min_Size = min(strVector.size(), intVector.size());
vector< pair<string,int> > pairVector(min_Size);
generate(pairVector.begin(), pairVector.end(), f);
for( int i = 0 ; i < 2 ; i++ )
cout << pairVector[i].first <<" " << pairVector[i].second << endl;
}
I'll try and summarize what you want with some possible answers depending on your situation. You say you want a new vector that is essentially a zipped version of two other vectors which contain two heterogeneous types. Where you can access the two types as some sort of pair?
If you want to make this more efficient, you need to think about what you are using the new vector for? I can see three scenarios with what you are doing.
The new vector is a copy of your data so you can do stuff with it without affecting the original vectors. (ei you still need the original two vectors)
The new vector is now the storage mechanism for your data. (ei you
no longer need the original two vectors)
You are simply coupling the vectors together to make use and representation easier. (ei where they are stored doesn't actually matter)
1) Not much you can do aside from copying the data into your new vector. Explained more in Sam Varshavchik's answer.
3) You do something like Shakil's answer or here or some type of customized iterator.
2) Here you make some optimisations here where you do zero coping of the data with the use of a wrapper class. Note: A wrapper class works if you don't need to use the actual std::vector < std::pair > class. You can make a class where you move the data into it and create access operators for it. If you can do this, it also allows you to decompose the wrapper back into the original two vectors without copying. Something like this might suffice.
class StringIntContainer {
public:
StringIntContaint(std::vector<std::string>& _string_vec, std::vector<int>& _int_vec)
: string_vec_(std::move(_string_vec)), int_vec_(std::move(_int_vec))
{
assert(string_vec_.size() == int_vec_.size());
}
std::pair<std::string, int> operator[] (std::size_t _i) const
{
return std::make_pair(string_vec_[_i], int_vec_[_i]);
}
/* You may want methods that return reference to data so you can edit it*/
std::pair<std::vector<std::string>, std::vector<int>> Decompose()
{
return std::make_pair(std::move(string_vec_), std::move(int_vec_[_i])));
}
private:
std::vector<std::string> _string_vec_;
std::vector<int> int_vec_;
};

Why std::vector<uint8_t>::insert works 5 times faster than std::copy with MSVC 2015 compiler?

I have a trivial function that copies a byte block to std::vector:
std::vector<uint8_t> v;
void Write(const uint8_t * buffer, size_t count)
{
//std::copy(buffer, buffer + count, std::back_inserter(v));
v.insert(v.end(), buffer, buffer + count);
}
v.reserve(<buffer size>);
v.resize(0);
Write(<some buffer>, <buffer size>);
if I use std::vector<uint8_t>::insert it works 5 times faster than if I use std::copy.
I tried to compile this code with MSVC 2015 with enabled and disabled optimization and got the same result.
Looks like something is strange with std::copy or std::back_inserter implementation.
Standard library implementation is written with performance in mind, but performance is achieved only when optimization is ON.
//This reduces the performance dramatically if the optimization is switched off.
Trying to measure a function performance with optimization OFF is as pointless as asking ourselves if the law of gravitation would still be true if there were no mass left in the Universe.
The call to v.insert is calling a member function of the container. The member function knows how the container is implemented, so it can do things that a more generic algorithm can't do. In particular, when inserting a range of values designated by random-access iterators into a vector, the implementation knows how many elements are being added, so it can resize the internal storage once and then just copy the elements.
The call to std::copy with an insert-iterator, on the other hand, has to call insert for each element. It can't preallocate, because std::copy works with sequences, not containers; it doesn't know how to adjust the size of the container. So for large insertions into a vector the internal storage gets resized each time the vector is full and a new insertion is needed. The overhead of that reallocation is amortized constant time, but the constant is much larger than the constant when only one resizing is done.
With the call to reserve (which I overlooked, thanks, #ChrisDrew), the overhead of reallocating is not as significant. But the implementation of insert knows how many values are being copied, and it knows that those values are contiguous in memory (because the iterator is a pointer), and it knows that the values are trivially copyable, so it will use std::memcpy to blast the bits in all at once. With std::copy, none of that applies; the back inserter has to check whether a reallocation is necessary, and that code can't be optimized out, so you end up with a loop that copies an element at a time, checking for the end of the allocated space for each element. That's much more expensive than a plain std::memcpy.
In general, the more the algorithm knows about the internals of the data structure that it's accessing, the faster it can be. STL algorithms are generic, and the cost of that genericity can be more overhead than a that of a container-specific algorithm.
With a good implementation of std::vector, v.insert(v.end(), buffer, buffer + count); might be implemented as:
size_t count = last-first;
resize(size() + count);
memcpy(data+offset, first, count);
std::copy(buffer, buffer + count, std::back_inserter(v)) on the other hand will be implemented as:
while ( first != last )
{
*output++ = *first++;
}
which is equivalent to:
while ( first != last )
{
v.push_back( *first++ );
}
or (roughly):
while ( first != last )
{
// push_back should be slightly more efficient than this
v.resize(v.size() + 1);
v.back() = *first++;
}
Whilst in theory the compiler could optimise the above into a memcpy its unlikely to, at best you'll probably get the methods inlined so that you don't have a function call overhead, it'll still be writing one byte at a time whereas a memcpy will normally use vector instructions to copy multiple bytes at once.

strncpy equivalent of std::copy

The STL provides std::copy but it is tricky to use it with output containers with fixed sizes as there is no bounds checking on the output iterator
So I invented my own, something like below
template<class InputIterator , class OutputIterator>
void safecopy( InputIterator srcStart , InputIterator srcEnd ,
OutputIterator destStart , OutputIterator destEnd )
{
while ( srcStart != srcEnd && destStart != destEnd )
{
*destStart = *srcStart;
++srcStart;
++destStart;
}
}
int main()
{
std::istream_iterator<char> begin(std::cin), end;
char buffer[3];
safecopy( begin, end, buffer, buffer + 3 );
return 0;
}
Questions:
Am I reinventing the wheel here ? Is there an stl algorithm to do what I want.
Are there any deficiencies in my safecopy , does it work for everything std::copy works for ?
Let me promote my comment to an answer, so I have a bit more space.
First off, your implementation looks good.
Now, why isn't this in the standard? (The new standard adds std::copy_n, but that does something different, too.*)
Think about it like this: strncopy isn't really a "good" idea; it's just not a terrible idea. Since C doesn't have any dynamic data structures, a length-checked version is the best you can do.
But in C++ this doesn't fit nicely into the general idea of dynamic containers: You would rarely want to overwrite some elements, but rather create all elements, which you do by std::copy plus std::inserter. strncpy is a crutch which requires you to preallocate the destination data structure, but in C++ we can do a lot better than this. With dynamic containers, iterators and inserters, we can copy anything without needing to worry about allocation.
In other words, any abstract algorithm that you might conceive should have a better, more specific method of obtaining iterators and iterator ranges (think remove/erase); it is rarely the case that the ultimate goal of an algorithm is to only produce an output range that is constrained by some other destination range.
In summary: Yes, you can do that, but you can probably do better.
*) Though copy_n plus min of source and destination size could be used to create a bounded copy.
I would make one minor adjustment to your implementation. Give it a return value. Either the final output iterator, or an integer indicating the number of elements copied.
The main use case I can see for your function would be reading fixed size chunks from an input stream and you don't know when it will end. If it does end, you need some way of knowing that, and you need to know how many elements were copied before it actually ended. If you know how many elements were copied, and it didn't meet or exceed the size of the output range, that's how you can know it ended.
Yes. You're reinventing the wheel again!
For example, you could use std::copy as:
std::copy(s.begin(), s.begin() + 3 , buffer);
instead of this,
safecopy(s.begin(), s.end() , buffer, buffer + 3);
The usage of std::copy in this way is NOT less safer than your safecopy.
Or even better is std::copy_n which comes with C++11:
std::copy_n(s.begin(), 3, buffer);
This would work even if the argument is not random access iterator.
As for when you use std::vector<char>, you could use its constructor directly as:
std::vector<char> v(s.begin(), s.end());
No need of even std::copy.

Remove an element from the middle of an std::heap

I'm using a priority queue as a scheduler with one extra requirement. I need to be able to cancel scheduled items. This equates to removing an item from the middle of the priority queue.
I can't use std::priority_queue as access to any element other than top is protected.
I'm trying to use the algorithm's heap functions. But I'm still missing the piece I need. When I remove an element I from the middle of the heap I want it to rebuild itself efficiently. C++ provides these heap functions:
std::make_heap O(3n)
std::push_heap O(lg(n))
std::pop_heap O(2 lg(n))
I want a new function like std::repair_heap with a big-O < 3n. I'd provide it with location of the hole where the canceled item used to reside and it would properly adjust the heap.
It seems to be a huge oversight to not to provide a std::repair_heap function. Am I missing something obvious?
Is there library that provides an stl-compliant std::repair_heap?
Is there a better data structure for modeling a scheduler?
NOTE:
I'm not using an std::map for a few reasons.
A heap has constant memory overhead.
A heap has awesome cache locality.
I guess you know which element in the heap container (index n) you want to delete.
Set the value v[n] = BIG; the value BIG is really bigger than any other values in the heap.
Call std::push_heap( v.begin(), v.begin()+n+1 );
Call std::pop_heap( v.begin(), v.end() );
Call v.pop_back();
Done
Operation is O(ln(n))
RE: request for proof
First, a qualifier:
This method assumes something about the algorithm used by std push_heap.
Specifically, it assumes that std push_heap( v.begin(), v.begin()+n+1 )
will only alter the range [0, n]
for those elements which are ascendants of n, i.e., indices in the following set:
A(n)={n,(n-1)/2,((n-1)/2-1)/2....0}.
Here is a typical spec for std push_heap:
http://www.cplusplus.com/reference/algorithm/push_heap/
"Given a heap range [first,last-1), this function extends the range considered a heap to [first,last) by placing the value in (last-1) into its corresponding location in it."
Does it guarantee to use the "normal heap algorithm" that you read about in textbooks?
You tell me.
Anyway, here is the code which you can run and see, empirically, that it works.
I am using VC 2005.
#include <algorithm>
#include <vector>
#include <iostream>
bool is_heap_valid(const std::vector<int> &vin)
{
std::vector<int> v = vin;
std::make_heap(v.begin(), v.end());
return std::equal(vin.begin(), vin.end(), v.begin());
}
int _tmain(int argc, _TCHAR* argv[])
{
srand(0);
std::vector<int> v;
for (int i=0; i<100; i++)
{
v.push_back( rand() % 0x7fff );
}
std::make_heap(v.begin(), v.end());
bool bfail = false;
while( v.size() >= 2)
{
int n = v.size()/2;
v[n] = 0x7fffffff;
std::push_heap(v.begin(), v.begin()+n+1);
std::pop_heap(v.begin(), v.end());
v.resize(v.size()-1);
if (!is_heap_valid(v))
{
std::cout << "heap is not valid" << std::endl;
bfail = true;
break;
}
}
if (!bfail)
std::cout << "success" << std::endl;
return 0;
}
But I have another problem, which is how to know the index "n" which needs to be deleted. I cannot see how to keep track of that (know the place in the heap) while using std push_heap and std pop_heap. I think you need to write your own heap code and write the index in the heap to an object every time the object is moved in the heap. Sigh.
Unfortunately, the standard is missing this (fairly important) function. With g++, you can use the non-standard function std::__adjust_heap to do this, but there's no easy portable way of doing it -- and __adjust_heap is slightly different in different versions of g++, so you can't even do it portably over g++ versions.
How does your repair_heap() work? Here's my guess:
If your heap is defined by some iterator range, say (heapBegin, heapEnd). The element you want to remove is the root of some subtree of the heap, which is defined by some subrange (subHeapBegin, subHeapEnd). Use std::pop_heap(subHeapBegin, subHeapEnd), then if subHeapEnd != heapEnd, swap the values at *(subHeapEnd-1) and *(heapEnd-1), which should put your deleted item at the end of the heap container. Now you have to percolate the element at *(subHeapEnd-1) up in your subheap. If I haven't missed something, which is possible, then all that remains is to chop the end element off of the heap container.
Before going to the trouble of trying to code that correctly (I've skipped some details like calculating subHeapBegin and subHeapEnd), I'd run some tests to determine if make_heap() really slows you down. Big-O is useful, but it's not the same thing as actual execution time.
It seems to me that removing from the middle of a heap might mean the entire heap has to be rebuilt: The reason there's no repair_heap is because it would have to do the same (big-oh) work as make_heap.
Are you able to do something like put std::pair<bool, Item> in the heap and just invalidate items instead of removing them? Then when they finally get to the top just ignore the item and move along.
You can try ‘std::multiset’ which is implemented as the heap structure and support ‘std::erase’ operation, so you could ‘std::find’ the element then erase it.
Here's a bit of delphi code i used to remove items from a heap. I don't know this C++ of which you speak and don't have a repair function, but hey..
first the pop, so you get an idea of how the thing works:
function THeap.Pop: HeapItem;
begin
if fNextIndex > 1 then begin
Dec(fNextIndex);
Result:= fBuckets[1]; //no zero element
fBuckets[1] := fBuckets[fNextIndex];
fBuckets[fNextIndex] := nil;
FixHeapDown; //this has a param defaulting to
end
else
Result:= nil;
end;
now to contrast, the deletion:
procedure THeap.Delete(Item: HeapItem);
var
i:integer;
begin
for i:=1 to pred(fNextIndex) do
if Item=fBuckets[i] then begin
dec(fNextIndex);
fBuckets[i] := fBuckets[fNextIndex];
fBuckets[fNextIndex] := nil;
FixHeapDown(i);
break;
end;
end;
its of course a no-no to even think about
doing what we're doing here, but hey, costs
do change sometimes and jobs do get canceled.
enjoy.
i hope this helps.