How to prevent an iterator to go out of range? - c++

I am using a vector to map source line numbers to code addresses. It looks like if the address argument is higher than the highest in the table, the iterator points to the next, non-existing element. For error protection, I want to disallow out-of-range input arguments. Is there a more elegant method, than I use below?
findLinenoToAddress(unsigned int A)
{
if (A > AddressToLineMap[AddressToLineMap.size()-1]->Address)
A = AddressToLineMap[AddressToLineMap.size()-1]->Address;
std::vector<AddressToLineRecord*>::const_iterator it;
for(it = AddressToLineMap.begin(); it != AddressToLineMap.end(); it+=1)
{
if ((*it)->Address >= A)
break;
}
return (*it)->Lineno;
}

Indeed, as AndyG commented, your code suggests that the vector is sorted.
Because of this you should really use a binary search algorithm:
https://en.wikipedia.org/wiki/Binary_search_algorithm,
Where can I get a "useful" C++ binary search algorithm?
That is the reason why the current code is slow and definitely should not be used.
But trying to answer the exact question the minimal changes to your code could be like this (note checking for emptiness and immediate returns from ifs):
int findLinenoToAddress(unsigned int A)
{
if (AddressToLineMap.empty())
return 0;
if(A>AddressToLineMap[AddressToLineMap.size()-1]->Address)
return AddressToLineMap[AddressToLineMap.size()-1]->Lineno;
std::vector<AddressToLineRecord*>::const_iterator it;
for(it = AddressToLineMap.begin(); it != AddressToLineMap.end(); it+=1)
{
if((*it)->Address >= A) break;
}
return (*it)->Lineno;
}
The other method is to use a „sentinel”:
https://en.wikipedia.org/wiki/Sentinel_node
This method needs that you warrant that your vector ALWAYS has additional item at its end with UINT_MAX as Address (also it means that it is never empty).
Then the code could look like this:
int findLinenoToAddress(unsigned int A)
{
std::vector<AddressToLineRecord*>::const_iterator it;
for(it = AddressToLineMap.cbegin(); it != AddressToLineMap.cend(); it++)
{
if((*it)->Address >= A)
return (*it)->Lineno;
}
throw "an unreachable code";
}
This code should be greatly improved by using find_if:
Using find_if on a vector of object, but it will be as slow as other examples here.
So again - choose binary search instead.

Related

faster erase-remove idiom when I don't care about order and don't have duplicates?

I have a vector of objects and want to delete by value. However the value only occurs once if at all, and I don't care about sorting.
Obviously, if such delete-by-values were extremely common, and/or the data set quite big, a vector wouldn't be the best data structure. But let's say I've determined that not to be the case.
To be clear, if my code were C, I'd be happy with the following:
void delete_by_value( int* const piArray, int& n, int iValue ) {
for ( int i = 0; i < n; i++ ) {
if ( piArray[ i ] == iValue ) {
piArray[ i ] = piArray[ --n ];
return;
}
}
}
It seems that the "modern idiom" approach using std::algos and container methods would be:
v.erase(std::remove(v.begin(), v.end(), iValue), v.end());
But that should be far slower since for a random existent element, it's n/2 moves and n compares. My version is 1 move and n/2 compares.
Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom? And if not why not?
Use std::find to replace the loop. Take the replacement value from the predecessor of the end iterator, and also use that iterator to erase that element. As this iterator is to the last element, erase is cheap. Bonus: bool return for success checking and templateing over int.
template<typename T>
bool delete_by_value(std::vector<T> &v, T const &del) {
auto final = v.end();
auto found = std::find(v.begin(), final, del);
if(found == final) return false;
*found = *--final;
v.erase(final);
return true;
}
Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom?
There aren't a ready-made function for every niche use case in the standard library. Unstable remove is one of the functions that is not provided. It has been proposed (p0041r0) a while back though. Likewise, there are also no special versions of algorithms for the special case of vectors that do not contain duplicates.
So, you'll need to implement the algorithm yourself if you wish to use an optimal algorithm. There is std::find for linear search. After that, you only need to assign from last element and finally pop it off.
Most implementations of std::vector::resize will not reallocate if you make the size of the vector smaller. So, the following will probably have similar performance to the C example.
void find_and_delete(std::vector<int>& v, int value) {
auto it = std::find(v.begin(), v.end(), value);
if (it != v.end()) {
*it = v.back();
v.resize(v.size() - 1);
}
}
C++ way would be mostly identical with std::vector:
template <typename T>
void delete_by_value(std::vector<T>& v, const T& value) {
auto it = std::find(v.begin(), v.end(), value);
if (it != v.end()) {
*it = std::move(v.back());
v.pop_back();
}
}

c++ – Disappearing Variable

I am writing a function to take the intersection of two sorted vector<size_t>s named a and b. The function iterates through both vectors removing anything from a that is not also in b so that whatever remains in a is the intersection of the two. Code here:
void intersect(vector<size_t> &a, vector<size_t> &b) {
vector<size_t>::iterator aItr = a.begin();
vector<size_t>::iterator bItr = b.begin();
vector<size_t>::iterator aEnd = a.end();
vector<size_t>::iterator bEnd = b.end();
while(aItr != aEnd) {
while(*bItr < *aItr) {
bItr++;
if(bItr == bEnd) {
a.erase(aItr, aEnd);
return;
}
}
if (*aItr == *bItr) aItr++;
else aItr = a.erase(aItr, aItr+1);
}
}
I am getting a very bug. I am stepping the debugger and once it passes line 8 "while(*bItr < *aItr)" b seems to disappear. The debugger seems not to know that b even exists! When b comes back into existence after it goes back to the top of the loop it has now taken on the values of a!
This is the kind of behavior that I expect to see in dynamic memory error, but as you can see I am not managing any dynamic memory here. I am super confused and could really use some help.
Thanks in advance!
Well, perhaps you should first address a major issue with your code: iterator invalidation.
See: Iterator invalidation rules here on StackOverflow.
When you erase an element in a vector, iterators into that vector at the point of deletion and further on are not guaranteed to be valid. Your code, though, assumes such validity for aEnd (thanks #SidS).
I would guess either this is the reason for what you're seeing, or maybe it's your compiler optimization flags which can change the execution flow, the lifetimes of variables which are not necessary, etc.
Plus, as #KT. notes, your erases can be really expensive, making your algorithm potentially quadratic-time in the length of a.
You are making the assumption that b contains at least one element. To address that, you can add this prior to your first loop :
if (bItr == bEnd)
{
a.clear();
return;
}
Also, since you're erasing elements from a, aEnd will become invalid. Replace every use of aEnd with a.end().
std::set_intersection could do all of this for you :
void intersect(vector<size_t> &a, const vector<size_t> &b)
{
auto it = set_intersection(a.begin(), a.end(), b.begin(), b.end(), a.begin());
a.erase(it, a.end());
}

std::lower_bound with skipping invalid elements

I have a list of file names, with each representing a point in time. The list typically has thousands of elements. Given a time point, I'd like to convert these files names into time objects (I'm using boost::ptime), and then find the value of std::lower_bound of this time point with respect to the files names.
Example:
Filenames (with date + time, minutes increasing, with a minute for every file):
station01_20170612_030405.hdf5
station01_20170612_030505.hdf5
station01_20170612_030605.hdf5
station01_20170612_030705.hdf5
station01_20170612_030805.hdf5
station01_20170612_030905.hdf5
If I have a time-point 2017-06-12 03:06:00, then it fits here:
station01_20170612_030405.hdf5
station01_20170612_030505.hdf5
<--- The lower bound I am looking for is here
station01_20170612_030605.hdf5
station01_20170612_030705.hdf5
station01_20170612_030805.hdf5
station01_20170612_030905.hdf5
So far, everything is simple. Now the problem is that the list of files may be doped with some invalid file name, which will make the conversion to a time point fail.
Currently, I'm doing this the easy/inefficient way, and I'd like to optimize it, because this program will go on a server and the cost of operation matters. So, the stupid way is: Create a new list with time points, and only push time points that are valid:
vector<ptime> filesListTimePoints;
filesListTimePoints.reserve(filesList.size());
ptime time;
for(long i = 0; i < filesList.size(); i++) {
ErrorCode error = ConvertToTime(filesList[i], time);
if(error.errorCode() == SUCCESS)
filesListTimePoints.push_back(time);
}
//now use std::lower_bound() on filesListTimePoints
You see, the problem is that I'm using a linear solution with a problem that can be solved with O(log(N)) complexity. I don't need to convert all files or even look at all of them!
My question: How can I embed this into std::lower_bound, such that it remains with optimal complexity?
My idea of a possible solution:
On cppreference, there's a basic implementation of std::lower_bound. I'm thinking of modifying that to get a working solution. But I'm not sure what to do when a convesion fails, since this algorithm highly depends on monotonic behavior. Does this problem have a solution, even mathematically speaking?
Here's the version I'm thinking about initially:
template<class ForwardIt, class T>
ForwardIt lower_bound(ForwardIt first, ForwardIt last, const T& value)
{
ForwardIt it;
typename std::iterator_traits<ForwardIt>::difference_type count, step;
count = std::distance(first, last);
while (count > 0) {
it = first;
step = count / 2;
std::advance(it, step);
ErrorCode error = ConvertToTime(*it, time);
if(error.errorCode() == SUCCESS)
{
if (*it < value) {
first = ++it;
count -= step + 1;
}
else
count = step;
}
else {
// skip/ignore this point?
}
}
return first;
}
My ultimate solution (which might sound stupid) is to make this method a mutator of the list, and erase elements that are invalid. Is there a cleaner solution?
You can simply index by optional<ptime>. If you want to cache the converted values, consider making it a multimap<optional<ptime>, File>.
Better yet, make a datatype representing the file, and calculate the timepoint inside its constructor:
struct File {
File(std::string fname) : _fname(std::move(fname)), _time(parse_time(_fname)) { }
boost::optional<boost::posix_time::ptime> _time;
std::string _fname;
static boost::optional<boost::posix_time::ptime> parse_time(std::string const& fname) {
// return ptime or boost::none
}
};
Now, simply define operator< suitably or use e.g. boost::multi_index_container to index by _time
Further notes:
in case it wasn't clear, such a map/set will have it's own lower_bound, upper_bound and equal_range operations, and will obviously also work with std::lower_bound and friends.
there's always filter_iterator adaptor: http://www.boost.org/doc/libs/1_64_0/libs/iterator/doc/filter_iterator.html

C++ elegant way to mark index which doesn't belong to a vector

I was wondering about a proper and elegant way to mark index which doesn't belong to a vector/an array. Let me show you a brief example showing what I mean (using some pseudocode phrases):
std::vector<**type**> vector;
int getIndex()
{
if (**user has selected something**)
{
return **index of the thing in our vector**;
} else
return -1;
}
int main()
{
int selectedItem = getIndex();
if (selectedItem<vector.size()) //checking if selected index is valid, -1 is not
{
**do something using selected object**
}
}
Of course I mean to use it in much more sophisticated way, but I hope the problem is shown in the example. Is it a good idea to mark an index which is not in a vector using -1 constans? It leads to a warning about comparing signed and unsigned values, but still it works as I want it to.
I don't want to check additionaly if my selectedItem variable is -1, that gives one additional, unnecessary condition. So is this a good solution or should I consider something else?
The most elegant way to indicate that something you're looking for wasn't found in a vector is to use the C++ Standard Library facilities the way they were intended -- with iterators:
std::vector<type>::iterator it = std::find (vec.begin(), vec.end(), something_to_find);
if (it != vec.end())
{
// we found it
}
else
{
// we didn't find it -- it's not there
}
It's better to use iterators, but if you decide to stick with the indices, it's better to make getIndex return size_t as string::find() does:
size_t getIndex()
{
//...
return -1; // the same as std::numeric_limits<size_t>::max()
}
This way getIndex(element) < vec.size() if and only if the element is present in vector.
If you insist on using integer indexes instead of iterators, then -1 is the usual sentinel value used to say "not found". However instead of comparing against vec.size() you should compare to 0 instead, to avoid the signed/unsigned mismatch.
struct SelectableItem {bool selected;/*more suff here*/};
struct IsSelected(const SelectableItem& sel) {return sel.selected;}
int main(int argc, char** argv)
{
std::vector<SelectableItem> vec;
//populate vector
auto found = vec.find_if(vec.begin(), vec.end(), IsSelected());
if (found != vec.end())
{
SelectedItem& selected_item = *found;
/*do something*/
}
}
Don't reinvent the wheel.
If you decide to use vec.end() then you can guard yourself against invalidated iterators (e.g. you insert an element in the vector after you have created the iterator) by compiling with -D_GLIBCXX_DEBUG in debug mode.
I would use -1 though, but use the size_t type everywhere. Iterators are so error prone and the ISO standard is ambiguous and diffuse when it comes to the details.

C++ iterators problem

I'm working with iterators on C++ and I'm having some trouble here. It says "Debug Assertion Failed" on expression (this->_Has_container()) on line interIterator++.
Distance list is a vector< vector< DistanceNode > >. What I'm I doing wrong?
vector< vector<DistanceNode> >::iterator externIterator = distanceList.begin();
while (externIterator != distanceList.end()) {
vector<DistanceNode>::iterator interIterator = externIterator->begin();
while (interIterator != externIterator->end()){
if (interIterator->getReference() == tmp){
//remove element pointed by interIterator
externIterator->erase(interIterator);
} // if
interIterator++;
} // while
externIterator++;
} // while
vector's erase() returns a new iterator to the next element. All iterators to the erased element and to elements after it become invalidated. Your loop ignores this, however, and continues to use interIterator.
Your code should look something like this:
if (condition)
interIterator = externIterator->erase(interIterator);
else
++interIterator; // (generally better practice to use pre-increment)
You can't remove elements from a sequence container while iterating over it — at least not the way you are doing it — because calling erase invalidates the iterator. You should assign the return value from erase to the iterator and suppress the increment:
while (interIterator != externIterator->end()){
if (interIterator->getReference() == tmp){
interIterator = externIterator->erase(interIterator);
} else {
++interIterator;
}
}
Also, never use post-increment (i++) when pre-increment (++i) will do.
I'll take the liberty to rewrite the code:
class ByReference: public std::unary_function<bool, DistanceNode>
{
public:
explicit ByReference(const Reference& r): mReference(r) {}
bool operator()(const DistanceNode& node) const
{
return node.getReference() == r;
}
private:
Reference mReference;
};
typedef std::vector< std::vector< DistanceNode > >::iterator iterator_t;
for (iterator_t it = dl.begin(), end = dl.end(); it != end; ++it)
{
it->erase(
std::remove_if(it->begin(), it->end(), ByReference(tmp)),
it->end()
);
}
Why ?
The first loop (externIterator) iterates over a full range of elements without ever modifying the range itself, it's what a for is for, this way you won't forget to increment (admittedly a for_each would be better, but the syntax can be awkward)
The second loop is tricky: simply speaking you're actually cutting the branch you're sitting on when you call erase, which requires jumping around (using the value returned). In this case the operation you want to accomplish (purging the list according to a certain criteria) is exactly what the remove-erase idiom is tailored for.
Note that the code could be tidied up if we had true lambda support at our disposal. In C++0x we would write:
std::for_each(distanceList.begin(), distanceList.end(),
[const& tmp](std::vector<DistanceNode>& vec)
{
vec.erase(
std::remove_if(vec.begin(), vec.end(),
[const& tmp](const DistanceNode& dn) { return dn.getReference() == tmp; }
),
vec.end()
);
}
);
As you can see, we don't see any iterator incrementing / dereferencing taking place any longer, it's all wrapped in dedicated algorithms which ensure that everything is handled appropriately.
I'll grant you the syntax looks strange, but I guess it's because we are not used to it yet.