c++ multiset iterator sorting - c++

I have a multiset mymulti where I sort according to a class member m_a.
I want then to check for all sorted elements, if the difference in m_a for neighbour fields of mymulti is less than my given threshold, say 0.001. If so, then I want to prefer that which has a smaller another class member, m_b.
Here I am stuck, I have no experience with multiset or iterators. I don't know how to compare iterators from two iterations. If you can provide me with a right code for this what I want to do, will be very grateful!
My try, not too much, just my concept:
//all before I got stuck
for(it = mymulti.begin(); it!= mymulti.end(); ++it) //or it++?
if( (it+1)->mymulti.m_a - (it)->mymulti.m_a < 0.001)
if ((it+1)->mymulti.m_b < (it)->mymulti.m_b)
//swap them. but how to swap two fields in a multiset, not two multisets?
// otherwise do nothing

You cannot (or if you can, depending on your STL implementation, should not) modify items once they have been inserted into a multiset, as it could violate the provided ordering of the items in the multiset. So swapping would be a bad idea, even if you could do it.
See https://stackoverflow.com/a/2038534/713961 and http://www.cplusplus.com/reference/set/multiset/
If you would like to remove items, use multiset::erase, which takes an iterator. I believe the standard practice for "modifying" an item in a multiset is to remove it, then insert the modified version.
As a side note, I noticed you're checking if two floating point numbers are close enough in value by using a fixed epsilon (0.001). As explained in this article, this only works if all the floats you are comparing are sufficiently small. See the article for a comparison that works equally well for large and small floating-point values.

Related

Efficient way to search and find elements in one vector from the other vector?

I have two std::vector<long>s. One has 5 elements and the other has 100 elements. I want to compare elements from the smaller vector with the larger vector, and if an element is not found in the larger vector I want to push it back into the larger vector. I have my code here:
vector<long> Matvec, MatTempVec;
//assume Matvec has 5 elements and MatTempvec has 100 elements.
vector<long>::iterator It1;
for (auto it = Matvec.begin(); it != Matvec.end(); it++)
{
It1 = find(MatTempVec.begin(), MatTempVec.end(), it);
if (It1 != MatTempVec.end())
MatTempVec.push_back(*it);
}
Suggest me an efficient way for this search and find, other than what I have done above.
First, I hope you either, asking it just hypothetically, to find the "best algorithm" for a problem, or, you are talking about much larger data-sets.
For the amount of data you have, it is not worth thinking about optimizations like this.
In answer to your question:
This really depends of how many constraints you have on your vectors.
If you know they are sorted, this is easy to solve with one iteration over the two vectors.
If you know they are unique, you probably want to use set.
if you know nothing, you may be tempted to still use set, just as temporary data structure, for faster lookup. This may, or may not be faster in real-world, due to locality
As others have commented, I think you're using the wrong tool for the job. It would be better to use a structure that supported uniqueness inherently, like a std::set. Using a vector means the complexity of checking for the existence of a value in your list is O(n) (linearly proportional to the size of the list), while using a std::set would get you O(log(n)) complexity - which is much better - as std::sets are usually based on red/black trees.
If you really insist on doing it with vectors, and they're not sorted, then you're in the worst of all worlds and you'll end up doing a "Cartesian-Product Join" where the number of comparisons you're doing is the product of the number of rows in each set (i.e. 5x100 = 500 in this case). When the vectors are small, that may be acceptable, but as they grow it will quickly kill your performance.
So, one way out of this is to:
Sort your vectors
Perform a sort merge join on the result.
However, be careful in your choice of sorting algorithm too as that can also be expensive and ideally store the sorted result and maintain the vectors in sorted order. If you're re-sorting all the time, that will also kill performance.
(Or, go back to the top of this answer and reconsider your decision to stick with a vector...)

Finding the number of elements less than or equal to k in a multiset

I have a multiset, implemented as follows:
#include <bits/stdc++.h>
using namespace std;
multiset <int> M;
int numunder(int k){
/*this function must return the number of elements smaller than or equal to k
in M (taking multiplicity into account).
*/
}
At first I thought I could just return M.upper_bound(k)-M.begin()+1. Unfortunately it seems we cannot subtract pointers like that. We ended up having to implement an AVLNodes structure. Is there a way to get this to work taking advantage of the c++ std?
Sticking closely to your proposed M.upper_bound(k)-M.begin()+1 solution (which clearly does not compile, because the multimap iterator is a bidirectional iterator that does not implement operator-), you could use std::distance to get the distance between two multimap iterators to have a correct solution.
Note that this solution will have O(n) complexity, because if the iterator is not a random access iterator, std::distance will just increment the iterator passed in as first parameter, until it finds the iterator passed in as second argument.
I also don't really think that this problem can be solved in better than O(n) complexity with std::multiset.
This can be solved using some policy based data structures avaliable in gcc . You can use the red black tree with information statistics, here is a discussion
Gcc implements multisets as red-black trees. In a binary tree there is no non-trivial way to get the "sorted index" of a node without storing extra info in the node, such as the number of children.
Also know that iterating through the iterators returned by find, upper_bound, etc. will walk the tree, because the iterators are not random access. See https://en.cppreference.com/w/cpp/container/multiset
If you want to only use built-in data structures you could maintain a separate vector that you can perform binary search on. This is more organizational work but if you are only inserting or erasing then it is pretty simple. Anything more complicated probably warrants its own data structure.

Remove duplicated members from vector while maintaining order [duplicate]

This question already has answers here:
How to make elements of vector unique? (remove non adjacent duplicates)
(11 answers)
Closed 9 years ago.
I know this question has been asked a lot, but I could not find the best (the most efficient) way to remove duplicated members (type double) from vector while maintaining 1 copy and order of the original vector.
If your data was not doubles, simply doing a pass with an unordered_set keeping track of what you have already seen via the remove_if-erase idiom would work.
doubles are, however, bad news when checking for equality: two derivations that you may think should produce the same value may generate different results. A set would allow looking for nearby values. Simply use equal_range with a plus minus epsilon instead of find to see if there is another value approximetally equal to yours earlier in the vector, and use the same remove erase idiom.
The remove erase idiom looks like:
vec.erase( std::remove_if( vec.begin(), vec.end(), [&](double x)->bool{
// return true if you want to remove the double x
}, vec.end());
in C++03 it cannot be done inline.
The lambda above will be called for each element in order, like the body of a loop.
If you have to/wish to use a vector* then it might be easiest to trap duplicates on insertion - if the point to be inserted is already there, bin it.
Another approach for a really large collection would be to do a sort and search for duplicates after every N insertions, where N is the perfect number of insertions to wait before doing a sort and search for duplicates. (Calculating N is left as an exercise for the reader.)
Your approach, and the value of N if relevant, depends on the number of elements, how often the array is changed, how often the contents are examined, and the likelihood of duplicates occuring.
(*apparently, vectors are great as their disadvantages lie where modern computers tend to kick butt so hard it doesn't matter, and are blisteringly fast with linear searches. At least I think that's what Bjarn's saying here comparing a vector to a linked list.)

C++ Set: No match for - operator

I have a set, namely of type multiset , I'm trying to use the upper_bound function to find the index of the element returned by the iterator. Usually with vectors, it works if I get the iterator and subtract vector.begin() from it to get the answer.
However, when I try this with a set it gives an STL error, saying "no match for operator -' in ...(omitting STL details)
Is there a fundamental reason for this ( sets being implemented as RB-trees and all). If so, can anyone suggest an alternate to this? ( I'm trying to solve a question on a programming site)
Thanks!
Yes, there are different types of iterators and operator- is not supported for set iterators which are not random access.
You can use std::distance( mySet.begin(), iter );
I think that for std::set (and multiset) this is likely to be an O(log N) operation compared to it being constant time for vector and linear for list.
Are you sure you want to be storing your data in a std::multiset? You could use a sorted vector instead. Where the vector would be slower is if it is regularly edited, i.e. you are trying to insert and remove elements from anywhere, whilst retaining its sorted state.
If the data is built once then accessed many times, a sorted vector can sometimes be more efficient.
IF the data set is very large, consider using std::deque rather than std::vector because deque is more scalable in not requiring a contiguous memory block.

Complexity of STL max_element

So according to the link here: http://www.cplusplus.com/reference/algorithm/max_element/ , the max_element function is O(n), apparently for all STL containers. Is this correct? Shouldn't it be O(log n) for a set (implemented as a binary tree)?
On a somewhat related note, I've always used cplusplus.com for questions which are easier to answer, but I would be curious what others think of the site.
It's linear because it touches every element.
It's pointless to even use it on a set or other ordered container using the same comparator because you can just use .rbegin() in constant time.
If you're not using the same comparison function there's no guarantee that the orders will coincide so, again, it has to touch every element and has to be at least linear.
Although algorithms may be specialized for different iterator categories there is no way to specialize them base on whether an iterator range is ordered.
Most algorithms work on unordered ranges (max_element included), a few require the ranges to be ordered (e.g. set_union, set_intersection) some require other properties for the range (e.g. push_heap, pop_heap).
The max_element function is O(n) for all STL containers.
This is incorrect, because max_element applies to iterators, not containers. Should you give it iterators from a set, it has no way of knowing they come from a set and will therefore traverse all of them in order looking for the maximum. So the correct sentence is:
The max_element function is O(n) for all forward iterators
Besides, if you know that you're manipulating a set, you already have access to methods that give you the max element faster than O(n), so why use max_element ?
It is an STL algorithm, so it does not know anything about the container. So this linear search is the best it can do on a couple on forward iterators.
STL algorithms do not know what container you took the iterators from, whether or not it is ordered and what order constraints were used. It is a linear algorithm that checks all elements in the range while keeping track of the maximum value seen so far.
Note that even if you could use metaprogramming techniques to detect what type of container where the iterators obtained from that is not a guarantee that you can just skip to the last element to obtain the maximum:
int values[] = { 1, 2, 3, 4, 5 };
std::set<int, greater<int> > the_set( values, values+5 );
std::max_element( the_set.begin(), the_set.end() ); //??
Even if the iterators come from a set, it is not the last, but the first element the one that holds the maximum. With more complex data types the set can be ordered with some other key that can be unrelated to the min/max values.