Existence map in C++ - c++

I want something like an std::map, but I only want to see if the item exists or not, I don't actually need a key AND a value. What should I use?

Looks like you need a std::set.

If you want the same type of behavior as std::map, then you want std::set.
If you are mixing insert/delete and query operations, then std::set is probably the best choice. However, if you can populate the set first and then follow it with the queries, it might be worth looking at using std::vector, sorting it, and then using a binary search to check for existence in the vector.

If you really need existence only, and not even an order, you need an unordered_set. It is available from your favorite C++0x vendor or boost.org.

If your data is numerical you can use an std::vector which is optimized for space:
D:\Temp>type vectorbool.cpp
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<bool> vb(10);
vb[5] = true;
for (vector<bool>::const_iterator ci = vb.begin(); ci != vb.end(); ++ci) {
cout << *ci << endl;
}
}
D:\Temp>cl /nologo /W4 /EHsc vectorbool.cpp
vectorbool.cpp
D:\Temp>vectorbool.exe
0
0
0
0
0
1
0
0
0
0

You should probably look at stl::set for what you need. A stl::bitset is another option.
It will depend on how you need to use the information that would define which of these is better. A set is a sorted data structure, insertion, find and deletion take O(LOG N) time. But if you need to iterate over all the values that you have marked for "existence" then the set is the way to go.
If you only need to mark and lookup the fact that something is a member of a set then the bitset might be better for you. Insertion, find and delete only takes O(1), but you can only collect int values. Iterating over all the marked values will take O(N) as you need to go through the whole set to find the members that are set to true. You can use it in concert with a stl::map to map from the values you have to the numerical values the bitset needs.
Look at the operations that you need to perform with the values in your set and you should be able to choose the appropriate data structure

You can keep using std::map for the desired purpose.
To check if a particular item (of key type) exists in the map or not, you can use following code:
if (mapObj.count(item) != 0)
{
// item exists
}
As answered earlier, std::set will do the job as well. Interestingly both, set and map are represented as Trees internally.

If the key IS the value, then you might also consider a "bloom filter" rather than a set.

Related

C++: insert into std::map without knowing a key

I need to insert values into std::map (or it's equivalent) to any free position and then get it's key (to remove/modify later). Something like:
std::map<int, std::string> myMap;
const int key = myMap.insert("hello");
Is it possibly to do so with std::map or is there some appropriate container for that?
Thank you.
In addition to using a set, you can keep a list of allocated (or free)
keys, and find a new key before inserting. For a map indexed by
int, you can simply take the last element, and increment its key. But
I rather think I'd go with a simple std::vector; if deletion isn't
supported, you can do something simple like:
int key = myVector.size();
myVector.push_back( newEntry );
If you need to support deletions, then using a vector of some sort of
"maybe" type (boost::optional, etc.—you probably already have
one in your toolbox, maybe under the name of Fallible or Maybe) might be
appropriate. Depending on use patterns (number of deletions compared to
total entries, etc.), you may want to search the vector in order to
reuse entries. If your really ambitious, you could keep a bitmap of the
free entries, setting a bit each time you delete and entry, and
resetting it whenever you reuse the space.
You can add object to an std::set, and then later put the whole set into a map. But no, you can't put a value into a map without a key.
The closest thing to what you're trying to do is probably
myMap[myMap.size()] = "some string";
The only advantage this has over std::set is that you can pass the integer indexes around to other modules without them needing to know the type of std::set<Foo>::iterator or similar.
It is impossible. Such an operation would require intricate knowledge of the key type to know which keys are available. For example, std::map would have to increment int values for int maps or append to strings for string maps.
You could use a std::set and drop keying altogether.
If you want to achieve something similar to automatically generated primary keys in SQL databases than you can maintain a counter and use it to generate a unique key. But perhaps std::set is what you really need.

getting a C++ std::set's members by index

Is there a way to use one of the stl algorithms define in to get a member of a set using its index position in the set?
I could use a utility method like the one below, but I've got to think this exists already in some generic form in the stl:
ElementPtr elementAt(int elementNumber)
{
list<ElementPtr>::iterator elementIt = elements.begin();
for (int counter = 0; counter < elementNumber && elementIt != elements.end(); counter++, elementIt++)
{
}
return *elementIt;
}
#include <iterator>
list<ElementPtr>::iterator elementIt = elements.begin();
std::advance(elementIt, elementNumber);
x = *elementIt;
Which does essentially what your code does.
But the fact that you want to do this most likely indicates that you're data structures are wrong. Sets are not designed to be processed like this.
There isn't a usable index mechanism if it's implemented as a binary tree or a hash table, both of which are common for sets.
Are you actually using the right container type? Consider using a sorted vector instead.
You could do this using Boost.MultiIndex to build both ordering and random access indices on the same underlying data.
I don't believe so, as "index-of" doesn't really make sense in terms of a generalized std::set. Unless your set is constructed (and initialized) once and never changed, then you cannot guarantee that the results of calls to the index-of operator would always return a predictable result.
The best you are going to get is an iterator. Sets are containers where the value is the index (well, more of a reference in a hash table). Maybe we could better answer your question if we knew what you were trying to do.
I think you are equating a set to an array; they are structured quite differently, a numerical index does not apply.
You say set, but your code actually indicates list. The two are not the same. Sets are designed to have their elements retrieved by their value. Lists, you can just advance along them using std::advance.
There is no such thing as a numerical index into a set. You need to use a vector instead. And what's more, if you do happen to "get the nth item" in the set, it is not guaranteed it will be there (in the same place) after the set is modified.

How to associate to a number another number without using array

Let's say we have read these values:
3
1241
124515
5322353
341
43262267234
1241
1241
3213131
And I have an array like this (with the elements above):
a[0]=1241
a[1]=124515
a[2]=43262267234
a[3]=3
...
The thing is that the elements' order in the array is not constant (I have to change it somewhere else in my program).
How can I know on which position does one element appear in the read document.
Note that I can not do:
vector <int> a[1000000000000];
a[number].push_back(all_positions);
Because a will be too large (there's a memory restriction). (let's say I have only 3000 elements, but they're values are from 0 to 2^32)
So, in the example above, I would want to know all the positions 1241 is appearing on without iterating again through all the read elements.
In other words, how can I associate to the number "1241" the positions "1,6,7" so I can simply access them in O(1) (where 1 actually is the number of positions the element appears)
If there's no O(1) I want to know what's the optimal one ...
I don't know if I've made myself clear. If not, just say it and I'll update my question :)
You need to use some sort of dynamic array, like a vector (std::vector) or other similar containers (std::list, maybe, it depends on your needs).
Such data structures are safer and easier to use than C-style array, since they take care of memory management.
If you also need to look for an element in O(1) you should consider using some structures that will associate both an index to an item and an item to an index. I don't think STL provides any, but boost should have something like that.
If O(log n) is a cost you can afford, also consider std::map
You can use what is commonly refered to as a multimap. That is, it stores Key and multiple values. This is an O(log) look up time.
If you're working with Visual Studios they provide their own hash_multimap, else may I suggest using Boost::unordered_map with a list as your value?
You don't need a sparse array of 1000000000000 elements; use an std::map to map positions to values.
If you want bi-directional lookup (that is, you sometimes want "what are the indexes for this value?" and sometimes "what value is at this index?") then you can use a boost::bimap.
Things get further complicated as you have values appearing more than once. You can sacrifice the bi-directional lookup and use a std::multimap.
You could use a map for that. Like:
std::map<int, std::vector<int>> MyMap;
So everytime you encounter a value while reading the file, you append it's position to the map. Say X is the value you read and Y is the position then you just do
MyMap[X].push_back( Y );
Instead of you array use
std::map<int, vector<int> > a;
You need an associative collection but you might want to associated with multiple values.
You can use std::multimap< int, int >
or
you can use std::map< int, std::set< int > >
I have found in practice the latter is easier for removing items if you just need to remove one element. It is unique on key-value combinations but not on key or value alone.
If you need higher performance then you may wish to use a hash_map instead of map. For the inner collection though you will not get much performance in using a hash, as you will have very few duplicates and it is better to std::set.
There are many implementations of hash_map, and it is in the new standard. If you don't have the new standard, go for boost.
It seems you need a std::map<int,int>. You can store the mapping such as 1241->0 124515->1 etc. Then perform a look up on this map to get the array index.
Besides the std::map solution offered by others here (O(log n)), there's the approach of a hash map (implemented as boost::unordered_map or std::unordered_map in C++0x, supported by modern compilers).
It would give you O(1) lookup on average, which often is faster than a tree-based std::map. Try for yourself.
You can use a std::multimap to store both a key (e.g. 1241) and multiple values (e.g. 1, 6 and 7).
An insert has logarithmic complexity, but you can speed it up if you give the insert method a hint where it can insert the item.
For O(1) lookup you could hash the number to find its entry (key) in a hash map (boost::unordered_map, dictionary, stdex::hash_map etc)
The value could be a vector of indices where the number occurs or a 3000 bit array (375 bytes) where the bit number for each respective index where the number (key) occurs is set.
boost::unordered_map<unsigned long, std::vector<unsigned long>> myMap;
for(unsigned long i = 0; i < sizeof(a)/sizeof(*a); ++i)
{
myMap[a[i]].push_back(i);
}
Instead of storing an array of integer, you could store an array of structure containing the integer value and all its positions in an array or vector.

Is this usage of unordered map efficient/right way?

I want to learn about mapping functions in c/c++ in general so this is a basic program on unordered mapping. I use unordered mapping because my input data are not sorted and I read that unordered_map is very efficient. Here I've an array with which I'm creating the hash table and use the lookup function to find if the elements in another array are in the hash table or not. I've several questions regarding this implementation:
#include <stdio.h>
#include <unordered_map>
using namespace std;
typedef std::unordered_map<int,int> Mymap;
int main()
{
int x,z,l=0;
int samplearray[5] = {0,6,4,3,8};
int testarray[10] = {6,3,8,67,78,54,64,74,22,77};
Mymap c1;
for ( x=0;x< sizeof(samplearray)/sizeof(int);x++)
c1.insert(Mymap::value_type(samplearray[x], x));
for ( z=0;z< sizeof(testarray)/sizeof(int);z++)
if((c1.find(testarray[z]) != c1.end()) == true)
l++;
printf("The number of elements equal are : %d\n",l);
printf("the size of samplearray and testarray are : %d\t%d\n",sizeof(samplearray)/sizeof(int),sizeof(testarray)/sizeof(int));
}
First of all, is this a right way to
implement it? I'm getting the
answers right but seems that I use
too much of for loop.
This seems fairly okay with very small data but if I'm dealing with files of size > 500MB then this seems that, if I create a hash table for a 500MB file then the size of the hash table itself will be twice as much which is 1000MB. Is this always the case?
What is the difference between std::unordered map and boost::unordered map?
Finally, a small request. I'm new to C/C++ so if you are giving suggestions like using some other typedef/libraries, I'd highly appreciate if you could use a small example or implement it on my code. Thanks
You're starting off on the wrong foot. A map (ordered or otherwise) is intended to store a key along with some associated data. In your case, you're only storing a number (twice, as both the key and the data). For this situation, you want a set (again, ordered or otherwise) instead of a map.
I'd also avoid at least the first for loop, and use std::copy instead:
// There are better ways to do this, but it'll work for now:
#define end(array) ((array) + (sizeof(array)/sizeof(array[0]))
std::copy(samplearray,
end(samplearray),
std::inserter(Myset));
If you only need to count how many items are common between the two sets, your for loop is fairly reasonable. If you need/want to actually know what items are common between them, you might want to consider using std::set_intersection:
std::set<int> myset, test_set, common;
std::copy(samplearray, end(samplearray), std::inserter(myset));
std::copy(testarray, end(testarray), std::inserter(test_set));
std::set_intersection(myset.begin(), myset.end(),
test_set.begin(), test_set.end(),
std::inserter(common));
// show the common elements (including a count):
std::cout <<common.size() << " common elements:\t";
std::copy(common.begin(), common.end(),
std::ostream_iterator<int>(std::cout, "\t");
Note that you don't need to have an actual set to use set_intersection -- all you need is a sorted collection of items, so if you preferred to you could just sort your two arrays, then use set_intersection on them directly. Likewise, the result could go in some other collection (e.g., a vector) if you prefer.
As mentioned by Jerry, you could use a for loop for the search if you only need to know the number of matches. If that is the case, I would recommend using an unordered_set since you don't need the elements to be sorted.

STL sorted set where the conditions of order may change

I have a C++ STL set with a custom ordering defined.
The idea was that when items get added to the set, they're naturally ordered as I want them.
However, what I've just realised is that the ordering predicate can change as time goes by.
Presumably, the items in the set will then no longer be in order.
So two questions really:
Is it harmful that the items would then be out of order? Am I right in saying that the worst that can happen is that new entries may get put into the wrong place (which actually I can live with). Or, could this cause crashes, lost entries etc?
Is there a way to "refresh" the ordering of the set? You can't seem to use std::sort() on a set. The best I can come up with is dumping out the contents to a temp container and re-add them.
Any ideas?
Thanks,
John
set uses the ordering to lookup items. If you would insert N items according to ordering1 and insert an item according to ordering2, the set cannot find out if the item is already in.
It will violate the class invariant that every item is in there only once.
So it does harm.
The only safe way to do this with the STL is to create a new set with the changed predicate. For example you could do something like this when you needed to sort the set with a new predicate:
std::set<int> newset( oldset.begin(), oldset.end(), NewPred() );
This is actually implementation dependent.
The STL implementation can and usually will assumes the predicate used for sorting is stable (otherwise, "sorted" would not be defined). It is at least possible to construct a valid STL implementation that formats your hard drive when you change the behavior of the predicate instance.
So, yes, you need to re-insert the items into a new set.
Alternatively, you could construct your own container, e.g. a vector + sort + lower_bound for binary search. Then you could re-sort when the predicates behavior changes.
I agree with the other answers, that this is going to break in some strange and hard to debug ways. If you go the refresh route, you only need to do the copy once. Create a tmp set with the new sorting strategy, add each element from the original set to the tmp set, then do
orig.swap(tmp);
This will swap the internals of the sets.
If this were me, I would wrap this up in a new class that handles all of the details, so that you can change implementations as needed. Depending on your access patterns and the number of times the sort order changes, the previously mentioned vector, sort, lowerbound solution may be preferable.
If you can live with an unordered set, then why are you adding them into a set in the first place?
The only case I can think of is where you just want to make sure the list is unique when you add them. If that's the case then you could use a temporary set to protect additions:
if (ts.insert (value).second) {
// insertion took place
realContainer.push_back (value);
}
An alternative, is that depending on how frequently you'll be modifying the entries in the set, you can probably test to see if the entry will be in a different location (by using the set compare functionality) and where the position will move then remove the old entry and re-add the new one.
As everyone else has pointed out - having the set unordered really smells bad - and I would also guess that its possible got undefined behaviour according to the std.
While this doesn't give you exactly what you want, boost::multi_index gives you similar functionality. Due to the way templates work, you will never be able to "change" the ordering predicate for a container, it is set in stone at compile time, unless you are using a sorted vector or something similar, to where you are the one maintaining the invariant, and you can sort it however you want at any given time.
Multi_index however gives you a way to order a set of elements based on multiple ordering predicates at the same time. You can then select views of the container that behave like an std::set ordered by the predicate that you care about at the time.
This can cause lost entries, when searching for an element in a set the ordering operator is used this means that if an element was placed to the left of the root and now the ordering operator says it's to the right then that element will not longer be found.
Here's a simple test for you:
struct comparer : public std::binary_function<int, int, bool>
{
static enum CompareType {CT_LESS, CT_GREATER} CompareMode;
bool operator()(int lhs, int rhs) const
{
if(CompareMode == CT_LESS)
{
return lhs < rhs;
}
else
{
return lhs > rhs;
}
}
};
comparer::CompareType comparer::CompareMode = comparer::CT_LESS;
typedef std::set<int, comparer> is_compare_t;
void check(const is_compare_t &is, int v)
{
is_compare_t::const_iterator it = is.find(v);
if(it != is.end())
{
std::cout << "HAS " << v << std::endl;
}
else
{
std::cout << "ERROR NO " << v << std::endl;
}
}
int main()
{
is_compare_t is;
is.insert(20);
is.insert(5);
check(is, 5);
comparer::CompareMode = comparer::CT_GREATER;
check(is, 5);
is.insert(27);
check(is, 27);
comparer::CompareMode = comparer::CT_LESS;
check(is, 5);
check(is, 27);
return 0;
}
So, basically if you intend to be able to find the elements you once inserted you should not change the predicate used for insertions and find.
Just a follow up:
While running this code the Visual Studio C debug libraries started throwing exceptions complaining that the "<" operator was invalid.
So, it does seem that changing the sort ordering is a bad thing. Thanks everyone!
1) Harmful - no. Result in crashes - no. The worst is indeed a non-sorted set.
2) "Refreshing" would be the same as re-adding anyway!