Why is a set used instead of a map? C++ - c++

Sets are used to get information of an object by providing all the information, usually used to check if the data exists. A map is used to get the information of an object by using a key (single data). Correct me if I am wrong. Now the question is why would we need a set in the first place, can't we a map to see if the data exist? why would we need to provide all the information just to see if it exist?

There are many operations where you just need a set. Using a map would be just extra space.
Set operations (Union, Intersection etc.).
Keeping unique elements from a collection of numbers, objects etc.

A set serves to group items of the same type that are different among themselves (i.e., they are not equal). For example, the numbers 1 and 2 are both of int type, but 1!=2.
set containers are useful when you want to keep track of collections of homogeneous things as a group, and perform mathematical operations on such groups (like intersection, union, difference, etc). For example, imagine a set of search results containing all the documents mentioning the words cat and dog. And then another set containing all the documents mentioning the words pet. The union of those two sets would give you the group of documents containing the words cat, dog, and pet. Notice that such group will have no repetitions (i.e., if a document was in the both sets initially, it will be only once in the second set).
maps are most certainly not a set, but they can be seen as an arrangement which allows you to associate a value with every element of a set. They are used to represent relationships. For example, the set of people working for a company have an associated employee_number; in this case a map would be a useful structure to represent such relationship.
Going back to the previous example, if you wanted to know how many times has each page been accessed, you could probably create a map along the lines of std::map<Page, int>, that is, a relationship between the pages, and the number of times each has been visited.
Notice that the keys of a map form a set (probably this is what confuses many people), and an implication of this property is that you can only have a given key once (there are some esoteric containers where a key can be mapped to different values though).
So, if you need to interact with groups and collections as a whole, and with the members of the group itself, probably you want a set. If you need to associate certain things with members of a group or a collection, probably you want a map. If the association spans more than one dimension, probably you want a multi_map.
Important notice that in C++ std::set and std::map are ordered. C++11 offers alternative unordered containers called std::unordered_set and std::unordered_map.

A Set contains a unique list of ordered values, but a Map can contain a non unique set of unordered values accessed using a key.
Either could be used to determine if an object exists, it depends on your use case and how you need to be able to access that object - can you test to see if the Set contains an object that you have a reference to, or do you need to look it up by one or more keys to be able to compare it?

Related

How would one look for entities with specific components in an entity component system?

How would one look for entities with specific components in an entity component system?
In my current implementation I'm storing components in a
std::unordered_map< entity_id, std::unordered_map<type_index, Component *> >.
So if a system needs access to entities with specific components, what is the most efficient way to access them.
I currently have 2 ideas:
Iterate through the map and skip the entities that don't have those components.
Create "mappers" or "views" that hold a pointer to the entity and update them every time a component is assigned to or removed from an entity.
I saw some approaches with bitmasks and such, but that doesn't seem scalable.
Your situation calls for std::unordered_multimap.
"find" method would return an iterator for the first element, which matches the key in multimap. "equal_range" method would return you a pair, containing the iterators for the first and last object, matching your key.
Actually what unordered_multimap allows you to create is an in-memory key-value database that stores a bunch of objects for the same key.
If your "queries" would get more complicated than "give me all objects with component T" and turn into something like "give me all components that have component T and B at the same time", you would be better suited to create a class that has unordered_multimap as a member and has a bunch of utility methods for querying the stuff.
More on the subject:
http://www.cplusplus.com/reference/unordered_map/unordered_multimap/equal_range/
unordered_multimap - iterating the result of find() yields elements with different value (somewhat related question - the accepted answer could be helpful)
The way I do it involves storing a back index to the entity from the component (32-bits). It adds a bit of memory overhead but the total memory overhead of a component in mine is around 8 bytes which is usually not too bad for my use cases, and around 4 bytes per entity.
Now when you have a back index to an entity, what you can do when satisfying a query for all entities that have 2 or more component types is to use parallel bit sets.
For example, if you are looking for entities with two component types, Motion, and Sprite, then we start out by iterating through the motion components and set the associated bits for the entities that own them.
Next we iterate through the sprite components and look for the entity bits already set by the pass through motion components. If the entity index appears in both the motion components and the sprite components, then we add the entity to the list of entities that contain both. A diagram of the idea as well as how to multithread it and pool the entity-parallel bit arrays:
That gives you a set intersection in linear time and with a very small m (very, very cheap work per iteration as we're just marking and inspecting a bit -- much, much, much cheaper than a hash table, e.g.). I can actually perform a set intersection between two sets with 100 million elements each in under a second using this technique. As a bonus, with some minor effort, you can make it give you the entities back in sorted order for cache-friendly access patterns if you use the bitset to grab the indices of the entities that belong in 2 or more components.
There are ways to do this in better than linear time (Log(N)/Log(64)) though it gets considerably more involved where you can actually perform a set intersection between two sets containing a hundred million elements each in under a millisecond. Here's a hint:

Hierarchical filtered lookup in C++

I have been pondering a data structure problem for a while, but can't seem to come up with a good solution. I can not shake off the feeling that the solution is simple and I'm just not seeing it, however, so hopefully you guys can help!
Here is the problem: I have a large collection of objects in memory. Each of them has a number of data fields. Some of the data fields, such as an ID, are unique for each objects, but others, such as a name, can appear in multiple objects.
class Object {
size_t id;
std::string name;
Histogram histogram;
Type type;
...
};
I need to organize these objects in a way that will allow me to quickly (even if the number of objects is relatively large, i.e. millions) filter the collection given a specification of an arbitrary number of object members while all members that are left unspecified count as wildcards. For example, if I specify a given name, I want to retrieve all the objects whose name member equals the given name. However, if I then add a histogram to the query, I would like the query to return only the objects that match in both the name and the histogram fields, and so on. So, for example, I'd like a function
std::set<Object*> retrieve(size_t, std::string, Histogram, Type)
that can both do
retrieve(42, WILDCARD, WILDCARD, WILDCARD)
as well as
retrieve(42, WILDCARD, WILDCARD, Type_foo)
where the second call would return fewer or equally as many objects as the first one. Which data structure allows queries like this and can both be constructed and queried in reasonable time for object counts in the millions?
Thanks for the help!
First you could use Boost Multi-index to implement efficent lookup over differnt members of your Object. This could help to limit the number of elements to consider. As a second step you can simply use a lambda expression to implement a predicate for std::find_if to get first element or use std::copy_if to copy all elements to an target sequence. If you decide to use boost you can use Boost Range with filtering.

STL Map versus Static Array

I have to store information about contents in a lookup table such that it can be accessed very quickly.I might need to refer some of the elements in look up table recursively to get complete information about contents. What will be better data structure to use:
Map with one of parameter, which will be unique to all the entries in look up table, as key and rest of the information as value
Use static array for each unique entries and access them when needed according to key(which will be same as the one used in MAP).
I want my software to be robust as if we have any crash it will be catastrophic for my product.
It depends on the range of keys that you have.
Usually, when you say lookup table, you mean a smallish table which you can index directly ( O(1) ). As a dumb example, for a substitution cipher, you could have a char cipher[256] and simply index with the ASCII code of a character to get the substitution character. If the keys are complex objects or simply too many, you're probably stuck with a map.
You might also consider a hashtable (see unordered_map).
Reply:
If the key itself can be any 32-bit number, it wouldn't make sense to store a very sparse 4-billion element array.
If however your keys are themselves between say 0..10000, then you can have a 10000-element array containing pointers to your objects (or the objects themselves), with only 2000-5000 of your elements containing non-null pointers (or meaningful data, respectively). Access will be O(1).
If you can have large keys, then I'd probably go with the unordered_map. With a map of 5000 elements, you'd get O(log n) to mean around ~12 accesses, a hash table should be pretty much one or two accesses tops.
I'm not familiar with perfect hashes, so I can't advise about their implementation. If you do choose that, I'd be grateful for a link or two with ideas to keep in mind.
The lookup times in a std::map should be O=ln(n), with a linear search in a static array in the worst case O=n.
I'd strongly opt for a std::map even if it has a larger memory footprint (which should not matter, in the most cases).
Also you can make "maps of maps" or even deeper structures:
typedef std::map<MyKeyType, std::map<MyKeyType, MyValueType> > MyDoubleMapType;

Multiple keys Hash Table (unordered_map)

I need to use multiple keys(int type) to store and retrieve a single value from a hash table. I would use multiple key to index a single item. I need fast insertion and look up for the hash table. By the way, I am not allowed to use the Boost library in the implementation.
How could I do that?
If you mean that two ints form a single key then unordered_map<std::pair<int,int>, value_type>. If you want to index the same set of data by multiple keys then look at Boost.MultiIndex.
If the key to your container is comprised of the combination of multiple ints, you could use boost::tuple as your key, to encapsulate the ints without more work on your part. This holds provided your count of key int subcomponents is fixed.
Easiest way is probably to keep a map of pointers/indexes to the elements in a list.
A few more details are needed here though, do you need to support deletion? how are the elements setup? Can you use boost::shared pointers? (rather helpful if you need to support deletion)
I'm assuming that the value object in this case is large, or there is some other reason you can't simply duplicate values in a regular map.
If its always going to be a combination for retrieval.
Then its better to form a single compound key using multiple keys.
You can do this either
Storing the key as a concatenated string of ints like
(int1,int2,int3) => data
Using a higher data type like uint64_t where in u can add individual values to form a key
// Refer comment below for the approach

set map implementation in C++

I find that both set and map are implemented as a tree. set is a binary search tree, map is a self-balancing binary search tree, such as red-black tree? I am confused about the difference about the implementation. The difference I can image are as follow
1) element in set has only one value(key), element in map has two values.
2) set is used to store and fetch elements by itself. map is used to store and fetch elements via key.
What else are important?
Maps and sets have almost identical behavior and it's common for the implementation to use the exact same underlying technique.
The only important difference is map doesn't use the whole value_type to compare, just the key part of it.
Usually you'll know right away which you need: if you just have a bool for the "value" argument to the map, you probably want a set instead.
Set is a discrete mathematics concept that, in my experience, pops up again and again in programming. The stl set class is a relatively efficient way to keep track of sets where the most common opertions are insert/remove/find.
Maps are used where objects have a unique identity that is small compared to their entire set of attributes. For example, a web page can be defined as a URL and a byte stream of contents. You could put that byte stream in a set, but the binary search process would be extremely slow (since the contents are much bigger than the URL) and you wouldn't be able to look up a web page if its contents change. The URL is the identity of the web page, so it is the key of the map.
A map is usually implemented as a set< std::pair<> >.
The set is used when you want an ordered list to quickly search for an item, basically, while a map is used when you want to retrieve a value given its key.
In both cases, the key (for map) or value (for set) must be unique. If you want to store multiple values that are the same, you would use multimap or multiset.