Separate chaining with priority queue (using std::map) - c++

I just start learning hash table and while trying with std::map I come up with this question: when using separate chaining method to solve collision, can I use std:: priority_queue instead of just list?
For example there is a big group of people and I have the information of their first names and ages, and what I want to get is sorted lists of people with same first name e.g. 'David' base on their ages.
So to do this I first use their first name as the key to put these people into the map, and then people with same name that cause the collision should be solved with std::priority_queue base on age.
Is this the right way to solve this problem?
And I just realize that I don't really know the mystery behind std::map, is it using separate chaining or linear probing to solve collision? I couldn't find the answer for that.
Simple code I have for the question I described that might help clarify it a bit:
class people {
public:
people(string inName, int inAge):firstName(inName), age(inAge){};
private:
string firstName;
int age;
}
int main(int argc, char ** argv) {
string name;
int age;
name = "David";
age = 25;
people aPerson(name, age);
//This is just an example, there are usually more than two attributes to deal with.
std::map <string, people> peopleList;
peopleList[name] = aPerson;
//now how do I implement the priority queue for collision first names?
}
Thanks in advance!
EDIT: since I need O(1) search, I should use unordered map instead of map.

Right now you have a mapping between a name and a single people object. You need to change your mapping to be a map between a name and a std::priority_queue, with a custom comparator for the priority queue:
auto comparator = [](const people& p1, const people& p2) -> bool
{ return (p1.age < p2.age); }
std::map<std::string,
std::priority_queue<people, std::vector<people>, comparator>> peopleList;
// ...
peopleList[name].push(aPerson);

Related

Best way to group string members of object in a vector

I am trying to store a vector of objects and sort them by a string member possessed by each object. It doesn't need to be sorted alphabetically, it only needs to group every object with an identical string together in the vector.
IE reading through the vector and outputting the strings from beginning to end should return something like:
string_bulletSprite
string_bulletSprite
string_bulletSprite
string_playerSprite
string_enemySprite
string_enemySprite
But should NEVER return something like:
string_bulletSprite
string_playerSprite
string_bulletSprite
[etc.]
Currently I am using std:sort and a custom comparison function:
std::vector<GameObject*> worldVector;
[...]
std::sort(worldVector.begin(), worldVector.end(), compString);
And the comparison function used in the std::sort looks like this:
bool compString(GameObject* a, GameObject* b)
{
return a->getSpriteNameAndPath() < b->getSpriteNameAndPath();
}
getSpriteNameAndPath() is a simple accessor which returns a normal string.
This seems to work fine. I've stress tested this a fair bit and it seems to always group things together the way I wanted.
My question is, is this the ideal or most logical/efficient way of accomplishing the stated goal? I get the impression Sort isn't quite meant to be used this way and I'm wondering if there's a better way to do this if all I want to do is group but don't care about doing so in alphabetic order.
Or is this fine?
If you have lots of equivalent elements in your range, then std::sort is less efficient than manually sorting the elements.
You can do this by shifting the minimum elements to the beginning of the range, and then repeating this process on the remaining non-minimum elements
// given some range v
auto b = std::begin(v); // keeps track of remaining elements
while (b != std::end(v)) // while there's elements to be arranged
{
auto min = *std::min_element(b, std::end(v)); // find the minimum
// move elements matching that to the front
// and simultaneously update the remaining range
b = std::partition(b, std::end(v),
[=](auto const & i) {
return i == min;
});
}
Of course, a custom comparator can be passed to min_element, and the lambda in partition can be modified if equivalence is defined some other way.
Note that if you have very few equivalent elements, this method is much less efficient than using std::sort.
Here's a demo with a range of ints.
I hope I understood your question correctly, if so, I will give you a little example of std::map which is great for grouping things by keys, which will most probably be a std::string.
Please take a look:
class Sprite
{
public:
Sprite(/* args */)
{
}
~Sprite()
{
}
};
int main(int argc, char ** argv){
std::map <std::string, std::map<std::string, Sprite>> sprites;
std::map <std::string, Sprite> spaceships;
spaceships.insert(std::make_pair("executor", Sprite()));
spaceships.insert(std::make_pair("millennium Falcon", Sprite()));
spaceships.insert(std::make_pair("death star", Sprite()));
sprites.insert(std::make_pair("spaceships",spaceships));
std::cout << sprites["spaceships"]["executor"].~member_variable_or_function~() << std::endl;
return 0;
}
Seems like Functor or Lambda is the way to go for this particular program, but I realized some time after posting that I could just create an ID for the images and sort those instead of strings. Thanks for the help though, everyone!

Usefulness of KeyEqual in std::unordered_set/std::unordered_map

I understand that this may be vague question, but I wonder what are real world cases when custom comparator is useful for hash containers in std.
I understand it's usefulness in ordered containers, but for hash containers it seems a bit weird.
Reason for this is that hash value for elements that are equal according to comparator needs to be the same, and I believe that in most cases that actually means converting lookup/insert element to some common representation(it is faster and easier to implement).
For example:
set of case insensitive strings: if you want to hash properly you need to uppercase/lowercase the entire string anyway.
set of fractions(where 2/3 == 42/63): you need to convert 42/63 to 2/3 and then hash that...
So I wonder if someone can provide some real world examples of usefulness of customizing std::unordered_ template parameters so I can recognize those patterns in future code I write.
Note 1: "symmetry argument" (std::map enables customization of a comparator so std::unordred_should be customizable also) is something I considered and I do not think it is convincing.
Note 2: I mixed 2 kind of comparators (< and ==) in the post for brevity, I know that std::map uses < and std::unordered_map uses ==.
As per https://en.cppreference.com/w/cpp/container/unordered_set
Internally, the elements are not sorted in any particular order, but
organized into buckets. Which bucket an element is placed into depends
entirely on the hash of its value. This allows fast access to
individual elements, since once a hash is computed, it refers to the
exact bucket the element is placed into.
So the hash function defines the bucket your element will end up in, but once the bucket is decided, in order to find the element, the operator == will be used.
Basically operator == is used to resolve hash collision, and hence, you need your hash function and your operator == to be consistent. Furthermore, if your operator operator == says that two elements are equal, the set will not allow a duplication.
For what concerns customization, I think that the idea of case-insensitive set of strings is a good one: given two strings you will need to provide a case-insensitive hash-function to allow the set to determine the bucket it has to store the string in. Then you will need to provide a custom KeyEqualto allow the set to actually retrieve the element.
A case I had to deal with, in the past, was a way to allow users to insert strings, keeping track of their order of insertion but avoiding duplicates. So, given a struct like:
struct keyword{
std::string value;
int sequenceCounter;
};
You want to detect duplicates according only to value. One of the solutions I came up with was an unordered_set with a custom comparator/hash function, that used only value. This allowed me to check for the existence of a key before allowing insertion.
One interesting usage is to define memory efficient indexes (database sense of the term) for a given set of objects.
Example
Let's say we have a program that has a collection of N objects of this class:
struct Person {
// each object has a unique firstName/lastName pair
std::string firstName;
std::string lastName;
// each object has a unique ssn value
std::string socialSecurityNumber;
// each object has a unique email value
std::string email;
}
And we need to retrieve efficiently objects by the value of any unique property.
Implementations comparison
Time complexities are given assuming string comparisons are constant time (strings have limited length).
1) Single unordered_map
With a single map indexed by a single key (ex: email):
std::unordered_map<std::string,Person> indexedByEmail;
Time complexity: lookup by any unique property other than email requires a traversal of the map: average O(N).
Memory usage: the email value is duplicated. This could be avoided by using a single set with custom hash & compare (see 3).
2) Multiple unordered_map, no custom hash & compare
With a map for each unique property, with default hash & comparisons:
std::unordered_map<std::pair<std::string,std::string>, Person> byName;
std::unordered_map<std::string, const Person*> byEmail;
std::unordered_map<std::string, const Person*> bySSN;
Time complexity: by using the appropriate map, a lookup by any unique property is average O(1).
Memory usage: inefficient, because of all the string duplications.
3) Multiple unordered_set, custom hash & comparison:
With custom hash & comparison, we define different unordered_set which will hash & compare only specific fields of the objects. Theses sets can be used to perform lookup as if items were stored in a map as in 2, but without duplicating any field.
using StrHash = std::hash<std::string>;
// --------------------
struct PersonNameHash {
std::size_t operator()(const Person& p) const {
// not the best hashing function in the world, but good enough for demo purposes.
return StrHash()(p.firstName) + StrHash()(p.lastName);
}
};
struct PersonNameEqual {
bool operator()(const Person& p1, const Person& p2) const {
return (p1.firstName == p2.firstName) && (p1.lastName == p2.lastName);
}
};
std::unordered_set<Person, PersonNameHash, PersonNameEqual> byName;
// --------------------
struct PersonSsnHash {
std::size_t operator()(const Person* p) const {
return StrHash()(p->socialSecurityNumber);
}
};
struct PersonSsnEqual {
bool operator()(const Person* p1, const Person* p2) const {
return p1->socialSecurityNumber == p2->socialSecurityNumber;
}
};
std::unordered_set<const Person*, PersonSsnHash, PersonSsnEqual> bySSN;
// --------------------
struct PersonEmailHash {
std::size_t operator()(const Person* p) const {
return StrHash()(p->email);
}
};
struct PersonEmailEqual {
bool operator()(const Person* p1, const Person* p2) const {
return p1->email == p2->email;
}
};
std::unordered_set<const Person*,PersonEmailHash,PersonEmailEqual> byEmail;
Time complexity: a lookup by any unique property is still O(1) average.
Memory usage: much better than 2): no string duplication.
Live demo
The hash function itself does something to extract features in a certain way, and The comparator's job is to distinguish whether features are the same or not
With a "shell" of data you may not need to modify the comparator
Briefly: put a feature shell on the data. Features are responsible for being compared
As a matter of fact, I don't quite understand what you problem description. My speech is inevitably confused in logic. Please understand.
:)

Graph based on unordered_map performance (short version)

Hello :) I am implementing some graph where vertices are strings. I do many things with them, so using strings would be highly ineffective. That is why I am using indexes, simple ints. But although the rest of the class works pretty fast, I have trouble with the part I copied below. I've read somewhere that unordered_map needs some hash function, should I add it? If yes, how? The code below contains EVERYTHING that I am doing with the unordered_map.
Thank you in advance for help :)
class Graph
{
private:
unordered_map <string, int> indexes_of_vertices;
int number_of_vertices;
int index_counter;
int get_index(string vertex)
{
if (indexes_of_vertices.count(vertex) == 0) // they key is missing yet
{
indexes_of_vertices[vertex] = index_counter;
return index_counter++;
}
else
return indexes_of_vertices[vertex];
}
public:
Graph(int number_of_vertices)
{
this->number_of_vertices = number_of_vertices;
index_counter = 0;
}
};
Here's a quick optimization for what seems to be the important function:
int get_index(const string& vertex)
{
typedef unordered_map <string, int> map_t;
pair<map_t::iterator, bool> inserted =
indexes_of_vertices.insert(map_t::value_type(vertex, index_counter));
if (inserted.second) // the key was missing until now
return index_counter++;
else // inserted.second is false, means vertex was already there
return inserted.first->second; // this is the value
}
The optimizations are:
Take argument by const-ref.
Do a single map lookup instead of two: we speculatively insert() then see if it worked or not, which saves a redundant lookup in either case.
Please let us know how much difference that makes. Another idea, if your keys are usually small, is to use a self-contained string type like GCC's vstring which avoids ex-situ memory allocation for strings under one or two dozen characters. And then to consider whether your data are really large enough to benefit from a hash table, or if another data structure would be more efficient.

How to find an object in std::vector?

Suppose I have a class called Bank, with attributes
class Bank {
string _name;
}
Now I declare a vector of Bank.
vector<Bank> list;
Given a string, how do I search the vector list for that particular Bank object that has the same string name?
I'm trying to avoid doing loops and see if there is an stl function that can do this.
You can use good old linear search:
auto it = std::find_if(list.begin(), list.end(), [&](const Bank& bank)
{
return bank._name == the_name_you_are_looking_for;
});
If there is no such bank in the list, the end iterator will be returned:
if (it == list.end())
{
// no bank in the list with the name you were looking for :-(
}
else
{
// *it is the first bank in the list with the name you were looking for :-)
}
If your compiler is from the stone ages, it won't understand lambdas and auto. Untested C++98 code:
struct NameFinder
{
const std::string& captured_name;
bool operator()(const Bank& bank) const
{
return bank.name == captured_name;
}
};
NameFinder finder = {the_name_you_are_looking_for};
std::vector<Bank>::iterator it = std::find_if(list.begin(), list.end(), finder);
As per popular request, just a side note to warn potential beginners attracted by this question in the future:
std::find is using a linear method, because the underlying object (a vector in that case) is not designed with search efficiency in mind.
Using a vector for data where search time is critical will possibly work, given the computing power available in your average PC, but could become slow quickly if the volume of data to handle grows.
If you need to search quickly, you have other containers (std::set, std::map and a few variants) that allows retrieval in logarithmic times.
You can even use hash tables for (near) instant access in containers like unordered_set and unordered_map, but the cost of other operations grows accordingly. It's all a matter of balance.
You can also sort the vector first and then perform a dichotomic search with std:: algorithms, like binary_search if you have a strict order or lower_bound, upper_bound and equal_range if you can only define a partial order on your elements.
std::find will allow you to search through the vector in a variety of ways.

Trying to keep age/name pairs matched after sorting

I'm writing a program where the user inputs names and then ages. The program then sorts the list alphabetically and outputs the pairs. However, I'm not sure how to keep the ages matched up with the names after sorting them alphabetically. All I've got so far is...
Edit: Changed the code to this -
#include "std_lib_facilities.h"
struct People{
string name;
int age;
};
int main()
{
vector<People>nameage;
cout << "Enter name then age until done. Press enter, 0, enter to continue.:\n";
People name;
People age;
while(name != "0"){
cin >> name;
nameage.push_back(name);
cin >> age;
nameage.push_back(age);}
vector<People>::iterator i = (nameage.end()-1);
nameage.erase(i);
}
I get compiler errors for the != operator and the cin operators. Not sure what to do.
Rather than two vectors (one for names, and one for ages), have a vector of a new type that contains both:
struct Person
{
string name;
double age;
};
vector<Person> people;
edit for comments:
Keep in mind what you're now pushing onto the vector. You must push something of type Person. You can do this in a couple of ways:
Push back a default constructed person and then set the name and age fields:
people.push_back(Person());
people.back().name = name;
people.back().age = age;
Give Person a constructor that takes a name and an age, and push a Person with some values:
struct Person
{
Person(const string& name_, double age_) : name(name_), age(age_) {}
string name;
double age;
};
people.push_back(Person(name, age));
Create a Person, give it some values, and push that into the vector:
Person person;
person.name = name;
person.age = age;
people.push_back(person);
Or more simply:
Person person = { name, age };
people.push_back(person);
(thanks avakar)
In addition to the solution posted by jeje and luke, you can also insert the pairs into a map (or multimap, in case duplicate names are allowed).
assert(names.size() == ages.size());
map<string, double> people;
for (size_t i = 0; i < names.size(); ++i)
people[names[i]] = ages[i];
// The sequence [people.begin(), people.end()) is now sorted
Note that using vector<person> will be faster if you fill it up only once in advance. map will be faster if you decide to add/remove people dynamically.
You should consider putting names and ages together in structured record.
Then sort the records.
J.
You could have a vector of structs/classes, where each one has both a name and an age. When sorting, use a custom comparator that only looks at the name field.
Alternately, build an additional vector of integers [0,names.size()-1]. Sort that, with a custom comparator that instead of comparing a < b compares names[a] < names[b]. After sorting, the integer vector will give you the permutation that you can apply to both the names and ages vectors.
You either need to swap elements in both vectors at the same time (the FORTRAN way), or store a vector of structs or pairs. The later approach is more idiomatic for c-like languages.
You should use the pair<> utility template. Reference here.
G'day,
Given how you're trying to model this, my gut feeling is that you haven't approached the problem from an OO perspective. Try using a class instead of a struct.
struct's are soooo K&R! (-:
Think of a Person as an object and they have attributes that are tightly coupled, e.g. Name and Age. Maybe even address, email, Twitter, weight, height, etc.
Then add to your objects the functions that are meaningful, e.g. comparing ages, weights, etc. Writing a < operator for email addresses or Twitter id's is a bit bizarre though.
OOA is just looking at what attributes your "objects" have in real life and that gives you a good starting point for designing your objects.
To get a better idea of OOA have a look at the excellent book "Object Oriented Systems Analysis: Modeling the World in Data" by Sally Shlaer and Stephen Mellor (sanitised Amazon link). Don't faint at the Amazon price though $83.33 indeed! At least it's $0.01 second hand... (-:
HTH
cheers,