Does hash_map automatically sort [C++]? - c++

In the below code, hash_map is automatically sorting or maybe inserting elements in a sorted order. Any ideas why it does this?? Suggestions Please??
This is NOT a homework problem, trying to solve an interview question posted on glassdoor dot com.
#include <iostream>
#include <vector>
#include <ext/hash_map>
#include <map>
#include <string.h>
#include <sstream>
using namespace __gnu_cxx;
using namespace std;
struct eqstr
{
bool operator()(int i, int j) const
{
return i==j;
}
};
typedef hash_map<int, int, hash<int>, eqstr> myHash;
int main()
{
myHash array;
int inputArr[20] = {1,43,4,5,6,17,12,163,15,16,7,18,19,20,122,124,125,126,128,100};
for(int i=0;i<20;i++){
array[inputArr[i]] = inputArr[i]; //save value
}
myHash::iterator it = array.begin();
int data;
for (; it != array.end(); ++it) {
data = it->first;
cout << ":: " << data;
}
}
//!Output ::: 1:: 4:: 5:: 6:: 7:: 12:: 15:: 16:: 17:: 18:: 19:: 20:: 43:: 100:: 122:: 124:: 125:: 126:: 128:: 163

Hash map will not automatically sort your data. In fact the order is unspecified, depending on your hash function and input order. It is just that in your case the numbers turns out are sorted.
You may want to read about hash table for how this container stores the data.
A clear counter example can be created by replacing that 100 with 999999999. The result is
:: 1:: 4:: 5:: 6:: 7:: 12:: 15:: 16:: 17:: 18:: 19:: 20:: 999999999:: 43:: 122:: 124:: 125:: 126:: 128:: 163
(The actual reason is the hash_map's bucket_count is 193 and the hash function of int is an identity function, so any numbers below 193 will appear sorted.)

A hash map may appear to be sorted based on a few factors:
The hash function returns values that are in the same order as the input, for the range of input that you are providing.
There are no hash collisions.
There are no guarantees that the results will be sorted, it's only a coincidence if they come out that way.

Think about how the hash function operates. A hash is always a function f:input->output that maps the input set I into a (usually smaller) output set O so that the input set is approximately uniformly distributed across the output set.
There is no requirement that order should be preserved; in fact, it's unusual that it would be preserved, because (since the output set is smaller) there will be values *i,j*that have the same hash. This is called a collision.
On the other hand, there's no reason it shouldn't. IN fact, it can be proven that there will always exist at least one sequence that will preserve the order.
But there's another possibility: if ALL the values collide, then they get stored in some other kind of data structure, like a list. It may be that these all collidge, and the other structure imposes order.
Three possibilities: hash_map happens to sort that particular sequence, or hash_map is actually implimented as an array, or the values collidge and the implementation stores collisions in a way that gives a sorted order.

Related

error in iterating maps to get input

In chess, each type of coin has some weight. Given the name of the coin and weight for the coin, write a C++ code to print the name of the coins in ascending order of their weight. Assume that weight of each coin is unique.
I want to use map
My codes is here
#include <iostream>
#include <map>
using namespace std;
int main(){
int n,i=0;
char name;
int weight;
cin>>n;
class std::map<char,int> coins;
while(i<n)
{
i++;
cin>>name;
cin>>weight;
coins[name]=weight;
}
coins.sort(coins.begin(),coins.end(),weight);
while(i<n){
i++;
cout<<coins;
}
You cannot sort a map. The elements are already sorted in an
order defined by the key of each element.
If it is guaranteed that every piece will have a unique weight,
none the same as any other,
then you can use weight instead of name as your key
in a map of type std::map<int,char>.
Then simply iterating through the map will give you the elements
in order of increasing weight.
But if you do that
and if it ever happens that someone specifies two pieces that have
the same weight in the input to your program, one of the pieces
will be lost and will not appear at all in the output list.
For the reason just mentioned, using a map this way has a bad
"code smell" and I would be reluctant to use it in real life.
I would also be hesitant to give this as an answer to an
exercise in a programming course.
If you use multimap instead of map, however, you can have
multiple elements with the same key, still sorted according to the
order of their keys.
That seems like a much better idea.
No need to sort . Only list needs to be sorted . Maps by default keep the data in sorted format

Efficient C++ data structure to search for intervals in sets of integer values

I'm looking for a data structure (and an C++ implementation) that allows to search (efficiently) for all elements having an integer value within a given interval. Example: say the set contains:
3,4,5,7,11,13,17,20,21
Now I want to now all elements from this set within [5,19]. So the answer should be 5,7,11,13,17
For my usage trivial search is not an option, as the number of elements is large (several million elements) and I have to do the search quite often. Any suggestions?
For this, you typically use std::set, that is an ordered set which has a search tree built on top (at least that's one possible implementation).
To get the elements in the queried interval, find the two iterators pointing at the first and last element you're looking for. That's a use case of the algorithm std::lower_bound and upper_bound to consider both interval limits as inclusive: [x,y]. (If you want to have the end exclusive, use lower_bound also for the end.)
These algorithms have logarithmic complexity on the size of the set: O(log n)
Note that you may also use a std::vector if you sort it before applying these operations. This might be advantageous in some situations, but if you always want to sort the elements, use std::set, as it does that automatically for you.
Live demo
#include <set>
#include <algorithm>
#include <iostream>
int main()
{
// Your set (Note that these numbers don't have to be given in order):
std::set<int> s = { 3,4,5,7,11,13,17,20,21 };
// Your query:
int x = 5;
int y = 19;
// The iterators:
auto lower = std::lower_bound(s.begin(), s.end(), x);
auto upper = std::upper_bound(s.begin(), s.end(), y);
// Iterating over them:
for (auto it = lower; it != upper; ++it) {
// Do something with *it, or just print *it:
std::cout << *it << '\n';
}
}
Output:
5
7
11
13
17
For searching within the intervals like you mentioned, Segment trees are the best. In competitive programming, several questions are based on this data structure.
One such implementation could be found here:
http://www.sanfoundry.com/cpp-program-implement-segement-tree/
You might need to modify the code to suit your question, but the basic implementation remains the same.

Word Frequency Statistics

In an pre-interview, I am faced with a question like this:
Given a string consists of words separated by a single white space, print out the words in descending order sorted by the number of times they appear in the string.
For example an input string of “a b b” would generate the following output:
b : 2
a : 1
Firstly, I'd say it is not so clear that whether the input string is made up of single-letter words or multiple-letter words. If the former is the case, it could be simple.
Here is my thought:
int c[26] = {0};
char *pIn = strIn;
while (*pIn != 0 && *pIn != ' ')
{
++c[*pIn];
++pIn;
}
/* how to sort the array c[26] and remember the original index? */
I can get the statistics of the frequecy of every single-letter word in the input string, and I can get it sorted (using QuickSort or whatever). But after the count array is sorted, how to get the single-letter word associated with the count so that I can print them out in pair later?
If the input string is made of of multiple-letter word, I plan to use a map<const char *, int> to track the frequency. But again, how to sort the map's key-value pair?
The question is in C or C++, and any suggestion is welcome.
Thanks!
I would use a std::map<std::string, int> to store the words and their counts. Then I would use something this to get the words:
while(std::cin >> word) {
// increment map's count for that word
}
finally, you just need to figure out how to print them in order of frequency, I'll leave that as an exercise for you.
You're definitely wrong in assuming that you need only 26 options, 'cause your employer will want to allow multiple-character words as well (and maybe even numbers?).
This means you're going to need an array with a variable length. I strongly recommend using a vector or, even better, a map.
To find the character sequences in the string, find your current position (start at 0) and the position of the next space. Then that's the word. Set the current position to the space and do it again. Keep repeating this until you're at the end.
By using the map you'll already have the word/count available.
If the job you're applying for requires university skills, I strongly recommend optimizing the map by adding some kind of hashing function. However, judging by the difficulty of the question I assume that that is not the case.
Taking the C-language case:
I like brute-force, straightforward algos so I would do it in this way:
Tokenize the input string to give an unsorted array of words. I'll have to actually, physically move each word (because each is of variable length); and I think I'll need an array of char*, which I'll use as the arg to qsort( ).
qsort( ) (descending) that array of words. (In the COMPAR function of qsort(), pretend that bigger words are smaller words so that the array acquires descending sort order.)
3.a. Go through the now-sorted array, looking for subarrays of identical words. The end of a subarray, and the beginning of the next, is signalled by the first non-identical word I see.
3.b. When I get to the end of a subarray (or to the end of the sorted array), I know (1) the word and (2) the number of identical words in the subarray.
EDIT new step 4: Save, in another array (call it array2), a char* to a word in the subarry and the count of identical words in the subarray.
When no more words in sorted array, I'm done. it's time to print.
qsort( ) array2 by word frequency.
go through array2, printing each word and its frequency.
I'M DONE! Let's go to lunch.
All the answers prior to mine did not give really an answer.
Let us think on a potential solution.
There is a more or less standard approach for counting something in a container.
We can use an associative container like a std::map or a std::unordered_map. And here we associate a "key", in this case the word, to a count, with a value, in this case the count of the specific word.
And luckily the maps have a very nice index operator[]. This will look for the given key and, if found, return a reference to the value. If not found, then it will create a new entry with the key and return a reference to the new entry. So, in both cases, we will get a reference to the value used for counting. And then we can simply write:
std::unordered_map<char,int> counter{};
counter[word]++;
And that looks really intuitive.
After this operation, you have already the frequency table. Either sorted by the key (the word), by using a std::map or unsorted, but faster accessible with a std::unordered_map.
Now you want to sort according to the frequency/count. Unfortunately this is not possible with maps.
Therefore we need to use a second container, like a ```std::vector`````which we then can sort unsing std::sort for any given predicate, or, we can copy the values into a container, like a std::multiset that implicitely orders its elements.
For getting out the words of a std::string we simply use a std::istringstream and the standard extraction operator >>. No big deal at all.
And because writing all this long names for the std containers, we create alias names, with the using keyword.
After all this, we now write ultra compact code and fulfill the task with just a few lines of code:
#include <iostream>
#include <string>
#include <sstream>
#include <utility>
#include <set>
#include <unordered_map>
#include <type_traits>
#include <iomanip>
// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;
// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;
// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Rank = std::multiset<Pair, Comp>;
// ------------------------------------------------------------
std::istringstream text{ " 4444 55555 1 22 4444 333 55555 333 333 4444 4444 55555 55555 55555 22 "};
int main() {
Counter counter;
// Count
for (std::string word{}; text >> word; counter[word]++);
// Sort
Rank rank(counter.begin(), counter.end());
// Output
for (const auto& [word, count] : rank) std::cout << std::setw(15) << word << " : " << count << '\n';
}

Simple and effective way to store data that can be accessed though key or ordinal c++

I need to create a data structure that can access elements by a string key, or by their ordinal.
the class currently uses an array of nodes that contain the string key and a pointer to whatever element. This allows for O(n) looping through, or O(1) getting an element by ordinal, however the only way I've found to find an element by key is doing an O(n) loop and comparing keys until I find what I want, which is SLOW when there are 1000+ elements. is there a way to use the key to reference the pointer, or am I out of luck?
EDIT: the by ordinal is not so much important as the O(n) looping. This is going to be used as a base structure that will be inherited for use in other ways, for instance, if it was a structure of draw able objects, i'd want to be able to draw all of them in a single loop
You can use std::map for O(log n) searching speed. View this branch for more details. In this branch exactly your situation (fast retrieving values by string or/and ordinal key) is discussed.
Small example (ordinal keys are used, you can do similiar things with strings):
#include <map>
#include <string>
using std::map;
using std::string;
struct dummy {
unsigned ordinal_key;
string dummy_body;
};
int main()
{
map<unsigned, dummy> lookup_map;
dummy d1;
d1.ordinal_key = 10;
lookup_map[d1.ordinal_key] = d1;
// ...
unsigned some_key = 20;
//determing if element with desired key is presented in map
if (lookup_map.find(some_key) != lookup_map.end())
//do stuff
}
If you seldom modify your array you can just keep it sorted and use binary_search on it to find the element by key in O(logn) time (technically O(klogn) since you're using strings [where k is the average length of a key string]).
Of course this (just like using a map or unordered_map) will mess up your ordinal retrieval since the elements are going to be stored in sorted order not insertion order.
Use vector and map:
std::vector<your_struct> elements;
std::map<std::string, int> index;
Map allows you to retrieve the key's index in O(lg n) time, whereas the vector allows O(1) element access by index.
Use a hashmap

Sorting 1000-2000 elements with many cache misses

I have an array of 1000-2000 elements which are pointers to objects. I want to keep my array sorted and obviously I want to do this as quick as possible. They are sorted by a member and not allocated contiguously so assume a cache miss whenever I access the sort-by member.
Currently I'm sorting on-demand rather than on-add, but because of the cache misses and [presumably] non-inlining of the member access the inner loop of my quick sort is slow.
I'm doing tests and trying things now, (and see what the actual bottleneck is) but can anyone recommend a good alternative to speeding this up?
Should I do an insert-sort instead of quicksorting on-demand, or should I try and change my model to make the elements contigious and reduce cache misses?
OR, is there a sort algorithm I've not come accross which is good for data that is going to cache miss?
Edit: Maybe I worded this wrong :), I don't actually need my array sorted all the time (I'm not iterating through them sequentially for anything) I just need it sorted when I'm doing a binary chop to find a matching object, and doing that quicksort at that time (when I want to search) is currently my bottleneck, because of the cache misses and jumps (I'm using a < operator on my object, but I'm hoping that inlines in release)
Simple approach: insertion sort on every insert. Since your elements are not aligned in memory I'm guessing linked list. If so, then you could transform it into a linked list with jumps to the 10th element, the 100th and so on. This is kind of similar to the next suggestion.
Or you reorganize your container structure into a binary tree (or what every tree you like, B, B*, red-black, ...) and insert elements like you would insert them into a search tree.
Running a quicksort on each insertion is enormously inefficient. Doing a binary search and insert operation would likely be orders of magnitude faster. Using a binary search tree instead of a linear array would reduce the insert cost.
Edit: I missed that you were doing sort on extraction, not insert. Regardless, keeping things sorted amortizes sorting time over each insert, which almost has to be a win, unless you have a lot of inserts for each extraction.
If you want to keep the sort on-extract methodology, then maybe switch to merge sort, or another sort that has good performance for mostly-sorted data.
I think the best approach in your case would be changing your data structure to something logarithmic and rethinking your architecture. Because the bottleneck of your application is not that sorting thing, but the question why do you have to sort everything on each insert and try to compensate that by adding on-demand sort?.
Another thing you could try (that is based on your current implementation) is implementing an external pointer - something mapping table / function and sort those second keys, but I actually doubt it would benefit in this case.
Instead of the array of the pointers you may consider an array of structs which consist of both a pointer to your object and the sort criteria. That is:
Instead of
struct MyType {
// ...
int m_SomeField; // this is the sort criteria
};
std::vector<MyType*> arr;
You may do this:
strcut ArrayElement {
MyType* m_pObj; // the actual object
int m_SortCriteria; // should be always equal to the m_pObj->m_SomeField
};
std::vector<ArrayElement> arr;
You may also remove the m_SomeField field from your struct, if you only access your object via this array.
By such in order to sort your array you won't need to dereference m_pObj every iteration. Hence you'll utilize the cache.
Of course you must keep the m_SortCriteria always synchronized with m_SomeField of the object (in case you're editing it).
As you mention, you're going to have to do some profiling to determine if this is a bottleneck and if other approaches provide any relief.
Alternatives to using an array are std::set or std::multiset which are normally implemented as R-B binary trees, and so have good performance for most applications. You're going to have to weigh using them against the frequency of the sort-when-searched pattern you implemented.
In either case, I wouldn't recommend rolling-your-own sort or search unless you're interested in learning more about how it's done.
I would think that sorting on insertion would be better. We are talking O(log N) comparisons here, so say ceil( O(log N) ) + 1 retrieval of the data to sort with.
For 2000, it amounts to: 8
What's great about this is that you can buffer the data of the element to be inserted, that's how you only have 8 function calls to actually insert.
You may wish to look at some inlining, but do profile before you're sure THIS is the tight spot.
Nowadays you could use a set, either a std::set, if you have unique values in your structure member, or, std::multiset if you have duplicate values in you structure member.
One side note: The concept using pointers, is in general not advisable.
STL containers (if used correctly) give you nearly always an optimized performance.
Anyway. Please see some example code:
#include <iostream>
#include <array>
#include <algorithm>
#include <set>
#include <iterator>
// Demo data structure, whatever
struct Data {
int i{};
};
// -----------------------------------------------------------------------------------------
// All in the below section is executed during compile time. Not during runtime
// It will create an array to some thousands pointer
constexpr std::size_t DemoSize = 4000u;
using DemoPtrData = std::array<const Data*, DemoSize>;
using DemoData = std::array<Data, DemoSize>;
consteval DemoData createDemoData() {
DemoData dd{};
int k{};
for (Data& d : dd)
d.i = k++*2;
return dd;
}
constexpr DemoData demoData = createDemoData();
consteval DemoPtrData createDemoPtrData(const DemoData& dd) {
DemoPtrData dpd{};
for (std::size_t k{}; k < dpd.size(); ++k)
dpd[k] = &dd[k];
return dpd;
}
constexpr DemoPtrData dpd = createDemoPtrData(demoData);
// -----------------------------------------------------------------------------------------
struct Comp {bool operator () (const Data* d1, const Data* d2) const { return d1->i < d2->i; }};
using MySet = std::multiset<const Data*, Comp>;
int main() {
// Add some thousand pointers. Will be sorted according to struct member
MySet mySet{ dpd.begin(), dpd.end() };
// Extract a range of data. integer values between 42 and 52
const Data* p42 = dpd[21];
const Data* p52 = dpd[26];
// Show result
for (auto iptr = mySet.lower_bound(p42); iptr != mySet.upper_bound(p52); ++iptr)
std::cout << (*iptr)->i << '\n';
// Insert a new element
Data d1{ 47 };
mySet.insert(&d1);
// Show again
std::cout << "\n\n";
for (auto iptr = mySet.lower_bound(p42); iptr != mySet.upper_bound(p52); ++iptr)
std::cout << (*iptr)->i << '\n';
}