C++ to compare 2 lists of strings

C++ to compare 2 lists of strings - c++

In Python, set is pretty handy for comparing 2 lists of strings (see this link). I was wondering if there's a good solution for C++ in terms of performance. As each list has over 1 million strings in it.
It's case-sensitive matching.

The data types std::set<> (usually implemented as a balanced tree) and std::unordered_set<> (from C++11, implemented as a hash) are available. There is also a convenience algorithm called std::set_intersection that computes the actual intersection.
Here is an example.
#include <iostream>
#include <vector>
#include <string>
#include <set> // for std::set
#include <algorithm> // for std::set_intersection
int main()
{
std::set<std::string> s1 { "red", "green", "blue" };
std::set<std::string> s2 { "black", "blue", "white", "green" };
/* Collecting the results in a vector. The vector may grow quite
large -- it may be more efficient to print the elements directly. */
std::vector<std::string> s_both {};
std::set_intersection(s1.begin(),s1.end(),
s2.begin(),s2.end(),
std::back_inserter(s_both));
/* Printing the elements collected by the vector, just to show that
the result is correct. */
for (const std::string &s : s_both)
std::cout << s << ' ';
std::cout << std::endl;
return 0;
}
Note. If you want to use std::unordered_set<>, the std::set_intersection cannot be used like this, because it expects the input sets to be ordered. You'd have to use the usual technique of a for-loop iterating over the smaller set and finding the elements in the larger one to determine the intersection. Nevertheless, for a large number of elements (especially, strings), the hash-based std::unordered_set<> may be faster. There are also STL-compatible implementations such as the one in Boost (boost::unordered_set) and the one created by Google (sparse_hash_set and dense_hash_set). For various other implementations and benchmarks (including one for strings), see here.

If you don't need much performance I suggest using map/set from STL:
list<string> list, list2;
...
set<string> sndList;
list<string> result;
for(list<string>::iterator it = list2.begin(); it != list2.end(); ++it)
sndList.insert(*it);
for(list<string>::iteratir it = list.begin(); it != list.end(); ++it)
if(sndList.count(*it) > 0)
result.push_back(*it);
Otherwise I suggest some hashing function for comparison.

If it really is a std::list you have, sort them and use set_intersection:
list<string> words1;
list<string> words2;
list<string> common_words;
words1.sort();
words2.sort();
set_intersection(words1.begin(), words1.end(),
words2.begin(), words2.end(),
back_inserter(common_words));

Related

Is there any better way to carry out partial match of a vector of strings in c++?

I have two vectors of strings and want to carry out a partial/full match between each string i.e
std::vector<std::string> A = { "AA", "ABC" }
std::vector<std::string> B = { "AABB", "AABC", "ABC", "BC"}
I want to perform a partial/full match for all strings in A with all strings in B.
Simplest approach would be just run loops over both vectors
for(const string& s1 : A)
{
for(const string& s2 : B)
{
if(s2.find(s1) != std::string::npos)
std::cout << "partial matched" << std::endl;
}
}
Results for the above example would be:
"AA" --> "AABB", "AABC"
"ABC" --> "AABC", "ABC"
Is there any better way to do this?

The brute force method is by far the simplest. You can potentially make the code cleaner by using nested calls to std::any_of.
To be more efficient, you need to wade deep into complex string searching algorithms. I believe the state of the art in multiple-pattern search is Aho-Corasick. A variant is Commentz-Walter.
There's a BSD 2-clause (looks like)-licensed implementation of Aho-Corasick on Github, though you'd probably want to add a modified main searching function to immediately return if any match is found, instead of collecting all.
If you really want to wade deep into the reeds, you could start with this paper and follow the references.

std::set_intersection might be what you want since you do not elaborate on the context.
Assumes the request is:
The match means "full string comparison".
Need to know the partial matched result.
The time complexity of std::intersection is at most 2*(a.size() + b.size()) while the brute force method is O(n^2).
Reference here
Note: you might want to know the construction and insertion time of std::set as well.
Note 2: you may also use sorted std::vector which is cache friendly for std::set_intersection
#include <set>
#include <string>
#include <iostream>
#include <algorithm>
int main()
{
std::set<std::string> a{"bar","foo"};
std::set<std::string> b{"oof","foo","par"};
std::vector<std::string> intersect;
std::set_intersection(b.begin(),b.end(),
a.begin(),a.end(),
std::back_inserter(intersect));
std::cout << intersect.size() << std::endl;
for(std::string& matched: intersect){
std::cout << matched << std::endl;
}
return 0;
}
Live Demo

Comparing unordered_map vs unordered_set

First of all, what is the main difference between them?
The only thing i've found is that unordered_set has no operator [].
How should i access an element in unordered_set, since there is no []?
Which container is using random access to memory(or both)?
And which one of them faster in any sense or using less memory?

They are nearly identical. unordered_set only contains keys, and no values. There is no mapping from a key to a value, so no need for an operator[]. unordered_map maps a key to a value.
You can use the various find methods within unordered_set to locate things.

you can use iterators to access elements.
unordered_set <string> u{
"Dog",
"Cat",
"Rat",
"Parrot",
"bee"
};
for(auto& s:u){
cout << s << ' ';
}
unordered_set<string>::const_iterator point = u.find("bee");

How should I access an element in unordered_set (C++17)?
In C++ 17 a new function extract is added to unordered_set.
Specially, this is the only way to take move only object out of the set.
https://en.cppreference.com/w/cpp/container/unordered_set/extract
For example if you want third element of your unordered set.
Advance the iterator
std::advance(it,2);
Then extarct the value
s.extract(it).value();
Here is the complete code. try on any C++17 compiler.
#include <iostream>
#include <string>
#include <unordered_set>
#include <iterator>
int main()
{
//CREATE AN OBJECT
std::unordered_set<std::string> s;
//INSERT DATA
s.insert("aee");
s.insert("bee");
s.insert("cee");
s.insert("dee");
//NEED TO INCLUDE "iterator" HEADER TO USE "std::advance"
auto it = s.begin();
std::advance(it,2);
//USING EXTRACT
std::string sval = s.extract(it).value();
std::cout<<sval;
}
Note: if queried for out of bound index, nothing happens. No result.
Try changing your code
//ONLY FOUR ELEMENTS
std::advance(it,8);
//USING EXTRACT
std::string sval = s.extract(it).value();

How to preserve insertion order in Map? [duplicate]

I currently have a std::map<std::string,int> that stores an integer value to a unique string identifier, and I do look up with the string. It does mostly what I want, except that it does not keep track of the insertion order. So when I iterate the map to print out the values, they are sorted according to the string; but I want them to be sorted according to the order of (first) insertion.
I thought about using a vector<pair<string,int>> instead, but I need to look up the string and increment the integer values about 10,000,000 times, so I don't know whether a std::vector will be significantly slower.
Is there a way to use std::map or is there another std container that better suits my need?
I'm on GCC 3.4, and I have probably no more than 50 pairs of values in my std::map.

If you have only 50 values in std::map you could copy them to std::vector before printing out and sort via std::sort using appropriate functor.
Or you could use boost::multi_index. It allows to use several indexes.
In your case it could look like the following:
struct value_t {
string s;
int i;
};
struct string_tag {};
typedef multi_index_container<
value_t,
indexed_by<
random_access<>, // this index represents insertion order
hashed_unique< tag<string_tag>, member<value_t, string, &value_t::s> >
>
> values_t;

You might combine a std::vector with a std::tr1::unordered_map (a hash table). Here's a link to Boost's documentation for unordered_map. You can use the vector to keep track of the insertion order and the hash table to do the frequent lookups. If you're doing hundreds of thousands of lookups, the difference between O(log n) lookup for std::map and O(1) for a hash table might be significant.
std::vector<std::string> insertOrder;
std::tr1::unordered_map<std::string, long> myTable;
// Initialize the hash table and record insert order.
myTable["foo"] = 0;
insertOrder.push_back("foo");
myTable["bar"] = 0;
insertOrder.push_back("bar");
myTable["baz"] = 0;
insertOrder.push_back("baz");
/* Increment things in myTable 100000 times */
// Print the final results.
for (int i = 0; i < insertOrder.size(); ++i)
{
const std::string &s = insertOrder[i];
std::cout << s << ' ' << myTable[s] << '\n';
}

Tessil has a very nice implementaion of ordered map (and set) which is MIT license. You can find it here: ordered-map
Map example
#include <iostream>
#include <string>
#include <cstdlib>
#include "ordered_map.h"
int main() {
tsl::ordered_map<char, int> map = {{'d', 1}, {'a', 2}, {'g', 3}};
map.insert({'b', 4});
map['h'] = 5;
map['e'] = 6;
map.erase('a');
// {d, 1} {g, 3} {b, 4} {h, 5} {e, 6}
for(const auto& key_value : map) {
std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}
map.unordered_erase('b');
// Break order: {d, 1} {g, 3} {e, 6} {h, 5}
for(const auto& key_value : map) {
std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}
}

Keep a parallel list<string> insertionOrder.
When it is time to print, iterate on the list and do lookups into the map.
each element in insertionOrder // walks in insertionOrder..
print map[ element ].second // but lookup is in map

If you need both lookup strategies, you will end up with two containers. You may use a vector with your actual values (ints), and put a map< string, vector< T >::difference_type> next to it, returning the index into the vector.
To complete all that, you may encapsulate both in one class.
But I believe boost has a container with multiple indices.

What you want (without resorting to Boost) is what I call an "ordered hash", which is essentially a mashup of a hash and a linked list with string or integer keys (or both at the same time). An ordered hash maintains the order of the elements during iteration with the absolute performance of a hash.
I've been putting together a relatively new C++ snippet library that fills in what I view as holes in the C++ language for C++ library developers. Go here:
https://github.com/cubiclesoft/cross-platform-cpp
Grab:
templates/detachable_ordered_hash.cpp
templates/detachable_ordered_hash.h
templates/detachable_ordered_hash_util.h
If user-controlled data will be placed into the hash, you might also want:
security/security_csprng.cpp
security/security_csprng.h
Invoke it:
#include "templates/detachable_ordered_hash.h"
...
// The 47 is the nearest prime to a power of two
// that is close to your data size.
//
// If your brain hurts, just use the lookup table
// in 'detachable_ordered_hash.cpp'.
//
// If you don't care about some minimal memory thrashing,
// just use a value of 3. It'll auto-resize itself.
int y;
CubicleSoft::OrderedHash<int> TempHash(47);
// If you need a secure hash (many hashes are vulnerable
// to DoS attacks), pass in two randomly selected 64-bit
// integer keys. Construct with CSPRNG.
// CubicleSoft::OrderedHash<int> TempHash(47, Key1, Key2);
CubicleSoft::OrderedHashNode<int> *Node;
...
// Push() for string keys takes a pointer to the string,
// its length, and the value to store. The new node is
// pushed onto the end of the linked list and wherever it
// goes in the hash.
y = 80;
TempHash.Push("key1", 5, y++);
TempHash.Push("key22", 6, y++);
TempHash.Push("key3", 5, y++);
// Adding an integer key into the same hash just for kicks.
TempHash.Push(12345, y++);
...
// Finding a node and modifying its value.
Node = TempHash.Find("key1", 5);
Node->Value = y++;
...
Node = TempHash.FirstList();
while (Node != NULL)
{
if (Node->GetStrKey()) printf("%s => %d\n", Node->GetStrKey(), Node->Value);
else printf("%d => %d\n", (int)Node->GetIntKey(), Node->Value);
Node = Node->NextList();
}
I ran into this SO thread during my research phase to see if anything like OrderedHash already existed without requiring me to drop in a massive library. I was disappointed. So I wrote my own. And now I've shared it.

Here is solution that requires only standard template library without using boost's multiindex:
You could use std::map<std::string,int>; and vector <data>; where in map you store the index of the location of data in vector and vector stores data in insertion order. Here access to data has O(log n) complexity. displaying data in insertion order has O(n) complexity. insertion of data has O(log n) complexity.
For Example:
#include<iostream>
#include<map>
#include<vector>
struct data{
int value;
std::string s;
}
typedef std::map<std::string,int> MapIndex;//this map stores the index of data stored
//in VectorData mapped to a string
typedef std::vector<data> VectorData;//stores the data in insertion order
void display_data_according_insertion_order(VectorData vectorData){
for(std::vector<data>::iterator it=vectorData.begin();it!=vectorData.end();it++){
std::cout<<it->value<<it->s<<std::endl;
}
}
int lookup_string(std::string s,MapIndex mapIndex){
std::MapIndex::iterator pt=mapIndex.find(s)
if (pt!=mapIndex.end())return it->second;
else return -1;//it signifies that key does not exist in map
}
int insert_value(data d,mapIndex,vectorData){
if(mapIndex.find(d.s)==mapIndex.end()){
mapIndex.insert(std::make_pair(d.s,vectorData.size()));//as the data is to be
//inserted at back
//therefore index is
//size of vector before
//insertion
vectorData.push_back(d);
return 1;
}
else return 0;//it signifies that insertion of data is failed due to the presence
//string in the map and map stores unique keys
}

You cannot do that with a map, but you could use two separate structures - the map and the vector and keep them synchronized - that is when you delete from the map, find and delete the element from the vector. Or you could create a map<string, pair<int,int>> - and in your pair store the size() of the map upon insertion to record position, along with the value of the int, and then when you print, use the position member to sort.

One thing you need to consider is the small number of data elements you are using. It is possible that it will be faster to use just the vector. There is some overhead in the map that can cause it to be more expensive to do lookups in small data sets than the simpler vector. So, if you know that you will always be using around the same number of elements, do some benchmarking and see if the performance of the map and vector is what you really think it is. You may find the lookup in a vector with only 50 elements is near the same as the map.

Another way to implement this is with a map instead of a vector. I will show you this approach and discuss the differences:
Just create a class that has two maps behind the scenes.
#include <map>
#include <string>
using namespace std;
class SpecialMap {
// usual stuff...
private:
int counter_;
map<int, string> insertion_order_;
map<string, int> data_;
};
You can then expose an iterator to iterator over data_ in the proper order. The way you do that is iterate through insertion_order_, and for each element you get from that iteration, do a lookup in the data_ with the value from insertion_order_
You can use the more efficient hash_map for insertion_order since you don't care about directly iterating through insertion_order_.
To do inserts, you can have a method like this:
void SpecialMap::Insert(const string& key, int value) {
// This may be an over simplification... You ought to check
// if you are overwriting a value in data_ so that you can update
// insertion_order_ accordingly
insertion_order_[counter_++] = key;
data_[key] = value;
}
There are a lot of ways you can make the design better and worry about performance, but this is a good skeleton to get you started on implementing this functionality on your own. You can make it templated, and you might actually store pairs as values in data_ so that you can easily reference the entry in insertion_order_. But I leave these design issues as an exercise :-).
Update: I suppose I should say something about efficiency of using map vs. vector for insertion_order_
lookups directly into data, in both cases are O(1)
inserts in the vector approach are O(1), inserts in the map approach are O(logn)
deletes in the vector approach are O(n) because you have to scan for the item to remove. With the map approach they are O(logn).
Maybe if you are not going to use deletes as much, you should use the vector approach. The map approach would be better if you were supporting a different ordering (like priority) instead of insertion order.

This is somewhat related to Faisals answer. You can just create a wrapper class around a map and vector and easily keep them synchronized. Proper encapsulation will let you control the access method and hence which container to use... the vector or the map. This avoids using Boost or anything like that.

// Should be like this man!
// This maintains the complexity of insertion is O(logN) and deletion is also O(logN).
class SpecialMap {
private:
int counter_;
map<int, string> insertion_order_;
map<string, int> insertion_order_reverse_look_up; // <- for fast delete
map<string, Data> data_;
};

There is no need to use a separate std::vector or any other container for keeping track of the insertion order. You can do what you want as shown below.
If you want to keep the insertion order then you can use the following program(version 1):
Version 1: For counting unique strings using std::map<std::string,int> in insertion order
#include <iostream>
#include <map>
#include <sstream>
int findExactMatchIndex(const std::string &totalString, const std::string &toBeSearched)
{
std::istringstream ss(totalString);
std::string word;
std::size_t index = 0;
while(ss >> word)
{
if(word == toBeSearched)
{
return index;
}
++index;
}
return -1;//return -1 when the string to be searched is not inside the inputString
}
int main() {
std::string inputString = "this is a string containing my name again and again and again ", word;
//this map maps the std::string to their respective count
std::map<std::string, int> wordCount;
std::istringstream ss(inputString);
while(ss >> word)
{
//std::cout<<"word:"<<word<<std::endl;
wordCount[word]++;
}
std::cout<<"Total unique words are: "<<wordCount.size()<<std::endl;
std::size_t i = 0;
std::istringstream gothroughStream(inputString);
//just go through the inputString(stream) instead of map
while( gothroughStream >> word)
{
int index = findExactMatchIndex(inputString, word);
if(index != -1 && (index == i)){
std::cout << word <<"-" << wordCount.at(word)<<std::endl;
}
++i;
}
return 0;
}
The output of the above program is as follows:
Total unique words are: 9
this-1
is-1
a-1
string-1
containing-1
my-1
name-1
again-3
and-2
Note that in the above program, if you have a comma or any other delimiter then it is counted as a separate word. So for example lets say you have the string this is, my name is then the string is, has count of 1 and the string is has count of 1. That is is, and is are different. This is because the computer doesn't know our definition of a word.
Note
The above program is a modification of my answer to How do i make the char in an array output in order in this nested for loop? which is given as version 2 below:
Version 2: For counting unique characters using std::map<char, int> in insertion order
#include <iostream>
#include <map>
int main() {
std::string inputString;
std::cout<<"Enter a string: ";
std::getline(std::cin,inputString);
//this map maps the char to their respective count
std::map<char, int> charCount;
for(char &c: inputString)
{
charCount[c]++;
}
std::size_t i = 0;
//just go through the inputString instead of map
for(char &c: inputString)
{
std::size_t index = inputString.find(c);
if(index != inputString.npos && (index == i)){
std::cout << c <<"-" << charCount.at(c)<<std::endl;
}
++i;
}
return 0;
}
In both cases/versions there is no need to use a separate std::vector or any other container to keep track of the insertion order.

Use boost::multi_index with map and list indices.

A map of pair (str,int) and static int that increments on insert calls indexes pairs of data. Put in a struct that can return the static int val with an index () member perhaps?

how to find duplicates in std::vector<string> and return a list of them?

So if I have a vector of words like:
Vec1 = "words", "words", "are", "fun", "fun"
resulting list: "fun", "words"
I am trying to determine which words are duplicated, and return an alphabetized vector of 1 copy of them. My problem is that I don't even know where to start, the only thing close to it I found was std::unique_copy which doesn't exactly do what I need. And specifically, I am inputting a std::vector<std::string> but outputting a std::list<std::string>. And if needed, I can use functor.
Could someone at least push me in the right direction please? I already tried reading stl documentation,but I am just "brain" blocked right now.

In 3 lines (not counting the vector and list creation nor the superfluous line-breaks in name of readability):
vector<string> vec{"words", "words", "are", "fun", "fun"};
list<string> output;
sort(vec.begin(), vec.end());
set<string> uvec(vec.begin(), vec.end());
set_difference(vec.begin(), vec.end(),
uvec.begin(), uvec.end(),
back_inserter(output));
EDIT
Explanation of the solution:
Sorting the vector is needed in order to use set_difference() later.
The uvec set will automatically keep elements sorted, and eliminate duplicates.
The output list will be populated by the elements of vec - uvec.

Make an empty std::unordered_set<std::string>
Iterator your vector, checking whether each item is a member of the set
If it's already in the set, this is a duplicate, so add to your result list
Otherwise, add to the set.
Since you want each duplicate only listed once in the results, you can use a hashset (not list) for the results as well.

IMO, Ben Voigt started with a good basic idea, but I would caution against taking his wording too literally.
In particular, I dislike the idea of searching for the string in the set, then adding it to your set if it's not present, and adding it to the output if it was present. This basically means every time we encounter a new word, we search our set of existing words twice, once to check whether a word is present, and again to insert it because it wasn't. Most of that searching will be essentially identical -- unless some other thread mutates the structure in the interim (which could give a race condition).
Instead, I'd start by trying to add it to the set of words you've seen. That returns a pair<iterator, bool>, with the bool set to true if and only if the value was inserted -- i.e., was not previously present. That lets us consolidate the search for an existing string and the insertion of the new string together into a single insert:
while (input >> word)
if (!(existing.insert(word)).second)
output.insert(word);
This also cleans up the flow enough that it's pretty easy to turn the test into a functor that we can then use with std::remove_copy_if to produce our results quite directly:
#include <set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
class show_copies {
std::set<std::string> existing;
public:
bool operator()(std::string const &in) {
return existing.insert(in).second;
}
};
int main() {
std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
std::set<std::string> result;
std::remove_copy_if(words.begin(), words.end(),
std::inserter(result, result.end()), show_copies());
for (auto const &s : result)
std::cout << s << "\n";
}
Depending on whether I cared more about code simplicity or execution speed, I might use an std::vector instead of the set for result, and use std::sort followed by std::unique_copy to produce the final result. In such a case I'd probably also replace the std::set inside of show_copies with an std::unordered_set instead:
#include <unordered_set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
class show_copies {
std::unordered_set<std::string> existing;
public:
bool operator()(std::string const &in) {
return existing.insert(in).second;
}
};
int main() {
std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
std::vector<std::string> intermediate;
std::remove_copy_if(words.begin(), words.end(),
std::back_inserter(intermediate), show_copies());
std::sort(intermediate.begin(), intermediate.end());
std::unique_copy(intermediate.begin(), intermediate.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
This is marginally more complex (one whole line longer!) but likely to be substantially faster when/if the number of words gets very large. Also note that I'm using std::unique_copy primarily to produce visible output. If you just want the result in a collection, you can use the standard unique/erase idiom to get unique items in intermediate.

In place (no additional storage). No string copying (except to result list). One sort + one pass:
#include <string>
#include <vector>
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
vector<string> vec{"words", "words", "are", "fun", "fun"};
list<string> dup;
sort(vec.begin(), vec.end());
const string empty{""};
const string* prev_p = ∅
for(const string& s: vec) {
if (*prev_p==s) dup.push_back(s);
prev_p = &s;
}
for(auto& w: dup) cout << w << ' ';
cout << '\n';
}

You can get a pretty clean implementation using a std::map to count the occurrences, and then relying on std::list::sort to sort the resulting list of words. For example:
std::list<std::string> duplicateWordList(const std::vector<std::string>& words) {
std::map<std::string, int> temp;
std::list<std::string> ret;
for (std::vector<std::string>::const_iterator iter = words.begin(); iter != words.end(); ++iter) {
temp[*iter] += 1;
// only add the word to our return list on the second copy
// (first copy doesn't count, third and later copies have already been handled)
if (temp[*iter] == 2) {
ret.push_back(*iter);
}
}
ret.sort();
return ret;
}
Using a std::map there seems a little wasteful, but it gets the job done.

Here's a better algorithm than the ones other people have proposed:
#include <algorithm>
#include <vector>
template<class It> It unique2(It const begin, It const end)
{
It i = begin;
if (i != end)
{
It j = i;
for (++j; j != end; ++j)
{
if (*i != *j)
{ using std::swap; swap(*++i, *j); }
}
++i;
}
return i;
}
int main()
{
std::vector<std::string> v;
v.push_back("words");
v.push_back("words");
v.push_back("are");
v.push_back("fun");
v.push_back("words");
v.push_back("fun");
v.push_back("fun");
std::sort(v.begin(), v.end());
v.erase(v.begin(), unique2(v.begin(), v.end()));
std::sort(v.begin(), v.end());
v.erase(unique2(v.begin(), v.end()), v.end());
}
It's better because it only requires swap with no auxiliary vector for storage, which means it will behave optimally for earlier versions of C++, and it doesn't require elements to be copyable.
If you're more clever, I think you can avoid sorting the vector twice as well.

stdext::hash_map unclear hash function

#include <iostream>
#include <hash_map>
using namespace stdext;
using namespace std;
class CompareStdString
{
public:
bool operator ()(const string & str1, const string & str2) const
{
return str1.compare(str2) < 0;
}
};
int main()
{
hash_map<string, int, hash_compare<string, CompareStdString> > Map;
Map.insert(make_pair("one", 1));
Map.insert(make_pair("two", 2));
Map.insert(make_pair("three", 3));
Map.insert(make_pair("four", 4));
Map.insert(make_pair("five", 5));
hash_map<string, int, hash_compare<string, CompareStdString> > :: iterator i;
for (i = Map.begin(); i != Map.end(); ++i)
{
i -> first; // they are ordered as three, five, two, four, one
}
return 0;
}
I want to use hash_map to keep std::string as a key. But when i insert the next pair order is confused. Why order is do not match to insert order ? how should i get the order one two three four five ??

Why order is do not match to insert order?
That's because a stdext::hash_map (and the platform-independent standard library version std::unordered_map from C++11) doesn't maintain/guarantee any reasonable order of its elements, not even insertion order. That's because it is a hashed container, with the individual elements' position based on their hash value and the size of the container. So you won't be able to maintain a reasonable order for your data with such a container.
What you can use to keep your elements in a guaranteed order is a good old std::map. But this also doesn't order elements by insertion order, but by the order induced by the comparison predicate (which can be confugured to respect insertion time, but that would be quite unintuitive and not that easy at all).
For anything else you won't get around rolling your own (or search for other libraries, don't know if boost has something like that). For example add all elements to a linear std::vector/std::list for insertion order iteration and maintain an additional std::(unordered_)map pointing into that vector/list for O(1)/O(log n) retrieval if neccessary.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ to compare 2 lists of strings - c++

In Python, set is pretty handy for comparing 2 lists of strings (see this link). I was wondering if there's a good solution for C++ in terms of performance. As each list has over 1 million strings in it. It's case-sensitive matching.

If it really is a std::list you have, sort them and use set_intersection: list<string> words1; list<string> words2; list<string> common_words; words1.sort(); words2.sort(); set_intersection(words1.begin(), words1.end(), words2.begin(), words2.end(), back_inserter(common_words));

Related

Is there any better way to carry out partial match of a vector of strings in c++?

Comparing unordered_map vs unordered_set

How to preserve insertion order in Map? [duplicate]

how to find duplicates in std::vector<string> and return a list of them?

stdext::hash_map unclear hash function

Categories

Resources