unordered_multiset pointer in C++ - c++

Iam beginner in C++, and I want to use insert function of unordered multiset pointer below to add new element:
struct Customer {
size_t operator()(const char& c) const;
};
unordered_multiset<char, Customer>* ms
can any one help?

void populate_multiset(const string& s, unordered_multiset<char, CustomHasher>* ms)
Given this function accepts a string and your unordered_multiset accepts a char, You can only insert a char
for(size_t i = 0; i<s.size(); i++) {
ms->insert(s[i]); // insert each individual char
}
Or use the iterators to insert a range of char
ms->insert(s.begin(), s.end());
Also, since the standard library already provides a way to hash a char. You can simply declare
unordered_multiset<char> ms;
However, if you do want to provide a custom hash function, you can. And the syntax is exactly like what you have in your question.
And a far more common way to pass in a container to a function is through reference. e.g.
void populate_multiset(const string& s, unordered_multiset<char, CustomHasher>& ms)
Then, you can use . instead of -> to do the exact same thing.

Related

How to use std::string as key in stxxl::map

I am trying to use std::string as a key in the stxxl::map
The insertion was fine for small number of strings about 10-100.
But while trying to insert large number of strings about 100000 in it, I am getting segmentation fault.
The code is as follows:
struct CompareGreaterString {
bool operator () (const std::string& a, const std::string& b) const {
return a > b;
}
static std::string max_value() {
return "";
}
};
// template parameter <KeyType, DataType, CompareType, RawNodeSize, RawLeafSize, PDAllocStrategy (optional)>
typedef stxxl::map<std::string, unsigned int, CompareGreaterString, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> name_map;
name_map strMap((name_map::node_block_type::raw_size)*3, (name_map::leaf_block_type::raw_size)*3);
for (unsigned int i = 0; i < 1000000; i++) { /// Inserting 1 million strings
std::stringstream strStream;
strStream << (i);
Console::println("Inserting: " + strStream.str());
strMap[strStream.str()]=i;
}
In here I am unable to identify why I am unable to insert more number of strings. I am getting segmentation fault exactly while inserting "1377". Plus I am able to add any number of integers as key. I feel that the variable size of string might be causing this trouble.
Also I am unable to understand what to return for max_value of the string. I simply returned a blank string.
According to documentation:
CompareType must also provide a static max_value method, that returns a value of type KeyType that is larger than any key stored in map
Because empty string happens to compare as smaller than any other string, it breaks this precondition and may thus cause unspecified behaviour.
Here's a max_value that should work. MAX_KEY_LEN is just an integer which is larger or equal to the length of the longest possible string key that the map can have.
struct CompareGreaterString {
// ...
static std::string max_value() {
return std::string(MAX_KEY_LEN, std::numeric_limits<unsigned char>::max());
}
};
I have finally found the solution to my problem with great help from Timo bingmann, user2079303 and Martin Ba. Thank you.
I would like to share it with you.
Firstly stxxl supports POD only. That means it stores fixed sized structures only. Hence std::string cannot be a key. stxxl::map worked for about 100-1000 strings because they were contained in the physical memory itself. When more strings are inserted it has to write on disk which is internally causing some problems.
Hence we need to use a fixed string using char[] as follows:
static const int MAX_KEY_LEN = 16;
class FixedString {
public:
char charStr[MAX_KEY_LEN];
bool operator< (const FixedString& fixedString) const {
return std::lexicographical_compare(charStr, charStr+MAX_KEY_LEN,
fixedString.charStr, fixedString.charStr+MAX_KEY_LEN);
}
bool operator==(const FixedString& fixedString) const {
return std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
}
bool operator!=(const FixedString& fixedString) const {
return !std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
}
};
struct comp_type : public std::less<FixedString> {
static FixedString max_value()
{
FixedString s;
std::fill(s.charStr, s.charStr+MAX_KEY_LEN, 0x7f);
return s;
}
};
Please note that all the operators mainly((), ==, !=) need to be overriden for all the stxxl::map functions to work
Now we may define fixed_name_map for map as follows:
typedef stxxl::map<FixedString, unsigned int, comp_type, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> fixed_name_map;
fixed_name_map myFixedMap((fixed_name_map::node_block_type::raw_size)*5, (fixed_name_map::leaf_block_type::raw_size)*5);
Now the program is compiling fine and is accepting about 10^8 strings without any problem.
also we can use myFixedMap like std::map itself. {for ex: myFixedMap[fixedString] = 10}
If you are using C++11, then as an alternative to the FixedString class you could use std::array<char, MAX_KEY_LEN>. It is an STL layer on top of an ordinary fixed-size C array, implementing comparisons and iterators as you are used to from std::string, but it's a POD type, so STXXL should support it.
Alternatively, you can use serialization_sort in TPIE. It can sort elements of type std::pair<std::string, unsigned int> just fine, so if all you need is to insert everything in bulk and then access it in bulk, this will be sufficient for your case (and probably faster depending on the exact case).

Sorting Array of Struct's based on String

I've been reading all the topics related to sorting arrays of structs, but haven't had any luck as of yet, so I'll just ask. I have a struct:
struct question{
string programNum;
string programDesc;
string programPoints;
string programInput;
string programQuestion;
};
And I populate an array of question in main, and now have an array called questions[] so now I need to write a sort that will sort questions[] based on question.programQuestion. Based on what I've read, this is where I'm at, but I'm not sure if its even close:
int myCompare (const void *v1, const void *v2 ) {
const struct question* p1 = static_cast<const struct question*>(v1);
const struct question* p2 = static_cast<const struct question*>(v2);
if (p1->programQuestion > p2->programQuestion){
return(+1);}
else if (p1->programQuestion < p2->programQuestion){
return(-1);}
else{
return(0);}
}
If this is right I'm not sure how to call it in main. Thanks for any help!
If you're intending to use std::sort to sort this array, you likely want to declare an operator< as a method in this struct. Something like this:
struct question{
string programNum;
string programDesc;
string programPoints;
string programInput;
string programQuestion;
bool operator<( const question &rhs) const;
};
bool question::operator<( const question &rhs ) const
{
return programQuestion < rhs.programQuestion;
}
The comparison function you were attempting to declare above appears to be the type qsort expects, and I would not recommend trying to qsort an array of these struct questions.
Just use std::sort. It's safer, nearly always faster (sometimes by huge margins), and generally easier to get right.
Unless there is some important reason not to do so, I would use a std::vector instead of a plain array. It is easier and safer. You could use the following code to sort your vector:
std::vector<question> questions;
// add some elements to the vector
std::sort(begin(questions), end(questions),
[](const question& q1, const question& q2) {
return q1.programQuestion < q2.programQuestion;
});
This code use some C++11 features. But you could achieve the same in previous versions of C++ by using a function object, or simply by implementing operator< in the struct (assuming you always want to sort such a struct based on that field).

How to sort vector of pointer-to-struct

I'm trying to sort a concurrent_vector type, where hits_object is:
struct hits_object{
unsigned long int hash;
int position;
};
Here is the code I'm using:
concurrent_vector<hits_object*> hits;
for(i=0;...){
hits_object *obj=(hits_object*)malloc(sizeof(hits_object));
obj->position=i;
obj->hash=_prevHash[tid];
hits[i]=obj;
}
Now I have filled up a concurrent_vector<hits_object*> called hits.
But I want to sort this concurrent_vector on position property!!!
Here is an example of what's inside a typical hits object:
0 1106579628979812621
4237 1978650773053442200
512 3993899825106178560
4749 739461489314544830
1024 1629056397321528633
5261 593672691728388007
1536 5320457688954994196
5773 9017584181485751685
2048 4321435111178287982
6285 7119721556722067586
2560 7464213275487369093
6797 5363778283295017380
3072 255404511111217936
7309 5944699400741478979
3584 1069999863423687408
7821 3050974832468442286
4096 5230358938835592022
8333 5235649807131532071
I want to sort this based on the first column ("position" of type int). The second column is "hash" of type unsigned long int.
Now I've tried to do the following:
std::sort(hits.begin(),hits.end(),compareByPosition);
where compareByPosition is defined as:
int compareByPosition(const void *elem1,const void *elem2 )
{
return ((hits_object*)elem1)->position > ((hits_object*)elem2)->position? 1 : -1;
}
but I keep getting segmentation faults when I put in the line std::sort(hits.begin(),hits.end(),compareByPosition);
Please help!
Your compare function needs to return a boolean 0 or 1, not an integer 1 or -1, and it should have a strongly-typed signature:
bool compareByPosition(const hits_object *elem1, const hits_object *elem2 )
{
return elem1->position < elem2->position;
}
The error you were seeing are due to std::sort interpreting everything non-zero returned from the comp function as true, meaning that the left-hand side is less than the right-hand side.
NOTE : This answer has been heavily edited as the result of conversations with sbi and Mike Seymour.
int (*)(void*, void*) is the comparator for C qsort() function. In C++ std::sort() the prototype to the comparator is:
bool cmp(const hits_object* lhs, const hits_object* rhs)
{
return lhs->position < rhs->position;
}
std::sort(hits.begin(), hits.end(), &cmp);
On the other hand, you can use std::pair struct, which by default compares its first fields:
typedef std::pair<int position, unsigned long int hash> hits_object;
// ...
std::sort(hits.begin(), hits.end());
Without knowing what concurrent_vector is, I can't be sure what's causing the segmentation fault. Assuming it's similar to std::vector, you need to populate it with hits.push_back(obj) rather than hits[i] = j; you cannot use [] to access elements beyond the end of a vector, or to access an empty vector at all.
The comparison function should be equivalent to a < b, returning a boolean value; it's not a C-style comparison function returning negative, positive, or zero. Also, since sort is a template, there's no need for C-style void * arguments; everything is strongly typed:
bool compareByPosition(hits_object const * elem1, hits_object const * elem2) {
return elem1->position < elem2->position;
}
Also, you usually don't want to use new (and certainly never malloc) to create objects to store in a vector; the simplest and safest container would be vector<hits_object> (and a comparator that takes references, rather than pointers, as arguments). If you really must store pointers (because the objects are expensive to copy and not movable, or because you need polymorphism - neither of which apply to your example), either use smart pointers such as std::unique_ptr, or make sure you delete them once you're done with them.
The third argument you pass to std::sort() must have a signature similar to, and the semantics of, operator<():
bool is_smaller_position(const hits_object* lhs, const hits_object* rhs)
{
return lhs->position < rhs->position;
}
When you store pointers in a vector, you cannot overload operator<(), because smaller-than is fixed for all built-in types.
On a sidenote: Do not use malloc() in C++, use new instead. Also, I wonder why you are not using objects, rather than pointers. Finally, if concurrent_vector is anything like std::vector, you need to explicitly make it expand to accommodate new objects. This is what your code would then look like:
concurrent_vector<hits_object*> hits;
for(i=0;...){
hits_object obj;
obj.position=i;
obj.hash=_prevHash[tid];
hits.push_back(obj);
}
This doesn't look right:
for(i=0;...){
hits_object *obj=(hits_object*)malloc(sizeof(hits_object));
obj->position=i;
obj->hash=_prevHash[tid];
hits[i]=obj;
}
here you already are sorting the array based on 'i' because you set position to i as well as it becomes the index of hits!
also why using malloc, you should use new(/delete) instead. You could then create a simple constructor for the structure to initialize the hits_object
e.g.
struct hits_object
{
int position;
unsigned int hash;
hits_object( int p, unsigned int h ) : position(p), hash(h) {;}
};
then later write instead
hits_object* obj = new hits_object( i, _prevHash[tid] );
or even
hits.push_back( new hits_object( i, _prevHash[tid] ) );
Finally, your compare function should use the same data type as vector for its arguments
bool cmp( hits_object* p1, hits_object* p2 )
{
return p1->position < p2->position;
}
You can add a Lambda instead of a function to std::sort.
struct test
{
int x;
};
std::vector<test> tests;
std::sort(tests.begin(), tests.end(),
[](const test* a, const test* b)
{
return a->x < b->x;
});

Avoid temporaries in std::map/std::unordered_map lookup with std::string key [duplicate]

This question already has answers here:
Avoiding key construction for std::map::find()
(4 answers)
Closed 8 years ago.
Consider the following code:
std::map<std::string, int> m1;
auto i = m1.find("foo");
const char* key = ...
auto j = m1.find(key);
This will create a temporary std::string object for every map lookup. What are the canonical ways to avoid it?
Don't use pointers; instead, pass strings directly. Then you can take advantage of references:
void do_something(std::string const & key)
{
auto it = m.find(key);
// ....
}
C++ typically becomes "more correct" the more you use its idioms and don't try to write C with it.
You can avoid the temporary by giving the std::map a custom comparator class, that can compare char *s. (The default will use the pointer's address, which isn't what you want. You need to compare on the string's value.)
Thus, something like:
class StrCmp
{
public:
bool operator () (const char *a, const char *b)
{
return strcmp(a, b) < 0;
}
};
// Later:
std::map<const char *, int, StrCmp> m;
Then, use like a normal map, but pass char *'s. Keep in mind that anything you store in the map must remain alive for the duration of the map. That means you need char literals, or you have to keep the data pointed to by the pointer alive on your own. For these reasons, I'd go with a std::map<std::string> and eat the temporary until profiling showed that the above was really needed.
There is no way to avoid a temporary std::string instance that copies character data. Note that this cost is very low and does not incur dynamic memory allocation if your standard library implementation uses short string optimizations.
However, if you need to proxy C-style strings on a frequent basis, you can still come up with custom solutions that will by-pass this allocation. This can pay off if you have to do this really often and your strings are lengthy enough not to benefit from short string optimizations.
If you only need a very small subset of string functionality (e.g. only assignment and copies), then you can write a small special-purpose string class that stores a const char * pointer and a function to release the memory.
class cheap_string
{
public:
typedef void(*Free)(const char*);
private:
const char * myData;
std::size_t mySize;
Free myFree;
public:
// direct member assignments, use with care.
cheap_string ( const char * data, std::size_t size, Free free );
// releases using custom deleter (a no-op for proxies).
~cheap_string ();
// create real copies (safety first).
cheap_string ( const cheap_string& );
cheap_string& operator= ( const cheap_string& );
cheap_string ( const char * data );
cheap_string ( const char * data, std::size_t size )
: myData(new char[size+1]), mySize(size), myFree(&destroy)
{
strcpy(myData, data);
myData[mySize] = '\0';
}
const char * data () const;
const std::size_t size () const;
// whatever string functionality you need.
bool operator< ( const cheap_string& ) const;
bool operator== ( const cheap_string& ) const;
// create proxies for existing character buffers.
static const cheap_string proxy ( const char * data )
{
return cheap_string(data, strlen(data), &abandon);
}
static const cheap_string proxy ( const char * data, std::size_t size )
{
return cheap_string(data, size, &abandon);
}
private:
// deleter for proxies (no-op)
static void abandon ( const char * data )
{
// no-op, this is used for proxies, which don't own the data!
}
// deleter for copies (delete[]).
static void destroy ( const char * data )
{
delete [] data;
}
};
Then, you can use this class as:
std::map<cheap_string, int> m1;
auto i = m1.find(cheap_string::proxy("foo"));
The temporary cheap_string instance does not create a copy of the character buffer like std::string does, yet it preserves safe copy semantics for storing instances of cheap_string in standard containers.
notes: if your implementation does not use return value optimization, you'll want to find an alternate syntax for the proxy method, such as a constructor with a special overload (taking a custom proxy_t type à la std::nothrow for placement new).
Well, map's find actually accepts a constant reference to the key, so you cannot avoid creating it at one point or another.
For the first part of the code you can have a constant static std::string with value "foo" to lookup. That way you won't create copies.
If you want to go Spartan's way, you can always create your own type that can be used like a string, but also can hold pointer to string literals.
But in any event, overhead associated with map lookups is so huge so this doesn't really make sense. If I were you I'd first replace map/unordered_map with google's dense hash. Then I would run Intel's VTune (amplifier these days) and see where the time is going and optimize those places. I doubt strings as keys will show up at a bottleneck top ten list.
Take a look at the StringRef class from llvm.
They can be constructed very cheap from c-strings, string literals or std::string. If you made a map of those, instead of std::string, the construction would be very fast.
It's a very fragile system though. You need to be sure that whatever the source of the strings you insert stays alive and unmodified for the lifetime of the map.

Templates, STL, C++

I wrote this routine to order items, keep only unique items, where it takes in an array of type T, and the size of the array. It returns the new size of the array after processing.
template <class T>
int reduce(T array[], int size) {
T *begin = array;
T *end = array + size;
sort(begin, end);
T *end_new = unique(begin, end);
return end_new - array;
}
My question is I was expecting it to sort const char *data like
{"aa", "bb", "bc", "ca", "bc", "aa", "cc", "cd", "ca", "bb"};
into //aa bb bc ca cc cd
However it does it the opposite way, : "cd cc ca bc bb aa"
Why does it do that? Does it not use the standard string comparisons? If I wanted to, how could I alter it so it would order const char * alphabetically? thanks.
sort() uses operator< per default, which would just compare the addresses in your case.
If you want to sort C-strings, you have to pass a comparator to sort(). To do this generically you can let the user pass a comparator, use specialization on a comparator function or a combination of these:
template<class T> bool my_comp(T a, T b) {
return a < b;
}
template<> bool my_comp<const char*>(const char* a, const char* b) {
return std::strcmp(a, b) < 0;
}
template<class T, class Comp>
int reduce(T array[], size_t size, Comp comp = my_comp<T>) {
// ...
std::sort(begin, end, comp);
// ...
}
std::sort uses < by default, < on chars has nothing to do with their lexicographic ordering. You can provide an additional parameter to sort to tell it how to compare, or you can use an array of std::string or similar instead of char.
const char*'s < operator performs pointer comparison, not string data comparison. Either use std::string for your string data, or specialize reduce so that it calls sort with a special comparator based on strcmp. I'm guessing you got the output you did because your compiler decided to reverse-alphabetize all of its string constants in memory.
unique also isn't doing anything -- it only works in the first place because your compiler pooled all of the strings in memory at compile time, so that all of the "bb" strings would use the same memory. If you read the exact same strings from a file, your array wouldn't change. The solutions to this problem are the same.