Undefined behavior in std::unordered_set with custom predicate - c++

I have an std::unordered_set that is supposed to store pointers to values stored in an std::list. The values are first added to the list, then their pointers are inserted into the set. The set uses a predicate that compares the values the pointers point to instead of the addresses. This produces undefined behavior.
Here's a minimal working example:
#include <unordered_set>
#include <iostream>
#include <string>
#include <list>
using namespace std;
template<typename T> struct set_hash {
size_t operator()(const T* p) const noexcept {
return reinterpret_cast<uintptr_t>(p);
}
};
template<typename T> struct set_eq {
bool operator()(const T* a, const T* b) const noexcept {
std::cout << "*a["<<*a<<"] == *b["<<*b<<"] "
<< boolalpha << (*a == *b) << std::endl;
return *a == *b;
}
};
template<typename T> using set_t =
std::unordered_set<const T*, set_hash<T>, set_eq<T>>;
int main()
{
set_t<string> set;
list<string> list{"a", "b", "a", "c", "a", "d"};
for (auto& str : list) {
set.insert(&str);
cout << str << ' ';
}
cout << endl;
for (auto p : set) cout << *p << ' ';
cout << endl;
string c("c");
cout << **set.find(&c) << endl;
return 0;
}
After running the program multiple times I get three possible outputs:
a b a c a d
d a c a b a
Segmentation fault
a b a c a d
d a c a b a
*a[c] == *b[a] false
Segmentation fault
a b a c a d
d a c a b a
*a[c] == *b[c] true
c
The output I expect is
a b a c a d
a b c (not necessarily in this order)
c
with some lines like *a[c] == *b[c] true, depending on how many times the predicate is called.
I do not understand what results in undefined behavior.
I get identical results with gcc4.8.2, gcc4.9.1, and gcc4.9.2.

The problem is that the hash function hashes the pointers while the comparison function compares the values pointed at by the pointers. So if you have two different list elements with the same value, the hash function will give different hashes (from the pointers), while the comparison function will compare equal.
The hash function needs to be consistent -- it needs to hash the values instead of the pointers.

Related

How does comparator in a set works with functor in C++?

Here is a simple program to show my point:
#include <iostream>
#include <set>
class comparator
{
public:
bool operator()(int* a, int* b){return *a < *b;}
};
int main()
{
std::set<int*> v1{new int{1}, new int{1}, new int{2}, new int{2}};
std::set<int*, comparator> v2{new int{1}, new int{1}, new int{2}, new int{2}};
std::cout << v1.size() << std::endl; // 4
std::cout << v2.size() << std::endl; // 2
return 0;
}
Before using functor, the set removes duplicate elements by address of integers. However, after including functor, it removes based on the values. The problem is that in the functor I didn't define the operator to return true on duplicate values, so why would it show this behavior?
I didn't define the operator to return true on duplicate values, so why would it show this behavior?
Because a std::set is intended to work with "less than" comparator, and it is implemented that way. That is, if for two values x and y in the set both x<y and y<x are false, then x and y are assumed to be equal and thus they are duplicates.

Why is the output of the maximum of two string literals wrong?

Can someone explain, why the output is "C" in this code?
#include <iostream>
using namespace std;
template<class X>
X maximum(X a,X b)
{
if(a > b)
return a;
else
return b;
}
int main() {
cout << maximum("C","D") << endl;
}
Note that in your case the type X will be inferred as const char*, hence you are comparing two const char *s i.e. the addresses of the two string literals.
If you want to get the expected result, use something like the following
cout << maximum("C"s, "D"s) << endl;
To pass std::strings instead of passing the addresses of the string literals.
See string literal operator
Demo
Or use characters instead of using string literals i.e 'C' and 'D' and in that case, X will be inferred as char.
And See Why is "using namespace std;" considered bad practice?
When you use maximum("C","D"), the template parameter is char const*. You end up comparing two pointers. There is no guarantee which pointer will be greater. You have indeterminate behavior.
If you want to compare the string "C" and string "D", you can use:
cout << maximum(std::string("C"), std::string("D")) << endl; // or
cout << maximum("C"s, "D"s) << endl; // or
cout << maximum<std::string>("C", "D");
If you want compare just the characters C and D, you should use
cout << maximum('C', 'D') << endl;
If you want this to work for cstring-literals also, add a specialization.
#include <cstring>
template<class X>
X maximum(X a, X b)
{
return a > b ? a : b;
}
template<>
char const* maximum<char const*>(char const* a, char const* b)
{
return std::strcmp(a, b) > 0 ? a : b;
}

insertion into unordered_map got lost

I'm having the following code, but after run the code, the result is empty, any ideas why the result is empty? the reference of result in function main was passed to myclass, I thought function addToResult will actually add data to result, and I'm expecting a map key = "test", value = "1": "1". I'm kind of new to c++. Thanks!
#include <iostream>
#include <string>
#include <unordered_map>
using LookUpTable = std::unordered_map<std::string, std::string>;
using DLTable = std::unordered_map<std::string, LookUpTable>;
class MyClass
{
public:
MyClass(DLTable& dltable) {
m_dltable = dltable;
};
void addToResult() {
LookUpTable ee;
ee.emplace("1", "1");
m_dltable.emplace("test", ee);
};
private:
DLTable m_dltable;
};
int main ()
{
DLTable result;
MyClass myclass(result);
myclass.addToResult();
std::cout << "myrecipe contains:" << std::endl;
for (auto& x: result) {
std::cout << x.first << ": "<< std::endl;
for (auto& xx : x.second) {
std::cout << xx.first << ": " << xx.second << std::endl;
}
}
std::cout << std::endl;
return 0;
}
Let' look into simplified example:
int a = 0;
int &b = a;
int c = b;
c = 123;
Will last assignment modify a? Of course not. It does not matter how you pass value to c through reference or not c is completely independent variable that just initialized by a reference.
Your case is the same - m_dltable is separate variable and the fact you initialize it using reference does not change anything. (Your case even worse, you did not initialize it by reference, you assigned to it)
In general your approach is wrong. If you want directly access that variable then just make it public, do not try to create convoluted workarounds on how to access it. If you want incapsulation just create members that allow you to iterate over that container. For example return a const reference to it or have begin() and end() methods that return (const) iterators accordingly.

String comparison using general comparators

I am trying to see what happens when we compare strings directly using operators like <, >, etc. The two usages in the code below surprisingly give different answers. Aren't they exactly same way of saying things?
#include <iostream>
template <class T>
T max(T a, T b)
{
//Usage 1:
if (a > b) return a; else return b;
//Usage 2:
return a > b ? a : b ;
}
int main()
{
std::cout << "max(\"Alladin\", \"Jasmine\") = " << max("Alladin", "Jasmine") << std::endl ;
}
Usage 1 gives "Jasmine" while usage 2 gives "Alladin".
When you use:
max("Alladin", "Jasmine")
it is equivalent to using:
max<char const*>("Alladin", "Jasmine")
In the function, you end up comparing pointers. The outcome of the call will depend on the values of the pointers. It is not guaranteed to be predictable.
Perhaps you want to use:
max(std::string("Alladin"), std::string("Jasmine"))
or
max<std::string>("Alladin", "Jasmine")
Be warned that some compiler might pick up std::max when you use that. You may want to change max to my_max or something like that.
You are not actually comparing the string in your code. "Alladin" and "Jasmine" are actually of the type const char[] and they decay into pointers when you call max("Alladin", "Jasmine"). This means that in your function you are comparing the address of the strings and not the contents.
If you meant to test std::strings then you need to create std::strings and pass them to your max function.
Both methods are wrong. Character strings don't have valid > operator.
You can compare std::string instead:
#include <iostream>
#include <string>
template <class T>
T my_max(T a, T b)
{
return a > b ? a : b;
}
int main()
{
std::string a = "Alladin";
std::string b = "Jasmine";
std::cout << "my max: " << my_max(a, b) << std::endl;
//standard max function:
std::cout << "standard max: " << max(a, b) << std::endl;
}
The expected result should always be "Jasmine"

String-bool comparsion - why?

I was working with boost::variant<int,std::string,bool> and its visitors when I runned into an unexpected behavior: the string and bool values were comparable. I don't know, why does it work like this, but I found it interesting. My only idea is that the variant with the bool value was interpreted as a char? Someone could explain it to me?
The comparsion visitor:
#include <iostream>
#include <algorithm>
#include <vector>
#include <boost/variant.hpp>
#include <boost/function.hpp>
struct my_less : boost::static_visitor<bool*>
{
template<typename T>
bool* operator()(T a, T b) const
{
return a<b ? new bool(true) : new bool(false);
}
template<typename T, typename U>
bool* operator()(T a, U b) const
{
return NULL;
}
};
int main()
{
typedef boost::variant<int,bool,std::string> datatype;
datatype *a = new datatype(false);
datatype *b = new datatype("abc");
my_less cmp;
bool* val = boost::apply_visitor(cmp,*a,*b);
if(val)
{
std::cout << *val;
}
else
{
std::cout << "NULL";
}
}
EDIT
Here is an extended main function with some test cases:
void show_result(bool* val)
{
if(val)
{
std::cout << *val << std::endl;
}
else
{
std::cout << "NULL" << std::endl;
}
}
int main()
{
//std::string a = "bbb";
//bool b = true;
//std::cout << b<a; //compilation error
typedef boost::variant<int,bool,std::string> datatype;
datatype int_value_1(4);
datatype int_value_2(3);
datatype string_value("abc");
datatype bool_value(true);
my_less cmp;
std::cout<<"First result, compare ints 4 and 3:"<<std::endl;
bool* val = boost::apply_visitor(cmp,int_value_1,int_value_2);
show_result(val);
std::cout<<"Second result, compare int to string 4 to abc " << std::endl;
val = boost::apply_visitor(cmp,int_value_1,string_value);
show_result(val);
std::cout <<"Third result, int 4 to bool true:" << std::endl;
val = boost::apply_visitor(cmp,int_value_1,bool_value);
show_result(val);
std::cout<<"Fourth result, string abc to bool true" << std::endl;
val = boost::apply_visitor(cmp,string_value,bool_value);
show_result(val);
}
The output:
First result, compare ints 4 and 3:
0
Second result, compare int to string 4 to abc
NULL
Third result, int 4 to bool true:
NULL
Fourth result, string abc to bool true
0
OK, now that you've completely changed your program, let me try again.
The problem is:
datatype *b = new datatype("abc");
"abc" is a const char*, not a std::string. If you want to create a std::string variant, you need to do so explicitly. Otherwise, you'll end up creating a bool variant because all pointers are convertible to bool, including const char* pointers.
Try this
datatype *b = new datatype(std::string("abc"));
This interaction between bool and std::string is apparently well-known and somewhat irritating. boost::variant provides a templated constructor, but the resolution rules prefer the built-in converstion to bool and there's no way in C++ to specify a template specialization on a constructor. It is possible to explicitly specialize assignment, so you can write:
datatype b;
b.operator=<std::string>("abc");
which might be marginally more efficient but much less readable than
datatype b;
b = std::string("abc");
If you don't include bool as a variant, then string literals do automatically convert to std::string. Maybe it's possible to use some sort of proxy pseudo-boolean class. I've never tried.