using a custom comparator with std::set - c++

I'm trying to create a list of words read from a file arranged by their length. For that, I'm trying to use std::set with a custom comparator.
class Longer {
public:
bool operator() (const string& a, const string& b)
{ return a.size() > b.size();}
};
set<string, Longer> make_dictionary (const string& ifile){
// produces a map of words in 'ifile' sorted by their length
ifstream ifs {ifile};
if (!ifs) throw runtime_error ("couldn't open file for reading");
string word;
set<string, Longer> words;
while (ifs >> word){
strip(word);
tolower(word);
words.insert(word);
}
remove_plurals(words);
if (ifs.eof()){
return words;
}
else
throw runtime_error ("input failed");
}
From this, I expect a list of all words in a file arranged by their length. Instead, I get a very short list, with exactly one word for each length occurring in the input:
polynomially-decidable
complexity-theoretic
linearly-decidable
lexicographically
alternating-time
finite-variable
newenvironment
documentclass
binoppenalty
investigate
usepackage
corollary
latexsym
article
remark
logic
12pt
box
on
a
Any idea of what's going on here?

With your comparator, equal-length words are equivalent, and you can't have duplicate equivalent entries in a set.
To maintain multiple words, you should modify your comparator so that it also performs, say, a lexicographic comparison if the lengths are the same.

Your comparator only compares by length, that means that equally-sized but different strings are treated as being equivalent by std::set. (std::set treats them equally if neither a < b nor b < a are true, with < being your custom comparator function.)
That means your comparator should also consider the string contents to avoid this situation. The keyword here is lexicographic comparison, meaning you take multiple comparison criteria in account. The first criterion would be your string length, and the second would be the string itself. An easy way to write lexicographic comparison is to make use of std::tuple which provides a comparison operator performing lexicographic comparison on the components by overloading the operator<.
To make your "reverse" ordering of length, which you wrote with operator>, compatible with the usually used operator<, simply take the negative size of the strings, i.e. first rewrite a.size() > b.size() as -a.size() < -b.size(), and then compose it with the string itself into tuples, finally compare the tuples with <:
class Longer {
public:
bool operator() (const string& a, const string& b)
{
return std::make_tuple(-a.size(), a )
< std::make_tuple(-b.size(), b );
// ^^^^^^^^^ ^^^
// first second
// criterion criterion
}
};

Related

Sorting structures inside vector by two criteria in alphabetical order

I have a following data structure (first string as "theme" of the school)
map<string, vector<School>> information;
And the school is:
struct School {
string name;
string location;
}
I have trouble printing my whole data structure out in alphabetical order (first theme, then location, then name). For an example.
"map key string : struct location : struct name"
"technology : berlin : university_of_berlin"
So far I managed to loop through the initial map by
for (auto const key:information) {
//access to struct
vector<School> v = key.second;
//sorting by location name
//comparasion done by seperate function that returns school.location1 < school.location2
sort(v.begin(), v.end(), compare);
If I print out the theme (key.first) and v.location, it's almost complete. Map is ordered by default and location comparasion works. But I can't figure out how to add second comparasion by name. If I do another sorting, this time by name, then I lose the original order by location. Is it somehow possible to "double sorting" where one criteria is more important, then another?
You can, you just need to add some condition in compare
bool compare(School const& lhs, School const& rhs)
{
if(lhs.location != rhs.location)
return lhs.location < rhs.location)
return lhs.name < rhs.name
}
Or you can overload the < operator like #ceorron did
There is a simple answer to this, I assume that you want to order first by "location" then "name".
The simple way is to implement a less operator in the struct School structure.
Example code:
//in School.hpp
struct School {
string name;
string location;
bool operator<(const School& rhs) const;
}
//in School.cpp
bool School::operator<(const School& rhs) const {
if(this->location < rhs.location)
return true;
if(rhs.location < this->location)
return false;
if(this->name < rhs.name)
return true;
if(rhs.name < this->name)
return false;
return false;
}
There are other ways though, you now call sort like this.
sort(v.begin(), v.end());
I am adding this answer just to be pedantic. See JustANewbie’s response for what is correct for this particular case (and I would say in most normal cases).
It is totally possible to perform multiple-pass sorts. The trick is to use a stable sorting method for each additional pass. (A stable sort preserves relative ordering of equivalent elements.)
The default std::sort algorithm is Introsort — which is not a stable sort (it uses Quicksort + Insertion Sort but switches to Heapsort if the Quicksort would take longer).
Conveniently, the Standard Library provides us the std::stable_sort algorithm for when we need a stable sort.
A stable sort is typically slower than a non-stable sort, which is why we tend to prefer the non-stable sort whenever possible. The first pass you can use a non-stable sort, but you must use a stable sort for all additional passes.
std::sort ( xs.begin(), xs.end(), compare_names ); // 1st pass: secondary criterion
std::stable_sort( xs.begin(), xs.end(), compare_locations ); // 2nd pass: primary criterion
The final order will be sorted primarily by location and secondarily by name.
You can add as many sorting passes as you need. Just remember that you apply the passes in reverse order of their significance. For example, if you want to sort people by (last name, first name, age), you must apply the sorting in reverse order: age, first name, last name.

How not to use custom comparison function of std::map in searching ( map::find)?

As you can see in my code, lenMap is a std::map with a custom comparison function. This function just check the string's length.
Now when I want to search for some key ( using map::find), the map still uses that custom comparison function.
But How can I force my map not to use that when I search for some key ?
Code:
struct CompareByLength : public std::binary_function<string, string, bool>
{
bool operator()(const string& lhs, const string& rhs) const
{
return lhs.length() < rhs.length();
}
};
int main()
{
typedef map<string, string, CompareByLength> lenMap;
lenMap mymap;
mymap["one"] = "one";
mymap["a"] = "a";
mymap["foobar"] = "foobar";
// Now In mymap: [a, one, foobar]
string target = "b";
if (mymap.find(target) == mymap.end())
cout << "Not Found :) !";
else
cout << "Found :( !"; // I don't want to reach here because of "a" item !
return 0;
}
The map itself does not offer such an operation. The idea of the comparison functor is to create an internal ordering for faster lookup, so the elements are actually ordered according to your functor.
If you need to search for elements in a different way, you can either use the STL algorithm std::find_if() (which has linear time complexity) or create a second map that uses another comparison functor.
In your specific example, since you seem only to be interested in the string's length, you should rather use the length (of type std::size_t) and not the string itself as a key.
By the way, std::binary_function is not needed as a base class. Starting from C++11, it has even been deprecated, see here for example.
The comparison function tells the map how to order elements and how to differentiate between them. If it only compares the length, two different strings with the same length will occupy the same position in the map (one will overwrite the other).
Either store your strings in a different data structure and sort them, or perhaps try this comparison function:
struct CompareByLength
{
bool operator()(const string& lhs, const string& rhs) const
{
if (lhs.length() < rhs.length())
{
return true;
}
else if (rhs.length() < lhs.length())
{
return false;
}
else
{
return lhs < rhs;
}
}
};
I didn't test it, but I believe this will first order strings by length, and then however strings normally compare.
You could also use std::map<std::string::size_type, std::map<std::string, std::string>> and use the length for the first map and the string value for the second map. You would probably want to wrap this in a class to make it easier to use, as there is no protection against messing it up.

How to sort a vector of structs based on a vector<string> within the vector to be sorted?

What is the best way to alphabetically sort a vector of structures based on the first word in every vector of all the structures in the vector of structures?
struct sentence{
vector<string> words;
};
vector<sentence> allSentences;
In other words, how to sort allSentences based on words[0]?
EDIT: I used the following solution:
bool cmp(const sentence& lhs, const sentence & rhs)
{
return lhs.words[0] < rhs.words[0];
}
std::sort(allSentences.begin(), allSentences.end(), cmp);
Provide a suitable comparison binary function and pass it on to std::sort. For example
bool cmp(const sentence& lhs, const sentence & rhs)
{
return lhs.words[0] < rhs.words[0];
}
then
std::sort(allSentences.begin(), allSentences.end(), cmp);
Alternatively, in C++11 you can use a lambda anonymous function
std::sort(allSentences.begin(), allSentences.end(),
[](const sentence& lhs, const sentence & rhs) {
return lhs.words[0] < rhs.words[0];}
);
You need some comparison function that you can pass to std::sort:
bool compare(const sentence& a, const sentence& b)
{
return a.words[0] < b.words[0];
}
As you can see, it takes two sentences and returns true if the first word of the first sentence is "less than" the first word of the second sentence.
Then you can sort allSentences very easily:
std::sort(allSentences.begin(), allSentences.end(), compare);
Of course, using this comparison means that sentences like {"hello", "world"} and {"hello", "friend"} will compare equal. But that's what you've asked for.
Generally, there are three different types of scenarios for comparison implementations you should consider.
A comparison of your object that makes always sense. It's independent from the scenario in which you want to compare objects. Then: Implement operator< for your class. This operator is used whenever two objects are compared (with <, which the standard algorithms do). (For single scenarios, you can still "overwrite" this behavior using the other methods below).
For this, extend your class with the following function:
struct sentence{
vector<string> words;
bool operator<(const sentence &other) const {
return this->words[0] < other.words[0];
}
};
Then, just call the standard sorting algorithm on your vector of sentences without other arguments:
std::sort(allSentences.begin(), allSentences.end());
However, your scenario doesn't sound like this is the best method, since comparing by the first word is something you don't want to have always, maybe only in one case.
A comparison of your object which will be used only once. In C++11, you have lambda functions (anonymous, literally inlined functions), which can be passed directly to the algorithm function in which it will be used, like std::sort in this scenario. This is my favorite solution:
// Sort lexicographical by first word
std::sort(allSentences.begin(), allSentences.end(),
[](const sentence& a, const sentence& b) {
a.words[0] < b.words[0];
});
In C++03, where you don't have lambdas, use to the 3rd solution:
A set of different, re-usable comparison methods, maybe a parameterized comparison function. Examples are: Compare by the first word, compare by length, compare by something else... In this case, implement the comparison function(s) either as free-standing functions and use function pointers, or implement them as functors (which can be parameterized). Also, lambdas stored in variables do the job in this case.
This method has the advantage to name the comparison methods, giving them a meaning. If you use different comparisons for the same object, but re-use them, this is a huge advantage:
// Lexicographical comparison by the first word only
bool compareLexByFirstWord(const sentence& a, const sentence& b) {
return a.words[0] < b.words[0];
}
// Lexicographical comparison by all words
bool compareLex(const sentence& a, const sentence& b) {
return a.words < b.words;
}
// Decide which behavior to use when actually using the comparison:
std::sort(sentence.begin(), sentence.end(), compareLexByFirstWord);
std::sort(sentence.begin(), sentence.end(), compareLex);

std::map with a char[5] key that may contain null bytes

The keys are binary garbage and I only defined them as chars because I need a 1-byte array.
They may contain null bytes.
Now problem is, when I have a two keys: ab(0)a and ab(0)b ((0) being a null byte), the map treats them as strings, considers them equal and I don't get two unique map entries.
What's the best way to solve this?
Why not use std::string as key:
//must use this as:
std::string key1("ab\0a",4);
std::string key2("ab\0b",4);
std::string key3("a\0b\0b",5);
std::string key4("a\0\0b\0b",6);
Second argument should denote the size of the c-string. All of the above use this constructor:
string ( const char * s, size_t n );
description of which is this:
Content is initialized to a copy of the string formed by the first n characters in the array of characters pointed by s.
Use std::array<char,5> or maybe even better (if you want really to handle keys as binary values) std::bitset
If you really want to use char[5] as your key, consider writing your own comparison class to compare between keys correctly. The map class requires one of these in order to organize its contents. By default, it is using a version that doesn't work with your key.
Here's a page on the map class that shows the parameters for map. You'd want to write your own Compare class to replace less<Key> which is the third template parameter to map.
If you only need to distinguish them and don't rely on a lexicographical ordering you could treat each key as uint64_t. This has the advantage, that you could easily replace std::map by a hashmap implementation and that you don't have to do anything by hand.
Otherwise you can also write your own comparator somehow like this:
class MyKeyComp
{
public:
operator()(char* lhs, char* rhs)
{
return lhs[0] == rhs[0] ?
(lhs[1] == rhs[1] ?
(lhs[2] == rhs[2] ?
(lhs[3] == rhs[3] ? lhs[4] < rhs[4])
: lhs[3] < rhs[3])
: lhs[2] < rhs[2])
: lhs[1] < rhs[1])
: lhs[0] < rhs[0];
}
};

Idiomatic C++ for finding a range of equal length strings, given a vector of strings (ordered by length)

given a std::vector< std::string >, the vector is ordered by string length, how can I find a range of equal length strength?
I am looking forward an idiomatic solution in C++.
I have found this solution:
// any idea for a better name? (English is not my mother tongue)
bool less_length( const std::string& lhs, const std::string& rhs )
{
return lhs.length() < rhs.length();
}
std::vector< std::string > words;
words.push_back("ape");
words.push_back("cat");
words.push_back("dog");
words.push_back("camel");
size_t length = 3;
// this will give a range from "ape" to "dog" (included):
std::equal_range( words.begin(), words.end(), std::string( length, 'a' ), less_length );
Is there a standard way of doing this (beautifully)?
I expect that you could write a comparator as follows:
struct LengthComparator {
bool operator()(const std::string &lhs, std::string::size_type rhs) {
return lhs.size() < rhs;
}
bool operator()(std::string::size_type lhs, const std::string &rhs) {
return lhs < rhs.size();
}
bool operator()(const std::string &lhs, const std::string &rhs) {
return lhs.size() < rhs.size();
}
};
Then use it:
std::equal_range(words.begin(), words.end(), length, LengthComparator());
I expect the third overload of operator() is never used, because the information it provides is redundant. The range has to be pre-sorted, so there's no point the algorithm comparing two items from the range, it should be comparing items from the range against the target you supply. But the standard doesn't guarantee that. [Edit: and defining all three means you can use the same comparator class to put the vector in order in the first place, which might be convenient].
This works for me (gcc 4.3.4), and while I think this will work on your implementation too, I'm less sure that it is actually valid. It implements the comparisons that the description of equal_range says will be true of the result, and 25.3.3/1 doesn't require that the template parameter T must be exactly the type of the objects referred to by the iterators. But there might be some text I've missed which adds more restrictions, so I'd do more standards-trawling before using it in anything important.
Your way is definitely not unidiomatic, but having to construct a dummy string with the target length does not look very elegant and it isn't very readable either.
I'd perhaps write my own helper function (i.e. string_length_range), encapsulating a plain, simple loop through the string list. There is no need to use std:: tools for everything.
std::equal_range does a binary search. Which means the words vector must be sorted, which in this case means that it must be non-decreasing in length.
I think your solution is a good one, definitely better than writing your own implementation of binary search which is notoriously error prone and hard to prove correct.
If doing a binary search was not your intent, then I agree with Alexander. Just a simple for loop through the words is the cleanest.