How to create the right interface for std::transform - c++

The signature of transform is:
OutputIterator transform (InputIterator first1, InputIterator last1,
OutputIterator result, UnaryOperation op);
And I want to create a generic token replacing functor, msg_parser, below, so I can use any container (string used in example below) and pass begin and end of container to transform. Thats the idea.
But I can't get this to compile.
Here is my code. Any help would be much appreciated.
#include <iostream>
#include <iterator>
#include <string>
#include <map>
#include <algorithm>
class msg_parser {
public:
msg_parser(const std::map<std::string, std::string>& mapping, const char token = '$')
: map_(mapping), token_(token) {}
// I can use a generic istream type interface to handle the parsing.
std::ostream_iterator operator() (std::istream_iterator in) {
//body will go through input and when get to end of input return output
}
private:
const char token_;
const std::map<std::string, std::string>& map_;
};
int main(int argc, char* argv[]) {
std::map<std::string, std::string> mapping;
mapping["author"] = "Winston Churchill";
std::string str_test("I am $(author)");
std::string str_out;
std::transform(str_test.begin(), str_test.end(), str_out.begin(), msg_parser(mapping));
return 0;
}

Since std::string is a collection of chars, std::transform will iterate over chars exactly distance(first1, last1) times, so in your case it's not possible to change the size of the string. You may be able to transform "$(author)" into another string exactly the same size, though, but I guess it's not what you want.
You probably want to iterate over stream iterators instead of chars:
std::stringstream istrstr(str_test);
std::stringstream ostrstr;
std::transform(std::istream_iterator<std::string>(istrstr),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(ostrstr, " "), // note the delimiter
msg_parser(mapping));
std::cout << ostrstr.str() << std::endl;
By the way, your UnaryOperation works on the iterated type, not on iterators, so operator() should be:
std::string operator() (std::string in) { // ...

You should read the documentations and examples for std::transform in a reference like this.
You'll notice that the operation shall take an element of the input container and generate an element for the output container. Since your containers are strings and the elements are chars, the signature should be char operator()(char). Container-iterators would be wrong in this case. Anyways, the iterators of std::string are char*s, so your std::ostream_iterator are completely senseless.
Having said that, you will notice that transform works on single characters, if you apply it to your string, not on the whole "author" substring. What you are trying to do is best achieved with C++11's std::regex regular expression library, not with std::transform

Related

Is there some way to use std::remove_if on std::string_view iterators?

I'm wanting to effectively trim an already created std::string_view using an iterator that doesn't point to the trimmed characters thanks to std::remove_if(). However, I can't use std::remove_if() on a std::basic_string_view::iterator directly because that's really a std::basic_string_view::const_iterator and std::remove_if() can't take non-moveable iterators as arguments.
The only workaround I've though of is casting the std::string_view to a std::string and then taking the iterator. Here's an example of that:
#include <string>
#include <string_view>
#include <algorithm>
#include <locale>
int main() {
std::string_view foo{"Whitepace...\nThe Final Frontier"};
const auto is_space{
[](const auto& character) {
return std::isspace(character, std::locale{});
}
};
// Doesn't compile
//auto without_conversion{
// std::remove_if(foo.begin(), foo.end(), is_space)
//};
// Works, for the most part.
auto with_conversion{
std::remove_if(std::string{foo}.begin(), std::string{foo}.end(), is_space)
};
But this kinda defeats the whole point of using std::string_view, as a string_view constructed from this iterator wouldn't be viewing the original string.
Is there some (preferably elegant) way to do this while keeping the view on the original string? Perhaps some way to make the string_view iterator non-const?
If your goal is to trim a string_view of spaces, and store the result in a std::string, then you should choose the appropriate algorithm that allows const iterators.
One such algorithm is std::copy_if:
#include <iostream>
#include <string_view>
#include <algorithm>
#include <iterator>
#include <cctype>
int main()
{
std::string_view foo{"Whitepace...\nThe Final Frontier"};
std::string result;
std::copy_if(foo.begin(), foo.end(), std::back_inserter(result), [](char ch)
{ return !std::isspace(static_cast<unsigned char>(ch)); });
std::cout << result;
}
Output:
Whitepace...TheFinalFrontier
std::string_view is a constant view of the string sequence.
For example, begin returns a const_iterator.
https://en.cppreference.com/w/cpp/string/basic_string_view/begin
Maybe you will have better luck with std::span, however take into account that literals in the program are always immutable.
You have to make a copy first anyway.
Also your last line doesn't do what you think because you are iterating over different temporaries, even if it compiles.
The correct code is, for example:
std::string FOO = foo;
auto with_conversion{
std::remove_if(FOO.begin(), FOO.end(), is_space)
};
In other words, the whole idea of your program (that you can modify a "program" string) is flawed in the first place.

Iterating and printing std::map using std::for_each

Recently, i learnt about the STL types and templates and we were given a challenge as part of practicing the STL and getting used to using it:
Iterate over a std::map<std::string, size_t>
Print its contents
Restrictions:
Can only use: std::vector, std::map, std::string, std::algorithm, std::functional
Cannot define complex types nor templates
Cannot use the . (member access), -> (member access via pointer), * (dereference) operators
Cannot use for, while, do-while nor if-else, switch and other conditionals
Can use std::for_each and other functions of function templates to iterate over collection of elements
No lambdas
No std::cout, std::cerr, std::ostream etc.
No auto types
Can use other STL templates so long as they are included in the headers described at (1)
Allowed to use these functions:
void print(const std::string& str)
{
std::cout << str << std::endl;
}
std::string split(const std::pair<std::string, size_t> &r)
{
std::string name;
std::tie(name, std::ignore) = r;
return name;
}
Originally, i had wanted to use std::for_each(std::begin(mymap), std::end(mymap), print) to iterate over the map and then use the print function to print out the contents. Then i realised that i am actually working with std::pair<std::string, size_t> which made me consider the use of std::bind and std::tie to break the std::pair up. But since i THINK i need to do it inside the std::for_each expression, how can i break up the std::pair while also call print on the elements?
I have also considered using Structured Binding but i am not allowed to use auto.
So, the question is, how do i make use of the STL to iterate the map to extract then print out the keys using the helper functions provided? Obviously, without the restrictions the challenge would have been very easy, but i am at a loss as to what kind of functions in the STL are appropriate in light of this.
I used from your function that takes a "std::pair& as for_each third argument.
I use printf() for print values.
#include <string>
#include <iostream>
#include <map>
#include <algorithm>
#include <vector>
using namespace std;
std::string Split(const std::pair<std::string, size_t> &r)
{
std::string name;
std::tie(name, std::ignore) = r;
return name;
}
int main()
{
string name1{ "John" };
string name2{ "Jack" };
std::map<std::string, size_t> sample = { {name1, 31}, {name2, 35} };
static vector<std::string> names;
std::for_each(sample.begin(), sample.end(), [](std::pair<std::string, size_t> pickup)
{
static int i = 0;
names.push_back(Split(pickup));
printf("%s\n", names[i].c_str());
i++;
});
}

Why the error no match for 'operator==', when using `std::find`?

I am using std::find to check a string isn't in std::vector<std::vector<string>>
Error:
no match for 'operator==' (operand types are 'std::vector<std::__cxx11::basic_string<char> >' and 'const char [6]')
Isn't it the type doesn't match?
vector< vector< string>>data;
if(find(data.begin(), data.end(), "START") == data.end()){
printf("Missing \"START\"\n");
return true;`
The reason for the error message has been well explained in the other answer. I would like to provide a solution to the problem.
As you are trying to find, if any of the std::string element in the vector of vector matches to "START", you could use standard algorithm std::any_of in combination with a unary predicate which returns std::find(vec.cbegin(), vec.cend(), str) != vec.cend(); where vec is the each rows of the vector of vectors. See a demo here
#include <algorithm>
#include <string>
#include <iostream>
#include <vector>
bool isFound(const std::vector<std::vector<std::string>>& data, const std::string &str)
{
const auto found_in_vector = [&str](const std::vector<std::string> & vec)-> bool {
return std::find(vec.cbegin(), vec.cend(), str) != vec.cend(); // if found
};
if (std::any_of(data.cbegin(), data.cend(), found_in_vector))
{
std::cout << "Found\n";
return true;
}
std::cout << "Missing \""<< str << " \"\n";
return false;
}
int main()
{
std::vector<std::vector<std::string>> data;
std::vector<std::string> test{ "START","test" };
data.emplace_back(test);
std::cout << std::boolalpha << isFound(data, std::string{ "START" } );
}
Yes and no. The error is triggered, because you've got a "vector of vectors of strings", i.e. there's one dimension too much. Define data using std::vector<std::string> instead and it will work.
But why does the error talk about a missing operators?
When you use std::find(), it's typically implemented as a macro or templated function, which does the actual work, rather than a precompiled runtime function somewhere in a library. This allows the compiler full optimization based on the actual types of your parameters.
What it actually does - since your container is a class - is trying to find a special member function, std::vector<std::vector<std::string>>::operator==(const char*). It's not directly implemented this way, typically a template instead, but that's not important here. The important fact is it won't find any version of operator==() with an argument that is somehow able to accept the string passed, either directly or through conversion. Reason for this is that your vector contains vectors, so the only valid argument would be another vector of strings.

how to find duplicates in std::vector<string> and return a list of them?

So if I have a vector of words like:
Vec1 = "words", "words", "are", "fun", "fun"
resulting list: "fun", "words"
I am trying to determine which words are duplicated, and return an alphabetized vector of 1 copy of them. My problem is that I don't even know where to start, the only thing close to it I found was std::unique_copy which doesn't exactly do what I need. And specifically, I am inputting a std::vector<std::string> but outputting a std::list<std::string>. And if needed, I can use functor.
Could someone at least push me in the right direction please? I already tried reading stl documentation,but I am just "brain" blocked right now.
In 3 lines (not counting the vector and list creation nor the superfluous line-breaks in name of readability):
vector<string> vec{"words", "words", "are", "fun", "fun"};
list<string> output;
sort(vec.begin(), vec.end());
set<string> uvec(vec.begin(), vec.end());
set_difference(vec.begin(), vec.end(),
uvec.begin(), uvec.end(),
back_inserter(output));
EDIT
Explanation of the solution:
Sorting the vector is needed in order to use set_difference() later.
The uvec set will automatically keep elements sorted, and eliminate duplicates.
The output list will be populated by the elements of vec - uvec.
Make an empty std::unordered_set<std::string>
Iterator your vector, checking whether each item is a member of the set
If it's already in the set, this is a duplicate, so add to your result list
Otherwise, add to the set.
Since you want each duplicate only listed once in the results, you can use a hashset (not list) for the results as well.
IMO, Ben Voigt started with a good basic idea, but I would caution against taking his wording too literally.
In particular, I dislike the idea of searching for the string in the set, then adding it to your set if it's not present, and adding it to the output if it was present. This basically means every time we encounter a new word, we search our set of existing words twice, once to check whether a word is present, and again to insert it because it wasn't. Most of that searching will be essentially identical -- unless some other thread mutates the structure in the interim (which could give a race condition).
Instead, I'd start by trying to add it to the set of words you've seen. That returns a pair<iterator, bool>, with the bool set to true if and only if the value was inserted -- i.e., was not previously present. That lets us consolidate the search for an existing string and the insertion of the new string together into a single insert:
while (input >> word)
if (!(existing.insert(word)).second)
output.insert(word);
This also cleans up the flow enough that it's pretty easy to turn the test into a functor that we can then use with std::remove_copy_if to produce our results quite directly:
#include <set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
class show_copies {
std::set<std::string> existing;
public:
bool operator()(std::string const &in) {
return existing.insert(in).second;
}
};
int main() {
std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
std::set<std::string> result;
std::remove_copy_if(words.begin(), words.end(),
std::inserter(result, result.end()), show_copies());
for (auto const &s : result)
std::cout << s << "\n";
}
Depending on whether I cared more about code simplicity or execution speed, I might use an std::vector instead of the set for result, and use std::sort followed by std::unique_copy to produce the final result. In such a case I'd probably also replace the std::set inside of show_copies with an std::unordered_set instead:
#include <unordered_set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
class show_copies {
std::unordered_set<std::string> existing;
public:
bool operator()(std::string const &in) {
return existing.insert(in).second;
}
};
int main() {
std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
std::vector<std::string> intermediate;
std::remove_copy_if(words.begin(), words.end(),
std::back_inserter(intermediate), show_copies());
std::sort(intermediate.begin(), intermediate.end());
std::unique_copy(intermediate.begin(), intermediate.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
This is marginally more complex (one whole line longer!) but likely to be substantially faster when/if the number of words gets very large. Also note that I'm using std::unique_copy primarily to produce visible output. If you just want the result in a collection, you can use the standard unique/erase idiom to get unique items in intermediate.
In place (no additional storage). No string copying (except to result list). One sort + one pass:
#include <string>
#include <vector>
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
vector<string> vec{"words", "words", "are", "fun", "fun"};
list<string> dup;
sort(vec.begin(), vec.end());
const string empty{""};
const string* prev_p = ∅
for(const string& s: vec) {
if (*prev_p==s) dup.push_back(s);
prev_p = &s;
}
for(auto& w: dup) cout << w << ' ';
cout << '\n';
}
You can get a pretty clean implementation using a std::map to count the occurrences, and then relying on std::list::sort to sort the resulting list of words. For example:
std::list<std::string> duplicateWordList(const std::vector<std::string>& words) {
std::map<std::string, int> temp;
std::list<std::string> ret;
for (std::vector<std::string>::const_iterator iter = words.begin(); iter != words.end(); ++iter) {
temp[*iter] += 1;
// only add the word to our return list on the second copy
// (first copy doesn't count, third and later copies have already been handled)
if (temp[*iter] == 2) {
ret.push_back(*iter);
}
}
ret.sort();
return ret;
}
Using a std::map there seems a little wasteful, but it gets the job done.
Here's a better algorithm than the ones other people have proposed:
#include <algorithm>
#include <vector>
template<class It> It unique2(It const begin, It const end)
{
It i = begin;
if (i != end)
{
It j = i;
for (++j; j != end; ++j)
{
if (*i != *j)
{ using std::swap; swap(*++i, *j); }
}
++i;
}
return i;
}
int main()
{
std::vector<std::string> v;
v.push_back("words");
v.push_back("words");
v.push_back("are");
v.push_back("fun");
v.push_back("words");
v.push_back("fun");
v.push_back("fun");
std::sort(v.begin(), v.end());
v.erase(v.begin(), unique2(v.begin(), v.end()));
std::sort(v.begin(), v.end());
v.erase(unique2(v.begin(), v.end()), v.end());
}
It's better because it only requires swap with no auxiliary vector for storage, which means it will behave optimally for earlier versions of C++, and it doesn't require elements to be copyable.
If you're more clever, I think you can avoid sorting the vector twice as well.

What is wrong with `std::set`?

In the other topic I was trying to solve this problem. The problem was to remove duplicate characters from a std::string.
std::string s= "saaangeetha";
Since the order was not important, so I sorted s first, and then used std::unique and finally resized it to get the desired result:
aeghnst
That is correct!
Now I want to do the same, but at the same time I want the order of characters intact. Means, I want this output:
sangeth
So I wrote this:
template<typename T>
struct is_repeated
{
std::set<T> unique;
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
s.erase(std::remove_if(s.begin(), s.end(), is_repeated<char>()), s.end());
std::cout << s ;
}
Which gives this output:
saangeth
That is, a is repeated, though other repetitions gone. What is wrong with the code?
Anyway I change my code a bit: (see the comment)
template<typename T>
struct is_repeated
{
std::set<T> & unique; //made reference!
is_repeated(std::set<T> &s) : unique(s) {} //added line!
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
std::set<char> set; //added line!
s.erase(std::remove_if(s.begin(),s.end(),is_repeated<char>(set)),s.end());
std::cout << s ;
}
Output:
sangeth
Problem gone!
So what is wrong with the first solution?
Also, if I don't make the member variable unique reference type, then the problem doesn't go.
What is wrong with std::set or is_repeated functor? Where exactly is the problem?
I also note that if the is_repeated functor is copied somewhere, then every member of it is also copied. I don't see the problem here!
Functors are supposed to be designed in a way where a copy of a functor is identical to the original functor. That is, if you make a copy of one functor and then perform a sequence of operations, the result should be the same no matter which functor you use, or even if you interleave the two functors. This gives the STL implementation the flexibility to copy functors and pass them around as it sees fit.
With your first functor, this claim does not hold because if I copy your functor and then call it, the changes you make to its stored set do not reflect in the original functor, so the copy and the original will perform differently. Similarly, if you take your second functor and make it not store its set by reference, the two copies of the functor will not behave identically.
The reason that your final version of the functor works, though, is because the fact that the set is stored by reference means that any number of copies of tue functor will behave identically to one another.
Hope this helps!
In GCC (libstdc++), remove_if is implemented essentially as
template<typename It, typename Pred>
It remove_if(It first, It last, Pred predicate) {
first = std::find_if(first, last, predicate);
// ^^^^^^^^^
if (first == last)
return first;
else {
It result = first;
++ result;
for (; first != last; ++ first) {
if (!predicate(*first)) {
// ^^^^^^^^^
*result = std::move(*first);
++ result;
}
}
}
}
Note that your predicate is passed by-value to find_if, so the struct, and therefore the set, modified inside find_if will not be propagated back to caller.
Since the first duplicate appears at:
saaangeetha
// ^
The initial "sa" will be kept after the find_if call. Meanwhile, the predicate's set is empty (the insertions within find_if are local). Therefore the loop afterwards will keep the 3rd a.
sa | angeth
// ^^ ^^^^^^
// || kept by the loop in remove_if
// ||
// kept by find_if
Not really an answer, but as another interesting tidbit to consider, this does work, even though it uses the original functor:
#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
template<typename T>
struct is_repeated {
std::set<T> unique;
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
std::remove_copy_if(s.begin(), s.end(),
std::ostream_iterator<char>(std::cout),
is_repeated<char>());
return 0;
}
Edit: I don't think it affects this behavior, but I've also corrected a minor slip in your functor (operator() should apparently take a parameter of type T, not char).
I suppose the problem could lie in that the is_repeated functor is copied somewhere inside the implementation of std::remove_if. If that is the case, the default copy constructor is used and this in turn calls std::set copy constructor. You end up with two is_repeated functors possibly used independently. However as the sets in both of them are distinct objects, they don't see the mutual changes. If you turn the field is_repeated::unique to a reference, then the copied functor still uses the original set which is what you want in this case.
Functor classes should be pure functions and have no state of their own. See item 39 in Scott Meyer's Effective STL book for a good explanation on this. But the gist of it is that your functor class may be copied 1 or more times inside the algorithm.
The other answers are correct, in that the issue is that the functor that you are using is not copyable safe. In particular, the STL that comes with gcc (4.2) implements std::remove_if as a combination of std::find_if to locate the first element to delete followed by a std::remove_copy_if to complete the operation.
template <typename ForwardIterator, typename Predicate>
std::remove_if( ForwardIterator first, ForwardIterator end, Predicate pred ) {
first = std::find_if( first, end, pred ); // [1]
ForwardIterator i = it;
return first == last? first
: std::remove_copy_if( ++i, end, fist, pred ); // [2]
}
The copy in [1] means that the first element found is added to the copy of the functor and that means that the first 'a' will be lost in oblivion. The functor is also copied in [2], and that would be fine if it were not because the original for that copy is an empty functor.
Depending on the implementation of remove_if can make copies of your predicate. Either refactor your functor and make it stateless or use Boost.Ref to "for passing references to function templates (algorithms) that would usually take copies of their arguments", like so:
#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <boost/ref.hpp>
#include <boost/bind.hpp>
template<typename T>
struct is_repeated {
std::set<T> unique;
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
s.erase(std::remove_if(s.begin(), s.end(), boost::bind<bool>(boost::ref(is_repeated<char>()),_1)), s.end());
std::cout << s;
return 0;
}