C++ Unexpected behavior with remove_if - c++

I am trying to use std::remove_if to remove spaces from a simple string, but I am getting weird results. Could someone help me figure out what's going on?
The Code is:
#include <iostream>
#include <algorithm>
#include <string>
int main(int argc, char * argv[])
{
std::string test = "a b";
std::remove_if(test.begin(), test.end(), isspace);
std::cout << "test : " << test << std::endl;
return 0;
}
I expect this to simply print out:
test : ab
but instead I get
test : abb
Trying with another string, I get:
Input: "a bcde uv xy"
Output: "abcdeuvxy xy"
It seems like it duplicating the last "word", but sometimes adds a space. How can I just get it to remove all spaces without doing weird stuff?

std::remove_if performs removing by shifting elements; the removed elements won't be erased from the container in fact. STL algorithms don't have such privilege; only containers can remove their elements.
(emphasis mine)
Removing is done by shifting (by means of move assignment) the
elements in the range in such a way that the elements that are not to
be removed appear in the beginning of the range. Relative order of the
elements that remain is preserved and the physical size of the
container is unchanged. Iterators pointing to an element between the
new logical end and the physical end of the range are still
dereferenceable, but the elements themselves have unspecified values
(as per MoveAssignable post-condition). A call to remove is typically
followed by a call to a container's erase method, which erases the
unspecified values and reduces the physical size of the container to
match its new logical size.
You can erase the removed elements afterward (which is known as erase–remove idiom).
test.erase(std::remove_if(test.begin(), test.end(), isspace), test.end());

Related

Where will a new element be inserted in a std::set?

I have a loop like this (where mySet is a std::set):
for(auto iter=mySet.begin(); iter!=mySet.end(); ++iter){
if (someCondition){mySet.insert(newElement);}
if (someotherCondition){mySet.insert(anothernewElement);}
}
I am experiencing some strange behavior, and I am asking myself if this could be due to the inserted element being inserted "before" the current iterator position in the loop. Namely, I have an Iteration where both conditions are true, but still the distance
distance(iter, mySet.end())
is only 1, not 2 as I would expect. Is my guess about set behavior right? And more importantly, can I still do what I want to do?
what I'm trying to do is to build "chains" on a hexagonal board beween fields of the same color. I have a set containing all fields of my color, and the conditions check the color of neighboring fields, and if they are of the same color, copy this field to mySet, so the chain.
I am trying to use std::set for this because it allows no fields to be in the chain more than once. Reading the comments so far I fear I need to swich to std::vector, where append() will surely add the element at the end, but then I will run into new problems due to having to think of a way to forbid doubling of elements. I therefore am hoping for advice how to solve this the best way.
Depending on the new element's value, it may be inserted before or after current iterator value. Below is an example of inserting before and after an iterator.
#include <iostream>
#include <set>
int main()
{
std::set<int> s;
s.insert(3);
auto it = s.begin();
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(2); // 2 will be inserted before it
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(5); // 5 will be inserted after it
std::cout << std::distance(it, s.end()) << std::endl; // prints 2
}
Regarding your question in the comments: In my particular case, modifying it while iterating is basically exactly what I want, but of course I need to add averything after the current position; no you can not manually arrange the order of the elements. A new value's order is determined by comparing the new one and existing elements. Below is the quote from cppreference.
std::set is an associative container that contains a sorted set of unique objects of type Key. Sorting is done using the key comparison function Compare. Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.
Thus, the implementation of the set will decide where exactly it will be placed.
If you really need to add values after current position, you need to use a different container. For example, simply a vector would be suitable:
it = myvector.insert ( it+1 , 200 ); // +1 to add after it
If you have a small number of items, doing a brute-force check to see if they're inside a vector can actually be faster than checking if they're in a set. This is because vectors tend to have better cache locality than lists.
We can write a function to do this pretty easily:
template<class T>
void insert_unique(std::vector<T>& vect, T const& elem) {
if(std::find(vect.begin(), vect.end(), elem) != vect.end()) {
vect.push_back(elem);
}
}

Is it OK to insert an empty into another set?

Sorry for naive questions, is it OK to insert an empty set to another set using range function or it is an undefied behavior ?
Test run in https://ideone.com/RNGIFT seems fine, checking the reference saying
If the container is empty, the returned iterator will be equal to end().
#include <iostream>
#include <set>
using namespace std;
int main() {
std::set<string> to_be_inserted;
std::set<string> res;
cout << "check everything is fine" << endl;
res.insert(to_be_inserted.begin(), to_be_inserted.end());
cout << "how about now ?" << endl;
return 0;
}
Yes, most things in C++ relating to iterators will work this way in edge cases such as empty containers so that algorithms relying on the begin and end member functions on containers do not require special code in such circumstances.
Since begin will return the end iterator in the case of the set being empty as you showed, it will effectively make a range of [end, end), which has a length of 0 (as can be checked by functions like std::distance) thus preforming no insertion operations (while also being defined behavior).
This can be seen in practice in a standard library implementation, such as libc++ here where that specific overload of insert walks down the range with a for loop which has an exit condition of the two iterators (first and last) being equal, inserting elements as it goes. In the case of passing an empty range like that to it where the first and last are equal, it'll not even enter the loop.

Don't understand results of std::remove in C++ STL

I was reading Josuttis "The C++ Standard Library, 2nd ed.". In section 6.7.1 author explains that the code given below will give unexpected results. I still don't how std::remove() functions, and why I am getting this strange result. (Though I understood that you need to use std::erase() in order to actually remove elements, and it is actually better to use list::erase() rather than combination of std::remove() & `std::remove()).
list<int> coll;
// insert elements from 6 to 1 and 1 to 6
for (int i=1; i<=6; ++i) {
coll.push_front(i);
coll.push_back(i);
}
// print
copy (coll.cbegin(), coll.cend(), // source
ostream_iterator<int>(cout," ")); // destination
cout << endl;
// remove all elements with value 3
remove (coll.begin(), coll.end(), // range
3); // value
// print (same as above)
and the results are
pre: 6 5 4 3 2 1 1 2 3 4 5 6
post: 6 5 4 2 1 1 2 4 5 6 5 6 (???)
This explanation should help:
Removing is done by shifting the elements in the range in such a way
that elements to be erased are overwritten. Relative order of the
elements that remain is preserved and the physical size of the
container is unchanged. Iterators pointing to an element between the
new logical end and the physical end of the range are still
dereferenceable, but the elements themselves have unspecified values.
A call to remove is typically followed by a call to a container's
erase method, which erases the unspecified values and reduces the
physical size of the container to match its new logical size.
Note that the return value from std::remove() is the iterator that represents the new end. Therefore, calling std::erase() on this new end and the old end will free your excess space.
std::remove doesn't actually shorten the list. It can't - as it only gets iterators and not the container itself.
What it does is copies the remaining values so that you get them in the beginning of the container. But the final elements of the container (in your case - the last two: '5' and '6') are actually still there..
After using std::remove you have to shorten to container yourself to remove the remaining "junk" copies.
You asked the algorithm to remove "3" element. So, while enumerating the container the algo shifts the content if something is removed from the middle. Such shift occurs 2 times in your case, this is why you see "5 6" elements at the end (because actual end was moved to 2 items forward). Then, "std::erase" will fix the issue with tail zombies.
To quote from everyone's favorite c++ website:
The function cannot alter the properties of the object containing the
range of elements (i.e., it cannot alter the size of an array or a
container): The removal is done by replacing the elements that compare
equal to val by the next element that does not, and signaling the new
size of the shortened range by returning an iterator to the element
that should be considered its new past-the-end element.
So std::remove doesn't change the size of the list. It removes the matching elements and returns you an iterator that represents the new end of the list. To actually erase the extraneous elements, you then need to do:
auto it = remove(coll.begin(), coll.end(), 3);
coll.erase(it, coll.end());

Word Frequency Statistics

In an pre-interview, I am faced with a question like this:
Given a string consists of words separated by a single white space, print out the words in descending order sorted by the number of times they appear in the string.
For example an input string of “a b b” would generate the following output:
b : 2
a : 1
Firstly, I'd say it is not so clear that whether the input string is made up of single-letter words or multiple-letter words. If the former is the case, it could be simple.
Here is my thought:
int c[26] = {0};
char *pIn = strIn;
while (*pIn != 0 && *pIn != ' ')
{
++c[*pIn];
++pIn;
}
/* how to sort the array c[26] and remember the original index? */
I can get the statistics of the frequecy of every single-letter word in the input string, and I can get it sorted (using QuickSort or whatever). But after the count array is sorted, how to get the single-letter word associated with the count so that I can print them out in pair later?
If the input string is made of of multiple-letter word, I plan to use a map<const char *, int> to track the frequency. But again, how to sort the map's key-value pair?
The question is in C or C++, and any suggestion is welcome.
Thanks!
I would use a std::map<std::string, int> to store the words and their counts. Then I would use something this to get the words:
while(std::cin >> word) {
// increment map's count for that word
}
finally, you just need to figure out how to print them in order of frequency, I'll leave that as an exercise for you.
You're definitely wrong in assuming that you need only 26 options, 'cause your employer will want to allow multiple-character words as well (and maybe even numbers?).
This means you're going to need an array with a variable length. I strongly recommend using a vector or, even better, a map.
To find the character sequences in the string, find your current position (start at 0) and the position of the next space. Then that's the word. Set the current position to the space and do it again. Keep repeating this until you're at the end.
By using the map you'll already have the word/count available.
If the job you're applying for requires university skills, I strongly recommend optimizing the map by adding some kind of hashing function. However, judging by the difficulty of the question I assume that that is not the case.
Taking the C-language case:
I like brute-force, straightforward algos so I would do it in this way:
Tokenize the input string to give an unsorted array of words. I'll have to actually, physically move each word (because each is of variable length); and I think I'll need an array of char*, which I'll use as the arg to qsort( ).
qsort( ) (descending) that array of words. (In the COMPAR function of qsort(), pretend that bigger words are smaller words so that the array acquires descending sort order.)
3.a. Go through the now-sorted array, looking for subarrays of identical words. The end of a subarray, and the beginning of the next, is signalled by the first non-identical word I see.
3.b. When I get to the end of a subarray (or to the end of the sorted array), I know (1) the word and (2) the number of identical words in the subarray.
EDIT new step 4: Save, in another array (call it array2), a char* to a word in the subarry and the count of identical words in the subarray.
When no more words in sorted array, I'm done. it's time to print.
qsort( ) array2 by word frequency.
go through array2, printing each word and its frequency.
I'M DONE! Let's go to lunch.
All the answers prior to mine did not give really an answer.
Let us think on a potential solution.
There is a more or less standard approach for counting something in a container.
We can use an associative container like a std::map or a std::unordered_map. And here we associate a "key", in this case the word, to a count, with a value, in this case the count of the specific word.
And luckily the maps have a very nice index operator[]. This will look for the given key and, if found, return a reference to the value. If not found, then it will create a new entry with the key and return a reference to the new entry. So, in both cases, we will get a reference to the value used for counting. And then we can simply write:
std::unordered_map<char,int> counter{};
counter[word]++;
And that looks really intuitive.
After this operation, you have already the frequency table. Either sorted by the key (the word), by using a std::map or unsorted, but faster accessible with a std::unordered_map.
Now you want to sort according to the frequency/count. Unfortunately this is not possible with maps.
Therefore we need to use a second container, like a ```std::vector`````which we then can sort unsing std::sort for any given predicate, or, we can copy the values into a container, like a std::multiset that implicitely orders its elements.
For getting out the words of a std::string we simply use a std::istringstream and the standard extraction operator >>. No big deal at all.
And because writing all this long names for the std containers, we create alias names, with the using keyword.
After all this, we now write ultra compact code and fulfill the task with just a few lines of code:
#include <iostream>
#include <string>
#include <sstream>
#include <utility>
#include <set>
#include <unordered_map>
#include <type_traits>
#include <iomanip>
// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;
// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;
// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Rank = std::multiset<Pair, Comp>;
// ------------------------------------------------------------
std::istringstream text{ " 4444 55555 1 22 4444 333 55555 333 333 4444 4444 55555 55555 55555 22 "};
int main() {
Counter counter;
// Count
for (std::string word{}; text >> word; counter[word]++);
// Sort
Rank rank(counter.begin(), counter.end());
// Output
for (const auto& [word, count] : rank) std::cout << std::setw(15) << word << " : " << count << '\n';
}

Modifying contents of vector in BOOST_FOREACH

This is a question that goes to how BOOST_FOREACH checks it's loop termination
cout << "Testing BOOST_FOREACH" << endl;
vector<int> numbers; numbers.reserve(8);
numbers.push_back(1); numbers.push_back(2); numbers.push_back(3);
cout << "capacity = " << numbers.capacity() << endl;
BOOST_FOREACH(int elem, numbers)
{
cout << elem << endl;
if (elem == 2) numbers.push_back(4);
}
cout << "capacity = " << numbers.capacity() << endl;
gives the output
Testing BOOST_FOREACH
capacity = 8
1
2
3
capacity = 8
But what about the number 4 which was inserted half way through the loop? If I change the type to a list the newly inserted number will be iterated over. The vector push_back operation will invalidate any pointers IF a reallocation is required, however that is not happening in this example. So the question I guess is why does the end() iterator appear to only be evaluated once (before the loop) when using vector but has a more dynamic evaluation when using a list?
Under the covers, BOOST_FOREACH uses
iterators to traverse the element
sequence. Before the loop is executed,
the end iterator is cached in a local
variable. This is called hoisting, and
it is an important optimization. It
assumes, however, that the end
iterator of the sequence is stable. It
usually is, but if we modify the
sequence by adding or removing
elements while we are iterating over
it, we may end up hoisting ourselves
on our own petard.
http://www.boost.org/doc/libs/1_40_0/doc/html/foreach/pitfalls.html
If you don't want the end() iterator to change use resize on the vector rather than reserve.
http://www.cplusplus.com/reference/stl/vector/resize/
Note that then you wouldn't want to push_back but use the operator[] instead. But be careful of going out of bounds.
The question was raised in the comments as to why the Microsoft debug runtime raises an assertion during iteration over the vector but not over the list. The reason is that insert is defined differently for list and vector (note that push_back is just an insert at the end of the sequence).
Per the C++ standard (ISO/IEC 14882:2003 23.2.4.3, vector modifiers):
[on insertion], if no reallocation happens, all the iterators and references before the insertion point remain valid.
(23.2.2.3, list modifiers):
[insert] does not affect the validity of iterators and references.
So, if you use push_back (and are sure that it's not going to cause a reallocation), it's okay with either container to continue using your iterator to iterate over the rest of the sequence.
In the case of the vector, however, it's undefined behavior to use the end iterator that you obtained before the push_back.
This is a roundabout answer to the question; it's a direct answer to the discussion in the question's comments.
boost's foreach will terminate when it's iterator == numbers.end()
Be careful though, calling push_back can/will invalidate any current iterators you have.