Replace multiple spaces with one space in a string - c++

How would I do something in c++ similar to the following code:
//Lang: Java
string.replaceAll(" ", " ");
This code-snippet would replace all multiple spaces in a string with a single space.

bool BothAreSpaces(char lhs, char rhs) { return (lhs == rhs) && (lhs == ' '); }
std::string::iterator new_end = std::unique(str.begin(), str.end(), BothAreSpaces);
str.erase(new_end, str.end());
How this works. The std::unique has two forms. The first form goes through a range and removes adjacent duplicates. So the string "abbaaabbbb" becomes "abab". The second form, which I used, takes a predicate which should take two elements and return true if they should be considered duplicates. The function I wrote, BothAreSpaces, serves this purpose. It determines exactly what it's name implies, that both of it's parameters are spaces. So when combined with std::unique, duplicate adjacent spaces are removed.
Just like std::remove and remove_if, std::unique doesn't actually make the container smaller, it just moves elements at the end closer to the beginning. It returns an iterator to the new end of range so you can use that to call the erase function, which is a member function of the string class.
Breaking it down, the erase function takes two parameters, a begin and an end iterator for a range to erase. For it's first parameter I'm passing the return value of std::unique, because that's where I want to start erasing. For it's second parameter, I am passing the string's end iterator.

So, I tried a way with std::remove_if & lambda expressions - though it seems still in my eyes easier to follow than above code, it doesn't have that "wow neat, didn't realize you could do that" thing to it.. Anyways I still post it, if only for learning purposes:
bool prev(false);
char rem(' ');
auto iter = std::remove_if(str.begin(), str.end(), [&] (char c) -> bool {
if (c == rem && prev) {
return true;
}
prev = (c == rem);
return false;
});
in.erase(iter, in.end());
EDIT realized that std::remove_if returns an iterator which can be used.. removed unnecessary code.

A variant of Benjamin Lindley's answer that uses a lambda expression to make things cleaner:
std::string::iterator new_end =
std::unique(str.begin(), str.end(),
[=](char lhs, char rhs){ return (lhs == rhs) && (lhs == ' '); }
);
str.erase(new_end, str.end());

Why not use a regular expression:
boost::regex_replace(str, boost::regex("[' ']{2,}"), " ");

how about isspace(lhs) && isspace(rhs) to handle all types of whitespace

Related

How to adapt a string splitting algorithm using pointers so it uses iterators instead?

The code below comes from an answer to this question on string splitting. It uses pointers, and a comment on that answer suggested it could be adapted for std::string. How can I use the features of std::string to implement the same algorithm, for example using iterators?
#include <vector>
#include <string>
using namespace std;
vector<string> split(const char *str, char c = ',')
{
vector<string> result;
do
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}
Ok so I obviously replaced char by string but then I noticed he is using a pointer to the beginning of the character. Is that even possible for strings? How do the loop termination criteria change? Is there anything else I need to worry about when making this change?
You can use iterators instead of pointers. Iterators provide a way to traverse containers, and can usually be thought of as analogous to pointers.
In this case, you can use the begin() member function (or cbegin() if you don't need to modify the elements) of a std::string object to obtain an iterator that references the first character, and the end() (or cend()) member function to obtain an iterator for "one-past-the-end".
For the inner loop, your termination criterion is the same; you want to stop when you hit the delimiter on which you'll be splitting the string. For the outer loop, instead of comparing the character value against '\0', you can compare the iterator against the end iterator you already obtained from the end() member function. The rest of the algorithm is pretty similar; iterators work like pointers in terms of dereference and increment:
std::vector<std::string> split(const std::string& str, const char delim = ',') {
std::vector<std::string> result;
auto end = str.cend();
auto iter = str.cbegin();
while (iter != end) {
auto begin = iter;
while (iter != end && *iter != delim) ++iter;
result.push_back(std::string(begin, iter));
if (iter != end) ++iter; // See note (**) below.
}
return result;
}
Note the subtle difference in the inner loop condition: it now tests whether we've hit the end before trying to dereference. This is because we can't dereference an iterator that points to the end of a container, so we must check this before trying to dereference. The original algorithm assumes that a null character ends the string, so we're ok to dereference a pointer to that position.
(**) The validity of iter++ != end when iter is already end is under discussion in Are end+1 iterators for std::string allowed?
I've added this if statement to the original algorithm to break the loop when iter reaches end in the inner loop. This avoids adding one to an iterator which is already the end iterator, and avoids the potential problem.

Logical negation of a predicate in C++

I'm following the book Accelerated C++, and to write a function to split a string into a vector of words (separated by space characters), find_if is utilized.
vector<string> split(const string& str) {
typedef string::const_iterator iter;
vector<string> ret;
iter i = str.begin();
while (i != str.end()) {
i = find_if(i, str.end(), not_space);
iter j = find_if(i, str.end(), space);
if (i != str.end())
ret.push_back(string(i, j));
i = j;
}
return ret;
}
and the definitions of space and not_space:
bool space(char c) {
return isspace(c);
}
bool not_space(char c) {
return !isspace(c);
}
Is it necessary to write two separate predicates here, or could one simply pass !space in place of not_space?
Just use std::not1(std::ptr_fun(space)). std::not1 is declared in <functional>.
(There is also a std::not2 for use with binary predicates; std::not1 is for unary predicates.)
You cannot simply use !space instead of not_space because all you'll be doing in that case is passing false to find_if. That happens because space will decay to a pointer to function, and function pointers are implicitly convertible to bool. Applying ! to the boolean value will always result in false (because the function pointer is never going to be nullptr).
You can reuse the function space by wrapping it in std::not1, which will negate the result of the predicate passed to it. Unfortunately, it's not as simple as writing std::not1(space), because not1 requires that the predicate define a nested type named argument_type, which your predicate doesn't satisfy.
To convert your function into a predicate usable with not1, you must first wrap it in std::ptr_fun. So the line in your split function becomes:
i = find_if(i, str.end(), std::not1(std::ptr_fun(space)));
With C++11, there's no need for the not1 and ptr_fun shenanigans, just use a lambda expression:
i = find_if(i, str.end(), [](char c) {return !space(c);});
You can also declare
template <bool find_space> bool space(char c) {
return find_space ^ (!isspace(c));
}
and then refer to it as space<true> and space<false> in the argument to find_if(). Much more versatile than std::not1().

Standard algorithm for accumulating a container into a string with a delimiter separating the entries?

I am looking for a standard library equivalent of this code for accumulating elements of an std container into a string with a delimiter separating consecutive entries:
string accumulate_with_delimiter( vector<string> strvect, string delimiter )
{
string answer;
for( vector<string>::const_iterator it = strvect.begin(); it != strvect.end(); ++it )
{
answer += *it;
if( it + 1 != strvect.end() )
{
answer += delimiter;
}
}
return answer;
}
Such code seems to be very common: printing out an array with delimiter " ", or saving into a CSV file with delimiter ",", etc. Therefore it's likely that a piece of code like that made its way into a standard library. std::accumulate comes close, but doesn't have a delimiter.
I don't think the standard C++ library has a nice approach to delimiting sequences. I typically end up using something like
std::ostringstream out;
if (!container.empty()) {
auto end(container.end());
std::copy(container.begin(), --end, std::ostream_iterator<T>(out, ", "));
out << *end;
}
Using std::accumulate() has a similar problem of although with the first element rather than the last element. Using a custom add function, you could use it something like this:
std::string concat;
if (!container.empty()) {
auto begin(container.begin());
concat = std::accumulate(++begin, container.end(), container.front(),
[](std::string f, std::string s) { return f + ", " + s; });
}
In both cases the iterators need to be moved to another element. The code uses temporary objects when moving the iterator because the container may use pointers as iterator in which case a pre-increment or pre-decrement on the result from begin() or end() doesn't work.
std::accumulate might be the correct answer, but you need the version which takes a custom adder. You can then provide your own lambda.
Remember to pass front() as the first value to accumulate, and start adding at begin() + 1. And test for empty vectors first of course.
I'm not sure if there is one in the recent Standard Library or not, but there is always boost::algorithm::join(strvec, delimiter).

C++ removing punctuation on strings, erase()/iterator issue

I know I'm not the first person to bring up the issue with reverse iterators trying to call the erase() method on strings. However, I wasn't able to find any good ways around this.
I'm reading the contents of a file, which contains a bunch of words. When I read in a word, I want to pass it to a function I have called stripPunct. However, I ONLY want to strip punctuation at the beginning and end of a string, not in the middle.
So for instance:
(word) should strip '(' and ')' resulting in just word
don't! should strip '!' resulting in just don't
So my logic (which I'm sure could be improved) was to have two while loops, one starting at the end and one at the beginning, traversing and erasing until it hits a non-punctuation char.
void stripPunct(string & str) {
string::iterator itr1 = str.begin();
string::reverse_iterator itr2 = str.rbegin();
while ( ispunct(*itr1) ) {
str.erase(itr1);
itr1++;
}
while ( ispunct(*itr2) ) {
str.erase(itr2);
itr2--;
}
}
However, obviously it's not working because erase() requires a regular iterator and not a reverse_iterator. But anyways, I feel like that logic is pretty inefficient.
Also, I tried instead of a reverse_iterator using just a regular iterator, starting it at str.end(), then decremented it, but it says I cannot dereference the iterator if I start it at str.end().
Can anyone help me with a good way to do this? Or maybe point out a workaround for what I already have?
Thank you so much in advance!
------------------ [ EDIT ] ----------------------------
found a solution, although it may not be the best solution:
// Call the stripPunct method:
stripPunct(str);
if ( !str.empty() ) { // make sure string is still valid
// perform other code
}
And here is the stripPunct method:
void stripPunct(string & str) {
string::iterator itr1 = str.begin();
string::iterator itr2 = str.end();
while ( !(str.empty()) && ispunct(*itr1) )
itr1 = str.erase(itr1);
itr2--;
if ( itr2 != str.begin() ) {
while ( !(str.empty()) && ispunct(*itr2) ) {
itr2 = str.erase(itr2);
itr2--;
}
}
}
First, note a couple problems with your code:
after you call erase() using itr1, you've invalidated itr2.
when using a reverse_iterator to go backwards through a sequence, you want to use ++, not -- (that's kind of the reason reverse iterators exist).
Now, to improve the logic, you can avoid erasing each character individually by finding the first charater you don't want to erase and erase everything up to that point. find_if() can be used to help with that:
int not_punct(char c) {
return !ispunct((unsigned char) c);
}
void stripPunct(string & str) {
string::iterator itr = find_if( str.begin(), str.end(), not_punct);
str.erase( str.begin(), itr);
string::reverse_iterator ritr = find_if( str.rbegin(), str.rend(), not_punct);
str.erase( ritr.base(), str.end());
}
Note that I've used base() to get the 'regular' iterator corresponding to the reverse_iterator. I find the logic for whether base() needs to be adjusted confusing (reverse iterators in general confuse me)- in this case it doesn't because we happen to want to start the erase after the character that's found.
This article by Scott Meyers, http://drdobbs.com/cpp/184401406, has a good treatment of reverse_iterator::base() in the section. "Guideline 3: Understand How to Use a reverse_iterator's Base iterator". The information in that article has also been incorporated into Meyer's "Effective STL" book.
You can't dereference iterator::end() because it points to invalid memory (memory right after the end of the array), so you have to decrement it first.
And one final note: if the word consists only of punctuations, your program will fail, be sure to handle that.
If you don't mind negative logic, you can do the following:
string tmp_str="";
tmp_str.reserve(str.length());
for (string::iterator itr1 = str.begin(); itr1 != str.end(); itr1++)
{
if (!ispunct(*itr1))
{
tmp_str.push_back(*itr1);
}
}
str = tmp_str;

C++ STL: Trouble with iterators

I'm having a beginner problem:
bool _isPalindrome(const string& str)
{
return _isPalindrome(str.begin(), str.end()); // won't compile
}
bool _isPalindrome(string::iterator begin, string::iterator end)
{
return begin == end || *begin == *end && _isPalindrome(++begin, --end);
}
What am I doing wrong here? Why doesn't str.begin() get type checked to be a string::iterator?
Update: Better version:
bool BrittlePalindrome::_isPalindrome(string::const_iterator begin, string::const_iterator end)
{
return begin >= end || *begin == *(end - 1) && _isPalindrome(++begin, --end);
}
Assuming that you have a declaration of the second function before the first function, the main issue is that you are passing the strings by const reference.
This means that the only overloads of begin() and end() that you have access to are the const versions which return std::string::const_iterator and not std::string::iterator.
The convention for iterators is that the end iterator points one beyond the end of a range and is not dereferencable - certainly if you pass str.end() as the end parameter. This means that *begin == *end is not valid, you need to decrement end once first. You are also going to have an issue with ranges with odd numbers of elements. By doing ++begin and --end with no further checking your iterators may cross over in the recursion rather than triggering the begin == end condition.
Also note that for maximum portability, global identifiers shouldn't start with an underscore.
str.begin() is non-const, while the argument str is const.
You can either change the iterator-accepting method to accept const_iterators, or you can change the string-accepting method to accept a non-const string.
Or you could cast away str's const-ness, but that would be a patent Bad Idea TM.
(I would also parenthesize your return statement on the iterator-accepting method to make your intent more clear, but that's neither here nor there.)
As previously mentioned your iterators need to be constant iterators, but there's something else wrong with your algorithm. It works fine if you have a string of odd length, but do you see what happens when your string is even length? Consider the palindrome:
aa
Your algorithm will pass in an iterator pointing to the front and to the end. All's good, then it will go to the next level, and all will still be good, but it won't end. Because your first condition will never be true. You need to check not only if begin==end but if begin+1==end or begin==end-1 if you prefer. Otherwise you're iterators are going to be upset.
What error are you getting?
Have you tried this?
bool _isPalindrome(string::const_iterator begin, string::const_iterator end)
replace iterator by const_iterator
swap function definitions
decrement end
Code:
bool isPalindrome(string::const_iterator begin, string::const_iterator end)
{
return (begin == end || begin == --end ||
*begin == *end && isPalindrome(++begin, end));
}
bool isPalindrome(const string& str)
{
return isPalindrome(str.begin(), str.end());
}
You haven't declared the second function before calling it in the first function. The compiler can't find it and thus tries to convert str.begin() (string::iterator) into a const string &. You can move the first function behind the second function.