How C char is different from C++ string element accessed using index - c++

I am using C++ erase remove idiom and facing a weird problem.
If I access element using string index result is not as expected.
string str = "A man, a plan, a canal: Panama";
str.erase(remove(str.begin(), str.end(), str[1]), str.end());
Result : Aman, a plan, a canal: Panaa
and if I use as below, result is as expected.
string str = "A man, a plan, a canal: Panama";
str.erase(remove(str.begin(), str.end(), ' '), str.end());
Result : Aman,aplan,acanal:Panama

Look at the signature of std::remove:
template< class ForwardIt, class T >
ForwardIt remove( ForwardIt first, ForwardIt last, const T& value );
The value is passed by const reference, therefore after removing first space your str[1] points to a wrong memory. It's unsafe to access container elements while modifying the container.

The algorithm std::remove is declared the following way
template<class ForwardIterator, class T>
ForwardIterator remove(
ForwardIterator first, ForwardIterator last,
const T& value // <-- reference!
);
As you can see the third parameter is declared as a reference const T& value.
So in this call
str.erase(remove(str.begin(), str.end(), str[1]), str.end());
the third argument is a reference to the object str[1] the value of which is changed within the algorithm to the letter 'm' when the first character ' ' is encountered..
If you would write for example
str.erase(remove(str.begin(), str.end(), str[1] + 0), str.end());
you would get the expected result because in this case the reference refers to a temporary object.

Related

Remove duplicate values from a vector<vector<string>>, not case sensitive? c++

Hi i'm trying to remove duplicate values from my vector.
It is set up as a vector. This vector contains a list of vectors, in each of the interior vectors is 3 strings.
I tried:
removeCopies.erase( unique(removeCopies.begin(), removeCopies.end() ), removeCopies.end());
but it still leaves some strings inside the interior vector like:
mainVector: {
interiorVector1: string 1: "book", string 2: "noun", string3: "A book"
interiorVector2: string 1: "book", string 2: "noun", string3: "a BOok"
}
I also can't just change it all to lowercase, I can't edit the values inside of the vector.
If you need a better explanation, please ask. Thank you.
edit:
I tried
unique(stringVec.begin(), stringVec.end(), [](const string &a, const string
&b) { return lowercase(a) == lowercase(b); }), stringVec.end()
where lowercase() turns the entire string to lowercase. But it doesn't allow me to access the interior vector strings to do this.
Just like std::sort, std::unique accepts a BinaryPredicate that in the case of unique is used for equals-comparison:
template< class ForwardIt, class BinaryPredicate >
constexpr ForwardIt unique( ForwardIt first, ForwardIt last, BinaryPredicate p );
If you provide a case-insensitive predicate, then it should work just fine. I recommend boost::iequals if you don't want to re-invent the wheel.
The following code won't work for your specific nested-vector example, but if you only had strings inside the vector the resultant code would look something like:
removeCopies.erase(std::unique(begin(removeCopies), end(removeCopies), boost::iequals), end(removeCopies));
In your case, you'll probably want to write a custom lambda that internally uses iequals to perform element-wise comparison.
Edit: Here is a discount-version of iequals:
bool iequals(const std::string& lhs, const std::string& rhs)
{
if (lhs.size() != rhs.size())
return false;
for(size_t i = 0; i < lhs.size(); ++i)
{
if (std::tolower(lhs[i]) != std::tolower(rhs[i]))
return false;
}
return true;
}

lambda to take `char` argument

How I should define lambda to take char from string iterator? In the code below lambda detect_bracket has problem with input parameter x.
I don't want to delete ALL brackets from the string, just at the beginning and at the end.
auto detect_bracket = [](char* x){ return(')' == x || '(' == x);};
this->str.erase(std::remove_if(str.begin(), str.begin(),
detect_bracket)
);
this->str.erase(std::remove_if(str.back(), str.back(),
detect_bracket)
);
You should take char as the parameter type of the lambda with std::remove_if, since the signature of the predicate function is supposed to check the element directly.
auto detect_bracket = [](char x){ return(')' == x || '(' == x);};
this->str.erase(std::remove_if(str.begin(), str.end(),
detect_bracket)
);
Note std::string::back() won't work with std::remove_if. It will return a char and std::remove_if expects a range expressed by iterator.
And str.begin(), str.begin() is just an empty range, if you just want to remove element at the begin and end, you could
auto detect_bracket = [](char x){ return(')' == x || '(' == x);};
if (!this->str.empty()) {
this->str.erase(std::remove_if(str.begin(), str.begin() + 1, detect_bracket), str.begin() + 1);
}
if (!this->str.empty()) {
this->str.erase(std::remove_if(str.end() - 1, str.end(), detect_bracket), str.end());
}
Note we need to specify the correct end iterator for std::string::erase, because std::remove_if will return an iterator even if it found nothing, and then the char will be erased wrongly.
LIVE
std::remove_if is a function with the following signature:
template< class ForwardIt, class UnaryPredicate >
ForwardIt remove_if( ForwardIt first, ForwardIt last, UnaryPredicate p );
p - unary predicate which returns ​true if the element should be removed.
The signature of the predicate function should be equivalent to the following:
bool pred(const Type &a);
The type Type must be such that an object of type ForwardIt can be
dereferenced and then implicitly converted to Type. ​
All you need is to change your function parameter from char* to char.
Multiple remove_if and erase calls are anyway modifying/invalidating the string. Why not simply create a new string, and conditionally assign source string from 0th location or 1st location? And then assign till the last or second last character, conditionally?
string target;
target.assign(source.begin() + skip_if_bracket_at_begin,
source.end() - skip_if_bracket_at_end);

How to adapt a string splitting algorithm using pointers so it uses iterators instead?

The code below comes from an answer to this question on string splitting. It uses pointers, and a comment on that answer suggested it could be adapted for std::string. How can I use the features of std::string to implement the same algorithm, for example using iterators?
#include <vector>
#include <string>
using namespace std;
vector<string> split(const char *str, char c = ',')
{
vector<string> result;
do
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}
Ok so I obviously replaced char by string but then I noticed he is using a pointer to the beginning of the character. Is that even possible for strings? How do the loop termination criteria change? Is there anything else I need to worry about when making this change?
You can use iterators instead of pointers. Iterators provide a way to traverse containers, and can usually be thought of as analogous to pointers.
In this case, you can use the begin() member function (or cbegin() if you don't need to modify the elements) of a std::string object to obtain an iterator that references the first character, and the end() (or cend()) member function to obtain an iterator for "one-past-the-end".
For the inner loop, your termination criterion is the same; you want to stop when you hit the delimiter on which you'll be splitting the string. For the outer loop, instead of comparing the character value against '\0', you can compare the iterator against the end iterator you already obtained from the end() member function. The rest of the algorithm is pretty similar; iterators work like pointers in terms of dereference and increment:
std::vector<std::string> split(const std::string& str, const char delim = ',') {
std::vector<std::string> result;
auto end = str.cend();
auto iter = str.cbegin();
while (iter != end) {
auto begin = iter;
while (iter != end && *iter != delim) ++iter;
result.push_back(std::string(begin, iter));
if (iter != end) ++iter; // See note (**) below.
}
return result;
}
Note the subtle difference in the inner loop condition: it now tests whether we've hit the end before trying to dereference. This is because we can't dereference an iterator that points to the end of a container, so we must check this before trying to dereference. The original algorithm assumes that a null character ends the string, so we're ok to dereference a pointer to that position.
(**) The validity of iter++ != end when iter is already end is under discussion in Are end+1 iterators for std::string allowed?
I've added this if statement to the original algorithm to break the loop when iter reaches end in the inner loop. This avoids adding one to an iterator which is already the end iterator, and avoids the potential problem.

String replacement in C++ on string of arbitrary length

I have a string I get from ostringstream. I'm currently trying to replace some characters in this string (content.replace(content.begin(), content.end(), "\n", "");) but sometimes I get an exception:
malloc: *** mach_vm_map(size=4294955008) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
std::bad_alloc
I suspect that this happens because the string is too big. What's the best practice for these situations? Declare the string on the heap?
Update
My full method:
xml_node HTMLDocument::content() const {
xml_node html = this->doc.first_child();
xml_node body = html.child("body");
xml_node section = body.child("section");
std::ostringstream oss;
if (section.type() != xml_node_type::node_null) {
section.print(oss);
} else {
body.print(oss);
}
string content;
content = oss.str();
content.replace(content.begin(), content.end(), "<section />", "<section></section>");
content.replace(content.begin(), content.end(), "\t", "");
xml_node node;
return node;
}
There is no std::string::replace member function's overload that accepts a pair of iterators, a const char* to be searched for and const char* to be used as replacement, and this is where your problem comes from:
content.replace(content.begin(), content.end(), "\n", "");
matches the following overload:
template <class InputIterator>
string& replace(iterator i1, iterator i2,
InputIterator first, InputIterator last);
that is, "\n" and "" is treated as the range <first; last), which, depending on what addresses do they have, crashes your program or not.
You have to either use std::regex or implement your own logic that iterates through std::string and replaces any encountered pattern with a replacement string.
The lines:
content.replace(content.begin(), content.end(), "<section />", "<section></section>");
content.replace(content.begin(), content.end(), "\t", "");
result in undefined behavior. They match the function:
template<class InputIterator>
std::string& std::string::replace(
const_iterator i1, const_iterator i2,
InputIterator j1, InputIterator j2);
with InputIterator resolving to char const*. The problem is
that the distance between the two iterators, and whether the
second can be reached from the first, is undefined, since they
point to totally unrelated bits of memory.
From your code, I don't think you understand what
std::string::replace does. It replaces the range [i1,i2) in
the string with the text defined by the range [j1,j2). It
does not do any search and comparison; it is for use after
you have found the range which needs replacing. Calling:
content.replace(content.begin(), content.end(), "<section />", "<section></section>");
has exactly the same effect as:
content = std::string( "<section />", "<section></section>");
, which is certainly not what you want.
In C++11, there's a regex_replace function which may be of
some use, although if you're really doing this on very large
strings, it may not be the most performant (the added
flexibility of regular expressions comes at a price); I'd
probably use something like:
std::string
searchAndReplace(
std::string const& original,
std::string const& from,
std::string const& to)
{
std::string results;
std::string::const_iterator current = original.begin();
std::string::const_iterator end = original.end();
std::string::const_iterator next = std::search( current, end, from.begin(), from.end() );
while ( next != end ) {
results.append( current, next );
results.append( to );
current = next + from.size();
next = std::search( current, end, from.begin(), from.end() );
}
results.append( current, next );
return results;
}
For very large strings, some heuristic for guessing the size,
and then doing a reserve on results is probably a good idea
as well.
Finally, since your second line just removes '\t', you'd be
better off using std::remove:
content.erase( std::remove( content.begin(), content.end(), '\t' ), content.end() );
AFAIK stl strings are always allocated on the heap if they go over a certain (small) size, eg 32 chars in Visual Studio
What you can do if you get allocation exceptions:
Use a custom allocator
Use a "rope" class.
Bad alloc might not mean you're run out of memory, more likely that you're run out of contiguous memory. A rope class might be better suited to you as it allocated strings in pieces internally.
This is one of the correct (and reasonably efficient) ways to remove characters from a string if you want to make a copy and leave the original intact:
#include <algorithm>
#include <string>
std::string delete_char(std::string src, char to_remove)
{
// note: src is a copy so we can mutate it
// move all offending characters to the end and get the iterator to last good char + 1
auto begin_junk = std::remove_if(src.begin(),
src.end(),
[&to_remove](const char c) { return c == to_remove; });
// chop off all the characters we wanted to remove
src.erase(begin_junk,
src.end());
// move the string back to the caller's result
return std::move(src);
}
called like this:
std::string src("a\nb\bc");
auto dest = delete_char(src, '\n');
assert(dest == "abc");
If you'd prefer to modify the string in place then simply:
src.erase(std::remove_if(src.begin(), src.end(), [](char c) { return c == '\n'; }), src.end());

Replace multiple spaces with one space in a string

How would I do something in c++ similar to the following code:
//Lang: Java
string.replaceAll(" ", " ");
This code-snippet would replace all multiple spaces in a string with a single space.
bool BothAreSpaces(char lhs, char rhs) { return (lhs == rhs) && (lhs == ' '); }
std::string::iterator new_end = std::unique(str.begin(), str.end(), BothAreSpaces);
str.erase(new_end, str.end());
How this works. The std::unique has two forms. The first form goes through a range and removes adjacent duplicates. So the string "abbaaabbbb" becomes "abab". The second form, which I used, takes a predicate which should take two elements and return true if they should be considered duplicates. The function I wrote, BothAreSpaces, serves this purpose. It determines exactly what it's name implies, that both of it's parameters are spaces. So when combined with std::unique, duplicate adjacent spaces are removed.
Just like std::remove and remove_if, std::unique doesn't actually make the container smaller, it just moves elements at the end closer to the beginning. It returns an iterator to the new end of range so you can use that to call the erase function, which is a member function of the string class.
Breaking it down, the erase function takes two parameters, a begin and an end iterator for a range to erase. For it's first parameter I'm passing the return value of std::unique, because that's where I want to start erasing. For it's second parameter, I am passing the string's end iterator.
So, I tried a way with std::remove_if & lambda expressions - though it seems still in my eyes easier to follow than above code, it doesn't have that "wow neat, didn't realize you could do that" thing to it.. Anyways I still post it, if only for learning purposes:
bool prev(false);
char rem(' ');
auto iter = std::remove_if(str.begin(), str.end(), [&] (char c) -> bool {
if (c == rem && prev) {
return true;
}
prev = (c == rem);
return false;
});
in.erase(iter, in.end());
EDIT realized that std::remove_if returns an iterator which can be used.. removed unnecessary code.
A variant of Benjamin Lindley's answer that uses a lambda expression to make things cleaner:
std::string::iterator new_end =
std::unique(str.begin(), str.end(),
[=](char lhs, char rhs){ return (lhs == rhs) && (lhs == ' '); }
);
str.erase(new_end, str.end());
Why not use a regular expression:
boost::regex_replace(str, boost::regex("[' ']{2,}"), " ");
how about isspace(lhs) && isspace(rhs) to handle all types of whitespace