C++ removing punctuation on strings, erase()/iterator issue - c++

I know I'm not the first person to bring up the issue with reverse iterators trying to call the erase() method on strings. However, I wasn't able to find any good ways around this.
I'm reading the contents of a file, which contains a bunch of words. When I read in a word, I want to pass it to a function I have called stripPunct. However, I ONLY want to strip punctuation at the beginning and end of a string, not in the middle.
So for instance:
(word) should strip '(' and ')' resulting in just word
don't! should strip '!' resulting in just don't
So my logic (which I'm sure could be improved) was to have two while loops, one starting at the end and one at the beginning, traversing and erasing until it hits a non-punctuation char.
void stripPunct(string & str) {
string::iterator itr1 = str.begin();
string::reverse_iterator itr2 = str.rbegin();
while ( ispunct(*itr1) ) {
str.erase(itr1);
itr1++;
}
while ( ispunct(*itr2) ) {
str.erase(itr2);
itr2--;
}
}
However, obviously it's not working because erase() requires a regular iterator and not a reverse_iterator. But anyways, I feel like that logic is pretty inefficient.
Also, I tried instead of a reverse_iterator using just a regular iterator, starting it at str.end(), then decremented it, but it says I cannot dereference the iterator if I start it at str.end().
Can anyone help me with a good way to do this? Or maybe point out a workaround for what I already have?
Thank you so much in advance!
------------------ [ EDIT ] ----------------------------
found a solution, although it may not be the best solution:
// Call the stripPunct method:
stripPunct(str);
if ( !str.empty() ) { // make sure string is still valid
// perform other code
}
And here is the stripPunct method:
void stripPunct(string & str) {
string::iterator itr1 = str.begin();
string::iterator itr2 = str.end();
while ( !(str.empty()) && ispunct(*itr1) )
itr1 = str.erase(itr1);
itr2--;
if ( itr2 != str.begin() ) {
while ( !(str.empty()) && ispunct(*itr2) ) {
itr2 = str.erase(itr2);
itr2--;
}
}
}

First, note a couple problems with your code:
after you call erase() using itr1, you've invalidated itr2.
when using a reverse_iterator to go backwards through a sequence, you want to use ++, not -- (that's kind of the reason reverse iterators exist).
Now, to improve the logic, you can avoid erasing each character individually by finding the first charater you don't want to erase and erase everything up to that point. find_if() can be used to help with that:
int not_punct(char c) {
return !ispunct((unsigned char) c);
}
void stripPunct(string & str) {
string::iterator itr = find_if( str.begin(), str.end(), not_punct);
str.erase( str.begin(), itr);
string::reverse_iterator ritr = find_if( str.rbegin(), str.rend(), not_punct);
str.erase( ritr.base(), str.end());
}
Note that I've used base() to get the 'regular' iterator corresponding to the reverse_iterator. I find the logic for whether base() needs to be adjusted confusing (reverse iterators in general confuse me)- in this case it doesn't because we happen to want to start the erase after the character that's found.
This article by Scott Meyers, http://drdobbs.com/cpp/184401406, has a good treatment of reverse_iterator::base() in the section. "Guideline 3: Understand How to Use a reverse_iterator's Base iterator". The information in that article has also been incorporated into Meyer's "Effective STL" book.

You can't dereference iterator::end() because it points to invalid memory (memory right after the end of the array), so you have to decrement it first.
And one final note: if the word consists only of punctuations, your program will fail, be sure to handle that.

If you don't mind negative logic, you can do the following:
string tmp_str="";
tmp_str.reserve(str.length());
for (string::iterator itr1 = str.begin(); itr1 != str.end(); itr1++)
{
if (!ispunct(*itr1))
{
tmp_str.push_back(*itr1);
}
}
str = tmp_str;

Related

Trouble using iterator on a list of objects

So I have a class called symbol, which is made up of 4 strings which are all public. I created a list of these and I want to do a look ahead on this list. This is what I have so far. I looked up the iterator methods and it says it supports the + operator but I get an error for this.
bool parser::parse(list<symbol> myList){
//Will read tokens by type to make sure that they pass the parse
std::list<symbol>::const_iterator lookAhead = myList.begin();
if ((lookAhead + 1) != myList.end)
lookAhead++;
for (std::list<symbol>::const_iterator it = myList.begin(); it != myList.end(); ++it){
if (it->type == "") {
}
}
return true;
}
I get an error when trying to add 1 to lookAhead. What are some good ways of creating a look ahead for a list?
Thanks,
Binx
A linked list does not support random access iterators, i.e. you cannot add an integer to its iterators.
Use std::next(lookAhead) to get the next iterator instead, or std::advance(lookAhead, 1). These functions know what kind of iterator is being passed, and will use a random seek if possible (e.g. with std::vector's random-access iterators), or manually advance (with a loop in the case of std::advance()) otherwise, as in this case.
Be careful advancing on iterators unconditionally, though -- advancing past end() is undefined!
You can read more about the different categories of C++ iterators here.
Side note: You're copying the entire list when it's passed in, since you're passing it by value. You probably want to pass it by reference instead (list<symbol> const& myList). You can also simplify your code using the C++11 auto keyword, which deduces the type automatically from the type of the expression that initializes the variable:
bool parser::parse(list<symbol> const& myList){
// Will read tokens by type to make sure that they pass the parse
auto lookAhead = myList.begin();
if (lookAhead != myList.end() && std::next(lookAhead) != myList.end())
++lookAhead;
for (auto it = myList.begin(); it != myList.end(); ++it){
if (it->type == "") {
}
}
return true;
}

How to adapt a string splitting algorithm using pointers so it uses iterators instead?

The code below comes from an answer to this question on string splitting. It uses pointers, and a comment on that answer suggested it could be adapted for std::string. How can I use the features of std::string to implement the same algorithm, for example using iterators?
#include <vector>
#include <string>
using namespace std;
vector<string> split(const char *str, char c = ',')
{
vector<string> result;
do
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}
Ok so I obviously replaced char by string but then I noticed he is using a pointer to the beginning of the character. Is that even possible for strings? How do the loop termination criteria change? Is there anything else I need to worry about when making this change?
You can use iterators instead of pointers. Iterators provide a way to traverse containers, and can usually be thought of as analogous to pointers.
In this case, you can use the begin() member function (or cbegin() if you don't need to modify the elements) of a std::string object to obtain an iterator that references the first character, and the end() (or cend()) member function to obtain an iterator for "one-past-the-end".
For the inner loop, your termination criterion is the same; you want to stop when you hit the delimiter on which you'll be splitting the string. For the outer loop, instead of comparing the character value against '\0', you can compare the iterator against the end iterator you already obtained from the end() member function. The rest of the algorithm is pretty similar; iterators work like pointers in terms of dereference and increment:
std::vector<std::string> split(const std::string& str, const char delim = ',') {
std::vector<std::string> result;
auto end = str.cend();
auto iter = str.cbegin();
while (iter != end) {
auto begin = iter;
while (iter != end && *iter != delim) ++iter;
result.push_back(std::string(begin, iter));
if (iter != end) ++iter; // See note (**) below.
}
return result;
}
Note the subtle difference in the inner loop condition: it now tests whether we've hit the end before trying to dereference. This is because we can't dereference an iterator that points to the end of a container, so we must check this before trying to dereference. The original algorithm assumes that a null character ends the string, so we're ok to dereference a pointer to that position.
(**) The validity of iter++ != end when iter is already end is under discussion in Are end+1 iterators for std::string allowed?
I've added this if statement to the original algorithm to break the loop when iter reaches end in the inner loop. This avoids adding one to an iterator which is already the end iterator, and avoids the potential problem.

Check if the following position of a pointer is null or end of line C++

Well, I have an iterator over a string in C++
std::string::iterator itB = itC->begin();
And I want to check if the next position of the iterator reaches the end of the line. I've tried this:
if(*itB ++ != NULL)
Also this:
itB++ == itC->end()
But now I'm really confused and I need some help, since I'm a rookie at pointers in C++.
You want to check it without modifying it. Both of your attempts involve modifying itB. If you have C++11, that's just std::next:
std::next(itB) == itC->end()
If you don't, just make another iterator:
std::string::iterator next = itB;
++next;
next == itC->end()
Although, in this case specifically we know that std::string::iterator is a random access iterator, so we can also just do:
itB + 1 == itC->end()
(The previous two work for any iterator category)

Function with set<string> and iterator

this is my homework:
Write a function to prints all strings with a length of 3. Your
solution must use a for loop with iterators.
void print3(const set & str)
And this is my code:
void print3(const set<string>& str){
string st;
set<string,less<string>>::iterator iter;
for(iter=str.begin();iter!=str.end();++iter)
{st=*iter;
if(st.length()==3) cout<<st<<' ';
}
}
But I think it's not good. Do someone have a better code? Please, help me to improve it.
-I have another question about iterator
string name[]={"halohg","nui","ght","jiunji"};
set<string> nameSet(name,name+4);
set<string>::iterator iter;
iter=name.begin();
How can I access name[2]="ght" by using iterator?
I tried iter+2 but it has some problems. I think I have to use random access iterator but I don't know how to use it.
Please, help me. Thanks a lot!
Some thoughts on improvement:
You can get rid of string st; and just check if (iter->length() == 3).
Another improvement would be to use a const_iterator instead of an iterator, since you aren't modifying any of the items.
Also, adding less<string> as a template parameter is kind of useless, since that's the default compare functor anyway, so it can be removed.
And lastly, it's generally a good idea to declare your locals with minimal scope (so they don't pollute other scopes or introduce unexpected hiding issues), so usually you want to declare your iter in the for.
So it becomes:
for (set<string>::const_iterator iter = str.begin(); iter != str.end(); ++iter) {
if (iter->length() == 3) cout << *iter << ' ';
}
That's about as good as you can get, given your requirements.
As for your second question, set's iterator is not a random access iterator. It's a (constant) Bidirectional Iterator. You can use std::advance if you wanted, though, and do:
std::set<std::string>::iterator iter;
iter = name.begin();
std::advance(iter, 2);
// iter is now pointing to the second element
Just remember that set sorts its elements.

C++ STL: Trouble with iterators

I'm having a beginner problem:
bool _isPalindrome(const string& str)
{
return _isPalindrome(str.begin(), str.end()); // won't compile
}
bool _isPalindrome(string::iterator begin, string::iterator end)
{
return begin == end || *begin == *end && _isPalindrome(++begin, --end);
}
What am I doing wrong here? Why doesn't str.begin() get type checked to be a string::iterator?
Update: Better version:
bool BrittlePalindrome::_isPalindrome(string::const_iterator begin, string::const_iterator end)
{
return begin >= end || *begin == *(end - 1) && _isPalindrome(++begin, --end);
}
Assuming that you have a declaration of the second function before the first function, the main issue is that you are passing the strings by const reference.
This means that the only overloads of begin() and end() that you have access to are the const versions which return std::string::const_iterator and not std::string::iterator.
The convention for iterators is that the end iterator points one beyond the end of a range and is not dereferencable - certainly if you pass str.end() as the end parameter. This means that *begin == *end is not valid, you need to decrement end once first. You are also going to have an issue with ranges with odd numbers of elements. By doing ++begin and --end with no further checking your iterators may cross over in the recursion rather than triggering the begin == end condition.
Also note that for maximum portability, global identifiers shouldn't start with an underscore.
str.begin() is non-const, while the argument str is const.
You can either change the iterator-accepting method to accept const_iterators, or you can change the string-accepting method to accept a non-const string.
Or you could cast away str's const-ness, but that would be a patent Bad Idea TM.
(I would also parenthesize your return statement on the iterator-accepting method to make your intent more clear, but that's neither here nor there.)
As previously mentioned your iterators need to be constant iterators, but there's something else wrong with your algorithm. It works fine if you have a string of odd length, but do you see what happens when your string is even length? Consider the palindrome:
aa
Your algorithm will pass in an iterator pointing to the front and to the end. All's good, then it will go to the next level, and all will still be good, but it won't end. Because your first condition will never be true. You need to check not only if begin==end but if begin+1==end or begin==end-1 if you prefer. Otherwise you're iterators are going to be upset.
What error are you getting?
Have you tried this?
bool _isPalindrome(string::const_iterator begin, string::const_iterator end)
replace iterator by const_iterator
swap function definitions
decrement end
Code:
bool isPalindrome(string::const_iterator begin, string::const_iterator end)
{
return (begin == end || begin == --end ||
*begin == *end && isPalindrome(++begin, end));
}
bool isPalindrome(const string& str)
{
return isPalindrome(str.begin(), str.end());
}
You haven't declared the second function before calling it in the first function. The compiler can't find it and thus tries to convert str.begin() (string::iterator) into a const string &. You can move the first function behind the second function.