String iterators and understanding a function

String iterators and understanding a function - c++

bool e_broj(const string &s){
string::const_iterator it = s.begin();
while(it != s.end() && isdigit(*it)){
++it;
}
return !s.empty() && it == s.end();
}
I have this function to check if a string is a number. I found this snippet online and I would like to understand how it works.
// this declares it as the beginning of the string (iterator)
string::const_iterator it = s.begin();
// this checks until the end of the string and
// checks if each character of the iterator is a digit?
while(it != s.end() && isdigit(*it)){
// this line increases the iterator for next
// character after checking the previous character?
++it;
// this line returns true (is number) if the iterator
// came to the end of the string and the string is empty?
return !s.empty() && it == s.end();

Your understanding is almost right. The only mistake was at the end:
// this line returns true (is number) if the iterator
// came to the end of the string and the string is empty?
return !s.empty() && it == s.end();
This should say "and the string is not empty", because the expression is !s.empty(), rather than just s.empty().
You may just have worded this funny, but to be clear, the condition on the while loop will keep the iterator moving through the string while it's not at the end and while the characters are still digits.
Your terminology with regards to the iterator makes me think you don't quite understand fully what it's doing. You can think of an iterator as being like a pointer (actually, pointers are iterators, but not necessarily vice-versa). The first line gives you an iterator that "points at" the first character in the string. Doing it++ moves the iterator to the next character. s.end() gives an iterator that points one past the end of the string (this is a valid iterator). *it gives you the character that the iterator is "pointing at".

The while loop stops at the of string OR when a non-digit shows up.
So, if we did not advance all the way to the end (it != s.end()), then the string has non-digit and therefore is not a number.
Empty string is a special case: it has no non-digits but it's not a number either.

Related

Bounds of std::string::find_first_of

Suppose I have a string foo and I want to search for the second period, if any.
I'm using this code:
std::size_t start = foo.find_first_of('.');
if (start != std::string::npos){
std::size_t next = foo.find_first_of('.', start + 1);
/*and so on*/
I'm wondering if this is well-defined if the first period is at the end of the string.
I think it is since start + 1 will be on the null-terminator, so I'm not in any danger of accessing any memory I shouldn't.
Am I correct?

If the first dot is at the end of the string, it's at index size() - 1.
So then start + 1 == size(), meaning that find_first_of will look in the interval [size(), size()). This is an empty interval, so no memory accesses will be made at all.

There may well not be a null-terminator at that point. (The standard does not guarantee it: c_str() is required to add one if necessary).
But your code is fine in any case. The behaviour on setting a pointer to point to 1-past-an-array is well-defined, so it's permissible to call the function with start + 1 is start is the last character in your string. Internally, a dereference of that pointer will not take place you're outside the region that find_first_of will search.

The C++ Standard does not impose any restriction on the value of the second parameter.
The function tries to calculate an actual position xpos the following way
pos <= xpos and xpos < size()
If it is unable to find such a velue it returns std::string::npos
For example
std::string s( "AB" );
auto pos = s.find_first_of( "A", std::string::npos );
if ( pos == std::string::npos ) std::cout << "Not found" << std::endl;
The output is
Not found

Removing Special Characters from C++ String (Except ' and - ) [duplicate]

This question already has answers here:
How to remove certain characters from a string in C++?
(15 answers)
Closed 3 years ago.
I'm trying to remove special characters from a string using an isWordChar() method. However, I need to keep two special characters, " ' " and " - ", such as the apostrophe in "isn't" and the hyphens in mother-in-law. Here's what I'm trying to implement:
std::string WordCount::stripWord(std::string word) {
for(unsigned int i = 0; i < wrd.size(); ++i)
{
if( !isWordChar(wrd[i]) && (wrd[i]!=39 && wrd[i]!=45))
{
wrd.erase(wrd.begin()+i);
--i;
}
}
return wrd;
}
After adding the special cases in my boolean, I can't get seem to correctly add the exception. Any hints or advice? Thanks!

I would use the remove/erase idiom:
word.erase(std::remove_if(word.begin(),
word.end(),
[](char c) {
return !(isWordChar(c) || '-' == c || '\'' == c);
}), word.end());
The way you're erasing characters has complexity of approximately O(N * M) (where N is the original length of the string and M is the number of characters you remove). This has a complexity of approximately O(N), so if you're removing very many characters (or the string is very long) it's likely to give a substantial speed improvement.
If you care about why it's so much faster, it's because it works somewhat differently. To be specific, when you erase an element from the middle of a string, the erase function immediately copies all the letters after that to fill the hole where you erased the character. If you do this M times, all those characters get copied one for each character you remove.
When you use remove_if, it does something more like this:
template <class Iter, class F>
Iter remove_if(Iter b, iter e, F f)
auto dest = word.begin();
for (auto src=word.begin(); src != word.end(); ++src)
if (!f(*src))
*dst++ = *src;
++src;
}
return dst;
}
This way, each character that's retained is only copied once, rather than being copied every time you remove one character from the string. Then when you do the final erase, it just removes characters from the end of the string, so it's basically just adjusting the length of the string downward.

Your logic is incorrect. It should be: !isWordChar(wrd[i]) && wrd[i] != 39 && wrd[i] != 45. Read as: If the character isn't a word character, and it's not an apostrophe, and it's not a hyphen, do whatever is in the if-statement.

Removing consecutive duplicate characters from a std::string

I'm currently trying to remove duplicate characters. For example:
maaaaaaa becomes ma
aaaaassssdddddd becomes asd
I have written the following piece of code:
string.erase(remove(string.find_first_of(string[i]) + 1, string.end(), string[i]), string.end());
but apparently std::string returns a pointer to the last + 1 character of the string, rather than the size, any ideas how I could remove string[i] from my string starting from the position next to that char?

string.find_first_of returns an integer position (and string::npos if not found). This is not compatible withstd::remove, which expects iterators. You can convert from a position to an iterator by adding the position to the begin iterator.
char to_remove = string[i];
auto beg = string.begin() + string.find_first_of(to_remove) + 1;
auto new_end = std::remove(beg, string.end(), to_remove);
string.erase(new_end, string.end());

C++: Find method

so if i were to enter patricia(don't worry im converting it toupper) that string would be loaded into my vector.
My question is about the find functions. i am counting down characters correct? so if i were to enter patricia and j would be on ABBOT, PATRICIA the value in comma would be 5. Ok im good so far, but what happens in my found variable?
bool NameSearch::findFirstNames(vector<string> &vsFirst, string name)
{
int j = 0;
bool bName = false;
vsFirst.clear();
while(j < total)
{
int comma;
comma = names[j].find(',');
//Confused here
int found = names[j].find(name, comma);
if(found > -1)
{
vsFirst.push_back(names[j]);
bName = true;
}
j++;
}
return bName;
}

The if (found > -1) test probably works on your platform but is technically dubious.
The return type of std::string::find() is std::string::size_type, and if the substring you're searching is not found, the returned value is std::string::npos (on the other hand, if the substring is found, the returned value is the character index of its first occurrence).
Now the std::string::npos value happens to be the greatest possible value of type std::string::size_type, and if that is unsigned int on your implementation, that means the comparison with the signed int -1 will yield true.
However, no assumptions can be made in general on the type of std::string::size_type Thus, I suggest to rewrite the test as:
if (found != std::string::npos)
{
...
}

This is misleading code. std::string::find() returns a size_t, not an int.
int comma;
comma = names[j].find(',');
This is misleading code. When std::string::find() fails, it returns std::string::npos, not -1. In your environment, it's equivalent to -1 by coincidence.
if(found > -1)
The if statement is effectively trying to check "if a result was found" by making sure it isn't std::string::npos.

There are two other answers, that point out what is wrong with this code, but I feel like none of them explains to you, what the author was doing, and that's the explanation you want. :)
Let's look at the following snippet first.
int comma;
comma = names[j].find(',');
As pointed out, it should be rewritten as:
size_t comma;
comma = names[j].find(',');
There are 4 overloads of the find method in the std::string
The code above uses this one:
size_t find (char c, size_t pos = 0) const;
It returns the index, at which the character passed as the first argument (in your case it's ',') appears in the string or std::string::npos if that character isn't found. Apparently the author is sure the ',' character must be present in the string names[j] and doesn't check the result.
In the line:
int found = names[j].find(name, comma);
which again should be rewritten as:
size_t found = names[j].find(name, comma);
the following overload of the find method is used:
size_t find (const string& str, size_t pos = 0) const;
This one searches the string names[j] for the first occurrence of the string passed as the first argument (in your case name) and returns the index at which the match starts if there is a match or std::string::npos otherwise.
As you can see, both mentioned overloads of the find method have a second parameter with default value of 0. This second parameter allows a user to specify, at what index to start the search in the searched string (in your case names[j])
The call:
comma = names[j].find(',');
is equivalent to the call:
comma = names[j].find(',', 0);
and it means: look for the character ',' int the string names[j] starting from the beginning and return the index of the first occurrence of that character or std::string::npos, if there is no such character in that string.
The call:
size_t found = names[j].find(name, comma);
means: look for the substring equal to name in the string names[j], but start from the position where the comma was found and return the index of the first occurrence of that substring or std::string::npos if there is no such substring in that string, after the comma.
Maybe comma_position instead of comma would have been a better name for the variable.

How to use the penultimate position of an iterator in c++

I have the following piece of code which helps me to write a bunch of values into a comma separated file format. My problem is, that I do not want a comma after the last element written to normcsv. How can I use beg in an If clause of the kind:
if(beg == penultimate element)
then.... bla bla...
Everything I tried out ended up with the iterator being mad invalid
ReadLine.erase(0,17);
int offsets[] = {8,8,8,8,8,8};
boost::offset_separator f(offsets, offsets+6);
boost::tokenizer<boost::offset_separator> RVBEARline(ReadLine,f);
boost::tokenizer<boost::offset_separator>::iterator beg;
for( beg=RVBEARline.begin(); beg!=RVBEARline.end();++beg )
{
copy=*beg;
boost::trim(copy);
if(copy.compare(0,1,".")==0)
{
copy.insert(0,"0");
}
normcsv << copy <<",";
}

Instead of printing the comma after the element except during the last iteration, print it before the element except during the first iteration. For that, you can use if(beg != RVBEARline.begin()).

An alternative to ruakh's "first plus rest" approach, you can do with one less local variable by using a loop-and-a-half construct:
{
auto it = x.begin(), end = x.end();
if (it != end)
{
for ( ; ; )
{
process(*it);
if (++it == end) break;
print_delimiter();
}
}
}
Here x.begin() and x.end() are only called once. There is one mandatory comparison per loop round, the minimum possible. The check for emptiness is hoisted outside.

Couldn't you just always remove the last character since you know it will be an extraneous comma?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String iterators and understanding a function - c++

The while loop stops at the of string OR when a non-digit shows up. So, if we did not advance all the way to the end (it != s.end()), then the string has non-digit and therefore is not a number. Empty string is a special case: it has no non-digits but it's not a number either.

Related

Bounds of std::string::find_first_of

Removing Special Characters from C++ String (Except ' and - ) [duplicate]

Removing consecutive duplicate characters from a std::string

C++: Find method

How to use the penultimate position of an iterator in c++

Categories

Resources