Bounds of std::string::find_first_of - c++

Suppose I have a string foo and I want to search for the second period, if any.
I'm using this code:
std::size_t start = foo.find_first_of('.');
if (start != std::string::npos){
std::size_t next = foo.find_first_of('.', start + 1);
/*and so on*/
I'm wondering if this is well-defined if the first period is at the end of the string.
I think it is since start + 1 will be on the null-terminator, so I'm not in any danger of accessing any memory I shouldn't.
Am I correct?

If the first dot is at the end of the string, it's at index size() - 1.
So then start + 1 == size(), meaning that find_first_of will look in the interval [size(), size()). This is an empty interval, so no memory accesses will be made at all.

There may well not be a null-terminator at that point. (The standard does not guarantee it: c_str() is required to add one if necessary).
But your code is fine in any case. The behaviour on setting a pointer to point to 1-past-an-array is well-defined, so it's permissible to call the function with start + 1 is start is the last character in your string. Internally, a dereference of that pointer will not take place you're outside the region that find_first_of will search.

The C++ Standard does not impose any restriction on the value of the second parameter.
The function tries to calculate an actual position xpos the following way
pos <= xpos and xpos < size()
If it is unable to find such a velue it returns std::string::npos
For example
std::string s( "AB" );
auto pos = s.find_first_of( "A", std::string::npos );
if ( pos == std::string::npos ) std::cout << "Not found" << std::endl;
The output is
Not found

Related

Why string constructor(s, pos) exception is "pos > s.size()" and not "pos >= s.size()"?

Sample below:
string s1 = "abcde";
string s2(s1, s1.size()); // s1.size() = 5.
Notice that s1.size() = 5 and the last allowable index = 4 (for character 'e'). The above runs fine returning empty string. Only when pos = 6 then it fail with exception out-of-range. Why?
According to cppereference site:
Exceptions
3) std::out_of_range if pos > other.size()
Shouldn't the correct exception be "if pos >= other.size()"?
Since C++11, s1[s1.size()] is required to work and will return a reference to the '\0' at the end of the string. Changing the '\0' to something else however leads to undefined behavior. You are however allowed to write '\0' there.
Notice that s1.size() = 5 and the last allowable index = 4 (for character 'e').
This is wrong. Since C++11, the last allowable character index is size(), not size()-1. std::basic_string's data buffer is now required to be null terminated, and accessing index size() (even on an empty string) is required to return a reference to a valid '\0' character.
The above runs fine returning empty string.
As it should be.
Only when pos = 6 then it fail with exception out-of-range.
Correct, because that really is out of bounds.

Does substr change the position where the find function starts searching?

Does substr change the position where the find function starts searching ?
I have a char * named search_text containing the following text:
ABC_NAME = 'XYZSomeone' AND ABC_CLASS = 'XYZSomething'
I want to display the "ABC_NAME" value from that string.
Here is what I am doing:
std::cout << std::string(search_text).substr ( 12, std::string( search_text ).find ("'", 13 )-1) << std::endl;
My logic in the above in the substr is as follows:
The ABC_NAME value always begins at the 12th character, so start the substring there.
Do a find for the character ' (single quotation mark) from the 13th character onwards, starting from the 13th character (the second argument of the find() function). The resulting number will be the outer bound of the substr.
However, my code prints out the following:
XYZSomeone' AND ABC_C
However, when I try to display the value of the find() function directly, I do get the correct number for the location of the second ' (single quotation mark)
std::cout << std::string( search_text ).find ("'", 13 ) << std::endl;
This prints out:
22
So why is it that the substr is not finding the value of 22 as its second argument ?
It's a rather simple matter to evaluate your expression by hand, seeing how you already verified the result of find:
std::string(search_text).substr ( 12, std::string( search_text ).find ("'", 13 )-1)
std::string("ABC_NAME = 'XYZSomeone' AND ABC_CLASS = 'XYZSomething'").substr ( 12, 22-1)
Now check the documentation for substr: "Returns a substring [pos, pos+count)". The character at position 12 is the 'X' for the name portion, and the character at position 12+21 = 33 is the 'L' from the class portion. So we expect the substring starting at that 'X' and going up to just before that 'L', which is "XYZSomeone' AND ABC_C". Check.
(It is understandable to forget whether substr takes a length or a position at which to end. Different languages do disagree on this. Hence the link to the documentation.)
Unsolicited commentary
Trying to do so much in one line makes your code harder to read and harder to debug. In this case, it also hurts performance. There is no need to convert search_text to a std::string twice.
std::string search_string{search_text};
std::size_t found = search_string.find('\'', 12);
if ( found != std::string::npos )
found -= 12;
std::cout << search_string.substr(12, found) << std::endl;
This cuts the number of times a string is constructed (hence the times the string data is copied) from three to two.
If you are using C++17, you can improve the performance even more by constructing no strings. Just use std::string_view instead of std::string. For this scenario, it has the same member functions taking the same parameters; all you have to change is the type of search_string. This puts the performance on par with C code.
Even better: since string views are so cheap to create, you could even write your code – without a performance hit – so that it doesn't matter whether substr takes a length or takes the past-the-end position.
std::string_view search_string{search_text};
std::string_view ltrimmed = search_string.substr(12);
std::size_t found = ltrimmed.find('\'');
std::cout << ltrimmed.substr(0, found) << std::endl;
Constructive laziness FTW!

Find substring between two indices in C++

I want to find the substring between two indices. The substr(start_index, number_of_characters) function in C++ returns substring based on number of characters. Hence a small hack to use it with start and end indices is as follows:
// extract 'go' from 'iamgoodhere'
string s = "iamgoodhere";
int start = 3, end = 4;
cout<<s.substr(start,end-start+1); // go
What other methods exist in C++ to get the substring between two indices?
You can do this:
std::string(&s[start], &s[end+1])
or this:
std::string(s.c_str() + start, s.c_str() + end + 1)
or this:
std::string(s.begin() + start, s.begin() + end + 1)
These approaches require that end is less than s.size(), whereas substr() does not require that.
Don't complain about the +1--ranges in C++ are always specified as inclusive begin and exclusive end.
In addition to John Zwinck's answer you can use substr in combination with std::distance:
auto size = std::distance(itStart, itEnd);
std::string newStr = myStr.subStr(itStart, size);
Here is one solution,
std::string distance_finder(std::string str, int start, int end)
{
return str.substr(start, end - start);
}
Though, end always has to be grater than start.

C++: Find method

so if i were to enter patricia(don't worry im converting it toupper) that string would be loaded into my vector.
My question is about the find functions. i am counting down characters correct? so if i were to enter patricia and j would be on ABBOT, PATRICIA the value in comma would be 5. Ok im good so far, but what happens in my found variable?
bool NameSearch::findFirstNames(vector<string> &vsFirst, string name)
{
int j = 0;
bool bName = false;
vsFirst.clear();
while(j < total)
{
int comma;
comma = names[j].find(',');
//Confused here
int found = names[j].find(name, comma);
if(found > -1)
{
vsFirst.push_back(names[j]);
bName = true;
}
j++;
}
return bName;
}
The if (found > -1) test probably works on your platform but is technically dubious.
The return type of std::string::find() is std::string::size_type, and if the substring you're searching is not found, the returned value is std::string::npos (on the other hand, if the substring is found, the returned value is the character index of its first occurrence).
Now the std::string::npos value happens to be the greatest possible value of type std::string::size_type, and if that is unsigned int on your implementation, that means the comparison with the signed int -1 will yield true.
However, no assumptions can be made in general on the type of std::string::size_type Thus, I suggest to rewrite the test as:
if (found != std::string::npos)
{
...
}
This is misleading code. std::string::find() returns a size_t, not an int.
int comma;
comma = names[j].find(',');
This is misleading code. When std::string::find() fails, it returns std::string::npos, not -1. In your environment, it's equivalent to -1 by coincidence.
if(found > -1)
The if statement is effectively trying to check "if a result was found" by making sure it isn't std::string::npos.
There are two other answers, that point out what is wrong with this code, but I feel like none of them explains to you, what the author was doing, and that's the explanation you want. :)
Let's look at the following snippet first.
int comma;
comma = names[j].find(',');
As pointed out, it should be rewritten as:
size_t comma;
comma = names[j].find(',');
There are 4 overloads of the find method in the std::string
The code above uses this one:
size_t find (char c, size_t pos = 0) const;
It returns the index, at which the character passed as the first argument (in your case it's ',') appears in the string or std::string::npos if that character isn't found. Apparently the author is sure the ',' character must be present in the string names[j] and doesn't check the result.
In the line:
int found = names[j].find(name, comma);
which again should be rewritten as:
size_t found = names[j].find(name, comma);
the following overload of the find method is used:
size_t find (const string& str, size_t pos = 0) const;
This one searches the string names[j] for the first occurrence of the string passed as the first argument (in your case name) and returns the index at which the match starts if there is a match or std::string::npos otherwise.
As you can see, both mentioned overloads of the find method have a second parameter with default value of 0. This second parameter allows a user to specify, at what index to start the search in the searched string (in your case names[j])
The call:
comma = names[j].find(',');
is equivalent to the call:
comma = names[j].find(',', 0);
and it means: look for the character ',' int the string names[j] starting from the beginning and return the index of the first occurrence of that character or std::string::npos, if there is no such character in that string.
The call:
size_t found = names[j].find(name, comma);
means: look for the substring equal to name in the string names[j], but start from the position where the comma was found and return the index of the first occurrence of that substring or std::string::npos if there is no such substring in that string, after the comma.
Maybe comma_position instead of comma would have been a better name for the variable.

How to use the penultimate position of an iterator in c++

I have the following piece of code which helps me to write a bunch of values into a comma separated file format. My problem is, that I do not want a comma after the last element written to normcsv. How can I use beg in an If clause of the kind:
if(beg == penultimate element)
then.... bla bla...
Everything I tried out ended up with the iterator being mad invalid
ReadLine.erase(0,17);
int offsets[] = {8,8,8,8,8,8};
boost::offset_separator f(offsets, offsets+6);
boost::tokenizer<boost::offset_separator> RVBEARline(ReadLine,f);
boost::tokenizer<boost::offset_separator>::iterator beg;
for( beg=RVBEARline.begin(); beg!=RVBEARline.end();++beg )
{
copy=*beg;
boost::trim(copy);
if(copy.compare(0,1,".")==0)
{
copy.insert(0,"0");
}
normcsv << copy <<",";
}
Instead of printing the comma after the element except during the last iteration, print it before the element except during the first iteration. For that, you can use if(beg != RVBEARline.begin()).
An alternative to ruakh's "first plus rest" approach, you can do with one less local variable by using a loop-and-a-half construct:
{
auto it = x.begin(), end = x.end();
if (it != end)
{
for ( ; ; )
{
process(*it);
if (++it == end) break;
print_delimiter();
}
}
}
Here x.begin() and x.end() are only called once. There is one mandatory comparison per loop round, the minimum possible. The check for emptiness is hoisted outside.
Couldn't you just always remove the last character since you know it will be an extraneous comma?