Check if input string has leading or trailing whitespaces in C++? - c++

I am trying to validate a single-line input string in C++11 to see if it contains any leading / trailing whitespaces. My code now looks like this:
bool is_valid(const std::string& s) {
auto start = s.begin();
auto end = s.end();
if (std::isspace(*start) || std::isspace(*end)) {
return false;
}
return true;
}
int main() {
std::string name{};
std::getline(std::cin, name);
if (!is_valid(name)) {
std::cout << "Invalid!";
}
return 0;
}
But now the program can only detect leading whitespaces. For example, for John it would print Invalid! but for Mary it would classify it as valid input, which is not. Does anyone know what's wrong with my program?

A simple test for std::string::front() and std::string::back() could have been done after testing for the empty string:
bool is_valid(const std::string& s)
{
return s.empty() ||
(!std::isspace(static_cast<unsigned char>(s.front())) &&
!std::isspace(static_cast<unsigned char>(s.back())));
}

The end iterator does not point to an element in the container. It points one past the last element. You may not dereference the end iterator. For a std::string you can use it's operator[]:
char last_char = s[s.size()-1];
advance the begin iterator:
auto it = s.begin() + s.size()-1;
char last_char = *it;
or decrement the end iterator:
auto it = s.end() -1;
char last_char = *it;
Other alternatives are back() or using the reverse iterator rbegin().
Note that they all require s.size() != 0. For an empty string s.begin() == s.end(). You should check that first in the function and return true for that case.

s.end() is one pass the end of the string just like any other containers in C++, so accessing it invokes undefined behavior. You need to use std::prev(s.end()) instead (which is valid only the string contains at least 1 character though, so you need to check the string length first)

.end is used to get an iterator to past the last element. You can use std::string::rbegin to get the last element.
auto end = s.rbegin();
NB: std::string::starts_with and std::string::ends_with are available from C++20.

Yes, .end() is to the past-the-end element. Then why not using .back() instead?
bool is_valid(std::string const& str) {
return str.empty() || !(std::isspace(str.front()) || std::isspace(str.back()));
}

Related

fastest way to read the last line of a string?

I'd like to know the fastest way for reading the last line in a std::string object.
Technically, the string after the last occurrence of \n in the fastest possible way?
This can be done using just string::find_last_of and string::substr like so
std::string get_last_line(const std::string &str)
{
auto position = str.find_last_of('\n');
if (position == std::string::npos)
return str;
else
return str.substr(position + 1);
}
see: example
I would probably use std::string::rfind and std::string::substr combined with guaranteed std::string::npos wrap around to be succinct:
inline std::string last_line_of(std::string const& s)
{
return s.substr(s.rfind('\n') + 1);
}
If s.rfind('\n') doesn't find anything it returns std::string::npos. The C++ standard says std::string::npos + 1 == 0. And returning s.substr(0) is always safe.
If s.rfind('\n') does find something then you want the substring starting from the next character. Again returning s.substr(s.size()) is safe according to the standard.
NOTE: In C++17 this method will benefit from guaranteed return value optimization so it should be super efficient.
I thought of a way that reads the string inversely (backwards) while storing what it reads
std::string get_last_line(const std::string &str)
{
size_t l = str.length();
std::string last_line_reversed, last_line;
for (--l; l > 0; --l)
{
char c = str.at(l);
if (c == '\n')
break;
last_line_reversed += c;
}
l = last_line_reversed.length();
size_t i = 0, y = l;
for (; i < l; ++i)
last_line += last_line_reversed[--y];
return last_line;
}
until it counters a '\n' character then reverse the stored string back and return it. If the target string is big and has a lot of new lines, this function would be very efficient.

Bug with Iterating over a string c++

so I have a function called split_alpha() that takes in a std::string and splits the string into words, using any non-alphaneumeric character as a delimiter. It also maps the words to their lower-cased versions.
vector<string> split_alpha(string to_split) {
vector<string> results;
string::iterator start = to_split.begin();
string::iterator it = start;
++it;
//get rid of any non-alphaneumeric chars at the front of the string
while (!isalnum(*start)) {
++start;
++it;
}
while (it != to_split.end()) {
if (!isalnum(*it)) {
string to_add = string(start, it);
lower_alpha(to_add);
results.push_back(to_add);
++it;
if (it == to_split.end()) { break; }
while (!isalnum(*it)) {
++it;
if (it == to_split.end()) { break; }
}
start = it;
++it;
}
else {
++it;
if (it == to_split.end()) { break; }
}
}
//adds the last word
string to_add = string(start, it);
lower_alpha(to_add);
results.push_back(to_add);
return results;
}
The function works fine 99% of the time, but when I give it the string "Sending query: “SELECT * FROM users”" (not including the quotations around the whole string), it does something really weird. It essentially goes into an infinite loop (within that while loop) and never finds the end of the string. Instead it keeps reading random characters/strings from somewhere?? My vector ends up with a size of about 200 before it finally segfaults. Anyone know what could be causing this? I tried printing out the string and it seems perfectly fine. Once again, the code works on every other string I've tried.
Thanks!!
isn't the while loop doing that?
Yes, but you can have several ++it triggering before the while loop check, and in any one of those cases the iterator could already be at the end of the string. Most likely the other strings you tried did not cause a failure because they all end with an alphanumeric character.
Invert the order of the ++it and the check:
if (it == to_split.end()) { break; }
++it;
Explanation: the following assert will fail, as the iterator will no longer be pointing to the end of the string (but one character further):
if (it == to_split.end())
{
++it;
assert(it == to_split.end());
}
Since the origin of the bug in your function has been pointed out already, may I suggest slightly different approach to your word splitting, using regex:
#include <iostream>
#include <regex>
#include <vector>
#include <string>
#include <cctype>
std::vector<std::string> split_alpha(std::string str)
{
std::regex RE{ "([a-zA-Z0-9]+)" }; // isalnum equivalent
std::vector<std::string> result;
// find every word
for (std::smatch matches; std::regex_search(str, matches, RE); str = matches.suffix())
{
//push word to the vector
result.push_back(matches[1].str());
//transform to lower
for (char &c : result[result.size() - 1])
c = std::tolower(c);
}
return result;
}
int main()
{
// test the function
for (auto &word : split_alpha("Sending query: “SELECT * FROM users”"))
std::cout << word << std::endl;
return 0;
}
Result:
sending
query
select
from
users

What design pattern should I use for a function that parses HTML attributes? Is this a job for Regex?

I'm wondering if you guys can help me start this out. I have a function that is defined as follows:
bool HtmlProcessor::_hasNextAttribute(std::string::iterator & it1, const std::string::iterator & it2, const std::pair<std::string, std::string> attrHolder)
{
/* Parses the first HTML attributes in the iterator range [it1, it2), adding them to attrHolder; eg.
"class="myClass1 myClass2" id="myId" onsubmit = "myFunction()""
---------- _hasNextAttribute -------->
attrHolder = ("class", "myClass1 myClass2")
When the function terminates, it1 will be the iterator to the last character parsed, will be equal to
it2 if no characters were parsed.
*/
}
In other words, it looks for the first pattern of
[someString][possibleWhiteSpace]=[possibleWhiteSpace][quotationMark][someOtherString][quotationMark]
and puts that in a pair (someString, someOtherString).
What sort of algorithm should I be using to do this elegantly?
Bonus question:
Where I use the function,
while (_hasNextAttribute(it1, it2, thisAttribute))
I am getting a compiler error
Non-const lvalue reference to type '__wrap_iter<pointer>' cannot bind to a value of unrelated type '__wrap_iter<const_pointer>'
Any idea why that might be?
Regular expressions can be useful to parse well-structured input. When taking input from users, I find it more flexible to use my custom reading functions.
The example below returns whether valid attribute following your pattern was found. If so, the first iterator is advanced beyond that attribute and the name and value are stored in the pair. (The pair should be a reference, so that changes are reflected.) If not, the iterator stays as it is. If after reading all attributes the iterator is not the end of the string, not all input was parsed.
As is, the function emulates the behaviour of a specialised regular expression. (I've annotated the code with the sub-expressions it corresponds to.) But because you have complete control over the code, you could modify it and extend it. For example, you yould replace each occurrences of return false with an appropriate error code so you can generate good error messages.
Anyway, here goes:
#include <iostream>
#include <string>
bool nextAttribute(std::string::iterator &iter,
const std::string::iterator &end,
std::pair<std::string, std::string> &attr)
{
std::string::iterator it = iter;
std::string::iterator start;
while (it != end && isspace(*it)) ++it; // \s*
if (it == end) return false;
start = it; // (
while (it != end && isalnum(*it)) ++it; // \w+
if (it == start) return false;
attr.first = std::string(start, it); // )
while (it != end && isspace(*it)) ++it; // \s*
if (it == end) return false;
if (*it != '=') return false; // =
++it;
while (it != end && isspace(*it)) ++it; // \s*
if (it == end) return false;
if (*it != '"') return false; // "
++it;
start = it; // (
while (it != end && *it != '"') ++it; // [^"]*
if (it == end) return false;
attr.second = std::string(start, it); // )
++it;
while (it != end && isspace(*it)) ++it; // \s*
iter = it;
return true;
}
int main()
{
std::string str("class=\"big red\" id=\"007\" onsubmit = \"go()\"");
std::pair<std::string, std::string> attr;
std::string::iterator it = str.begin();
while (nextAttribute(it, str.end(), attr)) {
std::cout << attr.first << ": '" << attr.second << "'\n";
}
if (it != str.end()) {
std::cout << "Incomplete: "
<< std::string(it, str.end()) << "\n";
}
return 0;
}
I'd suggest a top-down approach:
Locate the first = character which separates the attribute name from the attribute value.
Locate the first non-whitespace character preceding the = character.
Locate the first " character following the =
Locate the second " character following the first ".
The attribute name is everything from the beginning to the first non-whitespace character you found in step 2. The attribute value is everything between the two quotation marks you found in 3. and 4.
That being said, I'd not recommend dealing with iterators into std::string objects: the whole std::string API is built around indices, e.g. std::find_last_not_of (which is useful for implementing step 2. above) takes an integer.

Efficient way to check if std::string has only spaces

I was just talking with a friend about what would be the most efficient way to check if a std::string has only spaces. He needs to do this on an embedded project he is working on and apparently this kind of optimization matters to him.
I've came up with the following code, it uses strtok().
bool has_only_spaces(std::string& str)
{
char* token = strtok(const_cast<char*>(str.c_str()), " ");
while (token != NULL)
{
if (*token != ' ')
{
return true;
}
}
return false;
}
I'm looking for feedback on this code and more efficient ways to perform this task are also welcome.
if(str.find_first_not_of(' ') != std::string::npos)
{
// There's a non-space.
}
In C++11, the all_of algorithm can be employed:
// Check if s consists only of whitespaces
bool whiteSpacesOnly = std::all_of(s.begin(),s.end(),isspace);
Why so much work, so much typing?
bool has_only_spaces(const std::string& str) {
return str.find_first_not_of (' ') == str.npos;
}
Wouldn't it be easier to do:
bool has_only_spaces(const std::string &str)
{
for (std::string::const_iterator it = str.begin(); it != str.end(); ++it)
{
if (*it != ' ') return false;
}
return true;
}
This has the advantage of returning early as soon as a non-space character is found, so it will be marginally more efficient than solutions that examine the whole string.
To check if string has only whitespace in c++11:
bool is_whitespace(const std::string& s) {
return std::all_of(s.begin(), s.end(), isspace);
}
in pre-c++11:
bool is_whitespace(const std::string& s) {
for (std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
if (!isspace(*it)) {
return false;
}
}
return true;
}
Here's one that only uses STL (Requires C++11)
inline bool isBlank(const std::string& s)
{
return std::all_of(s.cbegin(),s.cend(),[](char c) { return std::isspace(c); });
}
It relies on fact that if string is empty (begin = end) std::all_of also returns true
Here is a small test program: http://cpp.sh/2tx6
Using strtok like that is bad style! strtok modifies the buffer it tokenizes (it replaces the delimiter chars with \0).
Here's a non modifying version.
const char* p = str.c_str();
while(*p == ' ') ++p;
return *p != 0;
It can be optimized even further, if you iterate through it in machine word chunks. To be portable, you would also have to take alignment into consideration.
I do not approve of you const_casting above and using strtok.
A std::string can contain embedded nulls but let's assume it will be all ASCII 32 characters before you hit the NULL terminator.
One way you can approach this is with a simple loop, and I will assume const char *.
bool all_spaces( const char * v )
{
for ( ; *v; ++v )
{
if( *v != ' ' )
return false;
}
return true;
}
For larger strings, you can check word-at-a-time until you reach the last word, and then assume the 32-bit word (say) will be 0x20202020 which may be faster.
Something like:
return std::find_if(
str.begin(), str.end(),
std::bind2nd( std::not_equal_to<char>(), ' ' ) )
== str.end();
If you're interested in white space, and not just the space character,
then the best thing to do is to define a predicate, and use it:
struct IsNotSpace
{
bool operator()( char ch ) const
{
return ! ::is_space( static_cast<unsigned char>( ch ) );
}
};
If you're doing any text processing at all, a collection of such simple
predicates will be invaluable (and they're easy to generate
automatically from the list of functions in <ctype.h>).
it's highly unlikely you'll beat a compiler optimized naive algorithm for this, e.g.
string::iterator it(str.begin()), end(str.end())
for(; it != end && *it == ' '; ++it);
return it == end;
EDIT: Actually - there is a quicker way (depending on size of string and memory available)..
std::string ns(str.size(), ' ');
return ns == str;
EDIT: actually above is not quick.. it's daft... stick with the naive implementation, the optimizer will be all over that...
EDIT AGAIN: dammit, I guess it's better to look at the functions in std::string
return str.find_first_not_of(' ') == string::npos;
I had a similar problem in a programming assignment, and here is one other solution I came up with after reviewing others. here I simply create a new sentence without the new spaces. If there are double spaces I simply overlook them.
string sentence;
string newsent; //reconstruct new sentence
string dbl = " ";
getline(cin, sentence);
int len = sentence.length();
for(int i = 0; i < len; i++){
//if there are multiple whitespaces, this loop will iterate until there are none, then go back one.
if (isspace(sentence[i]) && isspace(sentence[i+1])) {do{
i++;
}while (isspace(sentence[i])); i--;} //here, you have to dial back one to maintain at least one space.
newsent +=sentence[i];
}
cout << newsent << "\n";
Hm...I'd do this:
for (auto i = str.begin(); i != str.end() ++i)
if (!isspace(i))
return false;
Pseudo-code, isspace is located in cctype for C++.
Edit: Thanks to James for pointing out that isspace has undefined behavior on signed chars.
If you are using CString, you can do
CString myString = " "; // All whitespace
if(myString.Trim().IsEmpty())
{
// string is all whitespace
}
This has the benefit of trimming all newline, space and tab characters.

Difference between these two functions that find Palindromes

I wrote a function to check whether a word is palindrome or not but "unexpectedly", that function failed quite badly, here it is:
bool isPalindrome (const string& s){
string reverse = "";
string original = s;
for (string_sz i = 0; i != original.size(); ++i){
reverse += original.back();
original.pop_back();
}
if (reverse == original)
return true;
else
return false;
}
It gives me "string iterator offset out of range error" when you pass in a string with only one character and returns true even if we pass in an empty string (although I know its because of the intialisation of the reverse variable) and also when you pass in an unassigned string for example:
string input;
isPalindrome(input);
Later, I found a better function which works as you would expect:
bool found(const string& s)
{
bool found = true;
for (string::const_iterator i = s.begin(), j = s.end() - 1; i < j; ++i, --j) {
if (*i != *j)
found = false;
}
return found;
}
Unlike the first function, this function correctly fails when you give it an unassigned string variable or an empty string and works for single characters and such...
So, good people of stackoverflow please point out to me why the first function is so bad...
Thank You.
for (string_sz i = 0; i != original.size(); ++i) {
reverse += original.back();
original.pop_back();
}
original.size() changes as you pop elements off the back. Effectively, you keep incrementing i and decrementing original.size(); they may never be equal.
if (reverse == original)
This will never be true since you've just removed all of the elements from original and added them in reverse order to reverse. original will always be empty at this point.
You're found function could very well rely on the STL std::compare function and on the begin()/end() rbegin()/rend() functions of the string. and could be a one line function :
return std::equal(s.begin(), s.end(), s.rbegin());
The std::equal() function compares two ranges of the same length.
The begin()/end() functions provide forward iterators while rbegin() provides a reverse iterator, ie an iterator that starts at the end of the string and goes to the beginning.
This is probably not what you want, but reverse is already implemented as an algorithm in STL:
bool isPalindrome( const std::string & str )
{
std::string rev( str );
std::reverse( rev.begin(), rev.end() );
return str==rev;
}
As #James McNellis points out, this can be further condensed (without needing any algorithm) by constructing the reversed string directly with reverse iterators on the original string:
bool isPalindrome( const std::string & str )
{
return str == std::string( str.rbegin(), str.rend() );
}
The loop that purports to reverse the string doesn't in fact do so. As you're removing items from the list, you're also incrementing i. In some cases I imagine it's possible for i to skip past the current size and iterate forever.
Instead of your loop, you can use reverse:
std::reverse(original.begin(), original.end());
And then do the rest of the work. It's up to your requirements if an empty string is a palindrome or not.
Your solutions are far too complicated ;)
bool is_palindrome(std::string const& s) {
if (s.empty()) return false; // if this is required.
return !lexicographical_compare(s.begin(), s.end(), s.rbegin(), s.rend());
}
Edit: Or, as Etienne noted, just use std::equal ...
bool is_palindrome(std::string const& s) {
if (s.empty()) return false; // if this is required.
return equal(s.begin(), s.end(), s.rbegin());
}