Efficient way to check if std::string has only spaces - c++

I was just talking with a friend about what would be the most efficient way to check if a std::string has only spaces. He needs to do this on an embedded project he is working on and apparently this kind of optimization matters to him.
I've came up with the following code, it uses strtok().
bool has_only_spaces(std::string& str)
{
char* token = strtok(const_cast<char*>(str.c_str()), " ");
while (token != NULL)
{
if (*token != ' ')
{
return true;
}
}
return false;
}
I'm looking for feedback on this code and more efficient ways to perform this task are also welcome.

if(str.find_first_not_of(' ') != std::string::npos)
{
// There's a non-space.
}

In C++11, the all_of algorithm can be employed:
// Check if s consists only of whitespaces
bool whiteSpacesOnly = std::all_of(s.begin(),s.end(),isspace);

Why so much work, so much typing?
bool has_only_spaces(const std::string& str) {
return str.find_first_not_of (' ') == str.npos;
}

Wouldn't it be easier to do:
bool has_only_spaces(const std::string &str)
{
for (std::string::const_iterator it = str.begin(); it != str.end(); ++it)
{
if (*it != ' ') return false;
}
return true;
}
This has the advantage of returning early as soon as a non-space character is found, so it will be marginally more efficient than solutions that examine the whole string.

To check if string has only whitespace in c++11:
bool is_whitespace(const std::string& s) {
return std::all_of(s.begin(), s.end(), isspace);
}
in pre-c++11:
bool is_whitespace(const std::string& s) {
for (std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
if (!isspace(*it)) {
return false;
}
}
return true;
}

Here's one that only uses STL (Requires C++11)
inline bool isBlank(const std::string& s)
{
return std::all_of(s.cbegin(),s.cend(),[](char c) { return std::isspace(c); });
}
It relies on fact that if string is empty (begin = end) std::all_of also returns true
Here is a small test program: http://cpp.sh/2tx6

Using strtok like that is bad style! strtok modifies the buffer it tokenizes (it replaces the delimiter chars with \0).
Here's a non modifying version.
const char* p = str.c_str();
while(*p == ' ') ++p;
return *p != 0;
It can be optimized even further, if you iterate through it in machine word chunks. To be portable, you would also have to take alignment into consideration.

I do not approve of you const_casting above and using strtok.
A std::string can contain embedded nulls but let's assume it will be all ASCII 32 characters before you hit the NULL terminator.
One way you can approach this is with a simple loop, and I will assume const char *.
bool all_spaces( const char * v )
{
for ( ; *v; ++v )
{
if( *v != ' ' )
return false;
}
return true;
}
For larger strings, you can check word-at-a-time until you reach the last word, and then assume the 32-bit word (say) will be 0x20202020 which may be faster.

Something like:
return std::find_if(
str.begin(), str.end(),
std::bind2nd( std::not_equal_to<char>(), ' ' ) )
== str.end();
If you're interested in white space, and not just the space character,
then the best thing to do is to define a predicate, and use it:
struct IsNotSpace
{
bool operator()( char ch ) const
{
return ! ::is_space( static_cast<unsigned char>( ch ) );
}
};
If you're doing any text processing at all, a collection of such simple
predicates will be invaluable (and they're easy to generate
automatically from the list of functions in <ctype.h>).

it's highly unlikely you'll beat a compiler optimized naive algorithm for this, e.g.
string::iterator it(str.begin()), end(str.end())
for(; it != end && *it == ' '; ++it);
return it == end;
EDIT: Actually - there is a quicker way (depending on size of string and memory available)..
std::string ns(str.size(), ' ');
return ns == str;
EDIT: actually above is not quick.. it's daft... stick with the naive implementation, the optimizer will be all over that...
EDIT AGAIN: dammit, I guess it's better to look at the functions in std::string
return str.find_first_not_of(' ') == string::npos;

I had a similar problem in a programming assignment, and here is one other solution I came up with after reviewing others. here I simply create a new sentence without the new spaces. If there are double spaces I simply overlook them.
string sentence;
string newsent; //reconstruct new sentence
string dbl = " ";
getline(cin, sentence);
int len = sentence.length();
for(int i = 0; i < len; i++){
//if there are multiple whitespaces, this loop will iterate until there are none, then go back one.
if (isspace(sentence[i]) && isspace(sentence[i+1])) {do{
i++;
}while (isspace(sentence[i])); i--;} //here, you have to dial back one to maintain at least one space.
newsent +=sentence[i];
}
cout << newsent << "\n";

Hm...I'd do this:
for (auto i = str.begin(); i != str.end() ++i)
if (!isspace(i))
return false;
Pseudo-code, isspace is located in cctype for C++.
Edit: Thanks to James for pointing out that isspace has undefined behavior on signed chars.

If you are using CString, you can do
CString myString = " "; // All whitespace
if(myString.Trim().IsEmpty())
{
// string is all whitespace
}
This has the benefit of trimming all newline, space and tab characters.

Related

string::replace not working correctly 100% of the time?

I'm trying to replace every space character with '%20' in a string, and I'm thinking of using the built in replace function for the string class.
Currently, I have:
void replaceSpace(string& s)
{
int len = s.length();
string str = "%20";
for(int i = 0; i < len; i++) {
if(s[i] == ' ') {
s.replace(i, 1, str);
}
}
}
When I pass in the string "_a_b_c_e_f_g__", where the underscores represent space, my output is "%20a%20b%20c%20e_f_g__". Again, underscores represent space.
Why is that the spaces near the beginning of the string are replaced, but the spaces towards the end aren't?
You are making s longer with each replacement, but you are not updating len which is used in the loop condition.
Modifying the string that you are just scanning is like cutting the branch under your feet. It may work if you are careful, but in this case you aren't.
Namely, you take the string len at the beginning but with each replacement your string gets longer and you are pushing the replacement places further away (so you never reach all of them).
The correct way to cut this branch is from its end (tip) towards the trunk - this way you always have a safe footing:
void replaceSpace(string& s)
{
int len = s.length();
string str = "%20";
for(int i = len - 1; i >= 0; i--) {
if(s[i] == ' ') {
s.replace(i, 1, str);
}
}
}
You're growing the string but only looping to its initial size.
Looping over a collection while modifying it is very prone to error.
Here's a solution that doesn't:
void replace(string& s)
{
string s1;
std::for_each(s.begin(),
s.end(),
[&](char c) {
if (c == ' ') s1 += "%20";
else s1 += c;
});
s.swap(s1);
}
As others have already mentioned, the problem is you're using the initial string length in your loop, but the string gets bigger along the way. Your loop never reaches the end of the string.
You have a number of ways to fix this. You can correct your solution and make sure you go to the end of the string as it is now, not as it was before you started looping.
Or you can use #molbdnilo 's way, which creates a copy of the string along the way.
Or you can use something like this:
std::string input = " a b c e f g ";
std::string::size_type pos = 0;
while ((pos = input.find(' ', pos)) != std::string::npos)
{
input.replace(pos, 1, "%20");
}
Here's a function that can make it easier for you:
string replace_char_str(string str, string find_str, string replace_str)
{
size_t pos = 0;
for ( pos = str.find(find_str); pos != std::string::npos; pos = str.find(find_str,pos) )
{
str.replace(pos ,1, replace_str);
}
return str;
}
So if when you want to replace the spaces, try it like this:
string new_str = replace_char_str(yourstring, " ", "%20");
Hope this helps you ! :)

Vector's of unsigned char iterators not working

I wanna to cut CRLF at end of the vector, but my code is not working (at first loop of while - equal is calling and returns false). In debug mode "i" == 0 and have "ptr" value == "0x002e4cfe"
string testS = "\r\n\r\n\r\n<-3 CRLF Testing trim new lines 3 CRLF->\r\n\r\n\r\n";
vector<uint8> _data; _data.clear();
_data.insert(_data.end(), testS.begin(), testS.end());
vector<uint8>::iterator i = _data.end();
uint32 bytesToCut = 0;
while(i != _data.begin()) {
if(equal(i - 1, i, "\r\n")) {
bytesToCut += 2;
--i; if(i == _data.begin()) return; else --i;
} else {
if(bytesToCut) _data.erase(_data.end() - bytesToCut, _data.end());
return;
}
}
Thanks a lot for your answers. But i need version with iterators, because my code is used when i parsing chunked http transfering data, which is writed to vector and i need func, which would take a pointer to a vector and iterator defining the position to remove CRLF backwards. And all my problems, i think, apparently enclosed in iterators.
Your code is invalid at least due to setting incorrect range in algorithm std::equal
if(equal(i - 1, i, "\r\n")) {
In this expression you compare only one element of the vector pointed by iterator i - 1 with '\r'. You have to write something as
if(equal(i - 2, i, "\r\n")) {
If you need to remove pairs "\r\n" from the vector then I can suggest the following approach (I used my own variable names and included testing output):
std::string s = "\r\n\r\n\r\n<-3 CRLF Testing trim new lines 3 CRLF->\r\n\r\n\r\n";
std::vector<unsigned char> v( s.begin(), s.end() );
std::cout << v.size() << std::endl;
auto last = v.end();
auto prev = v.end();
while ( prev != v.begin() && *--prev == '\n' && prev != v.begin() && *--prev == '\r' )
{
last = prev;
}
v.erase( last, v.end() );
std::cout << v.size() << std::endl;
instead if re inventing th wheel you can the existing STL algo with something like:
std::string s;
s = s.substr(0, s.find_last_not_of(" \r\n"));
If you need to just trim '\r' & '\n' from the end then simple substr will do:
std::string str = "\r\n\r\n\r\nSome string\r\n\r\n\r\n";
size_t newLength = str.length();
while (str[newLength - 1] == '\r' || str[newLength - 1] == '\n') newLength--;
str = str.substr(0, newLength);
std::cout << str;
Don't sweat small stuff :)
Removing all '\r' and '\n' could be simple as (C++03):
#include <iostream>
#include <string>
#include <algorithm>
int main() {
std::string str = "\r\n\r\n\r\nSome string\r\n\r\n\r\n";
str.erase(std::remove(str.begin(), str.end(), '\r'), str.end());
str.erase(std::remove(str.begin(), str.end(), '\n'), str.end());
std::cout << str;
}
or:
bool isUnwantedChar(char c) {
return (c == '\r' || c == '\n');
}
int main() {
std::string str = "\r\n\r\n\r\nSome string\r\n\r\n\r\n";
str.erase(std::remove_if(str.begin(), str.end(), isUnwantedChar), str.end());
std::cout << str;
}
First of all, your vector initialization is ... non-optimal. All you needed to do is:
string testS = "\r\n\r\n\r\n<-3 CRLF Testing trim new lines 3 CRLF->\r\n\r\n\r\n";
vector<uint8> _data(testS.begin(), testS.end());
Second, if you wanted to remove the \r and \n characters, you could have done it in the string:
testS.erase(std::remove_if(testS.begin(), testS.end(), [](char c)
{
return c == '\r' || c == '\n';
}), testS.end());
If you wanted to do it in the vector, it is the same basic process:
_data.erase(std::remove_if(_data.begin(), _data.end(), [](uint8 ui)
{
return ui == static_cast<uint8>('\r') || ui == static_cast<uint8>('\n');
}), _data.end());
Your problem is likely due to the usage of invalidated iterators in your loop (that has several other logical issues, but since it shouldn't exist anyway, I won't touch on) that removes elements 1-by-1.
If you wanted to remove the items just from the end of the string/vector, it would be slightly different, but still the same basic pattern:
int start = testS.find_first_not_of("\r\n", 0); // finds the first non-\r\n character in the string
int end = testS.find_first_of("\r\n", start); // find the first \r\n character after real characters
// assuming neither start nor end are equal to std::string::npos - this should be checked
testS.erase(testS.begin() + end, testS.end()); // erase the `\r\n`s at the end of the string.
or alternatively (if \r\n can be in the middle of the string as well):
std::string::reverse_iterator rit = std::find_if_not(testS.rbegin(), testS.rend(), [](char c)
{
return c == '\r' || c == '\n';
});
testS.erase(rit.base(), testS.end());

Separating alphabetic characters in C++ STL

I've been practicing C++ for a competition next week. And in the sample problem I've been working on, requires splitting of paragraphs into words. Of course, that's easy. But this problem is so weird, that the words like: isn't should be separated as well: isn and t. I know it's weird but I have to follow this.
I have a function split() that takes a constant char delimiter as one of the parameter. It's what I use to separate words from spaces. But I can't figure out this one. Even numbers like: phil67bs should be separated as phil and bs.
And no, I don't ask for full code. A pseudocode will do, or something that will help me understand what to do. Thanks!
PS: Please no recommendations for external libs. Just the STL. :)
Filter out numbers, spaces and anything else that isn't a letter by using a proper locale. See this SO thread about treating everything but numbers as a whitespace. So use a mask and do something similar to what Jerry Coffin suggests but only for letters:
struct alphabet_only: std::ctype<char>
{
alphabet_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['['], std::ctype_base::upper);
std::fill(&rc['a'], &rc['{'], std::ctype_base::lower);
return &rc[0];
}
};
And, boom! You're golden.
Or... you could just do a transform:
char changeToLetters(const char& input){ return isalpha(input) ? input : ' '; }
vector<char> output;
output.reserve( myVector.size() );
transform( myVector.begin(), myVector.end(), insert_iterator(output), ptr_fun(changeToLetters) );
Which, um, is much easier to grok, just not as efficient as Jerry's idea.
Edit:
Changed 'Z' to '[' so that the value 'Z' is filled. Likewise with 'z' to '{'.
This sounds like a perfect job for the find_first_of function which finds the first occurrence of a set of characters. You can use this to look for arbitrary stop characters and generate words from the spaces between such stop characters.
Roughly:
size_t previous = 0;
for (; ;) {
size_t next = str.find_first_of(" '1234567890", previous);
// Do processing
if (next == string::npos)
break;
previous = next + 1;
};
Just change your function to delimit on anything that isn't an alphabetic character. Is there anything in particular that you are having trouble with?
Break down the problem: First, write a function that gets the first "word" from the sentence. This is easy; just look for the first non-alphabetic character. The next step is to remove all leading non-alphabetic character from the remaining string. From there, just repeat.
You can do something like this:
vector<string> split(const string& str)
{
vector<string> splits;
string cur;
for(int i = 0; i < str.size(); ++i)
{
if(str[i] >= '0' && str[i] <= '9')
{
if(!cur.empty())
{
splits.push_back(cur);
}
cur="";
}
else
{
cur += str[i];
}
}
if(! cur.empty())
{
splits.push_back(cur);
}
return splits;
}
let's assume that the input is in a std::string (use std::getline(cin, line) for example to read a full line from cin)
std::vector<std::string> split(std::string const& input)
{
std::string::const_iterator it(input), end(input.end());
std::string current;
vector<std::string> words;
for(; it != end; ++it)
{
if (isalpha(*it))
{
current.push_back(*it); // add this char to the current word
}
else
{
// push the current word in to the result list
words.push_back(current);
current.clear(); // next word
}
}
return words;
}
I've not tested it, but I guess it ought to work...

C++ Remove new line from multiline string

Whats the most efficient way of removing a 'newline' from a std::string?
#include <algorithm>
#include <string>
std::string str;
str.erase(std::remove(str.begin(), str.end(), '\n'), str.cend());
The behavior of std::remove may not quite be what you'd expect.
A call to remove is typically followed by a call to a container's erase method, which erases the unspecified values and reduces the physical size of the container to match its new logical size.
See an explanation of it here.
If the newline is expected to be at the end of the string, then:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
If the string can contain many newlines anywhere in the string:
std::string::size_type i = 0;
while (i < s.length()) {
i = s.find('\n', i);
if (i == std::string:npos) {
break;
}
s.erase(i);
}
You should use the erase-remove idiom, looking for '\n'. This will work for any standard sequence container; not just string.
Here is one for DOS or Unix new line:
void chomp( string &s)
{
int pos;
if((pos=s.find('\n')) != string::npos)
s.erase(pos);
}
Slight modification on edW's solution to remove all exisiting endline chars
void chomp(string &s){
size_t pos;
while (((pos=s.find('\n')) != string::npos))
s.erase(pos,1);
}
Note that size_t is typed for pos, it is because npos is defined differently for different types, for example, -1 (unsigned int) and -1 (unsigned float) are not the same, due to the fact the max size of each type are different. Therefore, comparing int to size_t might return false even if their values are both -1.
s.erase(std::remove(s.begin(), s.end(), '\n'), s.end());
The code removes all newlines from the string str.
O(N) implementation best served without comments on SO and with comments in production.
unsigned shift=0;
for (unsigned i=0; i<length(str); ++i){
if (str[i] == '\n') {
++shift;
}else{
str[i-shift] = str[i];
}
}
str.resize(str.length() - shift);
std::string some_str = SOME_VAL;
if ( some_str.size() > 0 && some_str[some_str.length()-1] == '\n' )
some_str.resize( some_str.length()-1 );
or (removes several newlines at the end)
some_str.resize( some_str.find_last_not_of(L"\n")+1 );
Another way to do it in the for loop
void rm_nl(string &s) {
for (int p = s.find("\n"); p != (int) string::npos; p = s.find("\n"))
s.erase(p,1);
}
Usage:
string data = "\naaa\nbbb\nccc\nddd\n";
rm_nl(data);
cout << data; // data = aaabbbcccddd
All these answers seem a bit heavy to me.
If you just flat out remove the '\n' and move everything else back a spot, you are liable to have some characters slammed together in a weird-looking way. So why not just do the simple (and most efficient) thing: Replace all '\n's with spaces?
for (int i = 0; i < str.length();i++) {
if (str[i] == '\n') {
str[i] = ' ';
}
}
There may be ways to improve the speed of this at the edges, but it will be way quicker than moving whole chunks of the string around in memory.
If its anywhere in the string than you can't do better than O(n).
And the only way is to search for '\n' in the string and erase it.
for(int i=0;i<s.length();i++) if(s[i]=='\n') s.erase(s.begin()+i);
For more newlines than:
int n=0;
for(int i=0;i<s.length();i++){
if(s[i]=='\n'){
n++;//we increase the number of newlines we have found so far
}else{
s[i-n]=s[i];
}
}
s.resize(s.length()-n);//to delete only once the last n elements witch are now newlines
It erases all the newlines once.
About answer 3 removing only the last \n off string code :
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
Will the if condition not fail if the string is really empty ?
Is it not better to do :
if (!s.empty())
{
if (s[s.length()-1] == '\n')
s.erase(s.length()-1);
}
To extend #Greg Hewgill's answer for C++11:
If you just need to delete a newline at the very end of the string:
This in C++98:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
...can now be done like this in C++11:
if (!s.empty() && s.back() == '\n') {
s.pop_back();
}
Optionally, wrap it up in a function. Note that I pass it by ptr here simply so that when you take its address as you pass it to the function, it reminds you that the string will be modified in place inside the function.
void remove_trailing_newline(std::string* str)
{
if (str->empty())
{
return;
}
if (str->back() == '\n')
{
str->pop_back();
}
}
// usage
std::string str = "some string\n";
remove_trailing_newline(&str);
Whats the most efficient way of removing a 'newline' from a std::string?
As far as the most efficient way goes--that I'd have to speed test/profile and see. I'll see if I can get back to you on that and run some speed tests between the top two answers here, and a C-style way like I did here: Removing elements from array in C. I'll use my nanos() timestamp function for speed testing.
Other References:
See these "new" C++11 functions in this reference wiki here: https://en.cppreference.com/w/cpp/string/basic_string
https://en.cppreference.com/w/cpp/string/basic_string/empty
https://en.cppreference.com/w/cpp/string/basic_string/back
https://en.cppreference.com/w/cpp/string/basic_string/pop_back

How can I make this work with every delimiter in C++?

I just wrote a program that tokenizes a char array using pointers. The program only needed to work with a space as the delimiter character. I just turned it in and got full credit, but after turning it in, I realized that this program worked only if the delimiter character was a space.
My question is, how could I make this program work with an arbitrary delimiter character?
The function I've shown you below returns a pointer to the next word in the char array. This is what I believe I need to change for it to work with any delimiter character.
Thanks!
Code:
char* StringTokenizer::Next(void) {
pNextWord = pStart;
if (*pStart == '\0') { return NULL; }
while (*pStart != delim) {
pStart++;
}
if (*pStart == '\0') { return NULL; }
*pStart = '\0';
pStart++;
return pNextWord;
}
The printing loop in main():
while ((nextWord = tk.Next()) != NULL) {
cout << nextWord << endl;
}
The simpliest way is to change your
while (*pStart != delim)
to something like
while (*pStart != ' ' && *pStart != '\n' && *pStart != '\t')
Or, you could make delim a string, and create a function that checks if a char is in the string:
bool isDelim(char c, const char *delim) {
while (*delim) {
if (*delim == c)
return true;
delim++;
}
return false;
}
while ( !isDelim(*pStart, " \n\t") )
Or, perhaps the best solution is to use one of the prebuilt functions for doing all this, such as strtok.
Just change the line
while (*pStart != delim)
as follows:
while (*pStart != '\0' && strchr(" \t\n", *pStart) == NULL)
The standard strchr function (declared in the string.h header)
looks for a character (given in the second argument) in a C-string
(given in the first argument) and returns a pointer to the position
where that character occurs for the first time. Hence, the expression
strchr(" \t\n", *pStart) == NULL is true if the current character
(*pStart) cannot be not found in string " \t\n" and, therefore,
is not a delimiter. (Modify the delimiter string to adapt it to your
needs, of course.)
This approach provides a short and simple way to test whether a given
character belongs to a (small) set of characters of interest. And it
uses a standard function.
By the way, you can do this using not only a C-string, but with
a std::string, too. All you need is to declare a const std::string
with " \t\n"-like value and then replace the call to the strchr
function with the find method of the declared delimiter string.
Hmm...this doesn't look quite right:
if (*pStart = '\0')
The condition can never be true. I'm guessing you intended == instead of =? You also have a bit of a problem here:
while (*pStart != delim)
If the last word in the string isn't followed by a delimiter, this is going to run off the end of the string, which will cause serious problems.
Edit: Unless you really need to do this on your own, consider using a stringstream for the job. It already has all the right mechanism in place and quite heavily tested. It does add overhead, but it's quite acceptable in a lot of cases.
Not compiled. but I'd do something like this.
//const int N = someGoodValue;
char delimList[N] = {' ',',','.',';', '|', '!', '$', '\n'};//all delims here.
char* StringTokenizer::Next(void)
{
if (*pStart == '\0') { return NULL; }
pNextWord = pStart;
while (1){
for (int x = 0; x < N; x++){
if (*pStart == delimList[x]){ //this is it.
*pStart = '\0';
pStart++;
return pNextWord;
}
}
if ('\0' == *pStart){ //last word.. maybe.
return pNextWord;
}
pStart++;
}
}
// (!compiled).
I assume that we want to stick to C instead of C++. Functions strspn and strcspn are good for tokenizing by a set a delimiters. You can use strspn to find where the next separator begins (i.e. where the current token ends) and then using strcspn to find where the separator ends (i.e. where the next token begins). Loop until you reach the end.