C++ Remove new line from multiline string - c++

Whats the most efficient way of removing a 'newline' from a std::string?

#include <algorithm>
#include <string>
std::string str;
str.erase(std::remove(str.begin(), str.end(), '\n'), str.cend());
The behavior of std::remove may not quite be what you'd expect.
A call to remove is typically followed by a call to a container's erase method, which erases the unspecified values and reduces the physical size of the container to match its new logical size.
See an explanation of it here.

If the newline is expected to be at the end of the string, then:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
If the string can contain many newlines anywhere in the string:
std::string::size_type i = 0;
while (i < s.length()) {
i = s.find('\n', i);
if (i == std::string:npos) {
break;
}
s.erase(i);
}

You should use the erase-remove idiom, looking for '\n'. This will work for any standard sequence container; not just string.

Here is one for DOS or Unix new line:
void chomp( string &s)
{
int pos;
if((pos=s.find('\n')) != string::npos)
s.erase(pos);
}

Slight modification on edW's solution to remove all exisiting endline chars
void chomp(string &s){
size_t pos;
while (((pos=s.find('\n')) != string::npos))
s.erase(pos,1);
}
Note that size_t is typed for pos, it is because npos is defined differently for different types, for example, -1 (unsigned int) and -1 (unsigned float) are not the same, due to the fact the max size of each type are different. Therefore, comparing int to size_t might return false even if their values are both -1.

s.erase(std::remove(s.begin(), s.end(), '\n'), s.end());

The code removes all newlines from the string str.
O(N) implementation best served without comments on SO and with comments in production.
unsigned shift=0;
for (unsigned i=0; i<length(str); ++i){
if (str[i] == '\n') {
++shift;
}else{
str[i-shift] = str[i];
}
}
str.resize(str.length() - shift);

std::string some_str = SOME_VAL;
if ( some_str.size() > 0 && some_str[some_str.length()-1] == '\n' )
some_str.resize( some_str.length()-1 );
or (removes several newlines at the end)
some_str.resize( some_str.find_last_not_of(L"\n")+1 );

Another way to do it in the for loop
void rm_nl(string &s) {
for (int p = s.find("\n"); p != (int) string::npos; p = s.find("\n"))
s.erase(p,1);
}
Usage:
string data = "\naaa\nbbb\nccc\nddd\n";
rm_nl(data);
cout << data; // data = aaabbbcccddd

All these answers seem a bit heavy to me.
If you just flat out remove the '\n' and move everything else back a spot, you are liable to have some characters slammed together in a weird-looking way. So why not just do the simple (and most efficient) thing: Replace all '\n's with spaces?
for (int i = 0; i < str.length();i++) {
if (str[i] == '\n') {
str[i] = ' ';
}
}
There may be ways to improve the speed of this at the edges, but it will be way quicker than moving whole chunks of the string around in memory.

If its anywhere in the string than you can't do better than O(n).
And the only way is to search for '\n' in the string and erase it.
for(int i=0;i<s.length();i++) if(s[i]=='\n') s.erase(s.begin()+i);
For more newlines than:
int n=0;
for(int i=0;i<s.length();i++){
if(s[i]=='\n'){
n++;//we increase the number of newlines we have found so far
}else{
s[i-n]=s[i];
}
}
s.resize(s.length()-n);//to delete only once the last n elements witch are now newlines
It erases all the newlines once.

About answer 3 removing only the last \n off string code :
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
Will the if condition not fail if the string is really empty ?
Is it not better to do :
if (!s.empty())
{
if (s[s.length()-1] == '\n')
s.erase(s.length()-1);
}

To extend #Greg Hewgill's answer for C++11:
If you just need to delete a newline at the very end of the string:
This in C++98:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
...can now be done like this in C++11:
if (!s.empty() && s.back() == '\n') {
s.pop_back();
}
Optionally, wrap it up in a function. Note that I pass it by ptr here simply so that when you take its address as you pass it to the function, it reminds you that the string will be modified in place inside the function.
void remove_trailing_newline(std::string* str)
{
if (str->empty())
{
return;
}
if (str->back() == '\n')
{
str->pop_back();
}
}
// usage
std::string str = "some string\n";
remove_trailing_newline(&str);
Whats the most efficient way of removing a 'newline' from a std::string?
As far as the most efficient way goes--that I'd have to speed test/profile and see. I'll see if I can get back to you on that and run some speed tests between the top two answers here, and a C-style way like I did here: Removing elements from array in C. I'll use my nanos() timestamp function for speed testing.
Other References:
See these "new" C++11 functions in this reference wiki here: https://en.cppreference.com/w/cpp/string/basic_string
https://en.cppreference.com/w/cpp/string/basic_string/empty
https://en.cppreference.com/w/cpp/string/basic_string/back
https://en.cppreference.com/w/cpp/string/basic_string/pop_back

Related

makeValidWord(std::string word) not working properly

I'm programming a hash table thing in C++, but this specific piece of code will not run properly. It should return a string of alpha characters and ' and -, but I get cases like "t" instead of "art" when I try to input "'aRT-*".
isWordChar() return a bool value depending on whether the input is a valid word character or not using isAlpha()
// Words cannot contain any digits, or special characters EXCEPT for
// hyphens (-) and apostrophes (') that occur in the middle of a
// valid word (the first and last characters of a word must be an alpha
// character). All upper case characters in the word should be convertd
// to lower case.
// For example, "can't" and "good-hearted" are considered valid words.
// "12mOnkEYs-$" will be converted to "monkeys".
// "Pa55ive" will be stripped "paive".
std::string WordCount::makeValidWord(std::string word) {
if (word.size() == 0) {
return word;
}
string r = "";
string in = "";
size_t incr = 0;
size_t decr = word.size() - 1;
while (incr < word.size() && !isWordChar(word.at(incr))) {
incr++;
}
while (0 < decr && !isWordChar(word.at(decr))) {
decr--;
}
if (incr > decr) {
return r;
}
while (incr <= decr) {
if (isWordChar(word.at(incr)) || word.at(incr) == '-' || word.at(incr) == '\'') {
in =+ word.at(incr);
}
incr++;
}
for (size_t i = 0; i < in.size(); i++) {
r += tolower(in.at(i));
}
return r;
}
Assuming you can use standard algorithms its better to rewrite your function using them. This achieves 2 goals:
code is more readable, since using algorithms shows intent along with code itself
there is less chance to make error
So it should be something like this:
std::string WordCount::makeValidWord(std::string word) {
auto first = std::find_if(word.cbegin(), word.cend(), isWordChar);
auto last = std::find_if(word.crbegin(), word.crend(), isWordChar);
std::string i;
std::copy_if(first, std::next(last), std::back_inserter(i), [](char c) {
return isWordChar(c) || c == '-' || c == '\'';
});
std::string r;
std::transform(i.cbegin(), i.cend(), std::back_inserter(r), std::tolower);
return r;
}
I am going to echo #Someprogrammerdude and say: Learn to use a debugger!
I pasted your code into Visual Studio (changed isWordChar() to isalpha()), and stepped it through with the debugger. Then it was pretty trivial to notice this happening:
First loop of while (incr <= decr) {:
Second loop:
Ooh, look at that; the variable in does not update correctly - instead of collecting a string of the correct characters it only holds the last one. How can that be?
in =+ word.at(incr); Hey, that is not right, that operator should be +=.
Many errors are that easy and effortless to find and correct if you use a debugger. Pick one up today. :)

fastest way to read the last line of a string?

I'd like to know the fastest way for reading the last line in a std::string object.
Technically, the string after the last occurrence of \n in the fastest possible way?
This can be done using just string::find_last_of and string::substr like so
std::string get_last_line(const std::string &str)
{
auto position = str.find_last_of('\n');
if (position == std::string::npos)
return str;
else
return str.substr(position + 1);
}
see: example
I would probably use std::string::rfind and std::string::substr combined with guaranteed std::string::npos wrap around to be succinct:
inline std::string last_line_of(std::string const& s)
{
return s.substr(s.rfind('\n') + 1);
}
If s.rfind('\n') doesn't find anything it returns std::string::npos. The C++ standard says std::string::npos + 1 == 0. And returning s.substr(0) is always safe.
If s.rfind('\n') does find something then you want the substring starting from the next character. Again returning s.substr(s.size()) is safe according to the standard.
NOTE: In C++17 this method will benefit from guaranteed return value optimization so it should be super efficient.
I thought of a way that reads the string inversely (backwards) while storing what it reads
std::string get_last_line(const std::string &str)
{
size_t l = str.length();
std::string last_line_reversed, last_line;
for (--l; l > 0; --l)
{
char c = str.at(l);
if (c == '\n')
break;
last_line_reversed += c;
}
l = last_line_reversed.length();
size_t i = 0, y = l;
for (; i < l; ++i)
last_line += last_line_reversed[--y];
return last_line;
}
until it counters a '\n' character then reverse the stored string back and return it. If the target string is big and has a lot of new lines, this function would be very efficient.

Loop quitting for no reason

I have a question regarding C++. This is my current function:
string clarifyWord(string str) {
//Remove all spaces before string
unsigned long i = 0;
int currentASCII = 0;
while (i < str.length()) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i++;
continue;
} else {
break;
}
}
//Remove all spaces after string
i = str.length();
while (i > -1) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i--;
continue;
} else {
break;
}
}
return str;
}
Just to get the basic and obvious things out of the way, I have #include <string> and using namespace std; so I do have access to the string functions.
The thing is though that the loop is quitting and sometimes skipping the second loop. I am passing in the str to be " Cheese " and it should remove all the spaces before the string and after the string.
In the main function, I am also assigning a variable to clarifyWord(str) where str is above. It doesn't seem to print that out either using cout << str;.
Is there something I am missing with printing out strings or looping with strings? Also ASCII code 32 is Space.
Okay so the erase function you are calling looks like this:
string& erase ( size_t pos = 0, size_t n = npos );
The n parameter is the number of items to delete. The npos means, delete everything up until the end of the string, so set the second parameter to 1.
str.erase(i,1)
[EDIT]
You could change the first loop to this:
while (str.length() > 0 && str[0] == ' ')
{
str.erase(0,1);
}
and the second loop to this:
while (str.length() > 0 && str[str.length() - 1] == ' ')
{
str.erase(str.length() - 1, 1);
}
In your second loop, you can't initialize i to str.length().
str[str.length()] is going to be after the end of your string, and so is unlikely to be a space (thus triggering the break out of the second loop).
You're using erase (modifying the string) while you're in a loop checking its size. This is a dangerous way of processing the string. As you return a new string, I would recommend you first to search for the first occurrence in the string of the non-space character, and then the last one, and then returning a substring. Something along the lines of (not tested):
size_t init = str.find_first_not_of(' ');
if (init == std::string::npos)
return "";
size_t fini = std.find_last_not_of(' ');
return str.substr(init, fini - init + 1);
You see, no loops, erases, etc.
unsigned long i ... while (i > -1) Well, that's not right, is it? How would you expect that to work? The compiler will in fact convert both operands to the same type: while (i > static_cast<unsigned long>(-1)). And that's just another way to write ULONG-MAX, i.e. while (i > ULONG_MAX). In other words, while(false).
You're using erase incorrectly. It'll erase from pos to npos.
i.e. string& erase ( size_t pos = 0, size_t n = npos );
See: http://www.cplusplus.com/reference/string/string/erase/
A better way to do this is to note the position of the first non space and where the spaces occur at the end of the string. Then use either substr or erase twice.
You also don't need to go to the trouble of doing this:
currentASCII = int(str[i]);
if (currentASCII == 32) {
Instead do this:
if (str[i] == ' ') {
Which I think you'll agree is a lot easier to read.
So, you can shorten it somewhat with something like: (not tested but it shouldn't be far
off)
string clarifyWord(string str) {
int start = 0, end = str.length();
while (str[start++] == ' ');
while (str[end--] == ' ');
return str.substr(start, end);
}

Efficient way to check if std::string has only spaces

I was just talking with a friend about what would be the most efficient way to check if a std::string has only spaces. He needs to do this on an embedded project he is working on and apparently this kind of optimization matters to him.
I've came up with the following code, it uses strtok().
bool has_only_spaces(std::string& str)
{
char* token = strtok(const_cast<char*>(str.c_str()), " ");
while (token != NULL)
{
if (*token != ' ')
{
return true;
}
}
return false;
}
I'm looking for feedback on this code and more efficient ways to perform this task are also welcome.
if(str.find_first_not_of(' ') != std::string::npos)
{
// There's a non-space.
}
In C++11, the all_of algorithm can be employed:
// Check if s consists only of whitespaces
bool whiteSpacesOnly = std::all_of(s.begin(),s.end(),isspace);
Why so much work, so much typing?
bool has_only_spaces(const std::string& str) {
return str.find_first_not_of (' ') == str.npos;
}
Wouldn't it be easier to do:
bool has_only_spaces(const std::string &str)
{
for (std::string::const_iterator it = str.begin(); it != str.end(); ++it)
{
if (*it != ' ') return false;
}
return true;
}
This has the advantage of returning early as soon as a non-space character is found, so it will be marginally more efficient than solutions that examine the whole string.
To check if string has only whitespace in c++11:
bool is_whitespace(const std::string& s) {
return std::all_of(s.begin(), s.end(), isspace);
}
in pre-c++11:
bool is_whitespace(const std::string& s) {
for (std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
if (!isspace(*it)) {
return false;
}
}
return true;
}
Here's one that only uses STL (Requires C++11)
inline bool isBlank(const std::string& s)
{
return std::all_of(s.cbegin(),s.cend(),[](char c) { return std::isspace(c); });
}
It relies on fact that if string is empty (begin = end) std::all_of also returns true
Here is a small test program: http://cpp.sh/2tx6
Using strtok like that is bad style! strtok modifies the buffer it tokenizes (it replaces the delimiter chars with \0).
Here's a non modifying version.
const char* p = str.c_str();
while(*p == ' ') ++p;
return *p != 0;
It can be optimized even further, if you iterate through it in machine word chunks. To be portable, you would also have to take alignment into consideration.
I do not approve of you const_casting above and using strtok.
A std::string can contain embedded nulls but let's assume it will be all ASCII 32 characters before you hit the NULL terminator.
One way you can approach this is with a simple loop, and I will assume const char *.
bool all_spaces( const char * v )
{
for ( ; *v; ++v )
{
if( *v != ' ' )
return false;
}
return true;
}
For larger strings, you can check word-at-a-time until you reach the last word, and then assume the 32-bit word (say) will be 0x20202020 which may be faster.
Something like:
return std::find_if(
str.begin(), str.end(),
std::bind2nd( std::not_equal_to<char>(), ' ' ) )
== str.end();
If you're interested in white space, and not just the space character,
then the best thing to do is to define a predicate, and use it:
struct IsNotSpace
{
bool operator()( char ch ) const
{
return ! ::is_space( static_cast<unsigned char>( ch ) );
}
};
If you're doing any text processing at all, a collection of such simple
predicates will be invaluable (and they're easy to generate
automatically from the list of functions in <ctype.h>).
it's highly unlikely you'll beat a compiler optimized naive algorithm for this, e.g.
string::iterator it(str.begin()), end(str.end())
for(; it != end && *it == ' '; ++it);
return it == end;
EDIT: Actually - there is a quicker way (depending on size of string and memory available)..
std::string ns(str.size(), ' ');
return ns == str;
EDIT: actually above is not quick.. it's daft... stick with the naive implementation, the optimizer will be all over that...
EDIT AGAIN: dammit, I guess it's better to look at the functions in std::string
return str.find_first_not_of(' ') == string::npos;
I had a similar problem in a programming assignment, and here is one other solution I came up with after reviewing others. here I simply create a new sentence without the new spaces. If there are double spaces I simply overlook them.
string sentence;
string newsent; //reconstruct new sentence
string dbl = " ";
getline(cin, sentence);
int len = sentence.length();
for(int i = 0; i < len; i++){
//if there are multiple whitespaces, this loop will iterate until there are none, then go back one.
if (isspace(sentence[i]) && isspace(sentence[i+1])) {do{
i++;
}while (isspace(sentence[i])); i--;} //here, you have to dial back one to maintain at least one space.
newsent +=sentence[i];
}
cout << newsent << "\n";
Hm...I'd do this:
for (auto i = str.begin(); i != str.end() ++i)
if (!isspace(i))
return false;
Pseudo-code, isspace is located in cctype for C++.
Edit: Thanks to James for pointing out that isspace has undefined behavior on signed chars.
If you are using CString, you can do
CString myString = " "; // All whitespace
if(myString.Trim().IsEmpty())
{
// string is all whitespace
}
This has the benefit of trimming all newline, space and tab characters.

Separating alphabetic characters in C++ STL

I've been practicing C++ for a competition next week. And in the sample problem I've been working on, requires splitting of paragraphs into words. Of course, that's easy. But this problem is so weird, that the words like: isn't should be separated as well: isn and t. I know it's weird but I have to follow this.
I have a function split() that takes a constant char delimiter as one of the parameter. It's what I use to separate words from spaces. But I can't figure out this one. Even numbers like: phil67bs should be separated as phil and bs.
And no, I don't ask for full code. A pseudocode will do, or something that will help me understand what to do. Thanks!
PS: Please no recommendations for external libs. Just the STL. :)
Filter out numbers, spaces and anything else that isn't a letter by using a proper locale. See this SO thread about treating everything but numbers as a whitespace. So use a mask and do something similar to what Jerry Coffin suggests but only for letters:
struct alphabet_only: std::ctype<char>
{
alphabet_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['['], std::ctype_base::upper);
std::fill(&rc['a'], &rc['{'], std::ctype_base::lower);
return &rc[0];
}
};
And, boom! You're golden.
Or... you could just do a transform:
char changeToLetters(const char& input){ return isalpha(input) ? input : ' '; }
vector<char> output;
output.reserve( myVector.size() );
transform( myVector.begin(), myVector.end(), insert_iterator(output), ptr_fun(changeToLetters) );
Which, um, is much easier to grok, just not as efficient as Jerry's idea.
Edit:
Changed 'Z' to '[' so that the value 'Z' is filled. Likewise with 'z' to '{'.
This sounds like a perfect job for the find_first_of function which finds the first occurrence of a set of characters. You can use this to look for arbitrary stop characters and generate words from the spaces between such stop characters.
Roughly:
size_t previous = 0;
for (; ;) {
size_t next = str.find_first_of(" '1234567890", previous);
// Do processing
if (next == string::npos)
break;
previous = next + 1;
};
Just change your function to delimit on anything that isn't an alphabetic character. Is there anything in particular that you are having trouble with?
Break down the problem: First, write a function that gets the first "word" from the sentence. This is easy; just look for the first non-alphabetic character. The next step is to remove all leading non-alphabetic character from the remaining string. From there, just repeat.
You can do something like this:
vector<string> split(const string& str)
{
vector<string> splits;
string cur;
for(int i = 0; i < str.size(); ++i)
{
if(str[i] >= '0' && str[i] <= '9')
{
if(!cur.empty())
{
splits.push_back(cur);
}
cur="";
}
else
{
cur += str[i];
}
}
if(! cur.empty())
{
splits.push_back(cur);
}
return splits;
}
let's assume that the input is in a std::string (use std::getline(cin, line) for example to read a full line from cin)
std::vector<std::string> split(std::string const& input)
{
std::string::const_iterator it(input), end(input.end());
std::string current;
vector<std::string> words;
for(; it != end; ++it)
{
if (isalpha(*it))
{
current.push_back(*it); // add this char to the current word
}
else
{
// push the current word in to the result list
words.push_back(current);
current.clear(); // next word
}
}
return words;
}
I've not tested it, but I guess it ought to work...