Related
I write this split function, can't find easy way to split by string_view(several chars).
My function:
size_t split(std::vector<std::string_view>& result, std::string_view in, char sep) {
result.reserve(std::count(in.begin(), in.end(), in.find(sep) != std::string::npos) + 1);
for (auto pfirst = in.begin();; ++pfirst) {
auto pbefore = pfirst;
pfirst = std::find(pfirst, in.end(), sep);
result.emplace_back(q, pfirst-pbefore);
if (pfirst == in.end())
return result.size();
}
}
I want to call this split function with string_view separator. For example:
str = "apple, phone, bread\n keyboard, computer"
split(result, str, "\n,")
Result:['apple', 'phone', 'bread', 'keyboard', 'computer']
My question is, how can i implement this function as fast as possible?
First, you are using std::count() incorrectly.
Second, std::string_view has its own find_first_of() and substr() methods, which you can use in this situation, instead of using iterators. find_first_of() allows you to specify multiple characters to search for.
Try something more like this:
size_t split(std::vector<std::string_view>& result, std::string_view in, std::string_view seps) {
result.reserve(std::count_if(in.begin(), in.end(), [&](char ch){ return seps.find(ch) != std::string_view::npos; }) + 1);
std::string_view::size_type start = 0, end;
while ((end = in.find_first_of(seps, start)) != std::string_view::npos) {
result.push_back(in.substr(start, end-start));
start = in.find_first_not_of(' ', end+1);
}
if (start != std::string_view::npos)
result.push_back(in.substr(start));
return result.size();
}
Online Demo
This is my take on splitting a string view, just loops once over all the characters in the string view and returns a vector of string_views (so no copying of data)
The calling code can still use words.size() to get the size if needed.
(I use C++20 std::set contains function)
Live demo here : https://onlinegdb.com/tHfPIeo1iM
#include <iostream>
#include <set>
#include <string_view>
#include <vector>
auto split(const std::string_view& string, const std::set<char>& separators)
{
std::vector<std::string_view> words;
auto word_begin{ string.data() };
std::size_t word_len{ 0ul };
for (const auto& c : string)
{
if (!separators.contains(c))
{
word_len++;
}
else
{
// we found a word and not a seperator repeat
if (word_len > 0)
{
words.emplace_back(word_begin, word_len);
word_begin += word_len;
word_len = 0;
}
word_begin++;
}
}
// string_view doesn't have a trailing zero so
// also no trailing separator so if there is still
// a word in the "pipeline" add it too
if (word_len > 0)
{
words.emplace_back(word_begin, word_len);
}
return words;
}
int main()
{
std::set<char> seperators{ ' ', ',', '.', '!', '\n' };
auto words = split("apple, phone, bread\n keyboard, computer", seperators);
bool comma = false;
std::cout << "[";
for (const auto& word : words)
{
if (comma) std::cout << ", ";
std::cout << word;
comma = true;
}
std::cout << "]\n";
return 0;
}
I do not know about performance, but this code seems a lot simpler
std::vector<std::string> ParseDelimited(
const std::string &l, char delim )
{
std::vector<std::string> token;
std::stringstream sst(l);
std::string a;
while (getline(sst, a, delim))
token.push_back(a);
return token;
}
Given a string:
std::string str{"This i_s A stRIng"};
Is it possible to transform it to lowercase and remove all non-alpha characters in a single pass?
Expected result:
this is a string
I know you can use std::transform(..., ::tolower) and string.erase(remove_if()) combination to make two passes, or it can be done manually by iterating each character, but is there a way to do something that would combine the std::transform and erase calls without having to run through the string multiple times?
C++20 ranges allow algorithms to be combined, in a one-pass fashion.
std::string str{"This i_s A stRIng"};
std::string out;
auto is_alpha_or_space = [](unsigned char c){ return isalpha(c) || isspace(c); };
auto safe_tolower = [](unsigned char c){ return tolower(c); };
std::ranges::copy( str
| std::views::filter(is_alpha_or_space)
| std::views::transform(safe_tolower)
, std::back_inserter(out));
See it on Compiler Explorer
First let me notice that you seem to want to filter alphabetic characters or spaces, that is, characters l for which std::isalpha(l) || std::isspace(l) returns true.
Assuming this, you can achieve what you want using std::accumulate
str = std::accumulate(str.begin(), str.end(), std::string{},
[](const std::string& s, const auto& l) {
if (std::isalpha(l, std::locale()) || std::isspace(l, std::locale()))
return s + std::tolower(l, std::locale());
else
return s;
});
See it Live on Coliru.
I know you can ... do it manually by iterating each character ...
That is exactly would you would have to do, eg:
std::string str{"This i_s A stRIng"};
std::string::size_type pos = 0;
while (pos < str.size())
{
unsigned char ch = str[pos];
if (!::isalpha(ch))
{
if (!::isspace(ch))
{
str.erase(pos, 1);
continue;
}
}
else
{
str[pos] = (char) ::tolower(ch);
}
++pos;
}
Or:
std::string str{"This i_s A stRIng"};
auto iter = str.begin();
while (iter != str.end())
{
unsigned char ch = *iter;
if (!::isalpha(ch))
{
if (!::isspace(ch))
{
iter = str.erase(iter);
continue;
}
}
else
{
*iter = (char) ::tolower(ch);
}
++iter;
}
but is there a way to do something that would combine the std::transform and erase calls without having to run through the string multiple times?
You can use the standard std::accumulate() algorithm, as shown in francesco's answer. Although, that will not manipulate the std::string in-place, as the code above does. It will create a new std::string instead (and will do so on each iteration, for that matter).
Otherwise, you could use C++20 ranges, ie by combining std::views::filter() with std::views::transform(), eg (I'm not familiar with the <ranges> library, so this syntax might need tweaking):
#include <ranges>
auto alphaOrSpace = [](unsigned char ch){ return ::isalpha(ch) || ::isspace(ch); }
auto lowercase = [](unsigned char ch){ return ::tolower(ch); };
std::string str{"This i_s A stRIng"};
str = str | std::views::filter(alphaOrSpace) | std::views::transform(lowercase);
But, this would actually be a multi-pass solution, just coded into a single operation.
I'm currently using the following code to right-trim all the std::strings in my programs:
std::string s;
s.erase(s.find_last_not_of(" \n\r\t")+1);
It works fine, but I wonder if there are some end-cases where it might fail?
Of course, answers with elegant alternatives and also left-trim solution are welcome.
EDIT Since c++17, some parts of the standard library were removed. Fortunately, starting with c++11, we have lambdas which are a superior solution.
#include <algorithm>
#include <cctype>
#include <locale>
// trim from start (in place)
static inline void ltrim(std::string &s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(), [](unsigned char ch) {
return !std::isspace(ch);
}));
}
// trim from end (in place)
static inline void rtrim(std::string &s) {
s.erase(std::find_if(s.rbegin(), s.rend(), [](unsigned char ch) {
return !std::isspace(ch);
}).base(), s.end());
}
// trim from both ends (in place)
static inline void trim(std::string &s) {
rtrim(s);
ltrim(s);
}
// trim from start (copying)
static inline std::string ltrim_copy(std::string s) {
ltrim(s);
return s;
}
// trim from end (copying)
static inline std::string rtrim_copy(std::string s) {
rtrim(s);
return s;
}
// trim from both ends (copying)
static inline std::string trim_copy(std::string s) {
trim(s);
return s;
}
Thanks to https://stackoverflow.com/a/44973498/524503 for bringing up the modern solution.
Original answer:
I tend to use one of these 3 for my trimming needs:
#include <algorithm>
#include <functional>
#include <cctype>
#include <locale>
// trim from start
static inline std::string <rim(std::string &s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(),
std::not1(std::ptr_fun<int, int>(std::isspace))));
return s;
}
// trim from end
static inline std::string &rtrim(std::string &s) {
s.erase(std::find_if(s.rbegin(), s.rend(),
std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end());
return s;
}
// trim from both ends
static inline std::string &trim(std::string &s) {
return ltrim(rtrim(s));
}
They are fairly self-explanatory and work very well.
EDIT: BTW, I have std::ptr_fun in there to help disambiguate std::isspace because there is actually a second definition which supports locales. This could have been a cast just the same, but I tend to like this better.
EDIT: To address some comments about accepting a parameter by reference, modifying and returning it. I Agree. An implementation that I would likely prefer would be two sets of functions, one for in place and one which makes a copy. A better set of examples would be:
#include <algorithm>
#include <functional>
#include <cctype>
#include <locale>
// trim from start (in place)
static inline void ltrim(std::string &s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(),
std::not1(std::ptr_fun<int, int>(std::isspace))));
}
// trim from end (in place)
static inline void rtrim(std::string &s) {
s.erase(std::find_if(s.rbegin(), s.rend(),
std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end());
}
// trim from both ends (in place)
static inline void trim(std::string &s) {
rtrim(s);
ltrim(s);
}
// trim from start (copying)
static inline std::string ltrim_copy(std::string s) {
ltrim(s);
return s;
}
// trim from end (copying)
static inline std::string rtrim_copy(std::string s) {
rtrim(s);
return s;
}
// trim from both ends (copying)
static inline std::string trim_copy(std::string s) {
trim(s);
return s;
}
I am keeping the original answer above though for context and in the interest of keeping the high voted answer still available.
Using Boost's string algorithms would be easiest:
#include <boost/algorithm/string.hpp>
std::string str("hello world! ");
boost::trim_right(str);
str is now "hello world!". There's also trim_left and trim, which trims both sides.
If you add _copy suffix to any of above function names e.g. trim_copy, the function will return a trimmed copy of the string instead of modifying it through a reference.
If you add _if suffix to any of above function names e.g. trim_copy_if, you can trim all characters satisfying your custom predicate, as opposed to just whitespaces.
What you are doing is fine and robust. I have used the same method for a long time and I have yet to find a faster method:
const char* ws = " \t\n\r\f\v";
// trim from end of string (right)
inline std::string& rtrim(std::string& s, const char* t = ws)
{
s.erase(s.find_last_not_of(t) + 1);
return s;
}
// trim from beginning of string (left)
inline std::string& ltrim(std::string& s, const char* t = ws)
{
s.erase(0, s.find_first_not_of(t));
return s;
}
// trim from both ends of string (right then left)
inline std::string& trim(std::string& s, const char* t = ws)
{
return ltrim(rtrim(s, t), t);
}
By supplying the characters to be trimmed you have the flexibility to trim non-whitespace characters and the efficiency to trim only the characters you want trimmed.
Use the following code to right trim (trailing) spaces and tab characters from std::strings (ideone):
// trim trailing spaces
size_t endpos = str.find_last_not_of(" \t");
size_t startpos = str.find_first_not_of(" \t");
if( std::string::npos != endpos )
{
str = str.substr( 0, endpos+1 );
str = str.substr( startpos );
}
else {
str.erase(std::remove(std::begin(str), std::end(str), ' '), std::end(str));
}
And just to balance things out, I'll include the left trim code too (ideone):
// trim leading spaces
size_t startpos = str.find_first_not_of(" \t");
if( string::npos != startpos )
{
str = str.substr( startpos );
}
Try this, it works for me.
inline std::string trim(std::string& str)
{
str.erase(str.find_last_not_of(' ')+1); //suffixing spaces
str.erase(0, str.find_first_not_of(' ')); //prefixing spaces
return str;
}
Bit late to the party, but never mind. Now C++11 is here, we have lambdas and auto variables. So my version, which also handles all-whitespace and empty strings, is:
#include <cctype>
#include <string>
#include <algorithm>
inline std::string trim(const std::string &s)
{
auto wsfront=std::find_if_not(s.begin(),s.end(),[](int c){return std::isspace(c);});
auto wsback=std::find_if_not(s.rbegin(),s.rend(),[](int c){return std::isspace(c);}).base();
return (wsback<=wsfront ? std::string() : std::string(wsfront,wsback));
}
We could make a reverse iterator from wsfront and use that as the termination condition in the second find_if_not but that's only useful in the case of an all-whitespace string, and gcc 4.8 at least isn't smart enough to infer the type of the reverse iterator (std::string::const_reverse_iterator) with auto. I don't know how expensive constructing a reverse iterator is, so YMMV here. With this alteration, the code looks like this:
inline std::string trim(const std::string &s)
{
auto wsfront=std::find_if_not(s.begin(),s.end(),[](int c){return std::isspace(c);});
return std::string(wsfront,std::find_if_not(s.rbegin(),std::string::const_reverse_iterator(wsfront),[](int c){return std::isspace(c);}).base());
}
http://ideone.com/nFVtEo
std::string trim(const std::string &s)
{
std::string::const_iterator it = s.begin();
while (it != s.end() && isspace(*it))
it++;
std::string::const_reverse_iterator rit = s.rbegin();
while (rit.base() != it && isspace(*rit))
rit++;
return std::string(it, rit.base());
}
I like tzaman's solution, the only problem with it is that it doesn't trim a string containing only spaces.
To correct that 1 flaw, add a str.clear() in between the 2 trimmer lines
std::stringstream trimmer;
trimmer << str;
str.clear();
trimmer >> str;
With C++17 you can use basic_string_view::remove_prefix and basic_string_view::remove_suffix:
std::string_view trim(std::string_view s)
{
s.remove_prefix(std::min(s.find_first_not_of(" \t\r\v\n"), s.size()));
s.remove_suffix(std::min(s.size() - s.find_last_not_of(" \t\r\v\n") - 1, s.size()));
return s;
}
A nice alternative:
std::string_view ltrim(std::string_view s)
{
s.remove_prefix(std::distance(s.cbegin(), std::find_if(s.cbegin(), s.cend(),
[](int c) {return !std::isspace(c);})));
return s;
}
std::string_view rtrim(std::string_view s)
{
s.remove_suffix(std::distance(s.crbegin(), std::find_if(s.crbegin(), s.crend(),
[](int c) {return !std::isspace(c);})));
return s;
}
std::string_view trim(std::string_view s)
{
return ltrim(rtrim(s));
}
In the case of an empty string, your code assumes that adding 1 to string::npos gives 0. string::npos is of type string::size_type, which is unsigned. Thus, you are relying on the overflow behaviour of addition.
s.erase(0, s.find_first_not_of(" \n\r\t"));
s.erase(s.find_last_not_of(" \n\r\t")+1);
Hacked off of Cplusplus.com
std::string choppa(const std::string &t, const std::string &ws)
{
std::string str = t;
size_t found;
found = str.find_last_not_of(ws);
if (found != std::string::npos)
str.erase(found+1);
else
str.clear(); // str is all whitespace
return str;
}
This works for the null case as well. :-)
My solution based on the answer by #Bill the Lizard.
Note that these functions will return the empty string if the input string contains nothing but whitespace.
const std::string StringUtils::WHITESPACE = " \n\r\t";
std::string StringUtils::Trim(const std::string& s)
{
return TrimRight(TrimLeft(s));
}
std::string StringUtils::TrimLeft(const std::string& s)
{
size_t startpos = s.find_first_not_of(StringUtils::WHITESPACE);
return (startpos == std::string::npos) ? "" : s.substr(startpos);
}
std::string StringUtils::TrimRight(const std::string& s)
{
size_t endpos = s.find_last_not_of(StringUtils::WHITESPACE);
return (endpos == std::string::npos) ? "" : s.substr(0, endpos+1);
}
str.erase(0, str.find_first_not_of("\t\n\v\f\r ")); // left trim
str.erase(str.find_last_not_of("\t\n\v\f\r ") + 1); // right trim
Try it online!
With C++11 also came a regular expression module, which of course can be used to trim leading or trailing spaces.
Maybe something like this:
std::string ltrim(const std::string& s)
{
static const std::regex lws{"^[[:space:]]*", std::regex_constants::extended};
return std::regex_replace(s, lws, "");
}
std::string rtrim(const std::string& s)
{
static const std::regex tws{"[[:space:]]*$", std::regex_constants::extended};
return std::regex_replace(s, tws, "");
}
std::string trim(const std::string& s)
{
return ltrim(rtrim(s));
}
My answer is an improvement upon the top answer for this post that trims control characters as well as spaces (0-32 and 127 on the ASCII table).
std::isgraph determines if a character has a graphical representation, so you can use this to alter Evan's answer to remove any character that doesn't have a graphical representation from either side of a string. The result is a much more elegant solution:
#include <algorithm>
#include <functional>
#include <string>
/**
* #brief Left Trim
*
* Trims whitespace from the left end of the provided std::string
*
* #param[out] s The std::string to trim
*
* #return The modified std::string&
*/
std::string& ltrim(std::string& s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(),
std::ptr_fun<int, int>(std::isgraph)));
return s;
}
/**
* #brief Right Trim
*
* Trims whitespace from the right end of the provided std::string
*
* #param[out] s The std::string to trim
*
* #return The modified std::string&
*/
std::string& rtrim(std::string& s) {
s.erase(std::find_if(s.rbegin(), s.rend(),
std::ptr_fun<int, int>(std::isgraph)).base(), s.end());
return s;
}
/**
* #brief Trim
*
* Trims whitespace from both ends of the provided std::string
*
* #param[out] s The std::string to trim
*
* #return The modified std::string&
*/
std::string& trim(std::string& s) {
return ltrim(rtrim(s));
}
Note: Alternatively you should be able to use std::iswgraph if you need support for wide characters, but you will also have to edit this code to enable std::wstring manipulation, which is something that I haven't tested (see the reference page for std::basic_string to explore this option).
This is what I use. Just keep removing space from the front, and then, if there's anything left, do the same from the back.
void trim(string& s) {
while(s.compare(0,1," ")==0)
s.erase(s.begin()); // remove leading whitespaces
while(s.size()>0 && s.compare(s.size()-1,1," ")==0)
s.erase(s.end()-1); // remove trailing whitespaces
}
An elegant way of doing it can be like
std::string & trim(std::string & str)
{
return ltrim(rtrim(str));
}
And the supportive functions are implemented as:
std::string & ltrim(std::string & str)
{
auto it = std::find_if( str.begin() , str.end() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( str.begin() , it);
return str;
}
std::string & rtrim(std::string & str)
{
auto it = std::find_if( str.rbegin() , str.rend() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( it.base() , str.end() );
return str;
}
And once you've all these in place, you can write this as well:
std::string trim_copy(std::string const & str)
{
auto s = str;
return ltrim(rtrim(s));
}
Here is a solution for trim with regex
#include <string>
#include <regex>
string trim(string str){
return regex_replace(str, regex("(^[ ]+)|([ ]+$)"),"");
}
I guess if you start asking for the "best way" to trim a string, I'd say a good implementation would be one that:
Doesn't allocate temporary strings
Has overloads for in-place trim and copy trim
Can be easily customized to accept different validation sequences / logic
Obviously there are too many different ways to approach this and it definitely depends on what you actually need. However, the C standard library still has some very useful functions in <string.h>, like memchr. There's a reason why C is still regarded as the best language for IO - its stdlib is pure efficiency.
inline const char* trim_start(const char* str)
{
while (memchr(" \t\n\r", *str, 4)) ++str;
return str;
}
inline const char* trim_end(const char* end)
{
while (memchr(" \t\n\r", end[-1], 4)) --end;
return end;
}
inline std::string trim(const char* buffer, int len) // trim a buffer (input?)
{
return std::string(trim_start(buffer), trim_end(buffer + len));
}
inline void trim_inplace(std::string& str)
{
str.assign(trim_start(str.c_str()),
trim_end(str.c_str() + str.length()));
}
int main()
{
char str [] = "\t \nhello\r \t \n";
string trimmed = trim(str, strlen(str));
cout << "'" << trimmed << "'" << endl;
system("pause");
return 0;
}
For what it's worth, here is a trim implementation with an eye towards performance. It's much quicker than many other trim routines I've seen around. Instead of using iterators and std::finds, it uses raw c strings and indices. It optimizes the following special cases: size 0 string (do nothing), string with no whitespace to trim (do nothing), string with only trailing whitespace to trim (just resize the string), string that's entirely whitespace (just clear the string). And finally, in the worst case (string with leading whitespace), it does its best to perform an efficient copy construction, performing only 1 copy and then moving that copy in place of the original string.
void TrimString(std::string & str)
{
if(str.empty())
return;
const auto pStr = str.c_str();
size_t front = 0;
while(front < str.length() && std::isspace(int(pStr[front]))) {++front;}
size_t back = str.length();
while(back > front && std::isspace(int(pStr[back-1]))) {--back;}
if(0 == front)
{
if(back < str.length())
{
str.resize(back - front);
}
}
else if(back <= front)
{
str.clear();
}
else
{
str = std::move(std::string(str.begin()+front, str.begin()+back));
}
}
Trim C++11 implementation:
static void trim(std::string &s) {
s.erase(s.begin(), std::find_if_not(s.begin(), s.end(), [](char c){ return std::isspace(c); }));
s.erase(std::find_if_not(s.rbegin(), s.rend(), [](char c){ return std::isspace(c); }).base(), s.end());
}
Contributing my solution to the noise. trim defaults to creating a new string and returning the modified one while trim_in_place modifies the string passed to it. The trim function supports c++11 move semantics.
#include <string>
// modifies input string, returns input
std::string& trim_left_in_place(std::string& str) {
size_t i = 0;
while(i < str.size() && isspace(str[i])) { ++i; };
return str.erase(0, i);
}
std::string& trim_right_in_place(std::string& str) {
size_t i = str.size();
while(i > 0 && isspace(str[i - 1])) { --i; };
return str.erase(i, str.size());
}
std::string& trim_in_place(std::string& str) {
return trim_left_in_place(trim_right_in_place(str));
}
// returns newly created strings
std::string trim_right(std::string str) {
return trim_right_in_place(str);
}
std::string trim_left(std::string str) {
return trim_left_in_place(str);
}
std::string trim(std::string str) {
return trim_left_in_place(trim_right_in_place(str));
}
#include <cassert>
int main() {
std::string s1(" \t\r\n ");
std::string s2(" \r\nc");
std::string s3("c \t");
std::string s4(" \rc ");
assert(trim(s1) == "");
assert(trim(s2) == "c");
assert(trim(s3) == "c");
assert(trim(s4) == "c");
assert(s1 == " \t\r\n ");
assert(s2 == " \r\nc");
assert(s3 == "c \t");
assert(s4 == " \rc ");
assert(trim_in_place(s1) == "");
assert(trim_in_place(s2) == "c");
assert(trim_in_place(s3) == "c");
assert(trim_in_place(s4) == "c");
assert(s1 == "");
assert(s2 == "c");
assert(s3 == "c");
assert(s4 == "c");
}
This can be done more simply in C++11 due to the addition of back() and pop_back().
while ( !s.empty() && isspace(s.back()) ) s.pop_back();
Here's what I came up with:
std::stringstream trimmer;
trimmer << str;
trimmer >> str;
Stream extraction eliminates whitespace automatically, so this works like a charm.
Pretty clean and elegant too, if I do say so myself. ;)
I'm not sure if your environment is the same, but in mine, the empty string case will cause the program to abort. I would either wrap that erase call with an if(!s.empty()) or use Boost as already mentioned.
Here is my version:
size_t beg = s.find_first_not_of(" \r\n");
return (beg == string::npos) ? "" : in.substr(beg, s.find_last_not_of(" \r\n") - beg);
Here's a solution easy to understand for beginners not used to write std:: everywhere and not yet familiar with const-correctness, iterators, STL algorithms, etc...
#include <string>
#include <cctype> // for isspace
using namespace std;
// Left trim the given string (" hello! " --> "hello! ")
string left_trim(string str) {
int numStartSpaces = 0;
for (int i = 0; i < str.length(); i++) {
if (!isspace(str[i])) break;
numStartSpaces++;
}
return str.substr(numStartSpaces);
}
// Right trim the given string (" hello! " --> " hello!")
string right_trim(string str) {
int numEndSpaces = 0;
for (int i = str.length() - 1; i >= 0; i--) {
if (!isspace(str[i])) break;
numEndSpaces++;
}
return str.substr(0, str.length() - numEndSpaces);
}
// Left and right trim the given string (" hello! " --> "hello!")
string trim(string str) {
return right_trim(left_trim(str));
}
Hope it helps...
The above methods are great, but sometimes you want to use a combination of functions for what your routine considers to be whitespace. In this case, using functors to combine operations can get messy so I prefer a simple loop I can modify for the trim. Here is a slightly modified trim function copied from the C version here on SO. In this example, I am trimming non alphanumeric characters.
string trim(char const *str)
{
// Trim leading non-letters
while(!isalnum(*str)) str++;
// Trim trailing non-letters
end = str + strlen(str) - 1;
while(end > str && !isalnum(*end)) end--;
return string(str, end+1);
}
What about this...?
#include <iostream>
#include <string>
#include <regex>
std::string ltrim( std::string str ) {
return std::regex_replace( str, std::regex("^\\s+"), std::string("") );
}
std::string rtrim( std::string str ) {
return std::regex_replace( str, std::regex("\\s+$"), std::string("") );
}
std::string trim( std::string str ) {
return ltrim( rtrim( str ) );
}
int main() {
std::string str = " \t this is a test string \n ";
std::cout << "-" << trim( str ) << "-\n";
return 0;
}
Note: I'm still relatively new to C++, so please forgive me if I'm off base here.
This function, vec2string takes a vector of char's and converts to a hex string representation but with a blank space between each byte value. Just a formatting requirement in my app. Can anyone think of a way to remove the need for that.
std::string& vec2string(const std::vector<char>& vec, std::string& s) {
static const char hex_lookup[] = "0123456789ABCDEF";
for(std::vector<char>::const_iterator it = vec.begin(); it != vec.end(); ++it) {
s.append(1, hex_lookup[(*it >> 4) & 0xf]);
s.append(1, hex_lookup[*it & 0xf]);
s.append(1, ' ');
}
//remove very last space - I would ideally like to remove this***
if(!s.empty())
s.erase(s.size()-1);
return s;
}
std::string& vec2string(const std::vector<char>& vec, std::string& s) {
static const char hex_lookup[] = "0123456789ABCDEF";
if (vec.empty())
return s;
s.reserve(s.size() + vec.size() * 3 - 1);
std::vector<char>::const_iterator it = vec.begin();
s.append(1, hex_lookup[(*it >> 4) & 0xf]);
s.append(1, hex_lookup[*it & 0xf]);
for(++it; it != vec.end(); ++it) {
s.append(1, ' ');
s.append(1, hex_lookup[(*it >> 4) & 0xf]);
s.append(1, hex_lookup[*it & 0xf]);
}
return s;
}
You could add a check in the loop, before appending the characters, if the string is not empty and then add a space. Like:
for (...)
{
if (!s.empty())
s += ' ';
...
}
If you have Boost, use algorithm/string/join.hpp. Otherwise, you could try the Duff's Device approach:
string hex_str(const vector<char>& bytes)
{
string result;
if (!bytes.empty())
{
const char hex_lookup[] = "0123456789ABCDEF";
vector<char>::const_iterator it = bytes.begin();
goto skip;
do {
result += ' ';
skip:
result += hex_lookup[(*it >> 4) & 0xf];
result += hex_lookup[*it & 0xf];
} while (++it != bytes.end());
}
return result;
}
Use boost::trim documentation
boost::trim(your_string);
I'd start by separating the hex conversion into its own little function:
std::string to_hex(char in) {
static const char hex_lookup[] = "0123456789ABCDEF";
std::string s;
s.push_back(hex_lookup[(in>>4) & 0xf];
s.push_back(hex_lookup[in & 0xf];
return s;
}
Then I'd use std::transform to apply that to the whole vector, with my infix_ostream_iterator and a std::stringstream to put the pieces together.
#include <sstream>
#include <algorithm>
#include "infix_iterator"
std::string vec2string(const std::vector<char>& vec) {
std::stringstream s;
std::transform(vec.begin(), vec.end(),
infix_ostream_iterator<std::string>(s, " "),
to_hex);
return s.str();
}
Also note that rather than modifying an existing string, this creates and returns a new string. At least in my opinion, modifying an existing string is a poor idea -- simply producing a string is much cleaner and more modular. If the caller wants to combine the results into a longer string, that's fine, but it's better for the lower-level function to just do one thing cleanly, and let the higher level function decide what to do with the result.
I'm currently using the following code to right-trim all the std::strings in my programs:
std::string s;
s.erase(s.find_last_not_of(" \n\r\t")+1);
It works fine, but I wonder if there are some end-cases where it might fail?
Of course, answers with elegant alternatives and also left-trim solution are welcome.
EDIT Since c++17, some parts of the standard library were removed. Fortunately, starting with c++11, we have lambdas which are a superior solution.
#include <algorithm>
#include <cctype>
#include <locale>
// trim from start (in place)
static inline void ltrim(std::string &s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(), [](unsigned char ch) {
return !std::isspace(ch);
}));
}
// trim from end (in place)
static inline void rtrim(std::string &s) {
s.erase(std::find_if(s.rbegin(), s.rend(), [](unsigned char ch) {
return !std::isspace(ch);
}).base(), s.end());
}
// trim from both ends (in place)
static inline void trim(std::string &s) {
rtrim(s);
ltrim(s);
}
// trim from start (copying)
static inline std::string ltrim_copy(std::string s) {
ltrim(s);
return s;
}
// trim from end (copying)
static inline std::string rtrim_copy(std::string s) {
rtrim(s);
return s;
}
// trim from both ends (copying)
static inline std::string trim_copy(std::string s) {
trim(s);
return s;
}
Thanks to https://stackoverflow.com/a/44973498/524503 for bringing up the modern solution.
Original answer:
I tend to use one of these 3 for my trimming needs:
#include <algorithm>
#include <functional>
#include <cctype>
#include <locale>
// trim from start
static inline std::string <rim(std::string &s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(),
std::not1(std::ptr_fun<int, int>(std::isspace))));
return s;
}
// trim from end
static inline std::string &rtrim(std::string &s) {
s.erase(std::find_if(s.rbegin(), s.rend(),
std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end());
return s;
}
// trim from both ends
static inline std::string &trim(std::string &s) {
return ltrim(rtrim(s));
}
They are fairly self-explanatory and work very well.
EDIT: BTW, I have std::ptr_fun in there to help disambiguate std::isspace because there is actually a second definition which supports locales. This could have been a cast just the same, but I tend to like this better.
EDIT: To address some comments about accepting a parameter by reference, modifying and returning it. I Agree. An implementation that I would likely prefer would be two sets of functions, one for in place and one which makes a copy. A better set of examples would be:
#include <algorithm>
#include <functional>
#include <cctype>
#include <locale>
// trim from start (in place)
static inline void ltrim(std::string &s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(),
std::not1(std::ptr_fun<int, int>(std::isspace))));
}
// trim from end (in place)
static inline void rtrim(std::string &s) {
s.erase(std::find_if(s.rbegin(), s.rend(),
std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end());
}
// trim from both ends (in place)
static inline void trim(std::string &s) {
rtrim(s);
ltrim(s);
}
// trim from start (copying)
static inline std::string ltrim_copy(std::string s) {
ltrim(s);
return s;
}
// trim from end (copying)
static inline std::string rtrim_copy(std::string s) {
rtrim(s);
return s;
}
// trim from both ends (copying)
static inline std::string trim_copy(std::string s) {
trim(s);
return s;
}
I am keeping the original answer above though for context and in the interest of keeping the high voted answer still available.
Using Boost's string algorithms would be easiest:
#include <boost/algorithm/string.hpp>
std::string str("hello world! ");
boost::trim_right(str);
str is now "hello world!". There's also trim_left and trim, which trims both sides.
If you add _copy suffix to any of above function names e.g. trim_copy, the function will return a trimmed copy of the string instead of modifying it through a reference.
If you add _if suffix to any of above function names e.g. trim_copy_if, you can trim all characters satisfying your custom predicate, as opposed to just whitespaces.
What you are doing is fine and robust. I have used the same method for a long time and I have yet to find a faster method:
const char* ws = " \t\n\r\f\v";
// trim from end of string (right)
inline std::string& rtrim(std::string& s, const char* t = ws)
{
s.erase(s.find_last_not_of(t) + 1);
return s;
}
// trim from beginning of string (left)
inline std::string& ltrim(std::string& s, const char* t = ws)
{
s.erase(0, s.find_first_not_of(t));
return s;
}
// trim from both ends of string (right then left)
inline std::string& trim(std::string& s, const char* t = ws)
{
return ltrim(rtrim(s, t), t);
}
By supplying the characters to be trimmed you have the flexibility to trim non-whitespace characters and the efficiency to trim only the characters you want trimmed.
Use the following code to right trim (trailing) spaces and tab characters from std::strings (ideone):
// trim trailing spaces
size_t endpos = str.find_last_not_of(" \t");
size_t startpos = str.find_first_not_of(" \t");
if( std::string::npos != endpos )
{
str = str.substr( 0, endpos+1 );
str = str.substr( startpos );
}
else {
str.erase(std::remove(std::begin(str), std::end(str), ' '), std::end(str));
}
And just to balance things out, I'll include the left trim code too (ideone):
// trim leading spaces
size_t startpos = str.find_first_not_of(" \t");
if( string::npos != startpos )
{
str = str.substr( startpos );
}
Try this, it works for me.
inline std::string trim(std::string& str)
{
str.erase(str.find_last_not_of(' ')+1); //suffixing spaces
str.erase(0, str.find_first_not_of(' ')); //prefixing spaces
return str;
}
Bit late to the party, but never mind. Now C++11 is here, we have lambdas and auto variables. So my version, which also handles all-whitespace and empty strings, is:
#include <cctype>
#include <string>
#include <algorithm>
inline std::string trim(const std::string &s)
{
auto wsfront=std::find_if_not(s.begin(),s.end(),[](int c){return std::isspace(c);});
auto wsback=std::find_if_not(s.rbegin(),s.rend(),[](int c){return std::isspace(c);}).base();
return (wsback<=wsfront ? std::string() : std::string(wsfront,wsback));
}
We could make a reverse iterator from wsfront and use that as the termination condition in the second find_if_not but that's only useful in the case of an all-whitespace string, and gcc 4.8 at least isn't smart enough to infer the type of the reverse iterator (std::string::const_reverse_iterator) with auto. I don't know how expensive constructing a reverse iterator is, so YMMV here. With this alteration, the code looks like this:
inline std::string trim(const std::string &s)
{
auto wsfront=std::find_if_not(s.begin(),s.end(),[](int c){return std::isspace(c);});
return std::string(wsfront,std::find_if_not(s.rbegin(),std::string::const_reverse_iterator(wsfront),[](int c){return std::isspace(c);}).base());
}
http://ideone.com/nFVtEo
std::string trim(const std::string &s)
{
std::string::const_iterator it = s.begin();
while (it != s.end() && isspace(*it))
it++;
std::string::const_reverse_iterator rit = s.rbegin();
while (rit.base() != it && isspace(*rit))
rit++;
return std::string(it, rit.base());
}
I like tzaman's solution, the only problem with it is that it doesn't trim a string containing only spaces.
To correct that 1 flaw, add a str.clear() in between the 2 trimmer lines
std::stringstream trimmer;
trimmer << str;
str.clear();
trimmer >> str;
With C++17 you can use basic_string_view::remove_prefix and basic_string_view::remove_suffix:
std::string_view trim(std::string_view s)
{
s.remove_prefix(std::min(s.find_first_not_of(" \t\r\v\n"), s.size()));
s.remove_suffix(std::min(s.size() - s.find_last_not_of(" \t\r\v\n") - 1, s.size()));
return s;
}
A nice alternative:
std::string_view ltrim(std::string_view s)
{
s.remove_prefix(std::distance(s.cbegin(), std::find_if(s.cbegin(), s.cend(),
[](int c) {return !std::isspace(c);})));
return s;
}
std::string_view rtrim(std::string_view s)
{
s.remove_suffix(std::distance(s.crbegin(), std::find_if(s.crbegin(), s.crend(),
[](int c) {return !std::isspace(c);})));
return s;
}
std::string_view trim(std::string_view s)
{
return ltrim(rtrim(s));
}
In the case of an empty string, your code assumes that adding 1 to string::npos gives 0. string::npos is of type string::size_type, which is unsigned. Thus, you are relying on the overflow behaviour of addition.
s.erase(0, s.find_first_not_of(" \n\r\t"));
s.erase(s.find_last_not_of(" \n\r\t")+1);
Hacked off of Cplusplus.com
std::string choppa(const std::string &t, const std::string &ws)
{
std::string str = t;
size_t found;
found = str.find_last_not_of(ws);
if (found != std::string::npos)
str.erase(found+1);
else
str.clear(); // str is all whitespace
return str;
}
This works for the null case as well. :-)
My solution based on the answer by #Bill the Lizard.
Note that these functions will return the empty string if the input string contains nothing but whitespace.
const std::string StringUtils::WHITESPACE = " \n\r\t";
std::string StringUtils::Trim(const std::string& s)
{
return TrimRight(TrimLeft(s));
}
std::string StringUtils::TrimLeft(const std::string& s)
{
size_t startpos = s.find_first_not_of(StringUtils::WHITESPACE);
return (startpos == std::string::npos) ? "" : s.substr(startpos);
}
std::string StringUtils::TrimRight(const std::string& s)
{
size_t endpos = s.find_last_not_of(StringUtils::WHITESPACE);
return (endpos == std::string::npos) ? "" : s.substr(0, endpos+1);
}
str.erase(0, str.find_first_not_of("\t\n\v\f\r ")); // left trim
str.erase(str.find_last_not_of("\t\n\v\f\r ") + 1); // right trim
Try it online!
With C++11 also came a regular expression module, which of course can be used to trim leading or trailing spaces.
Maybe something like this:
std::string ltrim(const std::string& s)
{
static const std::regex lws{"^[[:space:]]*", std::regex_constants::extended};
return std::regex_replace(s, lws, "");
}
std::string rtrim(const std::string& s)
{
static const std::regex tws{"[[:space:]]*$", std::regex_constants::extended};
return std::regex_replace(s, tws, "");
}
std::string trim(const std::string& s)
{
return ltrim(rtrim(s));
}
My answer is an improvement upon the top answer for this post that trims control characters as well as spaces (0-32 and 127 on the ASCII table).
std::isgraph determines if a character has a graphical representation, so you can use this to alter Evan's answer to remove any character that doesn't have a graphical representation from either side of a string. The result is a much more elegant solution:
#include <algorithm>
#include <functional>
#include <string>
/**
* #brief Left Trim
*
* Trims whitespace from the left end of the provided std::string
*
* #param[out] s The std::string to trim
*
* #return The modified std::string&
*/
std::string& ltrim(std::string& s) {
s.erase(s.begin(), std::find_if(s.begin(), s.end(),
std::ptr_fun<int, int>(std::isgraph)));
return s;
}
/**
* #brief Right Trim
*
* Trims whitespace from the right end of the provided std::string
*
* #param[out] s The std::string to trim
*
* #return The modified std::string&
*/
std::string& rtrim(std::string& s) {
s.erase(std::find_if(s.rbegin(), s.rend(),
std::ptr_fun<int, int>(std::isgraph)).base(), s.end());
return s;
}
/**
* #brief Trim
*
* Trims whitespace from both ends of the provided std::string
*
* #param[out] s The std::string to trim
*
* #return The modified std::string&
*/
std::string& trim(std::string& s) {
return ltrim(rtrim(s));
}
Note: Alternatively you should be able to use std::iswgraph if you need support for wide characters, but you will also have to edit this code to enable std::wstring manipulation, which is something that I haven't tested (see the reference page for std::basic_string to explore this option).
This is what I use. Just keep removing space from the front, and then, if there's anything left, do the same from the back.
void trim(string& s) {
while(s.compare(0,1," ")==0)
s.erase(s.begin()); // remove leading whitespaces
while(s.size()>0 && s.compare(s.size()-1,1," ")==0)
s.erase(s.end()-1); // remove trailing whitespaces
}
An elegant way of doing it can be like
std::string & trim(std::string & str)
{
return ltrim(rtrim(str));
}
And the supportive functions are implemented as:
std::string & ltrim(std::string & str)
{
auto it = std::find_if( str.begin() , str.end() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( str.begin() , it);
return str;
}
std::string & rtrim(std::string & str)
{
auto it = std::find_if( str.rbegin() , str.rend() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( it.base() , str.end() );
return str;
}
And once you've all these in place, you can write this as well:
std::string trim_copy(std::string const & str)
{
auto s = str;
return ltrim(rtrim(s));
}
Here is a solution for trim with regex
#include <string>
#include <regex>
string trim(string str){
return regex_replace(str, regex("(^[ ]+)|([ ]+$)"),"");
}
I guess if you start asking for the "best way" to trim a string, I'd say a good implementation would be one that:
Doesn't allocate temporary strings
Has overloads for in-place trim and copy trim
Can be easily customized to accept different validation sequences / logic
Obviously there are too many different ways to approach this and it definitely depends on what you actually need. However, the C standard library still has some very useful functions in <string.h>, like memchr. There's a reason why C is still regarded as the best language for IO - its stdlib is pure efficiency.
inline const char* trim_start(const char* str)
{
while (memchr(" \t\n\r", *str, 4)) ++str;
return str;
}
inline const char* trim_end(const char* end)
{
while (memchr(" \t\n\r", end[-1], 4)) --end;
return end;
}
inline std::string trim(const char* buffer, int len) // trim a buffer (input?)
{
return std::string(trim_start(buffer), trim_end(buffer + len));
}
inline void trim_inplace(std::string& str)
{
str.assign(trim_start(str.c_str()),
trim_end(str.c_str() + str.length()));
}
int main()
{
char str [] = "\t \nhello\r \t \n";
string trimmed = trim(str, strlen(str));
cout << "'" << trimmed << "'" << endl;
system("pause");
return 0;
}
For what it's worth, here is a trim implementation with an eye towards performance. It's much quicker than many other trim routines I've seen around. Instead of using iterators and std::finds, it uses raw c strings and indices. It optimizes the following special cases: size 0 string (do nothing), string with no whitespace to trim (do nothing), string with only trailing whitespace to trim (just resize the string), string that's entirely whitespace (just clear the string). And finally, in the worst case (string with leading whitespace), it does its best to perform an efficient copy construction, performing only 1 copy and then moving that copy in place of the original string.
void TrimString(std::string & str)
{
if(str.empty())
return;
const auto pStr = str.c_str();
size_t front = 0;
while(front < str.length() && std::isspace(int(pStr[front]))) {++front;}
size_t back = str.length();
while(back > front && std::isspace(int(pStr[back-1]))) {--back;}
if(0 == front)
{
if(back < str.length())
{
str.resize(back - front);
}
}
else if(back <= front)
{
str.clear();
}
else
{
str = std::move(std::string(str.begin()+front, str.begin()+back));
}
}
Trim C++11 implementation:
static void trim(std::string &s) {
s.erase(s.begin(), std::find_if_not(s.begin(), s.end(), [](char c){ return std::isspace(c); }));
s.erase(std::find_if_not(s.rbegin(), s.rend(), [](char c){ return std::isspace(c); }).base(), s.end());
}
Contributing my solution to the noise. trim defaults to creating a new string and returning the modified one while trim_in_place modifies the string passed to it. The trim function supports c++11 move semantics.
#include <string>
// modifies input string, returns input
std::string& trim_left_in_place(std::string& str) {
size_t i = 0;
while(i < str.size() && isspace(str[i])) { ++i; };
return str.erase(0, i);
}
std::string& trim_right_in_place(std::string& str) {
size_t i = str.size();
while(i > 0 && isspace(str[i - 1])) { --i; };
return str.erase(i, str.size());
}
std::string& trim_in_place(std::string& str) {
return trim_left_in_place(trim_right_in_place(str));
}
// returns newly created strings
std::string trim_right(std::string str) {
return trim_right_in_place(str);
}
std::string trim_left(std::string str) {
return trim_left_in_place(str);
}
std::string trim(std::string str) {
return trim_left_in_place(trim_right_in_place(str));
}
#include <cassert>
int main() {
std::string s1(" \t\r\n ");
std::string s2(" \r\nc");
std::string s3("c \t");
std::string s4(" \rc ");
assert(trim(s1) == "");
assert(trim(s2) == "c");
assert(trim(s3) == "c");
assert(trim(s4) == "c");
assert(s1 == " \t\r\n ");
assert(s2 == " \r\nc");
assert(s3 == "c \t");
assert(s4 == " \rc ");
assert(trim_in_place(s1) == "");
assert(trim_in_place(s2) == "c");
assert(trim_in_place(s3) == "c");
assert(trim_in_place(s4) == "c");
assert(s1 == "");
assert(s2 == "c");
assert(s3 == "c");
assert(s4 == "c");
}
This can be done more simply in C++11 due to the addition of back() and pop_back().
while ( !s.empty() && isspace(s.back()) ) s.pop_back();
Here's what I came up with:
std::stringstream trimmer;
trimmer << str;
trimmer >> str;
Stream extraction eliminates whitespace automatically, so this works like a charm.
Pretty clean and elegant too, if I do say so myself. ;)
I'm not sure if your environment is the same, but in mine, the empty string case will cause the program to abort. I would either wrap that erase call with an if(!s.empty()) or use Boost as already mentioned.
Here is my version:
size_t beg = s.find_first_not_of(" \r\n");
return (beg == string::npos) ? "" : in.substr(beg, s.find_last_not_of(" \r\n") - beg);
Here's a solution easy to understand for beginners not used to write std:: everywhere and not yet familiar with const-correctness, iterators, STL algorithms, etc...
#include <string>
#include <cctype> // for isspace
using namespace std;
// Left trim the given string (" hello! " --> "hello! ")
string left_trim(string str) {
int numStartSpaces = 0;
for (int i = 0; i < str.length(); i++) {
if (!isspace(str[i])) break;
numStartSpaces++;
}
return str.substr(numStartSpaces);
}
// Right trim the given string (" hello! " --> " hello!")
string right_trim(string str) {
int numEndSpaces = 0;
for (int i = str.length() - 1; i >= 0; i--) {
if (!isspace(str[i])) break;
numEndSpaces++;
}
return str.substr(0, str.length() - numEndSpaces);
}
// Left and right trim the given string (" hello! " --> "hello!")
string trim(string str) {
return right_trim(left_trim(str));
}
Hope it helps...
The above methods are great, but sometimes you want to use a combination of functions for what your routine considers to be whitespace. In this case, using functors to combine operations can get messy so I prefer a simple loop I can modify for the trim. Here is a slightly modified trim function copied from the C version here on SO. In this example, I am trimming non alphanumeric characters.
string trim(char const *str)
{
// Trim leading non-letters
while(!isalnum(*str)) str++;
// Trim trailing non-letters
end = str + strlen(str) - 1;
while(end > str && !isalnum(*end)) end--;
return string(str, end+1);
}
What about this...?
#include <iostream>
#include <string>
#include <regex>
std::string ltrim( std::string str ) {
return std::regex_replace( str, std::regex("^\\s+"), std::string("") );
}
std::string rtrim( std::string str ) {
return std::regex_replace( str, std::regex("\\s+$"), std::string("") );
}
std::string trim( std::string str ) {
return ltrim( rtrim( str ) );
}
int main() {
std::string str = " \t this is a test string \n ";
std::cout << "-" << trim( str ) << "-\n";
return 0;
}
Note: I'm still relatively new to C++, so please forgive me if I'm off base here.