How to strip all non alphanumeric characters from a string in c++? - c++

I am writing a piece of software, and It require me to handle data I get from a webpage with libcurl. When I get the data, for some reason it has extra line breaks in it. I need to figure out a way to only allow letters, numbers, and spaces. And remove everything else, including line breaks. Is there any easy way to do this? Thanks.

Write a function that takes a char and returns true if you want to remove that character or false if you want to keep it:
bool my_predicate(char c);
Then use the std::remove_if algorithm to remove the unwanted characters from the string:
std::string s = "my data";
s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());
Depending on your requirements, you may be able to use one of the Standard Library predicates, like std::isalnum, instead of writing your own predicate (you said you needed to match alphanumeric characters and spaces, so perhaps this doesn't exactly fit what you need).
If you want to use the Standard Library std::isalnum function, you will need a cast to disambiguate between the std::isalnum function in the C Standard Library header <cctype> (which is the one you want to use) and the std::isalnum in the C++ Standard Library header <locale> (which is not the one you want to use, unless you want to perform locale-specific string processing):
s.erase(std::remove_if(s.begin(), s.end(), (int(*)(int))std::isalnum), s.end());
This works equally well with any of the sequence containers (including std::string, std::vector and std::deque). This idiom is commonly referred to as the "erase/remove" idiom. The std::remove_if algorithm will also work with ordinary arrays. The std::remove_if makes only a single pass over the sequence, so it has linear time complexity.

Previous uses of std::isalnum won't compile with std::ptr_fun without passing the unary argument is requires, hence this solution with a lambda function should encapsulate the correct answer:
s.erase(std::remove_if(s.begin(), s.end(),
[]( auto const& c ) -> bool { return !std::isalnum(c); } ), s.end());

You could always loop through and just erase all non alphanumeric characters if you're using string.
#include <cctype>
size_t i = 0;
size_t len = str.length();
while(i < len){
if (!isalnum(str[i]) || str[i] == ' '){
str.erase(i,1);
len--;
}else
i++;
}
Someone better with the Standard Lib can probably do this without a loop.
If you're using just a char buffer, you can loop through and if a character is not alphanumeric, shift all the characters after it backwards one (to overwrite the offending character):
#include <cctype>
size_t buflen = something;
for (size_t i = 0; i < buflen; ++i)
if (!isalnum(buf[i]) || buf[i] != ' ')
memcpy(buf[i], buf[i + 1], --buflen - i);

Just extending James McNellis's code a little bit more. His function is deleting alnum characters instead of non-alnum ones.
To delete non-alnum characters from a string. (alnum = alphabetical or numeric)
Declare a function (isalnum returns 0 if passed char is not alnum)
bool isNotAlnum(char c) {
return isalnum(c) == 0;
}
And then write this
s.erase(remove_if(s.begin(), s.end(), isNotAlnum), s.end());
then your string is only with alnum characters.

Benchmarking the different methods.
If you are looking for a benchmark I made one.
(115830 cycles) 115.8ms -> using stringstream
( 40434 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !isalnum(c); }), s.end());
( 40389 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return ispunct(c); }), s.end());
( 42386 cycles) 42.4ms -> s.erase(remove_if(s.begin(), s.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s.end());
( 42969 cycles) 43.0ms -> s.erase(remove_if(s.begin(), s.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s.end());
( 44829 cycles) 44.8ms -> alnum_from_libc(s) see below
( 24505 cycles) 24.5ms -> Puzzled? My method, see below
( 9717 cycles) 9.7ms -> using mask and bitwise operators
Original length: 8286208, current len with alnum only: 5822471
Stringstream gives terrible results (but we all know that)
The different answers already given gives about the same runtime
Doing it the C way consistently give better runtime (almost twice faster!), it is definitely worth considering, and on top of that it is compatible with C language.
My bitwise method (also C compatible) is more than 400% faster.
NB the selected answer had to be modified as it was keeping only the special characters
NB2: The test file is a (almost) 8192 kb text file with roughly 62 alnum and 12 special characters, randomly and evenly written.
Benchmark source code
#include <ctime>
#include <iostream>
#include <sstream>
#include <string>
#include <algorithm>
#include <locale> // ispunct
#include <cctype>
#include <fstream> // read file
#include <streambuf>
#include <sys/stat.h> // check if file exist
#include <cstring>
using namespace std;
bool exist(const char *name)
{
struct stat buffer;
return !stat(name, &buffer);
}
constexpr int SIZE = 8092 * 1024;
void keep_alnum(string &s) {
stringstream ss;
int i = 0;
for (i = 0; i < SIZE; i++)
if (isalnum(s[i]))
ss << s[i];
s = ss.str();
}
/* my method, best runtime */
void old_school(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30; // '0'
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
void alnum_from_libc(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
if (isalnum(s[i]))
s[n++] = s[i];
}
s[n] = '\0';
}
#define benchmark(x) printf("\033[30m(%6.0lf cycles) \033[32m%5.1lfms\n\033[0m", x, x / (CLOCKS_PER_SEC / 1000))
int main(int ac, char **av) {
if (ac < 2) {
cout << "usage: ./a.out \"{your string} or ./a.out FILE \"{your file}\n";
return 1;
}
string s;
s.reserve(SIZE+1);
string s1;
s1.reserve(SIZE+1);
char s4[SIZE + 1], s5[SIZE + 1];
if (ac == 3) {
if (!exist(av[2])) {
for (size_t i = 0; i < SIZE; i++)
s4[i] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnoporstuvwxyz!##$%^&*()__+:\"<>?,./'"[rand() % 74];
s4[SIZE] = '\0';
ofstream ofs(av[2]);
if (ofs)
ofs << s4;
}
ifstream ifs(av[2]);
if (ifs) {
ifs.rdbuf()->pubsetbuf(s4, SIZE);
copy(istreambuf_iterator<char>(ifs), {}, s.begin());
}
else
cout << "error\n";
ifs.seekg(0, ios::beg);
s.assign((istreambuf_iterator<char>(ifs)), istreambuf_iterator<char>());
}
else
s = av[1];
double elapsedTime;
clock_t start;
bool is_different = false;
s1 = s;
start = clock();
keep_alnum(s1);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
string tmp = s1;
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return !isalnum(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return ispunct(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
memcpy(s4, s.c_str(), SIZE);
start = clock();
alnum_from_libc(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
memcpy(s4, s.c_str(), SIZE);
start = clock();
old_school(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
cout << "Original length: " << s.size() << ", current len with alnum only: " << strlen(s4) << endl;
// make sure that strings are equivalent
printf("\033[3%cm%s\n", ('3' + !is_different), !is_different ? "OK" : "KO");
return 0;
}
My solution
For the bitwise method you can check it directly on my github, basically I avoid branching instructions (if) thanks to the mask. I avoid posting bitwise operations with C++ tag, I get a lot of hate for it.
For the C style one, I iterate over the string and have two index: n for the characters we keep and i to go through the string, where we test one after another if it is a digit, a uppercase or a lowercase.
Add this function:
void strip_special_chars(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30;
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
and use as:
char s1[s.size() + 1]
memcpy(s1, s.c_str(), s.size());
strip_special_chars(s1);

The remove_copy_if standard algorithm would be very appropriate for your case.

#include <cctype>
#include <string>
#include <functional>
std::string s = "Hello World!";
s.erase(std::remove_if(s.begin(), s.end(),
std::not1(std::ptr_fun(std::isalnum)), s.end()), s.end());
std::cout << s << std::endl;
Results in:
"HelloWorld"
You use isalnum to determine whether or not each character is alpha numeric, then use ptr_fun to pass the function to not1 which NOTs the returned value, leaving you with only the alphanumeric stuff you want.

You can use the remove-erase algorithm this way -
// Removes all punctuation
s.erase( std::remove_if(s.begin(), s.end(), &ispunct), s.end());

Below code should work just fine for given string s. It's utilizing <algorithm> and <locale> libraries.
std::string s("He!!llo Wo,#rld! 12 453");
s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !std::isalnum(c); }), s.end());

The mentioned solution
s.erase( std::remove_if(s.begin(), s.end(), &std::ispunct), s.end());
is very nice, but unfortunately doesn't work with characters like 'Ñ' in Visual Studio (debug mode), because of this line:
_ASSERTE((unsigned)(c + 1) <= 256)
in isctype.c
So, I would recommend something like this:
inline int my_ispunct( int ch )
{
return std::ispunct(unsigned char(ch));
}
...
s.erase( std::remove_if(s.begin(), s.end(), &my_ispunct), s.end());

The following works for me.
str.erase(std::remove_if(str.begin(), str.end(), &ispunct), str.end());
str.erase(std::remove_if(str.begin(), str.end(), &isspace), str.end());

void remove_spaces(string data)
{ int i=0,j=0;
while(i<data.length())
{
if (isalpha(data[i]))
{
data[i]=data[i];
i++;
}
else
{
data.erase(i,1);}
}
cout<<data;
}

Related

Deleting spaces from the beginning of strings [duplicate]

How to remove spaces from a string object in C++.
For example, how to remove leading and trailing spaces from the below string object.
//Original string: " This is a sample string "
//Desired string: "This is a sample string"
The string class, as far as I know, doesn't provide any methods to remove leading and trailing spaces.
To add to the problem, how to extend this formatting to process extra spaces between words of the string. For example,
// Original string: " This is a sample string "
// Desired string: "This is a sample string"
Using the string methods mentioned in the solution, I can think of doing these operations in two steps.
Remove leading and trailing spaces.
Use find_first_of, find_last_of, find_first_not_of, find_last_not_of and substr, repeatedly at word boundaries to get desired formatting.
This is called trimming. If you can use Boost, I'd recommend it.
Otherwise, use find_first_not_of to get the index of the first non-whitespace character, then find_last_not_of to get the index from the end that isn't whitespace. With these, use substr to get the sub-string with no surrounding whitespace.
In response to your edit, I don't know the term but I'd guess something along the lines of "reduce", so that's what I called it. :) (Note, I've changed the white-space to be a parameter, for flexibility)
#include <iostream>
#include <string>
std::string trim(const std::string& str,
const std::string& whitespace = " \t")
{
const auto strBegin = str.find_first_not_of(whitespace);
if (strBegin == std::string::npos)
return ""; // no content
const auto strEnd = str.find_last_not_of(whitespace);
const auto strRange = strEnd - strBegin + 1;
return str.substr(strBegin, strRange);
}
std::string reduce(const std::string& str,
const std::string& fill = " ",
const std::string& whitespace = " \t")
{
// trim first
auto result = trim(str, whitespace);
// replace sub ranges
auto beginSpace = result.find_first_of(whitespace);
while (beginSpace != std::string::npos)
{
const auto endSpace = result.find_first_not_of(whitespace, beginSpace);
const auto range = endSpace - beginSpace;
result.replace(beginSpace, range, fill);
const auto newStart = beginSpace + fill.length();
beginSpace = result.find_first_of(whitespace, newStart);
}
return result;
}
int main(void)
{
const std::string foo = " too much\t \tspace\t\t\t ";
const std::string bar = "one\ntwo";
std::cout << "[" << trim(foo) << "]" << std::endl;
std::cout << "[" << reduce(foo) << "]" << std::endl;
std::cout << "[" << reduce(foo, "-") << "]" << std::endl;
std::cout << "[" << trim(bar) << "]" << std::endl;
}
Result:
[too much space]
[too much space]
[too-much-space]
[one
two]
Easy removing leading, trailing and extra spaces from a std::string in one line
value = std::regex_replace(value, std::regex("^ +| +$|( ) +"), "$1");
removing only leading spaces
value.erase(value.begin(), std::find_if(value.begin(), value.end(), std::bind1st(std::not_equal_to<char>(), ' ')));
or
value = std::regex_replace(value, std::regex("^ +"), "");
removing only trailing spaces
value.erase(std::find_if(value.rbegin(), value.rend(), std::bind1st(std::not_equal_to<char>(), ' ')).base(), value.end());
or
value = std::regex_replace(value, std::regex(" +$"), "");
removing only extra spaces
value = regex_replace(value, std::regex(" +"), " ");
I am currently using these functions:
// trim from left
inline std::string& ltrim(std::string& s, const char* t = " \t\n\r\f\v")
{
s.erase(0, s.find_first_not_of(t));
return s;
}
// trim from right
inline std::string& rtrim(std::string& s, const char* t = " \t\n\r\f\v")
{
s.erase(s.find_last_not_of(t) + 1);
return s;
}
// trim from left & right
inline std::string& trim(std::string& s, const char* t = " \t\n\r\f\v")
{
return ltrim(rtrim(s, t), t);
}
// copying versions
inline std::string ltrim_copy(std::string s, const char* t = " \t\n\r\f\v")
{
return ltrim(s, t);
}
inline std::string rtrim_copy(std::string s, const char* t = " \t\n\r\f\v")
{
return rtrim(s, t);
}
inline std::string trim_copy(std::string s, const char* t = " \t\n\r\f\v")
{
return trim(s, t);
}
Boost string trim algorithm
#include <boost/algorithm/string/trim.hpp>
[...]
std::string msg = " some text with spaces ";
boost::algorithm::trim(msg);
This is my solution for stripping the leading and trailing spaces ...
std::string stripString = " Plamen ";
while(!stripString.empty() && std::isspace(*stripString.begin()))
stripString.erase(stripString.begin());
while(!stripString.empty() && std::isspace(*stripString.rbegin()))
stripString.erase(stripString.length()-1);
The result is "Plamen"
Here is how you can do it:
std::string & trim(std::string & str)
{
return ltrim(rtrim(str));
}
And the supportive functions are implemeted as:
std::string & ltrim(std::string & str)
{
auto it2 = std::find_if( str.begin() , str.end() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( str.begin() , it2);
return str;
}
std::string & rtrim(std::string & str)
{
auto it1 = std::find_if( str.rbegin() , str.rend() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( it1.base() , str.end() );
return str;
}
And once you've all these in place, you can write this as well:
std::string trim_copy(std::string const & str)
{
auto s = str;
return ltrim(rtrim(s));
}
C++17 introduced std::basic_string_view, a class template that refers to a constant contiguous sequence of char-like objects, i.e. a view of the string. Apart from having a very similar interface to std::basic_string, it has two additional functions: remove_prefix(), which shrinks the view by moving its start forward; and
remove_suffix(), which shrinks the view by moving its end backward. These can be used to trim leading and trailing space:
#include <string_view>
#include <string>
std::string_view ltrim(std::string_view str)
{
const auto pos(str.find_first_not_of(" \t\n\r\f\v"));
str.remove_prefix(std::min(pos, str.length()));
return str;
}
std::string_view rtrim(std::string_view str)
{
const auto pos(str.find_last_not_of(" \t\n\r\f\v"));
str.remove_suffix(std::min(str.length() - pos - 1, str.length()));
return str;
}
std::string_view trim(std::string_view str)
{
str = ltrim(str);
str = rtrim(str);
return str;
}
int main()
{
std::string str = " hello world ";
auto sv1{ ltrim(str) }; // "hello world "
auto sv2{ rtrim(str) }; // " hello world"
auto sv3{ trim(str) }; // "hello world"
//If you want, you can create std::string objects from std::string_view objects
std::string s1{ sv1 };
std::string s2{ sv2 };
std::string s3{ sv3 };
}
Note: the use of std::min to ensure pos is not greater than size(), which happens when all characters in the string are whitespace and find_first_not_of returns npos. Also, std::string_view is a non-owning reference, so it's only valid as long as the original string still exists. Trimming the string view has no effect on the string it is based on.
Example for trim leading and trailing spaces following jon-hanson's suggestion to use boost (only removes trailing and pending spaces):
#include <boost/algorithm/string/trim.hpp>
std::string str = " t e s t ";
boost::algorithm::trim ( str );
Results in "t e s t"
There is also
trim_left results in "t e s t "
trim_right results in " t e s t"
/// strip a string, remove leading and trailing spaces
void strip(const string& in, string& out)
{
string::const_iterator b = in.begin(), e = in.end();
// skipping leading spaces
while (isSpace(*b)){
++b;
}
if (b != e){
// skipping trailing spaces
while (isSpace(*(e-1))){
--e;
}
}
out.assign(b, e);
}
In the above code, the isSpace() function is a boolean function that tells whether a character is a white space, you can implement this function to reflect your needs, or just call the isspace() from "ctype.h" if you want.
Example for trimming leading and trailing spaces
std::string aString(" This is a string to be trimmed ");
auto start = aString.find_first_not_of(' ');
auto end = aString.find_last_not_of(' ');
std::string trimmedString;
trimmedString = aString.substr(start, (end - start) + 1);
OR
trimmedSring = aString.substr(aString.find_first_not_of(' '), (aString.find_last_not_of(' ') - aString.find_first_not_of(' ')) + 1);
Using the standard library has many benefits, but one must be aware of some special cases that cause exceptions. For example, none of the answers covered the case where a C++ string has some Unicode characters. In this case, if you use the function isspace, an exception will be thrown.
I have been using the following code for trimming the strings and some other operations that might come in handy. The major benefits of this code are: it is really fast (faster than any code I have ever tested), it only uses the standard library, and it never causes an exception:
#include <string>
#include <algorithm>
#include <functional>
#include <locale>
#include <iostream>
typedef unsigned char BYTE;
std::string strTrim(std::string s, char option = 0)
{
// convert all whitespace characters to a standard space
std::replace_if(s.begin(), s.end(), (std::function<int(BYTE)>)::isspace, ' ');
// remove leading and trailing spaces
size_t f = s.find_first_not_of(' ');
if (f == std::string::npos) return "";
s = s.substr(f, s.find_last_not_of(' ') - f + 1);
// remove consecutive spaces
s = std::string(s.begin(), std::unique(s.begin(), s.end(),
[](BYTE l, BYTE r){ return l == ' ' && r == ' '; }));
switch (option)
{
case 'l': // convert to lowercase
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
return s;
case 'U': // convert to uppercase
std::transform(s.begin(), s.end(), s.begin(), ::toupper);
return s;
case 'n': // remove all spaces
s.erase(std::remove(s.begin(), s.end(), ' '), s.end());
return s;
default: // just trim
return s;
}
}
This might be the simplest of all.
You can use string::find and string::rfind to find whitespace from both sides and reduce the string.
void TrimWord(std::string& word)
{
if (word.empty()) return;
// Trim spaces from left side
while (word.find(" ") == 0)
{
word.erase(0, 1);
}
// Trim spaces from right side
size_t len = word.size();
while (word.rfind(" ") == --len)
{
word.erase(len, len + 1);
}
}
To add to the problem, how to extend this formatting to process extra spaces between words of the string.
Actually, this is a simpler case than accounting for multiple leading and trailing white-space characters. All you need to do is remove duplicate adjacent white-space characters from the entire string.
The predicate for adjacent white space would simply be:
auto by_space = [](unsigned char a, unsigned char b) {
return std::isspace(a) and std::isspace(b);
};
and then you can get rid of those duplicate adjacent white-space characters with std::unique, and the erase-remove idiom:
// s = " This is a sample string "
s.erase(std::unique(std::begin(s), std::end(s), by_space),
std::end(s));
// s = " This is a sample string "
This does potentially leave an extra white-space character at the front and/or the back. This can be removed quite easily:
if (std::size(s) && std::isspace(s.back()))
s.pop_back();
if (std::size(s) && std::isspace(s.front()))
s.erase(0, 1);
Here's a demo.
I've tested this, it all works. So this method processInput will just ask the user to type something in. it will return a string that has no extra spaces internally, nor extra spaces at the begining or the end. Hope this helps. (also put a heap of commenting in to make it simple to understand).
you can see how to implement it in the main() at the bottom
#include <string>
#include <iostream>
string processInput() {
char inputChar[256];
string output = "";
int outputLength = 0;
bool space = false;
// user inputs a string.. well a char array
cin.getline(inputChar,256);
output = inputChar;
string outputToLower = "";
// put characters to lower and reduce spaces
for(int i = 0; i < output.length(); i++){
// if it's caps put it to lowercase
output[i] = tolower(output[i]);
// make sure we do not include tabs or line returns or weird symbol for null entry array thingy
if (output[i] != '\t' && output[i] != '\n' && output[i] != 'Ì') {
if (space) {
// if the previous space was a space but this one is not, then space now is false and add char
if (output[i] != ' ') {
space = false;
// add the char
outputToLower+=output[i];
}
} else {
// if space is false, make it true if the char is a space
if (output[i] == ' ') {
space = true;
}
// add the char
outputToLower+=output[i];
}
}
}
// trim leading and tailing space
string trimmedOutput = "";
for(int i = 0; i < outputToLower.length(); i++){
// if it's the last character and it's not a space, then add it
// if it's the first character and it's not a space, then add it
// if it's not the first or the last then add it
if (i == outputToLower.length() - 1 && outputToLower[i] != ' ' ||
i == 0 && outputToLower[i] != ' ' ||
i > 0 && i < outputToLower.length() - 1) {
trimmedOutput += outputToLower[i];
}
}
// return
output = trimmedOutput;
return output;
}
int main() {
cout << "Username: ";
string userName = processInput();
cout << "\nModified Input = " << userName << endl;
}
Why complicate?
std::string removeSpaces(std::string x){
if(x[0] == ' ') { x.erase(0, 1); return removeSpaces(x); }
if(x[x.length() - 1] == ' ') { x.erase(x.length() - 1, x.length()); return removeSpaces(x); }
else return x;
}
This works even if boost was to fail, no regex, no weird stuff nor libraries.
EDIT:
Fix for M.M.'s comment.
No boost, no regex, just the string library. It's that simple.
string trim(const string& s) { // removes whitespace characters from beginnig and end of string s
const int l = (int)s.length();
int a=0, b=l-1;
char c;
while(a<l && ((c=s[a])==' '||c=='\t'||c=='\n'||c=='\v'||c=='\f'||c=='\r'||c=='\0')) a++;
while(b>a && ((c=s[b])==' '||c=='\t'||c=='\n'||c=='\v'||c=='\f'||c=='\r'||c=='\0')) b--;
return s.substr(a, 1+b-a);
}
The constant time and space complexity for removing leading and trailing spaces can be achieved by using pop_back() function in the string. Code looks as follows:
void trimTrailingSpaces(string& s) {
while (s.size() > 0 && s.back() == ' ') {
s.pop_back();
}
}
void trimSpaces(string& s) {
//trim trailing spaces.
trimTrailingSpaces(s);
//trim leading spaces
//To reduce complexity, reversing and removing trailing spaces
//and again reversing back
reverse(s.begin(), s.end());
trimTrailingSpaces(s);
reverse(s.begin(), s.end());
}
char *str = (char*) malloc(50 * sizeof(char));
strcpy(str, " some random string (<50 chars) ");
while(*str == ' ' || *str == '\t' || *str == '\n')
str++;
int len = strlen(str);
while(len >= 0 &&
(str[len - 1] == ' ' || str[len - 1] == '\t' || *str == '\n')
{
*(str + len - 1) = '\0';
len--;
}
printf(":%s:\n", str);
void removeSpaces(string& str)
{
/* remove multiple spaces */
int k=0;
for (int j=0; j<str.size(); ++j)
{
if ( (str[j] != ' ') || (str[j] == ' ' && str[j+1] != ' ' ))
{
str [k] = str [j];
++k;
}
}
str.resize(k);
/* remove space at the end */
if (str [k-1] == ' ')
str.erase(str.end()-1);
/* remove space at the begin */
if (str [0] == ' ')
str.erase(str.begin());
}
string trim(const string & sStr)
{
int nSize = sStr.size();
int nSPos = 0, nEPos = 1, i;
for(i = 0; i< nSize; ++i) {
if( !isspace( sStr[i] ) ) {
nSPos = i ;
break;
}
}
for(i = nSize -1 ; i >= 0 ; --i) {
if( !isspace( sStr[i] ) ) {
nEPos = i;
break;
}
}
return string(sStr, nSPos, nEPos - nSPos + 1);
}
For leading- and trailing spaces, how about:
string string_trim(const string& in) {
stringstream ss;
string out;
ss << in;
ss >> out;
return out;
}
Or for a sentence:
string trim_words(const string& sentence) {
stringstream ss;
ss << sentence;
string s;
string out;
while(ss >> s) {
out+=(s+' ');
}
return out.substr(0, out.length()-1);
}
neat and clean
void trimLeftTrailingSpaces(string &input) {
input.erase(input.begin(), find_if(input.begin(), input.end(), [](int ch) {
return !isspace(ch);
}));
}
void trimRightTrailingSpaces(string &input) {
input.erase(find_if(input.rbegin(), input.rend(), [](int ch) {
return !isspace(ch);
}).base(), input.end());
}
This was the most intuitive way for me to solve this problem:
/**
* #brief Reverses a string, a helper function to removeLeadingTrailingSpaces
*
* #param line
* #return std::string
*/
std::string reverseString (std::string line) {
std::string reverse_line = "";
for(int i = line.length() - 1; i > -1; i--) {
reverse_line += line[i];
}
return reverse_line;
}
/**
* #brief Removes leading and trailing whitespace
* as well as extra whitespace within the line
*
* #param line
* #return std::string
*/
std::string removeLeadingTrailingSpaces(std::string line) {
std::string filtered_line = "";
std::string curr_line = line;
for(int loop = 0; loop < 2; loop++) {
bool leading_spaces_exist = true;
filtered_line = "";
std::string prev_char = "";
for(int i = 0; i < line.length(); i++) {
// Ignores leading whitespace
if(leading_spaces_exist) {
if(curr_line[i] != ' ') {
leading_spaces_exist = false;
}
}
// Puts the rest of the line in a variable
// and ignore back-to-back whitespace
if(!leading_spaces_exist) {
if(!(curr_line[i] == ' ' && prev_char == " ")) {
filtered_line += curr_line[i];
}
prev_char = curr_line[i];
}
}
/*
Reverses the line so that after we remove the leading whitespace
the trailing whitespace becomes the leading whitespace.
After the second round, it needs to reverse the string back to
its regular order.
*/
curr_line = reverseString(filtered_line);
}
return curr_line;
}
Basically, I looped through the string and removed the leading whitespace, then flipped the string and repeated the same process, then flipped back to normal.
I also added the functionality of cleaning up the line if there were back-to-back spaces.
My Solution for this problem not using any STL methods but only C++ string's own methods is as following:
void processString(string &s) {
if ( s.empty() ) return;
//delete leading and trailing spaces of the input string
int notSpaceStartPos = 0, notSpaceEndPos = s.length() - 1;
while ( s[notSpaceStartPos] == ' ' ) ++notSpaceStartPos;
while ( s[notSpaceEndPos] == ' ' ) --notSpaceEndPos;
if ( notSpaceStartPos > notSpaceEndPos ) { s = ""; return; }
s = s.substr(notSpaceStartPos, notSpaceEndPos - notSpaceStartPos + 1);
//reduce multiple spaces between two words to a single space
string temp;
for ( int i = 0; i < s.length(); i++ ) {
if ( i > 0 && s[i] == ' ' && s[i-1] == ' ' ) continue;
temp.push_back(s[i]);
}
s = temp;
}
I have used this method to pass a LeetCode problem Reverse Words in a String
void TrimWhitespaces(std::wstring& str)
{
if (str.empty())
return;
const std::wstring& whitespace = L" \t";
std::wstring::size_type strBegin = str.find_first_not_of(whitespace);
std::wstring::size_type strEnd = str.find_last_not_of(whitespace);
if (strBegin != std::wstring::npos || strEnd != std::wstring::npos)
{
strBegin == std::wstring::npos ? 0 : strBegin;
strEnd == std::wstring::npos ? str.size() : 0;
const auto strRange = strEnd - strBegin + 1;
str.substr(strBegin, strRange).swap(str);
}
else if (str[0] == ' ' || str[0] == '\t') // handles non-empty spaces-only or tabs-only
{
str = L"";
}
}
void TrimWhitespacesTest()
{
std::wstring EmptyStr = L"";
std::wstring SpacesOnlyStr = L" ";
std::wstring TabsOnlyStr = L" ";
std::wstring RightSpacesStr = L"12345 ";
std::wstring LeftSpacesStr = L" 12345";
std::wstring NoSpacesStr = L"12345";
TrimWhitespaces(EmptyStr);
TrimWhitespaces(SpacesOnlyStr);
TrimWhitespaces(TabsOnlyStr);
TrimWhitespaces(RightSpacesStr);
TrimWhitespaces(LeftSpacesStr);
TrimWhitespaces(NoSpacesStr);
assert(EmptyStr == L"");
assert(SpacesOnlyStr == L"");
assert(TabsOnlyStr == L"");
assert(RightSpacesStr == L"12345");
assert(LeftSpacesStr == L"12345");
assert(NoSpacesStr == L"12345");
}
What about the erase-remove idiom?
std::string s("...");
s.erase( std::remove(s.begin(), s.end(), ' '), s.end() );
Sorry. I saw too late that you don't want to remove all whitespace.

How to delete the symbols from string using c++? [duplicate]

I am writing a piece of software, and It require me to handle data I get from a webpage with libcurl. When I get the data, for some reason it has extra line breaks in it. I need to figure out a way to only allow letters, numbers, and spaces. And remove everything else, including line breaks. Is there any easy way to do this? Thanks.
Write a function that takes a char and returns true if you want to remove that character or false if you want to keep it:
bool my_predicate(char c);
Then use the std::remove_if algorithm to remove the unwanted characters from the string:
std::string s = "my data";
s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());
Depending on your requirements, you may be able to use one of the Standard Library predicates, like std::isalnum, instead of writing your own predicate (you said you needed to match alphanumeric characters and spaces, so perhaps this doesn't exactly fit what you need).
If you want to use the Standard Library std::isalnum function, you will need a cast to disambiguate between the std::isalnum function in the C Standard Library header <cctype> (which is the one you want to use) and the std::isalnum in the C++ Standard Library header <locale> (which is not the one you want to use, unless you want to perform locale-specific string processing):
s.erase(std::remove_if(s.begin(), s.end(), (int(*)(int))std::isalnum), s.end());
This works equally well with any of the sequence containers (including std::string, std::vector and std::deque). This idiom is commonly referred to as the "erase/remove" idiom. The std::remove_if algorithm will also work with ordinary arrays. The std::remove_if makes only a single pass over the sequence, so it has linear time complexity.
Previous uses of std::isalnum won't compile with std::ptr_fun without passing the unary argument is requires, hence this solution with a lambda function should encapsulate the correct answer:
s.erase(std::remove_if(s.begin(), s.end(),
[]( auto const& c ) -> bool { return !std::isalnum(c); } ), s.end());
You could always loop through and just erase all non alphanumeric characters if you're using string.
#include <cctype>
size_t i = 0;
size_t len = str.length();
while(i < len){
if (!isalnum(str[i]) || str[i] == ' '){
str.erase(i,1);
len--;
}else
i++;
}
Someone better with the Standard Lib can probably do this without a loop.
If you're using just a char buffer, you can loop through and if a character is not alphanumeric, shift all the characters after it backwards one (to overwrite the offending character):
#include <cctype>
size_t buflen = something;
for (size_t i = 0; i < buflen; ++i)
if (!isalnum(buf[i]) || buf[i] != ' ')
memcpy(buf[i], buf[i + 1], --buflen - i);
Just extending James McNellis's code a little bit more. His function is deleting alnum characters instead of non-alnum ones.
To delete non-alnum characters from a string. (alnum = alphabetical or numeric)
Declare a function (isalnum returns 0 if passed char is not alnum)
bool isNotAlnum(char c) {
return isalnum(c) == 0;
}
And then write this
s.erase(remove_if(s.begin(), s.end(), isNotAlnum), s.end());
then your string is only with alnum characters.
Benchmarking the different methods.
If you are looking for a benchmark I made one.
(115830 cycles) 115.8ms -> using stringstream
( 40434 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !isalnum(c); }), s.end());
( 40389 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return ispunct(c); }), s.end());
( 42386 cycles) 42.4ms -> s.erase(remove_if(s.begin(), s.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s.end());
( 42969 cycles) 43.0ms -> s.erase(remove_if(s.begin(), s.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s.end());
( 44829 cycles) 44.8ms -> alnum_from_libc(s) see below
( 24505 cycles) 24.5ms -> Puzzled? My method, see below
( 9717 cycles) 9.7ms -> using mask and bitwise operators
Original length: 8286208, current len with alnum only: 5822471
Stringstream gives terrible results (but we all know that)
The different answers already given gives about the same runtime
Doing it the C way consistently give better runtime (almost twice faster!), it is definitely worth considering, and on top of that it is compatible with C language.
My bitwise method (also C compatible) is more than 400% faster.
NB the selected answer had to be modified as it was keeping only the special characters
NB2: The test file is a (almost) 8192 kb text file with roughly 62 alnum and 12 special characters, randomly and evenly written.
Benchmark source code
#include <ctime>
#include <iostream>
#include <sstream>
#include <string>
#include <algorithm>
#include <locale> // ispunct
#include <cctype>
#include <fstream> // read file
#include <streambuf>
#include <sys/stat.h> // check if file exist
#include <cstring>
using namespace std;
bool exist(const char *name)
{
struct stat buffer;
return !stat(name, &buffer);
}
constexpr int SIZE = 8092 * 1024;
void keep_alnum(string &s) {
stringstream ss;
int i = 0;
for (i = 0; i < SIZE; i++)
if (isalnum(s[i]))
ss << s[i];
s = ss.str();
}
/* my method, best runtime */
void old_school(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30; // '0'
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
void alnum_from_libc(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
if (isalnum(s[i]))
s[n++] = s[i];
}
s[n] = '\0';
}
#define benchmark(x) printf("\033[30m(%6.0lf cycles) \033[32m%5.1lfms\n\033[0m", x, x / (CLOCKS_PER_SEC / 1000))
int main(int ac, char **av) {
if (ac < 2) {
cout << "usage: ./a.out \"{your string} or ./a.out FILE \"{your file}\n";
return 1;
}
string s;
s.reserve(SIZE+1);
string s1;
s1.reserve(SIZE+1);
char s4[SIZE + 1], s5[SIZE + 1];
if (ac == 3) {
if (!exist(av[2])) {
for (size_t i = 0; i < SIZE; i++)
s4[i] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnoporstuvwxyz!##$%^&*()__+:\"<>?,./'"[rand() % 74];
s4[SIZE] = '\0';
ofstream ofs(av[2]);
if (ofs)
ofs << s4;
}
ifstream ifs(av[2]);
if (ifs) {
ifs.rdbuf()->pubsetbuf(s4, SIZE);
copy(istreambuf_iterator<char>(ifs), {}, s.begin());
}
else
cout << "error\n";
ifs.seekg(0, ios::beg);
s.assign((istreambuf_iterator<char>(ifs)), istreambuf_iterator<char>());
}
else
s = av[1];
double elapsedTime;
clock_t start;
bool is_different = false;
s1 = s;
start = clock();
keep_alnum(s1);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
string tmp = s1;
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return !isalnum(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return ispunct(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
memcpy(s4, s.c_str(), SIZE);
start = clock();
alnum_from_libc(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
memcpy(s4, s.c_str(), SIZE);
start = clock();
old_school(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
cout << "Original length: " << s.size() << ", current len with alnum only: " << strlen(s4) << endl;
// make sure that strings are equivalent
printf("\033[3%cm%s\n", ('3' + !is_different), !is_different ? "OK" : "KO");
return 0;
}
My solution
For the bitwise method you can check it directly on my github, basically I avoid branching instructions (if) thanks to the mask. I avoid posting bitwise operations with C++ tag, I get a lot of hate for it.
For the C style one, I iterate over the string and have two index: n for the characters we keep and i to go through the string, where we test one after another if it is a digit, a uppercase or a lowercase.
Add this function:
void strip_special_chars(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30;
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
and use as:
char s1[s.size() + 1]
memcpy(s1, s.c_str(), s.size());
strip_special_chars(s1);
The remove_copy_if standard algorithm would be very appropriate for your case.
#include <cctype>
#include <string>
#include <functional>
std::string s = "Hello World!";
s.erase(std::remove_if(s.begin(), s.end(),
std::not1(std::ptr_fun(std::isalnum)), s.end()), s.end());
std::cout << s << std::endl;
Results in:
"HelloWorld"
You use isalnum to determine whether or not each character is alpha numeric, then use ptr_fun to pass the function to not1 which NOTs the returned value, leaving you with only the alphanumeric stuff you want.
You can use the remove-erase algorithm this way -
// Removes all punctuation
s.erase( std::remove_if(s.begin(), s.end(), &ispunct), s.end());
Below code should work just fine for given string s. It's utilizing <algorithm> and <locale> libraries.
std::string s("He!!llo Wo,#rld! 12 453");
s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !std::isalnum(c); }), s.end());
The mentioned solution
s.erase( std::remove_if(s.begin(), s.end(), &std::ispunct), s.end());
is very nice, but unfortunately doesn't work with characters like 'Ñ' in Visual Studio (debug mode), because of this line:
_ASSERTE((unsigned)(c + 1) <= 256)
in isctype.c
So, I would recommend something like this:
inline int my_ispunct( int ch )
{
return std::ispunct(unsigned char(ch));
}
...
s.erase( std::remove_if(s.begin(), s.end(), &my_ispunct), s.end());
The following works for me.
str.erase(std::remove_if(str.begin(), str.end(), &ispunct), str.end());
str.erase(std::remove_if(str.begin(), str.end(), &isspace), str.end());
void remove_spaces(string data)
{ int i=0,j=0;
while(i<data.length())
{
if (isalpha(data[i]))
{
data[i]=data[i];
i++;
}
else
{
data.erase(i,1);}
}
cout<<data;
}

Remove character in c++ with cstring [duplicate]

I am using following:
replace (str1.begin(), str1.end(), 'a' , '')
But this is giving compilation error.
Basically, replace replaces a character with another and '' is not a character. What you're looking for is erase.
See this question which answers the same problem. In your case:
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), 'a'), str.end());
Or use boost if that's an option for you, like:
#include <boost/algorithm/string.hpp>
boost::erase_all(str, "a");
All of this is well-documented on reference websites. But if you didn't know of these functions, you could easily do this kind of things by hand:
std::string output;
output.reserve(str.size()); // optional, avoids buffer reallocations in the loop
for(size_t i = 0; i < str.size(); ++i)
if(str[i] != 'a') output += str[i];
The algorithm std::replace works per element on a given sequence (so it replaces elements with different elements, and can not replace it with nothing). But there is no empty character. If you want to remove elements from a sequence, the following elements have to be moved, and std::replace doesn't work like this.
You can try to use std::remove() together with str.erase()1 to achieve this.
str.erase(std::remove(str.begin(), str.end(), 'a'), str.end());
Using copy_if:
#include <string>
#include <iostream>
#include <algorithm>
int main() {
std::string s1 = "a1a2b3c4a5";
std::string s2;
std::copy_if(s1.begin(), s1.end(), std::back_inserter(s2),
[](char c){
std::string exclude = "a";
return exclude.find(c) == std::string::npos;}
);
std::cout << s2 << '\n';
return 0;
}
Starting with C++20, std::erase() has been added to the standard library, which combines the call to str.erase() and std::remove() into just one function:
std::erase(str, 'a');
The std::erase() function overload acting on strings is defined directly in the <string> header file, so no separate includes are required. Similiar overloads are defined for all the other containers.
string RemoveChar(string str, char c)
{
string result;
for (size_t i = 0; i < str.size(); i++)
{
char currentChar = str[i];
if (currentChar != c)
result += currentChar;
}
return result;
}
This is how I did it.
Or you could do as Antoine mentioned:
See this
question
which answers the same problem. In your case:
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), 'a'), str.end());
In case you have a predicate and/or a non empty output to fill with the filtered string, I would consider:
output.reserve(str.size() + output.size());
std::copy_if(str.cbegin(),
str.cend(),
std::back_inserter(output),
predicate});
In the original question the predicate is [](char c){return c != 'a';}
This code removes repetition of characters i.e, if the input is aaabbcc then the output will be abc. (the array must be sorted for this code to work)
cin >> s;
ans = "";
ans += s[0];
for(int i = 1;i < s.length();++i)
if(s[i] != s[i-1])
ans += s[i];
cout << ans << endl;
Based on other answers, here goes one more example where I removed all special chars in a given string:
#include <iostream>
#include <string>
#include <algorithm>
std::string chars(".,?!.:;_,!'\"-");
int main(int argc, char const *argv){
std::string input("oi?");
std::string output = eraseSpecialChars(input);
return 0;
}
std::string eraseSpecialChars(std::string str){
std::string newStr;
newStr.assign(str);
for(int i = 0; i < str.length(); i++){
for(int j = 0; j < chars.length(); j++ ){
if(str.at(i) == chars.at(j)){
char c = str.at(i);
newStr.erase(std::remove(newStr.begin(), newStr.end(), c), newStr.end());
}
}
}
return newStr;
}
Input vs Output:
Input:ra,..pha
Output:rapha
Input:ovo,
Output:ovo
Input:a.vo
Output:avo
Input:oi?
Output:oi
I have a string being read:
"\"internet\""
and I want to remove the quotes. I used the std::erase solution suggested above:
str.erase(std::remove(str.begin(), str.end(), '\"'), str.end());
but when I then did a compare on the result it failed:
if (str == "internet") {}
What I actually got was:
"internet "
The std::erase / std::remove solution doesn't shorten the string when it removes the end. I added this (from https://stackoverflow.com/a/21815483/264822):
str.erase(std::find_if(str.rbegin(), str.rend(), std::bind1st(std::not_equal_to<char>(), ' ')).base(), str.end());
to remove the trailing space(s).
I guess the method std:remove works but it was giving some compatibility issue with the includes so I ended up writing this little function:
string removeCharsFromString(const string str, char* charsToRemove )
{
char c[str.length()+1]; // + terminating char
const char *p = str.c_str();
unsigned int z=0, size = str.length();
unsigned int x;
bool rem=false;
for(x=0; x<size; x++)
{
rem = false;
for (unsigned int i = 0; charsToRemove[i] != 0; i++)
{
if (charsToRemove[i] == p[x])
{
rem = true;
break;
}
}
if (rem == false) c[z++] = p[x];
}
c[z] = '\0';
return string(c);
}
Just use as
myString = removeCharsFromString(myString, "abc\r");
and it will remove all the occurrence of the given char list.
This might also be a bit more efficient as the loop returns after the first match, so we actually do less comparison.
This is how I do it:
std::string removeAll(std::string str, char c) {
size_t offset = 0;
size_t size = str.size();
size_t i = 0;
while (i < size - offset) {
if (str[i + offset] == c) {
offset++;
}
if (offset != 0) {
str[i] = str[i + offset];
}
i++;
}
str.resize(size - offset);
return str;
}
Basically whenever I find a given char, I advance the offset and relocate the char to the correct index. I don't know if this is correct or efficient, I'm starting (yet again) at C++ and i'd appreciate any input on that.
70% Faster Solution than the top answer
void removeCharsFromString(std::string& str, const char* charsToRemove)
{
size_t charsToRemoveLen = strlen(charsToRemove);
std::remove_if(str.begin(), str.end(), [charsToRemove, charsToRemoveLen](char ch) -> bool
{
for (int i = 0; i < charsToRemoveLen; ++i) {
if (ch == charsToRemove[i])
return true;
}
return false;
});
}
#include <string>
#include <algorithm>
std::string str = "YourString";
char chars[] = {'Y', 'S'};
str.erase (std::remove(str.begin(), str.end(), chars[i]), str.end());
Will remove capital Y and S from str, leaving "ourtring".
Note that remove is an algorithm and needs the header <algorithm> included.

convertint date and time string to just integers in C++

How to convert
std::string strdate = "2012-06-25 05:32:06.963";
To some thing like this
std::string strintdate = "20120625053206963" // basically i removed -, :, space and .
I think I should use strtok or string functions, but I am not able to do it, can any one please help me here with sampel code.
so that i convert it to unsigned __int64 by using
// crt_strtoui64.c
#include <stdio.h>
unsigned __int64 atoui64(const char *szUnsignedInt) {
return _strtoui64(szUnsignedInt, NULL, 10);
}
int main() {
unsigned __int64 u = atoui64("18446744073709551615");
printf( "u = %I64u\n", u );
}
bool nondigit(char c) {
return c < '0' || c > '9';
}
std::string strdate = "2012-06-25 05:32:06.963";
strdate.erase(
std::remove_if(strdate.begin(), strdate.end(), nondigit),
strdate.end()
);
std::istringstream ss(strdate);
unsigned __int64 result;
if (ss >> result) {
// success
} else {
// handle failure
}
Btw, your representation as a 64-bit int might be a bit fragile. Make sure that the date/time 2012-06-25 05:32:06 is input as 2012-06-25 05:32:06.000, otherwise the integer you get out of the end is smaller than expected (and hence might be confused for a date/time in the year 2AD).
If your compiler supports C++11 features:
#include <iostream>
#include <algorithm>
#include <string>
int main()
{
std::string s("2012-06-25 05:32:06.963");
s.erase(std::remove_if(s.begin(),
s.end(),
[](const char a_c) { return !isdigit(a_c); }),
s.end());
std::cout << s << "\n";
return 0;
}
use string replace to replace the unwanted chars with no chars
http://www.cplusplus.com/reference/string/string/replace/
Here you go:
bool not_digit (int c) { return !std::isdigit(c); }
std::string date="2012-06-25 05:32:06.963";
// construct a new string
std::string intdate(date.begin(), std::remove_if(date.begin(), date.end(), not_digit));
I wouldn't use strtok. Here's a fairly simple method that just uses std::string member functions:
std::string strdate = "2012-06-25 05:32:06.963";
size_t pos = strdate.find_first_not_of("1234567890");
while (pos != std::string::npos)
{
size_t endpos = strdate.find_first_of("1234567890", pos);
strdate.erase(pos, endpos - pos);
pos = strdate.find_first_not_of("1234567890");
}
This is not a super efficient approach, but it will work.
A perhaps more efficient approach might use a stringstream...
std::string strdate = "2012-06-25 05:32:06.963";
std::stringstream out;
for (auto i = strdate.begin(); i != strdate.end(); i++)
if (std::isdigit(*i)) out << *i;
strdate = out.str();
I make no promises about time or space complexity, but I suspect that using string::erase multiple times might involve a little more memory shuffling.
std::string strdate = "2012-06-25 05:32:06.963";
std::string result ="";
for(std::string::iterator itr = strdate.begin(); itr != strdate.end(); itr++)
{
if(itr[0] >= '0' && itr[0] <= '9')
{
result.push_back(itr[0]);
}
}

Removing leading and trailing spaces from a string

How to remove spaces from a string object in C++.
For example, how to remove leading and trailing spaces from the below string object.
//Original string: " This is a sample string "
//Desired string: "This is a sample string"
The string class, as far as I know, doesn't provide any methods to remove leading and trailing spaces.
To add to the problem, how to extend this formatting to process extra spaces between words of the string. For example,
// Original string: " This is a sample string "
// Desired string: "This is a sample string"
Using the string methods mentioned in the solution, I can think of doing these operations in two steps.
Remove leading and trailing spaces.
Use find_first_of, find_last_of, find_first_not_of, find_last_not_of and substr, repeatedly at word boundaries to get desired formatting.
This is called trimming. If you can use Boost, I'd recommend it.
Otherwise, use find_first_not_of to get the index of the first non-whitespace character, then find_last_not_of to get the index from the end that isn't whitespace. With these, use substr to get the sub-string with no surrounding whitespace.
In response to your edit, I don't know the term but I'd guess something along the lines of "reduce", so that's what I called it. :) (Note, I've changed the white-space to be a parameter, for flexibility)
#include <iostream>
#include <string>
std::string trim(const std::string& str,
const std::string& whitespace = " \t")
{
const auto strBegin = str.find_first_not_of(whitespace);
if (strBegin == std::string::npos)
return ""; // no content
const auto strEnd = str.find_last_not_of(whitespace);
const auto strRange = strEnd - strBegin + 1;
return str.substr(strBegin, strRange);
}
std::string reduce(const std::string& str,
const std::string& fill = " ",
const std::string& whitespace = " \t")
{
// trim first
auto result = trim(str, whitespace);
// replace sub ranges
auto beginSpace = result.find_first_of(whitespace);
while (beginSpace != std::string::npos)
{
const auto endSpace = result.find_first_not_of(whitespace, beginSpace);
const auto range = endSpace - beginSpace;
result.replace(beginSpace, range, fill);
const auto newStart = beginSpace + fill.length();
beginSpace = result.find_first_of(whitespace, newStart);
}
return result;
}
int main(void)
{
const std::string foo = " too much\t \tspace\t\t\t ";
const std::string bar = "one\ntwo";
std::cout << "[" << trim(foo) << "]" << std::endl;
std::cout << "[" << reduce(foo) << "]" << std::endl;
std::cout << "[" << reduce(foo, "-") << "]" << std::endl;
std::cout << "[" << trim(bar) << "]" << std::endl;
}
Result:
[too much space]
[too much space]
[too-much-space]
[one
two]
Easy removing leading, trailing and extra spaces from a std::string in one line
value = std::regex_replace(value, std::regex("^ +| +$|( ) +"), "$1");
removing only leading spaces
value.erase(value.begin(), std::find_if(value.begin(), value.end(), std::bind1st(std::not_equal_to<char>(), ' ')));
or
value = std::regex_replace(value, std::regex("^ +"), "");
removing only trailing spaces
value.erase(std::find_if(value.rbegin(), value.rend(), std::bind1st(std::not_equal_to<char>(), ' ')).base(), value.end());
or
value = std::regex_replace(value, std::regex(" +$"), "");
removing only extra spaces
value = regex_replace(value, std::regex(" +"), " ");
I am currently using these functions:
// trim from left
inline std::string& ltrim(std::string& s, const char* t = " \t\n\r\f\v")
{
s.erase(0, s.find_first_not_of(t));
return s;
}
// trim from right
inline std::string& rtrim(std::string& s, const char* t = " \t\n\r\f\v")
{
s.erase(s.find_last_not_of(t) + 1);
return s;
}
// trim from left & right
inline std::string& trim(std::string& s, const char* t = " \t\n\r\f\v")
{
return ltrim(rtrim(s, t), t);
}
// copying versions
inline std::string ltrim_copy(std::string s, const char* t = " \t\n\r\f\v")
{
return ltrim(s, t);
}
inline std::string rtrim_copy(std::string s, const char* t = " \t\n\r\f\v")
{
return rtrim(s, t);
}
inline std::string trim_copy(std::string s, const char* t = " \t\n\r\f\v")
{
return trim(s, t);
}
Boost string trim algorithm
#include <boost/algorithm/string/trim.hpp>
[...]
std::string msg = " some text with spaces ";
boost::algorithm::trim(msg);
This is my solution for stripping the leading and trailing spaces ...
std::string stripString = " Plamen ";
while(!stripString.empty() && std::isspace(*stripString.begin()))
stripString.erase(stripString.begin());
while(!stripString.empty() && std::isspace(*stripString.rbegin()))
stripString.erase(stripString.length()-1);
The result is "Plamen"
Here is how you can do it:
std::string & trim(std::string & str)
{
return ltrim(rtrim(str));
}
And the supportive functions are implemeted as:
std::string & ltrim(std::string & str)
{
auto it2 = std::find_if( str.begin() , str.end() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( str.begin() , it2);
return str;
}
std::string & rtrim(std::string & str)
{
auto it1 = std::find_if( str.rbegin() , str.rend() , [](char ch){ return !std::isspace<char>(ch , std::locale::classic() ) ; } );
str.erase( it1.base() , str.end() );
return str;
}
And once you've all these in place, you can write this as well:
std::string trim_copy(std::string const & str)
{
auto s = str;
return ltrim(rtrim(s));
}
C++17 introduced std::basic_string_view, a class template that refers to a constant contiguous sequence of char-like objects, i.e. a view of the string. Apart from having a very similar interface to std::basic_string, it has two additional functions: remove_prefix(), which shrinks the view by moving its start forward; and
remove_suffix(), which shrinks the view by moving its end backward. These can be used to trim leading and trailing space:
#include <string_view>
#include <string>
std::string_view ltrim(std::string_view str)
{
const auto pos(str.find_first_not_of(" \t\n\r\f\v"));
str.remove_prefix(std::min(pos, str.length()));
return str;
}
std::string_view rtrim(std::string_view str)
{
const auto pos(str.find_last_not_of(" \t\n\r\f\v"));
str.remove_suffix(std::min(str.length() - pos - 1, str.length()));
return str;
}
std::string_view trim(std::string_view str)
{
str = ltrim(str);
str = rtrim(str);
return str;
}
int main()
{
std::string str = " hello world ";
auto sv1{ ltrim(str) }; // "hello world "
auto sv2{ rtrim(str) }; // " hello world"
auto sv3{ trim(str) }; // "hello world"
//If you want, you can create std::string objects from std::string_view objects
std::string s1{ sv1 };
std::string s2{ sv2 };
std::string s3{ sv3 };
}
Note: the use of std::min to ensure pos is not greater than size(), which happens when all characters in the string are whitespace and find_first_not_of returns npos. Also, std::string_view is a non-owning reference, so it's only valid as long as the original string still exists. Trimming the string view has no effect on the string it is based on.
Example for trim leading and trailing spaces following jon-hanson's suggestion to use boost (only removes trailing and pending spaces):
#include <boost/algorithm/string/trim.hpp>
std::string str = " t e s t ";
boost::algorithm::trim ( str );
Results in "t e s t"
There is also
trim_left results in "t e s t "
trim_right results in " t e s t"
/// strip a string, remove leading and trailing spaces
void strip(const string& in, string& out)
{
string::const_iterator b = in.begin(), e = in.end();
// skipping leading spaces
while (isSpace(*b)){
++b;
}
if (b != e){
// skipping trailing spaces
while (isSpace(*(e-1))){
--e;
}
}
out.assign(b, e);
}
In the above code, the isSpace() function is a boolean function that tells whether a character is a white space, you can implement this function to reflect your needs, or just call the isspace() from "ctype.h" if you want.
Example for trimming leading and trailing spaces
std::string aString(" This is a string to be trimmed ");
auto start = aString.find_first_not_of(' ');
auto end = aString.find_last_not_of(' ');
std::string trimmedString;
trimmedString = aString.substr(start, (end - start) + 1);
OR
trimmedSring = aString.substr(aString.find_first_not_of(' '), (aString.find_last_not_of(' ') - aString.find_first_not_of(' ')) + 1);
Using the standard library has many benefits, but one must be aware of some special cases that cause exceptions. For example, none of the answers covered the case where a C++ string has some Unicode characters. In this case, if you use the function isspace, an exception will be thrown.
I have been using the following code for trimming the strings and some other operations that might come in handy. The major benefits of this code are: it is really fast (faster than any code I have ever tested), it only uses the standard library, and it never causes an exception:
#include <string>
#include <algorithm>
#include <functional>
#include <locale>
#include <iostream>
typedef unsigned char BYTE;
std::string strTrim(std::string s, char option = 0)
{
// convert all whitespace characters to a standard space
std::replace_if(s.begin(), s.end(), (std::function<int(BYTE)>)::isspace, ' ');
// remove leading and trailing spaces
size_t f = s.find_first_not_of(' ');
if (f == std::string::npos) return "";
s = s.substr(f, s.find_last_not_of(' ') - f + 1);
// remove consecutive spaces
s = std::string(s.begin(), std::unique(s.begin(), s.end(),
[](BYTE l, BYTE r){ return l == ' ' && r == ' '; }));
switch (option)
{
case 'l': // convert to lowercase
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
return s;
case 'U': // convert to uppercase
std::transform(s.begin(), s.end(), s.begin(), ::toupper);
return s;
case 'n': // remove all spaces
s.erase(std::remove(s.begin(), s.end(), ' '), s.end());
return s;
default: // just trim
return s;
}
}
This might be the simplest of all.
You can use string::find and string::rfind to find whitespace from both sides and reduce the string.
void TrimWord(std::string& word)
{
if (word.empty()) return;
// Trim spaces from left side
while (word.find(" ") == 0)
{
word.erase(0, 1);
}
// Trim spaces from right side
size_t len = word.size();
while (word.rfind(" ") == --len)
{
word.erase(len, len + 1);
}
}
To add to the problem, how to extend this formatting to process extra spaces between words of the string.
Actually, this is a simpler case than accounting for multiple leading and trailing white-space characters. All you need to do is remove duplicate adjacent white-space characters from the entire string.
The predicate for adjacent white space would simply be:
auto by_space = [](unsigned char a, unsigned char b) {
return std::isspace(a) and std::isspace(b);
};
and then you can get rid of those duplicate adjacent white-space characters with std::unique, and the erase-remove idiom:
// s = " This is a sample string "
s.erase(std::unique(std::begin(s), std::end(s), by_space),
std::end(s));
// s = " This is a sample string "
This does potentially leave an extra white-space character at the front and/or the back. This can be removed quite easily:
if (std::size(s) && std::isspace(s.back()))
s.pop_back();
if (std::size(s) && std::isspace(s.front()))
s.erase(0, 1);
Here's a demo.
I've tested this, it all works. So this method processInput will just ask the user to type something in. it will return a string that has no extra spaces internally, nor extra spaces at the begining or the end. Hope this helps. (also put a heap of commenting in to make it simple to understand).
you can see how to implement it in the main() at the bottom
#include <string>
#include <iostream>
string processInput() {
char inputChar[256];
string output = "";
int outputLength = 0;
bool space = false;
// user inputs a string.. well a char array
cin.getline(inputChar,256);
output = inputChar;
string outputToLower = "";
// put characters to lower and reduce spaces
for(int i = 0; i < output.length(); i++){
// if it's caps put it to lowercase
output[i] = tolower(output[i]);
// make sure we do not include tabs or line returns or weird symbol for null entry array thingy
if (output[i] != '\t' && output[i] != '\n' && output[i] != 'Ì') {
if (space) {
// if the previous space was a space but this one is not, then space now is false and add char
if (output[i] != ' ') {
space = false;
// add the char
outputToLower+=output[i];
}
} else {
// if space is false, make it true if the char is a space
if (output[i] == ' ') {
space = true;
}
// add the char
outputToLower+=output[i];
}
}
}
// trim leading and tailing space
string trimmedOutput = "";
for(int i = 0; i < outputToLower.length(); i++){
// if it's the last character and it's not a space, then add it
// if it's the first character and it's not a space, then add it
// if it's not the first or the last then add it
if (i == outputToLower.length() - 1 && outputToLower[i] != ' ' ||
i == 0 && outputToLower[i] != ' ' ||
i > 0 && i < outputToLower.length() - 1) {
trimmedOutput += outputToLower[i];
}
}
// return
output = trimmedOutput;
return output;
}
int main() {
cout << "Username: ";
string userName = processInput();
cout << "\nModified Input = " << userName << endl;
}
Why complicate?
std::string removeSpaces(std::string x){
if(x[0] == ' ') { x.erase(0, 1); return removeSpaces(x); }
if(x[x.length() - 1] == ' ') { x.erase(x.length() - 1, x.length()); return removeSpaces(x); }
else return x;
}
This works even if boost was to fail, no regex, no weird stuff nor libraries.
EDIT:
Fix for M.M.'s comment.
No boost, no regex, just the string library. It's that simple.
string trim(const string& s) { // removes whitespace characters from beginnig and end of string s
const int l = (int)s.length();
int a=0, b=l-1;
char c;
while(a<l && ((c=s[a])==' '||c=='\t'||c=='\n'||c=='\v'||c=='\f'||c=='\r'||c=='\0')) a++;
while(b>a && ((c=s[b])==' '||c=='\t'||c=='\n'||c=='\v'||c=='\f'||c=='\r'||c=='\0')) b--;
return s.substr(a, 1+b-a);
}
The constant time and space complexity for removing leading and trailing spaces can be achieved by using pop_back() function in the string. Code looks as follows:
void trimTrailingSpaces(string& s) {
while (s.size() > 0 && s.back() == ' ') {
s.pop_back();
}
}
void trimSpaces(string& s) {
//trim trailing spaces.
trimTrailingSpaces(s);
//trim leading spaces
//To reduce complexity, reversing and removing trailing spaces
//and again reversing back
reverse(s.begin(), s.end());
trimTrailingSpaces(s);
reverse(s.begin(), s.end());
}
char *str = (char*) malloc(50 * sizeof(char));
strcpy(str, " some random string (<50 chars) ");
while(*str == ' ' || *str == '\t' || *str == '\n')
str++;
int len = strlen(str);
while(len >= 0 &&
(str[len - 1] == ' ' || str[len - 1] == '\t' || *str == '\n')
{
*(str + len - 1) = '\0';
len--;
}
printf(":%s:\n", str);
void removeSpaces(string& str)
{
/* remove multiple spaces */
int k=0;
for (int j=0; j<str.size(); ++j)
{
if ( (str[j] != ' ') || (str[j] == ' ' && str[j+1] != ' ' ))
{
str [k] = str [j];
++k;
}
}
str.resize(k);
/* remove space at the end */
if (str [k-1] == ' ')
str.erase(str.end()-1);
/* remove space at the begin */
if (str [0] == ' ')
str.erase(str.begin());
}
string trim(const string & sStr)
{
int nSize = sStr.size();
int nSPos = 0, nEPos = 1, i;
for(i = 0; i< nSize; ++i) {
if( !isspace( sStr[i] ) ) {
nSPos = i ;
break;
}
}
for(i = nSize -1 ; i >= 0 ; --i) {
if( !isspace( sStr[i] ) ) {
nEPos = i;
break;
}
}
return string(sStr, nSPos, nEPos - nSPos + 1);
}
For leading- and trailing spaces, how about:
string string_trim(const string& in) {
stringstream ss;
string out;
ss << in;
ss >> out;
return out;
}
Or for a sentence:
string trim_words(const string& sentence) {
stringstream ss;
ss << sentence;
string s;
string out;
while(ss >> s) {
out+=(s+' ');
}
return out.substr(0, out.length()-1);
}
neat and clean
void trimLeftTrailingSpaces(string &input) {
input.erase(input.begin(), find_if(input.begin(), input.end(), [](int ch) {
return !isspace(ch);
}));
}
void trimRightTrailingSpaces(string &input) {
input.erase(find_if(input.rbegin(), input.rend(), [](int ch) {
return !isspace(ch);
}).base(), input.end());
}
This was the most intuitive way for me to solve this problem:
/**
* #brief Reverses a string, a helper function to removeLeadingTrailingSpaces
*
* #param line
* #return std::string
*/
std::string reverseString (std::string line) {
std::string reverse_line = "";
for(int i = line.length() - 1; i > -1; i--) {
reverse_line += line[i];
}
return reverse_line;
}
/**
* #brief Removes leading and trailing whitespace
* as well as extra whitespace within the line
*
* #param line
* #return std::string
*/
std::string removeLeadingTrailingSpaces(std::string line) {
std::string filtered_line = "";
std::string curr_line = line;
for(int loop = 0; loop < 2; loop++) {
bool leading_spaces_exist = true;
filtered_line = "";
std::string prev_char = "";
for(int i = 0; i < line.length(); i++) {
// Ignores leading whitespace
if(leading_spaces_exist) {
if(curr_line[i] != ' ') {
leading_spaces_exist = false;
}
}
// Puts the rest of the line in a variable
// and ignore back-to-back whitespace
if(!leading_spaces_exist) {
if(!(curr_line[i] == ' ' && prev_char == " ")) {
filtered_line += curr_line[i];
}
prev_char = curr_line[i];
}
}
/*
Reverses the line so that after we remove the leading whitespace
the trailing whitespace becomes the leading whitespace.
After the second round, it needs to reverse the string back to
its regular order.
*/
curr_line = reverseString(filtered_line);
}
return curr_line;
}
Basically, I looped through the string and removed the leading whitespace, then flipped the string and repeated the same process, then flipped back to normal.
I also added the functionality of cleaning up the line if there were back-to-back spaces.
My Solution for this problem not using any STL methods but only C++ string's own methods is as following:
void processString(string &s) {
if ( s.empty() ) return;
//delete leading and trailing spaces of the input string
int notSpaceStartPos = 0, notSpaceEndPos = s.length() - 1;
while ( s[notSpaceStartPos] == ' ' ) ++notSpaceStartPos;
while ( s[notSpaceEndPos] == ' ' ) --notSpaceEndPos;
if ( notSpaceStartPos > notSpaceEndPos ) { s = ""; return; }
s = s.substr(notSpaceStartPos, notSpaceEndPos - notSpaceStartPos + 1);
//reduce multiple spaces between two words to a single space
string temp;
for ( int i = 0; i < s.length(); i++ ) {
if ( i > 0 && s[i] == ' ' && s[i-1] == ' ' ) continue;
temp.push_back(s[i]);
}
s = temp;
}
I have used this method to pass a LeetCode problem Reverse Words in a String
void TrimWhitespaces(std::wstring& str)
{
if (str.empty())
return;
const std::wstring& whitespace = L" \t";
std::wstring::size_type strBegin = str.find_first_not_of(whitespace);
std::wstring::size_type strEnd = str.find_last_not_of(whitespace);
if (strBegin != std::wstring::npos || strEnd != std::wstring::npos)
{
strBegin == std::wstring::npos ? 0 : strBegin;
strEnd == std::wstring::npos ? str.size() : 0;
const auto strRange = strEnd - strBegin + 1;
str.substr(strBegin, strRange).swap(str);
}
else if (str[0] == ' ' || str[0] == '\t') // handles non-empty spaces-only or tabs-only
{
str = L"";
}
}
void TrimWhitespacesTest()
{
std::wstring EmptyStr = L"";
std::wstring SpacesOnlyStr = L" ";
std::wstring TabsOnlyStr = L" ";
std::wstring RightSpacesStr = L"12345 ";
std::wstring LeftSpacesStr = L" 12345";
std::wstring NoSpacesStr = L"12345";
TrimWhitespaces(EmptyStr);
TrimWhitespaces(SpacesOnlyStr);
TrimWhitespaces(TabsOnlyStr);
TrimWhitespaces(RightSpacesStr);
TrimWhitespaces(LeftSpacesStr);
TrimWhitespaces(NoSpacesStr);
assert(EmptyStr == L"");
assert(SpacesOnlyStr == L"");
assert(TabsOnlyStr == L"");
assert(RightSpacesStr == L"12345");
assert(LeftSpacesStr == L"12345");
assert(NoSpacesStr == L"12345");
}
What about the erase-remove idiom?
std::string s("...");
s.erase( std::remove(s.begin(), s.end(), ' '), s.end() );
Sorry. I saw too late that you don't want to remove all whitespace.