I am using following:
replace (str1.begin(), str1.end(), 'a' , '')
But this is giving compilation error.
Basically, replace replaces a character with another and '' is not a character. What you're looking for is erase.
See this question which answers the same problem. In your case:
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), 'a'), str.end());
Or use boost if that's an option for you, like:
#include <boost/algorithm/string.hpp>
boost::erase_all(str, "a");
All of this is well-documented on reference websites. But if you didn't know of these functions, you could easily do this kind of things by hand:
std::string output;
output.reserve(str.size()); // optional, avoids buffer reallocations in the loop
for(size_t i = 0; i < str.size(); ++i)
if(str[i] != 'a') output += str[i];
The algorithm std::replace works per element on a given sequence (so it replaces elements with different elements, and can not replace it with nothing). But there is no empty character. If you want to remove elements from a sequence, the following elements have to be moved, and std::replace doesn't work like this.
You can try to use std::remove() together with str.erase()1 to achieve this.
str.erase(std::remove(str.begin(), str.end(), 'a'), str.end());
Using copy_if:
#include <string>
#include <iostream>
#include <algorithm>
int main() {
std::string s1 = "a1a2b3c4a5";
std::string s2;
std::copy_if(s1.begin(), s1.end(), std::back_inserter(s2),
[](char c){
std::string exclude = "a";
return exclude.find(c) == std::string::npos;}
);
std::cout << s2 << '\n';
return 0;
}
Starting with C++20, std::erase() has been added to the standard library, which combines the call to str.erase() and std::remove() into just one function:
std::erase(str, 'a');
The std::erase() function overload acting on strings is defined directly in the <string> header file, so no separate includes are required. Similiar overloads are defined for all the other containers.
string RemoveChar(string str, char c)
{
string result;
for (size_t i = 0; i < str.size(); i++)
{
char currentChar = str[i];
if (currentChar != c)
result += currentChar;
}
return result;
}
This is how I did it.
Or you could do as Antoine mentioned:
See this
question
which answers the same problem. In your case:
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), 'a'), str.end());
In case you have a predicate and/or a non empty output to fill with the filtered string, I would consider:
output.reserve(str.size() + output.size());
std::copy_if(str.cbegin(),
str.cend(),
std::back_inserter(output),
predicate});
In the original question the predicate is [](char c){return c != 'a';}
This code removes repetition of characters i.e, if the input is aaabbcc then the output will be abc. (the array must be sorted for this code to work)
cin >> s;
ans = "";
ans += s[0];
for(int i = 1;i < s.length();++i)
if(s[i] != s[i-1])
ans += s[i];
cout << ans << endl;
Based on other answers, here goes one more example where I removed all special chars in a given string:
#include <iostream>
#include <string>
#include <algorithm>
std::string chars(".,?!.:;_,!'\"-");
int main(int argc, char const *argv){
std::string input("oi?");
std::string output = eraseSpecialChars(input);
return 0;
}
std::string eraseSpecialChars(std::string str){
std::string newStr;
newStr.assign(str);
for(int i = 0; i < str.length(); i++){
for(int j = 0; j < chars.length(); j++ ){
if(str.at(i) == chars.at(j)){
char c = str.at(i);
newStr.erase(std::remove(newStr.begin(), newStr.end(), c), newStr.end());
}
}
}
return newStr;
}
Input vs Output:
Input:ra,..pha
Output:rapha
Input:ovo,
Output:ovo
Input:a.vo
Output:avo
Input:oi?
Output:oi
I have a string being read:
"\"internet\""
and I want to remove the quotes. I used the std::erase solution suggested above:
str.erase(std::remove(str.begin(), str.end(), '\"'), str.end());
but when I then did a compare on the result it failed:
if (str == "internet") {}
What I actually got was:
"internet "
The std::erase / std::remove solution doesn't shorten the string when it removes the end. I added this (from https://stackoverflow.com/a/21815483/264822):
str.erase(std::find_if(str.rbegin(), str.rend(), std::bind1st(std::not_equal_to<char>(), ' ')).base(), str.end());
to remove the trailing space(s).
I guess the method std:remove works but it was giving some compatibility issue with the includes so I ended up writing this little function:
string removeCharsFromString(const string str, char* charsToRemove )
{
char c[str.length()+1]; // + terminating char
const char *p = str.c_str();
unsigned int z=0, size = str.length();
unsigned int x;
bool rem=false;
for(x=0; x<size; x++)
{
rem = false;
for (unsigned int i = 0; charsToRemove[i] != 0; i++)
{
if (charsToRemove[i] == p[x])
{
rem = true;
break;
}
}
if (rem == false) c[z++] = p[x];
}
c[z] = '\0';
return string(c);
}
Just use as
myString = removeCharsFromString(myString, "abc\r");
and it will remove all the occurrence of the given char list.
This might also be a bit more efficient as the loop returns after the first match, so we actually do less comparison.
This is how I do it:
std::string removeAll(std::string str, char c) {
size_t offset = 0;
size_t size = str.size();
size_t i = 0;
while (i < size - offset) {
if (str[i + offset] == c) {
offset++;
}
if (offset != 0) {
str[i] = str[i + offset];
}
i++;
}
str.resize(size - offset);
return str;
}
Basically whenever I find a given char, I advance the offset and relocate the char to the correct index. I don't know if this is correct or efficient, I'm starting (yet again) at C++ and i'd appreciate any input on that.
70% Faster Solution than the top answer
void removeCharsFromString(std::string& str, const char* charsToRemove)
{
size_t charsToRemoveLen = strlen(charsToRemove);
std::remove_if(str.begin(), str.end(), [charsToRemove, charsToRemoveLen](char ch) -> bool
{
for (int i = 0; i < charsToRemoveLen; ++i) {
if (ch == charsToRemove[i])
return true;
}
return false;
});
}
#include <string>
#include <algorithm>
std::string str = "YourString";
char chars[] = {'Y', 'S'};
str.erase (std::remove(str.begin(), str.end(), chars[i]), str.end());
Will remove capital Y and S from str, leaving "ourtring".
Note that remove is an algorithm and needs the header <algorithm> included.
Related
This Function is supposed to return a letter that has single occurrence, upper cased.
I was trying to find a mistake for 30 minutes and I seriously don't know what is the problem. May someone take a look at this?
#include <algorithm>
#include <string>
using namespace std;
char singleOccurrence(string str) {
transform(str.begin(), str.end(), str.begin(), ::toupper);
sort(str.begin(), str.end());
for(int i=0; i<str.length(); i++) {
if(i==str.length()-1)
return str[i];
else if(str[i] != str[i+1])
return str[i];
}
}
int main()
{
string str = "ala";
cout << singleOccurrence(str);
}
This if-else statement
if(i==str.length()-1)
{
return str[i];
}
else
if(str[i] != str[i+1])
{
return str[i];
}
does not make a sense.
For example consider string s = "aab". As s[1] != s[2] your function will return the letter 'a' though it is present in the string more than one time.
Or consider another example of a string like "aaa". In this case your function again will return the letter 'a'.
If you want to use an approach with a sorted sequence of characters then you could within the function use for example the standard container std::map<char, size_t> without sorting the passed string itself.
Otherwise you could use nested for loops for the original string.
In the both cases the function should be declared like
char singleOccurrence( const std::string &str );
or
char singleOccurrence( std::string_view str );
If for example in a string there is no such a character you could return the character '\0'.
if(i==str.length()-1)
{
return str[i];
}
else
if(str[i] != str[i+1])
{
return str[i];
}
}
After "else" statement you should put and "{", idk if that could be a problem. Second of all that if doesn't return something different, in both cases you return str[i]. Third I don't see iostream library included, don't forget to include if you want to use cout or cin(for reading from input). Also you call this function once in main and it will always give you the first letter after the letters will be sorted. You can try to use map, which is implemented in STL in map library. In map you can give your letter from that string as a key and key value will be the number of occurrences.
#include <algorithm>
#include <iostream>
#include <string>
#include <map>
using namespace std;
// char singleOccurrence(string str)
// {
// transform(str.begin(), str.end(), str.begin(), ::toupper);
// sort(str.begin(), str.end());
// for(int i=0; i<str.length(); i++)
// {
// if(i==str.length()-1)
// {
// return str[i];
// }
// else
// if(str[i] != str[i+1])
// {
// return str[i];
// }
// }
// }
int main()
{
map<char, int> occurrences;//declare a map, your key will be a char and you key value an int
string str="ala";
for(int i = 0; i < str.length(); i++){
occurrences[str[i]]++;
}
for(auto n : occurrences){
cout<<n.first<<" "<<n.second<<endl;//n will be an iterator, you use iterators to get map values
}//if you want to check the number and show only the letter with a single occurrence just add an if statement and check in key value (n.second) is equal to one
}
I hope this helps! :)
I am writing a piece of software, and It require me to handle data I get from a webpage with libcurl. When I get the data, for some reason it has extra line breaks in it. I need to figure out a way to only allow letters, numbers, and spaces. And remove everything else, including line breaks. Is there any easy way to do this? Thanks.
Write a function that takes a char and returns true if you want to remove that character or false if you want to keep it:
bool my_predicate(char c);
Then use the std::remove_if algorithm to remove the unwanted characters from the string:
std::string s = "my data";
s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());
Depending on your requirements, you may be able to use one of the Standard Library predicates, like std::isalnum, instead of writing your own predicate (you said you needed to match alphanumeric characters and spaces, so perhaps this doesn't exactly fit what you need).
If you want to use the Standard Library std::isalnum function, you will need a cast to disambiguate between the std::isalnum function in the C Standard Library header <cctype> (which is the one you want to use) and the std::isalnum in the C++ Standard Library header <locale> (which is not the one you want to use, unless you want to perform locale-specific string processing):
s.erase(std::remove_if(s.begin(), s.end(), (int(*)(int))std::isalnum), s.end());
This works equally well with any of the sequence containers (including std::string, std::vector and std::deque). This idiom is commonly referred to as the "erase/remove" idiom. The std::remove_if algorithm will also work with ordinary arrays. The std::remove_if makes only a single pass over the sequence, so it has linear time complexity.
Previous uses of std::isalnum won't compile with std::ptr_fun without passing the unary argument is requires, hence this solution with a lambda function should encapsulate the correct answer:
s.erase(std::remove_if(s.begin(), s.end(),
[]( auto const& c ) -> bool { return !std::isalnum(c); } ), s.end());
You could always loop through and just erase all non alphanumeric characters if you're using string.
#include <cctype>
size_t i = 0;
size_t len = str.length();
while(i < len){
if (!isalnum(str[i]) || str[i] == ' '){
str.erase(i,1);
len--;
}else
i++;
}
Someone better with the Standard Lib can probably do this without a loop.
If you're using just a char buffer, you can loop through and if a character is not alphanumeric, shift all the characters after it backwards one (to overwrite the offending character):
#include <cctype>
size_t buflen = something;
for (size_t i = 0; i < buflen; ++i)
if (!isalnum(buf[i]) || buf[i] != ' ')
memcpy(buf[i], buf[i + 1], --buflen - i);
Just extending James McNellis's code a little bit more. His function is deleting alnum characters instead of non-alnum ones.
To delete non-alnum characters from a string. (alnum = alphabetical or numeric)
Declare a function (isalnum returns 0 if passed char is not alnum)
bool isNotAlnum(char c) {
return isalnum(c) == 0;
}
And then write this
s.erase(remove_if(s.begin(), s.end(), isNotAlnum), s.end());
then your string is only with alnum characters.
Benchmarking the different methods.
If you are looking for a benchmark I made one.
(115830 cycles) 115.8ms -> using stringstream
( 40434 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !isalnum(c); }), s.end());
( 40389 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return ispunct(c); }), s.end());
( 42386 cycles) 42.4ms -> s.erase(remove_if(s.begin(), s.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s.end());
( 42969 cycles) 43.0ms -> s.erase(remove_if(s.begin(), s.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s.end());
( 44829 cycles) 44.8ms -> alnum_from_libc(s) see below
( 24505 cycles) 24.5ms -> Puzzled? My method, see below
( 9717 cycles) 9.7ms -> using mask and bitwise operators
Original length: 8286208, current len with alnum only: 5822471
Stringstream gives terrible results (but we all know that)
The different answers already given gives about the same runtime
Doing it the C way consistently give better runtime (almost twice faster!), it is definitely worth considering, and on top of that it is compatible with C language.
My bitwise method (also C compatible) is more than 400% faster.
NB the selected answer had to be modified as it was keeping only the special characters
NB2: The test file is a (almost) 8192 kb text file with roughly 62 alnum and 12 special characters, randomly and evenly written.
Benchmark source code
#include <ctime>
#include <iostream>
#include <sstream>
#include <string>
#include <algorithm>
#include <locale> // ispunct
#include <cctype>
#include <fstream> // read file
#include <streambuf>
#include <sys/stat.h> // check if file exist
#include <cstring>
using namespace std;
bool exist(const char *name)
{
struct stat buffer;
return !stat(name, &buffer);
}
constexpr int SIZE = 8092 * 1024;
void keep_alnum(string &s) {
stringstream ss;
int i = 0;
for (i = 0; i < SIZE; i++)
if (isalnum(s[i]))
ss << s[i];
s = ss.str();
}
/* my method, best runtime */
void old_school(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30; // '0'
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
void alnum_from_libc(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
if (isalnum(s[i]))
s[n++] = s[i];
}
s[n] = '\0';
}
#define benchmark(x) printf("\033[30m(%6.0lf cycles) \033[32m%5.1lfms\n\033[0m", x, x / (CLOCKS_PER_SEC / 1000))
int main(int ac, char **av) {
if (ac < 2) {
cout << "usage: ./a.out \"{your string} or ./a.out FILE \"{your file}\n";
return 1;
}
string s;
s.reserve(SIZE+1);
string s1;
s1.reserve(SIZE+1);
char s4[SIZE + 1], s5[SIZE + 1];
if (ac == 3) {
if (!exist(av[2])) {
for (size_t i = 0; i < SIZE; i++)
s4[i] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnoporstuvwxyz!##$%^&*()__+:\"<>?,./'"[rand() % 74];
s4[SIZE] = '\0';
ofstream ofs(av[2]);
if (ofs)
ofs << s4;
}
ifstream ifs(av[2]);
if (ifs) {
ifs.rdbuf()->pubsetbuf(s4, SIZE);
copy(istreambuf_iterator<char>(ifs), {}, s.begin());
}
else
cout << "error\n";
ifs.seekg(0, ios::beg);
s.assign((istreambuf_iterator<char>(ifs)), istreambuf_iterator<char>());
}
else
s = av[1];
double elapsedTime;
clock_t start;
bool is_different = false;
s1 = s;
start = clock();
keep_alnum(s1);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
string tmp = s1;
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return !isalnum(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return ispunct(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
memcpy(s4, s.c_str(), SIZE);
start = clock();
alnum_from_libc(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
memcpy(s4, s.c_str(), SIZE);
start = clock();
old_school(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
cout << "Original length: " << s.size() << ", current len with alnum only: " << strlen(s4) << endl;
// make sure that strings are equivalent
printf("\033[3%cm%s\n", ('3' + !is_different), !is_different ? "OK" : "KO");
return 0;
}
My solution
For the bitwise method you can check it directly on my github, basically I avoid branching instructions (if) thanks to the mask. I avoid posting bitwise operations with C++ tag, I get a lot of hate for it.
For the C style one, I iterate over the string and have two index: n for the characters we keep and i to go through the string, where we test one after another if it is a digit, a uppercase or a lowercase.
Add this function:
void strip_special_chars(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30;
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
and use as:
char s1[s.size() + 1]
memcpy(s1, s.c_str(), s.size());
strip_special_chars(s1);
The remove_copy_if standard algorithm would be very appropriate for your case.
#include <cctype>
#include <string>
#include <functional>
std::string s = "Hello World!";
s.erase(std::remove_if(s.begin(), s.end(),
std::not1(std::ptr_fun(std::isalnum)), s.end()), s.end());
std::cout << s << std::endl;
Results in:
"HelloWorld"
You use isalnum to determine whether or not each character is alpha numeric, then use ptr_fun to pass the function to not1 which NOTs the returned value, leaving you with only the alphanumeric stuff you want.
You can use the remove-erase algorithm this way -
// Removes all punctuation
s.erase( std::remove_if(s.begin(), s.end(), &ispunct), s.end());
Below code should work just fine for given string s. It's utilizing <algorithm> and <locale> libraries.
std::string s("He!!llo Wo,#rld! 12 453");
s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !std::isalnum(c); }), s.end());
The mentioned solution
s.erase( std::remove_if(s.begin(), s.end(), &std::ispunct), s.end());
is very nice, but unfortunately doesn't work with characters like 'Ñ' in Visual Studio (debug mode), because of this line:
_ASSERTE((unsigned)(c + 1) <= 256)
in isctype.c
So, I would recommend something like this:
inline int my_ispunct( int ch )
{
return std::ispunct(unsigned char(ch));
}
...
s.erase( std::remove_if(s.begin(), s.end(), &my_ispunct), s.end());
The following works for me.
str.erase(std::remove_if(str.begin(), str.end(), &ispunct), str.end());
str.erase(std::remove_if(str.begin(), str.end(), &isspace), str.end());
void remove_spaces(string data)
{ int i=0,j=0;
while(i<data.length())
{
if (isalpha(data[i]))
{
data[i]=data[i];
i++;
}
else
{
data.erase(i,1);}
}
cout<<data;
}
There are a few examples about this question. However most of the answers are not what I am looking for.
I am looking for a way to implement an efficient and easy function rather than using boost or any other non STL libraries. If you ask me why, in most coding competitions and interviews, you are not allowed to use them.
Here is the closest that I can approach:
vector<string> SplitString(const char *str, char c)
{
vector<string> result;
do {
const char *begin = str;
while(*str != c && *str) {
str++;
}
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}
int main() {
string mainString = "This is a sentence. Another sentence. The third sentence. This is the last sentence.";
vector<string> sentences;
sentences = SplitString(mainString.c_str(), '.');
while (!sentences.empty()) {
cout << sentences.back() << endl;
sentences.pop_back();
}
return 0;
}
Now the problem with this is, it can only have a char delimiter not string. I have thought of implementing a few ways but they seemed way too complex. The easiest one that I thought was, convert delimiter to char array use c as the first char of the delimiter char array after this:
while(*str != c && *str) {
str++;
}
const char *beginDelim = *cArr;
while(1) {
if (*str == *cArr && *str && *cArr) {
str++;
cArr++;
}
else if (!*cArr) {
break;
}
else if (*cArr) {
cArr = beginDelim;
}
}
And the code continues from result.push_back() part.
So I was wondering if are there any way to implement an efficient and easy function for splitting a string with a string delimiter?
Generally speaking, a string is a char pointer. So you should search for the first character in the delimeter, then check the very next character. Also in looking at your code I am not sure that while (0 != *str++) is doing what you think it is. I think you mean for it to be null terminated.
something like this should do it:
vector<string> SplitString(const char* str,const char* d) {
vector<string> result;
size_t len = strlen(d);
const char* start = str;
while ( str = strstr(start,d) ) {
result.push_back(string(start,len));
start = str + len;
}
result.push_back(start);
return result;
}
How's this:
#include <vector>
#include <algorithm>
#include <string>
using namespace std;
vector<string> SplitString(const string &str, const string &delim)
{
vector<string> ret;
string::const_iterator prev = str.begin();
for (string::const_iterator i = str.begin(); i < str.end() - delim.length()+1; ++i)
{
if (equal(delim.begin(), delim.end(), i)) {
ret.push_back(string(prev,i));
i += delim.length()-1;
prev = i+1;
}
}
ret.push_back(string(prev,str.end()));
return ret;
}
#include <iostream>
#include <string>
#include <vector>
using namespace std;
vector<string> SplitString(string str, const string &delim) {
vector<string> result;
size_t found;
while((found = str.find(delim)) != string::npos) {
result.push_back(str.substr(0, found));
str = str.substr(found + delim.size());
}
return result;
}
int main() {
string mainString = "This is a sentence. Another sentence. The third sentence. This is the last sentence.";
vector<string> sentences;
sentences = SplitString(mainString, ".");
for(auto& sentence : sentences) {
cout << sentence << endl;
}
return 0;
}
vector<string>split(string str, const char d){
string temp;
vector<string>vct;
for(int i = 0; str[i] != '\0'; i++){
if(str[i] != d){
temp += str[i];
}else if(!empty(temp)){
vct.push_back(temp), temp.clear();
}
}
vct.push_back(temp);
return vct;
}
Takes two arguments
const char d as delimiter.
string str as string to be splitted.
stores splitted string in a vector and returns it.
Although, I'm not sure about efficiency of this code. :)
I am writing a piece of software, and It require me to handle data I get from a webpage with libcurl. When I get the data, for some reason it has extra line breaks in it. I need to figure out a way to only allow letters, numbers, and spaces. And remove everything else, including line breaks. Is there any easy way to do this? Thanks.
Write a function that takes a char and returns true if you want to remove that character or false if you want to keep it:
bool my_predicate(char c);
Then use the std::remove_if algorithm to remove the unwanted characters from the string:
std::string s = "my data";
s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());
Depending on your requirements, you may be able to use one of the Standard Library predicates, like std::isalnum, instead of writing your own predicate (you said you needed to match alphanumeric characters and spaces, so perhaps this doesn't exactly fit what you need).
If you want to use the Standard Library std::isalnum function, you will need a cast to disambiguate between the std::isalnum function in the C Standard Library header <cctype> (which is the one you want to use) and the std::isalnum in the C++ Standard Library header <locale> (which is not the one you want to use, unless you want to perform locale-specific string processing):
s.erase(std::remove_if(s.begin(), s.end(), (int(*)(int))std::isalnum), s.end());
This works equally well with any of the sequence containers (including std::string, std::vector and std::deque). This idiom is commonly referred to as the "erase/remove" idiom. The std::remove_if algorithm will also work with ordinary arrays. The std::remove_if makes only a single pass over the sequence, so it has linear time complexity.
Previous uses of std::isalnum won't compile with std::ptr_fun without passing the unary argument is requires, hence this solution with a lambda function should encapsulate the correct answer:
s.erase(std::remove_if(s.begin(), s.end(),
[]( auto const& c ) -> bool { return !std::isalnum(c); } ), s.end());
You could always loop through and just erase all non alphanumeric characters if you're using string.
#include <cctype>
size_t i = 0;
size_t len = str.length();
while(i < len){
if (!isalnum(str[i]) || str[i] == ' '){
str.erase(i,1);
len--;
}else
i++;
}
Someone better with the Standard Lib can probably do this without a loop.
If you're using just a char buffer, you can loop through and if a character is not alphanumeric, shift all the characters after it backwards one (to overwrite the offending character):
#include <cctype>
size_t buflen = something;
for (size_t i = 0; i < buflen; ++i)
if (!isalnum(buf[i]) || buf[i] != ' ')
memcpy(buf[i], buf[i + 1], --buflen - i);
Just extending James McNellis's code a little bit more. His function is deleting alnum characters instead of non-alnum ones.
To delete non-alnum characters from a string. (alnum = alphabetical or numeric)
Declare a function (isalnum returns 0 if passed char is not alnum)
bool isNotAlnum(char c) {
return isalnum(c) == 0;
}
And then write this
s.erase(remove_if(s.begin(), s.end(), isNotAlnum), s.end());
then your string is only with alnum characters.
Benchmarking the different methods.
If you are looking for a benchmark I made one.
(115830 cycles) 115.8ms -> using stringstream
( 40434 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !isalnum(c); }), s.end());
( 40389 cycles) 40.4ms -> s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return ispunct(c); }), s.end());
( 42386 cycles) 42.4ms -> s.erase(remove_if(s.begin(), s.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s.end());
( 42969 cycles) 43.0ms -> s.erase(remove_if(s.begin(), s.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s.end());
( 44829 cycles) 44.8ms -> alnum_from_libc(s) see below
( 24505 cycles) 24.5ms -> Puzzled? My method, see below
( 9717 cycles) 9.7ms -> using mask and bitwise operators
Original length: 8286208, current len with alnum only: 5822471
Stringstream gives terrible results (but we all know that)
The different answers already given gives about the same runtime
Doing it the C way consistently give better runtime (almost twice faster!), it is definitely worth considering, and on top of that it is compatible with C language.
My bitwise method (also C compatible) is more than 400% faster.
NB the selected answer had to be modified as it was keeping only the special characters
NB2: The test file is a (almost) 8192 kb text file with roughly 62 alnum and 12 special characters, randomly and evenly written.
Benchmark source code
#include <ctime>
#include <iostream>
#include <sstream>
#include <string>
#include <algorithm>
#include <locale> // ispunct
#include <cctype>
#include <fstream> // read file
#include <streambuf>
#include <sys/stat.h> // check if file exist
#include <cstring>
using namespace std;
bool exist(const char *name)
{
struct stat buffer;
return !stat(name, &buffer);
}
constexpr int SIZE = 8092 * 1024;
void keep_alnum(string &s) {
stringstream ss;
int i = 0;
for (i = 0; i < SIZE; i++)
if (isalnum(s[i]))
ss << s[i];
s = ss.str();
}
/* my method, best runtime */
void old_school(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30; // '0'
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
void alnum_from_libc(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
if (isalnum(s[i]))
s[n++] = s[i];
}
s[n] = '\0';
}
#define benchmark(x) printf("\033[30m(%6.0lf cycles) \033[32m%5.1lfms\n\033[0m", x, x / (CLOCKS_PER_SEC / 1000))
int main(int ac, char **av) {
if (ac < 2) {
cout << "usage: ./a.out \"{your string} or ./a.out FILE \"{your file}\n";
return 1;
}
string s;
s.reserve(SIZE+1);
string s1;
s1.reserve(SIZE+1);
char s4[SIZE + 1], s5[SIZE + 1];
if (ac == 3) {
if (!exist(av[2])) {
for (size_t i = 0; i < SIZE; i++)
s4[i] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnoporstuvwxyz!##$%^&*()__+:\"<>?,./'"[rand() % 74];
s4[SIZE] = '\0';
ofstream ofs(av[2]);
if (ofs)
ofs << s4;
}
ifstream ifs(av[2]);
if (ifs) {
ifs.rdbuf()->pubsetbuf(s4, SIZE);
copy(istreambuf_iterator<char>(ifs), {}, s.begin());
}
else
cout << "error\n";
ifs.seekg(0, ios::beg);
s.assign((istreambuf_iterator<char>(ifs)), istreambuf_iterator<char>());
}
else
s = av[1];
double elapsedTime;
clock_t start;
bool is_different = false;
s1 = s;
start = clock();
keep_alnum(s1);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
string tmp = s1;
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return !isalnum(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(std::remove_if(s1.begin(), s1.end(), [](char c) { return ispunct(c); }), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), not1(ptr_fun( (int(*)(int))isalnum ))), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
s1 = s;
start = clock();
s1.erase(remove_if(s1.begin(), s1.end(), []( auto const& c ) -> bool { return !isalnum(c); } ), s1.end());
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s1.c_str());
memcpy(s4, s.c_str(), SIZE);
start = clock();
alnum_from_libc(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
memcpy(s4, s.c_str(), SIZE);
start = clock();
old_school(s4);
elapsedTime = (clock() - start);
benchmark(elapsedTime);
is_different |= !!strcmp(tmp.c_str(), s4);
cout << "Original length: " << s.size() << ", current len with alnum only: " << strlen(s4) << endl;
// make sure that strings are equivalent
printf("\033[3%cm%s\n", ('3' + !is_different), !is_different ? "OK" : "KO");
return 0;
}
My solution
For the bitwise method you can check it directly on my github, basically I avoid branching instructions (if) thanks to the mask. I avoid posting bitwise operations with C++ tag, I get a lot of hate for it.
For the C style one, I iterate over the string and have two index: n for the characters we keep and i to go through the string, where we test one after another if it is a digit, a uppercase or a lowercase.
Add this function:
void strip_special_chars(char *s) {
int n = 0;
for (int i = 0; i < SIZE; i++) {
unsigned char c = s[i] - 0x30;
if (c < 10 || (c -= 0x11) < 26 || (c -= 0x20) < 26) // 0x30 + 0x11 = 'A' + 0x20 = 'a'
s[n++] = s[i];
}
s[n] = '\0';
}
and use as:
char s1[s.size() + 1]
memcpy(s1, s.c_str(), s.size());
strip_special_chars(s1);
The remove_copy_if standard algorithm would be very appropriate for your case.
#include <cctype>
#include <string>
#include <functional>
std::string s = "Hello World!";
s.erase(std::remove_if(s.begin(), s.end(),
std::not1(std::ptr_fun(std::isalnum)), s.end()), s.end());
std::cout << s << std::endl;
Results in:
"HelloWorld"
You use isalnum to determine whether or not each character is alpha numeric, then use ptr_fun to pass the function to not1 which NOTs the returned value, leaving you with only the alphanumeric stuff you want.
You can use the remove-erase algorithm this way -
// Removes all punctuation
s.erase( std::remove_if(s.begin(), s.end(), &ispunct), s.end());
Below code should work just fine for given string s. It's utilizing <algorithm> and <locale> libraries.
std::string s("He!!llo Wo,#rld! 12 453");
s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !std::isalnum(c); }), s.end());
The mentioned solution
s.erase( std::remove_if(s.begin(), s.end(), &std::ispunct), s.end());
is very nice, but unfortunately doesn't work with characters like 'Ñ' in Visual Studio (debug mode), because of this line:
_ASSERTE((unsigned)(c + 1) <= 256)
in isctype.c
So, I would recommend something like this:
inline int my_ispunct( int ch )
{
return std::ispunct(unsigned char(ch));
}
...
s.erase( std::remove_if(s.begin(), s.end(), &my_ispunct), s.end());
The following works for me.
str.erase(std::remove_if(str.begin(), str.end(), &ispunct), str.end());
str.erase(std::remove_if(str.begin(), str.end(), &isspace), str.end());
void remove_spaces(string data)
{ int i=0,j=0;
while(i<data.length())
{
if (isalpha(data[i]))
{
data[i]=data[i];
i++;
}
else
{
data.erase(i,1);}
}
cout<<data;
}
How can I count the number of "_" in a string like "bla_bla_blabla_bla"?
#include <algorithm>
std::string s = "a_b_c";
std::string::difference_type n = std::count(s.begin(), s.end(), '_');
Pseudocode:
count = 0
For each character c in string s
Check if c equals '_'
If yes, increase count
EDIT: C++ example code:
int count_underscores(string s) {
int count = 0;
for (int i = 0; i < s.size(); i++)
if (s[i] == '_') count++;
return count;
}
Note that this is code to use together with std::string, if you're using char*, replace s.size() with strlen(s).
Also note: I can understand you want something "as small as possible", but I'd suggest you to use this solution instead. As you see you can use a function to encapsulate the code for you so you won't have to write out the for loop everytime, but can just use count_underscores("my_string_") in the rest of your code. Using advanced C++ algorithms is certainly possible here, but I think it's overkill.
Old-fashioned solution with appropriately named variables. This gives the code some spirit.
#include <cstdio>
int _(char*__){int ___=0;while(*__)___='_'==*__++?___+1:___;return ___;}int main(){char*__="_la_blba_bla__bla___";printf("The string \"%s\" contains %d _ characters\n",__,_(__));}
Edit: about 8 years later, looking at this answer I'm ashamed I did this (even though I justified it to myself as a snarky poke at a low-effort question). This is toxic and not OK. I'm not removing the post; I'm adding this apology to help shifting the atmosphere on StackOverflow. So OP: I apologize and I hope you got your homework right despite my trolling and that answers like mine did not discourage you from participating on the site.
Using the lambda function to check the character is "_" then the only count will be incremented else not a valid character
std::string s = "a_b_c";
size_t count = std::count_if( s.begin(), s.end(), []( char c ){return c =='_';});
std::cout << "The count of numbers: " << count << std::endl;
#include <boost/range/algorithm/count.hpp>
std::string str = "a_b_c";
int cnt = boost::count(str, '_');
You name it... Lambda version... :)
using namespace boost::lambda;
std::string s = "a_b_c";
std::cout << std::count_if (s.begin(), s.end(), _1 == '_') << std::endl;
You need several includes... I leave you that as an exercise...
Count character occurrences in a string is easy:
#include <bits/stdc++.h>
using namespace std;
int main()
{
string s="Sakib Hossain";
int cou=count(s.begin(),s.end(),'a');
cout<<cou;
}
There are several methods of std::string for searching, but find is probably what you're looking for. If you mean a C-style string, then the equivalent is strchr. However, in either case, you can also use a for loop and check each character—the loop is essentially what these two wrap up.
Once you know how to find the next character given a starting position, you continually advance your search (i.e. use a loop), counting as you go.
I would have done it this way :
#include <iostream>
#include <string>
using namespace std;
int main()
{
int count = 0;
string s("Hello_world");
for (int i = 0; i < s.size(); i++)
{
if (s.at(i) == '_')
count++;
}
cout << endl << count;
cin.ignore();
return 0;
}
You can find out occurrence of '_' in source string by using string functions.
find() function takes 2 arguments , first - string whose occurrences we want to find out and second argument takes starting position.While loop is use to find out occurrence till the end of source string.
example:
string str2 = "_";
string strData = "bla_bla_blabla_bla_";
size_t pos = 0,pos2;
while ((pos = strData.find(str2, pos)) < strData.length())
{
printf("\n%d", pos);
pos += str2.length();
}
The range based for loop comes in handy
int countUnderScores(string str)
{
int count = 0;
for (char c: str)
if (c == '_') count++;
return count;
}
int main()
{
string str = "bla_bla_blabla_bla";
int count = countUnderScores(str);
cout << count << endl;
}
I would have done something like that :)
const char* str = "bla_bla_blabla_bla";
char* p = str;
unsigned int count = 0;
while (*p != '\0')
if (*p++ == '_')
count++;
Try
#include <iostream>
#include <string>
using namespace std;
int WordOccurrenceCount( std::string const & str, std::string const & word )
{
int count(0);
std::string::size_type word_pos( 0 );
while ( word_pos!=std::string::npos )
{
word_pos = str.find(word, word_pos );
if ( word_pos != std::string::npos )
{
++count;
// start next search after this word
word_pos += word.length();
}
}
return count;
}
int main()
{
string sting1="theeee peeeearl is in theeee riveeeer";
string word1="e";
cout<<word1<<" occurs "<<WordOccurrenceCount(sting1,word1)<<" times in ["<<sting1 <<"] \n\n";
return 0;
}
public static void main(String[] args) {
char[] array = "aabsbdcbdgratsbdbcfdgs".toCharArray();
char[][] countArr = new char[array.length][2];
int lastIndex = 0;
for (char c : array) {
int foundIndex = -1;
for (int i = 0; i < lastIndex; i++) {
if (countArr[i][0] == c) {
foundIndex = i;
break;
}
}
if (foundIndex >= 0) {
int a = countArr[foundIndex][1];
countArr[foundIndex][1] = (char) ++a;
} else {
countArr[lastIndex][0] = c;
countArr[lastIndex][1] = '1';
lastIndex++;
}
}
for (int i = 0; i < lastIndex; i++) {
System.out.println(countArr[i][0] + " " + countArr[i][1]);
}
}