I know how to replace all occurrences of a character with another character in string (How to replace all occurrences of a character in string?)
But what if I want to replace all even numbers in string with given string? I am confused between replace, replace_if and member replace/find functions of basic_string class, because signature of functions require old_val and new_val to be same type. But old_val is char, and new_val is string. Is there any effective way to do this, not using multiple loops?
e.g. if the input string is
"asjkdn3vhsjdvcn2asjnbd2vd"
and the replacement text is
"whatever"
, the result should be
"asjkdn3vhsjdvcnwhateverasjnbdwhatevervd"
You can use std::string::replace() to replace a character with a string. A working example is below:
#include <string>
#include <algorithm>
#include <iostream>
#include <string_view>
void replace_even_with_string(std::string &inout)
{
auto is_even = [](char ch)
{
return std::isdigit(static_cast<unsigned char>(ch)) && ((ch - '0') % 2) == 0;
};
std::string_view replacement_str = "whatever";
auto top = std::find_if(inout.begin(), inout.end(), is_even) - inout.begin();
for (std::string::size_type pos{};
(pos = (std::find_if(inout.begin() + pos, inout.end(), is_even) - inout.begin())) < inout.length();
pos += replacement_str.length() - 1)
{
inout.replace(pos, 1, replacement_str.data());
}
}
int main()
{
std::string test = "asjkdn3vhsjdvcn2asjnbd2vd";
std::cout << test << std::endl;
replace_even_with_string(test);
std::cout << test << std::endl;
}
While using a regex can add unnecessary complexity in many cases, here it's actually simple to read and write:
std::string str = /* ... some text ... */
std::regex r{R"~~([02468])~~"}; // this will match even digits
str = std::regex_replace(str, r, "rep"); // replace with the needed text
// and overwrite string
Here's a demo.
Related
I want a function that split text by array of delimiters. I have a demo that works perfectly, but it is really really slow. Here is a example of parameters.
text:
"pop-pap-bab bob"
vector of delimiters:
"-"," "
the result:
"pop", "-", "pap", "-", "bab", "bob"
So the function loops throw the string and tries to find delimeters and if it finds one it pushes the text and the delimiter that was found to the result array, if the text only contains spaces or if it is empty then don't push the text.
std::string replace(std::string str,std::string old,std::string new_str){
size_t pos = 0;
while ((pos = str.find(old)) != std::string::npos) {
str.replace(pos, old.length(), new_str);
}
return str;
}
std::vector<std::string> split_with_delimeter(std::string str,std::vector<std::string> delimeters){
std::vector<std::string> result;
std::string token;
int flag = 0;
for(int i=0;i<(int)str.size();i++){
for(int j=0;j<(int)delimeters.size();j++){
if(str.substr(i,delimeters.at(j).size()) == delimeters.at(j)){
if(token != ""){
result.push_back(token);
token = "";
}
if(replace(delimeters.at(j)," ","") != ""){
result.push_back(delimeters.at(j));
}
i += delimeters.at(j).size()-1;
flag = 1;
break;
}
}
if(flag == 0){token += str.at(i);}
flag = 0;
}
if(token != ""){
result.push_back(token);
}
return result;
}
My issue is that, the functions is really slow since it has 3 loops. I am wondering if anyone knows how to make the function faster. I am sorry, if I wasn't clear enough my english isn't the best.
It might be a good idea to use boost expressive. It is a powerful tool for various string operations more than struggling with string::find_xx and self for-loop or regex.
Concise explanation:
+as_xpr(" ") is repeated match more than 1 like regex and then prefix "-" means
shortest match.
If you define regex parser as sregex rex = "(" >> (+_w | +"_") >> ":" >> +_d >> ")", it would match (port_num:8080). In this case, ">>" means the concat of parsers and (+_w | +"_") means that it matches character or "_" repeatedly.
#include <vector>
#include <string>
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace std;
using namespace boost::xpressive;
int main() {
string source = "Nigeria is a multi&&national state in--habited by more than 2;;50 ethnic groups speak###ing 500 distinct languages";
vector<string> delimiters{ " ", " ", "&&", "-", ";;", "###"};
vector<sregex> pss{ -+as_xpr(delimiters.front()) };
for (const auto& d : delimiters) pss.push_back(pss.back() | -+as_xpr(d));
vector<string> ret;
size_t pos = 0;
auto push = [&](auto s, auto e) { ret.push_back(source.substr(s, e)); };
for_each(sregex_iterator(source.begin(), source.end(), pss.back()), {}, [&](smatch const& m) {
if (m.position() - pos) push(pos, m.position() - pos);
pos = m.position() + m.str().size();
}
);
push(pos, source.size() - pos);
for (auto& s : ret) printf("%s\n", s.c_str());
}
Output is splitted by multiple string delimiers.
Nigeria
is
a
multi
national
state
in
habited
by
more
than
2
50
ethnic
groups
speak
ing
500
distinct
languages
Maybe, as an alternative, you could use a regex? But maybe also too slow for you . . .
With a regex life would be very simple.
Please see the following example:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <iterator>
const std::regex re(R"((\w+|[\- ]))");
int main() {
std::string s{"pop-pap-bab bob"};
std::vector<std::string> part{std::sregex_token_iterator(s.begin(),s.end(),re),{}};
for (const std::string& p : part) std::cout << p << '\n';
}
We use the std::sregex_token_iterator in combination with the std::vectors range constructor, to extract everything specified in the regex and then put all those stuff into the std::vector
The regex itself is also simple. It specifies words or delimiters.
Maybe its worth a try . . .
NOTE: You've complained that your code is slow, but it's important to understand that most of the answers will have options to potentially speed up the program. And even if the author of the option measured the acceleration of the program, the option may be slower on your machine, so do not forget to measure the execution speed yourself.
If I were you, I would create a separate function that receives an array of strings and outputs an array of delimited strings. The problem with this approach may be that if the delimiter includes another delimiter, the result may not be what you expect, but it will be easier to iterate through different options for string splitting, finding the best.
And my solution would looks like this(though, it requires c++20)
#include <iomanip>
#include <iostream>
#include <ranges>
#include <string_view>
#include <vector>
std::vector<std::string> split_elems_of_array(const std::vector<std::string>& array, const std::string& delim)
{
std::vector<std::string> result;
for(const auto str: array)
{
for (const auto word : std::views::split(str, delim))
{
std::string chunk(word.begin(), word.end());
if(!chunk.empty() && chunk != " ")
result.push_back(chunk + delim);
}
}
return result;
}
std::vector<std::string> split_string(std::string str, std::vector<std::string> delims)
{
std::vector<std::string> result = {std::string(str)};
for(const auto&delim: delims)
result = split_elems_of_array(result, delim);
return {result.begin(), result.end()};
}
For my machine, my approach is 56 times faster: 67 ms versus 5112 ms. Length of string is 1000000, there are 100 delims with length 100
Here is the algorithm of standard splitting. if you split pop-pap-bab bob by {'-' , ' '} it gives you ["pop", "pap", "bab", "bob"] it's not storing delimiters and doesn't check for empty text. You can change it to do those things too.
Define a vector of strings named result.
Define a string variable named buffer.
Loop over your string, if current character is not a delimiter append it to buffer.
if current character is a delimiter, append buffer to result.
Return result at the end.
std::vector<std::string> split(std::string str, std::vector<char> delimiters)
{
std::vector<std::string> result;
std::string buffer;
for (const auto ch : str)
{
if (std::find(delimiters.begin(), delimiters.end(), ch) == delimiters.end())
buffer += ch;
else
{
result.insert(result.end(), buffer);
buffer.clear();
}
}
if (buffer.length())
result.insert(result.end(), buffer);
return result;
}
It's time complexity is O(n.m). n is the length of string and m is the length of delimiters.
I'm really stuck here. So I can't edit the main function, and inside it there is a function call with the only parameter being the string. How can I make this function put each word from the string into a vector, without using the auto keyword? I realize that this code is probably really wrong but its my best attempt at what it should look like.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
vector<string> extract_words(const char * sentence[])
{
string word = "";
vector<string> list;
for (int i = 0; i < sentence.size(); ++i)
{
while (sentence[i] != ' ')
{
word = word + sentence[i];
}
list.push_back(word);
}
}
int main()
{
sentence = "Help me please" /*In the actual code a function call is here that gets input sentence.*/
if (sentence.length() > 0)
{
words = extract_words(sentence);
}
}
Do you know how to read "words" from std::cin?
Then you can put that string in a std::istringstream which works like std::cin but for "reading" strings instead.
Use the stream extract operator >> in a loop to get all the words one by one, and add them to the vector.
Perhaps something like:
std::vector<std::string> get_all_words(std::string const& string)
{
std::vector<std::string> words;
std::istringstream in(string);
std::string word;
while (in >> word)
{
words.push_back(word);
}
return words;
}
With a little more knowledge of C++ and its standard classes and functions, you can actually make the function a lot shorter:
std::vector<std::string> get_all_words(std::string const& string)
{
std::istringstream in(string);
return std::vector<std::string>(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>());
}
I recommend making the argument to the function a const std::string& instead of const char * sentence[]. A std::string has many member functions, like find_first_of, find_first_not_of and substr and more that could help a lot.
Here's an example using those mentioned:
std::vector<std::string> extract_words(const std::string& sentence)
{
/* Control char's, "whitespaces", that we don't want in our words:
\a audible bell
\b backspace
\f form feed
\n line feed
\r carriage return
\t horizontal tab
\v vertical tab
*/
static const char whitespaces[] = " \t\n\r\a\b\f\v";
std::vector<std::string> list;
std::size_t begin = 0;
while(true)
{
// Skip whitespaces by finding the first non-whitespace, starting at
// "begin":
begin = sentence.find_first_not_of(whitespaces, begin);
// If no non-whitespace char was found, break out:
if(begin == std::string::npos) break;
// Search for a whitespace starting at "begin + 1":
std::size_t end = sentence.find_first_of(whitespaces, begin + 1);
// Store the result by creating a substring from "begin" with the
// length "end - begin":
list.push_back(sentence.substr(begin, end - begin));
// If no whitespace was found, break out:
if(end == std::string::npos) break;
// Set "begin" to the char after the found whitespace before the loop
// makes another lap:
begin = end + 1;
}
return list;
}
Demo
With the added restriction "no breaks", this could be a variant. It does exactly the same as the above, but without using break:
std::vector<std::string> extract_words(const std::string& sentence)
{
static const char whitespaces[] = " \t\n\r\a\b\f\v";
std::vector<std::string> list;
std::size_t begin = 0;
bool loop = true;
while(loop)
{
begin = sentence.find_first_not_of(whitespaces, begin);
if(begin == std::string::npos) {
loop = false;
} else {
std::size_t end = sentence.find_first_of(whitespaces, begin + 1);
list.push_back(sentence.substr(begin, end - begin));
if(end == std::string::npos) {
loop = false;
} else {
begin = end + 1;
}
}
}
return list;
}
I want to count how many unique words are in string 's' where punctuations and newline character (\n) separates each word. So far I've used the logical or operator to check how many wordSeparators are in the string, and added 1 to the result to get the number of words in string s.
My current code returns 12 as the number of word. Since 'ab', 'AB', 'aB', 'Ab' (and same for 'zzzz') are all same and not unique, how can I ignore the variants of a word? I followed the link: http://www.cplusplus.com/reference/algorithm/unique/, but the reference counts unique item in a vector. But, I am using string and not vector.
Here is my code:
#include <iostream>
#include <string>
using namespace std;
bool isWordSeparator(char & c) {
return c == ' ' || c == '-' || c == '\n' || c == '?' || c == '.' || c == ','
|| c == '?' || c == '!' || c == ':' || c == ';';
}
int countWords(string s) {
int wordCount = 0;
if (s.empty()) {
return 0;
}
for (int x = 0; x < s.length(); x++) {
if (isWordSeparator(s.at(x))) {
wordCount++;
return wordCount+1;
int main() {
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
int number_of_words = countWords(s);
cout << "Number of Words: " << number_of_words << endl;
return 0;
}
What you need to make your code case-insensitive is tolower().
You can apply it to your original string using std::transform:
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
I should add however that your current code is much closer to C than to C++, perhaps you should check out what standard library has to offer.
I suggest istringstream + istream_iterator for tokenizing and either unique_copy or set for getting rid of the duplicates, like this: https://ideone.com/nb4BEH
You could create a set of strings, save the position of the last separator (starting with 0) and use substring to extract the word, then insert it into the set. When done just return the set's size.
You could make the whole operation easier by using string::split - it tokenizes the string for you. All you have to do is insert all of the elements in the returned array to the set and again return it's size.
Edit: as per comments, you need a custom comparator to ignore case for comparisons.
First of all I'd suggest rewriting isWordSeparator like this:
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
since your current implementation doesn't handle all the punctuation and space, like \t or +.
Also, incrementing wordCount when isWordSeparator is true is incorrect for example if you have something like ?!.
So, a less error-prone approach would be to substitute all separators by space and then iterate words inserting them into an (unordered) set:
#include <iterator>
#include <unordered_set>
#include <algorithm>
#include <cctype>
#include <sstream>
int countWords(std::string s) {
std::transform(s.begin(), s.end(), s.begin(), [](char c) {
if (isWordSeparator(c)) {
return ' ';
}
return std::tolower(c);
});
std::unordered_set<std::string> uniqWords;
std::stringstream ss(s);
std::copy(std::istream_iterator<std::string>(ss), std::istream_iterator<std::string(), std::inserter(uniqWords));
return uniqWords.size();
}
While splitting the string into words, insert all words into a std::set. This will get rid of the duplicates. Then it's just a matter of calling set::size() to get the number of unique words.
I'm using the boost::split() function from the boost string algorithm library in my solution, because is almost standard nowadays.
Explanations in the comments in code...
#include <iostream>
#include <string>
#include <set>
#include <boost/algorithm/string.hpp>
using namespace std;
// Function suggested by user 'mshrbkv':
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
// This is used to make the set case-insensitive.
// Alternatively you could call boost::to_lower() to make the
// string all lowercase before calling boost::split().
struct IgnoreCaseCompare {
bool operator()( const std::string& a, const std::string& b ) const {
return boost::ilexicographical_compare( a, b );
}
};
int main()
{
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
// Define a set that will contain only unique strings, ignoring case.
set< string, IgnoreCaseCompare > words;
// Split the string by using your isWordSeparator function
// to define the delimiters. token_compress_on collapses multiple
// consecutive delimiters into only one.
boost::split( words, s, isWordSeparator, boost::token_compress_on );
// Now the set contains only the unique words.
cout << "Number of Words: " << words.size() << endl;
for( auto& w : words )
cout << w << endl;
return 0;
}
Demo: http://coliru.stacked-crooked.com/a/a3b51a6c6a3b4ee8
You can consider SQLite c++ wrapper
I have a string in form "blah-blah..obj_xx..blah-blah" where xx are digits. E.g. the string may be "root75/obj_43.dat".
I want to read "xx" (or 43 from the sample above) as an integer. How do I do it?
I tried to find "obj_" first:
std::string::size_type const cpos = name.find("obj_");
assert(std::string::npos != cpos);
but what's next?
My GCC doesn't support regexes fully, but I think this should work:
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
int main ()
{
std::string input ("blah-blah..obj_42..blah-blah");
std::regex expr ("obj_([0-9]+)");
std::sregex_iterator i = std::sregex_iterator(input.begin(), input.end(), expr);
std::smatch match = *i;
int number = std::stoi(match.str());
std::cout << number << '\n';
}
With something this simple you can do
auto b = name.find_first_of("0123456789", cpos);
auto e = name.find_first_not_of("0123456789", b);
if (b != std::string::npos)
{
auto digits = name.substr(b, e);
int n = std::stoi(digits);
}
else
{
// Error handling
}
For anything more complicated I would use regex.
How about:
#include <iostream>
#include <string>
int main()
{
const std::string test("root75/obj_43.dat");
int number;
// validate input:
const auto index = test.find("obj_");
if(index != std::string::npos)
{
number = std::stoi(test.substr(index+4));
std::cout << "number: " << number << ".\n";
}
else
std::cout << "Input validation failed.\n";
}
Live demo here. Includes (very) basic input validation (e.g. it will fail if the string contains multiple obj_), variable length numbers at the end, or even more stuff following it (adjust the substr call accordingly) and you can add a second argument to std::stoi to make sure it didn't fail for some reason.
Here's another option
//your code:
std::string::size_type const cpos = name.find("obj_");
assert(std::string::npos != cpos);
//my code starts here:
int n;
std::stringstream sin(name.substr(cpos+4));
sin>>n;
Dirt simple method, though probably pretty inefficient, and doesn't take advantage of the STL:
(Note that I didn't try to compile this)
unsigned GetFileNumber(std::string &s)
{
const std::string extension = ".dat";
/// get starting position - first character to the left of the file extension
/// in a real implementation, you'd want to verify that the string actually contains
/// the correct extension.
int i = (int)(s.size() - extension.size() - 1);
unsigned sum = 0;
int tensMultiplier = 1;
while (i >= 0)
{
/// get the integer value of this digit - subtract (int)'0' rather than
/// using the ASCII code of `0` directly for clarity. Optimizer converts
/// it to a literal immediate at compile time, anyway.
int digit = s[i] - (int)'0';
/// if this is a valid numeric character
if (digit >= 0 && digit <= 9)
{
/// add the digit's value, adjusted for it's place within the numeric
/// substring, to the accumulator
sum += digit * tensMultiplier;
/// set the tens place multiplier for the next digit to the left.
tensMultiplier *= 10;
}
else
{
break;
}
i--;
}
return sum;
}
If you need it as a string, just append the found digits to a result string rather than accumulating their values in sum.
This also assumes that .dat is the last part of your string. If not, I'd start at the end, count left until you find a numeric character, and then start the above loop. This is nice because it's O(n), but may not be as clear as the regex or find approaches.
I use boost framework, so it could be helpful, but I haven't found a necessary function.
For usual fast splitting I can use:
string str = ...;
vector<string> strs;
boost::split(strs, str, boost::is_any_of("mM"));
but it removes m and M characters.
I also can't siply use regexp because it searches the string for the longest value which meets a defined pattern.
P.S. There are a lot of similar questions, but they describe this implementation in other programming languages only.
Untested, but rather than using vector<string>, you could try a vector<boost::iterator_range<std::string::iterator>> (so you get a pair of iterators to the main string for each token. Then iterate from (start of range -1 [as long as start of range is not begin() of main string], to end of range)
EDIT: Here is an example:
#include <iostream>
#include <string>
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/range/iterator_range.hpp>
int main(void)
{
std::string str = "FooMBarMSFM";
std::vector<boost::iterator_range<std::string::iterator>> tokens;
boost::split(tokens, str, boost::is_any_of("mM"));
for(auto r : tokens)
{
std::string b(r.begin(), r.end());
std::cout << b << std::endl;
if (r.begin() != str.begin())
{
std::string bm(std::prev(r.begin()), r.end());
std::cout << "With token: [" << bm << "]" << std::endl;
}
}
}
Your need is beyond the conception of split. If you want to keep 'm or M', you could write a special split by strstr, strchr,strtok or find function. You could change some code to produce a flexible split function.
Here is an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void split(char *src, const char *separator, char **dest, int *num)
{
char *pNext;
int count = 0;
if (src == NULL || strlen(src) == 0) return;
if (separator == NULL || strlen(separator) == 0) return;
pNext = strtok(src,separator);
while(pNext != NULL)
{
*dest++ = pNext;
++count;
pNext = strtok(NULL,separator);
}
*num = count;
}
Besides, you could try boost::regex.
My current solution is the following (but it is not universal and looks like too complex).
I choose one character which couldn't appear in this string. In my case it is '|'.
string str = ...;
vector<string> strs;
boost::split(strs, str, boost::is_any_of("m"));
str = boost::join(strs, "|m");
boost::split(strs, str, boost::is_any_of("M"));
str = boost::join(strs, "|M");
if (boost::iequals(str.substr(0, 1), "|") {
str = str.substr(1);
}
boost::split(strs, str, boost::is_any_of("|"));
I add "|" before each of symbols m/M, except of the very first position in string. Then I split the string into substrings with deleting of this extra character