Parsing a string in c++ with a specfic format

Parsing a string in c++ with a specfic format - c++

I have this string post "ola tudo bem como esta" alghero.jpg and i want to break it into 3 pieces post, ola tudo bem como esta (i dont want the "") and alghero.jpg i tried it in c because im new and not really good at programming in c++ but its not working. Is there a more efficient way of doing this in c++?
Program:
int main()
{
char* token1 = new char[128];
char* token2 = new char[128];
char* token3 = new char[128];
char str[] = "post \"ola tudo bem como esta\" alghero.jpg";
char *token;
/* get the first token */
token = strtok(str, " ");
//walk through other tokens
while( token != NULL ) {
printf( " %s\n", token );
token = strtok(NULL, " ");
}
return(0);
}

In C++14 and later, you can use std::quoted to read quoted strings from any std::istream, such as std::istringstream, eg:
#include <iostream>
#include <sstream>
#include <string>
#include <iomanip>
int main()
{
std::string token1, token2, token3;
std::string str = "post \"ola tudo bem como esta\" alghero.jpg";
std::istringstream(str) >> token1 >> std::quoted(token2) >> token3;
std::cout << token1 << "\n";
std::cout << token2 << "\n";
std::cout << token3 << "\n";
return 0;
}

Use find to find the positions of the 2 quotes. Use substr to get the string from index 0 to first quote, first quote to second quote, and second quote to end.
std::string s = "post \"ola tudo bem como esta\" alghero.jpg";
auto first = s.find('\"');
if (first != s.npos) {
auto second = s.find('\"', first + 1);
if (second != s.npos) {
std::cout << s.substr(0, first-1) << '\n';
std::cout << s.substr(first+1, second-first-1) << '\n';
std::cout << s.substr(second+2) << '\n';
}
}
Output:
post
ola tudo bem como esta
alghero.jpg

One option for parsing strings is using regular expressions, for example :
#include <iostream>
#include <regex>
#include <string>
// struct to hold return value of parse function
struct parse_result_t
{
bool parsed{ false };
std::string token1;
std::string token2;
std::string token3;
};
// the parse function
auto parse(const std::string& string)
{
// this is a regex
// ^ match start of line
// (.*)\\\" matches any character until a \" (escaped ") and then escaped again for C++ string
// \w+ match one or more whitepsaces
// (.*)$ match 0 or more characters until end of string
// see it live here : https://regex101.com/r/XnkAZV/1
static std::regex rx{ "^(.*?)\\s+\\\"(.*?)\\\"\\s+(.*)$" };
std::smatch match;
parse_result_t result;
if (std::regex_search(string, match, rx))
{
result.parsed = true;
result.token1 = match[1];
result.token2 = match[2];
result.token3 = match[3];
}
return result;
}
int main()
{
auto result = parse("post \"ola tudo bem como esta\" alghero.jpg");
std::cout << "parse result = " << (result.parsed ? "success" : "failed") << "\n";
std::cout << "token 1 = " << result.token1 << "\n";
std::cout << "token 2 = " << result.token2 << "\n";
std::cout << "token 3 = " << result.token3 << "\n";
return 0;
}

if the strings are always separated by a single space you can just find the first space and last space using std::string::find and std::string::rfind`, split on those characters, and unquote the middle string:
#include <iostream>
#include <tuple>
#include <string>
std::string unquote(const std::string& str) {
if (str.front() != '"' || str.back() != '"') {
return str;
}
return str.substr(1, str.size() - 2);
}
std::tuple < std::string, std::string, std::string> parse_triple_with_quoted_middle(const std::string& str) {
auto iter1 = str.begin() + str.find(' ');
auto iter2 = str.begin() + str.rfind(' ');
auto str1 = std::string(str.begin(),iter1);
auto str2 = std::string(iter1 + 1, iter2);
auto str3 = std::string(iter2 + 1, str.end() );
return { str1, unquote(str2), str3 };
}
int main()
{
std::string test = "post \"ola tudo bem como esta\" alghero.jpg";
auto [str1, str2, str3] = parse_triple_with_quoted_middle(test);
std::cout << str1 << "\n";
std::cout << str2 << "\n";
std::cout << str3 << "\n";
}
You should probably put more input validation into the above, however.

You could use regular expressions for this:
The pattern to search repeatedly for would be: optionally starting with whitespaces \s*; then ([^\"]*) zero or more characters other than quotes (zero or more because you could have several quotes one after the other); and we capture this group (hence the use of parentheses); and finally, whether a quote \" or | the end of the expression $; and we don't capture this group (:?).We use std::regex to store the pattern, wrapping it all within R"()", so that we can write the raw expression.
The while loop does a few things: it searches the next match with regex_search, extracts the captured group, and updates the input line, so that the next search will start where the current one finished.matches is an array whose first element, matches[0], is the part of line matching the whole pattern, and the next elements correspond to the pattern's captured groups.
[Demo]
#include <iostream> // cout
#include <regex> // regex_search, smatch
int main() {
std::string line{"post \"ola tudo bem como esta\" alghero.jpg"};
std::regex pattern{R"(\s*([^\"]*)(:?\"|$))"};
std::smatch matches{};
while (std::regex_search(line, matches, pattern))
{
std::cout << matches[1] << "\n";
line = matches.suffix();
}
}

Related

c++ : istream_iterator skip spaces but not newline

Suppose I have
istringstream input("x = 42\n"s);
I'd like to iterate over this stream using std::istream_iterator<std::string>
int main() {
std::istringstream input("x = 42\n");
std::istream_iterator<std::string> iter(input);
for (; iter != std::istream_iterator<std::string>(); iter++) {
std::cout << *iter << std::endl;
}
}
I get the following output as expected:
x
=
42
Is it possible to have the same iteration skipping spaces but not a newline symbol? So I'd like to have
x
=
42
\n

std::istream_iterator isn't really the right tool for this job, because it doesn't let you specify the delimiter character to use. Instead, use std::getline, which does. Then check for the newline manually and strip it off if found:
#include <iostream>
#include <string>
#include <sstream>
int main() {
std::istringstream input("x = 42\n");
std::string s;
while (getline (input, s, ' '))
{
bool have_newline = !s.empty () && s.back () == '\n';
if (have_newline)
s.pop_back ();
std::cout << "\"" << s << "\"" << std::endl;
if (have_newline)
std::cout << "\"\n\"" << std::endl;
}
}
Output:
"x"
"="
"42"
"
"

If you can use boost use this:
boost::algorithm::split_regex(cont, str, boost::regex("\s"));
where "cont" can be the result container and "str" is your input string.
https://www.boost.org/doc/libs/1_76_0/doc/html/boost/algorithm/split_regex.html

i.m trying to split string by whitespace using c++, where the data from database [duplicate]

What would be easiest method to split a string using c++11?
I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.
Edit: I would like to have a vector<string> as a result and be able to delimitate on a single character.

std::regex_token_iterator performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}

Here is a (maybe less verbose) way to split string (based on the post you mentioned).
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> elems;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// elems.push_back(std::move(item)); // if C++11 (based on comment from #mchiasson)
}
return elems;
}

Here's an example of splitting a string and populating a vector with the extracted elements using boost.
#include <boost/algorithm/string.hpp>
std::string my_input("A,B,EE");
std::vector<std::string> results;
boost::algorithm::split(results, my_input, boost::is_any_of(","));
assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");

Another regex solution inspired by other answers but hopefully shorter and easier to read:
std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};

I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto and range-based for loop.
#include <string>
#include <cctype>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string s = "hello how are you won't you tell me your name";
vector<string> tokens;
string token;
for (const auto& c: s) {
if (!isspace(c))
token += c;
else {
if (token.length()) tokens.push_back(token);
token.clear();
}
}
if (token.length()) tokens.push_back(token);
return 0;
}

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
vector<string> split(const string& str, int delimiter(int) = ::isspace){
vector<string> result;
auto e=str.end();
auto i=str.begin();
while(i!=e){
i=find_if_not(i,e, delimiter);
if(i==e) break;
auto j=find_if(i,e, delimiter);
result.push_back(string(i,j));
i=j;
}
return result;
}
int main(){
string line;
getline(cin,line);
vector<string> result = split(line);
for(auto s: result){
cout<<s<<endl;
}
}

My choice is boost::tokenizer but I didn't have any heavy tasks and test with huge data.
Example from boost doc with lambda modification:
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <vector>
int main()
{
using namespace std;
using namespace boost;
string s = "This is, a test";
vector<string> v;
tokenizer<> tok(s);
for_each (tok.begin(), tok.end(), [&v](const string & s) { v.push_back(s); } );
// result 4 items: 1)This 2)is 3)a 4)test
return 0;
}

This is my answer. Verbose, readable and efficient.
std::vector<std::string> tokenize(const std::string& s, char c) {
auto end = s.cend();
auto start = end;
std::vector<std::string> v;
for( auto it = s.cbegin(); it != end; ++it ) {
if( *it != c ) {
if( start == end )
start = it;
continue;
}
if( start != end ) {
v.emplace_back(start, it);
start = end;
}
}
if( start != end )
v.emplace_back(start, end);
return v;
}

#include <string>
#include <vector>
#include <sstream>
inline vector<string> split(const string& s) {
vector<string> result;
istringstream iss(s);
for (string w; iss >> w; )
result.push_back(w);
return result;
}

Here is a C++11 solution that uses only std::string::find(). The delimiter can be any number of characters long. Parsed tokens are output via an output iterator, which is typically a std::back_inserter in my code.
I have not tested this with UTF-8, but I expect it should work as long as the input and delimiter are both valid UTF-8 strings.
#include <string>
template<class Iter>
Iter splitStrings(const std::string &s, const std::string &delim, Iter out)
{
if (delim.empty()) {
*out++ = s;
return out;
}
size_t a = 0, b = s.find(delim);
for ( ; b != std::string::npos;
a = b + delim.length(), b = s.find(delim, a))
{
*out++ = std::move(s.substr(a, b - a));
}
*out++ = std::move(s.substr(a, s.length() - a));
return out;
}
Some test cases:
void test()
{
std::vector<std::string> out;
size_t counter;
std::cout << "Empty input:" << std::endl;
out.clear();
splitStrings("", ",", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, empty delimiter:" << std::endl;
out.clear();
splitStrings("Hello, world!", "", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", no delimiter in string:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xyz", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string:" << std::endl;
out.clear();
splitStrings("abxycdxy!!xydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", input contains blank token:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", nothing after last delimiter:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", only delimiter exists string:" << std::endl;
out.clear();
splitStrings("xy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
}
Expected output:
Empty input:
0:
Non-empty input, empty delimiter:
0: Hello, world!
Non-empty input, non-empty delimiter, no delimiter in string:
0: abxycdxyxydefxya
Non-empty input, non-empty delimiter, delimiter exists string:
0: ab
1: cd
2: !!
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, input contains blank token:
0: ab
1: cd
2:
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, nothing after last delimiter:
0: ab
1: cd
2:
3: def
4:
Non-empty input, non-empty delimiter, only delimiter exists string:
0:
1:

One possible way of doing this is finding all occurrences of the split string and storing locations to a list. Then count input string characters and when you get to a position where there is a 'search hit' in the position list then you jump forward by 'length of the split string'. This approach takes a split string of any length. Here is my tested and working solution.
#include <iostream>
#include <string>
#include <list>
#include <vector>
using namespace std;
vector<string> Split(string input_string, string search_string)
{
list<int> search_hit_list;
vector<string> word_list;
size_t search_position, search_start = 0;
// Find start positions of every substring occurence and store positions to a hit list.
while ( (search_position = input_string.find(search_string, search_start) ) != string::npos) {
search_hit_list.push_back(search_position);
search_start = search_position + search_string.size();
}
// Iterate through hit list and reconstruct substring start and length positions
int character_counter = 0;
int start, length;
for (auto hit_position : search_hit_list) {
// Skip over substrings we are splitting with. This also skips over repeating substrings.
if (character_counter == hit_position) {
character_counter = character_counter + search_string.size();
continue;
}
start = character_counter;
character_counter = hit_position;
length = character_counter - start;
word_list.push_back(input_string.substr(start, length));
character_counter = character_counter + search_string.size();
}
// If the search string is not found in the input string, then return the whole input_string.
if (word_list.size() == 0) {
word_list.push_back(input_string);
return word_list;
}
// The last substring might be still be unprocessed, get it.
if (character_counter < input_string.size()) {
word_list.push_back(input_string.substr(character_counter, input_string.size() - character_counter));
}
return word_list;
}
int main() {
vector<string> word_list;
string search_string = " ";
// search_string = "the";
string text = "thetheThis is some text to test with the split-thethe function.";
word_list = Split(text, search_string);
for (auto item : word_list) {
cout << "'" << item << "'" << endl;
}
cout << endl;
}

Replace a string to another string using C++

The problem is I don't know the length of the input string.
My function can only replace if the input string is "yyyy". I think of the solution is that first, we will try to convert the input string back to "yyyy" and using my function to complete the work.
Here's my function:
void findAndReplaceAll(std::string & data, std::string toSearch, std::string replaceStr)
{
// Get the first occurrence
size_t pos = data.find(toSearch);
// Repeat till end is reached
while( pos != std::string::npos)
{
// Replace this occurrence of Sub String
data.replace(pos, toSearch.size(), replaceStr);
// Get the next occurrence from the current position
pos = data.find(toSearch, pos + replaceStr.size());
}
}
My main function
std::string format = "yyyyyyyyyydddd";
findAndReplaceAll(format, "yyyy", "%Y");
findAndReplaceAll(format, "dd", "%d");
My expected output should be :
%Y%d

Use regular expressions.
Example:
#include <iostream>
#include <string>
#include <regex>
int main(){
std::string text = "yyyyyy";
std::string sentence = "This is a yyyyyyyyyyyy.";
std::cout << "Text: " << text << std::endl;
std::cout << "Sentence: " << sentence << std::endl;
// Regex
std::regex y_re("y+"); // this is the regex that matches y yyy or more yyyy
// replacing
std::string r1 = std::regex_replace(text, y_re, "%y"); // using lowercase
std::string r2 = std::regex_replace(sentence, y_re, "%Y"); // using upercase
// showing result
std::cout << "Text replace: " << r1 << std::endl;
std::cout << "Sentence replace: " << r2 << std::endl;
return 0;
}
Output:
Text: yyyyyy
Sentence: This is a yyyyyyyyyyyy.
Text replace: %y
Sentence replace: This is a %Y.
If you want to make it even better you can use:
// Regex
std::regex y_re("[yY]+");
That will match any mix of lowercase and upper case for any amount of 'Y's .
Example output with that Regex:
Sentence: This is a yYyyyYYYYyyy.
Sentence replace: This is a %Y.
This is just a simple example of what you can do with regex, I'd recommend to look at the topic on itself, there is plenty of info her in SO and other sites.
Extra:
If you want to match before replacing to alternate the replacing you can do something like:
// Regex
std::string text = "yyaaaa";
std::cout << "Text: " << text << std::endl;
std::regex y_re("y+"); // this is the regex that matches y yyy or more yyyy
std::string output = "";
std::smatch ymatches;
if (std::regex_search(text, ymatches, y_re)) {
if (ymatches[0].length() == 2 ) {
output = std::regex_replace(text, y_re, "%y");
} else {
output = std::regex_replace(text, y_re, "%Y");
}
}

C++ alternative of Java's split(str, -1) [duplicate]

What would be easiest method to split a string using c++11?
I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.
Edit: I would like to have a vector<string> as a result and be able to delimitate on a single character.

std::regex_token_iterator performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}

Here is a (maybe less verbose) way to split string (based on the post you mentioned).
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> elems;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// elems.push_back(std::move(item)); // if C++11 (based on comment from #mchiasson)
}
return elems;
}

Here's an example of splitting a string and populating a vector with the extracted elements using boost.
#include <boost/algorithm/string.hpp>
std::string my_input("A,B,EE");
std::vector<std::string> results;
boost::algorithm::split(results, my_input, boost::is_any_of(","));
assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");

Another regex solution inspired by other answers but hopefully shorter and easier to read:
std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};

I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto and range-based for loop.
#include <string>
#include <cctype>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string s = "hello how are you won't you tell me your name";
vector<string> tokens;
string token;
for (const auto& c: s) {
if (!isspace(c))
token += c;
else {
if (token.length()) tokens.push_back(token);
token.clear();
}
}
if (token.length()) tokens.push_back(token);
return 0;
}

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
vector<string> split(const string& str, int delimiter(int) = ::isspace){
vector<string> result;
auto e=str.end();
auto i=str.begin();
while(i!=e){
i=find_if_not(i,e, delimiter);
if(i==e) break;
auto j=find_if(i,e, delimiter);
result.push_back(string(i,j));
i=j;
}
return result;
}
int main(){
string line;
getline(cin,line);
vector<string> result = split(line);
for(auto s: result){
cout<<s<<endl;
}
}

My choice is boost::tokenizer but I didn't have any heavy tasks and test with huge data.
Example from boost doc with lambda modification:
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <vector>
int main()
{
using namespace std;
using namespace boost;
string s = "This is, a test";
vector<string> v;
tokenizer<> tok(s);
for_each (tok.begin(), tok.end(), [&v](const string & s) { v.push_back(s); } );
// result 4 items: 1)This 2)is 3)a 4)test
return 0;
}

This is my answer. Verbose, readable and efficient.
std::vector<std::string> tokenize(const std::string& s, char c) {
auto end = s.cend();
auto start = end;
std::vector<std::string> v;
for( auto it = s.cbegin(); it != end; ++it ) {
if( *it != c ) {
if( start == end )
start = it;
continue;
}
if( start != end ) {
v.emplace_back(start, it);
start = end;
}
}
if( start != end )
v.emplace_back(start, end);
return v;
}

#include <string>
#include <vector>
#include <sstream>
inline vector<string> split(const string& s) {
vector<string> result;
istringstream iss(s);
for (string w; iss >> w; )
result.push_back(w);
return result;
}

Here is a C++11 solution that uses only std::string::find(). The delimiter can be any number of characters long. Parsed tokens are output via an output iterator, which is typically a std::back_inserter in my code.
I have not tested this with UTF-8, but I expect it should work as long as the input and delimiter are both valid UTF-8 strings.
#include <string>
template<class Iter>
Iter splitStrings(const std::string &s, const std::string &delim, Iter out)
{
if (delim.empty()) {
*out++ = s;
return out;
}
size_t a = 0, b = s.find(delim);
for ( ; b != std::string::npos;
a = b + delim.length(), b = s.find(delim, a))
{
*out++ = std::move(s.substr(a, b - a));
}
*out++ = std::move(s.substr(a, s.length() - a));
return out;
}
Some test cases:
void test()
{
std::vector<std::string> out;
size_t counter;
std::cout << "Empty input:" << std::endl;
out.clear();
splitStrings("", ",", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, empty delimiter:" << std::endl;
out.clear();
splitStrings("Hello, world!", "", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", no delimiter in string:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xyz", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string:" << std::endl;
out.clear();
splitStrings("abxycdxy!!xydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", input contains blank token:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", nothing after last delimiter:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", only delimiter exists string:" << std::endl;
out.clear();
splitStrings("xy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
}
Expected output:
Empty input:
0:
Non-empty input, empty delimiter:
0: Hello, world!
Non-empty input, non-empty delimiter, no delimiter in string:
0: abxycdxyxydefxya
Non-empty input, non-empty delimiter, delimiter exists string:
0: ab
1: cd
2: !!
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, input contains blank token:
0: ab
1: cd
2:
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, nothing after last delimiter:
0: ab
1: cd
2:
3: def
4:
Non-empty input, non-empty delimiter, only delimiter exists string:
0:
1:

One possible way of doing this is finding all occurrences of the split string and storing locations to a list. Then count input string characters and when you get to a position where there is a 'search hit' in the position list then you jump forward by 'length of the split string'. This approach takes a split string of any length. Here is my tested and working solution.
#include <iostream>
#include <string>
#include <list>
#include <vector>
using namespace std;
vector<string> Split(string input_string, string search_string)
{
list<int> search_hit_list;
vector<string> word_list;
size_t search_position, search_start = 0;
// Find start positions of every substring occurence and store positions to a hit list.
while ( (search_position = input_string.find(search_string, search_start) ) != string::npos) {
search_hit_list.push_back(search_position);
search_start = search_position + search_string.size();
}
// Iterate through hit list and reconstruct substring start and length positions
int character_counter = 0;
int start, length;
for (auto hit_position : search_hit_list) {
// Skip over substrings we are splitting with. This also skips over repeating substrings.
if (character_counter == hit_position) {
character_counter = character_counter + search_string.size();
continue;
}
start = character_counter;
character_counter = hit_position;
length = character_counter - start;
word_list.push_back(input_string.substr(start, length));
character_counter = character_counter + search_string.size();
}
// If the search string is not found in the input string, then return the whole input_string.
if (word_list.size() == 0) {
word_list.push_back(input_string);
return word_list;
}
// The last substring might be still be unprocessed, get it.
if (character_counter < input_string.size()) {
word_list.push_back(input_string.substr(character_counter, input_string.size() - character_counter));
}
return word_list;
}
int main() {
vector<string> word_list;
string search_string = " ";
// search_string = "the";
string text = "thetheThis is some text to test with the split-thethe function.";
word_list = Split(text, search_string);
for (auto item : word_list) {
cout << "'" << item << "'" << endl;
}
cout << endl;
}

how to get substring upto a first newline character from a string in c++?

I have a string with large content. I have to separate out content of string before the first newline character and after the newline character.
string content is as follows:
std::string = "exption is theo from my fimnct!
mt nsamre id kjsdf dskfk djfhj
/vonsfs/sdvfs/sdvjisd/dd.so
dfjg dfk dflkkm sdfk "
from above i have to get the content of first line upto the newline charcter in another string and keep the other content remain unchanged. The characters in first line are not fixed. it is variable sting.

What about string::substr and string::find:
#include <iostream>
int main()
{
std::string s = "foo\nbar";
std::cout << "first line: " << s.substr(0, s.find('\n')) << "\n";
}

You would do this like this:
std::string first, second, all = "...";
size_t pos = all.find('\n')
if(pos != std::string::npos)
{
first = all.substr(0, pos);
second = all.substr(pos+1);
}

Try std::algorithms:
int main (void)
{
std::string input(
"exption is theo from my fimnct!\n"
"mt nsamre id kjsdf dskfk djfhj\n"
"/vonsfs/sdvfs/sdvjisd/dd.so\n"
"dfjg dfk dflkkm sdfk"
);
std::string first_line(input.begin(), std::find(input.begin(), input.end(), '\n'));
std::string rest_lines(std::find(input.begin(), input.end(), '\n'), input.end());
std::cout << first_line << std::endl;
std::cout << "---" << std::endl;
std::cout << rest_lines << std::endl;
return 0;
}
This prints out
exption is theo from my fimnct!
---
mt nsamre id kjsdf dskfk djfhj
/vonsfs/sdvfs/sdvjisd/dd.so
dfjg dfk dflkkm sdf

std::string::substr and std::string::find_first_of

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing a string in c++ with a specfic format - c++

Related

c++ : istream_iterator skip spaces but not newline

i.m trying to split string by whitespace using c++, where the data from database [duplicate]

Replace a string to another string using C++

C++ alternative of Java's split(str, -1) [duplicate]

how to get substring upto a first newline character from a string in c++?

Categories

Resources