Parse integers in string in C++ - c++

I have a string that looks like this:
"{{2,3},{10,1},9}"
and I want to convert it to an array (or vector) of strings:
["{", "{", "2", "}", ",", "{", "10", ",", "1", "}", ",", "9", "}"]
I can't just pull out each character separately because some of the integers may be double-digit, and I can't figure out how to use stringstream because there are multiple delimiters on the integers (could be followed by , or })

Just walk through the string. If we're on a digit, walk 'til we're not on a digit:
std::vector<std::string> split(std::string const& s)
{
std::vector<std::string> results;
std::locale loc{};
for (auto it = s.begin(); it != s.end(); )
{
if (std::isdigit(*it, loc)) {
auto next = std::find_if(it+1, s.end(), [&](char c){
return !std::isdigit(c, loc);
});
results.emplace_back(it, next);
it = next;
}
else {
results.emplace_back(1, *it);
++it;
}
}
return results;
}

The logic is straightforward:
Iterate over the string.
push_back() each character as a string of its own into the output vector, unless it's a digit and the last character was a digit, in which case append the digit to the last string in the output vector:
That's it.
std::string s="{{2,3},{10,1},9}";
std::vector<std::string> v;
bool last_character_was_a_digit=false;
for (auto c:s)
{
if ( c >= '0' && c <= '9')
{
if (last_character_was_a_digit)
{
v.back().push_back(c);
continue;
}
last_character_was_a_digit=true;
}
else
{
last_character_was_a_digit=false;
}
v.push_back(std::string(&c, 1));
}

Thank you Sam and Barry for answering! I actually came up with my own solution (sometimes posting a question here helps me see things more clearly) but yours are much more elegant and delimiter-independent, I'll study them further!
std::vector<std::string> myVector;
std::string nextSubStr;
for (int i=0; i<myString.size(); i++)
{
nextSubStr = myString[i];
// if we pulled a single-digit integer out of myString
if (nextSubStr != "{" && nextSubStr != "}" && nextSubStr != ",")
{
// let's make sure we get the whole integer!
int j=i;
peekNext = myString[++j];
while (peekNext != "{" && peekNext != "}" && peekNext != ",")
{
// another digit on the integer
nextSubStr += peekNext;
peekNext = myString[++j];
i++;
}
}
myVector.push_back(nextSubStr);
}

Related

C++ separate string by selected commas

I was reading the following question Parsing a comma-delimited std::string on how to split a string by a comma (Someone gave me the link from my previous question) and one of the answers was:
stringstream ss( "1,1,1,1, or something else ,1,1,1,0" );
vector<string> result;
while( ss.good() )
{
string substr;
getline( ss, substr, ',' );
result.push_back( substr );
}
But what if my string was like the following, and I wanted to separate values only by the bold commas and ignoring what appears inside <>?
<a,b>,<c,d>,,<d,l>,
I want to get:
<a,b>
<c,d>
"" //Empty string
<d,l>
""
Given:<a,b>,,<c,d> It should return: <a,b> and "" and <c,d>
Given:<a,b>,<c,d> It should return:<a,b> and <c,d>
Given:<a,b>, It should return:<a,b> and ""
Given:<a,b>,,,<c,d> It should return:<a,b> and "" and "" and <c,d>
In other words, my program should behave just like the given solution above separated by , (Supposing there is no other , except the bold ones)
Here are some suggested solution and their problems:
Delete all bold commas: This will result in treating the following 2 inputs the same way while they shouldn't
<a,b>,<c,d>
<a,b>,,<c,d>
Replace all bold commas with some char and use the above algorithm: I can't select some char to replace the commas with since any value could appear in the rest of my string
Adding to #Carlos' answer, apart from regex (take a look at my comment); you can implement the substitution like the following (Here, I actually build a new string):
#include <algorithm>
#include <iostream>
#include <string>
int main() {
std::string str;
getline(std::cin,str);
std::string str_builder;
for (auto it = str.begin(); it != str.end(); it++) {
static bool flag = false;
if (*it == '<') {
flag = true;
}
else if (*it == '>') {
flag = false;
str_builder += *it;
}
if (flag) {
str_builder += *it;
}
}
}
Why not replace one set of commas with some known-to-not-clash character, then split it by the other commas, then reverse the replacement?
So replace the commas that are inside the <> with something, do the string split, replace again.
I think what you want is something like this:
vector<string> result;
string s = "<a,b>,,<c,d>"
int in_string = 0;
int latest_comma = 0;
for (int i = 0; i < s.size(); i++) {
if(s[i] == '<'){
result.push_back(s[i]);
in_string = 1;
latest_comma = 0;
}
else if(s[i] == '>'){
result.push_back(s[i]);
in_string = 0;
}
else if(!in_string && s[i] == ','){
if(latest_comma == 1)
result.push_back('\n');
else
latest_comma = 1;
}
else
result.push_back(s[i]);
}
Here is a possible code that scans a string one char at a time and splits it on commas (',') unless they are masked between brackets ('<' and '>').
Algo:
assume starting outside brackets
loop for each character:
if not a comma, or if inside brackets
store the character in the current item
if a < bracket: note that we are inside brackets
if a > bracket: note that we are outside brackets
else (an unmasked comma)
store the current item as a string into the resulting vector
clear the current item
store the last item into the resulting vector
Only 10 lines and my rubber duck agreed that it should work...
C++ implementation: I will use a vector to handle the current item because it is easier to build it one character at a time
std::vector<std::string> parse(const std::string& str) {
std::vector<std::string> result;
bool masked = false;
std::vector<char> current; // stores chars of the current item
for (const char c : str) {
if (masked || (c != ',')) {
current.push_back(c);
switch (c) {
case '<': masked = true; break;
case '>': masked = false;
}
}
else { // unmasked comma: store item and prepare next
current.push_back('\0'); // a terminating null for the vector data
result.push_back(std::string(&current[0]));
current.clear();
}
}
// do not forget the last item...
current.push_back('\0');
result.push_back(std::string(&current[0]));
return result;
}
I tested it with all your example strings and it gives the expected results.
Seems quite straight forward to me.
vector<string> customSplit(string s)
{
vector<string> results;
int level = 0;
std::stringstream ss;
for (char c : s)
{
switch (c)
{
case ',':
if (level == 0)
{
results.push_back(ss.str());
stringstream temp;
ss.swap(temp); // Clear ss for the new string.
}
else
{
ss << c;
}
break;
case '<':
level += 2;
case '>':
level -= 1;
default:
ss << c;
}
}
results.push_back(ss.str());
return results;
}

Split strings into tokens with delimiter (/ and -) in c++ [duplicate]

This question already has answers here:
Right way to split an std::string into a vector<string>
(12 answers)
Closed 11 months ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
I have some text (meaningful text or arithmetical expression) and I want to split it into words.
If I had a single delimiter, I'd use:
std::stringstream stringStream(inputString);
std::string word;
while(std::getline(stringStream, word, delimiter))
{
wordVector.push_back(word);
}
How can I break the string into tokens with several delimiters?
Assuming one of the delimiters is newline, the following reads the line and further splits it by the delimiters. For this example I've chosen the delimiters space, apostrophe, and semi-colon.
std::stringstream stringStream(inputString);
std::string line;
while(std::getline(stringStream, line))
{
std::size_t prev = 0, pos;
while ((pos = line.find_first_of(" ';", prev)) != std::string::npos)
{
if (pos > prev)
wordVector.push_back(line.substr(prev, pos-prev));
prev = pos+1;
}
if (prev < line.length())
wordVector.push_back(line.substr(prev, std::string::npos));
}
If you have boost, you could use:
#include <boost/algorithm/string.hpp>
std::string inputString("One!Two,Three:Four");
std::string delimiters("|,:");
std::vector<std::string> parts;
boost::split(parts, inputString, boost::is_any_of(delimiters));
Using std::regex
A std::regex can do string splitting in a few lines:
std::regex re("[\\|,:]");
std::sregex_token_iterator first{input.begin(), input.end(), re, -1}, last;//the '-1' is what makes the regex split (-1 := what was not matched)
std::vector<std::string> tokens{first, last};
Try it yourself
I don't know why nobody pointed out the manual way, but here it is:
const std::string delims(";,:. \n\t");
inline bool isDelim(char c) {
for (int i = 0; i < delims.size(); ++i)
if (delims[i] == c)
return true;
return false;
}
and in function:
std::stringstream stringStream(inputString);
std::string word; char c;
while (stringStream) {
word.clear();
// Read word
while (!isDelim((c = stringStream.get())))
word.push_back(c);
if (c != EOF)
stringStream.unget();
wordVector.push_back(word);
// Read delims
while (isDelim((c = stringStream.get())));
if (c != EOF)
stringStream.unget();
}
This way you can do something useful with the delims if you want.
And here, ages later, a solution using C++20:
constexpr std::string_view words{"Hello-_-C++-_-20-_-!"};
constexpr std::string_view delimeters{"-_-"};
for (const std::string_view word : std::views::split(words, delimeters)) {
std::cout << std::quoted(word) << ' ';
}
// outputs: Hello C++ 20!
Required headers:
#include <ranges>
#include <string_view>
Reference: https://en.cppreference.com/w/cpp/ranges/split_view
If you interesting in how to do it yourself and not using boost.
Assuming the delimiter string may be very long - let say M, checking for every char in your string if it is a delimiter, would cost O(M) each, so doing so in a loop for all chars in your original string, let say in length N, is O(M*N).
I would use a dictionary (like a map - "delimiter" to "booleans" - but here I would use a simple boolean array that has true in index = ascii value for each delimiter).
Now iterating on the string and check if the char is a delimiter is O(1), which eventually gives us O(N) overall.
Here is my sample code:
const int dictSize = 256;
vector<string> tokenizeMyString(const string &s, const string &del)
{
static bool dict[dictSize] = { false};
vector<string> res;
for (int i = 0; i < del.size(); ++i) {
dict[del[i]] = true;
}
string token("");
for (auto &i : s) {
if (dict[i]) {
if (!token.empty()) {
res.push_back(token);
token.clear();
}
}
else {
token += i;
}
}
if (!token.empty()) {
res.push_back(token);
}
return res;
}
int main()
{
string delString = "MyDog:Odie, MyCat:Garfield MyNumber:1001001";
//the delimiters are " " (space) and "," (comma)
vector<string> res = tokenizeMyString(delString, " ,");
for (auto &i : res) {
cout << "token: " << i << endl;
}
return 0;
}
Note: tokenizeMyString returns vector by value and create it on the stack first, so we're using here the power of the compiler >>> RVO - return value optimization :)
Using Eric Niebler's range-v3 library:
https://godbolt.org/z/ZnxfSa
#include <string>
#include <iostream>
#include "range/v3/all.hpp"
int main()
{
std::string s = "user1:192.168.0.1|user2:192.168.0.2|user3:192.168.0.3";
auto words = s
| ranges::view::split('|')
| ranges::view::transform([](auto w){
return w | ranges::view::split(':');
});
ranges::for_each(words, [](auto i){ std::cout << i << "\n"; });
}

Getting the words from a sentence and storing them in a vector of strings

Alright, guys ...
Here's my set that has all the letters. I'm defining a word as consisting of consecutive letters from the set.
const char LETTERS_ARR[] = {"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::set<char> LETTERS_SET(LETTERS_ARR, LETTERS_ARR + sizeof(LETTERS_ARR)/sizeof(char));
I was hoping that this function would take in a string representing a sentence and return a vector of strings that are the individual words in the sentence.
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1) == 1))) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += k;
}
else {
++it;
}
}
return retvec;
}
For instance, the following call should return a vector of the strings "Yo", "dawg", etc.
std::string mystring("Yo, dawg, I heard you life functions, so we put a function inside your function so you can derive while you derive.");
std::vector<std::string> mystringvec = get_sntnc_wrds(mystring);
But everything isn't going as planned. I tried running my code and it was putting the entire sentence into the first and only element of the vector. My function is very messy code and perhaps you can help me come up with a simpler version. I don't expect you to be able to trace my thought process in my pitiful attempt at writing that function.
Try this instead:
#include <vector>
#include <cctype>
#include <string>
#include <algorithm>
// true if the argument is whitespace, false otherwise
bool space(char c)
{
return isspace(c);
}
// false if the argument is whitespace, true otherwise
bool not_space(char c)
{
return !isspace(c);
}
vector<string> split(const string& str)
{
typedef string::const_iterator iter;
vector<string> ret;
iter i = str.begin();
while (i != str.end())
{
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
iter j = find_if(i, str.end(), space);
// copy the characters in [i, j)
if (i != str.end())
ret.push_back(string(i, j));
i = j;
}
return ret;
}
The split function will return a vector of strings, each element containing one word.
This code is taken from the Accelerated C++ book, so it's not mine, but it works. There are other superb examples of using containers and algorithms for solving every-day problems in this book. I could even get a one-liner to show the contents of a file at the output console. Highly recommended.
It's just a bracketing issue, my advice is (almost) never put in more brackets than are necessary, it's only confuses things
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1)) == 1) {
Your code compares the character with 1 not the return value of count.
Also although count does return an integer in this context I would simplify further and treat the return as a boolean
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1))) {
You should use the string steam with std::copy like so:
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>
int main() {
std::string sentence = "And I feel fine...";
std::istringstream iss(sentence);
std::vector<std::string> split;
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(split));
// This is to print the vector
for(auto iter = split.begin();
iter != split.end();
++iter)
{
std::cout << *iter << "\n";
}
}
I would use another more simple approach based on member functions of class std::string. For example
const char LETTERS[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::string s( "This12 34is 56a78 test." );
std::vector<std::string> v;
for ( std::string::size_type first = s.find_first_of( LETTERS, 0 );
first != std::string::npos;
first = s.find_first_of( LETTERS, first ) )
{
std::string::size_type last = s.find_first_not_of( LETTERS, first );
v.push_back(
std::string( s, first, last == std::string::npos ? std::string::npos : last - first ) );
first = last;
}
for ( const std::string &s : v ) std::cout << s << ' ';
std::cout << std::endl;
Here you make 2 mistakes, I have correct in the following code.
First, it should be
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1))
and, it should move to next by
it += (k+1);
and the code is
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1)) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += (k+1);
}
else {
++it;
}
}
return retvec;
}
The output have been tested.

Complex algorithm to extract numbers/number range from a string

I am working on a algorithm where I am trying the following output:
Given values/Inputs:
char *Var = "1-5,10,12,15-16,25-35,67,69,99-105";
int size = 29;
Here "1-5" depicts a range value, i.e. it will be understood as "1,2,3,4,5" while the values with just "," are individual values.
I was writing an algorithm where end output should be such that it will give complete range of output as:
int list[]=1,2,3,4,5,10,12,15,16,25,26,27,28,29,30,31,32,33,34,35,67,69,99,100,101,102,103,104,105;
If anyone is familiar with this issue then the help would be really appreciated.
Thanks in advance!
My initial code approach was as:
if(NULL != strchr((char *)grp_range, '-'))
{
int_u8 delims[] = "-";
result = (int_u8 *)strtok((char *)grp_range, (char *)delims);
if(NULL != result)
{
start_index = strtol((char*)result, (char **)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
while(NULL != result)
{
end_index = strtol((char*)result, (char**)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
while(start_index <= end_index)
{
grp_list[i++] = start_index;
start_index++;
}
}
else if(NULL != strchr((char *)grp_range, ','))
{
int_u8 delims[] = ",";
result = (unison_u8 *)strtok((char *)grp_range, (char *)delims);
while(result != NULL)
{
grp_list[i++] = strtol((char*)result, (char**)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
}
But it only works if I have either "0-5" or "0,10,15". I am looking forward to make it more versatile.
Here is a C++ solution for you to study.
#include <vector>
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int ConvertString2Int(const string& str)
{
stringstream ss(str);
int x;
if (! (ss >> x))
{
cerr << "Error converting " << str << " to integer" << endl;
abort();
}
return x;
}
vector<string> SplitStringToArray(const string& str, char splitter)
{
vector<string> tokens;
stringstream ss(str);
string temp;
while (getline(ss, temp, splitter)) // split into new "lines" based on character
{
tokens.push_back(temp);
}
return tokens;
}
vector<int> ParseData(const string& data)
{
vector<string> tokens = SplitStringToArray(data, ',');
vector<int> result;
for (vector<string>::const_iterator it = tokens.begin(), end_it = tokens.end(); it != end_it; ++it)
{
const string& token = *it;
vector<string> range = SplitStringToArray(token, '-');
if (range.size() == 1)
{
result.push_back(ConvertString2Int(range[0]));
}
else if (range.size() == 2)
{
int start = ConvertString2Int(range[0]);
int stop = ConvertString2Int(range[1]);
for (int i = start; i <= stop; i++)
{
result.push_back(i);
}
}
else
{
cerr << "Error parsing token " << token << endl;
abort();
}
}
return result;
}
int main()
{
vector<int> result = ParseData("1-5,10,12,15-16,25-35,67,69,99-105");
for (vector<int>::const_iterator it = result.begin(), end_it = result.end(); it != end_it; ++it)
{
cout << *it << " ";
}
cout << endl;
}
Live example
http://ideone.com/2W99Tt
This is my boost approach :
This won't give you array of ints, instead a vector of ints
Algorithm used: (nothing new)
Split string using ,
Split the individual string using -
Make a range low and high
Push it into vector with help of this range
Code:-
#include<iostream>
#include<vector>
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>
int main(){
std::string line("1-5,10,12,15-16,25-35,67,69,99-105");
std::vector<std::string> strs,r;
std::vector<int> v;
int low,high,i;
boost::split(strs,line,boost::is_any_of(","));
for (auto it:strs)
{
boost::split(r,it,boost::is_any_of("-"));
auto x = r.begin();
low = high =boost::lexical_cast<int>(r[0]);
x++;
if(x!=r.end())
high = boost::lexical_cast<int>(r[1]);
for(i=low;i<=high;++i)
v.push_back(i);
}
for(auto x:v)
std::cout<<x<<" ";
return 0;
}
You're issue seems to be misunderstanding how strtok works. Have a look at this.
#include <string.h>
#include <stdio.h>
int main()
{
int i, j;
char delims[] = " ,";
char str[] = "1-5,6,7";
char *tok;
char tmp[256];
int rstart, rend;
tok = strtok(str, delims);
while(tok != NULL) {
for(i = 0; i < strlen(tok); ++i) {
//// range
if(i != 0 && tok[i] == '-') {
strncpy(tmp, tok, i);
rstart = atoi(tmp);
strcpy(tmp, tok + i + 1);
rend = atoi(tmp);
for(j = rstart; j <= rend; ++j)
printf("%d\n", j);
i = strlen(tok) + 1;
}
else if(strchr(tok, '-') == NULL)
printf("%s\n", tok);
}
tok = strtok(NULL, delims);
}
return 0;
}
Don't search. Just go through the text one character at a time. As long as you're seeing digits, accumulate them into a value. If the digits are followed by a - then you're looking at a range, and need to parse the next set of digits to get the upper bound of the range and put all the values into your list. If the value is not followed by a - then you've got a single value; put it into your list.
Stop and think about it: what you actually have is a comma
separated list of ranges, where a range can be either a single
number, or a pair of numbers separated by a '-'. So you
probably want to loop over the ranges, using recursive descent
for the parsing. (This sort of thing is best handled by an
istream, so that's what I'll use.)
std::vector<int> results;
std::istringstream parser( std::string( var ) );
processRange( results, parser );
while ( isSeparator( parser, ',' ) ) {
processRange( results, parser );
}
with:
bool
isSeparator( std::istream& source, char separ )
{
char next;
source >> next;
if ( source && next != separ ) {
source.putback( next );
}
return source && next == separ;
}
and
void
processRange( std::vector<int>& results, std::istream& source )
{
int first = 0;
source >> first;
int last = first;
if ( isSeparator( source, '-' ) ) {
source >> last;
}
if ( last < first ) {
source.setstate( std::ios_base::failbit );
}
if ( source ) {
while ( first != last ) {
results.push_back( first );
++ first;
}
results.push_back( first );
}
}
The isSeparator function will, in fact, probably be useful in
other projects in the future, and should be kept in your
toolbox.
First divide whole string into numbers and ranges (using strtok() with "," delimiter), save strings in array, then, search through array looking for "-", if it present than use sscanf() with "%d-%d" format, else use sscanf with single "%d" format.
Function usage is easily googling.
One approach:
You need a parser that identifies 3 kinds of tokens: ',', '-', and numbers. That raises the level of abstraction so that you are operating at a level above characters.
Then you can parse your token stream to create a list of ranges and constants.
Then you can parse that list to convert the ranges into constants.
Some code that does part of the job:
#include <stdio.h>
// Prints a comma after the last digit. You will need to fix that up.
void print(int a, int b) {
for (int i = a; i <= b; ++i) {
printf("%d, ", i);
}
}
int main() {
enum { DASH, COMMA, NUMBER };
struct token {
int type;
int value;
};
// Sample input stream. Notice the sentinel comma at the end.
// 1-5,10,
struct token tokStream[] = {
{ NUMBER, 1 },
{ DASH, 0 },
{ NUMBER, 5 },
{ COMMA, 0 },
{ NUMBER, 10 },
{ COMMA, 0 } };
// This parser assumes well formed input. You have to add all the error
// checking yourself.
size_t i = 0;
while (i < sizeof(tokStream)/sizeof(struct token)) {
if (tokStream[i+1].type == COMMA) {
print(tokStream[i].value, tokStream[i].value);
i += 2; // skip to next number
}
else { // DASH
print(tokStream[i].value, tokStream[i+2].value);
i += 4; // skip to next number
}
}
return 0;
}

Parse string to create a list of element

I have a string like this:
"\r color=\"red\" name=\"Jon\" \t\n depth=\"8.26\" "
And I want to parse this string and create a std::list of this object:
class data
{
std::string name;
std::string value;
};
Where for example:
name = color
value = red
What is the fastest way? I can use boost.
EDIT:
This is what i've tried:
vector<string> tokens;
split(tokens, str, is_any_of(" \t\f\v\n\r"));
if(tokens.size() > 1)
{
list<data> attr;
for_each(tokens.begin(), tokens.end(), [&attr](const string& token)
{
if(token.empty() || !contains(token, "="))
return;
vector<string> tokens;
split(tokens, token, is_any_of("="));
erase_all(tokens[1], "\"");
attr.push_back(data(tokens[0], tokens[1]));
}
);
}
But it does not work if there are spaces inside " ": like color="red 1".
Assuming that there will always be at least one white-space before the name, i think the following algorithm is fast enough:
list<data> l;
size_t fn, fv, lv = 0;
while((fv = str.find("\"", ++lv)) != string::npos &&
(lv = str.find("\"", fv+1)) != string::npos)
{
fn = str.find_last_of(" \t\n\v\f\r", fv);
l.push_back(data(str.substr(++fn, fv-fn-2), str.substr(++fv, lv-fv)));
}
Where str is your std::string and data has a constructor of this type:
data(string name, string value)
: name(name), value(value)
{ }
As you can see there was no need to use boost or regex, simply the standard library.
after your edit:
you could do the following for the space problem:
(replace all spaces that are not within " " quotes with a \n)
void PrepareForTokanization(std::string &str)
{
int quoteCount = 0;
int strLen = str.length();
for(int i=0; i<strLen; ++i){
if (str[i] == '"' && (i==0 || (str[i-1] != '\\')))
quoteCount++;
if(str[i] == ' ' && quoteCount%2 == 0)
str[i] = '\n';
}
}
and before you call split, prepare the string, and then remove the space character from the split is_any_of
PrepareForTokanization(str);
split(tokens, str, is_any_of("\t\f\v\n\r"));