Split string into key-value pairs using C++

Split string into key-value pairs using C++ - c++

I have a string like this:
"CA: ABCD\nCB: ABFG\nCC: AFBV\nCD: 4567"
Now ": " splits key from value while \n separates the pairs. I want to add the key-value pairs to a map in C++.
Is there any efficient way of doing this considering optimization in mind?

Well I have two methods here. The first one is the easy, obvious method that I use all the time (performance is rarely an issue). The second method is likely more efficient but I have not done any formal timings.
In my tests the second method is about 3 times faster.
#include <map>
#include <string>
#include <sstream>
#include <iostream>
std::map<std::string, std::string> mappify1(std::string const& s)
{
std::map<std::string, std::string> m;
std::string key, val;
std::istringstream iss(s);
while(std::getline(std::getline(iss, key, ':') >> std::ws, val))
m[key] = val;
return m;
}
std::map<std::string, std::string> mappify2(std::string const& s)
{
std::map<std::string, std::string> m;
std::string::size_type key_pos = 0;
std::string::size_type key_end;
std::string::size_type val_pos;
std::string::size_type val_end;
while((key_end = s.find(':', key_pos)) != std::string::npos)
{
if((val_pos = s.find_first_not_of(": ", key_end)) == std::string::npos)
break;
val_end = s.find('\n', val_pos);
m.emplace(s.substr(key_pos, key_end - key_pos), s.substr(val_pos, val_end - val_pos));
key_pos = val_end;
if(key_pos != std::string::npos)
++key_pos;
}
return m;
}
int main()
{
std::string s = "CA: ABCD\nCB: ABFG\nCC: AFBV\nCD: 4567";
std::cout << "mappify1: " << '\n';
auto m = mappify1(s);
for(auto const& p: m)
std::cout << '{' << p.first << " => " << p.second << '}' << '\n';
std::cout << "mappify2: " << '\n';
m = mappify2(s);
for(auto const& p: m)
std::cout << '{' << p.first << " => " << p.second << '}' << '\n';
}
Output:
mappify1:
{CA => ABCD}
{CB => ABFG}
{CC => AFBV}
{CD => 4567}
mappify2:
{CA => ABCD}
{CB => ABFG}
{CC => AFBV}
{CD => 4567}

This format is called "Tag-Value".
The most performance critical place where such encoding is used in the industry is probably financial FIX Protocol (= for key-value separator, and '\001' as entries delimiter). So if you are on x86 hardware then your best bet would be to google 'SSE4 FIX protocol parser github' and reuse the open sourced findings of HFT shops.
If you still want to delegate the vectorization part to the compiler and can spare few nanoseconds for readability then the most elegant solution is to store the result in a std::string (data) + boost::flat_map<boost::string_ref, boost::string_ref> (view). Parsing is a matter of taste, while-loop or strtok would be easiest for the compiler to parse. Boost-spirit based parser would be easiest for a human (familiar with boost-spirit) to read.
C++ for-loop based solution
#include <boost/container/flat_map.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/range/iterator_range_io.hpp>
#include <iostream>
// g++ -std=c++1z ~/aaa.cc
int main()
{
using range_t = boost::iterator_range<std::string::const_iterator>;
using map_t = boost::container::flat_map<range_t, range_t>;
char const sep = ':';
char const dlm = '\n';
// this part can be reused for parsing multiple records
map_t result;
result.reserve(1024);
std::string const input {"hello:world\n bye: world"};
// this part is per-line/per-record
result.clear();
for (auto _beg = begin(input), _end = end(input), it = _beg; it != _end;)
{
auto sep_it = std::find(it, _end, sep);
if (sep_it != _end)
{
auto dlm_it = std::find(sep_it + 1, _end, dlm);
result.emplace(range_t {it, sep_it}, range_t {sep_it + 1, dlm_it});
it = dlm_it + (dlm_it != _end);
}
else throw std::runtime_error("cannot parse");
}
for (auto& x: result)
std::cout << x.first << " => " << x.second << '\n';
return 0;
}

The format is simple enough that doing the parsing "by hand" IMO is the best option, overall remains quite readable.
This should also be reasonably efficient (the key and value strings are always the same - albeit cleared, so the reallocations inside the main loop should just stop after a few iterations); ret also should qualify for NRVO, OTOH in case of problems with that you can always change to an output parameter.
Of course std::map may not be the fastest gun in the west, but it's a request in the problem text.
std::map<std::string, std::string> parseKV(const std::string &sz) {
std::map<std::string, std::string> ret;
std::string key;
std::string value;
const char *s=sz.c_str();
while(*s) {
// parse the key
while(*s && *s!=':' && s[1]!=' ') {
key.push_back(*s);
++s;
}
// if we quit due to the end of the string exit now
if(!*s) break;
// skip the ": "
s+=2;
// parse the value
while(*s && *s!='\n') {
value.push_back(*s);
++s;
}
ret[key]=value;
key.clear(); value.clear();
// skip the newline
++s;
}
return ret;
}

If worried about performance, you should probably rethink the need for the end result to be a map. That could end up being a lot of char buffers in memory. Ideally keeping track of just the char* and length of each sub string will be faster/smaller.

Here is a solution, using strtok as a splitting means. Please note that strtok changes your string, it puts '\0' at the split char.
#include <iostream>
#include <string>
#include <map>
#include <string.h>
using namespace std;
int main (int argc, char *argv[])
{
char s1[] = "CA: ABCD\nCB: ABFG\nCC: AFBV\nCD: 4567";
map<string, string> mymap;
char *token;
token = strtok(s1, "\n");
while (token != NULL) {
string s(token);
size_t pos = s.find(":");
mymap[s.substr(0, pos)] = s.substr(pos + 1, string::npos);
token = strtok(NULL, "\n");
}
for (auto keyval : mymap)
cout << keyval.first << "/" << keyval.second << endl;
return 0;
}

I doubt you should worry about optimization for reading this string and converting it in a std::map. If you really want to optimize this fixed-content map, change it to a std::vector<std::pair<>> and sort it once.
That said, the most elegant way of creating the std::map with standard C++ features is the following:
std::map<std::string, std::string> deserializeKeyValue(const std::string &sz) {
constexpr auto ELEMENT_SEPARATOR = ": "s;
constexpr auto LINE_SEPARATOR = "\n"s;
std::map<std::string, std::string> result;
std::size_t begin{0};
std::size_t end{0};
while (begin < sz.size()) {
// Search key
end = sz.find(ELEMENT_SEPARATOR, begin);
assert(end != std::string::npos); // Replace by error handling
auto key = sz.substr(begin, /*size=*/ end - begin);
begin = end + ELEMENT_SEPARATOR.size();
// Seach value
end = sz.find(LINE_SEPARATOR, begin);
auto value = sz.substr(begin, end == std::string::npos ? std::string::npos : /*size=*/ end - begin);
begin = (end == std::string::npos) ? sz.size() : end + LINE_SEPARATOR.size();
// Store key-value
[[maybe_unused]] auto emplaceResult = result.emplace(std::move(key), std::move(value));
assert(emplaceResult.second); // Replace by error handling
}
return result;
}
The performance of this might not be ideal, though every c++ programmer understands this code.

A very simple solution using boost is the following, it works also with partial tokens (e.g. key without values or empty pairs).
#include <string>
#include <list>
#include <map>
#include <iostream>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>
using namespace std;
using namespace boost;
int main() {
string s = "CA: ABCD\nCB: ABFG\nCC: AFBV\nCD: 4567";
list<string> tokenList;
split(tokenList,s,is_any_of("\n"),token_compress_on);
map<string, string> kvMap;
BOOST_FOREACH(string token, tokenList) {
size_t sep_pos = token.find_first_of(": ");
string key = token.substr(0,sep_pos);
string value = (sep_pos == string::npos ? "" : token.substr(sep_pos+2,string::npos));
kvMap[key] = value;
cout << "[" << key << "] => [" << kvMap[key] << "]" << endl;
}
return 0;
}

void splitString(std::map<std::string, std::string> &mymap, const std::string &text, char sep)
{
int start = 0, end1 = 0, end2 = 0;
while ((end1 = text.find(sep, start)) != std::string::npos && (end2 = text.find(sep, end1+1)) != std::string::npos) {
std::string key = text.substr(start, end1 - start);
std::string val = text.substr(end1 + 1, end2 - end1 - 1);
mymap.insert(std::pair<std::string,std::string>(key, val));
start = end2 + 1;
}
}
For example:
std::string text = "key1;val1;key2;val2;key3;val3;";
std::map<std::string, std::string> mymap;
splitString(mymap, text, ';');
Will result in a map of size 3: { key1="val1", key2="val2", key3="val3" }
More examples:
"key1;val1;key2;" => {key1="val1"} (no 2nd val, so 2nd key doesn't count)
"key1;val1;key2;val2" => {key1="val1"} (no delim at end of the 2nd val, so it doesn't count)
"key1;val1;key2;;" => {key1="val1",key2=""} (key2 holds empty string)

Have looked through the accepted answer and tried to extend a bit which seems to work in more general cases. The test run can be found here. All kind of comments or modification are welcome.
#include <iostream>
#include <string>
#include <sstream>
#include <map>
#include <algorithm>
#include <vector>
size_t find(const std::string& line, std::vector<std::string> vect, int pos=0) {
int eol1;
eol1 = 0;
for (std::vector<std::string>::iterator iter = vect.begin(); iter != vect.end(); ++iter) {
//std::cout << *iter << std::endl;
int eol2 = line.find(*iter, pos);
if (eol1 == 0 && eol2 > 0)
eol1 = eol2;
else if (eol2 > 0 && eol2 < eol1)
eol1 = eol2;
}
return eol1;
}
std::map<std::string, std::string> mappify(std::string const& s, char delim='=') {
std::map<std::string, std::string> m;
std::string::size_type key_pos = 0, i, j;
std::string::size_type key_end;
std::string::size_type val_pos;
std::string::size_type lim_pos;
std::string::size_type val_end;
while ((key_end = s.find(delim, key_pos)) != std::string::npos) {
if ((val_pos = s.find_first_not_of(delim, key_end + 1)) == std::string::npos)break;
while (key_end - 1 > 0 && (s[key_end - 1] <= 32 || s[key_end - 1] == ';'))
key_end--;
while (val_pos < s.size() && (s[val_pos] <= 32 || s[val_pos] == ';'))
val_pos++;
val_end = s.find('\n', val_pos);
i = s.find('\"', val_pos);
if (i != std::string::npos)
j = s.find('\"', i + 1);
else
j = 0;
lim_pos = find(s.substr(0, i), { " ",";","\t" }, val_pos + 1);
//std::cout << "s.substr(j):" << s.substr(j)<<std::endl;
if (lim_pos == 0 && j != std::string::npos)lim_pos = find(s.substr(j), { " ",";","\t" }) + j;
if (lim_pos < val_pos)lim_pos = val_pos + 1;
if (j > 0)val_end = j + 1;
if (val_end > lim_pos)val_end = lim_pos;
m.emplace(s.substr(key_pos, key_end - key_pos), s.substr(val_pos, val_end - val_pos));
key_pos = val_end;
while ((key_pos < s.size() && s[key_pos] <= 32 || s[key_pos] == ';'))
++key_pos;
if (val_end == 0)break;
}
return m;
}
int main() {
std::string s ="\
File=\"c:\\dir\\ocean\\\nCCS_test.txt\"\n\
iEcho=10000; iHrShift=0 rho_Co2 = 1.15d0;\n\
Liner=01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890";
auto m = mappify(s);
for (auto const& p : m)
std::cout << '{' << p.first << " :=> " << p.second << '}' << '\n';
return 0;
}

Related

Counting words in an input string in C++ **with consideration for typos

I've been looking for ways to count the number of words in a string, but specifically for strings that may contain typos (i.e. "_This_is_a___test" as opposed to "This_is_a_test"). Most of the pages I've looked at only handle single spaces.
This is actually my first time programming in C++, and I don't have much other programming experience to speak of (2 years of college in C and Java). Although what I have is functional, I'm also aware it's complex, and I'm wondering if there is a more efficient way to achieve the same results?
This is what I have currently. Before I run the string through numWords(), I run it through a trim function that removes leading whitespace, then check that there are still characters remaining.
int numWords(string str) {
int count = 1;
for (int i = 0; i < str.size(); i++) {
if (str[i] == ' ' || str[i] == '\t' || str[i] == '\n') {
bool repeat = true;
int j = 1;
while (j < (str.size() - i) && repeat) {
if (str[i + j] != ' ' && str[i + j] != '\t' && str[i + j] != '\n') {
repeat = false;
i = i + j;
count++;
}
else
j++;
}
}
}
return count;
}
Also, I wrote mine to take a string argument, but most of the examples I've seen used (char* str) instead, which I wasn't sure how to use with my input string.

You don't need all those stringstreams to count word boundary
#include <string>
#include <cctype>
int numWords(std::string str)
{
bool space = true; // not in word
int count = 0;
for(auto c:str){
if(std::isspace(c))space=true;
else{
if(space)++count;
space=false;
}
}
return count;
}

One solution is to utilize std::istringstream to count the number of words and to skip over spaces automatically.
#include <sstream>
#include <string>
#include <iostream>
int numWords(std::string str)
{
int count = 0;
std::istringstream strm(str);
std::string word;
while (strm >> word)
++count;
return count;
}
int main()
{
std::cout << numWords(" This is a test ");
}
Output:
4
Albeit as mentioned std::istringstream is more "heavier" in terms of performance than writing your own loop.

Sam's comment made me write a function that does not allocate strings for words. But just creates string_views on the input string.
#include <cassert>
#include <cctype>
#include <vector>
#include <string_view>
#include <iostream>
std::vector<std::string_view> get_words(const std::string& input)
{
std::vector<std::string_view> words;
// the first word begins at an alpha character
auto begin_of_word = std::find_if(input.begin(), input.end(), [](const char c) { return std::isalpha(c); });
auto end_of_word = input.begin();
auto end_of_input = input.end();
// parse the whole string
while (end_of_word != end_of_input)
{
// as long as you see text characters move end_of_word one back
while ((end_of_word != end_of_input) && std::isalpha(*end_of_word)) end_of_word++;
// create a string view from begin of word to end of word.
// no new string memory will be allocated
// std::vector will do some dynamic memory allocation to store string_view (metadata of word positions)
words.emplace_back(begin_of_word, end_of_word);
// then skip all non readable characters.
while ((end_of_word != end_of_input) && !std::isalpha(*end_of_word) ) end_of_word++;
// and if we haven't reached the end then we are at the beginning of a new word.
if ( end_of_word != input.end()) begin_of_word = end_of_word;
}
return words;
}
int main()
{
std::string input{ "This, this is a test!" };
auto words = get_words(input);
for (const auto& word : words)
{
std::cout << word << "\n";
}
return 0;
}

You can use standard function std::distance with std::istringstream the following way
#include <iostream>
#include <sstream>
#include <string>
#include <iterator>
int main()
{
std::string s( " This is a test" );
std::istringstream iss( s );
auto count = std::distance( std::istream_iterator<std::string>( iss ),
std::istream_iterator<std::string>() );
std::cout << count << '\n';
}
The program output is
4
If you want you can place the call of std::distance in a separate function like
#include <iostream>
#include <sstream>
#include <string>
#include <iterator>
size_t numWords( const std::string &s )
{
std::istringstream iss( s );
return std::distance( std::istream_iterator<std::string>( iss ),
std::istream_iterator<std::string>() );
}
int main()
{
std::string s( " This is a test" );
std::cout << numWords( s ) << '\n';
}
If separators can include other characters apart from white space characters as for example punctuations then you should use methods of the class std::string or std::string_view find_first_of and find_first_not_of.
Here is a demonstration program.
#include <iostream>
#include <string>
#include <string_view>
size_t numWords( const std::string_view s, std::string_view delim = " \t" )
{
size_t count = 0;
for ( std::string_view::size_type pos = 0;
( pos = s.find_first_not_of( delim, pos ) ) != std::string_view::npos;
pos = s.find_first_of( delim, pos ) )
{
++count;
}
return count;
}
int main()
{
std::string s( "Is it a test ? Yes ! Now we will run it ..." );
std::cout << numWords( s, " \t!?.," ) << '\n';
}
The program output is
10

you can do it easily with regex
int numWords(std::string str)
{
std::regex re("\\S+"); // or `[^ \t\n]+` to exactly match the question
return std::distance(
std::sregex_iterator(str.begin(), str.end(), re),
std::sregex_iterator()
);
}

Problem with only reading certain values in a string

For the following string string s1 = "172.16.254.01";I only want to read the numbers values and push them into a vector. It works well with this problem, but if I want to do it to let's say string s1 = "172.16.254...01"it will also push back two empty strings into my vector which I don't want to do. This will ruin my new vector, because not only do I have two empty strings but the size of the vector will also increase which will be problamatic in my case.
This is my code:
string s1 = "172.16.254.01";
vector<string> res;
string delimiter = ".";
size_t pos = 0;
while ((pos = s1.find(delimiter)) != std::string::npos) {
res.push_back(s1.substr(0, pos));
s1.erase(0, pos + delimiter.length());
}
res.push_back(s1);
for (auto value : res)
{
cout << value << endl;
}
This will output:
172
16
254
01
As you can see I use "." as a delimiter and read everything into a new string until it hits another ".". How can I manipulate my loop so that it fulfills the given requirements I mentioned at the start?

You need to check that between two delimiters there is a non-empty substring.
For example if the source string starts from the delimiter "." like ".1" then in the first iteration of the while loop an empty string will be pushed on the vector
while ((pos = s1.find(delimiter)) != std::string::npos) {
res.push_back(s1.substr(0, pos));
This statement after the while loop
res.push_back(s1);
can also append an empty string to the vector.
And there is no great sense to erase the source string.
I can suggest the following solution.
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::string s( "172.16.254...01" );
std::vector<std::string> v;
const char delim = '.';
for (std::string::size_type pos = 0; pos != std::string::npos; )
{
auto last = s.find( delim, pos );
if (pos != last)
{
pos = s.find_first_not_of( ' ', pos );
if (pos != last)
{
v.push_back( s.substr( pos, last - pos ) );
}
pos = last;
}
if (pos != std::string::npos) ++pos;
}
for (const auto &item : v)
{
std::cout << item << ' ';
}
std::cout << '\n';
}
The program output is
172 16 254 01

Your code is not skipping unnecessary delimiter. It is directly inserting the values into res. So you should try something like this to skip unnecessary delimiter:
if (pos < s1.size())
{
if (s1.substr(pos + 1, delimiter.size()) == delimiter)
{
// Erase the extra delimiter
s1.erase(pos + 1, delimiter.length()); continue;
}
}
Final code:
#include <iostream>
#include <vector>
#include <string>
int main()
{
std::string s1 = "172.16.254...01";
std::vector<std::string> res;
std::string delimiter = ".";
size_t pos = 0;
while ((pos = s1.find(delimiter)) != std::string::npos)
{
if (s1.substr(pos + 1, delimiter.size()) == delimiter)
{
// Erase the extra delimiter
s1.erase(pos + 1, delimiter.length()); continue;
}
res.push_back(s1.substr(0, pos));
s1.erase(0, pos + delimiter.length());
}
for (auto value : res)
{
std::cout << value << std::endl;
}
}
Also, you should consider not using the following line in your code:
using namespace std;
...as it's considered as bad practice. Instead use std:: everytime.

Is there any inbuilt function available two get string between two delimiter string in C/C++?

Is there any inbuilt function available to get strings between two delimiter string in C++?
Input string
(23567)=(58765)+(67888)+(65678)
Expected Output
23567
58765
67888
65678
include <iostream>
#include <stdexcept>
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> tokenize(const std::string& input)
{
std::vector<std::string> result;
std::istringstream stream(input);
std::string thingie; // please choose a better name, my inspiration is absent today
while(std::getline(stream, thingie, '('))
{
if(std::getline(stream, thingie, ')'))
result.push_back(thingie);
else
throw std::runtime_error("expected \')\' to match \'(\'.");
}
return result;
}
void rtc()
{
ifstream myfile(test.txt);
if(myfile.is_open())
while (!myfile.eof())
{
getline(myfile,line);
auto tokens = tokenize(line);
for(auto&& item : tokens)
std::cout << item << '\n';
}
Error C4430 missing type specifier int assumed note:c++ does not support default int
ErrorC2440initializing cannot convertfrom std::vector<_ty>to int
Error C2059syntac error empty declaration
Error C2143syntax error missing;before&&
Error C2059syntax error:')'

Use std::getline:
#include <iostream>
#include <stdexcept>
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> tokenize(const std::string& input)
{
std::vector<std::string> result;
std::istringstream stream(input);
std::string thingie; // please choose a better name, my inspiration is absent today
while(std::getline(stream, thingie, '('))
{
if(std::getline(stream, thingie, ')'))
result.push_back(thingie);
else
throw std::runtime_error("expected \')\' to match \'(\'.");
}
return result;
}
int main()
{
std::string test = "(23567)=(58765)+(67888)+(65678)";
auto tokens = tokenize(test);
for(auto&& item : tokens)
std::cout << item << '\n';
}
Live example here.
For those not entirely convinced by the awesome robustness of this solution, I specialized this for double inputs between the parentheses, and used boost::lexical_cast to verify the input:
#include <iostream>
#include <stdexcept>
#include <string>
#include <sstream>
#include <vector>
#include <boost/lexical_cast.hpp>
std::vector<double> tokenize(const std::string& input)
{
std::vector<double> result;
std::istringstream stream(input);
std::string thingie; // please choose a better name, my inspiration is absent today
while(std::getline(stream, thingie, '('))
{
if(std::getline(stream, thingie, ')'))
{
try
{
result.push_back(boost::lexical_cast<double>(thingie));
}
catch(...)
{
throw std::runtime_error("This wasn't just a number, was it?");
}
}
else
throw std::runtime_error("expected \')\' to match \'(\'.");
}
return result;
}
int main()
{
std::string test = "(23567)=(58765)+(67888)+(65678)";
auto tokens = tokenize(test);
for(auto&& item : tokens)
std::cout << item << '\n';
test = "(2h567)=(58765)+(67888)+(65678)";
tokens = tokenize(test);
}
Live example here. Now go cry about how bad strtok really is, or how bad/unportable the general <regex> implementations are currently. Also, for those who doubt boost::lexical_cast performance-wise, please see the results for yourself.

strpbrk can be used to find the start of each token
or strcspn can be used to count the characters until the next token
then strspn can be used to find the length of each token.
const char tokenChars[] = "0123456789";
char token = input; // say input is "(23567)=(58765)+(67888)+(65678)"
while( 0 != (token = strpbrk( token, tokenChars )) ) // find token
{
size_t tokenLen = strspn( token, token_chars ); // find length of token
// print out tokenLen characters of token here!
token+= tokenLen; // go to end of token
}
http://www.cplusplus.com/reference/cstring/strspn/
http://www.cplusplus.com/reference/cstring/strcspn/
http://www.cplusplus.com/reference/cstring/strpbrk/

Here's the answer if you wanna use pointers:
char test[32] = "(23567)=(58765)+(67888)+(65678)";
char *output = NULL;
char *pos = (char *)test;
int length = 0;
while (*pos != '\0') {
if(*pos == '(' || *pos == ')' || *pos == '+' || *pos == '=') {
*pos = '\0';
if (length > 0) {
output = new char[length + 1];
strncpy_s(output, length + 1, pos - length, length + 1);
length = 0;
cout << output << endl;
delete [] output;
output = NULL;
}
} else {
length++;
}
pos++;
}

While some of the commentators may hate it I like this:
for (p = std::strtok(input, "+"); p != NULL; p = std::strtok(NULL, "+"))
{
// do more stuff
}
This won't work off the bat - the delimiters need expanding - it demonstrates the ease of use.

const char input[] = "(2av67q)=(ble ble)+(67888)+(qpa)";
int s = 0;
for(int i = 0; input[i]; i++)
{
if ( input[i] == ')' )
{
cout << endl;
s = 0;
}
else if ( input[i] == '(' )
{
s = 1;
continue;
}
else
{
if ( s == 1 )
{
cout << input[i];
}
}
}
result:
2av67q
ble ble
67888
qpa

Here is a solution using a regular expression:
std::vector<std::string> get_numbers(std::string const& s)
{
static std::regex regex(R"(^\((\d+)\)=\((\d+)\)(?:\+\((\d+)\))+$)",
std::regex_constants::ECMAScript
| std::regex_constants::optimize);
std::vector<std::string> results;
std::sregex_iterator matches(s.cbegin(), s.cend(), regex);
for (auto first = matches->cbegin(), last = matches->cend();
last != first;
++first)
{
results.push_back(first->str());
}
return results;
}

Complex algorithm to extract numbers/number range from a string

I am working on a algorithm where I am trying the following output:
Given values/Inputs:
char *Var = "1-5,10,12,15-16,25-35,67,69,99-105";
int size = 29;
Here "1-5" depicts a range value, i.e. it will be understood as "1,2,3,4,5" while the values with just "," are individual values.
I was writing an algorithm where end output should be such that it will give complete range of output as:
int list[]=1,2,3,4,5,10,12,15,16,25,26,27,28,29,30,31,32,33,34,35,67,69,99,100,101,102,103,104,105;
If anyone is familiar with this issue then the help would be really appreciated.
Thanks in advance!
My initial code approach was as:
if(NULL != strchr((char *)grp_range, '-'))
{
int_u8 delims[] = "-";
result = (int_u8 *)strtok((char *)grp_range, (char *)delims);
if(NULL != result)
{
start_index = strtol((char*)result, (char **)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
while(NULL != result)
{
end_index = strtol((char*)result, (char**)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
while(start_index <= end_index)
{
grp_list[i++] = start_index;
start_index++;
}
}
else if(NULL != strchr((char *)grp_range, ','))
{
int_u8 delims[] = ",";
result = (unison_u8 *)strtok((char *)grp_range, (char *)delims);
while(result != NULL)
{
grp_list[i++] = strtol((char*)result, (char**)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
}
But it only works if I have either "0-5" or "0,10,15". I am looking forward to make it more versatile.

Here is a C++ solution for you to study.
#include <vector>
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int ConvertString2Int(const string& str)
{
stringstream ss(str);
int x;
if (! (ss >> x))
{
cerr << "Error converting " << str << " to integer" << endl;
abort();
}
return x;
}
vector<string> SplitStringToArray(const string& str, char splitter)
{
vector<string> tokens;
stringstream ss(str);
string temp;
while (getline(ss, temp, splitter)) // split into new "lines" based on character
{
tokens.push_back(temp);
}
return tokens;
}
vector<int> ParseData(const string& data)
{
vector<string> tokens = SplitStringToArray(data, ',');
vector<int> result;
for (vector<string>::const_iterator it = tokens.begin(), end_it = tokens.end(); it != end_it; ++it)
{
const string& token = *it;
vector<string> range = SplitStringToArray(token, '-');
if (range.size() == 1)
{
result.push_back(ConvertString2Int(range[0]));
}
else if (range.size() == 2)
{
int start = ConvertString2Int(range[0]);
int stop = ConvertString2Int(range[1]);
for (int i = start; i <= stop; i++)
{
result.push_back(i);
}
}
else
{
cerr << "Error parsing token " << token << endl;
abort();
}
}
return result;
}
int main()
{
vector<int> result = ParseData("1-5,10,12,15-16,25-35,67,69,99-105");
for (vector<int>::const_iterator it = result.begin(), end_it = result.end(); it != end_it; ++it)
{
cout << *it << " ";
}
cout << endl;
}
Live example
http://ideone.com/2W99Tt

This is my boost approach :
This won't give you array of ints, instead a vector of ints
Algorithm used: (nothing new)
Split string using ,
Split the individual string using -
Make a range low and high
Push it into vector with help of this range
Code:-
#include<iostream>
#include<vector>
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>
int main(){
std::string line("1-5,10,12,15-16,25-35,67,69,99-105");
std::vector<std::string> strs,r;
std::vector<int> v;
int low,high,i;
boost::split(strs,line,boost::is_any_of(","));
for (auto it:strs)
{
boost::split(r,it,boost::is_any_of("-"));
auto x = r.begin();
low = high =boost::lexical_cast<int>(r[0]);
x++;
if(x!=r.end())
high = boost::lexical_cast<int>(r[1]);
for(i=low;i<=high;++i)
v.push_back(i);
}
for(auto x:v)
std::cout<<x<<" ";
return 0;
}

You're issue seems to be misunderstanding how strtok works. Have a look at this.
#include <string.h>
#include <stdio.h>
int main()
{
int i, j;
char delims[] = " ,";
char str[] = "1-5,6,7";
char *tok;
char tmp[256];
int rstart, rend;
tok = strtok(str, delims);
while(tok != NULL) {
for(i = 0; i < strlen(tok); ++i) {
//// range
if(i != 0 && tok[i] == '-') {
strncpy(tmp, tok, i);
rstart = atoi(tmp);
strcpy(tmp, tok + i + 1);
rend = atoi(tmp);
for(j = rstart; j <= rend; ++j)
printf("%d\n", j);
i = strlen(tok) + 1;
}
else if(strchr(tok, '-') == NULL)
printf("%s\n", tok);
}
tok = strtok(NULL, delims);
}
return 0;
}

Don't search. Just go through the text one character at a time. As long as you're seeing digits, accumulate them into a value. If the digits are followed by a - then you're looking at a range, and need to parse the next set of digits to get the upper bound of the range and put all the values into your list. If the value is not followed by a - then you've got a single value; put it into your list.

Stop and think about it: what you actually have is a comma
separated list of ranges, where a range can be either a single
number, or a pair of numbers separated by a '-'. So you
probably want to loop over the ranges, using recursive descent
for the parsing. (This sort of thing is best handled by an
istream, so that's what I'll use.)
std::vector<int> results;
std::istringstream parser( std::string( var ) );
processRange( results, parser );
while ( isSeparator( parser, ',' ) ) {
processRange( results, parser );
}
with:
bool
isSeparator( std::istream& source, char separ )
{
char next;
source >> next;
if ( source && next != separ ) {
source.putback( next );
}
return source && next == separ;
}
and
void
processRange( std::vector<int>& results, std::istream& source )
{
int first = 0;
source >> first;
int last = first;
if ( isSeparator( source, '-' ) ) {
source >> last;
}
if ( last < first ) {
source.setstate( std::ios_base::failbit );
}
if ( source ) {
while ( first != last ) {
results.push_back( first );
++ first;
}
results.push_back( first );
}
}
The isSeparator function will, in fact, probably be useful in
other projects in the future, and should be kept in your
toolbox.

First divide whole string into numbers and ranges (using strtok() with "," delimiter), save strings in array, then, search through array looking for "-", if it present than use sscanf() with "%d-%d" format, else use sscanf with single "%d" format.
Function usage is easily googling.

One approach:
You need a parser that identifies 3 kinds of tokens: ',', '-', and numbers. That raises the level of abstraction so that you are operating at a level above characters.
Then you can parse your token stream to create a list of ranges and constants.
Then you can parse that list to convert the ranges into constants.
Some code that does part of the job:
#include <stdio.h>
// Prints a comma after the last digit. You will need to fix that up.
void print(int a, int b) {
for (int i = a; i <= b; ++i) {
printf("%d, ", i);
}
}
int main() {
enum { DASH, COMMA, NUMBER };
struct token {
int type;
int value;
};
// Sample input stream. Notice the sentinel comma at the end.
// 1-5,10,
struct token tokStream[] = {
{ NUMBER, 1 },
{ DASH, 0 },
{ NUMBER, 5 },
{ COMMA, 0 },
{ NUMBER, 10 },
{ COMMA, 0 } };
// This parser assumes well formed input. You have to add all the error
// checking yourself.
size_t i = 0;
while (i < sizeof(tokStream)/sizeof(struct token)) {
if (tokStream[i+1].type == COMMA) {
print(tokStream[i].value, tokStream[i].value);
i += 2; // skip to next number
}
else { // DASH
print(tokStream[i].value, tokStream[i+2].value);
i += 4; // skip to next number
}
}
return 0;
}

Add items to a vector recursively

I'm attempting to create a recursive function that outputs a vector of strings that contains all possible word combinations (while retaining order of letters) of a given string. Basically, the foundation of an auto-correct typing program, which produces effects similar that of the iPhone.
vector<string> allPossibleWords(string str, vector<vector<char> > & adjacentKeys)
{
vector<string> words;
cout << str << endl;
if (str.length() == 0)
{
return words;
}
char firstLetter = str[0];
string restOf = str.substr(1, str.length() - 1);
int position = position_in_vector(firstLetter);
for (int i = 0; i < adjacentKeys[position].size(); i++)
{
string temp(1, adjacentKeys[position][i]);
words.push_back(temp);
}
//allPossibleWords(restOf, adjacentKeys);
}
int position_in_vector(char letter)
{
return (letter % 97);
}
For instance, if str is "yp", the output should be a vector containing the values {"yp", "tp", "gp", "hp", "up", "yo", "to", "go", "ho", "uo", "yl", "tl", "gl", "hl", "ul"}. If str is "y", the output should be a vector containing the values {"y", "t", "g", "h", "u"}.
The 26 vectors stored in adjacentKeys contain the letters adjacent to the letter that is stored in the first position of the vector.
a qwsz
b vghjn
c xdfgv
d zserfcx
//and so on
I am stuck with this function, and can't figure out how to recursively build this vector.

(Update: 2130 GMT Sunday: I've significantly changed my answer. I think this works now.)
Here is a complete program. There are other changes I think I would make, but I'm trying to keep to the spirit of your initial solution. It's important to return a single empty word when str.length()==0.
#include <vector>
#include <iostream>
using namespace std;
vector<string> allPossibleWords(string str, vector<vector<char> > & adjacentKeys)
{
vector<string> words;
// cout << "str=" << str << endl;
if (str.length() == 0)
{
words.push_back("");
return words;
}
char firstLetter = str[0];
// cout << "firstLetter=" << firstLetter << endl;
int positionInAdjacentKeys = 0;
while(positionInAdjacentKeys < adjacentKeys.size() && adjacentKeys.at(positionInAdjacentKeys).front() != firstLetter) {
++ positionInAdjacentKeys;
}
vector<char> & adjacent = adjacentKeys.at(positionInAdjacentKeys);
string restOf = str.substr(1, str.length() - 1);
// cout << firstLetter << ":" << restOf << endl;
// int position = position_in_vector(firstLetter);
vector<string> recursiveWords = allPossibleWords(restOf, adjacentKeys);
for (int i = 0; i < adjacent.size(); i++)
{
const string temp(1, adjacent[i]);
// cout << " temp=" << temp << endl;
for(vector<string>::const_iterator i = recursiveWords.begin(); i != recursiveWords.end(); i++)
{
// cout << "new word=" << temp + *i << endl;
words.push_back(temp + *i);
}
}
return words;
}
int main() {
vector<vector<char> > adj;
vector<char> v1;
v1.clear();
v1.push_back('p');
v1.push_back('o');
v1.push_back('l');
adj.push_back(v1);
v1.clear();
v1.push_back('y');
v1.push_back('t');
v1.push_back('g');
v1.push_back('h');
v1.push_back('u');
adj.push_back(v1);
adj.push_back(v1);
vector<string> words = allPossibleWords("yp", adj);
for(vector<string> :: const_iterator i = words.begin(); i != words.end(); i++) {
cout << *i << endl;
}
}
return

Maybe something like this? I haven't tested it because I don't have your adjacentKeys matrix. It can probably be optimised a bit, but I don't think this approach will scale well at all.
I'd suggest attacking the problem from a different angle, perhaps storing your dictionary in some kind of K-ary tree, and having several pointers walking the tree, following branches based on your adjacency matrix. This would stop the generation of invalid words (and subsequent lookups to check validity) as branches would only exist where valid words exist.
using namespace std;
void allPossibleWordsHelper(const string& str,
string::size_type index,
const vector<vector<char> >& adjacentKeys,
vector<string>& results)
{
if (str.length() == 0)
{
return;
}
std::string head = (index > 0) ? str.substr(0, index) : "";
std::string tail = (index < str.length() - 1) ? str.substr(index + 1) : "";
vector<string> possibleHeads;
string::size_type headIndex = (str.length() - index) / 2;
allPossibleWordsHelper(head, headIndex, adjacentKeys, possibleHeads);
vector<string> possibleTails;
allPossibleWordsHelper(tail, index + headIndex, adjacentKeys, possibleTails);
int pos = str[index] - 'a';
vector<string>::const_iterator headi;
vector<string>::const_iterator headi_end = possibleHeads.end();
vector<string>::const_iterator taili;
vector<string>::const_iterator taili_end = possibleTails.end();
vector<char>::const_iterator aki;
vector<char>::const_iterator aki_end = adjacentKeys[pos].end();
for(headi = possibleHeads.begin(); headi != headi_end; ++headi)
{
for (aki = adjacentKeys[pos].begin(); aki != aki_end; ++aki)
{
for (taili = possibleTails.begin(); taili != taili_end; ++taili)
{
string suggestedWord = *headi + *aki + *taili;
results.push_back(suggestedWord);
}
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Split string into key-value pairs using C++ - c++

I have a string like this: "CA: ABCD\nCB: ABFG\nCC: AFBV\nCD: 4567" Now ": " splits key from value while \n separates the pairs. I want to add the key-value pairs to a map in C++. Is there any efficient way of doing this considering optimization in mind?

If worried about performance, you should probably rethink the need for the end result to be a map. That could end up being a lot of char buffers in memory. Ideally keeping track of just the char* and length of each sub string will be faster/smaller.

Related

Counting words in an input string in C++ **with consideration for typos

Problem with only reading certain values in a string

Is there any inbuilt function available two get string between two delimiter string in C/C++?

Complex algorithm to extract numbers/number range from a string

Add items to a vector recursively

Categories

Resources