Parsing key/value pairs from a string in C++ - c++

I'm working in C++11, no Boost. I have a function that takes as input a std::string that contains a series of key-value pairs, delimited with semicolons, and returns an object constructed from the input. All keys are required, but may be in any order.
Here is an example input string:
Top=0;Bottom=6;Name=Foo;
Here's another:
Name=Bar;Bottom=20;Top=10;
There is a corresponding concrete struct:
struct S
{
const uint8_t top;
const uint8_t bottom;
const string name;
}
I've implemented the function by repeatedly running a regular expression on the input string, once per member of S, and assigning the captured group of each to the relevant member of S, but this smells wrong. What's the best way to handle this sort of parsing?

For an easy readable solution, you can e.g. use std::regex_token_iterator and a sorted container to distinguish the attribute value pairs (alternatively use an unsorted container and std::sort).
std::regex r{R"([^;]+;)"};
std::set<std::string> tokens{std::sregex_token_iterator{std::begin(s), std::end(s), r}, std::sregex_token_iterator{}};
Now the attribute value strings are sorted lexicographically in the set tokens, i.e. the first is Bottom, then Name and last Top.
Lastly use a simple std::string::find and std::string::substr to extract the desired parts of the string.
Live example

Do you care about performance or readability? If readability is good enough, then pick your favorite version of split from this question and away we go:
std::map<std::string, std::string> tag_map;
for (const std::string& tag : split(input, ';')) {
auto key_val = split(input, '=');
tag_map.insert(std::make_pair(key_val[0], key_val[1]));
}
S s{std::stoi(tag_map["top"]),
std::stoi(tag_map["bottom"]),
tag_map["name"]};

Related

Why does `std::string::find()` not return the end iterator on failures?

I find the behaviour of std::string::find to be inconsistent with standard C++ containers.
E.g.
std::map<int, int> myMap = {{1, 2}};
auto it = myMap.find(10); // it == myMap.end()
But for a string,
std::string myStr = "hello";
auto it = myStr.find('!'); // it == std::string::npos
Why shouldn't the failed myStr.find('!') return myStr.end() instead of std::string::npos?
Since the std::string is somewhat special when compared with other containers, I am wondering whether there is some real reason behind this.
(Surprisingly, I couldn't find anyone questioning this anywhere).
To begin with, the std::string interface is well known to be bloated and inconsistent, see Herb Sutter's Gotw84 on this topic. But nevertheless, there is a reasoning behind std::string::find returning an index: std::string::substr. This convenience member function operates on indices, e.g.
const std::string src = "abcdefghijk";
std::cout << src.substr(2, 5) << "\n";
You could implement substr such that it accepts iterators into the string, but then we wouldn't need to wait long for loud complaints that std::string is unusable and counterintuitive. So given that std::string::substr accepts indices, how would you find the index of the first occurence of 'd' in the above input string in order to print out everything starting from this substring?
const auto it = src.find('d'); // imagine this returns an iterator
std::cout << src.substr(std::distance(src.cbegin(), it));
This might also not be what you want. Hence we can let std::string::find return an index, and here we are:
const std::string extracted = src.substr(src.find('d'));
If you want to work with iterators, use <algorithm>. They allow you to the above as
auto it = std::find(src.cbegin(), src.cend(), 'd');
std::copy(it, src.cend(), std::ostream_iterator<char>(std::cout));
This is because std::string have two interfaces:
The general iterator based interface found on all containers
The std::string specific index based interface
std::string::find is part of the index based interface, and therefore returns indices.
Use std::find to use the general iterator based interface.
Use std::vector<char> if you don't want the index based interface (don't do this).

In place Tokenization of std::string into a map of Key Value

In C the delimiters can be replaced by Nulls and a map of char* -> char* with a comparison function would work.
I am trying to figure out the fastest possible way to do this in Modern C++ . The idea is to avoid Copying characters in the Map.
std::string sample_string("name=alpha;title=something;job=nothing");
to
std::map<std::string,std::string> sample_map;
Without copying characters.
It's ok to lose original input string.
Two std::strings cannot point to the same underlying bytes, so no it's not possible to do with strings.
To avoid coping bytes, you could to use iterators:
struct Slice {
string::iterator begin, end;
bool operator < (const& Slice that) const {
return lexicographical_compare(begin, end, that.begin, that.end);
}
};
std::map<Slice,Slice> sample_map;
And beware that if you modify the original string, all the iterators will be invalid.

vector of string, need to search for a particular character in it

I have a vector of strings, i need to search for a particular character in it
vector<string> users;
users.push_back("user25_5");
users.push_back("user65_6");
users.push_back("user95_9");
I have to search for the number 65 in the vector
the find library of vectors just searches for the entire string, it does not work for particular character in the string
You can use std::find_if with a suitable functor:
bool has_65(const std::string& s)
{
// search for "65" and return bool
}
then
auto it = std::find_if(users.begin(), users.end(), has_65);
For finding strings inside strings, have a look at std::string::find.

C++: storing CSV in contianer

I have a std::string that contains comma separated values, i need to store those values in some suitable container e.g. array, vector or some other container. Is there any built in function through which i could do this? Or i need to write custom code for this?
If you're willing and able to use the Boost libraries, Boost Tokenizer would work really well for this task.
That would look like:
std::string str = "some,comma,separated,words";
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
boost::char_separator<char> sep(",");
tokenizer tokens(str, sep);
std::vector<std::string> vec(tokens.begin(), tokens.end());
You basically need to tokenize the string using , as the delimiter. This earlier Stackoverflow thread shall help you with it.
Here is another relevant post.
I don't think there is any available in the standard library. I would approach like -
Tokenize the string based on , delimeter using strtok.
Convert it to integer using atoi function.
push_back the value to the vector.
If you are comfortable with boost library, check this thread.
Using AXE parser generator you can easily parse your csv string, e.g.
std::string input = "aaa,bbb,ccc,ddd";
std::vector<std::string> v; // your strings get here
auto value = *(r_any() - ',') >> r_push_back(v); // rule for single value
auto csv = *(value & ',') & value & r_end(); // rule for csv string
csv(input.begin(), input.end());
Disclaimer: I didn't test the code above, it might have some superficial errors.

predicate for a map from string to int

I have this small program that reads a line of input & prints the words in it, with their respective number of occurrences. I want to sort the elements in the map that stores these values according to their occurrences. I mean, the words that only appear once, will be ordered to be at the beginning, then the words that appeared twice 7 so on. I know that the predicate should return a bool value, but I don't know what the parameters should be. Should it be two iterators to the map? If some one could explain this, it would be greatly appreciated. Thank you in advance.
#include<iostream>
#include<map>
using std::cout;
using std::cin;
using std::endl;
using std::string;
using std::map;
int main()
{
string s;
map<string,int> counters; //store each word & an associated counter
//read the input, keeping track of each word & how often we see it
while(cin>>s)
{
++counters[s];
}
//write the words & associated counts
for(map<string,int>::const_iterator iter = counters.begin();iter != counters.end();iter++)
{
cout<<iter->first<<"\t"<<iter->second<<endl;
}
return 0;
}
std::map is always sorted according to its key. You cannot sort the elements by their value.
You need to copy the contents to another data structure (for example std::vector<std::pair<string, int> >) which can be sorted.
Here is a predicate that can be used to sort such a vector. Note that sorting algorithms in C++ standard library need a "less than" predicate which basically says "is a smaller than b".
bool cmp(std::pair<string, int> const &a, std::pair<string, int> const &b) {
return a.second < b.second;
}
You can't resort a map, it's order is predefined (by default, from std::less on the key type). The easiest solution for your problem would be to create a std::multimap<int, string> and insert your values there, then just loop over the multimap, which will be ordered on the key type (int, the number of occurences), which will give you the order that you want, without having to define a predicate.
You are not going to be able to do this with one pass with an std::map. It can only be sorted on one thing at a time, and you cannot change the key in-place. What I would recommend is to use the code you have now to maintain the counters map, then use std::max_element with a comparison function that compares the second field of each std::pair<string, int> in the map.
A map has its keys sorted, not its values. That's what makes the map efficent. You cannot sort it by occurrences without using another data structure (maybe a reversed index!)
As stated, it simply won't work -- a map always remains sorted by its key value, which would be the strings.
As others have noted, you can copy the data to some other structure, and sort by the value. Another possibility would be to use a Boost bimap instead. I've posted a demo of the basic idea previously.
You probably want to transform map<string,int> to vector<pair<const string, int> > then sort the vector on the int member.
You could do
struct PairLessSecond
{
template< typename P >
bool operator()( const P& pairLeft, const P& pairRight ) const
{
return pairLeft.second < pairRight.second;
}
};
You can probably also construct all this somehow using a lambda with a bind.
Now
std::vector< std::map<std::string,int>::value_type > byCount;
std::sort( byCount.begin(), byCount.end(), PairLessSecond() );