iterate over ini file on c++, probably using boost::property_tree::ptree? - c++

My task is trivial - i just need to parse such file:
Apple = 1
Orange = 2
XYZ = 3950
But i do not know the set of available keys. I was parsing this file relatively easy using C#, let me demonstrate source code:
public static Dictionary<string, string> ReadParametersFromFile(string path)
{
string[] linesDirty = File.ReadAllLines(path);
string[] lines = linesDirty.Where(
str => !String.IsNullOrWhiteSpace(str) && !str.StartsWith("//")).ToArray();
var dict = lines.Select(s => s.Split(new char[] { '=' }))
.ToDictionary(s => s[0].Trim(), s => s[1].Trim());
return dict;
}
Now I just need to do the same thing using c++. I was thinking to use boost::property_tree::ptree however it seems I just can not iterate over ini file. It's easy to read ini file:
boost::property_tree::ptree pt;
boost::property_tree::ini_parser::read_ini(path, pt);
But it is not possible to iterate over it, refer to this question Boost program options - get all entries in section
The question is - what is the easiest way to write analog of C# code above on C++ ?

To answer your question directly: of course iterating a property tree is possible. In fact it's trivial:
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/ini_parser.hpp>
int main()
{
using boost::property_tree::ptree;
ptree pt;
read_ini("input.txt", pt);
for (auto& section : pt)
{
std::cout << '[' << section.first << "]\n";
for (auto& key : section.second)
std::cout << key.first << "=" << key.second.get_value<std::string>() << "\n";
}
}
This results in output like:
[Cat1]
name1=100 #skipped
name2=200 \#not \\skipped
name3=dhfj dhjgfd
[Cat_2]
UsagePage=9
Usage=19
Offset=0x1204
[Cat_3]
UsagePage=12
Usage=39
Offset=0x12304
I've written a very full-featured Inifile parser using boost-spirit before:
Cross-platform way to get line number of an INI file where given option was found
It supports comments (single line and block), quotes, escapes etc.
(as a bonus, it optionally records the exact source locations of all the parsed elements, which was the subject of that question).
For your purpose, though, I think I'd recomment Boost Property Tree.

For the moment, I've simplified the problem a bit, leaving out the logic for comments (which looks broken to me anyway).
#include <map>
#include <fstream>
#include <iostream>
#include <string>
typedef std::pair<std::string, std::string> entry;
// This isn't officially allowed (it's an overload, not a specialization) but is
// fine with every compiler of which I'm aware.
namespace std {
std::istream &operator>>(std::istream &is, entry &d) {
std::getline(is, d.first, '=');
std::getline(is, d.second);
return is;
}
}
int main() {
// open an input file.
std::ifstream in("myfile.ini");
// read the file into our map:
std::map<std::string, std::string> dict((std::istream_iterator<entry>(in)),
std::istream_iterator<entry>());
// Show what we read:
for (entry const &e : dict)
std::cout << "Key: " << e.first << "\tvalue: " << e.second << "\n";
}
Personally, I think I'd write the comment skipping as a filtering stream buffer, but for those unfamiliar with the C++ standard library, it's open to argument that would be a somewhat roundabout solution. Another possibility would be a comment_iterator that skips the remainder of a line, starting from a designated comment delimiter. I don't like that as well, but it's probably simpler in some ways.
Note that the only code we really write here is to read one, single entry from the file into a pair. The istream_iterator handles pretty much everything from there. As such, there's little real point in writing a direct analog of your function -- we just initialize the map from the iterators, and we're done.

Related

Insert into array specific strings from text file

ArticlesDataset.txt file contains all the metadata information of documents. unigramCount contains all unique words and their number of occurrences for each document. There are 1500 publications recorded in the txt file. Here is an example entry for a document:
{"creator":["Romain Allais","Julie Gobert"],
"datePublished":"2018-05-30",
"docType":"article",
"doi":"10.1051\/mattech\/2018010",
"id":"ark:\/\/27927\/phz10hn2bh3",
"isPartOf":"Mat\u00e9riaux & Techniques",
"issueNumber":"5-6",
"language":["eng"],
"outputFormat":["unigram","bigram","trigram"],
"pageCount":7,
"pagination":"pp. null-null",
"provider":"portico",
"publicationYear":2018,
"publisher":"EDP Sciences",
"sequence":3.0,
"tdmCategory":["Applied sciences -Engineering"],
"title":"Environmental assessment of PSS",
"url":"http:\/\/doi.org\/10.1051\/mattech\/2018010",
"volumeNumber":"105",
"wordCount":4446,
"unigramCount":{"others":1,"air":1,"networks,":1,"conventional":1,"IEEE":1}}
My purpose is to pull out the unigram counts for each document and store them in a suitable array. How can I do it by using fstream library?
How can i improve below code to reach my goal.
std::string dummy;
std::ifstream data("PublicationsDataSet.txt");
while (data.good())
{
getline(data, dummy, ',');
}
your question delves in two different topics, one is parsing the data and the other into storing it in memory.
To the first point the answer is, you'll need a parser, you either write one which will involve a syntax parser to convert each "key words" into tokens, for then an interpreter to compile them into a data object based on the token parameter the data is preceded or succeeded eg:
'[' = start an array, every values after this are part of the array
']' = end of the an array, return to previous parsing state
':' = separate key and values, left hand side is key, right hand side is value
...
this is a fine exercise to sharpen one's skills but way too arduous and with potential never-ending-bug-fixing road, as recommended also by other comments finding an already made library is probably the easier road on a time pinch or on a project time crunching scenario.
Another thing to point out, plain arrays in c++ are size fixed, so mostly likely since you are parsing the values you'll probably use std::vectors, which allow insertion, and once you are done processing the file and really intend to send the data back as an array you can do that directly from the object
std::vector<YourObjectType> parsedObject;
char* arr = new char[parsedObject.size()];
std::copy(v.begin(), v.end(), arr);
this is a psudo code, lots of things will depend on the implementation, but it gives the idea.
A starting point to write a parse is this article goes in great details on how it works and it's components, mind you every parser implements it's own language (yes just like c++ and other languages, are all parsed) so you'll need to expand on the concept with your commands
expression parser
Here's a simplified solution of what you could do using std::regex:
Read the lines of a stream (std::cin in this case) one by one.
Check if the line contains a unigramCount element.
If that's the case, walk the different entries within the unigramCount element.
About the regular expressions used:
"unigramCount":{}, allowing:
zero or more whitespaces basically everywhere, and
zero or more characters within the braces.
"<key>":<value>, where:
<key> is one or more characters other than a double quote,
<value> is one or more digits, and
you could have whitespaces at both sides of the :.
A good data structure for storing your unigramCount entries could be a std::map.
[Demo]
#include <iostream> // cout
#include <map>
#include <regex> // regex_match, regex_search, sregex_iterator
#include <string> // stoi
int main()
{
std::string line{};
std::map<std::string, int> unigram_counts{};
while (std::getline(std::cin, line))
{
const std::regex unigram_count_pattern{R"(^\s*\"unigramCount\"\s*:\s*\{.*\}\s*$)"};
if (std::regex_match(line, unigram_count_pattern))
{
const std::regex entry_pattern{R"(\"([^\"]+)\"\s*:\s*([0-9]+))"};
for (auto entry_it{std::sregex_iterator(line.cbegin(), line.cend(), entry_pattern)};
entry_it != std::sregex_iterator{};
++entry_it)
{
auto matches{*entry_it};
auto& key{matches[1]};
auto& value{matches[2]};
unigram_counts[key] = std::stoi(value);
}
}
}
for (auto& [key, value] : unigram_counts)
{
std::cout << "'" << key << "' : " << value << "\n";
}
}
// Outputs:
//
// 'IEEE' : 1
// 'air' : 1
// 'conventional' : 1
// 'networks,' : 1
// 'others' : 1

c++ boost parse ini file when containing multikeys

I need to parse ini file using C++ with boost library. This file contains the multi keys. For example,
[section_1]
key_1=value_1
key_1=value_2
...
key_n=value_n
[section_2]
key1=value_1
key1=value_2
...
key_n=value_1
key_n=value_2
[]
...
[section_n]
...
I tried use the functional of boost library: the function boost::property_tree::ini_parser::read_ini(), but it can't contain the multikey in ini file and return the exception. So I tried use the function boost::program_options::parse_config_file(), but it's not what I need.
What functionality should I use to parse the ini file and for each section I can to get own structure with relevant key values?
Your input is simply not an INI file, as INI files do not permit duplicate values. You can write your own parser, e.g. using the code I wrote here:¹
Cross-platform way to get line number of an INI file where given option was found
If you replace the section_t map
typedef std::map<textnode_t, textnode_t> section_t;
with multimap:
typedef std::multimap<textnode_t, textnode_t> section_t;
you can parse repeated keys:
[section_1]
key_1=value_1
key_1=value_2
key_n=value_n
[section_2]
key1=value_1
key2=value_2
key_n=value_1
key_n=value_2
[section_n]
See full code here: https://gist.github.com/sehe/068b1ae81547b98a3cec02a530f220df
¹ or Learning Boost.Spirit: parsing INI and http://coliru.stacked-crooked.com/view?id=cd1d516ae0b19bd6f9af1e3f1b132211-0d2159870a1c6cb0cd1457b292b97230 and possibly others
A SSCCE that might help you
Live On Coliru
#include <boost/property_tree/ini_parser.hpp>
#include <iostream>
using boost::property_tree::ptree;
int main() {
std::istringstream iss(R"([section_1]
key_1=value_1
key_2=value_2
key_n=value_n
[section_2]
key1=value_1
key2=value_2
key_n=value_n
key_m=value_m
[]
[section_n])");
ptree pt;
read_ini(iss, pt);
for (auto& section : pt) {
std::cout << "[" << section.first << "]\n";
for (auto& key : section.second) {
std::cout << key.first << "=" << key.second.get_value("") << "\n";
}
}
}
Prints
[section_1]
key_1=value_1
key_2=value_2
key_n=value_n
[section_2]
key1=value_1
key2=value_2
key_n=value_n
key_m=value_m

boost property tree cannot read multiple json data in one file

I really need to get help to decide my problem. I am using boost property tree to parse twitter messages that is stored in json file. All messages are saved in one json file and I need to parse all one by one.
Here is the twitter json data saved in a file. it has 3 different messages. (Below is deducted message only for test)
{"id":593393012970926082,"in_reply_to_status_id":1,"user":{"id":2292380240,"followers_count":2},"retweet_count":0}
{"id":654878454684687878,"in_reply_to_status_id":7,"user":{"id":2292380241,"followers_count":4},"retweet_count":5}
{"id":123487894154878414,"in_reply_to_status_id":343,"user":{"id":2292380242,"followers_count":773},"retweet_count":654}
And here is my C++ code for parsing the message, using property tree.
#include <boost/property_tree/json_parser.hpp>
using namespace std;
using namespace boost::property_tree;
string jsonfile = "./twitter.json";
int main()
{
ptree pt;
read_json( jsonfile, pt );
cout<<"in_reply_to_status_id: "<<pt.get("in_reply_to_status_id",0)<<"\n";
}
I want to get all in_reply_to_status_id values from the file. Now it is printing only the first line value. The result is printing follow.
in_reply_to_status_id: 1
I would like to get all values like below.
in_reply_to_status_id: 1
in_reply_to_status_id: 7
in_reply_to_status_id: 343
How can I get all values from the file.
Please help me. Thank you very much.
You should have right json file, for example like this
[
{"id":593393012970926082,"in_reply_to_status_id":1,"user":{"id":2292380240,"followers_count":2},"retweet_count":0},
{"id":654878454684687878,"in_reply_to_status_id":7,"user":{"id":2292380241,"followers_count":4},"retweet_count":5},
{"id":123487894154878414,"in_reply_to_status_id":343,"user":{"id":2292380242,"followers_count":773},"retweet_count":654}
]
And code should be like this
for (const auto& p : pt)
{
cout << p.second.get("in_reply_to_status_id",0) << endl;
}
Instead of range-based for, you can use BOOST_FOREACH for example.
BOOST_FOREACH(const ptree::value_type& p, pt)
You can see my example, first you should get the child tree, and then parse it. My code:
string str = "{\"key\":[{\"id\":1}, {\"id\":2}]}";
stringstream ss(str);
boost::property_tree::ptree parser, child;
boost::property_tree::json_parser::read_json(ss, parser);
child = parser.get_child("key");
for(auto& p : child)
cout << p.second.get<uint32_t>("id") << endl;
I hope this can help you.

C++ boost/regex regex_search

Consider the following string content:
string content = "{'name':'Fantastic gloves','description':'Theese gloves will fit any time period.','current':{'trend':'high','price':'47.1000'}";
I have never used regex_search and I have been searching around for ways to use it - I still do not quite get it. From that random string (it's from an API) how could I grab two things:
1) the price - in this example it is 47.1000
2) the name - in this example Fantastic gloves
From what I have read, regex_search would be the best approach here. I plan on using the price as an integer value, I will use regex_replace in order to remove the "." from the string before converting it. I have only used regex_replace and I found it easy to work with, I don't know why I am struggling so much with regex_search.
Keynotes:
Content is contained inside ' '
Content id and value is separated by :
Conent/value are separated by ,
Value of id's name and price will vary.
My first though was to locate for instance price and then move 3 characters ahead (':') and gather everything until the next ' - however I am not sure if I am completely off-track here or not.
Any help is appreciated.
boost::regex would not be needed. Regular expressions are used for more general pattern matching, whereas your example is very specific. One way to handle your problem is to break the string up into individual tokens. Here is an example using boost::tokenizer:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <map>
int main()
{
std::map<std::string, std::string> m;
std::string content = "{'name':'Fantastic gloves','description':'Theese gloves will fit any time period.','current':{'trend':'high','price':'47.1000'}";
boost::char_separator<char> sep("{},':");
boost::tokenizer<boost::char_separator<char>> tokenizer(content, sep);
std::string id;
for (auto tok = tokenizer.begin(); tok != tokenizer.end(); ++tok)
{
// Since "current" is a special case I added code to handle that
if (*tok != "current")
{
id = *tok++;
m[id] = *tok;
}
else
{
id = *++tok;
m[id] = *++tok; // trend
id = *++tok;
m[id] = *++tok; // price
}
}
std::cout << "Name: " << m["name"] << std::endl;
std::cout << "Price: " << m["price"] << std::endl;
}
Link to live code.
As the string you are attempting to parse appears to be JSON (JavaScript Object Notation), consider using a specialized JSON parser.
You can find a comprehensive list of JSON parsers in many languages including C++ at http://json.org/. Also, I found a discussion on the merits of several JSON parsers for C++ in response to this SO question.

How does one extract the sequence of parsed options using Boost Program Options?

I'm building a graph generator using Boost Graph and Program Options. There are, for example, two types of components C and W, each with 1 source, 1 sink and some additional parameters to specify topology in between. I'd like to be able to stitch them together in the sequence provided by the order of the command line arguments.
For example:
./bin/make_graph -c4,5,1 -w3,3 -c3,1,2
Should create a graph resembling the following:
C -- W -- C
But:
./bin/make_graph -c4,5,1 -c3,1,2 -w3,3
Should create a graph resembling the following:
C -- C -- W
Using boost::program_options, I was unable to determine how to extract the exact order since it "composes" the options of the same string_key into a map with value_type == vector< string > (in my case).
By iterating over the map, the order is lost. Is there a way to not duplicate the parsing, but have a function called (perhaps a callback) every time an option is parsed? I couldn't find documentation in this direction. Any other suggestions?
To convince you that I'm not making this up, here's what I have so far:
namespace bpo = boost::program_options;
std::vector<std::string> args_cat, args_grid, args_web;
bpo::options_description desc("Program options:");
desc.add_options()
.operator ()("help,h","Displays this help message.")
.operator ()("caterpillar,c",bpo::value< std::vector<std::string> >(&args_cat)->default_value( std::vector<std::string>(1,"4,7,2"), "4,7,2" ),"Caterpillar tree with 3 parameters")
.operator ()("grid,g",bpo::value< std::vector<std::string> >(&args_grid)->default_value( std::vector<std::string>(1,"3,4"), "3,4" ),"Rectangular grid with 2 parameters")
.operator ()("web,w",bpo::value< std::vector<std::string> >(&args_web)->default_value( std::vector<std::string>(1,"3,4"), "3,4" ),"Web with 2 parameters")
;
bpo::variables_map ops;
bpo::store(bpo::parse_command_line(argc,argv,desc),ops);
bpo::notify(ops);
if((argc < 2) || (ops.count("help"))) {
std::cout << desc << std::endl;
return;
}
//TODO: remove the following scope block after testing
{
typedef bpo::variables_map::iterator OptionsIterator;
OptionsIterator it = ops.options.begin(), it_end = ops.options.end();
while(it != it_end) {
std::cout << it->first << ": ";
BOOST_FOREACH(std::string value, it->second) {
std::cout << value << " ";
}
std::cout << std::endl;
++it;
}
return;
}
I realize that I could also include the type as a parameter and solve this problem trivially, e.g.:
./bin/make_graph --component c,4,5,1 --component w,3,3 --component c,3,1,2
but that's moving in the direction of writing a parser/validator myself (maybe even without using Boost Program Options):
./bin/make_graph --custom c,4,5,1,w,3,3,c,3,1,2
./bin/make_graph c,4,5,1,w,3,3,c,3,1,2
How would you guys recommend I do this in an elegant way?
Thanks in advance!
PS: I've searched on SO for "[boost] +sequence program options" and "[boost-program-options] +order" (and their variants) before posting this, so I apologize in advance if this turns out to be a duplicate.
Since posting the question, I did some digging and have a "hack" that works with the existing examples I had above.
bpo::parsed_options p_ops = bpo::parse_command_line(argc,argv,desc);
typedef std::vector< bpo::basic_option<char> >::iterator OptionsIterator;
OptionsIterator it = p_ops.options.begin(), it_end = p_ops.options.end();
while(it != it_end) {
std::cout << it->string_key << ": ";
BOOST_FOREACH(std::string value, it->value) {
std::cout << value << " ";
}
std::cout << std::endl;
++it;
}
The reason I call it a hack is because it accesses all arguments as strings, and one would have to extract the types from it much like bpo::variables_map does with the .as<T>() member function. EDIT: It also accesses a member of the options struct directly.
How about this:
./bin/make_graph c,4,5,1 c,3,1,2 w,3,3
Where "c,4,5,1", "c,3,1,2" and "w,3,3" are positional arguments which are stored (in order) in a std::vector<std::string> (just like --input-file in this tutorial) . Then use Boost.Tokenizer or boost::algorithm::split to extract the subtokens from each argument string.
If the graphs can be complex, you should consider making it possible for the user to specify an input file that contains the graph parameters. Boost.Program_Options can parse a user config file that uses the same syntax as the command line options.