Learning Boost.Spirit: parsing INI - c++

I started to learn Boost.Spirit and finish reading Qi - Writing Parsers section. When reading, everything is easy and understandable. But when I try to do something, there are a lot of errors, because there are too many includes and namespaces and I need to know when to include/use them. As the practice, I want to write simple INI parser.
Here is the code (includes are from one of examples inside Spirit lib as almost everything else):
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <map>
namespace client
{
typedef std::map<std::string, std::string> key_value_map_t;
struct mini_ini
{
std::string name;
key_value_map_t key_values_map;
};
} // client
BOOST_FUSION_ADAPT_STRUCT(
client::mini_ini,
(std::string, name)
(client::key_value_map_t, key_values_map)
)
namespace client
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
template <typename Iterator>
struct ini_grammar : qi::grammar<Iterator, mini_ini(), ascii::space_type>
{
ini_grammar() : ini_grammar::base_type(section_, "section")
{
using qi::char_;
using qi::on_error;
using qi::fail;
using namespace qi::labels;
using phoenix::construct;
using phoenix::val;
key_ = +char_("a-zA-Z_0-9");
pair_ = key_ >> '=' >> *char_;
section_ = '[' >> key_ >> ']' >> '\n' >> *(pair_ >> '\n');
key_.name("key");
pair_.name("pair");
section_.name("section");
on_error<fail>
(
section_
, std::cout
<< val("Error! Expecting ")
<< _4 // what failed?
<< val(" here: \"")
<< construct<std::string>(_3, _2) // iterators to error-pos, end
<< val("\"")
<< std::endl
);
}
qi::rule<Iterator, std::string(), ascii::space_type> key_;
qi::rule<Iterator, mini_ini(), ascii::space_type> section_;
qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::space_type> pair_;
};
} // client
int
main()
{
std::string storage =
"[section]\n"
"key1=val1\n"
"key2=val2\n";
client::mini_ini ini;
typedef client::ini_grammar<std::string::const_iterator> ini_grammar;
ini_grammar grammar;
using boost::spirit::ascii::space;
std::string::const_iterator iter = storage.begin();
std::string::const_iterator end = storage.end();
bool r = phrase_parse(iter, end, grammar, space, ini);
if (r && iter == end)
{
std::cout << "-------------------------\n";
std::cout << "Parsing succeeded\n";
std::cout << "-------------------------\n";
return 0;
}
else
{
std::cout << "-------------------------\n";
std::cout << "Parsing failed\n";
std::cout << "-------------------------\n";
std::cout << std::string(iter, end) << "\n";
return 1;
}
return 0;
}
As u can see I want to parse next text into mini_ini struct:
"[section]"
"key1=val1"
"key2=val2";
I have the fail and std::string(iter, end) is full input string.
My questions:
Why I see fail but don't see on_error<fail> handler ?
Have you any recommendations how to learn Boost.Spirit (I have good understanding of documentation in theory, but in practice I have a lot of WHY ???) ?
Thanks

Q. Why I see fail but don't see on_error handler
The on_error handler is only fired for the registered rule (section_) and if an expectation point is failed.
Your grammar doesn't contain expectation points (only >> are used, not >).
Q. Have you any recommendations how to learn Boost.Spirit (I have good understanding of documentation in theory, but in practice I have a lot of WHY ???) ?
Just build the parsers you need. Copy good conventions from the docs and SO answers. There are a lot of them. As you have seen, quite a number contain full examples of Ini parsers with varying levels of error reporting too.
Bonus hints:
Do more detailed status reporting:
bool ok = phrase_parse(iter, end, grammar, space, ini);
if (ok) {
std::cout << "Parse success\n";
} else {
std::cout << "Parse failure\n";
}
if (iter != end) {
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}
return ok && (iter==end)? 0 : 1;
Use BOOST_SPIRIT_DEBUG:
#define BOOST_SPIRIT_DEBUG
// and later
BOOST_SPIRIT_DEBUG_NODES((key_)(pair_)(section_))
Prints:
<section_>
<try>[section]\nkey1=val1\n</try>
<key_>
<try>section]\nkey1=val1\nk</try>
<success>]\nkey1=val1\nkey2=val</success>
<attributes>[[s, e, c, t, i, o, n]]</attributes>
</key_>
<fail/>
</section_>
Parse failure
Remaining unparsed: '[section]
key1=val1
key2=val2
'
You'll notice that the section header isn't parsed because the newline is not matched. Your skipper (space_type) skips the newline, hence it will never match: Boost spirit skipper issues
Fix skipper
When using blank_type as the skipper you'll get a successful parse:
<section_>
<try>[section]\nkey1=val1\n</try>
<key_>
<try>section]\nkey1=val1\nk</try>
<success>]\nkey1=val1\nkey2=val</success>
<attributes>[[s, e, c, t, i, o, n]]</attributes>
</key_>
<pair_>
<try>key1=val1\nkey2=val2\n</try>
<key_>
<try>key1=val1\nkey2=val2\n</try>
<success>=val1\nkey2=val2\n</success>
<attributes>[[k, e, y, 1]]</attributes>
</key_>
<success></success>
<attributes>[[[k, e, y, 1], [v, a, l, 1,
, k, e, y, 2, =, v, a, l, 2,
]]]</attributes>
</pair_>
<success>key1=val1\nkey2=val2\n</success>
<attributes>[[[s, e, c, t, i, o, n], []]]</attributes>
</section_>
Parse success
Remaining unparsed: 'key1=val1
key2=val2
NOTE: The parse succeeds but doesn't do what you want. This is because *char_ includes newlines. So make that
pair_ = key_ >> '=' >> *(char_ - qi::eol); // or
pair_ = key_ >> '=' >> *~char_("\r\n"); // etc
Full code
Live On Coliru
#define BOOST_SPIRIT_DEBUG
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <map>
namespace client
{
typedef std::map<std::string, std::string> key_value_map_t;
struct mini_ini
{
std::string name;
key_value_map_t key_values_map;
};
} // client
BOOST_FUSION_ADAPT_STRUCT(
client::mini_ini,
(std::string, name)
(client::key_value_map_t, key_values_map)
)
namespace client
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
template <typename Iterator>
struct ini_grammar : qi::grammar<Iterator, mini_ini(), ascii::blank_type>
{
ini_grammar() : ini_grammar::base_type(section_, "section")
{
using qi::char_;
using qi::on_error;
using qi::fail;
using namespace qi::labels;
using phoenix::construct;
using phoenix::val;
key_ = +char_("a-zA-Z_0-9");
pair_ = key_ >> '=' >> *char_;
section_ = '[' >> key_ >> ']' >> '\n' >> *(pair_ >> '\n');
BOOST_SPIRIT_DEBUG_NODES((key_)(pair_)(section_))
on_error<fail>
(
section_
, std::cout
<< val("Error! Expecting ")
<< _4 // what failed?
<< val(" here: \"")
<< construct<std::string>(_3, _2) // iterators to error-pos, end
<< val("\"")
<< std::endl
);
}
qi::rule<Iterator, std::string(), ascii::blank_type> key_;
qi::rule<Iterator, mini_ini(), ascii::blank_type> section_;
qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::blank_type> pair_;
};
} // client
int
main()
{
std::string storage =
"[section]\n"
"key1=val1\n"
"key2=val2\n";
client::mini_ini ini;
typedef client::ini_grammar<std::string::const_iterator> ini_grammar;
ini_grammar grammar;
using boost::spirit::ascii::blank;
std::string::const_iterator iter = storage.begin();
std::string::const_iterator end = storage.end();
bool ok = phrase_parse(iter, end, grammar, blank, ini);
if (ok) {
std::cout << "Parse success\n";
} else {
std::cout << "Parse failure\n";
}
if (iter != end) {
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}
return ok && (iter==end)? 0 : 1;
}

Related

Spirit Qi First Parser

What did I mess up here? I'm getting 'start': undeclared identifier but I stuck pretty closely to the tutorial, so I'm not sure where I made a typo, or what I did wrong. Any hints? You all see the same thing, right?
#include <iostream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <iostream>
#include <string>
#include <boost/array.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi_no_skip.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
using qi::lit;
using qi::int_;
using qi::double_;
using ascii::char_;
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::no_skip;
using qi::eoi;
struct LETTER
{
char hi;
// int fourtytwo;
// char mom;
};
BOOST_FUSION_ADAPT_STRUCT(
LETTER,
(char, hi)
// (int, fourtytwo)
// (char, mom)
)
template <typename Iterator>
struct LETTERParser : qi::grammar<Iterator, LETTER(), ascii::space_type>
{
LETTERParser(): LETTERParser::base_type(start)
{
start %= lit("LETTER") >> char_;
// >> char_
// >> int_
// >> char_
// >> eoi
// ;
}
};
const std::string wat("Z");
int main()
{
LETTERParser<std::string::const_iterator> f;
LETTER example;
phrase_parse(wat.begin(), wat.end(), f, no_skip, example);
return 0;
}
There are a number of issues, one of which is non obvious
where's no_skip? Why are you passing it to a grammar that requires ascii::space_type?
where is the start rule declared?
don't pollute global namespace - it creates hard problems in generic code
handle errors
the grammar starts with a mandatory character sequence, which doesn't match the input
the non-obvious one: single-element structs interfere in unfortunate ways in Spirit/Fusion land.
Simplify:
Fixing the above and modernizing (c++11) the fusion adaptation:
live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
struct LETTER {
char hi;
int fourtytwo;
char mom;
};
BOOST_FUSION_ADAPT_STRUCT(LETTER, hi, fourtytwo, mom)
template <typename Iterator> struct LETTERParser : qi::grammar<Iterator, LETTER(), qi::ascii::space_type> {
LETTERParser() : LETTERParser::base_type(start) {
using qi::char_;
using qi::int_;
start = "LETTER" >> char_ >> int_ >> char_;
}
private:
qi::rule<Iterator, LETTER(), qi::ascii::space_type> start;
};
int main() {
const std::string input("LETTER Z 42m");
using It = std::string::const_iterator;
LETTERParser<It> parser;
LETTER example;
It f = input.begin(), l = input.end();
if (phrase_parse(f, l, parser, qi::ascii::space, example)) {
std::cout << "parsed: " << boost::fusion::as_vector(example) << "\n";
} else {
std::cout << "couldn't parse '" << input << "'\n";
}
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Prints
parsed: (Z 42 m)
Single Element:
You're in, luck it doesn't bite in your case:
Live On Coliru
Prints
parsed: (Z)
Remaining unparsed input: '42m'
as expected. If it strikes in the future, refer here e.g. Size of struct with a single element
Bonus
Consider encapsulating the choice of skipper. The caller should probably never be able to override it Live On Coliru - see also Boost spirit skipper issues

How to incrementally parse (and act on) a large file with Boost.Spirit.Qi?

I've created a Qi parser for a custom text file format. There are tens of thousands of entries to process and each entry usually has between 1-10 subentries. I put a trimmed down working example of my parser
here.
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <fstream>
#include <iostream>
#include <string>
using std::string;
using std::vector;
using std::cout;
using std::endl;
namespace model
{
namespace qi = boost::spirit::qi;
struct spectrum
{
string comment;
string file;
string nativeId;
double precursorMz;
int precursorCharge;
double precursorIntensity;
};
struct cluster
{
string id;
vector<spectrum> spectra;
};
struct clustering
{
string name;
vector<cluster> clusters;
};
}
// Tell fusion about the data structures to make them first-class fusion citizens.
// Must be at global scope.
BOOST_FUSION_ADAPT_STRUCT(
model::spectrum,
(string, comment)
(string, file)
(string, nativeId)
(double, precursorMz)
(int, precursorCharge)
(double, precursorIntensity)
)
BOOST_FUSION_ADAPT_STRUCT(
model::cluster,
(string, id)
(std::vector<model::spectrum>, spectra)
)
BOOST_FUSION_ADAPT_STRUCT(
model::clustering,
(string, name)
(std::vector<model::cluster>, clusters)
)
namespace {
struct ReportError
{
template<typename, typename, typename, typename> struct result { typedef void type; };
// contract the string to the surrounding new-line characters
template<typename Iter>
void operator()(Iter first_iter, Iter last_iter,
Iter error_iter, const boost::spirit::qi::info& what) const
{
std::string first(first_iter, error_iter);
std::string last(error_iter, last_iter);
auto first_pos = first.rfind('\n');
auto last_pos = last.find('\n');
auto error_line = ((first_pos == std::string::npos) ? first
: std::string(first, first_pos + 1))
+ std::string(last, 0, last_pos);
//auto error_pos = (error_iter - first_iter) + 1;
/*auto error_pos = error
if (first_pos != std::string::npos)
error_pos -= (first_pos + 1);*/
std::cerr << "Error parsing in " << what << std::endl
<< error_line << std::endl
//<< std::setw(error_pos) << '^'
<< std::endl;
}
};
const boost::phoenix::function<ReportError> report_error = ReportError();
}
namespace model
{
template <typename Iterator>
struct cluster_parser : qi::grammar<Iterator, clustering(), qi::blank_type>
{
cluster_parser() : cluster_parser::base_type(clusters)
{
using qi::int_;
using qi::lit;
using qi::double_;
using qi::bool_;
using qi::lexeme;
using qi::eol;
using qi::ascii::char_;
using qi::on_error;
using qi::fail;
using namespace qi::labels;
using boost::phoenix::construct;
using boost::phoenix::val;
quoted_string %= lexeme['"' > +(char_ - '"') > '"'];
spectrum_start %=
lit("SPEC") >
"#" > +(char_ - "File:") >
"File:" > quoted_string > lit(",") >
"NativeID:" > quoted_string >
bool_ > double_ > int_ > double_;
cluster_start %=
"=Cluster=" > eol >
"id=" > +(char_ - eol) > eol >
spectrum_start % eol;
clusters %=
"name=" > +(char_ - eol) > eol >
eol >
cluster_start % eol;
BOOST_SPIRIT_DEBUG_NODES((clusters)(cluster_start)(quoted_string)(spectrum_start))
//on_error<fail>(clusters, report_error(_1, _2, _3, _4));
//on_error<fail>(cluster_start, report_error(_1, _2, _3, _4));
//on_error<fail>(spectrum_start, report_error(_1, _2, _3, _4));
//on_error<fail>(quoted_string, report_error(_1, _2, _3, _4));
// on_success(cluster_start, quantify_cluster(_1, _2, _3, _4)); ??
}
qi::rule<Iterator, std::string(), qi::blank_type> quoted_string;
qi::rule<Iterator, cluster(), qi::blank_type> cluster_start;
qi::rule<Iterator, spectrum(), qi::blank_type> spectrum_start;
qi::rule<Iterator, clustering(), qi::blank_type> clusters;
};
}
int main()
{
using namespace model;
cluster_parser<boost::spirit::istream_iterator> g; // Our grammar
string str;
//std::ifstream input("c:/test/Mo_tai.clustering");
std::istringstream input("name=GreedyClustering_0.99\n"
"\n"
"=Cluster=\n"
"id=9c8c5830-5841-4f77-b819-64180509615b\n"
"SPEC\t#file=w:\\test\\Mo_Tai_iTRAQ_f4.mgf#id=index=219#title=Mo_Tai_iTRAQ_f4.1254.1254.2 File:\"Mo_Tai_iTRAQ_f4.raw\", NativeID:\"controllerType=0 controllerNumber=1 scan=1254\"\ttrue\t\t300.1374\t2\t\t\t0.0\n"
"=Cluster=\n"
"id=f8f384a1-3d5f-4af1-9581-4d03a5aa3342\n"
"SPEC\t#file=w:\\test\\Mo_Tai_iTRAQ_f9.mgf#id=index=560#title=Mo_Tai_iTRAQ_f9.1666.1666.3 File:\"Mo_Tai_iTRAQ_f9.raw\", NativeID:\"controllerType=0 controllerNumber=1 scan=1666\"\ttrue\t\t300.14413\t3\t\t\t0.0\n"
"SPEC\t#file=w:\\test\\Mo_Tai_iTRAQ_f9.mgf#id=index=520#title=Mo_Tai_iTRAQ_f9.1621.1621.3 File:\"Mo_Tai_iTRAQ_f9.raw\", NativeID:\"controllerType=0 controllerNumber=1 scan=1621\"\ttrue\t\t300.14197\t3\t\t\t0.0\n"
"=Cluster=\n"
"id=b84b79e1-44bc-44c0-a9af-5391ca02582d\n"
"SPEC\t#file=w:\\test\\Mo_Tai_iTRAQ_f2.mgf#id=index=7171#title=Mo_Tai_iTRAQ_f2.12729.12729.2 File:\"Mo_Tai_iTRAQ_f2.raw\", NativeID:\"controllerType=0 controllerNumber=1 scan=12729\"\ttrue\t\t300.15695\t2\t\t\t0.0");
input.unsetf(std::ios::skipws);
boost::spirit::istream_iterator begin(input);
boost::spirit::istream_iterator end;
clustering clusteringResults;
bool r = phrase_parse(begin, end, g, qi::blank, clusteringResults);
if (r && begin == end)
{
cout << "Parsing succeeded (" << clusteringResults.clusters.size() << " clusters)\n";
/*for (size_t i = 0; i < std::min((size_t)10, clusteringResults.clusters.size()); ++i)
{
cluster& c = clusteringResults.clusters[i];
cout << "Cluster " << c.id << " - avg. precursor m/z: " << c.avgPrecursorMz << ", num. spectra: " << c.spectra.size() << endl;
}*/
return 1;
}
else
{
std::cout << "Parsing failed (" << clusteringResults.clusters.size() << " clusters)\n";
if (!clusteringResults.clusters.empty())
{
cluster& c = clusteringResults.clusters.back();
cout << "Last cluster parsed " << c.id << ", num. spectra: " << c.spectra.size() << endl;
}
return 1;
}
}
I don't want to parse the entire file into memory before processing it. How can I make it queue up an entry (cluster) for processing after each cluster is finished parsing, delete the cluster after processing, then continue parsing? Even better would be to have another thread handle the processing asynchronously.
Just use streaming iterators.
Or operate on a memory mapped file.
On the processing side, push actions onto a queue from inside a semantic action.
Note: you could run into a supposed bug that doesn't clear the backtrack buffers properly; You might want to check this and take preventative measures as described in this answer: Boost spirit memory leak using flush_multi_pass
Live Demo
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/include/io.hpp>
namespace model
{
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
struct spectrum {
std::string comment;
std::string file;
std::string nativeId;
double precursorMz;
int precursorCharge;
double precursorIntensity;
};
struct cluster {
std::string id;
std::vector<spectrum> spectra;
};
}
BOOST_FUSION_ADAPT_STRUCT(model::spectrum, comment, file, nativeId, precursorMz, precursorCharge, precursorIntensity)
BOOST_FUSION_ADAPT_STRUCT(model::cluster, id, spectra)
namespace model
{
template <typename Iterator>
struct cluster_parser : qi::grammar<Iterator>
{
cluster_parser(std::function<void(std::string const&, model::cluster const&)> handler)
: cluster_parser::base_type(start),
submit_(handler)
{
using namespace qi;
quoted_string %= lexeme['"' > +(char_ - '"') > '"'];
spectrum_start %=
lit("SPEC") >
"#" > +(char_ - "File:") >
"File:" > quoted_string > lit(",") >
"NativeID:" > quoted_string >
bool_ > double_ > int_ > double_;
cluster_start %=
"=Cluster=" > eol >
"id=" > +(char_ - eol) > eol >
spectrum_start % eol;
clusters %=
"name=" > qi::as_string[ +(char_ - eol) ][ name_ = _1 ] > eol > eol >
cluster_start [ submit_(name_, _1) ] % eol;
start = skip(blank) [clusters];
BOOST_SPIRIT_DEBUG_NODES((start)(clusters)(cluster_start)(quoted_string)(spectrum_start))
}
private:
qi::_a_type name_;
px::function<std::function<void(std::string const&, model::cluster const&)> > submit_;
qi::rule<Iterator, std::string(), qi::blank_type> quoted_string;
qi::rule<Iterator, cluster(), qi::blank_type> cluster_start;
qi::rule<Iterator, spectrum(), qi::blank_type> spectrum_start;
qi::rule<Iterator, qi::locals<std::string>, qi::blank_type> clusters;
qi::rule<Iterator> start;
};
}
int main()
{
using namespace model;
cluster_parser<boost::spirit::istream_iterator> g([&](auto const&...){std::cout << "handled\n";}); // Our grammar
std::string str;
//std::ifstream input("c:/test/Mo_tai.clustering");
std::istringstream input(R"(name=GreedyClustering_0.99
=Cluster=
id=9c8c5830-5841-4f77-b819-64180509615b
SPEC #file=w:\test\Mo_Tai_iTRAQ_f4.mgf#id=index=219#title=Mo_Tai_iTRAQ_f4.1254.1254.2 File:"Mo_Tai_iTRAQ_f4.raw", NativeID:"controllerType=0 controllerNumber=1 scan=1254" true 300.1374 2 0.0
=Cluster=
id=f8f384a1-3d5f-4af1-9581-4d03a5aa3342
SPEC #file=w:\test\Mo_Tai_iTRAQ_f9.mgf#id=index=560#title=Mo_Tai_iTRAQ_f9.1666.1666.3 File:"Mo_Tai_iTRAQ_f9.raw", NativeID:"controllerType=0 controllerNumber=1 scan=1666" true 300.14413 3 0.0
SPEC #file=w:\test\Mo_Tai_iTRAQ_f9.mgf#id=index=520#title=Mo_Tai_iTRAQ_f9.1621.1621.3 File:"Mo_Tai_iTRAQ_f9.raw", NativeID:"controllerType=0 controllerNumber=1 scan=1621" true 300.14197 3 0.0
=Cluster=
id=b84b79e1-44bc-44c0-a9af-5391ca02582d
SPEC #file=w:\test\Mo_Tai_iTRAQ_f2.mgf#id=index=7171#title=Mo_Tai_iTRAQ_f2.12729.12729.2 File:"Mo_Tai_iTRAQ_f2.raw", NativeID:"controllerType=0 controllerNumber=1 scan=12729" true 300.15695 2 0.0)");
input.unsetf(std::ios::skipws);
boost::spirit::istream_iterator begin(input);
boost::spirit::istream_iterator end;
bool r = phrase_parse(begin, end, g, qi::blank);
if (r && begin == end) {
std::cout << "Parsing succeeded\n";
}
else {
std::cout << "Parsing failed\n";
}
if (begin!=end) {
std::cout << "Unparsed remaining input: '" << std::string(begin, end) << "\n";
}
return (r && begin==end)? 0 : 1;
}
Prints
handled
handled
handled
Parsing succeeded
BONUS: Threaded workers
Here's a version that dispatches the clusters for asynchronous processing on a thread pool.
Note that the submit method posts a lambda to the service. The lambda captures by value because the lifetime of the parameters should extend during the processing.
Live On Coliru
#include <boost/asio.hpp>
#include <boost/thread.hpp>
namespace ba = boost::asio;
struct Processing {
Processing() {
for (unsigned i=0; i < boost::thread::hardware_concurrency(); ++i)
_threads.create_thread([this] { _svc.run(); });
}
~Processing() {
_work.reset();
_threads.join_all();
}
void submit(std::string const& name, model::cluster const& cluster) {
_svc.post([=] { do_processing(name, cluster); });
}
private:
void do_processing(std::string const& name, model::cluster const& cluster) {
std::cout << "Thread " << boost::this_thread::get_id() << ": " << name << " cluster of " << cluster.spectra.size() << " spectra\n";
boost::this_thread::sleep_for(boost::chrono::milliseconds(950));
}
ba::io_service _svc;
boost::optional<ba::io_service::work> _work = ba::io_service::work(_svc);
boost::thread_group _threads;
};
[...snip...] and in main:
Processing processing;
auto handler = [&processing](auto&... args) { processing.submit(args...); };
cluster_parser<boost::spirit::istream_iterator> g(handler); // Our grammar
The rest is unmodified, and now it prints (e.g.):
Thread 7f0144a5b700: GreedyClustering_0.99 cluster of 1 spectra
Thread 7f014425a700: GreedyClustering_0.99 cluster of 2 spectra
Parsing succeeded
Thread 7f0143a59700: GreedyClustering_0.99 cluster of 1 spectra

making a vector of shared pointers from Spirit Qi

This is a followup question from a previous question.
I can parse into vectors of strings from my grammar, but I cannot seem to parse into a vector of shared pointers to strings; i.e. std::vector<std::shared_ptr<std::string> >, and need a bit of help.
My compiling header:
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/phoenix/bind/bind_member_function.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
// this solution for lazy make shared comes from the SO forum, user sehe.
// https://stackoverflow.com/questions/21516201/how-to-create-boost-phoenix-make-shared
// post found using google search terms `phoenix construct shared_ptr`
// changed from boost::shared_ptr to std::shared_ptr
namespace {
template <typename T>
struct make_shared_f
{
template <typename... A> struct result
{ typedef std::shared_ptr<T> type; };
template <typename... A>
typename result<A...>::type operator()(A&&... a) const {
return std::make_shared<T>(std::forward<A>(a)...);
}
};
template <typename T>
using make_shared_ = boost::phoenix::function<make_shared_f<T> >;
}
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template<typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::shared_ptr<std::string> >(), Skipper>
{
SystemParser() : SystemParser::base_type(variable_group_)
{
namespace phx = boost::phoenix;
using qi::_1;
using qi::_val;
using qi::eps;
using qi::lit;
var_counter = 0;
declarative_symbols.add("variable_group",0);
variable_group_ = "variable_group" > genericvargp_ > ';';
genericvargp_ = new_variable_ % ','; //
new_variable_ = unencountered_symbol_ [_val = make_shared_<std::string>() (_1)];
unencountered_symbol_ = valid_variable_name_ - ( encountered_variables | declarative_symbols );
valid_variable_name_ = +qi::alpha >> *(qi::alnum | qi::char_("[]_") );
// debug(variable_group_); debug(unencountered_symbol_); debug(new_variable_); debug(genericvargp_);
// BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
}
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::shared_ptr<std::string> >(), Skipper > variable_group_;
qi::rule<Iterator, std::vector<std::shared_ptr<std::string> >(), Skipper > genericvargp_;
qi::rule<Iterator, std::shared_ptr<std::string()> > new_variable_;
qi::rule<Iterator, std::string()> unencountered_symbol_;
qi::rule<Iterator, std::string()> valid_variable_name_;
unsigned var_counter;
qi::symbols<char,int> encountered_variables;
qi::symbols<char,int> declarative_symbols;
};
with driver code:
int main(int argc, char** argv)
{
std::vector<std::shared_ptr<std::string> > V;
std::string str = "variable_group x, y, z; ";
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
SystemParser<std::string::const_iterator> S;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
if (s)
{
std::cout << "Parse succeeded: " << V.size() << " variables\n";
for (auto& s : V)
std::cout << " - '" << s << "'\n";
}
else
std::cout << "Parse failed\n";
if (iter!=end)
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
return 0;
}
The text is parsed correctly, but the resulting vector is of length 0, while it should be of length 3. Somehow, the std::shared_ptr<string> is not pushed onto the back of the vector resulting from the rule genericvargp_.
I've tried many things, including reading all the debug information from a test parse, and placement of the %= signs for rule definitions, which should be used for rules for which there is a semantic action that does not assign _val unless I am mistaken. I've also played all night and day with using phx::bind to manually push onto the back of _val, but got nowhere. I've further verified that the make_shared_ provided by sehe in another answer is in fact lazy for std::shared_ptr.
As an aside, I have also struggled with getting the result of an unencountered_symbol_ to add to encountered_variables so as to enforce uniqueness of variable names...
The problem seems to be the propagation of the result of the new_variable_ rule onto the desired vector of shared pointers in the genericvargp_ rule.
This declaration
qi::rule<Iterator, std::shared_ptr<std::string()> > new_variable_;
Doesn't match the desired type:
qi::rule<Iterator, std::shared_ptr<std::string>() > new_variable_;
Sadly, in old SpiritV2 this attribute is silently ignored and no attribute propagation is done. This also explains why it didn't error out on compile time.
Live On Coliru
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#define BOOST_SPIRIT_DEBUG 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/phoenix/bind/bind_member_function.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
// this solution for lazy make shared comes from the SO forum, user sehe.
// https://stackoverflow.com/questions/21516201/how-to-create-boost-phoenix-make-shared
// post found using google search terms `phoenix construct shared_ptr`
// changed from boost::shared_ptr to std::shared_ptr
namespace {
template <typename T> struct make_shared_f {
template <typename... A> struct result { typedef std::shared_ptr<T> type; };
template <typename... A> typename result<A...>::type operator()(A &&... a) const {
return std::make_shared<T>(std::forward<A>(a)...);
}
};
template <typename T> using make_shared_ = boost::phoenix::function<make_shared_f<T> >;
}
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::shared_ptr<std::string> >(), Skipper> {
SystemParser() : SystemParser::base_type(variable_group_) {
namespace phx = boost::phoenix;
using qi::_1;
using qi::_val;
using qi::eps;
using qi::lit;
var_counter = 0;
declarative_symbols.add("variable_group", 0);
variable_group_ = "variable_group" > genericvargp_ > ';';
genericvargp_ = new_variable_ % ','; //
new_variable_ = unencountered_symbol_ [_val = make_shared_<std::string>()(_1)];
unencountered_symbol_ = valid_variable_name_ - (encountered_variables | declarative_symbols);
valid_variable_name_ = +qi::alpha >> *(qi::alnum | qi::char_("[]_"));
BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
}
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::shared_ptr<std::string> >(), Skipper> variable_group_;
qi::rule<Iterator, std::vector<std::shared_ptr<std::string> >(), Skipper> genericvargp_;
qi::rule<Iterator, std::shared_ptr<std::string>() > new_variable_;
qi::rule<Iterator, std::string()> unencountered_symbol_;
qi::rule<Iterator, std::string()> valid_variable_name_;
unsigned var_counter;
qi::symbols<char, qi::unused_type> encountered_variables;
qi::symbols<char, qi::unused_type> declarative_symbols;
};
int main()
{
std::vector<std::shared_ptr<std::string> > V;
std::string str = "variable_group x, y, z; ";
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
SystemParser<std::string::const_iterator> S;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
if (s)
{
std::cout << "Parse succeeded: " << V.size() << " variables\n";
for (auto& s : V)
std::cout << " - '" << *s << "'\n";
}
else
std::cout << "Parse failed\n";
if (iter!=end)
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}
Prints
Parse succeeded: 3 variables
- 'x'
- 'y'
- 'z'
As well as a lot of debug information

unhandled exception using Boost Spirit to parse grammar

I am trying to use Boost Spirit to parse the following grammar:
sentence:
noun verb
sentence conjunction sentence
conjunction:
"and"
noun:
"birds"
"cats"
verb:
"fly"
"meow"
parsing succeeds when the grammar only includes noun >> verb rule.
when grammar is modified to include sentence>>conjunction>>sentence rule and i supply an invalid input such as "birds fly" instead of "birdsfly" i get an unhandled exception when the program runs.
here is the code which is modified from examples found on boost doc
#define BOOST_VARIANT_MINIMIZE_SIZE
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#include <boost/spirit/include/phoenix_container.hpp>
#include <iostream>
#include <string>
using namespace boost::spirit;
using namespace boost::spirit::ascii;
template <typename Lexer>
struct token_list : lex::lexer<Lexer>
{
token_list()
{
noun = "birds|cats";
verb = "fly|meow";
conjunction = "and";
this->self.add
(noun)
(verb)
(conjunction)
;
}
lex::token_def<std::string> noun, verb, conjunction;
};
template <typename Iterator>
struct Grammar : qi::grammar<Iterator>
{
template <typename TokenDef>
Grammar(TokenDef const& tok)
: Grammar::base_type(sentence)
{
sentence = (tok.noun>>tok.verb)
|
(sentence>>tok.conjunction>>sentence)>>eoi
;
}
qi::rule<Iterator> sentence;
};
int main()
{
typedef lex::lexertl::token<char const*, boost::mpl::vector<std::string>> token_type;
typedef lex::lexertl::lexer<token_type> lexer_type;
typedef token_list<lexer_type>::iterator_type iterator_type;
token_list<lexer_type> word_count;
Grammar<iterator_type> g (word_count);
std::string str = "birdsfly";
//std::string str = "birds fly"; this input caused unhandled exception
char const* first = str.c_str();
char const* last = &first[str.size()];
bool r = lex::tokenize_and_parse(first, last, word_count, g);
if (r) {
std::cout << "Parsing passed"<< "\n";
}
else {
std::string rest(first, last);
std::cerr << "Parsing failed\n" << "stopped at: \""
<< rest << "\"\n";
}
system("PAUSE");
return 0;
}
You have left-recursion in the second branch of the sentence rule.
sentence = sentence >> ....
will always recurse on sentence, so you're seeing a stackoverflow.
I suggest writing the rule like, e.g:
sentence =
(tok.noun >> tok.verb)
>> *(tok.conjunction >> sentence)
>> qi::eoi
;
Now the result reads
g++ -Wall -pedantic -std=c++0x -g -O0 test.cpp -o test
Parsing failed
stopped at: " fly"
(and the inevitable "sh: PAUSE: command not found" of course...)
PS. Don't using namespace please. Instead:
namespace qi = boost::spirit::qi;
namespace lex = boost::spirit::lex;
Here's a cleaned up version with some other stuff removed/fixed: http://coliru.stacked-crooked.com/view?id=1fb26ca3e8c207979eaaf4592c319316-e223fd4a885a77b520bbfe69dda8fb91
#define BOOST_VARIANT_MINIMIZE_SIZE
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
// #include <boost/spirit/include/phoenix.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace lex = boost::spirit::lex;
template <typename Lexer>
struct token_list : lex::lexer<Lexer>
{
token_list()
{
noun = "birds|cats";
verb = "fly|meow";
conjunction = "and";
this->self.add
(noun)
(verb)
(conjunction)
;
}
lex::token_def<std::string> noun, verb, conjunction;
};
template <typename Iterator>
struct Grammar : qi::grammar<Iterator>
{
template <typename TokenDef>
Grammar(TokenDef const& tok) : Grammar::base_type(sentence)
{
sentence =
(tok.noun >> tok.verb)
>> *(tok.conjunction >> sentence)
>> qi::eoi
;
}
qi::rule<Iterator> sentence;
};
int main()
{
typedef std::string::const_iterator It;
typedef lex::lexertl::token<It, boost::mpl::vector<std::string>> token_type;
typedef lex::lexertl::lexer<token_type> lexer_type;
typedef token_list<lexer_type>::iterator_type iterator_type;
token_list<lexer_type> word_count;
Grammar<iterator_type> g(word_count);
//std::string str = "birdsfly";
const std::string str = "birds fly";
It first = str.begin();
It last = str.end();
bool r = lex::tokenize_and_parse(first, last, word_count, g);
if (r) {
std::cout << "Parsing passed"<< "\n";
}
else {
std::string rest(first, last);
std::cerr << "Parsing failed\n" << "stopped at: \"" << rest << "\"\n";
}
}

How do you output the original unparsed code (as a comment) from a spirit parser

Given the input string: A = 23; B = 5, I currently get the (expected) output:
Output: 0xa0000023
Output: 0xa0010005
-------------------------
I would like to see this instead:
Output: 0xa0000023 // A = 23
Output: 0xa0010005 // B = 5
-------------------------
The core line of code is:
statement = eps[_val = 0x50000000] >> identifier[_val += _1<<16] >>
"=" >> hex[_val += (_1 & 0x0000FFFF)];
Where identifier is a qi::symbols table lookup.
The rest of my code looks like this:
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
#include <iomanip>
#include <ios>
#include <string>
#include <complex>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
struct reg16_ : qi::symbols<char,unsigned> {
reg16_() {
add ("A", 0) ("B", 1) ("C", 2) ("D", 3) ;
}
} reg16;
template <typename Iterator>
struct dash_script_parser : qi::grammar<Iterator, std::vector<unsigned>(), ascii::space_type> {
dash_script_parser() : dash_script_parser::base_type(start) {
using qi::hex;
using qi::_val;
using qi::_1;
using qi::eps;
identifier %= reg16;
start %= (statement % ";" );
statement = eps[_val = 0x50000000] >> identifier[_val += _1<<16]>> "=" >> hex[_val += (_1 & 0x0000FFFF)];
}
qi::rule<Iterator, std::vector<unsigned>(), ascii::space_type> start;
qi::rule<Iterator, unsigned(), ascii::space_type> statement;
qi::rule<Iterator, unsigned()> identifier;
};
int
main()
{
std::cout << "\t\tA parser for Spirit...\n\n" << "Type [q or Q] to quit\n\n";
dash_script_parser<std::string::const_iterator> g;
std::string str;
while (getline(std::cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q') break;
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
std::vector<unsigned> strs;
bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, strs);
if (r && iter == end) {
for(std::vector<unsigned>::const_iterator it=strs.begin(); it<strs.end(); ++it)
std::cout << "Output: 0x" << std::setw(8) << std::setfill('0') << std::hex <<*it << "\n";
} else
std::cout << "Parsing failed\n";
}
return 0;
}
Update A newer answer brought iter_pos to my attention (from Boost Spirit Repository):
How do I capture the original input into the synthesized output from a spirit grammar?
This basically does the same as below, but without 'abusing' semantic actions (making it a much better fit, especially with automatic attribute propagation.
My gut feeling says that it will probably be easier to isolate statements into raw source iterator ranges first, and then parse the statements in isolation. That way, you'll have the corresponding source text at the start.
With that out of the way, here is an approach I tested to work without subverting your sample code too much:
1. Make the attribute type a struct
Replace the primitive unsigned with a struct that also contains the source snippet, verbatim, as a string:
struct statement_t
{
unsigned value;
std::string source;
};
BOOST_FUSION_ADAPT_STRUCT(statement_t, (unsigned, value)(std::string, source));
2. Make the parser fill both fields
The good thing is, you were already using semantic actions, so it is merely building onto that. Note that the result is not very pretty, and would benefit hugely from being converted into a (fused) functor. But it shows the technique very clearly:
start %= (statement % ";" );
statement = qi::raw [
raw[eps] [ at_c<0>(_val) = 0x50000000 ]
>> identifier [ at_c<0>(_val) += _1<<16 ]
>> "=" >> hex [ at_c<0>(_val) += (_1 & 0x0000FFFF) ]
]
[ at_c<1>(_val) = construct<std::string>(begin(_1), end(_1)) ]
;
3. Print
So, at_c<0>(_val) corresponds to statement::value, and at_c<1>(_val) corresponds to statement::source. This slightly modified output loop:
for(std::vector<statement_t>::const_iterator it=strs.begin(); it<strs.end(); ++it)
std::cout << "Output: 0x" << std::setw(8) << std::setfill('0') << std::hex << it->value << " // " << it->source << "\n";
outputs:
Output: 0x50000023 // A = 23
Output: 0x50010005 // B = 5
Full sample
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
#include <iomanip>
#include <ios>
#include <string>
#include <complex>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
namespace phx = boost::phoenix;
struct reg16_ : qi::symbols<char,unsigned> {
reg16_() {
add ("A", 0) ("B", 1) ("C", 2) ("D", 3) ;
}
} reg16;
struct statement_t
{
unsigned value;
std::string source;
};
BOOST_FUSION_ADAPT_STRUCT(statement_t, (unsigned, value)(std::string, source));
template <typename Iterator>
struct dash_script_parser : qi::grammar<Iterator, std::vector<statement_t>(), ascii::space_type> {
dash_script_parser() : dash_script_parser::base_type(start) {
using qi::hex;
using qi::_val;
using qi::_1;
using qi::eps;
using qi::raw;
identifier %= reg16;
using phx::begin;
using phx::end;
using phx::at_c;
using phx::construct;
start %= (statement % ";" );
statement = raw [
raw[eps] [ at_c<0>(_val) = 0x50000000 ]
>> identifier [ at_c<0>(_val) += _1<<16 ]
>> "=" >> hex [ at_c<0>(_val) += (_1 & 0x0000FFFF) ]
]
[ at_c<1>(_val) = construct<std::string>(begin(_1), end(_1)) ]
;
}
qi::rule<Iterator, std::vector<statement_t>(), ascii::space_type> start;
qi::rule<Iterator, statement_t(), ascii::space_type> statement;
qi::rule<Iterator, unsigned()> identifier;
};
int
main()
{
std::cout << "\t\tA parser for Spirit...\n\n" << "Type [q or Q] to quit\n\n";
dash_script_parser<std::string::const_iterator> g;
std::string str;
while (getline(std::cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q') break;
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
std::vector<statement_t> strs;
bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, strs);
if (r && iter == end) {
for(std::vector<statement_t>::const_iterator it=strs.begin(); it<strs.end(); ++it)
std::cout << "Output: 0x" << std::setw(8) << std::setfill('0') << std::hex << it->value << " // " << it->source << "\n";
} else
std::cout << "Parsing failed\n";
}
return 0;
}