Parsing comma separated grammars when unordered - c++

From a previous post I found a way to parse with boost::spirit a struct of this type:
"parameter" : {
"name" : "MyName" ,
"type" : "MyType" ,
"unit" : "MyUnit" ,
"cardinality" : "MyCardinality",
"value" : "MyValue"
}
It's a simple JSON with key-value pairs. Now I want to parse this struct regardless to variable orders. I.e. I want to parse into the same object also this struct:
"parameter" : {
"type" : "MyType" ,
"value" : "MyValue" ,
"unit" : "MyUnit" ,
"cardinality" : "MyCardinality",
"name" : "MyName"
}
I know that I can use the ^ operator in order to parse data in any order but I dont't know how to handles commas at ends of lines but last. How can I parse both structures?
This is the #sehe code from previous post. Grammar is defined here.
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
// This is pasted and copied from another header file
namespace StateMachine {
namespace Private {
struct LuaParameterData {
std::wstring name;
std::wstring type;
std::wstring unit;
std::wstring cardinality;
std::wstring value;
};
} // namespace Private
} // namespace StateMachine
BOOST_FUSION_ADAPT_STRUCT(
StateMachine::Private::LuaParameterData,
(std::wstring, name)
(std::wstring, type)
(std::wstring, unit)
(std::wstring, cardinality)
(std::wstring, value)
)
namespace qi = boost::spirit::qi;
// From here original file continues
namespace StateMachine {
namespace Private {
template<typename Iterator>
struct LuaParameterDataParser : qi::grammar<Iterator, LuaParameterData(), qi::ascii::space_type>
{
LuaParameterDataParser() : LuaParameterDataParser::base_type(start)
{
quotedString = qi::lexeme['"' >> +(qi::ascii::char_ - '"') >> '"'];
start =
qi::lit("\"parameter\"")
>> ':'
>> '{'
>> qi::lit("\"name\"" ) >> ':' >> quotedString >> ','
>> qi::lit("\"type\"" ) >> ':' >> quotedString >> ','
>> qi::lit("\"unit\"" ) >> ':' >> quotedString >> ','
>> qi::lit("\"cardinality\"") >> ':' >> quotedString >> ','
>> qi::lit("\"value\"" ) >> ':' >> quotedString
>> '}'
;
BOOST_SPIRIT_DEBUG_NODES((start)(quotedString));
}
qi::rule<Iterator, std::string(), qi::ascii::space_type> quotedString;
qi::rule<Iterator, LuaParameterData(), qi::ascii::space_type> start;
};
} // namespace Private
} // namespace StateMachine
int main() {
using It = std::string::const_iterator;
std::string const input = R"(
"parameter" : {
"name" : "name" ,
"type" : "type" ,
"unit" : "unit" ,
"cardinality" : "cardinality",
"value" : "value"
}
)";
It f = input.begin(),
l = input.end();
StateMachine::Private::LuaParameterDataParser<It> p;
StateMachine::Private::LuaParameterData data;
bool ok = qi::phrase_parse(f, l, p, qi::ascii::space, data);
if (ok) {
std::wcout << L"Parsed: \n";
std::wcout << L"\tname: " << data.name << L'\n';
std::wcout << L"\ttype: " << data.type << L'\n';
std::wcout << L"\tunit: " << data.unit << L'\n';
std::wcout << L"\tcardinality: " << data.cardinality << L'\n';
std::wcout << L"\tvalue: " << data.value << L'\n';
} else {
std::wcout << L"Parse failure\n";
}
if (f!=l)
std::wcout << L"Remaining unparsed: '" << std::wstring(f,l) << L"'\n";
}

I'm going to refer to a set of recent answers where I've been over things quite extensively:
Parsing heterogeneous data using Boost::Spirit
ad-hoc JSON-like parsing Reading JSON file with C++ and BOOST
application of a more general JSON grammar: Reading JSON file with C++ and BOOST
Tangentially related:
Boost Spirit : something like permutation, but not exactly
http://boost-spirit.com/home/2011/04/16/the-keyword-parser/: the keyword parser

Related

boost spirit parsing with no skipper

Think about a preprocessor which will read the raw text (no significant white space or tokens).
There are 3 rules.
resolve_para_entry should solve the Argument inside a call. The top-level text is returned as string.
resolve_para should resolve the whole Parameter list and put all the top-level Parameter in a string list.
resolve is the entry
On the way I track the iterator and get the text portion
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont take the Parser to step outside ..
Rules:
resolve_para_entry = +(
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val= phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space)) [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()] // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);
In the end doesn't seem very elegant. Maybe there is a better way to parse such stuff without skipper
Indeed this should be a lot simpler.
First off, I fail to see why the absense of a skipper is at all relevant.
Second, exposing the raw input is best done using qi::raw[] instead of dancing with iter_pos and clumsy semantic actions¹.
Among the other observations I see:
negating a charset is done with ~, so e.g. ~char_(",()")
(p|eps) would be better spelled -p
(lit('(') >> lit(')')) could be just "()" (after all, there's no skipper, right)
p >> *(',' >> p) is equivalent to p % ','
With the above, resolve_para simplifies to this:
resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
resolve_para_entry seems weird, to me. It appears that any nested parentheses are simply swallowed. Why not actually parse a recursive grammar so you detect syntax errors?
Here's my take on it:
Define An AST
I prefer to make this the first step because it helps me think about the parser productions:
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
Creating The Grammar Rules
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
And their definitions:
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
Notes:
No more semantic actions
No more eps
No more iter_pos
I've opted to make arglist not-optional. If you really wanted that, change it back:
resolve = identifier >> -arglist;
But in our sample it will generate a lot of noisy output.
Of course your entry point (start) will be different. I just did the simplest thing that could possibly work, using another handy parser directive from the Spirit Repository (like iter_pos that you were already using): seek[]
The hold is there for this reason: boost::spirit::qi duplicate parsing on the output - You might not need it in your actual parser.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << arg << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)
More Ideas
That last output shows you a problem with your current grammar: lit(',') should obviously not be seen as a call with two parameters.
I recently did an answer on extracting (nested) function calls with parameters which does things more neatly:
Boost spirit parse rule is not applied
or this one boost spirit reporting semantic error
BONUS
Bonus version that uses string_view and also shows exact line/column information of all extracted words.
Note that it still doesn't require any phoenix or semantic actions. Instead it simply defines the necesary trait to assign to boost::string_view from an iterator range.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) {
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, Ast::Source()> arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), '\n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('\n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << annotate(arg) << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)
¹ Boost Spirit: "Semantic actions are evil"?

Regex: Finding all subexpressions (using boost::regex)

I have a file which contains some "entity" data in Valve's format. It's basically a key-value deal, and it looks like this:
{
"world_maxs" "3432 4096 822"
"world_mins" "-2408 -4096 -571"
"skyname" "sky_alpinestorm_01"
"maxpropscreenwidth" "-1"
"detailvbsp" "detail_sawmill.vbsp"
"detailmaterial" "detail/detailsprites_sawmill"
"classname" "worldspawn"
"mapversion" "1371"
"hammerid" "1"
}
{
"origin" "553 -441 322"
"targetname" "tonemap_global"
"classname" "env_tonemap_controller"
"hammerid" "90580"
}
Each pair of {} counts as one entity, and the rows inside count as KeyValues. As you can see, it's fairly straightforward.
I want to process this data into a vector<map<string, string> > in C++. To do this, I've tried using regular expressions that come with Boost. Here is what I have so far:
static const boost::regex entityRegex("\\{(\\s*\"([A-Za-z0-9_]+)\"\\s*\"([^\"]+)\")+\\s*\\}");
boost::smatch what;
while (regex_search(entitiesString, what, entityRegex)) {
cout << what[0] << endl;
cout << what[1] << endl;
cout << what[2] << endl;
cout << what[3] << endl;
break; // TODO
}
Easier-to-read regex:
\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}
I'm not sure the regex is well-formed for my problem yet, but it seems to print the last key-value pair (hammerid, 1) at least.
My question is, how would I go about extracting the "nth" matched subexpression within an expression? Or is there not really a practical way to do this? Would it perhaps be better to write two nested while-loops, one which searches for the {} patterns, and then one which searches for the actual key-value pairs?
Thanks!
Using a parser generator you can code a proper parser.
For example, using Boost Spirit you can define the rules of the grammar inline as C++ expressions:
start = *entity;
entity = '{' >> *entry >> '}';
entry = text >> text;
text = '"' >> *~char_('"') >> '"';
Here's a full demo:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
using Entity = std::map<std::string, std::string>;
using ValveData = std::vector<Entity>;
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper = qi::space_type>
struct Grammar : qi::grammar<It, ValveData(), Skipper>
{
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = *entity;
entity = '{' >> *entry >> '}';
entry = text >> text;
text = '"' >> *~char_('"') >> '"';
BOOST_SPIRIT_DEBUG_NODES((start)(entity)(entry)(text))
}
private:
qi::rule<It, ValveData(), Skipper> start;
qi::rule<It, Entity(), Skipper> entity;
qi::rule<It, std::pair<std::string, std::string>(), Skipper> entry;
qi::rule<It, std::string()> text;
};
int main()
{
using It = boost::spirit::istream_iterator;
Grammar<It> parser;
It f(std::cin >> std::noskipws), l;
ValveData data;
bool ok = qi::phrase_parse(f, l, parser, qi::space, data);
if (ok) {
std::cout << "Parsing success:\n";
int count = 0;
for(auto& entity : data)
{
++count;
for (auto& entry : entity)
std::cout << "Entity " << count << ": [" << entry.first << "] -> [" << entry.second << "]\n";
}
} else {
std::cout << "Parsing failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Which prints (for the input shown):
Parsing success:
Entity 1: [classname] -> [worldspawn]
Entity 1: [detailmaterial] -> [detail/detailsprites_sawmill]
Entity 1: [detailvbsp] -> [detail_sawmill.vbsp]
Entity 1: [hammerid] -> [1]
Entity 1: [mapversion] -> [1371]
Entity 1: [maxpropscreenwidth] -> [-1]
Entity 1: [skyname] -> [sky_alpinestorm_01]
Entity 1: [world_maxs] -> [3432 4096 822]
Entity 1: [world_mins] -> [-2408 -4096 -571]
Entity 2: [classname] -> [env_tonemap_controller]
Entity 2: [hammerid] -> [90580]
Entity 2: [origin] -> [553 -441 322]
Entity 2: [targetname] -> [tonemap_global]
I think doing it all with one regex expression is hard because of the variable number of entries inside each entity {}. Personally I would consider using simply std::readline to do your parsing.
#include <map>
#include <vector>
#include <string>
#include <sstream>
#include <iostream>
std::istringstream iss(R"~(
{
"world_maxs" "3432 4096 822"
"world_mins" "-2408 -4096 -571"
"skyname" "sky_alpinestorm_01"
"maxpropscreenwidth" "-1"
"detailvbsp" "detail_sawmill.vbsp"
"detailmaterial" "detail/detailsprites_sawmill"
"classname" "worldspawn"
"mapversion" "1371"
"hammerid" "1"
}
{
"origin" "553 -441 322"
"targetname" "tonemap_global"
"classname" "env_tonemap_controller"
"hammerid" "90580"
}
)~");
int main()
{
std::string skip;
std::string entity;
std::vector<std::map<std::string, std::string> > vm;
// skip to open brace, read entity until close brace
while(std::getline(iss, skip, '{') && std::getline(iss, entity, '}'))
{
// turn entity into input stream
std::istringstream iss(entity);
// temporary map
std::map<std::string, std::string> m;
std::string key, val;
// skip to open quote, read key to close quote
while(std::getline(iss, skip, '"') && std::getline(iss, key, '"'))
{
// skip to open quote read val to close quote
if(std::getline(iss, skip, '"') && std::getline(iss, val, '"'))
m[key] = val;
}
// move map (no longer needed)
vm.push_back(std::move(m));
}
for(auto& m: vm)
{
for(auto& p: m)
std::cout << p.first << ": " << p.second << '\n';
std::cout << '\n';
}
}
Output:
classname: worldspawn
detailmaterial: detail/detailsprites_sawmill
detailvbsp: detail_sawmill.vbsp
hammerid: 1
mapversion: 1371
maxpropscreenwidth: -1
skyname: sky_alpinestorm_01
world_maxs: 3432 4096 822
world_mins: -2408 -4096 -571
classname: env_tonemap_controller
hammerid: 90580
origin: 553 -441 322
targetname: tonemap_global
I would have written it like this:
^\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}$
Or splited the regex into two strings. First match the curly braces, then loop through the content of curly braces line for line.
Match curly braces: ^(\{[^\}]+)$
Match the lines: ^(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)"\s*)$

Boost::spirit (classic) primitives vs custom parsers

I'm a beginner in Boost::spirit and I want to define grammar that parses TTCN language.
(http://www.trex.informatik.uni-goettingen.de/trac/wiki/ttcn-3_4.5.1)
I'm trying to define some rules for 'primitve' parsers like Alpha, AlphaNum to be faitful 1 to 1 to original grammar but obviously I do something wrong because grammar defined this way does not work.
But when I use primite parsers in place of TTCN's it started to work.
Can someone tell why 'manually' defined rules does not work as expected ?
How to fix it, because I would like to stick close to original grammar.
Is it a begginer's code bug or something different ?
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/classic_symbols.hpp>
#include <boost/spirit/include/classic_tree_to_xml.hpp>
#include <boost/spirit/include/classic_position_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_parse_tree.hpp>
#include <boost/spirit/include/classic_ast.hpp>
#include <iostream>
#include <string>
#include <boost/spirit/home/classic/debug.hpp>
using namespace boost::spirit::classic;
using namespace std;
using namespace BOOST_SPIRIT_CLASSIC_NS;
typedef node_iter_data_factory<int> factory_t;
typedef position_iterator<std::string::iterator> pos_iterator_t;
typedef tree_match<pos_iterator_t, factory_t> parse_tree_match_t;
typedef parse_tree_match_t::const_tree_iterator iter_t;
struct ParseGrammar: public grammar<ParseGrammar>
{
template<typename ScannerT>
struct definition
{
definition(ParseGrammar const &)
{
KeywordImport = str_p("import");
KeywordAll = str_p("all");
SemiColon = ch_p(';');
Underscore = ch_p('_');
NonZeroNum = range_p('1','9');
Num = ch_p('0') | NonZeroNum;
UpperAlpha = range_p('A', 'Z');
LowerAlpha = range_p('a', 'z');
Alpha = UpperAlpha | LowerAlpha;
AlphaNum = Alpha | Num;
//this does not!
Identifier = lexeme_d[Alpha >> *(AlphaNum | Underscore)];
// Uncomment below line to make rule work
// Identifier = lexeme_d[alpha_p >> *(alnum_p | Underscore)];
Module = KeywordImport >> Identifier >> KeywordAll >> SemiColon;
BOOST_SPIRIT_DEBUG_NODE(Module);
BOOST_SPIRIT_DEBUG_NODE(KeywordImport);
BOOST_SPIRIT_DEBUG_NODE(KeywordAll);
BOOST_SPIRIT_DEBUG_NODE(Identifier);
BOOST_SPIRIT_DEBUG_NODE(SemiColon);
}
rule<ScannerT> KeywordImport,KeywordAll,Module,Identifier,SemiColon;
rule<ScannerT> Alpha,UpperAlpha,LowerAlpha,Underscore,Num,AlphaNum;
rule<ScannerT> NonZeroNum;
rule<ScannerT> const&
start() const { return Module; }
};
};
int main()
{
ParseGrammar resolver; // Our parser
BOOST_SPIRIT_DEBUG_NODE(resolver);
string content = "import foobar all;";
pos_iterator_t pos_begin(content.begin(), content.end());
pos_iterator_t pos_end;
tree_parse_info<pos_iterator_t, factory_t> info;
info = ast_parse<factory_t>(pos_begin, pos_end, resolver, space_p);
std::cout << "\ninfo.length : " << info.length << std::endl;
std::cout << "info.full : " << info.full << std::endl;
if(info.full)
{
std::cout << "OK: Parsing succeeded\n\n";
}
else
{
int line = info.stop.get_position().line;
int column = info.stop.get_position().column;
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "-------------------------\n";
}
return 0;
}
I don't do Spirit Classic (which has been deprecated for some years now).
I can only assume you've mixed something up with skippers. Here's the thing translated into Spirit V2:
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
namespace qi = boost::spirit::qi;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
template <typename Iterator = pos_iterator_t, typename Skipper = qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar() : ParseGrammar::base_type(Module)
{
using namespace qi;
KeywordImport = lit("import");
KeywordAll = lit("all");
SemiColon = lit(';');
#if 1
// this rule obviously works
Identifier = lexeme [alpha >> *(alnum | '_')];
#else
// this does too, but less efficiently
Underscore = lit('_');
NonZeroNum = char_('1','9');
Num = char_('0') | NonZeroNum;
UpperAlpha = char_('A', 'Z');
LowerAlpha = char_('a', 'z');
Alpha = UpperAlpha | LowerAlpha;
AlphaNum = Alpha | Num;
Identifier = lexeme [Alpha >> *(AlphaNum | Underscore)];
#endif
Module = KeywordImport >> Identifier >> KeywordAll >> SemiColon;
BOOST_SPIRIT_DEBUG_NODES((Module)(KeywordImport)(KeywordAll)(Identifier)(SemiColon))
}
qi::rule<Iterator, Skipper> Module;
qi::rule<Iterator> KeywordImport,KeywordAll,Identifier,SemiColon;
qi::rule<Iterator> Alpha,UpperAlpha,LowerAlpha,Underscore,Num,AlphaNum;
qi::rule<Iterator> NonZeroNum;
};
int main()
{
std::string const content = "import \r\n\r\nfoobar\r\n\r\n all; bogus";
pos_iterator_t first(content.begin()), iter=first, last(content.end());
ParseGrammar<pos_iterator_t> resolver; // Our parser
bool ok = phrase_parse(iter, last, resolver, qi::space);
std::cout << std::boolalpha;
std::cout << "\nok : " << ok << std::endl;
std::cout << "full : " << (iter == last) << std::endl;
if(ok && iter==last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
I've added a little "bogus" at the end of input, so the output becomes a nicer demonstration:
<Module>
<try>import \r\n\r\nfoobar\r\n\r</try>
<KeywordImport>
<try>import \r\n\r\nfoobar\r\n\r</try>
<success> \r\n\r\nfoobar\r\n\r\n all;</success>
<attributes>[]</attributes>
</KeywordImport>
<Identifier>
<try>foobar\r\n\r\n all; bogu</try>
<success>\r\n\r\n all; bogus</success>
<attributes>[]</attributes>
</Identifier>
<KeywordAll>
<try>all; bogus</try>
<success>; bogus</success>
<attributes>[]</attributes>
</KeywordAll>
<SemiColon>
<try>; bogus</try>
<success> bogus</success>
<attributes>[]</attributes>
</SemiColon>
<success> bogus</success>
<attributes>[]</attributes>
</Module>
ok : true
full : false
-------------------------
ERROR: Parsing failed or not complete
stopped at: 3:8
remaining: 'bogus'
-------------------------
That all said, this is what I'd probably reduce it to:
template <typename Iterator, typename Skipper = qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar() : ParseGrammar::base_type(Module)
{
using namespace qi;
Identifier = alpha >> *(alnum | '_');
Module = "import" >> Identifier >> "all" >> ';';
BOOST_SPIRIT_DEBUG_NODES((Module)(Identifier))
}
qi::rule<Iterator, Skipper> Module;
qi::rule<Iterator> Identifier;
};
As you can see, the Identifier rule is implicitely a lexeme because it doesn't declared to use a skipper.
See it Live on Coliru

Parsing a number of named sets of other named sets

So I want to write a... well... not-so-simple parser with boost::spirit::qi. I know the bare basics of boost spirit, having gotten acquainted with it for the first time in the past couple of hours.
Basically I need to parse this:
# comment
# other comment
set "Myset A"
{
figure "AF 1"
{
i 0 0 0
i 1 2 5
i 1 1 1
f 3.1 45.11 5.3
i 3 1 5
f 1.1 2.33 5.166
}
figure "AF 2"
{
i 25 5 1
i 3 1 3
}
}
# comment
set "Myset B"
{
figure "BF 1"
{
f 23.1 4.3 5.11
}
}
set "Myset C"
{
include "Myset A" # includes all figures from Myset A
figure "CF"
{
i 1 1 1
f 3.11 5.33 3
}
}
Into this:
struct int_point { int x, y, z; };
struct float_point { float x, y, z; };
struct figure
{
string name;
vector<int_point> int_points;
vector<float_point> float_points;
};
struct figure_set
{
string name;
vector<figure> figures
};
vector<figure_set> figure_sets; // fill with the data of the input
Now, obviously having somebody write it for me would be too much, but can you please provide some tips on what to read and how to structure the grammar and parsers for this task?
And also... it may be the case that boost::spirit is not the best library I could use for the task. If so, which one is?
EDIT:
Here's where I've gotten so far. But I'm not yet sure how to go on: http://liveworkspace.org/code/212c31dfc0b6fbdf6c462d8d931c0e9f
I am able to read a single figure but, I don't yet have an idea how to parse a set of figures.
Here's my take on it
I believe the rule that will have been the blocker for you would be
figure = eps >> "figure"
>> name [ at_c<0>(_val) = _1 ] >> '{' >>
*(
ipoints [ push_back(at_c<1>(_val), _1) ]
| fpoints [ push_back(at_c<2>(_val), _1) ]
) >> '}';
This is actually a symptom of the fact that you parse inter-mixed i and f lines into separate containers.
See below for an alternative.
Here's my full code: test.cpp
//#define BOOST_SPIRIT_DEBUG // before including Spirit
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <fstream>
namespace Format
{
struct int_point { int x, y, z; };
struct float_point { float x, y, z; };
struct figure
{
std::string name;
std::vector<int_point> int_points;
std::vector<float_point> float_points;
friend std::ostream& operator<<(std::ostream& os, figure const& o);
};
struct figure_set
{
std::string name;
std::set<std::string> includes;
std::vector<figure> figures;
friend std::ostream& operator<<(std::ostream& os, figure_set const& o);
};
typedef std::vector<figure_set> file_data;
}
BOOST_FUSION_ADAPT_STRUCT(Format::int_point,
(int, x)(int, y)(int, z))
BOOST_FUSION_ADAPT_STRUCT(Format::float_point,
(float, x)(float, y)(float, z))
BOOST_FUSION_ADAPT_STRUCT(Format::figure,
(std::string, name)
(std::vector<Format::int_point>, int_points)
(std::vector<Format::float_point>, float_points))
BOOST_FUSION_ADAPT_STRUCT(Format::figure_set,
(std::string, name)
(std::set<std::string>, includes)
(std::vector<Format::figure>, figures))
namespace Format
{
std::ostream& operator<<(std::ostream& os, figure const& o)
{
using namespace boost::spirit::karma;
return os << format_delimited(
"\n figure" << no_delimit [ '"' << string << '"' ] << "\n {"
<< *("\n i" << int_ << int_ << int_)
<< *("\n f" << float_ << float_ << float_)
<< "\n }"
, ' ', o);
}
std::ostream& operator<<(std::ostream& os, figure_set const& o)
{
using namespace boost::spirit::karma;
return os << format_delimited(
"\nset" << no_delimit [ '"' << string << '"' ] << "\n{"
<< *("\n include " << no_delimit [ '"' << string << '"' ])
<< *stream
<< "\n}"
, ' ', o);
}
}
namespace /*anon*/
{
namespace phx=boost::phoenix;
namespace qi =boost::spirit::qi;
template <typename Iterator> struct skipper
: public qi::grammar<Iterator>
{
skipper() : skipper::base_type(start, "skipper")
{
using namespace qi;
comment = '#' >> *(char_ - eol) >> (eol|eoi);
start = comment | qi::space;
BOOST_SPIRIT_DEBUG_NODE(start);
BOOST_SPIRIT_DEBUG_NODE(comment);
}
private:
qi::rule<Iterator> start, comment;
};
template <typename Iterator> struct parser
: public qi::grammar<Iterator, Format::file_data(), skipper<Iterator> >
{
parser() : parser::base_type(start, "parser")
{
using namespace qi;
using phx::push_back;
using phx::at_c;
name = eps >> lexeme [ '"' >> *~char_('"') >> '"' ];
include = eps >> "include" >> name;
ipoints = eps >> "i" >> int_ >> int_ >> int_;
fpoints = eps >> "f" >> float_ >> float_ >> float_;
figure = eps >> "figure"
>> name [ at_c<0>(_val) = _1 ] >> '{' >>
*(
ipoints [ push_back(at_c<1>(_val), _1) ]
| fpoints [ push_back(at_c<2>(_val), _1) ]
) >> '}';
set = eps >> "set" >> name >> '{' >> *include >> *figure >> '}';
start = *set;
}
private:
qi::rule<Iterator, std::string() , skipper<Iterator> > name, include;
qi::rule<Iterator, Format::int_point() , skipper<Iterator> > ipoints;
qi::rule<Iterator, Format::float_point(), skipper<Iterator> > fpoints;
qi::rule<Iterator, Format::figure() , skipper<Iterator> > figure;
qi::rule<Iterator, Format::figure_set() , skipper<Iterator> > set;
qi::rule<Iterator, Format::file_data() , skipper<Iterator> > start;
};
}
namespace Parser {
bool parsefile(const std::string& spec, Format::file_data& data)
{
std::ifstream in(spec.c_str());
in.unsetf(std::ios::skipws);
std::string v;
v.reserve(4096);
v.insert(v.end(), std::istreambuf_iterator<char>(in.rdbuf()), std::istreambuf_iterator<char>());
if (!in)
return false;
typedef char const * iterator_type;
iterator_type first = &v[0];
iterator_type last = first+v.size();
try
{
parser<iterator_type> p;
skipper<iterator_type> s;
bool r = qi::phrase_parse(first, last, p, s, data);
r = r && (first == last);
if (!r)
std::cerr << spec << ": parsing failed at: \"" << std::string(first, last) << "\"\n";
return r;
}
catch (const qi::expectation_failure<char const *>& e)
{
std::cerr << "FIXME: expected " << e.what_ << ", got '" << std::string(e.first, e.last) << "'" << std::endl;
return false;
}
}
}
int main()
{
Format::file_data data;
bool ok = Parser::parsefile("input.txt", data);
std::cerr << "Parse " << (ok?"success":"failed") << std::endl;
std::cout << "# figure sets exported automatically by karma\n\n";
for (auto& set : data)
std::cout << set;
}
It outputs the parsed data as a verification: output.txt
Parse success
# figure sets exported automatically by karma
set "Myset A"
{
figure "AF 1"
{
i 0 0 0
i 1 2 5
i 1 1 1
i 3 1 5
f 3.1 45.11 5.3
f 1.1 2.33 5.166
}
figure "AF 2"
{
i 25 5 1
i 3 1 3
}
}
set "Myset B"
{
figure "BF 1"
{
f 23.1 4.3 5.11
}
}
set "Myset C"
{
include "Myset A"
figure "CF"
{
i 1 1 1
f 3.11 5.33 3.0
}
}
You will note that
the order of the point lines are changed (all int_points precede all float_points)
also, non-significant digits are added, e.g. in the last line 3.0 instead of 3 to show that the type if float.
you had 'forgotten' (?) about the includes in your question
Alternative
Have something that keeps the actual point lines in original order:
typedef boost::variant<int_point, float_point> if_point;
struct figure
{
std::string name;
std::vector<if_point> if_points;
}
Now the rules become simply:
name = eps >> lexeme [ '"' >> *~char_('"') >> '"' ];
include = eps >> "include" >> name;
ipoints = eps >> "i" >> int_ >> int_ >> int_;
fpoints = eps >> "f" >> float_ >> float_ >> float_;
figure = eps >> "figure" >> name >> '{' >> *(ipoints | fpoints) >> '}';
set = eps >> "set" >> name >> '{' >> *include >> *figure >> '}';
start = *set;
Note the elegance in
figure = eps >> "figure" >> name >> '{' >> *(ipoints | fpoints) >> '}';
And the output stays in the exact order of the input: output.txt
Once again, full demo code (on github only): test.cpp
Bonus update
Finally, I made my first proper Karma grammar to output the results:
name = no_delimit ['"' << string << '"'];
include = "include" << name;
ipoints = "\n i" << int_ << int_ << int_;
fpoints = "\n f" << float_ << float_ << float_;
figure = "figure" << name << "\n {" << *(ipoints | fpoints) << "\n }";
set = "set" << name << "\n{"
<< *("\n " << include)
<< *("\n " << figure) << "\n}";
start = "# figure sets exported automatically by karma\n\n"
<< set % eol;
That was actually considerably more comfortable than I had expected. See it in the lastest version of the fully updated gist: test.hpp

Resolve ambiguous boost::spirit::qi grammar with lookahead

I want to a list of name-value pairs. Each list is terminated by a '.' and EOL. Each name-value pair is separated by a ':'. Each pair is separated by a ';' in the list. E.g.
NAME1: VALUE1; NAME2: VALUE2; NAME3: VALUE3.<EOL>
The problem I have is that the values contain '.' and the last value always consumes the '.' at the EOL. Can I use some sort of lookahead to ensure the last '.' before the EOL is treated differently?
I have created a sample, that presumably looks like what you have. The tweak is in the following line:
value = lexeme [ *(char_ - ';' - ("." >> (eol|eoi))) ];
Note how - ("." >> (eol|eoi))) means: exclude any . that is immediately followed by end-of-line or end-of-input.
Test case (also live on http://liveworkspace.org/code/949b1d711772828606ddc507acf4fb4b):
const std::string input =
"name1: value 1; other name : value #2.\n"
"name.sub1: value.with.periods; other.sub2: \"more fun!\"....\n";
bool ok = doParse(input, qi::blank);
Output:
parse success
data: name1 : value 1 ; other name : value #2 .
data: name.sub1 : value.with.periods ; other.sub2 : "more fun!"... .
Full code:
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <map>
#include <vector>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
namespace phx = boost::phoenix;
typedef std::map<std::string, std::string> map_t;
typedef std::vector<map_t> maps_t;
template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, maps_t(), Skipper>
{
parser() : parser::base_type(start)
{
using namespace qi;
name = lexeme [ +~char_(':') ];
value = lexeme [ *(char_ - ';' - ('.' >> (eol|eoi))) ];
line = ((name >> ':' >> value) % ';') >> '.';
start = line % eol;
}
private:
qi::rule<It, std::string(), Skipper> name, value;
qi::rule<It, map_t(), Skipper> line;
qi::rule<It, maps_t(), Skipper> start;
};
template <typename C, typename Skipper>
bool doParse(const C& input, const Skipper& skipper)
{
auto f(std::begin(input)), l(std::end(input));
parser<decltype(f), Skipper> p;
maps_t data;
try
{
bool ok = qi::phrase_parse(f,l,p,skipper,data);
if (ok)
{
std::cout << "parse success\n";
for (auto& line : data)
std::cout << "data: " << karma::format_delimited((karma::string << ':' << karma::string) % ';' << '.', ' ', line) << '\n';
}
else std::cerr << "parse failed: '" << std::string(f,l) << "'\n";
//if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
return ok;
} catch(const qi::expectation_failure<decltype(f)>& e)
{
std::string frag(e.first, e.last);
std::cerr << e.what() << "'" << frag << "'\n";
}
return false;
}
int main()
{
const std::string input =
"name1: value 1; other name : value #2.\n"
"name.sub1: value.with.periods; other.sub2: \"more fun!\"....\n";
bool ok = doParse(input, qi::blank);
return ok? 0 : 255;
}