I'm trying to make a JSON parser but my object rule doesn't compile...
Code (complete code here):
// AST
using Object = std::map<std::string, struct Value>; // (Value is a variant which can contain a float, a string, an Object or an Array)
// Grammar def
using ObjectType = x3::rule<struct ObjectClass, Object>;
const ObjectType obj{"object"};
const auto obj_def = '{' > ((quotedString > ':' > val) % ',') > '}';
Error (complete error here):
/usr/include/boost/spirit/home/x3/support/traits/container_traits.hpp:77:56: error: no type named 'value_type' in 'std::pair<std::basic_string<char>, Json::Value>'
: detail::remove_value_const<typename Container::value_type>
~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
The type std::pair<std::basic_string<char>, Json::Value> is good, but it must be an array (std::vector<std::pair<std::basic_string<char>, Json::Value>>, so std::map<std::basic_string<char>, Json::Value>)
What is the problem?
Your diagnosis is off the mark. You can just eliminate rules and defs until you find the culprit. The obj_def is the culprit, which you can confirm by commenting it out:
const auto obj_def = x3::eps; // '{' > ((quotedString > ':' > val) % ',') > '}';
In your grammardef.hpp you need to include
#include <boost/fusion/adapted.hpp>
so that Fusion knows how to deal with std::pair<std::string, Json::Value>.
This is a FAQ entry since early days of Spirit V2 (http://boost-spirit.com/home/articles/qi-example/parsing-a-list-of-key-value-pairs-using-spirit-qi/).
Also, bear in mind that some implementations will expect properties to be ordered (this is not actually specified) and you might want to check against duplicate keys (especially after normalizing unicode escapes).
Related
I'm writing a little compiler just for fun and I'm using Boost Spirit Qi to describe my grammar. Now I want to make a minor change in the grammar to prepare some further additions. Unfortunately these changes won't compile and I would like to understand why this is the case.
Here is a snippet from the code I want to change. I hope the provided information is enough to understand the idea. The complete code is a bit large, but if you want to look at it or even test it (Makefile and Travis CI is provided), see https://github.com/Kruecke/BFGenerator/blob/8f66aa5/bf/compiler.cpp#L433.
typedef boost::variant<
function_call_t,
variable_declaration_t,
variable_assignment_t,
// ...
> instruction_t;
struct grammar : qi::grammar<iterator, program_t(), ascii::space_type> {
grammar() : grammar::base_type(program) {
instruction = function_call
| variable_declaration
| variable_assignment
// | ...
;
function_call = function_name >> '(' > -(variable_name % ',') > ')' > ';';
// ...
}
qi::rule<iterator, instruction::instruction_t(), ascii::space_type> instruction;
qi::rule<iterator, instruction::function_call_t(), ascii::space_type> function_call;
// ...
};
So far, everything is just working fine. Now I want to move the parsing of the trailing semicolon (> ';') from the function_call rule to the instruction rule. My code now looks like this:
struct grammar : qi::grammar<iterator, program_t(), ascii::space_type> {
grammar() : grammar::base_type(program) {
instruction = (function_call > ';') // Added trailing semicolon
| variable_declaration
| variable_assignment
// | ...
;
// Removed trailing semicolon here:
function_call = function_name >> '(' > -(variable_name % ',') > ')';
// ...
}
From my understanding the rules haven't really changed because the character parser ';' doesn't yield any attribute and so it shouldn't matter where this parser is positioned. However, this change won't compile:
/usr/include/boost/spirit/home/support/container.hpp:278:13: error: no matching function for call to ‘std::basic_string<char>::insert(std::basic_string<char>::iterator, const bf::instruction::function_call_t&)’
c.insert(c.end(), val);
^
(This error comes from the instruction = ... line.)
Why is this change not compiling? I'm rather looking for an explanation to understand what's going on than a workaround.
Ok, so after looking at this closely, you are trying to insert multiple strings into your function_call_t type, which is a fusion sequence that can be converted to from a single std::string. However, you are probably going to run into issues with your function_call rule because it's attribute is actually tuple <std::string, optional <vector <std::string>>>. I'd imagine that spirit is having issues flattening that structure out and that is causing your issue, however, I don't have a compiler to test it out at the moment.
I am using Spirit Qi as my parser, to parse mathematical expressions into an expression tree. I keep track of such things as the types of the symbols which are encountered as I parse, and which must be declared in the text I am parsing. Namely, I am parsing Bertini input files, a simple-ish example of which is here, a complicated example is here, and for completeness purposes, as below:
%input: our first input file
variable_group x,y;
function f,g;
f = x^2 - 1;
g = y^2 - 4;
END;
The grammar I have been working on will ideally
find declaration statements, and then parse the following comma-separated list of symbols of the type being declared, and store the resulting vector of symbols in the class object being parsed into; e.g. variable_group x, y;
find a previously declared symbol, which is followed by an equals sign, and is the definition of that symbol as an evaluatable mathematical object; e.g. f = x^2 - 1; This part I mostly have under control.
find a not-previously declared symbol followed by =, and parse it as a subfunction. I think I can handle this, too.
The problem I have been struggling to solve seems like it is so trivial, yet after hours of searching, I still haven't gotten there. I've read dozens of Boost Spirit mailing list posts, SO posts, the manual, and the headers for Spirit themselves, yet still don't quite grok a few critical things about Spirit Qi parsing.
Here is the problematic basic grammar definition, which would go in system_parser.hpp:
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template<typename Iterator>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type>
{
SystemParser() : SystemParser::base_type(variable_group_)
{
namespace phx = boost::phoenix;
using qi::_1;
using qi::_val;
using qi::eps;
using qi::lit;
qi::symbols<char,int> encountered_variables;
qi::symbols<char,int> declarative_symbols;
declarative_symbols.add("variable_group",0);
// wraps the vector between its appropriate declaration and line termination.
BOOST_SPIRIT_DEBUG_NODE(variable_group_);
debug(variable_group_);
variable_group_.name("variable_group_");
variable_group_ %= lit("variable_group") >> genericvargp_ >> lit(';');
// creates a vector of strings
BOOST_SPIRIT_DEBUG_NODE(genericvargp_);
debug(genericvargp_);
genericvargp_.name("genericvargp_");
genericvargp_ %= new_variable_ % ',';
// will in the future make a shared pointer to an object using the string
BOOST_SPIRIT_DEBUG_NODE(new_variable_);
debug(new_variable_);
new_variable_.name("new_variable_");
new_variable_ %= unencountered_symbol_;
// this rule gets a string.
BOOST_SPIRIT_DEBUG_NODE(unencountered_symbol_);
debug(unencountered_symbol_);
unencountered_symbol_.name("unencountered_symbol");
unencountered_symbol_ %= valid_variable_name_ - ( encountered_variables | declarative_symbols);
// get a string which fits the naming rules.
BOOST_SPIRIT_DEBUG_NODE(valid_variable_name_);
valid_variable_name_.name("valid_variable_name_");
valid_variable_name_ %= +qi::alpha >> *(qi::alnum | qi::char_('_') | qi::char_('[') | qi::char_(']') );
}
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > variable_group_;
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > genericvargp_;
qi::rule<Iterator, std::string(), ascii::space_type> new_variable_;
qi::rule<Iterator, std::string(), ascii::space_type > unencountered_symbol_;// , ascii::space_type
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
and some code which uses it:
#include "system_parsing.hpp"
int main(int argc, char** argv)
{
std::vector<std::string> V;
std::string str = "variable_group x, y, z;";
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
SystemParser<std::string::const_iterator> S;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
std::cout << "the unparsed string:\n" << std::string(iter,end);
return 0;
}
It compiles under Clang 4.9.x on OSX just fine. When I run it, I get:
Assertion failed: (px != 0), function operator->, file /usr/local/include/boost/smart_ptr/shared_ptr.hpp, line 648.
Alternately, if I use expectation operator > rather than >> in the definition of the variable_group_ rule, I get our dear old friend Segmentation fault: 11.
In my learning process, I've come across such excellent posts as how to tell the type spirit is trying to generate, attribute propagation, how to interact with symbols, an example of infinite left recursion which lead to a segfault, information on parsing into classes, not structs which has a link to using Customization points (yet the links contain no examples), the Nabialek trick which couples keywords to actions, and perhaps most relevant for what I am trying to do dynamic difference parsing which is certainly something I need since the set of symbols grows, and I disallow usage of them as another type later, as the set of already-encountered symbols starts empty, and grows -- that it, the rules for parsing are dynamic.
So here's where I am at. My current problem is the assert/segfault generated by this particular example. However, I am unclear on some things, and need guiding advice, which I just haven't put together from any of the sources I have consulted, and the request for which hopefully makes this SO question disjoint from others previously asked:
When is it appropriate to use lexeme? I just don't know when to use lexeme, and not.
What are some guidelines for when to use > rather than >>?
I've seen many Fusion adapt examples where there is a struct to be parsed into, and a set of rules to do so. My input files will possibly have multiple occurrences of declarations of function, variables, etc, which all need to go the same place, so I need to be able to add to fields of the terminal class object into which I am parsing, in any order, multiple times. I think I would like to use getter/setters for the class object, so that parsing is not the only pathway to object construction. Is this a problem?
Any kind advice for this beginner is most welcome.
You reference the symbols variables. But they are locals so they don't exist once the constructor returns. This invokes Undefined Behaviour. Anything can happen.
Make the symmbol tables members of the class.
Also simplifying the dance around
the skippers (see Boost spirit skipper issues). That link also answers your _"When is it appropriate to use lexeme[]. In your sample you lacked the lexeme[] around encountered_variables|declarative_symbols, for example.
the debug macros
the operator%=, and some generally unused stuff
guessing you didn't need the mapped type of the symbols<> (because the int wasn't consumed), simplified the initialization there
Demo
Live On Coliru
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#define BOOST_SPIRIT_DEBUG 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), Skipper> {
SystemParser() : SystemParser::base_type(variable_group_)
{
declarative_symbols += "variable_group";
variable_group_ = "variable_group" >> genericvargp_ >> ';';
genericvargp_ = new_variable_ % ',';
valid_variable_name_ = qi::alpha >> *(qi::alnum | qi::char_("_[]"));
unencountered_symbol_ = valid_variable_name_ - (encountered_variables|declarative_symbols);
new_variable_ = unencountered_symbol_;
BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
}
private:
qi::symbols<char, qi::unused_type> encountered_variables, declarative_symbols;
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), Skipper> variable_group_;
qi::rule<Iterator, std::vector<std::string>(), Skipper> genericvargp_;
qi::rule<Iterator, std::string()> new_variable_;
qi::rule<Iterator, std::string()> unencountered_symbol_; // , Skipper
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
//#include "system_parsing.hpp"
int main() {
using It = std::string::const_iterator;
std::string const str = "variable_group x, y, z;";
SystemParser<It> S;
It iter = str.begin(), end = str.end();
std::vector<std::string> V;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
if (s)
{
std::cout << "Parse succeeded: " << V.size() << "\n";
for (auto& s : V)
std::cout << " - '" << s << "'\n";
}
else
std::cout << "Parse failed\n";
if (iter!=end)
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}
Prints
Parse succeeded: 3
- 'x'
- 'y'
- 'z'
I have a simple grammar consisting of mixed variables ($(name)) and variable-value pairs ($(name:value)). I have a hand-coded recursive parser, but am interested in using it as an exercise to learn Spirit, which I'll need for more complex grammars eventually(/soon).
Anyway, the set of possible forms I'm working with (simplified from the full grammar) is:
$(variable) // Uses simple look-up, recursion and inline replace
$(name:value) // Inserts a new variable into the local lookup table
My current rules look something like:
typedef std::map<std::string, std::string> dictionary;
template <typename Iterator>
bool parse_vars(Iterator first, Iterator last, dictionary & vars, std::string & output)
{
using qi::phrase_parse;
using qi::_1;
using ascii::char_;
using ascii::string;
using ascii::space;
using phoenix::insert;
dictionary statevars;
typedef qi::rule<Iterator, std::string()> string_rule;
typedef qi::rule<Iterator, std::pair<std::string, std::string>()> pair_rule;
string_rule state = string >> ':' >> string; // Error 3
pair_rule variable =
(
char_('$') >> '(' >>
(
state[insert(phoenix::ref(statevars), _1)] |
string[output += vars[_1]] // Error 1, will eventually need to recurse
) >> ')'
); // Error 2
bool result = phrase_parse
(
first, last,
(
variable % ','
),
space
);
return r;
}
If it wasn't obvious, I have no idea how Spirit works and the docs have everything but actual explanations, so this is about an hour of throwing examples together.
The parts I particularly question are the leading char_('$') in the variable rule, but removing this causes a shift operator error (the compiler interprets '$' >> '(' as a right-shift).
When compiling, I get errors related to the state rule, particularly creating the pair, and the lookup:
error C2679: binary '[' : no operator found which takes a right-hand operand of type 'const boost::spirit::_1_type' (or there is no acceptable conversion)
error C2512: 'boost::spirit::qi::rule::rule' : no appropriate default constructor available
Changing the lookup (vars[_1]) to a simple += gives:
3. error C2665: 'boost::spirit::char_class::classify::is' : none of the 15 overloads could convert all the argument types
Error 1 seems to relate to the type (attribute?) of the _1 placeholder, but that should be a string, and is when used for printing or concatenation to the output string. 2 appears to be noise caused by 1.
Error 3, digging down the stack of template errors, seems to relate to not being able to turn the state rule into a pair, which seems odd as it almost exactly matches one of the rules from this example.
How can I modify the variable rule to properly handle both input forms?
A few things to note:
To adapt std::pair (so you can use it with maps) you should include (at least)
#include <boost/fusion/adapted/std_pair.hpp>
It looks like you are trying to create a symbol table. You could use qi::symbols for that
avoid mixing output generation with parsing, it complicates matters unduly
I haven't 'fixed' all the above (due to lack of context), but I'd happy to help out with any other questions arising from those.
Here is a fixed code version staying pretty close to the OP. Edit have tested it too now, output below:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef std::map<std::string, std::string> dictionary;
template <typename Iterator, typename Skipper = qi::space_type>
struct parser : qi::grammar<Iterator, Skipper>
{
parser(dictionary& statevars, std::string& output) : parser::base_type(start)
{
using namespace qi;
using phx::insert;
with_initializer = +~char_(":)") >> ':' >> *~char_(")");
simple = +~char_(")");
variable =
"$(" >> (
with_initializer [ insert(phx::ref(statevars), qi::_1) ]
| simple [ phx::ref(output) += phx::ref(statevars)[_1] ]
) >> ')';
start = variable % ',';
BOOST_SPIRIT_DEBUG_NODE(start);
BOOST_SPIRIT_DEBUG_NODE(variable);
BOOST_SPIRIT_DEBUG_NODE(simple);
BOOST_SPIRIT_DEBUG_NODE(with_initializer);
}
private:
qi::rule<Iterator, std::pair<std::string, std::string>(), Skipper> with_initializer;
qi::rule<Iterator, std::string(), Skipper> simple;
qi::rule<Iterator, Skipper> variable;
qi::rule<Iterator, Skipper> start;
};
template <typename Iterator>
bool parse_vars(Iterator &first, Iterator last, dictionary & vars, std::string & output)
{
parser<Iterator> p(vars, output);
return qi::phrase_parse(first, last, p, qi::space);
}
int main()
{
const std::string input = "$(name:default),$(var),$(name)";
std::string::const_iterator f(input.begin());
std::string::const_iterator l(input.end());
std::string output;
dictionary table;
if (!parse_vars(f,l,table,output))
std::cerr << "oops\n";
if (f!=l)
std::cerr << "Unparsed: '" << std::string(f,l) << "'\n";
std::cout << "Output: '" << output << "'\n";
}
Output:
Output: 'default'
you have to have char_('$') otherwise the >> is 'char' on both sides - you need to have at least one spirit type in there to get the overloaded operator >>.
You may also need to use _1 from phoenix.
Also take a look at:
http://boost-spirit.com/home/articles/qi-example/parsing-a-list-of-key-value-pairs-using-spirit-qi/
I have the following code:
std::string test("1.1");
std::pair<int, int> d;
bool r = qi::phrase_parse(
test.begin(),
test.end(),
qi::int_ >> '.' >> qi::int_,
space,
d
);
So I'm trying to parse the string test and place the result in the std::pair d. However it is not working, I suspect it has to do with the Compound Attribute Rules.
Any hints to how to get this working?
The compiler error is the following:
error: no matching function for call
to 'std::pair::pair(const
int&)'
It should work. What people forget very often is to add a
#include <boost/fusion/include/std_pair.hpp>
to their list of includes. This is necessary to make std::pair a full blown Fusion citizen.
I am trying to parse a C-function like tree expressions like the following (using the Spirit Parser Framework):
F( A() , B( GREAT( SOME , NOT ) ) , C( YES ) )
For this I am trying to use the three rules on the following grammar:
template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), space_type> {
InputGrammar() : InputGrammar::base_type( ) {
tag = ( qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9") )[ push_back( at_c<0>(qi::_val) , qi::_1 ) ];
command = tag [ at_c<0>(qi::_val) = at_c<0>(qi::_1) ] >> "(" >> (*instruction >> ",")
[ push_back( at_c<1>(qi::_val) , qi::_1 ) ] >> ")";
instruction = ( command | tag ) [qi::_val = qi::_1];
}
qi::rule< Iterator , ExpressionAST() , space_type > tag;
qi::rule< Iterator , ExpressionAST() , space_type > command;
qi::rule< Iterator , ExpressionAST() , space_type > instruction;
};
Notice that my tag rule just tries to capture the identifiers used in the expressions (the 'function' names). Also notice that the signature of the tag rule returns a ExpressionAST instead of a std::string, like in most examples. The reason I want to do it like this is actually pretty simple: I hate using variants and I will avoid them if possible. It would be great to keep the cake and eat it too I guess.
A command should start with a tag (the name of the current node, first string field of the AST node) and a variable number of arguments enclosed by parentheses, and each of the arguments can be a tag itself or another command.
However, this example does not work at all. It compiles and everything, but at run time it fails to parse all my test strings. And the thing that really annoys me is that I can't figure how to fix it, since I can't really debug the above code, at least in the traditional meaning of the word. Basically the only way I see I can fix the above code is by knowing what I am doing wrong.
So, the question is that I don't know what is wrong with the above code. How would you define the above grammar?
The ExpressionAST type I am using is:
struct MockExpressionNode {
std::string name;
std::vector< MockExpressionNode > operands;
typedef std::vector< MockExpressionNode >::iterator iterator;
typedef std::vector< MockExpressionNode >::const_iterator const_iterator;
iterator begin() { return operands.begin(); }
const_iterator begin() const { return operands.begin(); }
iterator end() { return operands.end(); }
const_iterator end() const { return operands.end(); }
bool is_leaf() const {
return ( operands.begin() == operands.end() );
}
};
BOOST_FUSION_ADAPT_STRUCT(
MockExpressionNode,
(std::string, name)
(std::vector<MockExpressionNode>, operands)
)
As far as debugging, its possible to use a normal break and watch approach. This is made difficult by how you've formatted the rules though. If you format per the spirit examples (~one parser per line, one phoenix statement per line), break points will be much more informative.
Your data structure doesn't have a way to distinguish A() from SOME in that they are both leaves (let me know if I'm missing something). From your variant comment, I don't think this was your intention, so to distinguish these two cases, I added a bool commandFlag member variable to MockExpressionNode (true for A() and false for SOME), with a corresponding fusion adapter line.
For the code specifically, you need to pass the start rule to the base constructor, i.e.:
InputGrammar() : InputGrammar::base_type(instruction) {...}
This is the entry point in the grammar, and is why you were not getting any data parsed. I'm surprised it compiled without it, I thought that the grammar type was required to match the type of the first rule. Even so, this is a convenient convention to follow.
For the tag rule, there are actually two parsers qi::char_("a-zA-Z_"), which is _1 with type char and *qi::char_("a-zA-Z_0-9") which is _2 with type (basically) vector<char>. Its not possible to coerce these into a string without autorules, But it can be done by attaching a rule to each parsed char:
tag = qi::char_("a-zA-Z_")
[ at_c<0>(qi::_val) = qi::_1 ];
>> *qi::char_("a-zA-Z_0-9") //[] has precedence over *, so _1 is
[ at_c<0>(qi::_val) += qi::_1 ]; // a char rather than a vector<char>
However, its much cleaner to let spirit do this conversion. So define a new rule:
qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
identifier %= qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9");
And don't worry about it ;). Then tag becomes
tag = identifier
[
at_c<0>(qi::_val) = qi::_1,
ph::at_c<2>(qi::_val) = false //commandFlag
]
For command, the first part is fine, but theres a couple problems with (*instruction >> ",")[ push_back( at_c<1>(qi::_val) , qi::_1 ) ]. This will parse zero or multiple instruction rules followed by a ",". It also attempts to push_back a vector<MockExpressionNode> (not sure why this compiled either, maybe not instantiated because of the missing start rule?). I think you want the following (with the identifier modification):
command =
identifier
[
ph::at_c<0>(qi::_val) = qi::_1,
ph::at_c<2>(qi::_val) = true //commandFlag
]
>> "("
>> -(instruction % ",")
[
ph::at_c<1>(qi::_val) = qi::_1
]
>> ")";
This uses the optional operator - and the list operator %, the latter is equivalent to instruction >> *("," >> instruction). The phoenix expression then just assigns the vector directly to the structure member, but you could also attach the action directly to the instruction match and use push_back.
The instruction rule is fine, I'll just mention that it is equivalent to instruction %= (command|tag).
One last thing, if there actually is no distinction between A() and SOME (i.e. your original structure with no commandFlag), you can write this parser using only autorules:
template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), ascii::space_type> {
InputGrammar() : InputGrammar::base_type( command ) {
identifier %=
qi::char_("a-zA-Z_")
>> *qi::char_("a-zA-Z_0-9");
command %=
identifier
>> -(
"("
>> -(command % ",")
>> ")");
}
qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
qi::rule< Iterator , ExpressionAST(void) , ascii::space_type > command;
};
This is the big benefit of using a fusion wrapped structure that models the input closely.