Using Boost Spirit's stream parser with custom grammar - c++

Synopsis
I would like to harness Boost Spirit's stream parser API to parse an std::istream incrementally. However, I could not find a good example of how to use it with an iterator-based grammar. Conceptually, my goal is to parse an infinite stream of objects of type T.
Details
A grammar in Qi with an attribute of type T and skipper S typically has the form:
template <typename Iterator>
struct grammar : qi::grammar<Iterator, T(), S>;
How do I use such a grammar with the stream-based API? Specifically, my mental model for the stream API is that I can do something along the lines of:
// Callback invoked for each successfully parsed instance of T.
void f(T const& x)
{
}
// What iterator type?
grammar<???> parser;
skipper<???> skipper;
T x;
std::ifstream ifs("/path/to/file");
ifs.unsetf(std::ios::skipws)
while (! ifs.eof())
{
ifs >> phrase_match(parser, skipper, x);
if (ifs.good() || ifs.eof())
f(x);
}
I am struggling with bringing together traditional grammars requiring iterators. How does that fit together with the stream API?

You're missing the Spirit multi-pass iterator. Note, however, that parsing of the stream will not be done incrementally unless you go out of your way to make sure your grammar has minimal backtracking.

Related

Tagging std::function with a name?

I'm working on a parser combinator library, and I'd really like my parser to simply be some callable object:
typedef std::function<parse_result(parse_stream)> parser;
Which makes the parser combinators nice, eg:
parser operator &(parser a, parser b) { return both(a,b); }
but I'd like two features:
1) I'd like string literals to get promoted to a parser automatically so you can do things like:
parser option = "<" & regexp("[^+>]+");
2) I'd like a parser to have a name that I can use for error formatting. In the case of the "both" parser above, I could print that I expected a.name() and b.name() for example.
The two options I've tried so far are
a parser class that's callable, this lets me build from strings and std::function instances but a general callable has to be converted to a std::function first and from there to a parser, and C++ won't do two implicit conversions
Inheriting from std::function so I can implicitly convert the functions, but this seems to have a lot of gotchas in terms of only converting callables into a parser.
Does anyone have any thoughts on how to structure this?
You don't want a raw typedef of std function; your parser is more than a any std function.
struct parser: std::function<parse_result(parse_stream)>{
using base = std::function<parse_result(parse_stream)>;
using base::base;
};
this should permit
parser p = []( parse_stream str ) { return parse_result(7); };
as we use inheriting constructors to expose the raw std::function ctors in parser.
While you can override:
parser operator&(parser a, parser b) { return both(a,b); }
with the typedef version by putting & in the namespace of parse_result or parse_stream, I'd advise against it; there has been chat in the standarizatoin to restrict that kind of template-argument ADL. With a bare parser type, the spot to put such operator overloads is clear.
In addition some types cannot be overloaded outside of the class, like &=. With a struct you can do it in there.
None of this fixes
parser option = "<" & regexp("[^+>]+");
as the problem here is that the right hand side has no idea what the left hand side is doing (unless regexp is a function returing a parser).
First do this:
struct parser: std::function<parse_result(parse_stream)>{
using base = std::function<parse_result(parse_stream)>;
parser( char const* str ):
base( [str=std::string(str)](parse_stream stream)->parse_result { /* do work */ } )
{}
parser( char c ):
base( [c](parse_stream str)->parse_result { /* do work */ } )
{}
using base::base;
};
then you can add
namespace parsing {
// parser definition goes here
inline namespace literals {
inline parser operator""_p( char const* str ) { return str; }
}
}
and using namespace parsing::literals means "hello"_p is a parser that attempts to parse the string "hello".

Parsing into a set using boost spirit x3

I am interested in if there is a way to parse into a set using boost spirit x3. the background is I have a string of tokens, each token represents a enum value, now i want to create a parser which parses if every every token is at most once in the string, it would be a charm if I could get all the parsed tokens into a std::set while parsing.
To get the enums back from the parsed string I am using a symbol_table:
enum class foo{bar, baz, bla, huh};
struct enum_table : x3::symbols<foo> {
enum_table() {
add("bar", foo::bar)
("baz", foo::baz)
("huh", foo::huh);
}
} const enum_parser;
I am interested in if there is a way to parse into a set using boost spirit x3.
Spirit can parse into std::set<> out of the box (at least as of Boost 1.61.0), so the following already works with the types you've shown:
std::set<foo> foos;
x3::phrase_parse(
input.begin(), input.end(),
+enum_parser,
x3::space,
foos
);
Online Demo
To get your parser to fail upon encountering duplicates, this is most easily accomplished with semantic actions:
std::set<foo> foos;
auto insert_foo_or_fail = [&foos](auto& ctx) {
_pass(ctx) = foos.insert(_attr(ctx)).second;
};
x3::phrase_parse(
input.begin(), input.end(),
+x3::omit[enum_parser[insert_foo_or_fail]],
x3::space
);
Online Demo

Boost spirit changing variable value in semantic action

I want to change a local variable value in semantic action, like following:
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;
namespace ascii = boost::spirit::ascii;
using boost::phoenix::ref;
using boost::phoenix::bind;
void dummy(const std::vector<char>& v, int& var)
{
var = 7;
}
template <typename Iterator>
struct x_grammar : public qi::grammar<Iterator, std::string(), ascii::space_type>
{
public:
x_grammar() : x_grammar::base_type(start_rule, "x_grammar")
{
using namespace qi;
int local_var = 0;
start_rule = (+(char_ - ";"))[bind(dummy, _1, ref(local_var))];
//repeat(ref(local_var))[some_rule];
}
private:
qi::rule<Iterator, std::string(), ascii::space_type> start_rule;
};
int main()
{
typedef std::string::const_iterator iter;
std::string storage("string;aaa");
iter it_begin(storage.begin());
iter it_end(storage.end());
std::string read_data;
using boost::spirit::ascii::space;
x_grammar<iter> g;
try {
bool r = qi::phrase_parse(it_begin, it_end, g, space, read_data);
std::cout << "Pass!\n";
} catch (const qi::expectation_failure<iter>& x) {
std::cout << "Error!\n";
}
}
I am getting some annoying compile errors using GCC 4.6.1 with boost 1.55.
I can't help but note that if compiler errors annoy you, then perhaps you should write valid code :/
Instructive Hat On...
While that's of course a flippant remark, it's also somewhat enlightening.
I've told you twice now that the whole idea of using constructor local variables in your grammar is fundamentally broken:
Boost spirit semantic action not invoked
Boost spirit using local variables
What you want is
inherited attributes
qi::locals
maayyyyybe, maaaayyybe grammar member variables; with the caveat that they make your rules non-re-entrant.
The important thing here to really get inside your head is
Boost Spirit generates parser from expression templates. Expression templates are 90% static information (type only), and get "compiled" (.compile()) into "invokable" (.parse()) form.
Most importantly, while you can write control flow in your semantic actions, none of this actually executed at the definition site. It's "compiled" into a lazy actor that can later be invoked.
The generated parse will conditionally invoke the lazy actor when the corresponding parse expression matches
Constructive Hat On...
It looks like you just want to transform attributes using a function.
Here's what you can do:
transform as part of the semantic action, placing the result into the regular attribute (maintaining 'functional' semantics for parser composition):
qi::rule<Iterator, exposed(), Skipper> myrule;
myrule = int_ [ _val = phx::bind(custom_xform, _1) ];
Where custom_xform is any old-school calleable (including polymorphic ones):
exposed custom_xform(int i) { return make_new_exposed(i); }
// or
struct custom_xfrom_t {
template <typename> struct result { typedef exposed type; };
template <typename Int>
exposed operator()(Int i) const {
return make_new_exposed(i);
}
};
static const custom_xform_t custom_xform;
You can add some syntactic sugar [1]
qi::rule<Iterator, exposed(), Skipper> myrule;
myrule = int_ [ _val = custom_xform(_1) ];
This requires custom_xform is defined as a lazy actor:
phx::function<custom_xform_t> custom_xform; // `custom_xform_t` again the (polymorphic) functor
You may note this wouldn't work for a regular function. You could wrap it in a calleable object, or use the BOOST_PHOENIX_ADAPT_FUNCTION macro to do just that for you
If you have some more involved transformations that you want to apply more often, consider using the Spirit Customization Points:
Customization of Spirit's Attribute Handling, specifically:
Transform an Attribute to a Different Type
Store a Parsed Attribute Value
These work most smoothly if you choose specific types for your attributes (e.g. Ast::Multiplicity or Ast::VelocityRanking, instead of int or double
[1] using BOOST_SPIRIT_USE_PHOENIX_V3
The code compiles with C++03. However, when using GCC 4.6's C++11 support, the code fails to compile. Here are the relevant excerpts from the error:
/usr/local/include/boost/spirit/home/support/action_dispatch.hpp: In static
member function 'static void boost::spirit::traits::action_dispatch<
Component>::caller(F&&, A&& ...) [with F =
const std::_Bind<with Boost.Phoenix actors>]'
...
main.cpp:25:9: instantiated from 'x_grammar<Iterator>::x_grammar() [...]
/usr/local/include/boost/spirit/home/support/action_dispatch.hpp:142:13: error:
no matching function for call to 'boost::spirit::traits::
action_dispatch<...>::do_call(const std::_Bind<with Boost.Phoenix actors>)'
Despite the using boost::phoenix::bind directive, the unqualified call to bind() is resolving to std::bind() rather than boost::phoenix::bind(), but the arguments are resolving to Boost.Phoenix actors. The Boost.Spirit documentation specifically warns about not mixing placeholders from different libraries:
You have to make sure not to mix placeholders with a library they don't belong to and not to use different libraries while writing a semantic action.
Hence, the compilation problem can be resolved by being explicit when defining the semantic action. Use either:
std::bind(dummy, std::placeholders::_1, std::ref(local_var))
or:
boost::phoenix::bind(dummy, _1, ref(local_var))
While that resolves the compiler error, it is worth noting that the ref(local_var) object will maintain a dangling reference, as its lifetime extends beyond that of local_var. Here is a working example where local_var's lifetime is extend to beyond the scope of the constructor by making it static.

How to tel a boost::karma::rule not to consume its attribute without providing a valid generator?

Say we have the following source code:
#include <iostream>
#include <string>
#include <iterator>
#include <boost/spirit/include/karma.hpp>
namespace karma = boost::spirit::karma;
template <typename OutputIterator> struct grammar : karma::grammar<OutputIterator, std::nullptr_t()> {
grammar() : grammar::base_type(query) {
query = "yeah";
}
karma::rule<OutputIterator, std::nullptr_t()> query;
};
int main(void) {
typedef std::back_insert_iterator<std::string> iterator_type;
std::string generated;
iterator_type output_it(generated);
//keys_and_values<sink_type> g;
grammar<iterator_type> g;
bool result = karma::generate(output_it, g, nullptr);
std::cout << result << ":" << generated << std::endl;
return 0;
}
This fails to compile because karma lacks some traits for std::nullptr_t (those are boost::spirit::traits::extract_c_string and boost::spirit::traits::char traits). More specifically, it fails because karma is not able to find a generator for an attribute of type std::nullptr_t.
I see several ways to cope with that:
Replace std::nullptr_t by karma::unused_type in the grammar definition : It works on this example but may introduce ambiguity in a more complex grammar.
Defining the traits specialization : In my opinion, this is dirty and not generic. Plus, it exposes my specialization of a standard type for everyone, leading to potential conflicts.
Specializing an attribute transform : Same problem of specializing a standard type just for me.
Write a custom generator : The best candidate so far, but it makes a serious load of highly templated code lines compared to the task complexity.
Put a intermediate rule with a karma::unused_type attribute. A quick fix that works but have no sense.
Question : How can I tell the karma::rule to generate a simple literal and not care about having or not a generator for its attribute ?
You seem to have stumbled on the inverse of the infamous single-element fusion sequence conundrum[1] :(
I noticed, because the error emanates from the code trying to verify that the input string matches the attribute (lit.hpp):
// fail if attribute isn't matched by immediate literal
typedef typename attribute<Context>::type attribute_type;
typedef typename spirit::result_of::extract_from<attribute_type, Attribute>::type
extracted_string_type;
using spirit::traits::get_c_string;
if (!detail::string_compare(
get_c_string(
traits::extract_from<attribute_type>(attr, context))
, get_c_string(str_), char_encoding(), Tag()))
{
return false;
}
However, that makes no sense at all, since the docs state:
lit, like string, also emits a string of characters. The main difference is that lit does not consumes [sic] an attribute. A plain string like "hello" or a std::basic_string is equivalent to a lit
So I just... on a whim thought to coerce things a little, by using the same workaround that works for single-element fusion sequences on the Qi side:
query = karma::eps << "yeah";
And, voilĂ : it works: Live On Coliru
[1] See
How do I use a class with only one attribute in a AST with Boost Spirit?
Spirit Qi attribute propagation issue with single-member struct
Compiler error when adapting struct with BOOST_FUSION_ADAPT_STRUCT
Adapt class containing a string member as synthesized attribute
Etc. This is a sad flaw that will probably need to be worked around for SpiritV2.
Possible answer : After posting this, I found a solution which satisfies me for the moment. That is : introduce a intermediate rule.
template <typename OutputIterator> struct grammar : karma::grammar<OutputIterator, std::nullptr_t()> {
grammar() : grammar::base_type(query) {
query = null_rule;
null_rule = "null";
}
karma::rule<OutputIterator, std::nullptr_t()> query;
karma::rule<OutputIterator, karma::unused_type()> null_rule;
};
I'm still interested in any comment, reproach or other solution.

How to use Boost Spirit auto rules with AST?

EDIT: I expanded sehe's example to show the problem when I want to use it on another rule: http://liveworkspace.org/code/22lxL7$17
I'm trying to improve the performances of my Boost Spirit parser and I saw that since C++11, it was possible to use auto-rules like that:
auto comment = "/*" >> *(char_ - "*/") >> "*/";
(or with BOOST_AUTO or BOOST_SPIRIT_AUTO).
I have a rule declarer like that:
qi::rule<lexer::Iterator, ast::SimpleType()> simple_type;
and defined like that:
simple_type %=
const_
>> lexer.identifier;
If I declare it with auto, it compiles, but it cannot be used as AST in other rules.
Is it possible to define rules creating AST with auto rules ?
I'm also interested in other ways to speedup AST creation in Boost Spirit.
First of all, I tried a simple example and "it works for me" with a simple adapted struct:
struct fixed
{
int integral;
unsigned fractional;
};
BOOST_FUSION_ADAPT_STRUCT(fixed, (int, integral)(unsigned, fractional));
template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, std::vector<fixed>(), Skipper>
{
parser() : parser::base_type(start)
{
using namespace qi;
BOOST_SPIRIT_AUTO(qi, fixed_rule, lexeme [ int_ >> -('.' >> uint_ | attr(0u)) ]);
start = *fixed_rule;
BOOST_SPIRIT_DEBUG_NODE(start);
}
private:
qi::rule<It, std::vector<fixed>(), Skipper> start;
};
This happily parses the inputs: http://liveworkspace.org/code/22lxL7$1
I think you might mean where attribute compatibility is required, and
attr_cast (doc)
as<T>(parse_expr) (doc)
should be able to help out just fine in those cases.
See this answer for more details on attr_cast (and attribute compatibility in general): String parser with boost variant recursive wrapper
There's no such thing as an "auto rule". When you auto-capture an expression like that, you're using all of the defaults to create a rule. So the attribute of this "auto rule" will be exactly and only the attribute of the expression, with no attribute conversion.
If you need to create special attribute data (ie: you need to convert the incoming attribute type to your own data), you must either use a rule or a semantic action.