boost::spirit::x3 attribute compatibility rules, intuition or code? - c++

Is there a document somewhere which describes how various spirit::x3 rule definition operations affect attribute compatibility?
I was surprised when:
x3::lexeme[ x3::alpha > *(x3::alnum | x3::char_('_')) ]
could not be moved into a fusion-adapted struct:
struct Name {
std::string value;
};
For the time being, I got rid of the first mandatory alphabetical character, but I would still like to express a rule which defines that the name string must begin with a letter. Is this one of those situations where I need to try adding eps around until it works, or is there a stated reason why the above couldn't work?
I apologize if this has been written down somewhere, I couldn't find it.

If you're not on the develop branch you don't have the fix for that single-element sequence adaptiation bug, so yeah it's probably that.
Due to the genericity of attribute transformation/propagation, there's a lot of wiggle room, but of course it's just documented and ultimately in the code. In other words: there's no magic.
In the Qi days I'd have "fixed" this by just spelling out the desired transform with qi::as<> or qi::attr_cast<>. X3 doesn't have it (yet), but you can use a rule to mimick it very easily:
Live On Coliru
#include <iostream>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
struct Name {
std::string value;
};
BOOST_FUSION_ADAPT_STRUCT(Name, value)
int main() {
std::string const input = "Halleo123_1";
Name out;
bool ok = x3::parse(input.begin(), input.end(),
x3::rule<struct _, std::string>{} =
x3::alpha >> *(x3::alnum | x3::char_('_')),
out);
if (ok)
std::cout << "Parsed: " << out.value << "\n";
else
std::cout << "Parse failed\n";
}
Prints:
Parsed: Halleo123_1
Automate it
Because X3 works so nicely with c++14 core language features, it's not hard to reduce typing:
Understanding the List Operator (%) in Boost.Spirit

Related

Type safety of Boost Qi/X3 parse

Consider this code using Boost Spirit X3 (conceptually same goes for Boost Spirit Qi):
string command;
string value;
x3::parse(command.begin(), command.end(), "float:" >> x3::double_, value);
Why is this code not generating any error during compilation? Shouldn't "float:" >> x3::double_ parser have attribute of type double and therefore not accept std::string as 4th argument to parse?
BTW, I know I could do this:
string value;
auto assign = [](auto& ctx){value = _attr(ctx)};
x3::parse(command.begin(), command.end(), "float:" >> x3::double_[assign], value)
which would generate an error, but it is more complicated than necessary.
As a last resort: any there any well-known replacements for sscanf that would be type safe (possibly in boost)?
It's because automatic attribute propagation into a container is very flexible.
And std::string is a container.
Perhaps if we do the following experiment it makes a bit more sense:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
int main() {
//std::string command = "f: hello world";
std::string command = "f: 104 101 108 108 111 32 119 111 114 108 100";
std::string value;
x3::phrase_parse(
command.begin(),
command.end(),
"f:" >> *x3::double_, x3::blank,
value);
std::cout << value << "\n";
}
As you might surmise, it prints:
hello world
For strings this seems surprising. It would be much less of a surprise if the target was std::vector<float> - but it would require precisely the same amount of conversions.
It's fair for each element to enjoy propagation conversions. What if you were parsing Ast nodes like:
struct Operation {
Operator op;
std::vector<Expression> operands;
}
You would hate if operands didn't implicitly convert into Expression from Unary, FunctionCall, StringLiteral etc.
#tomasz-grobelny, using the explicit_convert_parser_type from
here, and with #defined REPRODUCE_SO_62986317, correctly compile-time
errors when passing anything but a double as the attribute.
HTH,
Larry

Spirit X3: attribute of alternative parser, not `char`, but `variant<char, char>`

According to spirit x3 attribute collapsing rule, the attribute of an anternative parser with 2
alternation having same attribute, should collapse, i.e. a: A, b: A --> (a | b): A, but the code below shows its not, is it a bug or my fault?
Here is the spirit x3 documentation of compound attribute rules https://www.boost.org/doc/libs/develop/libs/spirit/doc/x3/html/spirit_x3/quick_reference/compound_attribute_rules.html
I've got a code snippet to reproduce this question
Here is the code https://wandbox.org/permlink/7MvN03yiX7ir3esE
And the original code in case the link expired
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace fusion = boost::fusion;
using x3::_val;
using x3::_attr;
using x3::_where;
using fusion::at_c;
auto const string_def = x3::lexeme["\"" >> *(x3::symbols<char>({ {"\\\"", '"'} }) | (x3::print - '\"')) > "\""][([](auto& ctx) {
std::cout << typeid(_attr(ctx)).name();
//should _attr(ctx) be a vector<char>, according to attribute collapsing rule?
})];
int main() {
std::string input = R"__("hello\"world")__";
x3::parse(input.begin(), input.end(), string_def);
}
Bascally, the code parse a string literal, which contains a number of chars (*), which is either a character except '"' (x3::print-'"'), or a escape sequence "\\\"" representing the character '"' (x3::symbol<char>...), both of them has a attribute of char, and char | char should be char, not variant<char, char>.
the cout part of the code shows result like St6vectorIN5boost7variantINS0_6detail7variant13over_sequenceINS0_3mpl6l_itemIN4mpl_5long_ILl2EEEcNS6_INS8_ILl1EEEcNS5_5l_endEEEEEEEJEEESaISF_EE, after demangling, it is vector<variant<over_sequence<...>>> and variant<over_sequence<> is basically a workaround of variant<T...> in pre-C++11 times.
So what's wrong with the code, should it be vector<char>, not vector<variant>? And BTW, can I customize attribute collapsing rule myself?
As #IgorR. mentioned in comment, semantic action suppresses attribute propagation, I ignored that point. So problem solved, I might need just parse this into std::string.
For some reason I need parsing this into a unique_ptr object, so I need separate the parser into a rule<StringContent, std::string> and a rule<StringLiteral, std::unique_ptr<ast::StringLiteral>, true>, problem solved

undefined behaviour somewhere in boost::spirit::qi::phrase_parse

I am learning to use boost::spirit library. I took this example http://www.boost.org/doc/libs/1_56_0/libs/spirit/example/qi/num_list1.cpp and compiled it on my computer - it works fine.
However if I modify it a little - if I initialize the parser itself
auto parser = qi::double_ >> *(',' >> qi::double_);
somewhere as global variable and pass it to phrase_parse, everything goes crazy. Here is the complete modified code (only 1 line is modified and 1 added) - http://pastebin.com/5rWS3pMt
If I run the original code and pass "3.14, 3.15" to stdin, it says Parsing succeeded, but with my modified version it fails. I tried a lot of modifications of the same type - assigning the parser to global variable - in some variants on some compilers it segfaults.
I don't understand why and how it is so.
Here is another, simpler version which prints true and then segfaults on clang++ and just segfaults on g++
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
const auto doubles_parser_global = qi::double_ >> *(',' >> qi::double_);
int main() {
const auto doubles_parser_local = qi::double_ >> *(',' >> qi::double_);
const std::string nums {"3.14, 3.15, 3.1415926"};
std::cout << std::boolalpha;
std::cout
<< qi::phrase_parse(
nums.cbegin(), nums.cend(), doubles_parser_local, ascii::space
)
<< std::endl; // works fine
std::cout
<< qi::phrase_parse(
nums.cbegin(), nums.cend(), doubles_parser_global, ascii::space
) // this segfaults
<< std::endl;
}
You cannot use auto to store parser expressions¹
Either you need to evaluate from the temporary expression directly, or you need to assign to a rule/grammar:
const qi::rule<std::string::const_iterator, qi::space_type> doubles_parser_local = qi::double_ >> *(',' >> qi::double_);
You can have your cake and eat it too on most recent BOost versions (possibly the dev branch) there should be a BOOST_SPIRIT_AUTO macro
This is becoming a bit of a FAQ item:
Assigning parsers to auto variables
boost spirit V2 qi bug associated with optimization level
¹ I believe this is actually a limitation of the underlying Proto library. There's a Proto-0x lib version on github (by Eric Niebler) that promises to solve these issues by being completely redesigned to be aware of references. I think this required some c++11 features that Boost Proto currently cannot use.

boost spirit semantic action parameters

in this article about boost spirit semantic actions it is mentioned that
There are actually 2 more arguments
being passed: the parser context and a
reference to a boolean ‘hit’
parameter. The parser context is
meaningful only if the semantic action
is attached somewhere to the right
hand side of a rule. We will see more
information about this shortly. The
boolean value can be set to false
inside the semantic action invalidates
the match in retrospective, making the
parser fail.
All fine, but i've been trying to find an example passing a function object as semantic action that uses the other parameters (parser context and hit boolean) but i haven't found any. I would love to see an example using regular functions or function objects, as i barely can grok the phoenix voodoo
This a really good question (and also a can of worms) because it gets at the interface of qi and phoenix. I haven't seen an example either, so I'll extend the article a little in this direction.
As you say, functions for semantic actions can take up to three parameters
Matched attribute - covered in the article
Context - contains the qi-phoenix interface
Match flag - manipulate the match state
Match flag
As the article states, the second parameter is not meaningful unless the expression is part of a rule, so lets start with the third. A placeholder for the second parameter is still needed though and for this use boost::fusion::unused_type. So a modified function from the article to use the third parameter is:
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>
void f(int attribute, const boost::fusion::unused_type& it, bool& mFlag){
//output parameters
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;
//fiddle with match flag
mFlag = false;
}
namespace qi = boost::spirit::qi;
int main(void){
std::string input("1234 6543");
std::string::const_iterator begin = input.begin(), end = input.end();
bool returnVal = qi::phrase_parse(begin, end, qi::int_[f], qi::space);
std::cout << "return: " << returnVal << std::endl;
return 0;
}
which outputs:
matched integer: '1234'
match flag: 1
return: 0
All this example does is switch the match to a non-match, which is reflected in the parser output. According to hkaiser, in boost 1.44 and up setting the match flag to false will cause the match to fail in the normal way. If alternatives are defined, the parser will backtrack and attempt to match them as one would expect. However, in boost<=1.43 a Spirit bug prevents backtracking, which causes strange behavior. To see this, add phoenix include boost/spirit/include/phoenix.hpp and change the expression to
qi::int_[f] | qi::digit[std::cout << qi::_1 << "\n"]
You'd expect that, when the qi::int parser fails, the alternative qi::digit to match the beginning of the input at "1", but the output is:
matched integer: '1234'
match flag: 1
6
return: 1
The 6 is the first digit of the second int in the input which indicates the alternative is taken using the skipper and without backtracking. Notice also that the match is considered succesful, based on the alternative.
Once boost 1.44 is out, the match flag will be useful for applying match criteria that might be otherwise difficult to express in a parser sequence. Note that the match flag can be manipulated in phoenix expressions using the _pass placeholder.
Context parameter
The more interesting parameter is the second one, which contains the qi-phoenix interface, or in qi parlance, the context of the semantic action. To illustrate this, first examine a rule:
rule<Iterator, Attribute(Arg1,Arg2,...), qi::locals<Loc1,Loc2,...>, Skipper>
The context parameter embodies the Attribute, Arg1, ... ArgN, and qi::locals template paramters, wrapped in a boost::spirit::context template type. This attribute differs from the function parameter: the function parameter attribute is the parsed value, while this attribute is the value of the rule itself. A semantic action must map the former to the latter. Here's an example of a possible context type (phoenix expression equivalents indicated):
using namespace boost;
spirit::context< //context template
fusion::cons<
int&, //return int attribute (phoenix: _val)
fusion::cons<
char&, //char argument1 (phoenix: _r1)
fusion::cons<
float&, //float argument2 (phoenix: _r2)
fusion::nil //end of cons list
>,
>,
>,
fusion::vector2< //locals container
char, //char local (phoenix: _a)
unsigned int //unsigned int local (phoenix: _b)
>
>
Note the return attribute and argument list take the form of a lisp-style list (a cons list). To access these variables within a function, access the attribute or locals members of the context struct template with fusion::at<>(). For example, for a context variable con
//assign return attribute
fusion::at_c<0>(con.attributes) = 1;
//get the second rule argument
float arg2 = fusion::at_c<2>(con.attributes);
//assign the first local
fusion::at_c<1>(con.locals) = 42;
To modify the article example to use the second argument, change the function definition and phrase_parse calls:
...
typedef
boost::spirit::context<
boost::fusion::cons<int&, boost::fusion::nil>,
boost::fusion::vector0<>
> f_context;
void f(int attribute, const f_context& con, bool& mFlag){
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;
//assign output attribute from parsed value
boost::fusion::at_c<0>(con.attributes) = attribute;
}
...
int matchedInt;
qi::rule<std::string::const_iterator,int(void),ascii::space_type>
intRule = qi::int_[f];
qi::phrase_parse(begin, end, intRule, ascii::space, matchedInt);
std::cout << "matched: " << matchedInt << std::endl;
....
This is a very simple example that just maps the parsed value to the output attribute value, but extensions should be fairly apparent. Just make the context struct template parameters match the rule output, input, and local types. Note that this type of a direct match between parsed type/value to output type/value can be done automatically using auto rules, with a %= instead of a = when defining the rule:
qi::rule<std::string::const_iterator,int(void),ascii::space_type>
intRule %= qi::int_;
IMHO, writing a function for each action would be rather tedious, compared to the brief and readable phoenix expression equivalents. I sympathize with the voodoo viewpoint, but once you work with phoenix for a little while, the semantics and syntax aren't terribly difficult.
Edit: Accessing rule context w/ Phoenix
The context variable is only defined when the parser is part of a rule. Think of a parser as being any expression that consumes input, where a rule translates the parser values (qi::_1) into a rule value (qi::_val). The difference is often non-trivial, for example when qi::val has a Class type that needs to be constructed from POD parsed values. Below is a simple example.
Let's say part of our input is a sequence of three CSV integers (x1, x2, x3), and we only care out an arithmetic function of these three integers (f = x0 + (x1+x2)*x3 ), where x0 is a value obtained elsewhere. One option is to read in the integers and calculate the function, or alternatively use phoenix to do both.
For this example, use one rule with an output attribute (the function value), and input (x0), and a local (to pass information between individual parsers with the rule). Here's the full example.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
int main(void){
std::string input("1234, 6543, 42");
std::string::const_iterator begin = input.begin(), end = input.end();
qi::rule<
std::string::const_iterator,
int(int), //output (_val) and input (_r1)
qi::locals<int>, //local int (_a)
ascii::space_type
>
intRule =
qi::int_[qi::_a = qi::_1] //local = x1
>> ","
>> qi::int_[qi::_a += qi::_1] //local = x1 + x2
>> ","
>> qi::int_
[
qi::_val = qi::_a*qi::_1 + qi::_r1 //output = local*x3 + x0
];
int ruleValue, x0 = 10;
qi::phrase_parse(begin, end, intRule(x0), ascii::space, ruleValue);
std::cout << "rule value: " << ruleValue << std::endl;
return 0;
}
Alternatively, all the ints could be parsed as a vector, and the function evaluated with a single semantic action (the % below is the list operator and elements of the vector are accessed with phoenix::at):
namespace ph = boost::phoenix;
...
qi::rule<
std::string::const_iterator,
int(int),
ascii::space_type
>
intRule =
(qi::int_ % ",")
[
qi::_val = (ph::at(qi::_1,0) + ph::at(qi::_1,1))
* ph::at(qi::_1,2) + qi::_r1
];
....
For the above, if the input is incorrect (two ints instead of three), bad thing could happen at run time, so it would be better to specify the number of parsed values explicitly, so parsing will fail for a bad input. The below uses _1, _2, and _3 to reference the first, second, and third match value:
(qi::int_ >> "," >> qi::int_ >> "," >> qi::int_)
[
qi::_val = (qi::_1 + qi::_2) * qi::_3 + qi::_r1
];
This is a contrived example, but should give you the idea. I've found phoenix semantic actions really helpful in constructing complex objects directly from input; this is possible because you can call constructors and member functions within semantic actions.

Error avalanche in Boost.Spirit.Qi usage

I'm not being able to figure out what's wrong with my code. Boost's templates are making me go crazy! I can't make heads or tails out of all this, so I just had to ask.
What's wrong with this?
#include <iostream>
#include <boost/lambda/lambda.hpp>
#include <boost/spirit/include/qi.hpp>
void parsePathTest(const std::string &path)
{
namespace lambda = boost::lambda;
using namespace boost::spirit;
const std::string permitted = "._\\-##a-zA-Z0-9";
const std::string physicalPermitted = permitted + "/\\\\";
const std::string archivedPermitted = permitted + ":{}";
std::string physical,archived;
// avoids non-const reference to rvalue
std::string::const_iterator begin = path.begin(),end = path.end();
// splits a string like "some/nice-path/while_checking:permitted#symbols.bin"
// as physical = "some/nice-path/while_checking"
// and archived = "permitted#symbols.bin" (if this portion exists)
// I could barely find out the type for this expression
auto expr
= ( +char_(physicalPermitted) ) [lambda::var(physical) = lambda::_1]
>> -(
':'
>> (
+char_(archivedPermitted) [lambda::var(archived) = lambda::_1]
)
)
;
// the error occurs in a template instantiated from here
qi::parse(begin,end,expr);
std::cout << physical << '\n' << archived << '\n';
}
The number of errors is immense; I would suggest people who want to help compiling this on their on (trust me, pasting here is unpractical). I am using the latest TDM-GCC version (GCC 4.4.1) and Boost version 1.39.00.
As a bonus, I would like to ask another two things: whether C++0x's new static_assert functionality will help Boost in this sense, and whether the implementation I've quoted above is a good idea, or if I should use Boost's String Algorithms library. Would the latter likely give a much better performance?
Thanks.
-- edit
The following very minimal sample fails at first with the exact same error as the code above.
#include <iostream>
#include <boost/spirit/include/qi.hpp>
int main()
{
using namespace boost::spirit;
std::string str = "sample";
std::string::const_iterator begin(str.begin()), end(str.end());
auto expr
= ( +char_("a-zA-Z") )
;
// the error occurs in a template instantiated from here
if (qi::parse(begin,end,expr))
{
std::cout << "[+] Parsed!\n";
}
else
{
std::cout << "[-] Parsing failed.\n";
}
return 0;
}
-- edit 2
I still don't know why it didn't work in my old version of Boost (1.39), but upgrading to Boost 1.42 solved the problem. The following code compiles and runs perfectly with Boost 1.42:
#include <iostream>
#include <boost/spirit/include/qi.hpp>
int main()
{
using namespace boost::spirit;
std::string str = "sample";
std::string::const_iterator begin(str.begin()), end(str.end());
auto expr
= ( +qi::char_("a-zA-Z") ) // notice this line; char_ is not part of
// boost::spirit anymore (or maybe I didn't
// include the right headers, but, regardless,
// khaiser said I should use qi::char_, so here
// it goes)
;
// the error occurs in a template instantiated from here
if (qi::parse(begin,end,expr))
{
std::cout << "[+] Parsed!\n";
}
else
{
std::cout << "[-] Parsing failed.\n";
}
return 0;
}
Thanks for the tips, hkaiser.
Several remarks: a) don't use the Spirit V2 beta version distributed with Boost V1.39 and V1.40. Use at least Spirit V2.1 (as released with Boost V1.41) instead, as it contains a lot of bug fixes and performance enhancements (both, compile time and runtime performance). If you can't switch Boost versions, read here for how to proceed. b) Try to avoid using boost::lambda or boost::bind with Spirit V2.x. Yes, I know, the docs say it works, but you have to know what you're doing. Use boost::phoenix expressions instead. Spirit 'knows' about Phoenix, which makes writing semantic actions easier. If you use Phoenix, your code will look like:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phoenix = boost::phoenix;
std::string physical, archived;
auto expr
= ( +char_(physicalPermitted) ) [phoenix::ref(physical) = qi::_1]
>> -(
':'
>> ( +char_(archivedPermitted) )[phoenix::ref(archived) = qi::_1]
)
;
But your overall parser will get even simpler if you utilize Spirit's built-in attribute propagation rules:
std::string physical;
boost::optional<std::string> archived;
qi::parse(begin, end,
+qi::char_(physicalPermitted) >> -(':' >> +qi::char_(archivedPermitted)),
physical, archived);
i.e. no need to have semantic actions at all. If you need more information about the attribute handling, see the article series about the Magic of Attributes on Spirit's web site.
Edit:
Regarding your static_assert question: yes static_assert, can improve error messages as it can be used to trigger compiler errors as early as possible. In fact, Spirit uses this technique extensively already. But it is not possible to protect the user from getting those huge error messages in all cases, but only for those user errors the programmer did expect. Only concepts (which unfortunately didn't make it into the new C++ Standard) could have been used to generally reduce teh size of the error messages.
Regarding your Boost's String Algorithms question: certainly it's possible to utilize this library for simple tasks as yours. You might even be better off using Boost.Tokenizer (if all you need is to split the input string at the ':'). The performance of Spirit should be comparable to the corresponding performance of the string algorithms, but this certainly depends on the code you will write. If you assume that the utilized string algorithm will require one pass over the input string data, then Spirit won't be faster (as it's doing one pass as well).
Neither the Boost String algorithms nor the Boost Tokenizer can give you the verification of the characters matched. Your Spirit grammar matches only the characters you specified in the character classes. So if you need this matching/verification, you should use either Spirit or Boost Regex.