I've reduced my code to the absolute minimum needed to reproduce the error (sadly that is still 60 lines, not quite Minimal, but its VCE at least).
I'm using Boost 1.56 in Visual Studio 2013 (Platform Toolset v120).
The code below gives me an Access Violation unless I uncomment the marked lines. By doing some tests, it seems boost::spirit doesn't like it if the enum starts at 0 (in my full code I have more values in the enum and I just set IntLiteral = 1 and it also got rid of the access violation error, although the names were wrong because ToString was off by one when indexing into the array).
Is this a bug in boost::spirit or did I do something wrong?
#include <iostream>
#include <string>
#include <vector>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
namespace lex = boost::spirit::lex;
typedef lex::lexertl::token<char const*> LexToken;
typedef lex::lexertl::actor_lexer<LexToken> LexerType;
typedef boost::iterator_range<char const*> IteratorRange;
enum TokenType
{
//Unused, // <-- Uncommenting this fixes the error (1/2)
IntLiteral,
};
std::string tokenTypeNames[] = {
//"unused", // <-- Uncommenting this line fixes the error (2/2)
"int literal",
};
std::string ToString(TokenType t)
{
return tokenTypeNames[t];
}
template <typename T>
struct Lexer : boost::spirit::lex::lexer < T >
{
Lexer()
{
self.add
// Literals
("[1-9][0-9]*", TokenType::IntLiteral);
}
};
int main(int argc, char* argv[])
{
std::cout << "Boost version: " << BOOST_LIB_VERSION << std::endl;
std::string input = "33";
char const* inputIt = input.c_str();
char const* inputEnd = &input[input.size()];
Lexer<LexerType> tokens;
LexerType::iterator_type token = tokens.begin(inputIt, inputEnd);
LexerType::iterator_type end = tokens.end();
for (; token->is_valid() && token != end; ++token)
{
auto range = boost::get<IteratorRange>(token->value());
std::cout << ToString(static_cast<TokenType>(token->id())) << " (" << std::string(range.begin(), range.end()) << ')' << std::endl;
}
std::cin.get();
return 0;
}
If I uncomment the lines I get:
Boost version: 1_56
int literal (33)
The fact that it "works" if you uncomment those lines, is pure accident.
From the docs spirit/lex/tutorials/lexer_quickstart2.html:
To ensure every token gets assigned a id the Spirit.Lex library internally assigns unique numbers to the token definitions, starting with the constant defined by boost::spirit::lex::min_token_id
See also this older answer:
Spirit Lex: Which token definition generated this token?
So you can just fix it using the offset, but I guess it will keep on being a brittle solution as it is very easy to let the enum go out of synch with actual token definitions in the lexer tables.
I'd suggest using the nameof() approach as given in the linked answer, which leverages named token_def<> objects.
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
namespace lex = boost::spirit::lex;
typedef lex::lexertl::token<char const*> LexToken;
typedef lex::lexertl::actor_lexer<LexToken> LexerType;
typedef boost::iterator_range<char const*> IteratorRange;
enum TokenType {
IntLiteral = boost::spirit::lex::min_token_id
};
std::string const& ToString(TokenType t) {
static const std::string tokenTypeNames[] = {
"int literal",
};
return tokenTypeNames[t - boost::spirit::lex::min_token_id];
}
template <typename T>
struct Lexer : boost::spirit::lex::lexer<T> {
Lexer() {
this->self.add
// Literals
("[1-9][0-9]*", TokenType::IntLiteral);
}
};
int main() {
std::cout << "Boost version: " << BOOST_LIB_VERSION << std::endl;
std::string input = "33";
char const* inputIt = input.c_str();
char const* inputEnd = &input[input.size()];
Lexer<LexerType> tokens;
LexerType::iterator_type token = tokens.begin(inputIt, inputEnd);
LexerType::iterator_type end = tokens.end();
for (; token->is_valid() && token != end; ++token)
{
auto range = boost::get<IteratorRange>(token->value());
std::cout << ToString(static_cast<TokenType>(token->id())) << " (" << std::string(range.begin(), range.end()) << ')' << std::endl;
}
}
Related
I am trying to learn Boost.Spirit, but I have found a difficulty.
I am trying to parse a string into the following structure:
struct employee {
std::string name;
std::string location;
};
And it seems that when two attributes with the same type are back to back, they collapse (logically) into a std::vector of that type. Because of that rule, the following parser
+x3::ascii::alnum >>
+x3::space >>
+x3::ascii::alnum
would have the attribute of std::vector<std::string>.
But I am trying to parse this into that struct, which means that the ideal attribute for me would be a boost::fusion::tuple<std::string, std::string>, so I can adapt my struct to it.
The complete version of the not working code (referenced above):
// Example program
#include <iostream>
#include <string>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
struct employee {
std::string name;
std::string location;
};
BOOST_FUSION_ADAPT_STRUCT(employee,
(std::string, name),
(std::string, location)
)
namespace x3 = boost::spirit::x3;
x3::rule<struct parse_emp_id, employee> const parse_emp = "Employee Parser";
auto parse_emp_def =
+x3::ascii::alnum >>
+x3::space >>
+x3::ascii::alnum
;
BOOST_SPIRIT_DEFINE(parse_emp);
int main()
{
std::string input = "Joe Fairbanks";
employee ret;
x3::parse(input.begin(), input.end(), parse_emp, ret);
std::cout << "Name: " << ret.name << "\tLocation: " << ret.location << std::endl;
}
See it live
This code triggers a static_assert telling me that my attribute isn't correct:
error: static_assert failed "Attribute does not have the expected size."
With the command of
clang++ -std=c++14 test.cpp
(it also fails under GCC).
What I have tried
I have found a workaround to this problem, but it is messy, and I can't believe that this is the cleanest way:
// Example program
#include <iostream>
#include <string>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
struct employee {
std::string name;
std::string location;
};
namespace x3 = boost::spirit::x3;
x3::rule<struct parse_emp_id, employee> const parse_emp = "Employee Parser";
auto parse_emp_def =
x3::eps [
([](auto& ctx) {
x3::_val(ctx) = employee{};
})
]>>
(+x3::ascii::alnum)[
([](auto& ctx) {
x3::_val(ctx).name = x3::_attr(ctx);
})
]>>
+x3::space >>
(+x3::ascii::alnum)[
([](auto& ctx) {
x3::_val(ctx).location = x3::_attr(ctx);
})
]
;
BOOST_SPIRIT_DEFINE(parse_emp);
int main()
{
std::string input = "Joe Fairbanks";
employee ret;
x3::parse(input.begin(), input.end(), parse_emp, ret);
std::cout << "Name: " << ret.name << "\tLocation: " << ret.location << std::endl;
}
See it live
I really don't like that solution: it kinda ruins the amazing expressiveness of spirit and makes it super ugly, also if I want to add new fields into the employee struct, then I have to add an extra lambda, instead of just updating my BOOST_FUSION_ADAPT_STRUCT, which is much easier.
So the question is: Is there some way to (hopefully) cleanly split two consecutive attributes of the same type from the std::vector and into a boost::fusion::vector?
Thank you in advance for getting this far ;).
The problem is that unlike character literals, x3::space has an attribute. So you don't have an attribute of two separate character sequences separated by whitespace, but rather an attribute of one big character sequence which includes the whitespace.
The omit directive is what you're after, and with that single addition your 'not working code' works. :-]
// Example program
#include <string>
#include <iostream>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
struct employee {
std::string name;
std::string location;
};
BOOST_FUSION_ADAPT_STRUCT(employee, name, location)
x3::rule<struct parse_emp_id, employee> const parse_emp = "Employee Parser";
auto parse_emp_def
= +x3::ascii::alnum
>> x3::omit[+x3::space]
>> +x3::ascii::alnum
;
BOOST_SPIRIT_DEFINE(parse_emp)
int main()
{
std::string const input = "Joe Fairbanks";
employee ret;
x3::parse(input.begin(), input.end(), parse_emp, ret);
std::cout << "Name: " << ret.name << "\tLocation: " << ret.location << '\n';
}
Online Demo
I am continuing to learn the Boost Spirit library and have comile issue with example that I couldn`t compile. The source of example you can find here: source place.
Also you can look at this code and compile result on Coliru
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
//#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#include <boost/spirit/include/phoenix_algorithm.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <string>
#include <iostream>
namespace lex = boost::spirit::lex;
struct distance_func
{
template <typename Iterator1, typename Iterator2>
struct result : boost::iterator_difference<Iterator1> {};
template <typename Iterator1, typename Iterator2>
typename result<Iterator1, Iterator2>::type
operator()(Iterator1& begin, Iterator2& end) const
{
return std::distance(begin, end);
}
};
boost::phoenix::function<distance_func> const distance = distance_func();
//[wcl_token_definition
template <typename Lexer>
struct word_count_tokens : lex::lexer<Lexer>
{
word_count_tokens()
: c(0), w(0), l(0)
, word("[^ \t\n]+") // define tokens
, eol("\n")
, any(".")
{
using boost::spirit::lex::_start;
using boost::spirit::lex::_end;
using boost::phoenix::ref;
// associate tokens with the lexer
this->self
= word [++ref(w), ref(c) += distance(_start, _end)]
| eol [++ref(c), ++ref(l)]
| any [++ref(c)]
;
}
std::size_t c, w, l;
lex::token_def<> word, eol, any;
};
//]
///////////////////////////////////////////////////////////////////////////////
//[wcl_main
int main(int argc, char* argv[])
{
typedef
lex::lexertl::token<char const*, lex::omit, boost::mpl::false_>
token_type;
/*< This defines the lexer type to use
>*/ typedef lex::lexertl::actor_lexer<token_type> lexer_type;
/*< Create the lexer object instance needed to invoke the lexical analysis
>*/ word_count_tokens<lexer_type> word_count_lexer;
/*< Read input from the given file, tokenize all the input, while discarding
all generated tokens
>*/ std::string str;
char const* first = str.c_str();
char const* last = &first[str.size()];
/*< Create a pair of iterators returning the sequence of generated tokens
>*/ lexer_type::iterator_type iter = word_count_lexer.begin(first, last);
lexer_type::iterator_type end = word_count_lexer.end();
/*< Here we simply iterate over all tokens, making sure to break the loop
if an invalid token gets returned from the lexer
>*/ while (iter != end && token_is_valid(*iter))
++iter;
if (iter == end) {
std::cout << "lines: " << word_count_lexer.l
<< ", words: " << word_count_lexer.w
<< ", characters: " << word_count_lexer.c
<< "\n";
}
else {
std::string rest(first, last);
std::cout << "Lexical analysis failed\n" << "stopped at: \""
<< rest << "\"\n";
}
return 0;
}
When I try to compile it I receive a lot of errors, see full list on Coliru.
What wrong with this example? What and why need be changed to compile it?
Apparently something changed in the internals of Lex, and the iterator(s) are now rvalues sometimes.
You need to adjust the distance_func to either read
operator()(Iterator1 begin, Iterator2 end) const
or
operator()(Iterator1 const& begin, Iterator2 const& end) const
Then it works. See Live On Coliru
question asked: Spirit-general list
Hello all,
I'm not sure if my subject is correct, but the testcode will probably show
what I want to achieve.
I'm trying to parse things like:
'%40' to '#'
'%3C' to '<'
I have a minimal testcase below. I don't understand why
this doesn't work. It's probably me making a mistake but I don't see it.
Using:
Compiler: gcc 4.6
Boost: current trunk
I use the following compile line:
g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp
#include <iostream>
#include <string>
#define BOOST_SPIRIT_UNICODE
#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>
typedef boost::uint32_t uchar; // Unicode codepoint
namespace qi = boost::spirit::qi;
int main(int argc, char **argv) {
// Input
std::string input = "%3C";
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
using qi::xdigit;
using qi::_1;
using qi::_2;
using qi::_val;
qi::rule<std::string::const_iterator, uchar()> pchar =
('%' > xdigit > xdigit) [_val = (_1 << 4) + _2];
std::string result;
bool r = qi::parse(begin, end, pchar, result);
if (r && begin == end) {
std::cout << "Output: " << result << std::endl;
std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
} else {
std::cerr << "Error" << std::endl;
return 1;
}
return 0;
}
Regards,
Matthijs Möhlmann
qi::xdigit does not do what you think it does: it returns the raw character (i.e. '0', not 0x00).
You could leverage qi::uint_parser to your advantage, making your parse much simpler as a bonus:
typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
no need to rely on phoenix (making it work on older versions of Boost)
get both characters in one go (otherwise, you might have needed to add copious casting to prevent integer sign extensions)
Here is a fixed up sample:
#include <iostream>
#include <string>
#define BOOST_SPIRIT_UNICODE
#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
typedef boost::uint32_t uchar; // Unicode codepoint
namespace qi = boost::spirit::qi;
typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
const static xuchar xuchar_ = xuchar();
int main(int argc, char **argv) {
// Input
std::string input = "%3C";
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::rule<std::string::const_iterator, uchar()> pchar = '%' > xuchar_;
uchar result;
bool r = qi::parse(begin, end, pchar, result);
if (r && begin == end) {
std::cout << "Output: " << result << std::endl;
std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
} else {
std::cerr << "Error" << std::endl;
return 1;
}
return 0;
}
Output:
Output: 60
Expected: < (LESS-THAN SIGN)
'<' is indeed ASCII 60
I'm newbie in Boost.Spirit.Lex.
Some strange error appears every time I try to use lex::_val in semantics actions in my simple lexer:
#ifndef _TOKENS_H_
#define _TOKENS_H_
#include <iostream>
#include <string>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#include <boost/spirit/include/phoenix_container.hpp>
namespace lex = boost::spirit::lex;
namespace phx = boost::phoenix;
enum tokenids
{
ID_IDENTIFICATOR = 1,
ID_CONSTANT,
ID_OPERATION,
ID_BRACKET,
ID_WHITESPACES
};
template <typename Lexer>
struct mega_tokens
: lex::lexer<Lexer>
{
mega_tokens()
: identifier(L"[a-zA-Z_][a-zA-Z0-9_]*", ID_IDENTIFICATOR)
, constant (L"[0-9]+(\\.[0-9]+)?", ID_CONSTANT )
, operation (L"[\\+\\-\\*/]", ID_OPERATION )
, bracket (L"[\\(\\)\\[\\]]", ID_BRACKET )
{
using lex::_tokenid;
using lex::_val;
using phx::val;
this->self
= operation [ std::wcout
<< val(L'<') << _tokenid
// << val(L':') << lex::_val
<< val(L'>')
]
| identifier [ std::wcout
<< val(L'<') << _tokenid
<< val(L':') << _val
<< val(L'>')
]
| constant [ std::wcout
<< val(L'<') << _tokenid
// << val(L':') << _val
<< val(L'>')
]
| bracket [ std::wcout
<< val(L'<') << _tokenid
// << val(L':') << lex::_val
<< val(L'>')
]
;
}
lex::token_def<wchar_t, wchar_t> operation;
lex::token_def<std::wstring, wchar_t> identifier;
lex::token_def<double, wchar_t> constant;
lex::token_def<wchar_t, wchar_t> bracket;
};
#endif // _TOKENS_H_
and
#include <cstdlib>
#include <iostream>
#include <locale>
#include <boost/spirit/include/lex_lexertl.hpp>
#include "tokens.h"
int main()
{
setlocale(LC_ALL, "Russian");
namespace lex = boost::spirit::lex;
typedef std::wstring::iterator base_iterator;
typedef lex::lexertl::token <
base_iterator,
boost::mpl::vector<wchar_t, std::wstring, double, wchar_t>,
boost::mpl::true_
> token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
typedef mega_tokens<lexer_type>::iterator_type iterator_type;
mega_tokens<lexer_type> mega_lexer;
std::wstring str = L"alfa+x1*(2.836-x2[i])";
base_iterator first = str.begin();
bool r = lex::tokenize(first, str.end(), mega_lexer);
if (r) {
std::wcout << L"Success" << std::endl;
}
else {
std::wstring rest(first, str.end());
std::wcerr << L"Lexical analysis failed\n" << L"stopped at: \""
<< rest << L"\"\n";
}
return EXIT_SUCCESS;
}
This code causes an error in Boost header 'boost/spirit/home/lex/argument.hpp' on line 167 while compiling:
return: can't convert 'const
boost::variant' to
'boost::variant &'
When I don't use lex::_val program compiles with no errors.
Obviously, I use _val in wrong way, but I do not know how to do this correctly. Help, please! :)
P.S. And sorry for my terrible English…
I believe this is a problem in the current Phoenix related to using iostreams. As a workaround I suggest to define a custom (Phoenix) function doing the actual output:
struct output_operation_impl
{
template <typename TokenId, typename Val>
struct result { typedef void type; };
template <typename TokenId, typename Val>
void operator()(T1 const& tokenid, T2 const& val) const
{
std::wcout << L'<' << tokenid << L':' << val << L'>';
}
};
boost::phoenix::function<output_operation_impl> const output_operation =
output_operation_impl();
calling it as:
this->self = operation[ output_operation(_tokenid, _val) ] ... ;
Regards Hartmut
I would like to use a parsed value as the input to a loop parser.
The grammar defines a header that specifies the (variable) size of the following string. For example, say the following string is the input to some parser.
12\r\nTest Payload
The parser should extract the 12, convert it to an unsigned int and then read twelve characters. I can define a boost spirit grammar that compiles, but an assertion in the boost spirit code fails at runtime.
#include <iostream>
#include <boost/spirit.hpp>
using namespace boost::spirit;
struct my_closure : public closure<my_closure, std::size_t> {
member1 size;
};
struct my_grammar : public grammar<my_grammar> {
template <typename ScannerT>
struct definition {
typedef rule<ScannerT> rule_type;
typedef rule<ScannerT, my_closure::context_t> closure_rule_type;
closure_rule_type header;
rule_type payload;
rule_type top;
definition(const my_grammar &self)
{
using namespace phoenix;
header = uint_p[header.size = arg1];
payload = repeat_p(header.size())[anychar_p][assign_a(self.result)];
top = header >> str_p("\r\n") >> payload;
}
const rule_type &start() const { return top; }
};
my_grammar(std::string &p_) : result(p_) {}
std::string &result;
};
int
main(int argc, char **argv)
{
const std::string content = "12\r\nTest Payload";
std::string payload;
my_grammar g(payload);
if (!parse(content.begin(), content.end(), g).full) {
std::cerr << "there was a parsing error!\n";
return -1;
}
std::cout << "Payload: " << payload << std::endl;
return 0;
}
Is it possible to tell spirit that the closure variable should be evaluated lazily? Is this behaviour supported by boost spirit?
This is much easier with the new qi parser available in Spirit 2. The following code snippet provides a full example that mostly works. An unexpected character is being inserted into the final result.
#include <iostream>
#include <string>
#include <boost/version.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_repeat.hpp>
#include <boost/spirit/include/qi_grammar.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
using boost::spirit::qi::repeat;
using boost::spirit::qi::uint_;
using boost::spirit::ascii::char_;
using boost::spirit::ascii::alpha;
using boost::spirit::qi::_1;
namespace phx = boost::phoenix;
namespace qi = boost::spirit::qi;
template <typename P, typename T>
void test_parser_attr(
char const* input, P const& p, T& attr, bool full_match = true)
{
using boost::spirit::qi::parse;
char const* f(input);
char const* l(f + strlen(f));
if (parse(f, l, p, attr) && (!full_match || (f == l)))
std::cout << "ok" << std::endl;
else
std::cout << "fail" << std::endl;
}
static void
straight_forward()
{
std::string str;
int n;
test_parser_attr("12\r\nTest Payload",
uint_[phx::ref(n) = _1] >> "\r\n" >> repeat(phx::ref(n))[char_],
str);
std::cout << "str.length() == " << str.length() << std::endl;
std::cout << n << "," << str << std::endl; // will print "12,Test Payload"
}
template <typename P, typename T>
void
test_phrase_parser(char const* input, P const& p,
T& attr, bool full_match = true)
{
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::ascii::space;
char const* f(input);
char const* l(f + strlen(f));
if (phrase_parse(f, l, p, space, attr) && (!full_match || (f == l)))
std::cout << "ok" << std::endl;
else
std::cout << "fail" << std::endl;
}
template <typename Iterator>
struct test_grammar
: qi::grammar<Iterator, std::string(), qi::locals<unsigned> > {
test_grammar()
: test_grammar::base_type(my_rule)
{
using boost::spirit::qi::_a;
my_rule %= uint_[_a = _1] >> "\r\n" >> repeat(_a)[char_];
}
qi::rule<Iterator, std::string(), qi::locals<unsigned> > my_rule;
};
static void
with_grammar_local_variable()
{
std::string str;
test_phrase_parser("12\r\nTest Payload", test_grammar<const char*>(), str);
std::cout << str << std::endl; // will print "Test Payload"
}
int
main(int argc, char **argv)
{
std::cout << "boost version: " << BOOST_LIB_VERSION << std::endl;
straight_forward();
with_grammar_local_variable();
return 0;
}
What you are looking for is lazy_p, check the example here: http://www.boost.org/doc/libs/1_35_0/libs/spirit/doc/the_lazy_parser.html