Parsing escaped strings using Boost::Spirit

Parsing escaped strings using Boost::Spirit - c++

I would like to write a boost::spirit parser that parses a simple string in double quotes that uses escaped double quotes, e.g. "a \"b\" c".
Here is what I tried:
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
namespace client
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator>
bool parse(Iterator first, Iterator last)
{
using qi::char_;
qi::rule< Iterator, std::string(), ascii::space_type > text;
qi::rule< Iterator, std::string() > content;
qi::rule< Iterator, char() > escChar;
text = '"' >> content >> '"';
content = +(~char_('"') | escChar);
escChar = '\\' >> char_("\"");
bool r = qi::phrase_parse(first, last, text, ascii::space);
if (first != last) // fail if we did not get a full match
return false;
return r;
}
}
int main() {
std::string str = "\"a \\\"b\\\" c\"";
if (client::parse(str.begin(), str.end()))
std::cout << str << " Parses OK: " << std::endl;
else
std::cout << "Fail\n";
return 0;
}
It follows the example on Parsing escaped strings with boost spirit, but the output is "Fail". How can I get it to work?

Been a while since I had a go at spirit, but I think one of your rules is the wrong way round.
Try:
content = +(escChar | ~char_('"'))
instead of:
content = +(~char_('"') | escChar)
It is matching your \ using ~char('"') and therefore never gets round to checking if escChar matches. It then reads the next " as the end of the string and stops parsing.

Related

boost spirit grammar for parsing header columns

I want to parse header columns of a text file. The column names should be allowed to be quoted and any case of letters. Currently I am using the following grammar:
#include <string>
#include <iostream>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename Iterator, typename Skipper>
struct Grammar : qi::grammar<Iterator, void(), Skipper>
{
static constexpr char colsep = '|';
Grammar() : Grammar::base_type(header)
{
using namespace qi;
using ascii::char_;
#define COL(name) (no_case[name] | ('"' >> no_case[name] >> '"'))
header = (COL("columna") | COL("column_a")) >> colsep >>
(COL("columnb") | COL("column_b")) >> colsep >>
(COL("columnc") | COL("column_c")) >> eol >> eoi;
#undef COL
}
qi::rule<Iterator, void(), Skipper> header;
};
int main()
{
const std::string s{"columnA|column_B|column_c\n"};
auto begin(std::begin(s)), end(std::end(s));
Grammar<std::string::const_iterator, qi::blank_type> p;
bool ok = qi::phrase_parse(begin, end, p, qi::blank);
if (ok && begin == end)
std::cout << "Header ok" << std::endl;
else if (ok && begin != end)
std::cout << "Remaining unparsed: '" << std::string(begin, end) << "'" << std::endl;
else
std::cout << "Parse failed" << std::endl;
return 0;
}
Is this possible without the use of a macro? Further I would like to ignore any underscores at all. Can this be achieved with a custom skipper? In the end it would be ideal if one could write:
header = col("columna") >> colsep >> col("columnb") >> colsep >> column("columnc") >> eol >> eoi;
where col would be an appropriate grammar or rule.

#sehe how can I fix this grammar to support "\"Column_A\"" as well? 6 hours ago
By this time you should probably have realized that there's two different things going on here.
Separate Yo Concerns
On the one hand you have a grammar (that allows |-separated columns like columna or "Column_A").
On the other hand you have semantic analysis (the phase where you check that the parsed contents match certain criteria).
The thing that is making your life hard is trying to conflate the two. Now, don't get me wrong, there could be (very rare) circumstances where fusing those responsibilities together is absolutely required - but I feel that would always be an optimization. If you need that, Spirit is not your thing, and you're much more likely to be served with a handwritten parser.
Parsing
So let's get brain-dead simple about the grammar:
static auto headers = (quoted|bare) % '|' > (eol|eoi);
The bare and quoted rules can be pretty much the same as before:
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
As you can see this will implicitly take care of quoting and escaping as well whitespace skipping outside lexemes. When applied simply, it will result in a clean list of column names:
std::string const s = "\"columnA\"|column_B| column_c \n";
std::vector<std::string> headers;
bool ok = phrase_parse(begin(s), end(s), Grammar::headers, x3::blank, headers);
std::cout << "Parse " << (ok?"ok":"invalid") << std::endl;
if (ok) for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
Prints Live On Coliru
Parse ok
"columnA"
"column_B"
"column_c"
INTERMEZZO: Coding Style
Let's structure our code so that the separation of concerns is reflected. Our parsing code might use X3, but our validation code doesn't need to be in the same translation unit (cpp file).
Have a header defining some basic types:
#include <string>
#include <vector>
using Header = std::string;
using Headers = std::vector<Header>;
Define the operations we want to perform on them:
Headers parse_headers(std::string const& input);
bool header_match(Header const& actual, Header const& expected);
bool headers_match(Headers const& actual, Headers const& expected);
Now, main can be rewritten as just:
auto headers = parse_headers("\"columnA\"|column_B| column_c \n");
for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
bool valid = headers_match(headers, {"columna","columnb","columnc"});
std::cout << "Validation " << (valid?"passed":"failed") << "\n";
And e.g. a parse_headers.cpp could contain:
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace Grammar {
using namespace x3;
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
static auto headers = (quoted|bare) % '|' > (eol|eoi);
}
Headers parse_headers(std::string const& input) {
Headers output;
if (phrase_parse(begin(input), end(input), Grammar::headers, x3::blank, output))
return output;
return {}; // or throw, if you prefer
}
Validating
This is what is known as "semantic checks". You take the vector of strings and check them according to your logic:
#include <boost/range/adaptors.hpp>
#include <boost/algorithm/string.hpp>
bool header_match(Header const& actual, Header const& expected) {
using namespace boost::adaptors;
auto significant = [](unsigned char ch) {
return ch != '_' && std::isgraph(ch);
};
return boost::algorithm::iequals(actual | filtered(significant), expected);
}
bool headers_match(Headers const& actual, Headers const& expected) {
return boost::equal(actual, expected, header_match);
}
That's all. All the power of algorithms and modern C++ at your disposal, no need to fight with constraints due to parsing context.
Full Demo
The above, Live On Wandbox
Both parts got significantly simpler:
your parser doesn't have to deal with quirky comparison logic
your comparison logic doesn't have to deal with grammar concerns (quotes, escapes, delimiters and whitespace)

modify regex to include comma

I have the following string:
arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21')
The regex to extract the value looks like:
boost::regex re_arg_values("('[^']*(?:''[^']*)*'[^)]*)");
The above regex properly extracts the values. BUT when I include a comma , the code fails. For eg:
arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**')
How shall I modify this regex to include the comma?
FYI. The value can contain spaces, special characters, and also tabs. The code is in CPP.
Thanks in advance.

I'd not use a regex here.
The goal MUST be to parse values, and no doubt they will have useful values, that you need interpreted.
I'd devise a datastructure like:
#include <map>
namespace Config {
using Key = std::string;
using Value = boost::variant<int, std::string, bool>;
using Setting = std::pair<Key, Value>;
using Settings = std::map<Key, Value>;
}
For this you can write 1:1 a parser using Boost Spirit:
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
namespace Parser {
using It = std::string::const_iterator;
using namespace Config;
namespace qi = boost::spirit::qi;
using Skip = qi::blank_type;
qi::rule<It, std::string()> quoted_ = "'" >> *(
"'" >> qi::char_("'") // double ''
| '\\' >> qi::char_ // any character escaped
| ~qi::char_("'") // non-quotes
) >> "'";
qi::rule<It, Key()> key_ = +qi::char_("a-zA-Z0-9_"); // for example
qi::rule<It, Value()> value_ = qi::int_ | quoted_ | qi::bool_;
qi::rule<It, Setting(), Skip> setting_ = key_ >> '(' >> value_ >> ')';
qi::rule<It, Settings()> settings_ = qi::skip(qi::blank) [*setting_];
}
Note how this
interprets non-string values correctly
specifies what keys look like and parses them too
interprets string escapes, so the Value in the map contains the "real" string, after un-escaping
ignores whitespace outside values (use space_type if you want to ignore newlines as whitespace as well)
You can use it like:
int main() {
std::string const input = R"( arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'))";
Config::Settings map;
if (parse(input.begin(), input.end(), Parser::settings_, map)) {
for(auto& entry : map)
std::cout << "config setting {" << entry.first << ", " << entry.second << "}\n";
}
}
Which prints
config setting {arg1, value1}
config setting {arg2, value ')2}
config setting {arg3, user'~!##$%^&*_~!##$%^&"*_-=+[{]}|;:<.>?21**,**}
Live Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <map>
#include <boost/fusion/adapted/std_pair.hpp>
namespace Config {
using Key = std::string;
using Value = boost::variant<int, std::string, bool>;
using Setting = std::pair<Key, Value>;
using Settings = std::map<Key, Value>;
}
namespace Parser {
using It = std::string::const_iterator;
using namespace Config;
namespace qi = boost::spirit::qi;
using Skip = qi::blank_type;
qi::rule<It, std::string()> quoted_ = "'" >> *(
"'" >> qi::char_("'") // double ''
| '\\' >> qi::char_ // any character escaped
| ~qi::char_("'") // non-quotes
) >> "'";
qi::rule<It, Key()> key_ = +qi::char_("a-zA-Z0-9_"); // for example
qi::rule<It, Value()> value_ = qi::int_ | quoted_ | qi::bool_;
qi::rule<It, Setting(), Skip> setting_ = key_ >> '(' >> value_ >> ')';
qi::rule<It, Settings()> settings_ = qi::skip(qi::blank) [*setting_];
}
int main() {
std::string const input = R"( arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'))";
Config::Settings map;
if (parse(input.begin(), input.end(), Parser::settings_, map)) {
for(auto& entry : map)
std::cout << "config setting {" << entry.first << ", " << entry.second << "}\n";
}
}
BONUS
For comparison, here's the "same" but using regex:
Live On Coliru
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
#include <map>
namespace Config {
using Key = std::string;
using RawValue = std::string;
using Settings = std::map<Key, RawValue>;
Settings parse(std::string const& input) {
Settings settings;
boost::regex re(R"((\w+)\(('.*?')\))");
auto f = boost::make_regex_iterator(input, re);
for (auto& match : boost::make_iterator_range(f, {}))
settings.emplace(match[1].str(), match[2].str());
return settings;
}
}
int main() {
std::string const input = R"( arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'))";
Config::Settings map = Config::parse(input);
for(auto& entry : map)
std::cout << "config setting {" << entry.first << ", " << entry.second << "}\n";
}
Prints
config setting {arg1, 'value1'}
config setting {arg2, 'value ''}
config setting {arg3, 'user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'}
Notes:
it no longer interprets and converts any values
it no longer processes escapes
it requires an additional runtime library dependency on boost_regex

Boost Spirit Signals Successful Parsing Despite Token Being Incomplete

I have a very simple path construct that I am trying to parse with boost spirit.lex.
We have the following grammar:
token := [a-z]+
path := (token : path) | (token)
So we're just talking about colon separated lower-case ASCII strings here.
I have three examples "xyz", "abc:xyz", "abc:xyz:".
The first two should be deemed valid. The third one, which has a trailing colon, should not be deemed valid. Unfortunately the parser I have recognizes all three as being valid. The grammar should not allow an empty token, but apparently spirit is doing just that. What am I missing to get the third one rejected?
Also, if you read the code below, in comments there is another version of the parser that demands that all paths end with semi-colons. I can get appropriate behavior when I activate those lines, (i.e. rejection of "abc:xyz:;"), but this is not really what I want.
Anyone have any ideas?
Thanks.
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
using namespace boost::spirit;
using boost::phoenix::val;
template<typename Lexer>
struct PathTokens : boost::spirit::lex::lexer<Lexer>
{
PathTokens()
{
identifier = "[a-z]+";
separator = ":";
this->self.add
(identifier)
(separator)
(';')
;
}
boost::spirit::lex::token_def<std::string> identifier, separator;
};
template <typename Iterator>
struct PathGrammar
: boost::spirit::qi::grammar<Iterator>
{
template <typename TokenDef>
PathGrammar(TokenDef const& tok)
: PathGrammar::base_type(path)
{
using boost::spirit::_val;
path
=
(token >> tok.separator >> path)[std::cerr << _1 << "\n"]
|
//(token >> ';')[std::cerr << _1 << "\n"]
(token)[std::cerr << _1 << "\n"]
;
token
= (tok.identifier) [_val=_1]
;
}
boost::spirit::qi::rule<Iterator> path;
boost::spirit::qi::rule<Iterator, std::string()> token;
};
int main()
{
typedef std::string::iterator BaseIteratorType;
typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::string> > TokenType;
typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType;
typedef PathTokens<LexerType>::iterator_type TokensIterator;
typedef std::vector<std::string> Tests;
Tests paths;
paths.push_back("abc");
paths.push_back("abc:xyz");
paths.push_back("abc:xyz:");
/*
paths.clear();
paths.push_back("abc;");
paths.push_back("abc:xyz;");
paths.push_back("abc:xyz:;");
*/
for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter )
{
std::string str = *iter;
std::cerr << "*****" << str << "*****\n";
PathTokens<LexerType> tokens;
PathGrammar<TokensIterator> grammar(tokens);
BaseIteratorType first = str.begin();
BaseIteratorType last = str.end();
bool r = boost::spirit::lex::tokenize_and_parse(first, last, tokens, grammar);
std::cerr << r << " " << (first==last) << "\n";
}
}

I addition to to what llonesmiz already said, here's a trick using qi::eoi that I sometimes use:
path = (
(token >> tok.separator >> path) [std::cerr << _1 << "\n"]
| token [std::cerr << _1 << "\n"]
) >> eoi;
This makes the grammar require eoi (end-of-input) at the end of a successful match. This leads to the desired result:
http://liveworkspace.org/code/23a7adb11889bbb2825097d7c553f71d
*****abc*****
abc
1 1
*****abc:xyz*****
xyz
abc
1 1
*****abc:xyz:*****
xyz
abc
0 1

The problem lies in the meaning of first and last after your call to tokenize_and_parse. first==last checks if your string has been completely tokenized, you can't infer anything about grammar. If you isolate the parsing like this, you obtain the expected result:
PathTokens<LexerType> tokens;
PathGrammar<TokensIterator> grammar(tokens);
BaseIteratorType first = str.begin();
BaseIteratorType last = str.end();
LexerType::iterator_type lexfirst = tokens.begin(first,last);
LexerType::iterator_type lexlast = tokens.end();
bool r = parse(lexfirst, lexlast, grammar);
std::cerr << r << " " << (lexfirst==lexlast) << "\n";

This is what I finally ended up with. It uses the suggestions from both #sehe and #llonesmiz. Note the conversion to std::wstring and the use of actions in the grammar definition, which were not present in the original post.
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/bind.hpp>
#include <iostream>
#include <string>
//
// This example uses boost spirit to parse a simple
// colon-delimited grammar.
//
// The grammar we want to recognize is:
// identifier := [a-z]+
// separator = :
// path= (identifier separator path) | identifier
//
// From the boost spirit perspective this example shows
// a few things I found hard to come by when building my
// first parser.
// 1. How to flag an incomplete token at the end of input
// as an error. (use of boost::spirit::eoi)
// 2. How to bind an action on an instance of an object
// that is taken as input to the parser.
// 3. Use of std::wstring.
// 4. Use of the lexer iterator.
//
// This using directive will cause issues with boost::bind
// when referencing placeholders such as _1.
// using namespace boost::spirit;
//! A class that tokenizes our input.
template<typename Lexer>
struct Tokens : boost::spirit::lex::lexer<Lexer>
{
Tokens()
{
identifier = L"[a-z]+";
separator = L":";
this->self.add
(identifier)
(separator)
;
}
boost::spirit::lex::token_def<std::wstring, wchar_t> identifier, separator;
};
//! This class provides a callback that echoes strings to stderr.
struct Echo
{
void echo(boost::fusion::vector<std::wstring> const& t) const
{
using namespace boost::fusion;
std::wcerr << at_c<0>(t) << "\n";
}
};
//! The definition of our grammar, as described above.
template <typename Iterator>
struct Grammar : boost::spirit::qi::grammar<Iterator>
{
template <typename TokenDef>
Grammar(TokenDef const& tok, Echo const& e)
: Grammar::base_type(path)
{
using boost::spirit::_val;
path
=
((token >> tok.separator >> path)[boost::bind(&Echo::echo, e,::_1)]
|
(token)[boost::bind(&Echo::echo, &e, ::_1)]
) >> boost::spirit::eoi; // Look for end of input.
token
= (tok.identifier) [_val=boost::spirit::qi::_1]
;
}
boost::spirit::qi::rule<Iterator> path;
boost::spirit::qi::rule<Iterator, std::wstring()> token;
};
int main()
{
// A set of typedefs to make things a little clearer. This stuff is
// well described in the boost spirit documentation/examples.
typedef std::wstring::iterator BaseIteratorType;
typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::wstring> > TokenType;
typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType;
typedef Tokens<LexerType>::iterator_type TokensIterator;
typedef LexerType::iterator_type LexerIterator;
// Define some paths to parse.
typedef std::vector<std::wstring> Tests;
Tests paths;
paths.push_back(L"abc");
paths.push_back(L"abc:xyz");
paths.push_back(L"abc:xyz:");
paths.push_back(L":");
// Parse 'em.
for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter )
{
std::wstring str = *iter;
std::wcerr << L"*****" << str << L"*****\n";
Echo e;
Tokens<LexerType> tokens;
Grammar<TokensIterator> grammar(tokens, e);
BaseIteratorType first = str.begin();
BaseIteratorType last = str.end();
// Have the lexer consume our string.
LexerIterator lexFirst = tokens.begin(first, last);
LexerIterator lexLast = tokens.end();
// Have the parser consume the output of the lexer.
bool r = boost::spirit::qi::parse(lexFirst, lexLast, grammar);
// Print the status and whether or note all output of the lexer
// was processed.
std::wcerr << r << L" " << (lexFirst==lexLast) << L"\n";
}
}

Spirit Qi sequence parsing issues

I have some issues with parser writing with Spirit::Qi 2.4.
I have a series of key-value pairs to parse in following format <key name>=<value>.
Key name can be [a-zA-Z0-9] and is always followed by = sign with no white-space between key name and = sign. Key name is also always preceded by at least one space.
Value can be almost any C expression (spaces are possible as well), with the exception of the expressions containing = char and code blocks { }.
At the end of the sequence of the key value pairs there's a { sign.
I struggle a lot with writing parser for this expression. Since the key name always is preceded by at least one space and followed by = and contains no spaces I defined it as
KeyName %= [+char_("a-zA-Z0-9_") >> lit("=")] ;
Value can be almost anything, but it can not contain = nor { chars, so I defined it as:
Value %= +(char_ - char_("{=")) ;
I thought about using look-ahead's like this to catch the value:
ValueExpression
%= (
Value
>> *space
>> &(KeyName | lit("{"))
)
;
But it won't work, for some reason (seems like the ValueExpression greedily goes up to the = sign and "doesn't know" what to do from there). I have limited knowledge of LL parsers, so I'm not really sure what's cooking here. Is there any other way I could tackle this kind of sequence?
Here's example series:
EXP1=FunctionCall(A, B, C) TEST="Example String" \
AnotherArg=__FILENAME__ - 'BlahBlah' EXP2= a+ b+* {
Additional info: since this is a part of a much larger grammar I can't really solve this problem any other way than by a Spirit.Qi parser (like splitting by '=' and doing some custom parsing or something similar).
Edit:
I've created minimum working example here: http://ideone.com/kgYD8
(compiled under VS 2012 with boost 1.50, but should be fine on older setups as well).

I'd suggest you have a look at the article Parsing a List of Key-Value Pairs Using Spirit.Qi.
I've greatly simplified your code, while
adding attribute handling
removing phoenix semantic actions
debugging of rules
Here it is, without further ado:
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
typedef std::map<std::string, std::string> data_t;
template <typename It, typename Skipper>
struct grammar : qi::grammar<It, data_t(), Skipper>
{
grammar() : grammar::base_type(Sequence)
{
using namespace qi;
KeyName = +char_("a-zA-Z0-9_") >> '=';
Value = qi::no_skip [+(~char_("={") - KeyName)];
Sequence = +(KeyName > Value);
BOOST_SPIRIT_DEBUG_NODE(KeyName);
BOOST_SPIRIT_DEBUG_NODE(Value);
BOOST_SPIRIT_DEBUG_NODE(Sequence);
}
private:
qi::rule<It, data_t(), Skipper> Sequence;
qi::rule<It, std::string()> KeyName; // no skipper, removes need for qi::lexeme
qi::rule<It, std::string(), Skipper> Value;
};
template <typename Iterator>
data_t parse (Iterator begin, Iterator end)
{
grammar<Iterator, qi::space_type> p;
data_t data;
if (qi::phrase_parse(begin, end, p, qi::space, data)) {
std::cout << "parse ok\n";
if (begin!=end) {
std::cout << "remaining: " << std::string(begin,end) << '\n';
}
} else {
std::cout << "failed: " << std::string(begin,end) << '\n';
}
return data;
}
int main ()
{
std::string test(" ARG=Test still in first ARG ARG2=Zombie cat EXP2=FunctionCall(A, B C) {" );
auto data = parse(test.begin(), test.end());
for (auto& e : data)
std::cout << e.first << "=" << e.second << '\n';
}
Output will be:
parse ok
remaining: {
ARG=Test still in first ARG
ARG2=Zombie cat
EXP2=FunctionCall(A, B C)
If you really wanted '{' to be part of the last value, change this line:
Value = qi::no_skip [+(char_ - KeyName)];

Passing file-path string to semantic action in Boost.Spirit

I am new to Boost.Spirit, and I have a question related to a mini-interpreter I am trying to implement using the library. As a sub-task of parsing my language, I need to extract a file-path from an input of the form:
"path = \"/path/to/file\""
and pass it as a string (without quotes) to a semantic action.
I wrote some code which can parse this type of input, but passing the parsed string does not work as expected, probably due to my lack of experience with Boost.Spirit.
Can anyone help?
In reality, my grammar is more complex, but I have isolated the problem to:
#include <string>
#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/phoenix_core.hpp"
#include "boost/spirit/include/phoenix_operator.hpp"
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
namespace parser {
// Semantic action (note: in reality, this would use file_path_string in non-trivial way)
void display_path(std::string file_path_string) {
std::cout << "Detected file-path: " << file_path_string << std::endl;
}
// Grammar
template <typename Iterator>
struct path_command : qi::grammar<Iterator, ascii::space_type> {
path_command() : path_command::base_type(path_specifier) {
using qi::string;
using qi::lit;
path = +(qi::char_("/") >> *qi::char_("a-zA-Z_0-9"));
quoted_path_string = lit('"') >> (path- lit('"')) >> lit('"');
path_specifier = lit("path") >> qi::lit("=")
>> quoted_path_string[&display_path];
}
qi::rule<Iterator, ascii::space_type> path_specifier;
qi::rule<Iterator, std::string()> path, quoted_path_string;
};
}
int main() {
using ascii::space;
typedef std::string::const_iterator iterator_type;
typedef parser::path_command<iterator_type> path_command;
bool parse_res;
path_command command_instance; // Instance of our Grammar
iterator_type iter, end;
std::string test_command1 = "path = \"/file1\"";
std::string test_command2 = "path = \"/dirname1/dirname2/file2\"";
// Testing example command 1
iter = test_command1.begin();
end = test_command1.end();
parse_res = phrase_parse(iter, end, command_instance, space);
std::cout << "Parse result for test 1: " << parse_res << std::endl;
// Testing example command 2
iter = test_command2.begin();
end = test_command2.end();
parse_res = phrase_parse(iter, end, command_instance, space);
std::cout << "Parse result for test 2: " << parse_res << std::endl;
return EXIT_SUCCESS;
}
The output is:
Detected file-path: /
Parse result for test 1: 1
Detected file-path: ///
Parse result for test 2: 1
but I would like to obtain:
Detected file-path: /file1
Parse result for test 1: 1
Detected file-path: /dirname1/dirname2/file2
Parse result for test 2: 1

Almost everything is fine with your parser. The problem is a bug in Spirit (upto Boost V1.46) preventing the correct handling of the attribute in cases like this. This has been recently fixed in SVN and will be available in Boost V1.47 (I tried running your unchanged program with this version and everything works just fine).
For now, you can work around this problem by utilizing the raw[] directive (see below).
I said 'almost' above, because you can a) simplify what you have, b) you should use no_skip[] to avoid invoking the skip parser in between the qutoes.
path = raw[+(qi::char_("/") >> *qi::char_("a-zA-Z_0-9"))];
quoted_path_string = no_skip['"' >> path >> '"'];
path_specifier = lit("path") >> qi::lit("=")
>> quoted_path_string[&display_path];
You can omit the - lit('"') part because your path parser does not recognize quotes in the first place.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing escaped strings using Boost::Spirit - c++

Related

boost spirit grammar for parsing header columns

modify regex to include comma

Boost Spirit Signals Successful Parsing Despite Token Being Incomplete

Spirit Qi sequence parsing issues

Passing file-path string to semantic action in Boost.Spirit

Categories

Resources