I need to update a parser to admit these new features, but I am not able to manage all them at a time:
The commands must admit an indeterminate number of parameters (> 0).
Parameters might be numbers, unquoted strings or quoted strings.
Parameters are separate by commas.
Within quoted strings, it shall be permitted to use opening/closing parenthesis.
(It easier to understand these requirements looking at source code example)
My current code, including checks, is as follows:
Godbolt link: https://godbolt.org/z/5d6o53n9h
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace script
{
struct Command
{
enum Type { NONE, WRITE_LOG, INSERT_LABEL, START_PROCESS, END_PROCESS, COMMENT, FAIL };
Type type{ Type::NONE };
std::vector<std::string> args;
};
using Commands = std::vector<Command>;
}//namespace script
BOOST_FUSION_ADAPT_STRUCT(script::Command, type, args)
namespace script
{
namespace qi = boost::spirit::qi;
template <typename It>
class Parser : public qi::grammar<It, Commands()>
{
private:
qi::symbols<char, Command::Type> type;
qi::rule<It, Command(), qi::blank_type> none, command, comment, fail;//By its very nature "fail" must be the last one to be checked
qi::rule<It, Commands()> start;
public:
Parser() : Parser::base_type(start)
{
using namespace qi;//NOTE: "as_string" is neccessary in all args due to std::vector<std::string>
auto empty_args = copy(attr(std::vector<std::string>{}));
type.add
("WriteLog", Command::WRITE_LOG)
("InsertLabel", Command::INSERT_LABEL)
("StartProcess", Command::START_PROCESS)
("EndProcess", Command::END_PROCESS);
none = omit[*blank] >> &(eol | eoi)
>> attr(Command::NONE)
>> empty_args;//ignore args
command = type >> '('
>> as_string[lexeme[+~char_("(),\r\n")]] % ',' >> ')';
comment = lit("//")
>> attr(Command::COMMENT)
>> as_string[lexeme[*~char_("\r\n")]];
fail = omit[*~char_("\r\n")]
>> attr(Command::FAIL)
>> empty_args;//ignore args
start = skip(blank)[(none | command | comment | fail) % eol] >> eoi;
}
};
Commands parse(std::istream& in)
{
using It = boost::spirit::istream_iterator;
static const Parser<It> parser;
Commands commands;
It first(in >> std::noskipws), last;//No white space skipping
if (!qi::parse(first, last, parser, commands))
throw std::runtime_error("command parse error");
return commands;
}
}//namespace script
std::stringstream ss{
R"(// just a comment
WriteLog("this is a log")
WriteLog("this is also (in another way) a log")
WriteLog("but this is just a fail)
StartProcess(17, "program.exe", True)
StartProcess(17, "this_is_a_fail.exe, True)
)"};
int main()
{
using namespace script;
try
{
auto commands = script::parse(ss);
std::array args{ 0, 0, 1, 1, -1, 0, 3, -1, 0 };//Fails may have any number of arguments. It doesn't care. Sets as -1 by convenience flag
std::array types{ Command::COMMENT, Command::NONE, Command::WRITE_LOG, Command::WRITE_LOG, Command::FAIL, Command::NONE, Command::START_PROCESS, Command::FAIL, Command::NONE };
std::cout << std::boolalpha << "size correct? " << (commands.size() == 9) << std::endl;
std::cout << "types correct? " << std::equal(commands.begin(), commands.end(), types.begin(), types.end(), [](auto& cmd, auto& type) { return cmd.type == type; }) << std::endl;
std::cout << "arguments correct? " << std::equal(commands.begin(), commands.end(), args.begin(), args.end(), [](auto& cmd, auto arg) { return cmd.args.size() == arg || arg == -1; }) << std::endl;
}
catch (std::exception const& e)
{
std::cout << e.what() << "\n";
}
}
Any help with this will be appreciated.
You say you want to allow parentheses within quoted strings. But you don't even support quoted strings!
So the problem is your argument rule. Which doesn't even exist. It whould be roughly this part:
argument = +~char_("(),\r\n");
command = type >> '(' >> argument % ',' >> ')';
Where argument might be declared as
qi::rule<It, Argument()> argument;
In fact, rewriting the tests in an organized fashion, here's what we get right now:
Live On Compiler Explorer
static const Commands expected{
{Command::COMMENT, {"just a comment"}},
{Command::NONE, {}},
{Command::WRITE_LOG, {"this is a log"}},
{Command::WRITE_LOG, {"this is also (in another way) a log"}},
{Command::FAIL, {}},
{Command::NONE, {}},
{Command::START_PROCESS, {"17", "program.exe", "True"}},
{Command::FAIL, {}},
{Command::NONE, {}},
};
try {
auto parsed = script::parse(ss);
fmt::print("Parsed all correct? {} -- {} parsed (vs. {} expected)\n",
(parsed == expected), parsed.size(), expected.size());
for (auto i = 0u; i < std::min(expected.size(), parsed.size()); ++i) {
if (expected[i] != parsed[i]) {
fmt::print("index #{} expected {}\n"
" actual: {}\n",
i, expected[i], parsed[i]);
} else {
fmt::print("index #{} CORRECT ({})\n", i, parsed[i]);
}
}
} catch (std::exception const& e) {
fmt::print("Exception: {}\n", e.what());
}
Prints
Parsed all correct? false -- 9 parsed (vs. 9 expected)
index #0 CORRECT (Command(COMMENT, ["just a comment"]))
index #1 CORRECT (Command(NONE, []))
index #2 expected Command(WRITE_LOG, ["this is a log"])
actual: Command(WRITE_LOG, ["\"this is a log\""])
index #3 expected Command(WRITE_LOG, ["this is also (in another way) a log"])
actual: Command(FAIL, [])
index #4 expected Command(FAIL, [])
actual: Command(WRITE_LOG, ["\"but this is just a fail"])
index #5 CORRECT (Command(NONE, []))
index #6 expected Command(START_PROCESS, ["17", "program.exe", "True"])
actual: Command(START_PROCESS, ["17", "\"program.exe\"", "True"])
index #7 expected Command(FAIL, [])
actual: Command(START_PROCESS, ["17", "\"this_is_a_fail.exe", "True"])
index #8 CORRECT (Command(NONE, []))
As you can see, it fails quoted strings too, in my expectation. That's because the quoting is a language construct. In the AST (parsed results) you donot care about how exactly it was written in code. E.g. "hello\ world\041" might be equivalent too "hello world!" so both should result in the argument value hello world!.
So, let's do as we say:
argument = quoted_string | number | boolean | raw_string;
We can add a few rules:
// notice these are lexemes (no internal skipping):
qi::rule<It, Argument()> argument, quoted_string, number, boolean, raw_string;
And define them:
quoted_string = '"' >> *~char_('"') >> '"';
number = raw[double_];
boolean = raw[bool_];
raw_string = +~char_("(),\r\n");
argument = quoted_string | number | boolean | raw_string;
(If you want to allow escaped quotes, something like this:
quoted_string = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
Now, I'd say you probably want Argument to be something like variant<double, std::string, bool>, instead of just std::string.
With only this change, all the problems have practically vanished: Live On Compiler Explorer:
Parsed all correct? false -- 9 parsed (vs. 9 expected)
index #0 CORRECT (Command(COMMENT, ["just a comment"]))
index #1 CORRECT (Command(NONE, []))
index #2 CORRECT (Command(WRITE_LOG, ["this is a log"]))
index #3 CORRECT (Command(WRITE_LOG, ["this is also (in another way) a log"]))
index #4 CORRECT (Command(FAIL, []))
index #5 CORRECT (Command(NONE, []))
index #6 CORRECT (Command(START_PROCESS, ["17", "program.exe", "True"]))
index #7 expected Command(FAIL, [])
actual: Command(START_PROCESS, ["17", "this_is_a_fail.exe, True)\n\"this_is_a_fail.exe", "True"])
index #8 CORRECT (Command(NONE, []))
Now, index #7 looks very funky, but it's actually a well-known phenomenon in Spirit¹. Enabling BOOST_SPIRIT_DEBUG demonstrates it:
<argument>
<try>"this_is_a_fail.exe,</try>
<quoted_string>
<try>"this_is_a_fail.exe,</try>
<fail/>
</quoted_string>
<number>
<try>"this_is_a_fail.exe,</try>
<fail/>
</number>
<boolean>
<try>"this_is_a_fail.exe,</try>
<fail/>
</boolean>
<raw_string>
<try>"this_is_a_fail.exe,</try>
<success>, True)</success>
<attributes>[[t, h, i, s, _, i, s, _, a, _, f, a, i, l, ., e, x, e, ,, , T, r, u, e, ), ", t, h, i, s, _, i, s, _, a, _, f, a, i, l, ., e, x, e]]</attributes>
</raw_string>
<success>, True)</success>
<attributes>[[t, h, i, s, _, i, s, _, a, _, f, a, i, l, ., e, x, e, ,, , T, r, u, e, ), ", t, h, i, s, _, i, s, _, a, _, f, a, i, l, ., e, x, e]]</attributes>
</argument>
So, the string gets accepted as a raw string, even though it started with ". That's easily fixed, but we don't even need to. We could just apply qi::hold to avoid the duplication:
argument = qi::hold[quoted_string] | number | boolean | raw_string;
Result:
actual: Command(START_PROCESS, ["17", "\"this_is_a_fail.exe", "True"])
However, if you expect it to fail, fix that other problem:
raw_string = +~char_("\"(),\r\n"); // note the \"
Note: In the off-chance you really only require it to not start with
a quote:
raw_string = !lit('"') >> +~char_("(),\r\n");
I guess by now you see the problem with a "loose rule" like that, so I
don't recommend it.
You could express the requirement another way though, saying "if an
argument starts with '"' then is MUST be a quoted_string. Use
an expectation point there:
quoted_string = '"' > *('\\' >> char_ | ~char_('"')) > '"';
This has the effect that failure to parse a complete quoted_string
will throw an expectation_failed exception.
Summary / Listing
This is what we end up with:
Live On Compiler Explorer
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fmt/ranges.h>
namespace script {
using Argument = std::string;
using Arguments = std::vector<Argument>;
struct Command {
enum Type {
NONE,
WRITE_LOG,
INSERT_LABEL,
START_PROCESS,
END_PROCESS,
COMMENT,
FAIL
};
Type type{Type::NONE};
Arguments args;
auto operator<=>(Command const&) const = default;
};
using Commands = std::vector<Command>;
} // namespace script
BOOST_FUSION_ADAPT_STRUCT(script::Command, type, args)
namespace script {
namespace qi = boost::spirit::qi;
template <typename It> class Parser : public qi::grammar<It, Commands()> {
public:
Parser() : Parser::base_type(start) {
using namespace qi; // NOTE: "as_string" is neccessary in all args
auto empty_args = copy(attr(Arguments{}));
type.add //
("WriteLog", Command::WRITE_LOG) //
("InsertLabel", Command::INSERT_LABEL) //
("StartProcess", Command::START_PROCESS) //
("EndProcess", Command::END_PROCESS); //
none = omit[*blank] >> &(eol | eoi) //
>> attr(Command{Command::NONE, {}});
quoted_string = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
number = raw[double_];
boolean = raw[bool_];
raw_string = +~char_("\"(),\r\n");
argument = qi::hold[quoted_string] | number | boolean | raw_string;
command = type >> '(' >> argument % ',' >> ')';
comment = "//" //
>> attr(Command::COMMENT) //
>> as_string[lexeme[*~char_("\r\n")]]; //
fail = omit[*~char_("\r\n")] >> attr(Command{Command::FAIL, {}});
line = none | command | comment | fail; // keep fail last
start = skip(blank)[line % eol] >> eoi;
BOOST_SPIRIT_DEBUG_NODES((start)(line)(fail)(comment)(command)(
argument)(none)(quoted_string)(raw_string)(boolean)(number))
}
private:
qi::symbols<char, Command::Type> type;
qi::rule<It, Command(), qi::blank_type> line, none, command, comment, fail;
// notice these are lexemes (no internal skipping):
qi::rule<It, Argument()> argument, quoted_string, number, boolean, raw_string;
qi::rule<It, Commands()> start;
};
Commands parse(std::istream& in)
{
using It = boost::spirit::istream_iterator;
static const Parser<It> parser;
Commands commands;
return qi::parse(It{in >> std::noskipws}, {}, parser, commands)
? commands
: throw std::runtime_error("command parse error");
}
struct Formatter {
static constexpr auto name(script::Command::Type type) {
return std::array{"NONE", "WRITE_LOG", "INSERT_LABEL",
"START_PROCESS", "END_PROCESS", "COMMENT",
"FAIL"}
.at(static_cast<int>(type));
}
auto parse(auto& ctx) const { return ctx.begin(); }
auto format(script::Command const& cmd, auto& ctx) const {
return format_to(ctx.out(), "Command({}, {})", name(cmd.type), cmd.args);
}
};
} // namespace script
template <> struct fmt::formatter<script::Command> : script::Formatter {};
std::stringstream ss{
R"(// just a comment
WriteLog("this is a log")
WriteLog("this is also (in another way) a log")
WriteLog("but this is just a fail)
StartProcess(17, "program.exe", True)
StartProcess(17, "this_is_a_fail.exe, True)
)"};
int main() {
using namespace script;
static const Commands expected{
{Command::COMMENT, {"just a comment"}},
{Command::NONE, {}},
{Command::WRITE_LOG, {"this is a log"}},
{Command::WRITE_LOG, {"this is also (in another way) a log"}},
{Command::FAIL, {}},
{Command::NONE, {}},
{Command::START_PROCESS, {"17", "program.exe", "True"}},
{Command::FAIL, {}},
{Command::NONE, {}},
};
try {
auto parsed = script::parse(ss);
fmt::print("Parsed all correct? {} -- {} parsed (vs. {} expected)\n",
(parsed == expected), parsed.size(), expected.size());
for (auto i = 0u; i < std::min(expected.size(), parsed.size()); ++i) {
if (expected[i] != parsed[i]) {
fmt::print("index #{} expected {}\n"
" actual: {}\n",
i, expected[i], parsed[i]);
} else {
fmt::print("index #{} CORRECT ({})\n", i, parsed[i]);
}
}
} catch (std::exception const& e) {
fmt::print("Exception: {}\n", e.what());
}
}
Prints
Parsed all correct? true -- 9 parsed (vs. 9 expected)
index #0 CORRECT (Command(COMMENT, ["just a comment"]))
index #1 CORRECT (Command(NONE, []))
index #2 CORRECT (Command(WRITE_LOG, ["this is a log"]))
index #3 CORRECT (Command(WRITE_LOG, ["this is also (in another way) a log"]))
index #4 CORRECT (Command(FAIL, []))
index #5 CORRECT (Command(NONE, []))
index #6 CORRECT (Command(START_PROCESS, ["17", "program.exe", "True"]))
index #7 CORRECT (Command(FAIL, []))
index #8 CORRECT (Command(NONE, []))
¹ see e.g. boost::spirit alternative parsers return duplicates (which links to three more of the same kind)
Having some simplified grammar that should parse sequence of terminal literals: id, '<', '>' and ":action".
I need to allow brackets '(' ')' that do nothing but improve reading. (Full example is there http://coliru.stacked-crooked.com/a/dca93f5c8f37a889 )
Snip of my grammar:
start = expression % eol;
expression = (simple_def >> -expression)
| (qi::lit('(') > expression > ')');
simple_def = qi::lit('<') [qi::_val = Command::left]
| qi::lit('>') [qi::_val = Command::right]
| key [qi::_val = Command::id]
| qi::lit(":action") [qi::_val = Command::action]
;
key = +qi::char_("a-zA-Z_0-9");
When I try to parse: const std::string s = "(a1 > :action)"; Everything works like a charm.
But when I little bit bring more complexity with brackets "(a1 (>) :action)" I've gotten coredump. Just for information - coredump happens on coliru, while msvc compiled example just demonstrate fail parsing.
So my questions: (1) what's wrong with brackets, (2) how exactly brackets can be introduced to expression.
p.s. It is simplified grammar, in real I have more complicated case, but this is a minimal reproduceable code.
You should just handle the expectation failure:
terminate called after throwing an instance of 'boost::wrapexcept<boost::spir
it::qi::expectation_failure<__gnu_cxx::__normal_iterator<char const*, std::__
cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >
>'
what(): boost::spirit::qi::expectation_failure
Aborted (core dumped)
If you handle the expectation failure, the program will not have to terminate.
Fixing The Grammar
Your 'nested expression' rule only accepts a single expression. I think that
expression = (simple_def >> -expression)
is intended to match "1 or more `simple_def". However, the alternative branch:
| ('(' > expression > ')');
doesn't accept the same: it just stops after parsing `)'. This means that your input is simply invalid according to the grammar.
I suggest a simplification by expressing intent. You were on the right path with semantic typedefs. Let's avoid the "weasely" Line Of Lines (what even is that?):
using Id = std::string;
using Line = std::vector<Command>;
using Script = std::vector<Line>;
And use these typedefs consistently. Now, we can express the grammar as we "think" about it:
start = skip(blank)[script];
script = line % eol;
line = +simple;
simple = group | command;
group = '(' > line > ')';
See, by simplifying our mental model and sticking to it, we avoided the entire problem you had a hard time spotting.
Here's a quick demo that includes error handling, optional debug output, both test cases and encapsulating the skipper as it is part of the grammar: Live On Compiler Explorer
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
enum class Command { id, left, right, action };
static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
switch (cmd) {
case Command::id: return os << "[ID]";
case Command::left: return os << "[LEFT]";
case Command::right: return os << "[RIGHT]";
case Command::action: return os << "[ACTION]";
}
return os << "[???]";
}
using Id = std::string;
using Line = std::vector<Command>;
using Script = std::vector<Line>;
template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
ExprGrammar() : ExprGrammar::base_type(start) {
using namespace qi;
start = skip(blank)[script];
script = line % eol;
line = +simple;
simple = group | command;
group = '(' > line > ')';
command =
lit('<') [ _val = Command::left ] |
lit('>') [ _val = Command::right ] |
key [ _val = Command::id ] |
lit(":action") [ _val = Command::action ] ;
key = +char_("a-zA-Z_0-9");
BOOST_SPIRIT_DEBUG_NODES((command)(line)(simple)(group)(script)(key));
}
private:
qi::rule<It, Script()> start;
qi::rule<It, Line(), qi::blank_type> line, simple, group;
qi::rule<It, Script(), qi::blank_type> script;
qi::rule<It, Command(), qi::blank_type> command;
// lexemes
qi::rule<It, Id()> key;
};
int main() {
using It = std::string::const_iterator;
ExprGrammar<It> const p;
for (const std::string s : {
"a1 > :action\na1 (>) :action",
"(a1 > :action)\n(a1 (>) :action)",
"a1 (> :action)",
}) {
It f(begin(s)), l(end(s));
try {
Script parsed;
bool ok = qi::parse(f, l, p, parsed);
if (ok) {
fmt::print("Parsed {}\n", parsed);
} else {
fmt::print("Parsed failed\n");
}
if (f != l) {
fmt::print("Remaining unparsed: '{}'\n", std::string(f, l));
}
} catch (qi::expectation_failure<It> const& ef) {
fmt::print("{}\n", ef.what()); // TODO add more details :)
}
}
}
Prints
Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
Parsed {{[ID], [RIGHT], [ACTION]}}
BONUS
However, I think this can all be greatly simplified using qi::symbols for the commands. In fact it looks like you're only tokenizing (you confirm this when you say that the parentheses are not important).
line = +simple;
simple = group | command | (omit[key] >> attr(Command::id));
group = '(' > line > ')';
key = +char_("a-zA-Z_0-9");
Now you don't need Phoenix at all: Live On Compiler Explorer, printing
ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}, {[ID], [RIGHT], [ACTION]}}
ok? true {{[ID], [RIGHT], [ACTION]}}
Even Simpler?
Since I observe that you're basically tokenizing line-wise, why not simply skip the parentheses, and simplify all the way down to:
script = line % eol;
line = *(command | omit[key] >> attr(Command::id));
That's all. See it Live On Compiler Explorer again:
#include <boost/spirit/include/qi.hpp>
#include <fmt/ostream.h>
#include <fmt/ranges.h>
namespace qi = boost::spirit::qi;
enum class Command { id, left, right, action };
using Id = std::string;
using Line = std::vector<Command>;
using Script = std::vector<Line>;
static inline std::ostream& operator<<(std::ostream& os, Command cmd) {
return os << (std::array{"ID", "LEFT", "RIGHT", "ACTION"}.at(int(cmd)));
}
template <typename It>
struct ExprGrammar : qi::grammar<It, Script()> {
ExprGrammar() : ExprGrammar::base_type(start) {
using namespace qi;
start = skip(skipper.alias())[line % eol];
line = *(command | omit[key] >> attr(Command::id));
key = +char_("a-zA-Z_0-9");
BOOST_SPIRIT_DEBUG_NODES((line)(key));
}
private:
using Skipper = qi::rule<It>;
qi::rule<It, Script()> start;
qi::rule<It, Line(), Skipper> line;
Skipper skipper = qi::char_(" \t\b\f()");
qi::rule<It /*, Id()*/> key; // omit attribute for efficiency
struct cmdsym : qi::symbols<char, Command> {
cmdsym() { this->add("<", Command::left)
(">", Command::right)
(":action", Command::action);
}
} command;
};
int main() {
using It = std::string::const_iterator;
ExprGrammar<It> const p;
for (const std::string s : {
"a1 > :action\na1 (>) :action",
"(a1 > :action)\n(a1 (>) :action)",
"a1 (> :action)",
})
try {
It f(begin(s)), l(end(s));
Script parsed;
bool ok = qi::parse(f, l, p, parsed);
fmt::print("ok? {} {}\n", ok, parsed);
if (f != l)
fmt::print(" -- Remaining '{}'\n", std::string(f, l));
} catch (qi::expectation_failure<It> const& ef) {
fmt::print("{}\n", ef.what()); // TODO add more details :)
}
}
Prints
ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}, {ID, RIGHT, ACTION}}
ok? true {{ID, RIGHT, ACTION}}
Note I very subtly changed +() to *() so it would accept empty lines as well. This may or may not be what you want
I have to deal with a lot of files with a well defined syntax and semantic, for example:
the first line it's an header with special info
the other lines are containing a key value at the start of the line that are telling you how to parse and deal with the content of that line
if there is a comment it starts with a given token
etc etc ...
now boost::program_options, as far as I can tell, does pretty much the same job, but I only care about importing the content of those text file, without any extra work in between, just parse it and store it in my data structure .
the key step for me is that I would like to be able to do this parsing with:
regular expressions since I need to detect different semantics and I can't really imagine another way to do this
error checking ( corrupted file, unmatched keys even after parsing the entire file, etc etc ... )
so, I can use this library for this job ? There is a more functional approach ?
Okay, a starting point for a Spirit grammar
_Name = "newmtl" >> lexeme [ +graph ];
_Ns = "Ns" >> double_;
_Ka = "Ka" >> double_ >> double_ >> double_;
_Kd = "Kd" >> double_ >> double_ >> double_;
_Ks = "Ks" >> double_ >> double_ >> double_;
_d = "d" >> double_;
_illum %= "illum" >> qi::int_ [ _pass = (_1>=0) && (_1<=10) ];
comment = '#' >> *(char_ - eol);
statement=
comment
| _Ns [ bind(&material::_Ns, _r1) = _1 ]
| _Ka [ bind(&material::_Ka, _r1) = _1 ]
| _Kd [ bind(&material::_Kd, _r1) = _1 ]
| _Ks [ bind(&material::_Ks, _r1) = _1 ]
| _d [ bind(&material::_d, _r1) = _1 ]
| _illum [ bind(&material::_illum, _r1) = _1 ]
;
_material = -comment % eol
>> _Name [ bind(&material::_Name, _val) = _1 ] >> eol
>> -statement(_val) % eol;
start = _material % -eol;
I only implemented the MTL file subset grammar from your sample files.
Note: This is rather a simplistic grammar. But, you know, first things first. In reality I'd probably consider using the keyword list parser from the spirit repository. It has facilities to 'require' certain number of occurrences for the different 'field types'.
Note: Spirit Karma (and some ~50 other lines of code) are only here for demonstrational purposes.
With the following contents of untitled.mtl
# Blender MTL File: 'None'
# Material Count: 2
newmtl None
Ns 0
Ka 0.000000 0.000000 0.000000
Kd 0.8 0.8 0.8
Ks 0.8 0.8 0.8
d 1
illum 2
# Added just for testing:
newmtl Demo
Ns 1
Ks 0.9 0.9 0.9
d 42
illum 7
The output reads
phrase_parse -> true
remaining input: ''
void dump(const T&) [with T = std::vector<blender::mtl::material>]
-----
material {
Ns:0
Ka:{r:0,g:0,b:0}
Kd:{r:0.8,g:0.8,b:0.8}
Ks:{r:0.8,g:0.8,b:0.8}
d:1
illum:2(Highlight on)
}
material {
Ns:1
Ka:(unspecified)
Kd:(unspecified)
Ks:{r:0.9,g:0.9,b:0.9}
d:42
illum:7(Transparency: Refraction on/Reflection: Fresnel on and Ray trace on)
}
-----
Here's the listing
#define BOOST_SPIRIT_USE_PHOENIX_V3
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp> // for debug output/streaming
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
namespace wavefront { namespace obj
{
} }
namespace blender { namespace mtl // material?
{
struct Ns { int exponent; }; // specular exponent
struct Reflectivity { double r, g, b; };
using Name = std::string;
using Ka = Reflectivity;
using Kd = Reflectivity;
using Ks = Reflectivity;
using dissolve_factor = double;
enum class illumination_model {
color, // 0 Color on and Ambient off
color_ambient, // 1 Color on and Ambient on
highlight, // 2 Highlight on
reflection_ray, // 3 Reflection on and Ray trace on
glass_ray, // 4 Transparency: Glass on
// Reflection: Ray trace on
fresnel_ray, // 5 Reflection: Fresnel on and Ray trace on
refract_ray, // 6 Transparency: Refraction on
// Reflection: Fresnel off and Ray trace on
refract_ray_fresnel,// 7 Transparency: Refraction on
// Reflection: Fresnel on and Ray trace on
reflection, // 8 Reflection on and Ray trace off
glass, // 9 Transparency: Glass on
// Reflection: Ray trace off
shadow_invis, // 10 Casts shadows onto invisible surfaces
};
struct material
{
Name _Name;
boost::optional<Ns> _Ns;
boost::optional<Reflectivity> _Ka;
boost::optional<Reflectivity> _Kd;
boost::optional<Reflectivity> _Ks;
boost::optional<dissolve_factor> _d;
boost::optional<illumination_model> _illum;
};
using mtl_file = std::vector<material>;
///////////////////////////////////////////////////////////////////////
// Debug output helpers
std::ostream& operator<<(std::ostream& os, blender::mtl::illumination_model o)
{
using blender::mtl::illumination_model;
switch(o)
{
case illumination_model::color: return os << "0(Color on and Ambient off)";
case illumination_model::color_ambient: return os << "1(Color on and Ambient on)";
case illumination_model::highlight: return os << "2(Highlight on)";
case illumination_model::reflection_ray: return os << "3(Reflection on and Ray trace on)";
case illumination_model::glass_ray: return os << "4(Transparency: Glass on/Reflection: Ray trace on)";
case illumination_model::fresnel_ray: return os << "5(Reflection: Fresnel on and Ray trace on)";
case illumination_model::refract_ray: return os << "6(Transparency: Refraction on/Reflection: Fresnel off and Ray trace on)";
case illumination_model::refract_ray_fresnel: return os << "7(Transparency: Refraction on/Reflection: Fresnel on and Ray trace on)";
case illumination_model::reflection: return os << "8(Reflection on and Ray trace off)";
case illumination_model::glass: return os << "9(Transparency: Glass on/Reflection: Ray trace off)";
case illumination_model::shadow_invis: return os << "10(Casts shadows onto invisible surfaces)";
default: return os << "ILLEGAL VALUE";
}
}
std::ostream& operator<<(std::ostream& os, blender::mtl::Reflectivity const& o)
{
return os << "{r:" << o.r << ",g:" << o.g << ",b:" << o.b << "}";
}
std::ostream& operator<<(std::ostream& os, blender::mtl::material const& o)
{
using namespace boost::spirit::karma;
return os << format("material {"
"\n\tNs:" << (auto_ | "(unspecified)")
<< "\n\tKa:" << (stream | "(unspecified)")
<< "\n\tKd:" << (stream | "(unspecified)")
<< "\n\tKs:" << (stream | "(unspecified)")
<< "\n\td:" << (stream | "(unspecified)")
<< "\n\tillum:" << (stream | "(unspecified)")
<< "\n}", o);
}
} }
BOOST_FUSION_ADAPT_STRUCT(blender::mtl::Reflectivity,(double, r)(double, g)(double, b))
BOOST_FUSION_ADAPT_STRUCT(blender::mtl::Ns, (int, exponent))
BOOST_FUSION_ADAPT_STRUCT(blender::mtl::material,
(boost::optional<blender::mtl::Ns>, _Ns)
(boost::optional<blender::mtl::Ka>, _Ka)
(boost::optional<blender::mtl::Kd>, _Kd)
(boost::optional<blender::mtl::Ks>, _Ks)
(boost::optional<blender::mtl::dissolve_factor>, _d)
(boost::optional<blender::mtl::illumination_model>, _illum))
namespace blender { namespace mtl { namespace parsing
{
template <typename It>
struct grammar : qi::grammar<It, qi::blank_type, mtl_file()>
{
template <typename T=qi::unused_type> using rule = qi::rule<It, qi::blank_type, T>;
rule<Name()> _Name;
rule<Ns()> _Ns;
rule<Reflectivity()> _Ka;
rule<Reflectivity()> _Kd;
rule<Reflectivity()> _Ks;
rule<dissolve_factor()> _d;
rule<illumination_model()> _illum;
rule<mtl_file()> start;
rule<material()> _material;
rule<void(material&)> statement;
rule<> comment;
grammar() : grammar::base_type(start)
{
using namespace qi;
using phx::bind;
using blender::mtl::material;
_Name = "newmtl" >> lexeme [ +graph ];
_Ns = "Ns" >> double_;
_Ka = "Ka" >> double_ >> double_ >> double_;
_Kd = "Kd" >> double_ >> double_ >> double_;
_Ks = "Ks" >> double_ >> double_ >> double_;
_d = "d" >> double_;
_illum %= "illum" >> qi::int_ [ _pass = (_1>=0) && (_1<=10) ];
comment = '#' >> *(char_ - eol);
statement=
comment
| _Ns [ bind(&material::_Ns, _r1) = _1 ]
| _Ka [ bind(&material::_Ka, _r1) = _1 ]
| _Kd [ bind(&material::_Kd, _r1) = _1 ]
| _Ks [ bind(&material::_Ks, _r1) = _1 ]
| _d [ bind(&material::_d, _r1) = _1 ]
| _illum [ bind(&material::_illum, _r1) = _1 ]
;
_material = -comment % eol
>> _Name [ bind(&material::_Name, _val) = _1 ] >> eol
>> -statement(_val) % eol;
start = _material % -eol;
BOOST_SPIRIT_DEBUG_NODES(
(start)
(statement)
(_material)
(_Name) (_Ns) (_Ka) (_Kd) (_Ks) (_d) (_illum)
(comment))
}
};
} } }
#include <fstream>
template <typename T>
void dump(T const& data)
{
using namespace boost::spirit::karma;
std::cout << __PRETTY_FUNCTION__
<< "\n-----\n"
<< format(stream % eol, data)
<< "\n-----\n";
}
void testMtl(const char* const fname)
{
std::ifstream mtl(fname, std::ios::binary);
mtl.unsetf(std::ios::skipws);
boost::spirit::istream_iterator f(mtl), l;
using namespace blender::mtl::parsing;
static const grammar<decltype(f)> p;
blender::mtl::mtl_file data;
bool ok = qi::phrase_parse(f, l, p, qi::blank, data);
std::cout << "phrase_parse -> " << std::boolalpha << ok << "\n";
std::cout << "remaining input: '" << std::string(f,l) << "'\n";
dump(data);
}
int main()
{
testMtl("untitled.mtl");
}
Yes, at least if you config file as simple as map of key-value pairs (something like simple .ini).
From documentation:
The program_options library allows program developers to obtain
program options, that is (name, value) pairs from the user, via
conventional methods such as command line and config file.
...
Options can be read from anywhere. Sooner or later the command line
will be not enough for your users, and you'll want config files or
maybe even environment variables. These can be added without
significant effort on your part.
See "multiple sources" sample for details.
But, if you need (or could probably need in the future) a more sophisticated config files (XML, JSON or binary for example), it is worth to use standalone library.
It's most likely possible, but not necessarily convenient. If you want to parse anything you want to use parser - whether you use existing one or write one yourself depends on what you are parsing.
If there is no way to parse your format with any existing tool then just write your own parser. You can use lex/flex/flex++ with yacc/bison/bison++ or boost::spirit.
I think in a long run learning to maintain you own parser will be more useful that forcefully adjusting boost::program_options config, but not as convenient as using some existing parser already matching your needs.