Spirit Grammar To break Up a String by Number of Characters

Spirit Grammar To break Up a String by Number of Characters - c++

I am working on learning to write spirit grammars and I am trying to create a basic base 16 to base 64 converter that takes in a string representing hex, for example:
49276d206b696c
parse out 6 or less characters (less if the string isn't a perfect multiple of 6) and generate a base 64 encoded string from the input. One grammar I figured would probably work is something like this:
// 6 characters
`(qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F")[/*action*/]) |
// or 5 characters
(qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F")[/*action*/]) | ...`
etc.... all the way down to one character, Or having a different rule defined for each number of characters, but I think there must be a better way to specify the grammar. I read about spirit repeat and was thinking maybe I could do something like
+(boost::spirit::repeat(1, 6)[qi::char_("0-9a-fA-F")][/*action on characters*/])
however the compiler throws an error on this, because of the sematic action portion of the grammar. Is there a simpler way to specify a grammar to operate on exactly 6 or less characters at a time?
Edit
Here is what I have done so far...
base16convertergrammar.hpp
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>
namespace grammar {
namespace qi = boost::spirit::qi;
void toBase64(const std::string& p_input, std::string& p_output)
{
if (p_input.length() < 6)
{
// pad length
}
// use back inserter and generator to append to end of p_output.
}
template <typename Iterator>
struct Base16Grammar : qi::grammar<Iterator, std::string()>
{
Base16Grammar() : Base16Grammar::base_type(start, "base16grammar"),
m_base64String()
{
// get six characters at a time and send them off to be encoded
// if there is less than six characters just parse what we have
start = +(boost::spirit::repeat(1, 6)[qi::char_("0-9a-fA-F")][boost::phoenix::bind(toBase64, qi::_1,
boost::phoenix::ref(m_base64String))]);
}
qi::rule<Iterator, std::string()> start;
std::string m_base64String;
};
}
And here is the usage...
base16converter.cpp
#include "base16convertergrammar.hpp"
const std::string& convertHexToBase64(const std::string& p_hexString)
{
grammar::Base16Grammar<std::string::const_iterator> g;
bool r = boost::spirit::qi::parse(p_hexString.begin(), p_hexString.end(), g);
}
int main(int argc, char** argv)
{
std::string test("49276d206b696c6c");
convertHexToBase64(test);
}

First of all, repeat()[] exposes a vector, so vector<char>, not a string.
void toBase64(const std::vector<char>& p_input, std::string& p_output)
Secondly, please don't do all that work. You don't tell us what the input means, but as long as you want to group it in sixes, I'm assuming you want them interpreted as /something/. You could e.g. use the int_parser:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>
namespace grammar {
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template <typename Iterator>
struct Base16Grammar : qi::grammar<Iterator, std::string()>
{
Base16Grammar() : Base16Grammar::base_type(start, "base16grammar")
{
start = +qi::int_parser<uint64_t, 16, 1, 6>() [ qi::_val += to_string(qi::_1) + "; " ];
}
private:
struct to_string_f { template <typename T> std::string operator()(T const& v) const { return std::to_string(v); } };
px::function<to_string_f> to_string;
qi::rule<Iterator, std::string()> start;
};
}
std::string convertHexToBase64(const std::string& p_hexString)
{
grammar::Base16Grammar<std::string::const_iterator> g;
std::string result;
bool r = boost::spirit::qi::parse(p_hexString.begin(), p_hexString.end(), g, result);
assert(r);
return result;
}
int main()
{
for (std::string test : {"49276d206b696c6c"})
std::cout << test << " -> " << convertHexToBase64(test) << "\n";
}
Prints
49276d206b696c6c -> 4794221; 2124649; 27756;

Going out on a limb, you just want to transcode hex-encoded binary into base64.
Since you're already using Boost:
Live On Coliru
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/insert_linebreaks.hpp>
#include <boost/archive/iterators/transform_width.hpp>
// for hex decoding
#include <boost/iterator/function_input_iterator.hpp>
#include <string>
#include <iostream>
#include <functional>
std::string convertHexToBase64(const std::string &hex) {
struct get_byte_f {
using result_type = uint8_t;
std::string::const_iterator hex_it;
result_type operator()() {
auto nibble = [](uint8_t ch) {
if (!std::isxdigit(ch)) throw std::runtime_error("invalid hex input");
return std::isdigit(ch) ? ch - '0' : std::tolower(ch) - 'a' + 10;
};
auto hi = nibble(*hex_it++);
auto lo = nibble(*hex_it++);
return hi << 4 | lo;
}
} get_byte{ hex.begin() };
using namespace boost::archive::iterators;
using It = boost::iterators::function_input_iterator<get_byte_f, size_t>;
typedef insert_linebreaks< // insert line breaks every 72 characters
base64_from_binary< // convert binary values to base64 characters
transform_width< // retrieve 6 bit integers from a sequence of 8 bit bytes
It, 6, 8> >,
72> B64; // compose all the above operations in to a new iterator
return { B64(It{get_byte, 0}), B64(It{get_byte, hex.size()/2}) };
}
int main() {
for (std::string test : {
"49276d206b696c6c",
"736f6d65206c656e67746879207465787420746f2073686f77207768617420776f756c642068617070656e206174206c696e6520777261700a"
})
{
std::cout << " === hex: " << test << "\n" << convertHexToBase64(test) << "\n";
}
}
Prints
=== hex: 49276d206b696c6c
SSdtIGtpbGw
=== hex: 736f6d65206c656e67746879207465787420746f2073686f77207768617420776f756c642068617070656e206174206c696e6520777261700a
c29tZSBsZW5ndGh5IHRleHQgdG8gc2hvdyB3aGF0IHdvdWxkIGhhcHBlbiBhdCBsaW5lIHdy
YXAK

Related

How to skip (not output) tokens in Boost Spirit?

I'm new to Boost Spirit. I haven't been able to find examples for some simple things. For example, suppose I have an even number of space-delimited integers. (That matches *(qi::int_ >> qi::int_). So far so good.) I'd like to save just the even ones to a std::vector<int>. I've tried a variety of things like *(qi::int_ >> qi::skip[qi::int_]) https://godbolt.org/z/KPToo3xh6 but that still records every int, not just even ones.
#include <stdexcept>
#include <fmt/format.h>
#include <fmt/ranges.h>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
// Example based off https://raw.githubusercontent.com/bingmann/2018-cpp-spirit-parsing/master/spirit1_simple.cpp:
// Helper to run a parser, check for errors, and capture the results.
template <typename Parser, typename Skipper, typename ... Args>
void PhraseParseOrDie(
const std::string& input, const Parser& p, const Skipper& s,
Args&& ... args)
{
std::string::const_iterator begin = input.begin(), end = input.end();
boost::spirit::qi::phrase_parse(begin, end, p, s, std::forward<Args>(args) ...);
if (begin != end) {
fmt::print("Unparseable: \"{}\"\n", std::string(begin, end));
}
}
void test(std::string input)
{
std::vector<int> out_int_list;
PhraseParseOrDie(
// input string
input,
// parser grammar
*(qi::int_ >> qi::skip[qi::int_]),
// skip parser
qi::space,
// output list
out_int_list);
fmt::print("test() parse result: {}\n", out_int_list);
}
int main(int argc, char* argv[])
{
test("12345 42 5 2");
return 0;
}
Prints
test() parse result: [12345, 42, 5, 2]

You're looking for qi::omit[]:
*(qi::int_ >> qi::omit[qi::int_])
Note you can also implicitly omit things by declaring a rule without attribute-type (which make it bind to qi::unused_type for silent compatibility).
Also note that if you're making an adhoc, sloppy grammar to scan for certain "landmarks" in a larger body of text, consider spirit::repository::qi::seek which can be significantly faster and more expressive.
Finally, note that Spirit X3 comes with a similar seek[] directive out of the box.
Simplified Demo
Much simplified: https://godbolt.org/z/EY4KdxYv9
#include <fmt/ranges.h>
#include <boost/spirit/include/qi.hpp>
// Helper to run a parser, check for errors, and capture the results.
void test(std::string const& input)
{
std::vector<int> out_int_list;
namespace qi = boost::spirit::qi;
qi::parse(input.begin(), input.end(), //
qi::expect[ //
qi::skip(qi::space)[ //
*(qi::int_ >> qi::omit[qi::int_]) > qi::eoi]], //
out_int_list);
fmt::print("test() parse result: {}\n", out_int_list);
}
int main() { test("12345 42 5 2"); }
Prints
test() parse result: [12345, 5]
But Wait
Seeing your comment
// Parse a bracketed list of integers with spaces between symbols
Did you really mean that? Because that sounds a ton more like:
'[' > qi::auto_ % +qi::graph > ']'
See it live: https://godbolt.org/z/eK6Thzqea
//#define BOOST_SPIRIT_DEBUG
#include <fmt/ranges.h>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_auto.hpp>
//#include <boost/fusion/adapted.hpp>
// Helper to run a parser, check for errors, and capture the results.
template <typename T> auto test(std::string const& input) {
std::vector<T> out;
using namespace boost::spirit::qi;
rule<std::string::const_iterator, T()> v = auto_;
BOOST_SPIRIT_DEBUG_NODE(v);
phrase_parse( //
input.begin(), input.end(), //
'[' > -v % lexeme[+(graph - ']')] > ']', //
space, out);
return out;
}
int main() {
fmt::print("ints: {}\n", test<int>("[12345 USD 5 PUT]"));
fmt::print("doubles: {}\n", test<double>("[ 1.2345 42 -inf 'hello' 3.1415 ]"));
}
Prints
ints: [12345, 5]
doubles: [1.2345, -inf, 3.1415]

Boost::Spirit doubles character when followed by a default value

I use boost::spirit to parse (a part) of a monomial like x, y, xy, x^2, x^3yz. I want to save the variables of the monomial into a map, which also stores the corresponding exponent. Therefore the grammar should also save the implicit exponent of 1 (so x stores as if it was written as x^1).
start = +(potVar);
potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));// First try: This doubles the variable name
//potVar = varName >> (('^' >> exponent) | qi::attr(1));// Second try: This works as intended
exponent = qi::int_;
varName = qi::char_("a-z");
When using the default attribute as in the line "First try", Spirit doubles the variable name.
Everything works as intended when using the default attribute as in the line "Second try".
'First try' reads a variable x and stores the pair [xx, 1].
'Second try' reads a variable x and stores the pair [x, 1].
I think I solved the original problem myself. The second try works. However, I don't see how I doubled the variable name. Because I am about to get familiar with boost::spirit, which is a collection of challenges for me, and there are probably more to come, I would like to understand this behavior.
This is the whole code to recreate the problem. The frame of the grammar is copied from a presentation of the KIT https://panthema.net/2018/0912-Boost-Spirit-Tutorial/ , and Stackoverflow was already very helpful, when I needed the header, which enables me to use the std::pair.
#include <iostream>
#include <iomanip>
#include <stdexcept>
#include <cmath>
#include <map>
#include <utility>//for std::pair
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp> //https://stackoverflow.com/questions/53953642/parsing-map-of-variants-with-boost-spirit-x3
namespace qi = boost::spirit::qi;
template <typename Parser, typename Skipper, typename ... Args>
void PhraseParseOrDie(
const std::string& input, const Parser& p, const Skipper& s,
Args&& ... args)
{
std::string::const_iterator begin = input.begin(), end = input.end();
boost::spirit::qi::phrase_parse(
begin, end, p, s, std::forward<Args>(args) ...);
if (begin != end) {
std::cout << "Unparseable: "
<< std::quoted(std::string(begin, end)) << std::endl;
throw std::runtime_error("Parse error");
}
}
class ArithmeticGrammarMonomial : public qi::grammar<
std::string::const_iterator,
std::map<std::string, int>(), qi::space_type>
{
public:
using Iterator = std::string::const_iterator;
ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start)
{
start = +(potVar);
potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));
//potVar = varName >> (('^' >> exponent) | qi::attr(1));
exponent = qi::int_;
varName = qi::char_("a-z");
}
qi::rule<Iterator, std::map<std::string, int>(), qi::space_type> start;
qi::rule<Iterator, std::pair<std::string, int>(), qi::space_type> potVar;
qi::rule<Iterator, int()> exponent;
qi::rule<Iterator, std::string()> varName;
};
void test2(std::string input)
{
std::map<std::string, int> out_map;
PhraseParseOrDie(input, ArithmeticGrammarMonomial(), qi::space, out_map);
std::cout << "test2() parse result: "<<std::endl;
for(auto &it: out_map)
std::cout<< it.first<<it.second << std::endl;
}
/******************************************************************************/
int main(int argc, char* argv[])
{
std::cout << "Parse Monomial 1" << std::endl;
test2(argc >= 2 ? argv[1] : "x^3y^1");
test2(argc >= 2 ? argv[1] : "xy");
return 0;
}
Live demo

I think I solved the original problem myself. The second try works.
Indeed. It's how I'd do this (always match the AST with your parser expressions).
However, I don't see how I doubled the variable name.
It's due to backtracking with container attributes. They don't get rolled back. So the first branch parses potVar into a string, and then the parser backtracks into the second branch, which parses potVar into the same string.
boost::spirit::qi duplicate parsing on the output
Understanding Boost.spirit's string parser
Parsing with Boost::Spirit (V2.4) into container
Boost Spirit optional parser and backtracking
boost::spirit alternative parsers return duplicates
It can also crop up with semantic actions:
Boost Semantic Actions causing parsing issues
Boost Spirit optional parser and backtracking
In short:
match your AST structure in your rule expression, or use qi::hold to force the issue (at performance cost)
avoid semantic actions (Boost Spirit: "Semantic actions are evil"?)
For inspiration, here's a simplified take using Spirit X3
Live On Compiler Explorer
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
#include <map>
namespace Parsing {
namespace x3 = boost::spirit::x3;
auto exponent = '^' >> x3::int_ | x3::attr(1);
auto varName = x3::repeat(1)[x3::char_("a-z")];
auto potVar
= x3::rule<struct P, std::pair<std::string, int>>{}
= varName >> exponent;
auto start = x3::skip(x3::space)[+potVar >> x3::eoi];
template <typename T = x3::unused_type>
void StrictParse(std::string_view input, T&& into = {})
{
auto f = input.begin(), l = input.end();
if (!x3::parse(f, l, start, into)) {
fmt::print(stderr, "Error at: '{}'\n", std::string(f, l));
throw std::runtime_error("Parse error");
}
}
} // namespace Parsing
void test2(std::string input) {
std::map<std::string, int> out_map;
Parsing::StrictParse(input, out_map);
fmt::print("{} -> {}\n", input, out_map);
}
int main() {
for (auto s : {"x^3y^1", "xy"})
test2(s);
}
Prints
x^3y^1 -> [("x", 3), ("y", 1)]
xy -> [("x", 1), ("y", 1)]
Bonus Notes
It looks to me like you should be more careful. Even if you assume that all variables are 1 letter and no terms can occur (only factors), then still you need to correctly handle x^5y^2x to be x^6y^2 right?
Here's Qi version that uses semantic actions to correctly accumulate like factors:
Live On Coliru
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
#include <map>
namespace qi = boost::spirit::qi;
using Iterator = std::string::const_iterator;
using Monomial = std::map<char, int>;
struct ArithmeticGrammarMonomial : qi::grammar<Iterator, Monomial()> {
ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start) {
using namespace qi;
exp_ = '^' >> int_ | attr(1);
start = skip(space)[ //
+(char_("a-z") >> exp_)[_val[_1] += _2] //
];
}
private:
qi::rule<Iterator, Monomial()> start;
qi::rule<Iterator, int(), qi::space_type> exp_;
};
void do_test(std::string_view input) {
Monomial output;
static const ArithmeticGrammarMonomial p;
Iterator f(begin(input)), l(end(input));
qi::parse(f, l, qi::eps > p, output);
std::cout << std::quoted(input) << " -> " << std::endl;
for (auto& [var,exp] : output)
std::cout << " - " << var << '^' << exp << std::endl;
}
int main() {
for (auto s : {"x^3y^1", "xy", "x^5y^2x"})
do_test(s);
}
Prints
"x^3y^1" ->
- x^3
- y^1
"xy" ->
- x^1
- y^1
"x^5y^2x" ->
- x^6
- y^2

Parse only specific numbers with Boost.Spirit

How can I build a Boost.Spirit parser that matches only numbers in a certain range?
Consider the simple parser qi::uint_. It matches all unsigned integers. Is it possible to construct a parser that matches the numbers 0 to 12345 but not 12346 and larger?

One way is to attach to the qi::uint_ parser a semantic action that checks the parser's attribute and sets the semantic action's third parameter accordingly:
#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
qi::rule<std::string::const_iterator, unsigned(), qi::ascii::space_type> rule;
const auto not_greater_than_12345 = [](const unsigned& attr, auto&, bool& pass) {
pass = !(attr > 12345U);
};
rule %= qi::uint_[not_greater_than_12345];
std::vector<std::string> numbers{"0", "123", "1234", "12345", "12346", "123456"};
for (const auto& number : numbers) {
unsigned result;
auto iter = number.cbegin();
if (qi::phrase_parse(iter, number.cend(), rule, qi::ascii::space, result) &&
iter == number.cend()) {
std::cout << result << '\n'; // 0 123 1234 12345
}
}
}
Live on Wandbox
The semantic action can be written more concisely with the Phoenix placeholders _pass and _1:
#include <iostream>
#include <string>
#include <vector>
#include <boost/phoenix/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
qi::rule<std::string::const_iterator, unsigned(), qi::ascii::space_type> rule;
rule %= qi::uint_[qi::_pass = !(qi::_1 > 12345U)];
std::vector<std::string> numbers{"0", "123", "1234", "12345", "12346", "123456"};
for (const auto& number : numbers) {
unsigned result;
auto iter = number.cbegin();
if (qi::phrase_parse(iter, number.cend(), rule, qi::ascii::space, result) &&
iter == number.cend()) {
std::cout << result << '\n'; // 0 123 1234 12345
}
}
}
Live on Wandbox
From Semantic Actions with Parsers
The possible signatures for functions to be used as semantic actions are:
...
template <typename Attrib, typename Context>
void fa(Attrib& attr, Context& context, bool& pass);
... Here Attrib is the attribute type of the parser attached to the semantic action. ... The third parameter, pass, can be used by the semantic action to force the associated parser to fail. If pass is set to false the action parser will immediately return false as well, while not invoking p and not generating any output.

Trace the position of boost::spirit

How could I trace the position of the attribute of spirit?
A simple example
template <typename Iterator>
bool trace_numbers(Iterator first, Iterator last)
{
using boost::spirit::qi::double_;
using boost::spirit::qi::phrase_parse;
using boost::spirit::ascii::space;
bool r = phrase_parse(first, last,
// Begin grammar
(
double_ % ','
)
,
// End grammar
space);
if (first != last) // fail if we did not get a full match
return false;
return r;
}
I want to trace the position(line and column) of "double_", I found line_pos_iterator but have no idea how to use it.I also found multi-pass, but don't know it could be used to trace the positions or not(if it can, how?).

After some research, I found that using spirit::lex alone or combine it with spirit::qi is a solution.
#include <boost/config/warning_disable.hpp>
//[wcp_includes
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#include <boost/spirit/include/phoenix_container.hpp>
//]
#include <iostream>
#include <string>
#include <vector>
namespace spiritParser
{
//[wcp_namespaces
using namespace boost::spirit;
using namespace boost::spirit::ascii;
//[wcp_token_ids
enum tokenids
{
IDANY = lex::min_token_id + 10
};
//]
//[wcp_token_definition
template <typename Lexer>
struct number_position_track_tokens : lex::lexer<Lexer>
{
number_position_track_tokens()
{
// define patterns (lexer macros) to be used during token definition
// below
this->self.add_pattern
("NUM", "[0-9]+")
;
number = "{NUM}"; // reference the pattern 'NUM' as defined above
this->self.add
(number) // no token id is needed here
(".", IDANY) // characters are usable as tokens as well
;
}
lex::token_def<std::string> number;
};
//]
template<typename Iterator>
struct numberGrammar : qi::grammar<Iterator>
{
template <typename TokenDef>
numberGrammar(TokenDef const &tok)
: numberGrammar::base_type(start)
, num(0), position(0)
{
using boost::phoenix::ref;
using boost::phoenix::push_back;
using boost::phoenix::size;
//"34, 44, 55, 66, 77, 88"
start = *( tok.number [++ref(num),
boost::phoenix::push_back(boost::phoenix::ref(numPosition), boost::phoenix::ref(position)),
ref(position) += size(_1)
]
| qi::token(IDANY) [++ref(position)]
)
;
}
std::size_t num, position;
std::vector<size_t> numPosition;
qi::rule<Iterator> start;
};
void lex_word_count_1()
{
using token_type = lex::lexertl::token<char const*, boost::mpl::vector<std::string> >;
number_position_track_tokens<lexer_type> word_count; // Our lexer
numberGrammar<iterator_type> g (word_count); // Our parser
// read in the file int memory
std::string str ("34, 44, 55, 66, 77, 88");
char const* first = str.c_str();
char const* last = &first[str.size()];
if (r) {
std::cout << "nums: " << g.num << ", size: " << g.position <<std::endl;
for(auto data : g.numPosition){
std::cout<<"position : "<<data<<std::endl;
}
}
else {
std::string rest(first, last);
std::cerr << "Parsing failed\n" << "stopped at: \""
<< rest << "\"\n";
}
}
}
This is the example from the document Quickstart 3 - Counting Words Using a Parser with some alternation.In my humble opinion, this is far from easy for a small task like this. If the patterns are not difficult for std::regex to descript; need faster speed or both, select spirit::lex to track the locations of simple pattern(like the example I show) is overkill.

trigger warning from boost spirit parser

How I can add warnings in boost spirit parser.
Edit: ... that could report the issue with position
For example if I have an integer parser:
('0' >> oct)
| int_
I would like to be able to do something like this:
('0' >> oct)
| "-0" --> trigger warning("negative octal values are not supported, it will be interpreted as negative decimal value and the leading 0 will be ignored")
| int_

Q. Can I create my own callback? How?
A. Sure. Any way you'd normally do it in C++ (or look at Boost Signal2 and/or Boost Log)
parser(std::function<bool(std::string const& s)> callback)
: parser::base_type(start),
callback(callback)
{
using namespace qi;
start %=
as_string[+graph]
[ _pass = phx::bind(callback, _1) ]
% +space
;
BOOST_SPIRIT_DEBUG_NODES((start));
}
As you can see, you can even make the handler decide whether the warning should be ignored or cause the match to fail.
UPDATE #1 I've extended the sample to show some of the unrelated challenges you mentioned in the comments (position, duplicate checking). Hope this helps
Here's a simple demonstration: see it Live on Coliru (Word)
UPDATE #2 I've even made it (a) store the source information instead of the iterators, (b) made it "work" with floats (or any other exposed attribute type, really).
Note how uncannily similar it is, s/Word/Number/, basically: Live On Coliru (Number)
#define BOOST_RESULT_OF_USE_DECLTYPE // needed for gcc 4.7, not clang++
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <functional>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
// okay, so you want position reporting (actually unrelated):
#include <boost/spirit/include/support_line_pos_iterator.hpp>
using It = boost::spirit::line_pos_iterator<std::string::const_iterator>;
// AST type that represents a Number 'token' (with source and location
// information)
struct Number
{
double value;
size_t line_pos;
std::string source;
explicit Number(double value = 0.0, boost::iterator_range<It> const& range = {})
:
value(value),
line_pos(get_line(range.begin())),
source(range.begin(), range.end())
{}
bool operator< (const Number& other) const { return (other.value - value) > 0.0001; }
};
// the exposed attribute for the parser:
using Words = std::set<Number>;
// the callback signature for our warning; you could make it more like
// `on_error` so it takes the iterators directly, but again, I'm doing the
// simple thing for the dmeo
using Callback = std::function<bool(Number const& s)>;
template <typename It>
struct parser : qi::grammar<It, Words()>
{
parser(Callback warning)
: parser::base_type(start),
warning(warning)
{
using namespace qi;
auto check_unique = phx::end(_val) == phx::find(_val, _1);
word =
raw [ double_ [ _a = _1 ] ] [ _val = phx::construct<Number>(_a, _1) ]
;
start %=
- word [ _pass = check_unique || phx::bind(warning, _1) ]
% +space
>> eoi
;
}
private:
Callback warning;
qi::rule<It, Number(), qi::locals<double> > word;
qi::rule<It, Words()> start;
};
int main(int argc, const char *argv[])
{
// parse command line arguments
const auto flags = std::set<std::string> { argv+1, argv+argc };
const bool fatal_warnings = end(flags) != flags.find("-Werror");
// test input
const std::string input("2.4 2.7 \n\n\n-inf \n\nNaN 88 -2.40001 \n3.14 240001e-5\n\ninf");
// warning handler
auto warning_handler = [&](Number const& w) {
std::cerr << (fatal_warnings?"Error":"Warning")
<< ": Near-identical entry '" << w.source << "' at L:" << w.line_pos << "\n";
return !fatal_warnings;
};
// do the parse
It f(begin(input)), l(end(input));
bool ok = qi::parse(f, l, parser<It>(warning_handler));
// report results
if (ok) std::cout << "parse success\n";
else std::cerr << "parse failed\n";
if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
// exit code
return ok? 0 : 255;
}
Prints:
Warning: Near-identical entry 'NaN' at L:6
Warning: Near-identical entry '240001e-5' at L:7
parse success

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Spirit Grammar To break Up a String by Number of Characters - c++

Related

How to skip (not output) tokens in Boost Spirit?

Boost::Spirit doubles character when followed by a default value

Parse only specific numbers with Boost.Spirit

Trace the position of boost::spirit

trigger warning from boost spirit parser

Categories

Resources