128 bit string to array using boost::spirit::* - c++

I am currently starting with boost::spirit::*. I try to parse a 128 bit string into a simple c array with corresponding size. I created a short test which does the job:
boost::spirit::qi::int_parser< boost::uint8_t, 16, 2, 2 > uint8_hex;
std::string src( "00112233445566778899aabbccddeeff" );
boost::uint8_t dst[ 16 ];
bool r;
for( std::size_t i = 0; i < 16; ++i )
{
r = boost::spirit::qi::parse( src.begin( ) + 2 * i, src.begin( ) + 2 * i + 2, uint8_hex, dst[ i ] );
}
I have the feeling that this is not the smartest way to do it :) Any ideas how to define a rule so I can avoid the loop ?
Update:
In the meantime I figured out the following code which does the job very well:
using namespace boost::spirit;
using namespace boost::phoenix;
qi::int_parser< boost::uint8_t, 16, 2, 2 > uint8_hex;
std::string src( "00112233445566778899aabbccddeeff" );
boost::uint8_t dst[ 16 ];
std::size_t i = 0;
bool r = qi::parse( src.begin( ),
src.end( ),
qi::repeat( 16 )[ uint8_hex[ ref( dst )[ ref( i )++ ] = qi::_1 ] ] );

Not literally staying with the question, if you really wanted just to parse the hexadecimal representation of a 128 bit integer, you can do so portably by using uint128_t defined in Boost Multiprecision:
qi::int_parser<uint128_t, 16, 16, 16> uint128_hex;
uint128_t parsed;
bool r = qi::parse(f, l, uint128_hex, parsed);
This is bound to be the quickest way especially on platforms where 128bit types are supported in the instruction set.
Live On Coliru
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
using boost::multiprecision::uint128_t;
using It = std::string::const_iterator;
qi::int_parser<uint128_t, 16, 16, 16> uint128_hex;
std::string const src("00112233445566778899aabbccddeeff");
auto f(src.begin()), l(src.end());
uint128_t parsed;
bool r = qi::parse(f, l, uint128_hex, parsed);
if (r) std::cout << "Parse succeeded: " << std::hex << std::showbase << parsed << "\n";
else std::cout << "Parse failed at '" << std::string(f,l) << "'\n";
}

There's a sad combination of factors that lead to this being a painful edge case
Boost Fusion can adapt (boost::)array<> but it it requires the parser to result in a tuple of elements, not a container
Boost Fusion can adapt these sequences, but need to be configure to allow 16 elements:
#define FUSION_MAX_VECTOR_SIZE 16
Even when you do, the qi::repeat(n)[] parser directive expects the attribute to be a container type.
You might work around all this in an ugly way (e.g. Live On Coliru). This makes everything hard to work with down the road.
I'd prefer a tiny semantic action here to make the result being assigned from qi::repeat(n)[]:
using data_t = boost::array<uint8_t, 16>;
data_t dst {};
qi::rule<It, data_t(), qi::locals<data_t::iterator> > rule =
qi::eps [ qi::_a = phx::begin(qi::_val) ]
>> qi::repeat(16) [
uint8_hex [ *qi::_a++ = qi::_1 ]
];
This works without too much noise. The idea is to take the start iterator and write to the next element each iteraton.
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
int main() {
using It = std::string::const_iterator;
qi::int_parser<uint8_t, 16, 2, 2> uint8_hex;
std::string const src("00112233445566778899aabbccddeeff");
auto f(src.begin()), l(src.end());
using data_t = boost::array<uint8_t, 16>;
data_t dst {};
qi::rule<It, data_t(), qi::locals<data_t::iterator> > rule =
qi::eps [ qi::_a = phx::begin(qi::_val) ]
>> qi::repeat(16) [
uint8_hex [ *qi::_a++ = qi::_1 ]
];
bool r = qi::parse(f, l, rule, dst);
if (r) {
std::cout << "Parse succeeded\n";
for(unsigned i : dst) std::cout << std::hex << std::showbase << i << " ";
std::cout << "\n";
} else {
std::cout << "Parse failed at '" << std::string(f,l) << "'\n";
}
}

Related

Boost::Spirit doubles character when followed by a default value

I use boost::spirit to parse (a part) of a monomial like x, y, xy, x^2, x^3yz. I want to save the variables of the monomial into a map, which also stores the corresponding exponent. Therefore the grammar should also save the implicit exponent of 1 (so x stores as if it was written as x^1).
start = +(potVar);
potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));// First try: This doubles the variable name
//potVar = varName >> (('^' >> exponent) | qi::attr(1));// Second try: This works as intended
exponent = qi::int_;
varName = qi::char_("a-z");
When using the default attribute as in the line "First try", Spirit doubles the variable name.
Everything works as intended when using the default attribute as in the line "Second try".
'First try' reads a variable x and stores the pair [xx, 1].
'Second try' reads a variable x and stores the pair [x, 1].
I think I solved the original problem myself. The second try works. However, I don't see how I doubled the variable name. Because I am about to get familiar with boost::spirit, which is a collection of challenges for me, and there are probably more to come, I would like to understand this behavior.
This is the whole code to recreate the problem. The frame of the grammar is copied from a presentation of the KIT https://panthema.net/2018/0912-Boost-Spirit-Tutorial/ , and Stackoverflow was already very helpful, when I needed the header, which enables me to use the std::pair.
#include <iostream>
#include <iomanip>
#include <stdexcept>
#include <cmath>
#include <map>
#include <utility>//for std::pair
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp> //https://stackoverflow.com/questions/53953642/parsing-map-of-variants-with-boost-spirit-x3
namespace qi = boost::spirit::qi;
template <typename Parser, typename Skipper, typename ... Args>
void PhraseParseOrDie(
const std::string& input, const Parser& p, const Skipper& s,
Args&& ... args)
{
std::string::const_iterator begin = input.begin(), end = input.end();
boost::spirit::qi::phrase_parse(
begin, end, p, s, std::forward<Args>(args) ...);
if (begin != end) {
std::cout << "Unparseable: "
<< std::quoted(std::string(begin, end)) << std::endl;
throw std::runtime_error("Parse error");
}
}
class ArithmeticGrammarMonomial : public qi::grammar<
std::string::const_iterator,
std::map<std::string, int>(), qi::space_type>
{
public:
using Iterator = std::string::const_iterator;
ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start)
{
start = +(potVar);
potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));
//potVar = varName >> (('^' >> exponent) | qi::attr(1));
exponent = qi::int_;
varName = qi::char_("a-z");
}
qi::rule<Iterator, std::map<std::string, int>(), qi::space_type> start;
qi::rule<Iterator, std::pair<std::string, int>(), qi::space_type> potVar;
qi::rule<Iterator, int()> exponent;
qi::rule<Iterator, std::string()> varName;
};
void test2(std::string input)
{
std::map<std::string, int> out_map;
PhraseParseOrDie(input, ArithmeticGrammarMonomial(), qi::space, out_map);
std::cout << "test2() parse result: "<<std::endl;
for(auto &it: out_map)
std::cout<< it.first<<it.second << std::endl;
}
/******************************************************************************/
int main(int argc, char* argv[])
{
std::cout << "Parse Monomial 1" << std::endl;
test2(argc >= 2 ? argv[1] : "x^3y^1");
test2(argc >= 2 ? argv[1] : "xy");
return 0;
}
Live demo
I think I solved the original problem myself. The second try works.
Indeed. It's how I'd do this (always match the AST with your parser expressions).
However, I don't see how I doubled the variable name.
It's due to backtracking with container attributes. They don't get rolled back. So the first branch parses potVar into a string, and then the parser backtracks into the second branch, which parses potVar into the same string.
boost::spirit::qi duplicate parsing on the output
Understanding Boost.spirit's string parser
Parsing with Boost::Spirit (V2.4) into container
Boost Spirit optional parser and backtracking
boost::spirit alternative parsers return duplicates
It can also crop up with semantic actions:
Boost Semantic Actions causing parsing issues
Boost Spirit optional parser and backtracking
In short:
match your AST structure in your rule expression, or use qi::hold to force the issue (at performance cost)
avoid semantic actions (Boost Spirit: "Semantic actions are evil"?)
For inspiration, here's a simplified take using Spirit X3
Live On Compiler Explorer
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
#include <map>
namespace Parsing {
namespace x3 = boost::spirit::x3;
auto exponent = '^' >> x3::int_ | x3::attr(1);
auto varName = x3::repeat(1)[x3::char_("a-z")];
auto potVar
= x3::rule<struct P, std::pair<std::string, int>>{}
= varName >> exponent;
auto start = x3::skip(x3::space)[+potVar >> x3::eoi];
template <typename T = x3::unused_type>
void StrictParse(std::string_view input, T&& into = {})
{
auto f = input.begin(), l = input.end();
if (!x3::parse(f, l, start, into)) {
fmt::print(stderr, "Error at: '{}'\n", std::string(f, l));
throw std::runtime_error("Parse error");
}
}
} // namespace Parsing
void test2(std::string input) {
std::map<std::string, int> out_map;
Parsing::StrictParse(input, out_map);
fmt::print("{} -> {}\n", input, out_map);
}
int main() {
for (auto s : {"x^3y^1", "xy"})
test2(s);
}
Prints
x^3y^1 -> [("x", 3), ("y", 1)]
xy -> [("x", 1), ("y", 1)]
Bonus Notes
It looks to me like you should be more careful. Even if you assume that all variables are 1 letter and no terms can occur (only factors), then still you need to correctly handle x^5y^2x to be x^6y^2 right?
Here's Qi version that uses semantic actions to correctly accumulate like factors:
Live On Coliru
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
#include <map>
namespace qi = boost::spirit::qi;
using Iterator = std::string::const_iterator;
using Monomial = std::map<char, int>;
struct ArithmeticGrammarMonomial : qi::grammar<Iterator, Monomial()> {
ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start) {
using namespace qi;
exp_ = '^' >> int_ | attr(1);
start = skip(space)[ //
+(char_("a-z") >> exp_)[_val[_1] += _2] //
];
}
private:
qi::rule<Iterator, Monomial()> start;
qi::rule<Iterator, int(), qi::space_type> exp_;
};
void do_test(std::string_view input) {
Monomial output;
static const ArithmeticGrammarMonomial p;
Iterator f(begin(input)), l(end(input));
qi::parse(f, l, qi::eps > p, output);
std::cout << std::quoted(input) << " -> " << std::endl;
for (auto& [var,exp] : output)
std::cout << " - " << var << '^' << exp << std::endl;
}
int main() {
for (auto s : {"x^3y^1", "xy", "x^5y^2x"})
do_test(s);
}
Prints
"x^3y^1" ->
- x^3
- y^1
"xy" ->
- x^1
- y^1
"x^5y^2x" ->
- x^6
- y^2

Spirit X3, ascii::cntrl why disparity with std::iscntrl?

I'm concentrating on checking for error conditions in an parser design using Spirit X3. One of which is the character category checks like isalpha or ispunct. According to the X3 documentation Character Parsers they should match what C++ provides as std::isalpha and std::ispunct. However with a code demonstration shown below I do get different results.
#include <cstddef>
#include <cstdio>
#include <cstdint>
#include <cctype>
#include <iostream>
#include <boost/spirit/home/x3/version.hpp>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::space;
using x3::skip;
x3::rule<class main_rule_id, char> const main_rule_ = "main_rule";
const auto main_rule__def = ascii::cntrl;
BOOST_SPIRIT_DEFINE( main_rule_ )
const auto entry_point = skip(space) [ main_rule_ ];
}
int main()
{
printf( "Spirit X3 version: %4.4x\n", SPIRIT_X3_VERSION );
char output;
bool r = false;
bool r2 = false; // answer according to default "C" locale
char input[2];
input[1] = 0;
printf( "ascii::cntrl\n" );
uint8_t i = 0;
next_char:
input[0] = (char)i;
r = parse( (char*)input, input+1, client::parser::entry_point, output );
r2 = (bool)std::iscntrl( (unsigned char)i );
printf( "%2.2x:%d%d", i, r, r2 );
if ( i == 0x7f ) { goto exit_loop; }
++i;
if ( i % 8 ) { putchar( ' ' ); } else { putchar( '\n' ); }
goto next_char;
exit_loop:
return 0;
}
The output is:
Spirit X3 version: 3004
ascii::cntrl
00:11 01:11 02:11 03:11 04:11 05:11 06:11 07:11
08:11 09:01 0a:01 0b:01 0c:01 0d:01 0e:11 0f:11
10:11 11:11 12:11 13:11 14:11 15:11 16:11 17:11
18:11 19:11 1a:11 1b:11 1c:11 1d:11 1e:11 1f:11
20:00 21:00 22:00 23:00 24:00 25:00 26:00 27:00
28:00 29:00 2a:00 2b:00 2c:00 2d:00 2e:00 2f:00
30:00 31:00 32:00 33:00 34:00 35:00 36:00 37:00
38:00 39:00 3a:00 3b:00 3c:00 3d:00 3e:00 3f:00
40:00 41:00 42:00 43:00 44:00 45:00 46:00 47:00
48:00 49:00 4a:00 4b:00 4c:00 4d:00 4e:00 4f:00
50:00 51:00 52:00 53:00 54:00 55:00 56:00 57:00
58:00 59:00 5a:00 5b:00 5c:00 5d:00 5e:00 5f:00
60:00 61:00 62:00 63:00 64:00 65:00 66:00 67:00
68:00 69:00 6a:00 6b:00 6c:00 6d:00 6e:00 6f:00
70:00 71:00 72:00 73:00 74:00 75:00 76:00 77:00
78:00 79:00 7a:00 7b:00 7c:00 7d:00 7e:00 7f:11
So the first bit after the colon is the answer according to X3 and the second bit is the answer according to C++. The mismatch happens on the characters that also fall into the category isspace. Recently I'm more looking into the library headers, but I still haven't found a part that explains this behavior.
Why the disparity? Do I have missed something?
Oh yeah, I love my goto statements. And my retro C style. I hope you do too! Even for an X3 parser.
You accidentally run amok with the skipper which eats any whitespace before you can actually parse it.
I simplified the parser and now it succeeds:
As a note about style: there's no reason ever to
use C style casts (they're dangerous)
write a loop with goto (considered harmful)
use cryptic variable names (r, r2?)
Live On Coliru
#include <boost/spirit/home/x3/version.hpp>
#include <boost/spirit/home/x3.hpp>
#include <cctype>
#include <cstddef>
#include <cstdint>
#include <iostream>
#include <iomanip>
namespace client::parser {
using namespace boost::spirit::x3;
//const auto entry_point = skip(space)[ ascii::cntrl ];
const auto entry_point = ascii::cntrl;
}
int main() {
std::cout << std::boolalpha << std::hex << std::setfill('0');
std::cout << "Spirit X3 version: " << SPIRIT_X3_VERSION << "\n";
for (uint8_t i = 0; i <= 0x7f; ++i) {
auto from_x3 = parse(&i, &i + 1, client::parser::entry_point);
auto from_std = !!std::iscntrl(i);
if (from_x3 != from_std) {
std::cout << "0x" << std::setw(2) << static_cast<unsigned>(i) << "\tx3:" << from_x3 << "\tstd:" << from_std << '\n';
}
}
std::cout << "Done\n";
}
Prints simply
Spirit X3 version: 3000
Done
With the "bad line" commented in instead:
Live On Coliru
Spirit X3 version: 3000
0x09 x3:false std:true
0x0a x3:false std:true
0x0b x3:false std:true
0x0c x3:false std:true
0x0d x3:false std:true
Done

Alternative to sscanf_s in C++

result = sscanf_s(line.c_str(), "data (%d,%d)", &a, &b);
In the code above I am using sscanf_s to extract two integer values from the given string line. Is there another way, more object-oriented, of doing that in C++11? (std::stringstream and/or regular expressions?)
EDIT: I tried two solutions, first one doesn't work, second one does
// solution one (doesn't work)
// let line = "data (3,4)"
std::regex re("data (.*,.*)");
std::smatch m;
if (std::regex_search(line, m, re) )
cout << m[0] << " "<< m[1]; // I get the string "data (3,4) (3,4)"
// solution two (works but is somewhat ugly)
std::string name;
char openParenthesis;
char comma;
char closeParenthesis;
int x = 0, y = 0;
std::istringstream stream(line);
stream >> name >> openParenthesis >> a >> comma >> b >> closeParenthesis;
if( name=="data" && openParenthesis == '(' && comma == ',' && closeParenthesis == ')' )
{
a = x;
b = y;
}
EDIT 2: With Shawn's input, the following works perfectly:
std::regex re(R"(data \(\s*(\d+),\s*(\d+)\))");
std::smatch m;
if (std::regex_search(line, m, re) )
{
a = std::stoi(m[1]);
b = std::stoi(m[2]);
}
If it has not to be regex per se, you could use Boost.Spirit. The following is a slight modification of this example and gives you any number of comma-separated integers in a vector. (That is not exactly what you requested, but showing off a bit of what else would be possible, and also I didn't want to put more effort into changing the example).
This works on iterators, i.e. strings as well as streams. It's also trivially expandable to more complex grammars, and you can create stand-alone grammar objects you can re-use, or combine into yet more complex grammars. (Not demonstrated here.)
#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/phoenix_core.hpp"
#include "boost/spirit/include/phoenix_operator.hpp"
#include "boost/spirit/include/phoenix_stl.hpp"
#include <iostream>
#include <string>
#include <vector>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
template < typename Iterator >
bool parse_data( Iterator first, Iterator last, std::vector< int > & v )
{
bool r = qi::phrase_parse( first, last,
// Begin grammar
(
qi::lit( "data" ) >> '('
>> qi::int_[ phoenix::push_back( phoenix::ref( v ), qi::_1 ) ]
>> *( ',' >> qi::int_[ phoenix::push_back( phoenix::ref( v ), qi::_1 ) ] )
>> ')'
),
// End grammar
ascii::space );
if ( first != last ) // fail if we did not get a full match
{
return false;
}
return r;
}
int main()
{
std::string input = "data (38,4)";
std::vector< int > v;
if ( parse_data( input.begin(), input.end(), v ) )
{
std::cout << "Read:\n";
for ( auto i : v )
{
std::cout << i << "\n";
}
}
else
{
std::cout << "Failed.\n";
}
return 0;
}

How do I take the output of a parse and use it to look up in a symbols

As can be seen from the code I'm taking the output of one parse and using it to look up the number from the symbols in a second parse.
How do I do this as a single rule? Looking at the docs and doing a lot of searching leads me to believe this can be done with a local var, but I can't figure out how to use my symbols quad on that var.
int main()
{
using boost::phoenix::ref;
using qi::_1;
using qi::_val;
using qi::no_case;
using qi::_a;
using qi::symbols;
using qi::char_;
using qi::omit;
symbols<char, int> quad;
quad.add
("1", 1)
("2", 2)
("3", 3)
("4", 4)
("NE", 1)
("SE", 2)
("SW", 3)
("NW", 4)
;
std::wstring s = L"N44°30'14.950\"W";
std::wstring out;
int iQuad;
qi::parse(s.begin(), s.end(),
no_case[char_('N')] >> omit[*(qi::char_ - no_case[char_("NSEW")])] >> no_case[char_('W')],
out);
qi::parse(out.begin(), out.end(), quad, iQuad);
return 0;
}
Yes it can be done with a local var.
However, that demotes symbols to a regular map. So let's use that¹
1. The simplest thing
Firstly, I'd consider doing the simplest thing:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
namespace Rules {
namespace qi = boost::spirit::qi;
qi::rule<std::wstring::const_iterator, int()> quad = qi::no_case [
('N' >> *~qi::char_("EW") >> 'E')[ qi::_val = 1 ] |
('S' >> *~qi::char_("EW") >> 'E')[ qi::_val = 2 ] |
('S' >> *~qi::char_("EW") >> 'W')[ qi::_val = 3 ] |
('N' >> *~qi::char_("EW") >> 'W')[ qi::_val = 4 ]
];
}
int main() {
for (std::wstring const s : {
L"NE", L"SE", L"SW", L"NW",
L"N44°30'14.950\"E",
L"N44°30'14.950\"W",
L"S44°30'14.950\"W",
L"S44°30'14.950\"E",
L"1", L"2", L"3", L"4",
})
{
int iQuad;
auto f = s.begin(), l = s.end();
bool ok = parse(f, l, Rules::quad, iQuad);
if (ok)
std::wcout << L"Parsed: '" << s << L"' -> " << iQuad << L"\n";
else
std::wcout << L"Parse failed '" << s << L"'\n";
if (f!=l)
std::wcout << L"Remaining unparsed: '" << std::wstring(f,l) << L"'\n";
}
}
Which prints
Live On Coliru
Parsed: 'NE' -> 1
Parsed: 'SE' -> 2
Parsed: 'SW' -> 3
Parsed: 'NW' -> 4
Parsed: 'N44?30'14.950"E' -> 1
Parsed: 'N44?30'14.950"W' -> 4
Parsed: 'S44?30'14.950"W' -> 3
Parsed: 'S44?30'14.950"E' -> 2
Parse failed '1'
Remaining unparsed: '1'
Parse failed '2'
Remaining unparsed: '2'
Parse failed '3'
Remaining unparsed: '3'
Parse failed '4'
Remaining unparsed: '4'
If you want to make the numerics parse as well, just add
qi::rule<std::wstring::const_iterator, int()> quad = qi::no_case [
(qi::int_(1) | qi::int_(2) | qi::int_(3) | qi::int_(4)) [ qi::_val = qi::_1 ] |
('N' >> *~qi::char_("EW") >> 'E')[ qi::_val = 1 ] |
('S' >> *~qi::char_("EW") >> 'E')[ qi::_val = 2 ] |
('S' >> *~qi::char_("EW") >> 'W')[ qi::_val = 3 ] |
('N' >> *~qi::char_("EW") >> 'W')[ qi::_val = 4 ]
];
All this can be optimized, but I'll venture the guess that it's more efficient than anything based on symbol and 2-phase parse
2. Using a map lookup
Just... use a map:
template <typename It> struct MapLookup : qi::grammar<It, int()> {
MapLookup() : MapLookup::base_type(start) {
namespace px = boost::phoenix;
start = qi::as_string [
qi::char_("1234") |
qi::char_("nsNS") >> qi::omit[*~qi::char_("weWE")] >> qi::char_("weWE")
] [ qi::_val = px::ref(_lookup)[qi::_1] ];
}
private:
struct ci {
template <typename A, typename B>
bool operator()(A const& a, B const& b) const { return boost::ilexicographical_compare(a, b); }
};
std::map<std::string, int, ci> _lookup = {
{ "NE", 1 }, { "SE", 2 }, { "SW", 3 }, { "NW", 4 },
{ "1" , 1 }, { "2", 2 }, { "3", 3 }, { "4", 4 } };
qi::rule<It, int()> start;
};
See it Live On Coliru too.
3. Optimizing it
qi::symbol uses Tries. You might think that's faster. It is, in fact pretty fast for lookups. But not on very small keysets. On a node-based container. Using dynamically allocated temporary keys.
In other words, we can do much better:
template <typename It> struct FastLookup : qi::grammar<It, int()> {
using key = std::array<char, 2>;
FastLookup() : FastLookup::base_type(start) {
namespace px = boost::phoenix;
start =
qi::int_ [ qi::_pass = (qi::_1 > 0 && qi::_1 <= 4), qi::_val = qi::_1 ] |
qi::raw [
qi::char_("nsNS") >> qi::omit[*~qi::char_("weWE")] >> qi::char_("weWE")
] [ qi::_val = _lookup(qi::_1) ];
}
private:
struct lookup_f {
template <typename R> int operator()(R const& range) const {
using key = std::tuple<char, char>;
static constexpr key index[] = { key {'N','E'}, key {'S','E'}, key {'S','W'}, key {'N','W'}, };
using namespace std;
auto a = std::toupper(*range.begin());
auto b = std::toupper(*(range.end()-1));
return 1 + (find(begin(index), end(index), key(a, b)) - begin(index));
}
};
boost::phoenix::function<lookup_f> _lookup;
qi::rule<It, int()> start;
};
See it Live Again On Coliru
¹ if you insist you can use symbols in your own code

trigger warning from boost spirit parser

How I can add warnings in boost spirit parser.
Edit: ... that could report the issue with position
For example if I have an integer parser:
('0' >> oct)
| int_
I would like to be able to do something like this:
('0' >> oct)
| "-0" --> trigger warning("negative octal values are not supported, it will be interpreted as negative decimal value and the leading 0 will be ignored")
| int_
Q. Can I create my own callback? How?
A. Sure. Any way you'd normally do it in C++ (or look at Boost Signal2 and/or Boost Log)
parser(std::function<bool(std::string const& s)> callback)
: parser::base_type(start),
callback(callback)
{
using namespace qi;
start %=
as_string[+graph]
[ _pass = phx::bind(callback, _1) ]
% +space
;
BOOST_SPIRIT_DEBUG_NODES((start));
}
As you can see, you can even make the handler decide whether the warning should be ignored or cause the match to fail.
UPDATE #1 I've extended the sample to show some of the unrelated challenges you mentioned in the comments (position, duplicate checking). Hope this helps
Here's a simple demonstration: see it Live on Coliru (Word)
UPDATE #2 I've even made it (a) store the source information instead of the iterators, (b) made it "work" with floats (or any other exposed attribute type, really).
Note how uncannily similar it is, s/Word/Number/, basically: Live On Coliru (Number)
#define BOOST_RESULT_OF_USE_DECLTYPE // needed for gcc 4.7, not clang++
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <functional>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
// okay, so you want position reporting (actually unrelated):
#include <boost/spirit/include/support_line_pos_iterator.hpp>
using It = boost::spirit::line_pos_iterator<std::string::const_iterator>;
// AST type that represents a Number 'token' (with source and location
// information)
struct Number
{
double value;
size_t line_pos;
std::string source;
explicit Number(double value = 0.0, boost::iterator_range<It> const& range = {})
:
value(value),
line_pos(get_line(range.begin())),
source(range.begin(), range.end())
{}
bool operator< (const Number& other) const { return (other.value - value) > 0.0001; }
};
// the exposed attribute for the parser:
using Words = std::set<Number>;
// the callback signature for our warning; you could make it more like
// `on_error` so it takes the iterators directly, but again, I'm doing the
// simple thing for the dmeo
using Callback = std::function<bool(Number const& s)>;
template <typename It>
struct parser : qi::grammar<It, Words()>
{
parser(Callback warning)
: parser::base_type(start),
warning(warning)
{
using namespace qi;
auto check_unique = phx::end(_val) == phx::find(_val, _1);
word =
raw [ double_ [ _a = _1 ] ] [ _val = phx::construct<Number>(_a, _1) ]
;
start %=
- word [ _pass = check_unique || phx::bind(warning, _1) ]
% +space
>> eoi
;
}
private:
Callback warning;
qi::rule<It, Number(), qi::locals<double> > word;
qi::rule<It, Words()> start;
};
int main(int argc, const char *argv[])
{
// parse command line arguments
const auto flags = std::set<std::string> { argv+1, argv+argc };
const bool fatal_warnings = end(flags) != flags.find("-Werror");
// test input
const std::string input("2.4 2.7 \n\n\n-inf \n\nNaN 88 -2.40001 \n3.14 240001e-5\n\ninf");
// warning handler
auto warning_handler = [&](Number const& w) {
std::cerr << (fatal_warnings?"Error":"Warning")
<< ": Near-identical entry '" << w.source << "' at L:" << w.line_pos << "\n";
return !fatal_warnings;
};
// do the parse
It f(begin(input)), l(end(input));
bool ok = qi::parse(f, l, parser<It>(warning_handler));
// report results
if (ok) std::cout << "parse success\n";
else std::cerr << "parse failed\n";
if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
// exit code
return ok? 0 : 255;
}
Prints:
Warning: Near-identical entry 'NaN' at L:6
Warning: Near-identical entry '240001e-5' at L:7
parse success