boost::spirit::x3 parsing is slower than strsep parsing

boost::spirit::x3 parsing is slower than strsep parsing - c++

I wrote a x3 parser to parse a structured text file, here is the demo code:
int main() {
char buf[10240];
type_t example; // def see below
FILE* fp = fopen("text", "r");
while (fgets(buf, 10240, fp)) // read to the buffer
{
int n = strlen(buf);
example.clear();
if (client::parse_numbers(buf, buf+n, example)) // def see below
{ // do nothing here, only parse the buf and fill into the example }
}
}
struct type_t {
int id;
std::vector<int> fads;
std::vector<int> fbds;
std::vector<float> fvalues;
float target;
void clear() {
fads.clear();
fbds.clear();
fvalues.clear();
}
};
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, type_t& example)
{
using x3::int_;
using x3::double_;
using x3::phrase_parse;
using x3::parse;
using x3::_attr;
using ascii::space;
auto fn_id = [&](auto& ctx) { example.id = _attr(ctx); };
auto fn_fad = [&](auto& ctx) { example.fads.push_back(_attr(ctx)); };
auto fn_fbd = [&](auto& ctx) { example.fbds.push_back(_attr(ctx)); };
auto fn_value = [&](auto& ctx) { example.fvalues.push_back(_attr(ctx)); };
auto fn_target = [&](auto& ctx) { example.target = _attr(ctx); };
bool r = phrase_parse(first, last,
// Begin grammar
(
int_[fn_id] >>
double_[fn_target] >>
+(int_[fn_fad] >> ':' >> int_[fn_fbd] >> ':' >> double_[fn_value])
)
,
// End grammar
space);
if (first != last) // fail if we did not get a full match
return false;
return r;
}
//]
}
Am I doing it the right way or how to improve? I'd like to see if any optimization could be done before I switch back to my strsep parsing implementation, since it's much faster than this x3 version.

Why do you use semantic actions for this? An interesting point to read about is sehe's article Boost Spirit: “Semantic actions are evil”? and other notes about.
Parsing into an AST structure as shown by the X3 examples, e.g. Employee - Parsing into structs is IMO much more natural. You need the visitor pattern to evaluate the data later on.
One solution is shown here:
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
namespace ast {
struct triple {
double fad;
double fbd;
double value;
};
struct data {
int id;
double target;
std::vector<ast::triple> triple;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::triple, fad, fbd, value)
BOOST_FUSION_ADAPT_STRUCT(ast::data, id, target, triple)
namespace x3 = boost::spirit::x3;
namespace parser {
using x3::int_; using x3::double_;
auto const triple = x3::rule<struct _, ast::triple>{ "triple" } =
int_ >> ':' >> int_ >> ':' >> double_;
auto const data = x3::rule<struct _, ast::data>{ "data" } =
int_ >> double_ >> +triple;
}
int main()
{
std::stringstream buffer;
std::ifstream file{ R"(C:\data.txt)" };
if(file.is_open()) {
buffer << file.rdbuf();
file.close();
}
auto iter = std::begin(buffer.str());
auto const end = std::cend(buffer.str());
ast::data data;
bool parse_ok = x3::phrase_parse(iter, end, parser::data, x3::space, data);
if(parse_ok && (iter == end)) return true;
return false;
}
It does compile (see Wandbox), but isn't tested due to missing input data (which you can generate by you own inside the main() of course), but you are interested in benchmarking only.
Also note the use of stringstream to read the rdbuf. The are several ways to skin the cat, I refer here to How to read in a file in C++ where the rdbufreading approach is fast.
Further, how did you benchmark? Simply measure the time required by x3::phrase_parse() resp. strsep part only or the hole binary? file loading time inclusive? It must be compareable! Also consider OS filesystem caching etc.
BTW, it would be interesting to see the results and the test environment (data file size, strsep implementation etc).
Addendum:
If you approximately know how much data you can expect, you can pre-allocate memory for the vector using data.triple.reserve(10240); (or write an own constructor with this as arg). This prevents re-allocating during parsing (don't forget to enclose this into try/catch block to capture std::bad_alloc etc.). IIR the default capacity is 1000 on older gcc.

Related

Boost::Spirit doubles character when followed by a default value

I use boost::spirit to parse (a part) of a monomial like x, y, xy, x^2, x^3yz. I want to save the variables of the monomial into a map, which also stores the corresponding exponent. Therefore the grammar should also save the implicit exponent of 1 (so x stores as if it was written as x^1).
start = +(potVar);
potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));// First try: This doubles the variable name
//potVar = varName >> (('^' >> exponent) | qi::attr(1));// Second try: This works as intended
exponent = qi::int_;
varName = qi::char_("a-z");
When using the default attribute as in the line "First try", Spirit doubles the variable name.
Everything works as intended when using the default attribute as in the line "Second try".
'First try' reads a variable x and stores the pair [xx, 1].
'Second try' reads a variable x and stores the pair [x, 1].
I think I solved the original problem myself. The second try works. However, I don't see how I doubled the variable name. Because I am about to get familiar with boost::spirit, which is a collection of challenges for me, and there are probably more to come, I would like to understand this behavior.
This is the whole code to recreate the problem. The frame of the grammar is copied from a presentation of the KIT https://panthema.net/2018/0912-Boost-Spirit-Tutorial/ , and Stackoverflow was already very helpful, when I needed the header, which enables me to use the std::pair.
#include <iostream>
#include <iomanip>
#include <stdexcept>
#include <cmath>
#include <map>
#include <utility>//for std::pair
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp> //https://stackoverflow.com/questions/53953642/parsing-map-of-variants-with-boost-spirit-x3
namespace qi = boost::spirit::qi;
template <typename Parser, typename Skipper, typename ... Args>
void PhraseParseOrDie(
const std::string& input, const Parser& p, const Skipper& s,
Args&& ... args)
{
std::string::const_iterator begin = input.begin(), end = input.end();
boost::spirit::qi::phrase_parse(
begin, end, p, s, std::forward<Args>(args) ...);
if (begin != end) {
std::cout << "Unparseable: "
<< std::quoted(std::string(begin, end)) << std::endl;
throw std::runtime_error("Parse error");
}
}
class ArithmeticGrammarMonomial : public qi::grammar<
std::string::const_iterator,
std::map<std::string, int>(), qi::space_type>
{
public:
using Iterator = std::string::const_iterator;
ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start)
{
start = +(potVar);
potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));
//potVar = varName >> (('^' >> exponent) | qi::attr(1));
exponent = qi::int_;
varName = qi::char_("a-z");
}
qi::rule<Iterator, std::map<std::string, int>(), qi::space_type> start;
qi::rule<Iterator, std::pair<std::string, int>(), qi::space_type> potVar;
qi::rule<Iterator, int()> exponent;
qi::rule<Iterator, std::string()> varName;
};
void test2(std::string input)
{
std::map<std::string, int> out_map;
PhraseParseOrDie(input, ArithmeticGrammarMonomial(), qi::space, out_map);
std::cout << "test2() parse result: "<<std::endl;
for(auto &it: out_map)
std::cout<< it.first<<it.second << std::endl;
}
/******************************************************************************/
int main(int argc, char* argv[])
{
std::cout << "Parse Monomial 1" << std::endl;
test2(argc >= 2 ? argv[1] : "x^3y^1");
test2(argc >= 2 ? argv[1] : "xy");
return 0;
}
Live demo

I think I solved the original problem myself. The second try works.
Indeed. It's how I'd do this (always match the AST with your parser expressions).
However, I don't see how I doubled the variable name.
It's due to backtracking with container attributes. They don't get rolled back. So the first branch parses potVar into a string, and then the parser backtracks into the second branch, which parses potVar into the same string.
boost::spirit::qi duplicate parsing on the output
Understanding Boost.spirit's string parser
Parsing with Boost::Spirit (V2.4) into container
Boost Spirit optional parser and backtracking
boost::spirit alternative parsers return duplicates
It can also crop up with semantic actions:
Boost Semantic Actions causing parsing issues
Boost Spirit optional parser and backtracking
In short:
match your AST structure in your rule expression, or use qi::hold to force the issue (at performance cost)
avoid semantic actions (Boost Spirit: "Semantic actions are evil"?)
For inspiration, here's a simplified take using Spirit X3
Live On Compiler Explorer
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
#include <map>
namespace Parsing {
namespace x3 = boost::spirit::x3;
auto exponent = '^' >> x3::int_ | x3::attr(1);
auto varName = x3::repeat(1)[x3::char_("a-z")];
auto potVar
= x3::rule<struct P, std::pair<std::string, int>>{}
= varName >> exponent;
auto start = x3::skip(x3::space)[+potVar >> x3::eoi];
template <typename T = x3::unused_type>
void StrictParse(std::string_view input, T&& into = {})
{
auto f = input.begin(), l = input.end();
if (!x3::parse(f, l, start, into)) {
fmt::print(stderr, "Error at: '{}'\n", std::string(f, l));
throw std::runtime_error("Parse error");
}
}
} // namespace Parsing
void test2(std::string input) {
std::map<std::string, int> out_map;
Parsing::StrictParse(input, out_map);
fmt::print("{} -> {}\n", input, out_map);
}
int main() {
for (auto s : {"x^3y^1", "xy"})
test2(s);
}
Prints
x^3y^1 -> [("x", 3), ("y", 1)]
xy -> [("x", 1), ("y", 1)]
Bonus Notes
It looks to me like you should be more careful. Even if you assume that all variables are 1 letter and no terms can occur (only factors), then still you need to correctly handle x^5y^2x to be x^6y^2 right?
Here's Qi version that uses semantic actions to correctly accumulate like factors:
Live On Coliru
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
#include <map>
namespace qi = boost::spirit::qi;
using Iterator = std::string::const_iterator;
using Monomial = std::map<char, int>;
struct ArithmeticGrammarMonomial : qi::grammar<Iterator, Monomial()> {
ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start) {
using namespace qi;
exp_ = '^' >> int_ | attr(1);
start = skip(space)[ //
+(char_("a-z") >> exp_)[_val[_1] += _2] //
];
}
private:
qi::rule<Iterator, Monomial()> start;
qi::rule<Iterator, int(), qi::space_type> exp_;
};
void do_test(std::string_view input) {
Monomial output;
static const ArithmeticGrammarMonomial p;
Iterator f(begin(input)), l(end(input));
qi::parse(f, l, qi::eps > p, output);
std::cout << std::quoted(input) << " -> " << std::endl;
for (auto& [var,exp] : output)
std::cout << " - " << var << '^' << exp << std::endl;
}
int main() {
for (auto s : {"x^3y^1", "xy", "x^5y^2x"})
do_test(s);
}
Prints
"x^3y^1" ->
- x^3
- y^1
"xy" ->
- x^1
- y^1
"x^5y^2x" ->
- x^6
- y^2

How do you get a string out of a Boost Spirit X3 lexeme parser?

What is the simplest way to make a semantic action that extracts a string from a typical identifier parser based on boost::spirit::x3::lexeme?
I thought it might be possible to bypass needing to unpack the attribute and just use iterators into the input stream but apparently x3::_where does not do what I think it does.
The following yields output being empty. I expected it to contain "foobar_hello".
namespace x3 = boost::spirit::x3;
using x3::_where;
using x3::lexeme;
using x3::alpha;
auto ctx_to_string = [&](auto& ctx) {
_val(ctx) = std::string(_where(ctx).begin(), _where(ctx).end());
};
x3::rule<class identifier_rule_, std::string> const identifier_rule = "identifier_rule";
auto const identifier_rule_def = lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')][ctx_to_string];
BOOST_SPIRIT_DEFINE(identifier_rule);
int main()
{
std::string input = "foobar_hello";
std::string output;
auto result = x3::parse(input.begin(), input.end(), identifier_rule, output);
}
Do I need to somehow extract the string from the boost::fusion objects in x3::_attr(ctx) or am I doing something wrong?

You can simply use automatic attribute propagation, meaning you don't need the semantic action(1)
Live On Coliru
#include <iostream>
#include <iomanip>
#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace P {
x3::rule<class identifier_rule_, std::string> const identifier_rule = "identifier_rule";
auto const identifier_rule_def = x3::lexeme[(x3::alpha | x3::char_('_')) >> *(x3::alnum | x3::char_('_'))];
BOOST_SPIRIT_DEFINE(identifier_rule)
}
int main() {
std::string const input = "foobar_hello";
std::string output;
auto result = x3::parse(input.begin(), input.end(), P::identifier_rule, output);
}
Prints
<identifier_rule>
<try>foobar_hello</try>
<success></success>
<attributes>[f, o, o, b, a, r, _, h, e, l, l, o]</attributes>
</identifier_rule>
Note I changed '_' to x3::char_('_') to capture the underscores (x3::lit does not capture what it matches)
If you insist on semantic actions,
consider using rule<..., std::string, true> to also force automatic attrobute propagation
don't assume _where points to what you hope: http://coliru.stacked-crooked.com/a/336c057dabc86a84
use x3::raw[] to expose a controlled source iterator range (http://coliru.stacked-crooked.com/a/80a69ae9b99a4c61)
auto ctx_to_string = [](auto& ctx) {
std::cout << "\nSA: '" << _attr(ctx) << "'" << std::endl;
_val(ctx) = std::string(_attr(ctx).begin(), _attr(ctx).end());
};
x3::rule<class identifier_rule_, std::string> const identifier_rule = "identifier_rule";
auto const identifier_rule_def = x3::raw[ lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')] ] [ctx_to_string];
BOOST_SPIRIT_DEFINE(identifier_rule)
Note now the char_('_') doesn't make a difference anymore
consider using the built-in attribute helpers: http://coliru.stacked-crooked.com/a/3e3861330494e7c9
auto ctx_to_string = [](auto& ctx) {
using x3::traits::move_to;
move_to(_attr(ctx), _val(ctx));
};
Note how this approximates the builtin attribute propagation, though it's much less flexible than letting Spirit manage it
(1) mandatory link: Boost Spirit: "Semantic actions are evil"?

Spirit Grammar To break Up a String by Number of Characters

I am working on learning to write spirit grammars and I am trying to create a basic base 16 to base 64 converter that takes in a string representing hex, for example:
49276d206b696c
parse out 6 or less characters (less if the string isn't a perfect multiple of 6) and generate a base 64 encoded string from the input. One grammar I figured would probably work is something like this:
// 6 characters
`(qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F")[/*action*/]) |
// or 5 characters
(qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
qi::char_("0-9a-fA-F")[/*action*/]) | ...`
etc.... all the way down to one character, Or having a different rule defined for each number of characters, but I think there must be a better way to specify the grammar. I read about spirit repeat and was thinking maybe I could do something like
+(boost::spirit::repeat(1, 6)[qi::char_("0-9a-fA-F")][/*action on characters*/])
however the compiler throws an error on this, because of the sematic action portion of the grammar. Is there a simpler way to specify a grammar to operate on exactly 6 or less characters at a time?
Edit
Here is what I have done so far...
base16convertergrammar.hpp
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>
namespace grammar {
namespace qi = boost::spirit::qi;
void toBase64(const std::string& p_input, std::string& p_output)
{
if (p_input.length() < 6)
{
// pad length
}
// use back inserter and generator to append to end of p_output.
}
template <typename Iterator>
struct Base16Grammar : qi::grammar<Iterator, std::string()>
{
Base16Grammar() : Base16Grammar::base_type(start, "base16grammar"),
m_base64String()
{
// get six characters at a time and send them off to be encoded
// if there is less than six characters just parse what we have
start = +(boost::spirit::repeat(1, 6)[qi::char_("0-9a-fA-F")][boost::phoenix::bind(toBase64, qi::_1,
boost::phoenix::ref(m_base64String))]);
}
qi::rule<Iterator, std::string()> start;
std::string m_base64String;
};
}
And here is the usage...
base16converter.cpp
#include "base16convertergrammar.hpp"
const std::string& convertHexToBase64(const std::string& p_hexString)
{
grammar::Base16Grammar<std::string::const_iterator> g;
bool r = boost::spirit::qi::parse(p_hexString.begin(), p_hexString.end(), g);
}
int main(int argc, char** argv)
{
std::string test("49276d206b696c6c");
convertHexToBase64(test);
}

First of all, repeat()[] exposes a vector, so vector<char>, not a string.
void toBase64(const std::vector<char>& p_input, std::string& p_output)
Secondly, please don't do all that work. You don't tell us what the input means, but as long as you want to group it in sixes, I'm assuming you want them interpreted as /something/. You could e.g. use the int_parser:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>
namespace grammar {
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template <typename Iterator>
struct Base16Grammar : qi::grammar<Iterator, std::string()>
{
Base16Grammar() : Base16Grammar::base_type(start, "base16grammar")
{
start = +qi::int_parser<uint64_t, 16, 1, 6>() [ qi::_val += to_string(qi::_1) + "; " ];
}
private:
struct to_string_f { template <typename T> std::string operator()(T const& v) const { return std::to_string(v); } };
px::function<to_string_f> to_string;
qi::rule<Iterator, std::string()> start;
};
}
std::string convertHexToBase64(const std::string& p_hexString)
{
grammar::Base16Grammar<std::string::const_iterator> g;
std::string result;
bool r = boost::spirit::qi::parse(p_hexString.begin(), p_hexString.end(), g, result);
assert(r);
return result;
}
int main()
{
for (std::string test : {"49276d206b696c6c"})
std::cout << test << " -> " << convertHexToBase64(test) << "\n";
}
Prints
49276d206b696c6c -> 4794221; 2124649; 27756;

Going out on a limb, you just want to transcode hex-encoded binary into base64.
Since you're already using Boost:
Live On Coliru
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/insert_linebreaks.hpp>
#include <boost/archive/iterators/transform_width.hpp>
// for hex decoding
#include <boost/iterator/function_input_iterator.hpp>
#include <string>
#include <iostream>
#include <functional>
std::string convertHexToBase64(const std::string &hex) {
struct get_byte_f {
using result_type = uint8_t;
std::string::const_iterator hex_it;
result_type operator()() {
auto nibble = [](uint8_t ch) {
if (!std::isxdigit(ch)) throw std::runtime_error("invalid hex input");
return std::isdigit(ch) ? ch - '0' : std::tolower(ch) - 'a' + 10;
};
auto hi = nibble(*hex_it++);
auto lo = nibble(*hex_it++);
return hi << 4 | lo;
}
} get_byte{ hex.begin() };
using namespace boost::archive::iterators;
using It = boost::iterators::function_input_iterator<get_byte_f, size_t>;
typedef insert_linebreaks< // insert line breaks every 72 characters
base64_from_binary< // convert binary values to base64 characters
transform_width< // retrieve 6 bit integers from a sequence of 8 bit bytes
It, 6, 8> >,
72> B64; // compose all the above operations in to a new iterator
return { B64(It{get_byte, 0}), B64(It{get_byte, hex.size()/2}) };
}
int main() {
for (std::string test : {
"49276d206b696c6c",
"736f6d65206c656e67746879207465787420746f2073686f77207768617420776f756c642068617070656e206174206c696e6520777261700a"
})
{
std::cout << " === hex: " << test << "\n" << convertHexToBase64(test) << "\n";
}
}
Prints
=== hex: 49276d206b696c6c
SSdtIGtpbGw
=== hex: 736f6d65206c656e67746879207465787420746f2073686f77207768617420776f756c642068617070656e206174206c696e6520777261700a
c29tZSBsZW5ndGh5IHRleHQgdG8gc2hvdyB3aGF0IHdvdWxkIGhhcHBlbiBhdCBsaW5lIHdy
YXAK

Parse a C-string of floating numbers

I have a C-string which contains a list of floating numbers separated by commas and spaces. Each pair of numbers is separated by one (or more) spaces and represents a point where the x and y fields are separated by a comma (and optionally by spaces).
" 10,9 2.5, 3 4 ,150.32 "
I need to parse this string in order to fill a list of Point(x, y).
Following is my current implementation:
const char* strPoints = getString();
std::istringstream sstream(strPoints);
float x, y;
char comma;
while (sstream >> x >> comma >> y)
{
myList.push(Point(x, y));
}
Since I need to parse a lot (up to 500,000) of these strings I'm wondering if there is a faster solution.

Look at Boost Spirit:
How to parse space-separated floats in C++ quickly?
It supports NaN, positive and negative infinity just fine. Also it allows you to express the constraining grammar succinctly.
Simple adaptation of the code
Here is the adapted sample for your grammar:
struct Point { float x,y; };
typedef std::vector<Point> data_t;
// And later:
bool ok = phrase_parse(f,l,*(double_ > ',' > double_), space, data);
The iterators can be any iterators. So you can hook it up with your C string just fine.
Here's a straight adaptation of the linked benchmark case. This shows you how to parse from any std::istream or directly from a memory mapped file.
Live On Coliru
Further optimizations (strictly for C strings)
Here's a version that doesn't need to know the length of the string up front (this is neat because it avoids the strlen call in case you didn't have the length available):
template <typename OI>
static inline void parse_points(OI out, char const* it, char const* last = std::numeric_limits<char const*>::max()) {
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
bool ok = qi::phrase_parse(it, last,
*(qi::double_ >> ',' >> qi::double_) [ *phx::ref(out) = phx::construct<Point>(qi::_1, qi::_2) ],
qi::space);
if (!ok || !(it == last || *it == '\0')) {
throw it; // TODO proper error reporting?
}
}
Note how I made it take an output iterator so that you get to decide how to accumulate the results. The obvious wrapper to /just/ parse to a vector would be:
static inline data_t parse_points(char const* szInput) {
data_t pts;
parse_points(back_inserter(pts), szInput);
return pts;
}
But you can also do different things (like append to an existing container, that could have reserved a known capacity up front etc.). Things like this often allow truly optimized integration in the end.
Here's that code fully demo-ed in ~30 lines of essential code:
Live On Coliru
Extra Awesome Bonus
To show off the flexibility of this parser; if you just wanted to check the input and get a count of the points, you can replace the output iterator with a simple lambda function that increments a counter instead of adds a newly constructed point.
int main() {
int count = 0;
parse_points( " 10,9 2.5, 3 4 ,150.32 ", boost::make_function_output_iterator([&](Point const&){count++;}));
std::cout << "elements in sample: " << count << "\n";
}
Live On Coliru
Since everything is inlined the compiler will notice that the whole Point doesn't need to be constructed here and eliminate that code: http://paste.ubuntu.com/9781055/
The main function is seen directly invoking the very parser primitives. Handcoding the parser won't get you better tuning here, at least not without a lot of effort.

I got much better performance parsing out the points using a combination of std::find and std::strtof and the code wasn't much more complicated. Here's the test I ran:
#include <iostream>
#include <sstream>
#include <random>
#include <chrono>
#include <cctype>
#include <algorithm>
#include <cstdlib>
#include <forward_list>
struct Point { float x; float y; };
using PointList = std::forward_list<Point>;
using Clock = std::chrono::steady_clock;
using std::chrono::milliseconds;
std::string generate_points(int n) {
static auto random_generator = std::mt19937{std::random_device{}()};
std::ostringstream oss;
std::uniform_real_distribution<float> distribution(-1, 1);
for (int i=0; i<n; ++i) {
oss << distribution(random_generator) << " ," << distribution(random_generator) << "\t \n";
}
return oss.str();
}
PointList parse_points1(const char* s) {
std::istringstream iss(s);
PointList points;
float x, y;
char comma;
while (iss >> x >> comma >> y)
points.push_front(Point{x, y});
return points;
}
inline
std::tuple<Point, const char*> parse_point2(const char* x_first, const char* last) {
auto is_whitespace = [](char c) { return std::isspace(c); };
auto x_last = std::find(x_first, last, ',');
auto y_first = std::find_if_not(std::next(x_last), last, is_whitespace);
auto y_last = std::find_if(y_first, last, is_whitespace);
auto x = std::strtof(x_first, (char**)&x_last);
auto y = std::strtof(y_first, (char**)&y_last);
auto next_x_first = std::find_if_not(y_last, last, is_whitespace);
return std::make_tuple(Point{x, y}, next_x_first);
}
PointList parse_points2(const char* i, const char* last) {
PointList points;
Point point;
while (i != last) {
std::tie(point, i) = parse_point2(i, last);
points.push_front(point);
}
return points;
}
int main() {
auto s = generate_points(500000);
auto time0 = Clock::now();
auto points1 = parse_points1(s.c_str());
auto time1 = Clock::now();
auto points2 = parse_points2(s.data(), s.data() + s.size());
auto time2 = Clock::now();
std::cout << "using stringstream: "
<< std::chrono::duration_cast<milliseconds>(time1 - time0).count() << '\n';
std::cout << "using strtof: "
<< std::chrono::duration_cast<milliseconds>(time2 - time1).count() << '\n';
return 0;
}
outputs:
using stringstream: 1262
using strtof: 120

You can first try to disable the sychronization with the C I/O:
std::ios::sync_with_stdio(false);
Source: Using scanf() in C++ programs is faster than using cin?
You can also try to use alternatives to iostream:
boost_lexical_cast and define BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
scanf
I think you should give the sync_with_stdio(false) a try. The other alternatives require more coding, and I'm not sure that you will win much (if any).

Boost spirit, returned value from a semantic action interferes with the rule attribute

The following program is an artificial example (reduced from a larger grammar on which I'm working) to exhibit a strange behaviour.
The output of the program run as is is "hello" and is incorrect.
If I remove the (useless in this example) semantic action from the quoted_string rule the output is the expected "foo=hello".
#define BOOST_RESULT_OF_USE_DECLTYPE
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <vector>
#include <string>
#include <iostream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include "utils.hpp"
namespace t {
using std::vector;
using std::string;
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
template <typename Iterator, typename Skipper=qi::space_type>
struct G1 : qi::grammar<Iterator, string(), Skipper> {
template <typename T>
using rule = qi::rule<Iterator, T, Skipper>;
qi::rule<Iterator, string(), qi::locals<char>> quoted_string;
rule<string()> start;
G1() : G1::base_type(start, "G1") {
{
using qi::_1;
using qi::_a;
using attr_signature = vector<char>;
auto handler = [](attr_signature const& elements) -> string {
string output;
for(auto& e : elements) {
output += e;
}
return output;
};
quoted_string = (qi::omit[qi::char_("'\"")[_a = _1]]
>> +(qi::char_ - qi::char_(_a))
>> qi::lit(_a))[qi::_val = phx::bind(handler, _1)];
}
start = qi::string("foo") >> -(qi::string("=") >> quoted_string);
}
};
string parse(string const input) {
G1<string::const_iterator> g;
string result;
phrase_parse(begin(input), end(input), g, qi::standard::space, result);
return result;
}
};
int main() {
using namespace std;
auto r = t::parse("foo='hello'");
cout << r << endl;
}
I can definitely find a workaround, but I'd figure out what am I missing

Like #cv_and_he explained, you're overwriting the attribute with the result of handler(_1). Since attributes are passed by reference, you lose the original value.
Automatic attribute propagation rules know how to concatenate "string" container values, so why don't you just use the default implementation?
quoted_string %= qi::omit[qi::char_("'\"")[_a = _1]]
>> +(qi::char_ - qi::char_(_a))
>> qi::lit(_a);
(Note the %=; this enables automatic propagation even in the presence of semantic actions).
Alternatively, you can push-back from inside the SA:
>> +(qi::char_ - qi::char_(_a)) [ phx::push_back(qi::_val, _1) ]
And, if you really need some processing done in handler, make it take the string by reference:
auto handler = [](attr_signature const& elements, std::string& attr) {
for(auto& e : elements) {
attr += e;
}
};
quoted_string = (qi::omit[qi::char_("'\"")[_a = _1]]
>> +(qi::char_ - qi::char_(_a))
>> qi::lit(_a)) [ phx::bind(handler, _1, qi::_val) ];
All these approaches work.
For really heavy duty things, I have in the past used a custom string type with boost::spirit::traits customization points to do the transformations:
http://www.boost.org/doc/libs/1_55_0/libs/spirit/doc/html/spirit/advanced/customize.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

boost::spirit::x3 parsing is slower than strsep parsing - c++

Related

Boost::Spirit doubles character when followed by a default value

How do you get a string out of a Boost Spirit X3 lexeme parser?

Spirit Grammar To break Up a String by Number of Characters

Parse a C-string of floating numbers

Boost spirit, returned value from a semantic action interferes with the rule attribute

Categories

Resources