I wonder if it is possible to change the parser at runtime given it does not change the compound attribute.
Lets say I want to be able to modify at runtime the character of my parser that detects whether I have to join a line from ; to ~. Both are just characters and since the c++ types and the template instantiations dont vary (in both cases we are talking about a char) I think there must be some way, but I dont find it. So is this possible?
My concrete situation is that I am calling the X3 parser via C++/CLI and have the need that the character shall be adjustable from .NET. I hope the following example is enough to be able to understand my problem.
http://coliru.stacked-crooked.com/a/1cc2f2836dbfaa46
Kind regards
You cannot change the parser at runtime (except a DSO trick I described under your other question https://stackoverflow.com/a/56135824/3621421), but you can make your parser context-sensitive via semantic actions and/or stateful parsers (like x3::symbols).
The state for semantic actions (or probably for your custom parser) can also be stored in a parser context. However, usually I see that folks use global or function local variables for this purpose.
A simple example:
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
int main()
{
char const* s = "sep=,\n1,2,3", * e = s + std::strlen(s);
auto p = "sep=" >> x3::with<struct sep_tag, char>('\0')[
x3::char_[([](auto& ctx) { x3::get<struct sep_tag>(ctx) = _attr(ctx); })] >> x3::eol
>> x3::int_ % x3::char_[([](auto& ctx) { _pass(ctx) = x3::get<struct sep_tag>(ctx) == _attr(ctx); })]
];
if (parse(s, e, p) && s == e)
std::cout << "OK\n";
else
std::cout << "Failed\n";
}
Related
In a data processing project, i need to detect split words in chinese ( words in chinese dont contain spaces).
Is there a way to detect chinese characters using a native c++ feature or boost.locale library ?
Generally speaking, if you want full Unicode support in C++, there is little to no way around ICU. Boost provides some access to its features (through Boost.Locale and Boost.Regex), but it requires Boost to be compiled with ICU support for this. So instead of making sure the Boost of the target platform is compiled thusly you are probably better off using the ICU API directly.
If you are looking for word boundaries, icu::BreakIterator (more specifically, icu::BreakIterator::createWordInstance) is the starting point. You then pass the text to be iterated over via setText and move the iterator via next et al. (yes, ICU is a bit non-idiomatic this way, as it originated in Java land).
Alternatively, if you don't want to go for the full C++ API, there's ublock_getCode which will tell you the UBlockCode of the code point in question.
Here is my attempt using only boost and standard library:
#include <iostream>
#include <boost/regex/pending/unicode_iterator.hpp>
#include <functional>
#include <algorithm>
using Iter = boost::u8_to_u32_iterator<std::string::const_iterator>;
template <::boost::uint32_t a, ::boost::uint32_t b>
class UnicodeRange
{
static_assert(a <= b, "Proper range");
public:
constexpr bool operator()(::boost::uint32_t x) const noexcept
{
return x >= a && x <= b;
}
};
using UnifiedIdeographs = UnicodeRange<0x4E00, 0x9FFF>;
using UnifiedIdeographsA = UnicodeRange<0x3400, 0x4DBF>;
using UnifiedIdeographsB = UnicodeRange<0x20000, 0x2A6DF>;
using UnifiedIdeographsC = UnicodeRange<0x2A700, 0x2B73F>;
using UnifiedIdeographsD = UnicodeRange<0x2B740, 0x2B81F>;
using UnifiedIdeographsE = UnicodeRange<0x2B820, 0x2CEAF>;
using CompatibilityIdeographs = UnicodeRange<0xF900, 0xFAFF>;
using CompatibilityIdeographsSupplement = UnicodeRange<0x2F800, 0x2FA1F>;
constexpr bool isChineese(::boost::uint32_t x) noexcept
{
return UnifiedIdeographs{}(x)
|| UnifiedIdeographsA{}(x) || UnifiedIdeographsB{}(x) || UnifiedIdeographsC{}(x)
|| UnifiedIdeographsD{}(x) || UnifiedIdeographsE{}(x)
|| CompatibilityIdeographs{}(x) || CompatibilityIdeographsSupplement{}(x);
}
int main()
{
std::string s;
while (std::getline(std::cin, s))
{
auto start = std::find_if(Iter{s.cbegin()}, Iter{s.cend()}, isChineese);
auto stop = std::find_if_not(start, Iter{s.cend()}, isChineese);
std::cout << std::string{start.base(), stop.base()} << '\n';
}
return 0;
}
https://wandbox.org/permlink/FtxKa8D2LtR3ko9t
Probably you should be able to polish that approach to something fully functional.
I do not know how to properly cover this by tests and not sure which characters should be included in this check.
I'm writing some code that uses Z3 strings to evaluate permissions in ACLs. So far with SMT2 this has been relatively easy. An eg. code of what I'm trying to acheive is:
(declare-const Group String)
(declare-const Resource String)
(define-fun acl1() Bool
(or (and
(= Group "employee")
(str.prefixof "shared/News_" Resource))
(and
(= Group "manager")
(or (str.prefixof "shared/Internal_" Resource)
(str.prefixof "shared/News_" Resource))
)))
(define-fun acl2() Bool
(and (and (str.prefixof "shared/" Resource)
(str.in.re Group re.allchar))
(not (and (str.prefixof "shared/Internal_" Resource)
(= Group "employee")))))
;; perm(acl1) <= perm(acl) iff acl1 => acl2
(define-fun conjecture() Bool
(=> (= acl1 true)
(= acl2 true)))
(assert (not conjecture))
(check-sat)
Reading the z3 c++ bindings, I can't figure out how to stick a z3::function to this yet. So far, assuming that define-fun is just a lisp macro, I have this.
#include <z3++.h>
z3::expr acl1(z3::context& c, z3::expr& G, z3::expr& R)
{
return (((G == c.string_val("employee")) &&
z3::prefixof(c.string_val("shared/News_"), R)) ||
((G == c.string_val("manager")) &&
(z3::prefixof(c.string_val("shared/Internal_"), R) ||
z3::prefixof(c.string_val("shared/News_"), R))));
}
z3::expr acl2(z3::context& c, z3::expr& G, z3::expr& R)
{
return ((z3::prefixof(c.string_val(""), G) &&
z3::prefixof(c.string_val("shared/"), R)) &&
!((G == c.string_val("employee")) &&
(z3::prefixof(c.string_val("shared/Internal"), R))));
}
z3::expr MakeStringFunction(z3::context* c, std::string s) {
z3::sort sort = c->string_sort();
z3::symbol name = c->str_symbol(s.c_str());
return c->constant(name, sort);
}
void acl_eval()
{
z3::context c;
auto Group = MakeStringFunction(&c, "Group");
auto Resource = MakeStringFunction(&c, "Resource");
auto acl1_f = acl1(c, Group, Resource);
auto acl2_f = acl2(c, Group, Resource);
auto conjecture = implies(acl1_f == c.bool_val(true),
acl2_f == c.bool_val(true));
z3::solver s(c);
s.add(!conjecture);
std::cout << s.to_smt2() << std::endl;
switch(s.check()){
case z3::unsat: std::cout<< "Valid Conjecture" << std::endl; break;
case z3::sat: std::cout << "Invalid Conjecture" << std::endl; break;
case z3::unknown: [[fallthrough]]
default:
std::cout << "Unknown" << std::endl;
}
}
int main(){
acl_eval();
return 0;
}
Is this how this is to be done wrt functions in C++ bindings?
while the smt2 code generated by C++ bindings don't exactly look like the other one, I see a whole expr inside an assert with let bindings which kind of does what I want. Additionally, I also want to know if C++ bindings support regex functions like the SMT lib of z3 exposes? I can't find any examples and the docs aren't very clear.
In general, you do not need to create "functions" in SMTLib when you're using the C++ (or any other high-level) API. Instead, you simply write functions in those languages, which generate the required code directly. This does sound confusing at first, but it is the intended use case: SMTLib functions get replaced by functions in the host language. Running them in the host language then produces the necessary syntax trees in the object language; i.e., Z3's internal AST representation. Especially in your case, you do not need any "arguments" passed to these functions, so you shouldn't be creating any at all. So, what you did here is correct.
(Side note: There can be scenarios where you do want to spit out functions in SMTLib. For instance if you want to use uninterpreted functions. Or perhaps you want to use the recursive function definitions, which you cannot really do in the host language. But let's not conflate the matters here. If you do feel you actually do need them, please ask a separate question about that. From your description, I see no reason for them.)
Regarding regular-expression expressions: They're all available in the C++ API, take a look here: https://z3prover.github.io/api/html/z3_09_09_8h_source.html#l03334
In particular, the functions you're looking for are:
in_re: For checking membership
re_full: Regular expression accepting all strings (Somewhat confusingly, SMTLibs allchar is called re_full in the C++ API.)
Hopefully that'll get you started!
Let's say we want to parse a recursive block like this. When "skip_comments_tag" is prefixed to the block, we skip all comments(/*...*/) within this block recursively.
{
{}
{
skip_comments_tag{
{} /*comments*/
{ /*comments*/ }
}
}
}
It's easy to come up with a recursive parser as in Coliru.
namespace Parser {
auto const ruleComment = x3::lit("/*") >> *(x3::char_ - "*/") >> "*/" | x3::space;
x3::rule<struct SBlockId> const ruleBlock;
auto const ruleBlock_def = x3::lit('{') >> *(ruleBlock | "skip_comments_tag" >> x3::skip(ruleComment)[ruleBlock]) >> '}';
BOOST_SPIRIT_DEFINE(ruleBlock)
}
But it doesn't compile (when the parse function is called) because it will generate an infinite context (by x3::make_context in x3::skip_directive). x3::no_case and x3::with also have this problem because they all use x3::make_context in the implementation.
Questions:
Is there always a better way to write parsers for this kind of
questions to avoid such compile error and how?
Or is the x3::make_context implementation considered to be flawed for this kind of questions?
Honestly, I do think this is a limitation in the make_context facility, and yes, it has bitten me before.
You might be able to skirt it by using TU separation (BOOST_SPIRIT_DECLARE, BOOST_SPIRIT_DEFINE and
BOOST_SPIRIT_INSTANTIATE macros).
Honestly, I'd report it at the mailing list: [spirit-general]
See also http://boost.2283326.n4.nabble.com/Horrible-compiletimes-and-memory-usage-while-compiling-a-parser-with-X3-td4689104i20.html (FWIW I feel the "sequence partitioning" issue is unrelated)
I played a little with code at link and I have another question. I added semantic action to:
action = actions_ >> '(' >> parameters >> ')'[ /* semantic action placed here */];
so I can reuse the rule together with verification at multiple places. The problem is that then the spirit stops propagate my attribute type to the upper rules (which uses the action as parser). I read at link that operator %= should be used to enable it again (to have semantic actions and attribute propagation). But then I am getting compiler error that it is not possible to convert boost::fuction::vector2<ast::actionid, ast::parameters> to ast::action. Is there any macro in fusion to enable assignment in another direction? Or what should I do that the rule still exposes the same attribute as it is passed to the semantic action instead of having fusion vector there?
Sample code:
#include "stdafx.h"
// boost
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_iter_pos.hpp>
#include <boost/bind.hpp>
#include <boost/phoenix.hpp>
// std
#include <string>
#include <vector>
namespace bsqi = boost::spirit::qi;
namespace bsqi_coding = boost::spirit::standard_wide;
namespace bsqi_repos = boost::spirit::repository::qi;
//////////////////////////////////////////////////////////////////////////
enum class Action : uint8_t
{
eAction0 = 0,
eAction1,
eAction2
};
//////////////////////////////////////////////////////////////////////////
struct ActionSymbols : public boost::spirit::qi::symbols<wchar_t, Action>
{
ActionSymbols()
{
add
(L"action0", Action::eAction0)
(L"action1", Action::eAction1)
(L"action2", Action::eAction2)
;
}
} actionSymbols;
//////////////////////////////////////////////////////////////////////////
using ParameterValue = boost::variant<int, std::wstring>;
struct Parameter
{
std::wstring::const_iterator source; ///< position within the input where parameter begins
ParameterValue value; ///< type and value of the parameter
};
//////////////////////////////////////////////////////////////////////////
using Parameters = std::vector<Parameter>;
//////////////////////////////////////////////////////////////////////////
struct ActionParameters
{
Action action;
Parameters parameters;
};
//////////////////////////////////////////////////////////////////////////
BOOST_FUSION_ADAPT_STRUCT(Parameter, (std::wstring::const_iterator, source), (ParameterValue, value));
BOOST_FUSION_ADAPT_STRUCT(ActionParameters, (Action, action), (Parameters, parameters));
//////////////////////////////////////////////////////////////////////////
class SyntaxError : public std::runtime_error
{
public:
SyntaxError()
: std::runtime_error("Syntax error!")
{ }
};
//////////////////////////////////////////////////////////////////////////
template<typename IteratorT>
struct ScriptGrammar : bsqi::grammar<IteratorT, std::vector<ActionParameters>, bsqi_coding::space_type>
{
/// helper type to define all rules
template<typename T>
using RuleT = bsqi::rule<iterator_type, T, bsqi_coding::space_type>;
using result_type = std::vector<ActionParameters>;
explicit ScriptGrammar()
: base_type(start, "script")
{
// supported parameter types (int or quoted strings)
// note: iter_pos is used for saving the iterator for the parameter to enable generating more detailed error reports
parameter = bsqi_repos::iter_pos >> (bsqi::int_ | bsqi::lexeme[L'"' > *(bsqi_coding::char_ - L'"') > L'"']);
parameter.name("parameter");
// comma separator list of parameters (or no parameters)
parameters = -(parameter % L',');
parameters.name("parameters");
// action with parameters
action = (actionSymbols > L'(' > parameters > L')')[bsqi::_pass = boost::phoenix::bind(&ScriptGrammar::ValidateAction, this, bsqi::_1, bsqi::_2)];
action.name("action");
// action(..) [-> event(..) -> event(..) -> ..]
// eps = force to use this rule for parsing
// eoi = the rule must consume whole input
start = bsqi::eps > (action % L';') > L';' > bsqi::eoi;
}
private:
bool ValidateAction(Action action, const Parameters& parameters)
{
return true;
}
RuleT<Parameter> parameter;
RuleT<Parameters> parameters;
RuleT<ActionParameters> action;
RuleT<std::vector<ActionParameters>> start;
};
//////////////////////////////////////////////////////////////////////////
int _tmain(int argc, _TCHAR* argv[])
{
using ScriptParser = ScriptGrammar<std::wstring::const_iterator>;
ScriptParser parser;
auto input = std::wstring(L"\taction0(1, 2, 3); action1(\"str1\", \"str2\"); action2(\"strmix\", 0);\t\t");
auto it = input.begin();
ScriptParser::result_type output;
try
{
if(!phrase_parse(it, input.end(), parser, bsqi_coding::space, output))
throw SyntaxError();
}
catch(bsqi::expectation_failure<ScriptParser::iterator_type>& e)
{
std::cout << "Error! Expecting " << e.what_ << " here: \"" << std::string(e.first, e.last) << "\"";
}
catch(SyntaxError& e)
{
std::cout << e.what() << "\n";
}
return 0;
}
I try to get an attribute from start rule. I get correctly parsed values in my semantic action (ValidateAction) but the attribute from start rule receive only uninitialized values (size of the output std::vector is 3 but values are uninitialized). I tried to replace the initialization of the rules with %= instead of simple = but then the mentioned compilation error pops.
There is BOOST_SPIRIT_ACTIONS_ALLOW_ATTR_COMPAT which is supposed to allow attribute compatibility rules to work inside semantic actions like they work during automation attribute propagation.
However, the superior solution is to specify the conversions you wish, when you wish them.
The most obvious approaches are
wrap the intermediate into a qi::rule<..., T()>
Incidentally, I already solved your particular issue that way here boost spirit reporting semantic error in your previous question.
Actually, I suppose you would like to have a stateful validator working on the fly, and you can use Attribute Traits to transform your intermediates to the desired AST (e.g. if you don't want to actually store the iterators in your AST)
wrap the sub-expression in a qi::transform_attribute<T>()[p] directive.
Beware of a bug in some versions of Boost Spirit that requires you to explicitly deep-copy the subexpression in transform_attribute (use qi::copy(p))
I want to change a local variable value in semantic action, like following:
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;
namespace ascii = boost::spirit::ascii;
using boost::phoenix::ref;
using boost::phoenix::bind;
void dummy(const std::vector<char>& v, int& var)
{
var = 7;
}
template <typename Iterator>
struct x_grammar : public qi::grammar<Iterator, std::string(), ascii::space_type>
{
public:
x_grammar() : x_grammar::base_type(start_rule, "x_grammar")
{
using namespace qi;
int local_var = 0;
start_rule = (+(char_ - ";"))[bind(dummy, _1, ref(local_var))];
//repeat(ref(local_var))[some_rule];
}
private:
qi::rule<Iterator, std::string(), ascii::space_type> start_rule;
};
int main()
{
typedef std::string::const_iterator iter;
std::string storage("string;aaa");
iter it_begin(storage.begin());
iter it_end(storage.end());
std::string read_data;
using boost::spirit::ascii::space;
x_grammar<iter> g;
try {
bool r = qi::phrase_parse(it_begin, it_end, g, space, read_data);
std::cout << "Pass!\n";
} catch (const qi::expectation_failure<iter>& x) {
std::cout << "Error!\n";
}
}
I am getting some annoying compile errors using GCC 4.6.1 with boost 1.55.
I can't help but note that if compiler errors annoy you, then perhaps you should write valid code :/
Instructive Hat On...
While that's of course a flippant remark, it's also somewhat enlightening.
I've told you twice now that the whole idea of using constructor local variables in your grammar is fundamentally broken:
Boost spirit semantic action not invoked
Boost spirit using local variables
What you want is
inherited attributes
qi::locals
maayyyyybe, maaaayyybe grammar member variables; with the caveat that they make your rules non-re-entrant.
The important thing here to really get inside your head is
Boost Spirit generates parser from expression templates. Expression templates are 90% static information (type only), and get "compiled" (.compile()) into "invokable" (.parse()) form.
Most importantly, while you can write control flow in your semantic actions, none of this actually executed at the definition site. It's "compiled" into a lazy actor that can later be invoked.
The generated parse will conditionally invoke the lazy actor when the corresponding parse expression matches
Constructive Hat On...
It looks like you just want to transform attributes using a function.
Here's what you can do:
transform as part of the semantic action, placing the result into the regular attribute (maintaining 'functional' semantics for parser composition):
qi::rule<Iterator, exposed(), Skipper> myrule;
myrule = int_ [ _val = phx::bind(custom_xform, _1) ];
Where custom_xform is any old-school calleable (including polymorphic ones):
exposed custom_xform(int i) { return make_new_exposed(i); }
// or
struct custom_xfrom_t {
template <typename> struct result { typedef exposed type; };
template <typename Int>
exposed operator()(Int i) const {
return make_new_exposed(i);
}
};
static const custom_xform_t custom_xform;
You can add some syntactic sugar [1]
qi::rule<Iterator, exposed(), Skipper> myrule;
myrule = int_ [ _val = custom_xform(_1) ];
This requires custom_xform is defined as a lazy actor:
phx::function<custom_xform_t> custom_xform; // `custom_xform_t` again the (polymorphic) functor
You may note this wouldn't work for a regular function. You could wrap it in a calleable object, or use the BOOST_PHOENIX_ADAPT_FUNCTION macro to do just that for you
If you have some more involved transformations that you want to apply more often, consider using the Spirit Customization Points:
Customization of Spirit's Attribute Handling, specifically:
Transform an Attribute to a Different Type
Store a Parsed Attribute Value
These work most smoothly if you choose specific types for your attributes (e.g. Ast::Multiplicity or Ast::VelocityRanking, instead of int or double
[1] using BOOST_SPIRIT_USE_PHOENIX_V3
The code compiles with C++03. However, when using GCC 4.6's C++11 support, the code fails to compile. Here are the relevant excerpts from the error:
/usr/local/include/boost/spirit/home/support/action_dispatch.hpp: In static
member function 'static void boost::spirit::traits::action_dispatch<
Component>::caller(F&&, A&& ...) [with F =
const std::_Bind<with Boost.Phoenix actors>]'
...
main.cpp:25:9: instantiated from 'x_grammar<Iterator>::x_grammar() [...]
/usr/local/include/boost/spirit/home/support/action_dispatch.hpp:142:13: error:
no matching function for call to 'boost::spirit::traits::
action_dispatch<...>::do_call(const std::_Bind<with Boost.Phoenix actors>)'
Despite the using boost::phoenix::bind directive, the unqualified call to bind() is resolving to std::bind() rather than boost::phoenix::bind(), but the arguments are resolving to Boost.Phoenix actors. The Boost.Spirit documentation specifically warns about not mixing placeholders from different libraries:
You have to make sure not to mix placeholders with a library they don't belong to and not to use different libraries while writing a semantic action.
Hence, the compilation problem can be resolved by being explicit when defining the semantic action. Use either:
std::bind(dummy, std::placeholders::_1, std::ref(local_var))
or:
boost::phoenix::bind(dummy, _1, ref(local_var))
While that resolves the compiler error, it is worth noting that the ref(local_var) object will maintain a dangling reference, as its lifetime extends beyond that of local_var. Here is a working example where local_var's lifetime is extend to beyond the scope of the constructor by making it static.