boost::Spirit Grammar for unsorted schema

boost::Spirit Grammar for unsorted schema - c++

I have a section of a schema for a model that I need to parse. Lets say it looks like the following.
{
type = "Standard";
hostname="x.y.z";
port="123";
}
The properties are:
The elements may appear unordered.
All elements that are part of the schema must appear, and no other.
All of the elements' synthesised attributes go into a struct.
(optional) The schema might in the future depend on the type field -- i.e., different fields based on type -- however I am not concerned about this at the moment.

According to the Spirit forums, the following is the answer.
You might want to have a look at the
permutation parser:
a ^ b ^ c
Which matches a or b or c (or a
combination thereof) in any sequence.
If the objective is to parse into a struct, than the best way to test weather all essential members have been initialized, the struct members should be wrapped with boost::optional<> The attribute presence may then be easily tested post-parsing during run-time.

Related

why is std::variant permitted to hold the same type more than once?

What can be the use case of std::variant holding the same type more than once?
Refer to https://en.cppreference.com/w/cpp/utility/variant
You can only find the issue when you start to call std::get<T>(v).

Say we want to represent a token that can be a keyword, an identifier, or a symbol. One possible implementation is thus:
enum TokenType : std::size_t {
Keyword = 0, Identifier = 1, Symbol = 2
};
using Token = std::variant<std::string, std::string, char>;
Now it is possible to use, for example:
std::get<TokenType::Keyword>(token)
to access the alternatives.
Whether this is a good idea is of course up for debate, but it does show the existence of such use cases.

std variant is designed for use in generic code as a sum type.
It needs the algebraic property that a variant merge of two variants a and b (only one of which is active) holds alternative index equal to a if a was active, and count of alternatives in a plus the alternative index of b if b is active.
Because, honestly, anything else is madness there.
Fundamentally, the names of variant members is the index, not the type.
You are free to write a variant alias that bans duplicate types for your own code. It is relatively easy.
If that (type alternative named variants) was the base, writing index based variants would be quite annoying.
With reflection, making variants with named alternatives will also be plausible. Again, this would be harder if we blocked index based variants with duplicate types.

Flex/Bison: cannot use semantic_type

I try to create a c++ flex/bison parser. I used this tutorial as a starting point and did not change any bison/flex configurations. I am stuck now to the point of trying to unit test the lexer.
I have a function in my unit tests that directly calls yylex, and checks the result of it:
private: static void checkIntToken(MyScanner &scanner, Compiler *comp, unsigned long expected, unsigned char size, char isUnsigned, unsigned int line, const std::string &label) {
yy::MyParser::location_type loc;
yy::MyParser::semantic_type semantic; // <---- is seems like the destructor of this variable causes the crash
int type = scanner.yylex(&semantic, &loc, comp);
Assert::equals(yy::MyParser::token::INT, type, label + "__1");
MyIntToken* token = semantic.as<MyIntToken*>();
Assert::equals(expected, token->value, label + "__2");
Assert::equals(size, token->size, label + "__3");
Assert::equals(isUnsigned, token->isUnsigned, label + "__4");
Assert::equals(line, loc.begin.line, label + "__5");
//execution comes to this point, and then, program crashes
}
The error message is:
program: ../src/__autoGenerated__/MyParser.tab.hh:190: yy::variant<32>::~variant() [S = 32]: Assertion `!yytypeid_' failed.
I have tried to follow the logic in the auto-generated bison files, and make some sense out of it. But I did not succeed on that and ultimately gave up. I searched then for any advice on the web about this error message but did not find any.
The location indicated by the error has the following code:
~variant (){
YYASSERT (!yytypeid_);
}
EDIT: The problem disappears only if I remove the
%define parse.assert
option from the bison file. But I am not sure if this is a good idea...
What is the proper way to obtain the value of the token generated by flex, for unit testing purposes?

Note: I've tried to explain bison variant types to the best of my knowledge. I hope it is accurate but I haven't used them aside from some toy experiments. It would be an error to assume that this explanation in any way implies an endorsement of the interface.
The so-called "variant" type provided by bison's C++ interface is not a general-purpose variant type. That was a deliberate decision based on the fact that the parser is always able to figure out the semantic type associated with a semantic value on the parser stack. (This fact also allows a C union to be used safely within the parser.) Recording type information within the "variant" would therefore be redundant. So they don't. In that sense, it is not really a discriminated union, despite what one might expect of a type named "variant".
(The bison variant type is a template with an integer (non-type) template argument. That argument is the size in bytes of the largest type which is allowed in the variant; it does not in any other way specify the possible types. The semantic_type alias serves to ensure that the same template argument is used for every bison variant object in the parser code.)
Because it is not a discriminated union, its destructor cannot destruct the current value; it has no way to know how to do that.
This design decision is actually mentioned in the (lamentably insufficient) documentation for the Bison "variant" type. (When reading this, remember that it was originally written before std::variant existed. These days, it would be std::variant which was being rejected as "redundant", although it is also possible that the existence of std::variant might have had the happy result of revisiting this design decision). In the chapter on C++ Variant Types, we read:
Warning: We do not use Boost.Variant, for two reasons. First, it appeared unacceptable to require Boost on the user’s machine (i.e., the machine on which the generated parser will be compiled, not the machine on which bison was run). Second, for each possible semantic value, Boost.Variant not only stores the value, but also a tag specifying its type. But the parser already “knows” the type of the semantic value, so that would be duplicating the information.
Therefore we developed light-weight variants whose type tag is external (so they are really like unions for C++ actually).
And indeed they are. So any use of a bison "variant" must have a definite type:
You can build a variant with an argument of the type to build. (This is the only case where you don't need a template parameter, because the type is deduced from the argument. You would have to use an explicit template parameter only if the argument were not of the precise type; for example, an integer of lesser rank.)
You can get a reference to the value of known type T with as<T>. (This is undefined behaviour if the value has a different type.)
You can destruct the value of known type T with destroy<T>.
You can copy or move the value from another variant of known type T with copy<T> or move<T>. (move<T> involves constructing and then destructing a T(), so you might not want to do it if T had an expensive default constructor. On the whole, I'm not convinced by the semantics of the move method. And its name conflicts semantically with std::move, but again it came first.)
You can swap the values of two variants which both have the same known type T with swap<T>.
Now, the generated parser understands all these restrictions, and it always knows the real types of the "variants" it has at its disposal. But you might come along and try to do something with one of these objects in a way that violates a constraint. Since the object really doesn't have any way to check the constraint, you'll end up with undefined behaviour which will probably have some disastrous eventual consequence.
So they also implemented an option which allows the "variant" to check the constraints. Unsurprisingly, this consists of adding a discriminator. But since the discriminator is only used to validate and not to modify behaviour, it is not a small integer which chooses between a small number of known alternatives, but rather a pointer to a std::typeid (or NULL if the variant does not yet contain a value.) (To be fair, in most cases alignment constraints mean that using a pointer for this purpose is no more expensive than using a small enum. All the same...)
So that's what you're running into. You enabled assertions with %define parse.assert; that option was provided specifically to prevent you from doing what you are trying to do, which is let the variant object's destructor run before the variant's value is explicitly destructed.
So the "correct" way to avoid the problem is to insert an explicit call at the end of the scope:
// execution comes to this point, and then, without the following
// call, the program will fail on an assertion
semantic.destroy<MyIntType*>();
}
With the parse assertion enabled, the variant object will be able to verify that the types specified as template parameters to semantic.as<T> and semantic.destroy<T> are the same types as the value stored in the object. (Without parse.assert, that too is your responsibility.)
Warning: opinion follows.
In case anyone reading this cares, my preference for using real std::variant types comes from the fact that it is actually quite common for the semantic value of an AST node to require a discriminated union. The usual solution (in C++) is to construct a type hierarchy which is, in some ways, entirely artificial, and it is quite possible that std::variant can better express the semantics.
In practice, I use the C interface and my own discriminated union implementation.

referencing a type from within an xsd pattern

I want to recognize all of my servers over my office network. They have a particular naming pattern which only I use. I've defined it in a simpleType.
Now I was told I have to filter my servers from a list of full DNS names (like www.bla.moo.oneofmyservers.foo.loo). My naming strategy has a length limit. I would have simply put it inside a *mystrategy* if not for that.
Is there a way to reference my type from within a pattern definition?
It didn't work when I wrote *mytype*.

Assuming that what you're asking is something like this:
I have a pattern and I've used it as a constraining facet in a simple type; now, I want to make another type, and for maintenance purposes, I wish to somehow reference that pattern, so that I don't have to maintain it in two different places...
The answer is no, you can't. Constraining facets in XSD are not referenceable entities; nor types are referenceable within constraining facets.

What does `B means?

In toplevel, i get the following output:
#`B
- : [> `B ] = `B
then what does `B mean ? Why do we need it ?
Sincerely!

An identifier prefixed with a backquote like `B is a constructor of a polymorphic variant type. It's similar to the constructor of an algebraic type:
type abc = A | B | C
However, you can use polymorphic variant values without declaring them, and in general they're much more flexible than the usual algebraic types. The tradeoff is that they're also quite a bit trickier to use.
One thing people use them for is as simple named values, like enum values in C. Or, more precisely, like atoms in Lisp. You can use ordinary algebraic types for this, but you need to carefully maintain your definitions of them and guard against duplication. With polymorphic variants, you don't need to do either of these. You can use them without declaring them, and the constructors aren't required to be unique (two different types can have the same constructor).
Polymorphic variant constructors can also take parameters, as algebraic constructors can. So you can also write (`B 77), a constructor with a single int parameter.
This is a pretty big topic--see the above linked section of the OCaml manual for more details.

It's a polymorphic variant. From the documentation:
Variants as presented in section 1.4 are a powerful tool to build data structures and algorithms. However they sometimes lack flexibility when used in modular programming. This is due to the fact every constructor reserves a name to be used with a unique type. One cannot use the same name in another type, or consider a value of some type to belong to some other type with more constructors.
With polymorphic variants, this original assumption is removed. That is, a variant tag does not belong to any type in particular, the type system will just check that it is an admissible value according to its use. You need not define a type before using a variant tag. A variant type will be inferred independently for each of its uses.

Which style is better to declare a type in Ocaml?

I often need to declare a type which contains a map or a list, for instance:
type my_type_1 = my_type_0 IntMap.t
type my_type_2 = my_type_0 List
Also I have seen another style of declaration which encapsulates map or list in a record, for instance:
type my_type_1 =
| Bot_1
| Nb_1 of my_type_0 IntMap.t
type my_type_2 =
| Bot_2
| Nb_2 of my_type_0 List
My question is, whether there are some cases where the second style is necessary and better than the first style?
Thank you very much!

The two types you give are not equivalent, because of the Bot constructor added in the second case. This means that the two my_type_1 do not have the same semantics. Incidentally, the construction Bot | Foo of 'a is already provided by the standard type 'a option, with constructors Some and None, so the type my_type_1 of your second sample is equivalent to a my_type_1 option in the first one.
Whether to use an option type or your own constructors names is up to you. In general, I would advise to you an option type if the semantics of your type coincides with the option idea of failure, being absent, or being undefined. Given your name Bot, I assume this is probably what you're doing, but defining your own constructor names is also ok and can be clearer in some circumstances. The matter has been discussed in depth in this blog post from ezyang.
Now, assuming your two types definition were equivalent (that is, in absence of the Bot) constructor, what's the purpose of adding an algebraic datatype layer, a new constructor, instead of using a simple type alias ? Well, it has the effect of making your type distinct from the representation type. For example, if you define type 'a stack = Stack of 'a list, 'a stack and 'a list cannot be confused for each other, and the compiler will raise an error if you do. So that can be used to enforce a (light) type separation, with the constructor acting as a type annotation:
let empty = Stack []
let length (Stack li) = List.length li
I'd say it's mostly a matter of taste, but I would recommend using an algebraic datatype instead of an alias when you want to be sure that there can be no mistake with the original type. The downside is that you have to wrap the operations of the original datatype, as I did in my length function above.

Those are not different styles, but different types: the first type declarations are an abbreviation for a specialized instance (for mytype_0) of the polymorphic List, or IntMap.
The second set of definitions present a "constructed" type, for which Bot_1 (and Bot_2) provide values. Those "alternatives" can be used, for example, to create functions of type T -> my_type_1 which return Bot_1 in a special case where the computation doesn't allow to return a list, in a similar way of what an option type permits. This is impossible with the first set of definitions (who must always provide the required list payload).

The second one isn't a "record" (which is a different thing). It creates an algebraic data type. I'm not sure how to explain it but if you've used Haskell or Standard ML you'll know. It's basically a tagged union. A my_type_1 is either a Bot_1 (which carries no data) or a Nb_1 (which carries a my_type_0 IntMap.t as data).
The first one is simply a type synonym (like a typedef in C).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js