Flex/Bison: cannot use semantic_type - c++

I try to create a c++ flex/bison parser. I used this tutorial as a starting point and did not change any bison/flex configurations. I am stuck now to the point of trying to unit test the lexer.
I have a function in my unit tests that directly calls yylex, and checks the result of it:
private: static void checkIntToken(MyScanner &scanner, Compiler *comp, unsigned long expected, unsigned char size, char isUnsigned, unsigned int line, const std::string &label) {
yy::MyParser::location_type loc;
yy::MyParser::semantic_type semantic; // <---- is seems like the destructor of this variable causes the crash
int type = scanner.yylex(&semantic, &loc, comp);
Assert::equals(yy::MyParser::token::INT, type, label + "__1");
MyIntToken* token = semantic.as<MyIntToken*>();
Assert::equals(expected, token->value, label + "__2");
Assert::equals(size, token->size, label + "__3");
Assert::equals(isUnsigned, token->isUnsigned, label + "__4");
Assert::equals(line, loc.begin.line, label + "__5");
//execution comes to this point, and then, program crashes
}
The error message is:
program: ../src/__autoGenerated__/MyParser.tab.hh:190: yy::variant<32>::~variant() [S = 32]: Assertion `!yytypeid_' failed.
I have tried to follow the logic in the auto-generated bison files, and make some sense out of it. But I did not succeed on that and ultimately gave up. I searched then for any advice on the web about this error message but did not find any.
The location indicated by the error has the following code:
~variant (){
YYASSERT (!yytypeid_);
}
EDIT: The problem disappears only if I remove the
%define parse.assert
option from the bison file. But I am not sure if this is a good idea...
What is the proper way to obtain the value of the token generated by flex, for unit testing purposes?

Note: I've tried to explain bison variant types to the best of my knowledge. I hope it is accurate but I haven't used them aside from some toy experiments. It would be an error to assume that this explanation in any way implies an endorsement of the interface.
The so-called "variant" type provided by bison's C++ interface is not a general-purpose variant type. That was a deliberate decision based on the fact that the parser is always able to figure out the semantic type associated with a semantic value on the parser stack. (This fact also allows a C union to be used safely within the parser.) Recording type information within the "variant" would therefore be redundant. So they don't. In that sense, it is not really a discriminated union, despite what one might expect of a type named "variant".
(The bison variant type is a template with an integer (non-type) template argument. That argument is the size in bytes of the largest type which is allowed in the variant; it does not in any other way specify the possible types. The semantic_type alias serves to ensure that the same template argument is used for every bison variant object in the parser code.)
Because it is not a discriminated union, its destructor cannot destruct the current value; it has no way to know how to do that.
This design decision is actually mentioned in the (lamentably insufficient) documentation for the Bison "variant" type. (When reading this, remember that it was originally written before std::variant existed. These days, it would be std::variant which was being rejected as "redundant", although it is also possible that the existence of std::variant might have had the happy result of revisiting this design decision). In the chapter on C++ Variant Types, we read:
Warning: We do not use Boost.Variant, for two reasons. First, it appeared unacceptable to require Boost on the user’s machine (i.e., the machine on which the generated parser will be compiled, not the machine on which bison was run). Second, for each possible semantic value, Boost.Variant not only stores the value, but also a tag specifying its type. But the parser already “knows” the type of the semantic value, so that would be duplicating the information.
Therefore we developed light-weight variants whose type tag is external (so they are really like unions for C++ actually).
And indeed they are. So any use of a bison "variant" must have a definite type:
You can build a variant with an argument of the type to build. (This is the only case where you don't need a template parameter, because the type is deduced from the argument. You would have to use an explicit template parameter only if the argument were not of the precise type; for example, an integer of lesser rank.)
You can get a reference to the value of known type T with as<T>. (This is undefined behaviour if the value has a different type.)
You can destruct the value of known type T with destroy<T>.
You can copy or move the value from another variant of known type T with copy<T> or move<T>. (move<T> involves constructing and then destructing a T(), so you might not want to do it if T had an expensive default constructor. On the whole, I'm not convinced by the semantics of the move method. And its name conflicts semantically with std::move, but again it came first.)
You can swap the values of two variants which both have the same known type T with swap<T>.
Now, the generated parser understands all these restrictions, and it always knows the real types of the "variants" it has at its disposal. But you might come along and try to do something with one of these objects in a way that violates a constraint. Since the object really doesn't have any way to check the constraint, you'll end up with undefined behaviour which will probably have some disastrous eventual consequence.
So they also implemented an option which allows the "variant" to check the constraints. Unsurprisingly, this consists of adding a discriminator. But since the discriminator is only used to validate and not to modify behaviour, it is not a small integer which chooses between a small number of known alternatives, but rather a pointer to a std::typeid (or NULL if the variant does not yet contain a value.) (To be fair, in most cases alignment constraints mean that using a pointer for this purpose is no more expensive than using a small enum. All the same...)
So that's what you're running into. You enabled assertions with %define parse.assert; that option was provided specifically to prevent you from doing what you are trying to do, which is let the variant object's destructor run before the variant's value is explicitly destructed.
So the "correct" way to avoid the problem is to insert an explicit call at the end of the scope:
// execution comes to this point, and then, without the following
// call, the program will fail on an assertion
semantic.destroy<MyIntType*>();
}
With the parse assertion enabled, the variant object will be able to verify that the types specified as template parameters to semantic.as<T> and semantic.destroy<T> are the same types as the value stored in the object. (Without parse.assert, that too is your responsibility.)
Warning: opinion follows.
In case anyone reading this cares, my preference for using real std::variant types comes from the fact that it is actually quite common for the semantic value of an AST node to require a discriminated union. The usual solution (in C++) is to construct a type hierarchy which is, in some ways, entirely artificial, and it is quite possible that std::variant can better express the semantics.
In practice, I use the C interface and my own discriminated union implementation.

Related

How is typeid implemented?

I've read that it is implementation specific depending on your compiler, but what might be one way it is implemented? I am asking mainly because I want to know how it creates a name for a type. I'm relatively knew to programming and to C++ but what I've seen so far, it doesn't seem possible to turn a type into a string before runtime.
I was thinking the name generation could be done with macros and token-pasting sort of like this:
#define Put_In_Quotes(input) #input
template<class T>
const char* type_name(T data_type){
return Put_In_Quotes(T);
}
But I think this would simply return the literal string "T" instead of the type name. Not to mention it doesn't explain how typeid lets you enter just types and not values for its datatype parameter, eg. typeid(int).name()
All answers or guides to more information are greatly appreciated
Welcome to SE!
The compiler knows all the possible types your program might employ just like it knows which template forms your program needs instantiated. (As for the particular name strings it generates, different compilers use all sorts of conventions for how they name things internally.)
In simple terms, the typeid operator just returns a type_info object, right? Broadly speaking, that object is essentially all there is to implementing a type's RTTI: a type_info kept in the vtable, which is returned by typeid.
You mentioned you were new to C++, but if you look up how to do a vtable dump for your compiler, you can examine the actual data yourself. The formatting specifics of every compiler will differ, of course, but for a polymorphic type, you'll usually find this info right alongside the rest of that type's vtable data. Depending on how your compiler formats that output, it may look something like an additional struct member of that type and may (usually?) immediately precede or follow the rest of the type data. Static types will have similar data residing elsewhere.

Why does std::unique_lock use type tags to differentiate constructors?

In C++11, the std::unique_lock constructor is overloaded to accept the type tags defer_lock_t, try_to_lock_t, and adopt_lock_t:
unique_lock( mutex_type& m, std::defer_lock_t t );
unique_lock( mutex_type& m, std::try_to_lock_t t );
unique_lock( mutex_type& m, std::adopt_lock_t t );
These are empty classes (type tags) defined as follows:
struct defer_lock_t { };
struct try_to_lock_t { };
struct adopt_lock_t { };
This allows the user to disambiguate between the three constructors by passing one of the pre-defined instances of these classes:
constexpr std::defer_lock_t defer_lock {};
constexpr std::try_to_lock_t try_to_lock {};
constexpr std::adopt_lock_t adopt_lock {};
I am surprised that this is not implemented as an enum. As far as I can tell, using an enum would:
be simpler to implement
not change the syntax
allow the argument to be changed at runtime (albeit not very useful in this case).
(probably) could be inlined by the compiler with no performance hit
Why does the standard library use type tags, instead of an enum, to disambiguate these constructors? Perhaps more importantly, should I also prefer to use type tags in this situation when writing my own C++ code?
Tag dispatching
It is a technique known as tag dispatching. It allows the appropriate constructor to be called given the behaviour required by the client.
The reason for tags is that the types used for the tags are thus unrelated and will not conflict during overload resolution. Types (and not values as in the case of enums) are used to resolve overloaded functions. In addition, tags can be used to resolve calls that would otherwise have been ambiguous; in this case the tags would typically be based on some type trait(s).
Tag dispatching with templates means that only code that is required to be used given the construction is required to be implemented.
Tag dispatching allows for easier reading code (in my opinion at least) and simpler library code; the constructor won't have a switch statement and the invariants can be established in the initialiser list, based on these arguments, before executing the constructor itself. Sure, your milage may vary but this has been my general experience using tags.
Boost.org has a write up on the tag dispatching technique. It has a history of use that seems to go back at least as far as the SGI STL.
Why use it?
Why does the standard library use type tags, instead of an enum, to disambiguate these constructors?
Types would be more powerful and flexible when used during overload resolution and the possible implementation than enums; bear in mind the enums were originally unscoped and limited in how they could be used (by contrast to the tags).
Additional noteworthy reasons for tags;
Compile time decisions can be made over which constructor to use, and not runtime.
Disallows more "hacky" code where a integer is cast to the enum type with a value that is not catered for - design decisions would need to be made out to handle this and then code implemented to cater for any resultant exceptions or errors.
Keep in mind that the shared_lock and lock_guard also use these tags, but in the case of the lock_guard, only the adopt_lock is used. An enumeration would introduce more potential error conditions.
I think precedence and history also plays a role here. Given the wide spread use in the Standard Library and elsewhere; it is unlikely to change how situations, such as the original example, are implemented in the library.
Perhaps more importantly, should I also prefer to use type tags in this situation when writing my own C++ code?
This is essentially a design decision. Both can and should be used to target the problems they solve. I have used tags to "route" data and types to the correct function; particular when the implementation would be incompatible at compile time and if there are any overload resolutions in play.
The Standard Library std::advance is often given as an example of how tag dispatching can be used to implement and optimise an algorithm based on traits (or characteristics) of the types used (in this case when the iterators are random access iterators).
It is a powerful technique when used appropriately and should not be ignored. If using enums, favour the newer scoped enums over the older unscoped ones.
Using these tags enables you to take advantage of the type system of the language. This is closely related to template meta-programming. Simply speaking, using these tags allows the dispatch decision concerning which constructor to invoke to be made statically at compile time. This leaves room for compiler optimization, improves run-time efficiency, and makes template meta-programming with std::unique_lock easier. This is possible, because the tags are of different static types. With an enum, this cannot be done, for the value of an enum cannot be foreseen at compile time. Note that, using tags for differentiating purposes is a common template meta-programming technique. Just see those iterator tags used by the standard library.
The point is that if you want to add another function using enum, you should edit your enum, then rebuild all projects, which use your functions and enum. In addition there will be one function taking enum as argument and using switch or something. This will bring excess code into your application.
Otherwise if you use overloaded functions with tags, you can easily add another tag and add another overloaded function, without touching old ones. This is more back-compatible.
I suspect it was optimization. Notice that using a type (as is) the correct version is selected at compile time. As you point out using an enum is (potentially) selected in some conditional statement (maybe a switch) at run-time.
In many implementations locks are acquired and released at extremely high frequency and maybe designers thought with branch prediction and the implied memory synchronization events that might be a significant issue.
The flaw in my argument (which you also point out) is that the constructor is likely to be inline and it is likely that the condition would be optimized away anyway.
Notice that using 'dummy' parameters is the closest possible analogue to actually providing named constructors.
This method is called tag dispatching (I may be wrong). Enum type with different values is just one type in compile time and enum values can't be used to overload constructor. So with enum it will be one constructor with switch statement in it. Tag dispatching is equivalent to switch statement in compile time. Each tag type specify: what this constructor would do, how it will try to acquire the lock. You should use type tags, when you want to make decision in compile time and use enum to make decision in run-time.
Because, in std::unique_lock<Mutex>, you don't want to force Mutex to have a lock or try_lock method if it may never need to be called.
If it accepted an enum parameter, then both of those methods would need to be present.

C++ equivalent of Rust's Result<T, E> type?

I like using std::experimental::optional in my C++ code, but the problem is value_or requires the default value to be of the same type as the optional's value.
This doesn't work very well when I want an optional that either contains an int or contains an error message.
I guess I could use a union struct that has a boolean to indicate if the value is there or it's an error, but it sure would be nice if C++ just had a Result<T, E> type like Rust.
Is there any such type? Why hasn't Boost implemented it?
Result is really much more useful than Option, and surely the people at Boost are aware of its existence. Maybe I'll go read the Rust implementation and then copy it to C++?
Ex:
// Function either returns a file descriptor for a listening socket or fails
// and returns a nullopt value.
// My issue: error messages are distributed via perror.
std::experimental::optional<int> get_tcp_listener(const char *ip_and_port);
// You can use value_or to handle error, but the error message isn't included!
// I have to write my own error logger that is contained within
// get_tcp_listener. I would really appreciate if it returned the error
// message on failure, rather than an error value.
int fd = get_tcp_listener("127.0.0.1:9123").value_or(-1);
// Rust has a type which does what I'm talking about:
let fd = match get_tcp_listener("127.0.0.1:9123") {
Ok(fd) => fd,
Err(msg) => { log_error(msg); return; },
}
In c++17, optional<T> is an asymmetric type safe union of T and nothingness (nullopt_t). You can query if it has a T with explicit operator bool, and get the T out with unary *. The asymmetry means that optional "prefers" to be a T, which is why unqualified operations (like * or operator bool) refer to its Tness.
In c++17 variant<A,B,C> from paper n4218 is a symmetric type safe union of A, B and C (etc). boost::variant is always engaged, and std::variant is almost always engaged (in order to preserve some exception guarantees, it can become valueless by exception if the types it store don't have the right exception semantics).
As it is symmetric, there is no unique type for unary * to return, and explicit operator bool cannot say much of interest, so neither are supported.
Instead, you have to visit it, or query it for particular types.
In c++23 std::expected<T, E> from paper n4015 is an asymmetric type-safe union. It is either a T, or an E. But like optional, it "prefers" to be a T; it has an explicit operator bool that tells you if it is a T, and unary * gets the T.
In a sense, expected<T,E> is an optional<T>, but when empty instead of wasting the space it stores an E, which you can query.
Result<T,E> seems close to expected<T,E> (note that as of n4015, the order of parameters are swapped compared to Result, but the published version did not).
What you are looking for is exactly Alexandrescu's Expected. I recommend listening to his talk for an in depth understanding: https://www.youtube.com/watch?v=kaI4R0Ng4E8. He actually goes through the implementation line by line, and you can easily write it yourself and use it well after that.
Variant is a more general tool, it can be coerced to do what you want but you're better off with expected.
If not only boost is involved u can use result. This is nice single header container.
optional by design either contains a value of some type or nothing.
You may be looking for something like Boost::Variant.
This is not yet part of the standard library, although something like it may be eventually.
Since this was asked, the C++23 standard's std::expected does exactly this. A summarizing quote from Cpp Reference:
The class template std::expected provides a way to store either of two values. An object of std::expected at any given time either holds an expected value of type T, or an unexpected value of type E. std::expected is never valueless.

How does c++11 resolve constexpr into assembly?

The basic question:
Edit: v-The question-v
class foo {
public:
constexpr foo() { }
constexpr int operator()(const int& i) { return int(i); }
}
Performance is a non-trivial issue. How does the compiler actually compile the above? I know how I want it to be resolved, but how does the specification actually specify it will be resolved?
1) Seeing the type int has a constexpr constructor, create a int object and compile the string of bytes that make the type from memory into the code directly?
2) Replace any calls to the overload with a call to the 'int's constructor that for some unknown reason int doesn't have constexpr constructors? (Inlining the call.)
3) Create a function, call the function, and have that function call 'int's consctructor?
Why I want to know, and how I plan to use the knowledge
edit:v-Background only-v
The real library I'm working with uses template arguments to decide how a given type should be passed between functions. That is, by reference or by value because the exact size of the type is unknown. It will be a user's responsibility to work within the limits I give them, but I want these limits to be as light and user friendly as I can sanely make them.
I expect a simple single byte character to be passed around in which case it should be passed by value. I do not bar 300mega-byte behemoth that does several minuets of recalculation every time a copy constructor is invoked. In which case passing by reference makes more sense. I have only a list of requirements that a type must comply with, not set cap on what a type can or can not do.
Why I want to know the answer to my question is so I can in good faith make a function object that accepts this unknown template, and then makes a decision how, when, or even how much of a object should be copied. Via a virtual member function and a pointer allocated with new is so required. If the compiler resolves constexpr badly I need to know so I can abandon this line of thought and/or find a new one. Again, It will be a user's responsibility to work within the limits I give them, but I want these limits to be as light and user friendly as I can sanely make them.
Edit: Thank you for your answers. The only real question was the second sentence. It has now been answered. Everything else If more background is required, Allow me to restate the above:
I have a template with four argument. The goal of the template is a routing protocol. Be that TCP/IP -unlikely- or node to node within a game -possible. The first two are for data storage. They have no requirement beyond a list of operators for each. The last two define how the data is passed within the template. By default this is by reference. For performance and freedom of use, these can be changed define to pass information by value at a user's request.
Each is expect to be a single byte long. They could in the case of metric for a EIGRP or OSFP like protocol the second template argument could be the compound of a dozen or more different variable. Each taking a non-trival time to copy or recompute.
For ease of use I investigate the use a function object that accepts the third and fourth template to handle special cases and polymorphic classes that would fail to function or copy correctly. The goal to not force a user to rebuild their objects from scratch. This would require planning for virtual function to preform deep copies, or any number of other unknown oddites. The usefulness of the function object depends on how sanely a compiler can be depended on not generate a cascade of function calls.
More helpful I hope?
The C++11 standard doesn't say anything about how constexpr will be compiled down to machine instructions. The standard just says that expressions that are constexpr may be used in contexts where a compile time constant value is required. How any particular compiler chooses to translate that to executable code is an implementation issue.
Now in general, with optimizations turned on you can expect a reasonable compiler to not execute any code at runtime for many uses of constexpr but there aren't really any guarantees. I'm not really clear on what exactly you're asking about in your example so it's hard to give any specifics about your use case.
constexpr expressions are not special. For all intents and purposes, they're basically const unless the context they're used in is constexpr and all variables/functions are also constexpr. It is implementation defined how the compiler chooses to handle this. The Standard never deals with implementation details because it speaks in abstract terms.

Specializing std::optional

Will it be possible to specialize std::optional for user-defined types? If not, is it too late to propose this to the standard?
My use case for this is an integer-like class that represents a value within a range. For instance, you could have an integer that lies somewhere in the range [0, 10]. Many of my applications are sensitive to even a single byte of overhead, so I would be unable to use a non-specialized std::optional due to the extra bool. However, a specialization for std::optional would be trivial for an integer that has a range smaller than its underlying type. We could simply store the value 11 in my example. This should provide no space or time overhead over a non-optional value.
Am I allowed to create this specialization in namespace std?
The general rule in 17.6.4.2.1 [namespace.std]/1 applies:
A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the standard library requirements for the original template and is not explicitly
prohibited.
So I would say it's allowed.
N.B. optional will not be part of the C++14 standard, it will be included in a separate Technical Specification on library fundamentals, so there is time to change the rule if my interpretation is wrong.
If you are after a library that efficiently packs the value and the "no-value" flag into one memory location, I recommend looking at compact_optional. It does exactly this.
It does not specialize boost::optional or std::experimental::optional but it can wrap them inside, giving you a uniform interface, with optimizations where possible and a fallback to 'classical' optional where needed.
I've asked about the same thing, regarding specializing optional<bool> and optional<tribool> among other examples, to only use one byte. While the "legality" of doing such things was not under discussion, I do think that one should not, in theory, be allowed to specialize optional<T> in contrast to eg.: hash (which is explicitly allowed).
I don't have the logs with me but part of the rationale is that the interface treats access to the data as access to a pointer or reference, meaning that if you use a different data structure in the internals, some of the invariants of access might change; not to mention providing the interface with access to the data might require something like reinterpret_cast<(some_reference_type)>. Using a uint8_t to store a optional-bool, for example, would impose several extra requirements on the interface of optional<bool> that are different to the ones of optional<T>. What should the return type of operator* be, for example?
Basically, I'm guessing the idea is to avoid the whole vector<bool> fiasco again.
In your example, it might not be too bad, as the access type is still your_integer_type& (or pointer). But in that case, simply designing your integer type to allow for a "zombie" or "undetermined" value instead of relying on optional<> to do the job for you, with its extra overhead and requirements, might be the safest choice.
Make it easy to opt-in to space savings
I have decided that this is a useful thing to do, but a full specialization is a little more work than necessary (for instance, getting operator= correct).
I have posted on the Boost mailing list a way to simplify the task of specializing, especially when you only want to specialize some instantiations of a class template.
http://boost.2283326.n4.nabble.com/optional-Specializing-optional-to-save-space-td4680362.html
My current interface involves a special tag type used to 'unlock' access to particular functions. I have creatively named this type optional_tag. Only optional can construct an optional_tag. For a type to opt-in to a space-efficient representation, it needs the following member functions:
T(optional_tag) constructs an uninitialized value
initialize(optional_tag, Args && ...) constructs an object when there may be one in existence already
uninitialize(optional_tag) destroys the contained object
is_initialized(optional_tag) checks whether the object is currently in an initialized state
By always requiring the optional_tag parameter, we do not limit any function signatures. This is why, for instance, we cannot use operator bool() as the test, because the type may want that operator for other reasons.
An advantage of this over some other possible methods of implementing it is that you can make it work with any type that can naturally support such a state. It does not add any requirements such as having a move constructor.
You can see a full code implementation of the idea at
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/optional/optional.hpp?at=default&fileviewer=file-view-default
and for a class using the specialization:
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/class.hpp?at=default&fileviewer=file-view-default
(lines 220 through 242)
An alternative approach
This is in contrast to my previous implementation, which required users to specialize a class template. You can see the old version here:
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/optional.hpp
and
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/specialization.hpp
The problem with this approach is that it is simply more work for the user. Rather than adding four member functions, the user must go into a new namespace and specialize a template.
In practice, all specializations would have an in_place_t constructor that forwards all arguments to the underlying type. The optional_tag approach, on the other hand, can just use the underlying type's constructors directly.
In the specialize optional_storage approach, the user also has the responsibility of adding proper reference-qualified overloads of a value function. In the optional_tag approach, we already have the value so we do not have to pull it out.
optional_storage also required standardizing as part of the interface of optional two helper classes, only one of which the user is supposed to specialize (and sometimes delegate their specialization to the other).
The difference between this and compact_optional
compact_optional is a way of saying "Treat this special sentinel value as the type being not present, almost like a NaN". It requires the user to know that the type they are working with has some special sentinel. An easily specializable optional is a way of saying "My type does not need extra space to store the not present state, but that state is not a normal value." It does not require anyone to know about the optimization to take advantage of it; everyone who uses the type gets it for free.
The future
My goal is to get this first into boost::optional, and then part of the std::optional proposal. Until then, you can always use bounded::optional, although it has a few other (intentional) interface differences.
I don't see how allowing or not allowing some particular bit pattern to represent the unengaged state falls under anything the standard covers.
If you were trying to convince a library vendor to do this, it would require an implementation, exhaustive tests to show you haven't inadvertently blown any of the requirements of optional (or accidentally invoked undefined behavior) and extensive benchmarking to show this makes a notable difference in real world (and not just contrived) situations.
Of course, you can do whatever you want to your own code.