How is typeid implemented? - c++

I've read that it is implementation specific depending on your compiler, but what might be one way it is implemented? I am asking mainly because I want to know how it creates a name for a type. I'm relatively knew to programming and to C++ but what I've seen so far, it doesn't seem possible to turn a type into a string before runtime.
I was thinking the name generation could be done with macros and token-pasting sort of like this:
#define Put_In_Quotes(input) #input
template<class T>
const char* type_name(T data_type){
return Put_In_Quotes(T);
}
But I think this would simply return the literal string "T" instead of the type name. Not to mention it doesn't explain how typeid lets you enter just types and not values for its datatype parameter, eg. typeid(int).name()
All answers or guides to more information are greatly appreciated

Welcome to SE!
The compiler knows all the possible types your program might employ just like it knows which template forms your program needs instantiated. (As for the particular name strings it generates, different compilers use all sorts of conventions for how they name things internally.)
In simple terms, the typeid operator just returns a type_info object, right? Broadly speaking, that object is essentially all there is to implementing a type's RTTI: a type_info kept in the vtable, which is returned by typeid.
You mentioned you were new to C++, but if you look up how to do a vtable dump for your compiler, you can examine the actual data yourself. The formatting specifics of every compiler will differ, of course, but for a polymorphic type, you'll usually find this info right alongside the rest of that type's vtable data. Depending on how your compiler formats that output, it may look something like an additional struct member of that type and may (usually?) immediately precede or follow the rest of the type data. Static types will have similar data residing elsewhere.

Related

Flex/Bison: cannot use semantic_type

I try to create a c++ flex/bison parser. I used this tutorial as a starting point and did not change any bison/flex configurations. I am stuck now to the point of trying to unit test the lexer.
I have a function in my unit tests that directly calls yylex, and checks the result of it:
private: static void checkIntToken(MyScanner &scanner, Compiler *comp, unsigned long expected, unsigned char size, char isUnsigned, unsigned int line, const std::string &label) {
yy::MyParser::location_type loc;
yy::MyParser::semantic_type semantic; // <---- is seems like the destructor of this variable causes the crash
int type = scanner.yylex(&semantic, &loc, comp);
Assert::equals(yy::MyParser::token::INT, type, label + "__1");
MyIntToken* token = semantic.as<MyIntToken*>();
Assert::equals(expected, token->value, label + "__2");
Assert::equals(size, token->size, label + "__3");
Assert::equals(isUnsigned, token->isUnsigned, label + "__4");
Assert::equals(line, loc.begin.line, label + "__5");
//execution comes to this point, and then, program crashes
}
The error message is:
program: ../src/__autoGenerated__/MyParser.tab.hh:190: yy::variant<32>::~variant() [S = 32]: Assertion `!yytypeid_' failed.
I have tried to follow the logic in the auto-generated bison files, and make some sense out of it. But I did not succeed on that and ultimately gave up. I searched then for any advice on the web about this error message but did not find any.
The location indicated by the error has the following code:
~variant (){
YYASSERT (!yytypeid_);
}
EDIT: The problem disappears only if I remove the
%define parse.assert
option from the bison file. But I am not sure if this is a good idea...
What is the proper way to obtain the value of the token generated by flex, for unit testing purposes?
Note: I've tried to explain bison variant types to the best of my knowledge. I hope it is accurate but I haven't used them aside from some toy experiments. It would be an error to assume that this explanation in any way implies an endorsement of the interface.
The so-called "variant" type provided by bison's C++ interface is not a general-purpose variant type. That was a deliberate decision based on the fact that the parser is always able to figure out the semantic type associated with a semantic value on the parser stack. (This fact also allows a C union to be used safely within the parser.) Recording type information within the "variant" would therefore be redundant. So they don't. In that sense, it is not really a discriminated union, despite what one might expect of a type named "variant".
(The bison variant type is a template with an integer (non-type) template argument. That argument is the size in bytes of the largest type which is allowed in the variant; it does not in any other way specify the possible types. The semantic_type alias serves to ensure that the same template argument is used for every bison variant object in the parser code.)
Because it is not a discriminated union, its destructor cannot destruct the current value; it has no way to know how to do that.
This design decision is actually mentioned in the (lamentably insufficient) documentation for the Bison "variant" type. (When reading this, remember that it was originally written before std::variant existed. These days, it would be std::variant which was being rejected as "redundant", although it is also possible that the existence of std::variant might have had the happy result of revisiting this design decision). In the chapter on C++ Variant Types, we read:
Warning: We do not use Boost.Variant, for two reasons. First, it appeared unacceptable to require Boost on the user’s machine (i.e., the machine on which the generated parser will be compiled, not the machine on which bison was run). Second, for each possible semantic value, Boost.Variant not only stores the value, but also a tag specifying its type. But the parser already “knows” the type of the semantic value, so that would be duplicating the information.
Therefore we developed light-weight variants whose type tag is external (so they are really like unions for C++ actually).
And indeed they are. So any use of a bison "variant" must have a definite type:
You can build a variant with an argument of the type to build. (This is the only case where you don't need a template parameter, because the type is deduced from the argument. You would have to use an explicit template parameter only if the argument were not of the precise type; for example, an integer of lesser rank.)
You can get a reference to the value of known type T with as<T>. (This is undefined behaviour if the value has a different type.)
You can destruct the value of known type T with destroy<T>.
You can copy or move the value from another variant of known type T with copy<T> or move<T>. (move<T> involves constructing and then destructing a T(), so you might not want to do it if T had an expensive default constructor. On the whole, I'm not convinced by the semantics of the move method. And its name conflicts semantically with std::move, but again it came first.)
You can swap the values of two variants which both have the same known type T with swap<T>.
Now, the generated parser understands all these restrictions, and it always knows the real types of the "variants" it has at its disposal. But you might come along and try to do something with one of these objects in a way that violates a constraint. Since the object really doesn't have any way to check the constraint, you'll end up with undefined behaviour which will probably have some disastrous eventual consequence.
So they also implemented an option which allows the "variant" to check the constraints. Unsurprisingly, this consists of adding a discriminator. But since the discriminator is only used to validate and not to modify behaviour, it is not a small integer which chooses between a small number of known alternatives, but rather a pointer to a std::typeid (or NULL if the variant does not yet contain a value.) (To be fair, in most cases alignment constraints mean that using a pointer for this purpose is no more expensive than using a small enum. All the same...)
So that's what you're running into. You enabled assertions with %define parse.assert; that option was provided specifically to prevent you from doing what you are trying to do, which is let the variant object's destructor run before the variant's value is explicitly destructed.
So the "correct" way to avoid the problem is to insert an explicit call at the end of the scope:
// execution comes to this point, and then, without the following
// call, the program will fail on an assertion
semantic.destroy<MyIntType*>();
}
With the parse assertion enabled, the variant object will be able to verify that the types specified as template parameters to semantic.as<T> and semantic.destroy<T> are the same types as the value stored in the object. (Without parse.assert, that too is your responsibility.)
Warning: opinion follows.
In case anyone reading this cares, my preference for using real std::variant types comes from the fact that it is actually quite common for the semantic value of an AST node to require a discriminated union. The usual solution (in C++) is to construct a type hierarchy which is, in some ways, entirely artificial, and it is quite possible that std::variant can better express the semantics.
In practice, I use the C interface and my own discriminated union implementation.

Why is the `std::sto`... series not a template?

I wonder if there is a reason why the std::sto series (e.g. std::stoi, std::stol) is not a function template, like that:
template<typename T>
T sto(std::string const & str, std::size_t *pos = 0, int base = 10);
and then:
template<>
int sto<int>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
template<>
long sto<long>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
/* etc. */
In my sense, that would be a better design, because for the moment, when I have to convert a string in whatever numerical value an user want, I have to manually manage each case.
Is there a reason to not have such a template function? Is there an assumed choice, or is this just done like that?
Looking at the description of these functions at cppref, I note the following:
... Interprets a signed integer value in the string str.
1) calls std::strtol(str.c_str(), &ptr, base)...
and strol a "C" standard function that's also available in C++.
Reading further, we see: (for the c++ sto* functions):
Return value
The string converted to the specified signed integer type.
Exceptions
std::invalid_argument if no conversion could be performed
std::out_of_range if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or
std::strtoll) sets errno to ERANGE.
So while I have no original source for this, and indeed have never worked with these functions, I would guess that:
TL;DR : These functions are C++-ish wrappers around already existing C/C++ functions -- strtol* -- so they resemble these functions as close as possible.
I have to manage manually each case. Is there a reason to not have such a template function?
In case of such questions, Eric Lippert (C#) usually says something along the lines:
If a feature is missing, then it's missing because noone implemented it yet. And that's because either noone else earlier wanted yet, or because it was considered not worth the effort, or because it couldn't have been finished before publishing the current release".
Here, I guess it's the "not worth" part, but I have neither asked the commitee about, nor managed to find any answer in old questions and faqs. I didn't spend much time searching though.
I say this because I suppose that most common of these functions' functionality (if not all of) is already contained in stream classes, like istringstream. Just like cin/etc, this one also has an all-having operator >>, overloaded for all base numeric types (and more).
Furthermore, the stream manipulators like std::hex (std::setbase) already solve the problem of passing various type-dependent configuration parameters to the actual conversion functions. No problems with mixed function signatures (like those mentioned by DavidHaim in his answer). Here's just a single operator>>.
So.. since if we have it in streams, if we already can read numbers/etc from strings with simple foo >> bar >> setbase(42) >> baz >> ..., then I think it was not worth the effort to add more complicated layers to old C runtime functions.
No proof for that though. Just a hunch.
The problem with template specialization is that the specialization requires you to match the original template function signature, so each specialization must implement the interface of (string,pos,base).
If you would like to have some other type which does not follows this interface, you are in trouble.
Suppose that, in the future, we would like to have sto<std::pair<int,int>>. We will want to have pos and base for the first and the second stringified integer. we would like the signature to be in the form of string,pos1,base1,pos2,base2. Since sto signature is already set, we cannot do it.
You can always wrap std::sto* in your implementation of sto for integral types, but you cannot do that the other way around.
The purpose of these functions is to provide simple conversions for common cases. They are not intended as a general-purpose conversion suite. std::ostringstream is much better for that kind of thing.
In my sense, there would be a better design, because for the moment,
when I have to convert a string in whatever numerical value an user
want, I have to manage manually each case.
No, it would not. Templates goal (deliberately setting T-MP apart) is not to replace overloading; you should always prefer overloading to templates. Actually, it's something the language already does for you! Between a candidate function and a possible template instantation, the former will be prefered. Using language features for the sake of it is bad.
I don't see how templates could help either. Whatever type the user decides to input, it won't be known till runtime, and template types are deduced at compile time. C++ is a statically typed language. In this case, templates will just add an unneeded layer of complexity over normal function overloading.

Will C++ compiler generate code for each template type?

I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for pre-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
If this makes any difference, we are talking about embedded applications.
I have found a similar question, but it did not completely answer my question: Do C++ template classes duplicate code for each pointer type used?
Thanks a lot!
I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
Yes, this is equivalent to having both functions written.
Some linkers will detect the identical functions, and eliminate them. Some libraries are aware that their linker doesn't have this feature, and factor out common code into a single implementation, leaving only a casting wrapper around the common code. Ie, a std::vector<T*> specialization may forward all work to a std::vector<void*> then do casting on the way out.
Now, comdat folding is delicate: it is relatively easy to make functions you think are identical, but end up not being the same, so two functions are generated. As a toy example, you could go off and print the typename via typeid(x).name(). Now each version of the function is distinct, and they cannot be eliminated.
In some cases, you might do something like this thinking that it is a run time property that differs, and hence identical code will be created, and the identical functions eliminated -- but a smart C++ compiler might figure out what you did, use the as-if rule and turn it into a compile-time check, and block not-really-identical functions from being treated as identical.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for per-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
No, they aren't. Generics are roughly equivalent to the C++ technique of type erasure, such as what std::function<void()> does to store any callable object. In C++, type erasure is often done via templates, but not all uses of templates are type erasure!
The things that C++ does with templates that are not in essence type erasure are generally impossible to do with Java generics.
In C++, you can create a type erased container of pointers using templates, but std::vector doesn't do that -- it creates an actual container of pointers. The advantage to this is that all type checking on the std::vector is done at compile time, so there doesn't have to be any run time checks: a safe type-erased std::vector may require run time type checking and the associated overhead involved.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
They are distinct types. I can write code that will behave differently with a char or short value. As an example:
std::cout << x << "\n";
with x being a short, this print an integer whose value is x -- with x being a char, this prints the character corresponding to x.
Now, almost all template code exists in header files, and is implicitly inline. While inline doesn't mean what most folk think it means, it does mean that the compiler can hoist the code into the calling context easily.
If this makes any difference, we are talking about embedded applications.
What really makes a difference is what your particular compiler and linker is, and what settings and flags they have active.
The answer is maybe. In general, each instantiation of a
template is a unique type, with a unique implementation, and
will result in a totally independent instance of the code.
Merging the instances is possible, but would be considered
"optimization" (under the "as if" rule), and this optimization
isn't wide spread.
With regards to comparisons with Java, there are several points
to keep in mind:
C++ uses value semantics by default. An std::vector, for
example, will actually insert copies. And whether you're
copying a short or a double does make a difference in the
generated code. In Java, short and double will be boxed,
and the generated code will clone a boxed instance in some way;
cloning doesn't require different code, since it calls a virtual
function of Object, but physically copying does.
C++ is far more powerful than Java. In particular, it allows
comparing things like the address of functions, and it requires
that the functions in different instantiations of templates have
different addresses. Usually, this is not an important point,
and I can easily imagine a compiler with an option which tells
it to ignore this point, and to merge instances which are
identical at the binary level. (I think VC++ has something like
this.)
Another issue is that the implementation of a template in C++
must be present in the header file. In Java, of course,
everything must be present, always, so this issue affects all
classes, not just template. This is, of course, one of the
reasons why Java is not appropriate for large applications. But
it means that you don't want any complicated functionality in a
template; doing so loses one of the major advantages of C++,
compared to Java (and many other languages). In fact, it's not
rare, when implementing complicated functionality in templates,
to have the template inherit from a non-template class which
does most of the implementation in terms of void*. While
implementing large blocks of code in terms of void* is never
fun, it does have the advantage of offering the best of both
worlds to the client: the implementation is hidden in compiled
files, invisible in any way, shape or manner to the client.

Does the "type" of a struct change from computer to computer?

Let's assume I have this code;
class Ingredients{
public:
Ingredients(int size,string name);
int getsize();
private:
string name;
int size;
};
struct Chain{
Ingredients* ing;
Chain* next;
}
And in my main;
int main()
{
cout<<typeid(Chain).name()<<endl;
cout<<typeid(Chain->ing).name()<<endl;
cout<<typeid(Chain->next).name()<<endl;
}
my headers are;
#include <iostream>
#include <typeinfo>
using namespace std;
and finally outputs;
P8Chain
P12Ingredients
P8Chain
so my question is, will this types are reliable for using it in a code? If the types are changing (because of the P8 and P12 things I am not sure it would be the same) from computer to comp. this types wouldn't be reliable. What are your opinions?
Also they are not changing on every run.
They depend on your compiler, so don't use them inside your code.
The C++ standard says the following concerning typeid (section 5.2.8):
The result of a typeid expression is an lvalue of static type const std::type_info and dynamic type const std::type_info or const name where name is an implementation-defined class derived from std::type_info.
What you can do if you want some sort of RTTI is
if (typeid(myobject) == typeid(Chain)) {
do_something();
}
It depends on what you mean by "type". The more or less standard definition of type is the set of values and operations the type can take, and this will change from one machine to the next, because the size of int will change, or the maximum length of a string. On the other hand, there is a very real sense that type is what the compiler and the C++ standard consider it to be, which very roughly would correspond to, or at least be identified by the scoped name. Finally, the std::type_info::name() function is seriously underspecified. At best, it can be useful for debugging (e.g. logging the actual derived class a function was called with), and not all compilers provide even that. As far as the standard is concerned, a compiler could always return an empty string, and still be conform.
According to the standards the name() is implementation defined (as well as according to Stroustrup book - see p 415 on 3th edition)
The word "type" has multiple meanings.
In terms of type theory, a C++ struct doesn't really define a type at all.
More usefully, the C++ language standard talks about types in a way that can be taken rigorously, even if it never quite rigorously defines the term. In those terms, the struct declaration does define a unique and consistent type.
Maybe even more usefully, to your C++ compiler (and C linker), a type is represented by things like a memory layout, a mangled name, a list of member names and types, pointers to special and normal member functions, possibly pointers to vtable and/or rtti info, etc. This will not be the same from implementation to implementation. Between builds with the same implementation, some details (like where the pointers point) may change even if no relevant code changes, but you could probably define a useful subset of information that you could usefully call a "type" that doesn't change.
Beyond that, the type_info instance defined by section 18.5.1 of the standard cannot change within the bounds of what's explicitly defined, and the result of typeid as defined by section 5.2.8 as to be that instance or a compatible object that still compares equal to it. So, it sounds to me like, if it were possible to load up the type_info instances from two different runs at the same time, operator== would have to return true. However, it's not actually possible to load up type_info instances from two different runs (there's no requirement that they be serializable in any way, for example), so this may not be relevant.
Finally, the name displayed by typeid().name(), as defined by section 18.5.1, is just any implementation-defined NTBS. It could change between builds, runs, even calls within the same run. It could always be empty. Practically, it'll often be something vaguely useful for debugging, but that isn't guaranteed—and, even if it were, that wouldn't help, because "vaguely useful for debugging" doesn't have to mean "unique within a run and persistent across runs".
If you're asking about a specific compiler, the documentation for the compiler may give stricter guarantees than the standard requires. For example, I believe that on various platforms g++ guarantees that it'll use the C++ ABI defined at CodeSourcery, and will not change ABI versions within minor compiler versions, and will use the mangled names defined in the ABI as the type_info names. This means taking the binary to another computer won't affect the names, and even recompiling the source on another computer with the same platform and g++ version won't affect the names.

Reflexion Perfect Forwarding and the Visitor Pattern

http://codepad.org/etWqYnn3
I'm working on some form of a reflexion system for C++ despite the many who have warned against. What I'm looking at having is a set of interfaces IScope, IType, IMember, IMonikerClient and a wrapper class which contains the above say CReflexion. Ignoring all but the member which is the important part here is what I would like to do:
1) Instance the wrapper
2) Determine which type is to be used
3) Instance type
4) Overload the () and [] to access the contained member from outer(the wrapper) in code as easily as it is done when using a std::vector
I find that using 0x I can forward a method call with any type for a parameter. I can't however cast dynamically as cast doesn't take a variable(unless there are ways I am unaware of!)
I linked the rough idea above. I am currently using a switch statement to handle the varying interfaces. I would, and for obvious reasons, like to collapse this. I get type match errors in the switch cases as a cause of the call to the methods compiling against each case where only one of three work for any condition and compiler errors are thrown.
Could someone suggest anything to me here? That is aside from sticking to VARIANT :/
Thanks!
C++, even in "0x land", simply does not expose the kind of information you would need to create something like reflection.
I find that using 0x I can forward a method call with any type for a parameter.
You cannot forward a type as a parameter. You can forward the const-volatile qualifiers on a member, but that's all done in templates, at compile time. No runtime check ever is done when you're using things like forward.
Your template there for operator() is not going to compile unless T is convertable to int*, string*, and A** all at once. Think of templates as a simple find and replace algorithm that generates several functions for you -- the value of T gets replaced with the typename when the template is instantiated, and the function is compiled as normal.
Finally, you can only use dyanmic_cast to cast down the class hierarchy -- casting between the completely unrelated types A B and C isn't going to operate correctly.
You're better off taking the time to rethink your design such that it doesn't use reflection at all. It will probably be a better design anyway, considering even in language with reflection, reflection is most often used to paper over poor designs.