Mangling names for templated functions in runtime - possible? - c++

Suppose I've written a foo<T> function (I have a full signature with namespaces), but never mind that right now); and suppose there is no other function overloading it (in the relevant namespace it's in). Now let's place ourselves at runtime. Suppose I have the string "foo", and for some type MyType, I have typeid(MyType) (from the <memory> header).
Can I somehow obtain the symbol name for foo<MyType>?
Second version of this question: Now suppose I have the full signature of foo as a string, instead of just the name; and drop the assumption about no overloads.
Notes:
No, I'm not asking about the symbol itself, just the name. That would be an interesting question for another time.
Answers which depend on foo<T> coming from a shared library are relevant, although I don't think it should matter just for the symbol name.
I don't care about performance here, I'll do whatever it takes. Help me Obi Wan, you're my last hope etc. So, RTTI, compiling with weird flags, whatever.
Platform-dependent answers are also relevant: GNU/Linux with kernel version >= 3.x , an x86_64 CPU , gcc >= 4.8 .

No, you can't.
To get the mangled name of an instantiated function template, you need in the simplest case the following information:
The fully qualified name of the function (you said you only have "foo", what if the function is in a namespace?)
The types of all template type arguments (the mangled name of the type may suffice, if the mangling scheme for the function name embeds the type name directly; otherwise you'd need the full type name, potentially recursively into all template arguments of the type).
The types of all function arguments (same caveat applies).
This is assuming that you don't have template template parameters, or non-type parameters. It gets a lot more complicated when you have those, since it may require mangled forms of entire expression trees. It also assumes that you're not dealing with partial or full explicit specialization, which is even more complicated. And it is finally assuming that your function doesn't have any special decoration due to compiler-specific extensions (e.g. __stdcall in 32-bit Windows environments). Oh, and some ABIs may encode the return type of the function as well.
Since according to your premise you only have the function name (not clear on whether it is fully qualified) and the type_id objects of the template arguments (which may work as a source of the mangled type name, but do not on all platforms), you have insufficient information to recreate the mangled name.
This leaves the option of obtaining a list of all compiled symbols from your binary (if such is available) and searching for a most likely candidate, which is of course error-prone.

Related

Does C++ use static name resolution or dynamic name resolution?

I have been reading about "Name resolution" in wikipedia (Name resolution WIKI) and it has been given in that that C++ uses "Static Name Resolution". If that is true then I couldn't figure out how C++ manages to provide "polymorphism" without using dynamic name resolution.
Can anyone please answer whether C++ uses "Static Name Resolution" or "Dynamic Name Resolution". If it is static, can you also explain how C++ provides polymorphism.
Wikipedia's definition of name resolution is about how tokens are resolved into the names of constructs (functions, typenames, etc). Given that definition, C++ is 100% static with its name resolution. Every token that represents an identifier must be associated at compile-time with a specific entity.
C++ polymorphism is effectively cheating. The compiler can see that a static name resolves to a member function defined with the virtual keyword. If the compiler sees that the object you are calling this on is a dynamic object (ie: a pointer/reference to that type rather than a value of that type), the the compiler emits special code to call that function.
This special code does not change the name it resolves to. What it changes is the function that eventually gets called. That is not dynamic naming; that is dynamic function dispatch. The name gets resolved at compile-time; the function gets resolved at runtime.
C++ use static name resolution because it renames each function to made each one have an unique.
That mean that the function int foo(int bar) will be known by the compiler as something like _Z3fooi, while int foo(float bar) will be known as something like _Z3foof.
This is what we call name mangling.

C++ type suffix _t, _type or none

C++ sometimes uses the suffix _type on type definitions (e.g. std::vector<T>::value_type),
also sometimes _t (e.g. std::size_t), or no suffix (normal classes, and also typedefs like std::string which is really std::basic_string<...>)
Are there any good conventions on when to use which name?
As #MarcoA.'s answer correctly points out, the suffix _t is largely inherited from C (and in the global namespace - reserved for POSIX).
This leaves us with "no suffix" and _type.
Notice that there is no namespace-scope name in std ending in _type*; all such names are members of classes and class templates (or, in the case of regex-related types, of a nested namespace which largely plays a role of a class). I think that's the distinction: types themselves don't use the _type suffix.
The suffix _type is only used on members which denote types, and moreover, usually when they denote a type somewhat "external" to the containing class. Compare std::vector<T>::value_type and std::vector<T>::size_type, which come from the vector's template parameters T and Allocator, respectively, against std::vector<T>::iterator, which is "intrinsic" to the vector class template.
* Not entirely true, there are a few such names (also pointed out in a comment by #jrok): common_type, underlying_type, is_literal_type, true_type, false_type. In the first three, _type is not really a suffix, it's an actual part of the name (e.g. a metafunction to give the common type or the underlying type). With true_type and false_type, it is indeed a suffix (since true and false are reserved words). I would say it's a type which represents a true/false value in the type-based metaprogramming sense.
As a C heritage the _t (that used to mean "defined via typedef") syntax has been inherited (they're also SUS/POSIX-reserved in the global namespace).
Types added in C++ and not present in the original C language (e.g. size_type) don't need to be shortened.
Keep in mind that to the best of my knowledge this is more of an observation on an established convention rather than a general rule.
Member types are called type or something_type in the C++ standard library. This is readable and descriptive, and the added verbosity is not usually a problem because users don't normally spell out those type names: most of them are used in function signatures, then auto takes care of member function return types, and in C++14 the _t type aliases take care of type trait static type members.
That leads to the second point: Free-standing, non-member types are usually called something_t: size_t, int64_t, decay_t, etc. There is certainly an element of heritage from C in there, but the convention is maintained in the continuing evolution of C++. Presumably, succinctness is still a useful quality here, since those types are expected to be spelled out in general.
Finally, all the above only applies to what I might call "generic type derivation": Given X, give me some related type X::value_type, or given an integer, give me the 64-bit variant. The convention is thus restricted to common, vocabulary-type names. The class names of your actual business logic (including std::string) presumably do not warrant such a naming pattern, and I don't think many people would like to have to mangle every type name.
If you will, the _t and _type naming conventions apply primarily to the standard library and to certain aspects of the standard library style, but you do not need to take them as some kind of general mandate.
My answer is only relevant for type names within namespaces (that aren't std).
Use no suffix usually, and _type for enums
So, here's the thing: the identifier foo_type can be interpreted as
"the identifier of the type for things which are foo's" (e.g. size_type overall_size = v1.size() + v2.size();)
"the identifier of the type for things which are kinds, or types, of foo" (e.g. employment_type my_employment_type = FIXED_TERM;)
When you have typedef'ed enums in play, I think you would tend towards the second interpretation - otherwise, what would you call your enum types?
The common aversion to using no suffix is that seeing the identifier foo is confusing: Is it a variable, a specific foo? Or is it the type for foos? ... luckily, that's not an issue when you're in a namespace: my_ns::foo is obviously a type - you can't get it wrong (assuming you don't use global variables...); so no need for a prefix there.
PS - I employ the practice of suffixing my typedef's within classes with _type (pointer_type, value_type, reference_type etc.) I know that contradicts my advice above, but I somehow feel bad breaking with tradition on this point.
Now, you could ask - what happens if you have enums within classes? Well, I try to avoid those, and place my enum inside the surrounding namespace.

Does the "type" of a struct change from computer to computer?

Let's assume I have this code;
class Ingredients{
public:
Ingredients(int size,string name);
int getsize();
private:
string name;
int size;
};
struct Chain{
Ingredients* ing;
Chain* next;
}
And in my main;
int main()
{
cout<<typeid(Chain).name()<<endl;
cout<<typeid(Chain->ing).name()<<endl;
cout<<typeid(Chain->next).name()<<endl;
}
my headers are;
#include <iostream>
#include <typeinfo>
using namespace std;
and finally outputs;
P8Chain
P12Ingredients
P8Chain
so my question is, will this types are reliable for using it in a code? If the types are changing (because of the P8 and P12 things I am not sure it would be the same) from computer to comp. this types wouldn't be reliable. What are your opinions?
Also they are not changing on every run.
They depend on your compiler, so don't use them inside your code.
The C++ standard says the following concerning typeid (section 5.2.8):
The result of a typeid expression is an lvalue of static type const std::type_info and dynamic type const std::type_info or const name where name is an implementation-defined class derived from std::type_info.
What you can do if you want some sort of RTTI is
if (typeid(myobject) == typeid(Chain)) {
do_something();
}
It depends on what you mean by "type". The more or less standard definition of type is the set of values and operations the type can take, and this will change from one machine to the next, because the size of int will change, or the maximum length of a string. On the other hand, there is a very real sense that type is what the compiler and the C++ standard consider it to be, which very roughly would correspond to, or at least be identified by the scoped name. Finally, the std::type_info::name() function is seriously underspecified. At best, it can be useful for debugging (e.g. logging the actual derived class a function was called with), and not all compilers provide even that. As far as the standard is concerned, a compiler could always return an empty string, and still be conform.
According to the standards the name() is implementation defined (as well as according to Stroustrup book - see p 415 on 3th edition)
The word "type" has multiple meanings.
In terms of type theory, a C++ struct doesn't really define a type at all.
More usefully, the C++ language standard talks about types in a way that can be taken rigorously, even if it never quite rigorously defines the term. In those terms, the struct declaration does define a unique and consistent type.
Maybe even more usefully, to your C++ compiler (and C linker), a type is represented by things like a memory layout, a mangled name, a list of member names and types, pointers to special and normal member functions, possibly pointers to vtable and/or rtti info, etc. This will not be the same from implementation to implementation. Between builds with the same implementation, some details (like where the pointers point) may change even if no relevant code changes, but you could probably define a useful subset of information that you could usefully call a "type" that doesn't change.
Beyond that, the type_info instance defined by section 18.5.1 of the standard cannot change within the bounds of what's explicitly defined, and the result of typeid as defined by section 5.2.8 as to be that instance or a compatible object that still compares equal to it. So, it sounds to me like, if it were possible to load up the type_info instances from two different runs at the same time, operator== would have to return true. However, it's not actually possible to load up type_info instances from two different runs (there's no requirement that they be serializable in any way, for example), so this may not be relevant.
Finally, the name displayed by typeid().name(), as defined by section 18.5.1, is just any implementation-defined NTBS. It could change between builds, runs, even calls within the same run. It could always be empty. Practically, it'll often be something vaguely useful for debugging, but that isn't guaranteed—and, even if it were, that wouldn't help, because "vaguely useful for debugging" doesn't have to mean "unique within a run and persistent across runs".
If you're asking about a specific compiler, the documentation for the compiler may give stricter guarantees than the standard requires. For example, I believe that on various platforms g++ guarantees that it'll use the C++ ABI defined at CodeSourcery, and will not change ABI versions within minor compiler versions, and will use the mangled names defined in the ABI as the type_info names. This means taking the binary to another computer won't affect the names, and even recompiling the source on another computer with the same platform and g++ version won't affect the names.

A compile time ordering on types

I've been looking for a way to get an ordering on types at compile time. This would be useful, for example, for implementing (efficient) compile-time type-sets.
One obvious way to do it would be if there were a way to map every type to a unique integer. An answer to a previous question on that topic succinctly captures why that's difficult, and it seems like it would apply equally to any other way of trying to get an ordering:
the compiler has no way of knowing all compilation units and the linker has no concept of a type
Indeed, the challenge to the compiler would be considerable: it has to make sure that, in any invocation, for any source file, it returns the same integer for a given type / it returns the same ordering between any two given types, but at the same time, the universe of types is open and it has no knowledge of any types outside of the current file. A hard problem.
The idea I had is that types have names. And by the laws of C++, as far as I know the fully qualified name of a type must be unique across the entire program, otherwise you will get errors or undefined behaviour of some sort or another.
If two types have the same name, then they are the same type.
If two types are the same type, then either they have the same name, or they are typedefs for one another. The compiler has full knowledge of typedefs.
Names are strings, and strings have an ordering. So if I have it right, you could define a globally consistent ordering on types based on their names. More specifically, the ordering between any two types would be the ordering between the names of the types with the typedefs fully resolved. (Having a type behave differently from its typedefs would be problematic.)
Of course, standard C++ doesn't have any facilities for retrieving the names of types.
My questions are:
Do I have anything wrong? Are there any reasons this wouldn't, in theory, work?
Are there any compilers which give you access to the names of types (and ideally their typedef-resolved forms) at compile time as a language extension?
Is there any other way it could be done? Are there any compilers which do?
(I recognize that it's not polite to ask more than one question in the same question, but it seemed strange to post three separate questions with the same basic throat-clearing preceding them.)
the fully qualified name of a type must be unique across the entire program
But of course, that's only true if you consider seperate anonymous namespaces in different translation units to have different names in some sense, and have some way to figure out what they are.
The only sense in which I'm aware they really do have different names is in mangled linker symbols; you may (depending on the compiler) be able to get that from type_info::name(), but it isn't guaranteed, is limited to types with RTTI, and anyway doesn't seem to be declared as a constexpr so you can't use the value at compile time.
The ordering produced by type_info::before() naturally has the same limitations.
Out of interest, what are you trying to achieve with your compile-time type ordering?

unique synthesised name

I would like to generate various data types in C++ with unique deterministic names. For example:
struct struct_int_double { int mem0; double mem1; };
At present my compiler synthesises names using a counter, which means the names don't agree when compiling the same data type in distinct translation units.
Here's what won't work:
Using the ABI mangled_name function. Because it depends already on structs having unique names. Might work in C++11 compliant ABI by pretending struct is anonymous?
Templates eg struct2 because templates don't work with recursive types.
A complete mangling. Because it gives names which are way too long (hundreds of characters!)
Apart from a global registry (YUK!) the only thing I can think of is to first create a unique long mangled name, and then use a digest or hash function to shorten it (and hope there are no clashes).
Actual problem: to generate libraries which can be called where the types are anonymous, eg tuples, sum types, function types.
Any other ideas?
EDIT: Addition description of recursive type problem. Consider defining a linked list like this:
template<class T>
typedef pair<list<T>*, T> list;
This is actually what is required. It doesn't work for two reasons: first, you can't template a typedef. [NO, you can NOT use a template class with a typedef in it, it doesn't work] Second, you can't pass in list* as an argument because it isn't defined yet. In C without polymorphism you can do it:
struct list_int { struct list_int *next; int value; };
There are several work arounds. For this particular problem you can use a variant of the Barton-Nackman trick, but it doesn't generalise.
There is a general workaround, first shown me by Gabrielle des Rois, using a template with open recursion, and then a partial specialisation to close it. But this is extremely difficult to generate and would probably be unreadable even if I could figure out how to do it.
There's another problem doing variants properly too, but that's not directly related (it's just worse because of the stupid restriction against declaring unions with constructable types).
Therefore, my compiler simply uses ordinary C types. It has to handle polymorphism anyhow: one of the reasons for writing it was to bypass the problems of C++ type system including templates. This then leads to the naming problem.
Do you actually need the names to agree? Just define the structs separately, with different names, in the different translation units and reinterpret_cast<> where necessary to keep the C++ compiler happy. Of course that would be horrific in hand-written code, but this is code generated by your compiler, so you can (and I assume do) perform the necessary static type checks before the C++ code is generated.
If I've missed something and you really do need the type names to agree, then I think you already answered your own question: Unless the compiler can share information between the translation of multiple translation units (through some global registry), I can't see any way of generating unique, deterministic names from the type's structural form except the obvious one of name-mangling.
As for the length of names, I'm not sure why it matters? If you're considering using a hash function to shorten the names then clearly you don't need them to be human-readable, so why do they need to be short?
Personally I'd probably generate semi-human-readable names, in a similar style to existing name-mangling schemes, and not bother with the hash function. So, instead of generating struct_int_double you might generate sid (struct, int, double) or si32f64 (struct, 32-bit integer, 64-bit float) or whatever. Names like that have the advantage that they can still be parsed directly (which seems like it would be pretty much essential for debugging).
Edit
Some more thoughts:
Templates: I don't see any real advantage in generating template code to get around this problem, even if it were possible. If you're worried about hitting symbol name length limits in the linker, templates can't help you, because the linker has no concept of templates: any symbols it see will be mangled forms of the template structure generated by the C++ compiler and will have exactly the same problem as long mangled names generated directly by the felix compiler.
Any types that have been named in felix code should be retained and used directly (or nearly directly) in the generated C++ code. I would think there are practical (soft) readability/maintainability constraints on the complexity of anonymous types used in felix code, which are the only ones you need to generate names for. I assume your "variants" are discriminated unions, so each component part must have a name (the tag) defined in the felix code, and again these names can be retained. (I mentioned this in a comment, but since I'm editing my answer I might as well include it)
Reducing mangled-name length: Running a long mangled name through a hash function sounds like the easiest way to do it, and the chance of collisions should be acceptable as long as you use a good hash function and retain enough bits in your hashed name (and your alphabet for encoding the hashed name has 37 characters, so a full 160-bit sha1 hash could be written in about 31 characters). The hash function idea means that you won't be able to get directly back from a hashed name to the original name, but you might never need to do that. And you could dump out an auxiliary name-mapping table as part of the compilation process I guess (or re-generate the name from the C struct definition maybe, where it's available). Alternatively, if you still really don't like hash functions, you could probably define a reasonably compact bit-level encoding (then write that in the 37-character identifier alphabet), or even run some general purpose compression algorithm on that bit-level encoding. If you have enough felix code to analyse you could even pre-generate a fixed compression dictionary. That's stark raving bonkers of course: just use a hash.
Edit 2: Sorry, brain failure -- sha-1 digests are 160 bits, not 128.
PS. Not sure why this question was down-voted -- it seems reasonable to me, although some more context about this compiler you're working on might help.
I don't really understand your problem.
template<typename T>
struct SListItem
{
SListItem* m_prev;
SListItem* m_next;
T m_value;
};
int main()
{
SListItem<int> sListItem;
}