templating virtual functions not possible. Only a temporary technical limitation? - c++

I understand that you can't declare a virtual method as templated, because the compiler would not know how much entries to reserve in the virtual table. This is, however, a technical limitation, rather than a language one. The compiler could know how many instances of the template are actually needed, and "go back" to allocate a proper vtable size.
Is there a planned technical solution in the upcoming standard?

The compiler can never know all of the possible instantiations of a template. Under the current compilation model, each translation unit is compiled separately and later linked. When compiling a template type in one translation unit, you do not know the instantiations of that type in another.
Imagine you're writing a library and you want a template function in it. You compile the library and then distribute it to your clients. Now the clients can instantiate your template function with whatever template arguments they like, but your library has already been compiled! It can't "go back" and change this.
You're assuming that when you compile the template function, you also have available every instantiation of that function. That's often not the case and, under the current compilation and linking model, cannot be known to be the case.

It's certainly possible to do this, given no requirements of working with existing linkers. That is, the linker could sift through all the instantiations of that template function and build the appropriate data structures. But one of the strengths of C++ is that it doesn't require specialized linkers; that makes it portable to systems where the linker is written in stone and cannot be changed. And, yes, that happens; the linker is where all the object code meets, and it has to be compatible with all the programming languages that the system supports, and that, in turn, means that it sometimes has grown old and crufty, and any change brings a substantial risk of breakage. So, while it's theoretically possible to do this, it ain't gonna happen.

There is nothing currently planned based on the C++ Standards Committee papers and core language issues. The C++ Standard specifies requirements for implementations of C++, but not define the technical implementation itself. Hence, template virtual functions are explicitly not a technical limitation, but rather a limitation of the language defined by the standard. Nevertheless, the limitation of the language may be the result of the risk involved in changing existing implementations rather than being imposed as a result of an implementation's technical limitations.

Related

Compiler optimizations on map/set in c++

Does compiler make optimzation on data structures when input size is small ?
unordered_set<int>TmpSet;
TmpSet.insert(1);
TmpSet.insert(2);
TmpSet.insert(3);
...
...
Since since is small using hashing would be not required we can simply store this in 3variables. Do optimization like this happen ? If yes who is responsible for them ?
edit: Replaced set with unordered_set as former doesn't do hasing.
Possible in theory to totally replace whole data structures with a different implementation. (As long as escape analysis can show that a reference to it couldn't be passed to separately-compiled code).
But in practice what you've written is a call to a constructor, and then three calls to template functions which aren't simple. That's all the compiler sees, not the high-level semantics of a set.
Unless it's going to optimize stuff away, real compilers aren't going to use a different implementation which would have different object representations.
If you want to micro-optimize like this, in C++ you should do it yourself, e.g. with a std::bitset<32> and set bits to indicate set membership.
It's far too complex a problem for a compiler to reliably do a good job. And besides, there could be serious quality-of-implementation issues if the compiler starts inventing different data-structures that have different speed/space tradeoffs, and it guesses wrong and uses one that doesn't suit the use-case well.
Programmers have a reasonable expectation that their compiled code will work somewhat like what they wrote, at least on a large scale. Including calling those standard header template functions with their implementation of the functions.
Maybe there's some scope for a C++ implementation adapting to the use-case, but I that would need much debate and probably a special mechanism to allow it in practice. Perhaps name-recognition of std:: names could be enough, making those into builtins instead of header implementations, but currently the implementation strategy is to write those functions in C++ in .h headers. e.g. in libstdc++ or libc++.

Why is there no accurate C++ decompiler?

Why is it not possible to create a C++ decompiler that will function as accurately as those made for Java and C#?
There are several reasons:
Inlining. A lot of C++ code gets inlined in optimized builds. That plays havoc with any form of decompiler. To figure out that a function was inlined, the decompiler would have to analyze the specifics of the inlined code and match them up. And post-inlining optimization steps can make code very different, depending on where it was inlined.
Templates. Templates use #1 exclusively, but they create additional problems. It is at least theoretically possible that a function that gets inlined in two places would compile to the same sequence of assembly instructions. But for template code, which was instantiated with different template arguments? Different instantiations will usually have to compile down to different sequences of instructions. And this becomes even more difficult, since template code can call different sets of functions based on the template parameters. And those functions themselves could be inlined.
Compile-time execution. Template metaprogramming allows the compiler to actually execute code. But C++11's constexpr provides a more natural way to do some computations at compile time. Obviously, compile-time function calls or metafunction instantiations cannot be part of the compiled executable. Only the results of them will be (since that's kinda the point).
Lack of comprehensive runtime reflection. C# and Java both lace their bytecode with a lot of information about what the nature of the original source code. Object definitions are easily detectable, as are object names, member variable types and names, etc. C++ compiles down to machine language, which is not required to have any such information. And since it isn't required, compilers don't generate it. Even the reflection study group of the ISO C++ committee is focused on compile-time reflection, which is information that won't be available at runtime.
Even std::type_info doesn't offer anything. The reason being that, if the compiler does not detect that a particular type will have typeid called on it, then the compiler doesn't need to generate a std::type_info object for it. And even if it did, all that gives you is an object's name (and an identifier). Nothing more.
Because C++ compilers generally do not put any more information into the executable than they absolutely have to (especially not if they are compiling in release mode rather than a debug build), so the information you'd need to accurately decompile the program simply is not present in the executable.
Of course a C++ compiler could be made that does include all of the necessary information in the executable (e.g. in the most naive implementation, it could simply include a copy of the source code itself in the executable), but doing so would make the executables significantly larger, and most non-open-source C++ developers would prefer that other people not be able to decompile the executable, so there isn't a whole lot of demand for that functionality.

Does D allow separation of interface from implementation in templates?

I haven't tried D yet, but it seems like a very interesting language that has found some neat solutions to problems in C++. I'm curious, did it also make it possible to separate interface from implementation in templates? If yes, then how?
no any templates used are fully expanded at compile time
this means that the compiler needs to know the full code of the template making it impossible to keep it out of the .di files
At some point in processing the use of a template, D needs all the information about the template. However, there is no reason that this information need be encode as the original source code (OTOH, as an implementation detail, all current D compiler do require that). This is a fundamental issue of any language that has templates stronger than generics. The implications of this depend on what you are trying to do.
If your interest in separation of interface and implementation is to hide the implementation (like shipping binary libraries and header files in C), then this can't be done. The closest you can get is some kind of code obfuscation system.
If, on the other hand, you are interested in avoiding the cost of reprocessing templates for each recompilation, something more general like a binary pre-compiled header format could allow the reuse of the results of the lexical, syntactic and some of the passes while compiling several other modules. In fact, that would be simpler to do with D than in C.
A third option would be link time code generation, but that has little difference from conventional linking with aggressive use of an anolog to pre-compiled headers.

Is it advisable to expose a templated class of a Library?

I am developing a Library for Matrix Calculations in C++. For this I wanted to use Templates. After doing a bit of Template Meta Programing, I realized that I would end up in exposing my implementations in the Templatise Matrix Class.Is there any way to obfuscate the template class implementation in the header file when you expose that particular template class ? If yes, then how is it done ?
I will answer from the customer perspective.
When I need to use a library, and integrate it in my code, I expect to see the source code.
It is not because I wish to rip it out from the author... It is not because I am a lawless and irrespective hacker...
It is, simply, because:
code is documentation, and seeing the implementation of a method will help me compensate for the lack of it, or perhaps better understand what it meant (*)
for debugging, the ability to step down into library's code is invaluable
for developing, it is just so much easier if I can compile the code myself, in various flavors (with and without instrumentation, aka gcov, with and without debug symbols, etc...)
I don't ask for the code to be free, I am perfectly fine with the code being licensed, and I'll scrupulously follow the license terms, I just ask for the code to be available.
Frankly, if I have the choice between two libraries, and one does not expose its code, I'll lean toward the other, unless the performance/correctness difference is really important.
(*) In C++, Boost has libraries that I consider fundamentally broken in this regard. The code is riddled with compilers work-around, which makes it very difficult to read. Nevertheless, I use them because they're just awesome.
As templating means that the implementation of the class/function is created compile-time (needs to make a new implementation for each new type) I cannot see how you could hide the code. The only way would be to hide your templates in a precompiled library and only expose interfaces to predefined types. That would lose the template functionality though...
With current standard (and even upcoming C++11), one has to expose all the template definitions where those templates are used. There is no standard way to hide it.
Second part, if you choose to obfuscate it, then equally its usage will become complex. The best way in my opinion is to license/copyright them!
I think all template-based C++ libs are deployed as header files (perhaps also using libraries but the publicly usable templates have to be headers). That's true for STL, boost, etc. It's simply the way templates work -- the compiler has to see the original template.
In addition to all the other reasons cited, there's another problem: C++ names are "decorated" - for example in order to support method overloads, the types of the parameters for the method are encoded in the name of the method.
There is no standard for this encoding, it varies from compiler to compiler and even from one version of a compiler to another version of the same compiler.
As a result, if you have a library containing C++ functions, you can't ensure that the names of the functions can be read by your clients (unless you can guarantee that your clients are using the same version of the compiler that you are).
For standard libraries, this isn't a problem, since the libraries are shipped with the compiler, but for other libraries, you need to be very careful.
No, not really, since a template is not compiled code. It's literally a "template". When a template is instantiated in a .cpp file, the template itself needs to be available to the compiler in order to generate the code for the class method. Therefore you can't "hide" the template code ... it has to be available to the compiler, or else you are going to be unable to compile any modules that attempt to instantiate the template.
One good way to think of templates is like a blank form you might use, say for income-taxes or something of that nature. In order to actually make a valid income-tax form, meaning one filled out with your name, SSN, etc., you need a copy of the "blank" original. So you can't "hide" the form from a person and expect them to fill it out correctly. The same is true for the compiler. When you instantiate a template function or class, a copy of the template needs to be made available for the compiler to fill in the template parameters and actually generate the "real" code behind-the-scenes for you that is then compiled in the code module.
You can put your code in a precompiled header, but I must agree that best way to protect your code is to put license/copyright.

Few questions about C++ compilers : GCC, MSVC, Clang, Comeau etc

I've few questions about C++ compilers
Are C++ compilers required to be one-pass compiler? Does the Standard talk about it anywhere?
In particular, is GCC one-pass compiler? If it is, then why does it generate the following error twice in this example (though the template argument is different in each error message)?
error: declaration of ‘adder<T> item’ shadows a parameter
error: declaration of ‘adder<char [21]> item’ shadows a parameter
A more general question
What are the advantages and disadvantages of one-pass compiler and multi-pass compiler?
Useful links:
A List of C/C++ compilers (wikipedia)
An incomplete list of C++ compilers (Bjarne Stroustrup's site)
The standard sets no requirements what so ever with regards to
how a compiler is implemented. But what do you mean by
"one-pass"? Most compilers today do only read the input file
once. They create an in memory representation (often in the
form of some sort of parse tree), and may make multiple passes
over that. And almost certainly make multiple passes over parts
of it. The compiler must make a "pass" over the internal
representation of a template each time it is instantiated, for
example; there's no way of avoiding that. G++ also makes
a "pass" over the template when it is defined, before any
instantiation, and reports some errors then. (The standard
committee expressedly designed templates to allow a maximum of
error detection at the point of definition. This is the
motivation behind the requirement for typename in certain
places, for example.) Even without templates, a compiler will
generally have to make two passes over a class definition if
there are functions defined in it.
With regards to the more general question, again, I think you'd
have to define exactly what you mean by "one-pass". I don't
know of any compiler today which reads the source file several
times, but almost all will visit some or all of the nodes in the
parse tree more than once. Is this one-pass or multi-pass? The
distinction was more significant in the past, when memory wasn't
sufficient to maintain much of the source code in an internal
representation. Languages like Pascal and, to a lesser degree
C, were sometimes designed to be easy to implement with a single
pass compiler, since a single pass compiler would be
significantly faster. Today, this issue is largely irrelevant,
and modern languages, including C++, tend to ignore it; where
C++ seems to conform to the needs of a one-pass compiler, it's
largely for reasons of C compatibility, and where
C compatibility is not an issue (e.g. in a class definition), it
often makes order of declaration irrelevant.
From what I know, 30 years ago it was important for a compiler to be one-pass, because reads and writes to disk (or magnetic tape) were very slow and there was not enough memory to hold whole code (thanks James Kanze). Also, a single-pass is a requirement for scripting/interactive languages.
Nowdays compilers are usually not one-pass, there are several intermediate representations (e.g Abstract Syntax Tree or Static Single Assignment Form) that the code is transformed into and then analised/optimised.
Some elements in C++ cannot be solved without some intermediate steps, e.g. in a class you can reference members which are defined only later in the class body. Also, all templates need to be somehow remembered for further access during instantiation.
What does not happen usually, is that the source code is not parsed several times --- there is no need for that. So you should not experience same syntactic error being reported several times.
No, I would be surprised if you found a heavily used C++ single pass compiler.
No, it does multiple passes and even different optimizations based on the flags you pass it.
Advantages (single-pass): fast! Since all the source only needs to be examined once the compilation phase (and thus beginning of execution) can happen very quickly. It is also a model that is attractive because it makes the compiler easy to understand and often times "easier" to implement. (I worked on a single pass Pascal compiler once, but don't encounter them often, whereas single pass interpreters are common)
Disadvantages (sinlge-pass): Optimization, semantic/syntactic analysis. Sometimes a single code look lets things through that are easily caught by simple mechanisms in multiple passes. (kind of why we have things like JSLint)
Advantages (multi-pass): optimizations, semantic/syntactic analysis. Even pseudo interpreted languages like "JRuby" go through a pipeline compilation process to get to java/jvm bytecode before execution, you could consider this multi-pass and the multiple looks at the varying representations (and consequently the resulting optimizations) of code can make it very fast.
Disadvantages (multi-pass): complexity, sometimes time (depending on if AOT/JIT is being used as your compilation method)
Also, single-pass is pretty common in academia to help learn the aspects of compiler design.
Walter Bright, the developer of the first C++ compiler, has stated that he believes it is not possible to compile C++ without at least 3 passes. And, yes, that means 3 full text-transforming passes over the source, not just traversals through an internal tree representation. See his Dr. Dobb's magazine article, "Why is C++ compilation so slow?" So any hope of finding a true one-pass compiler seems doomed. (I think this was part of the motivation Bright had to develop D, his C++ alternative.)
The compiler only needs to look at the sources once top down, but that does not mean that it does not have to process the parsed contents more than once. In particular with templates, it has to instantiate the templated code with the type, and that cannot happen until the template is used (or explicitly instantiated by the user), which is the reason for your duplicate errors:
When the template is defined, the compiler detects an error and at that point the type has not been substituted. When the actual instantiation occurs it substitutes the template arguments and processes the result, which is what triggers the second error. Note that if the template was specialized after the first definition, and before the instantiation, for that particular type, the second error need not occur.