Related
so why is it not allowed to have default arguments on the function declaration and implementation? Wouldnt this be more readable for the implementer and the user of the function?
Is there a special reason why this is not allowed, or why the compiler or linker cant handle this?
Best regards
In fact, it is just that we cannot have, in the same scope, 2 declarations with a (common parameter with a) default argument:
void foo(int = 42);
void foo(int = 42); // Error.
and definition acts also as declaration.
if your definition doesn't include the header with the declaration,
you might have default in definition too.
Notice that default is not part of the signature, but should anyway is the same (by scope) for each translation unit (for inline functions, and also for non-inline functions since C++20 (but some default can be omitted)).
I don't know the why of those rules though.
There is no real reason except that the committee decided so and they apparently like to show their power by torturing millions of programmers this way.
Ok... may be this is not true but in my opinion it's a more logical explanation of why this is forbidden that anything I've read about the issue.
Note that g++ with -fpermissive allows default to be specified both in declaration and in implementation IF THE VALUES ARE THE SAME and gives an error if they are different.
This is the way IMO it should be in the standard, but it's not.
Because no.
PS: Don'y try to read too much into logical reasons about the rules of C++. Many times there are reasons, sometimes there are just poor justifications of a sad incident that must stay in forever for backward compatibility, sometimes there is no reason at all... it's just the way it is. This, added to the complexity of C++ and the concept of Undefined Behavior, is in my opinion why experimenting with C++ doesn't work well and you need to actually read the standard rules. Being smart doesn't help if you're experimenting, because the "correct" answer is often the wrong one. There's no way you can guess what was the decision taken in that rainy day at the committee meeting.
This might be a stupid question, but I am confused. I had a feeling that an immediate (consteval) function has to be executed during compile time and we simply cannot see its body in the binary.
This article clearly supports my feeling:
This has the implication that the [immediate] function is only seen at compile time. Symbols are not emitted for the function, you cannot take the address of such a function, and tools such as debuggers will not be able to show them. In this matter, immediate functions are similar to macros.
The similar strong claim might be found in Herb Sutter's publication:
Note that draft C++20 already contains part of the first round of reflection-related work to land in the standard: consteval functions that are guaranteed to run at compile time, which came from the reflection work and are designed specifically to be used to manipulate reflection information.
However, there is a number of evidences that are not so clear about this fact.
From cppreference:
consteval - specifies that a function is an immediate function, that is, every call to the function must produce a compile-time constant.
It does not mean it has to be called during compile time only.
From the P1073R3 proposal:
There is now general agreement that future language support for reflection should use constexpr functions, but since "reflection functions" typically have to be evaluated at compile time, they will in fact likely be immediate functions.
Seems like this means what I think, but still it is not clearly said. From the same proposal:
Sometimes, however, we want to express that a function should always produce a constant when called (directly or indirectly), and a non-constant result should produce an error.
Again, this does not mean the function has to be evaluated during compile time only.
From this answer:
your code must produce a compile time constant expression. But a compile time constant expression is not an observable property in the context where you used it, and there are no side effects to doing it at link or even run time! And under as-if there is nothing preventing that
Finally, there is a live demo, where consteval function is clearly called during runtime. However, I hope this is due to the fact consteval is not yet properly supported in clang and the behavior is actually incorrect, just like in Why does a consteval function allow undefined behavior?
To be more precise, I'd like to hear which of the following statements of the cited article are correct:
An immediate function is only seen at compile time (and cannot be evaluated at run time)
Symbols are not emitted for an immediate function
Tools such as debuggers will not be able to show an immediate function
To be more precise, I'd like to hear which of the following statements of the cited article are correct:
An immediate function is only seen at compile time (and cannot be evaluated at run time)
Symbols are not emitted for an immediate function
Tools such as debuggers will not be able to show an immediate function
Almost none of these are answers which the C++ standard can give. The standard doesn't define "symbols" or what tools can show. Almost all of these are dealer's choice as far as the standard is concerned.
Indeed, even the question of "compile time" vs. "run time" is something the standard doesn't deal with. The only question that concerns the standard is whether something is a constant expression. Invoking a constexpr function may produce a constant expression, depending on its parameters. Invoking a consteval function in a way which does not produce a constant expression is il-formed.
The one thing the standard does define is what gets "seen". Though it's not really about "compile time". There are a number of statements in C++20 that forbid most functions from dealing in pointers/references to immediate functions. For example, C++20 states in [expr.prim.id]/3:
An id-expression that denotes an immediate function shall appear only
as a subexpression of an immediate invocation, or
in an immediate function context.
So if you're not in an immediate function, or you're not using the name of an immediate function to call another immediate function (passing a pointer/reference to the function), then you cannot name an immediate function. And you can't get a pointer/reference to a function without naming it.
This and other statements in the spec (like pointers to immediate function not being valid results of constant expressions) essentially make it impossible for a pointer/reference to an immediate function to leak outside of constant expressions.
So statements about the visibility of immediate functions are correct, to some degree. Symbols can be emitted for immediate functions, but you cannot use immediate functions in a way that would prevent an implementation from discarding said symbols.
And that's basically the thing with consteval. It doesn't use standard language to enforce what must happen. It uses standard language to make it impossible to use the function in a way that will prevent these things from happening. So it's more reasonable to say:
You cannot use an immediate function in a way that would prevent the compiler from executing it at compile time.
You cannot use an immediate function in a way that would prevent the compiler from discarding symbols for it.
You cannot use an immediate function in a way that would force debuggers to be able to see them.
Quality of implementation is expected to take things from there.
It should also be noted that debugging builds are for... debugging. It would be entirely reasonable for advanced compiler tools to be able to debug code that generates constant expressions. So a debugger which could see immediate functions execute is an entirely desirable technology. This becomes moreso as compile-time code grows more complex.
The proposal mentions:
One consequence of this specification is that an immediate function never needs to be seen by a back end.
So it is definitely the intention of the proposal that calls are replaced by the constant. In other words, that the constant expression is evaluated during translation.
However, it does not say it is required that it is not seen by the backend. In fact, in another sentence of the proposal, it just says it is unlikely:
It also means that, unlike plain constexpr functions, consteval functions are unlikely to show up in symbolic debuggers.
More generally, we can re-state the question as:
Are compilers forced to evaluate constant expressions (everywhere; not just when they definitely need it)?
For instance, a compiler needs to evaluate a constant expression if it is the number of elements of an array, because it needs to statically determine the total size of the array.
However, a compiler may not need to evaluate other uses, and while any decent optimizing compiler will try to do so anyway, it does not mean it needs to.
Another interesting case to think about is an interpreter: while an interpreter still needs to evaluate some constant expressions, it may just do it lazily all the time, without performing any constant folding.
So, as far as I know, they aren't required, but I don't know the exact quotes we need from the standard to prove it (or otherwise). Perhaps it is a good follow-up question on its own, which would answer this one too.
For instance, in [expr.const]p1 there is a note that says they can, not that they are:
[Note: Constant expressions can be evaluated during translation. — end note]
I recently disabled RTTI on my compiler (MSVC10) and the executable size decreased significantly. By comparing the produced executables using a text editor, I found that the RTTI-less version contains much less symbol names, explaining the saved space.
AFAIK, those symbol names are only used to fill the type_info structure associated with each the polymorphic type, and one can programmatically access them calling type_info::name().
According to the standard, the format of the string returned by type_info::name() is unspecified. That is, no one can rely one it to do serious things portably. So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
But... is it possible ? I'm using MSVC10 and I've not found any option to do that. I can either disable completely RTTI (/GR-), or enable it with full type names (/GR). Does any compiler provide such an option?
So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
You are misreading the standard. The intent of making the return value from type_info::name() unspecified (other than a null-terminated binary string) was to give the implementers of the compiler/library/run-time environment free reign to implement the RTTI requirements as they see best. You, the programmer, have no say in how the Application Binary Interface (if there is one) is designed or implemented.
You're asking three different questions here.
The initial question asks whether there's any way to get MSVC to not generate names, or whether it's possible with other compilers, or, failing that, whether there's any way to strip the names out of the generated type_info without breaking things.
Then you want to know whether it would be possible to modify the MS ABI (presumably not too radically) so that it would be possible to strip the names.
Finally, you want to know whether it would be possible to design an ABI that didn't have names.
Question #1 is itself a complex question. As far as I know, there's no way to get MSVC to not generate names. And most other compilers are aimed at ABIs that specifically define what typeid(foo).name() must return, so they also can't be made to not generate names.
The more interesting question is, what happens if you strip out the names. For MSVC, I don't know the answer. The best thing to do here is probably to try it—go into your DLLs and change the first character of each name to \0 and see if it breaks dynamic_cast, etc. (I know that you can do this with Mac and linux x86_64 executables generated by g++ 4.2 and it works, but let's put that aside for now.)
On to question #2, assuming blanking the names doesn't work, it wouldn't be that hard to modify a name-based system to no longer require names. One trivial solution is to use hashes of the names, or even ROT13-encoded names (remember that the original goal here is "I don't want casual users to see the embarrassing names of my classes"). But I'm not sure that would count for what you're looking for. A slightly more complex solution is as follows:
For "dllexport"ed classes, generate a UUID, put that in the typeinfo, and also put it in the .LIB import library that gets generated along with the DLL.
For "dllimport"ed classes, read the UUID out of the .LIB and use that instead.
So, if you manage to get the dllexport/dllimport right, it will work, because your exe will be using the same UUID as the dll. But what if you don't? What if you "accidentally" specify identical classes (e.g., an instantiation of the same template with the same parameters) in your DLL and your EXE, without marking one as dllexport and one as dllimport? RTTI won't see them as the same type.
Is this a problem? Well, the C++ standard doesn't say it is. And neither does any MS documentation. In fact, the documentation explicitly says that you're not allowed to do this. You cannot use the same class or function in two different modules unless you explicitly export it from one module and import it into another. The fact that this is very hard to do with class templates is a problem, and it's a problem they don't try to solve.
Let's take a realistic example: Create a node-based linkedlist class template with a global static sentinel, where every list's last node points to that sentinel, and the end() function just returns a pointer to it. (Microsoft's own implementation of std::map used to do exactly this; I'm not sure if that's still true.) New up a linkedlist<int> in your exe, and pass it by reference to a function in your dll that tries to iterate from l.begin() to l.end(). It will never finish, because none of the nodes created by the exe will point to the copy of the sentinel in the dll. Of course if you pass l.begin() and l.end() into the DLL, instead of passing l itself, you won't have this problem. You can usually get away with passing a std::string or various other types by reference, just because they don't depend on anything that breaks. But you're not actually allowed to do so, you're just getting lucky. So, while replacing the names with UUIDs that have to be looked up at link time means types can't be matched up at link-loader time, the fact that types already can't be matched up at link-loader time means this is irrelevant.
It would be possible to build a name-based system that didn't have these problems. The ARM C++ ABI (and the iOS and Android ABIs based on it) restricts what programmers can get away with much less than MS, and has very specific requirements on how the link-loader has to make it work (3.2.5). This one couldn't be modified to not be name-based because it was an explicit choice in the design that:
• type_info::operator== and type_info::operator!= compare the strings returned by type_info::name(), not just the pointers to the RTTI objects and their names.
• No reliance is placed on the address returned by type_info::name(). (That is, t1.name() != t2.name() does not imply that t1 != t2).
The first condition effectively requires that these operators (and type_info::before()) must be called out of line, and that the execution environment must provide appropriate implementations of them.
But it's also possible to build an ABI that doesn't have this problem and that doesn't use names. Which segues nicely to #3.
The Itanium ABI (used by, among other things, both OS X and recent linux on x86_64 and i386) does guarantee that a linkedlist<int> generated in one object and a linkedlist<int> generated from the same header in another object can be linked together at runtime and will be the same type, which means they must have equal type_info objects. From 2.9.1:
It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms.
The compiler, linker, and link-loader must work together to make sure that a linkedlist<int> created in your executable points to the exact same type_info object that a linkedlist<int> created in your shared object would.
So, if you just took out all the names, it wouldn't make any difference at all. (And this is pretty easily tested and verified.)
But how could you possibly implement this ABI spec? j_kubik effectively argues that it's impossible because you'd have to preserve some link-time information in the .so files. Which points to the obvious answer: preserve some link-time information in the .so files. In fact, you already have to do that to handle, e.g., load-time relocations; this just extends what you need to preserve. And in fact, both Apple and GNU/linux/g++/ELF do exactly that. (This is part of the reason everyone building complex linux systems had to learn about symbol visibility and vague linkage a few years ago.)
There's an even more obvious way to solve the problem: Write a C++-based link loader, instead of trying to make the C++ compiler and linker work together to trick a C-based link loader. But as far as I know, nobody's tried that since Be.
Requirements for type-descriptor:
Works correctly in multi compilation-unit and shared-library environment;
Works correctly for different versions of shared libraries;
Works correctly although different compilation units don't share any information about type, except it's name: usually one header is used for all compilation units to define same type, but it's not required; even if, it doesn't affect resulting object file.
Work correctly despite fact that template instantiations must be fully-defined (so including type_info data) in every library that uses them, and yet behave like one type if several such libs are used together.
The fourth rule essentially bans all non-name based type-descriptors like UUIDs (unless specifically mentioned in type definition, but that is just name-replacement at best, and probably requires standard-alterations).
Stroing thuse UUIDs in separate files like suggeste .LIB files also causes trouble: different library versions implementing new types would cause trouble.
Compilation units should be able to share the same type (and its type_info) without the need to involve linker - because it should stay free of any language-specifics.
So type-name can be only unique type descriptor without completely re-modeling compilation and linking (also dynamic). I could imagine it working, but not under current scheme.
The task
I am trying to work out how best to add C++0x's override identifier to all existing methods that are already overrides in a large body of C++ code, without doing it manually.
(We have many, many hundreds of thousands of lines of code, and doing it manually would be a complete non-starter.)
Current idea
Our coding standards say that we should add the virtual keyword against all implicitly virtual methods in derived classes, even though strictly unnecessary (to aid comprehension).
So if I were to script the addition myself, I'd write a script that read all our headers, found all functions beginning with virtual, and insert override before the following semi-colon. Then compile it on a compiler that supports override, and fix all the errors in base classes.
But I'd really much rather not use this home-grown way, as:
it's obviously going to be tedious and error-prone.
not everyone has remembered, every time, to add the virtual keyword, so this method would miss out some existing overrides
Is there an existing tool?
So, is there already a tool that parses C++ code, detects existing methods that overrides, and appends override to their declarations?
(I am aware of static analysis tools such as PC-lint that warn about functions that look like they should be overrides. What I'm after is something that would actually munge our code, so that future errors in overrides will be detected at compiler-time, rather than later on in static analysis)
(In case anyone is tempted to point out that C++03 doesn't support 'override'... In practice, I'd be adding a macro, rather than the actual "override" identifier, to use our code on older compilers that don't support this feature. So after the identifier was added, I'd run a separate script to replace it with whatever macro we're going to use...)
Thanks in advance...
There is a tool under development by the LLVM project called "cpp11-migrate" which currently has the following features:
convert loops to range-based for loops
convert null pointer constants (like NULL or 0) to C++11 nullptr
replace the type specifier in variable declarations with the auto type specifier
add the override specifier to applicable member functions
This tool is documented here and should be released as part of clang 3.3.
However, you can download the source and build it yourself today.
Edit
Some more info:
Status of the C++11 Migrator - a blog post, dated 2013-04-15
cpp11-migrate User’s Manual
Edit 2: 2013-09-07
"cpp11-migrate" has been renamed to "clang-modernize". For windows users, it is now included in the new LLVM Snapshot Builds.
Edit 3: 2020-10-07
"clang-modernize" has bee renamed to "Clang-Tidy".
Our DMS Software Reengineering Toolkit with its C++11-capable C++ Front End can do this.
DMS is a general purpose program transformation system for arbitrary programming languages; the C++ front end allows it to process C++. DMS parses, builds ASTs and symbol tables that are accurate (this is hard to do for C++), provides support for querying properties of the AST nodes and trees, allows procedural and source-to-source transformations on the tree. After all changes are made, the modified tree can be regenerated with comments retained.
Your problem requires that you find derived virtual methods and change them. A DMS source-to-source transformation rule to do that would look something like:
source domain Cpp. -- tells DMS the following rules are for C++
rule insert_virtual_keyword (n:identifier, a: arguments, s: statements):
method_declaration -> method_declaration " =
" void \n(\a) { \s } " -> " virtual void \n(\a) { \s }"
if is_implicitly_virtual(n).
Such rules match against the syntax trees, so they can't mismatch to a comment, string, or whatever. The funny quotes are not C++ string quotes; they are meta-quotes to allow the rule language to know that what is inside them has to be treated as target language ("Cpp") syntax. The backslashes are escapes from the target language text, allowing matches to arbitrary structures e.g., \a indicates a need for an "a", which is defined to be the syntactic category "arguments".
You'd need more rules to handle cases where the function returns a non-void result, etc. but you shouldn't need a lot of them.
The fun part is implementing the predicate (returning TRUE or FALSE) controlling application of the transformation: is_implicitly_virtual. This predicate takes (an abstract syntax tree for) the method name n.
This predicate would consult the full C++ symbol table to determine what n really is. We already know it is a method from just its syntactic setting, but we want to know in what class context.
The symbol table provides the linkage between the method and class, and the symbol table information for the class tells us what the class inherits from, and for those classes, which methods they contain and how they are declared, eventually leading to the discovery (or not) that the parent class method is virtual. The code to do this has to be implemented as procedural code going against the C++ symbol table API. However, all the hard work is done; the symbol table is correct and contains references to all the other data needed. (If you don't have this information, you can't possibly decide algorithmically, and any code changes will likely be erroneous).
DMS has been used to carry out massive changes on C++ code in the past using program transformations.(Check the Papers page at the web site for C++ rearchitecting topics).
(I'm not a C++ expert, merely the DMS architect, so if I have minor detail wrong, please forgive.)
I did something like this a few months ago with about 3 MB worth of code and while you say that "doing it manually would be a complete non-starter," I think it is the only way. The reason is that you should be applying the override keyword to the prototypes that are intended to override base class methods. Any tool that adds it will put it on the prototypes that actually override base class methods. The compiler already knows which methods those are so adding the keyword doesn't change anything. (Please note that I am not terribly familiar with the new standard and I am assuming the override keyword is optional. Visual Studio has supported override since at least VS2005.)
I used a search for "virtual" in the header files to find most of them and I still occasionally find another prototype that is missing the override keyword.
I found two bugs by going through that.
Eclipse CDT has a working C++ parser and semantic utilities. The latest version IIRC also has markers for overriding methods.
It wouldn't require much code to write a plug-in which would base on that and rewrite the code to contain the override tags where appropriate.
one option is to
Enable suggest-override compiler warning And then write a script
which can insert override keyword to location pointed by the emitted warnings
Hello Community
I am look at C++ assembly, I have compiled a benchmark from the PARSEC suite and I am having difficulty knowing how do they name the class attribute functions in assembly language. for example if I have a class with some functions to manipulate it, in cpp we call them like test.increment();
After some investigation I found out that this function is
atomic_load_acq_ptr
represented as:
_ZL19atomic_load_acq_intPVj
in assembly, or at least this is what I have found out.
Let me know if I am wrong!
Is there some fixed rule for the mapping? or are they random?
Thanks
It's called name mangling, is necessary because of overloads and templates and such (i.e. the plain chars-and-numbers name isn't enough to identify a chunk of code unambiguously; embedding spaces or <> or :: in names usually isn't legal; copying the additional information in uncondensed, human-readable form would be wasteful), and it therefore depends on types, arity, etc.
The exact scheme can vary, but usually each compiler is self-consistent for a relatively long time (sometimes even several compilers can settle for one way).
That's called name mangling.. It is compiler dependant. No standard way, sorry :)
C++ allows function overloading, this means that one can have two functions with the same name but different parameters. Since your binary formats do not understand type this is a proble. The way that this is worked around is to use a scheme called name mangling. This adds a whole function of type information to the name used in the source file and ensures one calls the correct overload.
The extra letters etc that are added are governed by the particular Application Binary Interface (ABI) being used. Different compilers (and sometimes even different versions) may use different ABIs.
Yes there's a standard method for creating these symbols known as name mangling.