Does a compiler collapse classes which are identical in their structure? - c++

I hope this isn't a duplicate of a question itself, but the search terms are so ambiguous, I can't think of anything better.
Say we have two classes:
class FloatRect
{
float x,y,width,height;
};
and somewhere else
class FloatBox
{
float top,left,bottom,right;
};
From a practical standpoint, they're the same, so does the compiler treat them both as some sort of typedef?
Or will it produce two separate units of code?
I'm curious because I'd like to go beyond typedefs and make a few variants of a type to improve readability.
I don't want needless duplication, though...

This is completely implementation specific.
For example I can use CLang / LLVM to illustrate both point of view at once:
CLang is the C++ front-end, it uses two distinct types to resolve function calls etc... and treats them as completely different values
LLVM is the optimizer backend, it doesn't care (yet) about names, but only structural representation, and will therefore collapse them in a single type... or even entirely remove the time definition if useless.
If the question is about: does introducing a similarly laid-out class creates overhead, then the answer is no, so write the classes that you need.
Note: the same happens for functions, ie the optimizer can merge blocks of functions that are identical to get tighter code, this is not a reason to copy/paste though

They are totally unrelated classes with regards to the compiler.
If they are just POD C-structs, it won't actually generate any real code for them as such. (Yes there is a silent assignment operator and some other functions but I doubt there will be code actually compiled to do it, it will just inline them if they are used).

Since the classes you use as samples are only relevant during compilation, there's nothing to duplicate or collapse. Runtime, the member variables are simply accessed as "the value at at offset N".

This is, of course, hugely implementation-specific.
Any internal collapse here would be completely internal to the mechanism of the compiler, and would not have an effect on the produced translated code.
I would imagine it's very unlikely that this is the case, as I can think of no benefit and several ways in which this would really complicate matters. I can't present any evidence, though.

No. As they are literally two different types.
The compiler must treat them that way.
There is no magic merging going on.

No they are not treated as typedefs, because they are different types and can for example be used for overloading functions.
On the other hand, the types have no code in them so there will be nothing to duplicate.

Related

Pertinence of void pointers

Looking through a colleague's code, I see that some of its handles are stored as void pointers.
// Class header
void* hSomeSdk;
// Class implementation
hSomeSdk = new SomeSDK(...);
((SomeSDK*)hSomeSdk)->DoSomeWork();
Now I know that sometimes handles are void pointers because it may be unknown before runtime what will be the actual type of the handle. Or that it can help when we need to share the pointer without revealing its actual structure. But this does not seem to be the case in my situation: it will always be SomeSDK and it is not shared outside the class where it is created. Also the author of this code is gone from the company.
Are there other reasons why it would make sense to have it be a void pointer?
Since this is a member variable, I'm gonna go out on a limb and say your colleague wanted to minimize dependencies. Including the header for SomeSDK is probably undesirable just to define a pointer. The colleague may have had one of two reasons as far as I can see from the code you show:
They just didn't know they can add a forward declarations like class SomeSDK; to allow defining pointers. Some programmers just aren't aware of it.
They couldn't forward declare it. If SomeSDK is not a class, but a type alias (aka typedef), then it's not possible to forward declare it exactly. One can only declare the class it aliases, but that in turn may be an implementation detail that's hard to keep track of. Even the standard library has a similar problem, that is why it provides iosfwd to make forward declaring standard stream types easier.
If the code is peppered with casts of this handle, then the design should have been reworked ages ago. Otherwise, if it's in one place (or a few at most) only, I can see why the people maintaining it could live with it peacefully.
Nope.
If I had to guess, the ex-colleague was unfamiliar with forward declarations and thus didn't know they could still do SomeSDK* in the header without including the entire SomeSDK definition.
Given the constraints you've mentioned, the only thing this pattern achieves is to eliminate some type safety, make the code harder to read/maintain, and generate a Stack Overflow question.
void* were popular and needed back in C. They are convenient in the sense that they can be easily cast to anything. If you need to cast from double* to char*, you have to make a mid cast to void*.
The problem with void* is that they are too flexible: they do not convey intentions of the writer, making them very unsafe especially in big projects.
In Object Oriented Design it is popular to create abstract interface classes (all members are virtual and not implemented) and make pointers to such classes and then instantiate various possible implementation depending on the usage.
However, nowadays, it is more recommended to work with templates (main advantage of C++ over other languages), as those are much faster and enable more compile-time optimization than OOD allowed. Unfortunately, working with templates is still a huge hassle - they have more complicated syntax and it is difficult to convey intentions of the writer to users about restrictions and demands of the template parameters (Concepts TS that solves this problem decently will be available in C++20 - currently there is only SFINAE, a horrible temporary solution from 20 years ago; while Reflection TS, that will greatly enhance generic programming in C++, is unlikely to be available even in C++23).

Why can't I overload C++ conversion operators outside a class, as a non-member function?

This question has kind of been asked before but I feel the asker was hasty to call an answer correct when he never actually got a real answer. Maybe there is no reason why, and this needs to be put in the standard later, you tell me.
What is the rationale to not allow overloading of C++ conversions operator with non-member functions
I'm looking for the specific reason for this not being allowed as part of the design of the current standard. Basically, when you overload a cast operator to define an implicit conversion between two types, this overloaded definition has to be a member of the class that you're converting from, and not something outside a class. The obvious problem is that if you have types that you really can't modify for some reason but you want to implicitly convert between them for the sake of simplicity of syntax (despite the evils of implicit conversion) or because you have a bunch of other code, standard or custom that relies on implicit conversion...you can't do that if you can't add appropriate implicit conversions to the classes, so you need to use workarounds like regular functions for conversion that you would wrap around what would otherwise be the convenience of implicit conversion.
Also, is it...really possible that there would be a computational overhead to add these conversions outside a class? The way I see it, it would be easy for a compiler to, when going through to figure out what functions are available, associate external implicit conversion functions with the class they convert from so that the code is executed like it were part of that class as far as efficiency goes. The only downside would be the extra work it has to do to make the initial association, which should be almost nothing.
I will not take "because the standard says so" or "because implicit conversions are bad" as an answer. Somebody surely had a reason when they wrote the actual standard.
(I'm not a huge expert, I'm still learning the language.)
Edit, response:
Well, I imagine the situation could be like, yes you change the header file, but what you don't do is overwrite the existing one because that would be terrible. You would create a new header file based on the old one to accomodate the changes. The assumption would be that the old code is already compiled in an object file and changing the header just tells the compiler there's additional code somewhere else that you added. It wouldn't change what the old code does because it's already compiled and doesn't depend on that (i.e. some vendor handed you object code and a header). If I could modify and recompile the code I would be using the conversion for, then you couldn't make me write the conversion function externally, I wouldn't do it, it's too confusing. You wouldn't have to search every header randomly for the right definitions; if I was writing the code myself I would make a custom header with a highly visible section where the stuff I added to the vendor-supplied header is, and said header would be relatively obvious as to which one it was because it would be associated with the related types, and the other headers would be named by their original names so you would know they weren't changed. And you would have a corresponding file that contains only the conversion definitions, so my modifications would be self-contained, separated from the original object code, and relatively easy to find. Of course that's apart from the actual struggle of figuring out in the code which conversion function applies. I think you can find a variety of cases where that's easy enough to determine and natural enough to use where it makes sense to add on to an existing library like this for your own purposes. If I was using commercial code that I couldn't really modify and I saw a situation where what I was doing with it could be improved by using a conversion function to integrate it with some of my own stuff, I could see myself wanting to do this. Granted such things aren't obvious for a third person just reading a = b, they wouldn't know what was going on with my conversions just from that, but if you knew and it read nicely then it could work.
I appreciate the insight on how standards decisions tend to work, this is definitely a kind of fringe thing that you could ignore.
Besides having non-explicit conversion operators e.g. operator bool() in a class, you can also have non-explicit constructors taking a single argument, in the class you are converting to, as a way of introducing a user-defined conversion. (Not mentioned in question)
As to why you cannot introduce user-defined conversions between two types A and B without modifying their definitions... well this would create chaos.
If you can do this, then you can do it in a header file, and since introducing new user-defined conversions can change the meaning of code, it would mean that "old" code using only A and B could totally change what it is doing depending on if your header is then included before it, or something like this.
It's already hard enough to figure out exactly what user-defined conversion sequences are taking place when things are going wrong, even with the restriction that the conversions have to be declared by one of the two types. If you literally have to search every single unrelated header file in full potentially to find these conversion function definitions it dramatically worsens the maintenance problem, and there doesn't appear to be any benefit to allowing this. I mean can you give a non-contrived example where this language feature would help you make the implementation of something much simpler or easier to read?
In general, I think programmers like the idea that to figure out what a line a = b; does, they just have to read the definition of the type of a and the type of b and go from there... it's potentially ugly and painful if you start allowing these "gotcha" conversions that are harder to know about.
I guess you could say the same thing with regards to operator << being used for streaming... but with user-defined conversions its more serious since it can potentially affect any line of code where an object of that type is being passed as a parameter.
Also, I don't think you should necessarily expect to find a well-thought out reason, not everything that is feasible for compilers to implement is permitted by the standard. Committee tends to be conservative and seek consensus, so "no one really cared about feature X enough to fight for it" is probably as good an explanation as you will find about why feature X is not available.
Why is initialization of a constant dependent type in a template parameter list disallowed by the standard?
Answer to that question suggests a common reason for a feature not being available:
Legacy: the feature was left out in the first place and now we've built a lot without it that it's almost forgotten (see partial function template specialization).

Will C++ compiler generate code for each template type?

I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for pre-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
If this makes any difference, we are talking about embedded applications.
I have found a similar question, but it did not completely answer my question: Do C++ template classes duplicate code for each pointer type used?
Thanks a lot!
I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
Yes, this is equivalent to having both functions written.
Some linkers will detect the identical functions, and eliminate them. Some libraries are aware that their linker doesn't have this feature, and factor out common code into a single implementation, leaving only a casting wrapper around the common code. Ie, a std::vector<T*> specialization may forward all work to a std::vector<void*> then do casting on the way out.
Now, comdat folding is delicate: it is relatively easy to make functions you think are identical, but end up not being the same, so two functions are generated. As a toy example, you could go off and print the typename via typeid(x).name(). Now each version of the function is distinct, and they cannot be eliminated.
In some cases, you might do something like this thinking that it is a run time property that differs, and hence identical code will be created, and the identical functions eliminated -- but a smart C++ compiler might figure out what you did, use the as-if rule and turn it into a compile-time check, and block not-really-identical functions from being treated as identical.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for per-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
No, they aren't. Generics are roughly equivalent to the C++ technique of type erasure, such as what std::function<void()> does to store any callable object. In C++, type erasure is often done via templates, but not all uses of templates are type erasure!
The things that C++ does with templates that are not in essence type erasure are generally impossible to do with Java generics.
In C++, you can create a type erased container of pointers using templates, but std::vector doesn't do that -- it creates an actual container of pointers. The advantage to this is that all type checking on the std::vector is done at compile time, so there doesn't have to be any run time checks: a safe type-erased std::vector may require run time type checking and the associated overhead involved.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
They are distinct types. I can write code that will behave differently with a char or short value. As an example:
std::cout << x << "\n";
with x being a short, this print an integer whose value is x -- with x being a char, this prints the character corresponding to x.
Now, almost all template code exists in header files, and is implicitly inline. While inline doesn't mean what most folk think it means, it does mean that the compiler can hoist the code into the calling context easily.
If this makes any difference, we are talking about embedded applications.
What really makes a difference is what your particular compiler and linker is, and what settings and flags they have active.
The answer is maybe. In general, each instantiation of a
template is a unique type, with a unique implementation, and
will result in a totally independent instance of the code.
Merging the instances is possible, but would be considered
"optimization" (under the "as if" rule), and this optimization
isn't wide spread.
With regards to comparisons with Java, there are several points
to keep in mind:
C++ uses value semantics by default. An std::vector, for
example, will actually insert copies. And whether you're
copying a short or a double does make a difference in the
generated code. In Java, short and double will be boxed,
and the generated code will clone a boxed instance in some way;
cloning doesn't require different code, since it calls a virtual
function of Object, but physically copying does.
C++ is far more powerful than Java. In particular, it allows
comparing things like the address of functions, and it requires
that the functions in different instantiations of templates have
different addresses. Usually, this is not an important point,
and I can easily imagine a compiler with an option which tells
it to ignore this point, and to merge instances which are
identical at the binary level. (I think VC++ has something like
this.)
Another issue is that the implementation of a template in C++
must be present in the header file. In Java, of course,
everything must be present, always, so this issue affects all
classes, not just template. This is, of course, one of the
reasons why Java is not appropriate for large applications. But
it means that you don't want any complicated functionality in a
template; doing so loses one of the major advantages of C++,
compared to Java (and many other languages). In fact, it's not
rare, when implementing complicated functionality in templates,
to have the template inherit from a non-template class which
does most of the implementation in terms of void*. While
implementing large blocks of code in terms of void* is never
fun, it does have the advantage of offering the best of both
worlds to the client: the implementation is hidden in compiled
files, invisible in any way, shape or manner to the client.

Performance penalties on using "this->"?

Consider this example of two similar C++ member functions in a class C:
void C::function(Foo new_f) {
f = new_f;
}
and
void C::function(Foo new_f) {
this->f = new_f;
}
Are these functions compiled in the same manner? Are there any performance penalties for using this-> (more memory accesses or whatever)?
Yes, it's exactly the same and you'll get the same performance.
The only time you really must use the this-> syntax is when you have an argument to the function with the same name as an instance variable you want to access. Using the name of the variable by itself will refer to the argument so you need the this->. Of course, you could just rename the argument too. And, as ildjarn has pointed out in the comments also, you need to use this in certain situations to call functions that are dependent because this is implicitly dependent (you can read more about that though).
From the viewpoint of the compiler, there's no difference between this-> being implicit and being explicit.
Remember, however, that code should be written primarily for human readers, and only secondarily for the compiler. From this viewpoint, using this-> (except in the few places it's truly needed) is a huge loss, and should be expunged from all code.
It's a shorthand. In this case, it's exactly the same.
There is no performance penalty for the resulting code, because the compiler will have to use this to access the member anyway.
There is a performance penalty for me reading the code, because I would have to stop here and think "Why is this-> needed here? Is there a coding trick involved? Did I just miss something important about this class? Or did the coder just insert a random this-> for no reason?".
The compiler uses this pointer for you without you even knowing it. Whenever you type it yourself, you're explicitly stating it, but there's (in some cases) no need.
You can compare the assembly output of the two functions by compiling them under GCC with the flag -S. This will generate symbolic assembly code for the input C/C++ files, and the two should be identical.

Should const functionality be expanded?

EDIT: this question could probably use a more apropos title. Feel free to suggest one in the comments.
In using C++ with a large class set I once came upon a situation where const became a hassle, not because of its functionality, but because it's got a very simplistic definition. Its applicability to an integer or string is obvious, but for more complicated classes there are often multiple properties that could be modified independently of one another. I imagine many people forced to learn what the mutable keyword does might have had similar frustrations.
The most apparent example to me would be a matrix class, representing a 3D transform. A matrix will represent both a translation and a rotation each of which can be changed without modifying the other. Imagine the following class and functions with the hypothetical addition of 'multi-property const'.
class Matrix {
void translate(const Vector & translation) const("rotation");
void rotate(const Quaternion & rotation) const("translation");
}
public void spin180(const("translation") & Matrix matrix);
public void moveToOrigin(const("rotation") & Matrix matrix);
Or imagine predefined const keywords like "_comparable" which allow you to define functions that modify the object at will as long as you promise not to change anything that would affect the sort order of the object, easing the use of objects in sorted containers.
What would be some of the pros and cons of this kind of functionality? Can you imagine a practical use for it in your code? Is there a good approach to achieving this kind of functionality with the current const keyword functionality?
Bear in mind
I know such a language feature could easily be abused. The same can be said of many C++ language features
Like const I would expect this to be a strictly compile-time bit of functionality.
If you already think const is the stupidest thing since sliced mud, I'll take it as read that you feel the same way about this. No need to post, thanks.
EDIT:
In response to SBK's comment about member markup, I would suggest that you don't have any. For classes / members marked const, it works exactly as it always has. For anything marked const("foo") it treats all the members as mutable unless otherwise marked, leaving it up to the class author to ensure that his functions work as advertised. Besides, in a matrix represented as a 2D array internally, you can't mark the individual fields as const or non-const for translation or rotation because all the degrees of freedom are inside a single variable declaration.
Scott Meyers was working on a system of expanding the language with arbitary constraints (using templates).
So you could say a function/method was Verified,ThreadSafe (etc or any other constraints you liked). Then such constrained functions could only call other functions which had at least (or more) constraints. (eg a method maked ThreadSafe could only call another method marked ThreadSafe (unless the coder explicitly cast away that constraint).
Here is the article:
http://www.artima.com/cppsource/codefeatures.html
The cool concept I liked was that the constraints were enforced at compile time.
In cases where you have groups of members that are either const together or mutable together, wouldn't it make as much sense to formalize that by putting them in their own class together? That can be done today without changing the language.
Refinement
When an ADT is indistinguishable from itself after some operation the const property holds for the entire ADT. You wish to define partial constness.
In your sort order example you are asserting that operator< of the ADT is invariant under some other operation on the ADT. Your ad-hoc const names such as "rotation" are defined by the set of operations for which the ADT is invariant. We could leave the invariant unnamed and just list the operations that are invariant inside const(). Due to overloading functions would need to be specified with their full declaration.
void set_color (Color c) const (operator<, std::string get_name());
void set_name (std::string name) const (Color get_color());
So the const names can be seen as a formalism - their existence or absence doesn't change the power of the system. But 'typedef' could be used to name a list of invariants if that proves useful.
typedef const(operator<, std::string get_name()) DontWorryOnlyNameChanged;
It would be hard to think of good names for many cases.
Usefulness
The value in const is that the compiler can check it. This is a different kind of const.
But I see one big flaw in all of this. From your matrix example I might incorrectly infer that rotation and translation are independent and therefore commutative. But there is an obvious data dependency and matrix multiplication is not commutative. Interestingly, this is an example where partial constness is invariant under repeated application of one or the other but not both. 'translate' would be surprised to find that it's object had been translated due to a rotation after a previous translation. Perhaps I am misunderstanding the meaning of rotate and translate. But that's the problem, that constness now seems open to interpretation. So we need ... drum roll ... Logic.
Logic
It appears your proposal is analogous to dependent typing. With a powerful enough type system almost anything is provable at compile time. Your interest is in theorem provers and type theory, not C++. Look into intuitionistic logic, sequent calculus, Hoare logic, and Coq.
Now I've come full circle. Naming makes sense again,
int times_2(int n) const("divisible_by_3");
since divisible_by_3 is actually a type. Here's a prime number type in Qi. Welcome to the rabbit hole. And I pretended to be getting somewhere. What is this place? Why are there no clocks in here?
Such high level concepts are useful for a programmer.
If I wanted to make const-ness fine-grained, I'd do it structurally:
struct C { int x; int y; };
C const<x> *c;
C const<x,y> *d;
C const& e;
C &f;
c=&e; // fail, c->y is mutable via c
d=&e;
c=&f;
d=c;
If you allow me to express a preference for a scope that maximally const methods are preferred (the normal overloading would prefer the non-const method if my ref/pointer is non-const), then the compiler or a standalone static analysis could deduce the sets of must-be-const members for me.
Of course, this is all moot unless you plan on implementing a preprocessor that takes the nice high-level finely grained const C++ and translates it into casting-away-const C++. We don't even have C++0x yet.
I don't think that you can achieve this as strictly compile-time functionality.
I can't think of a good example so this strictly functional one will have to do:
struct Foo{
int bar;
};
bool operator <(Foo l, Foo r){
return (l.bar & 0xFF) < (r.bar & 0xFF);
}
Now I put a some Foos into a sorted set. Obviously the lower 8 bits of bar must remain unchanged so that the order is preserved. The upper bits can however be freely changed. This means the Foos in the set aren't const but aren't mutable either. However I don't see any way you could describe this level of constness in a general useful form without using runtime checking.
If you formalized the requirements I could even imagine, that you could prove that no compiler capable of doing this (at compile time) could even exist.
It could be interesting, but one of the useful features of const's simple definition is that the compiler can check it. If you start adding arbitrary constraints, such as "cannot change sort order", the compiler as it stands now cannot check it. Further, the problem of compile-time checking of arbitrary constraints is, in the general case, impossible to solve due to the halting problem. I would rather see the feature remain limited to what can actually be checked by a compiler.
There is work on enabling compilers to check more and more things — sophisticated type systems (including dependent type systems), and work such as the that done in SPARKAda, allowing for compiler-aided verification of various constraints — but they all eventually hit the theoretical limits of computer science.
I don't think the core language, and especially the const keyword, would be the right place for this. The concept of const in C++ is meant to express the idea that a particular action will not modify a certain area of memory. It is a very low-level idea.
What you are proposing is a logical const-ness that has to do with the high-level semantics of your program. The main problem, as I see it, is that semantics can vary so much between different classes and different programs that there would be no way for there to be a one-size-fits all language construct for this.
What would need to happen is that the programmer would need to be able to write validation code that the compiler would run in order to check that particular operations met his definition of semantic (or "logical") const-ness. When you think about it, though, such code, if it ran at compile-time, would not be very different from a unit test.
Really what you want is for the compiler to test whether functions adhere to a particular semantic contract. That's what unit tests are for. So what you're asking is that there be a language feature that automatically runs unit tests for you during the compilation step. I think that's not terribly useful, given how complicated the system would need to be.