Why do we need the spaceship <=> operator in C++? - c++

Why do we need such an operator in C++ and how is it useful in modern C++ programming? Any real world code examples where this can be applied will help.
This question is geared to understand the practical application in real world without reading wordy proposal from Herb Sutter. No offense to the proposal though.

I'll give you three points of motivation, just off the top of my head:
It's the common generalization of all other comparison operator (for totally-ordered domains): >, >=, ==, <=, < . Using <=> (spaceship), you can implement each of these other operations in a completely generic way.
For strings, it's equivalent to the good old strcmp() function from the C standard library. So - useful for lexicographic order checks, such as data in vectors or lists or other ordered containers.
For integral numbers, it's what the hardware does anyway: On x86 or x86_64 Comparing a and b (CMP RAX, RBX) is basically like subtracting (SUB RAX, RBX) except that RAX doesn't actually change, only the flags are affected, so you can use "jump on equal/not equal/greater than/lesser than/etc." (JE/JNE/JGT/JLT etc.) as the next instruction. CMP should be thought of as a "spaceship compare".

Related

What purpose does the pre-increment operator serve in C? [duplicate]

This question already has answers here:
What are the historical reasons C languages have pre-increments and post-increments?
(6 answers)
Closed 6 years ago.
In C and many of its derivatives, i++ increments i and evaluates to the value of i before it was incremented, and ++i increments i and evaluates to the value of i after it was incremented.
I can see the reasoning for a specific increment operator; many processors at the time had a special increment opcode that was faster than addition and conceptually "increment" is a different idea from "add," in theory having them be written differently might make code more readable.
What I don't understand is the need for the pre-increment operator. Can't any practical use for it be written like this?
#This...
x = i++;
#becomes this:
x = i;
++i;
Is there a historical reason I don't know about, maybe? Were you unable to "throw away" the return values of operators in primordial versions of C?
One reason is that it allowed for the generation of efficient code without having any fancy optimisation phases in compilers, provided that the programmer knew what he (or she) was doing. For example, when copying characters from one buffer to another, you might have:
register char *ptr1;
register char *ptr2;
...
for ( ... ) {
*ptr1++ = *ptr2++; /* post-increment */
}
A compiler that I once worked with (on a proprietary minicomputer) would generate the following register operations for the assignment:
load $r1,*$a1++ // load $r1 from address in $a1 and increment $a1
store $r1,*$a2++ // store $r1 at address in $a2 and increment $a2
I forget the actual opcodes. The compiler contained no optimisation phase yet the code that it generated was very tight providing that you understood the compiler and the machine architecture. It could do this because the hardware architecture had pre-decrement and post-increment addressing modes for both address registers and general registers. There were no pre-increment and post-decrement addressing modes as far as I recall but you could get by without those.
I believe that the DEC minicomputers on which C was originally developed had such addressing modes. The machine that I worked on wasn't made by DEC but the architecture was pretty similar.
An optimisation phase was planned for the compiler. However, it was mostly used by systems programmers and when they saw how good the generated code was, implementation of the optimisation phase was quietly shelved.
The whole rationale for the design of C was to allow the creation of simple and portable compilers that would generate reasonably efficient code with minimal (or no) intermediate code optimisation. For this reason, the increment and decrement operators and also the compound assignment operators played an important role in the generation of compact and efficient code by the early C compilers. They were not just syntactic sugar as suggested by Niklaus Wirth et al.
So you can, for example, do this
While (++i < threshold) [do something];
and this...
While (i++ < threshold) [do something];
or any of a thousand other specific implementations that both use the value and increment it in a single statement and get the expected different results

what's the difference of i++ and ++i in for loop? [duplicate]

Perhaps it doesn't matter to the compiler once it optimizes, but in C/C++, I see most people make a for loop in the form of:
for (i = 0; i < arr.length; i++)
where the incrementing is done with the post fix ++. I get the difference between the two forms. i++ returns the current value of i, but then adds 1 to i on the quiet. ++i first adds 1 to i, and returns the new value (being 1 more than i was).
I would think that i++ takes a little more work, since a previous value needs to be stored in addition to a next value: Push *(&i) to stack (or load to register); increment *(&i). Versus ++i: Increment *(&i); then use *(&i) as needed.
(I get that the "Increment *(&i)" operation may involve a register load, depending on CPU design. In which case, i++ would need either another register or a stack push.)
Anyway, at what point, and why, did i++ become more fashionable?
I'm inclined to believe azheglov: It's a pedagogic thing, and since most of us do C/C++ on a Window or *nix system where the compilers are of high quality, nobody gets hurt.
If you're using a low quality compiler or an interpreted environment, you may need to be sensitive to this. Certainly, if you're doing advanced C++ or device driver or embedded work, hopefully you're well seasoned enough for this to be not a big deal at all. (Do dogs have Buddah-nature? Who really needs to know?)
It doesn't matter which you use. On some extremely obsolete machines, and in certain instances with C++, ++i is more efficient, but modern compilers don't store the result if it's not stored. As to when it became popular to postincriment in for loops, my copy of K&R 2nd edition uses i++ on page 65 (the first for loop I found while flipping through.)
For some reason, i++ became more idiomatic in C, even though it creates a needless copy. (I thought that was through K&R, but I see this debated in other answers.) But I don't think there's a performance difference in C, where it's only used on built-ins, for which the compiler can optimize away the copy operation.
It does make a difference in C++, however, where i might be a user-defined type for which operator++() is overloaded. The compiler might not be able to assert that the copy operation has no visible side-effects and might thus not be able to eliminate it.
As for the reason why, here is what K&R had to say on the subject:
Brian Kernighan
you'll have to ask dennis (and it might be in the HOPL paper). i have a
dim memory that it was related to the post-increment operation in the
pdp-11, though beyond that i don't know, so don't quote me.
in c++ the preferred style for iterators is actually ++i for some subtle
implementation reason.
Dennis Ritchie
No particular reason, it just became fashionable. The code produced
is identical on the PDP-11, just an inc instruction, no autoincrement.
HOPL Paper
Thompson went a step further by inventing the ++ and -- operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B, but appeared along the way. People often guess that they were created to use the auto-increment and auto-decrement address modes provided by the DEC PDP-11 on which C and Unix first became popular. This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few ‘auto-increment’ memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own. Indeed, the auto-increment cells were not used directly in implementation of the operators, and a stronger
motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1.
For integer types the two forms should be equivalent when you don't use the value of the expression. This is no longer true in the C++ world with more complicated types, but is preserved in the language name.
I suspect that "i++" became more popular in the early days because that's the style used in the original K&R "The C Programming Language" book. You'd have to ask them why they chose that variant.
Because as soon as you start using "++i" people will be confused and curios. They will halt there everyday work and start googling for explanations. 12 minutes later they will enter stack overflow and create a question like this. And voila, your employer just spent yet another $10
Going a little further back than K&R, I looked at its predecessor: Kernighan's C tutorial (~1975). Here the first few while examples use ++n. But each and every for loop uses i++. So to answer your question: Almost right from the beginning i++ became more fashionable.
My theory (why i++ is more fashionable) is that when people learn C (or C++) they eventually learn to code iterations like this:
while( *p++ ) {
...
}
Note that the post-fix form is important here (using the infix form would create a one-off type of bug).
When the time comes to write a for loop where ++i or i++ doesn't really matter, it may feel more natural to use the postfix form.
ADDED: What I wrote above applies to primitive types, really. When coding something with primitive types, you tend to do things quickly and do what comes naturally. That's the important caveat that I need to attach to my theory.
If ++ is an overloaded operator on a C++ class (the possibility Rich K. suggested in the comments) then of course you need to code loops involving such classes with extreme care as opposed to doing simple things that come naturally.
At some level it's idiomatic C code. It's just the way things are usually done. If that's your big performance bottleneck you're likely working on a unique problem.
However, looking at my K&R The C Programming Language, 1st edition, the first instance I find of i in a loop (pp 38) does use ++i rather than i++.
Im my opinion it became more fashionable with the creation of C++ as C++ enables you to call ++ on non-trivial objects.
Ok, I elaborate: If you call i++ and i is a non-trivial object, then storing a copy containing the value of i before the increment will be more expensive than for say a pointer or an integer.
I think my predecessors are right regarding the side effects of choosing postincrement over preincrement.
For it's fashonability, it may be as simple as that you start all three expressions within the for statement the same repetitive way, something the human brain seems to lean towards to.
I would add up to what other people told you that the main rule is: be consistent. Pick one, and do not use the other one unless it is a specific case.
If the loop is too long, you need to reload the value in the cache to increment it before the jump to the begining.
What you don't need with ++i, no cache move.
In C, all operators that result in a variable having a new value besides prefix inc/dec modify the left hand variable (i=2, i+=5, etc). So in situations where ++i and i++ can be interchanged, many people are more comfortable with i++ because the operator is on the right hand side, modifying the left hand variable
Please tell me if that first sentence is incorrect, I'm not an expert with C.

Is it considered good design to compare objects of different types?

Would you consider this evidence of bad design?
//FooType and BarType not in the same hierarchy
bool operator==(const FooType &, const BarType &);
bool operator<(const FooType &, const BarType &);
For example, if FooType is double measuring seconds since epoch and BarType is a tuple of three integers (year, month, and day) providing date in UTC, comparisons like above "make sense".
Have you seen such inter-type comparisons? Are they frowned upon in the C++ community?
To start with, there's nothing wrong with using free functions instead of member functions, in fact it's recommended practice. See Scott Meyer's How Non-Member Functions Improve Encapsulation. You'll want to provide the comparisons in both directions though:
bool operator==(const FooType &, const BarType &);
bool operator==(const BarType &, const FooType &);
Second, it's perfectly acceptable to provide these comparisons if the comparisons make sense. The standard library for example allows you to compare std::complex values for equality with floating point, but not less-than.
The one thing you want to avoid is comparisons that don't make sense. In your example one of the time values is a double, which means the comparison would occur for any floating point or integer value once you take standard promotions into account. This is probably more than you intended since there's no way to determine if any particular value represents a time. The loss of type checking means that there's a potential for unintended bugs.
Personal vision and experience
I personally do not frown upon the comparison between different types. I even encourage it, as it may improve code readability; making what you are doing seem more logical. Outside of the basic number types, and maybe a string and a character, I find it hard to give you a logical intra-type comparison, and I don't remember having met many. I have met alot of arithmetic operators used like this though.
How to use them
You should be careful with what you are doing, they are used scarcely for a reason. If you offer a function for the comparison of two different types, the outcome should be logical and what the user intuitively expected. It is also desirelable to write good documentation for it. Mark Ransom already said it, but it is good if users can compare in both directions. If you think your comparison is not sufficiently clear enough with an operator, you should think about using a named function. This is also a very good solution if your operator can have multiple meanings.
What can go wrong
You do not have full control over what the user will do with what you have written. tletnes gave a good example of this, where two integers are compared, but the outcome has no meaning. In contradiction to this, the comparison of two different types can be very right. A float and an integer both representing seconds, can be well compared.
Arithmetic operators
Next to logical, I would like to show an intra-type example with arithmetic operators. Arithmetic operators are much like logical operators when talking about intra-type usage.
Say you have an operator + for a two dimentional vector and a square. What does this do? The user might think it scales the square, but another user is sure it translates! These kinds of issues can be very frustrating to your users. You can solve this by providing good documentation, but what I personally prefer, is specifically named functions, like Translate.
Conclusion
Intra-type logical operators can be useful and make clean code, but bad usage makes everything just more complicated.
Good design would dicatate that you should only compare values of compatible meaning. Generally type is a good clue to meaning, but is not the last word, in fact in many cases two values of the same type may have incompatible meaning, such as the followng two integers:
int seconds = 3 //seconds
int length = 2; //square inches
if(seconds >= length){
//what does this mean?
}
In this example we compare length to seconds, however there is not a meningful relationship between the two.
int test_duration = 3 //minutes
float elapsed_time = 2.5; //seconds
if((test_duration * 60) >= elapsed_time ){
//tes is done
}
In this examaple we compare two values of differing types (and units) however their meanings are still compatible (they both represent time) so (assuming there is a good reason why the two were stored that way (e.g. ease of using an API etc.) this is a good design.
According to Stepanov's rules (see Elements of Programming), equality is tightly connected with copying (construction and assignment) (and also inequality).
So, if the objects represent equal values then go ahead, make them equality comparable but think of these four operations at the same time (equality, [copy]construction, assignment and inequality).
By extension, also conversion (cast operation or construction on the other side) back and forth from the different types.
It is also implicitly connected to any "regular" function you can apply to these values.
Stepanov defines two values to be equal if any (regular) function applied to them gives equal results.
I would say also that, even if you can compare two objects equal and construct one from each other, if the set of common functions (generic or not) you can apply to both is not a relevant set or their results typical yield unequal values then there is little value in comparing the different types of objects.
Worst of all, what if one of the types has fundamentally more functions than the other? Can one maintain reflexivity?
Finally there is the consideration of algorithmic complexity, if comparing the two objects is O(N^2) or higher (where N is the "size" of the objects in some measure), then it is understood that there is little value in comparing objects at all. (See John Lakos' talk https://www.youtube.com/watch?v=W3xI1HJUy7Q)
So, as you see there is more to it than just coming up with a comparison criterion to fill in the body of operator==, or whether it is a good practice, that is just the start.
Equality is so fundamental that percolates into the meaning of all your program.

increment operator more purpose? [duplicate]

This question already has answers here:
rate ++a,a++,a=a+1 and a+=1 in terms of execution efficiency in C.Assume gcc to be the compiler [duplicate]
(10 answers)
Which is faster? ++, += or x + 1?
(5 answers)
Closed 9 years ago.
I was wondering if the increment and decrement operators ( ++ -- ) have more purpose to it than its plain use to make the code more simple
maybe:
i++;
is more efficient than:
i = i + 1;
?
In many ways, the main purpose of the operators is backwards
compatibility. When C++ was being designed, the general rule
was to do what C did, at least for non-class types; if C hadn't
had ++ and --, I doubt C++ would have them.
Which, of course, begs the question. It's inconceivable that
they would generate different code in a modern compiler, and
it's fairly inconceivable that the committee would introduce
them for optimization reasons (although you never
know—move semantics were introduced mainly for
optimization reasons). But back in the mid-1970s, in the
formative years of C? It was generally believed, at the time,
that they were introduced because they corresponded to machine
instructions on the PDP-11. On the other hand, they were
already present in B. C acquired them from B. And B was an
interpreted language, so there was no issue of them
corresponding to machine instructions. My own suspicion, which
applies to many of the operators (&, rather than and, etc.)
is that they were introduced because development at the time was
largely on teletypes (tty's), and every character you output to
a teletype made a lot of unpleasant noise. So the less
characters you needed, the better.
As to the choice between ++ i;, i += 1; and i = i + 1;:
there is a decided advantage to not having to repeat the i
(which can, of course, be a more or less complex expression), so
you want at least i += 1;. Python stops there, if for no
other reason than it treats assignment as a statement, rather
than as the side effect of an arbitrary expression. With over
30 years of programming in C and C++ under my belt, I still feel
that ++ i is missing when programming in Python, even though I
pretty much restrict myself in C++ to treating assignment as a
statement (and don't embed ++ i in more complicated
expressions).
Performance depends on the type of i.
If it's a built-in type, then optimizers will "notice" that your two statements are the same, and emit the same code for both.
Since you used post-increment (and ignoring your semi-colons), the two expressions have different values even when i is a built-in type. The value of i++ is the old value, the value of i = i + 1 is the new value. So, if you write:
j = i++;
k = (i = i + 1);
then the two are now different, and the same code will not be emitted for both.
Since the post-condition of post-increment is the same as pre-increment, you could well say that the primary purpose of the post-increment operator is that it evaluates to a different value. Regardless of performance, it makes the language more expressive.
If i has class type with overloaded operators then it might not be so simple. In theory the two might not even have the same result, but assuming "sensible" operators i + 1 returns a new instance of the class, which is then move-assigned (in C++11 if the type has a move assignment operator) or copy-assigned (in C++03) to i. There's no guarantee that the optimizer can make this as efficient as what operator++ does, especially if the operator overload definitions are off in another translation unit.
Some types have operator++ but not operator+, for example std::list<T>::iterator. That's not the reason ++ exists, though, since it existed in C and C has no types with ++ but not +. It is a way that C++ has taken advantage of it existing.
The two examples you gave will almost certainly compile to exactly the same machine code. Compilers are very smart. Understand that a compiler rarely executes the code you actually wrote. It will twist it and mould it to improve performance. Any modern compiler will know that i++; and i = i + 1; are identical (for an arithmetic type) and generate the same output.
However, there is a good reason to have both, other than just code readability. Conceptually, incrementing a value many times and adding to a value are different operations - they are only the same here because you are adding 1. An addition, such as x + 3, is a single operation, whereas doing ++++++x represents three individual operations with the same effect. Of course, a smart compiler will also know that for an object of arithmetic type, x, it can do N increments in constant time just by doing x + N.
However, the compiler can't make this assumption if x is of class type with an overloaded operator+ and operator++. These two operators may do entirely different things. In addition, implementing an operator+ as a non-constant time operation would give the wrong impression.
The importance of having both becomes clear when we're dealing with iterators. Only Random Access Iterators support addition and subtraction. For example, a standard raw pointer is a random access iterator because you can do ptr + 5 and get a pointer to the 5th object along. However, all other types of iterators (bidirectional, forward, input) do not support this - you can only increment them (and decrement a bidirectional iterator). To get to the 5th element along with a bidirectional iterator, you need to do ++ five times. That's because an addition represents a constant time operation but many iterators simply cannot traverse in constant time. Forcing multiple increments shows that it's not a constant time operation.
Both would produce same machine instruction(s) in optimized code by a decent compiler.
If the compiler sees i++ is more efficient,then it will convert i=i+1 to i++, or vice-versa. The result will be same no matter what you write.
I prefer ++i . I never write i=i+1.
No, it is simply to make typing simple, and to get simpler looking syntax.
When they are both compiled, they are reduced to the same compiled code, and run at the same speed.
The ++ or -- operator allows it to be combined into other statements rather than i=i+1. For e.g. while (i++ < 10) allows a while loop check and an increment to be done after it. It's not possible to do that with i=i+1.
The ++ or -- operator can be overloaded in other classes to have other meanings or to do increments and decrements with additional actions. Please see this page for more info.
Try to replace ++i or i++ in some more complicated expression. You're going to have difficulties with the preincrement/postincrement feature. You're going to need to split the expression into multiple ones. It's like ternary operator. It is equivalent, the compiler may perform some optimizations regardless on the way you enter the code, but ternary operator and preincrement/postincrement syntax just save you space and readability of the code.
They both are not exactly same , though functionally they are same, there is a difference in precedencei++ has more precedence(more priority) than i=i+1
The difference between i++ and i = i + 1 is that the first expression only evaluates i once and the second evaluates it twice. Consider:
int& f() {
static int value = 0;
std::cout << "f() called\n";
return value;
}
f()++; // writes the message once
f() = f() + 1; // writes the message twice
And if f() returns different references on different calls, the two expressions have vastly different effects. Not that this is good design, of course...

which cast is faster static_cast<int> () or int()

Try to see which cast is faster (not necessary better): new c++ case or old fashion C style cast. Any ideas?
There should be no difference at all if you compare int() to equivalent functionality of static_cast<int>().
Using VC2008:
double d = 10.5;
013A13EE fld qword ptr [__real#4025000000000000 (13A5840h)]
013A13F4 fstp qword ptr [d]
int x = int(d);
013A13F7 fld qword ptr [d]
013A13FA call #ILT+215(__ftol2_sse) (13A10DCh)
013A13FF mov dword ptr [x],eax
int y = static_cast<int>(d);
013A1402 fld qword ptr [d]
013A1405 call #ILT+215(__ftol2_sse) (13A10DCh)
013A140A mov dword ptr [y],eax
Obviously, it is 100% the same!
No difference whatsoever.
When it comes to such basic constructs as a single cast, once two constructs have the same semantic meaning, their performace will be perfectly identical, and the machine code generated for these constructs will be the same.
I believe that the actual result is implementation defined. You should check it in your version of compiler. But I believe that it will give the same result in most modern compilers. And in C++ you shouldn't use C-cast, instead use C++ casts - it will allow you to find errors at compile time.
Take a look at the assembly using each method. If it differs use a profiler.
They are same as it is resolved during compile time itself and there is no runtime overhead. Even if there was some difference I really wouldn't bother too much about these tiny (not even micro) optimizations.
As most people say one hopes these should be the same speed, although you're at the mercy of your compiler... and that's not always a very happy situation. Read on for war stories.
Depending on your compiler and the particular model of processor core which the program executes on the speed of float f; int i(f);, float f; int i = (int)f; and float f; int i = static_cast<int>(f); and their ilk (including variations involving double, long and unsigned types) can be atrociously slow - an order of magnitude worse than you expect. The compiler may emit instructions altering internal processor modes causing instruction pipelines to be thrown away. This is, in effect, a bug in the optimization element of the compiler. I've seen cases where one suffers the sort 40-clock-cycle costs mentioned in this analysis, at which point you have a major, unexpected and irritating performance bottleneck with AFAIK no entirely pleasing, robust, generic solution. There are alternatives involving assembler but AFAIK they do not round floating point to integer the same way as the casts do. If anyone knows any better I am interested. I'm hoping this issue is/will shortly be confined to legacy compilers/hardware but you need your wits about you.
P.S. I can't reach that link because my firewall blocks it as games-related but a Google cache of it suffices to demonstrate that its author knows more about it than I do.
When you choice makes little difference to the code, I'd pick the one which looks more familiar to later programmers. Making code easier to understand by others is always worth considering. In this case, I'd stick to int(…) for that reason.