Creating a unsigned int with undefined overflow? - c++

In a recent CppCon talk done by Chandler Carruth (link) around 39:16, he explain how letting the overflow of a signed integer undefined allow compilers to generate optimized assembly.
These kind of optimization can also be found in this blog post done by Krister Walfridsson, here
Being previously bitten by bugs involving sizes overflowing on INT_MAX, I tend to be pedantic on types I use in my code, but at the same time I don't want to lose quite straightforward performance gains.
While I have a limited knowledge of assembly, this left me wondering what would it entail to create a unsigned integer with undefined overflow ? This seems to be a recurring issue but I didn't find any proposal to introduce one (and eventually update std::size_t), does something like this has ever been discussed ?

This question is completely backward. There isn't some magic panacea by which a behaviour can be deemed undefined in order to give a compiler optimisation opportunities. There is always an offset.
For an operation to have undefined behaviour in some conditions, the C++ standard would need to describe no constraints on the resultant behaviour. That is the definition of undefined behaviour in the standard.
For the C++ standard (or any standard - undefined behaviour is a feature of standards, not just standards for programming languages) to do that, there would need to be more than one realistic way of implementing the operation, under a range of conditions, that produces different outcomes, advantages, and disadvantages. There would also need to be a realistic prospect of real-world implementation for more than one alternative. Lastly, there needs to be realistic chances that each of those features provide some value (e.g. desirable attributes of a system which uses those features, etc) - otherwise one approach can be specified, and there is no need for alternatives.
Overflow of signed integers has undefined behaviour on overflow because of a number of contributing factors. Firstly, there are different representations of a signed integer (e.g. ones-complement, twos-complement, etc). Second, the representation of a signed integer (by definition) includes representation of a sign e.g. a sign bit. Third, there is no particular representation of a signed integer that is inherently superior to another (choosing one or the others involves engineering trade-offs, for example in design of circuitry within a processor to implement an addition operation). Fourth, there are real-world implementations that use different representations. Because of these factors, operations on a signed integer that overflow may "wrap" with one CPU, but result in a hardware signal that must be cleared on another CPU. Each of these types of behaviour - or others - may be "optimal", by some measure, for some applications than not others. The standard has to allow for all of these possibilities - and the means by which it does that is to deem the behaviour undefined.
The reason arithmetic on unsigned integers has well-defined behaviour is because there aren't as many realistic ways of representing them or operations on them and - when such representations and operations on them are implemented in CPU circuitry, the results all come out the same (i.e. modulo arithmetic). There is no "sign bit" to worry about in creating circuits to represent and operate on unsigned integral values. Even if bits in an unsigned variable are physically laid out differently the implementation of operations (e.g. an adder circuit using NAND gates) causes a consistent behaviour on overflow for all basic math operations (addition, subtraction, multiplication, division). And, not surprisingly, all existing CPUs do it this way. There isn't one CPU that generates a hardware fault on unsigned overflow.
So, if you want overflow operations on unsigned values to have undefined behaviour, you would first need to find a way of representing an unsigned value in some way that results in more than one feasible/useful result/behaviour, make a case that your scheme is better in some way (e.g. performance, more easily fabricated CPU circuitry, application performance, etc). You would then need to convince some CPU designers to implement such a scheme, and convince system designers that scheme gives a real-world advantage. At the same time, you would need to leave some CPU designers and system designers with the belief that some other scheme has an advantage over yours for their purposes. In other words, real world usage of your approach must involve real trade-offs, rather than your approach having consistent advantage over all possible alternatives. Then, once you have multiple realisations in hardware of different approaches - which result in different behaviours on overflow on different platforms - you would need to convince the C++ standardisation committee that there is advantage in supporting your scheme in the standard (i.e. language or library features that exploit your approach), and that all of the possible behaviours on overflow need to be permitted. Only then will overflow on unsigned integers (or your variation thereof) have undefined behaviour.
I've described the above in terms of starting at the hardware level (i.e. having native support for your unsigned integer type in hardware). The same goes if you do it in software, but you would need to convince developers of libraries or operating systems instead.
Only then will you have introduced an unsigned integral type which has undefined behaviour if operations overflow.
More generally, as said at the start, this question is backward though. It is true that compilers exploit undefined behaviour (sometimes in highly devious ways) to improve performance. But, for the standard to deem that something has undefined behaviour, there needs to be more than one way of doing relevant operations, and implementations (compilers, etc) need to be able to analyse benefits and trade-offs of the alternatives, and then - according to some criteria - pick one. Which means there will always be a benefit (e.g. performance) and an unwanted consequence (e.g. unexpected results in some edge cases).

There is no such thing as an unsigned integer with undefined overflow. C++ is very specific that unsigned types do not overflow; they obey modulo arithmetic.
Could a future version of the language add an arithmetic type that does not obey modulo arithmetic, but also does not support signedness (and thus may use the whole range of its bits)? Maybe. But the alleged performance gains are not what they are with a signed value (which would otherwise have to consider correct handling of the sign bit, whereas an unsigned value has no "special" bits mandated) so I wouldn't hold your breath. In fact, although I'm no assembly expert, I can't imagine that this would be useful in any way.

Related

Is there any consistent definition of time complexity for real world languages like C++?

C++ tries to use the concept of time complexity in the specification of many library functions, but asymptotic complexity is a mathematical construct based on asymptotic behavior when the size of inputs and the values of numbers tend to infinity.
Obviously the size of scalars in any given C++ implementation is finite.
What is the official formalization of complexity in C++, compatible with the finite and bounded nature of C++ operations?
Remark: It goes without saying that for a container or algorithm based on a type parameter (as in the STL), complexity can only be expressed in term of number of user provided operations (say a comparison for sorted stuff), not in term of elementary C++ language operations. This is not the issue here.
EDIT:
Standard quote:
4.6 Program execution [intro.execution]
1 The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.
2 Certain aspects and operations of the abstract machine are described
in this International Standard as implementation-defined (for example,
sizeof(int)). These constitute the parameters of the abstract machine. [...]
The C++ language is defined in term of an abstract machine based on scalar types like integer types with a finite, defined number of bits and only so many possible values. (Dito for pointers.)
There is no "abstract" C++ where integers would be unbounded and could "tend to infinity".
It means in the abstract machine, any array, any container, any data structure is bounded (even if possibly huge compared to available computers and their minuscule memory (compared to f.ex. a 64 bits number).
Obviously the size of scalars in any given C++ implementation is finite.
Of course, you are correct with this statement! Another way of saying this would be "C++ runs on hardware and hardware is finite". Again, absolutely correct.
However, the key point is this: C++ is not formalized for any particular hardware.
Instead, it is formalized against an abstract machine.
As an example, sizeof(int) <= 4 is true for all hardware that I personally have ever programmed for. However, there is no upper bound at all in the standard regarding sizeof(int).
What does the C++ standard state the size of int, long type to be?
So, on a particular hardware the input to some function void f(int) is indeed limited by 2^31 - 1. So, in theory one could argue that, no matter what it does, this is an O(1) algorithm, because it's number of operations can never exceed a certain limit (which is the definition of O(1)). However, on the abstract machine there literally is no such limit, so this argument cannot hold.
So, in summary, I think the answer to your question is that C++ is not as limited as you think. C++ is neither finite nor bounded. Hardware is. The C++ abstract machine is not. Hence it makes sense to state the formal complexity (as defined by maths and theoretical CS) of standard algorithms.
Arguing that every algorithm is O(1), just because in practice there are always hardware limits, could be justified by a purely theoretical thinking, but it would be pointless. Even though, strictly speaking, big O is only meaningful in theory (where we can go towards infinity), it usually turns out to be quite meaningful in practice as well, even if we cannot go towards infinity but only towards 2^32 - 1.
UPDATE:
Regarding your edit: You seem to be mixing up two things:
There is no particular machine (whether abstract or real) that has an int type that could "tend to infinity". This is what you are saying and it is true! So, in this sense there always is an upper bound.
The C++ standard is written for any machine that could ever possibly be invented in the future. If someone creates hardware with sizeof(int) == 1000000, this is fine with the standard. So, in this sense there is no upper bound.
I hope you understand the difference between 1. and 2. and why both of them are valid statements and don't contradict each other. Each machine is finite, but the possibilities of hardware vendors are infinite.
So, if the standard specifies the complexity of an algorithm, it does (must do) so in terms of point 2. Otherwise it would restrict the growth of hardware. And this growth has no limit, hence it makes sense to use the mathematical definition of complexity, which also assumes there is no limit.
asymptotic complexity is a mathematical construct based on asymptotic behavior when the size of inputs and the values of numbers tend to infinity.
Correct. Similarly, algorithms are abstract entities which can be analyzed regarding these metrics within a given computational framework (such as a Turing machine).
C++ tries to use the concept of time complexity in the specification of many library functions
These complexity specifications impose restrictions on the algorithm you can use. If std::upper_bound has logarithmic complexity, you cannot use linear search as the underlying algorithm, because that has only linear complexity.
Obviously the size of scalars in any given C++ implementation is finite.
Obviously, any computational resource is finite. Your RAM and CPU have only finitely many states. But that does not mean everything is constant time (or that the halting problem is solved).
It is perfectly reasonable and workable for the standard to govern which algorithms an implementation can use (std::map being implemented as a red-black-tree in most cases is a direct consequence of the complexity requirements of its interface functions). The consequences on the actual "physical time" performance of real-world programs are neither obvious nor direct, but that is not within scope.
Let me put this into a simple process to point out the discrepancy in your argument:
The C++ standard specifies a complexity for some operation (e.g. .empty() or .push_back(...)).
Implementers must select an (abstract, mathematical) algorithm that fulfills that complexity criterion.
Implementers then write code which implements that algorithm on some specific hardware.
People write and run other C++ programs that use this operation.
You argument is that determining the complexity of the resulting code is meaningless because you cannot form asymptotes on finite hardware. That's correct, but it's a straw man: That's not what the standard does or intends to do. The standard specifies the complexity of the (abstract, mathematical) algorithm (point 1 and 2), which eventually leads to certain beneficial effects/properties of the (real-world, finite) implementation (point 3) for the benefit of people using the operation (point 4).
Those effects and properties are not specified explicitly in the standard (even though they are the reason for those specific standard stipulations). That's how technical standards work: You describe how things have to be done, not why this is beneficial or how it is best used.
Computational complexity and asymptotic complexity are two different terms. Quoting from Wikipedia:
Computational complexity, or simply complexity of an algorithm is the amount of resources required for running it.
For time complexity, the amount of resources translates to the amount of operations:
Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform.
In my understanding, this is the concept that C++ uses, that is, the complexity is evaluated in terms of the number of operations. For instance, if the number of operations a function performs does not depend on any parameter, then it is constant.
On the contrary, asymptotic complexity is something different:
One generally focuses on the behavior of the complexity for large n, that is on its asymptotic behavior when n tends to the infinity. Therefore, the complexity is generally expressed by using big O notation.
Asymptotic complexity is useful for the theoretical analysis of algorithms.
What is the official formalization of complexity in C++, compatible with the finite and bounded nature of C++ operations?
There is none.

Why not enforce 2's complement in C++?

The new C++ standard still refuses to specify the binary representation of integer types. Is this because there are real-world implementations of C++ that don't use 2's complement arithmetic? I find that hard to believe. Is it because the committee feared that future advances in hardware would render the notion of 'bit' obsolete? Again hard to believe. Can anyone shed any light on this?
Background: I was surprised twice in one comment thread (Benjamin Lindley's answer to this question). First, from piotr's comment:
Right shift on signed type is undefined behaviour
Second, from James Kanze's comment:
when assigning to a long, if the value doesn't fit in a long, the results are
implementation defined
I had to look these up in the standard before I believed them. The only reason for them is to accommodate non-2's-complement integer representations. WHY?
(Edit: C++20 now imposes 2's complement representation, note that overflow of signed arithmetic is still undefined and shifts continue to have undefined and implementation defined behaviors in some cases.)
A major problem in defining something which isn't, is that compilers were built assuming that is undefined. Changing the standard won't change the compilers and reviewing those to find out where the assumption was made is a difficult task.
Even on 2 complement machine, you may have more variety than you think. Two examples: some don't have a sign preserving right shift, just a right shift which introduce zeros; a common feature in DSP is saturating arithmetic, there assigning an out of range value will clip it at the maximum, not just drop the high order bits.
I suppose it is because the Standard says, in 3.9.1[basic.fundamental]/7
this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types.
which, I am willing to bet, came along from the C programming language, which lists sign and magnitude, two's complement, and one's complement as the only allowed representations in 6.2.6.2/2. And there sure were 1's complement systems around when C was wide-spread: UNIVACs are the most often mentioned, it seems.
It seems to me that, even today, if you are writing a broadly-applicable C++ library that you expect to run on any machine, then 2's complement cannot be assumed. C++ is just too widely used to be making assumptions like that.
Most people don't write those sorts of libraries, though, so if you want to take a dependency on 2's complement you should just go ahead.
Many aspects of the language standard are as they are because the Standards Committee has been extremely loath to forbid compilers from behaving in ways that existing code may rely upon. If code exists which would rely upon one's complement behavior, then requiring that compilers behave as though the underlying hardware uses two's complement would make it impossible for the older code to run using newer compilers.
The solution, which the Standards Committee has alas not yet seen fit to implement, would be to allow code to specify the desired semantics for things in a fashion independent of the machine's word size or hardware characteristics. If support for code which relies upon ones'-complement behavior is deemed important, design a means by which code could expressly demand one's-complement behavior regardless of the underlying hardware platform. If desired, to avoid overly complicating every single compiler, specify that certain aspects of the standard are optional, but conforming compilers must document which aspects they support. Such a design would allow compilers for ones'-complement machines to support both two's-complement behavior and ones'-complement behavior depending upon the needs of the program. Further, it would make it possible to port the code to two's-complement machines with compilers that happened to include ones'-complement support.
I'm not sure exactly why the Standards Committee has as yet not allowed any way by which code can specify behavior in a fashion independent of the underlying architecture and word size (so that code wouldn't have some machines use signed semantics for comparisons where other machines would use unsigned semantics), but for whatever reason they have yet to do so. Support for ones'-complement representation is but a part of that.

Detect long double overflows on embedded systems

I'm going to use big numbers in C++ code on an embedded system. Luckily the compiler recognizes long doubles.
I can not use standard libraries, boost libraries, gnu math libraries, etc. And the system has not got built-in float math cpu.
Now how can I detect long double overflows?
Your state that you need "big numbers", but this does not necessarily mean that the use of long double is indicated. In most embedded applications that I know of, long double is chosen for its enhanced precision, i.e. more bits of resolution for fractional numbers, than for its increased range.
You also state your implementation offers little of the usual floating point libraries and/or functionality. Based on these statements, I would question whether your need fully functional floating-point capabilities. If your concerns are limited to "big numbers", check to see if your compiler offers a long long datatype, which is a 64-bit integer.
If you do need some floating-point capability, you might consider a fixed-point implementation. Assuming a long long, you might choose to represent numbers in a 48.16 format, which will permit numbers of ~2.8x10^14 with 16 bits to the right of decimal. (If you need an introduction to fixed-point computation, start here.)
Having addressed some of the background issues, let's look at the original question. If you wish to detect overflow in an unsigned int (which I commonly do in my embedded work), it's sufficient to compare your latest result with the previous one. For example, my application requires me to periodically inspect a 16-bit counter that is driven by an external clock. If my current observation is less than the last observation, then I can assume that the 16-bit counter overflowed, and I can take action accordingly. If you implement your big numbers using a long long integer datatype, you can apply a similar strategy to detect overflow.
As it's not standard C++, you will have to rely on methods provided by your specific environment. The manufacturer of the embedded system should have documented how it can be done. Ask him.

How to write portable floating point arithmetic in c++?

Say you're writing a C++ application doing lots of floating point arithmetic. Say this application needs to be portable accross a reasonable range of hardware and OS platforms (say 32 and 64 bits hardware, Windows and Linux both in 32 and 64 bits flavors...).
How would you make sure that your floating point arithmetic is the same on all platforms ? For instance, how to be sure that a 32 bits floating point value will really be 32 bits on all platforms ?
For integers we have stdint.h but there doesn't seem to exist a floating point equivalent.
[EDIT]
I got very interesting answers but I'd like to add some precision to the question.
For integers, I can write:
#include <stdint>
[...]
int32_t myInt;
and be sure that whatever the (C99 compatible) platform I'm on, myInt is a 32 bits integer.
If I write:
double myDouble;
float myFloat;
am I certain that this will compile to, respectively, 64 bits and 32 bits floating point numbers on all platforms ?
Non-IEEE 754
Generally, you cannot. There's always a trade-off between consistency and performance, and C++ hands that to you.
For platforms that don't have floating point operations (like embedded and signal processing processors), you cannot use C++ "native" floating point operations, at least not portably so. While a software layer would be possible, that's certainly not feasible for this type of devices.
For these, you could use 16 bit or 32 bit fixed point arithmetic (but you might even discover that long is supported only rudimentary - and frequently, div is very expensive). However, this will be much slower than built-in fixed-point arithmetic, and becomes painful after the basic four operations.
I haven't come across devices that support floating point in a different format than IEEE 754. From my experience, your best bet is to hope for the standard, because otherwise you usually end up building algorithms and code around the capabilities of the device. When sin(x) suddenly costs 1000 times as much, you better pick an algorithm that doesn't need it.
IEEE 754 - Consistency
The only non-portability I found here is when you expect bit-identical results across platforms. The biggest influence is the optimizer. Again, you can trade accuracy and speed for consistency. Most compilers have a option for that - e.g. "floating point consistency" in Visual C++. But note that this is always accuracy beyond the guarantees of the standard.
Why results become inconsistent?
First, FPU registers often have higher resolution than double's (e.g. 80 bit), so as long as the code generator doesn't store the value back, intermediate values are held with higher accuracy.
Second, the equivalences like a*(b+c) = a*b + a*c are not exact due to the limited precision. Nonetheless the optimizer, if allowed, may make use of them.
Also - what I learned the hard way - printing and parsing functions are not necessarily consistent across platforms, probably due to numeric inaccuracies, too.
float
It is a common misconception that float operations are intrinsically faster than double. working on large float arrays is faster usually through less cache misses alone.
Be careful with float accuracy. it can be "good enough" for a long time, but I've often seen it fail faster than expected. Float-based FFT's can be much faster due to SIMD support, but generate notable artefacts quite early for audio processing.
Use fixed point.
However, if you want to approach the realm of possibly making portable floating point operations, you at least need to use controlfp to ensure consistent FPU behavior as well as ensuring that the compiler enforces ANSI conformance with respect to floating point operations. Why ANSI? Because it's a standard.
And even then you aren't guaranteeing that you can generate identical floating point behavior; that also depends on the CPU/FPU you are running on.
It shouldn't be an issue, IEEE 754 already defines all details of the layout of floats.
The maximum and minimum values storable should be defined in limits.h
Portable is one thing, generating consistent results on different platforms is another. Depending on what you are trying to do then writing portable code shouldn't be too difficult, but getting consistent results on ANY platform is practically impossible.
I believe "limits.h" will include the C library constants INT_MAX and its brethren. However, it is preferable to use "limits" and the classes it defines:
std::numeric_limits<float>, std::numeric_limits<double>, std::numberic_limits<int>, etc...
If you're assuming that you will get the same results on another system, read What could cause a deterministic process to generate floating point errors first. You might be surprised to learn that your floating point arithmetic isn't even the same across different runs on the very same machine!

Compilers and negative numbers representations

Recently I was confused by this question. Maybe because I didn't read language specifications (it's my fault, I know).
C99 standard doesn't say which negative numbers representation should be used by compiler. I always thought that the only right way to store negative numbers is two's complement (in most cases).
So here's my question: do you know any present-day compiler that implements by default one's complement or sign-magnitude representation? Can we change default representation with some compiler flag?
What is the simplest way to determine which representation is used?
And what about C++ standard?
I think it's not so much a question of what representation the compiler uses, but rather what representation the underlying machine uses. The compiler would be very stupid to pick a representation not supported by the target machine, since that would introduce loads of overhead for no benefit.
Some checksum fields in the IP protocol suite use one's complement, so perhaps dedicated "network accelerator"-type CPU:s implement it.
While twos-complement representation is by far the most common, it is not the only one (see some). The C and C++ standardisation committees did not want to require non-twos-complement machines to emulate a non-native representation. Therefore neither C not C++ require a specific negative integer format.
This leads to the undefined behaviour of bitwise operations on signed types.
The UNISYS 2200 series which implements one's complement math, is still in use with some quite updated compiler. You can read more about it in the questions below
Exotic architectures the standards committees care about
Are there any non-twos-complement implementations of C?