About time/space complexity in C/C++ standards - c++

Recently I've read things about abstract machine and as-if rule (What exactly is the "as-if" rule?), and the requirements on time complexity of standard library (like this one: Is list::size() really O(n)?).
Are the time/space complexity requirements on standard library in terms of abstract machine or in terms of real concrete machine?
If these are in terms of abstract machine, it seems an implementation can actually generate less efficient code in terms of complexity even though it seems not to be practical.
Did the standards mention anything about time/space complexity for non-standard-library code?
e.g. I may write a custom sorting code and expect O(n log n) time, but if an implementation just treats this as code in abstract machine, it is allowed to generate a slower sorting in assembly and machine code, like changing it to O(n^2) sort, even though it unlikely will do that in real situation.
Or maybe I missed something about the transformation requirements between abstract machine and real concrete machine. Can you help me to clarify? :)
Even thought I mainly read things about C++ standard, I also want to know the situation about C standard. So this question tags both.

Are the time/space complexity requirements on standard library in terms of abstract machine or in terms of real concrete machine?
The complexity requirements are in terms of the abstract machine:
[intro.abstract] The semantic descriptions in this document define a parameterized nondeterministic abstract machine...
Did the standards mention anything about time/space complexity for non-standard-library code?
No. The only complexity requirements in the standard are for standard containers and algorithms.
if an implementation just treats this as code in abstract machine, it is allowed to generate a slower sorting in assembly and machine code, like changing it to O(n^2) sort
That's not the worst thing it can do. An implementation can put the CPU to sleep for a year between every instruction. As long as you're patient enough, the program would have same observable behaviour as the abstract machine, so it would be conforming.

Many of the complexity requirements in the C++ standard are in terms of specific counts of particular operations. These do constrain the implementation.
E.g. std::find_if
At most last - first applications of the predicate.
This is more specific than "O(N), where N = std::distance(first, last)", as it specifies a constant factor of 1.
And there are others that have Big-O bounds, defining what operation(s) are counted
E.g. std::sort
O(N·log(N)), where N = std::distance(first, last) comparisons.
What this doesn't constrain includes how slow a comparison is, nor how many swaps occur. If your model of computation has fast comparison and slow swapping, you don't get a very useful analysis.

As you've been told in comments, the standards don't have any requirements regarding time or space complexity. And addressing your additional implicit question, yes, a compiler can change your O(n log n) code to run in O(n²) time. Or in O(n!) if it wants to.
The underlying explanation is that the standard defines correct programs, and a program is correct regardless of how long it takes to execute or how much memory it uses. These details are left to the implementation.
Specific implementations can compile your code in whichever way achieves correct behavior. It would be completely permissible, for instance, for an implementation to add a five-second delay between every line of code you wrote — the program is still correct. It would also be permissible for the compiler to figure out a better way of doing what you wrote and rewriting your entire program, as long as the observable behavior is the same.
However, the fact that an implementation is compliant doesn't mean it is perfect. Adding five-second delays wouldn't affect the implementation's compliance, but nobody would want to use that implementation. Compilers don't do these things because they are ultimately tools, and as such, their writers expect them to be useful to those who use them, and making your code intentionally worse is not useful.
TL;DR: bad performance (time complexity, memory complexity, etc.) doesn't affect compliance, but it will make you look for a new compiler.

Related

Is there any consistent definition of time complexity for real world languages like C++?

C++ tries to use the concept of time complexity in the specification of many library functions, but asymptotic complexity is a mathematical construct based on asymptotic behavior when the size of inputs and the values of numbers tend to infinity.
Obviously the size of scalars in any given C++ implementation is finite.
What is the official formalization of complexity in C++, compatible with the finite and bounded nature of C++ operations?
Remark: It goes without saying that for a container or algorithm based on a type parameter (as in the STL), complexity can only be expressed in term of number of user provided operations (say a comparison for sorted stuff), not in term of elementary C++ language operations. This is not the issue here.
EDIT:
Standard quote:
4.6 Program execution [intro.execution]
1 The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.
2 Certain aspects and operations of the abstract machine are described
in this International Standard as implementation-defined (for example,
sizeof(int)). These constitute the parameters of the abstract machine. [...]
The C++ language is defined in term of an abstract machine based on scalar types like integer types with a finite, defined number of bits and only so many possible values. (Dito for pointers.)
There is no "abstract" C++ where integers would be unbounded and could "tend to infinity".
It means in the abstract machine, any array, any container, any data structure is bounded (even if possibly huge compared to available computers and their minuscule memory (compared to f.ex. a 64 bits number).
Obviously the size of scalars in any given C++ implementation is finite.
Of course, you are correct with this statement! Another way of saying this would be "C++ runs on hardware and hardware is finite". Again, absolutely correct.
However, the key point is this: C++ is not formalized for any particular hardware.
Instead, it is formalized against an abstract machine.
As an example, sizeof(int) <= 4 is true for all hardware that I personally have ever programmed for. However, there is no upper bound at all in the standard regarding sizeof(int).
What does the C++ standard state the size of int, long type to be?
So, on a particular hardware the input to some function void f(int) is indeed limited by 2^31 - 1. So, in theory one could argue that, no matter what it does, this is an O(1) algorithm, because it's number of operations can never exceed a certain limit (which is the definition of O(1)). However, on the abstract machine there literally is no such limit, so this argument cannot hold.
So, in summary, I think the answer to your question is that C++ is not as limited as you think. C++ is neither finite nor bounded. Hardware is. The C++ abstract machine is not. Hence it makes sense to state the formal complexity (as defined by maths and theoretical CS) of standard algorithms.
Arguing that every algorithm is O(1), just because in practice there are always hardware limits, could be justified by a purely theoretical thinking, but it would be pointless. Even though, strictly speaking, big O is only meaningful in theory (where we can go towards infinity), it usually turns out to be quite meaningful in practice as well, even if we cannot go towards infinity but only towards 2^32 - 1.
UPDATE:
Regarding your edit: You seem to be mixing up two things:
There is no particular machine (whether abstract or real) that has an int type that could "tend to infinity". This is what you are saying and it is true! So, in this sense there always is an upper bound.
The C++ standard is written for any machine that could ever possibly be invented in the future. If someone creates hardware with sizeof(int) == 1000000, this is fine with the standard. So, in this sense there is no upper bound.
I hope you understand the difference between 1. and 2. and why both of them are valid statements and don't contradict each other. Each machine is finite, but the possibilities of hardware vendors are infinite.
So, if the standard specifies the complexity of an algorithm, it does (must do) so in terms of point 2. Otherwise it would restrict the growth of hardware. And this growth has no limit, hence it makes sense to use the mathematical definition of complexity, which also assumes there is no limit.
asymptotic complexity is a mathematical construct based on asymptotic behavior when the size of inputs and the values of numbers tend to infinity.
Correct. Similarly, algorithms are abstract entities which can be analyzed regarding these metrics within a given computational framework (such as a Turing machine).
C++ tries to use the concept of time complexity in the specification of many library functions
These complexity specifications impose restrictions on the algorithm you can use. If std::upper_bound has logarithmic complexity, you cannot use linear search as the underlying algorithm, because that has only linear complexity.
Obviously the size of scalars in any given C++ implementation is finite.
Obviously, any computational resource is finite. Your RAM and CPU have only finitely many states. But that does not mean everything is constant time (or that the halting problem is solved).
It is perfectly reasonable and workable for the standard to govern which algorithms an implementation can use (std::map being implemented as a red-black-tree in most cases is a direct consequence of the complexity requirements of its interface functions). The consequences on the actual "physical time" performance of real-world programs are neither obvious nor direct, but that is not within scope.
Let me put this into a simple process to point out the discrepancy in your argument:
The C++ standard specifies a complexity for some operation (e.g. .empty() or .push_back(...)).
Implementers must select an (abstract, mathematical) algorithm that fulfills that complexity criterion.
Implementers then write code which implements that algorithm on some specific hardware.
People write and run other C++ programs that use this operation.
You argument is that determining the complexity of the resulting code is meaningless because you cannot form asymptotes on finite hardware. That's correct, but it's a straw man: That's not what the standard does or intends to do. The standard specifies the complexity of the (abstract, mathematical) algorithm (point 1 and 2), which eventually leads to certain beneficial effects/properties of the (real-world, finite) implementation (point 3) for the benefit of people using the operation (point 4).
Those effects and properties are not specified explicitly in the standard (even though they are the reason for those specific standard stipulations). That's how technical standards work: You describe how things have to be done, not why this is beneficial or how it is best used.
Computational complexity and asymptotic complexity are two different terms. Quoting from Wikipedia:
Computational complexity, or simply complexity of an algorithm is the amount of resources required for running it.
For time complexity, the amount of resources translates to the amount of operations:
Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform.
In my understanding, this is the concept that C++ uses, that is, the complexity is evaluated in terms of the number of operations. For instance, if the number of operations a function performs does not depend on any parameter, then it is constant.
On the contrary, asymptotic complexity is something different:
One generally focuses on the behavior of the complexity for large n, that is on its asymptotic behavior when n tends to the infinity. Therefore, the complexity is generally expressed by using big O notation.
Asymptotic complexity is useful for the theoretical analysis of algorithms.
What is the official formalization of complexity in C++, compatible with the finite and bounded nature of C++ operations?
There is none.

How is scheduling handled in C++17 STL parallel algorithms?

Is there a standard scheduler specification for the C++17 STL parallel algorithms or is it entirely implementation dependant? The serial algorithms have complexity guarantees but the scheduler implementation is critical for performance with non uniform task loads, does the specification address this? It seems like it would be hard to guarantee cross-platform performance without a standardized scheduler.
As far as I can tell from the wording, such details are completely within the domain of implementation specification, as one would expect. The standard generally makes no effort to guarantee absolute performance of any kind, only complexity requirements, as you're seeing in this case.
Ultimately, though your source code can now take advantage of parallelism while being completely standard-defined, the actual practical outcome of running your program is up to your implementation, and I think that still makes sense. The goal of standardising features is not cross-platform performance, but portable code that can be proven correct in a vacuum.
I'd expect your toolchain to give further information on how this sort of thing works, and that may even influence your choice of toolchain! But it does make sense for them to have freedom in that regard, as they do in other areas. After all, there is a multitude of target platforms out there (theoretically infinite), all with their own potential and quirks.
It could be that a future standard emplaces further constraints on scheduling in order to kick implementers up the backside a little, but personally I wouldn't count on it.
Scheduling for C++17 STL algorithms is implementation-defined.
Moreover, C++17 doesn't guarantee parallel execution. It just allows parallelism.
The class execution::parallel_policy is an execution policy type used
as a unique type to disambiguate parallel algorithm overloading and
indicate that a parallel algorithm’s execution may be parallelized

What algorithms do popular C++ compilers use for std::sort and std::stable_sort?

What algorithms do popular C++ compilers use for std::sort and std::stable_sort? I know the standard only gives certain performance requirements, but I'd like to know which algorithms popular implementations use in practice.
The answer would be more useful if it cited references for each implementation.
First of all: the compilers do not provide any implementation of std::sort. Whilst traditionally each compiler comes prepackaged with a Standard Library implementation (which heavily relies on compilers' built-ins) you could in theory swap one implementation for another. One very good example is that Clang compiles both libstdc++ (traditionally packaged with gcc) and libc++ (brand new).
Now that this is out of the way...
std::sort has traditionally been implemented as an intro-sort. From a high-level point of view it means a relatively standard quick-sort implementation (with some median probing to avoid a O(n2) worst case) coupled with an insertion sort routine for small inputs. libc++ implementation however is slightly different and closer to TimSort: it detects already sorted sequences in the inputs and avoid sorting them again, leading to an O(n) behavior on fully sorted input. It also uses optimized sorting networks for small inputs.
std::stable_sort on the other hand is more complicated by nature. This can be extrapolated from the very wording of the Standard: the complexity is O(n log n) if sufficient additional memory can be allocated (hinting at a merge-sort), but degenerates to O(n log2 n) if not.
If we take gcc as an example we see that it is introsort for std::sort and mergesort for std::stable_sort.
If you wade through the libc++ code you will see that it also uses mergesort for std::stable_sort if the range is big enough.
One thing you should also note is that while the general approach is always one of the above mentioned ones, they are all highly optimized for various special cases.

Example of compiler optimizations that can be 'easily' done on C++ code but not C code

This question talks of an optimization of the sort function that cannot be readily achieved in C:
Performance of qsort vs std::sort?
Are there more examples of compiler optimizations which would be impossible or at least difficult to achieve in C when compared to C++?
As #sehe mentioned in a comment. It's about the abstractions more than anything else. In other words, if the language allows the coder to express intent better, then it can emit code which implements that intent in a more optimal fashion.
A simple example is std::fill. Sure for basic types, you could use memset, but, let's say it's an array of 32-bit unsigned longs. std::fill knows that the array size is a multiple of 32-bits. And depending on the compiler, it might even be able to make the assumption that the array is properly aligned on a 32-bit boundary as well.
All of this combined may allow the compiler to emit code which sets the value 32-bit at a time, with no run-time checks to make sure that it is valid to do so. If we are lucky, the compiler will recognize this and replace it with a particularly efficient architecture specific version of the code.
(in reality gcc and probably the other mainstream compilers do in fact do this for just about anything that could be considered equivalent to a memset already, including std::fill).
often, memset is implemented in a way that has run-time checks for these types of things in order to choose the optimal code path. While this difference is probably negligible, the idea is that we have better expressed the intent of "filling" an array with a specific value, so the compiler is able to make slightly better choices.
Other more complicated language features do a good job of using the expression of intent to get larger gains, but this is the simplest example.
To be clear, my point is not that std::fill is "better" than memset, instead this is an example of how c++ allows better expression of intent to the compiler, allowing it to have more information during compile time, resulting in some optimizations being easier to implement.
It depends a bit on what you think of as the optimization here. If you're thinking of it purely as "std::sort vs. qsort", then there are thousands of other similar optimizations. Using a C++ template can supports inlining in situations where essentially the only reasonable alternative in C is to use a pointer to a function and nearly no known compiler will inline the code being called. Depending on your viewpoint, this is either a single optimization, or an entire (open-ended) family of them.
Another possibility is using template meta-programming to turn something into a compile-time constant that would normally have to be computed at run-time with C. In theory, you could usually do this by embedding a magic number. This is possible via a #define into C, but can lose context, flexibility or both (e.g., in C++ you can define a constant at compile time, carry out an arbitrary calculation from that input, and produce a compile-time constant used by the rest of the code. Given the much more limited calculations you can carry out in a #define, that's not possible nearly as often.
Yet another possibility is function overloading and template specialization. These are separate, but give the same basic result: using code that's specialized to a particular type. In C, to keep the number of functions you deal with halfway reasonable, you frequently end up writing code that (for example) converts all integers to a long, then does math on that. Templates, template specialization, and overloading make it relatively easy to use code that keeps the smaller types their native sizes, which can give a substantial speed increase (especially when it can enable vectorizing the math).
One last obvious possibility stems from simply providing quite a few pre-built data structures and algorithms, and allowing such things to be packaged for relatively easy, efficient re-use. I doubt I could even count the number of times I wrote code in C using what I knew were relatively inefficient data structures and/or algorithms, simply because it wasn't worth the time to find (or adapt) a more efficient one to the task at hand. Yes, if it really became a major bottleneck, I'd go to the trouble of finding or writing something better -- but doing a bit of comparing, it's still fairly common to see speed double when written in C++.
I should add, however, that all of these are undoubtedly possible with C, at least in theory. If you approach this from a viewpoint of something like language complexity theory and theoretical models of computation (e.g., Turing machines) there's no question that C and C++ are equivalent. With enough work writing specialized versions of each function, you can/could theoretically do all of those same things with C as you can with C++.
From a viewpoint of what code you can plan on really writing in a practical project, the story changes very quickly -- the limit on what you can do mostly comes down to what you can reasonably manage, not anything like the theoretical model of computation represented by the language. Levels of optimization that are almost entirely theoretical in C are not only practical, but quite routine in C++.
Even the qsort vs std::sort example is invalid. If a C implementation wanted, it could put an inline version of qsort in stdlib.h, and any decent C compiler could handle inlining the comparison function. The reason this usually isn't done is that it's massively bloated and of dubious performance benefit -- issues C++ folks tend not to care about...

What are the functions in the standard library that can be implemented faster with programming hacks? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have recently read an article about fast sqrt calculation. Therefore, I have decided to ask SO community and its experts to help me find out, which STL algorithms or mathematical calculations can be implemented faster with programming hacks?
It would be great if you can give examples or links.
Thanks in advance.
System library developers have more concerns than just performance in mind:
Correctness and standards compliance: Critical!
General use: No optimisations are introduced, unless they benefit the majority of users.
Maintainability: Good hand-written assembly code can be faster, but you don't see much of it. Why?
Portability: Decent libraries should be portable to more than just Windows/x86/32bit.
Many optimisation hacks that you see around violate one or more of the requirements above.
In addition, optimisations that will be useless or even break when the next generation CPU comes around the corner are not a welcome thing.
If you don't have profiler evidence on it being really useful, don't bother optimising the system libraries. If you do, work on your own algorithms and code first, anyway...
EDIT:
I should also mention a couple of other all-encompassing concerns:
The cost/effort to profit/result ratio: Optimisations are an investment. Some of them are seemingly-impressive bubbles. Others are deeper and more effective in the long run. Their benefits must always be considered in relation to the cost of developing and maintaining them.
The marketing people: No matter what you think, you'll end up doing whatever they want - or think they want.
Probably all of them can be made faster for a specific problem domain.
Now the real question is, which ones should you hack to make faster? None, until the profiler tells you to.
Several of the algorithms in <algorithm> can be optimized for vector<bool>::[const_]iterator. These include:
find
count
fill
fill_n
copy
copy_backward
move // C++0x
move_backward // C++0x
swap_ranges
rotate
equal
I've probably missed some. But all of the above algorithms can be optimized to work on many bits at a time instead of just one bit at a time (as would a naive implementation).
This is an optimization that I suspect is sorely missing from most STL implementations. It is not missing from this one:
http://libcxx.llvm.org/
This is where you really need to listen to project managers and MBAs. What you're suggesting is re-implementing parts of the STL and or standard C library. There is an associated cost in terms of time to implement and maintenance burden of doing so, so you shouldn't do it unless you really, genuinely need to, as John points out. The rule is simple: is this calculation you're doing slowing you down (a.k.a. you are bound by the CPU)? If not, don't create your own implementation just for the sake of it.
Now, if you're really interested in fast maths, there are a few places you can start. The gnu multi-precision library implements many algorithms from modern computer arithmetic and semi numerical algorithms that are all about doing maths on arbitrary precision integers and floats insanely fast. The guys who write it optimise in assembly per build platform - it is about as fast as you can get in single core mode. This is the most general case I can think of for optimised maths i.e. that isn't specific to a certain domain.
Bringing my first paragraph and second in with what thkala has said, consider that GMP/MPIR have optimised assembly versions per cpu architecture and OS they support. Really. It's a big job, but it is what makes those libraries so fast on a specific small subset of problems that are programming.
Sometimes domain specific enhancements can be made. This is about understanding the problem in question. For example, when doing finite field arithmetic under rijndael's finite field you can, based on the knowledge that the characteristic polynomial is 2 with 8 terms, assume that your integers are of size uint8_t and that addition/subtraction are equivalent to xor operations. How does this work? Well basically if you add or subtract two elements of the polynomial, they contain either zero or one. If they're both zero or both one, the result is always zero. If they are different, the result is one. Term by term, that is equivalent to xor across a 8-bit binary string, where each bit represents a term in the polynomial. Multiplication is also relatively efficient. You can bet that rijndael was designed to take advantage of this kind of result.
That's a very specific result. It depends entirely on what you're doing to make things efficient. I can't imagine many STL functions are purely optimised for cpu speed, because amongst other things STL provides: collections via templates, which are about memory, file access which is about storage, exception handling etc. In short, being really fast is a narrow subset of what STL does and what it aims to achieve. Also, you should note that optimisation has different views. For example, if your app is heavy on IO, you are IO bound. Having a massively efficient square root calculation isn't really helpful since "slowness" really means waiting on the disk/OS/your file parsing routine.
In short, you as a developer of an STL library are trying to build an "all round" library for many different use cases.
But, since these things are always interesting, you might well be interested in bit twiddling hacks. I can't remember where I saw that, but I've definitely stolen that link from somebody else on here.
Almost none. The standard library is designed the way it is for a reason.
Taking sqrt, which you mention as an example, the standard library version is written to be as fast as possible, without sacrificing numerical accuracy or portability.
The article you mention is really beyond useless. There are some good articles floating around the 'net, describing more efficient ways to implement square roots. But this article isn't among them (it doesn't even measure whether the described algorithms are faster!) Carmack's trick is slower than std::sqrt on a modern CPU, as well as being less accurate.
It was used in a game something like 12 years ago, when CPUs had very different performance characteristics. It was faster then, but CPU's have changed, and today, it's both slower and less accurate than the CPU's built-in sqrt instruction.
You can implement a square root function which is faster than std::sqrt without losing accuracy, but then you lose portability, as it'll rely on CPU features not present on older CPU's.
Speed, accuracy, portability: choose any two. The standard library tries to balance all three, which means that the speed isn't as good as it could be if you were willing to sacrifice accuracy or portability, and accuracy is good, but not as good as it could be if you were willing to sacrifice speed, and so on.
In general, forget any notion of optimizing the standard library. The question you should be asking is whether you can write more specialized code.
The standard library has to cover every case. If you don't need that, you might be able to speed up the cases that you do need. But then it is no longer a suitable replacement for the standard library.
Now, there are no doubt parts of the standard library that could be optimized. the C++ IOStreams library in particular comes to mind. It is often naively, and very inefficiently, implemented. The C++ committee's technical report on C++ performance has an entire chapter dedicated to exploring how IOStreams could be implemented to be faster.
But that's I/O, where performance is often considered to be "unimportant".
For the rest of the standard library, you're unlikely to find much room for optimization.