I just came to know that we can use registers, explicitly in C++ programs. I wonder what if i declare and use all available registers in a single C++ program and run it for considerable amount of time. How badly will my system behave and what (if any) measures will be taken by the os to come out of the situation.
The compiler will simply ignore the register keyword, so you are not going to run out of registers. It may well ignore it anyway - compilers are typically much better at register allocation than humans.
The register keyword indicates to the compiler that the variable does not need to be addressable in main memory. Thus the compiler can be sure that there are no pointers to the value and optimize accordingly.
A common misconsception: the register keyword
register Keyword (see the non-Microsoft specific part)
"register" keyword in C?
Excessive use of the register keyword is unlikely to have serious negative impact on modern systems. Every thread maintains its own register values during execution, and its register usage will not have any direct impact on other threads. The compiler will either reject or ignore register usage that cannot result in a viable program. Poor register use will at most merely reduce performance and the OS will take no special action.
Only specific number of Registers are available for your C++ program.
Also, it is just a suggestion for the compiler mostly compilers can do this optimization themselves so there is not really much use of using register keyword more because compilers may or may not follow the suggestion.
So the only thing register keyword does with modern compilers is prevent you from using & to take the address of the variable.
To Quote Herb Sutter on this:
Never write register. It's exactly as meaningful as whitespace
The register keyword is only a suggestion to the compiler and can be ignored. Let the compiler do the optimization for you.
The register keyword is only a polite suggestion to the compiler that you think this variable will be heavily used and could it pretty-please just keep it in a register. The compiler is free to ignore this suggestion and, in fact, will usually do so in a modern environment.
register is basically a vestigial remnant of the old, grossly-inefficient C compilers that were available way back when. (The same compilers that led to things like the execrable Duff's Device and other monstrosities, in fact.) Modern compilers are far more capable than you are of keeping track of which variables should be placed into which registers at which points in execution. They will, thus, politely ignore you without saying a word.
Als posted a link to Herb Sutter's article on keywords. I agree with Sutter that one should never use register. I disagree with him on whether register is meaningless.
It is worse than meaningless.
I have seen code where a variable qualified with register is later used with "&". Code with dozens and dozens of variables qualified with register. And the ultimate doozy, "register volatile foo;"
Never use "register".
All the CPU registers are at the disposal of your program anyway, so there is nothing exceptional in using all of them. OS won't even notice it.
Related
I took a look at the parts of the code behind memcpy and other functions (memset, memmove, ...) and it seems to be a lot, and a lot of assembly code.
Other stackoverflow questions on this topic mention that a reason for that may be because it contains different code for different CPU architectures.
I have personally written my own memcpy/memset functions with very few lines of C++ code and in 1 million iterations with time measured with chrono, I consistently get better times.
So the question is, why did the programmers not just write the code in C/C++ and let the compiler interpret and optimize it how it thinks is best? Why so much assembly code?
This "It's pointless to rewrite in assembly" is a myth. A more accurate way to express it is that few programmers have the skill required to beat the compiler. But they do exist, and especially among those who develop compilers.
It's technically impossible to write memcpy in standard C++ and C as you have to rely on undefined constructs. The same is true for other standard library functions; memset and malloc are two other examples.
But that's not only reason: A C and C++ standard library implementation is, these days, so closely coupled with a particular compiler that the library writers can take all sorts of liberties that you, as a consumer, cannot. isupper, toupper, &c. stand out as good examples where a particular character encoding can be assumed.
Another good reason is that expertly handcrafted assembly can be difficult to beat for performance.
Compiler usually generates some unnecessary code (compared to hand written assembly) even on full optimization level. This wastes memory space which is not good specially on embedded systems and reduces performance.
Are you sure your custom codes are complete and flawless? I don't think so; because when you are writing assembly, you have full control on everything, but when you compile a code, there is a possibility that compiler generates something that you don't want (and it's your fault, not compiler).
It's almost impossible for compiler to generate code which is as complete as hand written assembly and is smaller than it at the same time.
As mentioned in some comments, it also depends on platform.
The memcpy and memset as well as other function, are written in assembly to take advantage of processor specific instructions.
For example, the ARM processor has a function that can load multiple registers from successive locations with one instruction. There is also the store multiple instruction that stores multiple registers into successive locations. The Intel x86 has block read and write instructions.
The assembly language allows for copying 4 8-bit bytes using a single 32-bit register.
Some processors allow for conditional execution of instructions, which helps when rolling out loops.
I've written optimized memcpy and memset functions for various processors. I've also spent a lot of time arguing (discussing) C and C++ "best" implementations with compilers. It's a little difficult using C or C++ to try and get the compiler to use the processor instructions you want it to.
Why did the programmers not just write the code in C/C++
We aren't mind readers. We don't even know what they wrote. If you need an authoritative answer, then you should ask the programmers that wrote the code.
But we can hypothesise, that they wrote what they did because it was fast, and did the right thing.
This question already has answers here:
Replacement for deprecated register keyword C++ 11
(3 answers)
Closed 7 years ago.
I was reading this and it says that the register keyword will most probably be removed from the next C++ standard. It also says that register was deprecated in 2011. So, what's wrong with register storage class specifier?
I think modern compilers are very smart and they implicitly optimize frequently used variables for speed (fast access) and puts them in CPU registers.
However, C++ experts also say don't or never use register. As such, what's the problem with the register keyword?
You've pretty much answered your own question:
I think modern compilers are very smart so they implicitly optimizes frequently used variables for speed (fast access) & puts them in CPU register.
That's precisely the point—optimisers are so good at register allocation nowadays that any attempt from the programmer to enforce their will through the register keyword would likely lead to a pessimisatin, and is therefore simply ignored by the compiler. Remember that register was never a binding requirement, always just a hint to the compiler. Now that they rightfully scoff at such hints, the keyword is simply obsolete and useless.
So, to directly answer your question of "what's wrong with it:" it no longer serves any purpose whatsoever, as the only one it ever had ("hint to the compiler to put this thing in a register") is now superseded by the compilers being way better at this than humans.
Standard doesn't require that register variable to be put into registers, instead it's just a hint for compiler for variables that are often used. And compiler can determine it on its own.
Here, clause regarding register keyword from the link you posted:
A register specifier is a hint to the implementation that the variable so declared will be heavily used. [ Note: The hint can be ignored and in most implementations it will be ignored if the address of the variable is taken. This use is deprecated (see D.2). -- end note ]
I read about cache optimization in C++ and the mechanisms, modern CPUs use to predict what data is needed next, to copy that into cache. But is there a direct way in C++ for the programmers, who know what actually is needed next, to determine what data gets copied into CPU cache?
This varies with the processor and compiler you're using.
Assuming you're using an Intel x86/x64 or compatible (e.g., AMD) processor, the processor provides a number of prefetch instructions, and most compilers include intrinsics to invoke them. With VC++ you use _m_prefetch or _m_prefetchw. With gcc you use __builtin_prefetch.
Likewise, VC++ on an ARM provides a __prefetch intrinsic for the same purpose (no, I really don't know why they couldn't have used the same name as on x86; the signature and effect appear identical).
Most other reasonably modern, higher-end processors probably provide similar instructions, and
I'd guess most compilers provide intrinsics to make them available, but just as with these, the names of the intrinsics will vary. For that matter, even though the functions are intrinsic to the compiler, most require that you include some header to use them -- and the name of the header will also vary.
The prefetch intrinsics Jerry provided would do the trick. keep in mind that there are several flavors controlled by an argument to that function, determining which levels of the cache (if any) would be used to keep the line. A prefetch_NTA for e.g. would not pollute the caches, but rather provide the line only for immediate use (and is used in cases where you're going to use it soon and once only)
Also keep in mind that these instructions are basically hints to the CPU (which also does quite well by itself trying to guess which lines to prefetch). As such, they are not guaranteed to work, they might fail in many cases (if the memory subsystem is loaded, or the address got swapped out of memory).
So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers. It would make sense that this optimization will kick in if there are only 1-2 parameters, and not when there are 256 (not that one would want to have the max number of parameters).
How can one find out the parameter limit (number of parameters) for a certain compiler (such as gcc) where one can be sure that this optimization will be used?
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers.
As FrankH says in his comments and as I'm going to say in my answer, the application binary interface for the system in question determines how arguments are passed to functions - this is called the calling convention for that platform.
To complicate matters, x86 32-bit actually has several. This is historical and comes from the fact that when Win32 bit arrived, everyone went crazy doing different things.
So, yes, you can "optimise" by writing function calls in such a way, but no, you shouldn't. You should follow the standards for your platform. Because the honest truth is, the speed of stack access probably isn't slowing your code down to that great an extent that you need to be binary-incompatible from everyone else on your system.
Why the need for ABIs/standard calling conventions? Well, in terms of using the processor registers, stack etc, applications must agree on what means what and where it shoudl go. If one function decided all its arguments were in registers and another that some were on the stack, how would they be interoperable? Moreover, you might come across the term scratch registers to mean those registers you don't have to restore. What happens if you call a function expecting it to leave some registers alone?
Anyway, as for what you asked for, here's some ABI documentation:
The difference between x86 and x64 on windows.
x86_64 ABI used for Unix-like platforms.
Wikipedia's x86 calling conventions.
A document on compiler calling conventions.
The last one is my favourite. To quote it:
In the days of the old DOS operating system, it was often possible to combine development
tools from different vendors with few compatibility problems. With 32-bit Windows, the
situation has gone completely out of hand. Different compilers use different data
representations, different function calling conventions, and different object file formats.
While static link libraries have traditionally been considered compiler-specific, the
widespread use of dynamic link libraries (DLL's) has made the distribution of function
libraries in binary form more common.
So whatever you're trying to do with optimising via modifying the function calling method, don't. Find another way to optimise. Profile your code. Study the compiler optimisations you've got for your compiler (-OX) if you think it helps and dump the assembly to check, if the speed is really that crucial
For publically visible functions, this is documented in the ABI standard. For functions that are not referencible from the outside, all bets are off anyway.
You would have to read the fine manual for the compiler. If you were lucky, you would find it there in a description of function calling conventions. Otherwise, for an OSS compiler such as gcc you would probably have to read its source-code.