Does typecasting consume extra CPU cycles - c++

Does typecasting in C/C++ result in extra CPU cycles?
My understanding is that is should consume extra CPU cycles atleast in certain cases. Like typecasting from float to integer where the CPU should require to convert a float structure to integer.
float a=2.0;
int b= (float)a;
I would like to understand the cases where it would/would not consume extra CPU cycles.

I would like to say that "converting between types" is what we should be looking at, not whether there is a cast or not. For example
int a = 10;
float b = a;
will be the same as :
int a = 10;
float b = (float)a;
This also applies to changing the size of a type, e.g.
char c = 'a';
int b = c;
this will "extend c into an int size from a single byte [using byte in the C sense, not 8-bit sense]", which will potentially add an extra instruction (or extra clockcycle(s) to the instruction used) above and beyond the datamovement itself.
Note that sometimes these conversions aren't at all obvious. On x86-64, a typical example is using int instead of unsigned int for indices in arrays. Since pointers are 64-bit, the index needs to be converted to 64-bit. In the case of an unsigned, that's trivial - just use the 64-bit version of the register the value is already in, since a 32-bit load operation will zero-fill the top part of the register. But if you have an int, it could be negative. So the compiler will have to use the "sign extend this to 64 bits" instruction. This is typically not an issue where the index is calculated based on a fixed loop and all values are positive, but if you call a function where it is not clear if the parameter is positive or negative, the compiler will definitely have to extend the value. Likewise if a function returns a value that is used as an index.
However, any reasonably competent compiler will not mindlessly add instructions to convert something from its own type to itself (possibly if optimization is turned off, it may do - but minimal optimization should see that "we're converting from type X to type X, that doesn't mean anything, lets take it away").
So, in short, the above example is does not add any extra penalty, but there are certainly cases where "converting data from one type to another does add extra instructions and/or clockcycles to the code".

It'll consume cycles where it alters the underlying representation. So it will consume cycles if you convert a float to an int or vice-versa. Depending on architecture casts such as int to char or long long to int may or may not consume cycles (but more often than not they will). Casting between pointer types will only consume cycles if there is multiple inheritance involved.

There are different types of casts. C++ has different types of cast operators for the different types of casts. If we look at it in those terms, ...
static_cast will usually have a cost if you're converting from one type to another, especially if the target type is a different size than the source type. static_casts are sometimes used to cast a pointer from a derived type to a base type. This may also have a cost, especially if the derived class has multiple bases.
reinterpret_cast will usually not have a direct cost. Loosely speaking, this type of cast doesn't change the value, it just changes how it's interpreted. Note, however, that this may have an indirect cost. If you reinterpret a pointer to an array of bytes as a pointer to an int, then you may pay a cost each time you dereference that pointer unless the pointer is aligned as the platform expects.
const_cast should not cost anything if you're adding or removing constness, as it's mostly an annotation to the compiler. If you're using it to add or remove a volatile qualifier, then I suppose there may be a performance difference because it would enable or disable certain optimizations.
dynamic_cast, which is used to cast from a pointer to a base class to a pointer to a derived class, most certainly has a cost, as it must--at a minimum--check if the conversion is appropriate.
When you use a traditional C cast, you're essentially just asking the compiler to choose the more specific type of cast. So to figure out if your C cast has a cost, you need to figure out what type of cast it really is.

DL and enjoy Agner Fog's manuals:
http://www.agner.org/optimize/
1. Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms
It is huge PDF but for start you can check out:
14.7 Don't mix float and double
14.8 Conversions between floating point numbers and integers

Related

Compiler optimizations allowed via "int", "least" and "fast" non-fixed width types C/C++

Clearly, fixed-width integral types should be used when the size is important.
However, I read (Insomniac Games style guide), that "int" should be preferred for loop counters / function args / return codes / ect when the size isn't important - the rationale given was that fixed-width types can preclude certain compiler optimizations.
Now, I'd like to make a distinction between "compiler optimization" and "a more suitable typedef for the target architecture". The latter has global scope, and my guess probably has very limited impact unless the compiler can somehow reason about the global performance of the program parameterized by this typedef. The former has local scope, where the compiler would have the freedom to optimize number of bytes used, and operations, based on local register pressure / usage, among other things.
Does the standard permit "compiler optimizations" (as we've defined) for non-fixed-width types? Any good examples of this?
If not, and assuming the CPU can operate on smaller types as least as fast as larger types, then I see no harm, from a performance standpoint, of using fixed-width integers sized according to local context. At least that gives the possibility of relieving register pressure, and I'd argue couldn't be worse.
The reason that the rule of thumb is to use an int is that the standard defines this integral type as the natural data type of the CPU (provided that it is sufficiently wide for the range INT_MIN to INT_MAX. That's where the best-performance stems from.
There are many things wrong with int_fast types - most notably that they can be slower than int!
#include <stdio.h>
#include <inttypes.h>
int main(void) {
printf("%zu\n", sizeof (int_fast32_t));
}
Run this on x86-64 and it prints 8... but it makes no sense - using 64-bit registers often require prefixes in x86-64 bit mode, and the "behaviour on overflow is undefined" means that using 32-bit int it doesn't matter if the upper 32 bits of the 64 bit register are set after arithmetic - the behaviour is "still correct".
What is even worse, however, than using the signed fast or least types, is using a small unsigned integer instead of size_t or a signed integer for a loop counter - now the compiler must generate extra code to "ensure the correct wraparound behaviour".
I'm not very familiar with the x86 instruction set but unless you can guarantee that practically every arithmetic and move instruction also allows additional shift and (sign) extends then the assumption that smaller types are "as least as fast" as larger ones is not true.
The complexity of x86 makes it pretty hard to come up with simple examples so lets consider an ARM microcontroller instead.
Lets define two addition functions which only differ by return type. "add32" which returns an integer of full register width and "add8" which only returns a single byte.
int32_t add32(int32_t a, int32_t b) { return a + b; }
int8_t add8(int32_t a, int32_t b) { return a + b; }
Compiling those functions with -Os gives the following assembly:
add32(int, int):
add r0, r0, r1
bx lr
add8(int, int):
add r0, r0, r1
sxtb r0, r0 // Sign-extend single byte
bx lr
Notice how the function which only returns a byte is one instruction longer. It has to truncate the 32bit addition to a single byte.
Here is a link to the code # compiler explorer:
https://godbolt.org/z/ABFQKe
However, I read (Insomniac Games style guide), that "int" should be preferred for loop counters
You should rather be using size_t, whenever iterating over an array. int has other problems than performance, such as being signed and also problematic when porting.
From a standard point-of-view, for a scenario where "n" is the size of an int, there exists no case where int_fastn_t should perform worse than int, or the compiler/standard lib/ABI/system has a fault.
Does the standard permit "compiler optimizations" (as we've defined) for non-fixed-width types? Any good examples of this?
Sure, the compiler might optimize the use of integer types quite wildly, as long as it doesn't affect the outcome of the result. No matter if they are int or int32_t.
For example, an 8 bit CPU compiler might optimize int a=1; int b=1; ... c = a + b; to be performed on 8 bit arithmetic, ignoring integer promotions and the actual size of int. It will however most likely have to allocate 16 bits of memory to store the result.
But if we give it some rotten code like char a = 0x80; int b = a >> 1;, it will have to do the optimization so that the side affects of integer promotion are taken in account. That is, the result could be 0xFFC0 rather than 0x40 as one might have expected (assuming signed char, 2's complement, arithmetic shift). The a >> 1 part isn't possible to optimize to an 8 bit type because of this - it has to be carried out with 16 bit arithmetic.
I think the question you are trying to ask is:
Is the compiler allowed to make additional optimizations for a
non-fixed-width type such as int beyond what it would be allowed for
a fixed width type like int32_t that happens to have the same
length on the current platform?
That is, you are not interested in the part where the size of the non-fixed width type is allowed to be chosen appropriately for the hardware - you are aware of that and are asking if beyond that additional optimizations are available?
The answer, as far as I am aware or have seen, is no. No both in the sense that compilers do not actually optimize int differently than int32_t (on platforms where int is 32-bits), and also no in the sense that there are not optimizations allowed by the standard for int which are also not allowed for int32_t1 (this second part is wrong - see comments).
The easiest way to see this is that the various fixed width integers are all typedefs for various underlying primitive integer types - so on a platform with 32-bit integers int32_t will probably be a typedef (perhaps indirectly) of int. So from a behavioral and optimization point of view, the types are identical, and as soon as you are in the IR world of the compiler, the original type probably isn't even really available without jumping through oops (i.e., int and int32_t will generate the same IR).
So I think the advice you received was wrong, or at best misleading.
1 Of course the answer to the question of "Is it allowed for a compiler to optimize int better than int32_t is yes, since there are not particular requirements on optimization so a compiler could do something weird like that, or the reverse, such as optimizing int32_t better than int. I that's not very interesting though.

Why is “cast from ‘X*’ to ‘Y’ loses precision” a hard error and what is suitable fix for legacy code

1. Why?
Code like this used to work and it's kind of obvious what it is supposed to mean. Is the compiler even allowed (by the specification) to make it an error?
I know that it's loosing precision and I would be happy with a warning. But it still has a well-defined semantics (at least for unsigned downsizing cast is defined) and the user just might want to do it.
2. Workaround
I have legacy code that I don't want to refactor too much because it's rather tricky and already debugged. It is doing two things:
Sometimes stores integers in pointer variables. The code only casts the pointer to integer if it stored an integer in it before. Therefore while the cast is downsizing, the overflow never happens in reality. The code is tested and works.
When integer is stored, it always fits in plain old unsigned, so changing the type is not considered a good idea and the pointer is passed around quite a bit, so changing it's type would be somewhat invasive.
Uses the address as hash value. A rather common thing to do. The hash table is not that large to make any sense to extend the type.
The code uses plain unsigned for hash value, but note that the more usual type of size_t may still generate the error, because there is no guarantee that sizeof(size_t) >= sizeof(void *). On platforms with segmented memory and far pointers, size_t only has to cover the offset part.
So what are the least invasive suitable workarounds? The code is known to work when compiled with compiler that does not produce this error, so I really want to do the operation, not change it.
Notes:
void *x;
int y;
union U { void *p; int i; } u;
*(int*)&x and u.p = x, u.i are not equivalent to (int)x and are not the opposite of (void *)y. On big endian architectures, the first two will return the bytes on lower addresses while the later will work on low order bytes, which may reside on higher addresses.
*(int*)&x and u.p = x, u.i are both strict aliasing violations, (int)x is not.
C++, 5.2.10:
4 - A pointer can be explicitly converted to any integral type large enough to hold it. [...]
C, 6.3.2.3:
6 - Any pointer type may be converted to an integer type. [...] If the result cannot be represented in the integer type, the behavior is undefined. [...]
So (int) p is illegal if int is 32-bit and void * is 64-bit; a C++ compiler is correct to give you an error, while a C compiler may either give an error on translation or emit a program with undefined behaviour.
You should write, adding a single conversion:
(int) (intptr_t) p
or, using C++ syntax,
static_cast<int>(reinterpret_cast<intptr_t>(p))
If you're converting to an unsigned integer type, convert via uintptr_t instead of intptr_t.
This is a tough one to solve "generically", because the "looses precision" indicates that your pointers are larger than the type you are trying to store it in. Which may well be "ok" in your mind, but the compiler is concerned that you will be restoring the int value back into a pointer, which has now lost the upper 32 bits (assuming we're talking 32-bit int and 64-bit pointers - there are other possible combinations).
There is uintptr_t that is size-compatible with whatever the pointer is on the systems, so typically, you can overcome the actual error by:
int x = static_cast<int>(reinterpret_cast<uintptr_t>(some_ptr));
This will first force a large integer from a pointer, and then cast the large integer to a smaller type.
Answer for C
Converting pointers to integers is implementation defined. Your problem is that the code that you are talking about seems never have been correct. And probably only worked on ancient architectures where both int and pointers are 32 bit.
The only types that are supposed to convert without loss are [u]intptr_t, if they exist on the platform (usually they do). Which part of such an uintptr_t is appropriate to use for your hash function is difficult to tell, you shouldn't make any assumptions on that. I would go for something like
uintptr_t n = (uintptr_t)x;
and then
((n >> 32) ^ n) & UINT32_MAX
this can be optimized out on 32 bit archs, and would give you traces of all other bits on 64 bit archs.
For C++ basically the same should apply, just the cast would be reinterpret_cast<std:uintptr_t>(x).

Is pointer conversion expensive or not?

Is pointer conversion considered expensive? (e.g. how many CPU cycles it takes to convert a pointer/address), especially when you have to do it quite frequently, for instance (just an example to show the scale of freqency, I know there are better ways for this particular cases):
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i);
};
(e.g. how many CPU cycles it takes to convert a pointer/address)
In most machine code languages there is only 1 "type" of pointer and so it doesn't cost anything to convert between them. Keep in mind that C++ types really only exist at compile time.
The real issue is that this sort of code can break strict aliasing rules. You can read more about this elsewhere, but essentially the compiler will either produce incorrect code through undefined behavior, or be forced to make conservative assumptions and thus produce slower code. (note that the char* and friends is somewhat exempt from the undefined behavior part)
Optimizers often have to make conservative assumptions about variables in the presence of pointers. For example, a constant propagation process that knows the value of variable x is 5 would not be able to keep using this information after an assignment to another variable (for example, *y = 10) because it could be that *y is an alias of x. This could be the case after an assignment like y = &x.
As an effect of the assignment to *y, the value of x would be changed as well, so propagating the information that x is 5 to the statements following *y = 10 would be potentially wrong (if *y is indeed an alias of x). However, if we have information about pointers, the constant propagation process could make a query like: can x be an alias of *y? Then, if the answer is no, x = 5 can be propagated safely.
Another optimization impacted by aliasing is code reordering. If the compiler decides that x is not aliased by *y, then code that uses or changes the value of x can be moved before the assignment *y = 10, if this would improve scheduling or enable more loop optimizations to be carried out.
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition, see section 6.5, paragraph 7) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", sometime allows for impressive increases in performance,[1] but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[2] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[3] In such cases, when compiled with gcc, the option -fno-strict-aliasing is invoked to prevent unwanted optimizations that could yield unexpected code.
[edit]
http://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts_with_optimization
What is the strict aliasing rule?
On any architecture you're likely to encounter, all pointer types have the same representation, and so conversion between different pointer types representing the same address has no run-time cost. This applies to all pointer conversions in C.
In C++, some pointer conversions have a cost and some don't:
reinterpret_cast and const_cast (or an equivalent C-style cast, such as the one in the question) and conversion to or from void* will simply reinterpret the pointer value, with no cost.
Conversion between pointer-to-base-class and pointer-to-derived class (either implicitly, or with static_cast or an equivalent C-style cast) may require adding a fixed offset to the pointer value if there are multiple base classes.
dynamic_cast will do a non-trivial amount of work to look up the pointer value based on the dynamic type of the object pointed to.
Historically, some architectures (e.g. PDP-10) had different representations for pointer-to-byte and pointer-to-word; there may be some runtime cost for conversions there.
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i); // bad cast
}
Remember, the machine only knows memory addresses, data and code. Everything else (such as types etc) are known only to the Compiler(that aid the programmer), and that does all the pointer arithmetic, only the compiler knows the size of each type.. so on and so forth.
At runtime, there are no machine cycles wasted in converting one pointer type to another because the conversion does not happen at runtime. All pointers are treated as of 4 bytes long(on a 32 bit machine) nothing more and nothing less.
It all depends on your underlying hardware.
On most machine architectures, all pointers are byte pointers, and converting between a byte pointer and a byte pointer is a no-op. On some architectures, a pointer conversion may under some circumstances require extra manipulation (there are machines that work with word based addresses for instance, and converting a word pointer to a byte pointer or vice versa will require extra manipulation).
Moreover, this is in general an unsafe technique, as the compiler can't perform any sanity checking on what you are doing, and you can end up overwriting data you didn't expect.

reinterpret_cast cast cost

My understanding is that C++ reinterpret_cast and C pointer cast is a just
a compile-time functionality and that it has no performance cost at all.
Is this true?
It's a good assumption to start with. However, the optimizer may be restricted in what it can assume in the presence of a reinterpret_cast<> or C pointer cast. Then, even though the cast itself has no associated instructions, the resulting code is slower.
For instance, if you cast an int to a pointer, the optimizer likely will have no idea what that pointer could be pointing to. As a result, it probably has to assume that a write through that pointer can change any variable. That beats very common optimizations such as storing variables in registers.
That's right. No cost other than any gain/loss in performance for performing instructions at the new width, which I might add, is only a concern in rare cases. Casting between pointers on every platform I've ever heard of has zero cost, and no performance change whatsoever.
C style casts in C++ will attempt a static_cast first and only perform a reinterpret_cast if a static cast cannot be performed. A static_cast may change the value of the pointer in the case of multiple inheritance (or when casting an interface to a concrete type), this offset calculation may involve an extra machine instruction. This will at most be 1 machine instruction so really very small.
Yes, this is true. Casting type which has runtime cost is dynamic_cast.
You're right, but think about it: reinterpret_cast means maybe a bad design or that you're doing something very low level.
dynamic-cast instead it will cost you something, because it has to look in a lookup table at runtime.
reinterpret_cast does not incur runtime cost.. however you have to be careful, as every use of reinterpret_cast is implementation defined. For example, it is possible reinterpreting a char array as an int array could cause the target architecture to throw an interrupt, because different types may have different alignment rules.
Get correct first, then worry about efficiency.
I was looking at my assembler code before and after reinterpret casting signed char as unsigned char. The instructions grew by about 3 or four more instructions.
int main()
{
signed char i = 0x80;
(unsigned char&)i >>= 7;
return i;
}
I was casting to unsigned char to make the compiler use SHL instruction, rather than SAR instruction, so that newly shifted shift in bits would be zer0s instead of var i signed bit value.
The compiler still and seems to always use SAR instruction. But the reinterpret casting made the compiler add more instructions. 3 to 4 more instructions!
I was concerned why my unicode function for converting UTF8 to UTF16 string was almost 3 times slower than Win32 MultiByteToWideChar(). Now I am worried that casting is one of the main factors.
Which is IRONIC, as we use reinterpret cast for speed.

What does this C++ construct do?

Somewhere in lines of code, I came across this construct...
//void* v = void* value from an iterator
int i = (int)(long(v))
What possible purpose can this contruct serve?
Why not simply use int(v) instead? Why the cast to long first?
It most possibly silences warnings.
Assuming a 32bit architecture with sizeof(int) < sizeof(long) and sizeof(long) == sizeof(void *) you possibly get a warning if you cast a void * to an int and no warning if you cast a void * to a long as you're not truncating. You then get a warning assigning a long to an int (possible truncation) which is removed by then explicitly casting from a long to an int.
Without knowing the compiler it's hard to say, but I've certainly seen multi-step casts required to prevent warnings. Why not try converting the construct to what you think it should be and see what the compiler says (of course that only helps you to work out what was in the mind of the original programmer if you're using the same compiler and same warning level as they were).
It does eeevil.
On most architectures, a pointer can be considered to be just another kind of number. On most architectures, long is as many bits as a pointer, so there is a 1-to-1 map between long values and pointers. But violations, especially of the second rule, are not uncommon!
long(v) is an alias for reinterpret_cast<long>(v), which carries no guarantees. Not really fit for any purpose, unless your ABI spec says otherwise.
However, for whatever reason, whoever wrote that code prefers int to long. So they again cross their fingers and hope that no essential information is thrown out in the bits that may possibly be lost in the int to long cast.
Two uses of this are creating a unique object identifier, or trying to somehow package the pointer for some kind of arithmetic otherwise unsupported by pointers.
An opaque identifier can be a void*, so casting to integral type is unnecessary.
"Extracting" an integer from a pointer (for e.g. a division operation) can always be done by subtracting a base pointer to obtain a difference of type ptrdiff_t, which is usually long.