What does this C++ construct do? - c++

Somewhere in lines of code, I came across this construct...
//void* v = void* value from an iterator
int i = (int)(long(v))
What possible purpose can this contruct serve?
Why not simply use int(v) instead? Why the cast to long first?

It most possibly silences warnings.
Assuming a 32bit architecture with sizeof(int) < sizeof(long) and sizeof(long) == sizeof(void *) you possibly get a warning if you cast a void * to an int and no warning if you cast a void * to a long as you're not truncating. You then get a warning assigning a long to an int (possible truncation) which is removed by then explicitly casting from a long to an int.
Without knowing the compiler it's hard to say, but I've certainly seen multi-step casts required to prevent warnings. Why not try converting the construct to what you think it should be and see what the compiler says (of course that only helps you to work out what was in the mind of the original programmer if you're using the same compiler and same warning level as they were).

It does eeevil.
On most architectures, a pointer can be considered to be just another kind of number. On most architectures, long is as many bits as a pointer, so there is a 1-to-1 map between long values and pointers. But violations, especially of the second rule, are not uncommon!
long(v) is an alias for reinterpret_cast<long>(v), which carries no guarantees. Not really fit for any purpose, unless your ABI spec says otherwise.
However, for whatever reason, whoever wrote that code prefers int to long. So they again cross their fingers and hope that no essential information is thrown out in the bits that may possibly be lost in the int to long cast.
Two uses of this are creating a unique object identifier, or trying to somehow package the pointer for some kind of arithmetic otherwise unsupported by pointers.
An opaque identifier can be a void*, so casting to integral type is unnecessary.
"Extracting" an integer from a pointer (for e.g. a division operation) can always be done by subtracting a base pointer to obtain a difference of type ptrdiff_t, which is usually long.

Related

Why is “cast from ‘X*’ to ‘Y’ loses precision” a hard error and what is suitable fix for legacy code

1. Why?
Code like this used to work and it's kind of obvious what it is supposed to mean. Is the compiler even allowed (by the specification) to make it an error?
I know that it's loosing precision and I would be happy with a warning. But it still has a well-defined semantics (at least for unsigned downsizing cast is defined) and the user just might want to do it.
2. Workaround
I have legacy code that I don't want to refactor too much because it's rather tricky and already debugged. It is doing two things:
Sometimes stores integers in pointer variables. The code only casts the pointer to integer if it stored an integer in it before. Therefore while the cast is downsizing, the overflow never happens in reality. The code is tested and works.
When integer is stored, it always fits in plain old unsigned, so changing the type is not considered a good idea and the pointer is passed around quite a bit, so changing it's type would be somewhat invasive.
Uses the address as hash value. A rather common thing to do. The hash table is not that large to make any sense to extend the type.
The code uses plain unsigned for hash value, but note that the more usual type of size_t may still generate the error, because there is no guarantee that sizeof(size_t) >= sizeof(void *). On platforms with segmented memory and far pointers, size_t only has to cover the offset part.
So what are the least invasive suitable workarounds? The code is known to work when compiled with compiler that does not produce this error, so I really want to do the operation, not change it.
Notes:
void *x;
int y;
union U { void *p; int i; } u;
*(int*)&x and u.p = x, u.i are not equivalent to (int)x and are not the opposite of (void *)y. On big endian architectures, the first two will return the bytes on lower addresses while the later will work on low order bytes, which may reside on higher addresses.
*(int*)&x and u.p = x, u.i are both strict aliasing violations, (int)x is not.
C++, 5.2.10:
4 - A pointer can be explicitly converted to any integral type large enough to hold it. [...]
C, 6.3.2.3:
6 - Any pointer type may be converted to an integer type. [...] If the result cannot be represented in the integer type, the behavior is undefined. [...]
So (int) p is illegal if int is 32-bit and void * is 64-bit; a C++ compiler is correct to give you an error, while a C compiler may either give an error on translation or emit a program with undefined behaviour.
You should write, adding a single conversion:
(int) (intptr_t) p
or, using C++ syntax,
static_cast<int>(reinterpret_cast<intptr_t>(p))
If you're converting to an unsigned integer type, convert via uintptr_t instead of intptr_t.
This is a tough one to solve "generically", because the "looses precision" indicates that your pointers are larger than the type you are trying to store it in. Which may well be "ok" in your mind, but the compiler is concerned that you will be restoring the int value back into a pointer, which has now lost the upper 32 bits (assuming we're talking 32-bit int and 64-bit pointers - there are other possible combinations).
There is uintptr_t that is size-compatible with whatever the pointer is on the systems, so typically, you can overcome the actual error by:
int x = static_cast<int>(reinterpret_cast<uintptr_t>(some_ptr));
This will first force a large integer from a pointer, and then cast the large integer to a smaller type.
Answer for C
Converting pointers to integers is implementation defined. Your problem is that the code that you are talking about seems never have been correct. And probably only worked on ancient architectures where both int and pointers are 32 bit.
The only types that are supposed to convert without loss are [u]intptr_t, if they exist on the platform (usually they do). Which part of such an uintptr_t is appropriate to use for your hash function is difficult to tell, you shouldn't make any assumptions on that. I would go for something like
uintptr_t n = (uintptr_t)x;
and then
((n >> 32) ^ n) & UINT32_MAX
this can be optimized out on 32 bit archs, and would give you traces of all other bits on 64 bit archs.
For C++ basically the same should apply, just the cast would be reinterpret_cast<std:uintptr_t>(x).

porting 32 bit to 64bit code

Was trying to port 32bit to 64bit code I was wondering if there are some standard rules when it comes to porting ?
I have my code compiling in a 64bit environment and now I come across some errors like
cast from pointer to integer of different size [-Werror=pointer-to-int-cast] for
x = (int32_t)y;
And to get of this I use x = (size_t)y; I get rid of the error but is this the correct way. Also in various location I have to cast a variable to (unsigned long long). For example
printf("Total Time : %5qu\n",time->compile_time
This gives an error error: format '%qu' expects argument of type 'long long unsigned int', but argument 2 has type (XYZ).
to get this fixed i do something like
printf("Total Time : %5qu\n",(unsigned long long) time->compile_time
Again is this proper ??
I think it's safe to assume that y is a pointer in this case.
Instead of size_t you should use intptr_t or uintptr_t.
See size_t vs. uintptr_t.
As for your second cast it depends what you mean by proper?
The usual advice is to avoid casting. However, like all things in programming there is a reason that they are available. When working on an implementation of malloc on an embedded system I had to cast pointers to uintptr_t in order to be able to do the necessary arithmetic on them. This code was tested on a 64 bit PC but ran on a 32 bit micro controller. The fact that I used two architectures was the best way to ensure it was somewhat portable code.
Casting though makes your code dependent on how the underlying type is defined! Just like you noticed with your x = (int32_t)y this line made your code dependent on the fact that a pointer was 32 bits wide.
The only advice I can give you is to know your types. If you want to cast, it is ok (so long as you can't make your variable of the correct type to begin with) but it may reduce your portability unless you choose the "correct" type to cast to.
The same goes for the printf cast. If I was you, I would read the definition of %5qu thoroughly first (this may help). Then I would attempt to use an appropriately typed variable (or conversely a different format string) and only if that failed would I resort to a cast.
I have never used %qu but I would interpret it as a 64 bit unsigned int so I would try using uint64_t (because long long is not guaranteed to be 64 bits across all platforms). Although by what I read on Wikipedia the q specifier is platform specific to begin with so it might be wise to change it.
Any more than this and the question becomes too broad (it's good that we stuck to specific examples). If you get stuck, come back with individual types that you want to check and ask questions only about them.
Was it Stroustrup that said they call it a 'cast' because it props up something that's broken ? ;-)
x = (int32_t) y;
In this case you are using an exact width type, so it really depends on what x and y are. The error message suggest that y is a pointer. A pointer is not an int32_t, so the real question is why is y being assigned to x ... it may indicate a potential problem. Casting it away may just cover the problem so that it bites you at run-time rather than compile-time. Figure out what the code thinks it's doing and "re-jigger" the types to fit the code. The reason the error goes away when you use a (size_t) cast is that likely the pointer is 64 bits and size_t is 64 bits, but you could consider that a simple form of random casting luck. The same is true when casting to (unsigned long long). Don't assume that an int is 32 or 64 bits and don't using casting as a porting tool ... it will get you in trouble. It's tough to be more specific based on a single line of code. If you want to post a < 25 line function that has issues; more specific advice may be available.

Is pointer conversion expensive or not?

Is pointer conversion considered expensive? (e.g. how many CPU cycles it takes to convert a pointer/address), especially when you have to do it quite frequently, for instance (just an example to show the scale of freqency, I know there are better ways for this particular cases):
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i);
};
(e.g. how many CPU cycles it takes to convert a pointer/address)
In most machine code languages there is only 1 "type" of pointer and so it doesn't cost anything to convert between them. Keep in mind that C++ types really only exist at compile time.
The real issue is that this sort of code can break strict aliasing rules. You can read more about this elsewhere, but essentially the compiler will either produce incorrect code through undefined behavior, or be forced to make conservative assumptions and thus produce slower code. (note that the char* and friends is somewhat exempt from the undefined behavior part)
Optimizers often have to make conservative assumptions about variables in the presence of pointers. For example, a constant propagation process that knows the value of variable x is 5 would not be able to keep using this information after an assignment to another variable (for example, *y = 10) because it could be that *y is an alias of x. This could be the case after an assignment like y = &x.
As an effect of the assignment to *y, the value of x would be changed as well, so propagating the information that x is 5 to the statements following *y = 10 would be potentially wrong (if *y is indeed an alias of x). However, if we have information about pointers, the constant propagation process could make a query like: can x be an alias of *y? Then, if the answer is no, x = 5 can be propagated safely.
Another optimization impacted by aliasing is code reordering. If the compiler decides that x is not aliased by *y, then code that uses or changes the value of x can be moved before the assignment *y = 10, if this would improve scheduling or enable more loop optimizations to be carried out.
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition, see section 6.5, paragraph 7) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", sometime allows for impressive increases in performance,[1] but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[2] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[3] In such cases, when compiled with gcc, the option -fno-strict-aliasing is invoked to prevent unwanted optimizations that could yield unexpected code.
[edit]
http://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts_with_optimization
What is the strict aliasing rule?
On any architecture you're likely to encounter, all pointer types have the same representation, and so conversion between different pointer types representing the same address has no run-time cost. This applies to all pointer conversions in C.
In C++, some pointer conversions have a cost and some don't:
reinterpret_cast and const_cast (or an equivalent C-style cast, such as the one in the question) and conversion to or from void* will simply reinterpret the pointer value, with no cost.
Conversion between pointer-to-base-class and pointer-to-derived class (either implicitly, or with static_cast or an equivalent C-style cast) may require adding a fixed offset to the pointer value if there are multiple base classes.
dynamic_cast will do a non-trivial amount of work to look up the pointer value based on the dynamic type of the object pointed to.
Historically, some architectures (e.g. PDP-10) had different representations for pointer-to-byte and pointer-to-word; there may be some runtime cost for conversions there.
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i); // bad cast
}
Remember, the machine only knows memory addresses, data and code. Everything else (such as types etc) are known only to the Compiler(that aid the programmer), and that does all the pointer arithmetic, only the compiler knows the size of each type.. so on and so forth.
At runtime, there are no machine cycles wasted in converting one pointer type to another because the conversion does not happen at runtime. All pointers are treated as of 4 bytes long(on a 32 bit machine) nothing more and nothing less.
It all depends on your underlying hardware.
On most machine architectures, all pointers are byte pointers, and converting between a byte pointer and a byte pointer is a no-op. On some architectures, a pointer conversion may under some circumstances require extra manipulation (there are machines that work with word based addresses for instance, and converting a word pointer to a byte pointer or vice versa will require extra manipulation).
Moreover, this is in general an unsafe technique, as the compiler can't perform any sanity checking on what you are doing, and you can end up overwriting data you didn't expect.

reinterpret_cast cast cost

My understanding is that C++ reinterpret_cast and C pointer cast is a just
a compile-time functionality and that it has no performance cost at all.
Is this true?
It's a good assumption to start with. However, the optimizer may be restricted in what it can assume in the presence of a reinterpret_cast<> or C pointer cast. Then, even though the cast itself has no associated instructions, the resulting code is slower.
For instance, if you cast an int to a pointer, the optimizer likely will have no idea what that pointer could be pointing to. As a result, it probably has to assume that a write through that pointer can change any variable. That beats very common optimizations such as storing variables in registers.
That's right. No cost other than any gain/loss in performance for performing instructions at the new width, which I might add, is only a concern in rare cases. Casting between pointers on every platform I've ever heard of has zero cost, and no performance change whatsoever.
C style casts in C++ will attempt a static_cast first and only perform a reinterpret_cast if a static cast cannot be performed. A static_cast may change the value of the pointer in the case of multiple inheritance (or when casting an interface to a concrete type), this offset calculation may involve an extra machine instruction. This will at most be 1 machine instruction so really very small.
Yes, this is true. Casting type which has runtime cost is dynamic_cast.
You're right, but think about it: reinterpret_cast means maybe a bad design or that you're doing something very low level.
dynamic-cast instead it will cost you something, because it has to look in a lookup table at runtime.
reinterpret_cast does not incur runtime cost.. however you have to be careful, as every use of reinterpret_cast is implementation defined. For example, it is possible reinterpreting a char array as an int array could cause the target architecture to throw an interrupt, because different types may have different alignment rules.
Get correct first, then worry about efficiency.
I was looking at my assembler code before and after reinterpret casting signed char as unsigned char. The instructions grew by about 3 or four more instructions.
int main()
{
signed char i = 0x80;
(unsigned char&)i >>= 7;
return i;
}
I was casting to unsigned char to make the compiler use SHL instruction, rather than SAR instruction, so that newly shifted shift in bits would be zer0s instead of var i signed bit value.
The compiler still and seems to always use SAR instruction. But the reinterpret casting made the compiler add more instructions. 3 to 4 more instructions!
I was concerned why my unicode function for converting UTF8 to UTF16 string was almost 3 times slower than Win32 MultiByteToWideChar(). Now I am worried that casting is one of the main factors.
Which is IRONIC, as we use reinterpret cast for speed.

How should I handle "cast from ‘void*’ to ‘int’ loses precision" when compiling 32-bit code on 64-bit machine?

I have a package that compiles and works fine on a 32-bit machine. I am now trying to get it to compile on a 64-bit machine and find the following error-
error: cast from ‘void*’ to ‘int’ loses precision
Is there a compiler flag to suppress these errors? or do I have to manually edit these files to avoid these casts?
The issue is that, in 32bits, an int (which is a 32bit integer) will hold a pointer value.
When you move to 64bit, you can no longer store a pointer in an int - it isn't large enough to hold a 64bit pointer. The intptr_t type is designed for this.
Your code is broken. It won't become any less broken by ignoring the warnings the compiler gives you.
What do you think will happen when you try to store a 64-bit wide pointer into a 32-bit integer? Half your data will get thrown away. I can't imagine many cases where that is the correct thing to do, or where it won't cause errors.
Fix your code. Or stay on the 32-bit platform that the code currently works on.
If your compiler defines intptr_t or uintptr_t, use those, as they are integer types guaranteed to be large enough to store a pointer.
If those types are not available, size_t or ptrdiff_t are also large enough to hold a pointer on most (not all) platforms. Or use long (is typically 64-bit on 64-bit platforms on the GCC compiler) or long long (a C99 types which most, but not all compilers, support in C++), or some other implementation-defined integral type that is at least 64 bits wide on a 64-bit platform.
My guess is OP's situation is a void* is being used as general storage for an int, where the void* is larger than the int. So eg:
int i = 123;
void *v = (void*)i; // 64bit void* being (ab)used to store 32bit value
[..]
int i2 = (int)v; // we want our 32bits of the 64bit void* back
Compiler doesn't like that last line.
I'm not going to weigh in on whether it's right or wrong to abuse a void* this way. If you really want to fool the compiler, the following technique seems to work, even with -Wall:
int i2 = *((int*)&v);
Here it takes the address of v, converts the address to a pointer of the datatype you want, then follows the pointer.
It's an error for a reason: int is only half as big as void* on your machine, so you can't just store a void* in an int. You would loose half of the pointer and when the program later tries to get the pointer out of that int again, it won't get anything useful.
Even if the compiler wouldn't give an error the code most likely wouldn't work. The code needs to be changed and reviewed for 64bit compatibility.
Casting a pointer to an int is horrible from a portability perspective. The size of int is defined by the mix of compiler and architecture. This is why the stdint.h header was created, to allow you to explicitly state the size of the type you're using across many different platforms with many different word sizes.
You'd be better off casting to a uintptr_t or intptr_t (from stdint.h, and choose the one that best matches the signedness you need).
You can try to use intptr_t for best portability instead of int where pointer casts are required, such as callbacks.
You do not want to suppress these errors because most likely, they are indicating a problem with the code logic.
If you suppresses the errors, this could even work for a while. While the pointer points to an address in the first 4 GB, the upper 32 bits will be 0 and you won't lose any data. But once you get an address > 4GB, your code will start 'mysteriously' not working.
What you should do is modify any int that can hold a pointer to intptr_t.
You have to manually edit those files in order to replace them with code that isn't likely to be buggy and nonportable.
Suppressing the warnings are a bad idea, but there may be a compiler flag to use 64-bit ints, depending on your compiler and architecture, and this is a safe way to fix the problem (assuming of course that the code didn't also assume ints are 32-bit). For gcc, the flag is -m64.
The best answer is still to fix the code, I suppose, but if it's legacy third-party code and these warnings are rampant, I can't see this refactoring as being a very efficient use of your time. Definitely don't cast pointers to ints in any of your new code, though.
As defined by the current C++ standard, there is no integer type which is guaranteed to hold a pointer. Some platforms will have an intptr_t, but this is not a standard feature of C++. Fundamentally, treating the bits of a pointer as if they were an integer is not a portable thing to do (although it can be made to work on many platforms).
If the reason for the cast is to make the pointer opaque, then void* already achieves this, so the code could use void* instead of int. A typedef might make this a little nicer in the code
typedef void * handle_t;
If the reason for the cast is to do pointer arithmetic with byte granularity, then the best way is probably to cast to a (char const *) and do the math with that.
If the reason for the cast is to achieve compatibility with some existing library (perhaps an older callback interface) which cannot be modified, then I think you need to review the documentation for that library. If the library is capable of supporting the functionality that you require (even on a 64-bit platform), then its documentation may address the intended solution.
I faced similar problem. I solved it in the following way:
#ifdef 64BIT
typedef uint64_t tulong;
#else
typedef uint32_t tulong;
#endif
void * ptr = NULL; //Whatever you want to keep it.
int i;
i = (int)(tulong)ptr;
I think, the problem is of typecasting a pointer to a shorter data type. But for a larger type to int, it works fine.
I converted this problem from typecasting a pointer to long to typecasting a 64-bit integer to 32-bit integer and it worked fine. I am still in the search of a compiler option in GCC/Clang.
Sometimes it is sensible to want to split up a 64-bit item into 2 32-bit items. This is how you would do it:
Header file:
//You only need this if you haven't got a definition of UInt32 from somewhere else
typedef unsigned int UInt32;
//x, when cast, points to the lower 32 bits
#define LO_32(x) (*( (UInt32 *) &x))
//address like an array to get to the higher bits (which are in position 1)
#define HI_32(x) (*( ( (UInt32 *) &x) + 1))
Source file:
//Wherever your pointer points to
void *ptr = PTR_LOCATION
//32-bit UInt containing the upper bits
UInt32 upper_half = HI_32(ptr);
//32-bit UInt containing the lower bits
UInt32 lower_half = LO_32(ptr);