I am converting a large project of multiple c++ applications from 32 bit MFC to 64 bit. These applications will be required to be compiled in both 32 and 64 bit for the forseeable future.
I've run across these types INT_PTR and UINT_PTR. I have two questions.
Is it considered best practice to use these types as a "default" type for general integer purposes, such as loop counters, etc?
I understand that the size of these types are related to the pointer size of the environment you are compiling for, but it seems confusing to use them for general purpose integer. For example for (INT_PTR i = 0; i<10; i++) ... ; i isn't a pointer or pointer related, so the name of the type is confusing to me. Are there better predefines to be used in this situation or should I make my own?
My compiler is VS2010.
Thanks
INT_PTR and similar have a very specific use-case: They model a data type that is large enough and with appropriate alignment requirements to hold an integer or a pointer type. It is used in situations where a method has parameters or a return type that is either an integral data type or a pointer (like GetWindowLongPtr).
For loops there is no generic advice other than this: Use the most appropriate type. If your loop index is used to index into a container, make it a size_t. If your loop index is an integer that runs across all values in between x and y, use the type of x and y, or one that is compatible with both. An INT_PTR is not really an appropriate loop index data type.
You might want to take a look at this: http://msdn.microsoft.com/en-us/library/windows/desktop/aa383751%28v=vs.85%29.aspx/ "Windows Data Types"
You might be a bit confused about the purpose for this type. It's there to ensure that if you cast a pointer to an int (which you probably shouldn't do anyway) you have an int type that fits. Windows is a bit odd in that (if memory serves) int is still 32 bits when compiling for 64 bits. If you need an int of a specific size I'd use the exact width types (in stdint.h)
If your loop is not related to pointers (that is, it is not used as an index of an array or a vector or anything like that), use int whenever a generic integer type is required. It's a good thing to make sure your code doesn't depend on size of int (although for MS VS int is 32 bit for both platforms).
Use size_t, vector<...>::size_type or other appropriate type for index-related loops.
You may use the datatype "intptr_t". It's a platform specific datatype. It holds 4 byte values in 32 bit platform while 8 byte values in 64 bit platform.
Related
I'm struggling to understand the usefulness of the C++ std::size_t data type. I realize that this data type is platform dependent, and is supposedly meant to make code more portable. However, it seems like it doesn't solve all the problems.
Say for example I'm working on a machine that has 32 bit int. Let's say I decide to write a c style function of this machine that just copies the bytes from one object to another. Inside this function, the memcpy function is used to write the data from object2 to object1. I've chosen an arbitrarily large number.
void writeBytes(obj *pobj1, obj *pobj2)
{
memcpy(pobj1, pobj2, 1048575);
}
This code should (hopefully) compile just fine. Because memcpy uses size_t in its declaration, and because size_t on this platform should be 32 bits, the number of 1048575 should work just fine.
But now let's say I decide to port this function over to a machine that 16 bit ints. Now the memcpy function interprets size_t as being of size 16. In this case, 1048575 is exceeds the allowed values for what memcpy was declared for. The code then fails to compile.
So my question: how exactly was size_t useful in this case? How did it make our code more portable?
size_t is able to hold the size of the largest object you can create. It is not required to be your platform's largest native integer type.
Given your example, your code would work regardless of the native integer being 16-bit or 32-bit if your platform allows 1048575 byte objects. Or the inverse -- if 1048575 doesn't fit in a size_t, you never could have created an object that large to memcpy.
size_t is susceptible to the same overflow and underflow rules as any other integral type. It's only typedef of an unsigned integral. Its purpose is not to prevent you from assigning or casting values that are outside of its range. It defines a standard type that will:
"store the maximum size of a theoretically possible object of any type
(including array)."
If you care about that maximum size, use
std::numeric_limits<std::size_t>::max()
and make decisions off of that.
I think size_t makes (not only your) code more readable and consistent, not necessarily more portable.
It will help porting it, however.
Imagine functions working with some size use a happy mixture of int, short, long, unsigned, unsigned short, ...
When one such function dealing with a size calls another that needs a size parameter too, it's very helpful when one size_t fits all (or at least most, like for the result of read()).
I learned this today at my work place. And I read this, this and this before posting my question.
Here's what my senior co-worker told me:
You cannot assign void* to UINT or unsigned int. It won’t work for 64 bit.
But why? Is it because void* and unsigned int carry different sizes on different architectures (as mentioned in other questions), or something else?
Yes, that is the case.
Your implementation may provide the optional type uintptr_t, however, which is defined as follows:
The following type designates an unsigned integer type with the property that any valid
pointer to void can be converted to this type, then converted back to pointer to void,
and the result will compare equal to the original pointer:
uintptr_t
The signed counterpart intptr_t may also be available. These types are available in the <cstdint> header.
By choosing to use these types, you are acknowledging that your code will only compile with the subset of implementations that provide this types on your target machines.
Size is obviously a show-stopper, if void* can't physically fit in an unisigned int the game is over. But even if sizeof(void*) == sizeof(unsigned int) you have a type compatibility problem: one holds a data pointer, the other data. You'd have to reinterpret_cast<>() the one to the other and all bets are off as to how well this would work.
You're essentially correct: It is not guaranteed that an unsigned int has the same machine word length as a void*, so you can't cast between them without losing information. Here is an excellent FAQ answer about it.
The main thing to keep in mind is that void* is an arbitrary data pointer, not a truly arbitrary pointer. In fact, there is no such thing as a truly generic pointer: certain machines may have different address spaces for programs and data, for example, so the sizes of pointers to each might differ. See this SO answer for more info.
Depends on the target for your application. You tagged VC++ and mention type UINT - thus it seems that you're building for Windows. In 32-bit Windows the pointer size is 32 bit, while in 64-bit Windows it's 64 bit. However, size of type UINT is defined similarly to 32 bit for both Windows flavors. You can use __uint64 or UINT64 MS-specific type instead of UINT to ensure it's big enough for your pointer. You can also use INT_PTR/UINT_PTR types which are specifically designed to match the size of the pointer (thus making it transparent for 32/64-bit flavors).
See http://msdn.microsoft.com/en-us/library/s3f49ktz.aspx for reference on various data types.
Of course, all of these will make your program not natively portable to other architecture/OSes.
1. Why?
Code like this used to work and it's kind of obvious what it is supposed to mean. Is the compiler even allowed (by the specification) to make it an error?
I know that it's loosing precision and I would be happy with a warning. But it still has a well-defined semantics (at least for unsigned downsizing cast is defined) and the user just might want to do it.
2. Workaround
I have legacy code that I don't want to refactor too much because it's rather tricky and already debugged. It is doing two things:
Sometimes stores integers in pointer variables. The code only casts the pointer to integer if it stored an integer in it before. Therefore while the cast is downsizing, the overflow never happens in reality. The code is tested and works.
When integer is stored, it always fits in plain old unsigned, so changing the type is not considered a good idea and the pointer is passed around quite a bit, so changing it's type would be somewhat invasive.
Uses the address as hash value. A rather common thing to do. The hash table is not that large to make any sense to extend the type.
The code uses plain unsigned for hash value, but note that the more usual type of size_t may still generate the error, because there is no guarantee that sizeof(size_t) >= sizeof(void *). On platforms with segmented memory and far pointers, size_t only has to cover the offset part.
So what are the least invasive suitable workarounds? The code is known to work when compiled with compiler that does not produce this error, so I really want to do the operation, not change it.
Notes:
void *x;
int y;
union U { void *p; int i; } u;
*(int*)&x and u.p = x, u.i are not equivalent to (int)x and are not the opposite of (void *)y. On big endian architectures, the first two will return the bytes on lower addresses while the later will work on low order bytes, which may reside on higher addresses.
*(int*)&x and u.p = x, u.i are both strict aliasing violations, (int)x is not.
C++, 5.2.10:
4 - A pointer can be explicitly converted to any integral type large enough to hold it. [...]
C, 6.3.2.3:
6 - Any pointer type may be converted to an integer type. [...] If the result cannot be represented in the integer type, the behavior is undefined. [...]
So (int) p is illegal if int is 32-bit and void * is 64-bit; a C++ compiler is correct to give you an error, while a C compiler may either give an error on translation or emit a program with undefined behaviour.
You should write, adding a single conversion:
(int) (intptr_t) p
or, using C++ syntax,
static_cast<int>(reinterpret_cast<intptr_t>(p))
If you're converting to an unsigned integer type, convert via uintptr_t instead of intptr_t.
This is a tough one to solve "generically", because the "looses precision" indicates that your pointers are larger than the type you are trying to store it in. Which may well be "ok" in your mind, but the compiler is concerned that you will be restoring the int value back into a pointer, which has now lost the upper 32 bits (assuming we're talking 32-bit int and 64-bit pointers - there are other possible combinations).
There is uintptr_t that is size-compatible with whatever the pointer is on the systems, so typically, you can overcome the actual error by:
int x = static_cast<int>(reinterpret_cast<uintptr_t>(some_ptr));
This will first force a large integer from a pointer, and then cast the large integer to a smaller type.
Answer for C
Converting pointers to integers is implementation defined. Your problem is that the code that you are talking about seems never have been correct. And probably only worked on ancient architectures where both int and pointers are 32 bit.
The only types that are supposed to convert without loss are [u]intptr_t, if they exist on the platform (usually they do). Which part of such an uintptr_t is appropriate to use for your hash function is difficult to tell, you shouldn't make any assumptions on that. I would go for something like
uintptr_t n = (uintptr_t)x;
and then
((n >> 32) ^ n) & UINT32_MAX
this can be optimized out on 32 bit archs, and would give you traces of all other bits on 64 bit archs.
For C++ basically the same should apply, just the cast would be reinterpret_cast<std:uintptr_t>(x).
Was trying to port 32bit to 64bit code I was wondering if there are some standard rules when it comes to porting ?
I have my code compiling in a 64bit environment and now I come across some errors like
cast from pointer to integer of different size [-Werror=pointer-to-int-cast] for
x = (int32_t)y;
And to get of this I use x = (size_t)y; I get rid of the error but is this the correct way. Also in various location I have to cast a variable to (unsigned long long). For example
printf("Total Time : %5qu\n",time->compile_time
This gives an error error: format '%qu' expects argument of type 'long long unsigned int', but argument 2 has type (XYZ).
to get this fixed i do something like
printf("Total Time : %5qu\n",(unsigned long long) time->compile_time
Again is this proper ??
I think it's safe to assume that y is a pointer in this case.
Instead of size_t you should use intptr_t or uintptr_t.
See size_t vs. uintptr_t.
As for your second cast it depends what you mean by proper?
The usual advice is to avoid casting. However, like all things in programming there is a reason that they are available. When working on an implementation of malloc on an embedded system I had to cast pointers to uintptr_t in order to be able to do the necessary arithmetic on them. This code was tested on a 64 bit PC but ran on a 32 bit micro controller. The fact that I used two architectures was the best way to ensure it was somewhat portable code.
Casting though makes your code dependent on how the underlying type is defined! Just like you noticed with your x = (int32_t)y this line made your code dependent on the fact that a pointer was 32 bits wide.
The only advice I can give you is to know your types. If you want to cast, it is ok (so long as you can't make your variable of the correct type to begin with) but it may reduce your portability unless you choose the "correct" type to cast to.
The same goes for the printf cast. If I was you, I would read the definition of %5qu thoroughly first (this may help). Then I would attempt to use an appropriately typed variable (or conversely a different format string) and only if that failed would I resort to a cast.
I have never used %qu but I would interpret it as a 64 bit unsigned int so I would try using uint64_t (because long long is not guaranteed to be 64 bits across all platforms). Although by what I read on Wikipedia the q specifier is platform specific to begin with so it might be wise to change it.
Any more than this and the question becomes too broad (it's good that we stuck to specific examples). If you get stuck, come back with individual types that you want to check and ask questions only about them.
Was it Stroustrup that said they call it a 'cast' because it props up something that's broken ? ;-)
x = (int32_t) y;
In this case you are using an exact width type, so it really depends on what x and y are. The error message suggest that y is a pointer. A pointer is not an int32_t, so the real question is why is y being assigned to x ... it may indicate a potential problem. Casting it away may just cover the problem so that it bites you at run-time rather than compile-time. Figure out what the code thinks it's doing and "re-jigger" the types to fit the code. The reason the error goes away when you use a (size_t) cast is that likely the pointer is 64 bits and size_t is 64 bits, but you could consider that a simple form of random casting luck. The same is true when casting to (unsigned long long). Don't assume that an int is 32 or 64 bits and don't using casting as a porting tool ... it will get you in trouble. It's tough to be more specific based on a single line of code. If you want to post a < 25 line function that has issues; more specific advice may be available.
I have a API like this,
class IoType {
......
StatusType writeBytes(......, size_t& bytesWritten);
StatusType writeObjects(......, size_t& objsWritten);
};
A senior member of the team who I respect seems to have a problem with the type size_t and suggest that I use C99 types. I know it sounds stupid but I always think c99 types like uint32_t and uint64_t look ugly. I do use them but only when it's really necessary, for instance when I need to serialize/deserialize a structure, I do want to be specific about the sizes of my data members.
What are the arguments against using size_t? I know it's not a real type but if I know for sure even a 32-bit integer is enough for me and a size type seems to be appropriate for number of bytes or number of objects, etc.
Use exact-size types like uint32_t whenever you're dealing with serialization of any sort (binary files, networking, etc.). Use size_t whenever you're dealing with the size of an object in memory—that's what it's intended for. All of the functions that deal with object sizes, like malloc, strlen, and the sizeof operator all size_t.
If you use size_t correctly, your program will be maximally portable, and it will not waste time and memory on platforms where it doesn't need to. On 32-bit platforms, a size_t will be 32 bits—if you instead used a uint64_t, you'd waste time and space. Conversely, on 64-bit platforms, a size_t will be 64 bits—if you instead used a uint32_t, your program could behave incorrectly (maybe even crash or open up a security vulnerability) if it ever had to deal with a piece of memory larger than 4 GB.
I can't think of anything wrong in using size_t in contexts where you don't need to serialize values. Also using size_t correctly will increase the code's safety/portability across 32 and 64 bit patforms.
Uhm, it's not a good idea to replace size_t (a maximally portable thing) with a less portable C99 fixed size or minimum size unsigned type.
On the other hand, you can avoid a lot of technical problems (wasted time) by using the signed ptrdiff_t type instead. The standard library’s use of unsigned type is just for historical reasons. It made sense in its day, and even today on 16-bit architectures, but generally it is nothing but trouble & verbosity.
Making that change requires some support, though, in particular a general size function that returns array or container size as ptrdiff_t.
Now, regarding your function signature
StatusType writeBytes(......, size_t& bytesWritten);
This forces the calling code’s choice of type for the bytes written count.
And then, with unsigned type size_t forced, it is easy to introduce a bug, e.g. by checking if that is less or more than some computed quantity.
A grotesque example: std::string("ah").length() < -5 is guaranteed true.
So instead, make that …
Size writeBytes(......);
or, if you do not want to use exceptions,
Size writeBytes(......, StatusType& status );
It is OK to have an enumeration of possible statuses as unsigned type or as whatever, because the only operations on status values will be equality checking and possibly as keys.