I've recently tried to really come to grips with references and pointers in C++, and I'm getting a little bit confused. I understand the * and & operators which can respectively get the value at an address and get the address of a value, however why can't these simply be used with basic types like ints?
I don't understand why you can't, for example, do something like the following and not use any weird pointer variable creation:
string x = "Hello";
int y = &x; //Set 'y' to the memory address of 'x'
cout << *y; //Output the value at the address 'y' (which is the memory address of 'x')
The code above should, theoretically in my mind, output the value of 'x'. 'y' contains the memory address of 'x', and hence '*y' should be 'x'. If this works (which incidentally on trying to compile it, it doesn't -- it tells me it can't convert from a string to an int, which doesn't make much sense since you'd think a memory address could be stored in an int fine).
Why do we need to use special pointer variable declarations (e.g. string *y = &x)?
And inside this, if we take the * operator in the pointer declaration literally in the example in the line above, we are setting the value of 'y' to the memory address of 'x', but then later when we want to access the value at the memory address ('&x') we can use the same '*y' which we previously set to the memory address.
C and C++ resolve type information at compile-time, not runtime. Even runtime polymorphism relies on the compiler constructing a table of function pointers with offsets fixed at compile time.
For that reason, the only way the program can know that cout << *y; is printing a string is because y is strongly typed as a pointer-to-string (std::string*). The program cannot, from the address alone, determine that the object stored at address y is a std::string. (Even C++ RTTI does not allow this, you need enough type information to identify a polymorphic base class.)
In short, C is a typed language. You cannot store arbitrary things in variables.
Check the type safety article at wikipedia. C/C++ prevents problematic operations and functional calls at compliation time by checking the type of the operands and function parameters (but note that with explicit casts you can change the type of an expression).
It doesn't make sense to store a string in an integer -> The same way it doesn't make sense to store a pointer in it.
Simply put, a memory address has a type, which is pointer. Pointers are not ints, so you can't store a pointer in an int variable. If you're curious why ints and pointers are not fungible, it's because the size of each is implementation defined (with certain restrictions) and there is no guarantee that they will be the same size.
For instance, as #Damien_The_Unbeliever pointed out pointers on a 64-bit system must be 64-bits long, but it is perfectly legal for an int to be 32-bits, as long as it is no longer than a long and nor shorter than a short.
As to why each data type has it's own pointer type, that's because each type (especially user-defined types) is structured differently in memory. If we were to dereference typeless (or void) pointers, there would be no information indicating how that data should be interpreted. If, on the other hand, you were to create a universal pointer and do away with the "inconvenience" of specifying types, each entity in memory would probably have to be stored along-side its type information. While this is doable, it's far from efficient, and efficiency is on of C++'s design goals.
Some very low-level languages... like machine language... operate exactly as you describe. A number is a number, and it's up to the programmer to hold it in their heads what it represents. Generally speaking, the hope of higher level languages is to keep you from the concerns and potential for error that comes from that style of development.
You can actually disregard C++'s type-safety, at your peril. For instance, the gcc on a 32-bit machine I have will print "Hello" when I run this:
string x = "Hello";
int y = reinterpret_cast<int>(&x);
cout << *reinterpret_cast<string*>(y) << endl;
But as pretty much every other answerer has pointed out, there's no guarantee it would work on another computer. If I try this on a 64-bit machine, I get:
error: cast from ‘std::string*’ to ‘int’ loses precision
Which I can work around by changing it to a long:
string x = "Hello";
long y = reinterpret_cast<long>(&x);
cout << *reinterpret_cast<string*>(y) << endl;
The C++ standard specifies minimums for these types, but not maximums, so you really don't know what you're going to be dealing with when you face a new compiler. See: What does the C++ standard state the size of int, long type to be?
So the potential for writing non-portable code is high once you start going this route and "casting away" the safeties in the language. reinterpret_cast is the most dangerous type of casting...
When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used?
But that's just technically drilling down into the "why not int" part specifically, in case you were interested. Note that as #BenVoight points out in the comment below, there does exist an integer type as of C99 called intptr_t which is guaranteed to hold any poniter. So there are much larger problems when you throw away type information than losing precision...like accidentally casting back to a wrong type!
C++ is a strongly typed language, and pointers and integers are different types. By making those separate types the compiler is able to detect misuses and tell you that what you are doing is incorrect.
At the same time, the pointer type maintains information on the type of the pointed object, if you obtain the address of a double, you have to store that in a double*, and the compiler knows that dereferencing that pointer you will get to a double. In your example code, int y = &x; cout << *y; the compiler would loose the information of what y points to, the type of the expression *y would be unknown and it would not be able to determine which of the different overloads of operator<< to call. Compare that with std::string *y = &x; where the compiler sees y it knows it is a std::string* and knows that dereferencing it you get to a std::string (and not a double or any other type), enabling the compiler to statically check all expressions that contain y.
Finally, while you think that a pointer is just the address of the object and that should be representable by an integral type (which on 64bit architectures would have to be int64 rather than int) that is not always the case. There are different architectures on which pointers are not really representable by integral values. For example in architectures with segmented memory, the address of an object can contain both a segment (integral value) and an offset into the segment (another integral value). On other architectures the size of pointers was different than the size of any integral type.
The language is trying to protect you from conflating two different concepts - even though at the hardware level they are both just sets of bits;
Outside of needing to pass values manually between various parts of a debugger, you never need to know the numerical value.
Outside of archaic uses of arrays, it doesn't make sense to "add 10" to a pointer - so you shouldn't treat them as numeric values.
By the compiler retaining type information, it also prevents you from making mistakes - if all pointers were equal, then the compiler couldn't, helpfully, point out that what you're trying to dereference as an int is a pointer to a string.
Related
As a part of hashing, I need to convert a function pointer to a string representation. With global/static functions it's trivial:
string s1{ to_string(reinterpret_cast<uintptr_t>(&global)) };
And from here:
2) Any pointer can be converted to any integral type large enough to
hold the value of the pointer (e.g. to std::uintptr_t)
But I have a problems with member functions:
cout << &MyStruct::member;
outputs 1 though in debugger I can see the address.
string s{ to_string(reinterpret_cast<uintptr_t>(&MyStruct::member)) };
Gives a compile-time error cannot convert. So it seems that not any pointer can be converted.
What else can I do to get a string representation?
cout << &MyStruct::member;
outputs 1 though in debugger I can see the address.
There is no overload for ostream::operator<<(decltype(&MyStruct::member)). However, the member function pointer is implicitly convertible to bool and for that, there exists an overload and that is the best match for overload resolution. The converted value is true if the pointer is not null. true is output as 1.
string s{ to_string(reinterpret_cast<uintptr_t>(&MyStruct::member)) };
Gives a compile-time error cannot convert. So it seems that not any pointer can be converted.
Perhaps confusingly, in standardese pointer is not an umbrella term for object pointers, pointers-to-members, pointers-to-functions and pointers-to-member-functions. Pointers mean just data pointers specifically.
So, the quoted rule does not apply to pointers-to-member-functions. It only applies to (object) pointers.
What else can I do to get a string representation?
You can use a buffer of unsigned char, big enough to represent the pointer, and use std::memcpy. Then print it in the format of your own choice. I recommend hexadecimal.
As Martin Bonner points out, the pointer-to-member may contain padding in which case two values that point to the same member may actually have a different value in the buffer. Therefore the printed value is not of much use because two values are not comparable without knowing which bits (if any) are padding - which is implementation defined.
Unfortunately I need a robust solution so because of this padding I can't use.
No portable robust solution exists.
As Jonathan Wakely points out, there is no padding in the Itanium ABI, so if your compiler uses that, then the suggested memcpy method would work.
1. Why?
Code like this used to work and it's kind of obvious what it is supposed to mean. Is the compiler even allowed (by the specification) to make it an error?
I know that it's loosing precision and I would be happy with a warning. But it still has a well-defined semantics (at least for unsigned downsizing cast is defined) and the user just might want to do it.
2. Workaround
I have legacy code that I don't want to refactor too much because it's rather tricky and already debugged. It is doing two things:
Sometimes stores integers in pointer variables. The code only casts the pointer to integer if it stored an integer in it before. Therefore while the cast is downsizing, the overflow never happens in reality. The code is tested and works.
When integer is stored, it always fits in plain old unsigned, so changing the type is not considered a good idea and the pointer is passed around quite a bit, so changing it's type would be somewhat invasive.
Uses the address as hash value. A rather common thing to do. The hash table is not that large to make any sense to extend the type.
The code uses plain unsigned for hash value, but note that the more usual type of size_t may still generate the error, because there is no guarantee that sizeof(size_t) >= sizeof(void *). On platforms with segmented memory and far pointers, size_t only has to cover the offset part.
So what are the least invasive suitable workarounds? The code is known to work when compiled with compiler that does not produce this error, so I really want to do the operation, not change it.
Notes:
void *x;
int y;
union U { void *p; int i; } u;
*(int*)&x and u.p = x, u.i are not equivalent to (int)x and are not the opposite of (void *)y. On big endian architectures, the first two will return the bytes on lower addresses while the later will work on low order bytes, which may reside on higher addresses.
*(int*)&x and u.p = x, u.i are both strict aliasing violations, (int)x is not.
C++, 5.2.10:
4 - A pointer can be explicitly converted to any integral type large enough to hold it. [...]
C, 6.3.2.3:
6 - Any pointer type may be converted to an integer type. [...] If the result cannot be represented in the integer type, the behavior is undefined. [...]
So (int) p is illegal if int is 32-bit and void * is 64-bit; a C++ compiler is correct to give you an error, while a C compiler may either give an error on translation or emit a program with undefined behaviour.
You should write, adding a single conversion:
(int) (intptr_t) p
or, using C++ syntax,
static_cast<int>(reinterpret_cast<intptr_t>(p))
If you're converting to an unsigned integer type, convert via uintptr_t instead of intptr_t.
This is a tough one to solve "generically", because the "looses precision" indicates that your pointers are larger than the type you are trying to store it in. Which may well be "ok" in your mind, but the compiler is concerned that you will be restoring the int value back into a pointer, which has now lost the upper 32 bits (assuming we're talking 32-bit int and 64-bit pointers - there are other possible combinations).
There is uintptr_t that is size-compatible with whatever the pointer is on the systems, so typically, you can overcome the actual error by:
int x = static_cast<int>(reinterpret_cast<uintptr_t>(some_ptr));
This will first force a large integer from a pointer, and then cast the large integer to a smaller type.
Answer for C
Converting pointers to integers is implementation defined. Your problem is that the code that you are talking about seems never have been correct. And probably only worked on ancient architectures where both int and pointers are 32 bit.
The only types that are supposed to convert without loss are [u]intptr_t, if they exist on the platform (usually they do). Which part of such an uintptr_t is appropriate to use for your hash function is difficult to tell, you shouldn't make any assumptions on that. I would go for something like
uintptr_t n = (uintptr_t)x;
and then
((n >> 32) ^ n) & UINT32_MAX
this can be optimized out on 32 bit archs, and would give you traces of all other bits on 64 bit archs.
For C++ basically the same should apply, just the cast would be reinterpret_cast<std:uintptr_t>(x).
I need a simple way to convert a pointer to an int, and then back to the pointer. I need it as an int for portability, since an int is easier to carry around in my project than a pointer.
Start player and return pointer:
SoundPlayer* player = new FxPlayerTiny();
return (int)player; // this works, but is it correct?
Stop player using pointer int:
FxPlayerTiny* player = static_cast<FxPlayerTiny*>((int*)num); // this gives me an error
FxPlayerTiny* player = (FxPlayerTiny*)((int*)obj); // this works, but is it correct?
delete player;
The best thing to do is to fix your project so it can deal with pointers. Integers and pointers are two different things. You can convert back and forth, but such conversions can lose information if you're not careful.
Converting a pointer value to int and back again can easily lose information, depending on the implementation and the underlying platform. For example, there are systems on int is smaller than a pointer. If you have 32-bit ints and 64-bit pointers, then converting a pointer to an int and back again will almost certainly give you an invalid pointer.
It's very likely that long or unsigned long is wide enough to hold a converted pointer value without loss of information; I've never worked on a system where it isn't. (Personally, I tend to prefer unsigned types, but not for any really good reason; either should work equally well.)
So you could write, for example:
SoundPlayer* player = new FxPlayerTiny();
return reinterpret_cast<unsigned long>player;
and convert the unsigned long value back to a pointer using reinterpret_cast,SoundPlayer*>.
Newer implementations provide typedefs uintptr_t and intptr_t, which are unsigned and signed integer types guaranteed to work correctly for round-trip pointer-to-integer-to-pointer conversions. They were introduced in C99, and optionally defined in the <stdint.h> header. (Implementations on which pointers are bigger than any integer type won't define them, but such implementations are rare.) C++ adopted them with the 2011 standard, defining them in the <cstdint> header. But Microsoft Visual C++ does support them as of the 2010 version.
This guarantee applies only to ordinary pointers, not to function pointers or member pointers.
So if you must do this, you can write:
#include <cstdint>
SoundPlayer* player = new FxPlayerTiny();
return reinterpret_cast<std::uintptr_t>player;
But first, consider this. new FxPlayerTiny() gives you a pointer value, and you're going to want a pointer value. If at all possible, just keep it as a pointer. Converting it to an integer means that you have to decide which of several techniques to use, and you have to keep track of which pointer type your integer value is supposed to represent. If you make a mistake, either using an integer type that isn't big enough or losing track of which pointer type you've stored, the compiler likely won't warn you about it.
Perhaps you have a good reason for needing to do that, but if you can just store pointers as pointers your life will be a lot easier.
It is not recommended to use an int for portability, especially because sizeof (int) and sizeof (void*) could differ (for example when compiling for Windows x64).
If ints are needed for a user interface, it maybe would be better to use a table of pointers and accessing them via indices.
However, if you want to convert an pointer to an integer and vice versa a simple conversion is enough but it's just the same, if sizeof *int) == sizeof (void*).
Is pointer conversion considered expensive? (e.g. how many CPU cycles it takes to convert a pointer/address), especially when you have to do it quite frequently, for instance (just an example to show the scale of freqency, I know there are better ways for this particular cases):
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i);
};
(e.g. how many CPU cycles it takes to convert a pointer/address)
In most machine code languages there is only 1 "type" of pointer and so it doesn't cost anything to convert between them. Keep in mind that C++ types really only exist at compile time.
The real issue is that this sort of code can break strict aliasing rules. You can read more about this elsewhere, but essentially the compiler will either produce incorrect code through undefined behavior, or be forced to make conservative assumptions and thus produce slower code. (note that the char* and friends is somewhat exempt from the undefined behavior part)
Optimizers often have to make conservative assumptions about variables in the presence of pointers. For example, a constant propagation process that knows the value of variable x is 5 would not be able to keep using this information after an assignment to another variable (for example, *y = 10) because it could be that *y is an alias of x. This could be the case after an assignment like y = &x.
As an effect of the assignment to *y, the value of x would be changed as well, so propagating the information that x is 5 to the statements following *y = 10 would be potentially wrong (if *y is indeed an alias of x). However, if we have information about pointers, the constant propagation process could make a query like: can x be an alias of *y? Then, if the answer is no, x = 5 can be propagated safely.
Another optimization impacted by aliasing is code reordering. If the compiler decides that x is not aliased by *y, then code that uses or changes the value of x can be moved before the assignment *y = 10, if this would improve scheduling or enable more loop optimizations to be carried out.
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition, see section 6.5, paragraph 7) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", sometime allows for impressive increases in performance,[1] but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[2] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[3] In such cases, when compiled with gcc, the option -fno-strict-aliasing is invoked to prevent unwanted optimizations that could yield unexpected code.
[edit]
http://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts_with_optimization
What is the strict aliasing rule?
On any architecture you're likely to encounter, all pointer types have the same representation, and so conversion between different pointer types representing the same address has no run-time cost. This applies to all pointer conversions in C.
In C++, some pointer conversions have a cost and some don't:
reinterpret_cast and const_cast (or an equivalent C-style cast, such as the one in the question) and conversion to or from void* will simply reinterpret the pointer value, with no cost.
Conversion between pointer-to-base-class and pointer-to-derived class (either implicitly, or with static_cast or an equivalent C-style cast) may require adding a fixed offset to the pointer value if there are multiple base classes.
dynamic_cast will do a non-trivial amount of work to look up the pointer value based on the dynamic type of the object pointed to.
Historically, some architectures (e.g. PDP-10) had different representations for pointer-to-byte and pointer-to-word; there may be some runtime cost for conversions there.
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i); // bad cast
}
Remember, the machine only knows memory addresses, data and code. Everything else (such as types etc) are known only to the Compiler(that aid the programmer), and that does all the pointer arithmetic, only the compiler knows the size of each type.. so on and so forth.
At runtime, there are no machine cycles wasted in converting one pointer type to another because the conversion does not happen at runtime. All pointers are treated as of 4 bytes long(on a 32 bit machine) nothing more and nothing less.
It all depends on your underlying hardware.
On most machine architectures, all pointers are byte pointers, and converting between a byte pointer and a byte pointer is a no-op. On some architectures, a pointer conversion may under some circumstances require extra manipulation (there are machines that work with word based addresses for instance, and converting a word pointer to a byte pointer or vice versa will require extra manipulation).
Moreover, this is in general an unsafe technique, as the compiler can't perform any sanity checking on what you are doing, and you can end up overwriting data you didn't expect.
int *pt = 0;
long i = reinterpret_cast<long>(pt);
Is i guaranteed to be 0? Is this well defined or implementation-defined?
I checked the c++ standard, but it only states that
A pointer to a data object or to a function (but not a pointer to member) can be converted to any integer type large enough to contain it.
In this case, pt doesn't point to any data object. Does the rule apply to this case?
No, i is not necessarily any value. The result is implementation-defined.†
The representation of pointers, in C++, is implementation-defined, including the representation of a null pointer. When you assign an integer value of zero to a pointer, you set that pointer to the implementation-defined null pointer value, which is not necessarily all-bits-zero. The result of casting that value to an integer is, by transitivity, implementation-defined.
Even more troublesome, though, is that the mapping done by reinterpret_cast is implementation-defined anyway. So even if the null pointer value was all-bits-zero, an implementation is free to make the result whatever it wants. You're only guaranteed that you'll get the original value when you cast back.
That all said, the next sentence after your quote includes the note:
[ Note: It is intended to be unsurprising to those who know the addressing structure
of the underlying machine. —end note ]
So even though specific mappings are not required, pragmatically you can take an educated guess.
† Assuming long is large enough. In C++0x use uintptr_t, optionally defined in <cstddef>.
I don't see why it wouldn't. The issue is that the integer type in question wouldn't be able to hold values that high, but if it's a null pointer there's no problem.
You want reinterpret_cast(pt) but yes, that will work. reinterpret_cast is the same as the old C style cast (long) and assumes you know what you are doing.
Yes, it does not matter where it points at runtime; the only thing that matters is the type of the pointer. "to a data object or to a function" just means that both "regular" object pointers as well as function pointers (i.e. pointers to object code) can be recast into numeric types.
I think the code you have above safely stores 0 to 'i' on all known common platforms.