Is the map from a pointers to his integer representation gomomorphic? - c++

Let f: Pointers -> Integer_Represenataion be a map provided by implementation (I hope, that map doesn't depends on the way we cast a pointer to an integral type). Let be a pointer to T and be a variable of integral type.
Does the standard explcitly define that the map is isomorphic, i.e. f(p+i)= f(p)+i*sizeof(T)? In general I would like to understand how additive operation between pointers and integrals is bounded.

It isn't. The specification does not require anything for it. It is implementation-defined and some implementations may be weird.
In similar cases it always helps to remember the memory models on 8086 (in 16 bits). There pointers are 32-bits, segment+offset, but they overlap to form only 20 bit address. In huge mode, these are normalized to smallest offset.
So say p = 0123:0004 (which converts to f(p) = 0x01230004), i = 42 and sizeof(T) = 2. Then p + i = 0128:0008 and converts to f(p+i) = 0x01280008, but f(p) + i*sizeof(T) = 0x01230058`, a different representation, though of the same address.
On the other hand in large model, the pointers are not normalized. So you can have both 0128:0008 and 0123:0058 and they are different pointers, but point to the same address.
Both follow the letter of the standard. Because arithmetic is only required to work on pointers to the same array or allocated block and the conversion to integer is implementation defined completely.

Related

Pointers and data types

I have the following question.
Given that a pointer holds the value of a memory address, why is it permitted to add an integer
data type value to a pointer variable but not a double data type?
My thoughts: Is it because we assume that the pointer is an int as well, or maybe because if we add a double will increase its length?
Thank you for your time.
You almost answered your question yourself: a pointer is a memory address. A memory address is an integer. You can add integers to integers and get integers as a result. Adding a float to an integer gives you a float, which cannot be used as a memory address.
For example, char *x = 0; is the address of a single byte; What would char *y = 0.5; mean? A byte that's somehow made up of the second half of the byte at address 0 and the first half of the byte at address 1?? This may make sense, but what about char *x = 3.1415926; or any similar floating-point number??
My thoughts: Is it because we assume that the pointer is an int as well, or maybe because if we add a double will increase its length?
If you look to documentation it says:
Certain addition, subtraction, increment, and decrement operators are defined for pointers to elements of arrays: such pointers satisfy the LegacyRandomAccessIterator requirements and allow the C++ library algorithms to work with raw arrays.
(emphasis is mine) and you should remember that:
*(ptr + 1)
is equal to:
ptr[1]
and indexes for arrays are integers so language does not define operations on pointers with floating point operands as that does not make any sense.
You can not add a double* (pointer) to an int* (pointer) via the conventions of C. A pointer holds a value of a memory address ["stores/points to the address of another variable"] that value in essence is determined by its type in this case int(4 byte-block of memory if I recall). A double is a double-precision, 64-bit floating-point data type. Just can't do it from the most "hardware" of levels.

Does casting pointers to integers define a total order on pointers?

(related to my previous question)
In QT, the QMap documentation says:
The key type of a QMap must provide operator<() specifying a total order.
However, in qmap.h, they seem to use something similar to std::less to compare pointers:
/*
QMap uses qMapLessThanKey() to compare keys. The default
implementation uses operator<(). For pointer types,
qMapLessThanKey() casts the pointers to integers before it
compares them, because operator<() is undefined on pointers
that come from different memory blocks. (In practice, this
is only a problem when running a program such as
BoundsChecker.)
*/
template <class Key> inline bool qMapLessThanKey(const Key &key1, const Key &key2)
{
return key1 < key2;
}
template <class Ptr> inline bool qMapLessThanKey(const Ptr *key1, const Ptr *key2)
{
Q_STATIC_ASSERT(sizeof(quintptr) == sizeof(const Ptr *));
return quintptr(key1) < quintptr(key2);
}
They just cast the pointers to quintptrs (which is the QT-version of uintptr_t, that is, an unsigned int that is capable of storing a pointer) and compare the results.
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to a pointer to void, and the result will compare equal to the original pointer: uintptr_t
Do you think this implementation of qMapLessThanKey() on pointers is ok?
Of course, there is a total order on integral types. But I think this is not sufficient to conclude that this operation defines a total order on pointers.
I think that it is true only if p1 == p2 implies quintptr(p1) == quintptr(p2), which, AFAIK, is not specified.
As a counterexample of this condition, imagine a target using 40 bits for pointers; it could convert pointers to quintptr, setting the 40 lowest bits to the pointer address and leaving the 24 highest bits unchanged (random). This is sufficient to respect the convertibility between quintptr and pointers, but this does not define a total order for pointers.
What do you think?
The Standard guarantees that converting a pointer to an uintptr_t will yield a value of some unsigned type which, if cast to the original pointer type, will yield the original pointer. It also mandates that any pointer can be decomposed into a sequence of unsigned char values, and that using such a sequence of unsigned char values to construct a pointer will yield the original. Neither guarantee, however, would forbid an implementation from including padding bits within pointer types, nor would either guarantee require that the padding bits behave in any consistent fashion.
If code avoided storing pointers, and instead cast to uintptr_t every pointer returned from malloc, later casting those values back to pointers as required, then the resulting uintptr_t values would form a ranking. The ranking might not have any relationship to the order in which objects were created, nor to their arrangement in memory, but it would be a ranking. If any pointer gets converted to uintptr_t more than once, however, the resulting values might rank entirely independently.
I think that you can't assume that there is a total order on pointers. The guarantees given by the standard for pointer to int conversions are rather limited:
5.2.10/4: A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined.
5.2.10/5: A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer
of sufficient size (...) and back to the same pointer type will have
its original value; mappings between pointers and integers are
otherwise implementation-defined.
From a practical point of view, most of the mainstream compilers will convert a pointer to an integer in a bitwise manner, and you'll have a total order.
The theoretical problem:
But this is not guaranteed. It might not work on past platforms (x86 real and protected mode), on exotic platform (embedded systems ?) , and -who knows- on some future platforms (?).
Take the example of segmented memory of the 8086: The real address is given by the combination of a segment (e.g. DS register for data segment, an SS for stack segment,...) and an offest:
Segment: XXXX YYYY YYYY YYYY 0000 16 bits shifted by 4 bits
Offset: 0000 ZZZZ ZZZZ ZZZZ ZZZZ 16 bits not sifted
------------------------
Address: AAAA AAAA AAAA AAAA AAAA 20 bits address
Now imagine that the compiler would convert the pointer to int, by simply doing the address math and put 20 bits in the integer: your safe and have a total order.
But another equally valid approach would be to store the segment on 16 upper bits and the offset on the 16 lower bits. In fact, this way would significantly facilitate/accelerate the load of pointer values into cpu registers.
This approach is compliant with standard c++ requirements, but each single address could be represented by 16 different pointers: your total order is lost !!
**Are there alternatives for the order ? **
One could imagine using pointer arithmetics. There are strong constraints on pointer arithmetics for elements in a same array:
5.7/6: When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two
array elements.
And subscripts are ordered.
Array can be of maximum size_t elements. So, naively, if sizeof(pointer) <= sizof(size_t) one could assume that taking an arbitrary reference pointer and doing some pointer arithmetic should lead to a total order.
Unfortunately, here also, the standard is very prudent:
5.7.7: For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T is different from the
cv-unqualified array element type, the behavior is undefined.
So pointer arithmetic won't do the trick for arbitrary pointers either. Again, back to the segmented memory models, helps to understand: arrays could have maximum 65535 bytes to fit completely in one segment. But different arrays could use different segments so that pointer arithmetic wouldn't be reliable for a total order either.
Conclusion
There's a subtle note in the standard on the mapping between pointer and interal value:
It is intended to be unsurprising to those who know the addressing
structure of the underlying machine.
This means that must be be possible to determine a total order. But keep in mind that it'll be non portable.

Can std::uintptr_t be used to avoid undefined behavior of out-of-bounds pointer arithmetic?

Now we know that doing out-of-bounds-pointer-arithmetic has undefined behavior as described in this SO question.
My question is: can we workaround such restriction by casting to std::uintptr_t for arithmetic operations and then cast back to pointer? is that guaranteed to work?
For example:
char a[5];
auto u = reinterpret_cast<std::uintptr_t>(a) - 1;
auto p = reinterpret_cast<char*>(u + 1); // OK?
The real world usage is for optimizing offsetted memory access -- instead of p[n + offset], I want to do offset_p[n].
EDIT To make the question more explicit:
Given a base pointer p of a char array, if p + n is a valid pointer, will reinterpret_cast<char*>(reinterpret_cast<std::uintptr_t>(p) + n) be guaranteed to yield the same valid pointer?
No, uintptr_t cannot be meaningfully used to avoid undefined behavior when performing pointer arithmetic.
For one thing, at least in C there is no guarantee that uintptr_t even exists. The requirement is that any value of type void* may be converted to uintptr_t and back again, yielding the original value without loss of information. In principle, there might not be any unsigned integer type wide enough to hold all pointer values. (I presume the same applies to C++, since C++ inherits most of the C standard library and defines it by reference to the C standard.)
Even if uintptr_t does exist, there is no guarantee that a given arithmetic operation on a uintptr_t value does the same thing as the corresponding operation on a pointer value.
For example, I've worked on systems (Cray vector systems, T90 and SV1) on which byte pointers are implemented in software. A native address is a 64-bit address that refers to a 64-bit word; there is no hardware support for byte addressing. A char* or void* pointer consists of a word pointer with a 3-bit offset stored in the otherwise unused high-order bits. Conversion between integers and pointers simply copies the bits. So incrementing a char* would advance it to point to the next 8-bit byte in memory; incrementing a uintptr_t obtained by converting a char* would advance it to point to the next 64-bit word.
That's just one example. More generally, conversions between pointers and integers are implementation-defined, and the language standard makes no guarantee about the semantics of those conversions (other than, in some cases, converting back to a pointer).
So yes, you can convert a pointer value to uintptr_t (if that type exists) and perform arithmetic on it without risking undefined behavior -- but the result may or may not be meaningful.
It happens that, on most systems, the mapping between pointers and integers is simpler, and you probably can get away with that kind of game. But you're better off using pointer arithmetic directly, and just being very careful to avoid any invalid operations.
Yes, that is legal, but you must reinterpret_cast exactly the same uintptr_t value back to char*.
(Therefore, what it you're intending to do is illegal; that is, converting a different value back to a pointer.)
5.2.10 Reinterpret cast
4 . A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined.
5 . A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted
to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type
will have its original value;
(Note that there'd be no way, in general, for the compiler to know that you subtracted one and then added it back.)

Why can I cast int and BOOL to void*, but not float?

void* is a useful feature of C and derivative languages. For example, it's possible to use void* to store objective-C object pointers in a C++ class.
I was working on a type conversion framework recently and due to time constraints was a little lazy - so I used void*... That's how this question came up:
Why can I typecast int to void*, but not float to void* ?
BOOL is not a C++ type. It's probably typedef or defined somewhere, and in these cases, it would be the same as int. Windows, for example, has this in Windef.h:
typedef int BOOL;
so your question reduces to, why can you typecast int to void*, but not float to void*?
int to void* is ok but generally not recommended (and some compilers will warn about it) because they are inherently the same in representation. A pointer is basically an integer that points to an address in memory.
float to void* is not ok because the interpretation of the float value and the actual bits representing it are different. For example, if you do:
float x = 1.0;
what it does is it sets the 32 bit memory to 00 00 80 3f (the actual representation of the float value 1.0 in IEEE single precision). When you cast a float to a void*, the interpretation is ambiguous. Do you mean the pointer that points to location 1 in memory? or do you mean the pointer that points to location 3f800000 (assuming little endian) in memory?
Of course, if you are sure which of the two cases you want, there is always a way to get around the problem. For example:
void* u = (void*)((int)x); // first case
void* u = (void*)(((unsigned short*)(&x))[0] | (((unsigned int)((unsigned short*)(&x))[1]) << 16)); // second case
Pointers are usually represented internally by the machine as integers. C allows you to cast back and forth between pointer type and integer type. (A pointer value may be converted to an integer large enough to hold it, and back.)
Using void* to hold integer values in unconventional. It's not guaranteed by the language to work, but if you want to be sloppy and constrain yourself to Intel and other commonplace platforms, it will basically scrape by.
Effectively what you're doing is using void* as a generic container of however many bytes are used by the machine for pointers. This differs between 32-bit and 64-bit machines. So converting long long to void* would lose bits on a 32-bit platform.
As for floating-point numbers, the intention of (void*) 10.5f is ambiguous. Do you want to round 10.5 to an integer, then convert that to a nonsense pointer? No, you want the bit-pattern used by the FPU to be placed into a nonsense pointer. This can be accomplished by assigning float f = 10.5f; void *vp = * (uint32_t*) &f;, but be warned that this is just nonsense: pointers aren't generic storage for bits.
The best generic storage for bits is char arrays, by the way. The language standards guarantee that memory can be manipulated through char*. But you have to mind data alignment requirements.
Standard says that 752 An integer may be converted to any pointer type. Doesn't say anything about pointer-float conversion.
Considering any of you want you transfer float value as void *, there is a workaround using type punning.
Here is an example;
struct mfloat {
union {
float fvalue;
int ivalue;
};
};
void print_float(void *data)
{
struct mfloat mf;
mf.ivalue = (int)data;
printf("%.2f\n", mf.fvalue);
}
struct mfloat mf;
mf.fvalue = 1.99f;
print_float((void *)(mf.ivalue));
we have used union to cast our float value(fvalue) as an integer(ivalue) to void*, and vice versa
The question is based on a false premise, namely that void * is somehow a "generic" or "catch-all" type in C or C++. It is not. It is a generic object pointer type, meaning that it can safely store pointers to any type of data, but it cannot itself contain any type of data.
You could use a void * pointer to generically manipulate data of any type by allocating sufficient memory to hold an object of any given type, then using a void * pointer to point to it. In some cases you could also use a union, which is of course designed to be able to contain objects of multiple types.
Now, because pointers can be thought of as integers (and indeed, on conventionally-addressed architectures, typically are integers) it is possible and in some circles fashionable to stuff an integer into a pointer. Some library API's have even documented and supported this usage — one notable example was X Windows.
Conversions between pointers and integers are implementation-defined, and these days typically draw warnings, and so typically require an explicit cast, not so much to force the conversion as simply to silence the warning. For example, both the code fragments below print 77, but the first one probably draws compiler warnings.
/* fragment 1: */
int i = 77;
void *p = i;
int j = p;
printf("%d\n", j);
/* fragment 2: */
int i = 77;
void *p = (void *)(uintptr_t)i;
int j = (int)p;
printf("%d\n", j);
In both cases, we are not really using the void * pointer p as a pointer at all: we are merely using it as a vessel for some bits. This relies on the fact that on a conventionally-addressed architecture, the implementation-defined behavior of a pointer/integer conversion is the obvious one, which to an assembly-language programmer or an old-school C programmer doesn't seem like a "conversion" at all. And if you can stuff an int into a pointer, it's not surprising if you can stuff in other integral types, like bool, as well.
But what about trying to stuff a floating-point value into a pointer? That's considerably more problematic. Stuffing an integer value into a pointer, though implementation-defined, makes perfect sense if you're doing bare-metal programming: you're taking the numeric value of the integer, and using it as a memory address. But what would it mean to try to stuff a floating-point value into a pointer?
It's so meaningless that the C Standard doesn't even label it "undefined".
It's so meaningless that a typical compiler won't even attempt it.
And if you think about it, it's not even obvious what it should do.
Would you want to use the numeric value, or the bit pattern, as the thing to try to stuff into the pointer? Stuffing in the numeric value is closer to how floating-point-to-integer conversions work, but you'd lose your fractional part. Using the bit pattern is what you'd probably want, but accessing the bit pattern of a floating-point value is never something that C makes easy, as generations of programmers who have attempted things like
uint32_t hexval = (uint32_t)3.0;
have discovered.
Nevertheless, if you were bound and determined to store a floating-point value in a void * pointer, you could probably accomplish it, using sufficiently brute-force casts, although the results are probably both undefined and machine-dependent. (That is, I think there's a strict aliasing violation here, and if pointers are bigger than floats, as of course they are on a 64-bit architecture, I think this will probably only work if the architecture is little-endian.)
float f = 77.75;
void *p = (void *)(uintptr_t)*(uint32_t *)&f;
float f2 = *(float *)&p;
printf("%f\n", f2);
dmr help me, this actually does print 77.75 on my machine.

Does pointer arithmetic have uses outside of arrays?

I think I understand the semantics of pointer arithmetic fairly well, but I only ever see examples when dealing with arrays. Does it have any other uses that can't be achieved by less opaque means? I'm sure you could find a way with clever casting to use it to access members of a struct, but I'm not sure why you'd bother. I'm mostly interested in C, but I'll tag with C++ because the answer probably applies there too.
Edit, based on answers received so far: I know pointers can be used in many non-array contexts. I'm specifically wondering about arithmetic on pointers, e.g. incrementing, taking a difference, etc.
Pointer arithmetic by definition in C happens only on arrays. However, as every object has a representation consisting of an overlaid unsigned char [sizeof object] array, it's also valid to perform pointer arithmetic on this representation. For example:
struct foo {
int a, b, c;
} bar;
/* Equivalent to: bar.c = 1; */
*(int *)((unsigned char *)&bar + offsetof(struct foo, c)) = 1;
Actually char * would work just as well.
If you follow the language standard to the letter, then pointer arithmetic is only defined when pointing to an array, and not in any other case.
A pointer may point to any element of an array, or one step past the end of the array.
From the top of my head I know it's used in XOR linked-lists (very nifty) and I've seen it used in very hacky recursions.
On the other hand, it's very hard to find uses since according to the standard pointer arithmic is only defined if within the bounds of an array.
a[n] is "just" syntactic sugar for *(a + n). For lulz, try the following
int a[2];
0[a] = 10;
1[a] = 20;
So one could argue that indexing and pointer arithmetic are merely interchangeable syntax.
Pointer arithmetic is only defined on arrays. Adding an integer to a pointer that does not point to an array element produces undefined behavior.
In embedded systems, pointers are used to represent addresses or locations. There may not be an array defined. (Although one could say that all of memory is one huge array.)
For example, a stack (holding variables and addresses) is manipulated by adding or subtracting values from the stack pointer. (In this case, the stack could be said to be an array based stack.)
Here's a case for pointer arithmetic outside of (strictly defined) arrays:
double d = 0.5;
unsigned char *bytes = (void *)&d;
for(size_t i = 0; i < sizeof d; i++)
printf("Byte %zu of d is %hhu\n", i, bytes[i]);
Why would you do this? I don't know. But if you want to look at the bitwise representation of an object (useful for things like memcpy and memcmp), you'll need to cast their addresses to unsigned char *s (or signed char *s if you like) and work with them byte-by-byte. (If your task isn't too difficult you can even write the code to work word-by-word, which most memcpy implementations will do. It's the same principle, though, just replace char with int32_t.)
Note that, in the standard, the exact values (or the number of values) that are printed are implementation-defined, but that this will always work as a way to access an object's internal bytewise representation. (It is not required to work for larger integer types, but almost always will - no processor I know of has had trap representations for integers in quite some time).