I think I understand the semantics of pointer arithmetic fairly well, but I only ever see examples when dealing with arrays. Does it have any other uses that can't be achieved by less opaque means? I'm sure you could find a way with clever casting to use it to access members of a struct, but I'm not sure why you'd bother. I'm mostly interested in C, but I'll tag with C++ because the answer probably applies there too.
Edit, based on answers received so far: I know pointers can be used in many non-array contexts. I'm specifically wondering about arithmetic on pointers, e.g. incrementing, taking a difference, etc.
Pointer arithmetic by definition in C happens only on arrays. However, as every object has a representation consisting of an overlaid unsigned char [sizeof object] array, it's also valid to perform pointer arithmetic on this representation. For example:
struct foo {
int a, b, c;
} bar;
/* Equivalent to: bar.c = 1; */
*(int *)((unsigned char *)&bar + offsetof(struct foo, c)) = 1;
Actually char * would work just as well.
If you follow the language standard to the letter, then pointer arithmetic is only defined when pointing to an array, and not in any other case.
A pointer may point to any element of an array, or one step past the end of the array.
From the top of my head I know it's used in XOR linked-lists (very nifty) and I've seen it used in very hacky recursions.
On the other hand, it's very hard to find uses since according to the standard pointer arithmic is only defined if within the bounds of an array.
a[n] is "just" syntactic sugar for *(a + n). For lulz, try the following
int a[2];
0[a] = 10;
1[a] = 20;
So one could argue that indexing and pointer arithmetic are merely interchangeable syntax.
Pointer arithmetic is only defined on arrays. Adding an integer to a pointer that does not point to an array element produces undefined behavior.
In embedded systems, pointers are used to represent addresses or locations. There may not be an array defined. (Although one could say that all of memory is one huge array.)
For example, a stack (holding variables and addresses) is manipulated by adding or subtracting values from the stack pointer. (In this case, the stack could be said to be an array based stack.)
Here's a case for pointer arithmetic outside of (strictly defined) arrays:
double d = 0.5;
unsigned char *bytes = (void *)&d;
for(size_t i = 0; i < sizeof d; i++)
printf("Byte %zu of d is %hhu\n", i, bytes[i]);
Why would you do this? I don't know. But if you want to look at the bitwise representation of an object (useful for things like memcpy and memcmp), you'll need to cast their addresses to unsigned char *s (or signed char *s if you like) and work with them byte-by-byte. (If your task isn't too difficult you can even write the code to work word-by-word, which most memcpy implementations will do. It's the same principle, though, just replace char with int32_t.)
Note that, in the standard, the exact values (or the number of values) that are printed are implementation-defined, but that this will always work as a way to access an object's internal bytewise representation. (It is not required to work for larger integer types, but almost always will - no processor I know of has had trap representations for integers in quite some time).
Related
As an example, consider the following structure:
struct S {
int a[4];
int b[4];
} s;
Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?
Personally, I feel that it must be UB in C++, whereas I'm not sure about C.
However, I failed to find anything relevant in the standards of C and C++ languages.
Update
There are several answers suggesting ways to make sure there is no padding
between fields in order to make the code work reliably. I'd like to emphasize
that if such code is UB, then absense of padding is not enough. If it is UB,
then the compiler is free to assume that accesses to S.a[i] and S.b[j] do not
overlap and the compiler is free to reorder such memory accesses. For example,
int x = s.b[2];
s.a[6] = 2;
return x;
can be transformed to
s.a[6] = 2;
int x = s.b[2];
return x;
which always returns 2.
Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?
No. Because accessing an array out of bound invoked undefined behaviour in C and C++.
C11 J.2 Undefined behavior
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond
the array object and is used as the operand of a unary * operator that
is evaluated (6.5.6).
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
C++ standard draft section 5.7 Additive operators paragraph 5 says:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
[...] If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
Apart from the answer of #rsp (Undefined behavior for an array subscript that is out of range) I can add that it is not legal to access b via a because the C language does not specify how much padding space can be between the end of area allocated for a and the start of b, so even if you can run it on a particular implementation , it is not portable.
instance of struct:
+-----------+----------------+-----------+---------------+
| array a | maybe padding | array b | maybe padding |
+-----------+----------------+-----------+---------------+
The second padding may miss as well as the alignment of struct object is the alignment of a which is the same as the alignment of b but the C language also does not impose the second padding not to be there.
a and b are two different arrays, and a is defined as containing 4 elements. Hence, a[6] accesses the array out of bounds and is therefore undefined behaviour. Note that array subscript a[6] is defined as *(a+6), so the proof of UB is actually given by section "Additive operators" in conjunction with pointers". See the following section of the C11-standard (e.g. this online draft version) describing this aspect:
6.5.6 Additive operators
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i-n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The same argument applies to C++ (though not quoted here).
Further, though it is clearly undefined behaviour due to the fact of exceeding array bounds of a, note that the compiler might introduce padding between members a and b, such that - even if such pointer arithmetics were allowed - a+6 would not necessarily yield the same address as b+2.
Is it legal? No. As others mentioned, it invokes Undefined Behavior.
Will it work? That depends on your compiler. That's the thing about undefined behavior: it's undefined.
On many C and C++ compilers, the struct will be laid out such that b will immediately follow a in memory and there will be no bounds checking. So accessing a[6] will effectively be the same as b[2] and will not cause any sort of exception.
Given
struct S {
int a[4];
int b[4];
} s
and assuming no extra padding, the structure is really just a way of looking at a block of memory containing 8 integers. You could cast it to (int*) and ((int*)s)[6] would point to the same memory as s.b[2].
Should you rely on this sort of behavior? Absolutely not. Undefined means that the compiler doesn't have to support this. The compiler is free to pad the structure which could render the assumption that &(s.b[2]) == &(s.a[6]) incorrect. The compiler could also add bounds checking on the array access (although enabling compiler optimizations would probably disable such a check).
I've have experienced the effects of this in the past. It's quite common to have a struct like this
struct Bob {
char name[16];
char whatever[64];
} bob;
strcpy(bob.name, "some name longer than 16 characters");
Now bob.whatever will be " than 16 characters". (which is why you should always use strncpy, BTW)
As #MartinJames mentioned in a comment, if you need to guarantee that a and b are in contiguous memory (or at least able to be treated as such, (edit) unless your architecture/compiler uses an unusual memory block size/offset and forced alignment that would require padding to be added), you need to use a union.
union overlap {
char all[8]; /* all the bytes in sequence */
struct { /* (anonymous struct so its members can be accessed directly) */
char a[4]; /* padding may be added after this if the alignment is not a sub-factor of 4 */
char b[4];
};
};
You can't directly access b from a (e.g. a[6], like you asked), but you can access the elements of both a and b by using all (e.g. all[6] refers to the same memory location as b[2]).
(Edit: You could replace 8 and 4 in the code above with 2*sizeof(int) and sizeof(int), respectively, to be more likely to match the architecture's alignment, especially if the code needs to be more portable, but then you have to be careful to avoid making any assumptions about how many bytes are in a, b, or all. However, this will work on what are probably the most common (1-, 2-, and 4-byte) memory alignments.)
Here is a simple example:
#include <stdio.h>
union overlap {
char all[2*sizeof(int)]; /* all the bytes in sequence */
struct { /* anonymous struct so its members can be accessed directly */
char a[sizeof(int)]; /* low word */
char b[sizeof(int)]; /* high word */
};
};
int main()
{
union overlap testing;
testing.a[0] = 'a';
testing.a[1] = 'b';
testing.a[2] = 'c';
testing.a[3] = '\0'; /* null terminator */
testing.b[0] = 'e';
testing.b[1] = 'f';
testing.b[2] = 'g';
testing.b[3] = '\0'; /* null terminator */
printf("a=%s\n",testing.a); /* output: a=abc */
printf("b=%s\n",testing.b); /* output: b=efg */
printf("all=%s\n",testing.all); /* output: all=abc */
testing.a[3] = 'd'; /* makes printf keep reading past the end of a */
printf("a=%s\n",testing.a); /* output: a=abcdefg */
printf("b=%s\n",testing.b); /* output: b=efg */
printf("all=%s\n",testing.all); /* output: all=abcdefg */
return 0;
}
No, since accesing an array out of bounds invokes Undefined Behavior, both in C and C++.
Short Answer: No. You're in the land of undefined behavior.
Long Answer: No. But that doesn't mean that you can't access the data in other sketchier ways... if you're using GCC you can do something like the following (elaboration of dwillis's answer):
struct __attribute__((packed,aligned(4))) Bad_Access {
int arr1[3];
int arr2[3];
};
and then you could access via (Godbolt source+asm):
int x = ((int*)ba_pointer)[4];
But that cast violates strict aliasing so is only safe with g++ -fno-strict-aliasing. You can cast a struct pointer to a pointer to the first member, but then you're back in the UB boat because you're accessing outside the first member.
Alternatively, just don't do that. Save a future programmer (probably yourself) the heartache of that mess.
Also, while we're at it, why not use std::vector? It's not fool-proof, but on the back-end it has guards to prevent such bad behavior.
Addendum:
If you're really concerned about performance:
Let's say you have two same-typed pointers that you're accessing. The compiler will more than likely assume that both pointers have the chance to interfere, and will instantiate additional logic to protect you from doing something dumb.
If you solemnly swear to the compiler that you're not trying to alias, the compiler will reward you handsomely:
Does the restrict keyword provide significant benefits in gcc / g++
Conclusion: Don't be evil; your future self, and the compiler will thank you.
Jed Schaff’s answer is on the right track, but not quite correct. If the compiler inserts padding between a and b, his solution will still fail. If, however, you declare:
typedef struct {
int a[4];
int b[4];
} s_t;
typedef union {
char bytes[sizeof(s_t)];
s_t s;
} u_t;
You may now access (int*)(bytes + offsetof(s_t, b)) to get the address of s.b, no matter how the compiler lays out the structure. The offsetof() macro is declared in <stddef.h>.
The expression sizeof(s_t) is a constant expression, legal in an array declaration in both C and C++. It will not give a variable-length array. (Apologies for misreading the C standard before. I thought that sounded wrong.)
In the real world, though, two consecutive arrays of int in a structure are going to be laid out the way you expect. (You might be able to engineer a very contrived counterexample by setting the bound of a to 3 or 5 instead of 4 and then getting the compiler to align both a and b on a 16-byte boundary.) Rather than convoluted methods to try to get a program that makes no assumptions whatsoever beyond the strict wording of the standard, you want some kind of defensive coding, such as static assert(&both_arrays[4] == &s.b[0], "");. These add no run-time overhead and will fail if your compiler is doing something that would break your program, so long as you don’t trigger UB in the assertion itself.
If you want a portable way to guarantee that both sub-arrays are packed into a contiguous memory range, or split a block of memory the other way, you can copy them with memcpy().
The Standard does not impose any restrictions upon what implementations must do when a program tries to use an out-of-bounds array subscript in one structure field to access a member of another. Out-of-bounds accesses are thus "illegal" in strictly conforming programs, and programs which make use of such accesses cannot simultaneously be 100% portable and free of errors. On the other hand, many implementations do define the behavior of such code, and programs which are targeted solely at such implementations may exploit such behavior.
There are three issues with such code:
While many implementations lay out structures in predictable fashion, the Standard allows implementations to add arbitrary padding before any structure member other than the first. Code could use sizeof or offsetof to ensure that structure members are placed as expected, but the other two issues would remain.
Given something like:
if (structPtr->array1[x])
structPtr->array2[y]++;
return structPtr->array1[x];
it would normally be useful for a compiler to assume that the use of structPtr->array1[x] will yield the same value as the preceding use in the "if" condition, even though it would change the behavior of code that relies upon aliasing between the two arrays.
If array1[] has e.g. 4 elements, a compiler given something like:
if (x < 4) foo(x);
structPtr->array1[x]=1;
might conclude that since there would be no defined cases where x isn't less than 4, it could call foo(x) unconditionally.
Unfortunately, while programs can use sizeof or offsetof to ensure that there aren't any surprises with struct layout, there's no way by which they can test whether compilers promise to refrain from the optimizations of types #2 or #3. Further, the Standard is a little vague about what would be meant in a case like:
struct foo {char array1[4],array2[4]; };
int test(struct foo *p, int i, int x, int y, int z)
{
if (p->array2[x])
{
((char*)p)[x]++;
((char*)(p->array1))[y]++;
p->array1[z]++;
}
return p->array2[x];
}
The Standard is pretty clear that behavior would only be defined if z is in the range 0..3, but since the type of p->array in that expression is char* (due to decay) it's not clear the cast in the access using y would have any effect. On the other hand, since converting pointer to the first element of a struct to char* should yield the same result as converting a struct pointer to char*, and the converted struct pointer should be usable to access all bytes therein, it would seem the access using x should be defined for (at minimum) x=0..7 [if the offset of array2 is greater than 4, it would affect the value of x needed to hit members of array2, but some value of x could do so with defined behavior].
IMHO, a good remedy would be to define the subscript operator on array types in a fashion that does not involve pointer decay. In that case, the expressions p->array[x] and &(p->array1[x]) could invite a compiler to assume that x is 0..3, but p->array+x and *(p->array+x) would require a compiler to allow for the possibility of other values. I don't know if any compilers do that, but the Standard doesn't require it.
I came across a line of code written in C++:
long *lbuf = (long*)spiReadBuffer;
And it turns out that "spiReadBuffer" is a byte array with 12 elements. But I am a little confused. I think I am familiar with defining pointers and I can see that "lbuf" is a type "long" pointer. Also I thought for casting we can do something like this:
y = (int) x;
But what if I put a "*" after the "int" just like my first example, where there is one after "long"?
I apologize if this is a really trivial question, but as I went through the type casting and pointers topics I did not come across my case and I did not really understand it.
I would appreciate it if you could guide me or introduce me to any relevant materials or resources.
This is called type punning. It tricks the compiler into reading the memory occupied by an object as if it was of another type.
In your case, the array spiReadBuffer decays to a pointer to its first element, then the pointer is cast and stored. When you dereference this pointer, you will access the beginning of the array as if it were a long.
The problem with this approach is that it triggers undefined behaviour (see strict aliasing). So even though it works in a lot of situations, it can also break without notice.
There are two ways (that I know of) to type-pun safely. The first one is standard-compliant : std::memcpy.
char spiReadBuffer[12];
long rbAsLong;
std::memcpy(&rbAsLong, &spiReadBuffer, sizeof rbAsLong);
// rbAsLong contains the first four bytes of spiReadBuffer, reinterpreted as a long.
The second one involves an extension that is often provided by compilers (but you should check), that extends the behaviour of unions.
union {
char buf[12];
long asLong;
} spiReadBuffer;
The standard states that writing to a member of a union then reading from another member is undefined behaviour. These compiler extensions choose to define it as a safe reinterpretation.
in C/C++ arrays are treated the same way by the compiler:
char spiReadBuffer[12];
char* pBuffer;
the compiler will treat both spiReadBuffer and pBuffer as pointers.
The code snippet
long *lbuf = (long*)spiReadBuffer;
is an example of type casting, only it's for pointer types. A char* is converted to a long*; You could say this is a type of pointer arithmetic because now, you can read sizeof(long) bytes from spiReadBuffer using the long* ( instead of one byte at a time ).
The second snippet you showed : y = (int) x; is also a cast, but not for pointers;
Consider this snippet:
char spiReadBuffer[] = {1,2,3,4,5,6,7,8};
long *lbuf = (long*)spiReadBuffer;
printf ("%08x\n", lbuf[0]);
It will print 04030201 on a little endian architecture or 01020304 on a little endian architecture.
After the long *lbuf = (long*)spiReadBuffer statement lBuf points to the beginning of the spiReadBuffer and lbuf[0] (or *lBuf) allows you to read the first 4 bytes of spiReadBuffer as a long.
I have learned recently that size_t was introduced to help future-proof code against native bit count increases and increases in available memory. The specific use definition seems to be on the storing of the size of something, generally an array.
I now must wonder how far this future proofing should be taken. Surely it is pointless to have an array length defined using the future-proof and appropriately sized size_t if the very next task of iterating over the array uses say an unsigned int as the index array:
void (double* vector, size_t vectorLength) {
for (unsigned int i = 0; i < vectorLength; i++) {
//...
}
}
In fact in this case I might expect the syntax strictly should up-convert the unsigned int to a size_t for the relation operator.
Does this imply the iterator variable i should simply be a size_t?
Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values? i.e.
double foo[100];
//...
int a = 4;
int b = -10;
int c = 50;
int index = a + b + c;
double d = foo[(size_t)index];
Surely though since my code logic creates a fixed bound, up-converting to the size_t provides no additional protection.
You should keep in mind the automatic conversion rules of the language.
Does this imply the iterator variable i should simply be a size_t?
Yes it does, because if size_t is larger than unsigned int and your array is actually larger than can be indexed with an unsigned int, then your variable (i) can never reach the size of the array.
Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
You try to make it sound drastic, while it's not. Why do you choose a variable as double and not float? Why would you make a variable as unsigned and one not? Why would you make a variable short while another is int? Of course, you always know what your variables are going to be used for, so you decide what types they should get. The choice of size_t is one among many and it's similarly decided.
In other words, every variable in a program should be functionally identified and given the correct type.
Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values?
Not at all. First, if the variable can never have negative values, then it could have been unsigned int or size_t in the first place. Second, if the variable can have negative values during computation, then you should definitely make sure that in the end it's non-negative, because you shouldn't index an array with a negative number.
That said, if you are sure your index is non-negative, then casting it to size_t doesn't make any difference. C11 at 6.5.2.1 says (emphasis mine):
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2th element of E1 (counting from zero).
Which means whatever type of index for which some_pointer + index makes sense, is allowed to be used as index. In other words, if you know your int has enough space to contain the index you are computing, there is absolutely no need to cast it to a different type.
Surely it is pointless to have an array length defined using the future-proof and appropriately sized size_t if the very next task of iterating over the array uses say an unsigned int as the index array
Yes it is. So don't do it.
In fact in this case I might expect the syntax strictly should up-convert the unsigned int to a size_t for the relation operator.
It will only be promoted in that particular < operation. The upper limit of your int variable will not be changed, so the ++ operation will always work with an int, rather than a size_t.
Does this imply the iterator variable i should simply be a size_t?
Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
Yeah well, it is better than int... But there is a smarter way to write programs: use common sense. Whenever you declare an array, you can actually stop and consider in advance how many items the array would possibly need to store. If it will never contain more than 100 items, there is absolutely no reason for you to use int nor to use size_t to index it.
In the 100 items case, simply use uint_fast8_t. Then the program is optimized for size as well as speed, and 100% portable.
Whenever declaring a variable, a good programmer will activate their brain and consider the following:
What is the range of the values that I will store inside this variable?
Do I actually need to store negative numbers in it?
In the case of an array, how many values will I need in the worst-case? (If unknown, do I have to use dynamic memory?)
Are there any compatibility issues with this variable if I decide to port this program?
As opposed to a bad programmer, who does not activate their brain but simply types int all over the place.
As discussed by Neil Kirk, iterators are a future proof counterpart of size_t.
An additional point in your question is the computation of a position, and this typically includes an absolute position (e.g. a in your example) and possibly one or more relative quantities (e.g. b or c), potentially signed.
The signed counterpart of size_t is ptrdiff_t and the analogous for iterator type I is typename I::difference_type.
As you describe in your question, it is best to use the appropriate types everywhere in your code, so that no conversions are needed. For memory efficiency, if you have e.g. an array of one million positions into other arrays and you know these positions are in the range 0-255, then you can use unsigned char; but then a conversion is necessary at some point.
In such cases, it is best to name this type, e.g.
using pos = unsigned char;
and make all conversions explicit. Then the code will be easier to maintain, should the range 0-255 increase in the future.
Yep, if you use int to index an array, you defeat the point of using size_t in other places. This is why you can use iterators with STL. They are future proof. For C arrays, you can use either size_t, pointers, or algorithms and lambdas or range-based for loops (C++11). If you need to store the size or index in variables, they will need to be size_t or other appropriate types, as will anything else they interact with, unless you know the size will be small. (For example, if you store the distance between two elements which will always be in a small range, you can use int).
double *my_array;
for (double *it = my_array, *end_it = my_array + my_array_size, it != end_it; ++it)
{
// use it
}
std::for_each(std::begin(my_array), std::end(my_array), [](double& x)
{
// use x
});
for (auto& x : my_array)
{
// use x
}
Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
I'll pick that point, and say clearly Yes. Besides, in most cases a variable used as an array index is only used as that (or something related to it).
And this rule does not only apply here, but also in other circumstances: There are many use cases where nowadays a special type exists: ptrdiff_t, off_t (which even may change depeding on the configuration we use!), pid_t and a lot of others.
void* is a useful feature of C and derivative languages. For example, it's possible to use void* to store objective-C object pointers in a C++ class.
I was working on a type conversion framework recently and due to time constraints was a little lazy - so I used void*... That's how this question came up:
Why can I typecast int to void*, but not float to void* ?
BOOL is not a C++ type. It's probably typedef or defined somewhere, and in these cases, it would be the same as int. Windows, for example, has this in Windef.h:
typedef int BOOL;
so your question reduces to, why can you typecast int to void*, but not float to void*?
int to void* is ok but generally not recommended (and some compilers will warn about it) because they are inherently the same in representation. A pointer is basically an integer that points to an address in memory.
float to void* is not ok because the interpretation of the float value and the actual bits representing it are different. For example, if you do:
float x = 1.0;
what it does is it sets the 32 bit memory to 00 00 80 3f (the actual representation of the float value 1.0 in IEEE single precision). When you cast a float to a void*, the interpretation is ambiguous. Do you mean the pointer that points to location 1 in memory? or do you mean the pointer that points to location 3f800000 (assuming little endian) in memory?
Of course, if you are sure which of the two cases you want, there is always a way to get around the problem. For example:
void* u = (void*)((int)x); // first case
void* u = (void*)(((unsigned short*)(&x))[0] | (((unsigned int)((unsigned short*)(&x))[1]) << 16)); // second case
Pointers are usually represented internally by the machine as integers. C allows you to cast back and forth between pointer type and integer type. (A pointer value may be converted to an integer large enough to hold it, and back.)
Using void* to hold integer values in unconventional. It's not guaranteed by the language to work, but if you want to be sloppy and constrain yourself to Intel and other commonplace platforms, it will basically scrape by.
Effectively what you're doing is using void* as a generic container of however many bytes are used by the machine for pointers. This differs between 32-bit and 64-bit machines. So converting long long to void* would lose bits on a 32-bit platform.
As for floating-point numbers, the intention of (void*) 10.5f is ambiguous. Do you want to round 10.5 to an integer, then convert that to a nonsense pointer? No, you want the bit-pattern used by the FPU to be placed into a nonsense pointer. This can be accomplished by assigning float f = 10.5f; void *vp = * (uint32_t*) &f;, but be warned that this is just nonsense: pointers aren't generic storage for bits.
The best generic storage for bits is char arrays, by the way. The language standards guarantee that memory can be manipulated through char*. But you have to mind data alignment requirements.
Standard says that 752 An integer may be converted to any pointer type. Doesn't say anything about pointer-float conversion.
Considering any of you want you transfer float value as void *, there is a workaround using type punning.
Here is an example;
struct mfloat {
union {
float fvalue;
int ivalue;
};
};
void print_float(void *data)
{
struct mfloat mf;
mf.ivalue = (int)data;
printf("%.2f\n", mf.fvalue);
}
struct mfloat mf;
mf.fvalue = 1.99f;
print_float((void *)(mf.ivalue));
we have used union to cast our float value(fvalue) as an integer(ivalue) to void*, and vice versa
The question is based on a false premise, namely that void * is somehow a "generic" or "catch-all" type in C or C++. It is not. It is a generic object pointer type, meaning that it can safely store pointers to any type of data, but it cannot itself contain any type of data.
You could use a void * pointer to generically manipulate data of any type by allocating sufficient memory to hold an object of any given type, then using a void * pointer to point to it. In some cases you could also use a union, which is of course designed to be able to contain objects of multiple types.
Now, because pointers can be thought of as integers (and indeed, on conventionally-addressed architectures, typically are integers) it is possible and in some circles fashionable to stuff an integer into a pointer. Some library API's have even documented and supported this usage — one notable example was X Windows.
Conversions between pointers and integers are implementation-defined, and these days typically draw warnings, and so typically require an explicit cast, not so much to force the conversion as simply to silence the warning. For example, both the code fragments below print 77, but the first one probably draws compiler warnings.
/* fragment 1: */
int i = 77;
void *p = i;
int j = p;
printf("%d\n", j);
/* fragment 2: */
int i = 77;
void *p = (void *)(uintptr_t)i;
int j = (int)p;
printf("%d\n", j);
In both cases, we are not really using the void * pointer p as a pointer at all: we are merely using it as a vessel for some bits. This relies on the fact that on a conventionally-addressed architecture, the implementation-defined behavior of a pointer/integer conversion is the obvious one, which to an assembly-language programmer or an old-school C programmer doesn't seem like a "conversion" at all. And if you can stuff an int into a pointer, it's not surprising if you can stuff in other integral types, like bool, as well.
But what about trying to stuff a floating-point value into a pointer? That's considerably more problematic. Stuffing an integer value into a pointer, though implementation-defined, makes perfect sense if you're doing bare-metal programming: you're taking the numeric value of the integer, and using it as a memory address. But what would it mean to try to stuff a floating-point value into a pointer?
It's so meaningless that the C Standard doesn't even label it "undefined".
It's so meaningless that a typical compiler won't even attempt it.
And if you think about it, it's not even obvious what it should do.
Would you want to use the numeric value, or the bit pattern, as the thing to try to stuff into the pointer? Stuffing in the numeric value is closer to how floating-point-to-integer conversions work, but you'd lose your fractional part. Using the bit pattern is what you'd probably want, but accessing the bit pattern of a floating-point value is never something that C makes easy, as generations of programmers who have attempted things like
uint32_t hexval = (uint32_t)3.0;
have discovered.
Nevertheless, if you were bound and determined to store a floating-point value in a void * pointer, you could probably accomplish it, using sufficiently brute-force casts, although the results are probably both undefined and machine-dependent. (That is, I think there's a strict aliasing violation here, and if pointers are bigger than floats, as of course they are on a 64-bit architecture, I think this will probably only work if the architecture is little-endian.)
float f = 77.75;
void *p = (void *)(uintptr_t)*(uint32_t *)&f;
float f2 = *(float *)&p;
printf("%f\n", f2);
dmr help me, this actually does print 77.75 on my machine.
Is the following code 100% portable?
int a=10;
size_t size_of_int = (char *)(&a+1)-(char*)(&a); // No problem here?
std::cout<<size_of_int;// or printf("%zu",size_of_int);
P.S: The question is only for learning purpose. So please don't give answers like Use sizeof() etc
From ANSI-ISO-IEC 14882-2003, p.87 (c++03):
"75) Another way to approach pointer
arithmetic is first to convert the
pointer(s) to character pointer(s): In
this scheme the integral value of the
expression added to or subtracted from
the converted pointer is first
multiplied by the size of the object
originally pointed to, and the
resulting pointer is converted back to
the original type. For pointer
subtraction, the result of the
difference between the character
pointers is similarly divided by the
size of the object originally pointed
to."
This seems to suggest that the pointer difference equals to the object size.
If we remove the UB'ness from incrementing a pointer to a scalar a and turn a into an array:
int a[1];
size_t size_of_int = (char*)(a+1) - (char*)(a);
std::cout<<size_of_int;// or printf("%zu",size_of_int);
Then this looks OK. The clauses about alignment requirements are consistent with the footnote, if alignment requirements are always divisible by the size of the object.
UPDATE: Interesting. As most of you probably know, GCC allows to specify an explicit alignment to types as an extension. But I can't break OP's "sizeof" method with it because GCC refuses to compile it:
#include <stdio.h>
typedef int a8_int __attribute__((aligned(8)));
int main()
{
a8_int v[2];
printf("=>%d\n",((char*)&v[1]-(char*)&v[0]));
}
The message is error: alignment of array elements is greater than element size.
&a+1 will lead to undefined behavior according to the C++ Standard 5.7/5:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of
the pointer operand. <...> If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
&a+1 is OK according to 5.7/4:
For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first
element of an array of length one with the type of the object as its element type.
That means that 5.7/5 can be applied without UB. And finally remark 75 from 5.7/6 as #Luther Blissett noted in his answer says that the code in the question is valid.
In the production code you should use sizeof instead. But the C++ Standard doesn't guarantee that sizeof(int) will result in 4 on every 32-bit platform.
No. This code won't work as you expect on every plattform. At least in theory, there might be a plattform with e.g. 24 bit integers (=3 bytes) but 32 bit alignment. Such alignments are not untypical for (older or simpler) plattforms. Then, your code would return 4, but sizeof( int ) would return 3.
But I am not aware of a real hardware that behaves that way. In practice, your code will work on most or all plattforms.
It's not 100% portable for the following reasons:
Edit: You'd best use int a[1]; and then a+1 becomes definitively valid.
&a invokes undefined behaviour on objects of register storage class.
In case of alignment restrictions that are larger or equal than the size of int type, size_of_int will not contain the correct answer.
Disclaimer:
I am uncertain if the above hold for C++.
Why not just:
size_t size_of_int = sizeof(int);
It is probably implementation defined.
I can imagine a (hypothetical) system where sizeof(int) is smaller than the default alignment.
It looks only safe to say that size_of_int >= sizeof(int)
The code above will portably compute sizeof(int) on a target platform but the latter is implementation defined - you will get different results on different platforms.
Yes, it gives you the equivalent of sizeof(a) but using ptrdiff_t instead of size_t type.
There was a debate on a similar question.
See the comments on my answer to that question for some pointers at why this is not only non-portable, but also is undefined behaviour by the standard.