Are pointer variables just integers with some operators or are they "symbolic"? - c++

EDIT: The original word choice was confusing. The term "symbolic" is much better than the original ("mystical").
In the discussion about my previous C++ question, I have been told that pointers are
"a simple value type much like an integer"
not "mystical"
"The Bit pattern (object representation) contains the value (value representation) (§3.9/4) for trivially copyable types, which a pointer is."
This does not sound right! If nothing is symbolic and a pointer is its representation, then I can do the following. Can I?
#include <stdio.h>
#include <string.h>
int main() {
int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;
if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
printf ("pa1 == pb\n");
*pa1 = 2;
}
else {
printf ("pa1 != pb\n");
pa1 = &a[0]; // ensure well defined behaviour in printf
}
printf ("b = %d *pa1 = %d\n", b, *pa1);
return 0;
}
This is a C and C++ question.
Testing with Compile and Execute C Online with GNU GCC v4.8.3: gcc -O2 -Wall gives
pa1 == pb
b = 1 *pa1 = 2
Testing with Compile and Execute C++ Online with GNU GCC v4.8.3: g++ -O2 -Wall
pa1 == pb
b = 1 *pa1 = 2
So the modification of b via (&a)[1] fails with GCC in C and C++.
Of course, I would like an answer based on standard quotes.
EDIT: To respond to criticism about UB on &a + 1, now a is an array of 1 element.
Related: Dereferencing an out of bound pointer that contains the address of an object (array of array)
Additional note: the term "mystical" was first used, I think, by Tony Delroy here. I was wrong to borrow it.

The first thing to say is that a sample of one test on one compiler generating code on one architecture is not the basis on which to draw a conclusion on the behaviour of the language.
c++ (and c) are general purpose languages created with the intention of being portable. i.e. a well formed program written in c++ on one system should run on any other (barring calls to system-specific services).
Once upon a time, for various reasons including backward-compatibility and cost, memory maps were not contiguous on all processors.
For example I used to write code on a 6809 system where half the memory was paged in via a PIA addressed in the non-paged part of the memory map. My c compiler was able to cope with this because pointers were, for that compiler, a 'mystical' type which knew how to write to the PIA.
The 80386 family has an addressing mode where addresses are organised in groups of 16 bytes. Look up FAR pointers and you'll see different pointer arithmetic.
This is the history of pointer development in c++. Not all chip manufacturers have been "well behaved" and the language accommodates them all (usually) without needing to rewrite source code.

Stealing the quote from TartanLlama:
[expr.add]/5 "[for pointer addition, ] if both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
So the compiler can assume that your pointer points to the a array, or one past the end. If it points one past the end, you cannot defererence it. But as you do, it surely can't be one past the end, so it can only be inside the array.
So now you have your code (reduced)
b = 1;
*pa1 = 2;
where pa points inside an array a and b is a separate variable. And when you print them, you get exactly 1 and 2, the values you have assigned them.
An optimizing compiler can figure that out, without even storing a 1or a 2 to memory. It can just print the final result.

If you turn off the optimiser the code works as expected.
By using pointer arithmetic that is undefined you are fooling the optimiser.
The optimiser has figured out that there is no code writing to b, so it can safely store it in a register. As it turns out, you have acquired the address of b in a non-standard way and modify the value in a way the optimiser doesn't see.
If you read the C standard, it says that pointers may be mystical. gcc pointers are not mystical. They are stored in ordinary memory and consist of the same type of bytes that make up all other data types. The behaviour you encountered is due to your code not respecting the limitations stated for the optimiser level you have chosen.
Edit:
The revised code is still UB. The standard doesn't allow referencing a[1] even if the pointer value happens to be identical to another pointer value. So the optimiser is allowed to store the value of b in a register.

C was conceived as a language in which pointers and integers were very intimately related, with the exact relationship depending upon the target platform. The relationship between pointers and integers made the language very suitable for purposes of low-level or systems programming. For purposes of discussion below, I'll thus call this language "Low-Level C" [LLC].
The C Standards Committee wrote up a description of a different language, where such a relationship is not expressly forbidden, but is not acknowledged in any useful fashion, even when an implementation generates code for a target and application field where such a relationship would be useful. I'll call this language "High Level Only C" [HLOC].
In the days when the Standard was written, most things that called themselves C implementations processed a dialect of LLC. Most useful compilers process a dialect which defines useful semantics in more cases than HLOC, but not as many as LLC. Whether pointers behave more like integers or more like abstract mystical entities depends upon which exact dialect one is using. If one is doing systems programming, it is reasonable to view C as treating pointers and integers as intimately related, because LLC dialects suitable for that purpose do so, and HLOC dialects that don't do so aren't suitable for that purpose. When doing high-end number crunching, however, one would far more often being using dialects of HLOC which do not recognize such a relationship.
The real problem, and source of so much contention, lies in the fact that LLC and HLOC are increasingly divergent, and yet are both referred to by the name C.

Related

What is a valid pointer in gcc linux x86-64 C++?

I am programming C++ using gcc on an obscure system called linux x86-64. I was hoping that may be there are a few folks out there who have used this same, specific system (and might also be able to help me understand what is a valid pointer on this system). I do not care to access the location pointed to by the pointer, just want to calculate it via pointer arithmetic.
According to section 3.9.2 of the standard:
A valid value of an object pointer type represents either the address of a byte in memory (1.7) or a null pointer.
And according to [expr.add]/4:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤
n; otherwise, the behavior is undefined. Likewise, the expression P -
J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j
≤ n; otherwise, the behavior is undefined.
And according to a stackoverflow question on valid C++ pointers in general:
Is 0x1 a valid memory address on your system? Well, for some embedded systems it is. For most OSes using virtual memory, the page beginning at zero is reserved as invalid.
Well, that makes it perfectly clear! So, besides NULL, a valid pointer is a byte in memory, no, wait, it's an array element including the element right after the array, no, wait, it's a virtual memory page, no, wait, it's Superman!
(I guess that by "Superman" here I mean "garbage collectors"... not that I read that anywhere, just smelled it. Seriously, though, all the best garbage collectors don't break in a serious way if you have bogus pointers lying around; at worst they just don't collect a few dead objects every now and then. Doesn't seem like anything worth messing up pointer arithmetic for.).
So, basically, a proper compiler would have to support all of the above flavors of valid pointers. I mean, a hypothetical compiler having the audacity to generate undefined behavior just because a pointer calculation is bad would be dodging at least the 3 bullets above, right? (OK, language lawyers, that one's yours).
Furthermore, many of these definitions are next to impossible for a compiler to know about. There are just so many ways of creating a valid memory byte (think lazy segfault trap microcode, sideband hints to a custom pagetable system that I'm about to access part of an array, ...), mapping a page, or simply creating an array.
Take, for example, a largish array I created myself, and a smallish array that I let the default memory manager create inside of that:
#include <iostream>
#include <inttypes.h>
#include <assert.h>
using namespace std;
extern const char largish[1000000000000000000L];
asm("largish = 0");
int main()
{
char* smallish = new char[1000000000];
cout << "largish base = " << (long)largish << "\n"
<< "largish length = " << sizeof(largish) << "\n"
<< "smallish base = " << (long)smallish << "\n";
}
Result:
largish base = 0
largish length = 1000000000000000000
smallish base = 23173885579280
(Don't ask how I knew that the default memory manager would allocate something inside of the other array. It's an obscure system setting. The point is I went through weeks of debugging torment to make this example work, just to prove to you that different allocation techniques can be oblivious to one another).
Given the number of ways of managing memory and combining program modules that are supported in linux x86-64, a C++ compiler really can't know about all of the arrays and various styles of page mappings.
Finally, why do I mention gcc specifically? Because it often seems to treat any pointer as a valid pointer... Take, for instance:
char* super_tricky_add_operation(char* a, long b) {return a + b;}
While after reading all the language specs you might expect the implementation of super_tricky_add_operation(a, b) to be rife with undefined behavior, it is in fact very boring, just an add or lea instruction. Which is so great, because I can use it for very convenient and practical things like non-zero-based arrays if nobody is putzing with my add instructions just to make a point about invalid pointers. I love gcc.
In summary, it seems that any C++ compiler supporting standard linkage tools on linux x86-64 would almost have to treat any pointer as a valid pointer, and gcc appears to be a member of that club. But I'm not quite 100% sure (given enough fractional precision, that is).
So... can anyone give a solid example of an invalid pointer in gcc linux x86-64? By solid I mean leading to undefined behavior. And explain what gives rise to the undefined behavior allowed by the language specs?
(or provide gcc documentation proving the contrary: that all pointers are valid).
Usually pointer math does exactly what you'd expect regardless of whether pointers are pointing at objects or not.
UB doesn't mean it has to fail. Only that it's allowed to make the whole rest of the program behave strangely in some way. UB doesn't mean that just the pointer-compare result can be "wrong", it means the entire behaviour of the whole program is undefined. This tends to happen with optimizations that depend on a violated assumption.
Interesting corner cases include an array at the very top of virtual address space: a pointer to one-past-the-end would wrap to zero, so start < end would be false?!? But pointer comparison doesn't have to handle that case, because the Linux kernel won't ever map the top page, so pointers into it can't be pointing into or just past objects. See Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?
Related:
GCC does have a max object size of PTRDIFF_MAX (which is a signed type). So for example, on 32-bit x86, an array larger than 2GB isn't fully supported for all cases of code-gen, although you can mmap one.
See my comment on What is the maximum size of an array in C? - this restriction lets gcc implement pointer subtraction (to get a size) without keeping the carry-out from the high bit, for types wider than char where the C subtraction result is in objects, not bytes, so in asm it's (a - b) / sizeof(T).
Don't ask how I knew that the default memory manager would allocate something inside of the other array. It's an obscure system setting. The point is I went through weeks of debugging torment to make this example work, just to prove to you that different allocation techniques can be oblivious to one another).
First of all, you never actually allocated the space for large[]. You used inline asm to make it start at address 0, but did nothing to actually get those pages mapped.
The kernel won't overlap existing mapped pages when new uses brk or mmap to get new memory from the kernel, so in fact static and dynamic allocation can't overlap.
Second, char[1000000000000000000L] ~= 2^59 bytes. Current x86-64 hardware and software only support canonical 48-bit virtual addresses (sign-extended to 64-bit). This will change with a future generation of Intel hardware which adds another level of page tables, taking us up to 48+9 = 57-bit addresses. (Still with the top half used by the kernel, and a big hole in the middle.)
Your unallocated space from 0 to ~2^59 covers all user-space virtual memory addresses that are possible on x86-64 Linux, so of course anything you allocate (including other static arrays) will be somewhere "inside" this fake array.
Removing the extern const from the declaration (so the array is actually allocated, https://godbolt.org/z/Hp2Exc) runs into the following problems:
//extern const
char largish[1000000000000000000L];
//asm("largish = 0");
/* rest of the code unchanged */
RIP-relative or 32-bit absolute (-fno-pie -no-pie) addressing can't reach static data that gets linked after large[] in the BSS, with the default code model (-mcmodel=small where all static code+data is assumed to fit in 2GB)
$ g++ -O2 large.cpp
/usr/bin/ld: /tmp/cc876exP.o: in function `_GLOBAL__sub_I_largish':
large.cpp:(.text.startup+0xd7): relocation truncated to fit: R_X86_64_PC32 against `.bss'
/usr/bin/ld: large.cpp:(.text.startup+0xf5): relocation truncated to fit: R_X86_64_PC32 against `.bss'
collect2: error: ld returned 1 exit status
compiling with -mcmodel=medium places large[] in a large-data section where it doesn't interfere with addressing other static data, but it itself is addressed using 64-bit absolute addressing. (Or -mcmodel=large does that for all static code/data, so every call is indirect movabs reg,imm64 / call reg instead of call rel32.)
That lets us compile and link, but then the executable won't run because the kernel knows that only 48-bit virtual addresses are supported and won't map the program in its ELF loader before running it, or for PIE before running ld.so on it.
peter#volta:/tmp$ g++ -fno-pie -no-pie -mcmodel=medium -O2 large.cpp
peter#volta:/tmp$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffd788a4b60 /* 52 vars */) = -1 EINVAL (Invalid argument)
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)
peter#volta:/tmp$ g++ -mcmodel=medium -O2 large.cpp
peter#volta:/tmp$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffdd3bbad00 /* 52 vars */) = -1 ENOMEM (Cannot allocate memory)
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)
(Interesting that we get different error codes for PIE vs non-PIE executables, but still before execve() even completes.)
Tricking the compiler + linker + runtime with asm("largish = 0"); is not very interesting, and creates obvious undefined behaviour.
Fun fact #2: x64 MSVC doesn't support static objects larger than 2^31-1 bytes. IDK if it has a -mcmodel=medium equivalent. Basically GCC fails to warn about objects too large for the selected memory model.
<source>(7): error C2148: total size of array must not exceed 0x7fffffff bytes
<source>(13): warning C4311: 'type cast': pointer truncation from 'char *' to 'long'
<source>(14): error C2070: 'char [-1486618624]': illegal sizeof operand
<source>(15): warning C4311: 'type cast': pointer truncation from 'char *' to 'long'
Also, it points out that long is the wrong type for pointers in general (because Windows x64 is an LLP64 ABI, where long is 32 bits). You want intptr_t or uintptr_t, or something equivalent to printf("%p") that prints a raw void*.
The Standard does not anticipate the existence of any storage beyond that which the implementation provides via objects of static, automatic, or thread duration, or the use of standard-library functions like calloc. It consequently imposes no restrictions on how implementations process pointers to such storage, since from its perspective such storage doesn't exist, pointers that meaningfully identify non-existent storage don't exist, and things that don't exist don't need to have rules written about them.
That doesn't mean that the people on the Committee weren't well aware that many execution environments provided forms of storage that C implementations might know nothing about. The expected, however, that people who actually worked with various platforms would be better placed than the Committee to determine what kinds of things programmers would need to do with such "outside" addresses, and how to best support such needs. No need for the Standard to concern itself with such things.
As it happens, there are some execution environments where it is more convenient for a compiler to treat pointers arithmetic like integer math than to do anything else, and many compilers for such platforms treat pointer arithmetic usefully even in cases where they're not required to do so. For 32-bit and 64-bit x86 and x64, I don't think there are any bit patterns for invalid non-null addresses, but it may be possible to form pointers that don't behave as valid pointers to the objects they address.
For example, given something like:
char x=1,y=2;
ptrdiff_t delta = (uintptr_t)&y - (uintptr_t)&x;
char *p = &x+delta;
*p = 3;
even if pointer representation is defined in such a way that using integer arithmetic to add delta to the address of x would yield y, that would in no way guarantee that a compiler would recognize that operations on *p might affect y, even if p holds y's address. Pointer p would effectively behave as though its address was invalid even though the bit pattern would match that of y's address.
The following examples show that GCC specifically assumes at least the following:
A global array cannot be at address 0.
An array cannot wrap around address 0.
Examples of unexpected behavior arising from arithmetic on invalid pointers in gcc linux x86-64 C++ (thank you melpomene):
largish == NULL evaluates to false in the program in the question.
unsigned n = ...; if (ptr + n < ptr) { /*overflow */ } can be optimized to if (false).
int arr[123]; int n = ...; if (arr + n < arr || arr + n > arr + 123) can be optimized to if (false).
Note that these examples all involve comparison of the invalid pointers, and therefore may not affect the practical case of non-zero-based arrays. Therefore I have opened a new question of a more practical nature.
Thank you everyone in the chat for helping to narrow down the question.

Undefined behaviour in RE2 which stated to be well defined

Recently I've found that the RE2 library uses this technique for fast set lookups. During the lookup it uses values from uninitialized array, which, as far as I know, is undefined behaviour.
I've even found this issue with valgrind warnings about use of uninitialized memory. But the issue was closed with a comment that this behaviour is indended.
I suppose that in reality an uninitialized array will just contain some random data on all modern compilers and architectures. But on the other hand I treat the 'undefined behaviour' statement as 'literally anything can happen' (including your program formats your hard drive or Godzilla comes and destroys your city).
The question is: is it legal to use uninitialized data in C++?
When C was originally designed, if arr was an array of some type T occupying N bytes, an expression like arr[i] meant "take the base address of arr, add i*N to it, fetch N bytes at the resulting address, and interpret them as a T". If every possible combination of N bytes would have a meaning when interpreted as a type T, fetching an uninitialized array element may yield an arbitrary value, but the behavior would otherwise be predictable. If T is a 32-bit type, an attempt to read an uninitialized array element of type T would yield one of at most 4294967296 possible behaviors; such action would be safe if and only if every one of those 4294967296 behaviors would meet a program's requirements. As you note, there are situations where such a guarantee is useful.
The C Standard, however, describes a semantically-weaker language which does not guarantee that an attempt to read an uninitialized array element will behave in a fashion consistent with any bit pattern the storage might have contain. Compiler writers want to process this weaker language, rather than the one Dennis Ritchie invented, because it allows them to apply a number of optimizations without regard for how they interact. For example, if code performs a=x; and later performs b=a; and c=a;, and if a compiler can't "see" anything between the original assignment and the later ones that could change a or x, it could omit the first assignment and change the latter two assignments to b=x; and c=x;. If, however, something happens between the latter two assignments that would change x, that could result in b and c getting different values--something that should be impossible if nothing changes a.
Applying that optimization by itself wouldn't be a problem if nothing changed x that shouldn't. On the other hand, consider code which uses some allocated storage as type float, frees it, re-allocates it, and uses it as type int. If the compiler knows that the original and replacement request are of the same size, it could recycle the storage without freeing and reallocating it. That could, however, cause the code sequence:
float *fp = malloc(4);
...
*fp = slowCalculation();
somethingElse = *fp;
free(fp);
int *ip = malloc(4);
...
a=*ip;
b=a;
...
c=a;
to get rewritten as:
float *fp = malloc(4);
...
startSlowCalculation(); // Use some pipelined computation unit
int *ip = (int*)fp;
...
b=*ip;
*fp = resultOfSlowCalculation(); // ** Moved from up above
somethingElse = *fp;
...
c=*ip;
It would be rare for performance to benefit particularly from processing the result of the slow calculation between the writes to b and c. Unfortunately, compilers aren't designed in a way that would make it convenient to guarantee that a deferred calculation wouldn't by chance land in exactly the spot where it would cause trouble.
Personally, I regard compiler writers' philosophy as severely misguided: if a programmer in a certain situation knows that a guarantee would be useful, requiring the programmer to work around the lack of it will impose significant cost with 100% certainty. By contrast, a requirement that compiler refrain from optimizations that are predicated on the lack of that guarantee would rarely cost anything (since code to work around its absence would almost certainly block the "optimization" anyway). Unfortunately, some people seem more interested in optimizing the performance of those source texts which don't need guarantees beyond what the Standard mandates, than in optimizing the efficiency with which a compiler can generate code to accomplish useful tasks.

Once again: strict aliasing rule and char*

The more I read, the more confused I get.
The last question from the related ones is closest to my question, but I got confused with all words about object lifetime and especially - is it OK to only read or not.
To get straight to the point. Correct me if I'm wrong.
This is fine, gcc does not give warning and I'm trying to "read type T (uint32_t) via char*":
uint32_t num = 0x01020304;
char* buff = reinterpret_cast< char* >( &num );
But this is "bad" (also gives a warning) and I'm trying "the other way around":
char buff[ 4 ] = { 0x1, 0x2, 0x3, 0x4 };
uint32_t num = *reinterpret_cast< uint32_t* >( buff );
How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)? Plus, adding const does not change the situation in any way.
Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"?
I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).
Is this the same for C and C++ (as I read a comment, implying it's different for the 2 languages)?
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
So, after reading a lot, I'm now more confused. I guess only memcpy is the "good" solution?
Related questions:
What is the strict aliasing rule?
"dereferencing type-punned pointer will break strict-aliasing rules" warning
Do I understand C/C++ strict-aliasing correctly?
Strict aliasing rule and 'char *' pointers
EDIT
The real world situation: I have a third party lib (http://www.fastcrypto.org/), which calculates UMAC and the returned value is in char[ 4 ]. Then I need to convert this to uint32_t. And, btw, the lib uses things like ((UINT32 *)pc->nonce)[0] = ((UINT32 *)nonce)[0] a lot. Anyway.
Also, I'm asking about what is right and what is wrong and why. Not only about the reordering, optimization, etc. (what's interesting is that with -O0 there are no warnings, only with -O2).
And please note: I'm aware of the big/little endian situation. It's not the case here. I really want to ignore the endianness here. The "strict aliasing rules" sounds like something really serious, far more serious than wrong endianness. I mean - like accessing/modifying memory, which is not supposed to be touched; any kind of UB at all.
Quotes from the standards (C and C++) would be really appreciated. I couldn't find anything about aliasing rules or anything relevant.
How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)?
The problem is in the compiler using the rules to determine whether such an optimization is allowed. In the second case you're trying to read a char[] object via an incompatible pointer type, which is undefined behavior; hence, the compiler might re-order the read and write (or do anything else which you might not expect).
But, there are exceptions for "going the other way", i.e. reading an object of some type via a character type.
Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"? I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf chapter 3.10 paragraph 10.
In C99, and also C11, it's 6.5 paragraph 7. For C++11, it's 3.10 ("Lvalues and Rvalues").
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of character type for C or of either unsigned char or char type for C++). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
Although the wording of the standard is horribly ambiguous, in C99 (and beyond) it's clear (at least since C99 TC3) that the intent is to allow type-punning through a union. You must however perform all accesses through the union. It's also not clear that you can "cast a union into existence", that is, the union object must exist first before you use it for type-punning.
the returned value is in char[ 4 ]. Then I need to convert this to uint32_t
Just use memcpy or manually shift the bytes to the correct position, in case byte-ordering is an issue. Good compilers can optimize this out anyway (yes, even the call to memcpy).
I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).
Endianess is the reason for this. Specifically the sequence of bytes 01 00 00 00 could mean 1 or 16,777,216.
The correct way to do what you are doing is to stop trying to trick the compiler into doing the conversion for you and perform the conversion yourself.
For instance if the char[4] is little-endian (smallest byte first) then you would do something like the following.
char[] buff = new char[4];
uint32_t result = 0;
for (int i = 0; i < 4; i++)
result = (result << 8) + buff[i];
This manually performs the conversion between the two and is guaranteed to always be correct as you are doing the mathematical conversion.
Now if you were doing this conversion rapidly it might make sense to use #if and knowledge of your architecture to use a enum to do this automatically as you mentioned, but that is again getting away from portable solutions. (Also you can use something like this as your fallback if you can't be certain)

Is pointer conversion expensive or not?

Is pointer conversion considered expensive? (e.g. how many CPU cycles it takes to convert a pointer/address), especially when you have to do it quite frequently, for instance (just an example to show the scale of freqency, I know there are better ways for this particular cases):
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i);
};
(e.g. how many CPU cycles it takes to convert a pointer/address)
In most machine code languages there is only 1 "type" of pointer and so it doesn't cost anything to convert between them. Keep in mind that C++ types really only exist at compile time.
The real issue is that this sort of code can break strict aliasing rules. You can read more about this elsewhere, but essentially the compiler will either produce incorrect code through undefined behavior, or be forced to make conservative assumptions and thus produce slower code. (note that the char* and friends is somewhat exempt from the undefined behavior part)
Optimizers often have to make conservative assumptions about variables in the presence of pointers. For example, a constant propagation process that knows the value of variable x is 5 would not be able to keep using this information after an assignment to another variable (for example, *y = 10) because it could be that *y is an alias of x. This could be the case after an assignment like y = &x.
As an effect of the assignment to *y, the value of x would be changed as well, so propagating the information that x is 5 to the statements following *y = 10 would be potentially wrong (if *y is indeed an alias of x). However, if we have information about pointers, the constant propagation process could make a query like: can x be an alias of *y? Then, if the answer is no, x = 5 can be propagated safely.
Another optimization impacted by aliasing is code reordering. If the compiler decides that x is not aliased by *y, then code that uses or changes the value of x can be moved before the assignment *y = 10, if this would improve scheduling or enable more loop optimizations to be carried out.
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition, see section 6.5, paragraph 7) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", sometime allows for impressive increases in performance,[1] but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[2] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[3] In such cases, when compiled with gcc, the option -fno-strict-aliasing is invoked to prevent unwanted optimizations that could yield unexpected code.
[edit]
http://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts_with_optimization
What is the strict aliasing rule?
On any architecture you're likely to encounter, all pointer types have the same representation, and so conversion between different pointer types representing the same address has no run-time cost. This applies to all pointer conversions in C.
In C++, some pointer conversions have a cost and some don't:
reinterpret_cast and const_cast (or an equivalent C-style cast, such as the one in the question) and conversion to or from void* will simply reinterpret the pointer value, with no cost.
Conversion between pointer-to-base-class and pointer-to-derived class (either implicitly, or with static_cast or an equivalent C-style cast) may require adding a fixed offset to the pointer value if there are multiple base classes.
dynamic_cast will do a non-trivial amount of work to look up the pointer value based on the dynamic type of the object pointed to.
Historically, some architectures (e.g. PDP-10) had different representations for pointer-to-byte and pointer-to-word; there may be some runtime cost for conversions there.
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i); // bad cast
}
Remember, the machine only knows memory addresses, data and code. Everything else (such as types etc) are known only to the Compiler(that aid the programmer), and that does all the pointer arithmetic, only the compiler knows the size of each type.. so on and so forth.
At runtime, there are no machine cycles wasted in converting one pointer type to another because the conversion does not happen at runtime. All pointers are treated as of 4 bytes long(on a 32 bit machine) nothing more and nothing less.
It all depends on your underlying hardware.
On most machine architectures, all pointers are byte pointers, and converting between a byte pointer and a byte pointer is a no-op. On some architectures, a pointer conversion may under some circumstances require extra manipulation (there are machines that work with word based addresses for instance, and converting a word pointer to a byte pointer or vice versa will require extra manipulation).
Moreover, this is in general an unsafe technique, as the compiler can't perform any sanity checking on what you are doing, and you can end up overwriting data you didn't expect.

Where the C++ literal-constant storage in memory?

Where the C++ literal-constant storage in memory? stack or heap?
int *p = &2 is wrong. I want know why? Thanks
-------------------------------------------------
My question is "Where the C++ literal-constant storage in memory", "int *p = &2 is wrong",not my question.
The details depend on the machine, but assuming a commonest sort of machine and operating system... every executable file contains several "segments" - CODE, BSS, DATA and some others.
CODE holds all the executable opcodes. Actually, it's often named TEXT because somehow that made sense to people way back decades ago. Normally it's read-only.
BSS is uninitialized data - it actually doesn't need to exist in the executable file, but is allocated by the operating system's loader when the program is starting to run.
DATA holds the literal constants - the int8, int16, int32 etc along with floats, string literals, and whatever weird things the compiler and linker care to produce. This is what you're asking about. However, it holds only constants defined for use as variables, as in
const long x = 2;
but unlikely to hold literal constants used in your source code but not tightly associated with a variable. Just a lone '2' is dealt with directly by the compiler. For example in C,
print("%d", 2);
would cause the compiler to build a subroutine call to print(), writing opcodes to push a pointer to the string literal "%d" and the value 2, both as 64-bit integers on a 64-bit machine (you're not one of those laggards still using 32-bit hardware, are you? :) followed by the opcode to jump to a subroutine at (identifier for 'print' subroutine).
The "%d" literal goes into DATA. The 2 doesn't; it's built into the opcode that stuffs integers onto the stack. That might actually be a "load register RAX immediate" followed by the value 2, followed by a "push register RAX", or maybe a single opcode can do the job. So in the final executable file, the 2 will be found in the CODE (aka TEXT) segment.
It typically isn't possible to make a pointer to that value, or to any opcode. It just doesn't make sense in terms of what high level languages like C do (and C is "high level" when you're talking about opcodes and segments.) "&2" can only be an error.
Now, it's not entirely impossible to have a pointer to opcodes. Whenever you define a function in C, or an object method, constructor or destructor in C++, the name of the function can be thought of as a pointer to the first opcode of the machine code compiled from that function. For example, print() without the parentheses is a pointer to a function. Maybe if your example code were in a function and you guess the right offset, pointer arithmetic could be used to point to that "immediate" value 2 nestled among the opcodes, but this is not going to be easy for any contemporary CPU, and certainly isn't for beginners.
Let me quote relevant clauses of C++03 Standard.
5.3.1/2
The result of the unary & operator is a pointer to its operand. The
operand shall be an lvalue.
An integer literal is an rvalue (however, I haven't found a direct quote in C++03 Standard, but C++11 mentiones that as a side note in 3.10/1).
Therefore, it's not possible to take an address of an integer literal.
What about the exact place where 2 is stored, it depends on usage. It might be a part of an machine instruction, or it might be optimized away, e.g. j=i*2 might become j=i+i. You should not rely upon it.
You have two questions:
Where are literal constants stored? With the exception of string
literals (which are actual objects), pretty much wherever the
implementation wants. It will usually depend on what you're doing with
them, but on a lot of architectures, integral constants (and often some
special floating point constants, like 0.0) will end up as part of a
machine instruction. When this isn't possible, they'll usually be
placed in the same logical segment as the code.
As to why taking the address of an rvalue is illegal, the main reason is
because the standard says so. Historically, it's forbidden because such
constants often never exist as a separate object in memory, and thus
have no address. Today... one could imagine other solutions: compilers
are smart enough to put them in memory if you took their address, and
not otherwise; and rvalues of class type do have a memory address.
The rules are somewhat arbitrary (and would be, regardless of what they
were)—hopefully, any rules which would allow taking the address of
a literal would make its type int const*, and not int*.