I cannot figure this out:
int main() {
int (*) (int *) = 5;
return 0;
}
The above assignment compiles with g++ c++11. I know that int (*) (int *) is a pointer to a function that accepts an (int *) as argument and returns an int, but I do not understand how you could equate it to 5. At first I thought it is a function that constantly returns 5 (from my recent learning in F#, probably, haha), then I thought, briefly, that the function pointer points to memory location 5, but that does not work, clearly, and neither does hex values.
Thinking that it could be because the function returns an int, and that assigning an int is ok (somehow), I also tried this:
int * (*) (int *) = my_ptr
where my_ptr is of type int *, the same type as this second function pointer, as in the first case with type int. This does not compile. Assigning 5, or any int value, instead of my_ptr, doesn't compile for this function pointer either.
So what does the assignment mean?
Update 1
We have confirmation that it is a bug, as shown in the best answer. However, it is still not known what actually happens to the value that you assign to the function pointer, or what happens with the assignment. Any (good) explanations on that would be very much appreciated! Please refer to the edits below for more clarity on the problem.
Edit 1
I am using gcc version 4.8.2 (in Ubuntu 4.8.2)
Edit 2
Actually, equating it to anything works on my compiler. Even equating it to a std::string variable, or a function name that returns a double, works.
Edit 2.1
Interestingly, making it a function pointer to any function that returns a data type that is not a pointer, will let it compile, such as
std::string (*) () = 5.6;
But as soon as the function pointer is to a function that returns some pointer, it does not compile, such as with
some_data_type ** (*) () = any_value;
It's a bug in g++.
int (*) (int *)
is a type name.
In C++ you cannot have a declaration with a type name without an identifier.
So this compiles with g++.
int (*) (int *) = 5;
and this compiles as well:
int (*) (int *);
but they are both invalid declarations.
EDIT:
T.C. mentions in the comments bugzilla bug 60680 with a similar test case but it has not yet been approved. The bug is confirmed in bugzilla.
EDIT2:
When the two declarations above are at file scope g++ correctly issues a diagnostic (it fails to issue the diagnostic at block scope).
EDIT3:
I checked and I can reproduce the issue on the latest release of g++ version 4 (4.9.2), latest pre-release version 5 (5.0.1 20150412) and latest experimental version 6 (6.0.0 20150412).
It is not valid C++. Remember that because your particular compiler happens to compile it doesn't make it valid. Compilers, like all complex software, sometimes have bugs and this appears to be one.
By contrast clang++ complains:
funnycast.cpp:3:11: error: expected expression
int (*) (int *) = 5;
^
funnycast.cpp:3:18: error: expected '(' for function-style cast or type construction
int (*) (int *) = 5;
~~~ ^
funnycast.cpp:3:19: error: expected expression
int (*) (int *) = 5;
^
3 errors generated.
This is the expected behavior because the offending line is not valid C++. It purports to be an assignment (because of the =) but contains no identifier.
As other answers have pointed out, it is a bug that
int (*) (int *) = 5;
compiles. A reasonable approximation of this statement that would be expected to have a meaning is:
int (*proc)(int*) = (int (*)(int*))(5);
Now proc is a pointer-to-function that expects the address 5 to be the base address of a function that takes an int* and returns an int.
On some microcontrollers/microprocessors 5 might be a valid code address, and it might be possible to locate such a function there.
On most general-purpose computers, the first page of memory (addresses 0-1023 for 4K pages) are purposely invalid (unmapped) in order to catch null pointer accesses.
Thus, while behavior depends on the platform, one can reasonably expect a page fault to occur when *proc is invoked (e.g., (*proc)(&v)). Before the time at which *proc is invoked, nothing unusual happens.
Unless you are writing a dynamic linker, you almost certainly shouldn't be numerically calculating addresses and assigning them to pointer-to-function variables.
/usr/lib/gcc/x86_64-pc-cygwin/4.9.2/cc1plus.exe -da so.cpp
This command line generates a lot of intermediate files. The first of them, so.cpp.170r.expand, says:
...
int main() ()
{
int D.2229;
int _1;
;; basic block 2, loop depth 0
;; pred: ENTRY
_1 = 0;
;; succ: 3
;; basic block 3, loop depth 0
;; pred: 2
<L0>:
return _1;
;; succ: EXIT
}
...
This still doesn’t answer what happens exactly, but it should be a step in the right direction.
Related
This is perplexing. Using g++ 4.9.1:
int main()
{
void* r1 = __builtin_return_address(0); // fine
unsigned int foo = 0;
void* r2 = __builtin_return_address(foo); // does not compile
}
The error returned is error: invalid argument to ‘__builtin_return_address’
The documentation says that this function takes an unsigned int. I know the __builtin functions have all kinds of weirdness, and this just might be how life is, but I need to be able to step through this thing with an incrementing variable for a stack dumper I'm trying to implement. If it only accepts constant arguments, that's not really possible.
Is there a workaround or a better way?
Just make your own huge switch/case or if/else tree up to as many levels as you may need. You can use macros to make it simpler.
I've been using C/C++ for about three years and I can't believe I've never encountered this issue before!
This following code compiles (I've just tried using gcc):
#include <iostream>
int change_i(int i) {
int j = 8;
return j;
}
int main() {
int i = 10;
change_i(10);
std::cout << "i = " << i << std::endl;
}
And, the program prints i = 10, as you might expect.
My question is -- why does this compile? I would have expected an error, or at least a warning, saying there was a value returned which is unused.
Naively, I would consider this a similar case to when you accidentally forget the return call in a non-void function. I understand it's different and I can see why there's nothing inherently wrong with this code, but it seems dangerous. I've just spotted a similar error in some very old code of mine, representing a bug which goes back a long time. I obviously meant to do:
i = change_i(10);
But forgot, so it was never changed (I know this example is silly, the exact code is much more complicated). Any thoughts would be much appreciated!
It compiles because calling a function and ignoring the return result is very common. In fact, the last line of main does so too.
std::cout << "i = " << i << std::endl;
is actually short for:
(std::cout).operator<<("i =").operator<<(i).operator<<(std::endl);
... and you are not using the value returned from the final operator<<.
Some static checkers have options to warn when function returns are ignored (and then options to annotate a function whose returns are often ignored). Gcc has an option to mark a function as requiring the return value be used (__attribute__((warn_unused_result))) - but it only works if the return type doesn't have a destructor :-(.
Ignoring the return value of a function is perfectly valid. Take this for example:
printf("hello\n");
We're ignoring the return value of printf here, which returns the number of characters printed. In most cases, you don't care how many characters are printed. If compilers warned about this, everyone's code would show tons of warnings.
This actually a specific case of ignoring the value of an expression, where in this case the value of the expression is the return value of a function.
Similarly, if you do this:
i++;
You have an expression whose value is discarded (i.e. the value of i before being incremented), however the ++ operator still increments the variable.
An assignment is also an expression:
i = j = k;
Here, you have two assignment expressions. One is j = k, whose value is the value of k (which was just assigned to j). This value is then used as the right hand side an another assignment to i. The value of the i = (j = k) expression is then discarded.
This is very different from not returning a value from a non-void function. In that case, the value returned by the function is undefined, and attempting to use that value results in undefined behavior.
There is nothing undefined about ignoring the value of an expression.
The short reason it is allowed is because that's what the standard specifies.
The statement
change_i(10);
discards the value returned by change_i().
The longer reason is that most expressions both have an effect and produce a result. So
i = change_i(10);
will set i to be 8, but the assignment expression itself also has a result of 8. This is why (if j is of type int)
j = i = change_i(10);
will cause both j and i to have the value of 8. This sort of logic can continue indefinitely - which is why expressions can be chained, such as k = i = j = 10. So - from a language perspective - it does not make sense to require that a value returned by a function is assigned to a variable.
If you want to explicitly discard the result of a function call, it is possible to do
(void)change_i(10);
and a statement like
j = (void)change_i(10);
will not compile, typically due to a mismatch of types (an int cannot be assigned the value of something of type void).
All that said, several compilers (and static code analysers) can actually be configured to give a warning if the caller does not use a value returned by a function. Such warnings are turned off by default - so it is necessary to compile with appropriate settings (e.g. command line options).
I've been using C/C++ for about three years
I can suppose that during these three years you used standard C function printf. For example
#include <stdio.h>
int main( void )
{
printf( "Hello World!\n" );
}
The function has return type that differs from void. However I am sure that in most cases you did not use the return value of the function.:)
If to require that the compiler would issue an error when the return value of a function is not used then the code similar to the shown above would not compile because the compiler does not have an access to the source code of the function and can not determine whether the function has a side effect.:)
Consider another standard C functions - string functions.
For example function strcpy is declared like
char * strcpy( char *destination, const char *source );
If you have for example the following character arrays
char source[] = "Hello World!";
char destination[sizeof( source )];
then the function usually is called like
strcpy( destination, source );
There is no sense to use its return value when you need just to copy a string. Moreover for the shown example you even may not write
destination = strcpy( destination, source );
The compiler will issue an error.
So as you can see there is sense to ignore sometimes return values of functions.
For your own example the compiler could issue a message that the function does not have a side effect so its call is obsolete. In any case it should issue a message that the function parameter is not used.:)
Take into account that sometimes the compiler does not see a function definition that is present in some other compilation unit or in a library. So the compiler is unable to determine whether a function has a side effect,
In most cases compilers deal with function declarations. Sometimes the function definitions are not available for compilers in C and C++.
Look at this program:
#include <iostream>
using namespace std;
int main()
{
const int x = 0;
int *p;
p=(int*)&x;
(*p)++;
cout<<x<<endl;
cout<<*p;
}
As you see above, I declared x as a const int, and using casting, a non const pointer named p is points to it. In the middle of the body of my program, I increased the value of x by one using (*p)++ (How is it possible, while x is defined as const?)
Now,when I print *p and x, they returns different values, while *p is supposed to point to address of x :
ap1019#sharifvm:~$ ./a.out
0
1
Why?
The change of variable after constant removal causes the undefined behaviour, in some cases it will just work as if it wouldn't be const, in some it will cause the memory violation error, in some, it will turn your computer into the rabbit which will try to kill you...
A bit of background on the behaviour. Imagine you are a compiler. You encounter the variable:
const int blah = 3;
And then you encounter the following operation:
int foo = 4 + blah;
As you are smart compiler and you know that blah is constant - therefore it will not change, instead of reading the value from the blah, you can exchange the value from get the blah storage place in memory read it to simply add the 3 to 4 and assign it to foo.
Infant you will probably assign 7 straight away because doing the addition is pointless each time you run the program.
Lets now get into the casting away the const part.
Some really sneaky programmer is doing the following:
int * blah_pointer = (int *) & blah;
Then he is increasing the blah value by doing this operation:
(*blah_pointer)++;
What will happen - if the variable is not in the protected memory (not read only) the program will just increase the value of variable stored in memory.
Now when you will read the value which is stored in the pointer you will get the increased value!
Ok but why is there an old, unchanged value if you are reading just the blah I hear you ask:
std::cout << blah;
It is there, because the compiler try to be smart and instead of actually reading the value from blah it will just exchange it to a constant value to blah, so instead of reading it it will actually exchange it to std::cout << 3.
The undefined part is changing the constant value - you can't ever know whether the value will be stored in protected or unprotected region therefore you can't tell what will happen.
If you want the compiler to actually check the value each time it encounters it just change the definition from:
const int blah = 3;
to
const volatile int blah = 3;
It will tell the compiler the following, even though the program I am writing is not allowed to change the blah value, it may be changed during the execution of the program, therefore do not try to optimise the access to the memory and read it every time the value is used.
I hope this makes it clearer.
I think, in compilation step, your compiler will replace all your constant variables with its values (it's like #define), it's the way GNU GCC compiler optimize the code.
I'm not 100% sure about it, but i've got the same issue while learning C/C++ syntax, and it's the conclusion that i've made after disassebling (converting the binary executable to assembler code) my program.
Anyways, just try to disassemble your output and see what is really happening.
I wrote a simple code as follows:
void show(const int a[], unsigned elements);
int main()
{
show(new int[]{1, 2, 3, 45}, 4); //does not work
}
void show(const int a[], unsigned elements)
{
cout << "{ ";
for (int i = 0; i < elements; i++)
{
cout << a[i];
if (i != elements - 1)
cout << ",";
cout << " ";
}
cout << "}";
}
It should just output { 1, 2, 3, 45 }. If I include a size in the brackets
show(new int[4]{1, 2, 3, 45}, 4);
then it works. So naturally I would assume that if I write the new this way I have to specify the size (although I thought that giving it an initialization list would imply the size). But, the odd thing is that when set a breakpoint at the show function call and I run it step by step through the debugger, the program outputs everything correctly and terminates at the end of main like it should. If I don't use the debugger, it either crashes after outputting a '{' or it outputs the whole thing "{ 1, 2, 3, 45 }" and an assertion failure " Program: ... "Expression: _CrtIsValidHeapPointer(pUserData) ... "
I'm curious to know why it is behaving this way. Also, I am using Visual Studio on Windows 8.
EDIT: I am using namepsace std. Please don't comment about using namespaces or about how to better write this code. I'm solely interested in the cause of this issue.
EDIT Responding to additional question in comment.
To be quick, yes it would "still" be a pointer, and yes it compiles with clang and gcc when you add the 4.
There are a couple things going on, however, and my initial answer was a simplification. The problem is that your expression is not well-formed to begin with, so it's not clear what it should evaluate to or what the type should be. Consider
If type is an array type, all dimensions other than the first must be specified as positive integral constant expression (until C++14)converted constant expression of type std::size_t (since C++14), but the first dimension may be any expression convertible to std::size_t.
Source: http://en.cppreference.com/w/cpp/language/new
As it says, either way there must be an expression in the brackets. This makes it difficult to say whether the expression would still evaluate to a pointer. A well-formed new expression would indeed evaluate to a pointer, no matter how many dimensions it has, even if it has zero. When I say pointer here, I strictly mean the representation, not the type.
The point is that the type, at least "inside" new, is different depending on how many dimensions you have. So, whether you do
new int
new int[6]
new int[12][14]
the representation is the same (a pointer), but the type new sees is different in each case. The compiler is able to respond to the different types in new (think by analogy with function overloading). In particular, when the type is an array type, it is possible to initialize the new memory with the braced initializer list containing multiple elements.
My best guess is, since VS was accepting the brackets without an expression, it was allocating memory for either a single int or int[0]. In the former case, it was wrongly allowing you to brace initialize it as if it was an array type, and in the latter case the allocated memory was not enough anyway. Your main then wrote over a heap guard that is there to catch this sort of thing in debug mode. When this was checked at the end of main or at program termination, you saw the symptoms. The flakiness in the output was either due to different heap layouts or due to buffering in the output stream.
Original answer
Your new expression, if it was well-formed, would have scalar type, meaning that the result is a "single value". That single value is a pointer to an integer, specifically to the one at the beginning of the array you are trying to create. That is how "dynamic arrays" are represented in C++. The type system does not "know" their size.
You are trying to initialize this single pointer value with an initializer list of 4 values. This shouldn't work. I am not sure that this should compile at all. It certainly didn't compile with clang or gcc, and I'm surprised that it worked in Visual Studio.
How does a pointer points to [-1]th index of the array produce legal output everytime. What is actually happening in the pointer assignment?
#include<stdio.h>
int main()
{
int realarray[10];
int *array = &realarray[-1];
printf("%p\n", (void *)array);
return 0;
}
Code output:
manav#workstation:~/knr$ gcc -Wall -pedantic ptr.c
manav#workstation:~/knr$ ./a.out
0xbf841140
EDIT: If this scenario is valid, then can i use this to define an array whose index start from 1 instead of 0, namely: array[1], array[2],...
Youre simply getting a pointer that contains the address of that "imaginary" location, i.e. the location of the first element &realarray[0] minus the size of one element.
This is undefined behavior, and might break horribly if, for instance, your machine has a segmented memory architecture. It's working because the compiler writer has chosen to implement the arithmetic as outlined above; that could change at any moment, and another compiler might behave totally differently.
a[b] is defined as *(a+b)
therefore a[-1] is *(a-1)
Whether a-1 is a valid pointer and therefore the dereference is valid depends on the context the code is used in.
The behaviour is undefined.
What you have observed may have happened in your particular compiler and configuration, but anything may happen in a different situation. You cannot rely on this behaviour at all.
The behavior is undefined. You can only calculate a pointer to any of the elements of an array, or one past, but that's it. You can only dereference a pointer to any of the elements of an array (not the one past pointer). Looking at your variable names, looks like you're asking a question from this C FAQ. I think that the answer on the FAQ is very good.
Although, as others have noted, it is undefined behaviour in this case, it compiles without warnings because in general, foo[-1] might be valid.
For example, this is fine:
int realarray[10] = { 10, 20, 30, 40 };
int *array = &realarray[2];
printf("%d\n", array[-1]);
In C and C++, array indexes are not checked at runtime. You are performing pointer arithmetic which may or may not end up giving defined results (not here).
However, in C++ you can use an array class that does provide bounds checks, e.g boost::array or std::tr1::array (to be added to standard library in C++0x):
#include <cstdio>
#include <boost/array.hpp>
int main()
{
try {
boost::array<int, 10> realarray;
int* p = &realarray.at(-1);
printf("%p\n", (void *)p);
} catch (const std::exception& e) {
puts(e.what());
}
}
Output:
array<>: index out of range
Also produces a compiler warning:
8 test.cpp [Warning] passing negative
value -0x000000001' for converting 1
ofT& boost::array::at(size_t)
[with T = int, unsigned int N = 10u]'
It simply points to the address of the item just ahead of the array in memory.
The array can simply be thought of as being a pointer. This is then simply decremented by one.
Here you just performing the pointer arithmetic , It will get firs index address of the relarray
See, if you &relarray[+1] , you would get the second element address of the array. since
&relarray[0] is pointing the first index address.
array points to one location before the starting address of realarray. However, what confused me is why does this compiled without any warnings.
You're just pointing to the 4 bytes located before the array.
This is perfectly well defined. Your code is guaranteed to be accepted by all compilers, and never crash at run time. C/C++ pointers are a numeric data type that obey the rules of arithmetic. Addition and subtraction work, and the bracket notation [] is just a fancy syntax for addition. NULL is literally the integer 0.
And this is why C/C++ are dangerous. The compiler will let you create pointers that point anywhere without complaint. Dereferencing the wild pointer in your example, *array = 1234; would produce undefined behavior, anything from subtle corruption to a crash.
Yes, you could use it to index from 1. Don't do this! The C/C++ idiom is to always index from 0. Other people who saw the code indexing from 1 would be tempted to "fix" it to index from 0.
The experiment could have provided little more clue if it was the following. Instead of printing the pointer value as
printf("%p\n", (void *)array);
, print the array element value
printf("%d\n", *array);
Thats because printing a pointer with %p will always produce some output (without any misbehavior), but nothing can be deduced from it.