Why sizeof int is wrong, while sizeof(int) is right? - c++

We know that sizeof is an operator used for calculating the size of any datatype and expression, and when the operand is an expression, the parentheses can be omitted.
int main()
{
int a;
sizeof int;
sizeof( int );
sizeof a;
sizeof( a );
return 0;
}
the first usage of sizeof is wrong, while others are right.
When it is compiled using gcc, the following error message will be given:
main.c:5:9: error: expected expression before ‘int’
My question is why the C standard does not allow this kind of operation. Will sizeof int cause any ambiguity?

The following could be ambiguous:
sizeof int * + 1
Is that (sizeof (int*)) + 1, or (sizeof(int)) * (+1)?
Obviously the C language could have introduced a rule to resolve the ambiguity, but I can imagine why it didn't bother. With the language as it stands, a type specifier never appears "naked" in an expression, and so there is no need for rules to resolve whether that second * is part of the type or an arithmetic operator.
The existing grammar does already resolve the potential ambiguity of sizeof (int *) + 1. It is (sizeof(int*))+1, not sizeof((int*)(+1)).
C++ has a somewhat similar issue to resolve with function-style cast syntax. You can write int(0) and you can write typedef int *intptr; intptr(0);, but you can't write int*(0). In that case, the resolution is that the "naked" type must be a simple type name, it can't just be any old type id that might have spaces in it, or trailing punctuation. Maybe sizeof could have been defined with the same restriction, I'm not certain.

From C99 Standard
6.5.3.4.2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a
type.
In your case int is neither expression nor parenthesized name.

There are two ways to use the sizeof operator in C. The syntax is this:
C11 6.5.3 Unary operators
...
sizeof unary-expression
sizeof ( type-name )
Whenever you use a type as operand, you must have the parenthesis, by the syntax definition of the language. If you use sizeof on an expression, you don't need the parenthesis.
The C standard gives one such example of where you might want to use it on an expression:
sizeof array / sizeof array[0]
However, for the sake of consistency, and to avoid bugs related to operator precedence, I would personally advise to always use () no matter the situation.

Related

how sizeof works on unimplemented function? [duplicate]

This question already has answers here:
What does sizeof (function(argument)) return?
(4 answers)
The function call inside sizeof doesn't invoke it? C++
(1 answer)
Closed 12 days ago.
I have two functions without any implementation.
I expect that the linker returns an undefined reference to hello and world error.
But surprisingly, the code compiles and runs without any error.
#include <stdio.h>
int hello();
char world();
int main() {
printf("sizeof hello = %zu, sizeof world = %zu\n", sizeof(hello()), sizeof(world()));
}
sizeof hello = 4, sizeof world = 1
sizeof(hello()) is the size of the the return type, int, not the size of the function. The function is not called.
The function does not need to be defined to determine its return type declared by int hello();.
Deeper (in C)
sizeof works with sizeof unary-expression and sizeof ( type-name ).
The size is determined from the type of the operand. The result
is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant. C17dr § 6.5.3.4 2
Since the type of the operand hello() is an int (and not a variable length array), the operand is not evaluated and is then like sizeof(int).
Aside: sizeof returns a size_t.
"%zu" is a correct specifier to match a size_t. "%ld" is not specified to work.
// printf("sizeof hello = %ld, sizeof world = %ld\n", sizeof(hello()), sizeof(world()));
printf("sizeof hello = %zu, sizeof world = %zu\n", sizeof(hello()), sizeof(world()));
sizeof is unevaluated context. It does not actually call the functions. No definition is required. The declaration is sufficient to know that they return int and char.
You could as well have written sizeof(int) and sizeof(char) to get the same output. sizeof(&hello) would result in the size of a function pointer to hello and does not require the definition either. And sizeof(hello) just makes no sense, because functions have no size (at least not in a sense that sizeof could tell you).
For details I refer you to https://en.cppreference.com/w/cpp/language/sizeof (c++) and https://en.cppreference.com/w/c/language/sizeof (c).
The title of your question:
how sizeof works on unimplemented function?
implies you are trying to determine how much memory the function implementation uses, similar to sizeof() int returning how much memory an int variable uses.
You can't use sizeof() on a function like that.
Per 6.5.3.4 The sizeof and _Alignof operators, paragraph 1 of the (draft) C11 standard:
The sizeof operator shall not be applied to an expression that has function type...
The draft C23 standard has the exact same wording:
The sizeof operator shall not be applied to an expression that has function type...

Confusing use of sizeof(…) operator results

I was browsing some C++ code recently and I ran into the following line:
static char zsocket_name[sizeof((struct sockaddr_un*)0)->sun_path] = {};
… This is confusing, as it looks to me as if the result of the sizeof operator is being pointer-dereferenced to access a struct field named sun_path, and that value is to be used to size an array in static storage.
However, when I tried a simple snippet program to evaulate the expression sizeof((struct sockaddr_un*)0)->sun_path, it yields the size of the sun_path member of the sockaddr_un struct.
Clearly, that is what the author of the original line was intending; but I find it syntactically confusing as it looks like a pointer dereference on the result of the sizeof(…) operation.
What am I missing about this use of sizeof(…)? Why does this expression evaluate this way?
In C++, the sizeof operator has a form sizeof expression in addition to the more common sizeof(type), so this:
sizeof ((struct sockaddr_un*)0)->sun_path
is equivalent to this:
sizeof(decltype(((struct sockaddr_un*)0)->sun_path))
The former, albeit without whitespace, is what's written in the code you posted.
Note that a parenthesized expression is also an expression, so sizeof ((struct sockaddr_un*)0)->sun_path can also be written with extra parentheses: sizeof(((struct sockaddr_un*)0)->sun_path) — even though this looks like the sizeof(type) form, it's actually the sizeof expression form applied to a parenthesized expression.
The only thing you can't do is sizeof type, so this is invalid:
sizeof decltype(((struct sockaddr_un*)0)->sun_path)
A more modern way of getting at the struct's field in C++, without casting 0 to a pointer, would be to use declval:
sizeof std::declval<sockaddr_un>().sun_path
Your mistake is thinking that sizeof works like a function call, whereas it is actually an operator. There is no requirement to use () at all.
sizeof is actually an operator of the form sizeof expression and the () around expression are not required. The precedence of sizeof in expressions is actually equal to that of ++ and -- (prefix form), unary + and -, ! and ~ (logical and bitwise not), the (type) typecast, & (address of), unary * (pointer indirection), and (C from 2011) _Alignof. All of these have right-to-left associativity.
The only operators with higher precedence than sizeof are ++ and -- (suffix form), function call (()), [] array subscripting, . and -> to access struct members, and (for C only from 1999) compound literals (type){list}.
There is no sizeof(expression) form. sizeof x evaluates the size of the result of an expression x (without evaluating x). sizeof (x) evaluates the size of result of the expression (x), again without evaluating it. You happen to have an expression of the form sizeof a->b which (due to precedence rules) is equivalent to sizeof (a->b) and not to sizeof(a)->b (which would trigger a compilation error).

What is the reason for the existent difference between C and C++ relative to the unary arithmetic operator +

In C the unary plus operator is called unary arithmetic operator and may not be applied to pointers (the C Standard, 6.5.3.3 Unary arithmetic operators).
1 The operand of the unary + or - operator shall have arithmetic
type; of the ~ operator, integer type; of the ! operator, scalar
type.
Thus this program will not compile
#include <stdio.h>
int main(void)
{
int a = 10;
int *pa = &a;
printf( "%d\n", *+pa );
return 0;
}
However in C++ the unary plus operator may be applied to pointers (the C++ Standard, 5.3.1 Unary operators)
7 The operand of the unary + operator shall have arithmetic, unscoped
enumeration, or pointer type and the result is the value of the
argument. Integral promotion is performed on integral or enumeration
operands. The type of the result is the type of the promoted operand.
And this program compiles successfully.
#include <iostream>
int main()
{
int a = 10;
int *pa = &a;
std::cout << *+pa << std::endl;
return 0;
}
What is the reason for maintaining this difference between C and C++?
The question arose when I was answering the question Why size of int pointer is different of size of int array?. I was going to show how to convert an array to a pointer in the sizeof operator.
At first I wanted to write
sizeof( +array )
However this expression is invalid in C. So I had to write
sizeof( array + 0 )
and I found that there is such a difference between C and C++.:)
Different languages may attach different semantics to the same syntax.
C and C++ are different languages with a common ancestor. C++ semantics look deceptively similar but are subtly different for some parts of the common syntax. Another curious case is this:
if (sizeof(char) == sizeof(int)) {
printf("Hello embedded world\n");
} else {
if (sizeof('a') == sizeof(char))
printf("This is C++ code\n");
if (sizeof('a') == sizeof(int))
printf("This is C code\n");
}
The reason for C++ to have extended the C syntax in the case of unary + might be to allow for some extended numeric types to be implemented as pointers, or simply for reasons of symmetry.
As Jaa-c mentions in a comment, +p is a computed expression whereas p is a reference to p. You provided another example where + can be used to force expression context. The question is why did the original authors of the C language disallow unary + on non numeric types? Maybe a side effect of the original implementation of pcc.
Note that in Javascript, the unary + operator can be applied to non number types and operates as a conversion to number.
In my considerations:
C++ is a type of Object-Oriented Language. So every data type can be treated as a "Class".
In C int is one of "the basic data type of C". But in C++ we can consider int as a Class. Thus, In C++ int pointer and int array belong to the different classes. In C a int pointer variable stored another int variable's address. int array's name instead of the first element's address of that int array. So in C they have kind of the same meaning.
As for the unary opreator "+", I understand the C++ language as: Every class In C++ represents a set of stuff. Every stuff in the set has the same properties. And there's some operations can be done onto each stuff. Of course these operations are member functions of a class. Another character In C++ is that users can overload an operator. Overload means we can do the same operation on the different Classes. For example: A man is eating a burger. we can overload action "Eat" between cats and rat: A cat is Eating a rat.
So as the C++ standard say:"The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the result is the value of the argument." That's just a overload for unary operator + in Class unscoped enumeration and pointer type. "And The Result Is The Value Of The Argument"-> I guess that's the point.

Semantics of unary & on numeric literal

What is the unary-& doing here?
int * a = 1990;
int result = &5[a];
If you were to print result you would get the value 2010.
You have to compile it with -fpermissive or it will stop due to errors.
In C, x [y] and y [x] are identical. So &5[a] is the same as &a[5].
&5[a] is the same as &a[5] and the same as a + 5. In your case it's undefined behavior because a points to nowhere.
C11 standard chapter 6.5.6 Additive operators/8 (the same in C++):
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
"...unary & on numeric literal"?
Postfix operators in C always have higher priority than prefix ones. In case of &5[a], the [] has higher priority than the &. Which means that in &5[a] the unary & is not applied to "numeric literal" as you seem to incorrectly believe. It is applied to the entire 5[a] subexpression. I.e. &5[a] is equivalent to &(5[a]).
As for what 5[a] means - this is a beaten-to-death FAQ. Look it up.
And no, you don't have "to compile it with -fpermissive" (my compiler tells me it doesn't even know what -fpermissive is). You have to figure out that this
int * a = 1990;
is not legal code in either C or C++. If anything, it requires an explicit cast
int * a = (int *) 1990;
not some obscure switch of some specific compiler you happened to be using at the moment. The same applies to another illegal initialization in int result = &5[a].
Finally, even if we overlook the illegal code and the undefined behavior triggered by that 5[a], the behavior of this code will still be highly implementation-dependent. I.e. the answer is no, in general case you will not get 2010 in result.
You cannot apply the unary & operator to an integer literal, because a literal is not an lvalue.
Due to operator precedence, your code doesn't do that. Since the indexing operator [] binds more tightly than unary &, &5[a] is equivalent to &(5[a]).
Here's a program similar to yours, except that it's valid code, not requiring -fpermissive to compile:
#include <stdio.h>
int main(void) {
int arr[6];
int *ptr1 = arr;
int *ptr2 = &5[ptr1];
printf("%p %p\n", ptr1, ptr2);
}
As explained in this question and my answer, the indexing operator is commutative (because it's defined in terms of addition, and addition is commutative), so 5[a] is equivalent to a[5]. So the expression &5[ptr1] computes the address of element 5 of arr.
In your program:
int * a = 1990;
int result = &5[a];
the initialization of a is invalid because a is of type int* and 1990 is of type int, and there is no implicit conversion from int to int*. Likewise, the initialization of result is invalid because &5[a] is of type int*. Apparently -fpermissive causes the compiler to violate the rules of the language and permit these invalid implicit conversions.
At least in the version of gcc I'm using, the -fpermissive option is valid only for C++ and Objective-C, not for C. In C, gcc permits such implicit conversions (with a warning) anyway. I strongly recommend not using this option. (Your question is tagged both C and C++. Keep in mind that C and C++ are two distinct, though closely related, languages. They happen to behave similarly in this case, but it's usually best to pick one language or the other.)

Is sizeof(*ptr) undefined behavior when pointing to invalid memory?

We all know that dereferencing an null pointer or a pointer to unallocated memory invokes undefined behaviour.
But what is the rule when used within an expression passed to sizeof?
For example:
int *ptr = 0;
int size = sizeof(*ptr);
Is this also undefined?
In most cases, you will find that sizeof(*x) does not actually evaluate *x at all. And, since it's the evaluation (de-referencing) of a pointer that invokes undefined behaviour, you'll find it's mostly okay. The C11 standard has this to say in 6.5.3.4. The sizeof operator /2 (my emphasis in all these quotes):
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
This is identical wording to the same section in C99. C89 had slightly different wording because, of course, there were no VLAs at that point. From 3.3.3.4. The sizeof operator:
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand, which is not itself evaluated. The result is an integer constant.
So, in C, for all non-VLAs, no dereferencing takes place and the statement is well defined. If the type of *x is a VLA, that's considered an execution-phase sizeof, something that needs to be worked out while the code is running - all others can be calculated at compile time. If x itself is the VLA, it's the same as the other cases, no evaluation takes place when using *x as an argument to sizeof().
C++ has (as expected, since it's a different language) slightly different rules, as shown in the various iterations of the standard:
First, C++03 5.3.3. Sizeof /1:
The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id.
In, C++11 5.3.3. Sizeof /1, you'll find slightly different wording but the same effect:
The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id.
C++11 5. Expressions /7 (the above mentioned clause 5) defines the term "unevaluated operand" as perhaps one of the most useless, redundant phrases I've read for a while, but I don't know what was going through the mind of the ISO people when they wrote it:
In some contexts ([some references to sections detailing those contexts - pax]), unevaluated operands appear. An unevaluated operand is not evaluated.
C++14/17 have the same wording as C++11 but not necessarily in the same sections, as stuff was added before the relevant parts. They're in 5.3.3. Sizeof /1 and 5. Expressions /8 for C++14 and 8.3.3. Sizeof /1 and 8. Expressions /8 for C++17.
So, in C++, evaluation of *x in sizeof(*x) never takes place, so it's well defined, provided you follow all the other rules like providing a complete type, for example. But, the bottom line is that no dereferencing is done, which means it does not cause a problem.
You can actually see this non-evaluation in the following program:
#include <iostream>
#include <cmath>
int main() {
int x = 42;
std::cout << x << '\n';
std::cout << sizeof(x = 6) << '\n';
std::cout << sizeof(x++) << '\n';
std::cout << sizeof(x = 15 * x * x + 7 * x - 12) << '\n';
std::cout << sizeof(x += sqrt(4.0)) << '\n';
std::cout << x << '\n';
}
You might think that the final line would output something vastly different to 42 (774, based on my rough calculations) because x has been changed quite a bit. But that is not actually the case since it's only the type of the expression in sizeof that matters here, and the type boils down to whatever type x is.
What you do see (other than the possibility of different pointer sizes on lines other than the first and last) is:
42
4
4
4
4
42
No. sizeof is an operator, and works on types, not the actual value (which is not evaluated).
To remind you that it's an operator, I suggest you get in the habit of omitting the brackets where practical.
int* ptr = 0;
size_t size = sizeof *ptr;
size = sizeof (int); /* brackets still required when naming a type */
The answer may well be different for C, where sizeof is not necessarily a compile-time construct, but in C++ the expression provided to sizeof is never evaluated. As such, there is never a possibility for undefined behavior to exhibit itself. By similar logic, you can also "call" functions that are never defined [because the function is never actually called, no definition is necessary], a fact that is frequently used in SFINAE rules.
sizeof and decltype do not evaluate their operands, computing types only.
sizeof(*ptr) is the same as sizeof(int) in this case.
Since sizeof does not evaluate its operand (except in the case of variable length arrays if you're using C99 or later), in the expression sizeof (*ptr), ptr is not evaluated, therefore it is not dereferenced. The sizeof operator only needs to determine the type of the expression *ptr to get the appropriate size.