Confusing use of sizeof(…) operator results

Confusing use of sizeof(…) operator results - c++

I was browsing some C++ code recently and I ran into the following line:
static char zsocket_name[sizeof((struct sockaddr_un*)0)->sun_path] = {};
… This is confusing, as it looks to me as if the result of the sizeof operator is being pointer-dereferenced to access a struct field named sun_path, and that value is to be used to size an array in static storage.
However, when I tried a simple snippet program to evaulate the expression sizeof((struct sockaddr_un*)0)->sun_path, it yields the size of the sun_path member of the sockaddr_un struct.
Clearly, that is what the author of the original line was intending; but I find it syntactically confusing as it looks like a pointer dereference on the result of the sizeof(…) operation.
What am I missing about this use of sizeof(…)? Why does this expression evaluate this way?

In C++, the sizeof operator has a form sizeof expression in addition to the more common sizeof(type), so this:
sizeof ((struct sockaddr_un*)0)->sun_path
is equivalent to this:
sizeof(decltype(((struct sockaddr_un*)0)->sun_path))
The former, albeit without whitespace, is what's written in the code you posted.
Note that a parenthesized expression is also an expression, so sizeof ((struct sockaddr_un*)0)->sun_path can also be written with extra parentheses: sizeof(((struct sockaddr_un*)0)->sun_path) — even though this looks like the sizeof(type) form, it's actually the sizeof expression form applied to a parenthesized expression.
The only thing you can't do is sizeof type, so this is invalid:
sizeof decltype(((struct sockaddr_un*)0)->sun_path)
A more modern way of getting at the struct's field in C++, without casting 0 to a pointer, would be to use declval:
sizeof std::declval<sockaddr_un>().sun_path

Your mistake is thinking that sizeof works like a function call, whereas it is actually an operator. There is no requirement to use () at all.
sizeof is actually an operator of the form sizeof expression and the () around expression are not required. The precedence of sizeof in expressions is actually equal to that of ++ and -- (prefix form), unary + and -, ! and ~ (logical and bitwise not), the (type) typecast, & (address of), unary * (pointer indirection), and (C from 2011) _Alignof. All of these have right-to-left associativity.
The only operators with higher precedence than sizeof are ++ and -- (suffix form), function call (()), [] array subscripting, . and -> to access struct members, and (for C only from 1999) compound literals (type){list}.
There is no sizeof(expression) form. sizeof x evaluates the size of the result of an expression x (without evaluating x). sizeof (x) evaluates the size of result of the expression (x), again without evaluating it. You happen to have an expression of the form sizeof a->b which (due to precedence rules) is equivalent to sizeof (a->b) and not to sizeof(a)->b (which would trigger a compilation error).

Related

C11 and constant expression evaluation in switch-case labels

following this question Why doesn't gcc allow a const int as a case expression?, basically the same as What promoted types are used for switch-case expression comparison? or Is there any way to use a constant array with constant index as switch case label in C?.
From the first link, I tried to replace :
case FOO: // aka 'const int FOO = 10'
with :
case ((int) "toto"[0]): // can't be anything *but* constant
Which gives :
https://ideone.com/n1bmIb -> https://ideone.com/4aOSXR = works in C++
https://ideone.com/n1bmIb -> https://ideone.com/RrnO2R = fails in C
I don't quite understand since the "toto" string can't be anything but a constant one, it isn't even a variable, it lies in the void of the compiler memory. I'm not even playing with the 'const' fuzzy logic of the C language (that really stands for "read-only, not constant, what did you expect?"), the problem is either "array access" or "pointer referencing" into a constant expression that do not evaluate in C, but do quite well in C++.
I expected to use this "trick" to use a HASH_MACRO(str) to generate unique case labels values from a key identifier, leaving eventually the compiler to raise an error in case of collision because of similar label values found.
OK, ok, I was told these restrictions were made to simplify language tooling (preproc, compiler, linker) and C ain't no LISP, but you can have full featured LISP interpreter/compilers for a fraction of the size of a C equivalent, so that's no excuse.
Question is : is there an "extension" to C11 that just allows this "toto" thingy to work in GCC, CLANG and... MSVC ? I don't want to go the C++ path (typedef's forward declarations don't work anymore) and because embedded stuff (hence the compile-time hash computation for space-time distortion).
Is there an intermediary "C+" language that is more 'permissive' and 'understand' embedded a little better, like -Praise the Lords- "enums as bitfield members", among nice other things we cannot have (because of out-of-reality standards evolving like snails under a desert sun) ?
#provemewrong, #changemymind, #norustplease

It doesn't matter whether or not it could be known to the compiler at compile time. The case label needs to have a value that is an integer constant expression (C11 6.8.4.2p3).
The expression of each case label shall be an integer constant expression and no two of the case constant expressions in the same switch statement shall have the same value after conversion. There may be at most one default label in a switch statement. (Any enclosed switch statement may have a default label or case constant expressions with values that duplicate case constant expressions in the enclosing switch statement.)
And the definition of an integer constant expression is in C11 6.6p6:
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.
Since "toto" is none of integer constants, enumeration constants, character constants, constant sizeof, _Alignof expressions or floating point constant cast to an integer; and that list was specified in the constraints section of the standard, the compiler must not pass this silently. (Even a conforming compiler may still successfully compile the program, but it must diagnose this as a constraint violation.)
What you can use is chained ? : to resolve the index to a character constant, i.e.
x == 0 ? 't'
: x == 1 ? 'o'
: x == 2 ? 't'
: x == 3 ? 'o'
This can be written into a macro.

"toto[0]" is not an integer constant expression as C defines the term:
6.6 Constant expressions
...
6 An integer constant expression117) shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, _Alignof expressions, and floating
constants that are the immediate operands of casts. Cast operators in an integer constant
expression shall only convert arithmetic types to integer types, except as part of an
operand to the sizeof or _Alignof operator.
117) An integer constant expression is required in a number of contexts such as the size of a bit-field
member of a structure, the value of an enumeration constant, and the size of a non-variable length
array. Further constraints that apply to the integer constant expressions used in conditional-inclusion
preprocessing directives are discussed in 6.10.1.
C 2011 online draft

The issue that you're running into is that, in C, "toto" is an array of chars. Sure, it's constant in memory, but it's still just an array. The [] operator indexes in an array (from a pointer). If you wanted, you would be able to edit a compiled binary and change the string "toto" to something else. In a sense, it is not compile-time known. It's equivalent to doing:
char * const ___string1 = "toto";
...
case ((int) ___string1[0]):
(This is a little forced and redundant, but it's just for demonstration)
Note that the type of the elements of a string literal is char, not const char.
The case, must be a constant however, as it is built into the compiled program control flow.

Why sizeof int is wrong, while sizeof(int) is right?

We know that sizeof is an operator used for calculating the size of any datatype and expression, and when the operand is an expression, the parentheses can be omitted.
int main()
{
int a;
sizeof int;
sizeof( int );
sizeof a;
sizeof( a );
return 0;
}
the first usage of sizeof is wrong, while others are right.
When it is compiled using gcc, the following error message will be given:
main.c:5:9: error: expected expression before ‘int’
My question is why the C standard does not allow this kind of operation. Will sizeof int cause any ambiguity?

The following could be ambiguous:
sizeof int * + 1
Is that (sizeof (int*)) + 1, or (sizeof(int)) * (+1)?
Obviously the C language could have introduced a rule to resolve the ambiguity, but I can imagine why it didn't bother. With the language as it stands, a type specifier never appears "naked" in an expression, and so there is no need for rules to resolve whether that second * is part of the type or an arithmetic operator.
The existing grammar does already resolve the potential ambiguity of sizeof (int *) + 1. It is (sizeof(int*))+1, not sizeof((int*)(+1)).
C++ has a somewhat similar issue to resolve with function-style cast syntax. You can write int(0) and you can write typedef int *intptr; intptr(0);, but you can't write int*(0). In that case, the resolution is that the "naked" type must be a simple type name, it can't just be any old type id that might have spaces in it, or trailing punctuation. Maybe sizeof could have been defined with the same restriction, I'm not certain.

From C99 Standard
6.5.3.4.2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a
type.
In your case int is neither expression nor parenthesized name.

There are two ways to use the sizeof operator in C. The syntax is this:
C11 6.5.3 Unary operators
...
sizeof unary-expression
sizeof ( type-name )
Whenever you use a type as operand, you must have the parenthesis, by the syntax definition of the language. If you use sizeof on an expression, you don't need the parenthesis.
The C standard gives one such example of where you might want to use it on an expression:
sizeof array / sizeof array[0]
However, for the sake of consistency, and to avoid bugs related to operator precedence, I would personally advise to always use () no matter the situation.

Several unary operators in C and C++

Is it standard-conforming to use expressions like
int i = 1;
+-+-+i;
and how the sign of i variable is determined?

Yes it is. Unary + and - associate right-to-left, so the expression is parsed as
+(-(+(-(+i))));
Which results in 1.
Note that these can be overloaded, so for a user-defined type the answer may differ.

Your operators has no side effect, +i do nothing with int itself and you do not use the temporary generated value but remove + that do nothing and you have -(-i) witch is equal to i itself.(removing + in the code will convert the operator, I mean remove it in computation because it has no effect)

i isn't modified (C: without intervening sequence points|C++: in an unsequenced manner) so it's legal. You're just creating a new temporary with each operator.
The unary + doesn't even do anything, so all you have is two negations which just give 1 for that expression. The variable i itself is never changed.

C++ Converting Postfix to infix

So i'm programming a cmd-based calculator in C++. I finished it, but i was wondering, after converting the infix to postfix, i have a queue called the postfix queue containing the operators/operands in correct order. How do I convert a postfix expression back to infix?

If you don't mind producing some extra parentheses, it should be pretty easy. You basically "evaluate" the postfix data about like usual, except that when you get to an operator, instead of evaluating that operator and pushing the result on the stack, you print out an open-paren, the first operand, the operator, the second operand, and finally a close-paren.
If you didn't mind changing the order, it would also be pretty easy to avoid the extraneous parentheses. Walk the expression backwards, rearranging things from operator operand operand into operand operator operand. If you encounter an operator where you need an operand, you have a sub-expression to print out similarly. You need to enclose that sub-expression in parentheses if and only if its operator is of lower precedence than the operator you encountered previously.
For example, consider: a b + c *. Walking this backwards, we get *, then c, so we start by printing out c *. Then we need another operand, but we have a +, so we have a sub-expression. Since + is lower precedence than *, we need to enclose that sub-expression in parentheses, so we get c * (b + a).
Conversely, if we had: a b * c +, we'd start similarly, producing c +, but then since * is higher precedence that +, we can/could print out the a * b (or b * a) without parens.
Note that with - or / (or anything else that isn't commutative) you'd have to be a bit more careful about getting the order of operands correct. Even so, you're not going to get the original expression back, only an expression that should be logically equivalent to it.

Is sizeof(*ptr) undefined behavior when pointing to invalid memory?

We all know that dereferencing an null pointer or a pointer to unallocated memory invokes undefined behaviour.
But what is the rule when used within an expression passed to sizeof?
For example:
int *ptr = 0;
int size = sizeof(*ptr);
Is this also undefined?

In most cases, you will find that sizeof(*x) does not actually evaluate *x at all. And, since it's the evaluation (de-referencing) of a pointer that invokes undefined behaviour, you'll find it's mostly okay. The C11 standard has this to say in 6.5.3.4. The sizeof operator /2 (my emphasis in all these quotes):
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
This is identical wording to the same section in C99. C89 had slightly different wording because, of course, there were no VLAs at that point. From 3.3.3.4. The sizeof operator:
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand, which is not itself evaluated. The result is an integer constant.
So, in C, for all non-VLAs, no dereferencing takes place and the statement is well defined. If the type of *x is a VLA, that's considered an execution-phase sizeof, something that needs to be worked out while the code is running - all others can be calculated at compile time. If x itself is the VLA, it's the same as the other cases, no evaluation takes place when using *x as an argument to sizeof().
C++ has (as expected, since it's a different language) slightly different rules, as shown in the various iterations of the standard:
First, C++03 5.3.3. Sizeof /1:
The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id.
In, C++11 5.3.3. Sizeof /1, you'll find slightly different wording but the same effect:
The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id.
C++11 5. Expressions /7 (the above mentioned clause 5) defines the term "unevaluated operand" as perhaps one of the most useless, redundant phrases I've read for a while, but I don't know what was going through the mind of the ISO people when they wrote it:
In some contexts ([some references to sections detailing those contexts - pax]), unevaluated operands appear. An unevaluated operand is not evaluated.
C++14/17 have the same wording as C++11 but not necessarily in the same sections, as stuff was added before the relevant parts. They're in 5.3.3. Sizeof /1 and 5. Expressions /8 for C++14 and 8.3.3. Sizeof /1 and 8. Expressions /8 for C++17.
So, in C++, evaluation of *x in sizeof(*x) never takes place, so it's well defined, provided you follow all the other rules like providing a complete type, for example. But, the bottom line is that no dereferencing is done, which means it does not cause a problem.
You can actually see this non-evaluation in the following program:
#include <iostream>
#include <cmath>
int main() {
int x = 42;
std::cout << x << '\n';
std::cout << sizeof(x = 6) << '\n';
std::cout << sizeof(x++) << '\n';
std::cout << sizeof(x = 15 * x * x + 7 * x - 12) << '\n';
std::cout << sizeof(x += sqrt(4.0)) << '\n';
std::cout << x << '\n';
}
You might think that the final line would output something vastly different to 42 (774, based on my rough calculations) because x has been changed quite a bit. But that is not actually the case since it's only the type of the expression in sizeof that matters here, and the type boils down to whatever type x is.
What you do see (other than the possibility of different pointer sizes on lines other than the first and last) is:
42
4
4
4
4
42

No. sizeof is an operator, and works on types, not the actual value (which is not evaluated).
To remind you that it's an operator, I suggest you get in the habit of omitting the brackets where practical.
int* ptr = 0;
size_t size = sizeof *ptr;
size = sizeof (int); /* brackets still required when naming a type */

The answer may well be different for C, where sizeof is not necessarily a compile-time construct, but in C++ the expression provided to sizeof is never evaluated. As such, there is never a possibility for undefined behavior to exhibit itself. By similar logic, you can also "call" functions that are never defined [because the function is never actually called, no definition is necessary], a fact that is frequently used in SFINAE rules.

sizeof and decltype do not evaluate their operands, computing types only.

sizeof(*ptr) is the same as sizeof(int) in this case.

Since sizeof does not evaluate its operand (except in the case of variable length arrays if you're using C99 or later), in the expression sizeof (*ptr), ptr is not evaluated, therefore it is not dereferenced. The sizeof operator only needs to determine the type of the expression *ptr to get the appropriate size.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js