I'm looking for where const-promotion is defined in c/c++. It's an implicit conversion, but I cannot find any documentation on it.
This works on g++ while using the --pedantic flag
// prototypes for example
void foo(char*);
void bar(const char*);
char buffer[8];
snprintf(buffer,sizeof(buffer), "hello");
// note: string literals are of type const char*
foo(buffer);
foo("hello"); // works, but why
bar(buffer);
bar("hello");
The behavior represented above is the expected behavior. However I am looking for the documentation for this behavior. I have looked at (drafts of) the c++98 standard and stack overflow searching for "promotion" and "implicit conversion" and have not found an answer.
If this question is too broad, I am using C++98, so we can address it for that standard.
This answer is for C and not C++.
Character string literals (distinguished from UTF-8 string literals or wide string literals) are arrays of char, per C 2018 6.4.5 61. For historical reasons, they are not arrays of const char, but they should be treated as const by programmers as, if a program tries to write to a string literal, the behavior is not defined by the C standard.
As an array, a string literal is automatically converted to a char * pointing to its first element, unless it is the operand of sizeof or unary & or is used to initialize an array.
Thus, in both foo(buffer) and foo("hello"), we have a char * argument passed to a char * parameter, and no conversion is necessary.
In bar(buffer) and bar("hello"), we have a char * argument passed to a const * parameter. The explanation for this follows.
For function calls where a prototype is visible, the arguments are converted to the types of the parameters as if by assignment, per C 2018 6.5.2.2 7:
If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type.…
(Note that the “unqualified version of its declared type” means a const int or char * const parameter would be int or char *, respectively, not that a const char * parameter would be char *.)
6.5.16.1 2 says:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression…
The type of the assignment expression is that of the left operand, 6.5.16 3:
… The type of an assignment expression is the type the left operand would have after lvalue conversion.…
So now we know the char * is converted to const char *. This also satisfies the constraints for assignment in 6.5.16.1 1:
One of the following shall hold: … the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;…
And the pointer conversion is specified in 6.3.2.3 2:
For any qualifier q, a pointer to a non-q-qualified type may be converted to a pointer to the q-qualified version of the type; the values stored in the original and converted pointers shall compare equal.
For the snprintf call, the argument "hello" is passed in a location corresponding to ... in the parameters. For this, we look to the rest of 6.5.2.2 7, which continues from the first part quoted above:
… The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
The default argument promotions are in 6.5.2.2 6:
… the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions.
Those promotions do not affect pointers, so the pointer is passed with its type unchanged. That is interesting because we could pass either a char * or a const char * here. The specification for snprintf refers to fprintf, which, for %s specification, says in 7.21.6.1 8:
… the argument shall be a pointer to the initial element of an array of character type.…
So it just requires a pointer to “character type,” not a specific type such as char or const char or volatile char.
(We might further wonder whether, if we were implementing our own function like snprintf and using <stdarg.h> to do it, whether passing a char * argument and processing it with a va_arg(ap, const char *) macro invocation would work. My initial reading of the va_arg specification in 7.16.1.1 2 says the types must be compatible, but char * and const char * are not compatible, but I have not studied this thoroughly.)
Footnote
1 Technically, a string literal is a thing in the source code or the representation of it during phases of C translation, and it is used to create an array of char. For simplicity, I will refer to array as the string literal.
Related
I've always thought that:
char a[10];
char* p = &a;
Is wrong and that it should be one of the following instead:
char a[10];
char* p = a; // OR
char* p = &a[0];
I wish I could find the thread here on SO I was reading saying that p = &a; is valid, and it was in regard to the C language, not C++.
I've been thinking, obviously:
char* p = new char[10];
char* p1 = &p; // Is wrong
But when an array is created as a local in stack space it seems reasonable to my intuition that a, &a, and &a[0] are all the same value/address. I have to say, I've been following mostly C++ and not seen it done this way, which is why when I started to look into C I was quick to call it an error, but I'm sure it's not in C (well verification on that would be appreciated also). I'm just wondering if it's also the case in C++, because as far as I remember in C++ is usually done one of the other two ways and this way (if my imagination isn't playing tricks) seems to done in C.
Edit: This is a really dumb question. My confusion came from the fact that my Visual Studio compiler compiles it in C, but not for C++, so I thought there was a difference in the language in this respect. I won't delete the question because it already has an answer.
char a[10];
char* p = &a;
is indeed wrong in C.
In particular (all quotes refer to ISO 9899:1999 (C99) and all emphasis is mine):
6.7.8 (Initialization) / 11 says:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. The
initial value of the object is that of the expression (after conversion); the same type
constraints and conversions as for simple assignment apply, taking the type of the scalar
to be the unqualified version of its declared type.
6.5.16.1 (Simple assignment):
Constraints
One of the following shall hold:
the left operand has qualified or unqualified arithmetic type and the right has
arithmetic type;
the left operand has a qualified or unqualified version of a structure or union type
compatible with the type of the right;
both operands are pointers to qualified or unqualified versions of compatible types,
and the type pointed to by the left has all the qualifiers of the type pointed to by the
right;
one operand is a pointer to an object or incomplete type and the other is a pointer to a
qualified or unqualified version of void, and the type pointed to by the left has all
the qualifiers of the type pointed to by the right;
the left operand is a pointer and the right is a null pointer constant; or
the left operand has type _Bool and the right is a pointer.
Only the bold part applies; we're not using any arithmetic or struct/union types, there's no void or _Bool and no null pointer constants.
The types in question are char * (pointer to char) on the left and char (*)[10] (pointer to array[10] of char) on the right. Compatibility of pointer types is defined as follows:
6.7.5.1 (Pointer declarators) / 2:
For two pointer types to be compatible, both shall be identically qualified and both shall
be pointers to compatible types.
The pointed-to types are char and char [10], respectively.
But now we're stuck. There is 6.2.7 (Compatible type and composite type) / 1:
Two types have compatible type if their types are the same. Additional rules for
determining whether two types are compatible are described in 6.7.2 for type specifiers,
in 6.7.3 for type qualifiers, and in 6.7.5 for declarators.
char and char [10] are clearly not the same. All the declarator rules for compatible types in 6.7.5 say "For two pointer types to be compatible ...", "For two array types to be compatible ...", "For two function types to be compatible ...", but there is no way for a non-array type to be compatible with an array type.
Thus the types are not compatible and char *p = &a violates a constraint in 6.5.16.1.
5.1.1.3 (Diagnostics):
A conforming implementation shall produce at least one diagnostic message (identified in
an implementation-defined manner) if a preprocessing translation unit or translation unit
contains a violation of any syntax rule or constraint, even if the behavior is also explicitly
specified as undefined or implementation-defined.
This means a warning or error message is required. If your compiler doesn't produce one, it's not an actual C compiler.
printf's % conversion specifier expects a pointer to a char array. Note the lack of const. I can see the reasons for this in C, and since C++ incorporates the C99 standard, this wouldn't change. However, if I'm writing my own printf, can I safely convert the argument to const char* instead?:
case 's' :
ptr = va_arg(va, const char*);
_puts(ptr, strlen(ptr));
break;
Would this have any unintended semantics (note: I'm not asking about undefined behavior, because such an implementation would not be conforming anyway)?
The C standard (ISO/IEC 9899:2011 (E)) specifies the meaning of the %s conversion specifier in 7.21.6.1/8:
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.
This formulation clearly not specific enough to tell whether the character type is const or non-const. It doesn't even state whether char, signed char, or unsigned char is used. I don't think character array is defined as a term in the C standard.
Put differently: using char const* for the type specified by a %s conversion specifier is fine.
When I'm in C++, and I call an overloaded function foo, like so:
foo('e' - (char) 5)
it can output "this is a char" or "this is an int" based on the type result. I get "this is an int" from my program, like this:
#include <iostream>
void foo(char x)
{
std::cout << "output is a char" << std::endl;
}
void foo(int x)
{
std::cout << "output is an int" << std::endl;
}
int main()
{
foo('a' + (char) 5);
}
My instructor says that in C, the expression above, ('a' + (char) 5), evaluates as a char. I see in the C99 standard that chars are promoted to ints to find the sum, but does C recast them back to chars when it's done? I can't find any references that seem credible saying one way or another what C actually does after the promotion is completed, and the sum is found.
Is the sum left as an int, or given as a char? How can I prove this in C, or is there a reference I'm not understanding/finding?
From the C Standard, 6.3.1.8 Usual arithmetic conversions, emphasis mine:
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the usual arithmetic conversions:
First, if the correspeonding real type of either operand is long double...
Otherwise, if the corresponding real type of either operand is double...
Otherwise, if the corresponding real type of either operand is float...
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
So you are exactly correct. The type of the expression 'a' + (char) 5 is int. There is no recasting back to char, unless explicitly asked for by the user. Note that 'a' here has type int, so it's only the (char)5 that needs to be promoted. This is stipulated in 6.4.4.4 Character Constants:
An integer character constant is a sequence of one or more multibyte characters enclosed
in single-quotes, as in 'x'.
...
An integer character constant has type int.
There is an example demonstrating the explicit recasting to char:
In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2
the ‘‘integer promotions’’ require that the abstract machine promote the value of each variable to int size and then add the two ints and truncate the sum. Provided the addition of two chars can be done without overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only produce the same result, possibly omitting the promotions.
The truncation here only happens because we assign back to a char.
No, C does not recast them back to chars.
The standard (ISO/IEC 9899:1999) says (6.3.1.8 Usual arithmetic conversions):
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is determined by the operator.
Your instructor seems to be wrong. Additional to your standard find that the arithmetic promotes to int, we can use a simple test program to show the behavior (no standard prove of course, but the same level of proof as your C++ test):
#include <stdio.h>
int main () {
printf("%g",'c' - (char)5);
}
produces
Warning: format specifies type 'double' but argument has type 'int'
with gcc and clang.
You can't determine the type of an expression as easily in C, but you can easily determine the size of an expression:
#include <stdio.h>
int main(void) {
printf("sizeof(char)==1\n");
printf("sizeof(int)==%u\n", sizeof(int));
printf("sizeof('a' + (char) 5)==%u\n", sizeof('a' + (char) 5));
return 0;
}
This gives me:
sizeof(char)==1
sizeof(int)==4
sizeof('a' + (char) 5)==4
which at least proves that 'a' + (char) 5 is not of type char.
It's promoted to an int, and there's nothing to tell the compiler it should use anything else. You can convert back to a char like this:
foo((char)('a' + 5));
This tells the compiler to treat the result of the calculation as a char, otherwise it leaves it as an int.
Section 6.5.2.2/6
If the expression that denotes the called function has a type that
does not include a prototype, the integer promotions are performed on
each argument...
So the answer to your question depends on the function prototype. If the function is declared as
void foo(int x)
or
void foo()
then the function argument will be passed as an int.
OTOH, if the function is declared as
void foo( char x )
then the result of the expression will be implicitly cast to char.
In C (unlike C++), the character literal 'a' has type int (§6.4.4.4¶10: "An integer character constant has type int.")
Even if that were not the case, the C standard clearly states that prior to the evaluation of the operator +, "[i]f both operands have arithmetic type, the usual arithmetic conversions are performed on them." (C11, §6.5.6 ¶4) In this respect, C and C++ have identical semantics. (See [expr.add] §5.7¶1 of C++)
From the C++ Standard (C++ Working Draft N3797, 5.7 Additive operators)
1 The additive operators + and - group left-to-right. The usual
arithmetic conversions are performed for operands of arithmetic or
enumeration type.
and (5 Expressions)
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
...
— Otherwise, the integral promotions (4.5) shall be performed on
both operands.62 Then the following rules shall be applied to the
promoted operands:
Thus the expression in the function call
foo('a' + (char) 5);
has type int.
To call the overloaded function with parameter of type char you have to write for example
foo( char( 'a' + 5 ) );
or
foo( ( char )( 'a' + 5 ) );
or you can use C++ casting like
foo( static_cast<char>( 'a' + 5 ) );
The above quotes from the C++ Standard also are valid for C Standard. The visible difference is that in C++ character literals have type char while in C they have type int.
So in C++ the output of the statement
std::cout << sizeof( 'a' ) << std::endl;
will be equal to 1.
While in C the output of the statement
printf( "%zu\n", sizeof( 'a' ) );
will be equal to sizeof( int ) that is usually equal to 4.
So apparently a std::nullptr_t argument is converted to a null pointer of type void * (Section 5.2.2/7 of N3337) when passed without a parameter (via ...). This means that to properly pass a null char * pointer, for example, a cast is still needed:
some_variadic_function("a", "b", "c", (const char *) std::nullptr);
since there is no guarantee that a null void * has the same bit pattern as a null char *. Correct?
This also means that there is no advantage to std::nullptr over 0 in such cases, except perhaps for clarity.
You ask:
since there is no guarantee that a null void * has the same bit pattern as a null char *. Correct?
Well, actually, that guarantee does exist, Deduplicator's answer already shows where the standard requires this. But that is not relevant to your question.
Passing void * to a variadic function, and accessing it using va_arg as char *, is specifically allowed as a special exception.
C++11:
18.10 Other runtime support [support.runtime]
1 Headers <csetjmp> (nonlocal jumps), <csignal> (signal handling), <cstdalign> (alignment), <cstdarg> (variable arguments), <cstdbool> (__bool_true_false_are_defined). (runtime environment
getenv(), system()), and <ctime> (system clock clock(), time()) provide further compatibility with C code.
2 The contents of these headers are the same as the Standard C library headers <setjmp.h>, <signal.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stdlib.h>, and <time.h>, respectively, with the following
changes:
[... nothing about va_arg]
C99:
7.15.1.1 The va_arg macro
[...] If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
-- one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
-- one type is pointer to void and the other is a pointer to a character type.
However, this does mean that in other cases where two types T1 and T2 have the same representation and alignment requirements, the behaviour is undefined if T1 is passed to a variadic function, and it is retrieved as T2.
An example of this: passing (void *) 0 and accessing it as char *, is allowed, passing (void *) 0 and accessing it as unsigned char * is also allowed, but passing (char *) 0 and accessing it as unsigned char * is not allowed. If a compiler is capable of inlining calls to variadic functions, and optimises based on the strict requirements of the standard, such mismatches could break badly.
This also means that there is no advantage to std::nullptr over 0 in such cases, except perhaps for clarity.
I would definitely not use nullptr without casting it, even though in this one special case it is valid. It is far too hard to see that it is valid. And if a cast is included anyway, (char *) 0 is just as clear as a null pointer value.
You are wrong. One of the few guarantees are that a char* has the same size and representation as the corresponding void*.
3.9.2 Compound Types §4
A pointer to cv-qualified (3.9.3) or cv-unqualified void can be used to point to objects of unknown type.
Such a pointer shall be able to hold any object pointer. An object of type cv void* shall have the same
representation and alignment requirements as cv char*.
Edit: Looks like this answer by hvd is better, showing a few more traps specific to the variadic function part of the question.
Considering this code fragment:
struct My {
operator const char*()const{ return "my"; }
} my;
CStringA s( "aha" );
printf("%s %s", s, my );
// another variadic function to get rid of comments about printf :)
void foo( int i, ... ) {
va_list vars;
va_start(vars, i);
for( const char* p = va_arg(vars,const char*)
; p != NULL
; p=va_arg(vars,const char*) )
{
std::cout << p << std::endl;
}
va_end(vars);
}
foo( 1, s, my );
This snippet results in the 'intuitive' output "aha". But I haven't got a clue how this can work:
if the variadic-function call is translated into pushing the pointers of the arguments, printf will receive a CStringA* that is interpreted as a const char*
if the variadic-function call is calling operator (const char*) on it, why wouldn't it do so for my own class?
Can someone explain this?
EDIT: added a dummy variadic function that treats it's arguments as const char*s. Behold - it even crashes when it reaches the my argument...
The relevant text of C++98 standard §5.2.2/7:
The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined.
So formally the behavior is undefined.
However, a given compiler can provide any number of language extensions, and Visual C++ does. The MSDN Library documents the behavior of Visual C++ as follows, with respect to passing arguments to ...:
If the actual argument is of type float, it is promoted to type double prior to the function call.
Any signed or unsigned char, short, enumerated type, or bit field is converted to either a signed or an unsigned int using integral promotion.
Any argument of class type is passed by value as a data structure; the copy is created by binary copying instead of by invoking the class's copy constructor (if one exists).
This doesn’t mention anything about Visual C++ applying user defined conversions.
MS CString is "cleverly" layed out, so that it's POD representation is exactly the pointer to its null terminated character string. (sizeof(CStringA) == sizeof(char*)) When it is used in any printf-style function the function just get's passed the character pointer.
So this works because of the last point above and the way CString is layed out.
What you're doing is undefined behaviour, and is either a non-standard extension provided by your compiler or works by sheer luck. I'm guessing that the CString stores the string data as the first element in the structure, and thus that reading from the CString as if it were a char * yields a valid null-terminated string.
You cannot insert Non-POD data into variadic functions.
More info
if the variadic-function call is calling operator (const char*) on it, why wouldn't it do so for my own class?
Yes but you should explicitly cast it in your code: printf("%s", (LPCSTR)s, ...);.
It doesn't. It doesn't even call the operator const char*. Visual C++ just passes the class data to printf as if by memcpy. It works because of the layout of the CString class: It only contains one member variable which is a pointer to the character data.
If the variadic-function call is translated into pushing the pointers of the arguments, …
That is not how variadic functions work. The values of the arguments, rather than pointers to the arguments, are passed, after special conversion rules for built-in types (such as char to int).
C++03 §5.2.2p7:
When there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg (18.7). The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined. If the argument has integral or enumeration type that is subject to the integral promotions (4.5), or a floating point type that is subject to the floating point promotion (4.6), the value of the argument is converted to the promoted type before the call. These promotions are referred to as the default argument promotions.
In particular from the above:
If the argument has a non-POD class type (clause 9), the behavior is undefined.
C++ punts to C for the definition of va_arg, and C99 TC3 §7.15.1.2p2 says:
… if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases: [list of cases that don't apply here]
Thus, if you pass a class type, it must be POD, and the receiving function must apply the correct type, otherwise the behavior is undefined. This means that in the worst case, it may work exactly as you expect.
Printf will not apply the correct type for any user-defined class type as it has no knowledge of them, so you cannot pass any UDT class type to printf. Your foo does the same thing by using a char pointer instead of the correct class type.
Your printf statement is wrong:
printf("%s", s, my );
Should be:
printf("%s %s", s, my);
Which will print out "aha my".
CString has a converstion operator for const char* (its actually for LPCTSTR which is a const TCHAR* - CStringA has a conversion function for LPCSTR).
The printf call will not convert your CStringA object to a CStringA* pointer. It essentially treats it like a void*. In the case of CString, it is sheer luck (or perhaps design of Microsoft's developers taking advantage of something that isn't in the standard) that it will give you the NULL-terminated string. If you were to use a _bstr_t instead (which has the size of the string first), despite having the conversion function, it would fail horribly.
It is good practice (and required in many cases) to explicitly cast your objects/pointers to what you want them to be when you call printf (or any variadic function for that matter).