printf's % conversion specifier expects a pointer to a char array. Note the lack of const. I can see the reasons for this in C, and since C++ incorporates the C99 standard, this wouldn't change. However, if I'm writing my own printf, can I safely convert the argument to const char* instead?:
case 's' :
ptr = va_arg(va, const char*);
_puts(ptr, strlen(ptr));
break;
Would this have any unintended semantics (note: I'm not asking about undefined behavior, because such an implementation would not be conforming anyway)?
The C standard (ISO/IEC 9899:2011 (E)) specifies the meaning of the %s conversion specifier in 7.21.6.1/8:
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.
This formulation clearly not specific enough to tell whether the character type is const or non-const. It doesn't even state whether char, signed char, or unsigned char is used. I don't think character array is defined as a term in the C standard.
Put differently: using char const* for the type specified by a %s conversion specifier is fine.
Related
For example such code may be useful:
unsigned char ch = 0xf2;
printf("%02hhx", ch);
However, ch is promoted to int when passing as a parameter of variadic function printf. So when %hhx is used, there is type mismatch. Is any undefined behavior involved here according to C standard? What if it is C++?
There are some discussion here but no answer is given.
C11 standard says:
7.21.6.1/7
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have
been promoted according to the integer promotions, but its value shall be
converted to signed char or unsigned char before printing); or that
a following n conversion specifier applies to a pointer to a signed char
argument.
So no, there is no undefined behavior. The standard is well aware that a char argument to a variadic function would be promoted to int.
I'm looking for where const-promotion is defined in c/c++. It's an implicit conversion, but I cannot find any documentation on it.
This works on g++ while using the --pedantic flag
// prototypes for example
void foo(char*);
void bar(const char*);
char buffer[8];
snprintf(buffer,sizeof(buffer), "hello");
// note: string literals are of type const char*
foo(buffer);
foo("hello"); // works, but why
bar(buffer);
bar("hello");
The behavior represented above is the expected behavior. However I am looking for the documentation for this behavior. I have looked at (drafts of) the c++98 standard and stack overflow searching for "promotion" and "implicit conversion" and have not found an answer.
If this question is too broad, I am using C++98, so we can address it for that standard.
This answer is for C and not C++.
Character string literals (distinguished from UTF-8 string literals or wide string literals) are arrays of char, per C 2018 6.4.5 61. For historical reasons, they are not arrays of const char, but they should be treated as const by programmers as, if a program tries to write to a string literal, the behavior is not defined by the C standard.
As an array, a string literal is automatically converted to a char * pointing to its first element, unless it is the operand of sizeof or unary & or is used to initialize an array.
Thus, in both foo(buffer) and foo("hello"), we have a char * argument passed to a char * parameter, and no conversion is necessary.
In bar(buffer) and bar("hello"), we have a char * argument passed to a const * parameter. The explanation for this follows.
For function calls where a prototype is visible, the arguments are converted to the types of the parameters as if by assignment, per C 2018 6.5.2.2 7:
If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type.…
(Note that the “unqualified version of its declared type” means a const int or char * const parameter would be int or char *, respectively, not that a const char * parameter would be char *.)
6.5.16.1 2 says:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression…
The type of the assignment expression is that of the left operand, 6.5.16 3:
… The type of an assignment expression is the type the left operand would have after lvalue conversion.…
So now we know the char * is converted to const char *. This also satisfies the constraints for assignment in 6.5.16.1 1:
One of the following shall hold: … the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;…
And the pointer conversion is specified in 6.3.2.3 2:
For any qualifier q, a pointer to a non-q-qualified type may be converted to a pointer to the q-qualified version of the type; the values stored in the original and converted pointers shall compare equal.
For the snprintf call, the argument "hello" is passed in a location corresponding to ... in the parameters. For this, we look to the rest of 6.5.2.2 7, which continues from the first part quoted above:
… The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
The default argument promotions are in 6.5.2.2 6:
… the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions.
Those promotions do not affect pointers, so the pointer is passed with its type unchanged. That is interesting because we could pass either a char * or a const char * here. The specification for snprintf refers to fprintf, which, for %s specification, says in 7.21.6.1 8:
… the argument shall be a pointer to the initial element of an array of character type.…
So it just requires a pointer to “character type,” not a specific type such as char or const char or volatile char.
(We might further wonder whether, if we were implementing our own function like snprintf and using <stdarg.h> to do it, whether passing a char * argument and processing it with a va_arg(ap, const char *) macro invocation would work. My initial reading of the va_arg specification in 7.16.1.1 2 says the types must be compatible, but char * and const char * are not compatible, but I have not studied this thoroughly.)
Footnote
1 Technically, a string literal is a thing in the source code or the representation of it during phases of C translation, and it is used to create an array of char. For simplicity, I will refer to array as the string literal.
So from my understanding pointer variables point to an address. So, how is the following code valid in C++?
char* b= "abcd"; //valid
int *c= 1; //invalid
The first line
char* b= "abcd";
is valid in C, because "string literals", while used as initializer, boils down to the address of the first element in the literal, which is a pointer (to char).
Related, C11, chapter §6.4.5, string literals,
[...] The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [...]
and then, chapter §6.3.2.1 (emphasis mine)
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue.
However, as mentioned in comments, in C++11 onwards, this is not valid anymore as string literals are of type const char[] there and in your case, LHS lacks the const specifier.
OTOH,
int *c= 1;
is invalid (illegal) because, 1 is an integer constant, which is not the same type as int *.
In C and very old versions of C++, a string literal "abcd" is of type char[], a character array. Such an array can naturally get pointed at by a char*, but not by a int* since that's not a compatible type.
However, C and C++ are different, often incompatible programming languages. They dropped compatibility with each other some 20 years ago.
In standard C++, a string literal is of type const char[] and therefore none of your posted code is valid in C++. This won't compile:
char* b = "abcd"; //invalid, discards const qualifier
This will:
const char* c = "abcd"; // valid
"abcd" is actually a const char[5] type, and the language permits this to be assigned to a const char* (and, regrettably, a char* although C++11 onwards disallows it.).
int *c = 1; is not allowed by the C++ or C standards since you can't assign an int to an int* pointer (with the exception of 0, and in that case your intent will be expressed clearer by assigning nullptr instead).
"abcd" is the address that contains the sequence of five bytes 97 98 99 100 0 -- you cannot see what the address is in the source code, but the compiler will still assign it an address.
1 is also an address near the bottom of your [virtual] memory. This may not seem to be useful to you, but it is useful to other people, so even though the "standard" might not want to permit this, every compiler you are ever likely to run into will support this.
While all other answers give the correct answer of why you code doesn't work, using a compound literal to initialize c, is one way you can make your code work, e.g.
int *c= (int[]){ 1 };
printf ("int pointer c : %d\n", *c);
Note, there are differences between C and C++ in the use of compound literals, they are only available in C.
char* foo = "fpp"; //compile in vs 2010 with no problem
I though string literal is const char* type.
And const type cannot be assigned to non-const type.
So I expect the code above to fail or am I missing something?
Edit: Sorry guys, I totally forgotten that compiler throws warning too.
I was looking at error list all this time.
I'm forget to check that.
Edit2: I set my project Warning Level to EnableAllWarnings (/Wall) and there's no warning about this.
So my question is still valid.
C++03 deprecates[Ref 1] use of string literal without the const keyword.
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
Prior to C++03, C++ derived its declaration of string literal without the const keyword, Note that the same is perfectly valid in C.
As I understand it, in C, before const was added, this was the way to assign a string to a pointer.
In C++ this is deprecated behavior, but still allowed to keep backwards compatibility. So don't use it.
In fact, I believe in C++11 it's completely invalid.
Not quite. A string literal is assignable to a char* type. A string literal should never be modified.
This strange situation is for backwards compatibility with programs before const existed.
gcc -std=c++0x warns about this:
a.cpp:5:14: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
So, this is still allowed, but deprecated, because literal strings are const.
There is no such thing as a const type. Const keyword is a so called type qualifier. It can be applied to any pointer type and just means that the value pointed at by the pointer should not be modified.
You could also apply the const qualifier to the pointer reference itself this way:
char* const p ="aaa";
This will protect the pointer variable from pointing to another string.
There's a special implicit conversion to support this, since it was a common idiom in legacy code (often written before const existed). The type of your string literal is char const[], and you should only use it as such. A good compiler will warn at the above, since the conversion was deprecated from the moment it was introduced.
Note that this is different from C, where the type of a string literal is char[] (but trying to modify it is still undefined behavior).
You are talking about C strings, which are actually vector of char. In C++, the class std::string is used, as well as a constant string is created as const std::string.
Anyway, compilers reserve a piece of memory in the future program in order to store the literal strings that show up in the source code. This part of the memory is considered read-only, so you shoud point to it with a const char *. It size is exactly the size of the string plus one extra position for the trailing zero, marking the end of the string.
Compilers need to keep backwards compatibility, so they still accept literals to be pointed by char *. However, this is misleading, since you are not supposed to be able to modify that memory which could be stored in ROM in an embedded system.
In my system, I use clang:
$ clang --version
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
Target: i386-pc-linux-gnu
Thread model: posix
In the clang C compiler, this code compiles without errors:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char * str = "Hello, World!";
printf( "%s", str );
return EXIT_SUCCESS;
}
However, the very same code (with minor modifications, such as the header's names) throws the following warning when compiled as a C++ program:
kk.cpp:6:15: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
char * str = "Hello, World!";
^
1 warning generated.
Hope this helps.
Considering this code fragment:
struct My {
operator const char*()const{ return "my"; }
} my;
CStringA s( "aha" );
printf("%s %s", s, my );
// another variadic function to get rid of comments about printf :)
void foo( int i, ... ) {
va_list vars;
va_start(vars, i);
for( const char* p = va_arg(vars,const char*)
; p != NULL
; p=va_arg(vars,const char*) )
{
std::cout << p << std::endl;
}
va_end(vars);
}
foo( 1, s, my );
This snippet results in the 'intuitive' output "aha". But I haven't got a clue how this can work:
if the variadic-function call is translated into pushing the pointers of the arguments, printf will receive a CStringA* that is interpreted as a const char*
if the variadic-function call is calling operator (const char*) on it, why wouldn't it do so for my own class?
Can someone explain this?
EDIT: added a dummy variadic function that treats it's arguments as const char*s. Behold - it even crashes when it reaches the my argument...
The relevant text of C++98 standard §5.2.2/7:
The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined.
So formally the behavior is undefined.
However, a given compiler can provide any number of language extensions, and Visual C++ does. The MSDN Library documents the behavior of Visual C++ as follows, with respect to passing arguments to ...:
If the actual argument is of type float, it is promoted to type double prior to the function call.
Any signed or unsigned char, short, enumerated type, or bit field is converted to either a signed or an unsigned int using integral promotion.
Any argument of class type is passed by value as a data structure; the copy is created by binary copying instead of by invoking the class's copy constructor (if one exists).
This doesn’t mention anything about Visual C++ applying user defined conversions.
MS CString is "cleverly" layed out, so that it's POD representation is exactly the pointer to its null terminated character string. (sizeof(CStringA) == sizeof(char*)) When it is used in any printf-style function the function just get's passed the character pointer.
So this works because of the last point above and the way CString is layed out.
What you're doing is undefined behaviour, and is either a non-standard extension provided by your compiler or works by sheer luck. I'm guessing that the CString stores the string data as the first element in the structure, and thus that reading from the CString as if it were a char * yields a valid null-terminated string.
You cannot insert Non-POD data into variadic functions.
More info
if the variadic-function call is calling operator (const char*) on it, why wouldn't it do so for my own class?
Yes but you should explicitly cast it in your code: printf("%s", (LPCSTR)s, ...);.
It doesn't. It doesn't even call the operator const char*. Visual C++ just passes the class data to printf as if by memcpy. It works because of the layout of the CString class: It only contains one member variable which is a pointer to the character data.
If the variadic-function call is translated into pushing the pointers of the arguments, …
That is not how variadic functions work. The values of the arguments, rather than pointers to the arguments, are passed, after special conversion rules for built-in types (such as char to int).
C++03 §5.2.2p7:
When there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg (18.7). The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined. If the argument has integral or enumeration type that is subject to the integral promotions (4.5), or a floating point type that is subject to the floating point promotion (4.6), the value of the argument is converted to the promoted type before the call. These promotions are referred to as the default argument promotions.
In particular from the above:
If the argument has a non-POD class type (clause 9), the behavior is undefined.
C++ punts to C for the definition of va_arg, and C99 TC3 §7.15.1.2p2 says:
… if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases: [list of cases that don't apply here]
Thus, if you pass a class type, it must be POD, and the receiving function must apply the correct type, otherwise the behavior is undefined. This means that in the worst case, it may work exactly as you expect.
Printf will not apply the correct type for any user-defined class type as it has no knowledge of them, so you cannot pass any UDT class type to printf. Your foo does the same thing by using a char pointer instead of the correct class type.
Your printf statement is wrong:
printf("%s", s, my );
Should be:
printf("%s %s", s, my);
Which will print out "aha my".
CString has a converstion operator for const char* (its actually for LPCTSTR which is a const TCHAR* - CStringA has a conversion function for LPCSTR).
The printf call will not convert your CStringA object to a CStringA* pointer. It essentially treats it like a void*. In the case of CString, it is sheer luck (or perhaps design of Microsoft's developers taking advantage of something that isn't in the standard) that it will give you the NULL-terminated string. If you were to use a _bstr_t instead (which has the size of the string first), despite having the conversion function, it would fail horribly.
It is good practice (and required in many cases) to explicitly cast your objects/pointers to what you want them to be when you call printf (or any variadic function for that matter).