how does printf know the address of a CString's character data? - c++

Considering this code fragment:
struct My {
operator const char*()const{ return "my"; }
} my;
CStringA s( "aha" );
printf("%s %s", s, my );
// another variadic function to get rid of comments about printf :)
void foo( int i, ... ) {
va_list vars;
va_start(vars, i);
for( const char* p = va_arg(vars,const char*)
; p != NULL
; p=va_arg(vars,const char*) )
{
std::cout << p << std::endl;
}
va_end(vars);
}
foo( 1, s, my );
This snippet results in the 'intuitive' output "aha". But I haven't got a clue how this can work:
if the variadic-function call is translated into pushing the pointers of the arguments, printf will receive a CStringA* that is interpreted as a const char*
if the variadic-function call is calling operator (const char*) on it, why wouldn't it do so for my own class?
Can someone explain this?
EDIT: added a dummy variadic function that treats it's arguments as const char*s. Behold - it even crashes when it reaches the my argument...

The relevant text of C++98 standard §5.2.2/7:
The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined.
So formally the behavior is undefined.
However, a given compiler can provide any number of language extensions, and Visual C++ does. The MSDN Library documents the behavior of Visual C++ as follows, with respect to passing arguments to ...:
If the actual argument is of type float, it is promoted to type double prior to the function call.
Any signed or unsigned char, short, enumerated type, or bit field is converted to either a signed or an unsigned int using integral promotion.
Any argument of class type is passed by value as a data structure; the copy is created by binary copying instead of by invoking the class's copy constructor (if one exists).
This doesn’t mention anything about Visual C++ applying user defined conversions.
MS CString is "cleverly" layed out, so that it's POD representation is exactly the pointer to its null terminated character string. (sizeof(CStringA) == sizeof(char*)) When it is used in any printf-style function the function just get's passed the character pointer.
So this works because of the last point above and the way CString is layed out.

What you're doing is undefined behaviour, and is either a non-standard extension provided by your compiler or works by sheer luck. I'm guessing that the CString stores the string data as the first element in the structure, and thus that reading from the CString as if it were a char * yields a valid null-terminated string.

You cannot insert Non-POD data into variadic functions.
More info

if the variadic-function call is calling operator (const char*) on it, why wouldn't it do so for my own class?
Yes but you should explicitly cast it in your code: printf("%s", (LPCSTR)s, ...);.

It doesn't. It doesn't even call the operator const char*. Visual C++ just passes the class data to printf as if by memcpy. It works because of the layout of the CString class: It only contains one member variable which is a pointer to the character data.

If the variadic-function call is translated into pushing the pointers of the arguments, …
That is not how variadic functions work. The values of the arguments, rather than pointers to the arguments, are passed, after special conversion rules for built-in types (such as char to int).
C++03 §5.2.2p7:
When there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg (18.7). The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined. If the argument has integral or enumeration type that is subject to the integral promotions (4.5), or a floating point type that is subject to the floating point promotion (4.6), the value of the argument is converted to the promoted type before the call. These promotions are referred to as the default argument promotions.
In particular from the above:
If the argument has a non-POD class type (clause 9), the behavior is undefined.
C++ punts to C for the definition of va_arg, and C99 TC3 §7.15.1.2p2 says:
… if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases: [list of cases that don't apply here]
Thus, if you pass a class type, it must be POD, and the receiving function must apply the correct type, otherwise the behavior is undefined. This means that in the worst case, it may work exactly as you expect.
Printf will not apply the correct type for any user-defined class type as it has no knowledge of them, so you cannot pass any UDT class type to printf. Your foo does the same thing by using a char pointer instead of the correct class type.

Your printf statement is wrong:
printf("%s", s, my );
Should be:
printf("%s %s", s, my);
Which will print out "aha my".
CString has a converstion operator for const char* (its actually for LPCTSTR which is a const TCHAR* - CStringA has a conversion function for LPCSTR).
The printf call will not convert your CStringA object to a CStringA* pointer. It essentially treats it like a void*. In the case of CString, it is sheer luck (or perhaps design of Microsoft's developers taking advantage of something that isn't in the standard) that it will give you the NULL-terminated string. If you were to use a _bstr_t instead (which has the size of the string first), despite having the conversion function, it would fail horribly.
It is good practice (and required in many cases) to explicitly cast your objects/pointers to what you want them to be when you call printf (or any variadic function for that matter).

Related

Where is const-promotion defined

I'm looking for where const-promotion is defined in c/c++. It's an implicit conversion, but I cannot find any documentation on it.
This works on g++ while using the --pedantic flag
// prototypes for example
void foo(char*);
void bar(const char*);
char buffer[8];
snprintf(buffer,sizeof(buffer), "hello");
// note: string literals are of type const char*
foo(buffer);
foo("hello"); // works, but why
bar(buffer);
bar("hello");
The behavior represented above is the expected behavior. However I am looking for the documentation for this behavior. I have looked at (drafts of) the c++98 standard and stack overflow searching for "promotion" and "implicit conversion" and have not found an answer.
If this question is too broad, I am using C++98, so we can address it for that standard.
This answer is for C and not C++.
Character string literals (distinguished from UTF-8 string literals or wide string literals) are arrays of char, per C 2018 6.4.5 61. For historical reasons, they are not arrays of const char, but they should be treated as const by programmers as, if a program tries to write to a string literal, the behavior is not defined by the C standard.
As an array, a string literal is automatically converted to a char * pointing to its first element, unless it is the operand of sizeof or unary & or is used to initialize an array.
Thus, in both foo(buffer) and foo("hello"), we have a char * argument passed to a char * parameter, and no conversion is necessary.
In bar(buffer) and bar("hello"), we have a char * argument passed to a const * parameter. The explanation for this follows.
For function calls where a prototype is visible, the arguments are converted to the types of the parameters as if by assignment, per C 2018 6.5.2.2 7:
If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type.…
(Note that the “unqualified version of its declared type” means a const int or char * const parameter would be int or char *, respectively, not that a const char * parameter would be char *.)
6.5.16.1 2 says:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression…
The type of the assignment expression is that of the left operand, 6.5.16 3:
… The type of an assignment expression is the type the left operand would have after lvalue conversion.…
So now we know the char * is converted to const char *. This also satisfies the constraints for assignment in 6.5.16.1 1:
One of the following shall hold: … the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;…
And the pointer conversion is specified in 6.3.2.3 2:
For any qualifier q, a pointer to a non-q-qualified type may be converted to a pointer to the q-qualified version of the type; the values stored in the original and converted pointers shall compare equal.
For the snprintf call, the argument "hello" is passed in a location corresponding to ... in the parameters. For this, we look to the rest of 6.5.2.2 7, which continues from the first part quoted above:
… The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
The default argument promotions are in 6.5.2.2 6:
… the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions.
Those promotions do not affect pointers, so the pointer is passed with its type unchanged. That is interesting because we could pass either a char * or a const char * here. The specification for snprintf refers to fprintf, which, for %s specification, says in 7.21.6.1 8:
… the argument shall be a pointer to the initial element of an array of character type.…
So it just requires a pointer to “character type,” not a specific type such as char or const char or volatile char.
(We might further wonder whether, if we were implementing our own function like snprintf and using <stdarg.h> to do it, whether passing a char * argument and processing it with a va_arg(ap, const char *) macro invocation would work. My initial reading of the va_arg specification in 7.16.1.1 2 says the types must be compatible, but char * and const char * are not compatible, but I have not studied this thoroughly.)
Footnote
1 Technically, a string literal is a thing in the source code or the representation of it during phases of C translation, and it is used to create an array of char. For simplicity, I will refer to array as the string literal.

std::nullptr_t arguments in variadic functions

So apparently a std::nullptr_t argument is converted to a null pointer of type void * (Section 5.2.2/7 of N3337) when passed without a parameter (via ...). This means that to properly pass a null char * pointer, for example, a cast is still needed:
some_variadic_function("a", "b", "c", (const char *) std::nullptr);
since there is no guarantee that a null void * has the same bit pattern as a null char *. Correct?
This also means that there is no advantage to std::nullptr over 0 in such cases, except perhaps for clarity.
You ask:
since there is no guarantee that a null void * has the same bit pattern as a null char *. Correct?
Well, actually, that guarantee does exist, Deduplicator's answer already shows where the standard requires this. But that is not relevant to your question.
Passing void * to a variadic function, and accessing it using va_arg as char *, is specifically allowed as a special exception.
C++11:
18.10 Other runtime support [support.runtime]
1 Headers <csetjmp> (nonlocal jumps), <csignal> (signal handling), <cstdalign> (alignment), <cstdarg> (variable arguments), <cstdbool> (__bool_true_false_are_defined). (runtime environment
getenv(), system()), and <ctime> (system clock clock(), time()) provide further compatibility with C code.
2 The contents of these headers are the same as the Standard C library headers <setjmp.h>, <signal.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stdlib.h>, and <time.h>, respectively, with the following
changes:
[... nothing about va_arg]
C99:
7.15.1.1 The va_arg macro
[...] If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
-- one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
-- one type is pointer to void and the other is a pointer to a character type.
However, this does mean that in other cases where two types T1 and T2 have the same representation and alignment requirements, the behaviour is undefined if T1 is passed to a variadic function, and it is retrieved as T2.
An example of this: passing (void *) 0 and accessing it as char *, is allowed, passing (void *) 0 and accessing it as unsigned char * is also allowed, but passing (char *) 0 and accessing it as unsigned char * is not allowed. If a compiler is capable of inlining calls to variadic functions, and optimises based on the strict requirements of the standard, such mismatches could break badly.
This also means that there is no advantage to std::nullptr over 0 in such cases, except perhaps for clarity.
I would definitely not use nullptr without casting it, even though in this one special case it is valid. It is far too hard to see that it is valid. And if a cast is included anyway, (char *) 0 is just as clear as a null pointer value.
You are wrong. One of the few guarantees are that a char* has the same size and representation as the corresponding void*.
3.9.2 Compound Types §4
A pointer to cv-qualified (3.9.3) or cv-unqualified void can be used to point to objects of unknown type.
Such a pointer shall be able to hold any object pointer. An object of type cv void* shall have the same
representation and alignment requirements as cv char*.
Edit: Looks like this answer by hvd is better, showing a few more traps specific to the variadic function part of the question.

Is it safe to pass an intptr_t to a function that expects an int?

More specifically, if I have the following function pointer type:
typedef void (*callback_type) (intptr_t context, void* buffer, size_t count);
can I safely and without "undefined behavior" do:
callback_type func_ptr = (callback_type)write;
intptr_t context = fd;
func_ptr(context, some_buffer, buffer_size);
?
Where write() is the system call (EDIT: has the signature ssize_t write(int fd, const void *buf, size_t count);, thus takes an int as the first argument), and fd is an int file descriptor. I assume the answer would be the same for C and C++, so I am tagging both.
No
That won't be portable because you are passing a parameter that will be a different size in the common LP64 paradigm.
Also, you aren't dereferencing the function pointer with the correct type, and the results of that are undefined.
Now, as you seem to have concluded, the function pointer will work as expected and the only practical problem is going to be: how will write(2) interpret the intptr_t first parameter?
And the actual run-time problem is that, on LP64, you are passing a 64-bit value to a 32-bit parameter. This might misalign the subsequent parameters. On a system with register parameters it would probably work just fine.
Let's have a look at C standard.
C11 (n1570), § 6.3.2.3 Pointers
A pointer to a function of one type may be converted to a pointer to a
function of another type and back again; the result shall compare
equal to the original pointer. If a converted pointer is used to call
a function whose type is not compatible with the referenced type, the
behavior is undefined.
C11 (n1570), § 6.7.6.3 Function declarators (including prototypes)
For two function types to be compatible, both shall specify compatible
return types. Moreover, the parameter type lists, if both are present,
shall agree in the number of parameters and in use of the ellipsis
terminator; corresponding parameters shall have compatible types.
C11 (n1570), § 6.2.7 Compatible type and composite type
Two types have compatible type if their types are the same.
Conclusion:
void (*) (intptr_t context, void* buffer, size_t count);
cannot be converted to:
void (*) (int context, void* buffer, size_t count);
The problem is not with passing the argument back and forth between functions, since automatic promotion from one integral type to another is done.
The problem is, what if intptr_t is shorter than int, thus not every value of int can be represented by an intptr_t? In such a case, the some of the highest bits in the int will be truncated when converting to intptr_t, so you'll end up write()ing to an invalid file descriptor. Although that should not invoke undefined behavior, it's still erroneous.

printf and pointers [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Correct format specifier to print pointer (address)?
When printing a pointer using printf, is it necessary to cast the pointer to void *? In other words, in code like
#include <stdio.h>
int main() {
int a;
printf("address of a = %p\n", &a);
}
should the argument really be (void *) &a? gcc doesn't seem to give any warnings when no explicit cast is made.
Yes, the cast to void* is required.
int a;
printf("address of a = %p\n", &a);
&a is of type int*; printf's "%p" format requires an argument of type void*. The int* argument is not implicitly converted to void*, because the declaration of printf doesn't provide type information for parameters other than the first (the format string). All arguments after the format string have the default argument promotions applied to them; these promotions do not convert int* to void*.
The likely result is that printf sees an argument that's really of type int* and interprets it as if it were of type void*. This is type-punning, not conversion, and it has undefined behavior. It will likely happen to work if int* and void* happen to have the same representation, but the language standard does not guarantee that, even by implication. And the type-punning I described is only one possible behavior; the standard says literally nothing about what can happen.
(If you do the same thing with a non-variadic function with a visible prototype, so the compiler knows at the point of the call that the parameter is of type void*, then it will generate code to do an implicit int*-to-void* conversion. That's not the case here.)
Is this a C or a C++ question? For C++, it seems that according to 5.2.2 [expr.call] paragraph 7 there isn't any implicit conversion to void*. It seems that C99's 6.5.2.2 paragraph 6 also doesn't imply any explicit promotion of pointer types. This would mean that an explicit cast to void* is required as pointer types can have different size (at least in C++): if the layout of the different pointer types isn't identical you'd end up with undefined behavior. Can someone point out where it is guaranteed that a pointer is passed with the appropriate size when using variable argument lists?
Of course, being a C++ programmer this isn't much of a problem: just don't use functions with variable number of arguments. That's not a viable approach in C, though.
I think it might be necessary to cast. Are we certain that the size of pointers is always the same? I'm sure I read on stackoverflow recently that the size (or maybe just the alignment?) of a struct* can be different to that of a union*. This would suggest that one or both can be different from the size of a void*.
So even if the value doesn't change much, or at all, in the conversion, maybe the cast is needed to ensure the size of the pointer itself is correct.
In print, %p expects a void* so you should explicitly cast it. If you don't do so, and if you are lucky then the pointer size and pointer representation might save the day. But you should explicitly cast it to be certain - anything else is technically undefined behaviour.

Semantics of char a[]

I recently embarrassed myself while explaining to a colleague why
char a[100];
scanf("%s", &a); // notice a & in front of 'a'
is very bad and that the slightly better way to do it is:
char a[100];
scanf("%s", a); // notice no & in front of 'a'
Ok. For everybody getting ready to tell me why scanf should not be used anyway for security reasons: ease up. This question is actually about the meaning of "&a" vs "a".
The thing is, after I explained why it shouldn't work, we tried it (with gcc) and it works =)). I ran a quick
printf("%p %p", a, &a);
and it prints the same address twice.
Can anybody explain to me what's going on?
Well, the &a case should be obvious. You take the address of the array, exactly as expected.
a is a bit more subtle, but the answer is that a is the array. And as any C programmer knows, arrays have a tendency to degenerate into a pointer at the slightest provocation, for example when passing it as a function parameter.
So scanf("%s", a) expects a pointer, not an array, so the array degenerates into a pointer to the first element of the array.
Of course scanf("%s", &a) works too, because that's explicitly the address of the array.
Edit: Oops, looks like I totally failed to consider what argument types scanf actually expects. Both cases yield a pointer to the same address, but of different types. (pointer to char, versus pointer to array of chars).
And I'll gladly admit I don't know enough about the semantics for ellipsis (...), which I've always avoided like the plague, so looks like the conversion to whichever type scanf ends up using may be undefined behavior. Read the comments, and litb's answer. You can usually trust him to get this stuff right. ;)
Well, scanf expects a char* pointer as the next argument when seeing a "%s". But what you give it is a pointer to a char[100]. You give it a char(*)[100]. It's not guaranteed to work at all, because the compiler may use a different representation for array pointers of course. If you turn on warnings for gcc, you will see also the proper warning displayed.
When you provide an argument object that is an argument not having a listed parameter in the function (so, as in the case for scanf when has the vararg style "..." arguments after the format string), the array will degenerate to a pointer to its first element. That is, the compiler will create a char* and pass that to printf.
So, never do it with &a and pass it to scanf using "%s". Good compilers, as comeau, will warn you correctly:
warning: argument is incompatible with corresponding format string conversion
Of course, the &a and (char*)a have the same address stored. But that does not mean you can use &a and (char*)a interchangeably.
Some Standard quotes to especially show how pointer arguments are not converted to void* auto-magically, and how the whole thing is undefined behavior.
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object. (6.3.2.1/3)
So, that is done always - it isn't mentioned below explicitly anymore when listening valid cases when types may differ.
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments. (6.5.2.2/7)
About how va_arg behaves extracting the arguments passed to printf, which is a vararg function, emphasis added by me (7.15.1.1/2):
Each invocation of the va_arg macro modifies ap so that the
values of successive arguments are returned in turn. The parameter type shall be a type
name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type. If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
one type is a signed integer type, the other type is the corresponding unsigned integer
type, and the value is representable in both types;
one type is pointer to void and the other is a pointer to a character type.
Well, here is what that default argument promotion is:
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions. (6.5.2.2/6)
It's been a while since I programmed in C but here's my 2c:
char a[100] doesn't allocate a separate variable for the address of the array, so the memory allocation looks like this:
---+-----+---
...|0..99|...
---+-----+---
^
a == &a
For comparison, if the array was malloc'd then there is a separate variable for the pointer, and a != &a.
char *a;
a = malloc(100);
In this case the memory looks like this:
---+---+---+-----+---
...| a |...|0..99|...
---+---+---+-----+---
^ ^
&a != a
K&R 2nd Ed. p.99 describes it fairly well:
The correspondence between indexing
and pointer arithmetic is very close.
By definition, the value of a variable
or expression of type array is the
address of element zero of the array.
Thus after the assignment pa=&a[0];
pa and a have identical values. Since
the name of the array is a synonym for
the location of the initial element,
the assignment pa=&a[0] can also be
written as pa=a;
A C array can be implicitly converted to a pointer to its first element (C99:TC3 6.3.2.1 §3), ie there are a lot of cases where a (which has type char [100]) will behave the same way as &a[0] (which has type char *). This explains why passing a as argument will work.
But don't start thinking this will always be the case: There are important differences between arrays and pointers, eg regarding assignment, sizeof and whatever else I can't think of right now...
&a is actually one of these pitfalls: This will create a pointer to the array, ie it has type char (*) [100] (and not char **). This means &a and &a[0] will point to the same memory location, but will have different types.
As far as I know, there is no implicit conversion between these types and they are not guaranteed to have a compatible representation as well. All I could find is C99:TC3 6.2.5 §27, which doesn't says much about pointers to arrays:
[...] Pointers to other types need not have the same representation or alignment requirements.
But there's also 6.3.2.3 §7:
[...] When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
So the cast (char *)&a should work as expected. Actually, I'm assuming here that the lowest addressed byte of an array will be the lowest addressed byte of its first element - not sure if this is guaranteed, or if a compiler is free to add arbitrary padding in front of an array, but if so, that would be seriously weird...
Anyway for this to work, &a still has to be cast to char * (or void * - the standard guarantees that these types have compatible representations). The problem is that there won't be any conversions applied to variable arguments aside from the default argument promotion, ie you have to do the cast explicitly yourself.
To summarize:
&a is of type char (*) [100], which might have a different bit-representation than char *. Therefore, an explicit cast must be done by the programmer, because for variable arguments, the compiler can't know to what it should convert the value. This means only the default argument promotion will be done, which, as litb pointed out, does not include a conversion to void *. It follows that:
scanf("%s", a); - good
scanf("%s", &a); - bad
scanf("%s", (char *)&a); - should be ok
Sorry, a tiny bit off topic:
This reminded me of an article I read about 8 years ago when I was coding C full time. I can't find the article but I think it was titled "arrays are not pointers" or something like that. Anyway, I did come across this C arrays and pointers FAQ which is interesting reading.
char [100] is a complex type of 100 adjacent char's, whose sizeof equals to 100.
Being casted to a pointer ((void*) a), this variable yields the address of the first char.
Reference to the variable of this type (&a) yields address of the whole variable, which, in turn, also happens to be the address of the first char