Last named parameter not function or array? - c++

This question is about vararg functions, and the last named parameter of them, before the ellipsis:
void f(Type paramN, ...) {
va_list ap;
va_start(ap, paramN);
va_end(ap);
}
I was reading in the C Standard, and found the following restriction for the va_start macro:
The parameter parmN is the identifier of the rightmost parameter in the variable parameter list in the function definition (the one just before the , ...). If the parameter parmN is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions, the behavior is undefined.
I wonder why the behavior is undefined for the following code
void f(int paramN[], ...) {
va_list ap;
va_start(ap, paramN);
va_end(ap);
}
and not undefined for the following
void f(int *paramN, ...) {
va_list ap;
va_start(ap, paramN);
va_end(ap);
}
The macros are intended to be implementable by pure C code. But pure C code cannot find out whether or not paramN was declared as an array or as a pointer. In both cases, the type of the parameter is adjusted to be a pointer. The same is true for function type parameters.
I wonder: What is the rationale of this restriction? Do some compilers have problems with implementing this when these parameter adjustments are in place internally? (The same undefined behavior is stated for C++ - so my question is about C++ aswell).

The restriction against register parameters or function parameters are probably something like:
you are not allowed to take the address of a variable with the register storage class.
function pointers are sometimes quite different than pointers to objects. For example, they might be larger than pointers to objects (you can't reliably convert a function pointer to an object pointer and back again), so adding some fixed number to the address of a function pointer might not get you to the next parameter. If va_start() and/or va_arg() were implemented by adding some fixed amount to the address of paramN and function pointers were larger than object pointers the calculation would end up with the wrong address for the object va_arg() returns. This might not seem to be a great way to implement these macros, but there might be platforms that have (or even need) this type of implementation.
I can't think of what the problem would be to prevent allowing array parameters, but PJ Plauger says this in his book "The Standard C Library":
Some of the restrictions imposed on the macros defined in <stdarg.h> seem unnecessarily severe. For some implementations, they are. Each was introduced, however, to meet the needs of at least one serious C implementation.
And I imagine that there are few people who know more about the ins and outs of the C library than Plauger. I hope someone can answer this specific question with an actual example; I think it would be an interesting bit of trivia.
New info:
The "Rationale for International Standard - Programming Languages - C" says this about va_start():
The parmN argument to va_start was intended to be an aid to implementors writing the
definition of a conforming va_start macro entirely in C, even using pre-C89 compilers (for example, by taking the address of the parameter). The restrictions on the declaration of the parmN parameter follow from the intent to allow this kind of implementation, as applying the & operator to a parameter name might not produce the intended result if the parameter’s declaration did not meet these restrictions.
Not that that helps me with the restriction on array parameters.

It's not undefined. Keep in mind that when parameter is declared as int paramN[], the actual parameter type will still decay to int* paramN immediately (which is visible in C++, for example, if you apply typeid to paramN).
I must admit that I'm not sure what this bit in the spec is even for, considering that you cannot have parameters of function or array types in the first place (since they will pointer-decay).

I found another relevant quote, from Dinkumware.
The last parameter must not have
register storage class, and it must
have a type that is not changed by the
translator. It cannot have:
* an array type
* a function type
* type float
* any integer type that changes when promoted
* a reference type [C++ only]
So apparently, the problem is precisely that the parameter gets passed in a way different from how it is declared. Interestingly enough, they also ban float and short, even though those should be supported by the standard.
As a hypothesis, it could be that some compilers have problems doing sizeof correctly on such parameters. E.g. it might be that, for
int f(int x[10])
{
return sizeof(x);
}
some (buggy) compiler will return 10*sizeof(int), thus breaking the va_start implementation.

I can only guess that the register restriction is there to ease library/compiler implementation -- it eliminates a special case for them to worry about.
But I have no clue about the array/function restriction. If it were in the C++ standard only, I would hazard a guess that there is some obscure template matching scenario where the difference between a parameter of type T[] and of type T* makes a difference, correct handling of which would complicate va_start etc. But since this clause appears in the C standard too, obviously that explanation is ruled out.
My conclusion: an oversight in the standards. Possible scenario: some pre-standard C compiler implemented parameters of type T[] and T* differently, and the spokesperson for that compiler on the C standards committee had the above restrictions added to the standard; that compiler later became obsolete, but no-one felt the restrictions were compelling enough to update the standard.

C++11 says:
[n3290: 13.1/3]: [..] Parameter declarations that differ only in a
pointer * versus an array [] are equivalent. That is, the array
declaration is adjusted to become a pointer declaration. [..]
and C99 too:
[C99: 6.7.5.3/7]: A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation. [..]
And you said:
But pure C code cannot find out whether or not paramN was declared as an array or as a pointer. In both cases, the type of the parameter is adjusted to be a pointer.
Right, so there's no difference between the two pieces of code you showed us. Both have paramN declared as a pointer; there is actually no array type there at all.
So why would there be a difference between the two when it comes to the UB?
The passage you quoted...
The parameter parmN is the identifier of the rightmost parameter in the variable parameter list in the function definition (the one just before the , ...). If the parameter parmN is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions, the behavior is undefined.
...applies to neither, as would be expected.

Related

Is it safe to reinterpret_cast from std::function<void()> * to std::function<std::monostate()> *?

Example:
std::function<std::monostate()> convert(std::function<void()> func){
return *reinterpret_cast<std::function<std::monostate()> * >(&func);
}
Are std::function<void()> and std::function<std::monostate()> considered "similar" enough for reinterpret_cast to be safe?
Edit: someone asked me to clarify what I am asking. I am not asking if the general case of foo<X> and foo<Y> are similar but whether foo<void> and foo<std::monostate> are.
No this is unsafe and leads to undefined behavior. In particular, there's no guarantee that the two layouts will be compatible. Of course, you might get away with it with some compiler and runtime combinations, but then it might break if some future release of your compiler decides to implement certain forms of control flow integrity.
The safe way to do what you want, albeit at a small cost in performance, is just to return a new lambda, as in:
std::function<std::monostate()> convert(std::function<void()> func){
return [func=std::move(func)]() -> std::monostate { func(); return {}; };
}
Are std::function<void()> and std::function<std::monostate()> considered "similar" enough for reinterpret_cast to be safe?
No. Given a template foo and distinct types X and Y, the instantiations foo<X> and foo<Y> are not similar, regardless of any perceived relationship between X and Y (as long as they are not the same type, which is why they were qualified as "distinct"). Different template instantiations are unrelated unless documented otherwise. There is no such documentation for std::function.
The rules for "similar" make allowances for digging into pointer types, but there is nothing special for templates. (Nor could there be, since a template specialization could look radically different than its base template.) Different types as template arguments yield dissimilar templated classes. No need to dig deeper into those arguments.
I am not asking if the general case of foo<X> and foo<Y> are similar but whether foo<void> and foo<std::monostate> are.
There is nothing special about void and std::monostate that would make them two names for the same type. (In fact, they cannot be the same type, as the former has zero values, while the latter has exactly one value.) So, asking about foo<void> and foo<std::monostate> is the same as asking about the general case, just with a greater possibility of seeing connections that do not exist.
Also, the question is not about foo<void> and foo<std::monostate> but about foo<void()> and foo<std::monostate()>. The types used as template arguments are function types, not object types. Function types are very particular in that two function types are the same only when all of their parameter and return types are exact matches; none of the conversions allowed when invoking a function are considered. (Not that there is a conversion from void to std::monostate.) The function types are different, so again the templates instantiated from those types are not similar.
Perhaps a more focused version of this question would have asked about function pointers instead of std::function objects.
(from a comment:) I was looking at the assembly code of std::monostate() functions and void() functions and they generate the same assembly verbatim.
Generated assembly means nothing as far as the language is concerned. At best, you have evidence that with your compiler, it seems likely that you could get away with invoking a function pointer after casting it from void (*)() to std::monostate (*)(). Not "safe" so much as "works for now". And that assumes that you use the function pointer directly instead of burying it inside a std::function (a complex adapter of types).
C++ is a strongly typed language. Different types are different even if they are treated the same at the level of assembly code. This might be more readily apparent if we switch to more familiar types. On many common systems, char is signed, making it equivalent to signed char at the assembly code level. However, this does not affect the similarity of functions. The following code is illegal, even if changing char to signed char has no effect on the assembly code generated for foo().
char foo() { return 'c'; }
int main()
{
signed char (*fun)() = foo; // <-- Error: invalid conversion
// ^^^^^^ -- because the return type is signed char, not char
}
One can downgrade this error to a warning with a reinterpret_cast. After all, it is legal to cast a function pointer to any function pointer type. However, it is not safe to invoke the function through the cast pointer (unless cast back to the original type), hence the warning. Invoking it might work very reliably on your system, but that is due to your system, not the language. When you ask about "safe", you are asking for guidance from the language specs, not merely what will probably work on your system.

Why isn't it a compile error if you pass a class object to scanf?

Why is the code below accepted by g++?
#include <cstdio>
#include <string>
int main()
{
std::string str;
scanf("%s", str);
}
What sense does it make to pass a class object to scanf()? Does it get converted to anything that could be useful to another function with variadic arguments?
scanf comes from C. In C if you wanted to have variable number of arguments (like scanf needs) the only solution was variadic function. Variadic functions by design are not type safe, i.e. you can pass absolutely any type and a varargs function will happily accept them. It is a limitation of the C language. That doesn't mean that any type is valid. If an type other than what is actually expected is passed, then we are in the wonderful land of Undefined Behavior.
That being said, scanf is a standard function and what it can accept is known, so most compilers will do extra checks (not required by the standard) if you enable the right flags. See Neil's answer for that.
In C++ (since C++11) we have parameter packs which are type safe ...ish (oh, concepts cannot get sooner).
Enable some warnings. With -Wextra -Wall -pedantic, you will get:
a.cpp:7:10: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'std::__cxx11::string' {aka 'std::__cxx11::basic_string<char>'} [-Wformat=]
scanf("%s", str);
If you want that to be an error rather than a warning, add -Werror.
You have two distinct problems here, not just one:
The passing of a std::string through variadic arguments (which has undefined behaviour), and
The passing of a std::string to a function whose logical semantics expected a char* instead (which has undefined behaviour).
So, no, it doesn't make sense. But it's not a hard error. If you're asking why this has undefined behaviour rather than being ill-formed (and requiring a hard error), I do not know specifically but the answer is usually that it was deemed insufficiently important to require compilers to go to the trouble it would take to diagnose it.
Also, it would be unusual for a logical precondition violation to be deemed ill-formed (just as a matter of convention and consistency; many such violations could not be detected before runtime), so I'd expect point #2 to have undefined behaviour regardless of what hypothetical changes we made to the language to better reject cases of point #1.
Anyway, in the twenty years since standardisation, we've reached a point in technology where the mainstream toolchains do warn on it anyway, and since warnings can be turned into errors, it doesn't really matter.
To answer each of your questions...
The question in the title: "Why isn't it a compile error if you pass a class object to scanf?"
Because the declaration of scanf is int scanf ( const char * format, ... ); which means it will accept any number of arguments after the format string as variadic arguments. The rules for such arguments are:
When a variadic function is called, after lvalue-to-rvalue, array-to-pointer, and function-to-pointer conversions, each argument that is a part of the variable argument list undergoes additional conversions known as default argument promotions:
std::nullptr_t is converted to void*
float arguments are converted to double as in floating-point promotion
bool, char, short, and unscoped enumerations are converted to int or wider integer types as in integer promotion
Only arithmetic, enumeration, pointer, pointer to member, and class type arguments are allowed (except class types with non-trivial copy constructor, non-trivial move constructor, or a non-trivial destructor, which are conditionally-supported with implementation-defined semantics)
Since std::string is a class type with non-trivial copy and move constructors, passing the argument is not allowed. Interestingly, this prohibition, while checkable by a compiler, is not rejected by the compiler as an error.
The first question in the body: "Why is the code below accepted by g++?"
That is a great question. The other answer by #LightnessRacesInOrbit addresses this point very well.
Your second question in the body: "Does it get converted to anything that could be useful to another function with variadic arguments?"
If you run the code, one of the possible results (at run time) is:
.... line 5: 19689 Segmentation fault (core dumped)
so, no, it is not converted into anything, in general, at least not implicitly.
The clarifying question in the comment thread to the question: "I wanted to know "why does the C++ language not disallow this"".
This question appears to be a subjective one, touching on why the C++ language designer(s) and perhaps even the C language designers, did not make their language design robust enough for the language definition to prohibit something other than a string, or memory buffer, or any number of other things, to be sensible as a non-initial argument to scanf. What we do know is that a compiler can often determine such things (that's what linters do, after all!) but we can only guess, really. My guess is that in order to make scanf super typesafe (in the language definition, as opposed to needing a linter) they would need to redefine scanf to use template arguments of some sort. However scanf comes from C, so they did not want to change its signature (that would indeed be wrong, given that C++ wants to be a C superset...).

Non-POD object error

So, I've read a lot online for this error, but for some reason, I'm still getting it even after I have tried the suggested things. If anyone could help me understand this and point out what's wrong, that would be awesome.
char * s = strtok(text, ",");
string name = s;
printf("%s", name);
Given your example code the error you get is saying something like you cannot pass a non-POD object to an ellipses. This is because you are trying to pass a non-POD type to a variadic function, one that takes a variable number of arguments. In this case by calling printf which is declared something like the below
int printf ( const char * format, ... );
The ellipsis used as the last parameter allows you to pass 0 or more additional arguments to the function as you are doing in your code. The C++ standard does allow you to pass a non-POD type but compilers are not required to support it. This is covered in part by 5.2.2/7 of the standard.
Passing a potentially-evaluated argument of class type having a non-trivial copy constructor, a non-trivial move contructor, or a non-trivial destructor, with no corresponding parameter, is conditionally-supported with implementation-defined semantics.
This means it is up to each compiler maker to decide if they want to support it and how it will behave. Apparently your compiler does not support this and even if it did I wouldn't recommend using it.

Widening of integral types?

Imagine you have this function:
void foo(long l) { /* do something with l */}
Now you call it like so at the call site:
foo(65); // here 65 is of type int
Why, (technically) when you specify in the declaration of your function that you are expecting a long and you pass just a number without the L suffix, is it being treated as an int?
Now, I know it is because the C++ Standard says so, however, what is the technical reason that this 65 isn't just promoted to being of type long and so save us the silly error of forgetting L suffix to make it a long explicitly?
I have found this in the C++ Standard:
4.7 Integral conversions [conv.integral]
5 The conversions allowed as integral promotions are excluded from the set of integral conversions.
That a narrowing conversion isn't being done implicitly, I can think with, but here the destination type is obviously wider than the source type.
EDIT
This question is based on a question I saw earlier, which had funny behavior when you didn't specify the L suffix. Example, but perhaps it's a C thing, more than C++?!!
In C++ objects and values have a type, that is independent on how you use them. Then when you use them, if you need a different type it will be converted appropriately.
The problem in the linked question is that varargs is not type-safe. It assumes that you pass in the correct types and that you decode them for what they are. While processing the caller, the compiler does not know how the callee is going to decode each one of the arguments so it cannot possibly convert them for you. Effectively, varargs is as typesafe as converting to a void* and converting back to a different type, if you get it right you get what you pushed in, if you get it wrong you get trash.
Also note that in this particular case, with inlining the compiler has enough information, but this is just a small case of a general family if errors. Consider the printf family of functions, depending on the contents of the first argument each one of the arguments is processed as a different type. Trying to fix this case at the language level would lead to inconsistencies, where in some cases the compiler does the right thing or the wrong one and it would not be clear to the user when to expect which, including the fact that it could do the right thing today, and the wrong one tomorrow if during refactoring the function definition is moved and not available for inlining, or if the logic of the function changes and the argument is processed as one type or another based on some previous parameter.
The function in this instance does receive a long, not an int. The compiler automatically converts any argument to the required parameter type if it's possible without losing any information (as here). That's one of the main reasons function prototypes are important.
It's essentially the same as with an expression like (1L + 1) - because the integer 1 is not the right type, it's implicitly converted to a long to perform the calculation, and the result is a long.
If you pass 65L in this function call, no type conversion is necessary, but there's no practical difference - 65L is used either way.
Although not C++, this is the relevant part of the C99 standard, which also explains the var args note:
If the expression that denotes the called function has a type that
does include a prototype, the arguments are implicitly converted, as
if by assignment, to the types of the corresponding parameters, taking
the type of each parameter to be the unqualified version of its
declared type. The ellipsis notation in a function prototype
declarator causes argument type conversion to stop after the last
declared parameter. The default argument promotions are performed on
trailing arguments.
Why, (technically) when you specify in the declaration of your function that you are expecting a long and you pass just a number without the L suffix, is it being treated as an int?
Because the type of a literal is specified only by the form of the literal, not the context in which it is used. For an integer, that is int unless the value is too large for that type, or a suffix is used to specify another type.
Now, I know it is because the C++ Standard says so, however, what is the technical reason that this 65 isn't just promoted to being of type long and so save us the silly error of forgetting L suffix to make it a long explicitly?
The value should be promoted to long whether or not you specify that type explicitly, since the function is declared to take an argument of type long. If that's not happening, perhaps you could give an example of code that fails, and describe how it fails?
UPDATE: the example you give passes the literal to a function taking untyped ellipsis (...) arguments, not a typed long argument. In that case, the function caller has no idea what type is expected, and only the default argument promotions are applied. Specifically, a value of type int remains an int when passed through ellipsis arguments.
The C standard states:
"The type of an integer constant is the first of the corresponding list in which its value can be represented."
In C89, this list is:
int, long int, unsigned long int
C99 extends that list to include:
long long int, unsigned long long int
As such, when you code is compiled, the literal 65 fits in an int type, and so it's type is accordingly int. The int is then promoted to long when the function is called.
If, for instance, sizeof(int) == 2, and your literal is something like 64000, the type of the value will be a long (assuming sizeof(long) > sizeof(int)).
The suffixes are used to overwrite the default behavior and force the specified literal value to be of a certain type. This can be particularly useful when the integer promotion would be expensive (e.g. as part of an equation in a tight loop).
We have to have a standard meaning for types because for lower level applications, the type REALLY matters, especially for integral types. Low level operators (such as bitshift, add, ect) rely on the type of the input to determine overflow locations. ((65 << 2) with integers is 260 (0x104), but with a single char it is 4! (0x004)). Sometimes you want this behavior, sometimes you don't. As a programmer, you just need to be able to always know what the compiler is going to do. Thus the design decision was made to make the human explicitly declare the integral types of their constants, with "undecorated" as the most commonly used type, integer.
The compiler does automatically "cast" your constant expressions at compile time, such that the effective value passed to the function is long, but up until the cast it is considered an int for this reason.

Is the *only* purpose of a *function signature* (as opp. to type) to define duplicates in a potential overload set - or are there other purposes?

Related to Why does casting a function to a function type that is identical except for return type fail?, I would like to understand, in a fuller way, the distinction between a function's type and a function's signature.
For example, the type of a function must typically be considered when dealing with function pointers, and the type of the function includes the return type of that function.
However, as noted in Mike Seymour's answer to the above-linked question, the signature of a function is different from the type of a function. The signature is certainly used to disambiguate from among potential overloaded functions (noting that the return type of functions does not play a role in identifying unique functions). But, I would now like to understand the relevance and importance of function signatures vs. function types. It occurs to me that the only purpose of function signatures in C++ is to identify overload candidates and/or unique functions in an overload set, during overload resolution.
Am I correct? Is overload resolution the only purpose of function signatures in C++? Or are there any other uses/applications of function signatures, besides (or only indirectly related to) overload resolution?
ADDENDUM For clarity, please note that I am specifically seeking to understand the distinction between the purpose of a function signature and a function type. I.e., I know that a function type is required both for the use of function pointers, and for a compiler/linker's implementation of a calling convention. However, the calling convention is relevant only after overload resolution is complete. I am here asking, specifically, if the only purpose of the function signature (as opposed to type) is for overload resolution.
Am I correct?
As far as I'm concerned, there are other purposes too. Consider that C also has function signatures but doesn't have overloading.
Apart from overloading, the fundamental purpose of function signatures is conforming to the calling convention of a particular platform.
When a function accepts arguments and returns values, the compiler needs to know the type and the size of the arguments in order to pass them correctly to a function. In general, function arguments are pushed onto the stack (this is not a universal rule though, especially on 64-bit architecture systems). Consider the following situation. If you call a function like
foo(42);
how does the compiler know what is the size of the integer value it has to pass to the function? The number 42 can be represented using various bit width, for example as a 1, 2, 4 (or even 8)-byte integer:
00101010
0000000000101010
00000000000000000000000000101010
Now if the function doesn't have a signature which tells that, for instance, the argument is a char (which is 1 byte), or a short (which may be 2 bytes) or an int, which may be 4 bytes, then the compiler has no way of determining the correct size. It means that if it pushes an arbitrary number of bytes to the stack, but the function expects another size, then stack corruption occurs.
Another good example is returning structures (struct). Usually, primitive return values (such as integers and floating-point numbers) are returned in a register; this is generally the EAX register on x86. But what if one wants to write a function returning a struct? if the overall size of the struct is so large that it doesn't fit into a register, the compiler must generate code that pushes the return value onto the stack as opposed to assigning it to a register. So if a function is defined as
int foo()
{
return 1337;
}
or as
struct bar {
int a;
char b[16];
float x;
};
struct bar foo()
{
struct bar ret;
ret.a = 0;
memcpy(&ret.b, "abcdefghijklmno", sizeof(ret.b));
ret.x = 3.1415927;
return ret;
}
different assembly (and machine code) will be generated - the first function that returns an integer will use the EAX register for storing the return value, but the second call will have to use the stack.
The standard mentions that signatures are used for name mangling and linking.
That being said, name mangling is not standarized. The return type is redundant in a function symbol (since there is only one possible return type for a function with a given name and arguments in a valid program, it is not required to differentiate two different symbols), but even then some ABIs do include the return type of a function in the mangled name, probably as a way of double checking that there is no violation of the rule above.