Imagine you have this function:
void foo(long l) { /* do something with l */}
Now you call it like so at the call site:
foo(65); // here 65 is of type int
Why, (technically) when you specify in the declaration of your function that you are expecting a long and you pass just a number without the L suffix, is it being treated as an int?
Now, I know it is because the C++ Standard says so, however, what is the technical reason that this 65 isn't just promoted to being of type long and so save us the silly error of forgetting L suffix to make it a long explicitly?
I have found this in the C++ Standard:
4.7 Integral conversions [conv.integral]
5 The conversions allowed as integral promotions are excluded from the set of integral conversions.
That a narrowing conversion isn't being done implicitly, I can think with, but here the destination type is obviously wider than the source type.
EDIT
This question is based on a question I saw earlier, which had funny behavior when you didn't specify the L suffix. Example, but perhaps it's a C thing, more than C++?!!
In C++ objects and values have a type, that is independent on how you use them. Then when you use them, if you need a different type it will be converted appropriately.
The problem in the linked question is that varargs is not type-safe. It assumes that you pass in the correct types and that you decode them for what they are. While processing the caller, the compiler does not know how the callee is going to decode each one of the arguments so it cannot possibly convert them for you. Effectively, varargs is as typesafe as converting to a void* and converting back to a different type, if you get it right you get what you pushed in, if you get it wrong you get trash.
Also note that in this particular case, with inlining the compiler has enough information, but this is just a small case of a general family if errors. Consider the printf family of functions, depending on the contents of the first argument each one of the arguments is processed as a different type. Trying to fix this case at the language level would lead to inconsistencies, where in some cases the compiler does the right thing or the wrong one and it would not be clear to the user when to expect which, including the fact that it could do the right thing today, and the wrong one tomorrow if during refactoring the function definition is moved and not available for inlining, or if the logic of the function changes and the argument is processed as one type or another based on some previous parameter.
The function in this instance does receive a long, not an int. The compiler automatically converts any argument to the required parameter type if it's possible without losing any information (as here). That's one of the main reasons function prototypes are important.
It's essentially the same as with an expression like (1L + 1) - because the integer 1 is not the right type, it's implicitly converted to a long to perform the calculation, and the result is a long.
If you pass 65L in this function call, no type conversion is necessary, but there's no practical difference - 65L is used either way.
Although not C++, this is the relevant part of the C99 standard, which also explains the var args note:
If the expression that denotes the called function has a type that
does include a prototype, the arguments are implicitly converted, as
if by assignment, to the types of the corresponding parameters, taking
the type of each parameter to be the unqualified version of its
declared type. The ellipsis notation in a function prototype
declarator causes argument type conversion to stop after the last
declared parameter. The default argument promotions are performed on
trailing arguments.
Why, (technically) when you specify in the declaration of your function that you are expecting a long and you pass just a number without the L suffix, is it being treated as an int?
Because the type of a literal is specified only by the form of the literal, not the context in which it is used. For an integer, that is int unless the value is too large for that type, or a suffix is used to specify another type.
Now, I know it is because the C++ Standard says so, however, what is the technical reason that this 65 isn't just promoted to being of type long and so save us the silly error of forgetting L suffix to make it a long explicitly?
The value should be promoted to long whether or not you specify that type explicitly, since the function is declared to take an argument of type long. If that's not happening, perhaps you could give an example of code that fails, and describe how it fails?
UPDATE: the example you give passes the literal to a function taking untyped ellipsis (...) arguments, not a typed long argument. In that case, the function caller has no idea what type is expected, and only the default argument promotions are applied. Specifically, a value of type int remains an int when passed through ellipsis arguments.
The C standard states:
"The type of an integer constant is the first of the corresponding list in which its value can be represented."
In C89, this list is:
int, long int, unsigned long int
C99 extends that list to include:
long long int, unsigned long long int
As such, when you code is compiled, the literal 65 fits in an int type, and so it's type is accordingly int. The int is then promoted to long when the function is called.
If, for instance, sizeof(int) == 2, and your literal is something like 64000, the type of the value will be a long (assuming sizeof(long) > sizeof(int)).
The suffixes are used to overwrite the default behavior and force the specified literal value to be of a certain type. This can be particularly useful when the integer promotion would be expensive (e.g. as part of an equation in a tight loop).
We have to have a standard meaning for types because for lower level applications, the type REALLY matters, especially for integral types. Low level operators (such as bitshift, add, ect) rely on the type of the input to determine overflow locations. ((65 << 2) with integers is 260 (0x104), but with a single char it is 4! (0x004)). Sometimes you want this behavior, sometimes you don't. As a programmer, you just need to be able to always know what the compiler is going to do. Thus the design decision was made to make the human explicitly declare the integral types of their constants, with "undecorated" as the most commonly used type, integer.
The compiler does automatically "cast" your constant expressions at compile time, such that the effective value passed to the function is long, but up until the cast it is considered an int for this reason.
Related
Why are parameters promoted when it comes to a variadic function,for instance floats are promoted to double ext and in which order are they promoted?
Variadic arguments - cppreference.com
Default conversions
When a variadic function is called, after lvalue-to-rvalue, array-to-pointer, and function-to-pointer conversions, each argument that is a part of the variable argument list undergoes additional conversions known as default argument promotions:
std::nullptr_t is converted to void*
float arguments are converted to double as in floating-point promotion
bool, char, short, and unscoped enumerations are converted to int or wider integer types as in integer promotion
Why are parameters promoted
Because that is how the language has been specified.
You may be thinking, why has the language been specified that way. I don't know if there is published rationale for this choice, but I suspect that the answer is as simple as: Because that is how the C language had been specified
You may be thinking, why was the C language specified that way. There is a standard document N1256 discussing design rationale of some choices for the C99 standard. It seems to not cover this choice. Besides, C language existed long before its standardisation and C99 wasn't even the first standard version. This behaviour may have existed before the involvement of the committee.
For what it's worth, same promotion rules apply also to calling functions that haven't been declared (until C99) or calling a fixed argument function through a prototype which doesn't declare the parameters:
// this is C lanugage
void fun();
int main(int, char [][]) {
float f = 42;
fun(f); // argument promotes to double
undeclared(f); // ill-formed since C99
// argument promotes to double prior to C99
The reasons for this may be similar to the reasons for promotion in case of variable parameter lists.
Promotion of arguments for variadic functions make it way more easier to deal with them. Since the function code doesn't know the actual type of arguments from the function signature, calling has to communicate the type through some other means, and promotion reduces the number of options without sacrificing the flexibility.
For example, consider classical example of variadic function - printf. When you give it %f argument, it already knows that the argument is double precision, since it would be promoted. Absence promotion, two different modifiers would have to exist, one for single precision and another one for double precision.
Another example would be integral promotions. Currently any type would work with %d modifier, and while modifiers for short versions do exist, one is not required to use them, and can simplify their code.
In addition, it provides for fewer surprises when using some other variadic functions. For example, Posix open function is shown as if it would be an overloaded function with either 2 or 3 arguments, last argument being specified in the man as mode_t type. In fact, there are no overloads in C, so there are no two versions of open - there is only one, which is a variadic one.
Absent of promotions, one would have to make sure that when 3-argument version is used, the last argument is exactly mode_t type, which would be quite inconvenient, counterintuitive and failure to do so would likely lead to quite unexpected behavior. Automatic promotions save us from this.
Why is the code below accepted by g++?
#include <cstdio>
#include <string>
int main()
{
std::string str;
scanf("%s", str);
}
What sense does it make to pass a class object to scanf()? Does it get converted to anything that could be useful to another function with variadic arguments?
scanf comes from C. In C if you wanted to have variable number of arguments (like scanf needs) the only solution was variadic function. Variadic functions by design are not type safe, i.e. you can pass absolutely any type and a varargs function will happily accept them. It is a limitation of the C language. That doesn't mean that any type is valid. If an type other than what is actually expected is passed, then we are in the wonderful land of Undefined Behavior.
That being said, scanf is a standard function and what it can accept is known, so most compilers will do extra checks (not required by the standard) if you enable the right flags. See Neil's answer for that.
In C++ (since C++11) we have parameter packs which are type safe ...ish (oh, concepts cannot get sooner).
Enable some warnings. With -Wextra -Wall -pedantic, you will get:
a.cpp:7:10: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'std::__cxx11::string' {aka 'std::__cxx11::basic_string<char>'} [-Wformat=]
scanf("%s", str);
If you want that to be an error rather than a warning, add -Werror.
You have two distinct problems here, not just one:
The passing of a std::string through variadic arguments (which has undefined behaviour), and
The passing of a std::string to a function whose logical semantics expected a char* instead (which has undefined behaviour).
So, no, it doesn't make sense. But it's not a hard error. If you're asking why this has undefined behaviour rather than being ill-formed (and requiring a hard error), I do not know specifically but the answer is usually that it was deemed insufficiently important to require compilers to go to the trouble it would take to diagnose it.
Also, it would be unusual for a logical precondition violation to be deemed ill-formed (just as a matter of convention and consistency; many such violations could not be detected before runtime), so I'd expect point #2 to have undefined behaviour regardless of what hypothetical changes we made to the language to better reject cases of point #1.
Anyway, in the twenty years since standardisation, we've reached a point in technology where the mainstream toolchains do warn on it anyway, and since warnings can be turned into errors, it doesn't really matter.
To answer each of your questions...
The question in the title: "Why isn't it a compile error if you pass a class object to scanf?"
Because the declaration of scanf is int scanf ( const char * format, ... ); which means it will accept any number of arguments after the format string as variadic arguments. The rules for such arguments are:
When a variadic function is called, after lvalue-to-rvalue, array-to-pointer, and function-to-pointer conversions, each argument that is a part of the variable argument list undergoes additional conversions known as default argument promotions:
std::nullptr_t is converted to void*
float arguments are converted to double as in floating-point promotion
bool, char, short, and unscoped enumerations are converted to int or wider integer types as in integer promotion
Only arithmetic, enumeration, pointer, pointer to member, and class type arguments are allowed (except class types with non-trivial copy constructor, non-trivial move constructor, or a non-trivial destructor, which are conditionally-supported with implementation-defined semantics)
Since std::string is a class type with non-trivial copy and move constructors, passing the argument is not allowed. Interestingly, this prohibition, while checkable by a compiler, is not rejected by the compiler as an error.
The first question in the body: "Why is the code below accepted by g++?"
That is a great question. The other answer by #LightnessRacesInOrbit addresses this point very well.
Your second question in the body: "Does it get converted to anything that could be useful to another function with variadic arguments?"
If you run the code, one of the possible results (at run time) is:
.... line 5: 19689 Segmentation fault (core dumped)
so, no, it is not converted into anything, in general, at least not implicitly.
The clarifying question in the comment thread to the question: "I wanted to know "why does the C++ language not disallow this"".
This question appears to be a subjective one, touching on why the C++ language designer(s) and perhaps even the C language designers, did not make their language design robust enough for the language definition to prohibit something other than a string, or memory buffer, or any number of other things, to be sensible as a non-initial argument to scanf. What we do know is that a compiler can often determine such things (that's what linters do, after all!) but we can only guess, really. My guess is that in order to make scanf super typesafe (in the language definition, as opposed to needing a linter) they would need to redefine scanf to use template arguments of some sort. However scanf comes from C, so they did not want to change its signature (that would indeed be wrong, given that C++ wants to be a C superset...).
Where I can find an excellently understandable article on C++ type conversion covering all of its types (promotion, implicit/explicit, etc.)?
I've been learning C++ for some time and, for example, virtual functions mechanism seems clearer to me than this topic. My opinion is that it is due to the textbook's authors who are complicating too much (see Stroustroup's book and so on).
(Props to Crazy Eddie for a first answer, but I feel it can be made clearer)
Type Conversion
Why does it happen?
Type conversion can happen for two main reasons. One is because you wrote an explicit expression, such as static_cast<int>(3.5). Another reason is that you used an expression at a place where the compiler needed another type, so it will insert the conversion for you. E.g. 2.5 + 1 will result in an implicit cast from 1 (an integer) to 1.0 (a double).
The explicit forms
There are only a limited number of explicit forms. First off, C++ has 4 named versions: static_cast, dynamic_cast, reinterpret_cast and const_cast. C++ also supports the C-style cast (Type) Expression. Finally, there is a "constructor-style" cast Type(Expression).
The 4 named forms are documented in any good introductory text. The C-style cast expands to a static_cast, const_cast or reinterpret_cast, and the "constructor-style" cast is a shorthand for a static_cast<Type>. However, due to parsing problems, the "constructor-style" cast requires a singe identifier for the name of the type; unsigned int(-5) or const float(5) are not legal.
The implicit forms
It's much harder to enumerate all the contexts in which an implicit conversion can happen. Since C++ is a typesafe OO language, there are many situations in which you have an object A in a context where you'd need a type B. Examples are the built-in operators, calling a function, or catching an exception by value.
The conversion sequence
In all cases, implicit and explicit, the compiler will try to find a conversion sequence. A conversion sequence is a series of steps that gets you from type A to type B. The exact conversion sequence chosen by the compiler depends on the type of cast. A dynamic_cast is used to do a checked Base-to-Derived conversion, so the steps are to check whether Derived inherits from Base, via which intermediate class(es). const_cast can remove both const and volatile. In the case of a static_cast, the possible steps are the most complex. It will do conversion between the built-in arithmetic types; it will convert Base pointers to Derived pointers and vice versa, it will consider class constructors (of the destination type) and class cast operators (of the source type), and it will add const and volatile. Obviously, quite a few of these step are orthogonal: an arithmetic type is never a pointer or class type. Also, the compiler will use each step at most once.
As we noted earlier, some type conversions are explicit and others are implicit. This matters to static_cast because it uses user-defined functions in the conversion sequence. Some of the conversion steps consiered by the compiler can be marked as explicit (In C++03, only constructors can). The compiler will skip (no error) any explicit conversion function for implicit conversion sequences. Of course, if there are no alternatives left, the compiler will still give an error.
The arithmetic conversions
Integer types such as char and short can be converted to "greater" types such as int and long, and smaller floating-point types can similarly be converted into greater types. Signed and unsigned integer types can be converted into each other. Integer and floating-point types can be changed into each other.
Base and Derived conversions
Since C++ is an OO language, there are a number of casts where the relation between Base and Derived matters. Here it is very important to understand the difference between actual objects, pointers, and references (especially if you're coming from .Net or Java). First, the actual objects. They have precisely one type, and you can convert them to any base type (ignoring private base classes for the moment). The conversion creates a new object of base type. We call this "slicing"; the derived parts are sliced off.
Another type of conversion exists when you have pointers to objects. You can always convert a Derived* to a Base*, because inside every Derived object there is a Base subobject. C++ will automatically apply the correct offset of Base with Derived to your pointer. This conversion will give you a new pointer, but not a new object. The new pointer will point to the existing sub-object. Therefore, the cast will never slice off the Derived part of your object.
The conversion the other way is trickier. In general, not every Base* will point to Base sub-object inside a Derived object. Base objects may also exist in other places. Therefore, it is possible that the conversion should fail. C++ gives you two options here. Either you tell the compiler that you're certain that you're pointing to a subobject inside a Derived via a static_cast<Derived*>(baseptr), or you ask the compiler to check with dynamic_cast<Derived*>(baseptr). In the latter case, the result will be nullptr if baseptr doesn't actually point to a Derived object.
For references to Base and Derived, the same applies except for dynamic_cast<Derived&>(baseref) : it will throw std::bad_cast instead of returning a null pointer. (There are no such things as null references).
User-defined conversions
There are two ways to define user conversions: via the source type and via the destination type. The first way involves defining a member operator DestinatonType() const in the source type. Note that it doesn't have an explicit return type (it's always DestinatonType), and that it's const. Conversions should never change the source object. A class may define several types to which it can be converted, simply by adding multiple operators.
The second type of conversion, via the destination type, relies on user-defined constructors. A constructor T::T which can be called with one argument of type U can be used to convert a U object into a T object. It doesn't matter if that constructor has additional default arguments, nor does it matter if the U argument is passed by value or by reference. However, as noted before, if T::T(U) is explicit, then it will not be considered in implicit conversion sequences.
it is possible that multiple conversion sequences between two types are possible, as a result of user-defined conversion sequences. Since these are essentially function calls (to user-defined operators or constructors), the conversion sequence is chosen via overload resolution of the different function calls.
Don't know of one so lets see if it can't be made here...hopefully I get it right.
First off, implicit/explicit:
Explicit "conversion" happens everywhere that you do a cast. More specifically, a static_cast. Other casts either fail to do any conversion or cover a different range of topics/conversions. Implicit conversion happens anywhere that conversion is happening without your specific say-so (no casting). Consider it thusly: Using a cast explicitly states your intent.
Promotion:
Promotion happens when you have two or more types interacting in an expression that are of different size. It is a special case of type "coercion", which I'll go over in a second. Promotion just takes the small type and expands it to the larger type. There is no standard set of sizes for numeric types but generally speaking, char < short < int < long < long long, and, float < double < long double.
Coercion:
Coercion happens any time types in an expression do not match. The compiler will "coerce" a lesser type into a greater type. In some cases, such as converting an integer to a double or an unsigned type into a signed type, information can be lost. Coercion includes promotion, so similar types of different size are resolved in that manner. If promotion is not enough then integral types are converted to floating types and unsigned types are converted to signed types. This happens until all components of an expression are of the same type.
These compiler actions only take place regarding raw, numeric types. Coercion and promotion do not happen to user defined classes. Generally speaking, explicit casting makes no real difference unless you are reversing promotion/coercion rules. It will, however, get rid of compiler warnings that coercion often causes.
User defined types can be converted though. This happens during overload resolution. The compiler will find the various entities that resemble a name you are using and then go through a process to resolve which of the entities should be used. The "identity" conversion is preferred above all; this means that a f(t) will resolve to f(typeof_t) over anything else (see Function with parameter type that has a copy-constructor with non-const ref chosen? for some confusion that can generate). If the identity conversion doesn't work the system then goes through this complex higherarchy of conversion attempts that include (hopefully in the right order) conversion to base type (slicing), user-defined constructors, user-defined conversion functions. There's some funky language about references which will generally be unimportant to you and that I don't fully understand without looking up anyway.
In the case of user type conversion explicit conversion makes a huge difference. The user that defined a type can declare a constructor as "explicit". This means that this constructor will never be considered in such a process as I described above. In order to call an entity in such a way that would use that constructor you must explicitly do so by casting (note that syntax such as std::string("hello") is not, strictly speaking, a call to the constructor but instead a "function-style" cast).
Because the compiler will silently look through constructors and type conversion overloads during name resolution, it is highly recommended that you declare the former as 'explicit' and avoid creating the latter. This is because any time the compiler silently does something there's room for bugs. People can't keep in mind every detail about the entire code tree, not even what's currently in scope (especially adding in koenig lookup), so they can easily forget about some detail that causes their code to do something unintentional due to conversions. Requiring explicit language for conversions makes such accidents much more difficult to make.
For integer types, check the book Secure Coding n C and C++ by Seacord, the chapter about integer overflows.
As for implicit type conversions, you will find the books Effective C++ and More Effective C++ to be very, very useful.
In fact, you shouldn't be a C++ developer without reading these.
Given a C++ function f(X x) where x is a variable of type X, and a variable y of type Y, what are all the automatic/implicit conversions the C++ compiler will perform on y so that the statement "f(y);" is legal code (no errors, no warnings)?
For example:
Pass Derived& to function taking Base& - ok
Pass Base& to function Derived& - not ok without a cast
Pass int to function taking long - ok, creates a temporary long
Pass int& to function taking long& - NOT ok, taking reference to temporary
Note how the built-in types have some quirks compared to classes: a Derived can be passed to function taking a Base (although it gets sliced), and an int can be passed to function taking a long, but you cannot pass an int& to a function taking a long&!!
What's the complete list of cases that are always "ok" (don't need to use any cast to do it)?
What it's for: I have a C++ script-binding library that lets you bind your C++ code and it will call C++ functions at runtime based on script expressions. Since expressions are evaluated at runtime, all the legal combinations of source types and function argument types that might need to be used in an expression have to be anticipated ahead of time and precompiled in the library so that they'll be usable at runtime. If I miss a legal combination, some reasonable expressions won't work in runtime expressions; if I accidently generate a combination that isn't legal C++, my library just won't compile.
Edit (narrowing the question):
Thanks, all of your answers are actually pretty helpful. I knew the answer was complicated, but it sounds like I've only seen the tip of the iceberg.
Let me rephrase the question a little then to limit its scope then:
I will let the user specify a list of "BaseClasses" and a list of "UserDefinedConversions". For Bases, I'll generate everything including reference and pointer conversions. But what cases (const/reference/pointer) can I safely do from the UserDefined Conversions list? (The user will give bare types, I will decorate with *, &, const, etc. in the template.)
C++ Standard gives the answer to your question in 13.3.3.1 Implicit conversion sequences, but it too large to post it here. I recommend you to read at least that part of C++ Standard.
Hope this link will help you.
Unfortunately the answer to your question is hugely complex, occupying at least 9 pages in the ISO C++ standard (specifically: ~6 pages in "3 Standard Conversions" and ~3 pages in "13.3.3.1 Implicit Conversion Sequences").
Brief summary: A conversion that does not require a cast is called an "implicit conversion sequence". C++ has "standard conversions", which are conversions between fundamental types (such as char being promoted to int) and things such as array-to-pointer decay; there can be several of these in a row, hence the term "sequences". C++ also permits user-defined conversions, which are defined by conversion functions and converting constructors. The important thing to note is that an implicit conversion sequence can have at most one user-defined conversion, with optionally a sequence of standard conversions on either side -- C++ will never "chain" more than one user-defined conversion together without a cast.
(If anyone would like to flesh this post out with the full details, please go ahead... But for me, that would just be too exhausting, sorry :-/)
Note how the built-in types have some
quirks compared to classes: a Derived
can be passed to function taking a
Base (although it gets sliced), and an
int can be passed to function taking a
long, but you cannot pass an int& to a
function taking a long&!!
That's not a quirk of built-in vs. class types. It's a quirk of inheritance.
If you had classes A and B, and B had a conversion to A (either because A has a constructor taking B, or because B has a conversion operator to A), then they'd behave just like int and long in this respect - conversion can occur where a function takes a value, but not where it takes a non-const reference. In both cases the problem is that there is no object to which the necessary non-const reference can be taken: a long& can't refer to an int, and an A& can't refer to a B, and no non-const reference can refer to a temporary.
The reason the base/derived example doesn't encounter this problem because a non-const Base reference can refer to a Derived object. The fact that the types are user-defined is a necessary but not a sufficient condition for the reference to be legal. Convertible user-defined classes where there is no inheritance behave just like built-ins.
This comment is way too long for comments, so I've used an answer. It doesn't actually answer your question, though, other than to distinguish between:
"Conversions" where a reference to a derived class is passed to a function taking a reference to a base class.
Conversions where a user-defined or built-in conversion actually creates an object, such as from int to long.
This question is about vararg functions, and the last named parameter of them, before the ellipsis:
void f(Type paramN, ...) {
va_list ap;
va_start(ap, paramN);
va_end(ap);
}
I was reading in the C Standard, and found the following restriction for the va_start macro:
The parameter parmN is the identifier of the rightmost parameter in the variable parameter list in the function definition (the one just before the , ...). If the parameter parmN is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions, the behavior is undefined.
I wonder why the behavior is undefined for the following code
void f(int paramN[], ...) {
va_list ap;
va_start(ap, paramN);
va_end(ap);
}
and not undefined for the following
void f(int *paramN, ...) {
va_list ap;
va_start(ap, paramN);
va_end(ap);
}
The macros are intended to be implementable by pure C code. But pure C code cannot find out whether or not paramN was declared as an array or as a pointer. In both cases, the type of the parameter is adjusted to be a pointer. The same is true for function type parameters.
I wonder: What is the rationale of this restriction? Do some compilers have problems with implementing this when these parameter adjustments are in place internally? (The same undefined behavior is stated for C++ - so my question is about C++ aswell).
The restriction against register parameters or function parameters are probably something like:
you are not allowed to take the address of a variable with the register storage class.
function pointers are sometimes quite different than pointers to objects. For example, they might be larger than pointers to objects (you can't reliably convert a function pointer to an object pointer and back again), so adding some fixed number to the address of a function pointer might not get you to the next parameter. If va_start() and/or va_arg() were implemented by adding some fixed amount to the address of paramN and function pointers were larger than object pointers the calculation would end up with the wrong address for the object va_arg() returns. This might not seem to be a great way to implement these macros, but there might be platforms that have (or even need) this type of implementation.
I can't think of what the problem would be to prevent allowing array parameters, but PJ Plauger says this in his book "The Standard C Library":
Some of the restrictions imposed on the macros defined in <stdarg.h> seem unnecessarily severe. For some implementations, they are. Each was introduced, however, to meet the needs of at least one serious C implementation.
And I imagine that there are few people who know more about the ins and outs of the C library than Plauger. I hope someone can answer this specific question with an actual example; I think it would be an interesting bit of trivia.
New info:
The "Rationale for International Standard - Programming Languages - C" says this about va_start():
The parmN argument to va_start was intended to be an aid to implementors writing the
definition of a conforming va_start macro entirely in C, even using pre-C89 compilers (for example, by taking the address of the parameter). The restrictions on the declaration of the parmN parameter follow from the intent to allow this kind of implementation, as applying the & operator to a parameter name might not produce the intended result if the parameter’s declaration did not meet these restrictions.
Not that that helps me with the restriction on array parameters.
It's not undefined. Keep in mind that when parameter is declared as int paramN[], the actual parameter type will still decay to int* paramN immediately (which is visible in C++, for example, if you apply typeid to paramN).
I must admit that I'm not sure what this bit in the spec is even for, considering that you cannot have parameters of function or array types in the first place (since they will pointer-decay).
I found another relevant quote, from Dinkumware.
The last parameter must not have
register storage class, and it must
have a type that is not changed by the
translator. It cannot have:
* an array type
* a function type
* type float
* any integer type that changes when promoted
* a reference type [C++ only]
So apparently, the problem is precisely that the parameter gets passed in a way different from how it is declared. Interestingly enough, they also ban float and short, even though those should be supported by the standard.
As a hypothesis, it could be that some compilers have problems doing sizeof correctly on such parameters. E.g. it might be that, for
int f(int x[10])
{
return sizeof(x);
}
some (buggy) compiler will return 10*sizeof(int), thus breaking the va_start implementation.
I can only guess that the register restriction is there to ease library/compiler implementation -- it eliminates a special case for them to worry about.
But I have no clue about the array/function restriction. If it were in the C++ standard only, I would hazard a guess that there is some obscure template matching scenario where the difference between a parameter of type T[] and of type T* makes a difference, correct handling of which would complicate va_start etc. But since this clause appears in the C standard too, obviously that explanation is ruled out.
My conclusion: an oversight in the standards. Possible scenario: some pre-standard C compiler implemented parameters of type T[] and T* differently, and the spokesperson for that compiler on the C standards committee had the above restrictions added to the standard; that compiler later became obsolete, but no-one felt the restrictions were compelling enough to update the standard.
C++11 says:
[n3290: 13.1/3]: [..] Parameter declarations that differ only in a
pointer * versus an array [] are equivalent. That is, the array
declaration is adjusted to become a pointer declaration. [..]
and C99 too:
[C99: 6.7.5.3/7]: A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation. [..]
And you said:
But pure C code cannot find out whether or not paramN was declared as an array or as a pointer. In both cases, the type of the parameter is adjusted to be a pointer.
Right, so there's no difference between the two pieces of code you showed us. Both have paramN declared as a pointer; there is actually no array type there at all.
So why would there be a difference between the two when it comes to the UB?
The passage you quoted...
The parameter parmN is the identifier of the rightmost parameter in the variable parameter list in the function definition (the one just before the , ...). If the parameter parmN is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions, the behavior is undefined.
...applies to neither, as would be expected.