Is it undefined behaviour to allocate overlarge stack structures? - c++

This is a C specification question.
We all know this is legal C and should work fine on any platform:
/* Stupid way to count the length of a number */
int count_len(int val) {
char buf[256];
return sprintf(buf, "%d", val);
}
But this is almost guaranteed to crash:
/* Stupid way to count the length of a number */
int count_len(int val) {
char buf[256000000];
return sprintf(buf, "%d", val);
}
The difference is that the latter program blows the stack and will probably crash. But, purely semantically, it really isn't any different than the previous program.
According to the C spec, is the latter program actually undefined behavior? If so, what distinguishes it from the former? If not, what in the C spec says it's OK for a conforming implementation to crash?
(If this differs between C89/C99/C11/C++*, this would be interesting too).

Language standards for C(89,99,11) begin with a scope section with this wording (also found in some C++, C#, Fortran and Pascal standards):
This International Standard does not specify
the size or complexity of a program and its data that will exceed the capacity of any specific data-processing system or the capacity of a particular processor;
all minimal requirements of a data-processing system that is capable of supporting a conforming implementation.
The gcc compiler does offer an option to check for stack overflow at runtime
21.1 Stack Overflow Checking
For most operating systems, gcc does not perform stack overflow checking by default. This means that if the main environment task or some other task exceeds the available stack space, then unpredictable behavior will occur. Most native systems offer some level of protection by adding a guard page at the end of each task stack. This mechanism is usually not enough for dealing properly with stack overflow situations because a large local variable could “jump” above the guard page. Furthermore, when the guard page is hit, there may not be any space left on the stack for executing the exception propagation code. Enabling stack checking avoids such situations.
To activate stack checking, compile all units with the gcc option -fstack-check. For example:
gcc -c -fstack-check package1.adb
Units compiled with this option will generate extra instructions to check that any use of the stack (for procedure calls or for declaring local variables in declare blocks) does not exceed the available stack space. If the space is exceeded, then a Storage_Error exception is raised.
There was an attempt during the standardization process for C99 to make a stronger statement within the standard that while size and complexity are beyond the scope of the standard the implementer has a responsibility to document the limits.
The rationale was
The definition of conformance has always been a problem with the C
Standard, being described by one author as "not even rubber teeth, more
like rubber gums". Though there are improvements in C9X compared with C89,
many of the issues still remain.
This paper proposes changes which, while not perfect, hopefully improve the
situation.
The following wording was suggested for inclusion to section 5.2.4.1
translation or execution might fail if the size or complexity of a program or its data exceeds the capacity of the implementation.
The implementation shall document a way to determine if the size or complexity of a correct program exceeds or might exceed the capacity of the implementation.
5.2.4.1. An implementation is always free to state that a given program is
too large or too complex to be translated or executed. However, to stop
this being a way to claim conformance while providing no useful facilities
whatsoever, the implementer must show provide a way to determine whether a
program is likely to exceed the limits. The method need not be perfect, so
long as it errs on the side of caution.
One way to do this would be to have a formula which converted values such
as the number of variables into, say, the amount of memory the compiler
would need. Similarly, if there is a limit on stack space, the formula need
only show how to determine the stack requirements for each function call
(assuming this is the only place the stack is allocated) and need not work
through every possible execution path (which would be impossible in the
face of recursion). The compiler could even have a mode which output a
value for each function in the program.
The proposed wording did not make it into the C99 standard, and therefore this area remained outside the scope of the standard. Section 5.2.4.1 of C99 does list these limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
127 nesting levels of blocks
63 nesting levels of conditional inclusion
12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or incomplete type in a declaration
63 nesting levels of parenthesized declarators within a full declarator
63 nesting levels of parenthesized expressions within a full expression
63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character)
31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)
4095 external identifiers in one translation unit
511 identifiers with block scope declared in one block
4095 macro identifiers simultaneously defined in one preprocessing translation unit
127 parameters in one function definition
127 arguments in one function call
127 parameters in one macro definition
127 arguments in one macro invocation
4095 characters in a logical source line
4095 characters in a character string literal or wide string literal (after concatenation)
65535 bytes in an object (in a hosted environment only)
15 nesting levels for #included files
1023 case labels for a switch statement (excluding those for any nested switch statements)
1023 members in a single structure or union
1023 enumeration constants in a single enumeration
63 levels of nested structure or union definitions in a single struct-declaration-list

In C++, Annex B indicates that the maximum size of an object is an implementation-specific finite number. That would tend to limit arrays with automatic storage class.
However, I'm not seeing something specifically for space accumulated by all automatic variables on the call stack, which is where a stack overflow ought to be triggered. I'm also not seeing a recursion limit in Annex B, although that would be closely related.

The C standard is silent on all issues relating to stack overflow. This is a bit strange since it's very vocal in just about every other corner of C programming. AFAIK there is no specification that a certain amount of automatic storage must be available, and no way of detecting or recovering from exhaustion of the space available for automatic storage. The abstract machine is assumed to have an unlimited amount of automatic storage.

I believe the behavior is undefined by omission -- if a 250,000,000-byte local object actually exceeds the implementation's capacity.
Quoting the 2011 ISO C standard, section 1 (Scope), paragraph 2:
This International Standard does not specify
[...]
- the size or complexity of a program and its data that will exceed the capacity of any
specific data-processing system or the capacity of a particular processor
So the standard explicitly acknowledges that a program may exceed the capacity of an implementation.
I think we can safely assume that a program that exceeds the capacity of an implementation is not required to behave the same way as one that does not; otherwise there would be no point in mentioning it.
Since nothing in the standard defines the behavior of such a program, the behavior is undefined. This is specified by section 4 (Conformance), paragraph 2:
[...] Undefined behavior is otherwise indicated in this
International Standard by the words "undefined behavior" or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe "behavior
that is undefined".
Of course an implementation on most modern computers could easily allocate 250 million bytes of memory; that's only a small fraction of the available RAM on the computer I'm typing this on, for example. But many operating systems place a fairly low limit on the amount of stack space that a program can allocate.
(Incidentally, I'm assuming that the code in the question is a fragment of some complete program that actually calls the function. As it stands, the code has no behavior, since there's no call to count_len, nor is there a main function. I might not normally mention that, but you did use the "language-lawyer" tag.)
Anyone who would argue that the behavior is not undefined should explain either (a) why having the program crash does not make the implementation non-conforming, or (b) how a program crash is within the scope of defined behavior (even if it's implementation-defined or unspecified).

Related

Is it legal to read into an array that doesn't have space for the null terminator?

Every C string is terminated with the null character when it is stored in a C-string variable, and this always consumes one array position.
So why is this legal:
char name[3];
std::cin >> name;
when a 3 letter word "cow" is input to std::cin?
Why is it allowing this?
The compiler cannot know what input you will be giving at runtime, so it cannot say that you will be doing something illegal. If the input fits in the array, then it's legal.
Prior C++20:
The behaviour of the program is undefined. Never do this.
Since C++20:
The input will be truncated to fit into the array, so the array will contain the string "co".
It is allowed because the memory management is left to the programmer.
In this case the error occurs at run-time and it's impossible for the compiler to spot it at compile time.
What you are getting here is an undefined behavior, where the program can crash, can go on smoothly, or present some odd behavior in the future.
Most likely here you are overflowing the allocated memory by one byte which will be stored in the first byte of the subsequent declared variable, or in other addresses which may or may not be already filled with relevant information.
You eventually may experience an abort of the execution if - for example - some read-only memory area is overwritten, or if you point your Instruction Pointer to some invalid area (again, by overwriting its value in memory)
Depends what you mean by "legal". In context of your question, there is no precise meaning.
According to the standard (at least before C++20) your code (if supplied with that input) has undefined behaviour. When behaviour is undefined, any observable outcome is permitted, and no diagnostics are required.
In particular, there is no requirement that an implementation (the compiler, the host system, or anything else) diagnose a problem (e.g. issue an error message), take preventative action (e.g. electrocute the programmer for entering "cow" before hitting the enter key), or take recovery action (e.g. truncate the input to "co").
The reason the standard allows such things is a combination of;
allowing implementation freedoms. For example, the standard permits but does not require your host system to electrocute the user for entering bad input to your program - but doesn't prevent an evil-minded system developer providing hardware and driver support to do exactly that;
technical infeasibility or technical difficulty. In a simple case like your code might be easy to detect. But, in more complicated programs (e.g. your code is buried away among other code) it might be impossible to identify a problem. Or, it might be possible, but take much longer than a programmer is willing/able to spend waiting.
In short, the C++ standard does not coddle the programmer. The programmer - not the standard, not the compiler - is responsible for avoiding undefined behaviour, and responsible if a program with undefined behaviour does undesirable things.

What is the maximum number of dimensions allowed for an array, and why?

What is the maximum number of dimensions that you can use when declaring an array?
For Example.
#include <iostream.h>
#include <conio.h>
{
int a[3][3][3][4][3];
a[2][2][2][2][2] = 9;
}
So, how many dimensions can we declare on an array.
What is limitation of it?
And what is reason behind it?
ISO/IEC 9899:2011 — C
In C, the C11 standard requires:
5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that
contains at least one instance of every one of the following limits:18)
…
12 pointer, array, and function declarators (in any combinations) modifying an
arithmetic, structure, union, or void type in a declaration.
…
18) Implementations should avoid imposing fixed translation limits whenever possible.
That means that to be a standard-compliant compiler, it must allow at least 12 array dimensions on a simple type like int, but should avoid imposing any limit if at all possible. The C90 and C99 standards also required the same limit.
ISO/IEC 14882:2011 — C++
For C++11, the equivalent information is:
Annex B (informative) Implementation quantities [implimits]
Because computers are finite, C++ implementations are inevitably limited in the size of the programs they
can successfully process. Every implementation shall document those limitations where known. This documentation
may cite fixed limits where they exist, say how to compute variable limits as a function of available
resources, or say that fixed limits do not exist or are unknown.
2 The limits may constrain quantities that include those described below or others. The bracketed number
following each quantity is recommended as the minimum for that quantity. However, these quantities are
only guidelines and do not determine compliance.
…
Pointer, array, and function declarators (in any combination) modifying a class, arithmetic, or incomplete
type in a declaration [256].
…
Thus, in C++, the recommendation is that you should be able to use at least 256 dimensions in an array declaration.
Note that even after you've got the compiler to accept your code, there will ultimately be limits imposed by the memory on the machine where the code is run. The standards specify the minimum number of dimensions that the compiler must allow (over-specify in the C++ standard; the mind boggles at the thought of a 256-dimensional array). The intention is that you shouldn't run into a problem — use as many dimensions as you need. (Can you imagine working with the source code for a 64-dimensional array, let alone anything more — the individual expressions in the source would be horrid to behold, let alone write, read, modify.)
It is not hard to understand that it is only limited by the amount of memory your machine has. You can take 100 (n)dimensional array also.1
Note: your code is accessing a memory out of the bound which is undefined behavior.
1.standard specifies a minimum limit of 12 in case of C and 256 in case of c++11.(This information is added after discussion with Jonathan leffler.My earlier answer only points out the maximum limits which is constrained my machine memory.
maximum number depend on stack size. ex, if stack size = 1Mb --> size of int a[xx][xx][xx][xx][xx] must < 1Mb

Maximum number of arguments [duplicate]

I came across some Fortran 90 code where 68 arguments are passed to a function.
Upon searching the web I only found something about a limit of passing 256 bytes for some CUDA Fortran related stuff (http://www.pgroup.com/userforum/viewtopic.php?t=2235&sid=f241ca3fd406ef89d0ba08a361acd962).
So I wonder: is there a limit to the number of arguments that may be passed to a function for Intel/Visual/GNU fortran compilers?
I came across this discussion of the Fortran 90 standards:
http://www.nag.co.uk/sc22wg5/Guidelines_for_Bindings-b.html
The relevant section is in italics below:
3.4 Statements
Constraints on the length of a Fortran statement impose an upper limit on the length of a procedure call. Although it is unlikely to be a serious imposition, a single Fortran statement in free-form format is subject to an upper limit of 5241 characters, including a possible label (F90 sections 3.3.1, 3.3.1.4); in fixed-form format the upper limit is 1320 characters plus a possible label (F90 section 3.3.2). These limits are subject to the characters being of "default kind", that is in practice single-byte characters; for double-byte characters, used for example for ideographic languages, the limits are processor-dependent but are likely to be of the same order of size.
There is no limit in the Fortran Standard on the maximum number of arguments to a procedure.
I have a security tool that I am developing in my research that analyzes binaries. I knew that the C language has limits on number of arguments (31 in the C90 standard, 127 in the C99 standard), so I thought that I could dimension a vector to hold 128 items pertaining to incoming arguments. I encountered a FORTRAN-derived binary that had 290 arguments passed, which led me to this discussion. The binary is from the SPEC CPU2006 benchmark suite, benchmark 481.wrf, where I see a procedure that is named (in the binary) "solve_interface_" which sets up 290 arguments on the stack and then calls "solve_em_" which actually processes these arguments. You can no doubt find the FORTRAN source code for these procedures online. The binary was produced by the GNU compiler tools for an x86/Linux/ELF system.
I don't believe that the Fortran standards explicitly impose such a limit. However, they do place a limit on the length of a line of code (132 characters) and the number of lines which can together form a single statement (256). I'll leave it to you to figure out how many arguments you could use in a single call to a routine.
Many compilers on the market have more relaxed limits on both statement length and the number of continuation lines that can be used. However, it would not surprise me if a compiler did impose a maximum number of arguments for any routine but the number is probably higher than any realistic need requires.

Maximum number of cases that can be addressed using switch statement

This is out of curiosity. What is the maximum number of switch cases I can have in a single switch including the default: case. I mean like this:
switch(ch)
{
case 1:
//some statement
break;
case 2:
//some statement
break;
.
.
.
.
case n:
//some statement
break;
default:
//default statement
}
My question is what is the maximum value that we can have here? Although this is not programatically significant, I found this a rather intriguing thought. I searched some blogs and found a statement here.
From a doc I have, it is said that:
Standard C specifies that a switch can have at least 257 case
statements. Standard C++ recommends that at least 16,384 case
statements be supported! The real value must be implementation
dependent.
But I don't know how accurate this information is, can somebody give me an idea? Also what does it mean by implementation dependent? Suppose there is a limit like this, can I somehow change it to a higher or lower value?
The draft C++ standard Annex B (informative) Implementation quantities says (emphasis mine):
Because computers are finite, C++ implementations are inevitably limited in the size of the programs they can successfully process. Every implementation shall document those limitations where known. [...]
The limits may constrain quantities that include those described below or others. The bracketed number following each quantity is recommended as the minimum for that quantity. However, these quantities are only guidelines and do not determine compliance.
and includes the follow item:
— Case labels for a switch statement (excluding those for any nested switch statements) [16384].
but these are not hard limits only a recommendation on minimums.
The implementation is the compiler, standard library and supporting tools and so implementation dependent basically means for this case the compiler will decide what the limit is but it should document this limit. The draft standard defines implementation-defined behavior in section 1.3.10 as:
behavior, for a well-formed program construct and correct data, that depends on the implementation and that each implementation documents
We can see that gcc does not impose a limit for C:
GCC is only limited by available memory.
which should also cover C++ in this case and it looks like Visual Studio also does not place a limit:
Microsoft C does not limit the number of case values in a switch statement. The number is limited only by the available memory. ANSI C requires at least 257 case labels be allowed in a switch statement.
I can not find similar documentation for clang.
Your question is tagged C++, so per C++98 Annex B/1:
Because computers are finite, C++ implementations are inevitably
limited in the size of the programs they can successfully process.
Every implementation shall document those limitations where known.
This documentation may cite fixed limits where they exist, say how to
compute variable limits as a function of available resources, or say
that fixed limits do not exist or are unknown.
And then Annex B/2:
The limits may constrain quantities that include those described below
or others. The bracketed number following each quantity is recommended
as the minimum for that quantity. However, these quantities are only
guidelines and do not determine compliance.
So as long as the implementation documents what it's doing, ANY max number of case statements is allowed. The standard recommends 16384 in a following list however.
Per the c99 standard, section 5.2.4.1 Translation limits says:
The implementation shall be able to translate and execute at least one program that
contains at least one instance of every one of the following limits:13)
and includes the following line:
— 1023 case labels for a switch statement (excluding those for any nested switch
statements)
Per c++98 standard, Annex B (informative) Implementation quantities says:
The limits may constrain quantities that include those described below
or others. The bracketed number following each quantity is recommended
as the minimum for that quantity. However, these quantities are only
guidelines and do not determine compliance.
— Case labels for a switch statement (excluding those for any nested
switch statements) [16 384].
In theory the max number of cases a switch statement can have depends on the data type of the variable you use:
data_type x
switch(x)
{
...
}
for char, you have 256, for short you have 65536 ...and so on; the maximum number of values you can represent given that data_type.
However, the compiler has to generate code for this switch(statement), and to code it usually generates is something like
cmp(R1,$value)
IFT jmp _subroutine
cmp(R1,$value2)
IFT jmp _subroutine2
...
The more cases you add, the higher the pressure on the registers and the larger the code size gets. Since memory and registers are not infinite, and the compiler is human-written there has to be a limit - and that is what is meant by implementation dependent. Each compiler can permit a different number of cases for a switch statement.
Implementation dependant means, the behaviour is not defined by standard, it is the decision of the compiler. The C++ standard does not set a minimum value for how many labels a switch statement shall support.

C/C++ Control Structure Limitations?

I have heard of a limitation in VC++ (not sure which version) on the number of nested if statements (somewhere in the ballpark of 300). The code was of the form:
if (a) ...
else if (b) ...
else if (c) ...
...
I was surprised to find out there is a limit to this sort of thing, and that the limit is so small. I'm not looking for comments about coding practice and why to avoid this sort of thing altogether.
Here's a list of things that I'd imagine could have some limitation:
Number of functions in a scope (global, class, or namespace).
Number of expressions in a single statement (e.g., compound conditionals).
Number of cases in a switch.
Number of parameters to a function.
Number of classes in a single hierarchy (either inheritance or containment).
What other control structures/language features have limits such as this? Do the language standards say anything about these limits (perhaps minimum requirements for an implementation)? Has anyone run into a particular language limitation like this with a particular compiler/implementation?
EDIT: Please note that the above form of if statements is indeed "nested." It is equivalent to:
if (a) { //...
}
else {
if (b) { //...
}
else {
if (c) { //...
}
else { //...
}
}
}
Visual C++ Compiler Limits
The C++ standard recommends limits for
various language constructs. The
following is a list of constructs
where the Visual C++ compiler does not
implement the recommended limits. The
first number is the recommended limit
and the second number is the limit
implemented by Visual C++:
Nesting levels of compound statements,
iteration control structures, and
selection control structures [256]
(256).
Parameters in one macro definition
[256] (127).
Arguments in one macro invocation
[256] (127).
Characters in a character string
literal or wide string literal (after
concatenation) [65536] (65535).
Levels of nested class, structure, or
union definitions in a single
struct-declaration-list [256] (16).
Member initializers in a constructor
definition [6144] (approximately 600,
memory dependent, can increase with
the /Zm compiler option).
Scope qualifications of one identifier
[256] (127).
Nested external specifications [1024]
(10).
Template arguments in a template
declaration [1024] (64).
Do the language standards say anything
about these limits (perhaps minimum
requirements for an implementation)?
No, the standard sets no minimum limits on this. But it is good practice for an implementation to set and document a hard limit on such things, rather than fail in some unknown way when the limit is exceeded.
Edit: The standard recommends some minimum limits In Annex B - there are really too many to post here and they are in any case advisory:
The limits may constrain quantities
that include those described below or
others. The bracketed number following
each quantity is recommended as the
minimum for that quantity. However,
these quantities are only guidelines
and do not determine compliance.
C specifies that implementations must be able to translate a program that contains an instance of each of a number of limits. The first limit is that of 127 nesting levels of blocks. (5.2.4.1 of ISO/IEC 9899:1999)
C doesn't say that any valid program that contains no more than 127 nesting levels must be translated; it could be unreasonably large in other ways. The rationale was to set some level of expectation that portable programs can have, while allowing latitude not to exclude small implementations and implementations targetting small systems.
In short, if you want more than 127 nesting levels it probably means that you should consult your implementation's documentation to see if it guarantees to support a larger number.
Just to put the whole scoping thing to bed, the following is legal C++ code:
int main() {
if ( int x = 1 ) {
}
else if ( int x = 2 ) {
}
}
which it would not be if both the if and the else if were at the same scope. I think there have been a lot of misunderstandings, perhaps engendered by my comment:
The compiler cares a great deal about scope.
which is of course true, but maybe not helpful in this situation.