I am reading Modern C++ design. It was mentioned about sizeof opeator as following description. Following paragraph is explained from generic programming point of view.
There is a surprising amount of power in sizeof: You can apply sizeof to any expression, no matter how complex, and sizeof returns its size without actually evaluating that expression at runtime. This means that sizeof is aware of overloading, template instantiation, conversion rules—everything that can take part in a C++ expression. In fact, sizeof conceals a complete facility for deducing the type of an expression; eventually, sizeof throws away the expression and returns only the size of its result.
My question is what does author mean sizeof returns its size with out actually evalutating the exression at runtime. And also in last line it was mentioned that sizeof throws away the expression. Request help in understanding these statements, it would be good if it is done with example.
Thanks
what does author mean sizeof returns its size with out actually evalutating the exression at runtime.
It means that sizeof(1/0) will yield sizeof(int), even though 1/0 would normally abort the program, because division by zero is a runtime error. Also, for any p declared as T* p, sizeof(*p) will yield sizeof(T) no matter what value is stored in p, even if p is dangling or not initialized at all.
sizeof is evaluated at compile time: the compiler computes the type of the expression that follows the sizeof operator. This is done once and for all by the compiler, hence the sentence “without actually evaluating that expression at runtime”.
The compiler computes the type, then it is able to deduce the size of the expression from the type, and then, still at compile time, the whole sizeof expression is replaced by the calculated size. So the expression itself does not make it into the executable code. That's what the sentence “sizeof throws away the expression and returns only the size of its result” means.
The following gives you the sizeof of the type that i++ has, which is int (usually an int has 4 or 8 bytes, so it will likely give you value 4 or 8). However, since the expression is not evaluated, no runtime action is done for the expression.
int i = 0;
sizeof(i++);
Evaluating an expression basically means to execute its side effects (e.g incrementing a variable) or reading values from memory or registers at runtime. So in some sense sizeof "throws away" its operand, since it does not really perform the runtime operation it specifies (the value of i will still be zero).
The compiler needs to calculate the sizes of types/structs/classes for various operations. The sizeof operator makes these sizes available to your program as a constant. So for example, if you do sizeof(int) the compiler knows how big an int is (in bytes) and will insert that value instead. The same applies for more complex things like sizeof(myVariable) with myVariable being of type MyClass: the compiler does know how much space MyClass takes up and thus can insert that value.
The point is that this evaluation takes places at compile-time: the result is a number. During runtime, the evaluation does not need to be done again.
It means int j=sizeof(int); would be compiled to int j=4;
I have read the compiled assembly, there is no actually calc during execution!
Related
I would have thought that all the necessary info would be known at compile time and the compiler could insert a constant value.
Does this indeed happen?
No. sizeof(int) results in a constant expression of size_t type, which means its value known at compile time. NO RUNTIME OVERHEAD!
No. It is a compile time thing.
No, in C++, sizeof is always evaluated in compile time.
Note that it's not true in C, the exception is variable length arrays.
I have an array that I would like to initialize
char arr[sizeof(int)];
Would this expression evaluate to a compile time constant or result in a function call?
char arr[sizeof(int)];
As far as the language is concerned, it is fine, though the array is only declared (and defined), it is NOT initialized if it is a local variable. If it is declared at namespace level, then it is statically zero-initialized.
Note that sizeof(int) is a constant expression of size_t type; its value is known at the compile time.
This is an initialization:
char arr[sizeof(int)] = { 'A', 'B', '0', 'F' };
This of course assumes that sizeof(int) is (at least) 4, or it will fail to compile.
And to answer the actual (new) question:
sizeof() is a compile time operator. In C++ [according to the standard, some compilers do allow C style variable length arrays], it will not result in anything other than a compile time constant. In C, with variable length arrays, it can become a simple calculation (number of elements * size of each element - where number of elements is the variable part).
There is no initialization here. There's nothing wrong with declaring or defining an array with sizeof(int) elements, except that it might look a little odd to readers of the code. But if that's what you need, that's what you should write.
It really depends on how you intend to use the array.
sizeof(int) may vary with different implementations so you just need to be careful how you access the elements in the array. Don't go assuming that an element that is accessible on your machine is accessible on another, unless it's within the minimum sizes specified in the C++ standard.
sizeof is evaluated at compile time, the only time sizeof would would be run time evaluated would be in the case of variable length arrays in C99 code or in gcc or other c++ compilers that support VLA as an extension. So this code is valid:
char arr[sizeof(int)];
Although if it is local variable it won't be initialized.
let's say I have:
int test[10];
on a 32bit machine. What if I do:
int b = test[-1];
obviously that's a big no-no when it comes to access an array (out of bound) but what actually happens? Just curious
Am I accessing the 32bit word "before" my array?
int b = *(test - 1);
or just addressing a very far away word (starting at "test" memory location)?
int b = *(test + 0xFFFFFFFF);
0xFFFFFFFF is the two's complement representation of decimal -1
The behaviour of your program is undefined as you are attempting to access an element outside the bounds of the array.
What might be happening is this: Assuming you have a 32 bit int type, you're accessing the 32 bits of memory on the stack (if any) before test[0] and are casting this to an int. Your process may not even own this memory. Not good.
Whatever happens, you get undefined behaviour since pointer arithmetic is only defined within an array (including the one-past-the-end position).
A better question might be:
int test[10];
int * t1 = test+1;
int b = t1[-1]; // Is this defined behaviour?
The answer to this is yes. The definition of subscripting (C++11 5.2.1) is:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
so this is equivalent to *((t1)+(-1)). The definition of pointer addition (C++11 5.7/5) is for all integer types, signed or unsigned, so nothing will cause -1 to be converted into an unsigned type; so the expression is equivalent to *(t1-1), which is well-defined since t1-1 is within the array bounds.
The C++ standard says that it's undefined behavior and illegal. What this means in practice is that anything could happen, and the anything can vary by hardware, compiler, options, and anything else you can think of. Since anything could happen there isn't a lot of point in speculating about what might happen with a particular hardware/compiler combination.
The official answer is that the behavior is undefined. Unofficially, you are trying to access the integer before the start of the array. This means that you instruct the computer to calculate the address that precedes the start of the array by 4 bytes (in your case). Whether this operation will success or not depends on multiple factors. Some of them are whether the array is going to be allocated on the stack segment or static data segment, where specifically the location of that address is going to be. On a general purpose machine (windows/linux) you are likely to get a garbage value as a result but it may also result in a memory violation error if the address happens to be somewhere where the process is not authorized to access. What may happen on a specialized hardware is anybody's guess.
Is the difference of two non-void pointer variables defined (per C99 and/or C++98) if they are both NULL valued?
For instance, say I have a buffer structure that looks like this:
struct buf {
char *buf;
char *pwrite;
char *pread;
} ex;
Say, ex.buf points to an array or some malloc'ed memory. If my code always ensures that pwrite and pread point within that array or one past it, then I am fairly confident that ex.pwrite - ex.pread will always be defined. However, what if pwrite and pread are both NULL. Can I just expect subtracting the two is defined as (ptrdiff_t)0 or does strictly compliant code need to test the pointers for NULL? Note that the only case I am interested in is when both pointers are NULL (which represents a buffer not initialized case). The reason has to do with a fully compliant "available" function given the preceding assumptions are met:
size_t buf_avail(const struct s_buf *b)
{
return b->pwrite - b->pread;
}
In C99, it's technically undefined behavior. C99 §6.5.6 says:
7) For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
[...]
9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [...]
And §6.3.2.3/3 says:
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
So since a null pointer is unequal to any object, it violates the preconditions of 6.5.6/9, so it's undefined behavior. But in practicality, I'd be willing to bet that pretty much every compiler will return a result of 0 without any ill side effects.
In C89, it's also undefined behavior, though the wording of the standard is slightly different.
C++03, on the other hand, does have defined behavior in this instance. The standard makes a special exception for subtracting two null pointers. C++03 §5.7/7 says:
If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type ptrdiff_t.
C++11 (as well as the latest draft of C++14, n3690) have identical wording to C++03, with just the minor change of std::ptrdiff_t in place of ptrdiff_t.
I found this in the C++ standard (5.7 [expr.add] / 7):
If two pointers [...] both are null, and the two pointers are
subtracted, the result compares equal to the value 0 converted to the
type std::ptrdiff_t
As others have said, C99 requires addition/subtraction between 2 pointers be of the same array object. NULL does not point to a valid object which is why you cannot use it in subtraction.
Edit: This answer is only valid for C, I didn't see the C++ tag when I answered.
No, pointer arithmetic is only allowed for pointers that point within the same object. Since by definition of the C standard null pointers don't point to any object, this is undefined behavior.
(Although, I'd guess that any reasonable compiler will return just 0 on it, but who knows.)
The C Standard does not impose any requirements on the behavior in this case, but many implementations do specify the behavior of pointer arithmetic in many cases beyond the bare minimums required by the Standard, including this one.
On any conforming C implementation, and nearly all (if not all) implementations of C-like dialects, the following guarantees will hold for any pointer p such that either *p or *(p-1) identifies some object:
For any integer value z that equals zero, The pointer values (p+z) and (p-z) will be equivalent in every way to p, except that they will only be constant if both p and z are constant.
For any q which is equivalent to p, the expressions p-q and q-p will both yield zero.
Having such guarantees hold for all pointer values, including null, may eliminate the need for some null checks in user code. Further, on most platforms, generating code that upholds such guarantees for all pointer values without regard for whether they are null would be simpler and cheaper than treating nulls specially. Some platforms, however, may trap on attempts to perform pointer arithmetic with null pointers, even when adding or subtracting zero. On such platforms, the number of compiler-generated null checks that would have to be added to pointer operations to uphold the guarantee would in many cases vastly exceed the number of user-generated null checks that could be omitted as a result.
If there were an implementation where the cost of upholding the guarantees would be great, but few if any programs would receive any benefit from them, it would make sense to allow it to trap "null+zero" computations, and require that user code for such an implementation include the manual null checks that the guarantees could have made unnecessary. Such an allowance was not expected to affect the other 99.44% of implementations, where the value of upholding the guarantees would exceed the cost. Such implementations should uphold such guarantees, but their authors shouldn't need the authors of the Standard to tell them that.
The authors of C++ have decided that conforming implementations must uphold the above guarantees at any cost, even on platforms where they could substantially degrade the performance of pointer arithmetic. They judged that the value of the guarantees even on platforms where they would be expensive to uphold would exceed the cost. Such an attitude may have been affected by a desire to treat C++ as a higher-level language than C. A C programmer could be expected to know when a particular target platform would handle cases like (null+zero) in unusual fashion, but C++ programmers weren't expected to concern themselves with such things. Guaranteeing a consistent behavioral model was thus judged to be worth the cost.
Of course, nowadays questions about what is "defined" seldom have anything to do with what behaviors a platform can support. Instead, it is now fashionable for compilers to--in the name of "optimization"--require that programmers manually write code to handle corner cases which platforms would previously have handled correctly. For example, if code which is supposed to output n characters starting at address p is written as:
void out_characters(unsigned char *p, int n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
older compilers would generate code that would reliably output nothing, with
no side-effect, if p==NULL and n==0, with no need to special-case n==0. On
newer compilers, however, one would have to add extra code:
void out_characters(unsigned char *p, int n)
{
if (n)
{
unsigned char *end = p+n;
while(p < end)
out_byte(*p++);
}
}
which an optimizer may or may not be able to get rid of. Failing to include the extra code may cause some compilers to figure that since p "can't possibly be null", any subsequent null pointer checks may be omitted, thus causing the code to break in a spot unrelated to the actual "problem".
For example result of this code snippet depends on which machine: the compiler machine or the machine executable file works?
sizeof(short int)
sizeof is a compile time operator.
It depends on the machine executing your program. But the value evaluates at compile time. Thus the compiler (of course) has to know for which machine it's compiling.
As of C99, sizeof is evaluated at runtime if and only if the operand is a variable-length array, e.g. int a[b], where b is not known at compile time. In this case, sizeof(a) is evaluated at runtime and its result is the size (in bytes) of the entire array, i.e. the size of all elements in the array, combined. To get the number of elements in the array, use sizeof(a) / sizeof(b). From the C99 standard:
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.
Note that all of this is different from what you'd get if you allocated an array on the heap, e.g. int* a = new int[b]. In that case, sizeof(a) would just give you the size of a pointer to int, i.e. 4 or 8 bytes, regardless of how many elements are in the array.
sizeof is evaluated at compile time, but if the executable is moved to a machine where the compile time and runtime values would be different, the executable will not be valid.
Anon tried to explain this, but still he nor no one else has stated that your compiler has flags to indicate what processor you are compiling for. This is how sizeof short is known at compile time.
I however feel that any desktop compiler should push out code compatible with desktops. I think the OS provides certain abstractions around this. Even though I hear that windows machines have different architecture from Macintosh machines.