sizeof() in C/C++ for arrays

sizeof() in C/C++ for arrays - c++

So C/C++ arrays don't know about their length, right? But then how can the function sizeof(array) work and give us the proper size in bytes when it shouldn't be able to know the number of elements in the array?

So C/C++ arrays don't know about their length, right.
Your assumption is wrong. With the exception of variable length arrays introduced in C99, arrays in both C and C++ have a size that is known in compile time. The compiler knows their size.
Your confusion is probably because there are times when array names decay into a pointer to its first element (like when passed as function argument), it's true that the size information is lost here.
But when sizeof is used on an array, the array is not converted to a pointer. This is your other confusion: sizeof is not a function, it's an operator.

I will quote the relevant portions of C99 standard. §6.5.3.4 ¶2 says
The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer. If
the type of the operand is a variable length array type, the operand
is evaluated; otherwise, the operand is not evaluated and the result
is an integer constant.
It also says in the same section §6.5.3.4 ¶1
The sizeof operator shall not be applied to an expression that has
function type or an incomplete type.
About the array type, §6.2.5 ¶20 says
An array type describes a contiguously allocated nonempty set of
objects with a particular member object type, called the element type.
Array types are characterized by their element type and by the number
of elements in the array.
It again says in §6.2.5 ¶22
An array type of unknown size is an incomplete type.
So to summarize the above, the size of an array is known to the compiler (determined using sizeof operator) when you also specify the size of the array, i.e, when it's a complete type.

Related

Memory allocation using array

Can someone please tell me, what in general, will the following do in C ?
H.L = new int* [H.n];
Does this command create L variable in structure H of integer type with size n?

It will create an array of H.n pointers to int. A pointer to the first array element will be stored in H.L.
If you wanted an array of ints, remove the asterisk.
On a side note, you may be happier using a vector<int*> instead.
It is much easier to use.

In the right side of the expression statement
H.L = new int* [H.n];
there is created an array of type int *[H.n] using unary operator new []. That is each element of the array has type int *.
Expression H.n used in square brackets specifies the number of elements of the allocated array and must be converted to type size_t. Usually it is some integral type as for example int.
According to the C+ Standard (5.3.4 New)
...If it is an array, the new-expression returns a pointer to the initial
element of the array.
So the left side lvalue H.L should have type int ** or some type (usually in rare cases) that can be converted from int **.

Confusion regarding types, overflows and UB in pointer-integral addition

I used to think that adding an integral type to a pointer (provided that the the pointer points to an array of a certain size etc. etc.) is always well defined, regardless of the integral type. The C++11 standard says ([expr.add]):
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i -th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n ) point to, respectively, the i + n -th and i − n -th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
On the other hand, it was brought to my attention recently that the built-in add operators for pointers are defined in terms of ptrdiff_t, which is a signed type (see 13.6/13). This seems to hint that if one does a malloc() with a very large (unsigned) size and then tries to reach the end of the allocated space via a pointer addition with a std::size_t value, this might result in undefined behaviour because the unsigned std::size_t will be converted to ptrdiff_t which is potentially UB.
I imagine similar issues would arise, e.g., in the operator[]() of std::vector, which is implemented in terms of an unsigned size_type. In general, it seems to me like this would make practically impossible to fully use the memory storage available on a platform.
It's worth noting that nor GCC nor Clang complain about signed-unsigned integral conversions with all the relevant diagnostic turned on when adding unsigned values to pointers.
Am I missing something?
EDIT: I'd like to clarify that I am talking about additions involving a pointer and an integral type (not two pointers).
EDIT2: an equivalent way of formulating the question might be this. Does this code result in UB in the second line, if ptrdiff_t has a smaller positive range than size_t?
char *ptr = static_cast<char * >(std::malloc(std::numeric_limits<std::size_t>::max()));
auto end = ptr + std::numeric_limits<std::size_t>::max();

Your question is based on a false premise.
Subtraction of pointers produces a ptrdiff_t §[expr.add]/6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (18.2).
That does not, however, mean that addition is defined in terms of ptrdiff_t. Rather the contrary, for addition only one conversion is specified (§[expr.add]/1):
The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.
The "usual arithmetic conversions" are defined in §[expr]/10. This includes only one conversion from unsigned type to signed type:
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
So, while there may be some room for question about exactly what type the size_t will be converted to (and whether it's converted at all), there's no question on one point: the only way it can be converted to a ptrdiff_t is if all its values can be represented without change as a ptrdiff_t.
So, given:
size_t N;
T *p;
...the expression p + N will never fail because of some (imagined) conversion of N to a ptrdiff_t before the addition takes place.
Since §13.6 is being mentioned, perhaps it's best to back up and look carefully at what §13.6 really is:
The candidate operator functions that represent the built-in operators defined in Clause 5 are specified in this subclause. These candidate functions participate in the operator overload resolution process as described in 13.3.1.2 and are used for no other purpose.
[emphasis added]
In other words, the fact that §13.6 defines an operator that adds a ptrdiff_t to a pointer does not mean that when any other integer type is added to a pointer, it's first converted to a ptrdiff_t, or anything like that. More generally, the operators defined in §13.6 are never used to carry out any arithmetic operations.
With that, and the rest of the text you quoted from §[expr.add], we can quickly conclude that adding a size_t to a pointer can overflow if and only if there aren't that many elements in the array after the pointer.
Given the above, one more question probably occurs to you. If I have code like this:
char *p = huge_array;
size_t N = sizeof(huge_array);
char *p2 = p + N;
ptrdiff_t diff = p2 - p;
...is it possible that the final subtraction will overflow? The short and simple answer to that is: Yes, it can.

c++, do array indices need to be int?

In c++, a const array, arr, contains 100 numbers between 0 and 80.
If I choose the numbers in arr to be chars, will they be implicitly converted to int everytime they are used as indices on double-pointers, i.e. doublepointer[arr[i]]?

Yes, they will be converted to type int. According to the C++ Standard "subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2))."
And if the additive operator is used then "The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type." This means that objects of type char will be converted to objects of type int when they are used in expressions as indices in the subscript operator.
Take into account that type char may behave either as unsigned char or as signed char depending on the compiler options you will select or that are set by default.
As for types that can be used as indices in the subscript operator then they shall be either unscoped enumerations or some integral types.

For a genuine array, the index is (converted to) some integral type, as explained in Vlad's answer.
But several STL containers e.g. std::map or std::vector have their own operator [] whose argument might be (e.g. for some map-s) a non-integral type. By convention, that operator might be related to some at member function.

No it is not necessary that you can have an int as a index of an array. You can have characters as array index but they have there own problems. Char indexes can create problems as they may be either signed or unsigned, depending on the implementation. If a
user-provided character is used as an array index, it's possible that
the value is negative, and in most cases, that would mean memory
outside of the array will be accessed. Hence it would result in an unnecessary chaos. Hence int is recommended and mostly used as array index

To answer the question in your title, an array index can be of any "unscoped enumeration or integral type". Array indexing is defined in terms of pointer addition; one operand must be a pointer to a completely-defined object type, and the other must be of some integral or unscoped enumeration
type.
(Note that the word "object" here has nothing to do with object-oriented programming.)
There's nothing special about type int in this context. When you define an array type or object, you don't specify the type of the index, just the element type and the number of elements. When you use an index expression arr[i], the index can be of any integral type; for example unsigned int and long long are valid, and will not be implicitly converted.
To address the specific code you're asking about, char is an integral type, so it's perfectly valid as an array index -- but you need to be careful. The "usual arithmetic conversions" are applied to the index expression, which means that if it's of a type narrower than int it will be promoted to int or to unsigned int. These promotions do not apply to an index whose type is already at least as wide as int.
If plain char happens to be signed in your implementation, and if the value happens to be negative, then it will be promoted to a negative int value, which is probably not what you want. In your particular case, you say the values are between 0 to 80, all of which are within the range of positive values of type char. But just in case your requirements change later, you'd be better off defining your array with an element type of unsigned char.

Why does decay to pointer for array argument appear not to apply to sizeof()?

I read a question earlier that was closed due to being an exact duplicate of this
When a function has a specific-size array parameter, why is it replaced with a pointer?
and
How to find the 'sizeof' (a pointer pointing to an array)?
but after reading this I am still confused by how sizeof() works. I understand that passing an array as an argument to a function such as
void foo(int a[5])
will result in the array argument decaying to a pointer. What I did not find in the above 2 question links was a clear answer as to why it is that the sizeof() function itself is exempt from (or at least seemingly exempt from) this pointer decay behaviour. If sizeof() behaved like any other function then
int a[5] = {1,2,3,4,5};
cout << sizeof(a) << endl;
then the above should output 4 instead of 20. Have I missed something obvious as this seems to be a contradiction of the decay to pointer behaviour??? Sorry for bringing this up again but I really am having a hard time of understanding why this happens despite having happily used the function for years without really thinking about it.

Because the standard says so (emphasis mine):
(C99, 6.3.2.1p3) "Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue."
Note that for C++, the standard explicitly says the size is the size of the array:
(C++11, 5.3.3p2 sizeof) "[...] When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n
elements is n times the size of an element."

sizeof is an operator, not a function. It's a specific one at that, too. The parentheses aren't even necessary if it's an expression:
int a;
sizeof (int); //needed because `int` is a type
sizeof a; //optional because `a` is an expression
sizeof (a); //^ also works
As you can see, it's on this chart of precedence as well. It's also one of the non-overloadable operators.

How does "sizeof" work in this helper for determining array size?

I've found this article that brings up the following template and a macro for getting array size:
template<typename Type, size_t Size>
char ( &ArraySizeHelper(Type( &Array )[Size]) )[Size];
#define _countof(Array) sizeof(ArraySizeHelper(Array))
and I find the following part totally unclear. sizeof is applied to a function declaration. I'd expect the result to be "size of function pointer". Why does it obtain "size of return value" instead?

sizeof is applied to the result of a function call, not a declaration. It therefore gives the size of the return value, which in this case is a reference to an array of chars.
The template causes the array in the return type to have the same number of elements as the argument array, which is fed to the function from the macro.
Finally, sizeof is then applied to a reference to this char array. sizeof on a reference is the same as sizeof on the type itself. Since sizeof(char) == 1, this gives the number of elements in the array.

template<typename Type, size_t Size>
char (&ArraySizeHelper(Type(&Array)[Size]))[Size];
#define _countof(Array) sizeof(ArraySizeHelper(Array))
sizeof is applied to a function declaration. I'd expect the result to be "size of function pointer". Why does it obtain "size of return value" instead?
It's not sizeof ArraySizeHelper (which would be illegal - can't take sizeof a function), nor sizeof &ArraySizeHelper - not even implicitly as implicit conversion from function to pointer-to-function is explicitly disallowed by the Standard, for C++0x see 5.3.3). Rather, it's sizeof ArraySizeHelper(Array) which is equivalent to sizeof the value that the function call returns, i.e. sizeof char[Size] hence Size.

ArraySizeHelper is a function template which returns a char array of size Size. The template takes two parameters, one is type (which is Type), and other is value (which is Size).
So when you pass an object of type, say, A[100] to the function. The compiler deduces both arguments of the template: Type becomes A, and Size becomes 100.
So the instantiated function return type becomes char[100]. Since the argument of sizeof is never evaluated, so the function need not to have definition. sizeof only needs to know the return type of the function which is char[100]. That becomes equivalent to sizeof(char[100]) which returns 100 - the size of the array.
Another interesting point to be noted is that sizeof(char) is not compiler-dependent, unlike other primitive types (other than the variants of char1). Its ALWAYS 1. So sizeof(char[100]) is guaranteed to be 100.
1. Size of all variants of char is ONE, be it char, signed char, unsigned char according to the Standard.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js