c++, do array indices need to be int? - c++

In c++, a const array, arr, contains 100 numbers between 0 and 80.
If I choose the numbers in arr to be chars, will they be implicitly converted to int everytime they are used as indices on double-pointers, i.e. doublepointer[arr[i]]?

Yes, they will be converted to type int. According to the C++ Standard "subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2))."
And if the additive operator is used then "The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type." This means that objects of type char will be converted to objects of type int when they are used in expressions as indices in the subscript operator.
Take into account that type char may behave either as unsigned char or as signed char depending on the compiler options you will select or that are set by default.
As for types that can be used as indices in the subscript operator then they shall be either unscoped enumerations or some integral types.

For a genuine array, the index is (converted to) some integral type, as explained in Vlad's answer.
But several STL containers e.g. std::map or std::vector have their own operator [] whose argument might be (e.g. for some map-s) a non-integral type. By convention, that operator might be related to some at member function.

No it is not necessary that you can have an int as a index of an array. You can have characters as array index but they have there own problems. Char indexes can create problems as they may be either signed or unsigned, depending on the implementation. If a
user-provided character is used as an array index, it's possible that
the value is negative, and in most cases, that would mean memory
outside of the array will be accessed. Hence it would result in an unnecessary chaos. Hence int is recommended and mostly used as array index

To answer the question in your title, an array index can be of any "unscoped enumeration or integral type". Array indexing is defined in terms of pointer addition; one operand must be a pointer to a completely-defined object type, and the other must be of some integral or unscoped enumeration
type.
(Note that the word "object" here has nothing to do with object-oriented programming.)
There's nothing special about type int in this context. When you define an array type or object, you don't specify the type of the index, just the element type and the number of elements. When you use an index expression arr[i], the index can be of any integral type; for example unsigned int and long long are valid, and will not be implicitly converted.
To address the specific code you're asking about, char is an integral type, so it's perfectly valid as an array index -- but you need to be careful. The "usual arithmetic conversions" are applied to the index expression, which means that if it's of a type narrower than int it will be promoted to int or to unsigned int. These promotions do not apply to an index whose type is already at least as wide as int.
If plain char happens to be signed in your implementation, and if the value happens to be negative, then it will be promoted to a negative int value, which is probably not what you want. In your particular case, you say the values are between 0 to 80, all of which are within the range of positive values of type char. But just in case your requirements change later, you'd be better off defining your array with an element type of unsigned char.

Related

C++: Does Comparing different sized integers cause UB? [duplicate]

This question already has answers here:
Comparing int with long and others
(2 answers)
Closed 1 year ago.
So this is probably a really simple question and if it was not about C++ I would just go ahead and check if it works on my computer or not, but unfortunately in C++ things usually tend to work on a couple of systems while still being UB and therefore not working on other systems.
Consider the following code snippet:
unsigned long long int a = std::numeric_limits< unsigned long long int >::max();
unsigned int b = 12;
bool test = a > b;
My question is: Can we compare integers of different size with one another without explicitly casting the smaller type to the bigger one using e.g. static_cast without running into undefined behavior (UB)?
In general there are three ways I can imagine this turning out:
The smaller type is implicitly cast to the bigger type before conversion (either via a real cast or by some clever way of being able to "pretend" it had been casted)
The bigger type is truncated to the size of the smaller one before comparison
This is not defined and one needs to add in an explicit cast in order to arrive at defined behavior
This is not undefined behavior. This is covered by the usual arithmetic conversions which are detailed in section 8p11.5 of the C++17 standard:
The integral promotions (7.6) shall be performed on both operands.
Then the following rules shall be applied to the promoted operands:
(11.5.1) If both operands have the same type, no further conversion is needed.
(11.5.2) Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser
integer conversion rank shall be converted to the type of the operand
with greater rank.
(11.5.3) Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other
operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
(11.5.4)Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with
unsigned integer type, the operand with unsigned integer type shall be
converted to the type of the operand with signed integer type.
(11.5.5)Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed
integer type.
The passage in bold is what applies here. Since both types are unsigned, the the smaller type is converted to the larger type as the format can hold a subset of values the latter can hold.
This is safe. C++ has what are called the Usual arithmetic conversions and they handle how to implicitly convert the objects passed to the built in binary operators.
In this case, integer promotion happens and b is converted to a unsigned long long int for you and then operator > is evaluated.

What type is used in C++ to define an array size?

Compiling some test code in avr-gcc for an 8-bit micro-controller, the line
const uint32_t N = 65537;
uint8_t values[N];
I got the following compilation warning (by default should be an error, really)
warning: conversion from 'long unsigned int' to 'unsigned int' changes value from '65537' to '1' [-Woverflow]
uint8_t values[N];
Note that when compiling for this target, sizeof(int) is 2.
So it seems that, at an array size cannot exceed the size of an unsigned int.
Am I correct? Is this GCC-specific or is it part of some C or C++ standard?
Before somebody remarks that an 8-bit microcontroller generally does not have enough memory for an array so large, let me just anticipate saying that this is beside the point.
size_t is considered as the type to use, despite not being formally ratified by either the C or C++ standards.
The rationale for this is that the sizeof(values) will be that type (that is mandatated by the C and C++ standards), and the number of elements will be necessarily not greater than this since sizeof for an object is at least 1.
So it seems that, at an array size cannot exceed the size of an
unsigned int.
That seems to be the case in your particular C[++] implementation.
Am I correct? Is this gcc-specific or is it part of some C or C++
standard?
It is not a characteristic of GCC in general, nor is it specified by either the C or C++ standard. It is a characteristic of your particular implementation: a version of GCC for your specific computing platform.
The C standard requires the expression designating the number of elements of an array to have an integer type, but it does not specify a particular one. I do think it's strange that your GCC seems to claim it's giving you an array with a different number of elements than you specified. I don't think that conforms to the standard, and I don't think it makes much sense as an extension. I would prefer to see it reject the code instead.
I'll dissect the issue with the rules in the "incorrekt and incomplet" ISO CPP standard draft n4659. Emphasis is added by me.
11.3.4 defines array declarations. Paragraph one contains
If the constant-expression [between the square brackets] (8.20) is present, it shall be a converted constant expression of type std::size_t [...].
std::size_t is from <cstddef>and defined as
[...] an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
Since it is imported via the C standard library headers the C standard is relevant for the properties of size_t. The ISO C draft N2176 prescribes in 7.20.3 the "minimal maximums", if you want, of integer types. For size_t that maximum is 65535. In other words, a 16 bit size_t is entirely conformant.
A "converted constant expression" is defined in 8.20/4:
A converted constant expression of type T is an expression, implicitly converted to type T, where the converted expression is a constant expression and the implicit conversion sequence contains only [any of 10 distinct conversions, one of which concerns integers (par. 4.7):]
— integral conversions (7.8) other than narrowing conversions (11.6.4)
An integral conversion (as opposed to a promotion which changes the type to equivalent or larger types) is defined as follows (7.8/3):
A prvalue of an integer type can be converted to a prvalue of another integer type.
7.8/5 then excludes the integral promotions from the integral conversions. This means that the conversions are usually narrowing type changes.
Narrowing conversions (which, as you'll remember, are excluded from the list of allowed conversions in converted constant expressions used for array sizes) are defined in the context of list-initialization, 11.6.4, par. 7
A narrowing conversion is an implicit conversion
[...]
7.31 — from an integer type [...] to an integer type that cannot represent all the values of the original type, except where the source is a constant expression whose value after integral promotions will fit into the target type.
This is effectively saying that the effective array size must be the constant value at display, which is an entirely reasonable requirement for avoiding surprises.
Now let's cobble it all together. The working hypothesis is that std::size_t is a 16 bit unsigned integer type with a value range of 0..65535. The integer literal 65537 is not representable in the system's 16 bit unsigned int and thus has type long. Therefore it will undergo an integer conversion. This will be a narrowing conversion because the value is not representable in the 16 bit size_t2, so that the exception condition in 11.6.4/7.3, "value fits anyway", does not apply.
So what does this mean?
11.6.4/3.11 is the catch-all rule for the failure to produce an initializer value from an item in an intializer list. Because the initializer-list rules are used for array sizes, we can assume that the catch-all for conversion failure applies to the array size constant:
(3.11) — Otherwise, the program is ill-formed.
A conformant compiler is required to produce a diagnostic, which it does. Case closed.
1 Yes, they sub-divide paragraphs.
2 Converting an integer value of 65537 (in whatever type can hold the number — here probably a `long) to a 16 bit unsigned integer is a defined operation. 7.8/2 details:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is
no truncation). —end note ]
The binary representation of 65537 is 1_0000_0000_0000_0001, i.e. only the least significant bit of the lower 16 bits is set. The conversion to a 16 bit unsigned value (which circumstantial evidence indicates size_t is) computes the [expression value] modulo 2^16, i.e. simply takes the lower 16 bits. This results in the value of 1 mentioned in the compiler diagnostics.
In your implementation size_t is defined as unsigned int and uint32_t is defined as a long unsigned int. When you create a C array the argument for the array size gets implicitly converted to size_t by the compiler.
This is why you're getting a warning. You're specifying the array size argument with an uint32_t that gets converted to size_t and these types don't match.
This is probably not what you want. Use size_t instead.
The value returned by sizeof will be of type size_t.
It is generally used as the number of elements in an array, because it will be of sufficient size. size_t is always unsigned but it is implementation-defined which type this is. Lastly, it is implementation-defined whether the implementation can support objects of even SIZE_MAX bytes... or even close to it.
[This answer was written when the question was tagged with C and C++. I have not yet re-examined it in light of OP’s revelation they are using C++ rather than C.]
size_t is the type the C standard designates for working with object sizes. However, it is not a cure-all for getting sizes correct.
size_t should be defined in the <stddef.h> header (and also in other headers).
The C standard does not require that expressions for array sizes, when specified in declarations, have the type size_t, nor does it require that they fit in a size_t. It is not specified what a C implementation ought to do when it cannot satisfy a request for an array size, especially for variable length arrays.
In your code:
const uint32_t N = 65537;
uint8_t values[N];
values is declared as a variable length array. (Although we can see the value of N could easily be known at compile time, it does not fit C’s definition of a constant expression, so uint8_t values[N]; qualifies as a declaration of a variable length array.) As you observed, GCC warns you that the 32-bit unsigned integer N is narrowed to a 16-bit unsigned integer. This warning is not required by the C standard; it is a courtesy provided by the compiler. More than that, the conversion is not required at all—since the C standard does not specify the type for an array dimension, the compiler could accept any integer expression here. So the fact that it has inserted an implicit conversion to the type it needs for array dimensions and warned you about it is a feature of the compiler, not of the C standard.
Consider what would happen if you wrote:
size_t N = 65537;
uint8_t values[N];
Now there would be no warning in uint8_t values[N];, as a 16-bit integer (the width of size_t in your C implementation) is being used where a 16-bit integer is needed. However, in this case, your compiler likely warns in size_t N = 65537;, since 65537 will have a 32-bit integer type, and a narrowing conversion is performed during the initialization of N.
However, the fact that you are using a variable length array suggests you may be computing array sizes at run-time, and this is only a simplified example. Possibly your actual code does not use constant sizes like this; it may calculate sizes during execution. For example, you might use:
size_t N = NumberOfGroups * ElementsPerGroup + Header;
In this case, there is a possibility that the wrong result will be calculated. If the variables all have type size_t, the result may easily wrap (effectively overflow the limits of the size_t type). In this case, the compiler will not give you any warning, because the values are all the same width; there is no narrowing conversion, just overflow.
Therefore, using size_t is insufficient to guard against errors in array dimensions.
An alternative is to use a type you expect to be wide enough for your calculations, perhaps uint32_t. Given NumberOfGroups and such as uint32_t types, then:
const uint32_t N = NumberOfGroups * ElementsPerGroup + Header;
will produce a correct value for N. Then you can test it at run-time to guard against errors:
if ((size_t) N != N)
Report error…
uint8_t values[(size_t) N];

Confusion regarding types, overflows and UB in pointer-integral addition

I used to think that adding an integral type to a pointer (provided that the the pointer points to an array of a certain size etc. etc.) is always well defined, regardless of the integral type. The C++11 standard says ([expr.add]):
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i -th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n ) point to, respectively, the i + n -th and i − n -th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
On the other hand, it was brought to my attention recently that the built-in add operators for pointers are defined in terms of ptrdiff_t, which is a signed type (see 13.6/13). This seems to hint that if one does a malloc() with a very large (unsigned) size and then tries to reach the end of the allocated space via a pointer addition with a std::size_t value, this might result in undefined behaviour because the unsigned std::size_t will be converted to ptrdiff_t which is potentially UB.
I imagine similar issues would arise, e.g., in the operator[]() of std::vector, which is implemented in terms of an unsigned size_type. In general, it seems to me like this would make practically impossible to fully use the memory storage available on a platform.
It's worth noting that nor GCC nor Clang complain about signed-unsigned integral conversions with all the relevant diagnostic turned on when adding unsigned values to pointers.
Am I missing something?
EDIT: I'd like to clarify that I am talking about additions involving a pointer and an integral type (not two pointers).
EDIT2: an equivalent way of formulating the question might be this. Does this code result in UB in the second line, if ptrdiff_t has a smaller positive range than size_t?
char *ptr = static_cast<char * >(std::malloc(std::numeric_limits<std::size_t>::max()));
auto end = ptr + std::numeric_limits<std::size_t>::max();
Your question is based on a false premise.
Subtraction of pointers produces a ptrdiff_t §[expr.add]/6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (18.2).
That does not, however, mean that addition is defined in terms of ptrdiff_t. Rather the contrary, for addition only one conversion is specified (§[expr.add]/1):
The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.
The "usual arithmetic conversions" are defined in §[expr]/10. This includes only one conversion from unsigned type to signed type:
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
So, while there may be some room for question about exactly what type the size_t will be converted to (and whether it's converted at all), there's no question on one point: the only way it can be converted to a ptrdiff_t is if all its values can be represented without change as a ptrdiff_t.
So, given:
size_t N;
T *p;
...the expression p + N will never fail because of some (imagined) conversion of N to a ptrdiff_t before the addition takes place.
Since §13.6 is being mentioned, perhaps it's best to back up and look carefully at what §13.6 really is:
The candidate operator functions that represent the built-in operators defined in Clause 5 are specified in this subclause. These candidate functions participate in the operator overload resolution process as described in 13.3.1.2 and are used for no other purpose.
[emphasis added]
In other words, the fact that §13.6 defines an operator that adds a ptrdiff_t to a pointer does not mean that when any other integer type is added to a pointer, it's first converted to a ptrdiff_t, or anything like that. More generally, the operators defined in §13.6 are never used to carry out any arithmetic operations.
With that, and the rest of the text you quoted from §[expr.add], we can quickly conclude that adding a size_t to a pointer can overflow if and only if there aren't that many elements in the array after the pointer.
Given the above, one more question probably occurs to you. If I have code like this:
char *p = huge_array;
size_t N = sizeof(huge_array);
char *p2 = p + N;
ptrdiff_t diff = p2 - p;
...is it possible that the final subtraction will overflow? The short and simple answer to that is: Yes, it can.

sizeof() in C/C++ for arrays

So C/C++ arrays don't know about their length, right? But then how can the function sizeof(array) work and give us the proper size in bytes when it shouldn't be able to know the number of elements in the array?
So C/C++ arrays don't know about their length, right.
Your assumption is wrong. With the exception of variable length arrays introduced in C99, arrays in both C and C++ have a size that is known in compile time. The compiler knows their size.
Your confusion is probably because there are times when array names decay into a pointer to its first element (like when passed as function argument), it's true that the size information is lost here.
But when sizeof is used on an array, the array is not converted to a pointer. This is your other confusion: sizeof is not a function, it's an operator.
I will quote the relevant portions of C99 standard. §6.5.3.4 ¶2 says
The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer. If
the type of the operand is a variable length array type, the operand
is evaluated; otherwise, the operand is not evaluated and the result
is an integer constant.
It also says in the same section §6.5.3.4 ¶1
The sizeof operator shall not be applied to an expression that has
function type or an incomplete type.
About the array type, §6.2.5 ¶20 says
An array type describes a contiguously allocated nonempty set of
objects with a particular member object type, called the element type.
Array types are characterized by their element type and by the number
of elements in the array.
It again says in §6.2.5 ¶22
An array type of unknown size is an incomplete type.
So to summarize the above, the size of an array is known to the compiler (determined using sizeof operator) when you also specify the size of the array, i.e, when it's a complete type.

Float Values as an index in an Array in C++

Can a float value be used as the index of an array? What will happen if an expression used as an index resulted to a float value?
The float value will be casted to int (it can give warning or error depending on compiler's warning level)
s1 = q[12.2]; // same as q[12]
s2 = q[12.999999]; // same as q[12]
s3 = q[12.1/6.2]; // same as q[1]
Yes. But it's pointless. The float value will be truncated to an integer.
(You can use std::map<float, T>, however, but most of the time you'll miss the intended values because of inaccuracy.)
A C++ array is a contiguous sequence of memory locations. a[x] means "the xth memory location after one pointed to by a."
What would it mean to access the 12.4th object in a sequence?
It will be casted to int.
This is an error. In [expr.sub]:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “pointer to T” and the other shall have unscoped enumeration or integral type.
I am not aware of a clause in the standard that specifies that conversion should happen here (admittedly, I would not be surprised if such a clause existed), although testing with ideone.com did produce a compilation error.
However, if you're subscripting a class rather than a pointer — e.g. std::vector or std::array — then the overload of operator[] will have the usual semantics of a function call, and floating-point arguments will get converted to the corresponding size_type.