Is string subscript an associated index? - c++

The subscript operator ([]) takes a std::string::size_type value.
The operator returns a reference to the character at the given
position. The value in the subscript is referred to as "a subscript" pp93 ~ 94 C++ Primer 5ed.
and
A vector is a collection of objects, all of which have the same type. Evey object in the collection has an associated index, which gives
access to that object.pp96 C++ Primer 5ed.
Question:
Is string subscript an associated index? If not, what is the difference between the subscript of the std::string type and the associated index of the collection/vector?

Think "index" as "the sequential number of an item," not "index" as "the lookup table in a book."
What they're saying about vectors is that the elements in them can be accessed through sequential numeric indices: v[0], v[1], etc.
The exact same holds for strings and the characters in them.

According to std::vector::operator[], the function:
Returns a reference to the element at specified location pos. No bounds checking is performed.
According to std::basic_string::operator[], the function:
Returns a reference to the character at specified location pos. No bounds checking is performed. If pos > size(), the behavior is undefined.
Thus, they are pretty much the same thing. The term associated index means exactly what it sounds like; It is the index associated with the element, nothing more.

The wording here is rather precise, but there is no real difference for these two simple cases. For both string and vector, X[0] denotes the first element of X. That is to say, 0 is the associated index of the first element of X, and 0 is also the argument to operator[], aka the subscript.
To see an example that is not so simple, consider std::string_view. You can have a string_view of the 100th to 200th character of a string. Now view[5] has subscript 5, but it refers to the 105th character in the underlying string.

Related

Alternatives or enhancements to std::find() on a native c++ pointer (such as uint16 *)

I am trying to come up with a single line C++ conditional check to check the presence of a value within a buffer. (like the if 'value' in list: check in python)
std::find seemed to be the right fit here.
char *buf = "01020304";
int buf_len = 8
uint16_t *ptr = std::find((uint16_t *) buf, (uint16_t *) buf + 3, (uint16_t)13360); // 13360 corresponds to 2 bytes in "04"
std::cout <<"\n(ptr-buf):"<<(ptr-(uint16_t *) buf);//returns 3 for any value, even if 0 is passed instead of 13360 since the last 2 bytes are not searched (buf+6, buf +7))
std::find seems to search in the [first, last) range of a buffer. This makes sense for STL types like vector. However, for raw pointers, the only way to make std::find() look at the last element of buf involves passing an offset beyond buf (eg. buf+8).
Is this a safe thing to do? I am guessing not. Is there a better alternative to using std::find to do a single-line exists check on a char *buffer?
When using pointers with STL algorithms, it is valid and correct to use a 'one past the end' pointer as the end iterator for a range. It is not valid to dereference a pointer one past the end of an array but you can test against one and perform certain arithmetic operations with one.
The standard specifies that it is valid to perform comparisons on a 'one past the end' pointer, section 5.9.3, 'Relational Operators':
5.9.3 Comparing pointers to objects is defined as follows:
(3.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
(3.2) — If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
It also specifies addition and subtraction on pointers such that 'one past the end' pointers behave correctly in algorithms like std::distance(), std::advance(), etc.
In addition the standard has this to say regarding STL iterators:
24.1.6 Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. ...
Past the end pointers are standard C too. E.g. it's safe to produce them with arithmetic: from C11 standard, 6.5.6 (additive operators) paragraph 8:
... If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.

Is it still legal to do pointer arithmetic on a deleted array?

Today I wrote something which looked like this:
void foo(std::vector<char>&v){
v.push_back('a');
char*front=&v.front();
char*back=&v.back();
size_t n1=back-front+1;
v.push_back('b');//This could reallocate the vector elements
size_t n2=back-front+1;//Is this line valid or Undefined Behavior ?
}
If a reallocation occures when I push 'b' back, may I still compute the difference of my two pointers ?
After reading the relevant passage of the standard a few times, I still cannot make my mind on this point.
C++11 5.7.6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the
subscripts of the two array elements. The type of the result is an implementation-defined signed integral
type; this type shall be the same type that is defined as std::ptrdiff_t in the header (18.2). As
with any other arithmetic overflow, if the result does not fit in the space provided, the behavior is undefined.
In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object,
the expression (P)-(Q) has the value i − j provided the value fits in an object of type std::ptrdiff_t.
Moreover, if the expression P points either to an element of an array object or one past the last element of
an array object, and the expression Q points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the
expression P points one past the last element of the array object, even though the expression (Q)+1 does not
point to an element of the array object. Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
Of course I know that it works, I just wonder if it is legal.
Pointers to deleted objects are toxic: don't touch then for anything other than giving them a new value. A memory tracking system may trap aby use of a reclaimed pointer value. I'm not aware if any such system in existence, however.
The relevant quote is 3.7.4.2 [basic.stc.dynamic.deallocation] paragraph 4:
If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value, the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers to any part of the deallocated storage. The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined.
When resizing a std::vector<...> it jumps through a number of hoops (allocators) and, by default, eventually calls a deallocation function.
Strictly speaking, it's UB. But you can always convert your char * pointers to uintptr_t (provided it is present) and then safely subtract the resulting integers.
void foo(std::vector<char>&v){
v.push_back('a');
auto front= uintptr_t (&v.front());
auto back = uintptr_t (&v.back());
size_t n1=back-front+1;
v.push_back('b');//This could reallocate the vector elements
size_t n2=back-front+1;
}
This particular case is safe but ugly and misleading.
Line v.push_back('b');//This could reallocate the vector elements can cause reallocation of your container. In this case next line will use a non existent front and back pointers. Computing difference of two addresses is safe even if are dangling pointers. What is not safe is dereferencing them.
The correct solution is to use vector::count() function the will be always in sync. If you (for some reason) don;t want to call vector::count() you should at leas use ++n1.

What is std::string(itr, itr) supposed to do?

The web site cplusplus documentation for std::string constructor taking two input iterators states in part:
Copies the sequence of characters in the range [first,last), in the same order.
first, last:
Input iterators to the initial and final positions in a range. The range used is [first,last), which includes all the characters between first and last, including the character pointed by first but not the character pointed by last.
What does this mean in the degenerate case where first == last? On the one hand first is included and on the other last is excluded? What does the official C++ standard say should happen in this case? Should an exception be thrown?
I don't know what documentation it is you're reading, but the standard says (§21.4.2/15):
[..] constructs a string from the values in the range [begin, end), as indicated in the Sequence Requirements table
And the Sequence requirements table (Table 100) defines X a(i, j) for a valid range [i, j) as:
Constructs a sequence container equal to the range [i, j)
A range is valid when the second iterator is reachable from the first (through incrementing). For two iterators that are equal, the range is empty. See §24.2.1/7:
A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element pointed to by i and up to but not including the element pointed to by j. Range [i,j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
So if first == last, as you say, you will get an empty string. If last is not reachable from first, you have undefined behaviour.
The range is empty, so there's nothing to copy. The result is an empty string.
What does this mean in the degenerate case where first == last?
It means that the input range is empty, so the string will be empty.
What does the standard say should happen in this case?
C++11 24.2.1/7 says:
A range [i,i) is an empty range
One thing I've used it for several times is constructing a std::string from a substring in a large C-style string (const char*). You can pass it two const char* pointers and it will construct a string from the characters starting at the first pointer and ending one before the second pointer.
If first == last, the result is an empty string. If first > last, behavior is undefined (thanks Mooing Duck)

How is a 2 dimensional array implemented in C++ [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
2D-array as argument to function
How are 2D arrays implemented in C++? Is it contiguous in memory?
i.e If a is an array first address contains element a(0,0) then next address a(0,1) then a(1,0), a(1,1) and so on...?
Yes it is contiguous in row major order. Suppose you have a 2d array named a[3][3]. Then in memory elements will be in this order: a[0][0], a[0][1], a[0][2], a[1][0], a[1][1], a[1][2], a[2][0], a[2][1], a[2][2].
Here is more details with an example
Given a declaration T D[C], where T is a type name, D an identifier and C an integral constant expression, the Standard says (highlighting mine):
(§8.3.4/1) [...] then the type of the identifier [...] is an array type. [...] The constant expression specifies the bound of (number of elements in) the array. If the value of the constant expression is N, the array has N elements numbered 0 to N-1, and the type of the identifier of D is “derived-declarator-type-list array of N T”. An object of array type contains a contiguously allocated non-empty set of N subobjects of type T. [...]
And:
(§8.3.4/3) When several “array of” specifications are adjacent, a multidimensional array is created; [...]
As well as:
(§8.3.4/9) [ Note: It follows from all this that arrays in C++ are stored row-wise (last subscript varies fastest) and that the first subscript in the declaration helps determine the amount of storage consumed by an array but plays no other part in subscript calculations. — end note ]
Conclusion
All this makes it clear that T a[N][M] is a contiguously stored list of N objects, each of which is a contigously stored list of M objects of type T. So yes, the whole two-dimensional array is one contiguously stored object.
Does that mean you can use one combined index to access the elements directly?
So, given an array int a[10][5], can you use a[0][23] instead of a[2][3]? Stricly speaking, no, because that is a violation of the first rule above, whereby only indexes 0..4 are valid for the second index. However, as far as that particular expression is concerned, if you were to consider a[0] as a pointer p to the first element of the first row of the array, and a[0][23] as *(p+23), you could be sure to access the correct element. More on this question in this existing question.

Is it undefined behavior to form a pointer range from a stack address?

Some C or C++ programmers are surprised to find out that even storing an invalid pointer is undefined behavior. However, for heap or stack arrays, it's okay to store the address of one past the end of the array, which allows you to store "end" positions for use in loops.
But is it undefined behavior to form a pointer range from a single stack variable, like:
char c = 'X';
char* begin = &c;
char* end = begin + 1;
for (; begin != end; ++begin) { /* do something */ }
Although the above example is pretty useless, this might be useful in the event that some function expects a pointer range, and you have a case where you simply have a single value to pass it.
Is this undefined behavior?
This is allowed, the behavior is defined and both begin and end are safely-derived pointer values.
In the C++ standard section 5.7 ([expr.add]) paragraph 4:
For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
When using C a similar clause can be found in the the C99/N1256 standard section 6.5.6 paragraph 7.
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
As an aside, in section 3.7.4.3 ([basic.stc.dynamic.safety]) "Safely-derived pointers" there is a footnote:
This section does not impose restrictions on dereferencing pointers to memory not allocated by ::operator new. This maintains the ability of many C++ implementations to use binary libraries and components written in other languages. In particular, this applies to C binaries, because dereferencing pointers to memory allocated by malloc is not restricted.
This suggests that pointer arithmetic throughout the stack is implementation-defined behavior, not undefined behavior.
I believe that legally, you may treat a single object as an array of size one. In addition, it is most definitely legal to take a pointer one past the end of any array as long as it's not de-referenced. So I believe that it is not UB.
It is not Undefined Behavior as long as you don't dereference the invalid iterator.
You are allowed to hold a pointer to memory beyond your allocation but not allowed to dereference it.
5.7-5 of ISO14882:2011(e) states:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i + n-th and i −
n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
Unless I overlooked something there, the addition only applies to pointers pointing to the same array. For everything else, the last sentence applies: "otherwise, the behaviour is undefined"
edit:
Indeed, when you add 5.7-4 it turns out that the operation you do is (virtually) on an array, thus the sentence does not apply:
For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of
length one with the type of the object as its element type.
In general it would be undefined behaviour to point beyond the memory space, however there is an exception for "one past the end", which is valid according to the standard.
Therefore in the particular example, &c+1 is a valid pointer but cannot be safely dereferenced.
You could define c as an array of size 1:
char c[1] = { 'X' };
Then the undefined behavior would become defined behavior.
Resulting code should be identical.