Difference in c_str function specification between C++03 and C++11 - c++

In the C++ reference of c_str() in std::string the following appears:
Return value
Pointer to the underlying character storage.
data()[i] == operator[](i) for every i in [0, size()) (until C++11)
data() + i == &operator[](i) for every i in [0, size()] (since C++11)
I do not understand the difference between the two, except for the range increase by one element since C++11.
Isn't the former statement data()[i] == operator[](i) also true for the latter?

Except for the range increment by one element since C++11, there is still a big difference between:
data()[i] == operator[](i)
and:
data() + i == &operator[](i)
That main difference is the & operator in the prototypes.
The old prototype, allowed for copy to be made when a write operation would occur, since the pointer returned could point to another buffer than the one holding the original string.
The other difference in the prototypes between data()[i] and data() + i, is not critical, since they are equivalent.
A difference between C++ and C++11 is that in the former, an std::string was not specified explicitly by the standard for whether it would have a null terminator or not. In the latter however, this is specified.
In other words: Will std::string always be null-terminated in C++11? Yes.

Note the closing bracket difference:
[0, size())
[0, size()]
First stands for exclusive range (that is item at size index is not included) while second stands for inclusive range (that is item at size index is included)
Before C++ the precense of terminating null was not handled in this case, while in C++11 accessing character at size() position is well-defined.
As for difference between data()[i] == operator[](i) and data() + i == &operator[](i) the second one applies more restrictions on potential implementation. In first case a pointer to buffer returned by data() may be different from the pointer to buffer where a value the reference to which returned by operator [] is stored. This could happen when a new buffer was created after invocation of non-const-qualified operator[] of copied string.

Prior to C++11, it was unspecified whether the string data was null-terminated or not. C++11 says it must be null-terminated.

Related

Calling std::string::assign(const CharT* s, size_type count) with count 0 safe? [duplicate]

I have a function which returns a pointer and a length, and I want to call std::string::assign(pointer, length). Do I have to make a special case (calling clear) when length is zero and the pointer may be nullptr?
The C++ standard says:
21.4.6.3 basic_string::assign
basic_string& assign(const charT* s, size_type n);
Requires: s points to an array of at least n elements of charT.
So what if n is zero? What is an array of zero characters and how does one point to it?
Is it valid to call
s.assign(nullptr, 0);
or is it undefined behavior?
The implementation of libstdc++ appears not to dereference the pointer s when the size n is zero, but that's hardly a guarantee.
Pedantically, a nullptr does not meet the requirements of pointing to an array of size >=0, and therefore the standard does not guarantee the behaviour (it's UB).
On the other hand, the implementation wouldn't be allowed to dereference the pointer if n is zero, because the pointer could be to an array of size zero, and dereferencing such a pointer would have undefined behaviour. Besides, there wouldn't be any need to do so, because nothing is copied.
The above reasoning does not mean that it is OK to ignore the UB. But, if there is no reason to disallow s.assign(nullptr, 0) then it could be preferable to change the wording of the standard to "If n is greater than zero, then s points to ...". I don't know of any good reason to disallow it, but neither can I promise that a good reason doesn't exist.
Note that adding a check is hardly complicated:
s.assign(ptr ? ptr : "", n);
What is an array of zero characters
This is: new char[0]. Arrays of automatic or static storage may not have a zero size.
Well as you point out, the standard says "s points to an array...". A null pointer does not point to an array of any number of elements. Not even 0 elements. Also, note that s points to "an array of at least n elements...". So it's clear that if n is zero, you can still pass a legitimate pointer to an array.
Overall, std::string's API is not well-guarded against null pointers to charT. So you should always make sure that pointers you hand off to it are non-null.
I am not sure why an implementation would dereference any pointer to an array whose length is provided as zero.
That said, I would err to the side of caution. You could argue that you are not meeting the standards requirement:
21.4.6.3 basic_string::assign
8 Requires: s points to an array of at least n elements of charT
because nullptr is not pointing to an array.
So technically the behaviour is undefined.
From the Standard (2.14.7) [lex.nullptr]:
The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t. [ Note: std::nullptr_t
is a distinct type that is neither a pointer type nor a pointer to member type ... ]
std::nullptr_t can be implicitly converted to any type of null pointer as per 4.10.1 [conv.ptr]. Regardless of the type of null pointer, the fact remains that it points at nothing.
Thus, it doesn't meet the requirement that s points to an array of at least n elements of charT.
It seems to be undefined behavior.
Interestingly, according to this answer, the C++11 Standard clearly stated that s must not be a null pointer in the basic_string constructor, but this wording has since been removed.

Properties of a pointer to a zero length array

Consider
int main()
{
auto a = new int[0];
delete[] a; // So there's no memory leak
}
Between the copy initialisation and deletion, are you allowed to read the pointer at a + 1?
Furthermore, does the language permit the compiler to set a to nullptr?
Per recent CWG reflector discussion as a result of editorial issue 3178, new int[0] produces what is currently called a "past-the-end" pointer value.
It follows that a cannot be null, and a + 1 is undefined by [expr.add]/4.
auto a = new int[0];
According to [basic.compound.3], the value stored in a must be one of the following:
A pointer to an object (of type int)
A pointer past the end of an object
Null
Invalid
We can rule out the first possibility since there were no objects of type int constructed. The third possibility is ruled out since C++ requires a non-null pointer to be returned (see [basic.stc.dynamic.allocation.2]). Thus we are left with two possibilities: a pointer past the end of an object or an invalid pointer.
I would be inclined to view a as a past-the-end pointer, but I don't have a reputable reference to definitively establish that. (There is, though, a strong implication of this in [basic.stc], seeing how you can delete this pointer.) So I'll entertain both possibilities in this answer.
Between the copy initialisation and deletion, are you allowed to read the pointer at a + 1?
The behavior is undefined, as dictated by [expr.add.4], regardless of which possibility from above applies.
If a is a past-the-end pointer, then it is considered to point to the hypothetical element at index 0 of an array with no elements. Adding the integer j to a is defined only when 0≤0+j≤n, where n is the size of the array. In our case, n is zero, so the sum a+j is defined only when j is 0. In particular, adding 1 is undefined.
If a is invalid, then we cleanly fall into "Otherwise, the behavior is undefined." (Not surprisingly, the cases that are defined cover only valid pointer values.)
Furthermore, does the language permit the compiler to set a to nullptr?
No. From the above-mentioned [basic.stc.dynamic.allocation.2]: "If the request succeeds, the value returned by a replaceable allocation function is a non-null pointer value". There is also a footnote calling out that C++ (but not C) requires a non-null pointer in response to a zero request.

Is it valid to pass nullptr to std::string::assign?

I have a function which returns a pointer and a length, and I want to call std::string::assign(pointer, length). Do I have to make a special case (calling clear) when length is zero and the pointer may be nullptr?
The C++ standard says:
21.4.6.3 basic_string::assign
basic_string& assign(const charT* s, size_type n);
Requires: s points to an array of at least n elements of charT.
So what if n is zero? What is an array of zero characters and how does one point to it?
Is it valid to call
s.assign(nullptr, 0);
or is it undefined behavior?
The implementation of libstdc++ appears not to dereference the pointer s when the size n is zero, but that's hardly a guarantee.
Pedantically, a nullptr does not meet the requirements of pointing to an array of size >=0, and therefore the standard does not guarantee the behaviour (it's UB).
On the other hand, the implementation wouldn't be allowed to dereference the pointer if n is zero, because the pointer could be to an array of size zero, and dereferencing such a pointer would have undefined behaviour. Besides, there wouldn't be any need to do so, because nothing is copied.
The above reasoning does not mean that it is OK to ignore the UB. But, if there is no reason to disallow s.assign(nullptr, 0) then it could be preferable to change the wording of the standard to "If n is greater than zero, then s points to ...". I don't know of any good reason to disallow it, but neither can I promise that a good reason doesn't exist.
Note that adding a check is hardly complicated:
s.assign(ptr ? ptr : "", n);
What is an array of zero characters
This is: new char[0]. Arrays of automatic or static storage may not have a zero size.
Well as you point out, the standard says "s points to an array...". A null pointer does not point to an array of any number of elements. Not even 0 elements. Also, note that s points to "an array of at least n elements...". So it's clear that if n is zero, you can still pass a legitimate pointer to an array.
Overall, std::string's API is not well-guarded against null pointers to charT. So you should always make sure that pointers you hand off to it are non-null.
I am not sure why an implementation would dereference any pointer to an array whose length is provided as zero.
That said, I would err to the side of caution. You could argue that you are not meeting the standards requirement:
21.4.6.3 basic_string::assign
8 Requires: s points to an array of at least n elements of charT
because nullptr is not pointing to an array.
So technically the behaviour is undefined.
From the Standard (2.14.7) [lex.nullptr]:
The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t. [ Note: std::nullptr_t
is a distinct type that is neither a pointer type nor a pointer to member type ... ]
std::nullptr_t can be implicitly converted to any type of null pointer as per 4.10.1 [conv.ptr]. Regardless of the type of null pointer, the fact remains that it points at nothing.
Thus, it doesn't meet the requirement that s points to an array of at least n elements of charT.
It seems to be undefined behavior.
Interestingly, according to this answer, the C++11 Standard clearly stated that s must not be a null pointer in the basic_string constructor, but this wording has since been removed.

Alternatives or enhancements to std::find() on a native c++ pointer (such as uint16 *)

I am trying to come up with a single line C++ conditional check to check the presence of a value within a buffer. (like the if 'value' in list: check in python)
std::find seemed to be the right fit here.
char *buf = "01020304";
int buf_len = 8
uint16_t *ptr = std::find((uint16_t *) buf, (uint16_t *) buf + 3, (uint16_t)13360); // 13360 corresponds to 2 bytes in "04"
std::cout <<"\n(ptr-buf):"<<(ptr-(uint16_t *) buf);//returns 3 for any value, even if 0 is passed instead of 13360 since the last 2 bytes are not searched (buf+6, buf +7))
std::find seems to search in the [first, last) range of a buffer. This makes sense for STL types like vector. However, for raw pointers, the only way to make std::find() look at the last element of buf involves passing an offset beyond buf (eg. buf+8).
Is this a safe thing to do? I am guessing not. Is there a better alternative to using std::find to do a single-line exists check on a char *buffer?
When using pointers with STL algorithms, it is valid and correct to use a 'one past the end' pointer as the end iterator for a range. It is not valid to dereference a pointer one past the end of an array but you can test against one and perform certain arithmetic operations with one.
The standard specifies that it is valid to perform comparisons on a 'one past the end' pointer, section 5.9.3, 'Relational Operators':
5.9.3 Comparing pointers to objects is defined as follows:
(3.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
(3.2) — If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
It also specifies addition and subtraction on pointers such that 'one past the end' pointers behave correctly in algorithms like std::distance(), std::advance(), etc.
In addition the standard has this to say regarding STL iterators:
24.1.6 Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. ...
Past the end pointers are standard C too. E.g. it's safe to produce them with arithmetic: from C11 standard, 6.5.6 (additive operators) paragraph 8:
... If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.

For a C++ string s, is s[s.size()] legal and always equal to '\0'?

If s[s.size()]=='\0', then it is convenient to treat it as a sentinel for some algorithm. I did a test and it's always equal to '\0', but some books says it's illegal to access s[s.size()].
Yes, that will give a reference to a zero-valued character, as specified by the C++11 standard:
Requires: pos <= size().
Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
where charT() is a value-constructed character, which will have the value zero. T is presumably a typo for charT. The C++14 draft (and presumably the final standard) says the same thing, with the typo fixed.
If you have a book that says otherwise, burn it or sell it to your enemies.