Character array initialization with the first element being null - c++

I was recently faced with a line of code and four options:
char fullName[30] = {NULL};
A) First element is assigned a NULL character.
B) Every element of the array is assigned 0 ( Zeroes )
C) Every element of the array is assigned NULL
D) The array is empty.
The answer we selected was option C, as, while the array is only initialized with a single NULL, C++ populates the rest of the array with NULL.
However, our professor disagreed, stating that the answer is A, he said:
So the very first element is NULL, and when you display it, it's displaying the first element, which is NULL.
The quote shows the question in its entirety; there was no other information provided. I'm curious to which one is correct, and if someone could explain why said answer would be correct.

The question is ill-defined, but Option B seems like the most correct answer.
The result depends on how exactly NULL is defined, which depends on the compiler (more precisely, on the standard library implementation). If it's defined as nullptr, the code will not compile. (I don't think any major implementation does that, but still.)
Assuming NULL is not defined as nullptr, then it must be defined as an integer literal with value 0 (which is 0, or 0L, or something similar), which makes your code equivalent to char fullName[30] = {0};.
This fills the array with zeroes, so Option B is the right answer.
In general, when you initialize an array with a brace-enclosed list, every element is initialized with something. If you provide fewer initializers than the number of elements, the remaining elements are zeroed.
Regarding the remaining options:
Option C is unclear, because if the code compiles, then NULL is equivalent to 0, so option C can be considered equivalent to Option B.
Option A can be valid depending on how you interpret it. If it means than the remaining elements are uninitialized, then it's wrong. If it doesn't specify what happens to the remaining elements, then it's a valid answer.
Option D is outright wrong, because arrays can't be "empty".

char fullName[30] = {NULL};
This is something that should never be written.
NULL is a macro that expands to a null pointer constant. A character - not a pointer - is being initialised here, so it makes no sense to use NULL.
It just so happens that some null pointer constants are also integer literals with value 0 (i.e. 0 or 0L for example), and if NULL expands to such literal, then the shown program is technically well-formed despite the abuse of NULL. What the macro expands to exactly is defined by the language implementation.
If NULLinstead expands to a null pointer constant that is not an integer literal such as nullptr - which is entirely possible - then the program is ill-formed.
NULL shouldn't be written in C++ at all, even to initialise pointers. It exists for backwards compatibility with C to make it easier to port C programs to C++.
Now, let us assume that NULL happens to expand to an integer literal on this particular implementation of C++.
Nothing in the example is assigned. Assignment is something that is done to pre-existing object. Here, and array is being initialised.
The first element of the array is initialised with the zero literal. The rest of the elements are value initialised. Both result in the null character. As such, the entire array will be filled with null characters.
A simple and correct way to write the same is:
char fullName[30] = {};
B and C are equally close to being correct, except for wording regarding "assignment". They fail to mention value initialisation, but at least the outcome is the same. A is not wrong either, although it is not as complete because it fails to describe how the rest of the elements are initialised.
If "empty" is interpreted as "contains no elements", then D is incorrect because the array contains 30 elements. If it is interpreted as "contains the empty string", then D would be a correct answer.

You are almost correct.
The professor is incorrect. It is true that display finishes at the first NULL (when some approaches are used), but that says nothing about the values of the remainder of the array, which could be trivially examined regardless.
[dcl.init/17.5]:: [..] the
ith array element is copy-initialized with xi for each 1 ≤ i ≤ k, and value-initialized for each k < i ≤ n. [..]
However, none of the options is strictly correct and well-worded.
What happens is that NULL is used to initialise the first element, and the other elements are zero-initialised. The end result is effectively Option B.
Thing is, if NULL were defined as an expression of type std::nullptr_t on your platform (which it isn't, but it is permitted to be), the example won't even compile!
NULL is a pointer, not a number. Historically it has been possible to mix and match the two things to some degree, but C++ has tried to tighten that up in recent years, and you should avoid blurring the line.
A better approach is:
char fullName[30] = {};
And the best approach is:
std::string fullName;

Apparently, Your Professor is right, let's see how
char someName[6] = "SAAD";
how the string name is represented in memory:
0 1 2 3 4 5
S A A D
Array-based C string
The individual characters that make up the string are stored in the elements of the array. The string is terminated by a null character. Array elements after the null character are not part of the string, and their contents are irrelevant.
A "null string" is a string with a null character as its first character:
0 1 2 3 4 5
/0
Null C string
The length of a null string is 0.

Related

Calling std::string::assign(const CharT* s, size_type count) with count 0 safe? [duplicate]

I have a function which returns a pointer and a length, and I want to call std::string::assign(pointer, length). Do I have to make a special case (calling clear) when length is zero and the pointer may be nullptr?
The C++ standard says:
21.4.6.3 basic_string::assign
basic_string& assign(const charT* s, size_type n);
Requires: s points to an array of at least n elements of charT.
So what if n is zero? What is an array of zero characters and how does one point to it?
Is it valid to call
s.assign(nullptr, 0);
or is it undefined behavior?
The implementation of libstdc++ appears not to dereference the pointer s when the size n is zero, but that's hardly a guarantee.
Pedantically, a nullptr does not meet the requirements of pointing to an array of size >=0, and therefore the standard does not guarantee the behaviour (it's UB).
On the other hand, the implementation wouldn't be allowed to dereference the pointer if n is zero, because the pointer could be to an array of size zero, and dereferencing such a pointer would have undefined behaviour. Besides, there wouldn't be any need to do so, because nothing is copied.
The above reasoning does not mean that it is OK to ignore the UB. But, if there is no reason to disallow s.assign(nullptr, 0) then it could be preferable to change the wording of the standard to "If n is greater than zero, then s points to ...". I don't know of any good reason to disallow it, but neither can I promise that a good reason doesn't exist.
Note that adding a check is hardly complicated:
s.assign(ptr ? ptr : "", n);
What is an array of zero characters
This is: new char[0]. Arrays of automatic or static storage may not have a zero size.
Well as you point out, the standard says "s points to an array...". A null pointer does not point to an array of any number of elements. Not even 0 elements. Also, note that s points to "an array of at least n elements...". So it's clear that if n is zero, you can still pass a legitimate pointer to an array.
Overall, std::string's API is not well-guarded against null pointers to charT. So you should always make sure that pointers you hand off to it are non-null.
I am not sure why an implementation would dereference any pointer to an array whose length is provided as zero.
That said, I would err to the side of caution. You could argue that you are not meeting the standards requirement:
21.4.6.3 basic_string::assign
8 Requires: s points to an array of at least n elements of charT
because nullptr is not pointing to an array.
So technically the behaviour is undefined.
From the Standard (2.14.7) [lex.nullptr]:
The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t. [ Note: std::nullptr_t
is a distinct type that is neither a pointer type nor a pointer to member type ... ]
std::nullptr_t can be implicitly converted to any type of null pointer as per 4.10.1 [conv.ptr]. Regardless of the type of null pointer, the fact remains that it points at nothing.
Thus, it doesn't meet the requirement that s points to an array of at least n elements of charT.
It seems to be undefined behavior.
Interestingly, according to this answer, the C++11 Standard clearly stated that s must not be a null pointer in the basic_string constructor, but this wording has since been removed.

Using NULL with C's char* strings [duplicate]

This question already has answers here:
What is the difference between NULL, '\0' and 0?
(11 answers)
Closed 4 years ago.
As we all know, strings in C are null-terminated. Does that mean that according to the standard it is legal to use the NULL constant as the terminator? Or is the similarity of the name of NULL pointer and null-terminator for a string only a happy coincidence?
Consider the code:
char str1[] = "abc";
char str2[] = "abc";
str1[3] = NULL;
str2[3] = '\0';
Here, we change the terminator of str1 to NULL. Is this legal and well-formed C code and str1 adheres to C's definition of null-terminated string? Will it be the same in case of C++?
In practice, I have always used NULL instead of '\0' in my code for strings and everything worked - but is such practice 100% legal?
EDIT: I understand that it's very bad style and refrain from endorsing it and now understand the difference between 0, NULL and '\0' (as in a duplicate What is the difference between NULL, '\0' and 0). I'm still quite curious as for the legality of this code - and voices here seem to be mixed - and the duplicate does not give an authoritative answer to that in my opinion.
Does that mean that according to the standard it is legal to use the NULL constant as the terminator? (OP)
str1[3] = NULL;
Sometimes. Further: does it always properly cause a character array to form a string without concerns?
First, it looks wrong. Akin to int z = 0.0;. Yes it is legal well defined code, but unnecessarily draws attention to itself.
In practice, I have always used NULL instead of '\0' (OP)
I doubt you will find any modern style guide or group of coders endorsing that. NULL is best reserved for pointer contexts.1
These are 2 common and well understood alternatives.
str1[3] = '\0';
str1[3] = 0;
strings in C are null-terminated (OP)
The C spec consistently uses null character, not just null.
The macros are NULL which expands to an implementation-defined null pointer constant; and ... C11 §7.19 3
OK, now what is a null pointer constant?
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. §6.3.2.3 5
If the null pointer constant is a void* then we have something like
str1[3] = (void*) 0;
The above can warn about converting a pointer to a char. This is something best avoided.
Will it be the same in case of C++? (OP)
Yes, the above applies. (Aside: str1[3] = 0 may warn.) Further, NULL is less preferred than nullptr. So NULL is rarely the best to use in C++ in any context.
1Note: #Joshua reports a style that matches OP's in 1995 Turbo C 4.5
The bottom line is that in C/C++, NULL is for pointers and is not the same as the null character, despite the fact that both are defined as zero. You might use NULL as the null character and get away with it depending on the context and platform, but to be correct, use '\0'. This is described in both standards:
C specifies that the macro NULL is defined as a macro in <stddef.h> which "expands to an implementation-defined null pointer constant" (Section 7.17.3), which is itself defined as "an integer constant expression with the value 0, or such an expression cast to type void *" (Section 6.3.2.3.3).
The null character is defined in section 5.2.1.2: "A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string." That same section explains that \0 will be the representation of this null character.
C++ makes the same distinctions. From section 4.10.1 of the C++ standard: "A null pointer constant is an integer literal (2.13.2) with value zero or a prvalue of type std::nullptr_t." In section 2.3.3, it describes the as "null character (respectively, null wide character), whose value is 0". Section C.5.2 further confirms that C++ respects NULL as a standard macro imported from the C Standard Library.
No, I don't think it's strictly legal.
NULL is specified to be either:
an integer constant expression with the value ​0​
an integer constant expression with the value 0 cast to the type void*
In an implementation that uses the first format, using it as the string terminator will work.
But in an implementation that uses the second format, it's not guaranteed to work. You're converting a pointer type to an integer type, and the result of this is implementation-dependent. It happens to do what you want in common implementations, but nothing requires it.
If you have the second type of implementation, you're likely to get a warning like:
warning: incompatible pointer to integer conversion assigning to 'char' from 'void *' [-Wint-conversion]
If you want to use a macro, you can define:
#define NUL '\0'
and then use NUL instead of NULL. This matches the official name of the ASCII null character.

In a structure, is it legal to use one array field to access another one?

As an example, consider the following structure:
struct S {
int a[4];
int b[4];
} s;
Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?
Personally, I feel that it must be UB in C++, whereas I'm not sure about C.
However, I failed to find anything relevant in the standards of C and C++ languages.
Update
There are several answers suggesting ways to make sure there is no padding
between fields in order to make the code work reliably. I'd like to emphasize
that if such code is UB, then absense of padding is not enough. If it is UB,
then the compiler is free to assume that accesses to S.a[i] and S.b[j] do not
overlap and the compiler is free to reorder such memory accesses. For example,
int x = s.b[2];
s.a[6] = 2;
return x;
can be transformed to
s.a[6] = 2;
int x = s.b[2];
return x;
which always returns 2.
Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?
No. Because accessing an array out of bound invoked undefined behaviour in C and C++.
C11 J.2 Undefined behavior
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond
the array object and is used as the operand of a unary * operator that
is evaluated (6.5.6).
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
C++ standard draft section 5.7 Additive operators paragraph 5 says:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
[...] If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
Apart from the answer of #rsp (Undefined behavior for an array subscript that is out of range) I can add that it is not legal to access b via a because the C language does not specify how much padding space can be between the end of area allocated for a and the start of b, so even if you can run it on a particular implementation , it is not portable.
instance of struct:
+-----------+----------------+-----------+---------------+
| array a | maybe padding | array b | maybe padding |
+-----------+----------------+-----------+---------------+
The second padding may miss as well as the alignment of struct object is the alignment of a which is the same as the alignment of b but the C language also does not impose the second padding not to be there.
a and b are two different arrays, and a is defined as containing 4 elements. Hence, a[6] accesses the array out of bounds and is therefore undefined behaviour. Note that array subscript a[6] is defined as *(a+6), so the proof of UB is actually given by section "Additive operators" in conjunction with pointers". See the following section of the C11-standard (e.g. this online draft version) describing this aspect:
6.5.6 Additive operators
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i-n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The same argument applies to C++ (though not quoted here).
Further, though it is clearly undefined behaviour due to the fact of exceeding array bounds of a, note that the compiler might introduce padding between members a and b, such that - even if such pointer arithmetics were allowed - a+6 would not necessarily yield the same address as b+2.
Is it legal? No. As others mentioned, it invokes Undefined Behavior.
Will it work? That depends on your compiler. That's the thing about undefined behavior: it's undefined.
On many C and C++ compilers, the struct will be laid out such that b will immediately follow a in memory and there will be no bounds checking. So accessing a[6] will effectively be the same as b[2] and will not cause any sort of exception.
Given
struct S {
int a[4];
int b[4];
} s
and assuming no extra padding, the structure is really just a way of looking at a block of memory containing 8 integers. You could cast it to (int*) and ((int*)s)[6] would point to the same memory as s.b[2].
Should you rely on this sort of behavior? Absolutely not. Undefined means that the compiler doesn't have to support this. The compiler is free to pad the structure which could render the assumption that &(s.b[2]) == &(s.a[6]) incorrect. The compiler could also add bounds checking on the array access (although enabling compiler optimizations would probably disable such a check).
I've have experienced the effects of this in the past. It's quite common to have a struct like this
struct Bob {
char name[16];
char whatever[64];
} bob;
strcpy(bob.name, "some name longer than 16 characters");
Now bob.whatever will be " than 16 characters". (which is why you should always use strncpy, BTW)
As #MartinJames mentioned in a comment, if you need to guarantee that a and b are in contiguous memory (or at least able to be treated as such, (edit) unless your architecture/compiler uses an unusual memory block size/offset and forced alignment that would require padding to be added), you need to use a union.
union overlap {
char all[8]; /* all the bytes in sequence */
struct { /* (anonymous struct so its members can be accessed directly) */
char a[4]; /* padding may be added after this if the alignment is not a sub-factor of 4 */
char b[4];
};
};
You can't directly access b from a (e.g. a[6], like you asked), but you can access the elements of both a and b by using all (e.g. all[6] refers to the same memory location as b[2]).
(Edit: You could replace 8 and 4 in the code above with 2*sizeof(int) and sizeof(int), respectively, to be more likely to match the architecture's alignment, especially if the code needs to be more portable, but then you have to be careful to avoid making any assumptions about how many bytes are in a, b, or all. However, this will work on what are probably the most common (1-, 2-, and 4-byte) memory alignments.)
Here is a simple example:
#include <stdio.h>
union overlap {
char all[2*sizeof(int)]; /* all the bytes in sequence */
struct { /* anonymous struct so its members can be accessed directly */
char a[sizeof(int)]; /* low word */
char b[sizeof(int)]; /* high word */
};
};
int main()
{
union overlap testing;
testing.a[0] = 'a';
testing.a[1] = 'b';
testing.a[2] = 'c';
testing.a[3] = '\0'; /* null terminator */
testing.b[0] = 'e';
testing.b[1] = 'f';
testing.b[2] = 'g';
testing.b[3] = '\0'; /* null terminator */
printf("a=%s\n",testing.a); /* output: a=abc */
printf("b=%s\n",testing.b); /* output: b=efg */
printf("all=%s\n",testing.all); /* output: all=abc */
testing.a[3] = 'd'; /* makes printf keep reading past the end of a */
printf("a=%s\n",testing.a); /* output: a=abcdefg */
printf("b=%s\n",testing.b); /* output: b=efg */
printf("all=%s\n",testing.all); /* output: all=abcdefg */
return 0;
}
No, since accesing an array out of bounds invokes Undefined Behavior, both in C and C++.
Short Answer: No. You're in the land of undefined behavior.
Long Answer: No. But that doesn't mean that you can't access the data in other sketchier ways... if you're using GCC you can do something like the following (elaboration of dwillis's answer):
struct __attribute__((packed,aligned(4))) Bad_Access {
int arr1[3];
int arr2[3];
};
and then you could access via (Godbolt source+asm):
int x = ((int*)ba_pointer)[4];
But that cast violates strict aliasing so is only safe with g++ -fno-strict-aliasing. You can cast a struct pointer to a pointer to the first member, but then you're back in the UB boat because you're accessing outside the first member.
Alternatively, just don't do that. Save a future programmer (probably yourself) the heartache of that mess.
Also, while we're at it, why not use std::vector? It's not fool-proof, but on the back-end it has guards to prevent such bad behavior.
Addendum:
If you're really concerned about performance:
Let's say you have two same-typed pointers that you're accessing. The compiler will more than likely assume that both pointers have the chance to interfere, and will instantiate additional logic to protect you from doing something dumb.
If you solemnly swear to the compiler that you're not trying to alias, the compiler will reward you handsomely:
Does the restrict keyword provide significant benefits in gcc / g++
Conclusion: Don't be evil; your future self, and the compiler will thank you.
Jed Schaff’s answer is on the right track, but not quite correct. If the compiler inserts padding between a and b, his solution will still fail. If, however, you declare:
typedef struct {
int a[4];
int b[4];
} s_t;
typedef union {
char bytes[sizeof(s_t)];
s_t s;
} u_t;
You may now access (int*)(bytes + offsetof(s_t, b)) to get the address of s.b, no matter how the compiler lays out the structure. The offsetof() macro is declared in <stddef.h>.
The expression sizeof(s_t) is a constant expression, legal in an array declaration in both C and C++. It will not give a variable-length array. (Apologies for misreading the C standard before. I thought that sounded wrong.)
In the real world, though, two consecutive arrays of int in a structure are going to be laid out the way you expect. (You might be able to engineer a very contrived counterexample by setting the bound of a to 3 or 5 instead of 4 and then getting the compiler to align both a and b on a 16-byte boundary.) Rather than convoluted methods to try to get a program that makes no assumptions whatsoever beyond the strict wording of the standard, you want some kind of defensive coding, such as static assert(&both_arrays[4] == &s.b[0], "");. These add no run-time overhead and will fail if your compiler is doing something that would break your program, so long as you don’t trigger UB in the assertion itself.
If you want a portable way to guarantee that both sub-arrays are packed into a contiguous memory range, or split a block of memory the other way, you can copy them with memcpy().
The Standard does not impose any restrictions upon what implementations must do when a program tries to use an out-of-bounds array subscript in one structure field to access a member of another. Out-of-bounds accesses are thus "illegal" in strictly conforming programs, and programs which make use of such accesses cannot simultaneously be 100% portable and free of errors. On the other hand, many implementations do define the behavior of such code, and programs which are targeted solely at such implementations may exploit such behavior.
There are three issues with such code:
While many implementations lay out structures in predictable fashion, the Standard allows implementations to add arbitrary padding before any structure member other than the first. Code could use sizeof or offsetof to ensure that structure members are placed as expected, but the other two issues would remain.
Given something like:
if (structPtr->array1[x])
structPtr->array2[y]++;
return structPtr->array1[x];
it would normally be useful for a compiler to assume that the use of structPtr->array1[x] will yield the same value as the preceding use in the "if" condition, even though it would change the behavior of code that relies upon aliasing between the two arrays.
If array1[] has e.g. 4 elements, a compiler given something like:
if (x < 4) foo(x);
structPtr->array1[x]=1;
might conclude that since there would be no defined cases where x isn't less than 4, it could call foo(x) unconditionally.
Unfortunately, while programs can use sizeof or offsetof to ensure that there aren't any surprises with struct layout, there's no way by which they can test whether compilers promise to refrain from the optimizations of types #2 or #3. Further, the Standard is a little vague about what would be meant in a case like:
struct foo {char array1[4],array2[4]; };
int test(struct foo *p, int i, int x, int y, int z)
{
if (p->array2[x])
{
((char*)p)[x]++;
((char*)(p->array1))[y]++;
p->array1[z]++;
}
return p->array2[x];
}
The Standard is pretty clear that behavior would only be defined if z is in the range 0..3, but since the type of p->array in that expression is char* (due to decay) it's not clear the cast in the access using y would have any effect. On the other hand, since converting pointer to the first element of a struct to char* should yield the same result as converting a struct pointer to char*, and the converted struct pointer should be usable to access all bytes therein, it would seem the access using x should be defined for (at minimum) x=0..7 [if the offset of array2 is greater than 4, it would affect the value of x needed to hit members of array2, but some value of x could do so with defined behavior].
IMHO, a good remedy would be to define the subscript operator on array types in a fashion that does not involve pointer decay. In that case, the expressions p->array[x] and &(p->array1[x]) could invite a compiler to assume that x is 0..3, but p->array+x and *(p->array+x) would require a compiler to allow for the possibility of other values. I don't know if any compilers do that, but the Standard doesn't require it.

Is it valid to pass nullptr to std::string::assign?

I have a function which returns a pointer and a length, and I want to call std::string::assign(pointer, length). Do I have to make a special case (calling clear) when length is zero and the pointer may be nullptr?
The C++ standard says:
21.4.6.3 basic_string::assign
basic_string& assign(const charT* s, size_type n);
Requires: s points to an array of at least n elements of charT.
So what if n is zero? What is an array of zero characters and how does one point to it?
Is it valid to call
s.assign(nullptr, 0);
or is it undefined behavior?
The implementation of libstdc++ appears not to dereference the pointer s when the size n is zero, but that's hardly a guarantee.
Pedantically, a nullptr does not meet the requirements of pointing to an array of size >=0, and therefore the standard does not guarantee the behaviour (it's UB).
On the other hand, the implementation wouldn't be allowed to dereference the pointer if n is zero, because the pointer could be to an array of size zero, and dereferencing such a pointer would have undefined behaviour. Besides, there wouldn't be any need to do so, because nothing is copied.
The above reasoning does not mean that it is OK to ignore the UB. But, if there is no reason to disallow s.assign(nullptr, 0) then it could be preferable to change the wording of the standard to "If n is greater than zero, then s points to ...". I don't know of any good reason to disallow it, but neither can I promise that a good reason doesn't exist.
Note that adding a check is hardly complicated:
s.assign(ptr ? ptr : "", n);
What is an array of zero characters
This is: new char[0]. Arrays of automatic or static storage may not have a zero size.
Well as you point out, the standard says "s points to an array...". A null pointer does not point to an array of any number of elements. Not even 0 elements. Also, note that s points to "an array of at least n elements...". So it's clear that if n is zero, you can still pass a legitimate pointer to an array.
Overall, std::string's API is not well-guarded against null pointers to charT. So you should always make sure that pointers you hand off to it are non-null.
I am not sure why an implementation would dereference any pointer to an array whose length is provided as zero.
That said, I would err to the side of caution. You could argue that you are not meeting the standards requirement:
21.4.6.3 basic_string::assign
8 Requires: s points to an array of at least n elements of charT
because nullptr is not pointing to an array.
So technically the behaviour is undefined.
From the Standard (2.14.7) [lex.nullptr]:
The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t. [ Note: std::nullptr_t
is a distinct type that is neither a pointer type nor a pointer to member type ... ]
std::nullptr_t can be implicitly converted to any type of null pointer as per 4.10.1 [conv.ptr]. Regardless of the type of null pointer, the fact remains that it points at nothing.
Thus, it doesn't meet the requirement that s points to an array of at least n elements of charT.
It seems to be undefined behavior.
Interestingly, according to this answer, the C++11 Standard clearly stated that s must not be a null pointer in the basic_string constructor, but this wording has since been removed.

How can char* be a condition in for loop?

In a book I am reading there is a piece of code :
string x;
size_t h=0;
for(const char* s=x.c_str();*s;++s)
h=(h*17)^*s;
Regarding this code, I have two questions:
how can *s be a condition? what does it mean?
what does "h=(h*17)^*s" mean?
Thanks for help!
how can *s be a condition? what does it mean?
It means "while the value pointed to by s is not zero." C strings are null-terminated, so the last character in the string returned by c_str() will be the null character (\0, represented by all bits zero).
what does "h=(h*17)^*s" mean?
It multiplies h by 17 then xors it with the value pointed to by s.
In C (or C++) any value can be used as a "boolean". A numeric value of 0, or a NULL pointer, means "false". Anything else means "true".
Here, *s is "the character value currently pointed to by s". The loop stops if that character is a 0 (not the "0" digit, with ASCII encoding 48, but the byte with ASCII encoding 0). This is conventionally the "end-of-string" marker, so the loop stops when it reaches the end of the string.
"^" is the bitwise XOR operator. The left "*" is a plain multiplication, while the other "*" is the pointer dereference operator (i.e. the thing which takes the pointer s and looks at the value to which this pointer points). "=" is assignment. In brief, the value of h is multiplied by 17, then XORed with the character pointed to by s, and the result becomes the new value of h.
*s detects the string termination character '\0'
(h*17)^*s is what it says: h multiplied by 17 and xor-ed with the content of the character pointed by s. Seems a simple hashing funciton.
As other answers have explained, the basic answer is that any expression that evaluates to 0 gets interpreted as a 'false' condition in C or C++, and *s will evaluate to 0 when the s pointer reaches the null termination character of the string ('\0').
You could equivalently use the expression *s != 0, and some developers might argue that this is what should be used, giving the opinion that the 'fuller' expression is more clear. Whether or not you agree with that opinion, you need to be able to understand the use of the terse alternative, since it's very commonly used in C/C++ code. You'll come across these expressions a lot, even if you prefer to use the more explicit comparision.
The more rigorous explanation from the standard (for some reason I feel compelled to bring this into the discussion, even though it doesn't really change or clarify anything. In fact, it probably will muddle things unnecessarily for some people - if you don't care to get into this level of trivia, you'll miss absolutely nothing by clicking the back button right now...):
In C, the *s expression is in what the standard calls 'expression-2' of the for statement, and this particular for statement example is just taking advantage of the standard's definition of the for statement. The for statement is classified as an 'iteration statement', and among the semantics of any iteration statement are (6.8.5/4 "Iteration statements"):
An iteration statement causes a statement called the loop body to be executed repeatedly
until the controlling expression compares equal to 0.
Since the 'expression-2' part of the for statement is the controlling expression, this means that the for loop will execute repeatedly until *s compares equal to 0.
The C++ standard defines things a little differently (but with the same result). In C++, the for statement is defined in terms of the while statement, and the condition part of the while statement controls the the iteration (6.5.1/1 "The while statement"):
until the value of the condition becomes false
Earlier in the C++ standard, the following describes how expressions are converted to bool (4.12 "boolean conversions"):
An rvalue of arithmetic, enumeration, pointer, or pointer to member type can be converted to an rvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true
Similar wording in the standard (in both languages) apply to the controlling expression/condition of all selection or iteration statements. All this language-lawyerese boils down to the fact that if an expression evaluates to 0 it's the same as evaluating to false (in the English sense of the word, since C doesn't have a built-in false keyword).
And that's the long, confusing explanation of the simple concept.
*s is the character that s currently points to, so it's a character. The for loop goes on until it becomes \0, meaning until the string ends.
h is assigned the value of h * 17 xored with the (ascii value of) character *s.
Here's a good tutorial about pointers.
1) *s in the condition checks whether *s!=NUL
2) h=(h*17)^*s implies multiply h by 17 and perform exclusive-OR operation with the value pointed to by s.
In C and C++, true and false are the same as non-zero, and zero. So code under if (1){ will always execute, as will code under if (-1237830){, but if (0){ is always false.
Likewise, if the value of the pointer is ever 0, the condition is the same as false, i.e. you will exit the loop.