Is the address of a string literal guaranteed not to be null? - c++

Is it guaranteed that "stuff" != nullptr and "" != nullptr?
An online search did not give useful results.
Edit: I am asking if it is guaranteed that the address of any string literal is not null, which neither of the duplicate questions explicitly answers.

Is the address of a string literal guaranteed not to be null?
Yes.
A null pointer cannot be derefernced. A string literal can be converted to a pointer, which can be dereferenced to get the first character of the string. Ergo, a string literal can't be a null pointer. Q.E.

Related

Using NULL with C's char* strings [duplicate]

This question already has answers here:
What is the difference between NULL, '\0' and 0?
(11 answers)
Closed 4 years ago.
As we all know, strings in C are null-terminated. Does that mean that according to the standard it is legal to use the NULL constant as the terminator? Or is the similarity of the name of NULL pointer and null-terminator for a string only a happy coincidence?
Consider the code:
char str1[] = "abc";
char str2[] = "abc";
str1[3] = NULL;
str2[3] = '\0';
Here, we change the terminator of str1 to NULL. Is this legal and well-formed C code and str1 adheres to C's definition of null-terminated string? Will it be the same in case of C++?
In practice, I have always used NULL instead of '\0' in my code for strings and everything worked - but is such practice 100% legal?
EDIT: I understand that it's very bad style and refrain from endorsing it and now understand the difference between 0, NULL and '\0' (as in a duplicate What is the difference between NULL, '\0' and 0). I'm still quite curious as for the legality of this code - and voices here seem to be mixed - and the duplicate does not give an authoritative answer to that in my opinion.
Does that mean that according to the standard it is legal to use the NULL constant as the terminator? (OP)
str1[3] = NULL;
Sometimes. Further: does it always properly cause a character array to form a string without concerns?
First, it looks wrong. Akin to int z = 0.0;. Yes it is legal well defined code, but unnecessarily draws attention to itself.
In practice, I have always used NULL instead of '\0' (OP)
I doubt you will find any modern style guide or group of coders endorsing that. NULL is best reserved for pointer contexts.1
These are 2 common and well understood alternatives.
str1[3] = '\0';
str1[3] = 0;
strings in C are null-terminated (OP)
The C spec consistently uses null character, not just null.
The macros are NULL which expands to an implementation-defined null pointer constant; and ... C11 §7.19 3
OK, now what is a null pointer constant?
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. §6.3.2.3 5
If the null pointer constant is a void* then we have something like
str1[3] = (void*) 0;
The above can warn about converting a pointer to a char. This is something best avoided.
Will it be the same in case of C++? (OP)
Yes, the above applies. (Aside: str1[3] = 0 may warn.) Further, NULL is less preferred than nullptr. So NULL is rarely the best to use in C++ in any context.
1Note: #Joshua reports a style that matches OP's in 1995 Turbo C 4.5
The bottom line is that in C/C++, NULL is for pointers and is not the same as the null character, despite the fact that both are defined as zero. You might use NULL as the null character and get away with it depending on the context and platform, but to be correct, use '\0'. This is described in both standards:
C specifies that the macro NULL is defined as a macro in <stddef.h> which "expands to an implementation-defined null pointer constant" (Section 7.17.3), which is itself defined as "an integer constant expression with the value 0, or such an expression cast to type void *" (Section 6.3.2.3.3).
The null character is defined in section 5.2.1.2: "A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string." That same section explains that \0 will be the representation of this null character.
C++ makes the same distinctions. From section 4.10.1 of the C++ standard: "A null pointer constant is an integer literal (2.13.2) with value zero or a prvalue of type std::nullptr_t." In section 2.3.3, it describes the as "null character (respectively, null wide character), whose value is 0". Section C.5.2 further confirms that C++ respects NULL as a standard macro imported from the C Standard Library.
No, I don't think it's strictly legal.
NULL is specified to be either:
an integer constant expression with the value ​0​
an integer constant expression with the value 0 cast to the type void*
In an implementation that uses the first format, using it as the string terminator will work.
But in an implementation that uses the second format, it's not guaranteed to work. You're converting a pointer type to an integer type, and the result of this is implementation-dependent. It happens to do what you want in common implementations, but nothing requires it.
If you have the second type of implementation, you're likely to get a warning like:
warning: incompatible pointer to integer conversion assigning to 'char' from 'void *' [-Wint-conversion]
If you want to use a macro, you can define:
#define NUL '\0'
and then use NUL instead of NULL. This matches the official name of the ASCII null character.

Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers? [duplicate]

This question already has answers here:
How can a char pointer be initialized with a string (Array of characters) but an int pointer not with an array of integer? [duplicate]
(4 answers)
Closed 5 years ago.
I am trying to understand the relationship between strings, arrays, and pointers.
The book I am reading has a program in which it initializes a variable as follows:
char* szString= "Name";
The way I understand this, is that a C-style string is simply an array of chars. An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset. I.e.
array[5] in fact returns what is evaluated from expression *(array+5).
So, from my understanding and testing the szString is in fact initialized as a pointer which points to the first address of the array storing "Name". I can deduce this because the output to:
cout << *szstring;
is the character "N".
My understanding of the statement
cout << szstring;
outputting the characters "Name", is that the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character. On the other hand for argument *szstring a different version of this method is used that supports C-style strings.
Therefore, if I can initialize a char type pointer to address the first element in an array of chars (a C-style string), why can I not initialize an int type pointer to the first element in an array of integers as follows:
int* intArray = {1,2,3};
a C-style string is simply an array of chars
Correct.
An array is simply a shorthand version of referring to the pointer (which stores the first value of the array) and an offset.
No, not really.
the method cout interprets the argument szstring as a string type and prints out all the characters until the NUL character
cout is not a "method", but its operator<< works this way yes.
Why can a char pointer variable be initialized to a string but an int pointer variable can not be initialized to an array of integers?
The simple answer is that string literals are special, otherwise we would not be able to use them.
In many ways, including this way, the language standards dictate special handling for both string literals and char*s.
why can I not initialize an int type pointer to the first element in an array of integers
C++ could have ultimately extended the syntax of other pointer initialisations to do a similar thing, but it didn't actually need to because instead we have the far superior:
std::vector<int> myInts{1,2,3};
The short answer is that there exist character array literals, but no int array literals.
A string literal is a literal value of array type, and it is an lvalue, so that's something whose address you can take and store. The lifetime of the object designated by such a value is permanent, so pointers thus obtained are valid throughout the entire program.
By contrast, there is no literal of type "array of int", and no unnamed int array lvalues.
Don't confuse this with the braced initialization lists, which are not expressions and therefore not values! Braced lists can be used to initialize variables of array type, but they are not themselves values.
If anything, the only odd-man-out in the language grammar is that it is permissible to initialize a char array with a braced list containing a string literal: char a[] = {"foo"}; Think of this as a kind of copy initialization; a is a copy of the literal lvalue.
As a beginner I had a similar question. Please look at this post and the answers.
This const char* szString= "Name" assigns to the pointer szString the address of the initial element of an array whose contents are "Name" (followed by a terminating '\0' null character).
There's no implicit conversion from int to int*, other that 0 being a special case, as a null pointer.

Assign integer literal to pointer?

This question might be too bad but I can take risk to post this question here to address my confusion.
Actually my question is that we can only assign address to pointer like :-
int *p,a;
p = &a; // OK
p = 1; // Error because you cannot assign integer literal to p*
But we can assign NULL to p like :
p = NULL;
Indeed, NULL is a macro which is value is 0 and before compiling this code by compiler it get replaced with 0 by prepocessor. So after replacement its look like
p = 0;
I know it means p is point to nothing but according to rule we can only assign address to pointer but 0 is an integer.
So this isn't break the rule ?
Thanks.
barak manos already pointed it out in his comment:
If you want to set a pointer to a literal value, you need to cast the literal value to the corresponding pointer type first.
NULL could just as well be defined as (void *) 0... which is implicitly convertible to any pointer type.
In either case, you end up with a pointer pointing to a literal address.
In no case, however, does your pointer point to memory containing a literal 4, though. This is, to my knowledge, not possible without assigning that literal value to an int first:
int i = 4;
int * p = &i;
No it does not break the rule. The integer constant 0 (and generally any constant expression evaluating to 0) is treated specially and it is allowed to assign such value to a pointer. It does not mean that you can assign any integer - just zero.
The current version of C++ introduces the new nullptr keyword which should be used instead.
you can directly set the address of a pointer in c:
char * p = reinterpret_cast<char *>( 0x0000ffff ) ;
Note that is not generally considered safe use of pointers for obvious reasons (can point anywhere in memory).
for more info see here and related question here
The reason for this is different in C from C++ IIRC. In C++ 0 is special literal that, by definition, can be interpreted as a null pointer of any pointer type; so in this case there is no cast from integer to pointer. To test this you can try doing this:
int i = 0;
int* p = i;
Which you will discover gives an error (See here for IDEOne example)
In fact literal 0 or any constant value that is zero is an exception. It could assign to a pointer and it means a NULL or nullptr. In reverse, any null pointer evaluates to 0.
Null does not point to any memory address location, its compiler's responsibility to handle it. It is null and null is nowhere in the memory. It points to nothing.

Can the expression "(ptr == 0) != (ptr == (void*)0)" really be true?

I read this claim in a forum thread linked to in a comment by #jsantander:
Keep in mind that when you assign or compare a pointer to zero, there is some special magic that occurs behind the scenes to use the correct pattern for the given pointer (which may not actually be zero). This is one of the reasons why things like #define NULL (void*)0 are evil – if you compare a char* to NULL that magic has been explicitly (and probably unknowingly) turned off, and an invalid result may happen. Just to be extra clear:
(my_char_ptr == 0) != (my_char_ptr == (void*)0)
So the way I understand it, for an architecture where the NULL pointer is, say, 0xffff, the code if (ptr), would compare ptr to 0xffff instead of to 0.
Is this really true? Is it described by the C++ standard?
If true, it would mean that 0 can be safely used even for architectures that have a non-zero NULL pointer value.
Edit
As an extra clarification, consider this code:
char *ptr;
memset(&ptr, 0, sizeof(ptr));
if ((ptr == (void*)0) && (ptr != 0)) {
printf("It can happen.\n");
}
This is how I understand the claim of this forum post.
There's two parts to your question. I'll start with:
If true, it would mean that 0 can be safely used even for architectures that have a non-zero NULL pointer value.
You are mixing up "value" and "representation". The value of a null pointer is called the null pointer value. The representation is the bits in memory that are used to store this value. The representation of a null pointer could be anything, there is no requirement that it is all-bits-zero.
In the code:
char *p = 0;
p is guaranteed to be a null pointer. It might not have all-bits-zero.
This is no more "magic" than the code:
float f = 5;
f does not have the same representation (bit-pattern in memory) as the int 5 does, yet there is no problem.
The C++ standard defines this. The text changed somewhat in C++11 with the addition of nullptr; however in all versions of C and C++, the integer literal 0 when converted to a pointer type generates a null pointer.
From C++11:
A null pointer constant is an integral constant expression prvalue of integer type that evaluates to zero or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of object pointer or function pointer type. Such a conversion is called a null pointer conversion.
0 is a null pointer constant, and (char *)0 for example is a null pointer value of type char *.
It's immaterial whether a null pointer has all-bits-zero or not. What matters is that a null pointer is guaranteed to be generated when you convert an integral constexpr of value 0 to a pointer type.
Moving onto the other part of your question. The text you quoted is complete garbage through and through. There's no "magic" in the idea that a conversion between types results in a different representation, as I discuss above.
The code my_char_ptr == NULL is guaranteed to test whether or not my_char_ptr is a null pointer.
It would be evil if you write in your own source code, #define NULL (void*)0. This is because it is undefined behaviour to define any macro that might be defined by a standard header.
However, the standard headers can write whatever they like so as the Standard requirements for null pointers are fulfilled. Compilers can "do magic" in the standard header code; for example there doesn't have to be a file called iostream on the filesystem; the compiler can see #include <iostream> and then have hardcoded all of the information that the Standard requires iostream to publish. But for obvious practical reasons, compilers generally don't do this; they allow the possibility for independent teams to develop the standard library.
Anyway, if a C++ compiler includes #define NULL (void *)0 in its own header, and as a result something non-conforming happens, then the compiler would be non-conforming obviously. And if nothing non-conforming happens then there is no problem.
I don't know who the text you quote would direct its "is evil" comment at. If it is directed at compiler vendors telling them not to be "evil" and put out non-conforming compilers, I guess we can't argue with that.
I think the forum post you link to is incorrect (or we have misinterpreted what it means by !=). The two sub-expressions have different semantics but the same result. Assuming that my_char_ptr really has type char* or similar, and a valid value:
my_char_ptr == 0 converts 0 to the type of my_char_ptr. That yields a null pointer because 0 is an example of a so-called "null pointer constant", which is defined in the standard. It then compares the two. The comparison is true if and only if my_char_ptr is a null pointer, because only null pointers compare equal to other null pointers.
my_char_ptr == (void*)0 converts my_char_ptr to void*, and then compares that to the result of converting 0 to void* (which is a null pointer). The comparison is true if and only if my_char_ptr is a null pointer because when you convert a pointer to void* the result is a null pointer if and only if the source is a null pointer.
The issue of whether null pointers are represented with 0 bits or not is interesting but irrelevant to the analysis of the code.
The practical danger of thinking that NULL is a null pointer (rather than merely a null pointer constant) is that you might think that printf("%p", NULL) has defined behaviour, or that foo(NULL) will call the void* overload of foo rather than the int overload, and so on.
No, because they incidentially used the only case where it is guaranteed to work as example.
Otherwise, yes.
Although practially you probably won't ever see a difference, strictly speaking, the concern is correct.
The C++ standard requires (4.10) that:
A null pointer constant (which is either an integral constant expression that evaluates to 0, or a prvalue of type std::nullptr_t) converts to the null pointer of any type.
Two null pointers of the same type compare equal.
A prvalue of type pointer-to-cv-T can be converted to pointer-to-cv-void, and the null pointer value will be adjusted accordingly.
Pointers of derived classes can be converted to pointers of base classes, and the null pointer value will be adjusted accordingly.
This means, if you are pedantic about the wording, that the null pointers of void and char and foo_bar are not only not necessarily zero bit patterns, but also are not necessarily the same. Only null pointers of the same type are necessarily the same (and actually, not even that is true, it only says that they must compare equal, which isn't the same thing).
The fact that it explicitly says "The null pointer value is converted to the null pointer value of the
destination type" signifies that this is not only an absurd, theoretical contortion of the wording, but indeed intended as a legitimate feature of an implementation.
That is regardless of the fact that the same literal 0 will convert to the null pointer of each type.
Incidentially, in their example, they compared to void*, which will work due to the above conversion rule. Also, in practice, the null pointer for every type is a zero bit pattern on every architecture that you are likely to encounter in your life (though of course, that's not guaranteed).
First, I'm not sure that (charPtr == 0) != (charPtr == (void*)0) is allowed, even in C++. In both cases, you're
converting a null pointer constant (0) to a pointer, which
results in a null pointer. And all null pointers should compare
equal.
Second, while I don't know the context of the passage you cite,
you really don't have to worry about NULL being (void*)0:
user code cannot legally define NULL (at least not if it
includes any standard headers), and the C++ standard requires
NULL to be defined as a null pointer constant; i.e. an
constant integral expression evaluating to 0. (Note that
despite its name, a null pointer constant cannot have a pointer
type.) So it might be 0 (the more or less standard
definition, since the very beginnings of C), or possibly 0L,
or even (1-1), but not ((void*)0). (Of course, it might
also be something like __nullptr, a compiler built-in constant
which evaluates to integer 0, but triggers a warning if not
converted immediately into a null pointer.
Finally: there's no requirement that a null pointer have all
0 bits, and there certainly have been cases where this wasn't
the case. On the other hand, there is a requirement that
comparing a null pointer to a null pointer constant will
evaluate to true; it's up to the compiler to make it work. And
since NULL is required to be defined as a null pointer
constant, whether you use NULL or 0 is purely a question of
personal preference and convention.
EDIT:
Just to clarify a little: the critical point involves conversion
of a "null pointer constant", an integral constant expression
evaluating to 0. What can surprise people is:
int zero = 0; // NOT a constant expression.
void* p1 = reinterpret_cast<void*>( zero );
void* p2 = 0;
if ( p1 == p2 ) // NOT guaranteed!
The results of converting a non-constant expression which
evaluates to zero to a pointer is not guaranteed to be a null
pointer.

Are C strings guaranteed to be arrays?

Are C strings (as opposed to std::string) guaranteed to be implemented as arrays? For example, say, I have
char const * str = "abc";
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is). I'm asking this because I dont know if C strings are a special case due to the null character terminating it.
First part of the question
Are C strings guaranteed to be implemented as arrays?
For example, say, I have: char const * str = "abc"
Yes, a string object is of an array type. A character string is a data format and a (character) string object is of a type array of char.
In your example str points to the string literal "abc". Character string literals have the type char[N+1] where N is the length of the string (i.e., the number of characters excluding the terminating null character).
Some references from Standard and K&R 2nd edition:
C defines a string literal as:
(C99, 6.4.5p2) "A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz"."
and says (emphasis mine):
C99, 6.4.5p5) "For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;"
K&R 2nd edition says:
"Technically, a string constant is an array of characters"
and
"when a string constant like "hello\n" appears in a C program, it is stored as an array of characters containing the characters in the string and terminated with a '\0' to mark the end."
Second part of the question
What it boils down to is whether or not str + 4 a legal pointer value (without dereferencing that is).
Yes, it is a valid pointer. In your case str + 4 is a pointer one past the last element of the array.
A valid pointer is a pointer that is either a null pointer or a pointer to a valid object. For an element of an array object, a pointer one past the last element of the array object is also a valid pointer.
Note that for the purpose of the last rule ("the one past element"), for pointers to objects that are not elements of an array, C treats the object as an array of one element.
(C99, 6.5.6p7) "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."
They are guaranteed to be a contiguous sequence of chars. If that's your definition of an array, then yes.
In your example you will have 4 chars, one for each character and one for the null terminator. str+4 will be out of range.
Are C strings guaranteed to be implemented as arrays?
With a wide definition of array, yes, they are a contiguous sequence of chars with a terminating null character.
What it boils down to is whether or not str + 4 a legal pointer value
The literal ("abc") is an array stored somewhere in the process memory. The type is is const char[4] (in C++, I am not sure if in C it is char[4]). Then str is a pointer to the first element of the string literal, and the expression str+3 is correct, can be dereferenced and the pointed character will be 0. The expression str+4 is a pointer beyond the end of the array and cannot be dereferenced.
The short answer is: yes, they are, but str+4 isn't necessarily a legal pointer as 1 char may not be equal to 1 byte.