String declaration works when I declare in the the following ways:
string a = "xyz";
char a[] = "xyz";
But in case of:
char *a = "xyz";
It gives an ERROR in g++ 4.9.2 compiler:
warning: deprecated conversion from string constant to ‘char*’
[-Wwrite-strings] char *a = "xyz";
I think these three declaration types are different from each other. Please help me out.
string a = "xyz";
This uses initializer syntax to invoke the constructor std::string( const char* ).
char a[] = "xyz";
This declares an array large enough to store the string plus terminator. It follows standard array-initializer rules. Think of it as equivalent to char a[] = { 'x', 'y', 'z', '\0' };
char *a = "xyz";
This takes a string literal ("xyz") and assigns it to a non-constant pointer. Within the language, such a pointer means it is okay to modify the string it points to, but that is undefined behaviour in this case, because string literals may not be modified. To prevent you from making such a mistake, the compiler gives you a warning. The following is valid and will not emit a warning:
const char *a = "xyz";
In earlier version(s) of the language, you could use:
char* a = "xyz";
Now, you must use:
char const* a = "xyz";
A string literal, such as "xyz" resides in the rea-only parts of the program. It can be used to initialize a char const* since you are not supposed to modify the contents of what a char const* points to. Using it to initialize a char* opens the possibility of a user modifying them accidentally. In addition, modifying such strings is cause for undefined behavior.
A string literal can also be used to initialize a char[]. In that case, the string literal is copied to the space allocated for the array. Hence, there is no risk of modifying read-only data of the program. Hence, using
char a[] = "xyz";
is OK.
The line
string a = "xyz";
invokes the constructor of string that takes a char const* and then uses that object to initialize a. Hence, that line is also OK.
Related
I am trying to understand how pointers,arrays and string literals work in C++.
Suppose we have the following line of code:
const char* const letters[] = {"A+","A"};
If I understand correctly, this declaration declares letters to be an array of constant pointers to constant characters. From my understanding, the compiler will actually convert each string literal to a null terminated char array and each element of letters is actually a constant pointer to the first element of that array.
So, for instance, letters[0] is actually a pointer to the "A" of "A+". However
std::cout<< letters[0];
actually outputs "A+" to the standard output. How can this be? Especially since letters[0] is a constant pointer?
My second question is related to the declaration above: if string literals are actually const char arrays, then why does the following line of code
const char* const letters[] = {{'A','+','\0'},{'A','\0'}};
throws
error: braces around scalar initializer for type ‘const char* const’
const char* const letters[] = {{'A','+','\0'},{'A','\0'}};
^
Thank you!
The standard specifies that a string literal is represented - as far as your program is concerned - as an array of const characters of static storage duration with a trailing '\0' terminator. The standard doesn't specify HOW a compiler achieves this effect, only that your program can treat the string literal in that way.
So modifying a string literal is either prevented (e.g. passing a string literal to a function expecting a char * is a diagnosable error, and the code will not compile) or - if code works around the type system to modify any character in a string literal - involves undefined behaviour.
In your example, letters[0] is of type const char *, and has a value equal to the address of the first character in the string literal "A+".
std::cout, being of type std::ostream, has an operator<<() that accepts a const char *. This function is called by the statement std::cout << letters[0] and the function assumes the const char * points at a zero-terminated array of char. It iterates over that array, outputting each character individually, until it encounters the trailing '\0' (which is not output).
The thing is, a const char * means that the pointer is to a const char, not that the pointer cannot be changed (that would be char * const). So it is possible to increment the pointer, but not change the value it points at. So, if we do
const char *p = letters[0];
while (*p != '\0')
{
std::cout << *p;
++p;
}
which loops over the characters of the string literal "A+", printing each one individually, and stopping when it reaches the '\0' (the above produces the same observable output std::cout << letters[0]).
However, in the above
*p = 'C';
will not compile, since the definition of p tells the compiler that *p cannot be changed. However, incrementing p is still allowed.
The reason that
const char* const letters [] = {{'A','+','\0'},{'A','\0'}};
does not compile is that an array initialiser cannot be used to initialise pointers. For example;
const int *nums = {1,2,3}; // invalid
const * const int nums2 [] = {{1,2,3}, {4,5,6}}; // invalid
are both illegal. Instead, one is required to define arrays, not pointers.
const int nums[] = {1,2,3};
const int nums2[][3] = {{1,2,3}, {4,5,6}};
All versions of C and C++ forbid initialising pointers (or arrays of pointers in your example) in this way.
Technically, the ability to use string literals to initialise pointers is actually the anomaly, not the prohibition on initialising pointers using arrays. The reasons C introduced that exemption for string literals are historical (in very early days of C, well before K&R C, string literals could not be used to initialise pointers either).
As for your first question, the type of letters[0] is const char * const. This is a pointer to a character, but not a character itself. When passing a pointer to a character to std::cout, it will treat it as a NUL-terminated C string, and writes out all characters from the start of the memory pointed to until it encounters a NUL-byte. So that is why the output will be A+. You can pass the first character of the first string by itself by writing:
std::cout << letters[0][0];
The fact that the pointers and/or the C strings themselves are const doesn't matter here, since nothing is writing to them.
As for your second question, const char * const declares a single array, but you are providing a nested array on the right-hand side of that statement. If you really wanted two arrays of characters, write:
const char *const letters[] = {{'A', '+', '\0'}, {'A', '\0'}};
That is equal to your code form the first question. Or if you want a single array:
const char *const letters = {'A', '+', '\0', 'A', '\0'};
That line is equal to:
const char *const letters = "A+\0A";
im realy confused about const char * and char *.
I know in char * when we want to modify the content, we need to do something like this
const char * temp = "Hello world";
char * str = new char[strlen(temp) + 1];
memcpy(str, temp, strlen(temp));
str[strlen(temp) + 1] = '\0';
and if we want to use something like this
char * str = "xxx";
char * str2 = "xts";
str = str2;
we get compiler warning. it's ok I know when i want to change char * I have to use something memory copy. but about const char * im realy confused. in const char * I can use this
const char * str = "Hello";
const char * str2 = "World";
str = str2; // and now str is Hello
and I have no compiler error ! why ? why we use memory copy when is not const and in const we only use equal operator ! and done !... how possible? is it ok to just use equal in const? no problem happen later?
As other answers say, you should distinguish pointers and bytes they point to.
Both types of pointers, char * and const char *, can be changed, that is, "redirected" to point to different bytes. However, if you want to change the bytes (characters) of the strings, you cannot use const char *.
So, if you have string literals "Hello" and "World" in your program, you can assign them to pointers, and printing the pointer will print the corresponding literal. However, to do anything non-trivial (e.g. change Hello to HELLO), you will need non-const pointers.
Another example: with some pointer manipulation, you can remove leading bytes from a string literal:
const char* str = "Hello";
std::cout << str; // Hello
str = str + 2;
std::cout << str; // llo
However, if you want to extract a substring, or do any other transformation on a string, you should reallocate it, and for that you need a non-const pointer.
BTW since you are using C++, you can use std::string, which makes it easier to work with strings. It reallocates strings without your intervention:
#include <string>
std::string str("Hello");
str = str.substr(1, 3);
std::cout << str; // ell
This is a confusing hangover from the days of early C. Early C didn't have const, so string literals were "char *". They remained char * to avoid breaking old code, but they became non-modifiable, so const char * in all but name. So modern C++ either warns or gives an error (to be strictly conforming) when the const is omitted.
Your memcpy missed the trailing nul byte, incidentally. Use strcpy() to copy a string, that's the right function with the right name. You can create a string in read/write memory by use of the
char rwstring[] = "I am writeable";
syntax.
That is cause your variables are just a pointers *. You're not modifiying their contents, but where they are pointing to.
char * a = "asd";
char * b = "qwe";
a = b;
now you threw away the contents of a. Now a and b points to the same place. If you modify one, both are modified.
In other words. Pointers are never constants (mostly). your const predicate in a pointer variable does not means nothing to the pointer.
The real difference is that the pointer (that is not const) is pointing to a const variable. and when you change the pointer it will be point to ANOTHER NEW const variable. That is why const has no effect on simple pointers.
Note: You can achieve different behaviours with pointers and const with more complex scenario. But with simple as it, it mostly has no effect.
Citing Malcolm McLean:
This is a confusing hangover from the days of early C. Early C didn't have const, so string literals were "char *". They remained char * to avoid breaking old code, but they became non-modifiable, so const char * in all but name.
Actually, string literals are not pointers, but arrays, this is why sizeof("hello world") works as a charm (yields 12, the terminating null character is included, in contrast to strlen...). Apart from this small detail, above statement is correct for good old C even in these days.
In C++, though, string literals have been arrays of constant characters (char const[]) right from the start:
C++ standard, 5.13.5.8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration.
(Emphasised by me.) In general, you are not allowed to assign pointer to const to pointer to non-const:
char const* s = "hello";
char ss = s;
This will fail to compile. Assigning string literals to pointer to non-const should normally fail, too, as the standard explicitly states in C.1.1, subclause 5.13.5:
Change: String literals made const.
The type of a string literal is changed from “array of char” to “array of const char”.
[...]char* p = "abc"; // valid in C, invalid in C++
Still, string literal assignement to pointer to non-const is commonly accepted by compilers (as an extension!), probably to retain compatibility to C. As this is, according to the standard, invalid, the compiler yields a warning, at least...
While creating an array of pointers for int data type the following code works:
int var[] = {10, 100, 200, 1000};
int *ptr[] = {&var[0], &var[1], &var[2], &var[3]};
While creating an array of pointers for char data type the following is legal:
char *names[] = {"Mathew Emerson", "Bob Jackson"};
But if I create an array of pointers for int data type as follows:
int var[] = {10, 100, 200, 1000};
int *ptr[] = {var[0], var[1], var[2], var[3]};
I get a compiler error. I understand why I am getting a compilation error in the above method of declaration for array of int data type, as var[i] is not a reference to a variable to which a pointer must point to i.e. its address, but shouldn't I also get error by the same logic in the declaration of my char array of pointer.
What is the reason that its is acceptable in char array of pointers?
Is " a string value " an address of something to which a pointer can point to or is it just a const string value.
char *names[] = {"Mathew Emerson", "Bob Jackson"};
Is not legal in C++. A string literal has the type of const char[] so it is illegal to store it as a char* as it violates const-correctness. Some compilers allow this to still compile as a legacy from C since string literals have the type char[] but it is not standard C++. If you turn up the warnings on your compiler you should get something along the lines of
main.cpp: In function 'int main()':
main.cpp:5:53: warning: ISO C++ forbids converting a string constant to 'char*' [-Wpedantic]
char *names[] = {"Mathew Emerson", "Bob Jackson"};
If you want an array of strings then I suggest you use a std::string like
std::string names[] = {"Mathew Emerson", "Bob Jackson"};
The reason
char *names[] = {"Mathew Emerson", "Bob Jackson"};
"works" is that since the string literals are arrays they implicitly decay to pointers so
{"Mathew Emerson", "Bob Jackson"}
Becomes
{ address_of_first_string_literal, address_of_second_string_literal}
and then those are used to initialize the pointers in the array.
int *ptr[] = {var[0], var[1], var[2], var[3]};
Cannot work because var[N] is a reference to the int in the array and not a pointer.
"Mathew Emerson" is of type const char* - it is already a pointer, thus you can directly store it in an array of pointers. The reason you need & for the int case is to "convert" int to int*.
(Ignoring the const-ness problem as mentioned in other answers...)
Each string literal you write ("Mathew Emerson", "Bob Jackson", ...) requires some storage location in the compiled code later.
It is as if you had written somewhere
char const[] MathewEmerson = { 'M', 'a', /*...*/, 'o', 'n', 0 };
So you you could construct your char const* array then as:
char const* names[] = { &MathewEmerson[0], /*...*/ };
As for arrays, the address of the array itself and its first element is the same, and arrays are implicitely converted to pointers, you can write instead
char const* names[] = { MathewEmerson, /*...*/ };
All this is done implicitely for you, if you use string literals.
Similarly, you could have written:
int *ptr[] = {var, &var[1], &var[2], &var[3]};
(note: var, not &var[0] for the first item) and if we go further, even:
int *ptr[] = {var, var + 1, var + 2, var + 3};
The result always would have been the same. Readability, understandability of one variant vs another? Well, another topic...
Your misconception is wrapped up in your interpretation of what is represented by: "Mathew Emerson"
This is an array of characters that will be instantiated in read only memory as part of your program's bootstrapping. This array of characters is called a String Literal, specifically a:
Narrow multibyte string literal. The type of an unprefixed string literal is const char[]
NathanOliver's answer correctly describes that your compiler doesn't have the warning level turned up high enough so it is allowing the, const char[] to decay into a char*. This is very bad because:
Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals[1]
It would have been completely legal and logical to do this however: const char *names[] = {"Mathew Emerson", "Bob Jackson"} Hopefully that clarifies for you what's happening well enough for you to understand that working with a String Literal is working with a pointer. Your code: int *ptr[] = {var[0], var[1], var[2], var[3]} is then illegal because, var[0] is an int& not an int*. It would be similarly illegal to do: char* ptr = {names[0][0], names[0][1], names[0][2], names[0][3]} Again the problem here would be that I was working with char&s not char*s.
I know that const char * is a pointer to a const char, while char *const is a constant pointer to a char.
I am testing this in the following code:
const char *s = "hello"; // Not permitted to modify the string "hello"
char *const t = "world"; // Not permitted to modify the pointer t
s = "hello2"; // Valid
// t = "world2"; // Invalid, gives compilation error
// *(s + 1) = 'a'; // Invalid, gives compilation error
*(t + 1) = 'a'; // Why does this not work?
The last line does not give any error, but causes the program to terminate unexpectedly. Why is modifying the string pointed to by t not allowed?
t is pointing to a string literal it is undefined behavior to modify a string literal. The C++ draft standard section 2.14.5 String literals paragraph 12 says(emphasis mine):
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
The relevant section from the C99 draft standard is 6.4.5 String literals paragraph 6 which says(emphasis mine):
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
On a typical modern Unix platform you will find string literals in the read-only segment which would result in a access violation if we attempt to modify it. We can use objdump to inspect the read-only section as follows:
objdump -s -j .rodata
we can see in the following live example that the string literal will indeed be found in the read-only section. Note that I had to add a printf otherwise the compiler would optimize out the string literal. Sample `objdump output:
Contents of section .rodata:
400668 01000200 776f726c 64002573 0a00 ....world.%s..
An alternative approach would be to have t point to an array with a copy of a string literal like so:
char r[] = "world";
char *const t = r ;
Although string literals in C officially have a type of char[] (array of char, not const), the C standard specifically states that they must be treated as non-modifiable. Compilers tend to put string literals in a read-only segment, so attempting to modify them results in an access violation.
String literals are described in section 6.4.5 of the C11 standard (ISO/IEC 9899:2011).
You can bypass the compiler error by recast it as char*, as in *((char*)s + 1) = 'a'; but as it was already estated in other answers, this is undefined behaviour and will probably result in Segmentation Fault because you are editing a string literal.
If you want to test it properly, initialize the strings in a function so the initialization can be dynamic and use strdup() for that.
int
main(int argc, char **argv)
{
char *d1 = strdup("hello");
char *d2 = strdup("world");
const char *s = d1;
char *const t = d2;
...
free(d1);
free(d2);
}
The d1 and d2 variables are mainly used so that the dynamic allocations can be properly freed using free() at the end. Also, as other answers suggest, always treat string literals as const char *.
If i define something like below,
char *s1 = "Hello";
why I can't do something like below,
*s1 = 'w'; // gives segmentation fault ...why???
What if I do something like below,
string s1 = "hello";
Can I do something like below,
*s1 = 'w';
Because "Hello" creates a const char[]. This decays to a const char* not a char*. In C++ string literals are read-only. You've created a pointer to such a literal and are trying to write to it.
But when you do
string s1 = "hello";
You copy the const char* "hello" into s1. The difference being in the first example s1 points to read-only "hello" and in the second example read-only "hello" is copied into non-const s1, allowing you to access the elements in the copied string to do what you wish with them.
If you want to do the same with a char* you need to allocate space for char data and copy hello into it
char hello[] = "hello"; // creates a char array big enough to hold "hello"
hello[0] = 'w'; // writes to the 0th char in the array
string literals are usually allocated in read-only data segment.
Because Hello resides in read only memory. Your signature should actually be
const char* s1 = "Hello";
If you want a mutable buffer then declare s1 as a char[]. std::string overloads operator [], so you can index into it, i.e., s1[index] = 'w'.
Time to confuse matters:
char s0[] = "Hello";
s0[0] = 'w';
This is perfectly valid! Of course, this doesn't answer the original question so here we go: string literals are created in read-only memory. That is, their type is char const[n] where n is the size of the string (including the terminating null character, i.e. n == 6 for the string literal "Hello". But why, oh, why can this type be used to initialize a char const*? The answer is simply backward compatibility, respectively compatibility to [old] C code: by the time const made it into the language, lots of places already initialized char* with string literals. Any decent compiler should warn about this abuse, however.