pointer to string and char catch 22 - c++

I'm studying on pointers and I'm stuck when I see char *p[10]. Because something is misunderstood. Can someone explain step-by-step and blow-by-blow why my logic is wrong and what the mistakes are and where did I think wrong and how should I think. Because I want to learn exactly. Also what about int *p[10]; ? Besides, for example x is a pointer to char but just char not chars. But how come char *x = "possible";
I think above one should be right but, I have seen for char *name[] = { "no month","jan","feb" }; I am really confused.

Your char *p[10] diagram shows an array where each element points to a character.
You could construct it like this:
char f = 'f';
char i = 'i';
char l1 = 'l';
char l2 = 'l';
char a1 = 'a';
char r1 = 'r';
char r2 = 'r';
char a2 = 'a';
char y = 'y';
char nul = '\0';
char *p[10] = { &f, &i, &l1, &l2, &a1, &r1, &r2, &a2, &y, &nul };
This is very different from the array
char p[10] = {'f', 'i', 'l', 'l', 'a', 'r', 'r', 'a', 'y', '\0'};
or
char p[10] = "fillarray";
which are arrays of characters, not pointers.
A pointer can equally well point to the first element of an array, as you've probably seen in constructions like
const char *p = "fillarray";
where p holds the address of the first element of an array defined by the literal.
This works because an array can decay into a pointer to its first element.
The same thing happens if you make an array of pointers:
/* Each element is a pointer to the first element of the corresponding string in the initialiser. */
const char *name[] = { "no month","jan","feb" };
You would get the same results with
const char* name[3];
name[0] = "no month";
name[1] = "jan";
name[2] = "feb";

char c = 'a';
Here, c is a char, typically a single byte of ASCII encoded data.
char* ptr = &c;
ptr is a char pointer. In C, all it does is point to a memory location and doesn't make any guarantees about what is at that location. You could use a char* to pass a char to a function to allow the function to allow the function to make changes to that char (pass by reference).
A common C convention is for a char* to point to a memory location where several characters are stored in sequence followed by the null character \0. This convention is called a C string:
char const* cstr = "hello";
cstr points to a block of memory 6 bytes long, ending with a null character. The data itself cannot be modified, though the pointer can be changed to point to something else.
An array of chars looks similar, but behaves slightly differently.
char arr[] = "hello";
Here arr IS a memory block of 6 chars. Since arr represents the memory itself, it cannot be changed to point to another location. The data can be modified though.
Now,
char const* name[] = { "Jan", " Feb"..., "Dec"};
is an array of pointer to characters.
name is a block of memory, each containing a pointer to a null-terminated string.
In the diagram, I think string* was accidentally used instead of char*. The difference between the left and the right, is not a technical difference really, but a difference in the way a char* is used. On the left each char* points to a single character, whereas in the one on the right, each char* points to a null-terminated block of characters.

Both are right.
A pointer in C or C++ may point either to a single item (a single char) or to the first in an array of items (char[]).
So a char *p[10]; definition may point to 10 single characters or 10 arrays (i.e. 10 strings).

Let’s go back to basics.
First, char *p is simply a pointer. p contains nothing more than a memory address. That memory address can point to anything, anywhere. By convention, we have always used NULL (or, I hate this method, assigning it to zero – yeah, they are the same “thing”, but NULL has traditionally been used in conjunction with pointers, so when you’re eyes flit across the code, you see NULL – you think “pointer”).
Anyway, that memory address being pointed to can contain anything. So, to use within the language, we type it, in this case it is a pointer to a character (char *p). This can be overridden by type casting, but that’s for a later time.
Second, we know anytime we see p[10], that we are dealing with an array. Again, the array can be an array of characters, an array of ints, etc. – but it’s still an array.
Your example: char *p[10], is then nothing more than an array of 10 character pointers. Nothing more, nothing less. Your problem comes in because you are trying to force the “string” concept onto this. There ain’t no strings in C. There ain’t no objects in C. The concept of a NULL-terminated string can most certainly be used. But a “string” in C is nothing more than an array of characters, terminated by a NULL (or, if you use some of the appropriate functions, you can use a specific number of characters – strncpy instead of strcpy, etc.). But, for all its appearance, and apparent use, there are no strings in C. They are nothing more than arrays of characters, with a few supporting functions that happen to stop going through the array when a NULL is encountered.
So – char a[10] – is simply an array of characters that is 10 characters long. You can fill it with any characters you wish. If one of those is the NULL character, then that terminates what is typically called a “C-style string”. There are functions that support this type of character array (i.e. “string”), but it is still a use of a character array.
Your confusion comes in because you are trying to mix C++ string objects, and forcing that concept onto C arrays of characters. As ugoren noted – your examples are both correct – because you are dealing with arrays of character pointers, NOT strings. Again, putting a NULL somewhere in that character array is happily supported by several C functions that give you the ability to work with a “string-like” concept – but they are not truly strings. Unless of course, you want to phrase it that a string is nothing more than one character following another – an array.

Related

Array of char pointers

I am looking at some code I did not write and wanted help to understand an element of it. The code stores character arrays, creates pointers to these arrays (assigning the pointers the arrays addresses). It looks like it then creates an array to store these characters pointers addresses and I just wanted some clarification on what I am looking at exactly. I also am confused about the use of the double (**) when creating the array.
I have included a stripped down and simplified example of what I am looking at below.
char eLangAr[20] = "English";
char fLangAr[20] = "French";
char gLangAr[20] = "German";
char* eLangPtr = eLangAr;
char* fLangPtr = fLangAr;
char* gLangPtr = gLangAr;
char **langStrings [3]=
{
&eLangPtr,
&fLangPtr,
&gLangPtr
};
When using the array they pass it as an argument to a function.
menu (*langStrings[0]);
So the idea is that the character array value "English" is passed to the function but I'm having trouble seeing how. They pass the menu function a copy of the value stored in the langStrings function at location 0, which would be the address of eLandPtr? If someone could explain the process in English so I could get my head around it that would be great. It might be just because it has been a long day but its just not straight in my head at all.
You are correct that langStrings contains pointers to pointers of array of characters. So each langString[i] points to a pointer. That pointer points to an array. That array contains the name of a language.
As others point out, it looks a bit clumsy. I'll elaborate:
char eLangAr[20] = "English"; is an array of 20 chars and the name "English" is copied to it. I don't expect that variable eLangAr will ever contain something else than this language name, so there is no need to use an array; a constant would be sufficient.
char **langStrings [3]= ... Here, it would be sufficient to have just one indirection (one *) as there seems no need to ever have the pointer point to anything else (randomly shuffle languages?).
in conclusion, just having the following should be sufficient:
const char *langStrings [3]=
{
"English",
"French",
"German"
};
(Note the const as the strings are now read-only constants/literals.)
What the given code could be useful for is when these language names must be spelled differently in different languages. So
"English",
"French",
"German" become "Engels", "Frans", "Duits". However, then there still is one level of indirection too many and the following would be sufficient:
char *langStrings [3]=
{
aLangArr,
fLangAr,
gLangAr
};
Ok, here goes.
The **ptrToptr notation means a pointer to a pointer. The easiest way to think on that is as a 2D matrix - dereferencing one pointer resolves the whole matrix to just one line in the matrix. Dereferencing the second pointer after that will give one value in the matrix.
This declaration:
char eLangAr[20] = "English";
Declares an array of length 20, type char and it contains the characters 'E', 'n', 'g', 'l', 'i', 's', 'h' '\0'
so it is (probably) null terminated but not full (there are some empty characters at the end). You can set a pointer to the start of it using:
char* englishPtr = &eLangAr[0]
And if you dereference englishPtr, it'll give the value 'E'.
This pointer:
char* eLangPtr = eLangAr;
points to the array itself (not necessarily the first value).
If you look at
*langStrings[0]
You'll see it means the contents (the dereferencing *) of the pointer in langStrings[0]. langStrings[0] is the address of eLangPtr, so dereferencing it gives eLangPtr. And eLangPtr is the pointer to the array eLangAr (which contains "English").
I guess the function wants to be able to write to eLangAr, so make it point to a different word without overwriting "English" itself. It could just over-write the characters itself in memory, but I guess it wants to keep those words safe.
** represents pointer to pointer. It's used when you have to store pointer for another pointer.
Value of eLangPtr will be pointer to eLangAr which will has the value "English"
Here:
char eLangAr[20] = "English";
an array is created. It has capacity of 20 chars and contains 8 characters - word "English" and terminating NULL character. Since it's an array, it can be used in a context where pointer is expected - thanks to array-to-pointer decay, which will construct a pointer to the first element of an array. This is done here:
char* eLangPtr = eLangAr;
Which is the same as:
char* eLangPtr = &eLangAr[0]; // explicitly get the address of the first element
Now, while char* represents pointer to a character (which means it points to a single char), the char** represents a pointer to the char* pointer.
The difference:
char text[] = "Text";
char* chPointer = &ch[0];
char** chPointerPointer = &chPointer; // get the address of the pointer
std::cout << chPointer; // prints the address of 'text' array
std::cout << *chPointer; // prints the first character in 'text' array ('T')
std::cout << chPointerPointer; // prints the address of 'chPointer'
std::cout << *chPointerPointer; // prints the value of 'chPointer' which is the address of 'text' array
std::cout << *(*chPointerPointer); // prints the first character in 'text' array ('T')
As you can see, it is simply an additional level of indirection.
Pointers to pointers are used for the same reason "first-level" pointers are used - they allow you to take the address of a pointer and pass it to a function which may write something to it, modifying the content of the original pointer.
In this case it is not needed, this would be sufficient:
const char *langStrings [3]=
{
eLangPtr,
fLangPtr,
gLangPtr
};
And then:
menu (langStrings[0]);

Why C++ variable doesn't need defining properly when it's a pointer?

I'm completely new to the C++ language (pointers in particular, experience is mainly in PHP) and would love some explanation to the following (I've tried searching for answers).
How are both lines of code able to do exactly the same job in my program? The second line seems to go against everything I've learnt & understood so far about pointers.
char disk[3] = "D:";
char* disk = "D:";
How am I able to initialize a pointer to anything other than a memory address? Not only that, in the second line I'm not declaring the array properly either - but it's still working?
The usual way to initialize an array in C and C++ is:
int a[3] = { 0, 1, 2 };
Aside: And you can optionally leave out the array bound and have it deduced from the initializer list, or have a larger bound than there are initializers:
int aa[] = { 0, 1, 2 }; // another array of three ints
int aaa[5] = { 0, 1, 2 }; // equivalent to { 0, 1, 2, 0, 0}
For arrays of characters there is a special rule that allows an array to be initialized from a string literal, with each element of the array being initialized from the corresponding character in the string literal.
Your first example uses the string literal "D:" so each element of the array will be initialized to a character from that string, equivalent to:
char disk[3] = { 'D', ':', '\0' };
(The third character is the null terminator, which is implicitly present in all string literals).
Aside: Here too you can optionally leave out the array bound and have it deduced from the string literal, or have a larger bound than the string length:
char dd[] = "D:"; // another array of three chars
char ddd[5] = "D:"; // equivalent to { 'D', ':', '\0', '\0', '\0'}
Just like the aaa example above, the extra elements in ddd that don't have a corresponding character in the string will be zero-initialized.
Your second example works because the string literal "D:" will be output by the compiler and stored somewhere in the executable as an array of three chars. When the executable is run the segment that contains the array (and other constants) will be mapped into the process' address space. So your char* pointer is then initialized to point to the location of that array, wherever that happens to be. Conceptually it's similar to:
const char __some_array_created_by_the_compiler[3] = "D:";
const char* disk = __some_array_created_by_the_compiler;
For historical reasons (mostly that const didn't exist in the early days of C) it was legal to use a non-const char* to point to that array, even though the array is actually read-only, so C and the first C++ standard allow you to use a non-const char* pointer to point to a string literal, even though the array that it refers to is really const:
const char __some_array_created_by_the_compiler[3] = "D:";
char* disk = (char*)__some_array_created_by_the_compiler;
This means that despite appearances your two examples are not exactly the same, because this is only allowed for the first one:
disk[0] = 'C';
For the first example that is OK, it alters the first element of the array.
For the second example it might compile, but it results in undefined behaviour, because what it's actually doing is modifying the first element of the __some_array_created_by_the_compiler which is read-only. In practice what will probably happen is that the process will crash, because trying to write to a read-only page of memory will raise a segmentation fault.
It's important to understand that there are lots of things in C++ (and even more in C) which the compiler will happily compile, but which cause Very Bad Things to happen when the code is executed.
char disk[3] = "D:";
Is treated as
char disk[3] = {'D',':','\0'};
Where as in C++11 and above
char* disk = "D:";
Is an error as a string literal is of type const char[] and cannot be assigned to a char *. You can assign it to a const char * though.
String literals are actually read-only, zero-terminated arrays of characters, and using a string literal gives you a pointer to the first character in the array.
So in the second example
char* disk = "D:";
you initialize disk to point to the first character of an array of three characters.
Note in my first paragraph above that I said that string literals are read-only arrays, that means that having a plain char* pointing to this array could make you think that it's okay to modify this array when it's not (attempting to modify a string literal leads to undefined behavior). This is the reason that const char* is usually used:
const char* disk = "D:";
Since C++11 it's actually an error to not use a const char*, through most compilers still only warn about it instead of producing an error.
You are absolutely right to say that pointers can store only memory address. Then how is the second statement valid? Let me explain.
When you put a sequence of characters in double quotes, what happens behind the screens is that the string gets stored in a read only computer memory and the address of the location where the string is stored is returned. So at run-time, the expression is evaluated, the string evaluates to the memory address, which is a character pointer. It is this pointer that is assigned to your pointer variable.
So what is the difference between the two statements? The string in the second case is a constant, while the string declared by the first statement can be changed.

how to understand char * ch="123"?

How should I understand char * ch="123"?
'1' is a char, so I can use:
char x = '1';
char *pt = &x;
But how do I understand char *pt="123"? Why can the char *pt point to string?
Is pt's value the first address value for "123"? If so, how do I get the length for the string pointed to by pt?
That is actually a really good question, and it is the consequence of several oddities in the C language:
1: A pointer to a char (char*) can of course also point to a specific char in an array of chars. That is what pointer arithmetic relies on:
// create an array of three chars
char arr[3] = { 'a', 'b', 'c'};
// point to the first char in the array
char* ptr = &arr[0]
// point to the third char in the array
char* ptr = &arr[2]
2: A string literal ("foo") is actually not a string as such, but simply an array of chars, followed by a null byte. (So "foo" is actually equivalent to the array {'f', 'o', 'o', '\0'})
3: In C, arrays "decay" into pointers to the first element. (This is why many people incorrectly says that "there is no difference between arrays and pointers in C"). That is, when you try to assign an array to a pointer object, it sets the pointer to point to the first element of the array. So given the array arr declared above, you can do char* ptr = arr, and it means the same as char* ptr = &arr[0].
4: In every other case, syntax like this would make the pointer point to an rvalue (loosely speaking, a temporary object, which you can't take the address of), which is generally illegal. (You can't do int* ptr = &42). But when you define a string literal (such as "foo"), it does not create an rvalue. Instead, it creates the char array with static storage. You're creating a static object, which is created when the program is loaded, and of course a pointer can safely point to that.
5: String literals are actually required to be marked as const (because they are static and read-only), but because early versions of C did not have the const keyword, you are allowed to omit the const specifier (at least prior to C++11), to avoid breaking old code (but you still have to treat the variable as read-only).
So char* ch = "123" really means:
write the char array {'1', '2', '3', '\0'} into the static section of the executable (so that when the program is loaded into memory, this variable is created in a read-only section of memory)
when this line of code is executed, create a pointer which points to the first element of this array
As a bonus fun fact, this differs from char ch[] = "123";, which instead means
write the char array {'1', '2', '3', '\0'} into the static section of the executable (so that when the program is loaded into memory, this variable is created in a read-only section of memory)
when this line of code is executed, create an array on the stack which contains a copy of this statically allocated array.
char* ptr = "123"; is compatible and almost equivalent to char ptr[] = { '1', '2', '3', '\0' }; (see http://ideone.com/rFOk3R).
In C a pointer can point to one value or an array of contiguous values. C++ inherited this.
So a string is just an array of character (char) ended by a '\0'. And a pointer to char can point to an array of char.
The length is given by the number of character between the begining and the terminal '\0'. Exemple of C strlen giving you the length of the string:
size_t strlen(const char * str)
{
const char *s;
for (s = str; *s; ++s) {}
return(s - str);
}
An yes it fails horribly if there is no '\0' at the end.
A string literal is an array of N const char where N is the length of the literal including the implicit NUL terminator. It has static storage duration and it's implementation defined where it is stored. From here on, it's the same a with a normal array - it decays to a pointer to its first character - that's a const char*. What you have there is not legal (not anymore since onset of C++11 standard) in C++, it should be const char* ch = "123";.
You can get the length of a literal with sizeof operator. Once it decays to a pointer, though, you need to iterate through it and find the terminator (that's what strlen function does).
So, with a const char* ch; you get a pointer to a constant character type that can point to a single character, or to the start of an array of characters (or anywhere between the start and the end). The array can be dynamically, autimatically or statically allocated and can be mutable or not.
In something like char ch[] = "text"; you have an array of characters. This is syntatic sugar for a normal array initializer (as in char ch[] = {'t','e','x','t','\0'}; but note that the literal will still be loaded at the start of the program). What hapens here is:
an array with automatic storage duration is allocated
its size is deduced from the size of the literal by the compiler
the contents of the literal are copied to the array
As a result, you have a region of storage that you can use at will (unlike literals, which must not be written into).
There are no strings in C, but there are pointers to characters.
*pt is indeed not pointing to a string, but to a single characters (the '1').
However, some functions take char* as argument assume that the byte on the address following the address that their argument points to, is set to 0 if they are not to operate on it.
In your example, if you tried using pt on a function which expects a "null terminated string" (basically, which expects that it will encounter a byte with a value of 0 when it should stop processing data) you will run into a segmentation fault, as x='1' gives x the ascii value of the 1 character, but nothing more, whereas char* pt="123" gives pt the value of the address of 1, but also puts into that memory, the bytes containing ascii values of 1, 2,3 followed by a byte with a value of 0 (zero).
So the memory (in a 8 bit machine) may look like this:
Address = Content (0x31 is the Ascii code for the character 1 (one))
0xa0 = 0x31
0xa1 = 0x32
0xa2 = 0x33
0xa3 = 0x00
Let's suppose that you in the same machine char* otherString = malloc(4),suppose that malloc returns a value of 0xb0, which is now the value of otherString, and we wanted to copy our "pt" (which would have a value of 0xa0) into otherString, the strcpy call would look like so:
strcpy( otherString, pt );
The same as
strcpy( 0xb0, 0x0a );
strcpy would then take the value of address 0xa0 and copy it into 0xb0, it would increment it's pointers to "pt" to 0xa1, check if 0xa1 is zero, if it is not zero, it would increment it's pointer to "otherString" and copy 0xa1 into 0xb1, and so on, until it's "pt" pointer is 0xa3, in this case, it will return as it detected that the end of the "string" has been reached.
This is of cause, not 100% how it goes on, and it could be implemented in many different ways.
Here is one http://fossies.org/dox/glibc-2.18/strcpy_8c_source.html
A pointer to an array?
A pointer points to only one memory address. The phrase that a pointer points to an array is only used in a loose sense---a pointer cannot really store multiple addresses at the same time.
In your example, char *ch="123", the pointer ch is really pointing to the first byte only. You can write code like the following, and it will make perfect sense:
char *ch = new char [1024];
sprintf (ch, "Hello");
delete [] ch;
char x = '1';
ch = &x;
Please note the use of the pointer ch to point to both the memory allocated by new char [1024] line as well as the address of the variable x, while still being the same pointer type.
C-style strings are null terminated
Strings in C used to be null terminated, i.e., a special '\0' was added to the end of the string and assumed to be there for all char * based functions (such as strlen and printf) This way, you can determine the length of the string by starting at the first byte and continue till you find the byte containing 0x00.
A verbose, sample implementation of anstrlen style function would be
int my_strlen (const char *startAddress)
{
int count = 0;
char *ptr = startAddress;
while (*ptr != 0)
{
++count;
++ptr;
}
return count;
}
char* pt = "123"; does two things:
1. creates the string literal "123" in ROM (this is usually in .text section)
2. creates a char* which is assigned the beginning of memory location where the string is located.
because of this operations like pt[1] = '2'; are illegal as you would be attempting to write to ROM memory.
But you can assign the pointer to some other memory location without any problems.

What is a char*?

Why do we need the *?
char* test = "testing";
From what I understood, we only apply * onto addresses.
This is a char:
char c = 't';
It can only hold one character!
This is a C-string:
char s[] = "test";
It can hold multiple characters. Another way to write the above is:
char s[] = {'t', 'e', 's', 't', 0};
The 0 at the end is called the NUL terminator. It denotes the end of a C-string.
A char* stores the starting memory location of a C-string.1 For example, we can use it to refer to the same array s that we defined above. We do this by setting our char* to the memory location of the first element of s:
char* p = &(s[0]);
The & operator gives us the memory location of s[0].
Here is a shorter way to write the above:
char* p = s;
Notice:
*(p + 0) == 't'
*(p + 1) == 'e'
*(p + 2) == 's'
*(p + 3) == 't'
*(p + 4) == 0 // NUL
Or, alternatively:
p[0] == 't'
p[1] == 'e'
p[2] == 's'
p[3] == 't'
p[4] == 0 // NUL
Another common usage of char* is to refer to the memory location of a string literal:
const char* myStringLiteral = "test";
Warning: This string literal should not be changed at runtime. We use const to warn the programmer (and compiler) not to modify myStringLiteral in the following illegal manner:
myStringLiteral[0] = 'b'; // Illegal! Do not do this for const char*!
This is different from the array s above, which we are allowed to modify. This is because the string literal "test" is automatically copied into the array at initialization phase. But with myStringLiteral, no such copying occurs. (Where would we copy to, anyways? There's no array to hold our data... just a lonely char*!)
1 Technical note: char* merely stores a memory location to things of type char. It can certainly refer to just a single char. However, it is much more common to use char* to refer to C-strings, which are NUL-terminated character sequences, as shown above.
The char type can only represent a single character. When you have a sequence of characters, they are piled next to each other in memory, and the location of the first character in that sequence is returned (assigned to test). Test is nothing more than a pointer to the memory location of the first character in "testing", saying that the type it points to is a char.
You can do one of two things:
char *test = "testing";
or:
char test[] = "testing";
Or, a few variations on those themes like:
char const *test = "testing";
I mention this primarily because it's the one you usually really want.
The bottom line, however, is that char x; will only define a single character. If you want a string of characters, you have to define an array of char or a pointer to char (which you'll initialize with a string literal, as above, more often than not).
There are real differences between the first two options though. char *test=... defines a pointer named test, which is initialized to point to a string literal. The string literal itself is allocated statically (typically right along with the code for your program), and you're not supposed to (attempt to) modify it -- thus the preference for char const *.
The char test[] = .. allocates an array. If it's a global, it's pretty similar to the previous except that it does not allocate a separate space for the pointer to the string literal -- rather, test becomes the name attached to the string literal itself.
If you do this as a local variable, test will still refer directly to the string literal - but since it's a local variable, it allocates "auto" storage (typically on the stack), which gets initialized (usually from a normal, statically allocated string literal) on every entry to the block/scope where it's defined.
The latter versions (with an array of char) can act deceptively similar to a pointer, because the name of an array will decay to the address of the beginning of the array anytime you pass it to a function. There are differences though. You can modify the array, but modifying a string literal gives undefined behavior. Conversely, you can change the pointer to point at some other chars, so something like:
char *test = "testing";
if (whatever)
test = "not testing any more";
...is perfectly fine, but trying to do the same with an array won't work (arrays aren't assignable).
The main thing people forgot to mention is that "testing" is an array of chars in memory, there's no such thing as primitive string type in c++. Therefore as with any other array, you can't reference it as if it is an element.
char* represents the address of the beginning of the contiguous block of memory of char's. You need it as you are not using a single char variable you are addressing a whole array of char's
When accessing this, functions will take the address of the first char and step through the memory. This is possible as arrays use contiguous memory (i.e. all of the memory is consecutive in memory).
Hope this clears things up! :)
Using a * says that this variable points to a location in memory. In this case, it is pointing to the location of the string "testing". With a char pointer, you are not limited to just single characters, because now you have more space available to you.
In C a array is represented by a pointer to the first element in it.

Difference between using character pointers and character arrays

Basic question.
char new_str[]="";
char * newstr;
If I have to concatenate some data into it or use string functions like strcat/substr/strcpy, what's the difference between the two?
I understand I have to allocate memory to the char * approach (Line #2). I'm not really sure how though.
And const char * and string literals are the same?
I need to know more on this. Can someone point to some nice exhaustive content/material?
The excellent source to clear up the confusion is Peter Van der Linden, Expert C Programming, Deep C secrets - that arrays and pointers are not the same is how they are addressed in memory.
With an array, char new_str[]; the compiler has given the new_str a memory address that is known at both compilation and runtime, e.g. 0x1234, hence the indexing of the new_str is simple by using []. For example new_str[4], at runtime, the code picks the address of where new_str resides in, e.g. 0x1234 (that is the address in physical memory). by adding the index specifier [4] to it, 0x1234 + 0x4, the value can then be retrieved.
Whereas, with a pointer, the compiler gives the symbol char *newstr an address e.g. 0x9876, but at runtime, that address used, is an indirect addressing scheme. Supposing that newstr was malloc'd newstr = malloc(10);, what is happening is that, everytime a reference in the code is made to use newstr, since the address of newstr is known by the compiler i.e. 0x9876, but what is newstr pointing to is variable. At runtime, the code fetches data from physical memory 0x9876 (i.e. newstr), but at that address is, another memory address (since we malloc'd it), e.g 0x8765 it is here, the code fetches the data from that memory address that malloc assigned to newstr, i.e. 0x8765.
The char new_str[] and char *newstr are used interchangeably, since an zeroth element index of the array decays into a pointer and that explains why you could newstr[5] or *(newstr + 5) Notice how the pointer expression is used even though we have declared char *newstr, hence *(new_str + 1) = *newstr; OR *(new_str + 1) = newstr[1];
In summary, the real difference between the two is how they are accessed in memory.
Get the book and read it and live it and breathe it. Its a brilliant book! :)
Please go through this article below:
Also see in case of array of char like in your case, char new_str[] then the new_str will always point to the base of the array. The pointer in itself can't be incremented. Yes you can use subscripts to access the next char in array eg: new_str[3];
But in case of pointer to char, the pointer can be incremented new_str++ to fetch you the next character in the array.
Also I would suggest this article for more clarity.
This is a character array:
char buf [1000];
So, for example, this makes no sense:
buf = &some_other_buf;
This is because buf, though it has characteristics of type pointer, it is already pointing to the only place that makes sense for it.
char *ptr;
On the other hand, ptr is only a pointer, and may point somewhere. Most often, it's something like this:
ptr = buf; // #1: point to the beginning of buf, same as &buf[0]
or maybe this:
ptr = malloc (1000); // #2: allocate heap and point to it
or:
ptr = "abcdefghijklmn"; // #3: string constant
For all of these, *ptr can be written to—except the third case where some compiling environment define string constants to be unwritable.
*ptr++ = 'h'; // writes into #1: buf[0], #2: first byte of heap, or
// #3 overwrites "a"
strcpy (ptr, "ello"); // finishes writing hello and adds a NUL
The difference is that one is a pointer, the other is an array. You can, for instance, sizeof() array. You may be interested in peeking here
If you're using C++ as your tags indicate, you really should be using the C++ strings, not the C char arrays.
The string type makes manipulating strings a lot easier.
If you're stuck with char arrays for some reason, the line:
char new_str[] = "";
allocates 1 byte of space and puts a null terminator character into it. It's subtly different from:
char *new_str = "";
since that may give you a reference to non-writable memory. The statement:
char *new_str;
on its own gives you a pointer but nothing that it points to. It can also have a random value if it's local to a function.
What people tend to do (in C rather than C++) is to do something like:
char *new_str = malloc (100); // (remember that this has to be freed) or
char new_str[100];
to get enough space.
If you use the str... functions, you're basically responsible for ensuring that you have enough space in the char array, lest you get all sorts of weird and wonderful practice at debugging code. If you use real C++ strings, a lot of the grunt work is done for you.
The type of the first is char[1], the second is char *. Different types.
Allocate memory for the latter with malloc in C, or new in C++.
char foo[] = "Bar"; // Allocates 4 bytes and fills them with
// 'B', 'a', 'r', '\0'.
The size here is implied from the initializer string.
The contents of foo are mutable. You can change foo[i] for example where i = 0..3.
OTOH if you do:
char *foo = "Bar";
The compiler now allocates a static string "Bar" in readonly memory and cannot be modified.
foo[i] = 'X'; // is now undefined.
char new_str[]="abcd";
This specifies an array of characters (a string) of size 5 bytes (one byte for each character plus one for the null terminator). So it stores the string 'abcd' in memory and we can access this string using the variable new_str.
char *new_str="abcd";
This specifies a string 'abcd' is stored somewhere in the memory and the pointer new_str points to the first character of that string.
To differentiate them in the memory allocation side:
// With char array, "hello" is allocated on stack
char s[] = "hello";
// With char pointer, "hello" is stored in the read-only data segment in C++'s memory layout.
char *s = "hello";
// To allocate a string on heap, malloc 6 bytes, due to a NUL byte in the end
char *s = malloc(6);
s = "hello";
If you're in c++ why not use std::string for all your string needs? Especially anything dealing with concatenation. This will save you from a lot of problems.