I'm new in c++ language and I am trying to understand the pointers concept.
I have a basic question regarding the char pointer,
What I know is that the pointer is a variable that stores an address value,
so when I write sth like this:
char * ptr = "hello";
From my basic knowledge, I think that after = there should be an address to be assigned to the pointer, but here we assign "hello" which is set of chars.
So what does that mean ?
Is the pointer ptr points to an address that stores "hello"? or does it store the hello itself?
Im so confused, hope you guys can help me..
Thanks in advance.
ptr holds the address to where the literal "hello" is stored at. In this case, it points to a string literal. It's an immutable array of characters located in static (most commonly read-only) memory.
You can make ptr point to something else by re-assigning it, but before you do, modifying the contents is illegal. (its type is actually const char*, the conversion to char* is deprecated (and even illegal in C++11) for C compatibility.
Because of this guarantee, the compiler is free to optimize for space, so
char * ptr = "hello";
char * ptr1 = "hello";
might yield two equal pointers. (i.e. ptr == ptr1)
The pointer is pointing to the address where "hello" is stored. More precisely it is pointing the 'h' in the "hello".
"hello" is a string literal: a static array of characters. Like all arrays, it can be converted to a pointer to its first element, if it's used in a context that requires a pointer.
However, the array is constant, so assigning it to char* (rather than const char*) is a very bad idea. You'll get undefined behaviour (typically an access violation) if you try to use that pointer to modify the string.
The compiler will "find somewhere" that it can put the string "hello", and the ptr will have the address of that "somewhere".
When you create a new char* by assigning it a string literal, what happens is char* gets assigned the address of the literal. So the actual value of char* might be 0x87F2F1A6 (some hex-address value). The char* points to the start (in this case the first char) of the string. In C and C++, all strings are terminated with a /0, this is how the system knows it has reached the end of the String.
char* text = "Hello!" can be thought of as the following:
At program start, you create an array of chars, 7 in length:
{'H','e','l','l','o','!','\0'}. The last one is the null character and shows that there aren't any more characters after it. [It's more efficient than keeping a count associated with the string... A count would take up perhaps 4 bytes for a 32-bit integer, while the null character is just a single byte, or two bytes if you're using Unicode strings. Plus it's less confusing to have a single array ending in the null character than to have to manage an array of characters and a counting variable at the same time.]
The difference between creating an array and making a string constant is that an array is editable and a string constant (or 'string literal') is not. Trying to set a value in a string literal causes problems: they are read-only.
Then, whenever you call the statement char* text = "Hello!", you take the address of that initial array and stick it into the variable text. Note that if you have something like this...
char* text1 = "Hello!";
char* text2 = "Hello!";
char* text3 = "Hello!";
...then it's quite possible that you're creating three separate arrays of {'H','e','l','l','o','!','\0'}, so it would be more efficient to do this...
char* _text = "Hello!";
char* text1 = _text;
char* text2 = _text;
char* text3 = _text;
Most compilers are smart enough to only initialize one string constant automatically, but some will only do that if you manually turn on certain optimization features.
Another note: from my experience, using delete [] on a pointer to a string literal doesn't cause issues, but it's unnecessary since as far as I know it doesn't actually delete it.
Related
I'm completely new to the C++ language (pointers in particular, experience is mainly in PHP) and would love some explanation to the following (I've tried searching for answers).
How are both lines of code able to do exactly the same job in my program? The second line seems to go against everything I've learnt & understood so far about pointers.
char disk[3] = "D:";
char* disk = "D:";
How am I able to initialize a pointer to anything other than a memory address? Not only that, in the second line I'm not declaring the array properly either - but it's still working?
The usual way to initialize an array in C and C++ is:
int a[3] = { 0, 1, 2 };
Aside: And you can optionally leave out the array bound and have it deduced from the initializer list, or have a larger bound than there are initializers:
int aa[] = { 0, 1, 2 }; // another array of three ints
int aaa[5] = { 0, 1, 2 }; // equivalent to { 0, 1, 2, 0, 0}
For arrays of characters there is a special rule that allows an array to be initialized from a string literal, with each element of the array being initialized from the corresponding character in the string literal.
Your first example uses the string literal "D:" so each element of the array will be initialized to a character from that string, equivalent to:
char disk[3] = { 'D', ':', '\0' };
(The third character is the null terminator, which is implicitly present in all string literals).
Aside: Here too you can optionally leave out the array bound and have it deduced from the string literal, or have a larger bound than the string length:
char dd[] = "D:"; // another array of three chars
char ddd[5] = "D:"; // equivalent to { 'D', ':', '\0', '\0', '\0'}
Just like the aaa example above, the extra elements in ddd that don't have a corresponding character in the string will be zero-initialized.
Your second example works because the string literal "D:" will be output by the compiler and stored somewhere in the executable as an array of three chars. When the executable is run the segment that contains the array (and other constants) will be mapped into the process' address space. So your char* pointer is then initialized to point to the location of that array, wherever that happens to be. Conceptually it's similar to:
const char __some_array_created_by_the_compiler[3] = "D:";
const char* disk = __some_array_created_by_the_compiler;
For historical reasons (mostly that const didn't exist in the early days of C) it was legal to use a non-const char* to point to that array, even though the array is actually read-only, so C and the first C++ standard allow you to use a non-const char* pointer to point to a string literal, even though the array that it refers to is really const:
const char __some_array_created_by_the_compiler[3] = "D:";
char* disk = (char*)__some_array_created_by_the_compiler;
This means that despite appearances your two examples are not exactly the same, because this is only allowed for the first one:
disk[0] = 'C';
For the first example that is OK, it alters the first element of the array.
For the second example it might compile, but it results in undefined behaviour, because what it's actually doing is modifying the first element of the __some_array_created_by_the_compiler which is read-only. In practice what will probably happen is that the process will crash, because trying to write to a read-only page of memory will raise a segmentation fault.
It's important to understand that there are lots of things in C++ (and even more in C) which the compiler will happily compile, but which cause Very Bad Things to happen when the code is executed.
char disk[3] = "D:";
Is treated as
char disk[3] = {'D',':','\0'};
Where as in C++11 and above
char* disk = "D:";
Is an error as a string literal is of type const char[] and cannot be assigned to a char *. You can assign it to a const char * though.
String literals are actually read-only, zero-terminated arrays of characters, and using a string literal gives you a pointer to the first character in the array.
So in the second example
char* disk = "D:";
you initialize disk to point to the first character of an array of three characters.
Note in my first paragraph above that I said that string literals are read-only arrays, that means that having a plain char* pointing to this array could make you think that it's okay to modify this array when it's not (attempting to modify a string literal leads to undefined behavior). This is the reason that const char* is usually used:
const char* disk = "D:";
Since C++11 it's actually an error to not use a const char*, through most compilers still only warn about it instead of producing an error.
You are absolutely right to say that pointers can store only memory address. Then how is the second statement valid? Let me explain.
When you put a sequence of characters in double quotes, what happens behind the screens is that the string gets stored in a read only computer memory and the address of the location where the string is stored is returned. So at run-time, the expression is evaluated, the string evaluates to the memory address, which is a character pointer. It is this pointer that is assigned to your pointer variable.
So what is the difference between the two statements? The string in the second case is a constant, while the string declared by the first statement can be changed.
I am trying to understand the following code:
const TCHAR * portName = "COM15";
I understand that a TCHAR is either a Char (in ANSI) or a wChar (in Unicode), basically a 1 byte or 2 byte container that represents something.
Now, if I declare a pointer to a const TCHAR called portName, portName is then a pointer. When I use the "=" sign, I am giving that pointer a value, and it seems irrational to me that "COM15" would be the address. I assume that line of code is giving me a pointer to the location of the beginning of the "COM15" string of characters, correct?
So what is actually happening in that line of code?
Is a string of characters ("COM15") being created and the "=" sign actually means that the location of the beginning of that string is being given to portName?
"Is a string of characters ("COM15") being created and the "=" sign actually means that the location of the beginning of that string is being given to portName?"
Yes, exactly. But other than it sounds from your question as you might have expected, this happens when the program is compiled, and not at run time. Also the const keyword prohibits changing that pointer at runtime later.
This how C works:
When you say char * str1 in C, you are allocating a pointer in the memory. When you write str1 = "Hello";, you are creating a string literal in memory and making the pointer point to it.
When you create another string literal "new string" and assign it to str1, all you are doing is changing where the pointer points.
I saw this example:
const char* SayHi() { return "Hi"; }
And it works fine, but if I try to remove the pointer it doesn't work and I can't figure
out why.
const char SayHi() { return "Hi"; } \\Pointer removed
It works if I assign it a single character like this:
const char SayHi() { return 'H'; } \\Pointer removed and only 1 character
But I don't know what makes it work exactly. Why would a pointer be able to hold more than one character? Isn't a pointer just a variable that points to another one? What does this point to?
That is because a char is by definition a single character (like in your 3rd case). If you want a string, you can either use a array of chars which decays to const char* (like in your first case) or, the C++ way, use std::string.
Here you can read more about the "array decaying to pointer" thing.
You are correct that a pointer is just a variable that points somewhere -- in this case it points to a string of characters somewhere in memory. By convention, strings (arrays of char) end with a null character (0), so operations like strlen can terminate safely without overflowing a buffer.
As for where that particular pointer (in your first example) points to, it is pointing to the string literal "Hi" (with a null terminator at the end added by the compiler). That location is platform-dependent and is answered here.
It is also better practice to use std::string in C++ than plain C arrays of characters.
Why do we need the *?
char* test = "testing";
From what I understood, we only apply * onto addresses.
This is a char:
char c = 't';
It can only hold one character!
This is a C-string:
char s[] = "test";
It can hold multiple characters. Another way to write the above is:
char s[] = {'t', 'e', 's', 't', 0};
The 0 at the end is called the NUL terminator. It denotes the end of a C-string.
A char* stores the starting memory location of a C-string.1 For example, we can use it to refer to the same array s that we defined above. We do this by setting our char* to the memory location of the first element of s:
char* p = &(s[0]);
The & operator gives us the memory location of s[0].
Here is a shorter way to write the above:
char* p = s;
Notice:
*(p + 0) == 't'
*(p + 1) == 'e'
*(p + 2) == 's'
*(p + 3) == 't'
*(p + 4) == 0 // NUL
Or, alternatively:
p[0] == 't'
p[1] == 'e'
p[2] == 's'
p[3] == 't'
p[4] == 0 // NUL
Another common usage of char* is to refer to the memory location of a string literal:
const char* myStringLiteral = "test";
Warning: This string literal should not be changed at runtime. We use const to warn the programmer (and compiler) not to modify myStringLiteral in the following illegal manner:
myStringLiteral[0] = 'b'; // Illegal! Do not do this for const char*!
This is different from the array s above, which we are allowed to modify. This is because the string literal "test" is automatically copied into the array at initialization phase. But with myStringLiteral, no such copying occurs. (Where would we copy to, anyways? There's no array to hold our data... just a lonely char*!)
1 Technical note: char* merely stores a memory location to things of type char. It can certainly refer to just a single char. However, it is much more common to use char* to refer to C-strings, which are NUL-terminated character sequences, as shown above.
The char type can only represent a single character. When you have a sequence of characters, they are piled next to each other in memory, and the location of the first character in that sequence is returned (assigned to test). Test is nothing more than a pointer to the memory location of the first character in "testing", saying that the type it points to is a char.
You can do one of two things:
char *test = "testing";
or:
char test[] = "testing";
Or, a few variations on those themes like:
char const *test = "testing";
I mention this primarily because it's the one you usually really want.
The bottom line, however, is that char x; will only define a single character. If you want a string of characters, you have to define an array of char or a pointer to char (which you'll initialize with a string literal, as above, more often than not).
There are real differences between the first two options though. char *test=... defines a pointer named test, which is initialized to point to a string literal. The string literal itself is allocated statically (typically right along with the code for your program), and you're not supposed to (attempt to) modify it -- thus the preference for char const *.
The char test[] = .. allocates an array. If it's a global, it's pretty similar to the previous except that it does not allocate a separate space for the pointer to the string literal -- rather, test becomes the name attached to the string literal itself.
If you do this as a local variable, test will still refer directly to the string literal - but since it's a local variable, it allocates "auto" storage (typically on the stack), which gets initialized (usually from a normal, statically allocated string literal) on every entry to the block/scope where it's defined.
The latter versions (with an array of char) can act deceptively similar to a pointer, because the name of an array will decay to the address of the beginning of the array anytime you pass it to a function. There are differences though. You can modify the array, but modifying a string literal gives undefined behavior. Conversely, you can change the pointer to point at some other chars, so something like:
char *test = "testing";
if (whatever)
test = "not testing any more";
...is perfectly fine, but trying to do the same with an array won't work (arrays aren't assignable).
The main thing people forgot to mention is that "testing" is an array of chars in memory, there's no such thing as primitive string type in c++. Therefore as with any other array, you can't reference it as if it is an element.
char* represents the address of the beginning of the contiguous block of memory of char's. You need it as you are not using a single char variable you are addressing a whole array of char's
When accessing this, functions will take the address of the first char and step through the memory. This is possible as arrays use contiguous memory (i.e. all of the memory is consecutive in memory).
Hope this clears things up! :)
Using a * says that this variable points to a location in memory. In this case, it is pointing to the location of the string "testing". With a char pointer, you are not limited to just single characters, because now you have more space available to you.
In C a array is represented by a pointer to the first element in it.
Basic question.
char new_str[]="";
char * newstr;
If I have to concatenate some data into it or use string functions like strcat/substr/strcpy, what's the difference between the two?
I understand I have to allocate memory to the char * approach (Line #2). I'm not really sure how though.
And const char * and string literals are the same?
I need to know more on this. Can someone point to some nice exhaustive content/material?
The excellent source to clear up the confusion is Peter Van der Linden, Expert C Programming, Deep C secrets - that arrays and pointers are not the same is how they are addressed in memory.
With an array, char new_str[]; the compiler has given the new_str a memory address that is known at both compilation and runtime, e.g. 0x1234, hence the indexing of the new_str is simple by using []. For example new_str[4], at runtime, the code picks the address of where new_str resides in, e.g. 0x1234 (that is the address in physical memory). by adding the index specifier [4] to it, 0x1234 + 0x4, the value can then be retrieved.
Whereas, with a pointer, the compiler gives the symbol char *newstr an address e.g. 0x9876, but at runtime, that address used, is an indirect addressing scheme. Supposing that newstr was malloc'd newstr = malloc(10);, what is happening is that, everytime a reference in the code is made to use newstr, since the address of newstr is known by the compiler i.e. 0x9876, but what is newstr pointing to is variable. At runtime, the code fetches data from physical memory 0x9876 (i.e. newstr), but at that address is, another memory address (since we malloc'd it), e.g 0x8765 it is here, the code fetches the data from that memory address that malloc assigned to newstr, i.e. 0x8765.
The char new_str[] and char *newstr are used interchangeably, since an zeroth element index of the array decays into a pointer and that explains why you could newstr[5] or *(newstr + 5) Notice how the pointer expression is used even though we have declared char *newstr, hence *(new_str + 1) = *newstr; OR *(new_str + 1) = newstr[1];
In summary, the real difference between the two is how they are accessed in memory.
Get the book and read it and live it and breathe it. Its a brilliant book! :)
Please go through this article below:
Also see in case of array of char like in your case, char new_str[] then the new_str will always point to the base of the array. The pointer in itself can't be incremented. Yes you can use subscripts to access the next char in array eg: new_str[3];
But in case of pointer to char, the pointer can be incremented new_str++ to fetch you the next character in the array.
Also I would suggest this article for more clarity.
This is a character array:
char buf [1000];
So, for example, this makes no sense:
buf = &some_other_buf;
This is because buf, though it has characteristics of type pointer, it is already pointing to the only place that makes sense for it.
char *ptr;
On the other hand, ptr is only a pointer, and may point somewhere. Most often, it's something like this:
ptr = buf; // #1: point to the beginning of buf, same as &buf[0]
or maybe this:
ptr = malloc (1000); // #2: allocate heap and point to it
or:
ptr = "abcdefghijklmn"; // #3: string constant
For all of these, *ptr can be written to—except the third case where some compiling environment define string constants to be unwritable.
*ptr++ = 'h'; // writes into #1: buf[0], #2: first byte of heap, or
// #3 overwrites "a"
strcpy (ptr, "ello"); // finishes writing hello and adds a NUL
The difference is that one is a pointer, the other is an array. You can, for instance, sizeof() array. You may be interested in peeking here
If you're using C++ as your tags indicate, you really should be using the C++ strings, not the C char arrays.
The string type makes manipulating strings a lot easier.
If you're stuck with char arrays for some reason, the line:
char new_str[] = "";
allocates 1 byte of space and puts a null terminator character into it. It's subtly different from:
char *new_str = "";
since that may give you a reference to non-writable memory. The statement:
char *new_str;
on its own gives you a pointer but nothing that it points to. It can also have a random value if it's local to a function.
What people tend to do (in C rather than C++) is to do something like:
char *new_str = malloc (100); // (remember that this has to be freed) or
char new_str[100];
to get enough space.
If you use the str... functions, you're basically responsible for ensuring that you have enough space in the char array, lest you get all sorts of weird and wonderful practice at debugging code. If you use real C++ strings, a lot of the grunt work is done for you.
The type of the first is char[1], the second is char *. Different types.
Allocate memory for the latter with malloc in C, or new in C++.
char foo[] = "Bar"; // Allocates 4 bytes and fills them with
// 'B', 'a', 'r', '\0'.
The size here is implied from the initializer string.
The contents of foo are mutable. You can change foo[i] for example where i = 0..3.
OTOH if you do:
char *foo = "Bar";
The compiler now allocates a static string "Bar" in readonly memory and cannot be modified.
foo[i] = 'X'; // is now undefined.
char new_str[]="abcd";
This specifies an array of characters (a string) of size 5 bytes (one byte for each character plus one for the null terminator). So it stores the string 'abcd' in memory and we can access this string using the variable new_str.
char *new_str="abcd";
This specifies a string 'abcd' is stored somewhere in the memory and the pointer new_str points to the first character of that string.
To differentiate them in the memory allocation side:
// With char array, "hello" is allocated on stack
char s[] = "hello";
// With char pointer, "hello" is stored in the read-only data segment in C++'s memory layout.
char *s = "hello";
// To allocate a string on heap, malloc 6 bytes, due to a NUL byte in the end
char *s = malloc(6);
s = "hello";
If you're in c++ why not use std::string for all your string needs? Especially anything dealing with concatenation. This will save you from a lot of problems.