I like "reinventing the wheel" for learning purposes, so I'm working on a container class for strings. Will using the NULL character as an array terminator (i.e., the last value in the array will be NULL) cause interference with the null-terminated strings?
I think it would only be an issue if an empty string is added, but I might be missing something.
EDIT: This is in C++.
"" is the empty string in C and C++, not NULL. Note that "" has exactly one element (instead of zero), meaning it is equivalent to {'\0'} as an array of char.
char const *notastring = NULL;
char const *emptystring = "";
emptystring[0] == '\0'; // true
notastring[0] == '\0'; // crashes
No, it won't, because you won't be storing in an array of char, you'll be storing in an array of char*.
char const* strings[] = {
"WTF"
, "Am"
, "I"
, "Using"
, "Char"
, "Arrays?!"
, 0
};
It depends on what kind of string you're storing.
If you're storing C-style strings, which are basically just pointers to character arrays (char*), there's a difference between a NULL pointer value, and an empty string. The former means the pointer is ‘empty’, the latter means the pointer points to an array that contains a single item with character value 0 ('\0'). So the pointer still has a value, and testing it (if (foo[3])) will work as expected.
If what you're storing are C++ standard library strings of type string, then there is no NULL value. That's because there is no pointer, and the string type is treated as a single value. (Whereas a pointer is technically not, but can be seen as a reference.)
I think you are confused. While C-strings are "null terminated", there is no "NULL" character. NULL is a name for a null pointer. The terminator for a C-string is a null character, i.e. a byte with a value of zero. In ASCII, this byte is (somewhat confusingly) named NUL.
Suppose your class contains an array of char that is used to store the string data. You do not need to "mark the end of the array"; the array has a specific size that is set at compile-time. You do need to know how much of that space is actually being used; the null-terminator on the string data accomplishes that for you - but you can get better performance by actually remembering the length. Also, a "string" class with a statically-sized char buffer is not very useful at all, because that buffer size is an upper limit on the length of strings you can have.
So a better string class would contain a pointer of type char*, which points to a dynamically allocated (via new[]) array of char s. Again, it makes no sense to "mark the end of the array", but you will want to remember both the length of the string (i.e. the amount of space being used) and the size of the allocation (i.e. the amount of space that may be used before you have to re-allocate).
When you are copying from std::string, use the iterators begin(), end() and you don't have to worry about the NULL - in reality, the NULL is only present if you call c_str() (in which case the block of memory this points to will have a NULL to terminate the string.) If you want to memcpy use the data() method.
Why don't you follow the pattern used by vector - store the number of elements within your container class, then you know always how many values there are in it:
vector<string> myVector;
size_t elements(myVector.size());
Instantiating a string with x where const char* x = 0; can be problematic. See this code in Visual C++ STL that gets called when you do this:
_Myt& assign(const _Elem *_Ptr)
{ // assign [_Ptr, <null>)
_DEBUG_POINTER(_Ptr);
return (assign(_Ptr, _Traits::length(_Ptr)));
}
static size_t __CLRCALL_OR_CDECL length(const _Elem *_First)
{ // find length of null-terminated string
return (_CSTD strlen(_First));
}
#include "Maxmp_crafts_fine_wheels.h"
MaxpmContaner maxpm;
maxpm.add("Hello");
maxpm.add(""); // uh oh, adding an empty string; should I worry?
maxpm.add(0);
At this point, as a user of MaxpmContainer who had not read your documentation, I would expect the following:
strcmp(maxpm[0],"Hello") == 0;
*maxpm[1] == 0;
maxpm[2] == 0;
Interference between the zero terminator at position two and the empty string at position one is avoided by means of the "interpret this as a memory address" operator *. Position one will not be zero; it will be an integer, which if you interpret it as a memory address, will turn out to be zero. Position two will be zero, which, if you interpret it as a memory address, will turn out to be an abrupt disorderly exit from your program.
Related
I'm working on an exercise to calculate the length of a string using pointers.
Here's the code I've written below:
int main() {
std::string text = "Hello World";
std::string *string_ptr = &text;
int size = 0;
//Error below: ISO C++ forbids comparison between pointer and integer [-fpermissive]
while (string_ptr != '\0') {
size++;
string_ptr++;
}
std::cout << size;
}
In a lot of examples that I've seen, the string is often a char array which I also understand is a string. However, I want to try calculate it as a string object but I'm getting the error below.
Is it possible to calculate it where the string is an object, or does it need to be a char array?
If you just want the size of the string, well, use std::string::size():
auto size = text.size();
Alternatively, you can use length(), which does the same thing.
But I'm guessing you're trying to reimplement strlen for learning purposes. In that case, there are three problems with your code.
First, you're trying to count the number of characters in the string, and that means you need a pointer to char, not a pointer to std::string. That pointer should also point to constant characters, because you're not trying to modify those characters.
Second, to get a pointer to the string's characters, use its method c_str(). Getting the address of the string just gets you a pointer to the string itself, not its contents. Most importantly, the characters pointed to by c_str() are null terminated, so it is safe to use for your purposes here. Alternatively, use data(), which has been behaving identically to c_str() since C++11.
Finally, counting those characters involves checking if the value pointed to by the pointer is '\0', so you'll need to dereference it in your loop.
Putting all of this together:
const char* string_ptr = text.c_str(); // get the characters
int size = 0;
while (*string_ptr != '\0') { // make sure you dereference the pointer
size++;
string_ptr++;
}
Of course, this assumes the string does not contain what are known as "embedded nulls", which is when there are '\0' characters before the end. std::string can contain such characters and will work correctly. In that case, your function will return a different value from what the string's size() method would, but there's no way around it.
For that reason, you should really just call size().
First things first, the problem is irrelevant. std::string::size() is a O(1) (constant time) operation, as std::string's typically store their size. Even if you need to know the length of a C-style string (aka char*), you can use strlen. (I get that this is an exercise, but I still wanted to warn you.)
Anyway, here you go:
size_t cstrSize(const char* cstr)
{
size_t size(0);
while (*cstr != '\0')
{
++size;
++cstr;
}
return size;
}
You can get the underlying C-style string (which is a pointer to the first character) of a std::string by calling std::string::c_str(). What you did was getting a pointer to the std::string object itself, and dereferencing it would just give you that object back. And yes, you need to dereference it (using the * unary operator). That is why you got an error (which was on the (string_ptr != '\0') btw).
You are totally confused here.
“text” is a std::string, that is an object with a size() method retuning the length of the string.
“string_ptr” is a pointer to a std::string, that is a pointer to an object. Since it is a pointer to an object, you don’t use text.size() to get the length, but string_ptr->size().
So first, no, you can’t compare a pointer with an integer constant, only with NULL or another pointer.
The first time you increase string_ptr it points to the memory after the variable text. At that point using *string_ptr for anything will crash.
Remember: std::string is an object.
I'm completely new to the C++ language (pointers in particular, experience is mainly in PHP) and would love some explanation to the following (I've tried searching for answers).
How are both lines of code able to do exactly the same job in my program? The second line seems to go against everything I've learnt & understood so far about pointers.
char disk[3] = "D:";
char* disk = "D:";
How am I able to initialize a pointer to anything other than a memory address? Not only that, in the second line I'm not declaring the array properly either - but it's still working?
The usual way to initialize an array in C and C++ is:
int a[3] = { 0, 1, 2 };
Aside: And you can optionally leave out the array bound and have it deduced from the initializer list, or have a larger bound than there are initializers:
int aa[] = { 0, 1, 2 }; // another array of three ints
int aaa[5] = { 0, 1, 2 }; // equivalent to { 0, 1, 2, 0, 0}
For arrays of characters there is a special rule that allows an array to be initialized from a string literal, with each element of the array being initialized from the corresponding character in the string literal.
Your first example uses the string literal "D:" so each element of the array will be initialized to a character from that string, equivalent to:
char disk[3] = { 'D', ':', '\0' };
(The third character is the null terminator, which is implicitly present in all string literals).
Aside: Here too you can optionally leave out the array bound and have it deduced from the string literal, or have a larger bound than the string length:
char dd[] = "D:"; // another array of three chars
char ddd[5] = "D:"; // equivalent to { 'D', ':', '\0', '\0', '\0'}
Just like the aaa example above, the extra elements in ddd that don't have a corresponding character in the string will be zero-initialized.
Your second example works because the string literal "D:" will be output by the compiler and stored somewhere in the executable as an array of three chars. When the executable is run the segment that contains the array (and other constants) will be mapped into the process' address space. So your char* pointer is then initialized to point to the location of that array, wherever that happens to be. Conceptually it's similar to:
const char __some_array_created_by_the_compiler[3] = "D:";
const char* disk = __some_array_created_by_the_compiler;
For historical reasons (mostly that const didn't exist in the early days of C) it was legal to use a non-const char* to point to that array, even though the array is actually read-only, so C and the first C++ standard allow you to use a non-const char* pointer to point to a string literal, even though the array that it refers to is really const:
const char __some_array_created_by_the_compiler[3] = "D:";
char* disk = (char*)__some_array_created_by_the_compiler;
This means that despite appearances your two examples are not exactly the same, because this is only allowed for the first one:
disk[0] = 'C';
For the first example that is OK, it alters the first element of the array.
For the second example it might compile, but it results in undefined behaviour, because what it's actually doing is modifying the first element of the __some_array_created_by_the_compiler which is read-only. In practice what will probably happen is that the process will crash, because trying to write to a read-only page of memory will raise a segmentation fault.
It's important to understand that there are lots of things in C++ (and even more in C) which the compiler will happily compile, but which cause Very Bad Things to happen when the code is executed.
char disk[3] = "D:";
Is treated as
char disk[3] = {'D',':','\0'};
Where as in C++11 and above
char* disk = "D:";
Is an error as a string literal is of type const char[] and cannot be assigned to a char *. You can assign it to a const char * though.
String literals are actually read-only, zero-terminated arrays of characters, and using a string literal gives you a pointer to the first character in the array.
So in the second example
char* disk = "D:";
you initialize disk to point to the first character of an array of three characters.
Note in my first paragraph above that I said that string literals are read-only arrays, that means that having a plain char* pointing to this array could make you think that it's okay to modify this array when it's not (attempting to modify a string literal leads to undefined behavior). This is the reason that const char* is usually used:
const char* disk = "D:";
Since C++11 it's actually an error to not use a const char*, through most compilers still only warn about it instead of producing an error.
You are absolutely right to say that pointers can store only memory address. Then how is the second statement valid? Let me explain.
When you put a sequence of characters in double quotes, what happens behind the screens is that the string gets stored in a read only computer memory and the address of the location where the string is stored is returned. So at run-time, the expression is evaluated, the string evaluates to the memory address, which is a character pointer. It is this pointer that is assigned to your pointer variable.
So what is the difference between the two statements? The string in the second case is a constant, while the string declared by the first statement can be changed.
I saw this example:
const char* SayHi() { return "Hi"; }
And it works fine, but if I try to remove the pointer it doesn't work and I can't figure
out why.
const char SayHi() { return "Hi"; } \\Pointer removed
It works if I assign it a single character like this:
const char SayHi() { return 'H'; } \\Pointer removed and only 1 character
But I don't know what makes it work exactly. Why would a pointer be able to hold more than one character? Isn't a pointer just a variable that points to another one? What does this point to?
That is because a char is by definition a single character (like in your 3rd case). If you want a string, you can either use a array of chars which decays to const char* (like in your first case) or, the C++ way, use std::string.
Here you can read more about the "array decaying to pointer" thing.
You are correct that a pointer is just a variable that points somewhere -- in this case it points to a string of characters somewhere in memory. By convention, strings (arrays of char) end with a null character (0), so operations like strlen can terminate safely without overflowing a buffer.
As for where that particular pointer (in your first example) points to, it is pointing to the string literal "Hi" (with a null terminator at the end added by the compiler). That location is platform-dependent and is answered here.
It is also better practice to use std::string in C++ than plain C arrays of characters.
Consider a simple example
Eg: const char* letter = "hi how r u";
letter is a const character pointer, which points to the string "hi how r u". Now when i want to print the data or to access the data I should use *letter correct?
But in this situation, shouldn't I only have to use the address in the call to printf?
printf("%s",letter);
So why is this?
*letter is actually a character; it's the first character that letter points to. If you're operating on a whole string of characters, then by convention, functions will look at that character, and the next one, etc, until they see a zero ('\0') byte.
In general, if you have a pointer to a bunch of elements (i.e., an array), then the pointer points to the first element, and somehow any code operating on that bunch of elements needs to know how many there are. For char*, there's the zero convention; for other kinds of arrays, you often have to pass the length as another parameter.
Simply because printf has a signature like this: int printf(const char* format, ...); which means it is expecting pointer(s) to a char table, which it will internally dereference.
letter does not point to the string as a whole, but to the first character of the string, hence a char pointer.
When you dereference the pointer (with *) then you are referring to the first character of the string.
however a single character is much use to prinf (when print a string) so it instead takes the pointer to the first element and increments it's value printing out the dereference values until the null character is found '\0'.
As this is a C++ question it is also important to note that you should really store strings as the safe encapulated type std::string and you the type safe iostreams where possible:
std::string line="hi how r u";
std::cout << line << std::endl;
%s prints up to the first \0 see: http://msdn.microsoft.com/en-us/library/hf4y5e3w.aspx, %s is a character string format field, there is nothing strange going on here.
printf("%s") expect the address in order to go through the memory searching for NULL (\0) = end of string. In this case you say only letter. To printf("%c") would expect the value not the address: printf("%c", *letter);
printf takes pointers to data arrays as arguments. So, if you're displaying a string (a type of array) with %s or a number with %d, %e, %f, etc, always pass the variable name without the *. The variable name is the pointer to the first element of the array, and printf will print each element of the array by using simple pointer arithmetic according to the type (char is 1 or 2 bytes, ints are 4, etc) until it reaches an EOL or zero value.
Of course, if you make a pointer to the array variable, then you'd want to dereference that pointer with *. But that's more the exception than the rule. :)
char el[3] = myvector[1].c_str();
myvector[i] is a string with three letters in. Why does this error?
It returns type char* which is a pointer to a string. You can't assign this directly to an array like that, as that array already has memory assigned to it. Try:
const char* el = myvector[1].c_str();
But very careful if the string itself is destroyed or changed as the pointer will no longer be valid.
Because a const char * is not a valid initializer for an array. What's more, I believe c_str returns a pointer to internal memory so it's not safe to store.
You probably want to copy the value in some way (memcpy or std::copy or something else).
In addition to what others have said, keep in mind that a string with a length of three characters requires four bytes when converted to a c_str. This is because an extra byte has to be reserved for the null at the end of the string.
Arrays in C++ must know their size, and be provided with initialisers, at compile-time. The value returned by c_str() is only known at run-time. If e1 were a std::string, as it probably should be, there would be no problem. If it must be a char[], then use strcpy to populate it.
char el[3];
strcpy( e1, myvector[1].c_str() );
This assumes that the string myvector[1] contains at most two characters.
Just create a copy of the string. Then, if you ever need to access it as a char*, just do so.
string el = myvector[1];
cout << &el[0] << endl;
Make the string const if you don't need to modify it. Use c_str() on 'el' instead if you want.
Or, just access it right from the vector with:
cout << &myvector[1][0] << endl;
if possible for your situation.