I am copying some data from a stream into a string, so I thought about resizing the string with the actual number of characters plus one for the terminating one, like this:
std::istringstream stream { "data" };
const std::size_t count = 4;
std::string copy;
copy.resize(count + 1);
stream.read(©[0], count);
copy[count] = 0;
However, in this case, copy indicates it has a size of 5 (which is consistent since I called resize(5)). Does that mean that resize() will add the extra terminating character itself? Which would mean that I do not have to worry about appending \0 after invoking read(&data[0], count)?
No you don't have to. The string class abstracts the concept of "null terminated char sequence", so that you don't have to worry about that anymore.
Also, the size of the string returned doesn't count the terminating character, which is consistent with the behavior I mentioned, because if you don't have to deal with the terminating character, you don't have to know about it. Your string is just the characters you want to manipulate, without any concern for that "utility" character that has nothing to do with your actual data.
The quote §21.4.7.1 basic_string accessors [string.accessors] from the standard indicates that std::string has a guaranteed null terminated buffer.
Also according to the standard §21.4.4/6-8 basic_string capacity [string.capacity]:
void resize(size_type n, charT c);
6 Requires: n <= max_size()
7 Throws: length_error if n > max_size().
8 Effects: Alters the length of the string designated by *this as follows:
- If n <= size(), the function replaces the string designated by *this with a string of length n whose elements are a copy of the initial elements of the original string designated by *this.
- If n > size(), the function replaces the string designated by *this with a string of length n whose first size() elements are a copy of the original string designated by *this, and whose remaining elements are all initialized to c.
void resize(size_type n);
9 Effects: resize(n,charT()).
Interpreting the above std::string::resize will not affect the terminating null character of the string's buffer.
Now to your code:
statement std::string copy; defines an empty string (i.e., copy.size() == 0).
statement copy.resize(count + 1); since (n == 5) > 0 will replace copy with a string of length 5 filled with \0 (i.e., null characters).
Now in statement stream.read(©[0], count); std::stream::read will simply copy a block of data, without checking its contents nor appending a null character at the end.
In other words it will just replace the first 4 null characters of copy with "data". The size of copy won't change it will be still a 5 sized string. That is, the contents of copy's buffer will be "data\0\0".
So calling copy[count] = 0; is redundant since copy[4] is already \0. However, your string is not "data" but rather "data\0".
Related
Consider the following code:
const char foo[] = "lorem ipsum"; // foo is an array of 12 characters
const auto length = strlen(foo); // length is 11
string bar(length, '\0'); // bar was constructed with string(11, '\0')
strncpy(data(bar), foo, length);
cout << data(bar) << endl;
My understanding is that strings are always allocated with a hidden null element. If this is the case then bar really allocates 12 characters, with the 12th being a hidden '\0' and this is perfectly safe... If I'm wrong on that then the cout will result in undefined behavior because there isn't a null terminator.
Can someone confirm for me? Is this legal?
There have been a lot of questions about why to use strncpy instead of just using the string(const char*, const size_t) constructor. My intent has been to make my toy code close to my actual code which contains a vsnprintf. Unfortunately even after getting excellent answers here I've found that vsnprintf doesn't behave the same as strncpy, and I've asked a follow up question here: Why is vsnprintf Not Writing the Same Number of Characters as strncpy Would?
This is safe, as long as you copy [0, size()) characters into the string . Per [basic.string]/3
In all cases, [data(), data() + size()] is a valid range, data() + size() points at an object with value charT() (a “null terminator”), and size() <= capacity() is true.
So string bar(length, '\0') gives you a string with a size() of 11, with an immutable null terminator at the end (for a total of 12 characters in actual size). As long as you do not overwrite that null terminator, or try to write past it, you're okay.
There are two different things here.
First, does strncpy add an additional \0 in this instance (11 non-\0 elements to be copied in a string of size 11). The answer is no:
Copies at most count characters of the byte string pointed to by src (including the terminating null character) to character array pointed to by dest.
If count is reached before the entire string src was copied, the resulting character array is not null-terminated.
So the call is perfectly fine.
Then data() gives you a proper \0-terminated string:
c_str() and data() perform the same function. (since C++11)
So it seems that for C++11, you are safe. Whether the string allocates an additional \0 or not doesn't seems to be indicated in the documentation, but the API is clear that what you are doing is perfectly fine.
You have allocated an 11-character std::string. You are not trying to read nor write anything past that, so that part will be safe.
So the real question is whether you have messed up the internals of the string. Since you haven't done anything that isn't allowed, how would that be possible? If it's required for the string to internally keep a 12-byte buffer with a null padding at the end in order to fulfill its contract, that will be the case no matter what operations you performed.
Yes it's safe according to the char * strncpy(char* destination, const char* source, size_t num):
Copy characters from string
Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.
I have a custom String class that I've built. I'm trying to build a custom insert function that inserts a string into a specified position.
For Example:
String str("test");
str.insert(2, "animal");
would return:
"tesanimalt"
What i have so far:
String& String::insert(int pos, const String& str) {
char newString[100];
strncpy(newString, chars, pos);
newString[pos] = '\0';
strcat(newString, str.chars);
strcat(newString, chars + pos);
return *this;
}
There are a lot of problems with the code, one of which is that your String doesn’t seem to contain any actual storage. You instead allocate a local array as a temporary variable, then let it be discarded.
It’s not possible to fix this without a MCVE, as we cannot see how String is supposed to work. There seems to be some kind of member called chars, and you probably want to copy to that. However, if it contains a buffer allocated with malloc() or new[], you might, if m is the length of the enclosing string, and n the length of the intercalated string, and you insert at position i:
Reallocate the buffer of the destination string to its new size (The sum of the sizes of the two strings without their terminating nulls, plus one byte for a terminating null). Alternatively, allocate a new buffer and copy the first i elements of the enclosing string.
Shift elements i through n of the enclosing string n elements to the right of the resized buffer.
Copy the string to insert to positions i through i+n-1.
Write a terminating null to position m+n.
Since you appear to want to modify the enclosing string, if you created a new buffer, deallocate the old one. If you reallocated the buffer, set chars to the possibly-changed value.
Can someone explain/confirm me meaning of below lines?
bool instring{false}; - this mean that false is the initial value of this variable, yes?
for (const char* p = mystart; *p; p++) - here pointer *p at second parameter of for means that this loop exist to the moment that this pointer exist, yes?
string(mystart,p-mystart) - I can't find this string usage in c++ reference, I know result of it is difference between this parameters, but don't understand how this happen.
This lines are from code below(original code from another SO question):
string line;
while (std::getline(cin, line)) { // read full line
const char *mystart=line.c_str(); // prepare to parse the line - start is position of begin of field
bool instring{false};
for (const char* p=mystart; *p; p++) { // iterate through the string
if (*p=='"') // toggle flag if we're btw double quote
instring = !instring;
else if (*p==',' && !instring) { // if comma OUTSIDE double quote
csvColumn.push_back(string(mystart,p-mystart)); // keep the field
mystart=p+1; // and start parsing next one
}
}
csvColumn.push_back(string(mystart)); // last field delimited by end of line instead of comma
}
bool instring{false} means what you think, using C++11 initializer lists.
c_str() returns a pointer to a c-style string, that is, null terminated. *p will dereference to '\0' at the end of the string, which is 0, which evaluates to false in the loop
string (const char* s, size_t n); is the constructor being used, passing the start and the size (p-mstart).
http://www.cplusplus.com/reference/string/string/string/
Correct. This is a relatively new unified initializer syntax
The loop exits the moment that pointer p points to a zero value
The string is using the two-iterator constructor, which can take two pointers as well. The new string contains everything starting at the first pointer, inclusive, up to the second (computed) pointer, excusive.
bool instring{false}; This is something known as list initialization, which is available since C++11. You can find more about it over here: http://en.cppreference.com/w/cpp/language/list_initialization
for (const char* p = mystart; *p; p++) This will loop for as long as there are characters left in your string.
string(mystart,p-mystart) This is an overloaded constructor of the string(). In your case, it copies the first p-mystart characters. You can find more about this over here (number 5 on the list): http://www.cplusplus.com/reference/string/string/string/
For item 1, maybe you mean bool instring(false); (paren instead of curly brace). Yes, that means initialize instring to the value 'false';
For item 2, *p is 0 means that you have reached the end of the null terminated string. So the loop will stop when you reach the end of the string.
For item 3, the first arg to string is a string, the second is an integral (numeric) value representing the length of the string or portion of the string.
I have this code say:
std::string str("ashish");
str.append("\0\0");
printf("%d", str.length());
It is printing 6 but if I have this code
std::string str("ashish");
str.append("\0\0",2);
printf("%d", str.length());
it is printing 8 ! Why?
It's because str.append("\0\0") uses the null character to determine the end of the string. So "\0\0" is length zero. The other overload, str.append("\0\0",2), just takes the length you give it, so it appends two characters.
From the standard:
basic_string&
append(const charT* s, size_type n);
7 Requires: s points to an array of at least n elements of charT.
8 Throws: length_error if size() + n > max_size().
9 Effects: The function replaces the string controlled by *this with a string of length size() + n whose first size() elements are a copy of the original string controlled by *this and whose remaining elements are a copy of the initial n elements of s.
10 Returns: *this.
basic_string& append(const charT* s);
11 Requires: s points to an array of at least traits::length(s) + 1 elements of charT.
12 Effects: Calls append(s, traits::length(s)).
13 Returns: *this.
— [string::append] 21.4.6.2 p7-13
From the docs:
string& append ( const char * s, size_t n );
Appends a copy of the
string formed by the first n characters in the array of characters
pointed by s.
string& append ( const char * s );
Appends a copy of the
string formed by the null-terminated character sequence (C string)
pointed by s. The length of this character sequence is determined by
the first ocurrence of a null character (as determined by
traits.length(s)).
The second version (your first one) takes into account the null-terminator (which in your case is exactly the first character). The first one doesn't.
I like "reinventing the wheel" for learning purposes, so I'm working on a container class for strings. Will using the NULL character as an array terminator (i.e., the last value in the array will be NULL) cause interference with the null-terminated strings?
I think it would only be an issue if an empty string is added, but I might be missing something.
EDIT: This is in C++.
"" is the empty string in C and C++, not NULL. Note that "" has exactly one element (instead of zero), meaning it is equivalent to {'\0'} as an array of char.
char const *notastring = NULL;
char const *emptystring = "";
emptystring[0] == '\0'; // true
notastring[0] == '\0'; // crashes
No, it won't, because you won't be storing in an array of char, you'll be storing in an array of char*.
char const* strings[] = {
"WTF"
, "Am"
, "I"
, "Using"
, "Char"
, "Arrays?!"
, 0
};
It depends on what kind of string you're storing.
If you're storing C-style strings, which are basically just pointers to character arrays (char*), there's a difference between a NULL pointer value, and an empty string. The former means the pointer is ‘empty’, the latter means the pointer points to an array that contains a single item with character value 0 ('\0'). So the pointer still has a value, and testing it (if (foo[3])) will work as expected.
If what you're storing are C++ standard library strings of type string, then there is no NULL value. That's because there is no pointer, and the string type is treated as a single value. (Whereas a pointer is technically not, but can be seen as a reference.)
I think you are confused. While C-strings are "null terminated", there is no "NULL" character. NULL is a name for a null pointer. The terminator for a C-string is a null character, i.e. a byte with a value of zero. In ASCII, this byte is (somewhat confusingly) named NUL.
Suppose your class contains an array of char that is used to store the string data. You do not need to "mark the end of the array"; the array has a specific size that is set at compile-time. You do need to know how much of that space is actually being used; the null-terminator on the string data accomplishes that for you - but you can get better performance by actually remembering the length. Also, a "string" class with a statically-sized char buffer is not very useful at all, because that buffer size is an upper limit on the length of strings you can have.
So a better string class would contain a pointer of type char*, which points to a dynamically allocated (via new[]) array of char s. Again, it makes no sense to "mark the end of the array", but you will want to remember both the length of the string (i.e. the amount of space being used) and the size of the allocation (i.e. the amount of space that may be used before you have to re-allocate).
When you are copying from std::string, use the iterators begin(), end() and you don't have to worry about the NULL - in reality, the NULL is only present if you call c_str() (in which case the block of memory this points to will have a NULL to terminate the string.) If you want to memcpy use the data() method.
Why don't you follow the pattern used by vector - store the number of elements within your container class, then you know always how many values there are in it:
vector<string> myVector;
size_t elements(myVector.size());
Instantiating a string with x where const char* x = 0; can be problematic. See this code in Visual C++ STL that gets called when you do this:
_Myt& assign(const _Elem *_Ptr)
{ // assign [_Ptr, <null>)
_DEBUG_POINTER(_Ptr);
return (assign(_Ptr, _Traits::length(_Ptr)));
}
static size_t __CLRCALL_OR_CDECL length(const _Elem *_First)
{ // find length of null-terminated string
return (_CSTD strlen(_First));
}
#include "Maxmp_crafts_fine_wheels.h"
MaxpmContaner maxpm;
maxpm.add("Hello");
maxpm.add(""); // uh oh, adding an empty string; should I worry?
maxpm.add(0);
At this point, as a user of MaxpmContainer who had not read your documentation, I would expect the following:
strcmp(maxpm[0],"Hello") == 0;
*maxpm[1] == 0;
maxpm[2] == 0;
Interference between the zero terminator at position two and the empty string at position one is avoided by means of the "interpret this as a memory address" operator *. Position one will not be zero; it will be an integer, which if you interpret it as a memory address, will turn out to be zero. Position two will be zero, which, if you interpret it as a memory address, will turn out to be an abrupt disorderly exit from your program.