Strange behaviour of string length function w.r.t null character? - c++

I have this code say:
std::string str("ashish");
str.append("\0\0");
printf("%d", str.length());
It is printing 6 but if I have this code
std::string str("ashish");
str.append("\0\0",2);
printf("%d", str.length());
it is printing 8 ! Why?

It's because str.append("\0\0") uses the null character to determine the end of the string. So "\0\0" is length zero. The other overload, str.append("\0\0",2), just takes the length you give it, so it appends two characters.
From the standard:
basic_string&
append(const charT* s, size_type n);
7 Requires: s points to an array of at least n elements of charT.
8 Throws: length_error if size() + n > max_size().
9 Effects: The function replaces the string controlled by *this with a string of length size() + n whose first size() elements are a copy of the original string controlled by *this and whose remaining elements are a copy of the initial n elements of s.
10 Returns: *this.
basic_string& append(const charT* s);
11 Requires: s points to an array of at least traits::length(s) + 1 elements of charT.
12 Effects: Calls append(s, traits::length(s)).
13 Returns: *this.
— [string::append] 21.4.6.2 p7-13

From the docs:
string& append ( const char * s, size_t n );
Appends a copy of the
string formed by the first n characters in the array of characters
pointed by s.
string& append ( const char * s );
Appends a copy of the
string formed by the null-terminated character sequence (C string)
pointed by s. The length of this character sequence is determined by
the first ocurrence of a null character (as determined by
traits.length(s)).
The second version (your first one) takes into account the null-terminator (which in your case is exactly the first character). The first one doesn't.

Related

Is it Safe to strncpy Into a string That Doesn't Have Room for the Null Terminator?

Consider the following code:
const char foo[] = "lorem ipsum"; // foo is an array of 12 characters
const auto length = strlen(foo); // length is 11
string bar(length, '\0'); // bar was constructed with string(11, '\0')
strncpy(data(bar), foo, length);
cout << data(bar) << endl;
My understanding is that strings are always allocated with a hidden null element. If this is the case then bar really allocates 12 characters, with the 12th being a hidden '\0' and this is perfectly safe... If I'm wrong on that then the cout will result in undefined behavior because there isn't a null terminator.
Can someone confirm for me? Is this legal?
There have been a lot of questions about why to use strncpy instead of just using the string(const char*, const size_t) constructor. My intent has been to make my toy code close to my actual code which contains a vsnprintf. Unfortunately even after getting excellent answers here I've found that vsnprintf doesn't behave the same as strncpy, and I've asked a follow up question here: Why is vsnprintf Not Writing the Same Number of Characters as strncpy Would?
This is safe, as long as you copy [0, size()) characters into the string . Per [basic.string]/3
In all cases, [data(), data() + size()] is a valid range, data() + size() points at an object with value charT() (a “null terminator”), and size() <= capacity() is true.
So string bar(length, '\0') gives you a string with a size() of 11, with an immutable null terminator at the end (for a total of 12 characters in actual size). As long as you do not overwrite that null terminator, or try to write past it, you're okay.
There are two different things here.
First, does strncpy add an additional \0 in this instance (11 non-\0 elements to be copied in a string of size 11). The answer is no:
Copies at most count characters of the byte string pointed to by src (including the terminating null character) to character array pointed to by dest.
If count is reached before the entire string src was copied, the resulting character array is not null-terminated.
So the call is perfectly fine.
Then data() gives you a proper \0-terminated string:
c_str() and data() perform the same function. (since C++11)
So it seems that for C++11, you are safe. Whether the string allocates an additional \0 or not doesn't seems to be indicated in the documentation, but the API is clear that what you are doing is perfectly fine.
You have allocated an 11-character std::string. You are not trying to read nor write anything past that, so that part will be safe.
So the real question is whether you have messed up the internals of the string. Since you haven't done anything that isn't allowed, how would that be possible? If it's required for the string to internally keep a 12-byte buffer with a null padding at the end in order to fulfill its contract, that will be the case no matter what operations you performed.
Yes it's safe according to the char * strncpy(char* destination, const char* source, size_t num):
Copy characters from string
Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.

Operations on pointers in for loop and string function and bool initial value

Can someone explain/confirm me meaning of below lines?
bool instring{false}; - this mean that false is the initial value of this variable, yes?
for (const char* p = mystart; *p; p++) - here pointer *p at second parameter of for means that this loop exist to the moment that this pointer exist, yes?
string(mystart,p-mystart) - I can't find this string usage in c++ reference, I know result of it is difference between this parameters, but don't understand how this happen.
This lines are from code below(original code from another SO question):
string line;
while (std::getline(cin, line)) { // read full line
const char *mystart=line.c_str(); // prepare to parse the line - start is position of begin of field
bool instring{false};
for (const char* p=mystart; *p; p++) { // iterate through the string
if (*p=='"') // toggle flag if we're btw double quote
instring = !instring;
else if (*p==',' && !instring) { // if comma OUTSIDE double quote
csvColumn.push_back(string(mystart,p-mystart)); // keep the field
mystart=p+1; // and start parsing next one
}
}
csvColumn.push_back(string(mystart)); // last field delimited by end of line instead of comma
}
bool instring{false} means what you think, using C++11 initializer lists.
c_str() returns a pointer to a c-style string, that is, null terminated. *p will dereference to '\0' at the end of the string, which is 0, which evaluates to false in the loop
string (const char* s, size_t n); is the constructor being used, passing the start and the size (p-mstart).
http://www.cplusplus.com/reference/string/string/string/
Correct. This is a relatively new unified initializer syntax
The loop exits the moment that pointer p points to a zero value
The string is using the two-iterator constructor, which can take two pointers as well. The new string contains everything starting at the first pointer, inclusive, up to the second (computed) pointer, excusive.
bool instring{false}; This is something known as list initialization, which is available since C++11. You can find more about it over here: http://en.cppreference.com/w/cpp/language/list_initialization
for (const char* p = mystart; *p; p++) This will loop for as long as there are characters left in your string.
string(mystart,p-mystart) This is an overloaded constructor of the string(). In your case, it copies the first p-mystart characters. You can find more about this over here (number 5 on the list): http://www.cplusplus.com/reference/string/string/string/
For item 1, maybe you mean bool instring(false); (paren instead of curly brace). Yes, that means initialize instring to the value 'false';
For item 2, *p is 0 means that you have reached the end of the null terminated string. So the loop will stop when you reach the end of the string.
For item 3, the first arg to string is a string, the second is an integral (numeric) value representing the length of the string or portion of the string.

Does the std::string::resize() method manages the terminating character?

I am copying some data from a stream into a string, so I thought about resizing the string with the actual number of characters plus one for the terminating one, like this:
std::istringstream stream { "data" };
const std::size_t count = 4;
std::string copy;
copy.resize(count + 1);
stream.read(&copy[0], count);
copy[count] = 0;
However, in this case, copy indicates it has a size of 5 (which is consistent since I called resize(5)). Does that mean that resize() will add the extra terminating character itself? Which would mean that I do not have to worry about appending \0 after invoking read(&data[0], count)?
No you don't have to. The string class abstracts the concept of "null terminated char sequence", so that you don't have to worry about that anymore.
Also, the size of the string returned doesn't count the terminating character, which is consistent with the behavior I mentioned, because if you don't have to deal with the terminating character, you don't have to know about it. Your string is just the characters you want to manipulate, without any concern for that "utility" character that has nothing to do with your actual data.
The quote §21.4.7.1 basic_string accessors [string.accessors] from the standard indicates that std::string has a guaranteed null terminated buffer.
Also according to the standard §21.4.4/6-8 basic_string capacity [string.capacity]:
void resize(size_type n, charT c);
6 Requires: n <= max_size()
7 Throws: length_error if n > max_size().
8 Effects: Alters the length of the string designated by *this as follows:
- If n <= size(), the function replaces the string designated by *this with a string of length n whose elements are a copy of the initial elements of the original string designated by *this.
- If n > size(), the function replaces the string designated by *this with a string of length n whose first size() elements are a copy of the original string designated by *this, and whose remaining elements are all initialized to c.
void resize(size_type n);
9 Effects: resize(n,charT()).
Interpreting the above std::string::resize will not affect the terminating null character of the string's buffer.
Now to your code:
statement std::string copy; defines an empty string (i.e., copy.size() == 0).
statement copy.resize(count + 1); since (n == 5) > 0 will replace copy with a string of length 5 filled with \0 (i.e., null characters).
Now in statement stream.read(&copy[0], count); std::stream::read will simply copy a block of data, without checking its contents nor appending a null character at the end.
In other words it will just replace the first 4 null characters of copy with "data". The size of copy won't change it will be still a 5 sized string. That is, the contents of copy's buffer will be "data\0\0".
So calling copy[count] = 0; is redundant since copy[4] is already \0. However, your string is not "data" but rather "data\0".

I'm finding String::copy rather difficult: copy first five characters of a string

I'm trying to teach myself to program, so I apologize in advance for any shoddy code or bad practices. Basically, I'm trying to copy part of a long string using string::copy, but I'm clearly not doing something right. My goal here is to copy and print the first five characters of the string "bignumber":
#include <iostream>
#include<string>
using namespace std;
int main()
{
const string bignumber = "73167176531330624919225119674426574742355349194934\
96983520312774506326239578318016984801869478851843\
85861560789112949495459501737958331952853208805511\
12540698747158523863050715693290963295227443043557\
6689664895044524452316173185640309871121722383113\
62229893423380308135336276614282806444486645238749\
30358907296290491560440772390713810515859307960866\
70172427121883998797908792274921901699720888093776\
65727333001053367881220235421809751254540594752243\
52584907711670556013604839586446706324415722155397\
53697817977846174064955149290862569321978468622482\
83972241375657056057490261407972968652414535100474\
82166370484403199890008895243450658541227588666881\
16427171479924442928230863465674813919123162824586\
17866458359124566529476545682848912883142607690042\
24219022671055626321111109370544217506941658960408\
07198403850962455444362981230987879927244284909188\
84580156166097919133875499200524063689912560717606\
05886116467109405077541002256983155200055935729725\
71636269561882670428252483600823257530420752963450";
int iter = 0;
size_t window;
char buffer[5];
window = bignumber.copy(buffer,iter,iter+5);
cout << window << endl;
return 0;
}
This is for project Euler problem 8 if you care. Thanks for your help.
I believe you misread the documentation of basic_string::copy. From this page :
size_type copy( Char* s, size_type count, size_type index = 0 ) const;
Copies count characters from the position, starting at index to the given character string s. The resulting string is not NULL terminated.
Your use of the first parameter is correct (buffer being a char array, it will decay to a char pointer upon function argument passing), but your second and third arguments aren't :
size_type count is the number of characters to copy : you are providing 0 while you seem to want 5
size_type index is the starting index for character copying : you are providing 5 while you apparently need 0 (copy count characters from the start of the string). The parameter happens to have a default argument value of 0 : you don't have to provide any value here.
In the end, you could do :
const size_t window = bignumber.copy(buffer, sizeof(buffer));
Notice that I've used sizeof(buffer) rather than the magic value 5 to avoid introducing a bug if the buffer size is changed without reflecting the modification to this call. Also be aware that buffer cannot be simply outputted to std::cout after the call to copy because it is not a null terminated string.
Now, if working with std::string only is an option (and in most cases, it should be), you might as well be using basic_string::substr :
basic_string substr( size_type index = 0, size_type count = npos ) const;
Returns a substring of the current string, starting at the given position index and having length of count characters.
For example :
const std::string substring = bignumber.substr(0, 5);
Contrary to the copy solution, there is no possible size issue here, and the result can be outputted to std::cout without any problem. In other words : it's much safer.
The substr member function would be the more conventional solution to this problem.

Standard Template String class: string.fill()

I need a way to create a string of n chars. In this case ascii value zero.
I know I can do it by calling the constructor:
string sTemp(125000, 'a');
but I would like to reuse sTemp in many places and fill it with different lengths.
I am calling a library that takes a string pointer and length as an argument and fills the string with bytes. (I know that technically string is not contiguous, but for all intents and purposes it is, and will likly become the standard soon). I do NOT want to use a vector.
is there some clever way to call the constructor again after the string has been created?
The string class provides the method assign to assign a given string a new value. The signatures are
1. string& assign ( const string& str );
2. string& assign ( const string& str, size_t pos, size_t n );
3. string& assign ( const char* s, size_t n );
4. string& assign ( const char* s );
5. string& assign ( size_t n, char c );
6. template <class InputIterator>
string& assign ( InputIterator first, InputIterator last );
Citing source: cplusplus.com (I recommend this website because it gives you a very elaborated reference of the C++ standard libraries.)
I think you're looking for something like the fifth one of these functions: n specifies the desired length of your string and c the character filled into this string. For example if you write
sTemp.assign(10, 'b');
your string will be solely filled with 10 b's.
I originally suggested to use the STL Algorithm std::fill but thus your string length stays unchanged. The method string::resize provides a way to change the string's size and fills the appended characters with a given value -- but only the appended ones are set. Finally string::assign stays the best approach!
Try to use:
sTemp.resize(newLength, 'a');
References:
void __CLR_OR_THIS_CALL resize(size_type _Newsize)
{ // determine new length, padding with null elements as needed
resize(_Newsize, _Elem());
}
void __CLR_OR_THIS_CALL resize(size_type _Newsize, _Elem _Ch)
{ // determine new length, padding with _Ch elements as needed
if (_Newsize <= _Mysize)
erase(_Newsize);
else
append(_Newsize - _Mysize, _Ch);
}