Program picking '\0' even when it is not mentioned - Clarification - c++

So, I am given to predict what this program will do:
int main()
{
char d[] = {'h','e','l','l','o'};
const char *c = d;
std::cout << *c << std::endl;
while ( *c ) {
c = c + 1;
std::cout << *c << std::endl;
if ( *c == '\0' )
std::cout << "Yes" << std::endl;
}
return 0;
}
From my understanding the code should've never printed Yes as there is no \0 in the character array d[], so is it the garbage value this program is picking? I short this while should run infinite times. Is that right?

The proper answer to this question is that the program exhibits undefined behavior, because it goes past the end of the array.
Changing the program to use string literal for initialization would change the behavior to "always prints "Yes":
char d[] = "hello";
I short this while should run infinite times.
Once undefined behavior happens, all bets are off. However, commonly the program manages to find a zero byte in memory outside of d[], at which point it prints "Yes", and exits the loop.

Your code is an example where array d is not a string (more accurately, not a nul-termitated string), so it is incorrect usage of that array as a string. That means, all functions that work with char* strings and use \0 as a sign of string end go ouside the memory allocated for d.... and somtimes \0 can be found outside (no one knows beforehand where this \0 will be found). And once again, this is incorrect usage that can lead to errors related to array boundaries violation.
Finaly, because conditions for if statement and while is "associated" in sense "(*c == '\0') is true at the last iteration of loop while(*c){...}" and there is very low probability that while(*c){...} is infinite, "yes" will be printed eventually.
UPDATE:
Let's consider additionally the following example:
#include <iostream>
using namespace std;
int main()
{
char d1[] = { 'h', 'e', 'l', 'l', 'o' }; // no nul-terminator here
char d2[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
char d3[] = "hello";
cout << "Memory allocated for d1 - " << sizeof(d1) << endl;
cout << "Length of string in d1 - " << strlen(d1) << endl;
cout << "Memory allocated for d2 - " << sizeof(d2) << endl;
cout << "Length of string in d2 - " << strlen(d2) << endl;
cout << "Memory allocated for d3 - " << sizeof(d3) << endl;
cout << "Length of string in d3 - " << strlen(d3) << endl;
return 0;
}
Output will be (for the second line not always exactly, but similar):
Memory allocated for d1 - 5
Length of string in d1 - 19
Memory allocated for d2 - 6
Length of string in d2 - 5
Memory allocated for d3 - 6
Length of string in d3 - 5
Here you can see 3 ways of char-array initialization. And d3 here is initialized with string literal where \0 is added because value is in "". Array d1 has no nul-terminator and as a result strlen return value greated than sizeof - \0 was found outside array d1.

Related

Is there any way to know the size of what will be printed to standard output in C++?

For example, if I use the following:
cout << "hello world";
Is there any way to know the size of what's being printed to stdout?
You can use std::stringstream for this:
#include <sstream>
#include <iostream>
int main(){
std::stringstream ss;
int a = 3;
ss<<"Hello, world! "<<a<<std::endl;
std::cout<<"Size was: "<<ss.str().size()<<std::endl;
std::cout<<ss.str()<<std::endl;
}
The above returns 16: 14 character for "Hello, world!", 1 character for the contents of the variable a, and one character from std::endl.
I doubt there is a standard way to determine how much bytes will be written to the standard output before writing it.
What you could do is, write it to an ostringstream and get the size of the stream. This doubles the work, but gives you a standard generic way to determine how many bytes will an object take when written to a stream:
template <class T>
std::size_t stream_len(const T& t)
{
std::ostringstream oss;
oss << t;
return oss.tellp();
}
Here is a demo: http://coliru.stacked-crooked.com/a/3de664b4059250ae
Here's an old school C style way that is still valid C++ as well as modern C++:
#include <iostream>
int main() {
// C style but still valid c++
std::cout << "C style but still valid C++\n";
char phrase[] = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' };
char phrase2[] = { "hello world" };
// Adding 1 for the new line character.
std::cout << "size of phrase[] in bytes = "
<< sizeof(phrase)
<< " + 1 for newline giving total of "
<< sizeof(phrase) + 1
<< " total bytes\n"; // Not Null terminated
std::cout << "size of phrase2[] in bytes = "
<< sizeof(phrase2)
<< " + 1 for newline giving total of "
<< sizeof(phrase2) + 1
<< " total bytes\n"; // Null terminated
// Or you can do it more c++ style
std::cout << "\nC++ style\n";
// Also adding one for newline character and this string is not null terminated
std::cout << "size of string in bytes = "
<< std::string("hello world").size()
<< " + 1 for newline giving a total of "
<< std::string("hello world").size() + 1
<< " total bytes\n";
std::cout << "Press any key and enter to quit." << std::endl;
char c;
std::cin >> c;
return 0;
}
Since each character in C/C++ is 1 byte; all you need is the count of characters including special characters such as space, null terminator, newline etc. That is why there is a sizeof( Type ) operator in C/C++.
Output
C style but still valid C++
size of phrase[] in bytes = 11 + 1 for newline giving total of 12 total bytes
size of phrase2[] in bytes = 12 + 1 for newline giving total of 13 total bytes
C++ style
size of string in bytes = 11 + 1 for newline giving a total of 12 total bytes
Press any key and enter to quit.
Now this will only give you the size of the output before you send it to the ostream's cout object. This also doesn't reflect the added characters of text that is describing this output.
As others have stated you can use stringstream to concatenate a bunch of strings, characters and other data types into the stringstream object with the insertion operator << and then have the stream's member function give you the size in bytes.
It works in the same manner as std::string( ... ).size().

Difference between string.empty and string[0] == '\0'

Suppose we have a string
std::string str; // some value is assigned
What is the difference between str.empty() and str[0] == '\0'?
C++11 and beyond
string_variable[0] is required to return the null character if the string is empty. That way there is no undefined behavior and the comparison still works if the string is truly empty. However you could have a string that starts with a null character ("\0Hi there") which returns true even though it is not empty. If you really want to know if it's empty, use empty().
Pre-C++11
The difference is that if the string is empty then string_variable[0] has undefined behavior; There is no index 0 unless the string is const-qualified. If the string is const qualified then it will return a null character.
string_variable.empty() on the other hand returns true if the string is empty, and false if it is not; the behavior won't be undefined.
Summary
empty() is meant to check whether the string/container is empty or not. It works on all containers that provide it and using empty clearly states your intent - which means a lot to people reading your code (including you).
Since C++11 it is guaranteed that str[str.size()] == '\0'. This means that if a string is empty, then str[0] == '\0'. But a C++ string has an explicit length field, meaning it can contain embedded null characters.
E.g. for std::string str("\0ab", 3), str[0] == '\0' but str.empty() is false.
Besides, str.empty() is more readable than str[0] == '\0'.
Other answers here are 100% correct. I just want to add three more notes:
empty is generic (every STL container implements this function) while operator [] with size_t only works with string objects and array-like containers. when dealing with generic STL code, empty is preferred.
also, empty is pretty much self explanatory while =='\0' is not very much.
when it's 2AM and you debug your code, would you prefer see if(str.empty()) or if(str[0] == '\0')?
if only functionality matters, we would all write in vanilla assembly.
there is also a performance penalty involved. empty is usually implemented by comparing the size member of the string to zero, which is very cheap, easy to inline etc. comparing against the first character might be more heavy. first of all, since all strings implement short string optimization, the program first has to ask if the string is in "short mode" or "long mode". branching - worse performance. if the string is long, dereferencing it may be costly if the string was "ignored" for some time and the dereference itself may cause a cache-fault which is costly.
empty() is not implemented as looking for the existence of a null character at position 0, its simply
bool empty() const
{
return size() == 0 ;
}
Which could be different
Also, beware of the functions you'll use if you use C++ 11 or later version:
#include <iostream>
#include <cstring>
int main() {
std::string str("\0ab", 3);
std::cout << "The size of str is " << str.size() << " bytes.\n";
std::cout << "The size of str is " << str.length() << " long.\n";
std::cout << "The size of str is " << std::strlen(str.c_str()) << " long.\n";
return 0;
}
will return
The size of str is 3 bytes.
The size of str is 3 long.
The size of str is 0 long.
You want to know the difference between str.empty() and str[0] == '\0'. Lets follow the example:
#include<iostream>
#include<string>
using namespace std;
int main(){
string str, str2; //both string is empty
str2 = "values"; //assigning a value to 'str2' string
str2[0] = '\0'; //assigning '\0' to str2[0], to make sure i have '\0' at 0 index
if(str.empty()) cout << "str is empty" << endl;
else cout << "str contains: " << str << endl;
if(str2.empty()) cout << "str2 is empty" << endl;
else cout << "str2 contains: " << str2 << endl;
return 0;
}
Output:
str is empty
str2 contains: alues
str.empty() will let you know the string is empty or not and str[0] == '\0' will let you know your strings 0 index contains '\0' or not. Your string variables 0 index contains '\0' doesn't mean that your string is empty. Yes, only once it can be possible when your string length is 1 and your string variables 0 index contains '\0'. That time you can say that, its an empty string.
C++ string has the concept of whether it is empty or not. If the string is empty then str[0] is undefined. Only if C++ string has size >1, str[0] is defined.
str[i] == '\0' is a concept of the C-string style. In the implementation of C-string, the last character of the string is '\0' to mark the end of a C-string.
For C-string you usually have to 'remember' the length of your string with a separate variable. In C++ String you can assign any position with '\0'.
Just a code segment to play with:
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
char str[5] = "abc";
cout << str << " length: " << strlen(str) << endl;
cout << "char at 4th position: " << str[3] << "|" << endl;
cout << "char at 5th position: " << str[4] << "|" << endl;
str[4]='X'; // this is OK, since Cstring is just an array of char!
cout << "char at 5th position after assignment: " << str[4] << "|" << endl;
string cppstr("abc");
cppstr.resize(3);
cout << "cppstr: " << cppstr << " length: " << cppstr.length() << endl;
cout << "char at 4th position:" << cppstr[3] << endl;
cout << "char at 401th positon:" << cppstr[400] << endl;
// you should be getting segmentation fault in the
// above two lines! But this may not happen every time.
cppstr[0] = '\0';
str[0] = '\0';
cout << "After zero the first char. Cstring: " << str << " length: " << strlen(str) << " | C++String: " << cppstr << " length: " << cppstr.length() << endl;
return 0;
}
On my machine the output:
abc length: 3
char at 4th position: |
char at 5th position: |
char at 5th position after assignment: X|
cppstr: abc length: 3
char at 4th position:
char at 401th positon:?
After zero the first char. Cstring: length: 0 | C++String: bc length: 3

why the results are so different?

I don't know why the similar code has a great difference? the first code outputs normally, but the second code outputs some unrecognizable characters. Who can explain it for me?
Thks
#include <iostream>
using namespace std;
int main(){
char a[5] = { 'A', 'B', 'C', 'D' };
cout << a + 1 << endl;
char b[5] = {'a','b','c','d','e'};
cout << b+1 << endl;
return 0;
}
Both expressions a+1 and b+1 degrade into a char* which is then treated by << as a NUL-terminated string, but only a is NUL-terminated. Accessing b as a NUL-terminated string causes undefined behavior, which in your case seems to be printing garbage after the first few characters. (Note that I originally said both were not NUL-terminated, but then I noticed that you had only 4 characters in the initializer for a but specified a size of 5. That means the 5th element would be zero-initialized, effectively NUL-terminating a.)
If you want to print them correctly without causing undefined behavior, make sure they are NUL-terminated:
int main(){
char a[5] = { 'A', 'B', 'C', 'D' }; // Works as-is, but not good form
cout << a + 1 << endl;
char b[6] = {'a','b','c','d','e', '\0'}; // Needed NUL-terminated, but still not the best way
cout << b+1 << endl;
return 0;
}
Or as eigenchris noted in a comment, you could rely on the compiler to NUL-terminate it for you by using a string constant instead:
char a[] = "ABCD";
char b[] = "abcde"; // Probably the best way to do this.
Since you send char * arguments to cout, like a or b, c style strings are expected. This means also the zero termination character is expected for each string. So the following will work:
char a[5] = { 'A', 'B', 'C', 'D', '\0' };
cout << a + 1 << endl;
char b[6] = {'a', 'b', 'c', 'd', 'e', '\0'};
cout << b + 1 << endl;
The thing that this happens to b is that you overwrite zero character by defining all 5 characters. 0 should have been the sixth character.
When you cout << a+1, the output will be "BCD", because the character after 'D' is a a nul (\0).
This is because you specified the array-size of 5, but only gave it 4 values. The remaining, unspecified value will be set to 0.
When you cout << b+1, you'll get "bcdeXXXX", which will continue until a nul is found.
(the XXX will be unpredictable characters based on whatever is in memory.)
This is because you specified the value of all 5 characters, and the memory beyond that is undefined. It might be nuls, it might be random values left over from an earlier program. There is no way to know for sure. But cout will continue printing until it encounters a nul \0, or causes a segmentation/access violation by reading an inaccessible memory address.
That is why you get random garbage on the second output.

String after appending Char changning its size

I want to test what if string append char's size, and below is the outcome.
I know that the string end with the null character, but why the outcome is like that?
#include <iostream>
#include <string>
using namespace std;
int main(){
string a = "" + 'a'; //3
string b = "" + '1'; //2
string c = "a" + 'a'; //2
string d = "1" + '1'; //3
string e = "\0" + 'a'; //20
string f = "\0" + '1'; //1
string g = "a" + '\0'; //1
string h = "1" + '\0'; //1
string i = "" + '\0'; //0
string j = "" + '\0'; //0
cout << a.size() << endl;
cout << b.size() << endl;
cout << c.size() << endl;
cout << d.size() << endl;
cout << e.size() << endl;
cout << f.size() << endl;
cout << g.size() << endl;
cout << h.size() << endl;
cout << i.size() << endl;
cout << j.size() << endl;
return 0;
}
Your code is not doing what you think.
String literals decay to const char *, and char is an integer type. If you try to sum them, the compiler finds that the simplest way to make sense of that stuff is to convert chars to ints, so the result is performing pointer arithmetic over the string literals - e.g. ""+'a' goes to the 97th character in memory after the beginning of the string literal "" (if 'a' is represented by 97 on your platform).
This results in garbage being passed to the string constructor, which will store inside the string being constructed whatever it finds at these locations of memory until it founds a \0 terminator. Hence the "strange" results you get (which aren't reproducible, since the exact memory layout of the string table depends from the compiler).
Of course all this is undefined behavior as far as the standard is concerned (you are accessing char arrays outside their bounds, apart from the cases where you add \0).
To make your code do what you mean, at least one of the operands must be of type string:
string c = string("a") + 'a';
or
string c = "a" + string("a");
so the compiler will see the relevant overloads of operator+ that involve std::string.
Most of your initializers have undefined behaviour. Consider, for example:
string a = "" + 'a';
You are adding a char to a char pointer. This advances the pointer by the ASCII value of the char, and uses the resulting (undefined) C string to initialize a.
To fix, change the above to:
string a = string("") + 'a';

Understanding char reference

I've written this simple script to understand what a reference is, and I'm getting stuck on the char array.
int numbers[5] = {3, 6, 9, 12, 15};
for (int i = 0; i < 5; i++)
{
cout << numbers[i] << endl;
cout << &numbers[i] << endl;
}
cout << "--------------" << endl;
char letters[5] = {'a', 'b', 'c', 'd', 'e'};
for (int i = 0; i < 5; i++)
{
cout << letters[i] << endl;
cout << &letters[i] << endl;
}
and this is the output:
3
0xbffff958
6
0xbffff95c
9
0xbffff960
12
0xbffff964
15
0xbffff968
--------------
a
abcde
b
bcde
c
cde
d
de
e
With the int array, when I use &numbers[i], I receive a strange number that is a memory location. This is ok; it's exactly what I've understood.
But with char, I don't understand why I have this output.
The reason is that cout "knows" what to do with a char * value - it prints the character string as a NUL-terminated C string.
The same is not true of an int * value, so cout prints the pointer value instead.
You can force pointer value output by casting:
cout << static_cast<void *>(&letters[i]) << endl;
You are looking at a peculiarity of C++ streams. It tries to convert its arguments to something that is usually printable. The type of this expression is &ints[x] int*. &chars[x] becomes char* which is, incidentally also the type of a C character string. As we want this cout << "FOO"' to print out the whole string, it is needed to have this behavior. In your case this actually results in undefined behavior as the string you are using is not properly null-terminated. To resolve this issue use a static_cast.
When you pass to ostream::operator<< (in fact it is a global function, not an operator) the argument of type char*, it is considered as a null-terminated string.