So I was messing around with vectors and the windows-api and someone suggested that I use a pointer to the first element in the vector as a buffer for a function.
I went out and tried printing the array with the syntax and it printed the whole array and now I wonder why is that?
char test[10] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j' };
std::cout << &test[0] << std::endl;
//output: abcdefghij
Edit 1:
I found out that printing with the array name worked too, but this code suprised me becasue using the reference syntax it printed the char array in decreasing order. But without using the refrenece syntax it does not I wonder why is that?
for (int i = 0; i < 10; i++) {
std::cout << &test[0] + i << " ";
}
//outputs: abcdefghij bcdefghij cdefghij defghij efghij fghij ghij hij ij j
&test[0] is a pointer to a char, to the first character in an array.
Sending a char * to cout is the equivalent of printing a string. The underlying logic will start dumping characters to the screen until it finds a NULL character, i.e. a char with a value of zero.
As others have mentioned, you are just lucky that you apparently had a zero (uninitialized memory?) immediately after the end of your array. Otherwise, the string printer would have kept going until it found one.
Related
I was playing around with char arrays in c++ and wrote this program:
int main()
{
char text[] = { 'h', 'e', 'l', 'l', 'o' }; //arrays initialised like this
//will have a size of the number
//of elements that you see
char text2[] = "hello"; //arrays initialised like this will have a size of
//the number of elements that you see + 1 (0 on the
//end to show where the end is
cout << endl;
cout << "The size of the first array is: " << sizeof(text) << endl;
cout << endl;
for (int i = 0; i < sizeof(text); i++)
{
cout << i << ":" << text[i] << endl;
}
cout << endl;
cout << "The size of the first array is: " << sizeof(text2) << endl;
cout << endl;
for (int i = 0; i < sizeof(text2); i++)
{
cout << i << ":" << text2[i] << endl;
}
cout << endl;
cin.get();
return 0;
}
This program gives me the output:
The size of the first array is: 5
0:h
1:e
2:l
3:l
4:o
The size of the first array is: 6
0:h
1:e
2:l
3:l
4:o
5:
My question is: Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0) on the end unlike initializing a char array with a string literal?
A curly braces initializer just provides the specified values for an array (or if the array is larger, the rest of the items are defaulted). It's not a string even if the items are char values. char is just the smallest integer type.
A string literal denotes a zero-terminated sequence of values.
That's all.
Informally, it's the second quotation character in a string literal of the form "foo" that adds the NUL-terminator.
In C++, "foo" is a const char[4] type, which decays to a const char* in certain situations.
It's just how the language works, that's all. And it's very useful since it dovetales nicely with all the standard library functions that model a string as a pointer to the first element in a NUL-terminated array of chars.
Splicing in an extra element with something like char text[] = { 'h', 'e', 'l', 'l', 'o' }; would be really annoying and it could introduce inconsistency into the language. Would you do the same thing for signed char, and unsigned char, for example? And what about int8_t?
A string literal like for example this "hello" has a type of a constant character array and initializwd the following way
const char string_literal_hello[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
As it is seen the type of the string literal is const char[6]. It contains six characters.
Thus this declaration
char text2[] = "hello";
that can be also written like
char text2[] = { "hello" };
in fact is substituted for the following declaration
char text2[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
That is then a string literal is used as an initializer of a character array all its characters are used to initialize the array.
You can terminate it yourself in multiple ways:
char text1[6] = { 'h', 'e', 'l', 'l', 'o' };
char text2[sizeof "hello"] = { 'h', 'e', 'l', 'l', 'o' };
char text3[] = "hello"; // <--- my personal favourite
Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0)
The reason is because that syntax...
Type name[] = { comma separated list };
...is used for initializing arrays of any type. Not just char.
The "quoted string" syntax is shorthand for a very specific type of array that assumes a null terminator is desired.
When you designate a double quote delimited set of adjacent characters (a string literal), it is assumed that what you want is a string. And a string in C means an array of characters that is null-terminated, because that's what the functions that operate on strings (printf, strcpy, etc...) expect. So the compiler automatically adds that null terminator for you.
When you provide a brace delimited, comma separated list of single quote delimited characters, it is assumed that you don't want a string, but you want an array of the exact characters you specified. So no null terminator is added.
C++ inherits this behavior.
I have troubles understanding strings as pointers. Apparently a string is understood as a pointer which points to the first address of the string. So using the "&"-operator I should receive the address of the first character of the string. Here's a small example:
#include "stdafx.h"
#include <iostream>
using namespace std;
int main(){
char text[101];
int length;
cout << "Enter a word: ";
cin >> text;
length = strlen(text);
for (int i = 0; i <= length; i++) {
cout << " " << &text[i];
}
return 0;
}
When entering a word such as "Hello", the output is: "Hello ello llo lo o". Instead I expected to receive the address of each character of "Hello". When I use the cast long(&text[i]) it works out fine. But I don't understand why. Without the cast, apparently the "&"-operator gives the starting address of the string to be printed. Using a cast it gives the address of every character separately.
Maybe sb. can explain this to me - I'd be really grateful!
&text[i] is equivalent to text + i and that shifts the pointer along the char[] array by i places using pointer arithmetic. The effect is to start the cout on the (i)th character, with the overload of << to a const char* called. That outputs all characters from the starting point up to the NUL-terminator.
text[i] however is a char type, and the overload of << to a char is called. That outputs a single character.
In C++, if you want a string, then use std::string instead. You can still write cin >> text; if text is a std::string type! Your code is also then not vulnerable to overrunning your char buffer.
If you have a character array storing a string as for example
char text[] = "Hello";
then the array is initialized like
char text[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
In this statement
std::cout << text;
there is used operator << overloaded for the type const char * and the array text is converted implicitly to pointer to its first element.
You could write instead
std::cout << &text[0];
because in the both statements the expressions text and &text[0] have type char *.
The operator overloaded for the type const char * outputs characters starting from the address at the pointer until a zero character is encountered.
So if instead of the statement
std::cout << &text[0];
you'll write
std::cout << &text[1];
then the only thing that is changed is the starting address of the string and nothing more. That is in fact you are outputting string that is represented like
{ 'e', 'l', 'l', 'o', '\0' }
If to write
std::cout << &text[2];
that is if the pointer in the right side of the expression is moved one position right then it means that you'll deal with the string
{ 'l', 'l', 'o', '\0' }
and so on.
That is the operator << overloaded like
template<class traits>
basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&,
const char*);
just outputs entities pointed to by the second parameter as strings.
If you want to output the value of the pointer itself instead of the string pointed to by the pointer you should use another overloaded operator << declared like
basic_ostream<charT, traits>& operator<<(const void* p);
Tp invoke it you should write for example
std::cout << ( void * )text;
or
std::cout << ( void * )&text[i];
where i is some index.
Instead of the C casting you can use C++ casting like
std::cout << static_cast<void *>( &text[i] );
To print the address of an array element, you could do:
cout << " " << (void*)&text[i];
This:
cout << " " << &text[i];
is equivalent to this:
cout << " " << text + i;
which means that you ask to print the string, starting from index i.
I don't know why the similar code has a great difference? the first code outputs normally, but the second code outputs some unrecognizable characters. Who can explain it for me?
Thks
#include <iostream>
using namespace std;
int main(){
char a[5] = { 'A', 'B', 'C', 'D' };
cout << a + 1 << endl;
char b[5] = {'a','b','c','d','e'};
cout << b+1 << endl;
return 0;
}
Both expressions a+1 and b+1 degrade into a char* which is then treated by << as a NUL-terminated string, but only a is NUL-terminated. Accessing b as a NUL-terminated string causes undefined behavior, which in your case seems to be printing garbage after the first few characters. (Note that I originally said both were not NUL-terminated, but then I noticed that you had only 4 characters in the initializer for a but specified a size of 5. That means the 5th element would be zero-initialized, effectively NUL-terminating a.)
If you want to print them correctly without causing undefined behavior, make sure they are NUL-terminated:
int main(){
char a[5] = { 'A', 'B', 'C', 'D' }; // Works as-is, but not good form
cout << a + 1 << endl;
char b[6] = {'a','b','c','d','e', '\0'}; // Needed NUL-terminated, but still not the best way
cout << b+1 << endl;
return 0;
}
Or as eigenchris noted in a comment, you could rely on the compiler to NUL-terminate it for you by using a string constant instead:
char a[] = "ABCD";
char b[] = "abcde"; // Probably the best way to do this.
Since you send char * arguments to cout, like a or b, c style strings are expected. This means also the zero termination character is expected for each string. So the following will work:
char a[5] = { 'A', 'B', 'C', 'D', '\0' };
cout << a + 1 << endl;
char b[6] = {'a', 'b', 'c', 'd', 'e', '\0'};
cout << b + 1 << endl;
The thing that this happens to b is that you overwrite zero character by defining all 5 characters. 0 should have been the sixth character.
When you cout << a+1, the output will be "BCD", because the character after 'D' is a a nul (\0).
This is because you specified the array-size of 5, but only gave it 4 values. The remaining, unspecified value will be set to 0.
When you cout << b+1, you'll get "bcdeXXXX", which will continue until a nul is found.
(the XXX will be unpredictable characters based on whatever is in memory.)
This is because you specified the value of all 5 characters, and the memory beyond that is undefined. It might be nuls, it might be random values left over from an earlier program. There is no way to know for sure. But cout will continue printing until it encounters a nul \0, or causes a segmentation/access violation by reading an inaccessible memory address.
That is why you get random garbage on the second output.
Is the C++ compiler supposed to automatically insert the null character after the end of the char array? The following prints "Word". So does the C++ compiler automatically insert a null character at the last position in the array?
int main()
{
char x[] = {'W', 'o', 'r', 'd'};
cout << x << endl;
return 0;
}
No, it doesn't. Your array x has only 4 elements. You are passing std::cout a pointer to char, which is not the beginning of a null-terminated string. This is undefined behaviour.
As an example, on my platform, similar code printed Word?, with a spurious question mark.
In C/C++ string literals are automatically appended with terminating zero. So if you will write for example
char x[] = "Word";
then the size of the array will be equal to 5 and the last element of the array will contain '\0'.
When the following record is used as you showed
char x[] = {'W', 'o', 'r', 'd'};
then the compiler allocates exactly the same number of elements as the number of the initializers. That is in this case the array will have only 4 elements.
However if you would write the following way
char x[5] = {'W', 'o', 'r', 'd'};
then the fifth element will contain the terminating zero because it has no corresponding initializer and will be zero-initialized.
Also take into account that the following record is valid in C but invalid in C++
char x[4] = "Word";
That is undefined behavior, and it printed "Word" as it might also have crashed.
No it doesn't. You yourself have to do that if you want to use it as a string or you want to use library functions for strings for that.
No, in this case you are just creating a non-null terminated array. In-case of a read based on null termination. Your program won't stop as it wouldn't find the null at last.
char x[] = "Word";
In this case 5 bytes would be allocated for x. Null at the end.
No, it doesn't. If you want it inserted automatically, use a string constant:
const char *x = "Word";
//OR
std::string const s = "Word";
const char *x = s.c_str();
//OR
char x[] = { "Word" }; //using the C++11 brace initializer syntax.
The simplest way to verify this is to look at the memory address for x using a debugger and monitor it and the next few bytes following it. Stop the debugger after the initialization and examine the memory contents.
No, if you want to use the char array as a string use this form of the initializer
char x[] = { "Word" };
Although it is not guaranteed, it seems my implementation always adds a \0 even if I am not requested.
I tested noNull[2] is always \0
char noNull[] = {'H', 'i'};
for (int i = 2; i < 5; ++i) {
if (noNull[i] == '\0') {
std::cout << "index " << i << " is null" << std::endl;
} else {
std::cout << "index " << i << " is not null" << std::endl;
}
}
output
index 2 is null
index 3 is not null
index 4 is not null
I know, the \0 on the end of the character array is a must if you use the character array with functions who expect \0, like cout, otherwise unexpected random characters appear.
My question is, if i use the character array only in my functions, reading it char by char, do i need to store the \0 at the end?
Also, is it a good idea to fill only characters and leave holes on the array?
Consider the following:
char chars[5];
chars[1] = 15;
chars[2] = 17;
chars[3] = 'c';
//code using the chars[1] and chars[3], but never using the chars
int y = chars[1]+chars[3];
cout << chars[3] << " is " << y;
Does the code above risk unexpected errors?
EDIT: edited the example.
The convention of storing a trailing char(0) at the end of an array of chars has a name, it's called a 'C string'. It has nothing to do, specifically, with char - if you are using wide character, a wide C string would be terminated with a wchar_t(0).
So it's absolutely fine to use char arrays without trailing zeroes if what you are using is just an array of chars and not a C string.
char dirs[4] = { 'n', 's', 'e', 'w' };
for (size_t i = 0; i < 4; ++i) {
fprintf(stderr, "dir %d = %c\n", i, dirs[i]);
std::cout << "dir " << i << " = " << dirs[i] << '\n';
}
Note that '\0' is char(0), that is it has a numeric, integer value of 0.
char x[] = { 'a', 'b', 'c', '\0' };
produces the same array as
char x[] = { 'a', 'b', 'c', 0 };
Your second question is unclear, though
//code using the chars[1] and chars[3], but never using the chars
int y = chars[1]+chars[3];
cout << chars[3] << " is " << y;
Leaving gaps is fine, as long as you're sure your code is aware that they are uninitialized. If it is not, then consider the following:
char chars[4]; // I don't initialize this.
chars[1] = '1';
chars[3] = '5';
int y = chars[1] + chars[3];
std::cout << "y = " << y << '\n';
// prints 100, because y is an int and '1' is 49 and '5' is 51
// later
for (size_t i = 0; i < sizeof(chars); ++i) {
std::cout << "chars[" << i << "] = " << chars[i] << '\n';
}
Remember:
char one = 1;
char asciiCharOne = '1';
are not the same. one has an integer value of 1, while asciiCharOne has an integer value of 49.
Lastly: If you are really looking to store integer numeric values rather than their character representations, you may want to look at the C++11 fixed-width integer types in . For an 8-bit, unsigned value uint8_t, for an 8-bit signed value, int8_t
Running off the end of a character array because it has no terminating \0 means accessing memory that does not belong to the array. That produces undefined behavior. Often that looks like random characters, but that's a rather benign symptom; some are worse.
As for not including it because you don't need it, sure. There's nothing magic that says that an array of char has to have a terminating \0.
To me it looks like you use the array not for strings, but as an array of numbers, so yes it is ok not to use '\0' in the array.
Since you are using it to store numbers, consider using uint8_t or int8_t types from stdint.h, which are typedefs for unsigned char and signed char, but is more clear this way that the array is used as an array of numbers, and not as a string.
cout << chars[3] << " is " << y; is not undefined behaviour because you access the element at position 3 from the array, that element is inside the array and is a char, so everything is fine.
EDIT:
Also, I know is not in your question, but since we are here, using char instead of int for numbers, can be deceiving. On most architectures, it does not increase performance, but actually slows it down. This is mainly because of the way the memory is addressable and because the processor works with 4 bytes / 8 bytes operands anyways. The only gain would be the storage size, but use this for storing on the disk, and unless you are working with really huge arrays, or with limited ram, use int for ram as well.