In C++, if I do:
char myArray[] = {'1','2','3','4','5','6','7','8','9'};
Does that allocate 10 spaces? The last element being '/0'?
What about:
char myArray[9] = {'1','2','3','4','5','6','7','8','9'};
Did I allocate only 9 spaces in this case? Is this bad?
And, finally, what happens when I do:
char myArray[10] = {'1','2','3','4','5','6','7','8','9','/0'};
char myArray[] = {'1','2','3','4','5','6','7','8','9'};
Does that allocate 10 spaces? The last element being '/0'?
No. 9.
char myArray[9] = {'1','2','3','4','5','6','7','8','9'};
Did I allocate only 9 spaces in this case?
Yes.
Is this bad?
No.
and finally what happens when I do
char myArray[10] = {'1','2','3','4','5','6','7','8','9','/0'};
Assuming you meant '\0', exactly what it looks like.
There's no magic in any of these cases — you get precisely what you're asking for.
Automatic null-termination is something that comes into play with string literals:
char myArray1[10] = "123456789";
char myArray2[9] = "123456789"; // won't compile - wrong size
char myArray3[] = "123456789"; // still 10 elements - includes null terminator
No, you'll only get the trailing NUL when using a string literal, i.e.:
// Array of 10 bytes
char myArray[] = "123456789";
// same as:
char myArray[] = {'1','2','3','4','5','6','7','8','9','\0'};
char myArray[] = {'1','2','3','4','5','6','7','8','9'};
This only allocates 9 elements.
char myArray[9] = {'1','2','3','4','5','6','7','8','9'};
Yes, this line also allocates 9 elements.
char myArray[10] = {'1','2','3','4','5','6','7','8','9','/0'};
The last one should be '\0' instead of '/0'.
What you are thinking about should be
char myArray[] = "123456789";
which allocates 10 characters (1 for the trailing '\0' at the end of the string literal)
char arrays don't behave differently than any other arrays when you use list-initialization. Would you expect
int x[] = {1,2};
to magically append a 0 as the last element and make x have 3 elements?
In case you provide fewer elements, then the last ones are value-initialized, so
char myArray[10] = {'1','2','3','4','5','6','7','8','9'};
would be null-terminated, but
char myArray[9] = {'1','2','3','4','5','6','7','8','9'};
isn't.
Related
I would want to initialize char array during compilation time with least amount of manual work.
Is there a working shorthand format for this
char arr[5] = {0x4, 'a', 's', 'd' 'c'};
such as
char arr[5] = {0x4, "asdc"};
You could integrate the char int the string with escape sequences:
char arr[6] = { "\x04asdc"};
edit: corrected the wrog length of the array.
No that's not possible. But you could do
char arr[] = "\04asdc";
The problem with this is that is would not be exactly like the original array you show, since it would include the string terminator and therefore have six elements.
well i have two char array . when i try to concatenate both string using strcat function.
Then my string "a" length reduced from 9 to 6.
i also lost my string "a" .string b changed too.See in the output. why this is happening ???
here is what i have done
#include <bits/stdc++.h>
using namespace std;
int main() {
char a[]="roomies!!";
char b[]="hey kammo DJ ";
char *c;
c=new char[50];
cout<<"before:-\n";
cout<<"len of a is "<<strlen(a)<<'\n';
cout<<"len of b is "<<strlen(b)<<'\n';
cout<<"len of c is "<<strlen(c)<<'\n';
cout<<"string a is = "<<a<<'\n';
cout<<"string b is = "<<b<<'\n';
cout<<"string c is = "<<c<<'\n';
c=strcat(b,a);
cout<<"\nafter:-\n";
cout<<"len of a is "<<strlen(a)<<'\n';
cout<<"len of b is "<<strlen(b)<<'\n';
cout<<"len of c is "<<strlen(c)<<'\n';
cout<<"string a is = "<<a<<'\n';
cout<<"string b is = "<<b<<'\n';
cout<<"string c is = "<<c<<'\n';
return 0;
}
output:-
before:-
len of a is 9
len of b is 13
len of c is 3
string a is = roomies!!
string b is = hey kammo DJ
string c is = =
after:-
len of a is 6
len of b is 22
len of c is 22
string a is = mies!!
string b is = hey kammo DJ roomies!!
string c is = hey kammo DJ roomies!!
According to strcat spec:
"The behavior is undefined if the destination array is not large
enough for the contents of both src and dest and the terminating null
character."
Your destination array is "b", which is obviously not large enough to store contents of both "a" and "b", so you got an undefined behavior which resulted in modification of "a" string.
strcat() appends source string to the destination string and returns the destination string.
char * strcat ( char * destination, const char * source );
So your statement
c = strcat(b,a)
So array b and c will have the same values. strcat()
EDIT :
"a" changed because you're overflowing "b" array. Since its c++ you can just use std::string instead of a character array.
std::string a = "hi" ;
std::string b = "this is concat" ;
std::string c = a + b ;
strcat on C++ reference
Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first
character of source, and a null-character is included at the end of
the new string formed by the concatenation of both in destination.
You may use std::string for desired results.
Function strcat has the signature char *strcat( char *dest, const char *src ) and appends the content of string src at the end of the string where dest points to, i.e. it alters the content of the memory to which dest points. This requires that the memory to which dest points is large enough to hold both strings src and dest.
Hence, your call strcat(b,a) actually yields undefined behaviour, as the memory block represented by b is capable of holding 14 bytes (i.e. the length of "hey kammo DJ "+ 1), but not for any additional string.
So you'd rather write something like:
strcpy (c,b);
strcat (c,a);
or:
snprintf(c, 50, "%s%s", b, a);
I have a fixed length character array I want to assign to a string. The problem comes if the character array is full, the assign fails. I thought of using the assign where you can supply n however that ignores \0s. For example:
std::string str;
char test1[4] = {'T', 'e', 's', 't'};
str.assign(test1); // BAD "Test2" (or some random extra characters)
str.assign(test1, 4); // GOOD "Test"
size_t len = strlen(test1); // BAD 5
char test2[4] = {'T', 'e', '\0', 't'};
str.assign(test2); // GOOD "Te"
str.assign(test2, 4); // BAD "Tet"
size_t len = strlen(test2); // GOOD 2
How can I assign a fixed length character array to a string correctly for both cases?
Use the "pair of iterators" form of assign.
str.assign(test1, std::find(test1, test1 + 4, '\0'));
Character buffers in C++ are either-or: either they are null terminated or they are not (and fixed-length). Mixing them in the way you do is thus not recommended. If you absolutely need this, there seems to be no alternative to manual copying until either the maximum length or a null terminator is reached.
for (char const* i = test1; i != test1 + length and *i != '\0'; ++i)
str += *i;
You want both NULL termination and fixed length? This is highly unusual and not recommended. You'll have to write your own function and push_back each individual character.
For the first case, when you do str.assign(test1) and str.assign(test2), you have to have /0 in your array, otherwise this is not a "char*" string and you can't assign it to std::string like this.
saw your serialization comment -- use std::vector<char>, std::array<char,4>, or just a 4 char array or container.
Your second 'bad' example - the one which prints out "Tet" - actually does work, but you have to be careful about how you check it:
str.assign(test2, 4); // BAD "Tet"
cout << "\"" << str << "\"" << endl;
does copy exactly four characters. If you run it through octal dump(od) on Linux say, using my.exe | od -c you'd get:
0000000 " T e \0 t " \n
0000007
I am a student learning C++, and I am trying to understand how null-terminated character arrays work. Suppose I define a char array like so:
char* str1 = "hello world";
As expected, strlen(str1) is equal to 11, and it is null-terminated.
Where does C++ put the null terminator, if all 11 elements of the above char array are filled with the characters "hello world"? Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'? CPlusPlus.com seems to suggest that one of the 11 would need to be '\0', unless it is indeed allocating 12.
Suppose I do the following:
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
// Copy the first one to the second one
strncpy( str2, str1, strlen(str1) );
// Output the second one
cout << "Str2: " << str2 << endl;
This outputs Str2: hello worldatcomY╗°g♠↕, which I assume is C++ reading the memory at the location pointed to by the pointer char* str2 until it encounters what it interprets to be a null character.
However, if I then do this:
// Null-terminate the second one
str2[strlen(str1)] = '\0';
// Output the second one again
cout << "Terminated Str2: " << str2 << endl;
It outputs Terminated Str2: hello world as expected.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Running this code does not seem to cause any compiler warnings or run-time errors. Is this safe to do in practice? Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
In the case of a string literal the compiler is actually reserving an extra char element for the \0 element.
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
This is a common mistake new C programmers make. When allocating the storage for a char* you need to allocate the number of characters + 1 more to store the \0. Not allocating the extra storage here means this line is also illegal
// Null-terminate the second one
str2[strlen(str1)] = '\0';
Here you're actually writing past the end of the memory you allocated. When allocating X elements the last legal byte you can access is the memory address offset by X - 1. Writing to the X element causes undefined behavior. It will often work but is a ticking time bomb.
The proper way to write this is as follows
size_t size = strlen(str1) + sizeof(char);
char* str2 = (char*) malloc(size);
strncpy( str2, str1, size);
// Output the second one
cout << "Str2: " << str2 << endl;
In this example the str2[size - 1] = '\0' isn't actually needed. The strncpy function will fill all extra spaces with the null terminator. Here there are only size - 1 elements in str1 so the final element in the array is unneeded and will be filled with \0
Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'?
Yes.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Yes.
Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
Yes, because the second form is not long enough to copy the string into.
Running this code does not seem to cause any compiler warnings or run-time errors.
Detecting this in all but the simplest cases is a very difficult problem. So the compiler authors simply don't bother.
This sort of complexity is exactly why you should be using std::string rather than raw C-style strings if you are writing C++. It's as simple as this:
std::string str1 = "hello world";
std::string str2 = str1;
The literal "hello world" is a char array that looks like:
{ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }
So, yes, the literal is 12 chars in size.
Also, malloc( strlen(str1) ) is allocating memory for 1 less byte than is needed, since strlen returns the length of the string, not including the NUL terminator. Writing to str[strlen(str1)] is writing 1 byte past the amount of memory that you've allocated.
Your compiler won't tell you that, but if you run your program through valgrind or a similar program available on your system it'll tell you if you're accessing memory you shouldn't be.
I think you are confused by the return value of strlen. It returns the length of the string, and it should not be confused with the size of the array that holds the string. Consider this example :
char* str = "Hello\0 world";
I added a null character in the middle of the string, which is perfectly valid. Here the array will have a length of 13 (12 characters + the final null character), but strlen(str) will return 5, because there are 5 characters before the first null character. strlen just counts the characters until a null character is found.
So if I use your code :
char* str1 = "Hello\0 world";
char* str2 = (char*) malloc(strlen(str1)); // strlen(str1) will return 5
strncpy(str2, str1, strlen(str1));
cout << "Str2: " << str2 << endl;
The str2 array will have a length of 5, and won't be terminated by a null character (because strlen doesn't count it). Is this what you expected?
For a standard C string the length of the array that is storing the string is always one character longer then the length of the string in characters. So your "hello world" string has a string length of 11 but requires a backing array with 12 entries.
The reason for this is simply the way those string are read. The functions handling those strings basically read the characters of the string one by one until they find the termination character '\0' and stop at this point. If this character is missing those functions just keep reading the memory until they either hit a protected memory area that causes the host operating system to kill your application or until they find the termination character.
Also if you initialize a character array with the length 11 and write the string "hello world" into it will yield massive problems. Because the array is expected to hold at least 12 characters. That means the byte that follows the array in the memory is overwritten. Resulting in unpredictable side effects.
Also while you are working with C++, you might want to look into std:string. This class is accessible if you are using C++ and provides better handling of strings. It might be worth looking into that.
I think what you need to know is that char arrays starts from 0 and goes until array length-1 and on position array length has the terminator('\0').
In your case:
str1[0] == 'h';
str1[10] == 'd';
str1[11] == '\0';
This is why is correct str2[strlen(str1)] = '\0';
The problem with the output after the strncpy is because it copys 11 elements(0..10) so you need to put manually the terminator(str2[11] = '\0').
String manipulation problem
http://www.ideone.com/qyTkL
In the above program (given in the book C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie Exercise 3.14) the length of the Character pointer taken is len+1
char *pc2 = new char[ len + 1];
http://www.ideone.com/pGa6c
However, in this program the length of the Character pointer i have taken is len
char *pc2 = new char[ len ];
Why is there the need to take the length of new string as 1 greater when we get the same result. Please Explain.
Mind it the Programs i have shown here are altered slightly. Not exactly the same one as in the book.
To store a string of length n in C, you need n+1 chars. This is because a string in C is simply an array of chars terminated by the null character \0. Thus, the memory that stores the string "hello" looks like
'h' 'e' 'l' 'l' 'o' '\0'
and consists of 6 chars even though the word hello is only 5 letters long.
The inconsistency you're seeing could be a semantic one; some would say that length of the word hello is len = 5, so we need to allocate len+1 chars, while some would say that since hello requires 6 chars we should say its length (as a C string) is len=6.
Note, by the way, that the C way of storing strings is not the only possible one. For example, one could store a string as an integer (giving the string's length) followed by characters. (I believe this is what Pascal does?). If one doesn't use a length field such as this, one needs another way to know when the string stops. The C way is that the string stops whenever a null character is reached.
To get a feel for how this works, you might want to try the following:
char* string = "hello, world!";
printf("%s\n", string);
char* string2 = "hello\0, world!";
printf("%s\n", string2);
(The assignment char* string = "foo"; is just a shorthand way of creating an array with 4 elements, and giving the first the value 'f', the second 'o', the third 'o', and the fourth '\0').
It's a convention that the string is terminated by an extra null character so whoever allocates storage has to allocate len + 1 characters.
It causes problem. But, sometimes, when len isn't aligned, the OS adds some bytes after it, so the problem is hidden.