Difference between char* and char[] - c++

I know this is a very basic question. I am confused as to why and how are the following different.
char str[] = "Test";
char *str = "Test";

char str[] = "Test";
Is an array of chars, initialized with the contents from "Test", while
char *str = "Test";
is a pointer to the literal (const) string "Test".
The main difference between them is that the first is an array and the other one is a pointer. The array owns its contents, which happen to be a copy of "Test", while the pointer simply refers to the contents of the string (which in this case is immutable).

The diference is the STACK memory used.
For example when programming for microcontrollers where very little memory for the stack is allocated, makes a big difference.
char a[] = "string"; // the compiler puts {'s','t','r','i','n','g', 0} onto STACK
char *a = "string"; // the compiler puts just the pointer onto STACK
// and {'s','t','r','i','n','g',0} in static memory area.

A pointer can be re-pointed to something else:
char foo[] = "foo";
char bar[] = "bar";
char *str = foo; // str points to 'f'
str = bar; // Now str points to 'b'
++str; // Now str points to 'a'
The last example of incrementing the pointer shows that you can easily iterate over the contents of a string, one element at a time.

One is pointer and one is array. They are different type of data.
int main ()
{
char str1[] = "Test";
char *str2 = "Test";
cout << "sizeof array " << sizeof(str1) << endl;
cout << "sizeof pointer " << sizeof(str2) << endl;
}
output
sizeof array 5
sizeof pointer 4

The first
char str[] = "Test";
is an array of five characters, initialized with the value "Test" plus the null terminator '\0'.
The second
char *str = "Test";
is a pointer to the memory location of the literal string "Test".

Starting from C++11, the second expression is now invalid and must be written:
const char *str = "Test";
The relevant section of the standard is Appendix C section 1.1:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array
of const char.” The type of a char16_t string literal is changed from
“array of some-integer-type” to “array of const char16_t.” The type of
a char32_t string literal is changed from “array of some-integer-type”
to “array of const char32_t.” The type of a wide string literal is
changed from “array of wchar_t” to “array of const wchar_t.”
Rationale: This avoids calling an inappropriate overloaded function,
which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature.

"Test" is an array of five characters (4 letters, plus the null terminator.
char str1[] = "Test"; creates that array of 5 characters, and names it str1. You can modify the contents of that array as much as you like, e.g. str1[0] = 'B';
char *str2 = "Test"; creates that array of 5 characters, doesn't name it, and also creates a pointer named str2. It sets str2 to point at that array of 5 characters. You can follow the pointer to modify the array as much as you like, e.g. str2[0] = 'B'; or *str2 = 'B';. You can even reassign that pointer to point someplace else, e.g. str2 = "other";.
An array is the text in quotes. The pointer merely points at it. You can do a lot of similar things with each, but they are different:
char str_arr[] = "Test";
char *strp = "Test";
// modify
str_arr[0] = 'B'; // ok, str_arr is now "Best"
strp[0] = 'W'; // ok, strp now points at "West"
*strp = 'L'; // ok, strp now points at "Lest"
// point to another string
char another[] = "another string";
str_arr = another; // compilation error. you cannot reassign an array
strp = another; // ok, strp now points at "another string"
// size
std::cout << sizeof(str_arr) << '\n'; // prints 5, because str_arr is five bytes
std::cout << sizeof(strp) << '\n'; // prints 4, because strp is a pointer
for that last part, note that sizeof(strp) is going to vary based on architecture. On a 32-bit machine, it will be 4 bytes, on a 64-bit machine it will be 8 bytes.

Let's take a look at the following ways to declare a string:
char name0 = 'abcd'; // cannot be anything longer than 4 letters (larger causes error)
cout << sizeof(name0) << endl; // using 1 byte to store
char name1[]="abcdefghijklmnopqrstuvwxyz"; // can represent very long strings
cout << sizeof(name1) << endl; // use large stack memory
char* name2 = "abcdefghijklmnopqrstuvwxyz"; // can represent very long strings
cout << sizeof(name2) << endl; // but use only 8 bytes
We could see that declaring string using char* variable_name seems the best way! It does the job with minimum stack memory required.

Related

Put chars used two times in a variable is a gain of space?

Related to How the compiler manages the same char?, create a variable which contains characters used at least two times in the code, is a gain of space?
Example:
wstrFile.find_last_of(L"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");
wstrFile.find_last_of(L"aefgh");
for the compiler: is the same or worst or better than
std::wstring wstrTemp = L"aefgh"
wstrFile.find_last_of(L"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");
wstrFile.find_last_of(wstrTemp);
In all cases, I will use "wstrTemp" further, its creation is needed.
NOTE TO SOLUTION:
Thanks to both answers.
In order to complete the answer for my question, the fact to put chars used at least two times will save space but it is totally painful/not useful to add char by char in the console, that was my main aim before this question.
If you use the same constant string literal twice in the same compilation unit
static const char greet[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
std::cout << "hello";
std::cout << "hello";
const char* t = "hello";
std::cout << t;
the compiler is able to perform some space optimizations (although it is not guaranteed it will do it). Some compilers can do this across the entire binary. In Visual Studio it is an optimization option, "String Pooling".
However, it only applies to constants.
static char greet[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
std::cout << "hello";
these two cannot overlap because one is mutable and the other is not. Consider:
const char greet[] = "hello";
char greeting[] = "hello";
for (size_t i = 0; i < 2; ++i) {
std::cout << greet << greeting << "\n";
greeting[0] = 'j';
}
This should output "hellohello" and then "hellojello". If the strings were pooled the second iteration would errantly print "jellojello".
So in general you don't have to worry about manually deduping strings, the compiler will usually do it for you in optimized builds.
This can easily be tested with the following code:
#include <iostream>
using namespace std;
int main() {
const char *a = "hello";
const char *b = "hello";
const char *c = "hello";
char d[] = "hello";
std::cout << "&a[0] = " << (void*)&a[0] << "\n";
std::cout << "&b[0] = " << (void*)&b[0] << "\n";
std::cout << "&c[0] = " << (void*)&c[0] << "\n";
std::cout << "&d[0] = " << (void*)&d[0] << "\n";
return 0;
}
http://ideone.com/hNf29c
In the case of d we get a different address - d is mutable, so a copy has to be stored, and the address is of the stack location the literal is copied to.
In your very specific case, where you have stated that you "have to" create the std::wstring wstrTemp, then second choice will be faster, but not by much.
std::string::find_last_of accepts const wchar_t* strings. So if you call
std::string::find_last_of with a const wchar_t*, the only thing it has to do is call wcslen and then do it's work.
While if you pass an const std::wstring& object (which you are doing), std::find_last_of will already know the buffer location, and the length size. Therefore it will instantly do it's work. It doesn't have to create another std::string object.
As for the other question, if you use a string literal like "hello" in multiple places, the compiler, most of the times (if not always) will store the "hello" only once. But if they are different, even by a single char like "hello2", the compiler cannot longer do that, because of the null terminator which is required.
If you have the following:
const char *ptr = "hello";
char temp[] = "hello";
First, both variables obviously have different memory address. But the string literal will, most of the times, point to same address.
What the compiler does to create "temp" is: first allocate enough space in the stack, and then copy from the static location where the string literal "hello" is stored, into the stack address. That's the only reason why those variables do not have the same address. But the string literals do.
based on #kfsone response, here's how it works
const char *a = "hello";
const char *b = "hello";
const char *c = "hello";
//char d[] = "hello";
// This is actually how d looks like in assembly.
char d[sizeof("hello")]; //SUB ESP, 6
// d is constructed from the string literal "hello"
memcpy(d, "hello", sizeof("hello"));
// the compiler does not call memcpy. most of the times, it will call
// the assembly MOVS instuction if they are big string literals, or just
// MOV DWORD PTR ES:[EDI], DWORD PTR DS:[ESI]
// MOV WORD PTR ES:[EDI+4], WORD PTR DS:[ESI+4]
// if they are small.
// Where EDI is the stack address of "d" and ESI the address of the string literal "hello"

pointer arithmetic on arrays

When I run the code below my output is not what I expect.
My way of understanding it is that ptr points to the address of the first element of the Str array. I think ptr + 5 should lead to the + 5th element which is f. So the output should only display f and not both fg.
Why is it showing fg? Does it have to do with how cout displays an array?
#include <iostream>
using namespace std;
int main()
{
char *ptr;
char Str[] = "abcdefg";
ptr = Str;
ptr += 5;
cout << ptr;
return 0;
}
Expected output: f
Actual output: fg
When you declare:
char Str[] = "abcdefg"
The string abcdefg is stored implicitly with an extra character \0 which marks the end of the string.
So, when you cout a char* the output will be all the characters stored where the char * points and all the characters stored in consecutive memory locations after the char* until a \0 character is encountered at one of the memory locations! Since, \0 character is after g in your example hence 2 characters are printed.
In case you only want to print the current character, you shall do this ::
cout << *ptr;
Why is it showing fg?
The reason why std::cout << char* prints the string till the end instead of a single char of the string is , because std::cout treats a char * as a pointer to the first character of a C-style string and prints it as such.1
Your array:
char Str[] = "abcdefg";
gets implicitly assigned an '\0'at the end and it is treated as a C-style string.
Does it have to do with how std::cout displays an array?
This has to do with how std::cout handles C-style strings, to test this change the array type to int and see the difference, i.e. it will print a single element.
1. This is because in C there are no string types and strings are manipulated through pointers of type char, indicating the beginning and termination character: '\0', indicating the end.

std::cout << cstring; prints value of cstring elements, not cstring hex address. Why?

I understand that an array of chars is different to a cstring, due to the inclusion of a suffixing \0 sentinel value in a cstring.
However, I also understand that, in the case of a cstring, an array of chars, or any other type of array, the array identifier in the program is a pointer to the array.
So, below is perfectly valid.
char some_c_string[] = "stringy";
char *stringptr;
stringptr = some_c_string; // assign pointer val to other pointer
What I don't understand is why std::cout automatically assumes I want to output the value of each element in either a cstring, or an array of chars, rather than the hex address. For example:
char some_c_string[] = "stringy"; // got a sentinel val
char charArray[5] = {'H','e','l','l','o'}; // no space for sentinel val \0
char *stringptr;
stringptr = some_c_string;
int intArray[3] = {1, 2, 4};
cout << some_c_string << endl << charArray << endl
<< stringptr << endl << intArray << endl;
Will result in the output:
stringy
Hello
stringy
0xsomehexadd
So for the cstring and the char array, std::cout has given me the value of each element, rather than the hex address like with the int array.
I guess this became a standard in C++ for convenience. But can someone please expand on 1) When this became standard. 2) How std::cout differentiates between char/cstrings and other arrays. I guess it uses sizeof() to see it's is an array of single bytes, and that value of each array element is an ASCII int value to identify an array of chars/cstring.
Thanks! :D
There is nothing fancy going on. The operator<< has a special overload for char*, so that you can do std::cout << "Hello World";. It's been like that since day 1 of c++.
For anything besides char*, the pointer address is displayed as hex.
If you want to display the address of a char*, simply cast it to void*, ie
std::cout << (void*)"Hello World";

Trying to reverse a string and getting a bus error

I am trying to reverse a string (but that's not the problem that I have). The problem is trying to change the value of the string array given a certain index. However, every time I try to change the value at the index, I get a bus error. Namely, Bus error: 10. I'm not sure what this means. Also, I tried str[0] = "a" but this also gives me a bus error. Any suggestions to fix this?
#include <iostream>
using namespace std;
void reverse(char* str){
str[0] = 'a';
}
int main(){
char* str = "hello";
reverse(str);
}
Allocate your string as an array on the stack and not as a pointer into a possibly read-only segment of your program.
char str[] = "hello";
First of all, this line should atleast give you a warning:
char* str = "hello";
you are converting a string constant to a pointer, which is not allowed.
To fix your code, you should use, char str[] = "hello" in main().
When you pass this array in reverse(), it decays to char*, now the question which you asked in previous answer's comment.
But when I write cout << str << endl;, why does it print out "hello"? Shouldn't it print only the first character of the string since it points to the first element of the array?
It is because the << operator on std::cout is overloaded. If you give it a char* or const char*, it treats the operand as a pointer to (the first character of) a C-style string, and prints the contents of that string:
const char * str= "hello";
cout << str; // prints "hello"
If you give it a char value, it prints that value as a character:
cout << *str; // prints "h"
cout << str[0]; // prints "h"

pointers in c++ debugging

I am currently trying to learn C++ from a book that I got from a friend of mine a couple of days ago. I I've seen some codes as a quiz in the book that I need to solve. So I tried to solve them but I'm not sure if my assumption is right.
This is the first one
char* r(char *g){ // can someone explain this line for me? I'm not sure what is it saying
char ch = 'B'; // is the code going to be correct if I changed char ch to char* ch?
return &ch; // since this is &ch, then the previous line should be char* ch, am I right?
}
The second code:
char* a;
a = new char[strlen(b)]; // will this line cause a compiling error just because b is undefined ? since there is no length for b because it's not even there?
strcpy(a,b); // since we're using strcpy() a and b has to be pointers am I right?
I am not asking for the answers, I need someone to tell me whether am right or wrong and why please.
char* r(char *g){ // can someone explain this line for me? I'm not sure what is it saying
Declares a function, r which takes one argument, a pointer g to contain the address of one or more characters.
char ch = 'B';
Declares a variable, ch of type char and assigns it a value 'B'. That is - it will contain a number which is the position in the ASCII chart of the letter B. It's going to contain the number 66, but when you print it out, it will produce the letter 'B'. (see http://www.asciitable.com/)
This variable will likely be on the stack. It could be in a register, but compilers are generally smart and the next line will ensure it is on the stack.
return &ch;
In this context, & is the address of operator.
return address_of(ch);
Since ch is of type char, &ch produces a value which is of type char*.
char* a;
Declares a variable a with no initial value. This is a bad thing to get into the habbit of writing.
a = new char[strlen(b)];
You say that b doesn't exist, but I think it's assumed to be of type char* - a pointer to one or more characters. In C and C++ a "C-String" is an array of 'char' values (characters) terminated by a char of value 0 (not the character '0', which has an ASCII value of 48, but 0, or '\0'). This is called a 'terminating nul' or a 'nul character' or a 'nul byte'.
The string "hello" is actually representable as an array { 'h', 'e', 'l', 'l', 'o', 0 }. Contrast with "hell0", which would be { 'h', 'e', 'l', 'l', '0', 0 };
The function strlen counts the number of characters from the address it is called with until it finds a nul. If b was the address of "hello", strlen would return 5.
new allocates memory for an object, or in this case an array of objects of type char, the number of which is the return value of strlen.
size_t len = strlen(b);
char* a = new char[len];
At this point in the code, recall my explanation about terminating nul and that strlen returns the number of characters before it finds the 0. To store a C-string you need the number of characters PLUS space for a terminating NULL.
If b is the string "A", it consists one character ('A') but two *char*s - 'A', and 0. Strlen returns the number of characters.
strcpy(a, b);
This will copy the characters pointed to by b to the address at a, *including the terminating nul.
The bug here is that you only allocated enough memory for the characters.
char* a = new char[strlen(b) + 1];
strcpy(a, b);
Again - strlen is always going to return the length - the number of characters, and you're always going to want one more than that, for the nul.
would be correct - otherwise you're going to overwrite the memory allocated to you and cause a corruption.
--- EDIT ---
Throwing some of this together, live demo here: http://ideone.com/X8HPxP
#include
#include
int main() {
char a[] = "hello";
std::cout << "a starts out as [" << a << "]\n";
// C/C++ arrays are 0-based, that is:
a[0] = 'H'; // changes a to "Hello"
std::cout << "a is now [" << a << "]\n";
std::cout << "strlen(a) returns " << strlen(a) << "\n";
// But that is based on counting characters until the 0.
a[3] = 0; // one way to write it,
a[3] = '\0'; // some people prefer writing it this way.
std::cout << "a changed to [" << a << "]\n";
std::cout << "strlen(a) is now " << strlen(a) << "\n";
return 0;
}
First Code:
r is function name having return type char* i.e. reference type and accepting parameter of reference type char* g.
'B' is assigned to ch variable.
r function returns the address of ch variable.
According to me no correction required in First code.
Second Code:
Yes it will cause compilation error at line 2 as b is not declared or defined.
1st Code:
You are defining a function called r which accepts a pointer to a char and returns a pointer to a char.
char* r(char *g){
//stack char variable ch is initialized to B
//changing char ch to char *ch will compile (with a warning) but then the address pointed by ch will contain garbage (value of 'B' projected as an address).
char ch = 'B';
//you are returning the address of ch which as seen above is a stack variable so you are causing undefined behavior. You should avoid this.
return &ch;
}
2nd Code:
char* a;
// if b is undefined as you state then following line will cause compiling error. strlen() will calculate the length of the area at runtime so b must be at lease defined first.
a = new char[strlen(b)];
//a is a pointer as you defined it above and points to the heap memory allocated by new
strcpy(a,b);
char* r(char *g){
here r() is function which take char* as argument and return char*
char ch = 'B';
return &ch;
Here, ch is char locally defined and you are returning it. This is not good. Better use char*. Also char will have only one character whereas if you use char* you can have more than one.
char* ch = "Thats My string";
return ch; //Notice ch is a pointer. No need to use &
Second Code:
char* a;
a = new char[strlen(b)];
If b is undefined, sure there will be error. If b is char* with some value assigned to it strlen will provide you length of string which b is having. so this looks good.
strcpy(a,b); // since we're using strcpy() a and b has to be pointers am I right?
Yes you are right! You can use strncpy instead.