char pointer parameter different behaviour - c++

I have the following code:
void uppercase(char *sir)
{
for(int i=0;i<strlen(sir);i++)
{
sir[i]=(char)toupper(sir[i]);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
//char lower[]="u forgot the funny"; this works
//char *lower="u forgot the funny"; this gives me a runtime error
uppercase(lower);
cout<<lower<<"\n\n";
system("PAUSE");
return 0;
}
I have noted that if I run with the char vector it works.
When I try to run with the second method it generates a runtime error.
I would like to know the reason for this behaviour please.

You cannot modify string literals; doing so (as in your second case) is undefined behaviour.
char x[] = "foo";
creates a character array containing the characters f,o,o,\0. It's basically a mutable copy of the string.
char *x = "foo";
creates a string pointer pointing to the "foo" string literal. The literal may live in some read-only memory, in the program memory, or in a constant pool. Writing to it is undefined behaviour. Also, not that the type of a string literal is always const char[], so assigning it to a char * is violating const-correctness.

The former creates a character array which can be mutated, the latter is a pointer to fixed memory (which cannot be manipulated)

Related

Why can we return char* from function?

Here is a piece of C++ code that shows some very peculiar behavior. Who can tell me why strB can print out the stuff?
char* strA()
{
char str[] = "hello word";
return str;
}
char* strB()
{
char* str = "hello word";
return str;
}
int main()
{
cout<<strA()<<endl;
cout<<strB()<<endl;
}
Why does strB() work?
A string literal (e.g. "a string literal") has static storage duration. That means its lifetime spans the duration of your program's execution. This can be done because the compiler knows every string literal that you are going to use in your program, hence it can store their data directly into the data section of the compiled executable (example: https://godbolt.org/z/7nErYe)
When you obtain a pointer to it, this pointer can be passed around freely (including being returned from a function) and dereferenced as the object it points to is always alive.
Why doesn't strA() work?
However, initializing an array of char from a string literal copies the content of the string literal. The created array is a different object from the original string literal. If such array is a local variable (i.e. has automatic storage duration), as in your strA(), then it is destroyed after the function returns.
When you return from strA(), since the return type is char* an "array-to-pointer-conversion" is performed, creating a pointer to the first element of the array. However, since the array is destroyed when the function returns, the pointer returned becomes invalid. You should not try to dereference such pointers (and avoid creating them in the first place).
String literals exist for the life of the program.
String literals have static storage duration, and thus exist in memory for the life of the program.
That means cout<<strB()<<endl; is fine, the returned pointer pointing to string literal "hello word" remains valid.
On the other hand, cout<<strA()<<endl; leads to UB. The returned pointer is pointing to the 1st element of the local array str; which is destroyed when strA() returns, left the returned pointer dangled.
BTW: String literals are of type const char[], char* str = "hello word"; is invalid since C++11 again. Change it to const char* str = "hello word";, and change the return type of strB() to const char* too.
String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (since C++11)
case 1:
#include <stdio.h>
char *strA() {
char str[] = "hello world";
return str;
}
int main(int argc, char **argv) {
puts(strA());
return 0;
}
The statement char str[] = "hello world"; is (probably) put on the stack when called, and expires once the function exits. If you are naïve enough to assume this is how it works on all target systems, you can write cute code like this, since the continuation is called ON TOP of the existing stack(so the data of the function still exists since it hasn't returned yet):
You can kinda cheat this with a continuation:
#include <stdio.h>
void strA(void (*continuation)(char *)) {
char str[] = "hello world";
continuation(str);
}
void myContinuation(char *arg) {
puts(arg);
}
int main(int argc, char **argv) {
strA(myContinuation);
return 0;
}
case 2:
If you use the snippet below, the literal "hello world" is usually stored in a protected read-only memory(trying to modify this string will cause a segmentation fault on many systems, this is similar to how your main, and strA are stored, c code is basically just a string of instructions/memory blob in the same way a string is a string of characters, but I digress), This string will be available to the program even if the function was never called if you just know the address it's suppose to be on the specific system. In the snippet below, the program prints the string without even calling the function, this will often work on the same platform, with a relatively same code and same compiler. It is considered undefined behavior though.
#include <stdio.h>
char *strB() {
char *str = "hello world";
return str;
}
int main(int argc, char **argv) {
char *myStr;
// comment the line below and replace it with
// result of &myStr[0], in my case, result of &myStr[0] is 4231168
printf("is your string: %s.\n", (char *)4231168);
myStr = strB();
printf("str is at: %lld\n", &myStr[0]);
return 0;
}
You can opt for a strC using structs and relative safety. This structure is created on the stack and FULLY returned. The return of strC is 81(an arbitrary number I made up for the structure, that I trust myself to respect) bytes in size.
#include <stdio.h>
typedef struct {
char data[81];
} MY_STRING;
MY_STRING strC() {
MY_STRING str = {"what year is this?"};
return str;
}
int main(int argc, char **argv) {
puts(strC().data);
printf("size of strC's return: %d.\n", sizeof(strC()));
return 0;
}
tldr; strB is likely corrupted by printf as soon as it returns from the function(since printf now has its' own stack), whereas string used in strA exists outside the function, it's basically a pointer to a global constant available as soon as program starts(the string is there in memory no different to how the code is in memory).

Pass string literal as char* via argument

I made a function which change string, see the following code.
void Test(char* str, char c) {
str[1] = c;
}
int main(){
Test("Hi", '2');
}
I notice it made some run time error. I know how to prevent the error.
char buff[3] = "Hi";
Test(buff,'2');
but I don't know why the first example made run time error. I guess, if I pass string directly, it becomes const char. Does anyone explain what happened exactly?
ps.
what if I use char* str = "hi", then pass it into the argument?
char* buff = "Hi";
Test(buff,'2');
like this. Can I modify buff?
Because "Hi" is string literal and it's not allowed to be modified, they are read-only (the type of string literal is const char[n]).
Modifying it is undefined behavior.
Regarding your edit: char* str = "hi" is invalid, it should be const char* str = "hi". Which is pointer to const char. Again, modifying it is disallowed.
When you don't explicitly allocate memory for strings, compiler stores them in read-only memory. So, any modification to such strings result in run time error.
Test("Hi", '2');
Here in the above case "Hi" string is stored in read-only memory.
char *buff = "Hi";
Test(buff,'2');
Here also "Hi" is stored in the read-only memory and the starting address is returned to buff character pointer, which is same as above. You can overcome such errors by allocating memory for the string and then pass that reference. Like
char buff[3] = "Hi";
Test(buff,'2');
or
char *buff = (char *)malloc(SIZE);
strcpy(buff, "Hi");
Test(buff,'2');
Please refer to this link http://www.geeksforgeeks.org/memory-layout-of-c-program/
Often string constants are in read-only memory, causing a runtime error when you attempt to modify it.
In your second example, you put the string into a buffer on the stack, so it can be updated without error.
Literal strings are not modifiable. When I compile your code with GCC I get the warning:
testptr.cpp:6: warning: deprecated conversion from string constant to 'char*'
Runtime Error:
char* buff = "Hi"; // buff points to an address in the code-section, which is a Read-Only section
buff[1] = 'x'; // Illegal memory access violation
Compilation Error:
const char* buff = "Hi"; // This is a correct declaration, which will prevent the runtime error above
buff[1] = 'x'; // The compiler will not allow this
All Good:
char buff[] = "Hi"; // buff points to an address in the stack or the data-section, which are both Read-Write sections
buff[1] = 'x'; // Works OK
Notes:
In all cases, a string "Hi" is placed in the code-section of the program.
In the last example, the contents of that string are copied into the buff array.
In the last example, the buff array is located in the stack if buff is a non-static local variable, and in the data-section of the program otherwise.

Inputting into a char* declared earlier crashes the program while doing that into a 'just-declared' char* doesn't. Why?

This code crashes the program
#include <cstdio>
int main()
{
char *name1;
char *name2 = "Mark";
gets(name1);
puts(name1);
return 0;
}
whereas this doesn't
#include <cstdio>
int main()
{
char *name1 = "Mark";
char *name2;
gets(name2);
puts(name2);
return 0;
}
Why ?
I am using MinGW with Code::Blocks IDE.
You are just lucky that one crashes and other doesn't.
Both of the programs produce undefined behavior.
char *name2;
gets(name2);
You need to point the pointer to a valid and big enough memory to be able to write to it. You are just writing to a uninitialized pointer. This results in Undefined behavior. Undefined behavior does not mandate a crash, it literally means any behavior is possible, as in your case it might crash sometimes and may not but nevertheless it is a incorrect program.
Ideal Solution is to simply use std::string.
If you insist on using char * you need to point this pointer to a valid memory. For e.g.
char myArr[256];
char *name2 = &myArr;
Both are Undefined Behavior, will it crash or not, is rather matter of luck.
You have to provide the memory for your input, but you don't. If you want to stick at gets and puts you should change char *name to char name[100] or allocate memory:
char *name = new char[100];
...
delete name;
If you need more than 100 chars (including the \0 char at the end of the string) you have to increase the size accordingly.
In C++ using std::string is most likely the better alternative.

C++ Swap string

I am trying to create a non-recursive method to swap a c-style string. It throws an exception in the Swap method. Could not figure out the problem.
void Swap(char *a, char* b)
{
char temp;
temp = *a;
*a = *b;
*b = temp;
}
void Reverse_String(char * str, int length)
{
for(int i=0 ; i <= length/2; i++) //do till the middle
{
Swap(str+i, str+length - i);
}
}
EDIT: I know there are fancier ways to do this. But since I'm learning, would like to know the problem with the code.
It throws an exception in the Swap method. Could not figure out the problem.
No it doesn't. Creating a temporary character and assigning characters can not possibly throw an exception. You might have an access violation, though, if your pointers don't point to blocks of memory you own.
The Reverse_String() function looks OK, assuming str points to at least length bytes of writable memory. There's not enough context in your question to extrapolate past that. I suspect you are passing invalid parameters. You'll need to show how you call Reverse_String() for us to determine if the call is valid or not.
If you are writing something like this:
char * str = "Foo";
Reverse_String(str, 3);
printf("Reversed: '%s'.\n", str);
Then you will definitely get an access violation, because str points to read-only memory. Try the following syntax instead:
char str[] = "Foo";
Reverse_String(str, 3);
printf("Reversed: '%s'.\n", str);
This will actually make a copy of the "Foo" string into a local buffer you can overwrite.
This answer refers to the comment by #user963018 made under #André Caron's answer (it's too long to be a comment).
char *str = "Foo";
The above declares a pointer to the first element of an array of char. The array is 4 characters long, 3 for F, o & o and 1 for a terminating NULL character. The array itself is stored in memory marked as read-only; which is why you were getting the access violation. In fact, in C++, your declaration is deprecated (it is allowed for backward compatibility to C) and your compiler should be warning you as such. If it isn't, try turning up the warning level. You should be using the following declaration:
const char *str = "Foo";
Now, the declaration indicates that str should not be used to modify whatever it is pointing to, and the compiler will complain if you attempt to do so.
char str[] = "Foo";
This declaration states that str is a array of 4 characters (including the NULL character). The difference here is that str is of type char[N] (where N == 4), not char *. However, str can decay to a pointer type if the context demands it, so you can pass it to the Swap function which expects a char *. Also, the memory containing Foo is no longer marked read-only, so you can modify it.
std::string str( "Foo" );
This declares an object of type std::string that contains the string "Foo". The memory that contains the string is dynamically allocated by the string object as required (some implementations may contain a small private buffer for small string optimization, but forget that for now). If you have string whose size may vary, or whose size you do not know at compile time, it is best to use std::string.

Why is main() argument argv of type char*[] rather than const char*[]?

When I wrote the following code and executed it, the compiler said
deprecated conversion from string constant to char*
int main()
{
char *p;
p=new char[5];
p="how are you";
cout<< p;
return 0;
}
It means that I should have written const char *.
But when we pass arguments into main using char* argv[] we don't write const char* argv[].
Why?
Because ... argv[] isn't const. And it certainly isn't a (static) string literal since it's being created at runtime.
You're declaring a char * pointer then assigning a string literal to it, which is by definition constant; the actual data is in read-only memory.
int main(int argc, char **argv) {
// Yes, I know I'm not checking anything - just a demo
argv[1][0] = 'f';
std::cout << argv[1] << std::endl;
}
Input:
g++ -o test test.cc
./test hoo
Output:
foo
This is not a comment on why you'd want to change argv, but it certainly is possible.
Historical reasons. Changing the signature of main() would break too much existing code. And it is possible that some implementations allow you to change the parameters to main from your code. However code like this:
char * p = "helllo";
* p = 'x';
is always illegal, because you are not allowed to mess with string literals like that, so the pointer should be to a const char.
why is it required for char* to be constant while assigning it to a string
Because such literal strings (like "hi", "hello what's going on", etc), are stored in the read-only segment of your exe. As such, the pointers that point to them need to point to constant characters (eg, can't change them).
You are assigning a string constant (const char*) to a pointer to a non-constant string (char *p). This would allow you to modify the string constant, e.g. by doing p[0] = 'n'.
Anyway, why don't you use std::string instead ? (you seem to be using C++).
If you look at execution functions like execve, you will see that they actually don't accept const char* as parameters, but do indeed require char*, therefore you can't use a string constant to invoke main.