Why does cout << *s << endl produce a segfault? - c++

I have this question on a practice exam for my C++ class, we're supposed to write what the output is or if it produces an error. Running this code produces a segmentation fault, but can someone explain why? It looks fine to me.
string *s;
s = (string *) "This is my house. I have to defend it.";
cout << *s << endl;

Indirecting through a pointer of type std::string* when it doesn't point to an object of type std::string has undefined behaviour.
A string literal is not an object of type std::string. String literal is an array of characters. std::string is a class defined in the header <string>.
Running this code produces a segmentation fault, but can someone explain why?
You indirect through a pointer that doesn't point to an object of compatible type. The behaviour of your program is undefined.
P.S. It is never necessary to use C-style cast (such as (type)expression). It can easily suppress helpful compilation errors and replace them with undefined behaviour. It should be avoided.
If you hadn't used a C-style cast here, then the type system would have alerted you to the mistake before getting to run the program. In this case, you might have seen an error message similar to:
error: cannot convert 'const char [39]' to 'std::string*' {aka 'std::basic_string<char>*'} in assignment
helping you realise that the types do not match.

You need to learn the difference between a string literal and class std::string.
Casting a string literal to std::string* is undefined behaviour. Which manifests itself here as a segmentation fault when the code does *s because s is invalid.
Without that C-style cast (string *) in s = (string *) "T..."; the compiler would emit an error.
Always question those C-style casts in C++.

The likely problem (or at least one of the problems) is the fact that s has no memory allocated to it. You need to use the new keyword and try something like this:
string *s;
s = new string("This is my house. I have to defend it.");
cout << *s << endl;
Of course, in theory you'd need to use delete too.

std::string tends to boil down to something that looks like this:
struct string
{
char* _begin;
char* _end;
char* _capacity;
};
The text string you have defined would look like this:
const char* const text = "This is my house. I have to defend it."
So you are casting from type (char*), to type (string*). This now means that _begin will store the memory location "This is ", and _end will store the memory location "my house"
When printing the string to std::cout, it will dereference the pointer to s, and try to find the string size, usually implemented like so:
size_t string::size() const { return _end - _begin; }
Given that _begin and _end aren't storing pointer values (they are just random bits of text), this will result in cout trying to print a very wrong number of characters _(because the memory locations in _begin and end are a nonsense)

Related

The following statment doesn't create a value on the Heap, nor the stack. Where does it exist in memory until then?

This is from my Computer Science Class
"This is dangerous (and officially deprecated in the C++ standard) because you haven't allocated memory for str1 to point at."
            — jD3V's Computer Science Professor
The Quote Above is Referring to this Line of Code
char* str1 = "Hello world";
To be clear:
I Get that using a pointer, as shown in the Line of Code above, is deprecated. I also know that it shouldn't appear in my code.
The Part I Don't Get:
The example line of code — char* str1 = "Hello world"; — works, and that surprises me.
It says that no memory has been allocated for the pointer to point at, though the pointer could still be accessed to obtain the C-String "Hello World". I am unaware of another place in memory, though my guess is that there has to be one, because if the following statement doesn't exist on the heap — "and its not placed in the stack according to my debugger" — then it must live in another memory location.
I am trying to be able to understand, and locate where the variables I declare are at in memory, and I am unable to do that here.
I would like to know...
In the example I showed above, where is the string "Hello World", and the str1 pointer that points at it, located in memory, if not in the Heap, or on the Stack?
[Disclaimer: I wrote this answer when the question was tagged both [c] and [c++]. The answer is definitely different for C versus C++. I am leaning somewhat towards C in this answer.]
char* str = "Hello world";
This is perfectly fine in C.
According to my CS Professor, in reference to the statement above, he says...
"This is dangerous (and officially deprecated in the C++ standard) because you haven't allocated memory for str to point at."
Either you misunderstood, or your professor is very badly confused.
The code is deprecated in C++ because you neglected to declare str as being a pointer to unmodifiable (i.e. const) characters. But there is nothing, absolutely nothing, wrong with the allocation of the pointed-to string.
When you write
char *str = "Hello world";
the compiler takes care of allocating memory for str to point to.
The compiler behaves more or less exactly as if you had written
static char __hidden_string[] = "Hello world";
char *str = __hidden_string;
or maybe
static const char __hidden_string[] = "Hello world";
char *str = (char *)__hidden_string;
Now, where is that __hidden_string array allocated? Certainly not on the stack (you'll notice it's declared static), and certainly not on the heap, either.
Once upon a time, the __hidden_string array was typically allocated in the "initialized data" segment, along with (most of) the rest of your global variables. That meant you could get away with modifying it, if you wanted to.
These days, some/many/most compilers allocate __hidden_string in a nonwritable segment, perhaps even the code segment. In that case, if you try to modify the string, you'll get a bus error or the equivalent.
For backwards compatibility reasons, C compilers cannot treat a string constant as if it were of type const char []. If they did, you'd get a warning whenever you wrote something like
char *str = "Hello world";
and to some extent that warning would be a good thing, because it would force you to write
const char *str = "Hello world";
making it explicit that str points to a string that you're not allowed to modify.
But C did not adopt this rule, because there's too much old code it would have broken.
C++, on the other hand, very definitely has adopted this rule. When I try char *str = "Hello world"; under two different C++ compilers, I get warning: conversion from string literal to 'char *' is deprecated. It's likely this is what your professor was trying to get at.
Summary:
"any strings in double quotes" are const lvalue string literals, stored somewhere in compiled program.
You can't modify such string, but you can store pointer to this string (of course const) and use it without modifying:
const char *str = "some string"
For example:
int my_strcmp(const char *str1, const char *str2) { ... }
int main()
{
...
const char *rule2_str= "rule2";
// compare some strings
if (my_strcmp(my_string, "rule1") == 0)
std::cout << "Execute rule1" << std::endl;
else if (my_strcmp(my_string, rule2_str) == 0)
std::cout << "Execute rule2" << std::endl;
...
}
If you want to modify string, you can copy string literal to your own array: char array[] = "12323", then your array will ititialize as string with terminate zero at the end:
Actually char array[] = "123" is same as char array[] = {'1', '2', '3', '\0'}.
For example:
int main()
{
char my_string[] = "12345";
my_string[0] = 5; // correct!
std::cout << my_string << std::endl; // 52345
}
Remember that then your array will be static, so you can't change it's size, for "dynamic sized" strings use std::string.
The problem is in lvalue and rvalue. lvalue defines locator value and it means that it has a specified place in memory and you can easily take it. rvalue has undefined place in memory. For example, any int a = 5 has 5 as rvalue. So you cannot take the address of an rvalue. When you try to access the memory for char* str = "Hello World" with something like str[0] = 'x' you will get an error Segmentation fault which means you tried to get unaviable memory. Btw operators * and & are forbidden for rvalues, it throws compile time error, if you try to use them.
The lvalue of "Hello World" is stored at the programms segment of memory. But it is specified so, as the programm can't modify it directly.

Initialization of pointers in c++

I need to clarify my concepts regarding the basics of pointer initialization in C++. As per my understanding, a pointer must be assigned an address before putting some value using the pointer.
int *p;
*p=10; //inappropriate
cout << *p <<"\n";
This would probably show the correct output (10) but this may cause issue in larger programs since p initially had garbage address which can be anything & may later be used somewhere else in the program as well.So , I believe this is incorrrect, the correct way is:
int *p;
int x=10;
p=&x; //appropriate
cout << *p <<"\n";
My question is, if the above understanding is correct, then does the same apply on char* as well?:
const char *str="hello"; // inappropriate
cout << str << "\n";
//OR
const string str1= "hello";
const char str2[6] ="world";
const char *str=str1; //appropriate
const char *st=str2; //appropriate
cout << str << st << "\n";
Please advice
Your understanding of strings is incorrect.
Lets take for example the very first line:
const char *str="hello";
This is actually correct. A string literal like "hello" is turned into a constant array by the compiler, and like all arrays it can decay to a pointer to its first element. So what you are doing is making str point to the first character of the array.
Then lets continue with
const string str1= "hello";
const char *str=str1;
This is actually wrong. A std::string object have no casting operator defined to cast to a const char *. The compiler will give you an error for this. You need to use the c_str function go get a pointer to the contained string.
Lastly:
const char str2[6] ="world";
const char *st=str2; //appropriate
This is really no different than the first line when you declare and initialize str. This is, as you say, "appropriate".
About that first example with the "inappropriate" pointer:
int *p;
*p=10; //inappropriate
cout << *p <<"\n";
This is not only "inappropriate", this leads to undefined behavior and may actually crash your program. Also, the correct term is that the value of p is indeterminate.
When I declare a pointer
int *p;
I get an object p whose values are addresses. No ints are created anywhere. The thing you need to do is think of p as being an address rather than being an int.
At this point, this isn't particularly useful since you have no addresses you could assign to it other than nullptr. Well, technically that's not true: p itself has an address which you can get with &p and store it in an int**, or even do something horrible like p = reinterpret_cast<int*>(&p);, but let's ignore that.
To do something with ints, you need to create one. e.g. if you go on to declare
int x;
you now have an int object whose values are integers, and we could then assign its address to p with p = &x;, and then recover the object from p via *p.
Now, C style strings have weird semantics — the weirdest aspect being that C doesn't actually have strings at all: it's always working with arrays of char.
String literals, like "Hello!", are guaranteed to (act1 like they) exist as an array of const char located at some address, and by C's odd conversion rules, this array automatically converts to a pointer to its first element. Thus,
const char *str = "hello";
stores the address of the h character in that character array. The declaration
const char str2[6] ="world";
works differently; this (acts1 like it) creates a brand new array, and copies the contents of the string literal "world" into the new array.
As an aside, there is an obsolete and deprecated feature here for compatibility with legacy programs, but for some misguided reason people still use it in new programs these days so you should be aware of it and that it's 'wrong': you're allowed to break the type system and actually write
char *str = "hello";
This shouldn't work because "hello" is an array of const char, but the standard permits this specific usage. You're still not actually allowed to modify the contents of the array, however.
1: By the "as if" rule, the program only has to behave as if things happen as I describe, but if you peeked at the assembly code, the actual way things happen can be very different.

Initialization of Chars [duplicate]

This question already has answers here:
How to initialize all members of an array to the same value?
(26 answers)
Closed 8 years ago.
I have been wondering, why can I not write my code like so:
char myChar[50];
myChar = "This is a really cool char!";
Or at least like this:
char myChar[50];
myChar[0] = "This is a really cool char!";
The second way makes more sense that it should work, to me, seeing that I would
start the array at the point I want it to start moving the letters into each spot in the
array.
Does anyone know why C++ does not do this? And can you show me the reasoning behind the
right way to do it?
Thank you all in advance!
The first line:
char myChar[50];
...allocates an array of 50 characters on the stack. The second line:
myChar = "This is a really cool char!";
Is attempting to assign a const static string (which exists in read-only memory in the text segment of your code) to the address of the beginning of the array. This is an incompatible LVALUE/RVALUE matcing/assignment. This approach:
const char* myChar = "This is a really cool char";
Will work, as the assignment of a pointer to address a string literal must be done at initialization time. There are potential exceptions, as in assigning a const char* pointer to a string literal like so:
/*******************************************************************************
* Preprocessor Directives
******************************************************************************/
#include <stdio.h>
/*******************************************************************************
* Function Prototypes
******************************************************************************/
const char* returnErrorString(int iError);
/*******************************************************************************
* Function Definitions
******************************************************************************/
int main(void) {
int i;
for (i=(-1); i<3; i++) {
printf("i=%d - Error String:%s\n", returnErrorString(i));
}
return 0;
}
const char* returnErrorString(int iError) {
const char* ret = NULL;
switch (iError) {
case 0:
ret = "No error";
break;
case (-1):
ret = "Invalid input";
break;
default:
ret = "Unknown error";
break;
}
return ret;
}
You might benefit from reading the post in my references below. It will give you some info on how code, variables, constants, etc, are broken into different segments of the final binary, and why some approaches don't even make sense. Also, it would be beneficial to read up a bit on terminology like integer literals, string literals, l-values, r-values, etc.
Good luck!
References
Difference between declared string and allocated string, Accessed 2014-05-01, <https://stackoverflow.com/questions/16021454/difference-between-declared-string-and-allocated-string>
You must initialise the array of chars inside the declaration of the array. There is actually no reason for not doing so, as if not, your array will contain garbage values until you initialise it. I advise you to look at this link:
Char array declaration and initialization in C
Also, you are allocating a char array of size 50 but only using 28 elements of it, this would appear to me to be a waste...
Try the following for simple string initialisations:
char mychar[11] = "hello world";
Or...
char *mychar = "hello world";
I hope this helps...
If you want to think of this in holistic terms, the reason is because myChar isn't a string -- it's just an array of char. Hence "FooGHBar" and char [50] are completely different types. Related in a sense, but really not.
Now some might say, "but "FooBar" is a string, and char [50] is really just a string too." But that is going on the assumption that myChar is the same as "FooBar", and it's not. It's also assuming that the compiler understands that both char[50] and char* are pointers to strings. The compiler doesn't understand that. There could be any manner of thing stored in those places that have nothing to do with strings.
"But myChar is just a pointer?"
That is the reason why people think that the assignment should be a natural thing -- but the fundamental premise is wrong. myChar is not a pointer. It is an array. A name which refers to an array will decay into a pointer at the drop of a hat, but an array is not a pointer.

How to: Typecasting Pointers on c++

I am learning typecasting.
Here is my basic code, i dont know why after typecasting, when printing p0, it is not showing the same address of a
I know this is very basic.
#include <iostream>
using namespace std;
int main()
{
int a=1025;
cout<<a<<endl;
cout<<"Size of a in bytes is "<<sizeof(a)<<endl;
int *p;//pointer to an integer
p=&a; //p stores an address of a
cout<<p<<endl;//display address of a
cout<<&a<<endl;//displays address of a
cout<<*p<<endl;//display value where p points to. p stores an address of a and so it points to the value of a
char *p0;//pointer to character
p0=(char*)p;//typecasting
cout<<p0<<endl;
cout<<*p0;
return 0;
}
When you pass a char * pointer to the << operator of std::cout, it prints the string that the pointer points to, not the address. It's the same behavior as the following code:
const char *str = "Hello!";
cout << str; // Prints the string "Hello!", not the address of the string
In your case, p0 doesn't point to a string, which is why you're getting unexpected behavior.
The overload of operator<<, used with std::cout and char* as arguments, is expecting a null-terminated string. What you are feeding it with, instead, is a pointer to what was an int* instead. This leads to undefined behavior when trying to output the char* in cout<<p0<<endl;.
In C++, is often a bad idea to use C-style casts. If you had used static_cast for example, you would have been warned that the conversion your are trying to make does not make much sense. It is true that you could use reinterpret_cast instead, but what you should be asking yourself is: why am I doing this? Why am I trying to shoot myself in the foot?
If what you want is to convert the number to string, you should be using other techniques instead. If you just want to print out the address of the char* you should be using std::addressof:
std::cout << std::addressof(p0) << std::endl;
As others have said cout is interpreting the char* as a string, and not a pointer
If you wanted to prove that the address is the same whatever type of pointer it is then you can cast it to a void pointer
cout<<(void*)p0<<endl;
In fact you get the address for pretty much any type other than char&
cout<<(float*)p0<<endl;
To prove to yourself that a char* pointer would have the same value use printf
printf("%x", p0);

C/C++ Char Pointer Crash

Let's say that a function which returns a fixed ‘random text’ string is written like
char *Function1()
{
return “Some text”;
}
then the program could crash if it accidentally tried to alter the value doing
Function1()[1]=’a’;
What are the square brackets after the function call attempting to do that would make the program crash? If you're familiar with this, any explanation would be greatly appreciated!
The string you're returning in the function is usually stored in a read-only part of your process. Attempting to modify it will cause an access violation. (EDIT: Strictly speaking, it is undefined behavior, and in some systems it will cause an access violation. Thanks, John).
This is the case usually because the string itself is hardcoded along with the code of your application. When loading, pointers are stablished to point to those read-only sections of your process that hold literal strings. In fact, whenever you write some string in C, it is treated as a const char* (a pointer to const memory).
The signature of that function should really be constchar* Function();.
You are trying to modify a string literal. According to the Standard, this evokes undefined behavior. Another thing to keep in mind (related) is that string literals are always of type const char*. There is a special dispensation to convert a pointer to a string literal to char*, taking away the const qualifier, but the underlying string is still const. So by doing what you are doing, you are trying to modify a const. This also evokes undefined behavior, and is akin to trying to do this:
const char* val = "hello";
char* modifyable_val = const_cast<char*>(val);
modifyable_val[1] = 'n'; // this evokes UB
Instead of returning a const char* from your function, return a string by value. This will construct a new string based on the string literal, and the calling code can do whatever it wants:
#include <string>
std::string Function1()
{
return “Some text”;
}
...later:
std::string s = Function1();
s[1] = 'a';
Now, if you are trying to change the value that Function() reuturns, then you'll have to do something else. I'd use a class:
#include <string>
class MyGizmo
{
public:
std::string str_;
MyGizmo() : str_("Some text") {};
};
int main()
{
MyGizmo gizmo;
gizmo.str_[1] = 'n';
}
You can use static char string for return value, but you never use it. It's just like access violation error. The behavior of it is not defined in c++ Standard.
It's not the brackets, but the assignement. Your function returns not a simple char *, but const char *( i can be wrong here, but the memory is read-only here), so you try to change the unchangeable memory. And the brackets - they just give you access to the element of the array.
Note also that you can avoid the crash by placing the text in a regular array:
char Function1Str[] = "Some text";
char *Function1()
{
return Function1Str;
}
The question shows that you do not understand the string literals.
image this code
char* pch = "Here is some text";
char* pch2 = "some text";
char* pch3 = "Here is";
Now, how the compiler allocates memory to the strings is entirely a matter for the compiler. the memory might organised like this:
Here is<NULL>Here is some text<NULL>
with pch2 pointing to memory location inside the pch string.
The key here is understanding the memory. Using the Standard Template Library (stl) would be a good practice, but you may be quite a steep learning curve for you.