Allocating C-style string on the Heap - c++

guys, I need a little technical help.
I'm working in C++, don't have much experience working in it but know the language somewhat. I need to use a C-style string (char array) but I need to allocate it on the heap.
If you look at this very simple piece of code:
#include <iostream>
using namespace std;
char* getText()
{
return "Hello";
}
int main()
{
char* text;
text = getText();
cout << text;
//delete text; // Calling delete results in an error
}
Now, I'm assuming that the "Hello" string is allocated on the stack, within getText(), which means the pointer will be "floating" as soon as getText returns, am I right?
If I'm right, then what's the best way to put "Hello" on the heap so I can use that string outside of getText and call delete on the pointer if I need to?

No, there's no hidden stack allocation going on there. "Hello" is static data, and ends up in the .data segment of your program.
This also means the string is shared for all calls to getText. A common use when this would be acceptable is if you have a large list of error messages that map to error codes. Functions like strerror work like this, so that you can get descriptive error messages for standard library error codes. But nobody is supposed to modify the return value of strerror (also because it is const). In your case, your function definition should read:
const char *getText()
If you do want a private copy of the string returned, you can use the strdup function to make a copy:
return strdup("Hello");

Use a std::string, from the <string> header. Then use its .c_str() member function. Then you don't have to care about allocation and deallocation: it takes care of it for you, correctly.
Cheers & hth.,

This is not right. "Hello" is a static string constant and it really should be const char*.

A narrow string literal has type "array of n const char", where n is the size of the string as defined below, and has static storage duration.
Static storage is neither automatic ("on the stack") nor dynamic ("on the heap"). It is allocated prior to the actual runtime of your program, so pointers to string literals never become invalid.
Note that char* p = "Hello" is deprecated because it is dangerous: the type system cannot prevent you from trying to change the string literal through p (which would result in undefined behavior). Use const char* p = "Hello" instead.

Related

Access violation at modifying char* by dereferencing [duplicate]

I read this on wikipedia
int main(void)
{
char *s = "hello world";
*s = 'H';
}
When the program containing this code is compiled, the string "hello world" is placed in the section of the program executable file marked as read-only; when loaded, the operating system places it with other strings and constant data in a read-only segment of memory. When executed, a variable, s, is set to point to the string's location, and an attempt is made to write an H character through the variable into the memory, causing a segmentation fault**
i don't know why the string is placed in read only segment.please someone could explain this.
String literals are stored in read-only memory, that's just how it works. Your code uses a pointer initialized to point at the memory where a string literal is stored, and thus you can't validly modify that memory.
To get a string in modifiable memory, do this:
char s[] = "hello world";
then you're fine, since now you're just using the constant string to initialize a non-constant array.
There is a big difference between:
char * s = "Hello world";
and
char s[] = "Hello world";
In the first case, s is a pointer to something that you can't change. It's stored in read-only memory (typically, in the code section of your application).
In the latter case, you allocate an array in read-write memory (typically plain RAM), that you can modify.
When you do: char *s = "hello world"; then s is a pointer that points to a memory that is in the code part, so you can't change it.
When you do: char s[] = "Hello World"; then s is an array of chars
that are on the stack, so you can change it.
If you don't want the string to be changed during the program, it is better to do: char
const *s = ....;. Then, when you try to change the string, your program will not crash with segmentation fault, it will arise a compiler error (which is much better).
first have a good understanding of pointers, I will give u a short demo:
First let us analyze your code line by line. Lets start from main onwards
char *s = "Some_string";
first of all, you are declaring a pointer to a char variable, now *s is a address in memory, and C will kick you if you try to change its memory value, thats illegal, so u better declare a character array, then assign s to its address, then change s.
Hope you get, it. For further reference and detailed understanding, refer KN King: C programming A Modern Approach
Per the language definition, string literals have to be stored in such a way that their lifetime extends over the lifetime of the program, and that they are visible over the entire program.
Exactly what this means in terms of where the string gets stored is up to the implementation; the language definition does not mandate that string literals are stored in read-only memory, and not all implementations do so. It only says that attempting to modify the contents of a string literal results in undefined behavior, meaning the implementation is free to do whatever it wants.

Returning a constant char pointer yields an error

I am new to C++, and haven't quite grasped all the concepts yet, so i am perplexed at why this function does not work. I am currently not at home, so i cannot post the compiler error just yet, i will do it as soon as i get home.
Here is the function.
const char * ConvertToChar(std::string input1, std::string input2) {
// Create a string that you want converted
std::stringstream ss;
// Streams the two strings together
ss << input1 << input2;
// outputs it into a string
std::string msg = ss.str();
//Creating the character the string will go in; be sure it is large enough so you don't overflow the array
cont char * cstr[80];
//Copies the string into the char array. Thus allowing it to be used elsewhere.
strcpy(cstr, msg.c_str());
return * cstr;
}
It is made to concatenate and convert two strings together to return a const char *. That is because the function i want to use it with requires a const char pointer to be passed through.
The code returns a pointer to a local (stack) variable. When the caller gets this pointer that local variable doesn't exist any more. This is often called dangling reference.
If you want to convert std::string to a c-style string use std::string::c_str().
So, to concatenate two strings and get a c-style string do:
std::string input1 = ...;
std::string input2 = ...;
// concatenate
std::string s = input1 + input2;
// get a c-style string
char const* cstr = s.c_str();
// cstr becomes invalid when s is changed or destroyed
Without knowing what the error is, it's hard to say, but this
line:
const char* cstr[80];
seems wrong: it creates an array of 80 pointers; when it
implicitly converts to a pointer, the type will be char
const**, which should give an error when it is passed as an
argument to strcpy, and the dereference in the return
statement is the same as if you wrote cstr[0], and returns the
first pointer in the array—since the contents of the array
have never been initialized, this is undefined behavior.
Before you go any further, you have to define what the function
should return—not only its type, but where the pointed to
memory will reside. There are three possible solutions to this:
Use a local static for the buffer:
This solution was
frequently used in early C, and is still present in a number of
functions in the C library. It has two major defects: 1)
successive calls will overwrite the results, so the client code
must make its own copy before calling the function again, and 2)
it isn't thread safe. (The second issue can be avoided by using
thread local storage.) In cases like yours, it also has the
problem that the buffer must be big enough for the data, which
probably requires dynamic allocation, which adds to the
complexity.
Return a pointer to dynamically allocated memory:
This works well in theory, but requires the client code to free
the memory. This must be rigorously documented, and is
extremely error prone.
Require the client code to provide the buffer:
This is probably the best solution in modern code, but it does
mean that you need extra parameters for the address and the
length of the buffer.
In addition to this: there's no need to use std::ostringstream
if all you're doing is concatenating; just add the two strings.
Whatever solution you use, verify that the results will fit.

C/C++ Char Pointer Crash

Let's say that a function which returns a fixed ‘random text’ string is written like
char *Function1()
{
return “Some text”;
}
then the program could crash if it accidentally tried to alter the value doing
Function1()[1]=’a’;
What are the square brackets after the function call attempting to do that would make the program crash? If you're familiar with this, any explanation would be greatly appreciated!
The string you're returning in the function is usually stored in a read-only part of your process. Attempting to modify it will cause an access violation. (EDIT: Strictly speaking, it is undefined behavior, and in some systems it will cause an access violation. Thanks, John).
This is the case usually because the string itself is hardcoded along with the code of your application. When loading, pointers are stablished to point to those read-only sections of your process that hold literal strings. In fact, whenever you write some string in C, it is treated as a const char* (a pointer to const memory).
The signature of that function should really be constchar* Function();.
You are trying to modify a string literal. According to the Standard, this evokes undefined behavior. Another thing to keep in mind (related) is that string literals are always of type const char*. There is a special dispensation to convert a pointer to a string literal to char*, taking away the const qualifier, but the underlying string is still const. So by doing what you are doing, you are trying to modify a const. This also evokes undefined behavior, and is akin to trying to do this:
const char* val = "hello";
char* modifyable_val = const_cast<char*>(val);
modifyable_val[1] = 'n'; // this evokes UB
Instead of returning a const char* from your function, return a string by value. This will construct a new string based on the string literal, and the calling code can do whatever it wants:
#include <string>
std::string Function1()
{
return “Some text”;
}
...later:
std::string s = Function1();
s[1] = 'a';
Now, if you are trying to change the value that Function() reuturns, then you'll have to do something else. I'd use a class:
#include <string>
class MyGizmo
{
public:
std::string str_;
MyGizmo() : str_("Some text") {};
};
int main()
{
MyGizmo gizmo;
gizmo.str_[1] = 'n';
}
You can use static char string for return value, but you never use it. It's just like access violation error. The behavior of it is not defined in c++ Standard.
It's not the brackets, but the assignement. Your function returns not a simple char *, but const char *( i can be wrong here, but the memory is read-only here), so you try to change the unchangeable memory. And the brackets - they just give you access to the element of the array.
Note also that you can avoid the crash by placing the text in a regular array:
char Function1Str[] = "Some text";
char *Function1()
{
return Function1Str;
}
The question shows that you do not understand the string literals.
image this code
char* pch = "Here is some text";
char* pch2 = "some text";
char* pch3 = "Here is";
Now, how the compiler allocates memory to the strings is entirely a matter for the compiler. the memory might organised like this:
Here is<NULL>Here is some text<NULL>
with pch2 pointing to memory location inside the pch string.
The key here is understanding the memory. Using the Standard Template Library (stl) would be a good practice, but you may be quite a steep learning curve for you.

Conversion char[] to char*

may be this is a sizzle question but please help
void Temp1::caller()
{
char *cc=Called();
printf("sdfasfasfas");
printf("%s",cc);
}
char *Temp1::Called()
{
char a[6]="Hello";
return &a;
}
Here how to print Hello using printf("%s",cc);
Firstly, this function:
char *Temp1::Called()
{
char a[6]="Hello";
return &a;
}
is returning a local variable, which will cease to exist once the function ends - change to:
const char *Temp1::Called()
{
return "Hello";
}
and then, the way to print strings using printf() is to use "%s":
void Temp1::caller()
{
const char *cc=Called();
printf("sdfasfasfas");
printf("%s",cc);
}
You are returning address of local variable, which exploits undefined behaviour. You need to make a static inside Called, or global, or allocate memory for it.
And use %s as format for printf
char a[6] is a local variable and you can not return it from function. It will be destroyed when your code will go out of scope.
You can use STL fot this:
#include <stdio.h>
#include <string>
using namespace std;
string Called()
{
string a=string("Hello");
return a;
}
int main()
{
string cc=Called();
printf("sdfasfasfas\n");
printf("%s",cc.c_str());
}
2 things
You need %s, the string format specifier to print strings.
You are returning the address of a local array variable a[6], it will be destroyed after the function returns. The program should be giving you a segmentation fault. You should be getting a crash. If you are on a linux machine do ulimit -c unlimited, and then run the program. You should see a core dump.
%c is for printing a single character. You need to use %s for printing a null terminated string. That said, this code is likely to crash as you are trying to return the address of the local variable a from function Called. This variable memory is released as soon as Called is returned. You are trying to use this released memory is Caller and in most of the cases it will crash.
Few things:
Change return &a; to return a;
You are returning the address of a
local array which will cease to exist
once the function return..so allocate
it dynamically using new or make it static.
use %s format specifier in printf in place of %c
Problem 1: %c is the specifier for a single char, while you need to use %s, which is the format specifier for strings (pointers to a NUL-terminated array of chars).
Problem 2: in Called you are returning a pointer to a pointer to char (a char **): a itself is considered a pointer to the first element of the array, so you don't need that ampersand in the return.
Problem 3: even if you corrected the other two errors, there's a major flaw in your code: you're trying to return a pointer to a local object (the array a ), which will be destructed when it will get out of scope (e.g. when Called will return). So the caller will have a pointer to an area of memory which is no longer dedicated to a; what happens next is undefined behavior: it may work for a while, until the memory where a was stored isn't used for something else (for example if you don't call other functions before using the returned value), but most likely will explode tragically at the first change in the application.
The correct method for returning strings in C is allocating them on the heap with malloc or calloc and return this pointer to the caller, which will have the responsibility to free it when it won't be needed anymore. Another common way to do it in C is to declare the local variable as static, so it won't be destructed at the return, but this will make your function non-reentrant neither thread-safe (and it may also give other nasty problems).
On the other hand, since you are using C++, the best way to deal with strings is the std::string class, which has a nice copy semantic so that you can return it as a normal return value without bothering about scope considerations.
By the way, if the string you need to return is always the same, you can just declare the return type of your function as const char * and return directly the string, as in
const char * Test()
{
return "Test";
}
This works because the string "Test" is put by the compiler in a fixed memory location, where it will stay during all the execution. It needs to be a const char * because the compiler is allowed to say to every other piece of program that needs a "Test" string to look there, and because all the strings of the application are tightly-packed there, so if you tried to make that string longer you would overwrite some other string.
Still, in my opinion, if you are doing errors like those I outlined, it may be that you're trying to do something a bit too complex for your current skills: if may be better to have another look at your C++ manual, especially at the chapter about pointers and strings.

Caching a const char * as a return type

Was reading up a bit on my C++, and found this article about RTTI (Runtime Type Identification):
http://msdn.microsoft.com/en-us/library/70ky2y6k(VS.80).aspx . Well, that's another subject :) - However, I stumbled upon a weird saying in the type_info-class, namely about the ::name-method. It says: "The type_info::name member function returns a const char* to a null-terminated string representing the human-readable name of the type. The memory pointed to is cached and should never be directly deallocated."
How can you implement something like this yourself!? I've been struggling quite a bit with this exact problem often before, as I don't want to make a new char-array for the caller to delete, so I've stuck to std::string thus far.
So, for the sake of simplicity, let's say I want to make a method that returns "Hello World!", let's call it
const char *getHelloString() const;
Personally, I would make it somehow like this (Pseudo):
const char *getHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
.. But this would mean that the caller should do a delete[] on my return pointer :(
Thx in advance
How about this:
const char *getHelloString() const
{
return "HelloWorld!";
}
Returning a literal directly means the space for the string is allocated in static storage by the compiler and will be available throughout the duration of the program.
I like all the answers about how the string could be statically allocated, but that's not necessarily true for all implementations, particularly the one whose documentation the original poster linked to. In this case, it appears that the decorated type name is stored statically in order to save space, and the undecorated type name is computed on demand and cached in a linked list.
If you're curious about how the Visual C++ type_info::name() implementation allocates and caches its memory, it's not hard to find out. First, create a tiny test program:
#include <cstdio>
#include <typeinfo>
#include <vector>
int main(int argc, char* argv[]) {
std::vector<int> v;
const type_info& ti = typeid(v);
const char* n = ti.name();
printf("%s\n", n);
return 0;
}
Build it and run it under a debugger (I used WinDbg) and look at the pointer returned by type_info::name(). Does it point to a global structure? If so, WinDbg's ln command will tell the name of the closest symbol:
0:000> ?? n
char * 0x00000000`00857290
"class std::vector<int,class std::allocator<int> >"
0:000> ln 0x00000000`00857290
0:000>
ln didn't print anything, which indicates that the string wasn't in the range of addresses owned by any specific module. It would be in that range if it was in the data or read-only data segment. Let's see if it was allocated on the heap, by searching all heaps for the address returned by type_info::name():
0:000> !heap -x 0x00000000`00857290
Entry User Heap Segment Size PrevSize Unused Flags
-------------------------------------------------------------------------------------------------------------
0000000000857280 0000000000857290 0000000000850000 0000000000850000 70 40 3e busy extra fill
Yes, it was allocated on the heap. Putting a breakpoint at the start of malloc() and restarting the program confirms it.
Looking at the declaration in <typeinfo> gives a clue about where the heap pointers are getting cached:
struct __type_info_node {
void *memPtr;
__type_info_node* next;
};
extern __type_info_node __type_info_root_node;
...
_CRTIMP_PURE const char* __CLR_OR_THIS_CALL name(__type_info_node* __ptype_info_node = &__type_info_root_node) const;
If you find the address of __type_info_root_node and walk down the list in the debugger, you quickly find a node containing the same address that was returned by type_info::name(). The list seems to be related to the caching scheme.
The MSDN page linked in the original question seems to fill in the blanks: the name is stored in its decorated form to save space, and this form is accessible via type_info::raw_name(). When you call type_info::name() for the first time on a given type, it undecorates the name, stores it in a heap-allocated buffer, caches the buffer pointer, and returns it.
The linked list may also be used to deallocate the cached strings during program exit (however, I didn't verify whether that is the case). This would ensure that they don't show up as memory leaks when you run a memory debugging tool.
Well gee, if we are talking about just a function, that you always want to return the same value. it's quite simple.
const char * foo()
{
static char[] return_val= "HelloWorld!";
return return_val;
}
The tricky bit is when you start doing things where you are caching the result, and then you have to consider Threading,or when your cache gets invalidated, and trying to store thing in thread local storage. But if it's just a one off output that is immediate copied, this should do the trick.
Alternately if you don't have a fixed size you have to do something where you have to either use a static buffer of arbitrary size.. in which you might eventually have something too large, or turn to a managed class say std::string.
const char * foo()
{
static std::string output;
DoCalculation(output);
return output.c_str();
}
also the function signature
const char *getHelloString() const;
is only applicable for member functions.
At which point you don't need to deal with static function local variables and could just use a member variable.
I think that since they know that there are a finite number of these, they just keep them around forever. It might be appropriate for you to do that in some instances, but as a general rule, std::string is going to be better.
They can also look up new calls to see if they made that string already and return the same pointer. Again, depending on what you are doing, this may be useful for you too.
Be careful when implementing a function that allocates a chunk of memory and then expects the caller to deallocate it, as you do in the OP:
const char *getHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
By doing this you are transferring ownership of the memory to the caller. If you call this code from some other function:
int main()
{
char * str = getHelloString();
delete str;
return 0;
}
...the semantics of transferring ownership of the memory is not clear, creating a situation where bugs and memory leaks are more likely.
Also, at least under Windows, if the two functions are in 2 different modules you could potentially corrupt the heap. In particular, if main() is in hello.exe, compiled in VC9, and getHelloString() is in utility.dll, compiled in VC6, you'll corrupt the heap when you delete the memory. This is because VC6 and VC9 both use their own heap, and they aren't the same heap, so you are allocating from one heap and deallocating from another.
Why does the return type need to be const? Don't think of the method as a get method, think of it as a create method. I've seen plenty of API that requires you to delete something a creation operator/method returns. Just make sure you note that in the documentation.
/* create a hello string
* must be deleted after use
*/
char *createHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
What I've often done when I need this sort of functionality is to have a char * pointer in the class - initialized to null - and allocate when required.
viz:
class CacheNameString
{
private:
char *name;
public:
CacheNameString():name(NULL) { }
const char *make_name(const char *v)
{
if (name != NULL)
free(name);
name = strdup(v);
return name;
}
};
Something like this would do:
const char *myfunction() {
static char *str = NULL; /* this only happens once */
delete [] str; /* delete previous cached version */
str = new char[strlen("whatever") + 1]; /* allocate space for the string and it's NUL terminator */
strcpy(str, "whatever");
return str;
}
EDIT: Something that occurred to me is that a good replacement for this could be returning a boost::shared_pointer instead. That way the caller can hold onto it as long as they want and they don't have to worry about explicitly deleting it. A fair compromise IMO.
The advice given that warns about the lifetime of the returned string is sound advise. You should always be careful about recognising your responsibilities when it comes to managing the lifetime of returned pointers. The practise is quite safe, however, provided the variable pointed to will outlast the call to the function that returned it. Consider, for instance, the pointer to const char returned by c_str() as a method of class std::string. This is returning a pointer to the memory managed by the string object which is guaranteed to be valid as long as the string object is not deleted or made to reallocate its internal memory.
In the case of the std::type_info class, it is a part of the C++ standard as its namespace implies. The memory returned from name() is actually pointed to static memory created by the compiler and linker when the class was compiled and is a part of the run time type identification (RTTI) system. Because it refers to a symbol in code space, you should not attempt to delete it.
I think something like this can only be implemented "cleanly" using objects and the RAII idiom.
When the objects destructor is called (obj goes out of scope), we can safely assume that the const char* pointers arent be used anymore.
example code:
class ICanReturnConstChars
{
std::stack<char*> cached_strings
public:
const char* yeahGiveItToMe(){
char* newmem = new char[something];
//write something to newmem
cached_strings.push_back(newmem);
return newmem;
}
~ICanReturnConstChars(){
while(!cached_strings.empty()){
delete [] cached_strings.back()
cached_strings.pop_back()
}
}
};
The only other possibility i know of is to pass a smart_ptr ..
It's probably done using a static buffer:
const char* GetHelloString()
{
static char buffer[256] = { 0 };
strcpy( buffer, "Hello World!" );
return buffer;
}
This buffer is like a global variable that is accessible only from this function.
You can't rely on GC; this is C++. That means you must keep the memory available until the program terminates. You simply don't know when it becomes safe to delete[] it. So, if you want to construct and return a const char*, simple new[] it and return it. Accept the unavoidable leak.