Can a char* be moved into an std::string?

Can a char* be moved into an std::string? - c++

Say I have something like this
extern "C" void make_foo (char** tgt) {
*tgt = (char*) malloc(4*sizeof(char));
strncpy(*tgt, "foo", 4);
}
int main() {
char* foo;
make_foo(&foo);
std::string foos{{foo}};
free(foo);
...
return 0;
}
Now, I would like to avoid using and then deleting the foo buffer. I.e., I'd like to change the initialisation of foos to something like
std::string foos{{std::move(foo)}};
and use no explicit free.
Turns out this actually compiles and seems to work, but I have a rather suspicious feel about it: does it actually move the C-defined string and properly free the storage? Or does it just ignore the std::move and leak the storage once the foo pointer goes out of scope?
It's not that I worry too much about the extra copy, but I do wonder if it's possible to write this in modern move-semantics style.

std::string constructor #5:
Constructs the string with the contents initialized with a copy of
the null-terminated character string pointed to by s. The length of
the string is determined by the first null character. The behavior is
undefined if s does not point at an array of at least
Traits::length(s)+1 elements of CharT, including the case when s is a
null pointer.
Your C-string is copied (the std::move doesn't matter here) and thus it is up to you to call free on foo.
A std::string will never take ownership.

tl;dr: Not really.
Pointers don't have any special move semantics. x = std::move(my_char_ptr) is the same as x = my_char_ptr. They are not similar in that regard to, say, std::vector's, in which moving takes away the allocated space.
However, in your case, if you want to keep existing heap buffers and treat them as strings - it can't be using std::string's, as they can't be constructed as a wrapper of an existing buffer (and there's small-string optimization etc.). Instead, consider either implementing a custom container, e.g. with some string data buffer (std::vector<char>) and an std::vector<std::string_view>, whose elements point into that buffer.

Related

Is it possible to std::move local stack variables?

Please consider the following code:
struct MyStruct
{
int iInteger;
string strString;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct NewStruct = { 8, "Hello" };
vecStructs.push_back(std::move(NewStruct));
}
int main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}
Why does this work?
At the moment when MyFunc is called, the return address should be placed on the stack of the current thread. Now create the NewStruct object gets created, which should be placed on the stack as well. With std::move, I tell the compiler, that i do not plan to use the NewStruct reference anymore. He can steal the memory. (The push_back function is the one with the move semantics.)
But when the function returns and NewStruct falls out of scope. Even if the compiler would not remove the memory, occupied by the originally existing structure from the stack, he has at least to remove the previously stored return address.
This would lead to a fragmented stack and future allocations would overwrite the "moved" Memory.
Can someone explain this to me, please?
EDIT:
First of all: Thank you very much for your answers.
But from what i have learned, I still cannot understand, why the following does not work like I expect it to work:
struct MyStruct
{
int iInteger;
string strString;
string strString2;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct oNewStruct = { 8, "Hello", "Definetly more than 16 characters" };
vecStructs.push_back(std::move(oNewStruct));
// At this point, oNewStruct.String2 should be "", because its memory was stolen.
// But only when I explicitly create a move-constructor in the form which was
// stated by Yakk, it is really that case.
}
void main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}

First, std::move does not move, and std::forward does not forward.
std::move is a cast to an rvalue reference. By convention, rvalue references are treated as "references you are permitted to move the data out of, as the caller promises they really don't need that data anymore".
On the other side of the fence, rvalue references implicitly bind to the return value of std::move (and sometimes forward), to temporary objects, in certain cases when returning a local from a function, and when using a member of a temporary or a moved-from object.
What happens within the function taking an rvalue reference is not magic. It cannot claim the storage directly within the object in question. It can, however, tear out its guts; it has permission (by convention) to mess with its arguments internal state if it can do the operation faster that way.
Now, C++ will automatically write some move constructors for you.
struct MyStruct
{
int iInteger;
string strString;
};
In this case, it will write something that roughly looks like this:
MyStruct::MyStruct( MyStruct&& other ) noexcept(true) :
iInteger( std::move(other.iInteger) ),
strString( std::move(other.strString) )
{}
Ie, it will do an element-wise move construct.
When you move an integer, nothing interesting happens. There isn't any benefit to messing with the source integer's state.
When you move a std::string, we get some efficiencies. The C++ standard describes what happens when you move from one std::string to another. Basically, if the source std::string is using the heap, the heap storage is transferred to the destination std::string.
This is a general pattern of C++ containers; when you move from them, they steal the "heap allocated" storage of the source container and reuse it in the destination.
Note that the source std::string remains a std::string, just one that has its "guts torn out". Most container like things are left empty, I don't recall if std::string makes that guarantee (it might not due to SBO), and it isn't important right now.
In short, when you move from something, its memory is not "reused", but memory it owns can be reused.
In your case, MyStruct has a std::string which can use heap allocated memory. This heap allocated memory can be moved into the MyStruct stored in the std::vector.
Going a bit further down the rabbit hole, "Hello" is likely to be so short that SBO occurs (small buffer optimization), and the std::string doesn't use the heap at all. For this particular case, there may be next to no performance improvement due to moveing.

Your example can be reduced to:
vector<string> vec;
string str; // populate with a really long string
vec.push_back(std::move(str));
This still raises the question, "Is it possible to move local stack variables." It just removes some extraneous code to make it easier to understand.
The answer is yes. Code like the above can benefit from std::move because std::string--at least if the content is large enough--stores it actual data on the heap, even if the variable is on the stack.
If you do not use std::move(), you can expect code like the above to copy the content of str, which could be arbitrarily large. If you do use std::move(), only the direct members of the string will be copied (move does not need to "zero out" the old locations), and the data will be used without modification or copying.
It's basically the difference between this:
char* str; // populate with a really long string
char* other = new char[strlen(str)+1];
strcpy(other, str);
vs
char* str; // populate with a really long string
char* other = str;
In both cases, the variables are on the stack. But the data is not.
If you have a case where truly all the data is on the stack, such as a std::string with the "small string optimization" in effect, or a struct containing integers, then std::move() will buy you nothing.

Is there memory leak if function returns std::vector<std::string>?

According to the reference:
1. std::vector::swap exchanges contents;
2. copying strings is deep.
But how about swapping a function returned array of strings?
My guess is, the function returns a copy of the internal strings. So the swapping should be fine. However, debugging in visual studio, the internal strings and the outside strings (after swapping) have the same memory addresses at the raw_view, so i doubt my guess.
Thank you.
std::vector<std::string> get_name_list()
{
std::string name1 = "foo";
std::string name2 = "bar";
std::vector<std::string> names;
names.push_back(name1);
names.push_back(name2);
return names;
}
void main()
{
std::vector<std::string> list;
list.swap(get_name_list()); // deep copy strings? or access local memory?
}

In general passing and returning by value avoids memory leaks, though of course the types involved might still have buggy memory management. This shouldn't be the case for standard library containers and std::string.
There is no memory leak in your code [edit: assuming it compiles, that is; you can make it compile by changing it to get_name_list().swap(list).] Swapping two vectors does not copy or move the vectors' elements. You can imagine that the two vectors' pointers to their internal data arrays are simply swapped, leaving the objects themselves in place.

Your code doesn't compile since you're trying to bind a temporary to a l-value reference
template <class T> void swap (T& a, T& b)
MSVC seems to accept it and swaps the contents (it doesn't copy the contents) but this is not conformant. It shouldn't leak (it's swapping the internal contents, not copying the contents) but it shouldn't work that way either.
Assuming both a C++11-conformant compiler and standard library in your case you'd better off relying on the compiler doing the right choice: i.e. returning that std::vector<std::string> is indeed a temporary and subject to move semantics. No memory leak would be involved since you're using a vector which (assuming no bugs in the implementation of course) provides move operators/constructors.
std::vector<std::string> list;
list = get_name_list();
Live Example
The signature
void main()
is also wrong even though, as Brian commented, MSVC might accept it. The standard signature is
int main()

Using char* or char [] in struct C++

I am creating a struct called student. In order to store the name, is there anything wrong with just declaring a char pointer in the struct instead of a char array with a predefined size? I can then assign a string literal to the char pointer in the main code.
struct student
{
int ID;
char* name;
};

It really depends on your use case. As suggested above you should use std::string in C++. But if you are using C-style strings, then it depends on your usage.
Using char[] of defined size you can avoid errors due to null pointers and other pointer related errors like memory leaks, dangling pointers etc., but you might not be making an optimal use of memory. You may for example define
#define MAX_SIZE 100
struct student
{
int ID;
char name[MAX_SIZE];
};
And then
#define STUDENT_COUNT 50
struct student many_students[STUDENT_COUNT];
But the length of names of student will be different and in many cases much less than MAX_SIZE. As such much memory will be wasted here.
Or in some cases it might be greater than MAX_SIZE. You may have to truncate the names here to avoid memory corruption.
In other case where we define use char*, memory is not wasted as we allocate only the required amount, but we must take care of memory allocation and freeing.
struct student
{
int ID;
char *name;
};
Then while storing name we need to do something like this:
struct student many_student[STUDENT_COUNT];
int i;
for( i=0; i<STUDENT_COUNT; i++) {
// some code to get student name
many_student[i].name = (char*)malloc(name_length+1 * sizeof(char));
// Now we can store name
}
// Later when name is no longer required free it
free(many_student[some_valid_index_to_free].name);
// also set it to NULL, to avoid dangling pointers
many_student[some_valid_index_to_free].name = NULL;
Also if you are again allocating memory to name, you should free previously allocated memory to avoid memory leaks. Also another thing to consider is NULL checks for pointers before use, i.e., you should always check as
if(many_students[valid_index].name!=NULL) {
// do stuff
}
Although you can create macros to do this, but these are basic overheads with pointers.
Another advantage of using pointers is that if there are many similar names then you can point multiple pointers to same name and save memory, but in array you will be having separate memory for all, e.g,
// IF we have a predefined global name array
char *GLOBAL_NAMES[] = {"NAME_1", "NAME_2", "NAME_3", "NAME_4", ... , "NAME_N"};
// using pointers, just need to assign name to correct pointer in array
many_student[valid_index_1].name = GLOBAL_NAMES[INDEX_NAME_1];
many_student[valid_index_2].name = GLOBAL_NAMES[INDEX_NAME_1];
// In case of array we would have had to copy.
Although this might not be your case, but just saying that pointers may help avoid extra usage.
Hope it will help you :)

Don't use either, use std::string. I (and many others) guarantee that compared to either char* or char[]:
it will be easier to use and
it will be less prone to bugs.

Difference is same as difference between static and dynamic memory allocation. With former ( static ) you have to specify size enough to store the name whereas with latter you have to pay attention to delete it when in no need.
Although it's all time better to use std::string.

Unless there is a strong reason to not do so, I'd suggest you to use a convenient string class like std::string, instead of a raw char* pointer.
Using std::string will simplify your code a lot, e.g. the structure will be automatically copyable, the strings will be automatically released, etc.
A reason why you could not use std::string is because you are designing an interface boundary, think of e.g. Win32 APIs which are mainly C-interface-based (implementation can be in C++), so you can't use C++ at the boundary and instead must use pure C.
But if that's not the case, do yourself a favor and use std::string.
Note also that in case you must use a raw char* pointer, you have several design questions to clarify, e.g.:
Is this an owning pointer, or an observing pointer?
If it's an owning pointer, in what way is it allocated, and in what way is it released? (e.g. malloc()/free(), new[]/delete[], some other allocator like COM CoTaskMemAlloc(), SysAllocString(), etc.)
If it's an observing pointer, you must make sure that the observed string's lifetime exceeds that of the observing pointer, to avoid dangling references.
All these questions are just non-existent if you use a convenient string class (like e.g. std::string).
Note also that, as some Win32 data structures do, you can have a maximum-sized string buffer inside your structure, e.g.
struct Student
{
int ID;
char Name[60];
};
In this case you could use C functions like strcpy(), or safer variants, to deep-copy a source string into the Name buffer. In that case you have good locality since the name string is inside the structure, and a simplified memory management with respect to the raw char* pointer case, but at the cost of having a pre-allocated memory buffer.
This may or may not be a better option for you, depending on your particular programming context. Anyway, keep in mind that this is a more C-like approach; a better C++ approach would be to just use a string class like std::string.

TL;DR - use std::string, as we're talking in c++.
EDIT: Previously, as per the C tag (currently removed)
As per your requirement, assigning a string literal needs a pointer, you cannot do that with an array, anyway.#
If you're using that pointer to store the base address of a string literal, then it is ok. Otherwise, you need to
allocate memory before using that pointer
deallocate memory once you're done with it.
#) Base address of compile time allocted array cannot be changed, thus assignment won't work.

Use the std::string library. It is more easier to work with. And has way more functionality compared to the built in counterparts.

const object and const constructor

Is there any way to know if an object is a const object or regular object, for instance consider the following class
class String
{
String(const char* str);
};
if user create a const object from String then there is no reason to copy the passed native string and that because he will not make any manipulation on it, the only thing he will do is get string size, string search and other functions that will not change the string.

There is a very good reason for copying - you can't know that the lifetime of the const char * is the same as that of the String object. And no, there is no way of knowing that you are constructing a const object.

Unfortunately, C++ does not provide a way to do what you are attempting. Simply passing a const char * does not guarantee the lifetime of the memory being pointed to. Consider:
char * a = new char[10];
char const *b = a;
String c (b);
delete[] a;
// c is now broken

There is no way for you to know. You could write a class that tightly interacts with String and that creates a constant string pointing to an external buffer (by making the corresponding constructor private and making the interacting class a nested class or a friend of String).
If all you worry about is doing dynamic memory management on a potentially small constant string, you can implement the Small String Optimization (also Small Object/Buffer Optimization). It works by having an embedded buffer in your string class, and copying each string up to some predefined size into that buffer, and each string that's larger to a dynamically allocated storage (the same technique is used by boost::function for storing small sized function objects).
class String {
union {
char *dynamicptr;
char buffer[16];
};
bool isDynamic;
};
There are clever techniques for storing even the length of the embedded string into the buffer itself (storing its length as buffer[15] and similar trickeries).

You could use const_string to do what you're looking for. However, even with const string you have to "tell" it that the string doesn't need to be copied.
const char* foo = "c-string";
boost::const_string bar(foo); // will copy foo
boost::const_string baz(boost::ref(foo)); // assumes foo will always be a valid pointer.

if user create a const object from String then there is no reason to copy the passed native string and that because he will not make any manipulation on it, the only thing he will do is get string size, string search and other functions that will not change the string.
Oh yes there is. Just that it is passes as const doesn't mean that it actually is const outside of the constructor call, and it especially doesn't mean it won't be destroyed while the string object still exists. The keyword const for a function argument only means that the function won't modify or delete it (trying to implement a function that modifies a const argument will result in a compiler error), but there's no way for the function to know what happens outside.

What you're looking for is basically a COW (copy on write) string. Such things are entirely possible, but getting them to work well is somewhat non-trivial. In a multithreaded environment, getting good performance can go beyond non-trivial into the decidedly difficult range.

Prototype for function that allocates memory on the heap (C/C++)

I'm fairly new to C++ so this is probably somewhat of a beginner question. It regards the "proper" style for doing something I suspect to be rather common.
I'm writing a function that, in performing its duties, allocates memory on the heap for use by the caller. I'm curious about what a good prototype for this function should look like. Right now I've got:
int f(char** buffer);
To use it, I would write:
char* data;
int data_length = f(&data);
// ...
delete[] data;
However, the fact that I'm passing a pointer to a pointer tips me off that I'm probably doing this the wrong way.
Anyone care to enlighten me?

In C, that would have been more or less legal.
In C++, functions typically shouldn't do that. You should try to use RAII to guarantee memory doesn't get leaked.
And now you might say "how would it leak memory, I call delete[] just there!", but what if an exception is thrown at the // ... lines?
Depending on what exactly the functions are meant to do, you have several options to consider. One obvious one is to replace the array with a vector:
std::vector<char> f();
std::vector<char> data = f();
int data_length = data.size();
// ...
//delete[] data;
and now we no longer need to explicitly delete, because the vector is allocated on the stack, and its destructor is called when it goes out of scope.
I should mention, in response to comments, that the above implies a copy of the vector, which could potentially be expensive. Most compilers will, if the f function is not too complex, optimize that copy away so this will be fine. (and if the function isn't called too often, the overhead won't matter anyway). But if that doesn't happen, you could instead pass an empty array to the f function by reference, and have f store its data in that instead of returning a new vector.
If the performance of returning a copy is unacceptable, another alternative would be to decouple the choice of container entirely, and use iterators instead:
// definition of f
template <typename iter>
void f(iter out);
// use of f
std::vector<char> vec;
f(std::back_inserter(vec));
Now the usual iterator operations can be used (*out to reference or write to the current element, and ++out to move the iterator forward to the next element) -- and more importantly, all the standard algorithms will now work. You could use std::copy to copy the data to the iterator, for example. This is the approach usually chosen by the standard library (ie. it is a good idea;)) when a function has to return a sequence of data.
Another option would be to make your own object taking responsibility for the allocation/deallocation:
struct f { // simplified for the sake of example. In the real world, it should be given a proper copy constructor + assignment operator, or they should be made inaccessible to avoid copying the object
f(){
// do whatever the f function was originally meant to do here
size = ???
data = new char[size];
}
~f() { delete[] data; }
int size;
char* data;
};
f data;
int data_length = data.size;
// ...
//delete[] data;
And again we no longer need to explicitly delete because the allocation is managed by an object on the stack. The latter is obviously more work, and there's more room for errors, so if the standard vector class (or other standard library components) do the job, prefer them. This example is only if you need something customized to your situation.
The general rule of thumb in C++ is that "if you're writing a delete or delete[] outside a RAII object, you're doing it wrong. If you're writing a new or `new[] outside a RAII object, you're doing it wrong, unless the result is immediately passed to a smart pointer"

In 'proper' C++ you would return an object that contains the memory allocation somewhere inside of it. Something like a std::vector.

Your function should not return a naked pointer to some memory. The pointer, after all, can be copied. Then you have the ownership problem: Who actually owns the memory and should delete it? You also have the problem that a naked pointer might point to a single object on the stack, on the heap, or to a static object. It could also point to an array at these places. Given that all you return is a pointer, how are users supposed to know?
What you should do instead is to return an object that manages its resource in an appropriate way. (Look up RAII.) Give the fact that the resource in this case is an array of char, either a std::string or a std::vector seem to be best:
int f(std::vector<char>& buffer);
std::vector<char> buffer;
int result = f(buffer);

Why not do the same way as malloc() - void* malloc( size_t numberOfBytes )? This way the number of bytes is the input parameter and the allocated block address is the return value.
UPD:
In comments you say that f() basically performs some action besides allocating memory. In this case using std::vector is a much better way.
void f( std::vector<char>& buffer )
{
buffer.clear();
// generate data and add it to the vector
}
the caller will just pass an allocated vector:
std::vector buffer;
f( buffer );
//f.size() now will return the number of elements to work with

Pass the pointer by reference...
int f(char* &buffer)
However you may wish to consider using reference counted pointers such as boost::shared_array to manage the memory if you are just starting this out.
e.g.
int f(boost::shared_array<char> &buffer)

Use RAII (Resource Acquisition Is Initialization) design pattern.
http://en.wikipedia.org/wiki/RAII
Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)

Just return the pointer:
char * f() {
return new char[100];
}
Having said that, you probably do not need to mess with explicit allocation like this - instead of arrays of char, use std::string or std::vector<char> instead.

If all f() does with the buffer is to return it (and its length), let it just return the length, and have the caller new it. If f() also does something with the buffer, then do as polyglot suggeted.
Of course, there may be a better design for the problem you want to solve, but for us to suggest anything would require that you provide more context.

The proper style is probably not to use a char* but a std::vector or a std::string depending on what you are using char* for.
About the problem of passing a parameter to be modified, instead of passing a pointer, pass a reference. In your case:
int f(char*&);
and if you follow the first advice:
int f(std::string&);
or
int f(std::vector<char>&);

Actually, the smart thing to do would be to put that pointer in a class. That way you have better control over its destruction, and the interface is much less confusing to the user.
class Cookie {
public:
Cookie () : pointer (new char[100]) {};
~Cookie () {
delete[] pointer;
}
private:
char * pointer;
// Prevent copying. Otherwise we have to make these "smart" to prevent
// destruction issues.
Cookie(const Cookie&);
Cookie& operator=(const Cookie&);
};

Provided that f does a new[] to match, it will work, but it's not very idiomatic.
Assuming that f fills in the data and is not just a malloc()-alike you would be better wrapping the allocation up as a std::vector<char>
void f(std::vector<char> &buffer)
{
// compute length
int len = ...
std::vector<char> data(len);
// fill in data
...
buffer.swap(data);
}
EDIT -- remove the spurious * from the signature

I guess you are trying to allocate a one dimensional array. If so, you don't need to pass a pointer to pointer.
int f(char* &buffer)
should be sufficient. And the usage scenario would be:
char* data;
int data_length = f(data);
// ...
delete[] data;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Can a char* be moved into an std::string? - c++

Related

Is it possible to std::move local stack variables?

Is there memory leak if function returns std::vector<std::string>?

Using char* or char [] in struct C++

const object and const constructor

Prototype for function that allocates memory on the heap (C/C++)

Categories

Resources