Initialize a string with a char pointer with zero-copy - c++

Say I am given a long and null-terminated cstring as char* text_ptr. I own text_ptr and I am responsible of free()ing it. Currently, I use text_ptr and free() it each time after use.
I try to improve memory safety a bit by wrapping it in a C++ class so that I can enjoy the benefit of RAII. There could be many ways to achieve it. A naive way is: string text_ptr(text_ptr);. However, by doing so, memory is copied once and I still need to manually free() my text_ptr. It would be better if I can avoid memory copy and free() (as this text_ptr is created frequently, performance could take a big hit if I copy it each time). My current thought:
Is it possible to transfer the ownership of text_ptr to a string text_str? Hypothetically, I do text_str.data() = text_ptr;.
Thanks

std::string can't receive ownership of an external buffer. The best you can do is std::unique_ptr.
By default std::unique_ptr will use delete (or delete[]), but you need std::free(), so a custom deleter is required:
#include <cstdlib>
#include <memory>
struct FreeDeleter
{
void operator()(void *p) const
{
std::free(p);
}
};
int main()
{
std::unique_ptr<char[], FreeDeleter> ptr((char *)malloc(42));
}
If you also store the length, you can construct a temporary std::string_view from pointer+length when needed, to conveniently read the string.
Or, a oneliner: std::unique_ptr<char[], std::integral_constant<void(*)(void *), std::free>>.
Another one for C++20: std::unique_ptr<char[], decltype([](void *p){std::free(p);})>.

An idea (not sure it’s a good one, tho)
#include <iostream>
#include <string_view>
#include <cstring>
#include <memory>
struct my_string_view : public std::string_view
{
using std::string_view::string_view;
std::shared_ptr<char[]> _data;
explicit my_string_view( char * data )
: std::string_view(data)
, _data{data, free}
{ }
};
void f( const my_string_view & s )
{
std::cout << "f: \"" << s << "\"\n";
}
int main()
{
my_string_view s( strdup( "Hello world!" ) );
f( s );
std::cout << "main: \"" << s << "\"\n";
}
(This version requires C++17. For older versions of the standard you’ll have to specify the default_deleter<char[]>() explicitly.)

Related

When should I use std::string / std::string_view for parameter / return type

Introduction
I'm writing some communication application. Before C++17 (without Boost), I use std::string and its const reference as cls1.
Since C++17, I introduced std::string_view to my code as cls2.
However, I don't have clear policy when should I use std::string_view. My communication application receives data from the network and it is stored to recv_buffer. And creates some application classes from recv_buffer.
Construction
If I focus only cls1's constructor, move construction is efficient. But I think that where the parameter s from. If it is originally from the recv_buffer, I can create std::string_view at the receiving (very early) point. And during recv_buffer's lifetime is enabled, use std::string_view everywhere. If I need to store the part of recv_buffer then create std::string.
An only exception I noticed is the recv_buffer is always contained complete data for my application class. In this case, move construction is efficient.
Getter
I think using the return type as std::string_view has advantage. Some member function such as substr() is efficient. But I don't see any disadvantage, so far.
Question
I suspect that I might see only pros of std::string_view. Before re-writing many codes, I would like to know your ideas.
PoC code
#include <string>
struct cls1 {
explicit cls1(std::string s):s_(std::move(s)) {}
std::string const& get() const { return s_; }
private:
std::string s_;
};
struct cls2 {
explicit cls2(std::string_view s):s_(s) {}
std::string_view get() const { return s_; }
private:
std::string s_;
};
#include <iostream>
int main() {
// If all of the receive buffer is the target
{
std::string recv_buffer = "ABC";
cls1 c1(std::move(recv_buffer)); // move construct
std::cout << c1.get().substr(1, 2) << std::endl; // create new string
}
{
std::string recv_buffer = "ABC";
cls2 c2(recv_buffer); // copy happend
std::cout << c2.get().substr(1, 2) << std::endl; // doesn't create new string
}
// If a part of the receive buffer is the target
{
std::string recv_buffer = "<<<ABC>>>";
cls1 c1(recv_buffer.substr(3, 3)); // copy happend and move construct
std::cout << c1.get().substr(1, 2) << std::endl; // create new string
}
{
std::string recv_buffer = "<<<ABC>>>";
std::string_view ref = recv_buffer;
cls2 c2(ref.substr(3, 3)); // string create from the part of buffer directly
std::cout << c2.get().substr(1, 2) << std::endl; // doesn't create new string
}
}
Running Demo: https://wandbox.org/permlink/TW8w3je3q3D46cjk
std::string_view is a way to get some std::string const member functions without creating a std::string if you have some char* or you want to reference subset of a string.
Consider it as a const reference. If the object it refers vanishes (or changes) for any reason, you have a problem. If your code can return a reference, you can return a string_view.
Example:
#include <cstdio>
#include <string>
#include <vector>
#include <string.h>
#include <iostream>
int main()
{
char* a = new char[10];
strcpy(a,"Hello");
std::string_view s(a);
std::cout << s; // OK
delete[] a;
std::cout << s; // whops. UD. If it was std::string, no problem, it would have been a copy
}
More info.
Edit: It doesn't have a c_str() member because this needs the creation of a \0 at the end of the substring which cannot be done without modification.
Don't return a string view when:
The caller needs a null terminated string. This is often the case when dealing with C API's.
You don't store the string itself somewhere. You do store the string in a member in this case.
Do realise, that the string view becomes invalidated by operations on the original string such as changing the capacity, as well as if the original string is destroyed. If the caller needs the string for a longer than the life time of the object that stores the string, then they can copy from the view into their own storage.

How to use a unique_ptr after passing it to a function?

I just started learning the new C++ memory model:
#include <string>
#include <iostream>
#include <memory>
void print(unique_ptr<std::string> s) {
std::cout << *s << " " << s->size() << "\n";
}
int main() {
auto s = std::make_unique<std::string>("Hello");
print(std::move(s));
std::cout << *s;
return 0;
}
Right now calling cout << *s; results in a segfault, as it should. I understand why it happens. But I also would like to know if there's a way get back the ownership. I'd like to be able to use a value after passing it to a function.
If you don't want to transfer ownership of the owned object, then don't pass the unique_ptr to the function. Instead, pass a reference or a raw pointer to the function (in modern C++ style, a raw pointer is usually understood to be non-owning). In the case where you just want to read the object, a const reference is usually appropriate:
void print(const std::string&);
// ...
print(*s);

reinterpret_cast between std::unordered_map

I have the following unordered_maps:
struct a{
std::string b;
};
int main()
{
std::unordered_map<std::string, std::string> source;
source["1"] = "Test";
source["2"] = "Test2";
std::unordered_map<std::string, a> dest = *reinterpret_cast<std::unordered_map<std::string, a>*>(&source);
std::cout << dest["1"].b << std::endl;
std::cout << dest["2"].b << std::endl;
}
Using a reinterpret_cast I convert source into dest. This works since struct a only consists of a std::string.
My question: Is this actually good pratice? GCC yields the following warning:
dereferencing type-punned pointer will break strict-aliasing rules
Can I safely ignore this? Or is there any potential drawback of just casting the raw bytes of STL container?
(cpp.sh/5r2rh)
No that is not good practice. Your code is not safe. In fact it's the opposite: undefined behavior, meaning sometimes it works sometimes it won't, even without telling you.
The real problem is that you have no "legal" way to convert a std::string to an struct a. This isn't C, don't use stuff as plain bytes, use the type system of the language. Then the compiler will help you do avoid bad mistakes.
This is my solution:
#include <unordered_map>
#include <string>
#include <iostream>
#include <algorithm>
struct a {
std::string b;
a () = default;
a (const std::string& b) : b(b){}
};
int main() {
std::unordered_map<std::string, std::string> source;
source["1"] = "Test";
source["2"] = "Test2";
std::unordered_map<std::string, a> dest;
std::transform(source.cbegin(),source.cend(),std::inserter(dest,dest.end()),[](const auto& value)
{
return std::forward_as_tuple(value.first,value.second);
});
std::cout << dest["1"].b << std::endl;
std::cout << dest["2"].b << std::endl;
}
If you have performance concerns, you can also add a move constructor and more, but trust me, readable clean code, is fast code. Otherwise the bootle neck is not that non casting code, but the use of maps, copying instead of moving and other stuff. But don't optimize prematurely.

Passing a string as a const char*

I have access to a class (not written by me) that takes a const char* as a parameter in the constructor. If I have a string that I want to pass the value of as a parameter, what is the safe way to pass it, keeping in mind that the string and the class object may have different scopes?
I don't have access to the source code for the class, so don't assume it's doing something sane like copying the string into a class member.
As a concrete example, this doesn't work:
#include <iostream>
#include <string>
class example {
public:
example(const char*);
const char* str;
};
example::example(const char* a) : str(a) {}
int main() {
std::string* a=new std::string("a");
example thisDoesntWork(a->c_str());
std::cout << thisDoesntWork.str << std::endl;
delete a;
std::cout << thisDoesntWork.str << std::endl; //The pointer is now invalid
a=new std::string("b");
std::cout << thisDoesntWork.str << std::endl;
}
Replacing the constructor with this works (so far as I can tell) but is clearly pretty awful:
example thisDoesWorkButIsAwful((new const std::string(*a))->c_str()); //Memory leak!
Similarly:
char* buffer=new char[a->size()+1];
strcpy(buffer,a->c_str()); //with #include <string.h> added up top
example(buffer);
But again, this is prone to memory leaks.
My main idea at the moment is to make a wrapper class around example that copies the string into a char * buffer and deletes the buffer when it goes out of scope, but that seems a little heavy-handed. Is there an easier/better way?
Fundamentally, something needs to hold on to the memory - either you do it yourself or have it done automatically.
One way to do it automatically:
class SuperThatHoldsIt
{
std::string mString ;
SuperThatHoldsIt ( std::string const& str )
: mString ( str ) { }
} ;
class HoldingExample
: private SuperThatHoldsIt
, public example
{
holdingExample ( std::string const& string )
: SuperThatHoldsIt ( string )
, example ( mString.c_str() )
{ }
} ;
Then create it in a std::shared_ptr (or boost::shared_ptr) which will hold on to it.
std::string myString ( "Hello, world!" ) ;
std::shared_ptr<HoldingExample> value = std::make_shared<HoldingExample> ( myString ) ;
Now this holds onto the memory AND the structure.
Notes:
The reason HoldingExample derives from two supers is to that the order of constructors will work out because superclasses are always initialized before local variables. This means we have to construct example before our own member variables, but we can always initialize a superclass's and use its member variables.
If you pass this into a function, like
callFunction ( *value ) ;
If they hold on to that const char* after you've let go of your value, then you'll still have a leak and you really can't get around that.

Overallocating with new/delete

Using malloc and free, it is easy to allocate structures with extra data beyond the end. But how do I accomplish the same with new/ delete?
I know I could use placement new syntax along with malloc for the allocation part, but will delete work properly and portably if I place an object in memory allocated by malloc?
What I want to accomplish is the same as the following example, but using new/ delete instead of malloc/ free, so that constructors/destructors will be called properly:
#include <cstdlib>
#include <cstring>
#include <iostream>
class Hamburger {
int tastyness;
public:
char *GetMeat();
};
char *Hamburger::GetMeat() {
return reinterpret_cast<char *>(this) + sizeof(Hamburger);
}
int main(int argc, char* argv[])
{
Hamburger* hb;
// Allocate a Hamburger with 4 extra bytes to store a string.
hb = reinterpret_cast<Hamburger*>(malloc(sizeof(Hamburger) + 4));
strcpy(hb->GetMeat(), "yum");
std::cout << "hamburger is " << hb->GetMeat() << std::endl;
free(hb);
}
Output: hamburger is yum
You can do this without resorting to malloc/free or undefined behavior (I'm not sure about the reinterpret_cast, but at least construction/destruction can be done just fine).
To allocate the memory you can just call the global operator new directly. After that you use good old placement new to construct the object there. You have to guard the ctor-call though, since the "placement delete" function that's called if the ctor fails will not release any memory but just do nothing (just as placement new does nothing).
To destroy the object afterwards you can (and may) call the destructor directly, and to release the memory you can call the global operator delete.
I think it should also be OK to just delete it as you would any normal object, since calling the destructor and global operator delete afterwards is just what the normal delete will do, but I'm not 100% sure.
Your example modified like that:
#include <cstdlib>
#include <cstring>
#include <iostream>
class Hamburger {
int tastyness;
public:
char *GetMeat();
};
char *Hamburger::GetMeat() {
return reinterpret_cast<char *>(this) + sizeof(Hamburger);
}
int main(int argc, char* argv[])
{
Hamburger* hb;
// Allocate space for a Hamburger with 4 extra bytes to store a string.
void* space = operator new(sizeof(Hamburger) + 4);
// Construct the burger in that space
hb = new (space) Hamburger; // TODO: guard ctor call (release memory if ctor fails)
strcpy(hb->GetMeat(), "yum"); // OK to call member function on burger now
std::cout << "hamburger is " << hb->GetMeat() << std::endl;
// To delete we have to do 2 things
// 1) call the destructor
hb->~Hamburger();
// 2) deallocate the space
operator delete(hb);
}
If I were you, I'd use placement new and an explicit destructor call instead of delete.
template< typename D, typename T >
D *get_aux_storage( T *x ) {
return reinterpret_cast< D * >( x + 1 );
}
int main() {
char const *hamburger_identity = "yum";
void *hamburger_room = malloc( sizeof( Hamburger )
+ strlen( hamburger_identity ) + 1 );
Hamburger *hamburger = new( hamburger_room ) Hamburger;
strcpy( get_aux_storage< char >( hamburger ), hamburger_identity );
cout << get_aux_storage< char const >( hamburger ) << '\n';
hamburger->~Hamburger(); // explicit destructor call
free( hamburger_room );
}
Of course, this kind of optimization should only be done after profiling has proven the need. (Will you really save memory this way? Will this make debugging harder?)
There might not be a significant technical difference, but to me new and delete signal that an object is being created and destroyed, even if the object is just a character. When you allocate an array of characters as a generic "block," it uses the array allocator (specially suited to arrays) and notionally constructs characters in it. Then you must use placement new to construct a new object on top of those characters, which is essentially object aliasing or double construction, followed by double destruction when you want to delete everything.
It's better to sidestep the C++ object model with malloc/free than to twist it to avoid dealing with data as objects.
Oh, an alternative is to use a custom operator new, but it can be a can of worms so I do not recommend it.
struct Hamburger {
int tastyness;
public:
char *GetMeat();
static void *operator new( size_t size_of_bread, size_t size_of_meat )
{ return malloc( size_of_bread + size_of_meat ); }
static void operator delete( void *ptr )
{ free( ptr ); }
};
int main() {
char const *hamburger_identity = "yum";
size_t meat_size = strlen( hamburger_identity ) + 1;
Hamburger *hamburger = new( meat_size ) Hamburger;
strcpy( hamburger->GetMeat(), hamburger_identity );
cout << hamburger->GetMeat() << '\n';
}
Urgh. Well, let's see. You definitely can't allocate with new/malloc and dispose with free/delete. You have to use matching pairs.
I suppose you could use "hp = new char[sizeof(Hamburger) + 4]" and "delete[]((char *) hp)", along with explicit constructor/destructor calls, if you really wanted to do this.
The only reason I can think why you'd want to do this would be you didn't have the Hamburger source -- i.e., it was a library class. Otherwise you'd just add a member to it! Can you explain why you'd want to use this idea?
There is another way that you could approach this if you have a reasonably constrained set of padding amounts. You could make a template class with the padding amount as the template parameter and then instantiate it with the set of possible padding amounts. So if, for example, you knew that you were only going to need padding of 16, 32, or 64 bytes, you could do it like this:
template <int Pad>
class Hamburger {
int tastiness;
char padding[Pad];
};
template class Hamburger<16>;
template class Hamburger<32>;
template class Hamburger<64>;
Is there any reason why the straightforward, easy and safe way is not applicable?
class Hamburger {
public:
void Extend( const std::string& pExtension) {
mContent += pExtension;
}
const std::string& GetMeat() ...
private:
std::string mContent;
};
int main() {
Hamburger hb;
hb.Extend("yum");
std::cout << "hamburger is " << hb.GetMeat() << std::endl;
}