Construct a string without allocation - c++

Considering we have a struct with char * member, if we want to request the content of this member, we normally do
char const * get_member() { return object.member; }
By this, we only return a pointer, without any allocation.
If now we want to return a string; is it possible to let the string just use that pointer, instead of copying the content and construct a new string object?
string const & get_member() { return object.member; }
will this code above will do a memory allocation. what like of extra work will this method do compare to the char const * one?

No, it is not possible. std::string always allocates its own memory and cannot take ownership of a pre-existing buffer.
You can either return a copy of the pointer, or you can use a std::string member in the first place, and return a reference to it. Or, alternatively return std::string_view which can be used with either char* or a std::string member. String view is only available since C++17 but it also exists in standard library extensions some for earlier compilers and there also exists non-standard implementations.
The struct is from some C code based library, just want to wrap with C++, at the mean time, do not kill any performance.
Then it seems that returning a std::string would not be an appropriate design.

Related

Secure way to create a getter for a char[ ] field

So im doing an assignment and i have to use a C-style string as a field of my class. I'm trying to create a secure getter for it but i have no idea how to do it.
Field im using:
char name[20];
So far i tried these things:
char* Car::getName() {
return &name[0];
}
^ the only one that works but if im right it returns the address of the field which allows to edit it.
char* Car::getName() {
char ret[20];
strcpy(ret, name);
return ret;
}
^another approach i tried with no success
So I'd like to ask what's a proper way of creating such getter? (Sorry if its a naive question but i haven't used a lot of C in my life)
Declare the return type to be const char *, and make the function itself const (to indicate the object is unchanged by the call). Callers can ignore that through casts, but that's on them; this is for correctness, not security.
const char* Car::getName() const {
return &name[0];
}
They'll have a pointer to the class internals, but it's explicitly stated that said pointer is to read-only data.
Other options include exposing it as a std::string that you construct on demand (return std::string(name); with return type std::string), or returning a smart pointer of some kind to dynamically allocated memory (equivalent morally to returning std::string). Making a copy and returning a dumb pointer (e.g. return strdup(name);) is a bad idea, since now you've opened the door to memory leaks; the caller has to manually free/delete memory.

std::string and const char *

If I use
const char * str = "Hello";
there is no memory allocation/deallocaton needed in runtime
If I use
const std::string str = "Hello";
will be there an allocation via new/malloc inside string class or not? I could find it in assembly, but I am not good at reading it.
If answer is "yes, there will be malloc/new", why? Why can there be only pass through to inner const char pointer inside std::string and do actual memory allocation if I need to edit edit string?
will be there an allocation via new/malloc inside string class or not?
It depends. The string object will have to provide some memory to store the data, since that's its job. Some implementations use a "small string optimisation", where the object contains a small buffer, and only allocates from the heap if the string is too large for that.
Why can there be only pass through to inner const char pointer inside std::string and do actual memory allocation if I need to edit edit string?
What you describe isn't necessarily an optimisation (since it needs an extra runtime check whenever you modify the string), and in any case isn't allowed by the iterator invalidation rules.
There is a proposal for a string_view, allowing you to access an existing character sequence with an interface like const string, without any memory management. It's not yet standard, and doesn't allow you to modify the string.
Naive implementation of std::string will require a heap allocation however compilers are allowed to optimize statically initialized std::string objects by replacing them with objects of alternative implementations if the initialized strings are not modified during runtime.
You may use const std::string when you instantiate immutable strings to ensure better optimization.
The C++ standard doesn't actually say you can't just store a pointer to an external string (and a length). However, that means EVERY time you may modify the string (e.g. char& std::string::operator[](size_t index)) would have to ensure that the string is actually writeable. Since a large number of string usage does NOT use a constant string only to store the string, but does indeed modify the string [or use a string that isn't a constant input anyway].
So, some problems are;
std::string s = "Hello";
char &c = s[1];
c = 'a'; // Should make string to "Hallo".
what if:
char buffer[1000];
cin.getline(buffer); // Reads "Hello"
std::string s = buffer;
cin.getline(buffer); // Reads "World"
What is the value in s now?
There are so many such cases where if you were to just copy the original string, it would cause more problems, and little or no benefit.

'Moving' unchanging char array into const std::string [duplicate]

I have a function f returning a char*. The function documentation says:
The user must delete returned string
I want to construct a std::string from it. The trivial things to do is:
char* cstring = f();
std::string s(cstring);
delete cstring;
Is it possibile to do it better using C++ features? I would like to write something like
std::string(cstring)
avoiding the leak.
std::string will make a copy of the null terminated string argument and manage that copy. There's no way to have it take ownership of a string you pass to it. So what you're doing is correct, the only improvement I'd suggest is a check for nullptr, assuming that is a valid return value for f(). This is necessary because the std::string constructor taking a char const * requires that the argument point to a valid array, and not be nullptr.
char* cstring = f();
std::string s(cstring ? cstring : "");
delete[] cstring; // You most likely want delete[] and not delete
Now, if you don't need all of std::string's interface, or if avoiding the copy is important, then you can use a unique_ptr to manage the string instead.
std::unique_ptr<char[]> s{f()}; // will call delete[] automatically
You can get access to the managed char * via s.get() and the string will be deleted when s goes out of scope.
Even if you go with the first option, I'd suggest storing the return value of f() in a unique_ptr before passing it to the std::string constructor. That way if the construction throws, the returned string will still be deleted.
There is no standard way for a std::string to take ownership of a buffer you pass.
Nor to take responsibility of cleaning up such a buffer.
In theory, an implementation, knowing all the internal details, could add a way for a std::string to take over buffers allocated with their allocator, but I don't know of any implementation which does.
Nor is there any guarantee doing so would actually be advantageous, depending on implementation-details.
This code can never be correct:
std::string s(cstring);
delete cstring;
The std::string constructor that takes a character pointer, requires a NUL-terminated string. So it is multiple characters.
delete cstring is scalar delete.
Either you are trying to create a string from a character scalar (in which case, why the indirection?)
std::string s(cstring[0]);
delete cstring;
or you have multiple characters, and should delete accordingly
std::string s(cstring);
delete [] cstring;
Check the other answers for the recommended way to make sure delete[] gets used, e.g.
std::string(std::unique_ptr<char[]>(f()).get())
std::string steal_char_buffer( std::unique_ptr<char[]> buff ) {
std::string s = buff?buff.get():""; // handle null pointers
return s;
}
std::string steal_char_buffer( const char* str ) {
std::unique_ptr<char[]> buff(str); // manage lifetime
return steal_char_buffer(std::move(buff));
}
now you can type
std::string s = steal_char_buffer(f());
and you get a std::string out of f().
You may want to make the argument of steal_char_buffer be a const char*&&. It is mostly pointless, but it might lead to some useful errors.
If you can change the interface of f, make it return a std::string directly or a std::unique_ptr<char[]>.
Another good idea is to wrap f in another function that returns a std::unique_ptr<char[]> or std::string:
std::unique_ptr<char[]> good_f() {
return std::unique_ptr<char[]>(f());
}
and/or
std::string good_f2() {
auto s = good_f();
return steal_char_buffer( std::move(s) );
}

Allocate a struct containing a string in a single allocation

I'm working on a program that stores a vital data structure as an unstructured string with program-defined delimiters (so we need to walk the string and extract the information we need as we go) and we'd like to convert it to a more structured data type.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself. The length of the string will always be known at allocation time. We've determined through testing that doubling the number of allocations required for each of these data types is an unnacceptable cost. Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation? If we were using cstrings I'd just have a char * in the struct and point it to the end of the struct after allocating a block big enough for the struct and string, but we'd prefer std::string if possible.
Most of my experience is with C, so please forgive any C++ ignorance displayed here.
If you have such rigorous memory needs, then you're going to have to abandon std::string.
The best alternative is to find or write an implementation of basic_string_ref (a proposal for the next C++ standard library), which is really just a char* coupled with a size. But it has all of the (non-mutating) functions of std::basic_string. Then you use a factory function to allocate the memory you need (your struct size + string data), and then use placement new to initialize the basic_string_ref.
Of course, you'll also need a custom deletion function, since you can't just pass the pointer to "delete".
Given the previously linked to implementation of basic_string_ref (and its associated typedefs, string_ref), here's a factory constructor/destructor, for some type T that needs to have a string on it:
template<typename T> T *Create(..., const char *theString, size_t lenstr)
{
char *memory = new char[sizeof(T) + lenstr + 1];
memcpy(memory + sizeof(T), theString, lenstr);
try
{
return new(memory) T(..., string_ref(theString, lenstr);
}
catch(...)
{
delete[] memory;
throw;
}
}
template<typename T> T *Create(..., const std::string & theString)
{
return Create(..., theString.c_str(), theString.length());
}
template<typename T> T *Create(..., const string_ref &theString)
{
return Create(..., theString.data(), theString.length());
}
template<typename T> void Destroy(T *pValue)
{
pValue->~T();
char *memory = reinterpret_cast<char*>(pValue);
delete[] memory;
}
Obviously, you'll need to fill in the other constructor parameters yourself. And your type's constructor will need to take a string_ref that refers to the string.
If you are using std::string, you can't really do one allocation for both structure and string, and you also can't make the allocation of both to be one large block. If you are using old C-style strings it's possible though.
If I understand you correctly, you are saying that through profiling you have determined that the fact that you have to allocate a string and another data member in your data structure imposes an unacceptable cost to you application.
If that's indeed the case I can think of a couple solutions.
You could pre-allocate all of these structures up front, before your program starts. Keep them in some kind of fixed collection so they aren't copy-constructed, and reserve enough buffer in your strings to hold your data.
Controversial as it may seem, you could use old C-style char arrays. It seems like you are fogoing much of the reason to use strings in the first place, which is the memory management. However in your case, since you know the needed buffer sizes at start up, you could handle this yourself. If you like the other facilities that string provides, bear in mind that much of that is still available in the <algorithm>s.
Take a look at Variable Sized Struct C++ - the short answer is that there's no way to do it in vanilla C++.
Do you really need to allocate the container structs on the heap? It might be more efficient to have those on the stack, so they don't need to be allocated at all.
Indeed two allocations can seem too high. There are two ways to cut them down though:
Do a single allocation
Do a single dynamic allocation
It might not seem so different, so let me explain.
1. You can use the struct hack in C++
Yes this is not typical C++
Yes this requires special care
Technically it requires:
disabling the copy constructor and assignment operator
making the constructor and destructor private and provide factory methods for allocating and deallocating the object
Honestly, this is the hard-way.
2. You can avoid allocating the outer struct dynamically
Simple enough:
struct M {
Kind _kind;
std::string _data;
};
and then pass instances of M on the stack. Move operations should guarantee that the std::string is not copied (you can always disable copy to make sure of it).
This solution is much simpler. The only (slight) drawback is in memory locality... but on the other hand the top of the stack is already in the CPU cache anyway.
C-style strings can always be converted to std::string as needed. In fact, there's a good chance that your observations from profiling are due to fragmentation of your data rather than simply the number of allocations, and creating an std::string on demand will be efficient. Of course, not knowing your actual application this is just a guess, and really one can't know this until it's tested anyways. I imagine a class
class my_class {
std::string data() const { return self._data; }
const char* data_as_c_str() const // In case you really need it!
{ return self._data; }
private:
int _type;
char _data[1];
};
Note I used a standard clever C trick for data layout: _data is as long as you want it to be, so long as your factory function allocates the extra space for it. IIRC, C99 even gave a special syntax for it:
struct my_struct {
int type;
char data[];
};
which has good odds of working with your C++ compiler. (Is this in the C++11 standard?)
Of course, if you do do this, you really need to make all of the constructors private and friend your factory function, to ensure that the factory function is the only way to actually instantiate my_class -- it would be broken without the extra memory for the array. You'll definitely need to make operator= private too, or otherwise implement it carefully.
Rethinking your data types is probably a good idea.
For example, one thing you can do is, rather than trying to put your char arrays into a structured data type, use a smart reference instead. A class that looks like
class structured_data_reference {
public:
structured_data_reference(const char *data):_data(data) {}
std::string get_first_field() const {
// Do something interesting with _data to get the first field
}
private:
const char *_data;
};
You'll want to do the right thing with the other constructors and assignment operator too (probably disable assignment, and implement something reasonable for move and copy). And you may want reference counted pointers (e.g. std::shared_ptr) throughout your code rather than bare pointers.
Another hack that's possible is to just use std::string, but store the type information in the first entry (or first several). This requires accounting for that whenever you access the data, of course.
I'm not sure if this exactly addressing your problem. One way you can optimize the memory allocation in C++ by using a pre-allocated buffer and then using a 'placement new' operator.
I tried to solve your problem as I understood it.
unsigned char *myPool = new unsigned char[10000];
struct myStruct
{
myStruct(char* aSource1, char* aSource2)
{
original = new (myPool) string(aSource1); //placement new
data = new (myPool) string(aSource2); //placement new
}
~myStruct()
{
original = NULL; //no deallocation needed
data = NULL; //no deallocation needed
}
string* original;
string* data;
};
int main()
{
myStruct* aStruct = new (myPool) myStruct("h1", "h2");
// Use the struct
aStruct = NULL; // No need to deallocate
delete [] myPool;
return 0;
}
[Edit] After, the comment from NicolBolas, the problem is bit more clear. I decided to write one more answer, eventhough in reality it is not that much advantageous than using a raw character array. But, I still believe that this is well within the stated constraints.
Idea would be to provide a custom allocater for the string class as specified in this SO question.
In the implementation of the allocate method, use the placement new as
pointer allocate(size_type n, void * = 0)
{
// fail if we try to allocate too much
if((n * sizeof(T))> max_size()) { throw std::bad_alloc(); }
//T* t = static_cast<T *>(::operator new(n * sizeof(T)));
T* t = new (/* provide the address of the original character buffer*/) T[n];
return t;
}
The constraint is that for the placement new to work, the original string address should be known to the allocater at run time. This can be achieved by external explicit setting before the new string member creation. However, this is not so elegant.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself.
I have a feeling that may you are not exploiting C++'s type-system to its maximum potential here. It looks and feels very C-ish (that is not a proper word, I know). I don't have concrete examples to post here since I don't have any idea about the problem you are trying to solve.
Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation?
I believe that you are worrying about the structure allocation followed by a copy of the string to the structure member? This ideally shouldn't happen (but of course, this depends on how and when you are initializng the members). C++11 supports move construction. This should take care of any extra string copies that you are worried about.
You should really, really post some code to make this discussion worthwhile :)
a vital data structure as an unstructured string with program-defined delimiters
One question: Is this string mutable? If not, you can use a slightly different data-structure. Don't store copies of parts of this vital data structure but rather indices/iterators to this string which point to the delimiters.
// assume that !, [, ], $, % etc. are your program defined delims
const std::string vital = "!id[thisisdata]$[moredata]%[controlblock]%";
// define a special struct
enum Type { ... };
struct Info {
size_t start, end;
Type type;
// define appropriate ctors
};
// parse the string and return Info obejcts
std::vector<Info> parse(const std::string& str) {
std::vector<Info> v;
// loop through the string looking for delims
for (size_t b = 0, e = str.size(); b < e; ++b) {
// on hitting one such delim create an Info
switch( str[ b ] ) {
case '%':
...
case '$;:
// initializing the start and then move until
// you get the appropriate end delim
}
// use push_back/emplace_back to insert this newly
// created Info object back in the vector
v.push_back( Info( start, end, kind ) );
}
return v;
}

const object and const constructor

Is there any way to know if an object is a const object or regular object, for instance consider the following class
class String
{
String(const char* str);
};
if user create a const object from String then there is no reason to copy the passed native string and that because he will not make any manipulation on it, the only thing he will do is get string size, string search and other functions that will not change the string.
There is a very good reason for copying - you can't know that the lifetime of the const char * is the same as that of the String object. And no, there is no way of knowing that you are constructing a const object.
Unfortunately, C++ does not provide a way to do what you are attempting. Simply passing a const char * does not guarantee the lifetime of the memory being pointed to. Consider:
char * a = new char[10];
char const *b = a;
String c (b);
delete[] a;
// c is now broken
There is no way for you to know. You could write a class that tightly interacts with String and that creates a constant string pointing to an external buffer (by making the corresponding constructor private and making the interacting class a nested class or a friend of String).
If all you worry about is doing dynamic memory management on a potentially small constant string, you can implement the Small String Optimization (also Small Object/Buffer Optimization). It works by having an embedded buffer in your string class, and copying each string up to some predefined size into that buffer, and each string that's larger to a dynamically allocated storage (the same technique is used by boost::function for storing small sized function objects).
class String {
union {
char *dynamicptr;
char buffer[16];
};
bool isDynamic;
};
There are clever techniques for storing even the length of the embedded string into the buffer itself (storing its length as buffer[15] and similar trickeries).
You could use const_string to do what you're looking for. However, even with const string you have to "tell" it that the string doesn't need to be copied.
const char* foo = "c-string";
boost::const_string bar(foo); // will copy foo
boost::const_string baz(boost::ref(foo)); // assumes foo will always be a valid pointer.
if user create a const object from String then there is no reason to copy the passed native string and that because he will not make any manipulation on it, the only thing he will do is get string size, string search and other functions that will not change the string.
Oh yes there is. Just that it is passes as const doesn't mean that it actually is const outside of the constructor call, and it especially doesn't mean it won't be destroyed while the string object still exists. The keyword const for a function argument only means that the function won't modify or delete it (trying to implement a function that modifies a const argument will result in a compiler error), but there's no way for the function to know what happens outside.
What you're looking for is basically a COW (copy on write) string. Such things are entirely possible, but getting them to work well is somewhat non-trivial. In a multithreaded environment, getting good performance can go beyond non-trivial into the decidedly difficult range.