Passing pointer to first string element as buffer - c++

I have a bunch of old linux code which does something like this:
int size_of_buffer = /*stuff computed dynamically*/;
char *buffer = malloc(size_of_buffer);
recv(socket, buffer, size_of_buffer, 0);
//do some processing of the buffer as string
free(buffer);
When I was migrating it to C++ I changed it like this:
int size_of_buffer = /*stuff computed dynamically*/;
const auto buffer = make_unique<char[]>(size_of_buffer);
recv(socket, buffer.get(), size_of_buffer, 0);
const std::string str_buffer = buffer.get();
//do some processing on str_buffer
Which you can't fail to notice causes double memory allocation and potentially multiple copying of data. My idea now is to pass the pointer to first character of the std::string with reserved storage, like this:
int size_of_buffer = /*stuff computed dynamically*/;
std::string buffer;
buffer.reserve(size_of_buffer);
recv(socket, &(buffer[0]), size_of_buffer, 0);
//do some processing on buffer
Is above code safe and well defined or there are some caveats and dangers that need to be avoided?

A similar question was asked here. The short answer is: it is not possible without copying.
Below C++17, there is no non-const overload of std::string::data(), and
1) Modifying the character array accessed through the const overload of data has undefined behavior.
Hence, you cannot modify the string through data.
Since C++11,
data() + i == std::addressof(operator[](i)) for every i in [0, size()].
Therefore, you also cannot modify the string through &(buffer[0]).
Before C+11, it is actually not very clear to me, what exactly is allowed, so maybe modifying through &(buffer.begin()) is ok, but I don't think so.
On cppreference, there is actually a quote that confounds me a bit
The elements of a basic_string are stored contiguously, that is, for a basic_string s, &*(s.begin() + n) == &*s.begin() + n for any n in [0, s.size()), or, equivalently, a pointer to s[0] can be passed to functions that expect a pointer to the first element of a (null-terminated (since C++11)) CharT[] array.
I think this means to const array, since otherwise it would not fit to the rest of the documentation and right now I do not have the time to go through the standard.
Since C++17, your code is ok, if you use std::string::resize instead of reserve, since data() only guarantees a valid range on [data(), data() + size()) (or you can just construct the string with the right size). There is no-non-copy way to create a string from a char *.
You can use a std::string_view, which has a constant constructor from char *. It does exactly what you want here, since it has no ownership on the pointer and memory.

Related

Is this how the size() function really works in std::string?

I wrote a simple function that can get the size of a std::string class object, and I know that size() function in std::string does the same job, So I wanted to know if the size() function really works like my function or if it is more complicated? If it's more complicated, then how?
int sizeOfString(const string str) {
int i=0;
while (str[i] != '\0') {
++i;
}
return i;
}
An std::string can contain null bytes, so your sizeOfString() function will produce a different result on the following input:
std::string evil("abc\0def", 7);
As for your other question: the size() method simply reads out an internal size field, so it is always constant time, while yours is linear in the size of the string.
You can peek at the implementation of std::string::size for various implementations for yourself: libc++, MSVC, libstdc++.
No.
Firstly, a std::string can contain NUL characters that count as part of the length, so you can't use '\0' as a sentinal, in the way you would for C-strings.
Secondly, The Standard guarantees that std::string::size has constant complexity.
In practice there are a few slightly different ways to represent a std::string:
pointer to start of buffer, buffer size, length of current data - size() just has to return the length member.
pointer to start of buffer, pointer to end of current data, pointer to end of buffer - size() has to return a simple calculation.
It is different than your implementation.
Your function iterates over the string until it find a null byte. Null terminated string are how string are handled in C through char*. In C++ a string is a full object with member variables.
Specifically for C++, the size of the string is stored as part of the object, making the size() function simply read out the value of a variable.
For a interesting talk about how a string works in C++ check out this video from CppCon: https://www.youtube.com/watch?v=kPR8h4-qZdk
No. Not at all like that.
std::string actually maintains the size as one of its data member. Think of std::string as a container that keeps a pointer to the actual data(a char*) and length of that data separate.
When you call size(), it actually just returns this size, hence it's O(1).
One example to highlight it's effect in practicality will be
// WRONG IMPLEMENTATION
int wrongChangeLengthToZero(std::string& s)
{
assert(s.size() != 0);
s[0]='\';
return s.size(); // Won't return 0
}
// CORRECT
int correctChangeLengthToZero(std::string& s)
{
assert(s.size() != 0);
s.resize(0);
return s.size(); // Will return 0
}

Using C functions to manipulate std::string

Sometimes you need to fill an std::string with characters constructed by a C function. A typical example is this:
constexpr static BUFFERSIZE{256};
char buffer[BUFFERSIZE];
snprint (buffer, BUFFERSIZE, formatstring, value1, value2);
return std::string(buffer);
Notice how we first need to fill a local buffer, and then copy it to the std::string.
The example becomes more complex if the maximum buffersize is calculated and not necessarily something you want to store on the stack. For example:
constexpr static BUFFERSIZE{256};
if (calculatedBufferSize>BUFFERSIZE)
{
auto ptr = std::make_unique<char[]>(calculatedBufferSize);
snprint (ptr.get(), calculatedBufferSize, formatstring, value1, value2);
return std::string(ptr.get());
}
else
{
char buffer[BUFFERSIZE];
snprint (buffer, BUFFERSIZE, formatstring, value1, value2);
return std::string(buffer);
}
This makes the code even more complex, and if the calculatedBufferSize is larger than what we want on the stack, we essentially do the following:
allocate memory (make_unique)
fill the memory with the wanted result
allocate memory (std::string)
copy memory to the string
deallocate memory
Since C++17 std::string has a non-const data() method, implying that this is the way to manipulate strings. So it seems tempting to do this:
std::string result;
result.resize(calculatedBufferSize);
snprint (result.data(), calculatedBufferSize, formatstring, value1, value2);
result.resize(strlen(result.c_str()));
return result;
My experiments show that the last resize is needed to make sure that the length of the string is reported correctly. std::string::length() does not search for a nul-terminator, it just returns the size (just like std::vector does).
Notice that we have much less allocation and copying going on:
allocate memory (resize string)
fill the memory with the wanted result
To be honest, although it seems to be much more efficient, it also looks very 'un-standard' to me. Can somebody indicate whether this is behavior allowed by the C++17 standard? Or is there another way to have this kind of manipulations in a more efficient way?
Please don't refer to question Manipulating std::string, as that question is about much more dirty logic (even using memset).
Also don't answer that I must use C++ streams (std::string_stream, efficient?, honestly?). Sometimes you simply have efficient logic in C that you want to reuse.
Modifying the contents pointed to by data() is fine, assuming you do not set the value at data() + size() to anything other than the null character. From [string.accessors]:
charT* data() noexcept;
Returns: A pointer p such that p + i == addressof(operator[](i)) for each i in [0, size()].
Complexity: Constant time.
Remarks: The program shall not modify the value stored at p + size() to any value other than charT(); otherwise, the behavior is undefined.
The statement result.resize(strlen(result.c_str())); does look a bit odd, though. std::snprintf returns the number of characters written; using that value to resize the string would be more appropriate. Additionally, it looks slightly neater to construct the string with the correct size instead of constructing an empty one that is immediately resized:
std::string result(maxlen, '\0');
result.resize(std::max(0, std::snprintf(result.data(), maxlen, fmt, value1, value2)));
return result;
The general approach looks fine to me. I would make couple of changes.
Capture the return value of snprinf.
Use it to perform error check and avoid a call to strlen.
std::string result;
result.resize(calculatedBufferSize);
int n = snprint (result.data(), calculatedBufferSize, formatstring, value1, value2);
if ( n < 0 )
{
// Problem. Deal with the error.
}
result.resize(n);
return result;

How to get a non-const pointer to the internal string inside std::string [duplicate]

It is well defined C++ to pass std::vector to C APIs that expect an output array like so because std::vector is contiguous:
std::vector<char> myArray(arraySize);
cStyleAPI(&myArray[0], arraySize);
Is it safe to pass std::string in the same manner to C APIs? Is there any guarantee in standard C++03 that std::string is contiguous and works in the same way as std::vector does in this situation?
If the C API function requires read-only access to the contents of the std::string then use the std::string::c_str() member function to pass the string. This is guaranteed to be a null terminated string.
If you intend to use the std::string as an out parameter, C++03 doesn't guarantee that the stored string is contiguous in memory, but C++11 does. With the latter it is OK to modify the string via operator[] as long as you don't modify the terminating NUL character.
So, I know this has been answered already, but I saw your comment in Praetorian's answer:
It's an OpenGL driver bug that results in the return value for the
maximum length string being broken. See
https://forums.geforce.com/default/topic/531732/glgetactiveattrib-invalid/.
glGetActiveAttrib won't try to write to the pointer returned by the
new[] call with the 0 size allocation, but the string isn't null
terminated. Then, later in the code, the non-null terminated string is
copied into a std::string for storage, which results in a read buffer
overflow. Very confusing to me too, and just checking here to see if
std::string would make things easier to follow.
Uhm... forgive me but if this is your problem then all these solutions seem to be overly complicated. If the problem boils down to the fact that you get a 0 as the buffer size that you need (which means that you will end up with a string that's not NULL-terminated, since there's no space for the NULL terminator) then simply make sure a NULL terminator is always present:
int arraySize;
/* assume arraySize is set to the length we need */
...
/* overallocate by 1 to add an explicit NULL terminator. Just in case. */
char *ptr = malloc(arraySize + 1);
if(ptr != NULL)
{
/* zero out the chunk of memory we got */
memset(ptr, 0, arraySize + 1);
/* call OpenGL function */
cStyleAPI(ptr, arraySize);
/* behold, even if arraySize is 0 because of OpenGL, ptr is still NULL-terminated */
assert(ptr[arraySize] == 0);
/* use it */
...
/* lose it */
free(ptr);
}
This seems, to me, to be the simplest, sanest solution.
Yes, but you'll need to pass them using the c_str method to guarantee null-termination.
No, it isn't, but generally because C strings are assumed to be zero-terminated, which your pointer-to-char isn't. (If you use string, you can use string::c_str() instead, that's 0-terminated.)
Anyways, C++11 does require the elements of vector to be contiguous in memory.
cStyleAPI(&myArray[0], arraySize);
If your cStyleAPI receives a char* for input, that is what std::string::c_str() is for.
If it receives a pre-allocated char* for output, then no. In that case, you should use a std::vector<char> or std::array<char>.
use resize_and_overwrite
https://en.cppreference.com/w/cpp/string/basic_string/resize_and_overwrite
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1072r10.html
[[nodiscard]] static inline string formaterr (DWORD errcode) {
string strerr;
strerr.resize_and_overwrite(2048, [errcode](char* buf, size_t buflen) {
// https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessage
return FormatMessageA(
FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
nullptr,
errcode,
0,
buf,
static_cast<DWORD>(buflen),
nullptr
);
});
return strerr;
}

std::string.c_str() has different value than std::string?

I have been working with C++ strings and trying to load char * strings into std::string by using C functions such as strcpy(). Since strcpy() takes char * as a parameter, I have to cast it which goes something like this:
std::string destination;
unsigned char *source;
strcpy((char*)destination.c_str(), (char*)source);
The code works fine and when I run the program in a debugger, the value of *source is stored in destination, but for some odd reason it won't print out with the statement
std::cout << destination;
I noticed that if I use
std::cout << destination.c_str();
The value prints out correctly and all is well. Why does this happen? Is there a better method of copying an unsigned char* or char* into a std::string (stringstreams?) This seems to only happen when I specify the string as foo.c_str() in a copying operation.
Edit: To answer the question "why would you do this?", I am using strcpy() as a plain example. There are other times that it's more complex than assignment. For example, having to copy only X amount of string A into string B using strncpy() or passing a std::string to a function from a C library that takes a char * as a parameter for a buffer.
Here's what you want
std::string destination = source;
What you're doing is wrong on so many levels... you're writing over the inner representation of a std::string... I mean... not cool man... it's much more complex than that, arrays being resized, read-only memory... the works.
This is not a good idea at all for two reasons:
destination.c_str() is a const pointer and casting away it's const and writing to it is undefined behavior.
You haven't set the size of the string, meaning that it won't even necessealy have a large enough buffer to hold the string which is likely to cause an access violation.
std::string has a constructor which allows it to be constructed from a char* so simply write:
std::string destination = source
Well what you are doing is undefined behavior. Your c_str() returns a const char * and is not meant to be assigned to. Why not use the defined constructor or assignment operator.
std::string defines an implicit conversion from const char* to std::string... so use that.
You decided to cast away an error as c_str() returns a const char*, i.e., it does not allow for writing to its underlying buffer. You did everything you could to get around that and it didn't work (you shouldn't be surprised at this).
c_str() returns a const char* for good reason. You have no idea if this pointer points to the string's underlying buffer. You have no idea if this pointer points to a memory block large enough to hold your new string. The library is using its interface to tell you exactly how the return value of c_str() should be used and you're ignoring that completely.
Do not do what you are doing!!!
I repeat!
DO NOT DO WHAT YOU ARE DOING!!!
That it seems to sort of work when you do some weird things is a consequence of how the string class was implemented. You are almost certainly writing in memory you shouldn't be and a bunch of other bogus stuff.
When you need to interact with a C function that writes to a buffer there's two basic methods:
std::string read_from_sock(int sock) {
char buffer[1024] = "";
int recv = read(sock, buffer, 1024);
if (recv > 0) {
return std::string(buffer, buffer + recv);
}
return std::string();
}
Or you might try the peek method:
std::string read_from_sock(int sock) {
int recv = read(sock, 0, 0, MSG_PEEK);
if (recv > 0) {
std::vector<char> buf(recv);
recv = read(sock, &buf[0], recv, 0);
return std::string(buf.begin(), buf.end());
}
return std::string();
}
Of course, these are not very robust versions...but they illustrate the point.
First you should note that the value returned by c_str is a const char* and must not be modified. Actually it even does not have to point to the internal buffer of string.
In response to your edit:
having to copy only X amount of string A into string B using strncpy()
If string A is a char array, and string B is std::string, and strlen(A) >= X, then you can do this:
B.assign(A, A + X);
passing a std::string to a function from a C library that takes a char
* as a parameter for a buffer
If the parameter is actually const char *, you can use c_str() for that. But if it is just plain char *, and you are using a C++11 compliant compiler, then you can do the following:
c_function(&B[0]);
However, you need to ensure that there is room in the string for the data(same as if you were using a plain c-string), which you can do with a call to the resize() function. If the function writes an unspecified amount of characters to the string as a null-terminated c-string, then you will probably want to truncate the string afterward, like this:
B.resize(B.find('\0'));
The reason you can safely do this in a C++11 compiler and not a C++03 compiler is that in C++03, strings were not guaranteed by the standard to be contiguous, but in C++11, they are. If you want the guarantee in C++03, then you can use std::vector<char> instead.

Can I get a non-const C string back from a C++ string?

Const-correctness in C++ is still giving me headaches. In working with some old C code, I find myself needing to assign turn a C++ string object into a C string and assign it to a variable. However, the variable is a char * and c_str() returns a const char []. Is there a good way to get around this without having to roll my own function to do it?
edit: I am also trying to avoid calling new. I will gladly trade slightly more complicated code for less memory leaks.
C++17 and newer:
foo(s.data(), s.size());
C++11, C++14:
foo(&s[0], s.size());
However this needs a note of caution: The result of &s[0]/s.data()/s.c_str() is only guaranteed to be valid until any member function is invoked that might change the string. So you should not store the result of these operations anywhere. The safest is to be done with them at the end of the full expression, as my examples do.
Pre C++-11 answer:
Since for to me inexplicable reasons nobody answered this the way I do now, and since other questions are now being closed pointing to this one, I'll add this here, even though coming a year too late will mean that it hangs at the very bottom of the pile...
With C++03, std::string isn't guaranteed to store its characters in a contiguous piece of memory, and the result of c_str() doesn't need to point to the string's internal buffer, so the only way guaranteed to work is this:
std::vector<char> buffer(s.begin(), s.end());
foo(&buffer[0], buffer.size());
s.assign(buffer.begin(), buffer.end());
This is no longer true in C++11.
There is an important distinction you need to make here: is the char* to which you wish to assign this "morally constant"? That is, is casting away const-ness just a technicality, and you really will still treat the string as a const? In that case, you can use a cast - either C-style or a C++-style const_cast. As long as you (and anyone else who ever maintains this code) have the discipline to treat that char* as a const char*, you'll be fine, but the compiler will no longer be watching your back, so if you ever treat it as a non-const you may be modifying a buffer that something else in your code relies upon.
If your char* is going to be treated as non-const, and you intend to modify what it points to, you must copy the returned string, not cast away its const-ness.
I guess there is always strcpy.
Or use char* strings in the parts of your C++ code that must interface with the old stuff.
Or refactor the existing code to compile with the C++ compiler and then to use std:string.
There's always const_cast...
std::string s("hello world");
char *p = const_cast<char *>(s.c_str());
Of course, that's basically subverting the type system, but sometimes it's necessary when integrating with older code.
If you can afford extra allocation, instead of a recommended strcpy I would consider using std::vector<char> like this:
// suppose you have your string:
std::string some_string("hello world");
// you can make a vector from it like this:
std::vector<char> some_buffer(some_string.begin(), some_string.end());
// suppose your C function is declared like this:
// some_c_function(char *buffer);
// you can just pass this vector to it like this:
some_c_function(&some_buffer[0]);
// if that function wants a buffer size as well,
// just give it some_buffer.size()
To me this is a bit more of a C++ way than strcpy. Take a look at Meyers' Effective STL Item 16 for a much nicer explanation than I could ever provide.
You can use the copy method:
len = myStr.copy(cStr, myStr.length());
cStr[len] = '\0';
Where myStr is your C++ string and cStr a char * with at least myStr.length()+1 size. Also, len is of type size_t and is needed, because copy doesn't null-terminate cStr.
Just use const_cast<char*>(str.data())
Do not feel bad or weird about it, it's perfectly good style to do this.
It's guaranteed to work in C++11. The fact that it's const qualified at all is arguably a mistake by the original standard before it; in C++03 it was possible to implement string as a discontinuous list of memory, but no one ever did it. There is not a compiler on earth that implements string as anything other than a contiguous block of memory, so feel free to treat it as such with complete confidence.
If you know that the std::string is not going to change, a C-style cast will work.
std::string s("hello");
char *p = (char *)s.c_str();
Of course, p is pointing to some buffer managed by the std::string. If the std::string goes out of scope or the buffer is changed (i.e., written to), p will probably be invalid.
The safest thing to do would be to copy the string if refactoring the code is out of the question.
std::string vString;
vString.resize(256); // allocate some space, up to you
char* vStringPtr(&vString.front());
// assign the value to the string (by using a function that copies the value).
// don't exceed vString.size() here!
// now make sure you erase the extra capacity after the first encountered \0.
vString.erase(std::find(vString.begin(), vString.end(), 0), vString.end());
// and here you have the C++ string with the proper value and bounds.
This is how you turn a C++ string to a C string. But make sure you know what you're doing, as it's really easy to step out of bounds using raw string functions. There are moments when this is necessary.
If c_str() is returning to you a copy of the string object internal buffer, you can just use const_cast<>.
However, if c_str() is giving you direct access tot he string object internal buffer, make an explicit copy, instead of removing the const.
Since c_str() gives you direct const access to the data structure, you probably shouldn't cast it. The simplest way to do it without having to preallocate a buffer is to just use strdup.
char* tmpptr;
tmpptr = strdup(myStringVar.c_str();
oldfunction(tmpptr);
free tmpptr;
It's quick, easy, and correct.
In CPP, if you want a char * from a string.c_str()
(to give it for example to a function that only takes a char *),
you can cast it to char * directly to lose the const from .c_str()
Example:
launchGame((char *) string.c_str());
C++17 adds a char* string::data() noexcept overload. So if your string object isn't const, the pointer returned by data() isn't either and you can use that.
Is it really that difficult to do yourself?
#include <string>
#include <cstring>
char *convert(std::string str)
{
size_t len = str.length();
char *buf = new char[len + 1];
memcpy(buf, str.data(), len);
buf[len] = '\0';
return buf;
}
char *convert(std::string str, char *buf, size_t len)
{
memcpy(buf, str.data(), len - 1);
buf[len - 1] = '\0';
return buf;
}
// A crazy template solution to avoid passing in the array length
// but loses the ability to pass in a dynamically allocated buffer
template <size_t len>
char *convert(std::string str, char (&buf)[len])
{
memcpy(buf, str.data(), len - 1);
buf[len - 1] = '\0';
return buf;
}
Usage:
std::string str = "Hello";
// Use buffer we've allocated
char buf[10];
convert(str, buf);
// Use buffer allocated for us
char *buf = convert(str);
delete [] buf;
// Use dynamic buffer of known length
buf = new char[10];
convert(str, buf, 10);
delete [] buf;