Is this good practice for reading into a string in C++? - c++

I have a function in C++ which reads the contents of a HTTP request body into a std::string.
I came up with the following code:
void handle_request_body(int connfd, HttpRequest &req) {
unsigned long size_to_read;
try {
size_to_read = std::stoul(req.headers().at("content-length"));
} catch (std::out_of_range const &) {
return;
}
char *buf = new char[size_to_read + 1];
memset(buf, 0, size_to_read + 1);
read(connfd, buf, size_to_read);
req._body.append(buf);
delete[] buf;
}
This is a little ugly to me as I have to use new since variable-sized arrays are not allowed.
I then tried to read directly to a string instead with the following code:
void handle_request_body(int connfd, HttpRequest &req) {
unsigned long size_to_read;
try {
size_to_read = std::stoul(req.headers().at("content-length"));
} catch (std::out_of_range const &) {
return;
}
std::string buf(size_to_read + 1, 0);
read(connfd, buf.data(), size_to_read);
req._body = buf;
}
I find the second method much cleaner, but I'm worried as to whether it is considered bad practice to read directly into a std::string using its data() method.
Is there a better way to do this?
Any insight is much appreciated!

Really depends what your read function does under the hood.
If you have control over the read function, I strongly suggest you don't use a pointer, but rather a class reference to a std container.
The resizable std containers don't guarantee that the pointers will keep pointing at the same memory i.e. if it reallocates it's size, your pointer will no longer be valid. Which is fine in this example because no one else is touching it, but in a lot of other applications this would be extremely unsafe!
Something like:
void read(int id, std::string& dest, int readLength){
//whatever code gets the data stream
dest += data;
}
If it has to be a C-Style buffer for some OS API call probably best to use a unique pointer of chars, to let the memory clean up after itself.
std::unique_ptr<char[]> buffer = std::make_unique<char[]>(size + 1);
memset(&buffer[0], 0, size + 1);
I don't recommend reading from the OS char by char as this usually has a huge performance overhead for every call, compared to reading it all at once.

As #Galik commented. This is opinion based.
But what we can state is:
In C++ we should not use raw pointers for owned memory
new and delete should be avoided
C-Style arrays should be avoided
Reading into data() will work, but is not that nice (my opinion). Anyway, better than using new and creating a memory leak.
You could, in a simple for loop read one byte after the other from the file and then add each byte to the std::string with +=. But this is also very clumsy.
Best would be, if you were allowed to use C++ language and libraries.
BTW. std::string is also a "variable-sized array"
Recommendation: Go with your second solution

Related

help with fixing memory leak

i have a member function in which i need to get some char array at run time
My fear
Is if i try
delete buffer;
then i cant
return buffer;
But how to i release the memory i allocated with
char * buffer= new char[size]
The class
class OpenglShaderLoader
{
char * getLastGlslError()
{
char * buffer;//i don't know the size of this until runtime
int size;
glShaderiv(hShaderId,GL_INFO_LOG_LENGTH,&size);//get size of buffer
buffer= new char[size];
//.. fill in the buffer
return buffer;
}
}
You should return a std::vector<char>. That way, when the caller finishes using the vector, its contents are freed automatically.
std::vector<char> getLastGlslError()
{
int size;
glShaderiv(hShaderId, GL_INFO_LOG_LENGTH, &size);
std::vector<char> buffer(size);
// fill in the buffer using &buffer[0] as the address
return buffer;
}
There is a simple adage - for every new there must be a delete, in your case, in relation to the class OpenglShaderLoader, when you call getLastGlsError, it returns a pointer to the buffer, it is there, that you must free up the memory, for example:
OpenglShaderLoader *ptr = new OpenglShaderLoader();
char *buf = ptr->getLastGlsError();
// do something with buf
delete [] buf;
You can see the responsibility of the pointer management rests outside the caller function as shown in the above code example/
You'd need another method, such as:
void freeLastGlslError(const char* s)
{
delete [] s;
}
But since you're using C++, not C, you shouldn't return a char*. For an object-oriented design, use a string class that manages the memory for you, like std::string. (Here's the litmus test to keep in mind: if memory is being freed outside of a destructor, you're probably doing something inadvisable.)
Here's a trick how to do it:
class A {
public:
A() : buffer(0) { }
char *get() { delete [] buffer; buffer = new char[10]; return buffer; }
~A() { delete [] buffer; }
private:
char *buffer;
}
When you return that pointer, whatever you're returning the pointer to should assume responsibility over that resource (i.e. delete it when done with it).
Alternatively, you can use a smart pointer to automatically delete the memory for you when nothing points to it.
Creating and returning a stl container or class (e.g. std::vector, std::string) is also a viable option.
Don't return a primitive char*. Encapsulate it in a class.
Assuming that the char array is really not a NULL terminated string, you need to include the size of it on return anyway. (It is sort of messy to continuously call glShaderiv to get the length, especially if it has performance implications. Easier to store the size with the allocation.)
Some have suggested using std::string or std::vector as the return. While each of these will work to a varying degree, they don't tell you what it is that is in each instance. Is it a string you print or is it an array of signed 8 bit integers?
A vector might be closer to what you need, but when you're looking at the code a year from now you won't know if the output vector of one method contains shader info when compared to another method that also returns a vector. There may also be implications of vector that make it undesirable for things like filling the buffer by passing a pointer to a device driver method since the storage is technically hidden.
So putting the return in a class that allocates your buffer and stores the size of the allocation allows you to let the return instance go out of scope and delete the buffer when the caller is done with it.
now body mentioned managed pointers yet?
If you don't need the features of a vector then ::array_ptr<char> might also help rather than rolling your own as in tp1's answer. Depending on version of compiler, available in boost/TR1/std.
boost::array_ptr<char> getLastGlslError()
{
int size;
glShaderiv(hShaderId, GL_INFO_LOG_LENGTH, &size);
boost::array_ptr<char> buffer = new char[size];
return buffer;
}

Reading raw byte array into std::string

I've been wondering about the following issue: assume I have a C style function that reads raw data into a buffer
int recv_n(int handle, void* buf, size_t len);
Can I read the data directly into an std:string or stringstream without allocating any temporal buffers? For example,
std::string s(100, '\0');
recv_n(handle, s.data(), 100);
I guess this solution has an undefined outcome, because, afaik, string::c_str and string::data might return a temporal location and not necessarily return the pointer to the real place in the memory, used by the object to store the data.
Any ideas?
Why not use a vector<char> instead of a string? That way you can do:
vector<char> v(100, '\0');
recv_n(handle, &v[0], 100);
This seems more idiomatic to me, especially since you aren't using it as a string (you say it's raw data).
Yes, after C++11.
But you cant use s.data() as it returns a char const*
Try:
std::string s(100, '\0');
recv_n(handle, &s[0], 100);
Depending on situation, I may have chosen a std::vector<char> especially for raw data (though it would all depend on usage of the data in your application).

std::string.resize() and std::string.length()

I'm relatively new to C++ and I'm still getting to grips with the C++ Standard Library. To help transition from C, I want to format a std::string using printf-style formatters. I realise stringstream is a more type-safe approach, but I find myself finding printf-style much easier to read and deal with (at least, for the time being). This is my function:
using namespace std;
string formatStdString(const string &format, ...)
{
va_list va;
string output;
size_t needed;
size_t used;
va_start(va, format);
needed = vsnprintf(&output[0], 0, format.c_str(), va);
output.resize(needed + 1); // for null terminator??
va_end(va);
va_start(va, format);
used = vsnprintf(&output[0], output.capacity(), format.c_str(), va);
// assert(used == needed);
va_end(va);
return output;
}
This works, kinda. A few things that I am not sure about are:
Do I need to make room for a null terminator, or is this unnecessary?
Is capacity() the right function to call here? I keep thinking length() would return 0 since the first character in the string is a '\0'.
Occasionally while writing this string&apos;s contents to a socket (using its c_str() and length()), I have null bytes popping up on the receiving end, which is causing a bit of grief, but they seem to appear inconsistently. If I don't use this function at all, no null bytes appear.
With the current standard (the upcomming standard differs here) there is no guarantee that the internal memory buffer managed by the std::string will be contiguous, or that the .c_str() method returns a pointer to the internal data representation (the implementation is allowed to generate a contiguous read-only block for that operation and return a pointer into it. A pointer to the actual internal data can be retrieved with the .data() member method, but note that it also returns a constant pointer: i.e. it is not intended for you to modify the contents. The buffer return by .data() it is not necessarily null terminated, the implementation only needs to guarantee the null termination when c_str() is called, so even in implementations where .data() and .c_str() are called, the implementation can add the \0 to the end of the buffer when the latter is called.
The standard intended to allow rope implementations, so in principle it is unsafe to do what you are trying, and from the point of view of the standard you should use an intermediate std::vector (guaranteed contiguity, and there is a guarantee that &myvector[0] is a pointer to the first allocated block of the real buffer).
In all implementations I know of, the internal memory handled by std::string is actually a contiguous buffer and using .data() is undefined behavior (writting to a constant variable) but even if incorrect it might work (I would avoid it). You should use other libraries that are designed for this purpose, like boost::format.
About the null termination. If you finally decide to follow the path of the undefined... you would need to allocate extra space for the null terminator, since the library will write it into the buffer. Now, the problem is that unlike C-style strings, std::strings can hold null pointers internally, so you will have to resize the string down to fit the largest contiguous block of memory from the beginning that contains no \0. That is probably the issue you are finding with spurious null characters. This means that the bad approach of using vsnprintf(or the family) has to be followed by str.resize( strlen( str.c_str() ) ) to discard all contents of the string after the first \0.
Overall, I would advice against this approach, and insist in either getting used to the C++ way of formatting, using third party libraries (boost is third party, but it is also the most standard non-standard library), using vectors or managing memory like in C... but that last option should be avoided like the plague.
// A safe way in C++ of using vsnprintf:
std::vector<char> tmp( 1000 ); // expected maximum size
vsnprintf( &tmp[0], tmp.size(), "Hi %s", name.c_str() ); // assuming name to be a string
std::string salute( &tmp[0] );
Use boost::format, if you prefer printf() over streams.
Edit: Just to make this clear, actually I fully agree with Alan, who said you should use streams.
I think that there are no guarantees that the layout of the string as referenced by &output[0] is contiguous and that you can write to it.
Use std::vector instead as a buffer which is guaranteed to have contiguous storage since C++03.
using namespace std;
string formatStdString(const string &format, ...)
{
va_list va;
vector<string::value_type> output(1); // ensure some storage is allocated
size_t needed;
size_t used;
va_start(va, format);
needed = vsnprintf(&output[0], 0, format.c_str(), va);
output.resize(needed); // don't need null terminator
va_end(va);
// Here we should ensure that needed != 0
va_start(va, format);
used = vsnprintf(&output[0], output.size(), format.c_str(), va); // use size()
// assert(used == needed);
va_end(va);
return string(output.begin(), output.end());
}
NOTE: You'll have to set an initial size to the vector as the statement &output[0] can otherwise attempt to reference a non-existing item (as the internal buffer might not have been allocated yet).
1) You do not need to make space for the null terminator.
2) capacity() tells you how much space the string has reserved internally. length() tells you the length of the string. You probably don't want capacity()
The std::string class takes care of the null terminator for you.
However, as pointed out, since you're using vnsprintf to the raw underying string buffer (C anachronisms die hard...), you will have to ensure there is room for the null terminator.
My implementation for variable argument lists for functions is like this:
std::string format(const char *fmt, ...)
{
using std::string;
using std::vector;
string retStr("");
if (NULL != fmt)
{
va_list marker = NULL;
// initialize variable arguments
va_start(marker, fmt);
// Get formatted string length adding one for NULL
size_t len = _vscprintf(fmt, marker) + 1;
// Create a char vector to hold the formatted string.
vector<char> buffer(len, '\0');
int nWritten = _vsnprintf_s(&buffer[0], buffer.size(), len, fmt,
marker);
if (nWritten > 0)
{
retStr = &buffer[0];
}
// Reset variable arguments
va_end(marker);
}
return retStr;
}
To help transition from C, I want to
format a std::string using
printf-style formatters.
Just don't :(
If you do this, you're not actually learning C++ but coding C with a C++ compiler. It's a bad mindset, bad practice, and it propagates the problems that the std::o*stream classes were created to avoid.
I realise stringstream is a more
type-safe approach, but I find myself
finding printf-style much easier to
read and deal with (at least, for the
time being).
It's not a more typesafe approach. It is a typesafe approach. More than that, it minimizes dependencies, it lowers the number of issues you have to keep track of (like explicit buffer allocation and keeping track of the null char terminator) and it makes it easier to maintain your code.
Above that it is completely extensible / customizable:
you can extend locale formatting
you can define the i/o operations for custom data types
you can add new types of output formatting
you can add new buffer i/o types (making for example std::clog write to a window)
you can plug in different error handling policies.
std::o*stream family of classes is very powerful and once you learn to use it correctly there's little doubt you will not go back.
Unless you have very specific requirements your time will probably be much better spent learning the o*stream classes than writing printf in C++.

Can I get a non-const C string back from a C++ string?

Const-correctness in C++ is still giving me headaches. In working with some old C code, I find myself needing to assign turn a C++ string object into a C string and assign it to a variable. However, the variable is a char * and c_str() returns a const char []. Is there a good way to get around this without having to roll my own function to do it?
edit: I am also trying to avoid calling new. I will gladly trade slightly more complicated code for less memory leaks.
C++17 and newer:
foo(s.data(), s.size());
C++11, C++14:
foo(&s[0], s.size());
However this needs a note of caution: The result of &s[0]/s.data()/s.c_str() is only guaranteed to be valid until any member function is invoked that might change the string. So you should not store the result of these operations anywhere. The safest is to be done with them at the end of the full expression, as my examples do.
Pre C++-11 answer:
Since for to me inexplicable reasons nobody answered this the way I do now, and since other questions are now being closed pointing to this one, I'll add this here, even though coming a year too late will mean that it hangs at the very bottom of the pile...
With C++03, std::string isn't guaranteed to store its characters in a contiguous piece of memory, and the result of c_str() doesn't need to point to the string's internal buffer, so the only way guaranteed to work is this:
std::vector<char> buffer(s.begin(), s.end());
foo(&buffer[0], buffer.size());
s.assign(buffer.begin(), buffer.end());
This is no longer true in C++11.
There is an important distinction you need to make here: is the char* to which you wish to assign this "morally constant"? That is, is casting away const-ness just a technicality, and you really will still treat the string as a const? In that case, you can use a cast - either C-style or a C++-style const_cast. As long as you (and anyone else who ever maintains this code) have the discipline to treat that char* as a const char*, you'll be fine, but the compiler will no longer be watching your back, so if you ever treat it as a non-const you may be modifying a buffer that something else in your code relies upon.
If your char* is going to be treated as non-const, and you intend to modify what it points to, you must copy the returned string, not cast away its const-ness.
I guess there is always strcpy.
Or use char* strings in the parts of your C++ code that must interface with the old stuff.
Or refactor the existing code to compile with the C++ compiler and then to use std:string.
There's always const_cast...
std::string s("hello world");
char *p = const_cast<char *>(s.c_str());
Of course, that's basically subverting the type system, but sometimes it's necessary when integrating with older code.
If you can afford extra allocation, instead of a recommended strcpy I would consider using std::vector<char> like this:
// suppose you have your string:
std::string some_string("hello world");
// you can make a vector from it like this:
std::vector<char> some_buffer(some_string.begin(), some_string.end());
// suppose your C function is declared like this:
// some_c_function(char *buffer);
// you can just pass this vector to it like this:
some_c_function(&some_buffer[0]);
// if that function wants a buffer size as well,
// just give it some_buffer.size()
To me this is a bit more of a C++ way than strcpy. Take a look at Meyers' Effective STL Item 16 for a much nicer explanation than I could ever provide.
You can use the copy method:
len = myStr.copy(cStr, myStr.length());
cStr[len] = '\0';
Where myStr is your C++ string and cStr a char * with at least myStr.length()+1 size. Also, len is of type size_t and is needed, because copy doesn't null-terminate cStr.
Just use const_cast<char*>(str.data())
Do not feel bad or weird about it, it's perfectly good style to do this.
It's guaranteed to work in C++11. The fact that it's const qualified at all is arguably a mistake by the original standard before it; in C++03 it was possible to implement string as a discontinuous list of memory, but no one ever did it. There is not a compiler on earth that implements string as anything other than a contiguous block of memory, so feel free to treat it as such with complete confidence.
If you know that the std::string is not going to change, a C-style cast will work.
std::string s("hello");
char *p = (char *)s.c_str();
Of course, p is pointing to some buffer managed by the std::string. If the std::string goes out of scope or the buffer is changed (i.e., written to), p will probably be invalid.
The safest thing to do would be to copy the string if refactoring the code is out of the question.
std::string vString;
vString.resize(256); // allocate some space, up to you
char* vStringPtr(&vString.front());
// assign the value to the string (by using a function that copies the value).
// don't exceed vString.size() here!
// now make sure you erase the extra capacity after the first encountered \0.
vString.erase(std::find(vString.begin(), vString.end(), 0), vString.end());
// and here you have the C++ string with the proper value and bounds.
This is how you turn a C++ string to a C string. But make sure you know what you're doing, as it's really easy to step out of bounds using raw string functions. There are moments when this is necessary.
If c_str() is returning to you a copy of the string object internal buffer, you can just use const_cast<>.
However, if c_str() is giving you direct access tot he string object internal buffer, make an explicit copy, instead of removing the const.
Since c_str() gives you direct const access to the data structure, you probably shouldn't cast it. The simplest way to do it without having to preallocate a buffer is to just use strdup.
char* tmpptr;
tmpptr = strdup(myStringVar.c_str();
oldfunction(tmpptr);
free tmpptr;
It's quick, easy, and correct.
In CPP, if you want a char * from a string.c_str()
(to give it for example to a function that only takes a char *),
you can cast it to char * directly to lose the const from .c_str()
Example:
launchGame((char *) string.c_str());
C++17 adds a char* string::data() noexcept overload. So if your string object isn't const, the pointer returned by data() isn't either and you can use that.
Is it really that difficult to do yourself?
#include <string>
#include <cstring>
char *convert(std::string str)
{
size_t len = str.length();
char *buf = new char[len + 1];
memcpy(buf, str.data(), len);
buf[len] = '\0';
return buf;
}
char *convert(std::string str, char *buf, size_t len)
{
memcpy(buf, str.data(), len - 1);
buf[len - 1] = '\0';
return buf;
}
// A crazy template solution to avoid passing in the array length
// but loses the ability to pass in a dynamically allocated buffer
template <size_t len>
char *convert(std::string str, char (&buf)[len])
{
memcpy(buf, str.data(), len - 1);
buf[len - 1] = '\0';
return buf;
}
Usage:
std::string str = "Hello";
// Use buffer we've allocated
char buf[10];
convert(str, buf);
// Use buffer allocated for us
char *buf = convert(str);
delete [] buf;
// Use dynamic buffer of known length
buf = new char[10];
convert(str, buf, 10);
delete [] buf;

Get bytes from std::string in C++

I'm working in a C++ unmanaged project.
I need to know how can I take a string like this "some data to encrypt" and get a byte[] array which I'm gonna use as the source for Encrypt.
In C# I do
for (int i = 0; i < text.Length; i++)
buffer[i] = (byte)text[i];
What I need to know is how to do the same but using unmanaged C++.
Thanks!
If you just need read-only access, then c_str() will do it:
char const *c = myString.c_str();
If you need read/write access, then you can copy the string into a vector. vectors manage dynamic memory for you. You don't have to mess with allocation/deallocation then:
std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];
std::string::data would seem to be sufficient and most efficient. If you want to have non-const memory to manipulate (strange for encryption) you can copy the data to a buffer using memcpy:
unsigned char buffer[mystring.length()];
memcpy(buffer, mystring.data(), mystring.length());
STL fanboys would encourage you to use std::copy instead:
std::copy(mystring.begin(), mystring.end(), buffer);
but there really isn't much of an upside to this. If you need null termination use std::string::c_str() and the various string duplication techniques others have provided, but I'd generally avoid that and just query for the length. Particularly with cryptography you just know somebody is going to try to break it by shoving nulls in to it, and using std::string::data() discourages you from lazily making assumptions about the underlying bits in the string.
Normally, encryption functions take
encrypt(const void *ptr, size_t bufferSize);
as arguments. You can pass c_str and length directly:
encrypt(strng.c_str(), strng.length());
This way, extra space is allocated or wasted.
In C++17 and later you can use std::byte to represent actual byte data. I would recommend something like this:
std::vector<std::byte> to_bytes(std::string const& s)
{
std::vector<std::byte> bytes;
bytes.reserve(std::size(s));
std::transform(std::begin(s), std::end(s), std::back_inserter(bytes), [](char c){
return std::byte(c);
});
return bytes;
}
From a std::string you can use the c_ptr() method if you want to get at the char_t buffer pointer.
It looks like you just want copy the characters of the string into a new buffer. I would simply use the std::string::copy function:
length = str.copy( buffer, str.size() );
If you just need to read the data.
encrypt(str.data(),str.size());
If you need a read/write copy of the data put it into a vector. (Don;t dynamically allocate space that's the job of vector).
std::vector<byte> source(str.begin(),str.end());
encrypt(&source[0],source.size());
Of course we are all assuming that byte is a char!!!
If this is just plain vanilla C, then:
strcpy(buffer, text.c_str());
Assuming that buffer is allocated and large enough to hold the contents of 'text', which is the assumption in your original code.
If encrypt() takes a 'const char *' then you can use
encrypt(text.c_str())
and you do not need to copy the string.
You might go with range-based for loop, which would look like this:
std::vector<std::byte> getByteArray(const string& str)
{
std::vector<std::byte> buffer;
for (char str_char : str)
buffer.push_back(std::byte(str_char));
return buffer;
}
I dont think you want to use the c# code you have there. They provide System.Text.Encoding.ASCII(also UTF-*)
string str = "some text;
byte[] bytes = System.Text.Encoding.ASCII.GetBytes(str);
your problems stem from ignoring the encoding in c# not your c++ code