Using C functions to manipulate std::string

Using C functions to manipulate std::string - c++

Sometimes you need to fill an std::string with characters constructed by a C function. A typical example is this:
constexpr static BUFFERSIZE{256};
char buffer[BUFFERSIZE];
snprint (buffer, BUFFERSIZE, formatstring, value1, value2);
return std::string(buffer);
Notice how we first need to fill a local buffer, and then copy it to the std::string.
The example becomes more complex if the maximum buffersize is calculated and not necessarily something you want to store on the stack. For example:
constexpr static BUFFERSIZE{256};
if (calculatedBufferSize>BUFFERSIZE)
{
auto ptr = std::make_unique<char[]>(calculatedBufferSize);
snprint (ptr.get(), calculatedBufferSize, formatstring, value1, value2);
return std::string(ptr.get());
}
else
{
char buffer[BUFFERSIZE];
snprint (buffer, BUFFERSIZE, formatstring, value1, value2);
return std::string(buffer);
}
This makes the code even more complex, and if the calculatedBufferSize is larger than what we want on the stack, we essentially do the following:
allocate memory (make_unique)
fill the memory with the wanted result
allocate memory (std::string)
copy memory to the string
deallocate memory
Since C++17 std::string has a non-const data() method, implying that this is the way to manipulate strings. So it seems tempting to do this:
std::string result;
result.resize(calculatedBufferSize);
snprint (result.data(), calculatedBufferSize, formatstring, value1, value2);
result.resize(strlen(result.c_str()));
return result;
My experiments show that the last resize is needed to make sure that the length of the string is reported correctly. std::string::length() does not search for a nul-terminator, it just returns the size (just like std::vector does).
Notice that we have much less allocation and copying going on:
allocate memory (resize string)
fill the memory with the wanted result
To be honest, although it seems to be much more efficient, it also looks very 'un-standard' to me. Can somebody indicate whether this is behavior allowed by the C++17 standard? Or is there another way to have this kind of manipulations in a more efficient way?
Please don't refer to question Manipulating std::string, as that question is about much more dirty logic (even using memset).
Also don't answer that I must use C++ streams (std::string_stream, efficient?, honestly?). Sometimes you simply have efficient logic in C that you want to reuse.

Modifying the contents pointed to by data() is fine, assuming you do not set the value at data() + size() to anything other than the null character. From [string.accessors]:
charT* data() noexcept;
Returns: A pointer p such that p + i == addressof(operator[](i)) for each i in [0, size()].
Complexity: Constant time.
Remarks: The program shall not modify the value stored at p + size() to any value other than charT(); otherwise, the behavior is undefined.
The statement result.resize(strlen(result.c_str())); does look a bit odd, though. std::snprintf returns the number of characters written; using that value to resize the string would be more appropriate. Additionally, it looks slightly neater to construct the string with the correct size instead of constructing an empty one that is immediately resized:
std::string result(maxlen, '\0');
result.resize(std::max(0, std::snprintf(result.data(), maxlen, fmt, value1, value2)));
return result;

The general approach looks fine to me. I would make couple of changes.
Capture the return value of snprinf.
Use it to perform error check and avoid a call to strlen.
std::string result;
result.resize(calculatedBufferSize);
int n = snprint (result.data(), calculatedBufferSize, formatstring, value1, value2);
if ( n < 0 )
{
// Problem. Deal with the error.
}
result.resize(n);
return result;

Related

Is this how the size() function really works in std::string?

I wrote a simple function that can get the size of a std::string class object, and I know that size() function in std::string does the same job, So I wanted to know if the size() function really works like my function or if it is more complicated? If it's more complicated, then how?
int sizeOfString(const string str) {
int i=0;
while (str[i] != '\0') {
++i;
}
return i;
}

An std::string can contain null bytes, so your sizeOfString() function will produce a different result on the following input:
std::string evil("abc\0def", 7);
As for your other question: the size() method simply reads out an internal size field, so it is always constant time, while yours is linear in the size of the string.
You can peek at the implementation of std::string::size for various implementations for yourself: libc++, MSVC, libstdc++.

No.
Firstly, a std::string can contain NUL characters that count as part of the length, so you can't use '\0' as a sentinal, in the way you would for C-strings.
Secondly, The Standard guarantees that std::string::size has constant complexity.
In practice there are a few slightly different ways to represent a std::string:
pointer to start of buffer, buffer size, length of current data - size() just has to return the length member.
pointer to start of buffer, pointer to end of current data, pointer to end of buffer - size() has to return a simple calculation.

It is different than your implementation.
Your function iterates over the string until it find a null byte. Null terminated string are how string are handled in C through char*. In C++ a string is a full object with member variables.
Specifically for C++, the size of the string is stored as part of the object, making the size() function simply read out the value of a variable.
For a interesting talk about how a string works in C++ check out this video from CppCon: https://www.youtube.com/watch?v=kPR8h4-qZdk

No. Not at all like that.
std::string actually maintains the size as one of its data member. Think of std::string as a container that keeps a pointer to the actual data(a char*) and length of that data separate.
When you call size(), it actually just returns this size, hence it's O(1).
One example to highlight it's effect in practicality will be
// WRONG IMPLEMENTATION
int wrongChangeLengthToZero(std::string& s)
{
assert(s.size() != 0);
s[0]='\';
return s.size(); // Won't return 0
}
// CORRECT
int correctChangeLengthToZero(std::string& s)
{
assert(s.size() != 0);
s.resize(0);
return s.size(); // Will return 0
}

Passing pointer to first string element as buffer

I have a bunch of old linux code which does something like this:
int size_of_buffer = /*stuff computed dynamically*/;
char *buffer = malloc(size_of_buffer);
recv(socket, buffer, size_of_buffer, 0);
//do some processing of the buffer as string
free(buffer);
When I was migrating it to C++ I changed it like this:
int size_of_buffer = /*stuff computed dynamically*/;
const auto buffer = make_unique<char[]>(size_of_buffer);
recv(socket, buffer.get(), size_of_buffer, 0);
const std::string str_buffer = buffer.get();
//do some processing on str_buffer
Which you can't fail to notice causes double memory allocation and potentially multiple copying of data. My idea now is to pass the pointer to first character of the std::string with reserved storage, like this:
int size_of_buffer = /*stuff computed dynamically*/;
std::string buffer;
buffer.reserve(size_of_buffer);
recv(socket, &(buffer[0]), size_of_buffer, 0);
//do some processing on buffer
Is above code safe and well defined or there are some caveats and dangers that need to be avoided?

A similar question was asked here. The short answer is: it is not possible without copying.
Below C++17, there is no non-const overload of std::string::data(), and
1) Modifying the character array accessed through the const overload of data has undefined behavior.
Hence, you cannot modify the string through data.
Since C++11,
data() + i == std::addressof(operator[](i)) for every i in [0, size()].
Therefore, you also cannot modify the string through &(buffer[0]).
Before C+11, it is actually not very clear to me, what exactly is allowed, so maybe modifying through &(buffer.begin()) is ok, but I don't think so.
On cppreference, there is actually a quote that confounds me a bit
The elements of a basic_string are stored contiguously, that is, for a basic_string s, &*(s.begin() + n) == &*s.begin() + n for any n in [0, s.size()), or, equivalently, a pointer to s[0] can be passed to functions that expect a pointer to the first element of a (null-terminated (since C++11)) CharT[] array.
I think this means to const array, since otherwise it would not fit to the rest of the documentation and right now I do not have the time to go through the standard.
Since C++17, your code is ok, if you use std::string::resize instead of reserve, since data() only guarantees a valid range on [data(), data() + size()) (or you can just construct the string with the right size). There is no-non-copy way to create a string from a char *.
You can use a std::string_view, which has a constant constructor from char *. It does exactly what you want here, since it has no ownership on the pointer and memory.

returning a "variable string literal" from a function

I have some function that needs to return a const char* (so that a whole host of other functions can end up using it).
I know that if I had something defined as follows:
const char* Foo(int n)
{
// Some code
.
.
.
return "string literal, say";
}
then there is no problem. However am I correct in saying that if Foo has to return some string that can only be determined at runtime (depending on the parameter n, (where each n taking any value in [0, 2^31-1] uniquely determines a return string)) then I have to use the heap (or return objects like std::string which use the heap internally)?
std::string seems too heavyweight for what I want to accomplish (at least two functions will have to pass the parcel), and allocating memory inside Foo to be freed by the caller doesn't strike me as a safe way of going forward. I cannot (easily) pass in references to the objects that need this function, and not that I believe it is possible anyway but macro trickery is out of the question.
Is there something simple that I have not yet considered?
EDIT
Thanks to all for the answers, I'll go for std::string (I suppose in a roundabout fashion I was asking for confirmation that there is no way of hinting to the compiler that it should store the contents of some char[] in the same place that it stores string literals). As for "heavyweight" (and I'm pleasantly surprised that copying them isn't as wasteful as I thought) that wasn't the best way of putting it, perhaps "different" would have been closer to my initial apprehension.

If you mean that your function chooses between one of n known-at-compile-time strings, then you can just return a const char * to any one of them. A string literal has static storage duration in C and C++, meaning that they exist for the lifetime of the program. Therefore it is safe to return a pointer to one.
const char* choose_string(int n)
{
switch(n % 4)
{
case 0: return "zero";
case 1: return "one";
case 2: return "two";
case 3: return "three";
}
}
If your function dynamically generates a string at runtime, then you have to either pass in a (char *buf, int buf_length) and write the result into it, or return a std::string.

In C++, returning a std::string is probably the right answer (as several others have already said).
If you don't want to use std::string for some reason (say, if you were programming in C, but then you would have tagged the question that way), there are several options for "returning" a string from a function. None of them are pretty.
If you return a string literal, what you're really returning is a pointer to the first character of the array object associated with that string literal. That object has static storage duration (i.e., it exists for the entire execution of your program), so returning a pointer to it is perfectly safe. This is obviously inflexible.
You can allocate an array on the heap and return a pointer to it. That lets the called function determine how long it needs to be, but it places the burden on the caller to deallocate the memory when it's no longer needed.
You can return a pointer to (the first element of) a static array defined inside the function. This is inflexible in that the maximum length has to be determined at compile time. It also means that successive calls to the function will clobber the result. The asctime() function, defined in <time.h> <ctime> does this. (I once wrote a function that cycled through the elements of a static array of arrays, so that 6 successive calls would not clobber previous results, but the 7th would. That was probably overkill.)
You can require the caller to pass in a pointer to (the first element of) an array that the caller itself must allocate, probably along with a separate argument that specifies the length of the caller's array. This requires the caller to know how long the string might be, and probably to be able to handle the error of not reserving enough space.
And now you know why C++ provides library features like std::string that take care of all this stuff for you.
Incidentally, the phrase "variable string literal" doesn't make a lot of sense. If something is a literal, it's not variable. Probably "variable string" is what you meant.

The easiest solution might be to return a std::string.
If you want to avoid std::string, one alternative is to have the caller pass a char[] buffer to the function. You might also want to provide a function that can tell the caller how big of a buffer will be needed, unless an upper bound is known statically.

Use std::string, but if you really want... A common pattern used in C programming is to return the size of the final result, allocate a buffer, and call the function twice. (I apologize for the C style, you want a C solution I give a C solution :P )
size_t Foo(int n, char* buff, size_t buffSize)
{
if (buff)
{
// check if buffSize is large enough if so fill
}
// calculate final string size and return
return stringSize;
}
size_t size = Foo(x, NULL, 0); // find the size of the result
char* string = malloc(size); // allocate
Foo(x,string, size); // fill the buffer

(Donning asbestos suit)
Consider just leaking the memory.
const char* Foo(int n)
{
static std::unordered_map<int, const char*> cache;
if (!cache[n])
{
// Generate cache[n]
}
return cache[n];
}
Yup, this will leak memory. Up to 2^32 strings worth of them. But if you had the actual string literals, you would always have all 2^32 strings in memory (and clearly require a 64 bits build - just the \0 alone take 4GB!)

String operations and memory management

I want to write a convenient wrapper to a C-style function strftime. And I've come to some options to convert char-array to string and vise-versa. Here is my code:
std::string Time::getAsFormattedString ( const std::string& format , const size_t& maxStringSize = 999 )
{
char* timeArray = 0;
std::string timeString;
// [OPTION_0]
timeArray = reinterpret_cast <char*> (malloc(sizeof(char)*maxStringSize)));
// [OPTION_1]
timeArray = const_cast <char*> (timeString.c_str());
// [OPTION_2]
timeArray = &(*(timeString.begin()));
strftime(timeArray,maxStringSize,format.c_str(),&this->time);
timeString = timeArray;
// [OPTION_0]
free(timeArray);
return timeString;
}
№0 option looks safe since no exceptions can be thrown before memory freeing (Edit: timeString = timeArray can throw one, try-catch needed around that line)
№1 const-casting always looks like a hack
№2 seems to be the best by I do not know if there could be some issues with it
Can you please tell me, which one is the most safe, correct, optimal and maybe kind of best-practice.
Thank you.

None of the options you propose are really acceptable; the
second and the third won't even work. Globally, there are two
"acceptable" solutions.
The simplest is:
char buffer[ 1000 ];
size_t n = strftime( buffer, sizeof( buffer ), format.c_str(), &time );
if ( n == 0 ) {
throw SomeError; // or you might just abort...
}
return std::string( buffer );
This has the advantage of simplicity, but you do have to
document the maximum size as a constraint in your interface.
(It seems like a reasonable constraint to me.)
Alternatively, you can remove the constraint:
std::vector<char> buffer( 100 );
size_t n = strftime( &buffer[0], buffer.size(), format.c_str(), &time );
while ( n == 0 ) {
buffer.resize( 2 * buffer.size() );
n = strftime( &buffer[0], buffer.size(), format.c_str(), &time );
}
return std::string( buffer.begin(), buffer.begin() + n );
(In C++11, and in practice in C++03, you can do this with
std::string directly, rather than std::vector. In which
case, you need to call resize( n ) on the resulting string
before returning it.)

As a start, the only option that isn't going to crash is option 0 here. The others will crash as you are saying you have allocated 999 bytes but infact the internal string will probably only have 1 byte allocated to it, and sad things will happen.
However I would probably do this by allocating a large chunk of characters on the stack here.
char timeArray[2048];
strftime(timeArray,2048,format.c_str(),&this->time);
return string(timeArray);
This way you don't have to do any casting or dynamic allocations and will almost certainly be neater and faster.

Option 1 and 2 are not good, as you are not meant to alter the string you get from std::string::c_str() (c is for constant). Option 2 will need a "resize" of the string before you can use it. But I'm not sure strings are guaranteed to copy from their same buffer...
My solution would be to have:
char timeArray[1000];
(Although that is way excessive. Unless you are repeating the same format specifier several times, it's unlikely to achieve more than 100 characters, and that is very lengthy - so "sane" combinations won't reach anywhere near 1000 characters.)
Note that timeString = timeArray can throw an exception for bad_alloc, so if you want to not leak memory in that situation, you either need to use stack-based storage (as in my suggestion), a smart pointer or a try/catch block around parts of the code.

Option 2 (or better still, &timeString[0]) should be preferred.
You're right about the const_cast being bad in option 1, and in option 0, you could, at the very least, clean up the code a bit by using new instead of malloc (and avoid the cast)
But prefer option 2.
(Oh, and as commenters have pointed out, if you're writing into the string itself, you obviously have to first resize it to be big enough that you won't write out of bounds. Given the downvotes, I should probably have been explicit about that)

The only possible solution is option 0 (though some minor adjustments can be performed). As others have pointed out, the standard says that the space returned by the c_str() method is not meant to be used for writing purposes, actually it is a way to allow std::string's to be read by the C standard library (which is part of the C++ standard library).
The Standard reads:
Returns: A pointer to the initial element of an array of length size()
+ 1 whose first size() elements equal the corresponding elements of the string controlled by *this and whose last element is a null
character specified by charT().
Requires: The program shall not
alter any of the values stored in the array. Nor shall the program
treat the returned value as a valid pointer value after any subsequent
call to a non-const member function of the class basic_string that
designates the same object as this.
So, I'd just do a quick fix to your code:
const char * Time::getAsFormattedString(const std::string& format)
{
static char timeArray[256];
std::strftime( timeArray, 256, format.c_str(), &this->time );
return timeArray;
}
This makes the buffer for your method to be created in program startup, and reused continuously, so no memory errors can be expected from that side (as the heap is untouched).
The only problem is that there is space in the stack in order to create the string in which you will store the result of the function, but anyway this will happen after calling the function, the function itself won't touch the heap, and only a minimum of the stack.
In practical terms, the usefulness of the function is untouched, since there is an automatic conversion from const char * to std::string, sou you can safely call it in the usual way:
std::string strTime = time.getAsFormattedString( "%F %T" );
Hope this helps.

std::string.c_str() has different value than std::string?

I have been working with C++ strings and trying to load char * strings into std::string by using C functions such as strcpy(). Since strcpy() takes char * as a parameter, I have to cast it which goes something like this:
std::string destination;
unsigned char *source;
strcpy((char*)destination.c_str(), (char*)source);
The code works fine and when I run the program in a debugger, the value of *source is stored in destination, but for some odd reason it won't print out with the statement
std::cout << destination;
I noticed that if I use
std::cout << destination.c_str();
The value prints out correctly and all is well. Why does this happen? Is there a better method of copying an unsigned char* or char* into a std::string (stringstreams?) This seems to only happen when I specify the string as foo.c_str() in a copying operation.
Edit: To answer the question "why would you do this?", I am using strcpy() as a plain example. There are other times that it's more complex than assignment. For example, having to copy only X amount of string A into string B using strncpy() or passing a std::string to a function from a C library that takes a char * as a parameter for a buffer.

Here's what you want
std::string destination = source;
What you're doing is wrong on so many levels... you're writing over the inner representation of a std::string... I mean... not cool man... it's much more complex than that, arrays being resized, read-only memory... the works.

This is not a good idea at all for two reasons:
destination.c_str() is a const pointer and casting away it's const and writing to it is undefined behavior.
You haven't set the size of the string, meaning that it won't even necessealy have a large enough buffer to hold the string which is likely to cause an access violation.
std::string has a constructor which allows it to be constructed from a char* so simply write:
std::string destination = source

Well what you are doing is undefined behavior. Your c_str() returns a const char * and is not meant to be assigned to. Why not use the defined constructor or assignment operator.

std::string defines an implicit conversion from const char* to std::string... so use that.
You decided to cast away an error as c_str() returns a const char*, i.e., it does not allow for writing to its underlying buffer. You did everything you could to get around that and it didn't work (you shouldn't be surprised at this).
c_str() returns a const char* for good reason. You have no idea if this pointer points to the string's underlying buffer. You have no idea if this pointer points to a memory block large enough to hold your new string. The library is using its interface to tell you exactly how the return value of c_str() should be used and you're ignoring that completely.

Do not do what you are doing!!!
I repeat!
DO NOT DO WHAT YOU ARE DOING!!!
That it seems to sort of work when you do some weird things is a consequence of how the string class was implemented. You are almost certainly writing in memory you shouldn't be and a bunch of other bogus stuff.
When you need to interact with a C function that writes to a buffer there's two basic methods:
std::string read_from_sock(int sock) {
char buffer[1024] = "";
int recv = read(sock, buffer, 1024);
if (recv > 0) {
return std::string(buffer, buffer + recv);
}
return std::string();
}
Or you might try the peek method:
std::string read_from_sock(int sock) {
int recv = read(sock, 0, 0, MSG_PEEK);
if (recv > 0) {
std::vector<char> buf(recv);
recv = read(sock, &buf[0], recv, 0);
return std::string(buf.begin(), buf.end());
}
return std::string();
}
Of course, these are not very robust versions...but they illustrate the point.

First you should note that the value returned by c_str is a const char* and must not be modified. Actually it even does not have to point to the internal buffer of string.

In response to your edit:
having to copy only X amount of string A into string B using strncpy()
If string A is a char array, and string B is std::string, and strlen(A) >= X, then you can do this:
B.assign(A, A + X);
passing a std::string to a function from a C library that takes a char
* as a parameter for a buffer
If the parameter is actually const char *, you can use c_str() for that. But if it is just plain char *, and you are using a C++11 compliant compiler, then you can do the following:
c_function(&B[0]);
However, you need to ensure that there is room in the string for the data(same as if you were using a plain c-string), which you can do with a call to the resize() function. If the function writes an unspecified amount of characters to the string as a null-terminated c-string, then you will probably want to truncate the string afterward, like this:
B.resize(B.find('\0'));
The reason you can safely do this in a C++11 compiler and not a C++03 compiler is that in C++03, strings were not guaranteed by the standard to be contiguous, but in C++11, they are. If you want the guarantee in C++03, then you can use std::vector<char> instead.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using C functions to manipulate std::string - c++

Related

Is this how the size() function really works in std::string?

Passing pointer to first string element as buffer

returning a "variable string literal" from a function

String operations and memory management

std::string.c_str() has different value than std::string?

Categories

Resources