std::stringstream efficient way to get written data, copy to another stream - c++

Without writing a custom rdbuf is there any way to use a stringstream efficiently? That is, with these requirements:
the stream can be reset and writing start again without deallocating previous memory
get a const char* to the data written (along with the length) without creating a temporary
populate the stream without creating a temporary string
If somebody can give me a definitive "no" that would be great.
Now, I also use boost, so if somebody can provide a boost alternative which does this that would be great. It has to have both istream and ostream interfaces available.

Use boost::interprocess::vectorstream or boost::interprocess::bufferstream. These classes basically meet all of your requirements.
boost::interprocess::vectorstream won't return a const char*, but it will return a const reference to an internal container class, (like an internal vector), rather than returning a temporary string copy. On the other hand, boost::interprocess::bufferstream will basically allow you to use any arbitrary buffer as an I/O stream, giving you complete control over memory allocation, so you can easily use a char buffer if you want.
These are both great classes, and wonderful replacements for std::stringstream, which, in my opinion, has always been hindered by the fact that it doesn't give you direct access to the internal buffer, resulting in the unnecessary creation of temporary string objects. It's a shame these classes are somewhat obscure, hidden away in the interprocess library.

Related

Remove last char in stringstream?

For example, the stringstream contains "abc\n", I want to remove the last char '\n'.
I know it can be done by using 'str' first.
But could it be done without stringstream::str()?
No, there isn't, at least not in a guaranteed manner. Although internally, it maintains a string buffer, you currently do not have access to it without a copy being made. There is a proposal to change this:
Streams have been the oldest part of the C++ standard library and their specification doesn’t take into account many things introduced since C++11. One of the oversights is that there is no non-copying access to the internal buffer of a basic_stringbuf which makes at least the obtaining
of the output results from an ostringstream inefficient, because a copy is always made. I personally speculate that this was also the reason why basic_strbuf took so long to get deprecated with its char * access.
With move semantics and basic_string_view there is no longer a reason to keep this pessimissation alive on basic_stringbuf.
Internally, there is no reason why there should be this limited, as I believe (I may be wrong) that basic_stringbuf requires a basic_string buffer, and Clang certainly implements basic_stringbuf in such a manner.
Right now, you can stringstream like any other stream, or access a copy of it's underlying buffer, however, you cannot modify the buffer directly. This means that any attempts to modify the end of the stream require copying the underlying buffer or reading bytes until the end.
stringstream ss;
ss<<"abc\n";
ss.seekp(-1, std::ios_base::end);
ss << '\0';

Can I do a zero-copy std::string allocation in C++ from a const char * array?

Profiling of my application reveals that it is spending nearly 5% of CPU time in string allocation. In many, many places I am making C++ std::string objects from a 64MB char buffer. The thing is, the buffer never changes during the running of the program. My analysis of std::string(const char *buf,size_t buflen) calls is that that the string is being copied because the buffer might change after the string is made. That isn't the problem here. Is there a way around this problem?
EDIT: I am working with binary data, so I can't just pass around char *s. Besides, then I would have a substantial overhead from always scanning for the NULL, which the std::string avoids.
If the string isn't going to change and if its lifetime is guaranteed to be longer than you are going to use the string, then don't use std::string.
Instead, consider a simple C string wrapper, like the proposed string_ref<T>.
Binary data? Stop using std::string and use std::vector<char>. But that won't fix your issue of it being copied. From your description, if this huge 64MB buffer will never change, you truly shouldn't be using std::string or std::vector<char>, either one isn't a good idea. You really ought to be passing around a const char* pointer (const uint8_t* would be more descriptive of binary data but under the covers it's the same thing, neglecting sign issues). Pass around both the pointer and a size_t length of it, or pass the pointer with another 'end' pointer. If you don't like passing around separate discrete variables (a pointer and the buffer’s length), make a struct to describe the buffer & have everyone use those instead:
struct binbuf_desc {
uint8_t* addr;
size_t len;
binbuf_desc(addr,len) : addr(addr), len(len) {}
}
You can always refer to your 64MB buffer (or any other buffer of any size) by using binbuf_desc objects. Note that binbuf_desc objects don’t own the buffer (or a copy of it), they’re just a descriptor of it, so you can just pass those around everywhere without having to worry about binbuf_desc’s making unnecessary copies of the buffer.
There is no portable solution. If you tell us what toolchain you're using, someone might know a trick specific to your library implementation. But for the most part, the std::string destructor (and assignment operator) is going to free the string content, and you can't free a string literal. (It's not impossible to have exceptions to this, and in fact the small string optimization is a common case that skips deallocation, but these are implementation details.)
A better approach is to not use std::string when you don't need/want dynamic allocation. const char* still works just fine in modern C++.
Since C++17, std::string_view may be your way. It can be initialized both from a bare C string (with or without a length), or a std::string
There is no constraint that the data() method returns a zero-terminated string though.
If you need this "zero-terminated on request" behaviour, there are alternatives such as str_view from Adam Sawicki that looks satisfying (https://github.com/sawickiap/str_view)
Seems that using const char * instead of std::string is the best way to go for you. But you should also consider how you are using strings. It may be possible that there could be going on implicit conversion from char pointers to std::string objects. This could happen during function calls, for example.

c++ class operator overloading reference

I'm confused on why exactly I need to use references for the return type and parameter list in this example from my book below. Is their any reason besides that it takes up less memory than having everything being copied over using pass by value? Or does it have to deal more with if I wanted to do cascading?
istream &operator>>( stream &input, PhoneNumber &number)
{
//input whatever
return input;
}
Because a) streams are not copyable, b) getting input from a stream means mutating it, so you need to modify the original and not a copy (however would that be realised). And reference to PhoneNumber should be obvious — you're getting input from the stream and into that object. If you'd pass it by copy, it wouldn't be visible outside of the operator, which makes the entire exercise rather pointless.
The biggest reason why you use pointers and references is not because it lets you use less memory (although it certainly does), but because it lets you use less time. Copying objects takes time, you often need to allocate additional memory, and then deallocate it in the end.
Even more importantly, objects such as streams are not meant to be copied at all: they contain internal state that is relevant to a physical object, - a file on disk or a network stream, - and their associated buffers, that does not make much sense to copy.

Can you avoid using temporary buffers when using std::string to interact with C style APIs?

I should preface this question by saying I think the answer is probably no, but I'd like to see what other people think about the issue.
I spend most of my time writing C++ that interacts with the Win32 API which like most C style APIs wants to either:
Take buffers which I've provided and operate on them.
Or return pointers to buffers which I need to later free.
Both of these scenarios essentially mean that if you want to use std::string in your code you've got to accept the fact that you're going to be doing a lot of string copying every time you construct a std::string from a temporary buffer.
What would be nice would be:
To be able to allow C style APIs to safely directly mutate a std::string and pre-reserve its allocation and set its size in advance (to mitigate scenario 1)
To be able to wrap a std::string around an existing char[] (to mitigate scenario 2)
Is there a nice way to do either of these, or should I just accept that there's an inherent cost in using std::string with old school APIs? It looks like scenario 1 would be particularly tricky because std::string has a short string optimisation whereby its buffer could either be on the stack or the heap depending on its size.
In C++11 you can simply pass a pointer to the first element of the string (&str[0]): its elements are guaranteed to be contiguous.
Previously, you can use .data() or .c_str() but the string is not mutable through these.
Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.
Well you could probably just const_cast the .data() of a string to char* and it would most likely work. As with all optimisations, make sure that it is actually this bit of the code that is the bottleneck. If it is, wrap this up in an inline-able function, or a template class or something so that you can write some tests for it and change the behaviour if it doesn't work on some platform.
I think the only thing that you can do safely with std::(w)string here is pass it as an input that's not going to be modified by its user; use .c_str() to get a pointer to (W)CHAR.
You may be able to use a std::vector<char> instead. You can directly pass a pointer to the first character into C code and let the C code write it which you can't do with a string. And many of the operations you'd want to perform on a string you can do on a std::vector<char> just as well.
Since C++11, you don't have to use temporary buffers. You can interchangeably use strings & c-strings and even write to the buffer of c++ strings, but you need to use string::front(), not string::data() or string::c_str() as those only return const char*. See Directly write into char* buffer of std::string.

How to cast a pointer of memory block to std stream

I have programed an application on windows XP and in Visual Studio with c++ language.
In that app I used LoadResource() API to load a resource for giving a file in the resource memory.
It returned a pointer of memory block and I wanna cast the pointer to the std stream to use for compatibility.
Could anyone help me?
You can't cast the resource to a stream type. Either you copy the bytes:
std::stringstream ss;
ss.rdbuf().sputn(buf, len);
or you wrap your resource in your own streambuf:
class resourcebuf : public std::streambuf {
// Todo: implement members including at least xsgetn, uflow and underflow
};
and pass it to istream::istream
Why would you need this?
Casting raw data pointers to streams means byte-by-byte copying of your resource and, therefore, lacks in performance (and, also to mention, I don't see any benefit in this approach).
If you want to work with raw memory, work with it. Casting here (compatibility?) seems to be a very strange approach.
Still, if you want to do it, you could create some stream from your memory block, that treats it as a sequence of bytes. In this case, it means using std::stringstream (istringstream).
After you lock your resource by LockResource, create a string from received void* pointer and pass it to your stringstream instance.
void* memory = LockResource(...);
// You would probably want to use SizeofResource() here
size_t memory_size = ... ;
std::string casted_memory(static_cast<char*>(memory), memory_size);
std::istringstream stream(casted_memory);
Most straightforward way is probably to convert the buffer to string and then stringstream:
std::stringstream ss(std::string(buf,len));
I think that will copy it twice, though, so if it turns out to be taking a lot of time you might need to look for alternatives. You could use strstream, but it might freak out the squares.