C++ stl stringstream direct buffer access - c++

this should be pretty common yet I find it fascinating that I couldn't find any straight forward solution.
Basically I read in a file over the network into a stringstream. This is the declaration:
std::stringstream membuf(std::ios::in | std::ios::out | std::ios::binary);
Now I have some C library that wants direct access to the read chunk of the memory. How do I get that? Read only access is OK. After the C function is done, I dispose of the memorystream, no need for it.
str() copies the buffer, which seems unnecessary and doubles the memory.
Am I missing something obvious? Maybe a different stl class would work better.
Edit:
Apparently, stringstream is not guaranteed to be stored continuously. What is?
if I use vector<char> how do I get byte buffer?

You can take full control of the buffer used by writing the buffer yourself and using that buffer in the stringstream
stringstream membuf(std::ios::in | std::ios::out | std::ios::binary);
membuf.rdbuf(yourVeryOwnStreamBuf);
Your own buffer should be derived from basic_streambuf, and override the sync() and overflow() methods appropriately.
For your internal representation you could probably use something like vector< char >, and reserve() it to the needed size so that no reallocations and copies are done.
This implies you know an upper bound for the space needed in advance. But if you don't know the size in advance, and need a continguous buffer in the end, copies are of course unavoidable.

std::stringstream doesn't (necessarily) store its buffer contiguously but can allocate chunks as it is gradually filled. If you then want all of its data in a contiguous region of memory then you will need to copy it and that is what str() does for you.
Of course, if you want to use or write a class with a different storage strategy then you can, but you don't then need to use std::stringstream at all.

You can call str() to get back a std::string. From there you can call c_str() on the std::string to get a char*. Note that c_str() isn't officially supported for this use, but everyone uses it this way :)
Edit
This is probably a better solution: std::istream::read. From the example on that page:
buffer = new char [length];
// read data as a block:
is.read (buffer,length);

Well, if you are seriously concerned about storage, you can get closer to the metal. basic_stringstream has a method, rdbuf() which returns it's basic_stringbuf (which is derived from basic_streambuf). You can then use the eback(), egptr(), and gptr() pointers to access characters directly out of the buffer. I've used this machinery in the past to implement a custom buffer with my desired semantics, so it is do-able.
Beware, this is not for the faint of heart! Set aside a few days, read Standard C++ IOStreams and Locales, or similar nitpicky reference, and be careful...

Related

Reading large strings in C++ -- is there a safe fast way?

http://insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html reviews a number of ways of reading an entire file into a string in C++. The key code for the fastest option looks like this:
std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());
Unfortunately, this is not safe as it relies on the string being implemented in a particular way. If, for example, the implementation was sharing strings then modifying the data at &contents[0] could affect strings other than the one being read. (More generally, there's no guarantee that this won't trash arbitrary memory -- it's unlikely to happen in practice, but it's not good practice to rely on that.)
C++ and the STL are designed to provide features that are efficient as C, so one would expect there to be a version of the above that was just as fast but guaranteed to be safe.
In the case of vector<T>, there are functions which can be used to access the raw data, which can be used to read a vector efficiently:
T* vector::data();
const T* vector::data() const;
The first of these can be used to read a vector<T> efficiently. Unfortunately, the string equivalent only provides the const variant:
const char* string::data() const noexcept;
So this cannot be used to read a string efficiently. (Presumably the non-const variant is omitted to support the shared string implementation.)
I have also checked the string constructors, but the ones that accept a char* copy the data -- there's no option to move it.
Is there a safe and fast way of reading the whole contents of a file into a string?
It may be worth noting that I want to read a string rather than a vector<char> so that I can access the resulting data using a istringstream. There's no equivalent of that for vector<char>.
If you really want to avoid copies, you can slurp the file into a std::vector<char>, and then roll your own std::basic_stringbuf to pull data from the vector.
You can then declare a std::istringstream and use std::basic_ios::rdbuf to replace the input buffer with your own one.
The caveat is that if you choose to call istringstream::str it will invoke std::basic_stringbuf::str and will require a copy. But then, it sounds like you won't be needing that function, and can actually stub it out.
Whether you get better performance this way would require actual measurement. But at least you avoid having to have two large contiguous memory blocks during the copy. Additionally, you could use something like std::deque as your underlying structure if you want to cope with truly huge files that cannot be allocated in contiguous memory.
It's also worth mentioning that if you're really just streaming that data you are essentially double-buffering by reading it into a string first. Unless you also require the contents in memory for some other purpose, the buffering inside std::ifstream is likely to be sufficient. If you do slurp the file, you may get a boost by turning buffering off.
I think using &string[0] is just fine, and it should work with the widely used standard library implementations (even if it is technically UB).
But since you mention that you want to put the data into an istringstream, here's an alternative:
Read the data into a char array (new char[in.tellg()])
Construct a stringstream (without the leading 'i')
Insert the data with stringstream::write
The istringstream would have to copy the data anyway, because a std::stringstream doesn't store a std::string internally as far as I'm aware, so you can leave the std::string away and put the data into it directly.
EDIT: Actually, instead of the manual allocation (or make_unique), this way you could also use the vector<char> you mentioned.

How to do file I/O with std::vector<bool>?

I need to implement a Boolean data container that will store fairly large amount of variables. I suppose I could just use char* and implement C-style macro accessors but I would prefer to have it wrapped in an std:: structure. std::bitset<size_t> does not seem practical as it has a fixed-during-compilation size.
So that leaves me with std::vector<bool> which is optimized for space; and it has a nice bool-like accessor.
Is there a way to do something like directly feed a pointer from it to fwrite()?
And how would one do file input into such a vector?
And lastly, is it a good data structure when a lot of file I/O is needed?
What about random file access (fseek etc)?
EDIT: I've decided to wrap a std::vector<unsigned int> in a new class that has functionality demanded by my requirements.
Is there a way to do something like directly feed a pointer from it to fwrite()?
No, but you can do it with std::fstream
std::ofstream f("output.file");
std::copy(vb.begin(), vb.end(), std::ostream_iterator<bool>(f, " "));
And how would one do file input into such a vector?
Using std::fstream
std::ifstream f("input.file");
std::copy(std::istream_iterator<bool>(f), {}, std::back_inserter(vb));
And lastly, is it a good data structure when a lot of file I/O is needed?
No, vector<bool> is rarely a good data structure for any purpose. See http://howardhinnant.github.io/onvectorbool.html
What about random file access (fseek etc)?
What about it?
You could use a std::vector<char>, resize it to the size of the file (or other size, say you want to process fixed length blocks), then you can pass the contents of it to a function like fread() or fwrite() in the following way:
std::vector<char> fileContents;
fileContents.resize(100);
fread(&fileContents[0], 1, 100, theFileStream);
This really just allows you to have a resizable char array, in C++ style. Perhaps it is a useful starting point? The point being that you can directly access the memory behind the vector, since it is guaranteed to be laid out sequentially, just like an array.
The same concept would work for a std::vector<bool> - I'd just be careful when freading into this as off the top of my head I can't tell you how big (sizeof wise) a bool is, as it depends on the platform (8bit vs 16bit vs 32bit, if you were working on a microcontroller for example).
It seems std::vector<bool> can be optimised to store each bool in a single bit, so, definitely do not attempt to use the memory behind a vector<bool> directly unless you know it is going to work!

Why does std::fstream use char*?

I'm writing a small program that reads the bytes from a file in binary file in groups of 16 bytes (please don't ask why), modifies them, and then writes them to another file.
The fstream::read function reads into a char * buffer, which I was initially passing to a function that looks like this:
char* modify (char block[16], std::string key)
The modification was done on block which was then returned. On roaming the posts of SO, I realized that it might be a better idea to use std::vector<char>. My immediate next worry was how to convert a char * to a std::vector<char>. Once again, SO gave me an answer.
But now what I'm wondering is: If its such a good idea to use std::vector<char> instead of char*, why do the fstream functions use char* at all?
Also, is it a good idea to convert the char* from fstream to std::vector<char> in the first place?
EDIT: I now realize that since fstream::read is used to write data into objects directly, char * is necessary. I must now modify my question. Firstly, why are there no overloaded functions for fstream::read? And secondly, in the program that I've written about, which is a better option?
To use it with a vector, do not pass a pointer to the vector. Instead, pass a pointer to the vector content:
vector<char> v(size);
stream.read(&v[0], size);
fstream() functions let you use char*s so you can point them at arbitrary pre-allocated buffers. std::vector<char> can be sized to provide an appropriate buffer, but it will be on the heap and there's allocation costs involved with that. Sometimes too you may want to read or write data to a specific location in memory - even in shared memory - rather than accepting whatever heap memory the vector happens to have allocated. Further, you may want to use fstream without having included the vector header... it's nice to be able to avoid unnecessary includes as it reduces compilation time.
As your buffers are always 16 bytes in size, it's probably best to allocate them as char [16] data members in an appropriate owning object (if any exists), or on the stack (i.e. some function's local variable).
vector<> is more useful when the alternative is heap allocation - whether because the size is unknown at compile time, or is particularly large, or you want more flexible control of the memory lifetime. It's also useful when you specifically want some of the other vector functionality, such as ability to change the number of elements afterwards, to sort the bytes etc. - it seems very unlikely you'll want to do any of that so a vector raises questions in the mind of the person reading your code about what you'll do for no good purpose. Still, the choice of char[16] vs. vector appears (based on your stated requirements) more a matter of taste than objective benefit.

C++ ifstream::read() and C arrays

It seems to be a general consensus that C arrays are bad and that using the smarter alternatives, like vectors or C++ strings is the way to go. No problem here.
Having said that, why does the read() member of ifstream input data to a char*...
The question is: can I input to a vector of bytes somehow using the STL alone?
A related bonus question: do you often check for ios::badbit and ios::failbit especially if you work with a dynamically allocated C string within that scope? Do you do deallocation of the C string in the catch()'s?
Thank you for reading.
You can read directly into an allocated vector (I have no ability to compile this from here so there might be typos or transposed parameters etc...) but the idea is correct.
vector<char> data;
data.resize(100);
// Read 100 characters into the storage of data
thing.read(&data[0], 100);
Having said that, why does the read() member of ifstream input data to a char*...
It was a design choice. Not necessarily the smartest one.
The question is: can I input to a vector of bytes somehow
The underlying storage for a std::vector<char> is guaranteed to be contiguous (i.e., a single block of memory), so yes. To use ifstream::read, you must (a) ensure that the vector's size is big enough (using .resize() - not .reserve()! This is because the vector has no way of knowing about the data you're reading in to its unused capacity and updating its size), and then (b) obtain a pointer to the beginning element of the vector (e.g. with &v.front() or &(v[0])).
If you don't want to pre-size the vector, then there are more advanced techniques you can use that involve iterator types and standard library algorithms. These have the advantage that you can easily read an entire file into a vector without having to check the file length first (and the standard tricks for checking a file's length might not be as reliable as you think!).
It looks something like:
#include <iterator>
#include <algorithm> // in addition to what you already have.
// ...
std::ifstream ifs;
std::vector v;
// ...
std::istreambuf_iterator<char> begin(ifs), end;
std::copy(begin, end, std::back_inserter(v));
A vector of bytes is not a string, of course. But you can do the exact same thing with std::string - the std::back_inserter function is smart enough to create the appropriate iterator type for any supplied type that provides a .push_back(), which both string and vector do. That's the magic of templates. :)
using the STL alone?
I'm confused. I thought we were talking about the C++ standard library. What is this STL of which you speak?
A related bonus question: do you often check for ios::badbit and ios::failbit
No; I usually write code in such a way that I can just check the result of the reading operation directly. See http://www.parashift.com/c++-faq-lite/input-output.html#faq-15.4 for an example/discussion.
especially if you work with a dynamically allocated C string within that scope?
This is a bad idea in general.
Do you do deallocation of the C string in the catch()'s?
And having to deal with this sort of thing is one of the main reasons why. This is not easy to get right. But I hope you realize that catching an exception and checking for ios::bits are two totally different things. Although you can configure the stream object to throw exceptions instead of just setting the flag bits :)

Should I preallocate std::stringstream?

I use std::stringstream extensively to construct strings and error messages in my application. The stringstreams are usually very short life automatic variables.
Will such usage cause heap reallocation for every variable? Should I switch from temporary to class-member stringstream variable?
In latter case, how can I reserve stringstream buffer? (Should I initialize it with a large enough string or is there a more elegant method?)
Have you profiled your execution, and found them to be a source of slow down?
Consider their usage. Are they mostly for error messages outside the normal flow of your code?
As far as reserving space...
Some implementations probably reserve a small buffer before any allocation takes place for the stringstream. Many implementations of std::string do this.
Another option might be (untested!)
std::string str;
str.reserve(50);
std::stringstream sstr(str);
You might find some more ideas in this gamedev thread.
edit:
Mucking around with the stringstream's rdbuf might also be a solution. This approach is probably Very Easy To Get Wrong though, so please be sure it's absolutely necessary. Definitely not elegant or concise.
Although "mucking around with the stringstream's rdbuf...is probably Very Easy To Get Wrong", I went ahead and hacked together a proof-of-concept anyway for fun, as it has always bugged me that there is no easy way to reserve storage for stringstream. Again, as #luke said, you are probably better off optimizing what your profiler tells you needs optimizing, so this is just to address "What if I want to do it anyway?".
Instead of mucking around with stringstream's rdbuf, I made my own, which does pretty much the same thing. It implements only the minimum, and uses a string as a buffer. Don't ask me why I called it a VECTOR_output_stream. This is just a quickly-hacked-together thing.
constexpr auto preallocated_size = 256;
auto stream = vector_output_stream(preallocated_size);
stream << "My parrot ate " << 3 << " cookies.";
cout << stream.str() << endl;
The Bad
This is an old question, but even as of C++1z/C++2a in Visual Studio 2019, stringstream has no ideal way of reserving a buffer.
The other answers to this question do not work at all and for the following reasons:
calling reserve on an empty string yields an empty string, so stringstream constructor doesn't need to allocate to copy the contents of that string.
seekp on a stringstream still seems to be undefined behavior and/or does nothing.
The Good
This code segment works as expected, with ss being preallocated with the requested size.
std::string dummy(reserve, '\0');
std::stringstream ss(dummy);
dummy.clear();
dummy.shrink_to_fit();
The code can also be written as a one-liner std::stringstream ss(std::string(reserve, '\0'));.
The Ugly
What really happens in this code segment is the following:
dummy is preallocated with the reserve, and the buffer is subsequently filled with null bytes (required for the constructor).
stringstream is constructed with dummy. This copies the entire string's contents into an internal buffer, which is preallocated.
dummy is then cleared and then erased, freeing up its allocation.
This means that in order to preallocate a stringstream, two allocations, one fill, and one copy takes place. The worst part is that during the expression, twice as much memory is needed for the desired allocation. Yikes!
For most use cases, this might not matter at all and it's OK to take the extra fill and copy hit to have fewer reallocations.
I'm not sure, but I suspect that stringbuf of stringstream is tightly related with resulted string. So I suspect that you can use ss.seekp(reserved-1); ss.put('\0'); to reserve reserved amount of bytes inside of underlying string of ss. Actually I'd like to see something like ss.seekp(reserved); ss.trunc();, but there is no trunc() method for streams.