I am doing in-memory image conversions between two frameworks (OpenSceneGraph and wxWidgets). Not wanting to care about the underlying classes (osg::Image and wxImage), I use the stream oriented I/O features both APIs provide like so:
1) Create an std::stringstream
2) Write to the stream using OSG's writers
3) Read from the stream using wxWigdets readers
This works fairly well. Until now I've been using direct access to the stream buffer, but my attention has been recently caught by the "non-contiguous underlying buffer" problem of the std::stringstream. I had been using a kludge to get a const char* ptr to the buffer - but it worked (tested on Windows, Linux and OSX, using MSVC 9 and GCC 4.x), so I never fixed it.
Now I understand that this code is a time bomb and I want to get rid of it. This problem has been brought up several times on SO (here for instance), but I could not find an answer that could really help me do the simplest thing that could possibly work.
I think the most reasonable thing to do is to create my own streambuf using a vector behind the scenes - this would guarantee that the buffer is contiguous. I am aware that this would not a generic solution, but given my constraints:
1) the required size is not infinite and actually quite predictable
2) my stream really needs to be an std::iostream (I can't use a raw char array) because of the APIs
anybody knows how I can either a custom stringbuf using a vector of chars ? Please do not answer "use std::stringstream::str()", since I know we can, but I'm precisely looking for something else (even though you'd say that copying 2-3 MB is so fast that I wouldn't even notice the difference, let's consider I am still interested in custom stringbufs just for the beauty of the exercise).
If you can use just an istream or an ostream (rather than
bidirectional), and don't need seeking, it's pretty simple (about 10
lines of code) to create your own streambuf using std::vector<char>.
But unless the strings are very, very large, why bother? The C++11 standard
guarantees that std::string is contiguous; that a char*
obtained by &myString[0] can be used to as a C style array.` And the
reason C++11 added this guarantee was in recognition of existing
practice; there simply weren't any implementations where this wasn't the
case (and now that it's required, there won't be any implementations in
the future where this isn't the case).
boost::iostreams have a few ready made sinks for this. There's array_sink if you have some sort of upper limit and can allocate the chunk upfront, such a sink won't grow dynamically but on the other hand that can be a positive as well. There's also back_inserter_device, which is more generic and works straight up with std::vector for example. An example using back_inserter_device:
#include <string>
#include <iostream>
#include "boost/iostreams/stream_buffer.hpp"
#include "boost/iostreams/device/back_inserter.hpp"
int main()
{
std::string destination;
destination.reserve( 1024 );
boost::iostreams::stream_buffer< boost::iostreams::back_insert_device< std::string > > outBuff( ( destination ) );
std::streambuf* cur = std::cout.rdbuf( &outBuff );
std::cout << "Hello!" << std::endl;
// If we used array_sink we'd need to use tellp here to retrieve how much we've actually written, and don't forgot to flush if you don't end with an endl!
std::cout.rdbuf( cur );
std::cout << destination;
}
Related
It seems like standard programming practice and the POSIX standard are at odds with each other. I'm working with a program and I noticed that I see a lot of stuff like:
char buf[NAME_MAX + 1]
And I'm also seeing that a lot of operating systems don't define NAME_MAX and say that that they technically don't have to according to POSIX because you're supposed to use pathconf to get the value it's configured to at runtime rather than hard-coding it as a constant anyway.
The problem is that the compiler won't let me use pathconf this way with arrays. Even if I try storing the result of pathconf in a const int, it still throws a fit and says it has to be a constant. So it looks like in order to actually use pathconf, I would have to avoid using an array of chars for the buffer here because that apparently isn't good enough. So I'm caught between a rock and a hard place, because the C++ standard seemingly won't allow me to do what POSIX says I must do, that is determine the size of a character buffer for a filename at runtime rather than compile time.
The only information I've been able to find on this suggests that I would need to replace the array with a vector, but it's not clear how I would do it. When I test using a simple program, I can get this to work:
std::vector<char> buf((pathconf("/", _PC_NAME_MAX) + 1));
And then I can figure out the size by calling buf.size() or something. But I'm not sure if this is the right approach at all. Does anyone have any experience with trying to get a program to stop depending on constants like NAME_MAX or MAXNAMLEN being defined in the system headers and getting the implementation to use pathconf at runtime instead?
Halfway measures do tend to result in conflicts of some sort.
const usigned NAME_MAX = /* get the value at runtime */;
char buf[NAME_MAX + 1];
The second line declares a C-style array (presumably) intended to hold a C-style string. In C, this is fine. In C++, there is an issue because the value of NAME_MAX is not known at compile time. That's why I called this a halfway measure—there is a mix of C-style code and C++ compiling. (Some compilers will allow this in C++. Apparently yours does not.)
The C++ approach would use C++-style strings, as in:
std::string buf;
That's it. The size does not need to be specified since memory will be allocated as needed, provided you avoid C-style interfaces. Use streaming (>>) when reasonable. If the buffer is being filled by user or file input, this should be all you need.
If you need to use C-style strings (perhaps this buffer is being filled by a system call written for C?), there are a few options for allocating the needed space. The simplest is probably a vector, much like you were thinking.
std::vector<char> buf{NAME_MAX + 1};
system_call(buf.data()); // Send a char* to the system call.
Alternatively, you could use a C++-style string, which could make manipulating the data more convenient.
std::string buf{NAME_MAX + 1, '\0'};
system_call(buf.data()); // Send a char* to the system call.
There is also a smart pointer option, but the vector approach might play nicer with existing code written for a C-style array.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I need a cross-platform portable function that is able to copy a 100GB+ binary file to a new destination. My first solution was this:
void copy(const string &src, const string &dst)
{
FILE *f;
char *buf;
long len;
f = fopen(src.c_str(), "rb");
fseek(f, 0, SEEK_END);
len = ftell(f);
rewind(f);
buf = (char *) malloc((len+1) * sizeof(char));
fread(buf, len, 1, f);
fclose(f);
f = fopen(dst.c_str(), "a");
fwrite(buf, len, 1, f);
fclose(f);
}
Unfortunately, the program was very slow. I suspect the buffer had to keep 100GB+ in the memory. I'm tempted to try the new code (taken from Copy a file in a sane, safe and efficient way):
std::ifstream src_(src, std::ios::binary);
std::ofstream dst_ = std::ofstream(dst, std::ios::binary);
dst_ << src_.rdbuf();
src_.close();
dst_.close();
My question is about this line:
dst_ << src_.rdbuf();
What does the C++ standard say about it? Does the code compiled to byte-by-byte transfer or just whole-buffer transfer (like my first example)?
I'm curious does the << compiled to something useful for me? Maybe I don't have to invest my time on something else, and just let the compiler do the job inside the operator? If the operator translates to looping for me, why should I do it myself?
PS: std::filesystem::copy is impossible as the code has to work for C++11.
The crux of your question is what happens when you do this:
dst_ << src_.rdbuf();
Clearly this is two function calls: one to istream::rdbuf(), which simply returns a pointer to a streambuf, followed by one to ostream::operator<<(streambuf*), which is documented as follows:
After constructing and checking the sentry object, checks if sb is a null pointer. If it is, executes setstate(badbit) and exits. Otherwise, extracts characters from the input sequence controlled by sb and inserts them into *this until one of the following conditions are met: [...]
Reading this, the answer to your question is that copying a file in this way will not require buffering the entire file contents in memory--rather it will read a character at a time (perhaps with some chunked buffering, but that's an optimization that shouldn't change our analysis).
Here is one implementation: https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a01075_source.html (__copy_streambufs). Essentially it a loop calling sgetc() and sputc() repeatedly until EOF is reached. The memory required is small and constant.
The C++ standard (I checked C++98, so this should be extremely compatible) says in [lib.ostream.inserters]:
basic_ostream<charT,traits>& operator<<
(basic_streambuf<charT,traits> *sb);
Effects: If sb is null calls setstate(badbit) (which may throw ios_base::failure).
Gets characters from sb and inserts them in *this. Characters are read from sb and inserted until any of the following occurs:
end-of-file occurs on the input sequence;
inserting in the output sequence fails (in which case the character to be inserted is not extracted);
an exception occurs while getting a character from sb.
If the function inserts no characters, it calls setstate(failbit) (which may throw ios_base::failure (27.4.4.3)). If an exception was thrown while extracting a character, the function set failbit in error state, and if failbit is on in exceptions() the caught exception is rethrown.
Returns: *this.
This description says << on rdbuf works on a character-by-character basis. In particular, if inserting of a character fails, that exact character remains unread in the input sequence. This implies that an implementation cannot just extract the whole contents into a single huge buffer upfront.
So yes, there's a loop somewhere in the internals of the standard library that does a byte-by-byte (well, charT really) transfer.
However, this does not mean that the whole thing is completely unbuffered. This is simply about what operator<< does internally. Your ostream object will still accumulate data internally until its buffer is full, then call write (or whatever low-level function your OS uses).
Unfortunately, the program was very slow.
Your first solution is wrong for a very simple reason: it reads the entire source file in memory, then write it entirely.
Files have been invented (perhaps in the 1960s) to handle data that don't fit in memory (and has to be in some "slower" storage, at that time hard disks or drums, or perhaps even tapes). And they have always been copied by "chunks".
The current (Unix-like) definition of file (as a sequence of bytes than is open-ed, read, write-n, close-d) is more recent than 1960s. Probably the late 1970s or early 1980s. And it comes with the notion of streams (which has been standardized in C with <stdio.h> and in C++ with std::fstream).
So your program has to work (like every file copying program today) for files much bigger than the available memory.You need some loop to read some buffer, write it, and repeat.
The size of the buffer is very important. If it is too small, you'll make too many IO operations (e.g. system calls). If it is too big, IO might be inefficient or even not work.
In practice, the buffer should today be much less than your RAM, typically several megabytes.
Your code is more C like than C++ like because it uses fopen. Here is a possible solution in C with <stdio.h>. If you code in genuine C++, adapt it to <fstream>:
void copyfile(const char*destpath, const char*srcpath) {
// experiment with various buffer size
#define MYBUFFERSIZE (4*1024*1024) /* four megabytes */
char* buf = malloc(MYBUFFERSIZE);
if (!buf) { perror("malloc buf"); exit(EXIT_FAILURE); };
FILE* filsrc = fopen(srcpath, "r");
if (!filsrc) { perror(srcpath); exit(EXIT_FAILURE); };
FILE* fildest = fopen(destpath, "w");
if (!fildest) { perror(destpath); exit(EXIT_FAILURE); };
for (;;) {
size_t rdsiz = fread(buf, 1, MYBUFFERSIZE, filsrc);
if (rdsiz==0) // end of file
break;
else if (rdsiz<0) // input error
{ perror("fread"); exit(EXIT_FAILURE); };
size_t wrsiz = fwrite(buf, rdsiz, 1, fildest);
if (wrsiz != 1) { perror("fwrite"); exit(EXIT_FAILURE); };
}
if (fclose(filsrc)) { perror("fclose source"); exit(EXIT_FAILURE); };
if (fclose(fildest)) { perror("fclose dest"); exit(EXIT_FAILURE); };
}
For simplicity, I am reading the buffer in byte components and writing it as a whole. A better solution is to handle partial writes.
Apparently dst_ << src_.rdbuf(); might do some loop internally (I have to admit I never used it and did not understand that at first; thanks to Melpopene for correcting me). But the actual buffer size matters a big lot. The two other answers (by John Swinck and by melpomene) focus on that rdbuf() thing. My answer focus on explaining why copying can be slow when you do it like in your first solution, and why you need to loop and why the buffer size matters a big lot.
If you really care about performance, you need to understand implementation details and operating system specific things. So read Operating systems: three easy pieces. Then understand how, on your particular operating system, the various buffering is done (there are several layers of buffers involved: your program buffers, the standard stream buffers, the kernel buffers, the page cache). Don't expect your C++ standard library to buffer in an optimal fashion.
Don't even dream of coding in standard C++ (without operating system specific stuff) an optimal or very fast copying function. If performance matters, you need to dive in OS specific details.
On Linux, you might use time(1), oprofile(1), perf(1) to measure your program's performance. You could use strace(1) to understand the various system calls involved (see syscalls(2) for a list). You might even code (in a Linux specific way) using directly the open(2), read(2), write(2), close(2) and perhaps readahead(2), mmap(2), posix_fadvise(2), madvise(2), sendfile(2) system calls.
At last, large file copying are limited by disk IO (which is the bottleneck). So even by spending days in optimizing OS specific code, you won't win much. The hardware is the limitation. You probably should code what is the most readable code for you (it might be that dst_ << src_.rdbuf(); thing which is looping) or use some library providing file copy. You might win a tiny amount of performance by tuning the various buffer sizes.
If the operator translates to looping for me, why should I do it myself?
Because you have no explicit guarantee on the actual buffering done (at various levels). As I explained, buffering matters for performance. Perhaps the actual performance is not that critical for you, and the ordinary settings of your system and standard library (and their default buffers sizes) might be enough.
PS. Your question contains at least 3 different questions (but related ones). I don't find it clear (so downvoted it), because I did not understand what is the most relevant one. Is it : performance? robustness? meaning of dst_ << src_.rdbuf();? Why is the first solution slow? How to copy large files quickly?
Is it evil to serialize struct objects using memcpy?
In one of my projects I am doing the following: I memcpy a struct object, base64 encode it, and write it to file. I do the inverse when parsing the data. It seems to work OK, but in certain situations (for example when using the WINDOWPLACEMENT for the HWND of Windows Media Player) it turns out that the decoded data does not match sizeof(WINDOWPLACEMENT).
Here are some code fragments:
// Using WINDOWPLACEMENT from Windows API headers:
typedef struct tagWINDOWPLACEMENT {
UINT length;
UINT flags;
UINT showCmd;
POINT ptMinPosition;
POINT ptMaxPosition;
RECT rcNormalPosition;
#ifdef _MAC
RECT rcDevice;
#endif
} WINDOWPLACEMENT;
static std::string EncodeWindowPlacement(const WINDOWPLACEMENT & inWindowPlacement)
{
std::stringstream ss;
{
Poco::Base64Encoder encoder(ss); // From the Poco C++ libraries
const char * offset = reinterpret_cast<const char*>(&inWindowPlacement);
std::vector<char> buffer(offset, offset + sizeof(inWindowPlacement));
for (size_t idx = 0; idx != buffer.size(); ++idx)
{
encoder << buffer[idx];
}
encoder.close();
}
return ss.str();
}
static WINDOWPLACEMENT DecodeWindowPlacement(const std::string & inEncoded)
{
std::string decodedString;
{
std::istringstream istr(inEncoded);
Poco::Base64Decoder decoder(istr); // From the Poco C++ libraries
decoder >> decodedString;
assert(decoder.eof());
if (decoder.fail())
{
throw std::runtime_error("Failed to parse Window placement data from the configuration file.");
}
}
if (decodedString.size() != sizeof(WINDOWPLACEMENT))
{
// !! Occurs frequently !!
throw std::runtime_error("Errors occured during parsing of the Window placement.");
}
WINDOWPLACEMENT windowPlacement;
memcpy(&windowPlacement, &decodedString[0], decodedString.size());
return windowPlacement;
}
I'm aware that copying classes in C++ using memcpy is likely to cause trouble because the copy constructors are not properly executed. I'm not sure if this also applies to C-style structs. Or is serialization by memory dumping simply not done?
Update:
A bug in Poco's Base64Encoder/Decoder is not impossible, but unlikely. Its test cases seem pretty thorough: Base64Test.cpp.
You will run into problems if you need to transfer these files between machines that do not all share the same endianness and word size, or if you add/remove slots from the structs in future versions and need to retain binary compatibility.
I'm not sure how operator>>() is implemented in Poco::Base64Decoder. If it is same as istream's operator>>(), then after decoder >> decodedString; decodedString may not contain all characters from the input. For example, if there is any whitespace character in encoded string then decoder >> decodedString; will read upto that whitespace.
Doing a memcpy of classes/structs is okay if they're just Plain Old Data (POD), but if that's the case, then you could rely on C++ doing the copying for you via copy constructors (which exist for both struct and class types in C++).
Certainly you can do it the way you have been doing it - one of the products I've worked on serializes data using memcpy, sends the data over the wire, and client applications decode the bytestream to get the data back.
But if you have a choice, you might want something higher level like boost.serialization, which offers more flexibility and deep-pointer copying. The aforementioned Google ProtoBuffers would work nicely too.
Here are some threads discussing serialization methods in C++:
boost serialization vs google protocol buffers?
C++ Serialization Performance
I wouldn't go as far as to say that it's evil, but I think it is asking for trouble and weird problems in many cases.
I know it has been done and it can work (I've seen people serialize structs like that to send over a network connection), but it has a number of drawbacks that have been pointed out already (inflexibility, endianness problems, structs containing pointers, packing, etc).
I'd recommend a more robust way of serializing and deserializing your data. I've heard lots of good things about Google protocol buffers, something like that will be a lot more flexible and will probably save you headaches in the end.
Serializing data in the manner you've done it is not particularly evil, if you know you're staying on a machine with the same byte size, word size, endian-ness, etc. Since you're serializing the window placement information, you probably don't care about portability between two different machines, and only want to save this information between sessions on the same machine. I'd hazard a guess that you're storing this into the Registry. If you want portability for other data that is actually useful when it's ported to other architectures, then you can look at many of the other suggestions posted here already, such as Google protocol buffers, etc. Whitespace is a red-herring, as all WS is irrelevant in a base64 encoded data stream and all decoders should ignore it (PoCo does).I am curious to know what are the sizes of the string and the structure when it fails.Knowing this might give you some insight into the problem.
Background:
I'm using Google's protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it's recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart's content.
[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]
I have an implemntation that works on Github:
src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp
But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,
Questions:
I'm using reinterpret_cast<char*> to read/write the 32 bit integers to and from the binary fstream. Since I'm using protobuf, I'm making the assumption that all machines are little-endian. I also assert that an int is indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binary fstream given these two limiting assumptions?
In reading from fstream, I create a temporary fixed-length char buffer, so that I can then pass this fixed-length buffer to the protobuf library to decode using ParseFromArray, as ParseFromIstream will consume the entire stream. I'd really prefer just to tell the library to read at most the next N bytes from fstream, but there doesn't seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of an fstream? Or is my design sufficiently upside down that I should consider a different approach entirely?
Edit:
#codymanix: I'm casting to char since istream::read requires a char array if I'm not mistaken. I'm also not using the extraction operator >> since I read it was poor form to use with binary streams. Or is this last piece of advice bogus?
#Martin York: Removed new/delete in favor of std::vector<char>. glob.cpp is now updated. Thanks!
Don't use new []/delete[].
Instead us a std::vector as deallocation is guaranteed in the event of exceptions.
Don't assume that reading will return all the bytes you requested.
Check with gcount() to make sure that you got what you asked for.
Rather than have Glob implement the code for both input and output depending on a switch in the constructor. Rather implement two specialized classes like ifstream/ofstream. This will simplify both the interface and the usage.
I read somewhere that snprintf is faster than ostringstream. Has anyone has any experiences with it? If yes why is it faster.
std::ostringstream is not required to be slower, but it is generally slower when implemented. FastFormat's website has some benchmarks.
The Standard library design for streams supports much more than snprintf does. The design is meant to be extensible, and includes protected virtual methods that are called by the publicly exposed methods. This allows you to derive from one of the stream classes, with the assurance that if you overload the protected method you will get the behavior you want. I believe that a compiler could avoid the overhead of the virtual function call, but I'm not aware of any compilers that do.
Additionally, stream operations often use growable buffers internally; which implies relatively slow memory allocations.
We replaced some stringstreams in inner loops with sprintf (using statically allocated buffers), and this made a big difference, both in msvc and gcc. I imagine that the dynamic memory management of this code:
{
char buf[100];
int i = 100;
sprintf(buf, "%d", i);
// do something with buf
}
is much simpler than
{
std::stringstream ss;
int i = 100;
ss << i;
std::string s = ss.str();
// do something with s
}
but i am very happy with the overall performance of stringstreams.
Some guys would possibly tell you about that the functions can't be faster than each other, but their implementation can. That's right i think i would agree.
You are unlikely to ever notice a difference in other than benchmarks. The reason that c++ streams generally tend to be slower is that they are much more flexible. Flexibility most often comes at the cost of either time or code growth.
In this case, C++ streams are based on stream-buffers. In itself, streams are just the hull that keep formatting and error flags in place, and call the right i/o facets of the c++ standard library (for example, num_put to print numbers), that print the values, well formatted, into the underlying stream-buffer connected to the c++ stream.
All this mechanisms - the facets, and the buffers, are implemented by virtual functions. While there is indeed no mark note, those functions must be implemented to be slower than c stdio pendants that fact will make them somewhat slower than using c stdio functions normally (i benchmark'ed that some time ago with gcc/libstdc++ and in fact noticed a slowdown - but which you hardly notice in day-by-day usage).
Absolutely this is implementation-specific.
But if you really want to know, write two small programs, and compare them. You would need to include typical usage for what you have in mind, the two programs would need to generate the same string, and you would use a profiler to look at the timing information.
Then you would know.
One issue would probably be that the type safety added by ostringstream carries extra overhead. I've not done any measurements, though.
As litb said, standard streams support many things we don't always need.
Some streams implementation get rid of this never used flexibility, see FAStream for instance.
It's quite possible that because sprintf is part of the CRT that is written in assembly. The ostringstream is part of the STL, and probably a little more generically written, and has OOP code/overhead to deal with.
Yes, if you run the function below on a few million numbers with Visual C++ 5.0, the first version takes about twice as long as the second and produces the same output.
Compiling tight loops into a .exe and running the Windows timethis something.exe' or the Linuxtime something' is how I investigate most of my performance curiosities. (`timethis' is available on the web somewhere)
void Hex32Bit(unsigned int n, string &result)
{
#if 0
stringstream ss;
ss
<< hex
<< setfill('0')
<< "0x" << setw(8) << n
;
result = ss.str();
#else
const size_t len = 11;
char temp[len];
_snprintf(temp, len, "0x%08x", n);
temp[len - 1] = '\0';
result = temp;
#endif
}
One reason I know that the printf family of functions are faster than the corresponding C++ functions (cout, cin, and other streams) is that the latter do typechecking. As this usually involves some requests to overloaded operators, it can take some time.
In fact, in programming competitions it is often recommended that you use printf et al rather than cout/cin for precisely this reason.