Least onerous way to implement generic formatted stream output in CUDA? - c++

I want to be able to write something close to:
std::cout << "Hello" << my_world_string << ", " << std::setprecision(5) << my_double << '\n';
in CUDA device-side code, for debugging templated functions - and for this kind of line of code to result in a single, unbroken, output line (i.e. the equivalent of a single CUDA printf() call - which typically doesn't get mangled with output from other threads).
Of course, that's not possible since there are no files or file descriptors in device-side code, nor is any of the std::ostream code usable in device-side code. Essentially what we have to work with is CUDA's hardware+software hack enabling printf()s. But it is obviously possible to get something like:
stream << "Hello" << my_world_string << ", " << foo::setprecision(5) << my_double << '\n';
stream << "Hello" << my_world_string << ", " << foo::setprecision(5) << my_double << '\n';
printf("%s", stream.str());
My question is: What should I implement which would allow me to write code as close to the above as possible, minimizing effort / amount of code to write?
I used the identifier stream but it doesn't have to be a stream. Nor does the code need to look just like I laid it out. The point is for me to be able to have printing code in a templated device function.
All code will be written in C++11.
Code may assume compilation is performed either with C++11 or a later version of the standard.
I can use existing FOSS code, but only if its license is permissive, e.g. 3-BSD, CC-BY-SA, MIT - but not GPL.

Currently, the way I'm thinking of implementing this is:
Implement an std::ostringstream-like class which can take its initial storage from elsewhere (on construction).
With such an object, you can then printf("%s\n", my_gpu_sstream.str()) .
Allow the GPU-ostringstream to be constructed with a fixed-sized buffer.
Allow the GPU-ostringstream to allocate variable-size buffers using CUDA's device-side malloc().
and Bob's your uncle.
However, I would really rather avoid implementing a full-blown stringstream myself. Seems like a whole lot of redundant work and code.
Edit: Done! I now havea working implementation in my cuda-kat library. I've used robhz786's strf library, which is (header-only-if-you-like) string formatting library not based on standard streams. On its basis I've implemented an on-device stringstream, kat::stringstream, and on the basis of that, a "printf'ing ostream" class.
It's far from perfect: strf doesn't use standard library manipulators and has it's own idioms for filling, setting precision etc. Also, compilation time is quite high. But it is quite usable. Even has the option to prepend each printed line with a prefix (e.g. the block & thread indices) if you configure it to do so. Output uses CUDA's intrinsic printf() mechanism - when reaching the end of a line.


Writing single-char vs. char const* to buffer

When writing single characters to an output stream, the purist in me wants to use single quotes (e.g.):
unsigned int age{40};
std::ostringstream oss;
oss << "In 2022, I am " << age << '\n'; // 1. Single quotes around \n
oss << "In 2023, I will be " << age + 1u << "\n"; // 2. Minor ick--double quotes around \n
Because I'm writing a single character and not an arbitrary-length message, it doesn't seem necessary to have to provide a null-terminated string literal.
So I decided to measure the difference in speed. Naively, I'd expect option 1, the single-character version, to be faster (only one char, no need to handle \0). However, my test with Clang 13 on quick-bench indicates that option 2 is a hair faster. Is there an obvious reason for this?
Of course, if the program is spending a lot of time writing data to a stream anyway, chances are the program needs to be rethought. But I'd like to have a reasonably correct mental model, and because the opposite happened wrt what I expected, my model needs to be revised.
As you can see in the assembly and in the libc++ source here, both << operations in the end call the same function __put_character_sequence which the compiler decided to not inline in either case.
So, in the end you are passing a pointer to the single char object anyway and if there is a pointer indirection overhead it applies equally to both cases.
__put_character_sequence also takes the length of the string as argument, which the compiler can easily evaluate at compile-time for "\n" as well. So there is no benefit there any way either.
In the end it probably comes down to the compiler having to store the single character on the stack since without inlining it can't tell whether __put_character_sequence will modify it. (The string literal cannot be modified by the function and also would have the same identity between iterations of the loop.)
If the standard library used a different approach or the compiler did inline slightly differently, the result could easily be the other way around.

What is more efficient (C++) [duplicate]

Many C++ books contain example code like this...
std::cout << "Test line" << std::endl;
...so I've always done that too. But I've seen a lot of code from working developers like this instead:
std::cout << "Test line\n";
Is there a technical reason to prefer one over the other, or is it just a matter of coding style?
The varying line-ending characters don't matter, assuming the file is open in text mode, which is what you get unless you ask for binary. The compiled program will write out the correct thing for the system compiled for.
The only difference is that std::endl flushes the output buffer, and '\n' doesn't. If you don't want the buffer flushed frequently, use '\n'. If you do (for example, if you want to get all the output, and the program is unstable), use std::endl.
The difference can be illustrated by the following:
std::cout << std::endl;
is equivalent to
std::cout << '\n' << std::flush;
Use std::endl If you want to force an immediate flush to the output.
Use \n if you are worried about performance (which is probably not the case if you are using the << operator).
I use \n on most lines.
Then use std::endl at the end of a paragraph (but that is just a habit and not usually necessary).
Contrary to other claims, the \n character is mapped to the correct platform end of line sequence only if the stream is going to a file (std::cin and std::cout being special but still files (or file-like)).
There might be performance issues, std::endl forces a flush of the output stream.
There's another function call implied in there if you're going to use std::endl
a) std::cout << "Hello\n";
b) std::cout << "Hello" << std::endl;
a) calls operator << once.
b) calls operator << twice.
I recalled reading about this in the standard, so here goes:
See C11 standard which defines how the standard streams behave, as C++ programs interface the CRT, the C11 standard should govern the flushing policy here.
ISO/IEC 9899:201x
7.21.3 §7
At program startup, three text streams are predefined and need not be opened explicitly
— standard input (for reading conventional input), standard output (for writing
conventional output), and standard error (for writing diagnostic output). As initially
opened, the standard error stream is not fully buffered; the standard input and standard
output streams are fully buffered if and only if the stream can be determined not to refer
to an interactive device.
7.21.3 §3
When a stream is unbuffered, characters are intended to appear from the source or at the
destination as soon as possible. Otherwise characters may be accumulated and
transmitted to or from the host environment as a block. When a stream is fully buffered,
characters are intended to be transmitted to or from the host environment as a block when
a buffer is filled. When a stream is line buffered, characters are intended to be
transmitted to or from the host environment as a block when a new-line character is
encountered. Furthermore, characters are intended to be transmitted as a block to the host
environment when a buffer is filled, when input is requested on an unbuffered stream, or
when input is requested on a line buffered stream that requires the transmission of
characters from the host environment. Support for these characteristics is
implementation-defined, and may be affected via the setbuf and setvbuf functions.
This means that std::cout and std::cin are fully buffered if and only if they are referring to a non-interactive device. In other words, if stdout is attached to a terminal then there is no difference in behavior.
However, if std::cout.sync_with_stdio(false) is called, then '\n' will not cause a flush even to interactive devices. Otherwise '\n' is equivalent to std::endl unless piping to files: c++ ref on std::endl.
They will both write the appropriate end-of-line character(s). In addition to that endl will cause the buffer to be committed. You usually don't want to use endl when doing file I/O because the unnecessary commits can impact performance.
Not a big deal, but endl won't work in boost::lambda.
(cout<<_1<<endl)(3); //error
(cout<<_1<<"\n")(3); //OK , prints 3
If you use Qt and endl, you could accidentally end up using an incorrect endl which gives you very surprising results. See the following code snippet:
#include <iostream>
#include <QtCore/QtCore>
#include <QtGui/QtGui>
// notice that there is no "using namespace std;"
int main(int argc, char** argv)
QApplication qapp(argc,argv);
QMainWindow mw;
std::cout << "Finished Execution!" << endl;
// This prints something similar to: "Finished Execution!67006AB4"
return qapp.exec();
Note that I wrote endl instead of std::endl (which would have been correct) and apparently there is a endl function defined in qtextstream.h (which is part of QtCore).
Using "\n" instead of endl completely sidesteps any potential namespace issues.
This is also a good example why putting symbols into the global namespace (like Qt does by default) is a bad idea.
Something that I've never seen anyone say is that '\n' is affected by cout formatting:
#include <iostream>
#include <iomanip>
int main() {
std::cout << "\\n:\n" << std::setw(2) << std::setfill('0') << '\n';
std::cout << "std::endl:\n" << std::setw(2) << std::setfill('0') << std::endl;
Notice, how since '\n' is one character and fill width is set to 2, only 1 zero gets printed before '\n'.
I can't find anything about it anywhere, but it reproduces with clang, gcc and msvc.
I was super confused when I first saw it.
With reference This is an output-only I/O manipulator.
std::endl Inserts a newline character into the output sequence os and flushes it as if by calling os.put(os.widen('\n')) followed by os.flush().
When to use:
This manipulator may be used to produce a line of output immediately,
when displaying output from a long-running process, logging activity of multiple threads or logging activity of a program that may crash unexpectedly.
An explicit flush of std::cout is also necessary before a call to std::system, if the spawned process performs any screen I/O. In most other usual interactive I/O scenarios, std::endl is redundant when used with std::cout because any input from std::cin, output to std::cerr, or program termination forces a call to std::cout.flush(). Use of std::endl in place of '\n', encouraged by some sources, may significantly degrade output performance.

Portable end of line

is there any way to automatically use correct EOL character depending on the OS used?
I was thinking of something like std::eol?
I know that it is very easy to use preprocessor directives but curious if that is already available.
What I am interested in is that I usually have some messages in my applications that I combine later into a single string and I want to have them separated with a EOL. I know that I could use std::stringstream << endl but it seems to be an overkill sometimes instead of a regular append.
std::endl is defined to do nothing besides write '\n' to the stream and flush it (§ Flushing is defined to do nothing for a stringstream, so you're left with a pretty way of saying mystringstream << '\n'. The standard library implementation on your OS converts \n appropriately, so that's not your concern.
Thus endl is already the ultimate in performance and portability, and the only other thing you could want is << '\n' if you are trying to efficiently write to a file (not a stringstream). Well, << '\n' does also eliminate the pointless virtual call to stringbuf::flush. Unless profiling shows that empty function call to be taking time, don't think about it.
If you want to write a line separator to a stream:
std::cout << '\n';
std::cout << "\n";
std::cout << "whatever you were going to say anyway\n";
If the stream is text mode and the OS uses anything other than LF as a separator, it will be converted.
If you want to write a line separator and flush the stream:
std::cout << std::endl;
If you have binary-mode output for whatever reason, and you want to write a platform-specific line break, then I think you might have to do it indirectly (write '\n' to a text stream and then examine it in binary mode to see what you get). Possibly there's some way to directly get the line break sequence from the implementation, that I'm not aware of. It's not a great idea, anyway: if you're writing or reading a file in binary mode then it should be in a format which defines line breaks independently of the OS, or which doesn't have lines at all. That's what binary mode is for.
Just open a file in text mode
FILE *fp = fopen( "your_file.txt", "w+t" );
and then
fprintf( fp, "some string and integer %d\n", i );
and the OS will take care of the EOL accordingly to its standards.
Well, the STL has std::endl, which you can use as
std::cout << "Hi five!" << std::endl;
Note that besides adding an endline, std::endl also flushes the buffer, which may have undesirable performance consequences.
Files, even text files, are often transferred between machines, so "os-specific new line character" is an oxymoron.
It is though true that operating systems have a say on that matter, particularly one operating systems aka Windows, although many windows programs will read \n-spaced files correctly, even though the winapi multiline edit control would not. I suggest you consider twice what is the right for you: it's not necessarily what your OS recommends. If your files are ever to be stored on removable media, do not use OS standard. Use global standard, 0xA.

How do I create my own ostream/streambuf?

For educational purposes I want to create a ostream and stream buffer to do:
fix endians when doing << myVar;
store in a deque container instead of using std:cout or writing to a file
log extra data, such as how many times I did <<, how many times I did .write, the amount of bytes I written and how many times I flush(). But I do not need all the info.
I tried overloading but failed horribly. I tried overloading write by doing
ostream& write( const char* s, streamsize n )
in my basic_stringstream2 class (I copied paste basic_stringstream into my cpp file and modified it) but the code kept using basic_ostream. I looked through code and it looks like I need to overload xsputn (which isn't mention on this page http://www.cplusplus.com/reference/iostream/ostream ) but what else do I need to overload? and how do I construct my class (what does it need to inherit, etc)?
The canonical approach consists in defining your own streambuf.
You should have a look at:
Angelika LAnger's articles on IOStreams derivation
James Kanze's articles on filtering streambufs
boost.iostream for examples of application
For A+C) I think you should look at facets, they modify how objects are written as characters. You could store statistics here as well on how many times you streamed your objects.
Check out How to format my own objects when using STL streams? for an example.
For B) You need to create your own streambuf and connect your ostream to that buffer (constructor argument). See Luc's links + Deriving new streambuf classes.
In short you need to implement this for an ostream (minimum):
overflow (put a single char or flush buffer) (link)
xsputn (put a char array to buffer)(link)
sync (link)
I'm not sure that what you want to do is possible. The << operators are not virtual. So you could define yourstream &operator << (yourstream &strm, int i) to do what you want with the endian conversion and counting, and it will work when your code calls it directly. But if you pass a yourstream object into a function that expects an ostream, any time that function calls <<, it will go to the original ostream version instead of yours.
As I understand it, the streams facilities have been set up so that you can "easily" define a new stream type which uses a different sort of buffer (like, say, a deque of chars), and you can very easily add support for outputting your own classes via <<. I don't think you are intended to be able to redefine the middle layer between those.
And particularly, the entire point of the << interface is to provide nicely formatted text output, while it sounds like you actually want binary output. (Otherwise the reference to "endian" makes no sense.) Even assuming there is some way of doing this I don't know, it will produce awkward binary output at best. For instance, consider the end user overload to output a point in 3D space. The end user version of << will probably do something like << '(' << x << ", " << y << ", " << z << ')'. That will look nice in a text stream, but it's a lot of wasted and completely useless characters in a binary stream, which would ideally just use << x << y << z. (And how many calls to << should those count as?)

