Does posix_fallocate work with files opened with appened mode? - c++

I'm trying to preallocate disk space for file operations, however, I encounter one weird issue that posix_fallocate only alloates one byte when I call it to allocate disk space for files opened with append mode and file contents are also unexpected. Has anyone known this issue? And my test codes are,
#include <cstdio>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <cerrno>
int main(int argc, char **argv)
{
FILE *fp = fopen("append.txt", "w");
for (int i = 0; i < 5; ++i)
fprintf(fp, "## Test loop %d\n", i);
fclose(fp);
sleep(1);
int fid = open("append.txt", O_WRONLY | O_APPEND);
struct stat status;
fstat(fid, &status);
printf("INFO: sizeof 'append.txt' is %ld Bytes.\n", status.st_size);
int ret = posix_fallocate(fid, (off_t)status.st_size, 1024);
if (ret) {
switch (ret) {
case EBADF:
fprintf(stderr, "ERROR: %d is not a valid file descriptor, or is not opened for writing.\n", fid);
break;
case EFBIG:
fprintf(stderr, "ERROR: exceed the maximum file size.\n");
break;
case ENOSPC:
fprintf(stderr, "ERROR: There is not enough space left on the device\n");
break;
default:
break;
}
}
fstat(fid, &status);
printf("INFO: sizeof 'append.txt' is %ld Bytes.\n", status.st_size);
char *hello = "hello world\n";
write(fid, hello, 12);
close(fid);
return 0;
}
And the expected result should be,
## Test loop 0
## Test loop 1
## Test loop 2
## Test loop 3
## Test loop 4
hello world
However, the result of above program is,
## Test loop 0
## Test loop 1
## Test loop 2
## Test loop 3
## Test loop 4
^#hello world
So, what's "^#"?
And the message shows,
INFO: sizeof 'append.txt' is 75 Bytes.
INFO: sizeof 'append.txt' is 76 Bytes.
Any clues?
Thanks

Quick Answer
Yes, posix_fallocate does work with files opened in APPEND mode. IF your filesystem supports the fallocate system call. If your filesystem does not support it the glibc emulation adds a single 0 byte to the end in APPEND mode.
More Information
This was a strange one and really puzzled me. I found the answer by using the strace program which shows what system calls are being made.
Check this out:
fallocate(3, 0, 74, 1000) = -1 EOPNOTSUPP (Operation not
supported)
fstat(3, {st_mode=S_IFREG|0664, st_size=75, ...}) = 0
fstatfs(3, {f_type=0xf15f, f_bsize=4096, f_blocks=56777565,
f_bfree=30435527, f_bavail=27551380, f_files=14426112,
f_ffree=13172614, f_fsid={1863489073, -1456395543}, f_namelen=143,
f_frsize=4096}) = 0
pwrite(3, "\0", 1, 1073) = 1
It looks like the GNU C Library is trying to help you here. The fallocate system call is apparently not implemented on your filesystem, so GLibC is emulating it by using pwrite to write a 0 byte out at the end of the requested allocation, thus extending the file.
This works fine in normal write mode. But in APPEND mode the write is always done at the end of the file so the pwrite writes one 0 byte at the end.
Not what was intended. Might be a GNU C Library bug.
It looks like ext4 does support fallocate. And if I write the file into /tmp it works. It fails in my home directory because I am using an encrypted home directory in Ubuntu with the ecryptfs filesystem

Per POSIX:
If the offset+ len is beyond the current file size, then posix_fallocate() shall adjust the file size to offset+ len. Otherwise, the file size shall not be changed.
So it doesn't make sense to use posix_fallocate with append mode, since it will extend the size of the file (filled with null bytes) and subsequent writes will take place after those null bytes, in space that's not yet reserved.
As for why it's only extending the file by one byte, are you sure that's correct? Have you measured? That sounds like a bug in the implementation.

Related

Why in this case the stdin fd is not ready

According to Linux Programmer's Manual, poll can wait for one of a set of file descriptors to become ready to perform I/O.
According to my understanding, if I add POLLIN to events, poll will return with a > 0 integer, when there is at least one fd which is ready to be read.
Consider the following code, In this code, I want the program echos my input immediately after I typed the character \n.
int main(){
char buffer[maxn];
while (true) {
struct pollfd pfd[1];
std::memset(pfd, 0, sizeof pfd);
pfd[0].fd = STDIN_FILENO;
pfd[0].events = POLLIN;
int ret = poll(pfd, 1, 1000);
if (ret < 0) {
}
else if (ret == 0) {
}
else {
if ((pfd[0].revents & POLLIN) == POLLIN) {
int n;
n = fscanf(stdin, "%s", &buffer);
if(n > 0){
printf("data from stdin: %s\n", buffer);
}
}else if((pfd[1].revents & POLLHUP) == POLLHUP){
break;
}
}
}
}
When I type
aa bb cc dd
I thought fscanf hasn't retrieved all data from stdin, because it only reads aa. So when the loop restarts, stdin's fd should still be ready. As a consequence, (pfd[0].revents & POLLIN) == POLLIN still stands, so I thought we can see the following output
data from stdin: aa
data from stdin: bb
data from stdin: cc
data from stdin: dd
However, actually only the first line is printed. I got strange here, I think this is similar with epoll's Edge-triggered mode. However, poll is level-triggered.
So can you explain why this happens with fscanf?
Polling works at the file descriptor level while fscanf works at the higher file handle level.
At the higher level, the C runtime library is free to cache the input stream in such a way that it would affect what you can see at the lower level.
For example (and this is probably what's happening here), the first time you fscanf your word aa, the entire line is read from the file descriptor and cached, before that first word is handed back to you.
A subsequent fscanf (with no intervening poll) would first check the cache to get the next word and, if it weren't there, it would go back to the file descriptor to get more input.
Unfortunately, the fact that you're checking for a poll event before doing this is causing problems. As far as the file descriptor level goes, the entire line has been read by your first fscanf so no further input is available - poll will therefore wait until such information does become available.
You can see this in action if you change:
n = fscanf(stdin, "%s", buffer);
into:
n = read(STDIN_FILENO, buffer, 3);
and change the printf to:
printf("data from stdin: %*.*s\n", n, n, buffer);
In that case, you do get the output you expect as soon as you press the ENTER key:
data from stdin: aa
data from stdin: bb
data from stdin: cc
data from stdin: dd
Just keep in mind that sample code is reading up to three characters (like aa<space>) rather than a word. It's more to illustrate what the problem is rather than give you the solution (to match your question "Can you explain why this happens?").
The solution is not to mix descriptor and handle based I/O when the caching of the latter can affect the former.

File read() hangs on binary large file

I'm working on a benchmark program. Upon making the read() system call, the program appears to hang indefinitely. The target file is 1 GB of binary data and I'm attempting to read directly into buffers that can be 1, 10 or 100 MB in size.
I'm using std::vector<char> to implement dynamically-sized buffers and handing off &vec[0] to read(). I'm also calling open() with the O_DIRECT flag to bypass kernel caching.
The essential coding details are captured below:
std::string fpath{"/path/to/file"};
size_t tries{};
int fd{};
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
// Throw exception if error opening file
if (fd == -1) {
ostringstream ss {};
switch (errno) {
case EACCES:
ss << "Error accessing file " << fpath << ": Permission denied";
break;
case EINVAL:
ss << "Invalid file open flags; system may also not support O_DIRECT flag, required for this benchmark";
break;
case ENAMETOOLONG:
ss << "Invalid path name: Too long";
break;
case ENOMEM:
ss << "Kernel error: Out of memory";
}
throw invalid_argument {ss.str()};
}
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
Poking through the executable with gdb shows that buffers are allocated correctly, and the file I've tested with checks out in xxd. I'm using g++ 7.3.1 (with C++11 support) to compile my code on a Fedora Server 27 VM.
Why is read() hanging on large binary files?
Edit: Code example updated to more accurately reflect error checking.
There are multiple problems with your code.
This code will never work properly if errno ever has a value equal to EINTR:
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
That code won't stop when the file has been successfully opened and will keep reopening the file over and over and leak file descriptors as it keeps looping once errno is EINTR.
This would be better:
do
{
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
while ( ( -1 == fd ) && ( EINTR == errno ) && ( tries < MAX_ATTEMPTS ) );
Second, as noted in the comments, O_DIRECT can impose alignment restrictions on memory. You might need page-aligned memory:
So
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
becomes
size_t buf_sz{1024*1024}; // 1 MiB buffer
// page-aligned buffer
buffer = mmap( 0, buf_sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, NULL );
auto bytes_read = read(fd, &buffer[0], buf_sz);
Note also the the Linux implementation of O_DIRECT can be very dodgy. It's been getting better, but there are still potential pitfalls that aren't very well documented at all. Along with alignment restrictions, if the last amount of data in the file isn't a full page, for example, you may not be able to read it if the filesystem's implementation of direct IO doesn't allow you to read anything but full pages (or some other block size). Likewise for write() calls - you may not be able to write just any number of bytes, you might be constrained to something like a 4k page.
This is also critical:
Most examples of read() hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
Some devices simply do not support direct IO. They should return an error, but again, the O_DIRECT implementation on Linux can be very hit-or-miss.
Pasting your program and running on my linux system, was a working and non-hanging program.
The most likely cause for the failure is the file is not a file-system item, or it has a hardware element which is not working.
Try with a smaller size - to confirm, and try on a different machine to help diagnose
My complete code (with no error checking)
#include <vector>
#include <string>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main( int argc, char ** argv )
{
std::string fpath{"myfile.txt" };
auto fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
}
myfile.txt was created with
dd if=/dev/zero of=myfile.txt bs=1024 count=1024
If the file is not 1Mb in size, it may fail.
If the file is a pipe, it can block until the data is available.
Most examples of read() hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
O_DIRECT flag is useful for filesystems and block devices. With this flag people normally map pages into the user space.
For sockets, pipes and serial devices it is plain useless because the kernel does not cache that data.
Your updated code hangs because fd is initialized with 0 which is STDIN_FILENO and it never opens that file, then it hangs reading from stdin.

Size error on read file

RESOLVED
I'm trying to make a simple file loader.
I aim to get the text from a shader file (plain text file) into a char* that I will compile later.
I've tried this function:
char* load_shader(char* pURL)
{
FILE *shaderFile;
char* pShader;
// File opening
fopen_s( &shaderFile, pURL, "r" );
if ( shaderFile == NULL )
return "FILE_ER";
// File size
fseek (shaderFile , 0 , SEEK_END);
int lSize = ftell (shaderFile);
rewind (shaderFile);
// Allocating size to store the content
pShader = (char*) malloc (sizeof(char) * lSize);
if (pShader == NULL)
{
fputs ("Memory error", stderr);
return "MEM_ER";
}
// copy the file into the buffer:
int result = fread (pShader, sizeof(char), lSize, shaderFile);
if (result != lSize)
{
// size of file 106/113
cout << "size of file " << result << "/" << lSize << endl;
fputs ("Reading error", stderr);
return "READ_ER";
}
// Terminate
fclose (shaderFile);
return 0;
}
But as you can see in the code I have a strange size difference at the end of the process which makes my function crash.
I must say I'm quite a beginner in C so I might have missed some subtilities regarding the memory allocation, types, pointers...
How can I solve this size issue?
*EDIT 1:
First, I shouldn't return 0 at the end but pShader; that seemed to be what crashed the program.
Then, I change the type of reult to size_t, and added a end character to pShader, adding pShdaer[result] = '/0'; after its declaration so I can display it correctly.
Finally, as #JamesKanze suggested, I turned fopen_s into fopen as the previous was not usefull in my case.
First, for this sort of raw access, you're probably better off
using the system level functions: CreateFile or open,
ReadFile or read and CloseHandle or close, with
GetFileSize or stat to get the size. Using FILE* or
std::filebuf will only introduce an additional level of
buffering and processing, for no gain in your case.
As to what you are seeing: there is no guarantee that an ftell
will return anything exploitable as a numeric value; it could
very well be just a magic cookie. On most current systems, it
is a byte offset into the physical file, but on any non-Unix
system, the offset into the physical file will not map directly
to the logical file you are reading unless you open the file in
binary mode. If you use "rb" to open the file, you'll
probably see the same values. (Theoretically, you could get
extra 0's at the end of the file, but practically, the OS's
where that happened are either extinct, or only used on legacy
mainframes.)
EDIT:
Since the answer stating this has been deleted: you should loop
on the fread until it returns 0 (setting errno to 0 before
each call, and checking it after the return to see whether the
function returned because of an error or because it reached the
end of file). Having said this: if you're on one of the usual
Windows or Unix systems, and the file is local to the machine,
and not too big, fread will read it all in one go. The
difference in size you are seeing (given the numerical values
you posted) is almost certainly due to the fact that the two
byte Windows line endings are being mapped to a single '\n'
character. To avoid this, you must open in binary mode;
alternatively, if you really are dealing with text (and want
this mapping), you can just ignore the extra bytes in your
buffer, setting the '\0' terminator after the last byte
actually read.

working of fwrite in c++

I am trying to simulate race conditions in writing to a file. This is what I am doing.
Opening a.txt in append mode in process1
writing "hello world" in process1
prints the ftell in process1 which is 11
put process1 in sleep
open a.txt again in append mode in process2
writing "hello world" in process2 (this correctly appends to the end of the file)
prints the ftell in process2 which is 22 (correct)
writing "bye world" in process2 (this correctly appends to the end of the file).
process2 quits
process1 resumes, and prints its ftell value, which is 11.
writing "bye world" by process1 --- i assume as the ftell of process1 is 11, this should overwrite the file.
However, the write of process1 is writing to the end of the file and there is no contention in writing between the processes.
I am using fopen as fopen("./a.txt", "a+)
Can anyone tell why is this behavior and how can I simulate the race condition in writing to the file?
The code of process1:
#include <iostream>
#include <fstream>
#include <string>
#include <stdio.h>
#include "time.h"
using namespace std;
int main()
{
FILE *f1= fopen("./a.txt","a+");
cout<<"opened file1"<<endl;
string data ("hello world");
fwrite(data.c_str(), sizeof(char), data.size(), f1);
fflush(f1);
cout<<"file1 tell "<<ftell(f1)<<endl;
cout<<"wrote file1"<<endl;
sleep(3);
string data1 ("bye world");;
cout<<"wrote file1 end"<<endl;
cout<<"file1 2nd tell "<<ftell(f1)<<endl;
fwrite(data1.c_str(), sizeof(char), data1.size(), f1);
cout<<"file1 2nd tell "<<ftell(f1)<<endl;
fflush(f1);
return 0;
}
In process2, I have commented out the sleep statement.
I am using the following script to run:
./process1 &
sleep 2
./process2 &
Thanks for your time.
The writer code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BLOCKSIZE 1000000
int main(int argc, char **argv)
{
FILE *f = fopen("a.txt", "a+");
char *block = malloc(BLOCKSIZE);
if (argc < 2)
{
fprintf(stderr, "need argument\n");
}
memset(block, argv[1][0], BLOCKSIZE);
for(int i = 0; i < 3000; i++)
{
fwrite(block, sizeof(char), BLOCKSIZE, f);
}
fclose(f);
}
The reader function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BLOCKSIZE 1000000
int main(int argc, char **argv)
{
FILE *f = fopen("a.txt", "r");
int c;
int oldc = 0;
int rl = 0;
while((c = fgetc(f)) != EOF)
{
if (c != oldc)
{
if (rl)
{
printf("Got %d of %c\n", rl, oldc);
}
oldc = c;
rl = 0;
}
rl++;
}
fclose(f);
}
I ran ./writefile A & ./writefile B then ./readfile
I got this:
Got 1000999424 of A
Got 999424 of B
Got 999424 of A
Got 4096 of B
Got 4096 of A
Got 995328 of B
Got 995328 of A
Got 4096 of B
Got 4096 of A
Got 995328 of B
Got 995328 of A
Got 4096 of B
Got 4096 of A
Got 995328 of B
Got 995328 of A
Got 4096 of B
Got 4096 of A
Got 995328 of B
Got 995328 of A
Got 4096 of B
Got 4096 of A
Got 995328 of B
Got 995328 of A
As you can see, there are nice long runs of A and B, but they are not exactly 1000000 characters long, which is the size I wrote them. The whole file, after a trialrun with a smaller size in the first run is just short of 7GB.
For reference: Fedora Core 16, with my own compiled 3.7rc5 kernel, gcc 4.6.3, x86-64, and ext4 on top of lvm, AMD PhenomII quad core processor, 16GB of RAM
Writing in append mode is an atomic operation. This is why it doesn't break.
Now... how to break it?
Try memory mapping the file and writing in the memory from the two processes. I'm pretty sure this will break it.
I'm pretty sure you can't RELY on this behaviour, but it may well work reliably on some systems. Writing to the same file from two different processes is likely to cause problems sooner or later, if you "try hard enough". And sod's law says that that's exactly when your boss is checking if the software works, when your customer takes delivery of the system you've sold, or when you are finalizing your report that took ages to produce, or some other important time.
The behavior you're trying to break or see depends on which OS you are working on, as writing in a file is a system call.
On what you told us about the first file descriptor to not overwrite what the second process wrote, the fact you opened the file in append mode in both process may have actualized the ftell value before actually writing in it.
Did you try to do the same with the standard open and write functions? Might be interesting as well.
EDIT: The C++ Reference doc explains about the fopen append option here:
"append/update: Open a file for update (both for input and output) with all output operations writing data at the end of the file. Repositioning operations (fseek, fsetpos, rewind) affects the next input operations, but output operations move the position back to the end of file."
This explains the behavior you observed.

C/C++ best way to send a number of bytes to stdout

Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?
void print(){
unsigned char temp[9];
temp[0] = matrix[0][0];
temp[1] = matrix[0][1];
temp[2] = matrix[0][2];
temp[3] = matrix[1][0];
temp[4] = matrix[1][1];
temp[5] = matrix[1][2];
temp[6] = matrix[2][0];
temp[7] = matrix[2][1];
temp[8] = matrix[2][2];
fwrite(temp,1,9,stdout);
}
Matrix is defined globally to be a unsigned char matrix[3][3];
IO is not an inexpensive operation. It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call write to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.
The only lower level function you can use (if you're developing on a *nix machine), is to use the raw write function, but even then your performance will not be that much faster than it is now. Simply put: IO is expensive.
The top rated answer claims that IO is slow.
Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.
Write 10 million records from a nine byte array
Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1
340ms to /dev/null
710ms to 90MB output file
15254ms to 90MB output file in "dribs" mode
FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0
450ms to /dev/null
550ms to 90MB output file on ZFS triple mirror
1150ms to 90MB output file on FFS system drive
22154ms to 90MB output file in "dribs" mode
There's nothing slow about IO if you can afford to buffer properly.
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
int main (int argc, char* argv[])
{
int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
int err;
int i;
enum { BigBuf = 4*1024*1024 };
char* outbuf = malloc (BigBuf);
assert (outbuf != NULL);
err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering
assert (err == 0);
enum { ArraySize = 9 };
char temp[ArraySize];
enum { Count = 10*1000*1000 };
for (i = 0; i < Count; ++i) {
fwrite (temp, 1, ArraySize, stdout);
if (dribs) fflush (stdout);
}
fflush (stdout); // seems to be needed after setting own buffer
fclose (stdout);
if (outbuf) { free (outbuf); outbuf = NULL; }
}
The rawest form of output you can do is the probable the write system call, like this
write (1, matrix, 9);
1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.
I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like this
fcntl (1, F_SETFL, O_NONBLOCK);
YMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.
Perhaps your problem is not that fwrite() is slow, but that it is buffered.
Try calling fflush(stdout) after the fwrite().
This all really depends on your definition of slow in this context.
All printing is fairly slow, although iostreams are really slow for printing.
Your best bet would be to use printf, something along the lines of:
printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);
As everyone has pointed out IO in tight inner loop is expensive. I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.
If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes. e.g app.exe > matrixDump.txt
What's wrong with:
fwrite(matrix,1,9,stdout);
both the one and the two dimensional arrays take up the same memory.
Try running the program twice. Once with output and once without. You will notice that overall, the one without the io is the fastest. Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.
So first, don't print on every entry. Basically what i am saying is do not do like that.
for(int i = 0; i<100; i++){
printf("Your stuff");
}
instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that
char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
char[i] = 1; //your 8 byte value goes here
}
//once you are done print it to a ocnsole with
write(1, buffer, 100);
but in your case, just use write(1, temp, 9);
I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:
❯ yes | dd of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s
vs
> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s
The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.
You can simply:
std::cout << temp;
printf is more C-Style.
Yet, IO operations are costly, so use them wisely.