Concurrent appends to a file : writes getting lost - c++

Are concurrent appends to a file support feature?
I tested this with concurrent threads + fstream for each thread. I see that data is not corrupt, but some writes are lost.
The file size is less than expected after the writes finish. Writes don't overlap.
If I write with custom seeks with each fstream where I coordinate which offset each thread will write, no writes are lost.
Here is the sample code :
#include <fstream>
#include <vector>
#include <thread>
#include "gtest/gtest.h"
void append_concurrently(string filename, const int data_in_gb, const int num_threads, const char start_char,
bool stream_cache = true) {
const int offset = 1024;
const long long num_records_each_thread = (data_in_gb * 1024 * ((1024 * 1024) / (num_threads * offset)));
{
auto write_file_fn = [&](int index) {
// each thread has its own handle
fstream file_handle(filename, fstream::app | fstream::binary);
if (!stream_cache) {
file_handle.rdbuf()->pubsetbuf(nullptr, 0); // no bufferring in fstream
}
vector<char> data(offset, (char)(index + start_char));
for (long long i = 0; i < num_records_each_thread; ++i) {
file_handle.write(data.data(), offset);
if (!file_handle) {
std::cout << "File write failed: "
<< file_handle.fail() << " " << file_handle.bad() << " " << file_handle.eof() << std::endl;
break;
}
}
// file_handle.flush();
};
auto start_time = chrono::high_resolution_clock::now();
vector<thread> writer_threads;
for (int i = 0; i < num_threads; ++i) {
writer_threads.push_back(std::thread(write_file_fn, i));
}
for (int i = 0; i < num_threads; ++i) {
writer_threads[i].join();
}
auto end_time = chrono::high_resolution_clock::now();
std::cout << filename << " Data written : " << data_in_gb << " GB, " << num_threads << " threads "
<< ", cache " << (stream_cache ? "true " : "false ") << ", size " << offset << " bytes ";
std::cout << "Time taken: " << (end_time - start_time).count() / 1000 << " micro-secs" << std::endl;
}
{
ifstream file(filename, fstream::in | fstream::binary);
file.seekg(0, ios_base::end);
// This EXPECT_EQ FAILS as file size is smaller than EXPECTED
EXPECT_EQ(num_records_each_thread * num_threads * offset, file.tellg());
file.seekg(0, ios_base::beg);
EXPECT_TRUE(file);
char data[offset]{ 0 };
for (long long i = 0; i < (num_records_each_thread * num_threads); ++i) {
file.read(data, offset);
EXPECT_TRUE(file || file.eof()); // should be able to read until eof
char expected_char = data[0]; // should not have any interleaving of data.
bool same = true;
for (auto & c : data) {
same = same && (c == expected_char) && (c != 0);
}
EXPECT_TRUE(same); // THIS PASSES
if (!same) {
std::cout << "corruption detected !!!" << std::endl;
break;
}
if (file.eof()) { // THIS FAILS as file size is smaller
EXPECT_EQ(num_records_each_thread * num_threads, i + 1);
break;
}
}
}
}
TEST(fstream, file_concurrent_appends) {
string filename = "file6.log";
const int data_in_gb = 1;
{
// trunc file before write threads start.
{
fstream file(filename, fstream::in | fstream::out | fstream::trunc | fstream::binary);
}
append_concurrently(filename, data_in_gb, 4, 'B', false);
}
std::remove(filename.c_str());
}
Edit:
I moved fstream to be shared by all threads. Now, for 512 byte buffer size, i see 8 writes totalling 4 KB lost consistently.
const int offset = 512;
const long long num_records_each_thread = (data_in_gb * 1024 * ((1024 * 1024) / (num_threads * offset)));
fstream file_handle(filename, fstream::app | fstream::binary);
if (!stream_cache) {
file_handle.rdbuf()->pubsetbuf(nullptr, 0); // no bufferring in fstream
}
Problem does not reproduce with 4KB buffer size.
Running main() from gtest_main.cc
Note: Google Test filter = *file_conc*_*append*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from fstream
[ RUN ] fstream.file_concurrent_appends
file6.log Data written : 1 GB, 1 threads , cache true , size 512 bytes Time taken: 38069289 micro-secs
d:\projs\logpoc\tests\test.cpp(279): error: Expected: num_records_each_thread * num_threads * offset
Which is: 1073741824
To be equal to: file.tellg()
Which is: 1073737728
d:\projs\logpoc\tests\test.cpp(301): error: Expected: num_records_each_thread * num_threads
Which is: 2097152
To be equal to: i + 1
Which is: 2097145
Edit 2:
Close file_handle after joining all threads to flush the data from the internal buffer. This resolved above issue.

According to §29.4.2 ¶7 of the official ISO C++20 standard, the functions provided by std::fstream are generally thread-safe.
However, if every thread has its own std::fstream object, then, as far as the C++ standard library is concerned, these are distinct streams and no synchronization will take place. Only the operating system's kernel will be aware that all file handles point to the same file. Therefore, any synchronization will have to be done by the kernel. But the kernel possibly isn't even aware that a write is supposed to go to the end of the file. Depending on your platform, it is possible that the kernel only receives write requests for certain file positions. If the end of file has meanwhile been moved by an append from another thread, then the position for a thread's previous write request may no longer be to the end of the file.
According to the documentation on std::fstream::open, opening a file in append mode will cause the stream to seek to the end of the file before every write. This behavior seems to be exactly what you want. But, for the reasons stated above, this will probably only work if all threads share the same std::fstream object. In that case, the std::fstream object should be able to synchronize all writes. In particular, it should be able to perform the seeks to the end of file and the subsequent writes atomically.

Related

Minor Page Faults when writing to mmaped file buffer

I'm noticing minor page faults when writing to a mmapped file buffer where the file is backed by disk.
My understanding of mmap is that for file mappings, the page cache has the file's data, and the page table will be updated to point to the file data in the page cache. This means that on the first write to the mmapped buffer, the page table will have to be updated to point to the page cache, and we may see minor page faults. However, as my benchmark below shows, even after pre-faulting the mmapped buffer, I still see minor page faults when doing random writes.
Note that these minor page faults only show up if I write to a random buffer (buf in the benchmark below) in-between writing to the mmapped buffer. Also note that these minor page faults do not seem to happen when using a tmpfs which is not disk backed.
So my question is why do we see these minor page faults when writing to a disk-backed file?
Here is the benchmark:
#include <iostream>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <fstream>
#include <sys/mman.h>
#include <sys/resource.h>
int main(int argc, char** argv) {
// Command line parsing.
if (argc != 2) {
std::cout << "Usage: ./bench <path to file>" << std::endl;
exit(1);
}
std::string filepath = argv[1];
// Open and truncate the file to be of size `FILE_LEN`.
int fd = open(filepath.c_str(), O_CREAT | O_TRUNC | O_RDWR, 0664);
const size_t FILE_LEN = (1 << 26); // 64MiB
if (fd < 0) {
std::cout << "open failed: " << strerror(errno) << std::endl;
exit(1);
}
if (ftruncate(fd, FILE_LEN) < 0) {
std::cout << "ftruncate failed: " << strerror(errno) << std::endl;
exit(1);
}
// `mmap` the file and pre-fault it.
char* ptr = static_cast<char*>(mmap(nullptr, FILE_LEN, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0));
if(ptr == MAP_FAILED) {
std::cout << "mmap failed: " << strerror(errno) << std::endl;
exit(1);
}
memset(ptr, 'A', FILE_LEN);
// Create a temporary buffer to clear the cache.
constexpr size_t BUF_LEN = (1 << 22); // 4MiB
char* buf = new char[BUF_LEN];
memset(buf, 'B', BUF_LEN);
std::cout << "Opened file " << fd << ", pre faulted " << ptr[rand() % FILE_LEN] << " " << buf[rand() % BUF_LEN]<< std::endl;
// Run the actual benchmark
rusage usage0, usage1;
getrusage(RUSAGE_THREAD, &usage0);
unsigned int x = 0;
for (size_t i = 0; i < 1000; ++i) {
char c = i % 26 + 'a';
const size_t WRITE_SIZE = 1024;
size_t start = i*WRITE_SIZE;
if (start + WRITE_SIZE >= FILE_LEN) {
start %= FILE_LEN;
}
memset(ptr + start, c, WRITE_SIZE);
x += ptr[start];
char d = (c * 142) % 26 + 'a';
for (size_t k = 0; k < BUF_LEN; ++k) {
buf[k] = d;
}
x += buf[int(d)];
}
std::cout << "Using the buffers " << ptr[rand() % FILE_LEN] << " " << buf[rand() % BUF_LEN] << " " << x << std::endl;
getrusage(RUSAGE_THREAD, &usage1);
std::cout << "========================" << std::endl;
std::cout << "Minor page faults = " << usage1.ru_minflt - usage0.ru_minflt << std::endl;
std::cout << "========================" << std::endl;
return 0;
}
Running ./bench "/dev/shm/test.txt" where /dev/shm/ uses the tmpfs filesystem, the benchmark always shows 0 minor page faults.
Running ./bench "/home/username/test.txt", the benchmark above shows ~200 minor page faults.
i.e. I see output like this with the above command:
========================
Minor page faults = 231
========================
Note that increasing the number of iterations in the benchmark correlates to an increase in number of minor page faults as well (e.g. changing number of iterations fromm 1000 to 2000 results in ~450 minor page faults).

std::ifstream::read does not read all 512 bytes, and sets eof and fail bits

Please take a look at a code snippet
std::ifstream input_file(input_file_path);
char* data = new (std::nothrow) char[block_size];
if (!data)
{
std::cout << "Failed to allocate a buffer" << '\n';
return -1;
}
request::request_id req_id = 1U;
while (input_file)
{
std::streamsize bytes_read = 0;
auto bytes_to_read = block_size;
do
{
input_file.read(data + bytes_read, block_size - bytes_read > 512 ? 512 : block_size - bytes_read);
bytes_read += input_file.gcount();
} while (input_file); // until eof
if (input_file.eof())
{
std::cout << "eofbit";
}
if (input_file.fail())
{
std::cout << "failbit";
}
if (input_file.bad())
{
std::cout << "badbit";
}
The file size is more than 8 megabytes. Code reads first 512 bytes, and 3 bytes at the second iteration.
Eof and fail bits are set.
What am doing wrong? Why does it execute this code like this?
You need to open the file as a binary file:
std::ifstream input_file(input_file_path, ios::binary | ios::in);
In text mode, some characters have special meanings (CR, newline, and EOF). You file probably has an EOF character that ends the input.

Write vector of unsigned char to binary file c++

I am reading binary file cmd.exe into unsigned chars array. Total bytes read into bytes_read are 153. I converted it to base64 string and then decode this string back (code from 2nd answer base64 decode snippet in c++) into vector<'BYTE>. Here BYTE is unsigned char.
decodedData.size() is also 153. But when I write this vector to file in binary mode to get my cmd.exe file again I get only 1 KB file. What thing I missed?
// Reading size of file
FILE * file = fopen("cmd.exe", "r+");
if (file == NULL) return 1;
fseek(file, 0, SEEK_END);
long int size = ftell(file);
fclose(file);
// Reading data to array of unsigned chars
file = fopen("cmd.exe", "r+");
unsigned char * myData = (unsigned char *)malloc(size);
int bytes_read = fread(myData, sizeof(unsigned char), size, file);
fclose(file);
std::string encodedData = base64_encode(&myData[0], bytes_read);
std::vector<BYTE> decodedData = base64_decode(encodedData);
////write data to file
ofstream outfile("cmd.exe", ios::out | ios::binary);
outfile.write((const char *)decodedData.data(), decodedData.size());
Update:
Thanks #chux for suggesting "r+" --> "rb+" Problem resolved.
You marked this as C++.
This is one C++ approach using fstream to read a binary file. To simplify for this example, I created a somewhat bigger m_buff than needed. From the comments, it sounds like your fopen("cmd.exe", "r+") was in error, so I'm only providing a C++ binary read.
Method tReader() a) opens a file in binary mode, b) reads the data into m_buff, and c) captures gCount for display.
It also demonstrates one possible use of chrono to measure duration.
#include <chrono>
// 'compressed' chrono access --------------vvvvvvv
typedef std::chrono::high_resolution_clock HRClk_t;
typedef HRClk_t::time_point Time_t;
typedef std::chrono::microseconds US_t;
using namespace std::chrono_literals; // suffixes 100ms, 2s, 30us
#include <iostream>
#include <fstream>
#include <cassert>
class T516_t
{
enum BuffConstraints : uint32_t {
Meg = (1024 * 1024),
END_BuffConstraints
};
char* m_buff;
int64_t m_gCount;
public:
T516_t()
: m_buff (nullptr)
, m_gCount (0)
{
m_buff = new char[Meg];
}
~T516_t() = default;
int exec()
{
tReader();
return(0);
}
private: // methods
void tReader()
{
std::string pfn = "/home/dmoen/.wine/drive_c/windows/system32/cmd.exe";
// open file in binary mode
std::ifstream sIn (pfn, std::ios_base::binary);
if (!sIn.is_open()) {
std::cerr << "UNREACHABLE: unable to open sIn " << pfn
<< " priviledges? media offline?";
return;
}
Time_t start_us = HRClk_t::now();
do
{
// perform read
sIn.read (m_buff, Meg);
// If the input sequence runs out of characters to extract (i.e., the
// end-of-file is reached) before n characters have been successfully
// read, buff contains all the characters read until that point, and
// both eofbit and failbit flags are set
m_gCount = sIn.gcount();
if(sIn.eof()) { break; } // exit when no more data
if(sIn.failbit ) {
std::cerr << "sIn.faileBit() set" << std::endl;
}
}while(1);
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
sIn.close();
std::cout << "\n " << pfn
<< " " << m_gCount << " bytes"
<< " " << duration_us.count() << " us"
<< std::endl;
} // int64_t tReader()
}; // class T516_t
int main(int , char**)
{
Time_t start_us = HRClk_t::now();
int retVal = -1;
{
T516_t t516;
retVal = t516.exec();
}
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << " FINI " << duration_us.count() << " us" << std::endl;
return(retVal);
}
One typical output on my system looks like:
/home/dmoen/.wine/drive_c/windows/system32/cmd.exe 722260 bytes 1180 us
FINI 1417 us
Your results will vary.
Your ofstream use looks good (so did not replicate).

Mapping UNIX Pipe to C++ std::cout

I am researching options of communicating processes in C++. Started with idea to bind Unix pipe to std::cout, but I could get it work. When writing directly using write(STDOUT_FILENO), I get expected result. When writing using std::cout, I get smaller and random output.
#include <iostream>
#include <unistd.h>
#include <fcntl.h>
const int PIPE_READ = 0;
const int PIPE_WRITE = 1;
int main() {
int pfd[2];
if(pipe(pfd) == -1){
std::cout << "Cannot create pipe" << std::endl;
return 0;
}
int pid = fork();
if(pid == -1){
std::cout << "Error on fork: " << errno << std::endl;
} else if(pid == 0) { // Child process
if(dup2(pfd[PIPE_WRITE],STDOUT_FILENO) < 0) {
std::cout << "Cannot redirect STDOUT: " << errno << std::endl;
return 0;
}
close(pfd[PIPE_WRITE]);
for(int i = 0; i < 8; i++){
int data = i;
write(STDOUT_FILENO,&data,sizeof(int)); // Works
//std::cout << data; // Don't work
}
} else { // Parent process
close(pfd[PIPE_WRITE]);
for(int i = 0; i < 8; i++){
int data;
ssize_t status;
if((status = read(pfd[PIPE_READ],&data,sizeof(int))) != sizeof(int)) {
std::cout << "Error (" << errno << ") on read: " << status << std::endl;
return -1;
}
std::cout << data << std::endl;
}
}
return 0;
}
Lets take a closer look at your writing:
write(STDOUT_FILENO,&data,sizeof(int)); // Works
//std::cout << data; // Don't work
The first "working" version write the contents of data in raw binary form to standard output. The second "non-working" version write the value of data as text to standard output.
If the value of data is 5 then the write call will write the integer value 5 while std::cout << data will write the integer value 53 (using ASCII encoding).
This of course have implications when you read the data as a raw and binary int in the parent.
If you want write the raw binary data to std::cout you have to use std::ostream::write:
std::cout.write(reinterpret_cast<char*>(&data), sizeof data);
The above line is equivalent to the write system-call you have.
Also important to know is that writing an int in raw form will write sizeof(int) bytes, usually four. Writing a single-digit integer as text will write a single byte.
Your loop will write eight numbers, which means it will write 32 bytes (4 * 8) if using write. If you output using << to std::cout then you will write 8 bytes. When you read you will read those 8 bytes and put into two single int values, then the read call will return 0 because the pipe has been closed.
What the values of those two int values will be depends on your hardware architecture, if it's little-endian or big-endian.

get error if i use fread, while no error using read

I'm trying to make some experiments on disk I/O using cache and not using it. In order to perform a read directly from the disk, I open the file with the O_DIRECT flag (defining the variable DISK_DIRECT).
Now the two branches of the if beneath, should perform the same operation, with the difference that one is helped by the cache and the other not.
The files to which I try to access are stored on disk and they do not change over time.
Also the two branches access to the same files.
At some point here, when I use fread I get ferror() to be true. While when I use read everything goes fine.
I'm sure they access the same files.
Do you have any idea why this could happen?
EDIT
Ok, i'm posting here an minimal example. the code i use is:
#include <iostream>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>
#include <fstream>
#include <sstream>
using namespace std;
typedef float fftwf_complex [2] ;
void fetch_level(unsigned long long tid, unsigned short level, fftwf_complex* P_read, fftwf_complex* P_fread, int n_coeff_per_level, FILE** files_o_direct, fstream* & files) {
int b_read;
fseek(files_o_direct[level],(long int) (tid * sizeof(fftwf_complex)*n_coeff_per_level), SEEK_SET);
b_read = fread(reinterpret_cast<char*>(P_fread),sizeof(fftwf_complex), n_coeff_per_level,files_o_direct[level]);
if(b_read == 0){
cerr << "nothing read\n";
}
files[level].seekg((streamoff) (tid * sizeof(fftwf_complex)*n_coeff_per_level), files[level].beg);
files[level].read(reinterpret_cast<char*>(P_read),
sizeof(fftwf_complex) * n_coeff_per_level);
}
void open_files (fstream* & files){
for(int i=0; i<1;i++) {
std::ostringstream oss;
oss << "./Test_fread_read/1.txt.bin";
files[i].open(oss.str().c_str(),
std::ios::in | std::ios::out |
std::ios::binary | std::ios::ate);
if (!files[i])
{
cerr << "fstream could not open " << oss.str() << endl;
}
}
}
void open_files_o_direct (FILE** files_o_direct, int* fd){
for(unsigned int i=0;i<1; i++){
std::ostringstream oss;
oss << "./Test_fread_read/1.txt.bin";
fd[i]=open(oss.str().c_str(), O_RDONLY | O_DIRECT);
files_o_direct[i] = fdopen(fd[i], "rb");
if(!files_o_direct[i])
cerr << "Could not open " << oss.str() << endl;
}
}
int close_files(FILE** files_o_direct, int* fd, fstream* & files) {
for(unsigned int i=0; i<1; i++){
//#if defined (DISK_DIRECT)
if(files_o_direct[i])
close(fd[i]);
//#else
if(files[i].is_open())
files[i].close();
//#endif
}
return 0;
}
int main(){
FILE**files_o_direct = new FILE* [256];
fstream* files = new fstream [256];
int * fd = new int [256];
fftwf_complex * P_read = new fftwf_complex [1];
fftwf_complex * P_fread = new fftwf_complex [1];
open_files_o_direct(files_o_direct, fd);
open_files(files);
fetch_level(2, 0, P_read, P_fread, 1, files_o_direct, files);
cout << "P_read: " << P_read[0][0] << " P_fread: " << P_fread[0][0] << endl;
cout << "P_read: " << P_read[0][1] << " P_fread: " << P_fread[0][1] << endl;
fetch_level(7, 0, P_read, P_fread, 1, files_o_direct, files);
cout << "P_read: " << P_read[0][0] << " P_fread: " << P_fread[0][0] << endl;
cout << "P_read: " << P_read[0][1] << " P_fread: " << P_fread[0][1] << endl;
fetch_level(8, 0, P_read, P_fread, 1, files_o_direct, files);
cout << "P_read: " << P_read[0][0] << " P_fread: " << P_fread[0][0] << endl;
cout << "P_read: " << P_read[0][1] << " P_fread: " << P_fread[0][1] << endl;
close_files(files_o_direct, fd, files);
delete [] P_read;
delete [] P_fread;
delete [] files;
delete [] files_o_direct;
return 0;
}
and the file which is accessed is:
0.133919 0.0458176
1.67441 2.40805
0.997525 -0.279977
-2.39672 -3.076
-0.0390913 0.854464
-0.0176478 -1.3142
-0.667981 -0.486272
0.831051 0.282802
-0.638032 -0.630943
-0.669854 -1.49762
which is stored in a binary format and that can be download from here: 1.txt.bin.
The output i get is:
nothing read
P_read: 0.997525 P_fread: 0
P_read: -0.279977 P_fread: 0
nothing read
P_read: 0.831051 P_fread: 0
P_read: 0.282802 P_fread: 0
nothing read
P_read: -0.638032 P_fread: 0
P_read: -0.630943 P_fread: 0
The problem persists even if i change the type of fftwf_complex from float[2] to simple float.
If i remove the fseek line everything works correctly.
This if (b_read == 0), will be true at the end of the file, and you will enter this branch
if(ferror(this->files_o_direct[level]))
fseek(this->files_o_direct[level], 0, SEEK_END); //ftell here returns 4800000
cerr << "nothing read\n";
even if ferror returns 0, the end of the file was reached anyway
fseek(this->files_o_direct[level], 0, SEEK_END);
makes no sense, and "nothing read\n" will be output either or not ferror returns nonzero.
From the manual page
fread() does not distinguish between end-of-file and error, and callers must use feof(3) and ferror(3) to determine which occurred.
so you have to check feof and if it is false you use ferror.
For who ever may have the same problem here there is the answer:
The O_DIRECT flag may impose alignment restrictions on the length and
address of user-space buffers and the file offset of I/Os. In Linux
alignment restrictions vary by filesystem and kernel version and
might be absent entirely. However there is currently no
filesystem-independent interface for an application to discover these
restrictions for a given file or filesystem. Some filesystems
provide their own interfaces for doing so, for example the
XFS_IOC_DIOINFO operation in xfsctl(3).
Under Linux 2.4, transfer sizes, and the alignment of the user buffer
and the file offset must all be multiples of the logical block size
of the filesystem. Since Linux 2.6.0, alignment to the logical block
size of the underlying storage (typically 512 bytes) suffices. The
logical block size can be determined using the ioctl(2) BLKSSZGET
operation or from the shell using the command:
blockdev --getss
linux reference page