I'm noticing minor page faults when writing to a mmapped file buffer where the file is backed by disk.
My understanding of mmap is that for file mappings, the page cache has the file's data, and the page table will be updated to point to the file data in the page cache. This means that on the first write to the mmapped buffer, the page table will have to be updated to point to the page cache, and we may see minor page faults. However, as my benchmark below shows, even after pre-faulting the mmapped buffer, I still see minor page faults when doing random writes.
Note that these minor page faults only show up if I write to a random buffer (buf in the benchmark below) in-between writing to the mmapped buffer. Also note that these minor page faults do not seem to happen when using a tmpfs which is not disk backed.
So my question is why do we see these minor page faults when writing to a disk-backed file?
Here is the benchmark:
#include <iostream>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <fstream>
#include <sys/mman.h>
#include <sys/resource.h>
int main(int argc, char** argv) {
// Command line parsing.
if (argc != 2) {
std::cout << "Usage: ./bench <path to file>" << std::endl;
exit(1);
}
std::string filepath = argv[1];
// Open and truncate the file to be of size `FILE_LEN`.
int fd = open(filepath.c_str(), O_CREAT | O_TRUNC | O_RDWR, 0664);
const size_t FILE_LEN = (1 << 26); // 64MiB
if (fd < 0) {
std::cout << "open failed: " << strerror(errno) << std::endl;
exit(1);
}
if (ftruncate(fd, FILE_LEN) < 0) {
std::cout << "ftruncate failed: " << strerror(errno) << std::endl;
exit(1);
}
// `mmap` the file and pre-fault it.
char* ptr = static_cast<char*>(mmap(nullptr, FILE_LEN, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0));
if(ptr == MAP_FAILED) {
std::cout << "mmap failed: " << strerror(errno) << std::endl;
exit(1);
}
memset(ptr, 'A', FILE_LEN);
// Create a temporary buffer to clear the cache.
constexpr size_t BUF_LEN = (1 << 22); // 4MiB
char* buf = new char[BUF_LEN];
memset(buf, 'B', BUF_LEN);
std::cout << "Opened file " << fd << ", pre faulted " << ptr[rand() % FILE_LEN] << " " << buf[rand() % BUF_LEN]<< std::endl;
// Run the actual benchmark
rusage usage0, usage1;
getrusage(RUSAGE_THREAD, &usage0);
unsigned int x = 0;
for (size_t i = 0; i < 1000; ++i) {
char c = i % 26 + 'a';
const size_t WRITE_SIZE = 1024;
size_t start = i*WRITE_SIZE;
if (start + WRITE_SIZE >= FILE_LEN) {
start %= FILE_LEN;
}
memset(ptr + start, c, WRITE_SIZE);
x += ptr[start];
char d = (c * 142) % 26 + 'a';
for (size_t k = 0; k < BUF_LEN; ++k) {
buf[k] = d;
}
x += buf[int(d)];
}
std::cout << "Using the buffers " << ptr[rand() % FILE_LEN] << " " << buf[rand() % BUF_LEN] << " " << x << std::endl;
getrusage(RUSAGE_THREAD, &usage1);
std::cout << "========================" << std::endl;
std::cout << "Minor page faults = " << usage1.ru_minflt - usage0.ru_minflt << std::endl;
std::cout << "========================" << std::endl;
return 0;
}
Running ./bench "/dev/shm/test.txt" where /dev/shm/ uses the tmpfs filesystem, the benchmark always shows 0 minor page faults.
Running ./bench "/home/username/test.txt", the benchmark above shows ~200 minor page faults.
i.e. I see output like this with the above command:
========================
Minor page faults = 231
========================
Note that increasing the number of iterations in the benchmark correlates to an increase in number of minor page faults as well (e.g. changing number of iterations fromm 1000 to 2000 results in ~450 minor page faults).
Related
The question says it all. I am going in circles here. I set snd_pcm_sw_params_set_stop_threshold to boundary (and zero too just for fun) and I am still getting buffer underrun errors on snd_pcm_writei. I cannot understand why. The documentation is pretty clear on this:
If the stop threshold is equal to boundary (also software parameter - sw_param) then automatic stop will be disabled
Here is a minimally reproducible example:
#include <alsa/asoundlib.h>
#include <iostream>
#define AUDIO_DEV "default"
#define AC_FRAME_SIZE 960
#define AC_SAMPLE_RATE 48000
#define AC_CHANNELS 2
//BUILD g++ -o main main.cpp -lasound
using namespace std;
int main() {
int err;
unsigned int i;
snd_pcm_t *handle;
snd_pcm_sframes_t frames;
snd_pcm_uframes_t boundary;
snd_pcm_sw_params_t *sw;
snd_pcm_hw_params_t *params;
unsigned int s_rate;
unsigned int buffer_time;
snd_pcm_uframes_t f_size;
unsigned char buffer[AC_FRAME_SIZE * 2];
int rc;
for (i = 0; i < sizeof(buffer); i++)
buffer[i] = random() & 0xff;
if ((err = snd_pcm_open(&handle, AUDIO_DEV, SND_PCM_STREAM_PLAYBACK, 0)) < 0) {
cout << "open error " << snd_strerror(err) << endl;
return 0;
}
s_rate = AC_SAMPLE_RATE;
f_size = AC_FRAME_SIZE;
buffer_time = 2500;
cout << s_rate << " " << f_size << endl;
snd_pcm_hw_params_alloca(¶ms);
snd_pcm_hw_params_any(handle, params);
snd_pcm_hw_params_set_access(handle, params, SND_PCM_ACCESS_RW_INTERLEAVED);
snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE);
snd_pcm_hw_params_set_channels(handle, params, AC_CHANNELS);
snd_pcm_hw_params_set_rate_near(handle, params, &s_rate, 0);
snd_pcm_hw_params_set_period_size_near(handle, params, &f_size, 0);
cout << s_rate << " " << f_size << endl;
rc = snd_pcm_hw_params(handle, params);
if (rc < 0) {
cout << "open error " << snd_strerror(err) << endl;
return 0;
}
snd_pcm_sw_params_alloca(&sw);
snd_pcm_sw_params_current(handle, sw);
snd_pcm_sw_params_get_boundary(sw, &boundary);
snd_pcm_sw_params_set_stop_threshold(handle, sw, boundary);
rc = snd_pcm_sw_params(handle, sw);
if (rc < 0) {
cout << "open error " << snd_strerror(err) << endl;
return 0;
}
snd_pcm_sw_params_current(handle, sw);
snd_pcm_sw_params_get_stop_threshold(sw, &boundary);
cout << "VALUE " << boundary << endl;
for (i = 0; i < 1600; i++) {
usleep(100 * 1000);
frames = snd_pcm_writei(handle, buffer, f_size);
if (frames < 0)
frames = snd_pcm_recover(handle, frames, 0);
if (frames < 0) {
cout << "open error " << snd_strerror(frames) << endl;
break;
}
}
return 0;
}
Okay I figured it out. To anyone who runs into this issue who also has pipewire or pulse (or any other thirdparty non-alsa audio card) enabled as the "default" card the solution is to not use pipewire or pulse directly. It seems that snd_pcm_sw_params_set_stop_threshold is not implemented properly in pipewire/pulseaudio. You'll notice that if you disable pipewire or pulse this code will run exactly the way you want it to run.
Here is how you can disable pulseaudio (which was the issue on my system):
systemctl --user stop pulseaudio.socket
systemctl --user stop pulseaudio.service
Although a much better solution is to just set the AUDIO_DEV to write directly to an alsa card. You can find the names of these cards by running aplay -L. But in 95% of cases updating AUDIO_DEV in my sample code to the following:
#define AUDIO_DEV "hw:0,0"
Will usually fix the issue.
Are concurrent appends to a file support feature?
I tested this with concurrent threads + fstream for each thread. I see that data is not corrupt, but some writes are lost.
The file size is less than expected after the writes finish. Writes don't overlap.
If I write with custom seeks with each fstream where I coordinate which offset each thread will write, no writes are lost.
Here is the sample code :
#include <fstream>
#include <vector>
#include <thread>
#include "gtest/gtest.h"
void append_concurrently(string filename, const int data_in_gb, const int num_threads, const char start_char,
bool stream_cache = true) {
const int offset = 1024;
const long long num_records_each_thread = (data_in_gb * 1024 * ((1024 * 1024) / (num_threads * offset)));
{
auto write_file_fn = [&](int index) {
// each thread has its own handle
fstream file_handle(filename, fstream::app | fstream::binary);
if (!stream_cache) {
file_handle.rdbuf()->pubsetbuf(nullptr, 0); // no bufferring in fstream
}
vector<char> data(offset, (char)(index + start_char));
for (long long i = 0; i < num_records_each_thread; ++i) {
file_handle.write(data.data(), offset);
if (!file_handle) {
std::cout << "File write failed: "
<< file_handle.fail() << " " << file_handle.bad() << " " << file_handle.eof() << std::endl;
break;
}
}
// file_handle.flush();
};
auto start_time = chrono::high_resolution_clock::now();
vector<thread> writer_threads;
for (int i = 0; i < num_threads; ++i) {
writer_threads.push_back(std::thread(write_file_fn, i));
}
for (int i = 0; i < num_threads; ++i) {
writer_threads[i].join();
}
auto end_time = chrono::high_resolution_clock::now();
std::cout << filename << " Data written : " << data_in_gb << " GB, " << num_threads << " threads "
<< ", cache " << (stream_cache ? "true " : "false ") << ", size " << offset << " bytes ";
std::cout << "Time taken: " << (end_time - start_time).count() / 1000 << " micro-secs" << std::endl;
}
{
ifstream file(filename, fstream::in | fstream::binary);
file.seekg(0, ios_base::end);
// This EXPECT_EQ FAILS as file size is smaller than EXPECTED
EXPECT_EQ(num_records_each_thread * num_threads * offset, file.tellg());
file.seekg(0, ios_base::beg);
EXPECT_TRUE(file);
char data[offset]{ 0 };
for (long long i = 0; i < (num_records_each_thread * num_threads); ++i) {
file.read(data, offset);
EXPECT_TRUE(file || file.eof()); // should be able to read until eof
char expected_char = data[0]; // should not have any interleaving of data.
bool same = true;
for (auto & c : data) {
same = same && (c == expected_char) && (c != 0);
}
EXPECT_TRUE(same); // THIS PASSES
if (!same) {
std::cout << "corruption detected !!!" << std::endl;
break;
}
if (file.eof()) { // THIS FAILS as file size is smaller
EXPECT_EQ(num_records_each_thread * num_threads, i + 1);
break;
}
}
}
}
TEST(fstream, file_concurrent_appends) {
string filename = "file6.log";
const int data_in_gb = 1;
{
// trunc file before write threads start.
{
fstream file(filename, fstream::in | fstream::out | fstream::trunc | fstream::binary);
}
append_concurrently(filename, data_in_gb, 4, 'B', false);
}
std::remove(filename.c_str());
}
Edit:
I moved fstream to be shared by all threads. Now, for 512 byte buffer size, i see 8 writes totalling 4 KB lost consistently.
const int offset = 512;
const long long num_records_each_thread = (data_in_gb * 1024 * ((1024 * 1024) / (num_threads * offset)));
fstream file_handle(filename, fstream::app | fstream::binary);
if (!stream_cache) {
file_handle.rdbuf()->pubsetbuf(nullptr, 0); // no bufferring in fstream
}
Problem does not reproduce with 4KB buffer size.
Running main() from gtest_main.cc
Note: Google Test filter = *file_conc*_*append*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from fstream
[ RUN ] fstream.file_concurrent_appends
file6.log Data written : 1 GB, 1 threads , cache true , size 512 bytes Time taken: 38069289 micro-secs
d:\projs\logpoc\tests\test.cpp(279): error: Expected: num_records_each_thread * num_threads * offset
Which is: 1073741824
To be equal to: file.tellg()
Which is: 1073737728
d:\projs\logpoc\tests\test.cpp(301): error: Expected: num_records_each_thread * num_threads
Which is: 2097152
To be equal to: i + 1
Which is: 2097145
Edit 2:
Close file_handle after joining all threads to flush the data from the internal buffer. This resolved above issue.
According to §29.4.2 ¶7 of the official ISO C++20 standard, the functions provided by std::fstream are generally thread-safe.
However, if every thread has its own std::fstream object, then, as far as the C++ standard library is concerned, these are distinct streams and no synchronization will take place. Only the operating system's kernel will be aware that all file handles point to the same file. Therefore, any synchronization will have to be done by the kernel. But the kernel possibly isn't even aware that a write is supposed to go to the end of the file. Depending on your platform, it is possible that the kernel only receives write requests for certain file positions. If the end of file has meanwhile been moved by an append from another thread, then the position for a thread's previous write request may no longer be to the end of the file.
According to the documentation on std::fstream::open, opening a file in append mode will cause the stream to seek to the end of the file before every write. This behavior seems to be exactly what you want. But, for the reasons stated above, this will probably only work if all threads share the same std::fstream object. In that case, the std::fstream object should be able to synchronize all writes. In particular, it should be able to perform the seeks to the end of file and the subsequent writes atomically.
Recently, i've been assigned a client/server project, which is basically a chat room, where files can be sent and recieved, and we can use webcams.
I'm currently working on the file transfer part, and after looking at some online tutorials, i've noticed most of them use offsets to write into their buffers, then they write the whole buffer into their new file.
To replicate that kind of code, i've set up 2 buffers, one on the client side, the other on the server side. On the server side, i read 8192 bytes from my file, into the buffer, then i send it into the client side, which recieves it, and adds it to my buffer. Problem is, after the second file transfer, every single transfer it does, it's a SOCKET_ERROR, which probably means something's not quite right.
server:
std::ifstream readFile;
readFile.open(FileName, std::ios::binary | std::ios::ate);
if (!readFile)
{
std::cout << "unable to open file" << std::endl;
}
int FileSize = readFile.tellg();
readFile.seekg(0);
int remainingBytes = 0;
uint32_t FileSize32t = (uint32_t)FileSize;
FileSize32t = htonl(FileSize32t);
send(connections[ID], (char*)&FileSize32t, sizeof(uint32_t), 0);
int sent_bytes = 0;
int offset = 0;
char data[8192];
remainingBytes = FileSize;
int i = 0;
while (i<6)
{
readFile.read(data, 8192);
if (remainingBytes < 8192)
{
sent_bytes = send(connections[ID], data+offset, remainingBytes, 0);
remainingBytes -= sent_bytes;
offset += sent_bytes;
}
else
{
sent_bytes = send(connections[ID], data+offset, 8192, 0);
if (sent_bytes == SOCKET_ERROR)
{
std::cout << "erro" << std::endl;
}
remainingBytes -= sent_bytes;
offset += sent_bytes;
std::cout <<"offset: "<< offset << std::endl;
std::cout << "Sent bytes: " << sent_bytes << std::endl;
std::cout << "remaining bytes: " << remainingBytes << std::endl;
}
i++;
}
Client:
char data[8192];
std::ofstream writeFile;
writeFile.open("Putin.jpg", std::ios::binary);
int bytesReceieved = 0;
int totalBytesReceieved = 0;
int i = 0;
while (i<6)
{
if (recvFileSize - totalBytesReceieved < 8192)
{
bytesReceieved = recv(connection, data+totalBytesReceieved, recvFileSize - totalBytesReceieved, 0);
totalBytesReceieved += bytesReceieved;
}
else
{
bytesReceieved = recv(connection, data + totalBytesReceieved, 8192, 0);
totalBytesReceieved += bytesReceieved;
std::cout << totalBytesReceieved << std::endl;
}
i++;
}
writeFile.write(data, totalBytesReceieved);
std::cout << "transferência terminada, bytes recebidos: " << totalBytesReceieved << std::endl;
writeFile.close();
Do note that this is just a test program, and it's preety much one of my first interactions with C++. I've been told this probably isn't the best way to start off with C++, but i need this assignment done until the 15th of september, so i need to finish it regardless. If you find any errors or problems besides my original issue do feel free to point them out and if you can, explain me why it's wrong.
Thank you very much for your help.
I was told, that mmap() might be in trouble, if someone deletes the original file. I was wondering if that really happens. So i created some little test-program. I am using linux.
#include <iostream>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
int main(int, char**)
{
char const * const fileName = "/tmp/demo-file.dat";
size_t size;
{
struct stat st;
stat(fileName, &st);
size = st.st_size;
}
int fd = open(fileName, O_RDWR);
if (fd == -1)
{
std::cout << "open() failed, errno = " << errno << ":" << strerror(errno) << std::endl;
return (-1);
}
else
{
std::cout << "open() done (ok)" << std::endl;
}
for (int i = 20; i > 0; --i)
{
std::cout << "file open()'ed, wait #" << i << " seconds before mmap()" << std::endl;
sleep(1);
}
void *data = mmap((void*)0L, size, PROT_READ, MAP_SHARED, fd, (off_t)0);
if (data == (void*)-1)
{
std::cout << "mmap() failed, errno = " << errno << ":" << strerror(errno) << std::endl;
}
else
{
std::cout << "mmap() done (ok)" << std::endl;
}
for (int i = 20; i > 0; --i)
{
std::cout << "going to close() socket in #" << i << " seconds" << std::endl;
sleep (1);
}
close(fd);
for (int i = 30; i > 0; --i)
{
std::cout << "going to umap() files in #" << i << " seconds (still accessing the data)" << std::endl;
for (unsigned int x = 0; x < size; ++x)
{
char cp = *(char*) (data + x);
(void) cp;
}
sleep(1);
}
munmap(data, size);
for (int i = 30; i > 0; --i)
{
std::cout << "going to terminate #" << i << " seconds" << std::endl;
sleep(1);
}
return 0;
}
Whenever i delete the file - after the open() operation - it doesn't have negative impact to then mmap(). I can still acess the data in the test program.
When i delete the file right after close(), but before mmap(), it works. I can also still see the old file in the /proc/[pid]/fd/ area:
lrwx------ 1 frank frank 64 Mär 22 20:28 3 -> /tmp/demo-file.dat (deleted)
The rest of the program works.
Even when i delete the file after the close() it still succeeds to access the mmap() data.
However in both cases, after the close() i cannot see the
lrwx------ 1 frank frank 64 Mär 22 20:28 3 -> /tmp/demo-file.dat (deleted)
anymore. (btw: where is then noted, that this file "still exists somehow"?)
So is it the opposite, that it is guaranteed, that mmap() will still be able to operate on the data, even if the file was manually deleted (in a shell or by some other process)?
Here's what's happening.
The first thing to check is
$ls -i /tmp/demo-file.dat
65 /tmp/demo-file.dat
Note the inode number of the file is 65.
On starting the program, here's what it has in its lsof output (apart from other entries not relevant to the current discourse)
a.out 29271 zoso 3u REG 0,21 5 65 /tmp/demo-file.dat
This is a result of the open() that's been done. Note that the inode number is the same as the other file. The open() has increased the ref count on the same inode. Also, note that REG indicates a regular file.
Now if the file is deleted (using rm etc.), here's what the lsof looks like
a.out 29271 zoso 3u REG 0,21 5 65 /tmp/demo-file.dat (deleted)
This is expected since the file that was opened has been deleted but the handle that to its inode still is open in the process.
Moving on to the mmap, and here's the lsof output
a.out 29271 zoso DEL REG 0,21 65 /tmp/demo-file.dat
a.out 29271 zoso 3u REG 0,21 5 65 /tmp/demo-file.dat (deleted)
Now there's another new entry but note that this is of type DEL which indicates (lifting from lsof man page):
''DEL'' for a Linux map file that has been deleted;
Since lsof can't stat the original file anymore, it puts this mapping as DEL with no size of course, yet note that the inode number still remains the same i.e 65.
Now after close() has been called on the fd here's what lsof shows
a.out 29271 zoso DEL REG 0,21 65 /tmp/demo-file.dat
Note that the (deleted) entry is gone since that fd to a REG file has been closed and now only the mmap'd memory remains.
After munmap() this entry too is gone (no more references to /tmp/demo-file.dat and finally the program ends.
I'm trying to make some experiments on disk I/O using cache and not using it. In order to perform a read directly from the disk, I open the file with the O_DIRECT flag (defining the variable DISK_DIRECT).
Now the two branches of the if beneath, should perform the same operation, with the difference that one is helped by the cache and the other not.
The files to which I try to access are stored on disk and they do not change over time.
Also the two branches access to the same files.
At some point here, when I use fread I get ferror() to be true. While when I use read everything goes fine.
I'm sure they access the same files.
Do you have any idea why this could happen?
EDIT
Ok, i'm posting here an minimal example. the code i use is:
#include <iostream>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>
#include <fstream>
#include <sstream>
using namespace std;
typedef float fftwf_complex [2] ;
void fetch_level(unsigned long long tid, unsigned short level, fftwf_complex* P_read, fftwf_complex* P_fread, int n_coeff_per_level, FILE** files_o_direct, fstream* & files) {
int b_read;
fseek(files_o_direct[level],(long int) (tid * sizeof(fftwf_complex)*n_coeff_per_level), SEEK_SET);
b_read = fread(reinterpret_cast<char*>(P_fread),sizeof(fftwf_complex), n_coeff_per_level,files_o_direct[level]);
if(b_read == 0){
cerr << "nothing read\n";
}
files[level].seekg((streamoff) (tid * sizeof(fftwf_complex)*n_coeff_per_level), files[level].beg);
files[level].read(reinterpret_cast<char*>(P_read),
sizeof(fftwf_complex) * n_coeff_per_level);
}
void open_files (fstream* & files){
for(int i=0; i<1;i++) {
std::ostringstream oss;
oss << "./Test_fread_read/1.txt.bin";
files[i].open(oss.str().c_str(),
std::ios::in | std::ios::out |
std::ios::binary | std::ios::ate);
if (!files[i])
{
cerr << "fstream could not open " << oss.str() << endl;
}
}
}
void open_files_o_direct (FILE** files_o_direct, int* fd){
for(unsigned int i=0;i<1; i++){
std::ostringstream oss;
oss << "./Test_fread_read/1.txt.bin";
fd[i]=open(oss.str().c_str(), O_RDONLY | O_DIRECT);
files_o_direct[i] = fdopen(fd[i], "rb");
if(!files_o_direct[i])
cerr << "Could not open " << oss.str() << endl;
}
}
int close_files(FILE** files_o_direct, int* fd, fstream* & files) {
for(unsigned int i=0; i<1; i++){
//#if defined (DISK_DIRECT)
if(files_o_direct[i])
close(fd[i]);
//#else
if(files[i].is_open())
files[i].close();
//#endif
}
return 0;
}
int main(){
FILE**files_o_direct = new FILE* [256];
fstream* files = new fstream [256];
int * fd = new int [256];
fftwf_complex * P_read = new fftwf_complex [1];
fftwf_complex * P_fread = new fftwf_complex [1];
open_files_o_direct(files_o_direct, fd);
open_files(files);
fetch_level(2, 0, P_read, P_fread, 1, files_o_direct, files);
cout << "P_read: " << P_read[0][0] << " P_fread: " << P_fread[0][0] << endl;
cout << "P_read: " << P_read[0][1] << " P_fread: " << P_fread[0][1] << endl;
fetch_level(7, 0, P_read, P_fread, 1, files_o_direct, files);
cout << "P_read: " << P_read[0][0] << " P_fread: " << P_fread[0][0] << endl;
cout << "P_read: " << P_read[0][1] << " P_fread: " << P_fread[0][1] << endl;
fetch_level(8, 0, P_read, P_fread, 1, files_o_direct, files);
cout << "P_read: " << P_read[0][0] << " P_fread: " << P_fread[0][0] << endl;
cout << "P_read: " << P_read[0][1] << " P_fread: " << P_fread[0][1] << endl;
close_files(files_o_direct, fd, files);
delete [] P_read;
delete [] P_fread;
delete [] files;
delete [] files_o_direct;
return 0;
}
and the file which is accessed is:
0.133919 0.0458176
1.67441 2.40805
0.997525 -0.279977
-2.39672 -3.076
-0.0390913 0.854464
-0.0176478 -1.3142
-0.667981 -0.486272
0.831051 0.282802
-0.638032 -0.630943
-0.669854 -1.49762
which is stored in a binary format and that can be download from here: 1.txt.bin.
The output i get is:
nothing read
P_read: 0.997525 P_fread: 0
P_read: -0.279977 P_fread: 0
nothing read
P_read: 0.831051 P_fread: 0
P_read: 0.282802 P_fread: 0
nothing read
P_read: -0.638032 P_fread: 0
P_read: -0.630943 P_fread: 0
The problem persists even if i change the type of fftwf_complex from float[2] to simple float.
If i remove the fseek line everything works correctly.
This if (b_read == 0), will be true at the end of the file, and you will enter this branch
if(ferror(this->files_o_direct[level]))
fseek(this->files_o_direct[level], 0, SEEK_END); //ftell here returns 4800000
cerr << "nothing read\n";
even if ferror returns 0, the end of the file was reached anyway
fseek(this->files_o_direct[level], 0, SEEK_END);
makes no sense, and "nothing read\n" will be output either or not ferror returns nonzero.
From the manual page
fread() does not distinguish between end-of-file and error, and callers must use feof(3) and ferror(3) to determine which occurred.
so you have to check feof and if it is false you use ferror.
For who ever may have the same problem here there is the answer:
The O_DIRECT flag may impose alignment restrictions on the length and
address of user-space buffers and the file offset of I/Os. In Linux
alignment restrictions vary by filesystem and kernel version and
might be absent entirely. However there is currently no
filesystem-independent interface for an application to discover these
restrictions for a given file or filesystem. Some filesystems
provide their own interfaces for doing so, for example the
XFS_IOC_DIOINFO operation in xfsctl(3).
Under Linux 2.4, transfer sizes, and the alignment of the user buffer
and the file offset must all be multiples of the logical block size
of the filesystem. Since Linux 2.6.0, alignment to the logical block
size of the underlying storage (typically 512 bytes) suffices. The
logical block size can be determined using the ioctl(2) BLKSSZGET
operation or from the shell using the command:
blockdev --getss
linux reference page