cpp file reading error with stat and read() [closed]

cpp file reading error with stat and read() [closed] - c++

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am running into this error not regularly, can't reproduce it.
File being read is a read-only file and can't be deleted or modified.
Code is not exactly the same because it is part of something bigger that I am writing but this is the important part of the code which is causing the problem.
This code is for explaining purposes and not to reproduce the problem because of point 1
I am trying to read the file using
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <memory>
#include <exception>
#include <iostream>
#include <glog/logging.h>
using namespace std;
int main() {
string fileName="blah";
struct stat fileStat;
int status = ::stat(fileName.c_str(), &fileStat);
if (status != 0) {
LOG(ERROR) << "Error stating the file";
}
size_t fileSize = fileStat.st_size;
// fileSize is 79626240. I am trying to read block starting from
// 67108864 bytes, so there will be 1251736
size_t fileBlockSize = 16 * 1024 * 1024;
size_t numBlocks = fileSize / fileBlockSize;
size_t offset = numBlocks;
size_t actualSize = fileSize - offset * fileBlockSize;
if (actualSize == 0) {
LOG(INFO) << "You read the entire file";
return 1;
}
int fd = ::open(fileName.c_str(), O_RDONLY);
if (fd < 0) {
throw std::runtime_error("Error opening the file");
} else if (offset > 0 && lseek(fd, offset, SEEK_SET) < 0) {
throw std::runtime_error("Error seeking the file");
}
uint64_t readBlockSize = 256 * 1024;
char *data = new char[readBlockSize + 1];
uint64_t totalRead = 0;
while (totalRead < actualSize) {
ssize_t numRead = ::read(fd, data, readBlockSize);
// Use the data you read upto numRead
if (numRead == 0) {
LOG(ERROR) << "Reached end of file";
break;
} else if (numRead < 0) {
throw std::runtime_error("read unsuccessful");
}
totalRead += numRead;
}
if (totalRead != actualSize) {
LOG(ERROR) << "Error reading the file";
}
}
If you imagine me slicing the file into blocks of size 16 mybtes and then reading the last block. I am reading the block in a loop with a smaller size, however I get EOF before I can finish reading the entire block. Can it ever happen the size reported by stat is greater than the size of data in the file ?
The output I see :
Reached end of file
Error reading the file
I don't need alternative solutions, I can do other things such as lseek to END however I wanna know why this is happening ?
PS It is not because of number of blocks on the disk. I am using st_size and nothing more

You must take care using stat over a file, it is better to use fstat to avoid TOCTOU race conditions.
int fileDescriptor = -1;
struct stat fileStat;
std::vector<char> fileContent;
std::string filename("test.txt");
fileDescriptor = open(filename.c_str(),O_RDONLY);
// Do error check of fileDescriptor
fstat(fileDescriptor,&fileStat);
// Do error check of fstat
fileContent.resize(fileStat.st_size);
::read(fileDescriptor,fileContent.data(),fileStat.st_size);
close(fileDescriptor);
Additionally, consider that read may return a value lesser than fileStat.st_size and you must read the remaining bytes(pretty hard in file I/O, quite common with sockets though), the code is just a small example.
Edit
I have copied your code and modified to load a local 9MB file, after compilation with g++ -g -std=c++11 -lglog main.cpp, I have setup a breakpoint in the line 51
if (totalRead != actualSize)
This is the result from my debug session:
(gdb) b main.cpp:51 Breakpoint 1 at 0x4013fc: file main.cpp, line 51.
(gdb) r Starting program: /home/jpalma/Documents/functionTest/a.out
[Thread debugging using libthread_db enabled] Using host libthread_db
library "/lib64/libthread_db.so.1".
Breakpoint 1, main () at main.cpp:51
51 if (totalRead !=> actualSize) {
(gdb) p totalRead
$1 = 9000032
(gdb) p actualSize
$2 = 9000032
So basically your program works flawless for me. Maybe you have a problem in your filesystem or something not related with this.
I'm using ext4 as filesystem.
ll reports this size from the file I'm reading 9000032 abr 29 16:10 WebDev.pdf, so as you can see it is actually correct. My page size is
$ getconf PAGESIZE
4096

So you question is : can the size reported by fstat, or by stat on a read only non modified file be greater that what will be read from the file ?
First some elements on read return value (from man page) :
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested ... On error, -1 is returned, and errno is set appropriately
So the return value is at max the requested size, can be less, with 0 as end of file indication, and -1 as error indication.
Man page says also that a read of less bytes than requested size may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.
So even if I could never see that, nothing in the documentation guarantees that even reading on the file, you will get as much data as requested unless end of file has been reached.
But it is clearly stated than a return value of 0 means that you are at end of file. As you test end of file with 0 == read(...), all is fine.
For your question : can size reported by stat be different from the number of bytes that can be read from a file, the answer is no, except if the file system is broken, or there are physical read errors. That is the definition of the size member : The st_size field gives the size of the file (if it is a regular file or a symbolic link) in bytes (from stat man page)
But I really cannot understand your code. First I see :
size_t fileSize = fileStat.st_size;
// fileSize is 79626240. I am trying to read block starting from
// 67108864 bytes, so there will be 1251736
size_t fileBlockSize = 16 * 1024 * 1024;
size_t numBlocks = fileSize / fileBlockSize;
size_t offset = numBlocks;
size_t actualSize = fileSize - offset * fileBlockSize;
So actualSize is now 1251736 when file is 79626240 bytes long.
And later, without actualSize has changed :
uint64_t totalRead = 0;
while (totalRead < actualSize) {
ssize_t numRead = ::read(fd, data, readBlockSize);
...
totalRead += numRead;
}
if (totalRead != actualSize) {
LOG(ERROR) << "Error reading the file";
}
As actualSize despite of its name is not the actual file size, you can go into the Error reading the file branch. But if it happens with the true file size, double check your file system.

Related

fseeko not creating a hole in file when seeking after EOF

I'm trying to use a sparse file to store sparse array of data, logically I thought the code had no bugs but the unit tests keep failing, after many inspections of code I decided to check the file content after every step and found out the holes were not created, aka: write first element, seek x amount of elements, write 2nd element ends up writing first element then second element in file without any space at all between them.
My simplified code:
FILE* file = fopen64(fn.c_str(), "ar+b");
auto const entryPoint = 220; //calculated at runtime, the size of each element is 220 bytes
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
The offset is being calculated correctly, current behavior is writing first element, leaving 20 elements empty then writing 22nd element. When inspecting file using hex dumps it shows 2 elements at offset 0 and 220 (directly after first element). unit tests also fail because reading 2nd element actually returns element number 22.
Anyone could explain what is wrong with my code? maybe I misunderstood the concept of holes???
------Edit1------
Here's my full code
Read function:
FILE* file = fopen64(fn.c_str(), "r+b");
if(file == nullptr){
memset(page->entries, 0, sizeof(page->entries));
return ;
}
MoveCursor(file, id, sizeof(page->entries));
size_t read = fread(&page->entries[0], sizeof(page->entries), 1, file);
fclose(file);
if(read != 1){ //didn't read a full page.
memset(page->entries, 0, sizeof(page->entries));
}
Write function:
auto fn = dir.path().string() + std::filesystem::path::preferred_separator + GetFileId(page->pageId);
FILE* file = fopen64(fn.c_str(), "ar+b");
MoveCursor(file, page->pageId, sizeof(page->entries));
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
void MoveCursor(FILE* file, TPageId pid, size_t pageMultiplier){
auto const entryPoint = pid * pageMultiplier;
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
}
And here's a simplified page class:
template<typename TPageId uint32_t EntriesCount>
struct PODPage {
bool dirtyBit = false;
TPageId pageId;
uint32_t entries[EntriesCount];
};
The reason I'm saying it is fseeko problem when writing is because when inspecting file content with xdd it shows data is out of order. Break points in MoveCursor function shows the offset is calculated correctly and manual inspection of file fields shows the offset is set correctly however when writing it doesn't leave a hole.
=============Edit2============
Minimal reproducer, logic goes as: write first chunk of data, seek to position 900, write second chunk of data, then try to read from position 900 and compare to data that was supposed to be there. Each operation opens and closes file which is what happens in my original code, keeping a file open is not allowed.
Expected behavior is to create a hole in file, actual behavior is the file is written sequentially without holes.
#include <iostream>
#define _FILE_OFFSET_BITS 64
#define __USE_FILE_OFFSET64 1
#include <stdio.h>
#include <cstring>
int main() {
uint32_t data[10] = {1,2,3,4,5,6,7,8,9};
uint32_t data2[10] = {9,8,7,6,5,4,3,2,1};
{
FILE* file = fopen64("data", "ar+b");
if(fwrite(&data[0], sizeof(data), 1, file) !=1) {
perror("err1");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "ar+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err2");
return 0;
}
if(fwrite(&data2[0], sizeof(data2), 1, file) !=1) {
perror("err3");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "r+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err4");
return 0;
}
uint32_t data3[10] = {0};
if(fread(&data3[0], sizeof(data3), 1, file)!=1) {
perror("err5");
return 0;
}
fclose(file);
if (memcmp(&data2[0],&data3[0],sizeof(data))!=0) {
std::cerr << "err6";
return 0;
}
}
return 0;
}

I think your problem is the same as discussed here:
fseek does not work when file is opened in "a" (append) mode
Does fseek() move the file pointer to the beginning of the file if it was opened in "a+b" mode?
Summary of the two above: If a file is opened for appending (using "a") then fseek only applies to the read position, not to the write position. The write position will always be at the end of the file.
You can fix this by opening the file with "w" or "w+" instead. Both worked for me with your minimal code example.

Why read system call stops reading when less than block is missing?

Introduction and general objective
I am trying to send an image from a child process (generated by calling popen from the parent) to the parent process.
The image is a grayscale png image. It is opened with the OpenCV library and encoded using imencode function of the same library. So the resulting encoded data is stored into a std::vector structure of type uchar, namely the buf vector in the code below.
No error in sending preliminary image information
First the child sends the following image information needed by the parent:
size of the buf vector containing the encoded data: this piece of information is needed so that the parent will allocate a buffer of the same size where to write the image information that it will receive from the child. Allocation is performed as follows (buf in this case is the array used to received data not the vector containing the encoded data):
u_char *buf = (u_char*)malloc(val*sizeof(u_char));
number of rows of the original image: needed by the parent to decode the image after all data have been received;
number of columns of the original image: needed by the parent to decode the image after all data have been received.
These data are written by the child on the standard output using cout and read by the parent using fgets system call.
This pieces of information are correctly sent and received so no problem until now.
Sending image data
The child writes the encoded data (i.e. the data contained in the vector buf) to the standard output using write system call while the parent uses the file-descriptor returned by popen to read the data. Data is read using read system call.
Data writing and reading is performed in blocks of 4096 bytes inside while loops. The writing line is the following:
written += write(STDOUT_FILENO, buf.data()+written, s);
where STDOUT_FILENO tells to write on standard output.
buf.data() returns the pointer to the first element in the array used internally by the vector structure.
written stores the number of bytes that have been written until now and it is used as index. s is the number of bytes (4096) that write will try to send each time.
write returns the number of bytes that actually have been written and this is used to update written.
Data reading is very similar and it is performed by the following line:
bytes_read = read(fileno(fp), buf+total_bytes, bytes2Copy);
fileno(fp) is telling from where to read data (fp is the filedescriptor returned by popen). buf is the array where received data is stored and total_bytes are the number of bytes read until now so it is used as index. bytes2Copy is the number of bytes expected to be received: it is wither BUFLEN (i.e. 4096) or for the last block of data the remaining data (if for example the total bytes are 5000 then after 1 block of 4096 bytes another block of 5000-4096 is expected).
The code
Consider this example. The following is a process launching a child process with popen
#include <stdlib.h>
#include <unistd.h>//read
#include "opencv2/opencv.hpp"
#include <iostream>
#define BUFLEN 4096
int main(int argc, char *argv[])
{
//file descriptor to the child process
FILE *fp;
cv::Mat frame;
char temp[10];
size_t bytes_read_tihs_loop = 0;
size_t total_bytes_read = 0;
//launch the child process with popen
if ((fp = popen("/path/to/child", "r")) == NULL)
{
//error
return 1;
}
//read the number of btyes of encoded image data
fgets(temp, 10, fp);
//convert the string to int
size_t bytesToRead = atoi((char*)temp);
//allocate memory where to store encoded iamge data that will be received
u_char *buf = (u_char*)malloc(bytesToRead*sizeof(u_char));
//some prints
std::cout<<bytesToRead<<std::endl;
//initialize the number of bytes read to 0
bytes_read_tihs_loop=0;
int bytes2Copy;
printf ("bytesToRead: %ld\n",bytesToRead);
bytes2Copy = BUFLEN;
while(total_bytes_read<bytesToRead &&
(bytes_read_tihs_loop = read(fileno(fp), buf+total_bytes_read, bytes2Copy))
)
{
//bytes to be read at this iteration: either 4096 or the remaining (bytesToRead-total)
bytes2Copy = BUFLEN < (bytesToRead-total_bytes_read) ? BUFLEN : (bytesToRead-total_bytes_read);
printf("%d btytes to copy\n", bytes2Copy);
//read the bytes
printf("%ld bytes read\n", bytes_read_tihs_loop);
//update the number of bytes read
total_bytes_read += bytes_read_tihs_loop;
printf("%lu total bytes read\n\n", total_bytes_read);
}
printf("%lu bytes received over %lu expected\n", total_bytes_read, bytesToRead);
printf("%lu final bytes read\n", total_bytes_read);
pclose(fp);
cv::namedWindow( "win", cv::WINDOW_AUTOSIZE );
frame = cv::imdecode(cv::Mat(1,total_bytes_read,0, buf), 0);
cv::imshow("win", frame);
return 0;
}
and the process opened by the above corresponds to the following:
#include <unistd.h> //STDOUT_FILENO
#include "opencv2/opencv.hpp"
#include <iostream>
using namespace std;
using namespace cv;
#define BUFLEN 4096
int main(int argc, char *argv[])
{
Mat frame;
std::vector<uchar> buf;
//read image as grayscale
frame = imread("test.png",0);
//encode image and put data into the vector buf
imencode(".png",frame, buf);
//send the total size of vector to parent
cout<<buf.size()<<endl;
unsigned int written= 0;
int i = 0;
size_t toWrite = 0;
//send until all bytes have been sent
while (written<buf.size())
{
//send the current block of data
toWrite = BUFLEN < (buf.size()-written) ? BUFLEN : (buf.size()-written);
written += write(STDOUT_FILENO, buf.data()+written, toWrite);
i++;
}
return 0;
}
The error
The child reads an image, encodes it and sends first the dimensions (size, #rows, #cols) to the parent and then the encoded image data.
The parent reads first the dimensions (no prob with that), then it starts reading data. Data is read 4096 bytes at each iteration. However when less than 4096 bytes are missing, it tries to read only the missing bytes: in my case the last step should read 1027 bytes (115715%4096), but instead of reading all of them it just reads `15.
What I got printed for the last two iterations is:
4096 btytes to copy
1034 bytes read
111626 total bytes read
111626 bytes received over 115715 expected
111626 final bytes read
OpenCV(4.0.0-pre) Error: Assertion failed (size.width>0 && size.height>0) in imshow, file /path/window.cpp, line 356
terminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.0.0-pre) /path/window.cpp:356: error: (-215:Assertion failed) size.width>0 && size.height>0 in function 'imshow'
Aborted (core dumped)
Why isn't read reading all the missing bytes?
I am working on this image:
There might be errors also on how I am trying to decode back the image so any help there would be appreciated too.
EDIT
In my opinion as opposed to some suggestions the problem is not related to the presence of \n or \r or \0.
In fact when I print data received as integer with the following lines:
for (int ii=0; ii<val; ii++)
{
std::cout<<(int)buf[ii]<< " ";
}
I see 0, 10 and 13 values (the ASCII values of the above mentioned characters) in the middle of data so this makes me think it is not the problem.

fgets(temp, 10, fp);
...
read(fileno(fp), ...)
This cannot possibly work.
stdio routines are buffered. Buffers are controlled by the implementation. fgets(temp, 10, fp); will read an unknown number of bytes from the file and put it in a buffer. These bytes will never be seen by low level file IO again.
You never, ever, use the same file with both styles of IO. Either do everything with stdio, or do everything with low-level IO. The first option is the easiest by far, you just replace read with fread.
If for some ungodly reason known only to the evil forces of darkness you want to keep both styles of IO, you can try that by calling setvbuf(fp, NULL, _IOLBF, 0) before doing anything else. I have never done that and cannot vouch for this method, but they say it should work. I don't see a single reason to use it though.
On a possibly unrelated, note, your reading loop has some logic in its termination condition that is not so easy to understand and could be invalid. The normal way to read a file looks approximately as follows:
left = data_size;
total = 0;
while (left > 0 &&
(got=read(file, buf+total, min(chunk_size, left))) > 0) {
left -= got;
total += got;
}
if (got == 0) ... // reached the end of file
else if (got < 0) ... // encountered an error
The more correct way would be to try again if got < 0 && errno == EINTR, so the modified condition could look like
while (left > 0 &&
(((got=read(file, buf+total, min(chunk_size, left))) > 0) ||
(got < 0 && errno == EINTR))) {
but at this point readability starts to suffer and you may want to split this in separate statements.

You're writing binary data to standard output, which is expecting text. Newline characters (\n) and/or return characters (\r) can be added or removed depending on your systems encoding for end-of-line in text files. Since you're missing characters, it appears that you system is removing one of those two characters.
You need to write your data to a file that you open in binary mode, and you should read in your file in binary.

Updated Answer
I am not the world's best at C++, but this works and will give you a reasonable starting point.
parent.cpp
#include <stdlib.h>
#include <unistd.h>
#include <iostream>
#include "opencv2/opencv.hpp"
int main(int argc, char *argv[])
{
// File descriptor to the child process
FILE *fp;
// Launch the child process with popen
if ((fp = popen("./child", "r")) == NULL)
{
return 1;
}
// Read the number of bytes of encoded image data
std::size_t filesize;
fread(&filesize, sizeof(filesize), 1, fp);
std::cout << "Filesize: " << filesize << std::endl;
// Allocate memory to store encoded image data that will be received
std::vector<uint8_t> buffer(filesize);
int bufferoffset = 0;
int bytesremaining = filesize;
while(bytesremaining>0)
{
std::cout << "Attempting to read: " << bytesremaining << std::endl;
int bytesread = fread(&buffer[bufferoffset],1,bytesremaining,fp);
bufferoffset += bytesread;
bytesremaining -= bytesread;
std::cout << "Bytesread/remaining: " << bytesread << "/" << bytesremaining << std::endl;
}
pclose(fp);
// Display that image
cv::Mat frame;
frame = cv::imdecode(buffer, -CV_LOAD_IMAGE_ANYDEPTH);
cv::imshow("win", frame);
cv::waitKey(0);
}
child.cpp
#include <cstdio>
#include <cstdint>
#include <vector>
#include <fstream>
#include <cassert>
#include <iostream>
int main()
{
std::FILE* fp = std::fopen("image.png", "rb");
assert(fp);
// Seek to end to get filesize
std::fseek(fp, 0, SEEK_END);
std::size_t filesize = std::ftell(fp);
// Rewind to beginning, allocate buffer and slurp entire file
std::fseek(fp, 0, SEEK_SET);
std::vector<uint8_t> buffer(filesize);
std::fread(buffer.data(), sizeof(uint8_t), buffer.size(), fp);
std::fclose(fp);
// Write filesize to stdout, followed by PNG image
std::cout.write((const char*)&filesize,sizeof(filesize));
std::cout.write((const char*)buffer.data(),filesize);
}
Original Answer
There are a couple of issues:
Your while loop writing the data from the child process is incorrect:
while (written<buf.size())
{
//send the current block of data
written += write(STDOUT_FILENO, buf.data()+written, s);
i++;
}
Imagine your image is 4097 bytes. You will write 4096 bytes the first time through the loop and then try and write 4096 (i.e. s) bytes on the second pass when there's only 1 byte left in your buffer.
You should write whichever is the lesser of 4096 and bytes remaining in buffer.
There's no point sending the width and height of the file, they are already encoded in the PNG file you are sending.
There's no point calling imread() in the child to convert the PNG file from disk into a cv::Mat and then calling imencode() to convert it back into a PNG to send to the parent. Just open() and read the file as binary and send that - it is already a PNG file.
I think you need to be clear in your mind whether you are sending a PNG file or pure pixel data. A PNG file will have:
PNG header,
image width and height,
date of creation,
color type, bit-depth
compressed, checksummed pixel data
A pixel-data only file will have:
RGB, RGB, RGB, RGB

Why is data corrupt when reading back from a file as it's being written with O_DIRECT

I have a C++ program that uses the POSIX API to write a file opened with O_DIRECT. Concurrently, another thread is reading back from the same file via a different file descriptor. I've noticed that occasionally the data read back from the file contains all zeroes, rather than the actual data I wrote. Why is this?
Here's an MCVE in C++17. Compile with g++ -std=c++17 -Wall -otest test.cpp or equivalent. Sorry I couldn't seem to make it any shorter. All it does is write 100 MiB of constant bytes (0x5A) to a file in one thread and read them back in another, printing a message if any of the read-back bytes are not equal to 0x5A.
WARNING, this MCVE will delete and rewrite any file in the current working directory named foo.
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
constexpr size_t CHUNK_SIZE = 1024 * 1024;
constexpr size_t TOTAL_SIZE = 100 * CHUNK_SIZE;
int main(int argc, char *argv[])
{
::unlink("foo");
std::thread write_thread([]()
{
int fd = ::open("foo", O_WRONLY | O_CREAT | O_DIRECT, 0777);
if (fd < 0) std::exit(-1);
uint8_t *buffer = static_cast<uint8_t *>(
std::aligned_alloc(4096, CHUNK_SIZE));
std::fill(buffer, buffer + CHUNK_SIZE, 0x5A);
size_t written = 0;
while (written < TOTAL_SIZE)
{
ssize_t rv = ::write(fd, buffer,
std::min(TOTAL_SIZE - written, CHUNK_SIZE));
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
written += rv;
}
});
std::thread read_thread([]()
{
int fd = ::open("foo", O_RDONLY, 0);
if (fd < 0) std::exit(-1);
uint8_t *buffer = new uint8_t[CHUNK_SIZE];
size_t checked = 0;
while (checked < TOTAL_SIZE)
{
ssize_t rv = ::read(fd, buffer, CHUNK_SIZE);
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
for (ssize_t i = 0; i < rv; ++i)
if (buffer[i] != 0x5A)
std::cerr << "readback mismatch at offset " << checked + i << std::endl;
checked += rv;
}
});
write_thread.join();
read_thread.join();
}
(Details such as proper error checking and resource management are omitted here for the sake of the MCVE. This is not my actual program but it shows the same behavior.)
I'm testing on Linux 4.15.0 with an SSD. About 1/3 of the time I run the program, the "readback mismatch" message prints. Sometimes it doesn't. In all cases, if I examine foo after the fact I find that it does contain the correct data.
If you remove O_DIRECT from the ::open() flags in the write thread, the problem goes away and the "readback mismatch" message never prints.
I could understand why my ::read() might return 0 or something to indicate I've already read everything that has been flushed to disk yet. But I can't understand why it would perform what appears to be a successful read, but with data other than what I wrote. Clearly I'm missing something, but what is it?

So, O_DIRECT has some additional constraints that might not make it what you're looking for:
Applications should avoid mixing O_DIRECT and normal I/O to the same
file, and especially to overlapping byte regions in the same file.
Even when the filesystem correctly handles the coherency issues in
this situation, overall I/O throughput is likely to be slower than
using either mode alone.
Instead, I think O_SYNC might be better, since it does provide the expected guarantees:
O_SYNC provides synchronized I/O file integrity completion, meaning
write operations will flush data and all associated metadata to the
underlying hardware. O_DSYNC provides synchronized I/O data
integrity completion, meaning write operations will flush data to the
underlying hardware, but will only flush metadata updates that are
required to allow a subsequent read operation to complete
successfully. Data integrity completion can reduce the number of
disk operations that are required for applications that don't need
the guarantees of file integrity completion.

Unable to correctly read bmp file

I am trying to read certain information from a bmp file. Basically file type i.e B M in my bmp file. I start with first opening the file. Which is happening correctly. The first fread is however failing. Why is this happening?
#include<stdio.h>
#include<string.h>
#define SIZE 1
int main(void)
{
FILE* fd = NULL;
char buff[2];
unsigned int i=0,size=0,offset=0;
memset(buff,0,sizeof(buff));
fd = fopen("RIT.bmp","r+");
if(NULL == fd)
{
printf("\n fopen() Error!!!\n");
return 1;
}
printf("\n File opened successfully\n");
if(SIZE*2 != fread(buff,SIZE,2,fd))//to read the file type.(i. e. B M)
{
printf("\n first fread() failed\n");
return 1;
}
return 0;
}
Output
File opened successfully
first fread() failed
Press any key to continue . . .
Update
Yes the file is empty, due to some earlier error. That is why this error is coming.

Probably your file doesn't have enough(2 bytes) data. Its giving correct output when I checked with file> 2 bytes. Same is failing for empty file

From the man page: "Upon successful completion, fread() shall return the number of elements successfully read [...]."
That would be 2, not SIZE*2.
Although, at second thought, SIZE is 1, so while the program is error-prone, it is not actually wrong. In that case, the second part of the sentence applies: " ... which is less than nitems only if a read error or end-of-file is encountered.". And as others said, check the global errno if the file is long enough. Maybe it's time for a new SSD.

reading windows file; _stat returns incorrect value

I was given a little library by my school to do some projects in. The library was written with linux in mind, so I'm trying to change some things to work with my MinGW compiler. One particular program is for reading files given a URL. I had to change stat to _stat to make it work properly. Opening a file works fine but _stat seems to return the incorrect value. I'll include the relevant code below:
#ifdef WIN32
#define stat _stat
#endif
//return true if the number of chars read is the same as the file size
bool IsDone() const
{
cout << "checking if numRead " << numRead << " is fileSize " << fileSize << endl;
return (numRead == fileSize);
}
char Read()
{
if (IsDone())
throw IllegalStateException("stream is done");
else
{
char c;
file.get(c);
cout << "reading: " << c << endl;
if (file.fail())
throw FileException(std::string("error reading from file ") + fileName);
++numRead;
return c;
}
}
void OpenFile(string fileName)
{
struct stat buf;
#ifdef WIN32
if (_stat(fileName.c_str(), &buf) < 0){
switch (errno){
case ENOENT:
throw FileException(std::string("Could not find file ") + name);
case EINVAL:
throw FileException(std::string("Invalid parameter to _stat.\n"));
default:
/* Should never be reached. */
throw FileException(std::string("Unexpected error in _stat.\n"));
}
}
#else
if (stat(fileName.c_str(), &buf) < 0)
throw FileException(std::string("could not determine size of file ") + fileName);
#endif
fileSize = buf.st_size;
file.open(fileName.c_str());
}
If you would like to see the entire library, you can get them from here. I understand that the code is gross looking; I'm just trying to cludge a working windows version. This thing works fine on Linux; the problem is that when I read in a file on Windows, the size is 1 short for every newline that I use in the input file, so that if I have a file that looks like this:
text
It works fine, but with:
text\r\n
It breaks, and the output looks like this:
checking if numRead 0 is fileSize 6
checking if numRead 0 is fileSize 6
reading: t
checking if numRead 1 is fileSize 6
checking if numRead 1 is fileSize 6
reading: e
checking if numRead 2 is fileSize 6
checking if numRead 2 is fileSize 6
reading: x
checking if numRead 3 is fileSize 6
checking if numRead 3 is fileSize 6
reading: t
checking if numRead 4 is fileSize 6
checking if numRead 4 is fileSize 6
reading:
checking if numRead 5 is fileSize 6
checking if numRead 5 is fileSize 6
reading:
File Error: error reading from file H:/test/data/stuff.txt
It breaks because IsDone() falsely returns false (no pun intented), and the program tries to read past the end of the file. Any suggestions on why _stat is returning an incorrect number when there's a newline?

What _stat is returning is quite correct. Windows uses "\r\n" to signal the end of a line, but when you open a file in text mode, that will be converted to a single new-line character as you read the stream.
If you want the stream you read to match the external length, open the file in binary mode instead.
If you'll pardon my saying so, once you're done with that, my advice would be to throw this code away, and change your name so if somebody sees your post, they won't blame you for it. What you have here is a lot of code that seems, at least to me, to make a simple task considerably more complex and difficult.

Looks to me like you should open the file with binary mode. No idea why you need to use stat to read a file and in the Windows version of your code, you call _stat and then stat again.
There are a few ways to do this in Windows. My personal preference is to use something like:
char strBuffer[1024];
FILE *fp = fopen (file,"rb"); // Binary read mode
while (!feof (fp))
fread (strBuffer,1,sizeof (strBuffer),fp);
fclose (fp);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

cpp file reading error with stat and read() [closed] - c++

Related

fseeko not creating a hole in file when seeking after EOF

Why read system call stops reading when less than block is missing?

Why is data corrupt when reading back from a file as it's being written with O_DIRECT

Unable to correctly read bmp file

reading windows file; _stat returns incorrect value

Categories

Resources