Should std::fread succeed if the file is large enough? - c++

I have found the following strange behaviour on Visual Studio 2015 when reading a file for a large array of bytes. The file that I load is about 80 MB and is large enough.
#include <cstdio>
#include <vector>
int main() {
std::FILE* file;
errno_t error = _wfopen_s(&file, L"/User/account/Desktop/file.data", L"r");
const std::size_t n = 16384;
std::vector<unsigned char> v(n);
const std::size_t nb_bytes_read = std::fread(v.data(), sizeof(unsigned char), n, file);
// At this point error = 0 and nb_bytes_read = 3473
}
So I ask std::fread for 16384 bytes and it just gives me 3473 even though the file is large enough. Should it be considered as a bug? The standard does not seem to say so but this behavior is very weird to me.

Try to open the file in binary mode "rb" which is propably what you want anyway. Otherwise, on the Windows platform, the byte \0x1A terminates input. Also, line breaks like \r\n will be converted to \n which may also result in less bytes read than specified.

According to this reference, fread() will only return fewer than the requested number of bytes if EOF was reached or an error occurred. You can check for those with feof() and ferror().

Related

Incorrect size of file found using Visual Studio C++

I am porting over c++ code from linux to windows. I am currently using Visual Studio 2013 to port my code.
I need to read a binary file and am using this portion of c++ code:
// Open the stream
std::ifstream is("myfile.bin");
// Determine the file length
is.seekg(0, std::ios_base::end);
std::size_t size=is.tellg();
is.seekg(0, std::ios_base::begin);
// Create a vector to store the data
int* Data = new int[size/sizeof(int)];
// Load the data
is.read((char*) &Data[0], size);
// Close the file
is.close();
In linux, the size of my binary file is correctly found to be 744mb. However, in windows, the size of my binary file is incorrectly found to be >4GB. How can I correct this issue?
Change std::ifstream is("myfile.bin"); to std::ifstream is("myfile.bin", std::ios::binary);
With your current default open mode, the compiler choses "char" mode. In Linux chars in files are UTF8, first 128 positions are 1-byte char. But for memory UTF32, 4-bytes per char, is used. In Windows chars are "wide-chars", 2-bytes per char.
I finally had the time to actually run this myself, though I had to fix a couple of things, like ios_base::beg instead of begin (different function) Also, as mentioned, the array allocation should be this int* Data = new int[size / sizeof(int) + 1]; // At most one extra int
I found your problem: you're not in the right directory. Check if you successfully opened the file or not. If you don't, then you get a huge garbage value (probably -1, but unsigned, so massive) for size.
Try this to find your directory in Windows: (probably need Windows.h or something that I "just had" already)
char dirBuf[256];
GetCurrentDirectory(256, dirBuf);
cout << "Current directory is: " << dirBuf << endl;
See if that's where your file is and move it accordingly. Or specify the ENTIRE path in the constructor to ifstream.
Also, it has nothing to do with ios::binary or not. Works fine both ways, or fails if the file isn't there.
std::size_t size=is.tellg();
The standard doesn't require tellg to return the byte offset from the beginning of the file. In general, this may not be a reliable way to get the size of the file, though it probably does what you expect on Linux and Windows.
The return type of the tellg method is std::basic_stream::pos_type, so you're starting with an implicit conversion to std::size_t which may or may not be appropriate. In a 32-bit build, for example, it's conceivable that the size of a file could be larger than a std::size_t can represent.
But the root problem is that you're not checking for errors. If you have exceptions disabled, then tellg reports an error by returning pos_type(-1). When you cast that to an unsigned type (which std::size_t is), then you get a very large value. I suspect you failed to open the file, and since you didn't detect that error, the seekg and the tellg failed. You then coerced pos_type(-1) to a std::size_t, which made it look like the file was huge.
You also have the problems others have noted: failing to open the file in binary mode and computing the wrong size for the buffer when the file isn't a multiple of the size of an int.
The most reliable to get the file size is to use the OS's API. On Windows, you can do this instead:
// Open the file. [TODO: Get the file name in wide characters and use
// CreateFileW instead. If the file name contains characters not
// representable by the user's ANSI codepage, then CreateFileA will fail.]
HANDLE hfile = CreateFileA("myfile.bin", GENERIC_READ, FILE_SHARE_READ,
nullptr, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN,
nullptr);
if (hfile == INVALID_HANDLE_VALUE) { error handling here }
// Figure out how big it is.
LARGE_INTEGER li_size;
if (!GetFileSizeEx(hfile, &li_size)) { error handling here }
// TODO: On a 32-bit build, this won't be able to handle huge files,
// so check that here.
std::size_t size = li_size.QuadPart;
// Create a buffer to store the data, being careful to round up to a
// multiple of sizeof(int). [TODO: Use a std::vector instead.]
int* Data = new int[(size + sizeof(int) - 1) / sizeof(int)];
// Load the data.
const DWORD BytesToRead = static_cast<DWORD>(size);
DWORD BytesRead = 0;
if (!ReadFile(hfile, Data, &BytesRead, nullptr) || BytesRead < BytesToRead) {
error handling here
}
// Close the file
CloseHandle(hfile);
int* Data = new int[size/sizeof(int)];
Why are you doing this? You're dividing the size by 4. You don't want to do this. It should just be int* Data = new int[size]
Also, it should be std::ifstream f("filename.bin", std::ios::binary);

Xcode Error: “EXC_BAD_ACCESS”

I am attempting to compile and run a test C program in Xcode. This program reads 5 symbols from a text file and closes it. The program builds successfully, but when I try to run the program I get the error: GDB: Program received signal: "EXC_BAD_ACCESS" around fclose(in).
#include <iostream>
#include <unistd.h>
int main (int argc, const char * argv[])
{
bool b;
char inpath[PATH_MAX];
printf("Enter input file path :\r\n");
std::cin >> inpath;
FILE *in = fopen(inpath, "r+w");
char buf[5];
fread(&buf,sizeof(buf),5,in);
printf(buf);
fclose(in);
return 0;
}
What could be a cause of this?
Ah! sizeof(buf) will return 5, so you're asking for 25 bytes in a 5-byte buffer. This overwrites auto storage and clobbers in.
And, of course, note that fprint(buf) will be attempting to print a buffer with no terminating null, so it will print garbage beyond the end of what was read.
The line
fread(&buf,sizeof(buf),5,in);
is wrong: read carefully the man page of fread (and remember that sizeof(buf) would be the size of the whole buf array).
The line
printf(buf);
is wrong. Behavior is undefined if for instance buf would contain %d
You definitely should learn to use the debugger (and enable all warnings with your compiler).
fread(&buf,sizeof(buf),5,in);
this says that you want to read the buf 5 times, which is not correct.
The second and third parameters tell fread the size of each element you want to read and the number of elements.

Read and write in c++

I am trying to use the system calls read() and write(). The following program creates a file and writes some data into it. Here is the code..
int main()
{
int fd;
open("student",O_CREAT,(mode_t)0600);
fd=open("student",O_WRONLY);
char data[128]="Hi nikhil, How are u?";
write(fd,data,128);
}
Upon the execution of the above program i got a file with name student created with size as 128 bytes.
int main()
{
int fd=open("student",O_WRONLY);
char data[128];
read(fd,data,128);
cout<<(char*)data<<endl;
}
But the output i get is junk characters....why is this so?
I wrote a small read program to read data from the file. Her is the code.
But the output
Don't read from a file that you've open in O_WRONLY mode!
Do yourself a favor and always check the return values of IO functions.
You should also always close file descriptors you've (successfully) opened. Might not matter for trivial code like this, but if you get into the habit of forgetting that, you'll end up writing code that leaks file descriptors, and that's a bad thing.
You're not checking whether read() returns an error. You should do so, because that's probably the case with the code in your question.
Since you're opening the file write-only in the first place, calling read() on it will result in an error. You should open the file for reading instead:
char data[128];
int fd = open("student", O_RDONLY);
if (fd != -1) {
if (read(fd, data, sizeof(data)) != -1) {
// Process data...
}
close(fd);
}
Well, one of the first things is that your data is not 128 bytes. Your data is the string: "Hi nikhil, How are u?", which is way less than 128 bytes. But you're writing 128 bytes from the array to the file. Everything after the initial string will be random junk from memory as the char array is only initialized with 21 bytes of data. So the next 107 bytes is junk.

fseek now supports large files

It appears that fseek now, at least in my implementation, supports large files naturally without fseek64, lseek or some strange compiler macro.
When did this happen?
#include <cstdio>
#include <cstdlib>
void writeF(const char*fname,size_t nItems){
FILE *fp=NULL;
if(NULL==(fp=fopen(fname,"w"))){
fprintf(stderr,"\t-> problems opening file:%s\n",fname);
exit(0);
}
for(size_t i=0;i<nItems;i++)
fwrite(&i,sizeof(size_t),1,fp);
fclose(fp);
}
void getIt(const char *fname,size_t offset,int whence,int nItems){
size_t ary[nItems];
FILE *fp = fopen(fname,"r");
fseek(fp,offset*sizeof(size_t),whence);
fread(ary,sizeof(size_t),nItems,fp);
for(int i=0;i<nItems;i++)
fprintf(stderr,"%lu\n",ary[i]);
fclose(fp);
}
int main(){
const char * fname = "temp.bin";
writeF(fname,1000000000);//writefile
getIt(fname,999999990,SEEK_SET,10);//get last 10 seek from start
getIt(fname,-10,SEEK_END,10);//get last 10 seek from start
return 0;
}
The code above writes a big file with the entries 1-10^9 in binary size_t format.
And then writes the last 10 entries, seeking from the beginning of the file, and seek from the end of file.
Linux x86-64 has had large file support (LFS) from pretty much day one; and doesn't require any special macros etc to enable it - both traditional fseek()) and LFS fseek64() already use a 64bit off_t.
Linux i386 (32bit) typically defaults to 32-bit off_t as otherwise it would break a huge number of applications - but you can test what is defined in your environment by checking the value of the _FILE_OFFSET_BITS macro.
See http://www.suse.de/~aj/linux_lfs.html for full details on Linux large file support.
The signature is
int fseek ( FILE * stream, long int offset, int origin );
so the range depends on the size of long.
On some systems it is 32-bit, and you have a problem with large files, on other systems it is 64-bit.
999999990 is a normal int and fits perfectly into 32 bits. I don't believe that you'd get away with this though:
getIt(fname,99999999990LL,SEEK_SET,10);

Reading from and writing to the middle of a binary file in C/C++

If I have a large binary file (say it has 100,000,000 floats), is there a way in C (or C++) to open the file and read a specific float, without having to load the whole file into memory (i.e. how can I quickly find what the 62,821,214th float is)? A second question, is there a way to change that specific float in the file without having to rewrite the entire file?
I'm envisioning functions like:
float readFloatFromFile(const char* fileName, int idx) {
FILE* f = fopen(fileName,"rb");
// What goes here?
}
void writeFloatToFile(const char* fileName, int idx, float f) {
// How do I open the file? fopen can only append or start a new file, right?
// What goes here?
}
You know the size of a float is sizeof(float), so multiplication can get you to the correct position:
FILE *f = fopen(fileName, "rb");
fseek(f, idx * sizeof(float), SEEK_SET);
float result;
fread(&result, sizeof(float), 1, f);
Similarly, you can write to a specific position using this method.
fopen allows to open a file for modification (not just to append) by using either the rb+ or wb+ mode on fopen. See here: http://www.cplusplus.com/reference/clibrary/cstdio/fopen/
To position the file to a specific float, you can use the fseek by using index*sizeof(float) as the offset ad SEEK_SET as the orign. See here: http://www.cplusplus.com/reference/clibrary/cstdio/fseek/
Here is an example if you would like to use C++ streams:
#include <fstream>
using namespace std;
int main()
{
fstream file("floats.bin", ios::binary);
float number;
file.seekp(62821214*sizeof(float), ios::beg);
file.read(reinterpret_cast<char*>(&number), sizeof(float));
file.seekp(0, ios::beg); // move to the beginning of the file
number = 3.2;
// write number at the beginning of the file
file.write(reinterpret_cast<char*>(&number), sizeof(float));
}
One way would be to call mmap() on the file. Once you've done that, you can read/modify the file as if it was an in-memory array.
Of course that method only works if the file is small enough to fit in your process's address space... if you're running in 64-bit mode, you'll be fine; in 32-bit mode, a file with 100,000,000 floats should fit, but another order or two of magnitude above that and you might run into trouble.
I know this question has been answered already, but Linux/Unix provides easy system calls to read/write(pread/pwrite) in the middle of a file. If you look at the kernel source code for the system calls 'read' & 'pread', both eventually calls the vfs_read().And vfs_read requires a OFFSET, i.e it requires a POSITION to read from the file. In pread,this offset is given by us and in read() the offset is calculated internally in the kernel and maintained for the file descriptor. pread() offers exceptional performance compared to read() and using pread ,you can read/write in the same file descriptor simultaneously in multiple threads in different parts of the file. My Humble opionion, never use read() or other file streams, use pread(). Hope the filestream libraries have wrapped the read() calls, the streams perform well by making fewer system calls.
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
char* buf; off_t offToStart = id * sizeof(float); size_t sizeToRead = sizeof(float);
int fd = open("fileName", O_RDONLY);
ret = pread(fd, buf, sizeToRead, offToStart);
//processs from the read 'buf'
close(fd);
}