mmap open and read from file - c++

I am mapping a huge file to avoid my app thrashing to main virtual memory, and to be able to run the app with more than the RAM I have. The code is c++ but partly follows old c APIs. When I work with the allocated pointer, the memory does get backed to the file as desired. However, when I run the app next time, I want the memory to be read from this same file which already has the prepared data. For some reason, on the next run, I read back all zeros. What am I doing wrong? Is it the ftruncate call? Is it the fopen call with wrong flag? Is it the mmap flags?
int64_t mmbytes=1<<36;
FILE *file = fopen(filename, "w+");
int fd = fileno(file);
int r = ftruncate(fd, mmbytes );
if (file == NULL || r){
perror("Failed: ");
throw std::runtime_error(std::strerror(errno));
} //
if ((mm = mmap(0, mmbytes,
PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, fd, 0)) == MAP_FAILED)
{
fprintf(stderr,"mmap error for output, errno %d\n", errno);
exit(-1);
}
}

FILE *file = fopen(filename, "w+");
I refer you to fopen's manual page, which describes "w+" as follows:
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is positioned
at the beginning of the file.
I specifically draw your attention to the "it is truncated" part. In other words, if there's anything in an existing file this ends up nuking it from high orbit.
Depending on what else you're doing "a" will work better.
Even better would be to forget fopen entirely, and simply use open:
int fd=open(filename, O_RDWR|O_CREAT, 0666);
There's your file descriptor, without jumping through any hoops. The file gets created, and left untouched if it already exists.

Related

Reading a file and saving the same exact file c++

I am actually writing a c++ program that reads any kind of file and saves it as a bmp file, but first I need to read the file, and thats were the issue is
char fileName[] = "test.jpg";
FILE * inFileForGettingSize;//This is for getting the file size
fopen_s(&inFileForGettingSize, fileName, "r");
fseek(inFileForGettingSize, 0L, SEEK_END);
int fileSize = ftell(inFileForGettingSize);
fclose(inFileForGettingSize);
ifstream inFile;//This is for reading the file
inFile.open(fileName);
if (inFile.fail()) {
cerr << "Error Opening File" << endl;
}
char * data = new char[fileSize];
inFile.read(data, fileSize);
ofstream outFile;//Writing the file back again
outFile.open("out.jpg");
outFile.write(data, fileSize);
outFile.close();
cin.get();
But when I read the file, lets say its a plainttext file it allways outputs some wierd charactes at the end, for example:
assdassaasd
sdaasddsa
sdadsa
passes to:
assdassaasd
sdaasddsa
sdadsaÍÍÍ
So when I do this with a jpg, exe, etc. It corrupts it.
I am not trying to COPY a file, I know there are other ways for that, Im just trying to read a complete file byte per byte. Thanks.
EDIT:
I found out that those 'Í' are equal to the number of end lines the file has, but this doesn't help me much
This is caused by newline handling.
You open the files in text mode (because you use "r" instead of "rb" for fopen and because you don't pass ios::binary to your fstream open calls), and on Windows, text mode translates "\r\n" pairs to "\n" on reading and back to "\r\n" when writing. The result is that the in-memory size is going to be shorter than the on-disk size, so when you try to write using the on-disk size, you go past the end of your array and write whatever random stuff happens to reside in memory.
You need to open files in binary mode when working with binary data:
fopen_s(&inFileForGettingSize, fileName, "rb");
inFile.open(fileName, ios::binary);
outFile.open("out.jpg", ios::binary);
For future reference, your copy routine could be improved. Mixing FILE* I/O with iostream I/O feels awkward, and opening and closing the file twice is extra work, and (most importantly), if your routine is ever run on a large enough file, it will exhaust memory trying to load the entire file into RAM. Copying a block at a time would be better:
const int BUFFER_SIZE = 65536;
char buffer[BUFFER_SIZE];
while (source.good()) {
source.read(buffer, BUFFER_SIZE);
dest.write(buffer, source.gcount());
}
It's a binary file, so you need to read and write the file as binary; otherwise it's treated as text, and assumed to have newlines that need translation.
In your call to fopen(), you need add the "b" designator:
fopen_s(&inFileForGettingSize, fileName, "rb");
And in your fstream::open calls, you need to add std::fstream::binary:
inFile.open(fileName, std::fstream::binary);
// ...
outFile.open("out.jpg", std::fstream::binary);

Size error on read file

RESOLVED
I'm trying to make a simple file loader.
I aim to get the text from a shader file (plain text file) into a char* that I will compile later.
I've tried this function:
char* load_shader(char* pURL)
{
FILE *shaderFile;
char* pShader;
// File opening
fopen_s( &shaderFile, pURL, "r" );
if ( shaderFile == NULL )
return "FILE_ER";
// File size
fseek (shaderFile , 0 , SEEK_END);
int lSize = ftell (shaderFile);
rewind (shaderFile);
// Allocating size to store the content
pShader = (char*) malloc (sizeof(char) * lSize);
if (pShader == NULL)
{
fputs ("Memory error", stderr);
return "MEM_ER";
}
// copy the file into the buffer:
int result = fread (pShader, sizeof(char), lSize, shaderFile);
if (result != lSize)
{
// size of file 106/113
cout << "size of file " << result << "/" << lSize << endl;
fputs ("Reading error", stderr);
return "READ_ER";
}
// Terminate
fclose (shaderFile);
return 0;
}
But as you can see in the code I have a strange size difference at the end of the process which makes my function crash.
I must say I'm quite a beginner in C so I might have missed some subtilities regarding the memory allocation, types, pointers...
How can I solve this size issue?
*EDIT 1:
First, I shouldn't return 0 at the end but pShader; that seemed to be what crashed the program.
Then, I change the type of reult to size_t, and added a end character to pShader, adding pShdaer[result] = '/0'; after its declaration so I can display it correctly.
Finally, as #JamesKanze suggested, I turned fopen_s into fopen as the previous was not usefull in my case.
First, for this sort of raw access, you're probably better off
using the system level functions: CreateFile or open,
ReadFile or read and CloseHandle or close, with
GetFileSize or stat to get the size. Using FILE* or
std::filebuf will only introduce an additional level of
buffering and processing, for no gain in your case.
As to what you are seeing: there is no guarantee that an ftell
will return anything exploitable as a numeric value; it could
very well be just a magic cookie. On most current systems, it
is a byte offset into the physical file, but on any non-Unix
system, the offset into the physical file will not map directly
to the logical file you are reading unless you open the file in
binary mode. If you use "rb" to open the file, you'll
probably see the same values. (Theoretically, you could get
extra 0's at the end of the file, but practically, the OS's
where that happened are either extinct, or only used on legacy
mainframes.)
EDIT:
Since the answer stating this has been deleted: you should loop
on the fread until it returns 0 (setting errno to 0 before
each call, and checking it after the return to see whether the
function returned because of an error or because it reached the
end of file). Having said this: if you're on one of the usual
Windows or Unix systems, and the file is local to the machine,
and not too big, fread will read it all in one go. The
difference in size you are seeing (given the numerical values
you posted) is almost certainly due to the fact that the two
byte Windows line endings are being mapped to a single '\n'
character. To avoid this, you must open in binary mode;
alternatively, if you really are dealing with text (and want
this mapping), you can just ignore the extra bytes in your
buffer, setting the '\0' terminator after the last byte
actually read.

"fastfwd" file that can be pipe/socket/fifo

My function gets a FILE* to read from, and it needs to read starting from some non-negative offset.
I could use fseek(file, offset, SEEK_SET), but it fails on stdin, for instance.
How can I determine if fseek works? If it doesn't, I could read and discard offset bytes.
And is there a way to read (and discard) from FILE without allocating read buffer?
You can test if fseek works on that stream, by calling fseek( file, offset, SEEK_SET) and on error, checking that errno == EBADF which is returned to say "The stream specified is not a seekable stream".
I think you need to read and discard, with a buffer, but if it can just be pagesize bytes and you keep a count of bytes read, reading till you did the equivalent of a seek. If it were a memory mappable file, then you can read without reading, but then the seek would have worked.
The return value of fseek() tells you if it worked or not:
Return Value
...Upon successful completion, fgetpos(), fseek(), fsetpos() return 0, ...Otherwise, -1 is returned and errno is set to indicate the error.
So attempt to fseek() from the file and check the return result, and handle your failure case accordingly. Ex:
ret = fseek(stdin, 0, SEEK_SET);
if(ret < 0)
printf("failed because: %s\n", strerror(errno));
will give you something like:
failed because: Illegal seek
So that failed because you can't seek stdin, where as:
FILE * fp = fopen("word.txt", "r");
ret = fseek(fp, 0, SEEK_SET);
if(ret < 0)
printf("failed because: %s\n", strerror(errno));
Wouldn't print anything because you got back 0 indicating success (assuming of course that "word.txt" exists, is readable, was opened successfully, etc).
I don't understand this part of your question:
is there a way to read (and discard) from FILE without allocating read buffer
You can just fseek() to the point you want to read, or you can read into an array to a buffer and then overwrite the results. The answer depends on your goals, but using things like fread() or read() will require a non-null pointer to store data into.

Faster way to get File Size information C++

I have a function to get a FileSize of a file. I am running this on WinCE. Here is my current code which seems particularily slow
int Directory::GetFileSize(const std::string &filepath)
{
int filesize = -1;
#ifdef linux
struct stat fileStats;
if(stat(filepath.c_str(), &fileStats) != -1)
filesize = fileStats.st_size;
#else
std::wstring widePath;
Unicode::AnsiToUnicode(widePath, filepath);
HANDLE hFile = CreateFile(widePath.c_str(), 0, FILE_SHARE_READ | FILE_SHARE_WRITE, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if (hFile > 0)
{
filesize = ::GetFileSize( hFile, NULL);
}
CloseHandle(hFile);
#endif
return filesize;
}
At least for Windows, I think I'd use something like this:
__int64 Directory::GetFileSize(std::wstring const &path) {
WIN32_FIND_DATAW data;
HANDLE h = FindFirstFileW(path.c_str(), &data);
if (h == INVALID_HANDLE_VALUE)
return -1;
FindClose(h);
return data.nFileSizeLow | (__int64)data.nFileSizeHigh << 32;
}
If the compiler you're using supports it, you might want to use long long instead of __int64. You probably do not want to use int though, as that will only work correctly for files up to 2 gigabytes, and files larger than that are now pretty common (though perhaps not so common on a WinCE device).
I'd expect this to be faster than most other methods though. It doesn't require opening the file itself at all, just finding the file's directory entry (or, in the case of something like NTFS, its master file table entry).
Your solution is already rather fast to query the size of a file.
Under Windows, at least for NTFS and FAT, the file system driver will keep the file size in the cache, so it is rather fast to query it. The most time-consuming work involved is switching from user-mode to kernel-mode, rather than the file system driver's processing.
If you want to make it even faster, you have to use your own cache policy in user-mode, e.g. a special hash table, to avoid switching from user-mode to kernel-mode. But I don't recommend you to do that, because you will gain little performance.
PS: You'd better avoid the statement Unicode::AnsiToUnicode(widePath, filepath); in your function body. This function is rather time-consuming.
Just an idea (I haven't tested it), but I would expect
GetFileAttributesEx to be fastest at the system level. It
avoids having to open the file, and logically, I would expect it
to be faster than FindFirstFile, since it doesn't have to
maintain any information for continuing the search.
You could roll your own but I don't see why your approach is slow:
int Get_Size( string path )
{
// #include <fstream>
FILE *pFile = NULL;
// get the file stream
fopen_s( &pFile, path.c_str(), "rb" );
// set the file pointer to end of file
fseek( pFile, 0, SEEK_END );
// get the file size
int Size = ftell( pFile );
// return the file pointer to begin of file if you want to read it
// rewind( pFile );
// close stream and release buffer
fclose( pFile );
return Size;
}

Why is calling close() after fopen() not closing?

I ran across the following code in one of our in-house dlls and I am trying to understand the behavior it was showing:
long GetFD(long* fd, const char* fileName, const char* mode)
{
string fileMode;
if (strlen(mode) == 0 || tolower(mode[0]) == 'w' || tolower(mode[0]) == 'o')
fileMode = string("w");
else if (tolower(mode[0]) == 'a')
fileMode = string("a");
else if (tolower(mode[0]) == 'r')
fileMode = string("r");
else
return -1;
FILE* ofp;
ofp = fopen(fileName, fileMode.c_str());
if (! ofp)
return -1;
*fd = (long)_fileno(ofp);
if (*fd < 0)
return -1;
return 0;
}
long CloseFD(long fd)
{
close((int)fd);
return 0;
}
After repeated calling of GetFD with the appropriate CloseFD, the whole dll would no longer be able to do any file IO. I wrote a tester program and found that I could GetFD 509 times, but the 510th time would error.
Using Process Explorer, the number of Handles did not increase.
So it seems that the dll is reaching the limit for the number of open files; setting _setmaxstdio(2048) does increase the amount of times we can call GetFD. Obviously, the close() is working quite right.
After a bit of searching, I replaced the fopen() call with:
long GetFD(long* fd, const char* fileName, const char* mode)
{
*fd = (long)open(fileName, 2);
if (*fd < 0)
return -1;
return 0;
}
Now, repeatedly calling GetFD/CloseFD works.
What is going on here?
If you open a file with fopen, you have to close it with fclose, symmetrically.
The C++ runtime must be given a chance to clean up/deallocate its inner file-related structures.
You need to use fclose with files opened via fopen, or close with files opened via open.
The standard library you are using has a static array of FILE structures. Because you are not calling fclose(), the standard library doesn't know that the underlying files have been closed, so it doesn't know it can reuse the corresponding FILE structures. You get an error after it has run out of entries in the FILE array.
fopen opens it's own file descriptor, so you'd need to do an fclose(ofp) in your original function to prevent running out of file descriptors. Usually, one either uses the lower level file descriptor functions open, close OR the buffered fopen, fclose functions.
you are open the file fopen() function so u have to close the file useing fclose(), if you are using open() function and try to call fclose() function it will not work