If I am reading a file in c++ like this:
//Begin to read a file
FILE *f = fopen("vids/18.dat", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *m_sendingStream = (char*)malloc(pos);
fread(m_sendingStream, pos, 1, f);
fclose(f);
//Finish reading a file
I have 2 questions first: Is this reading the entire file? (I want it to do so), and 2nd how can I create a while that continues until reaching the end of the file? I have:
while(i < sizeof(m_sendingStream))
but I am not sure if this works, I've been reading around (I've never programmed in c++ before) and I thought I could use eof() but apparently that's bad practice.
A loop should not be necessary when reading from a file, since you will get the entire contents with your code in one go. You should still record and check the return value, of course:
size_t const n = fread(buf, pos /*bytes in a record*/, 1 /*max number of records to read*/, f);
if (n != 1) { /* error! */ }
You can also write a loop that reads until the end of the file without knowing the file size in advance (e.g. read from a pipe or growing file):
#define CHUNKSIZE 65536
char * buf = malloc(CHUNKSIZE);
{
size_t n = 0, r = 0;
while ((r = fread(buf + n, 1 /*bytes in a record*/, CHUNKSIZE /*max records*/, f)) != 0)
{
n += r;
char * tmp = realloc(buf, n + CHUNKSIZE);
if (tmp) { buf = tmp; }
else { /* big fatal error */ }
}
if (!feof(f))
{
perror("Error reading file");
}
}
This is the C style of working with files, the C++ style would be using the fstream library.
And about your second question, a good way to check wether you are on the end of the file or not, would be to use the feof function.
Related
I'm trying to use a sparse file to store sparse array of data, logically I thought the code had no bugs but the unit tests keep failing, after many inspections of code I decided to check the file content after every step and found out the holes were not created, aka: write first element, seek x amount of elements, write 2nd element ends up writing first element then second element in file without any space at all between them.
My simplified code:
FILE* file = fopen64(fn.c_str(), "ar+b");
auto const entryPoint = 220; //calculated at runtime, the size of each element is 220 bytes
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
The offset is being calculated correctly, current behavior is writing first element, leaving 20 elements empty then writing 22nd element. When inspecting file using hex dumps it shows 2 elements at offset 0 and 220 (directly after first element). unit tests also fail because reading 2nd element actually returns element number 22.
Anyone could explain what is wrong with my code? maybe I misunderstood the concept of holes???
------Edit1------
Here's my full code
Read function:
FILE* file = fopen64(fn.c_str(), "r+b");
if(file == nullptr){
memset(page->entries, 0, sizeof(page->entries));
return ;
}
MoveCursor(file, id, sizeof(page->entries));
size_t read = fread(&page->entries[0], sizeof(page->entries), 1, file);
fclose(file);
if(read != 1){ //didn't read a full page.
memset(page->entries, 0, sizeof(page->entries));
}
Write function:
auto fn = dir.path().string() + std::filesystem::path::preferred_separator + GetFileId(page->pageId);
FILE* file = fopen64(fn.c_str(), "ar+b");
MoveCursor(file, page->pageId, sizeof(page->entries));
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
void MoveCursor(FILE* file, TPageId pid, size_t pageMultiplier){
auto const entryPoint = pid * pageMultiplier;
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
}
And here's a simplified page class:
template<typename TPageId uint32_t EntriesCount>
struct PODPage {
bool dirtyBit = false;
TPageId pageId;
uint32_t entries[EntriesCount];
};
The reason I'm saying it is fseeko problem when writing is because when inspecting file content with xdd it shows data is out of order. Break points in MoveCursor function shows the offset is calculated correctly and manual inspection of file fields shows the offset is set correctly however when writing it doesn't leave a hole.
=============Edit2============
Minimal reproducer, logic goes as: write first chunk of data, seek to position 900, write second chunk of data, then try to read from position 900 and compare to data that was supposed to be there. Each operation opens and closes file which is what happens in my original code, keeping a file open is not allowed.
Expected behavior is to create a hole in file, actual behavior is the file is written sequentially without holes.
#include <iostream>
#define _FILE_OFFSET_BITS 64
#define __USE_FILE_OFFSET64 1
#include <stdio.h>
#include <cstring>
int main() {
uint32_t data[10] = {1,2,3,4,5,6,7,8,9};
uint32_t data2[10] = {9,8,7,6,5,4,3,2,1};
{
FILE* file = fopen64("data", "ar+b");
if(fwrite(&data[0], sizeof(data), 1, file) !=1) {
perror("err1");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "ar+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err2");
return 0;
}
if(fwrite(&data2[0], sizeof(data2), 1, file) !=1) {
perror("err3");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "r+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err4");
return 0;
}
uint32_t data3[10] = {0};
if(fread(&data3[0], sizeof(data3), 1, file)!=1) {
perror("err5");
return 0;
}
fclose(file);
if (memcmp(&data2[0],&data3[0],sizeof(data))!=0) {
std::cerr << "err6";
return 0;
}
}
return 0;
}
I think your problem is the same as discussed here:
fseek does not work when file is opened in "a" (append) mode
Does fseek() move the file pointer to the beginning of the file if it was opened in "a+b" mode?
Summary of the two above: If a file is opened for appending (using "a") then fseek only applies to the read position, not to the write position. The write position will always be at the end of the file.
You can fix this by opening the file with "w" or "w+" instead. Both worked for me with your minimal code example.
I am reading a collection file (20 or so small files in one) with fread_s and the content is being written in a struct. Like 99% of the times it reads the data correctly, but one time, at always the same position it seems to ignore the byte size of the element size parameter and just reads 500 or so bytes until it aborts and reports a feof error. The thing is, it doesn't even write the last three bytes of the int to the struct.
When I remove the checks, and let it continue reading, it will read normal again, like nothing happened.
I observed that the _Placeholder variable in the file pointer gets changed to a different value, and then back again, but I guess its just the eof error getting packed in there.
#pragma pack(push, 1)
struct fileHeader {
__int32 typeID;
bool isGFX;
char filename[8];
__int32 offset;
};
#pragma pack(pop)
#define HEADERSIZE 68
#define FILEHEADERSIZE 17
....
FILE *file;
fopen_s(&file, filename.c_str(), "r");
for (int i = 0; i < header.files - 1; i++) {
fseek(file, HEADERSIZE + i * FILEHEADERSIZE, 0);
fileHeader headerFile;
memset(&headerFile, 0, FILEHEADERSIZE);
int oldPointer = ftell(file); //118
int d = fread_s(&headerFile, FILEHEADERSIZE, FILEHEADERSIZE, 1, file); //returns 0
int newPointer = ftell(file); //630
int e = errno; //0
int ea = ferror(file); //0
int ef = feof(file); //1
//getting used here in a function
}
headerFile = {typeID=17 isGFX=true filename=0x00fefd05 "CURSORR" offset = 164} - offset should be 6820
Like Jonathan Leffler said in the comments, the mistake was, that I didn't read in binary mode. a simple change fopen_s(&file, filename.c_str(), "r"); to fopen_s(&file, filename.c_str(), "rb"); fixed the problem.
I have a large file containing strings. I have to read this file and store it in a buffer using C or C++. I tried to do it as follows:
FILE* file = fopen(fileName.c_str(), "r");
assert(file != NULL);
size_t BUF_SIZE = 10 * 1024 * 1024;
char* buf = new char[BUF_SIZE];
string contents;
while (!feof(file))
{
int ret = fread(buf, BUF_SIZE, 1, file);
assert(ret != -1);
contents.append(buf);
}
The data in the file would be the strings and i have to find the character with maximum frequency.
Is it possible to optimize the code more than this ? Will using BinaryReader improve optimisation ? Could you share some more ways if you know?
I am trying to create a file with a given size using lseek() and adding a byte at the end of the file, however it creates a sparse file with 0 byte.
Below is the code...any suggestions?
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#ifndef BUF_SIZE
#define BUF_SIZE 1024
#endif // BUF_SIZE
int main(int argc, char *argv[])
{
int inputFd;
int fileSize = 500000000;
int openFlags;
int result;
mode_t filePerms;
ssize_t numRead;
char buf[BUF_SIZE];
openFlags = O_WRONLY | O_CREAT | O_EXCL;
filePerms = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; /*rw-rw-ew*/
inputFd = open(argv[1], openFlags, filePerms);
if (inputFd == -1)
printf("problem opening file %s ", argv[1]);
return 1;
printf ("input FD: %d", inputFd);
result = lseek(inputFd, fileSize-1, SEEK_SET);
if (result == -1){
close(inputFd);
printf("Error calling lseek() to stretch the file");
return 1;
}
result = write(inputFd, "", 1);
if (result < 0){
close(inputFd);
printf("Error writing a byte at the end of file\n");
return 1;
}
if (close(inputFd) == -1)
printf("problem closing file %s \n",argv[1]);
return 0;
}
You are missing some braces:
if (inputFd == -1)
printf("problem opening file %s ", argv[1]);
return 1;
You need to change this to:
if (inputFd == -1) {
printf("problem opening file %s ", argv[1]);
return 1;
}
Without the braces, the only statement controlled by the if statement is the printf, and the return 1; statement is always run no matter what the value of inputFd is.
It is good practice to always use braces around a controlled block, even if there is only one statement (such as for the close at the end of your program).
Do you have any example of writing a byte on every block of the file?
This code is from a slightly different context, but can be adapted to your case. The context was ensuring that the disk space for an Informix database was all allocated, so the wrapper code around this created the file (and it had not to exist, etc). However, the entry point to actually writing was the second of these two functions — the fill buffer function replicated the 8-byte word informix into a 64 KiB block.
/* Fill the given buffer with the string 'informix' repeatedly */
static void fill_buffer(char *buffer, size_t buflen)
{
size_t filled = sizeof("informix") - 1;
assert(buflen > filled);
memmove(buffer, "informix", sizeof("informix")-1);
while (filled < buflen)
{
size_t ncopy = (filled > buflen - filled) ? buflen - filled : filled;
memmove(&buffer[filled], buffer, ncopy);
filled *= 2;
}
}
/* Ensure the file is of the required size by writing to it */
static void write_file(int fd, size_t req_size)
{
char buffer[64*1024];
size_t nbytes = (req_size > sizeof(buffer)) ? sizeof(buffer) : req_size;
size_t filesize = 0;
fill_buffer(buffer, nbytes);
while (filesize < req_size)
{
size_t to_write = nbytes;
ssize_t written;
if (to_write > req_size - filesize)
to_write = req_size - filesize;
if ((written = write(fd, buffer, to_write)) != (ssize_t)to_write)
err_syserr("short write (%d vs %u requested)\n",
(int)written, (unsigned)to_write);
filesize += to_write;
}
}
As you can see, it writes in 64 KiB chunks. Frankly, there's going to be no difference between writing all bytes on a page and writing one byte per page. Indeed, if anything, writing the whole page will be faster because the new value can simply be written, whereas if you write just one byte per page, an old page has to be created/read, modified, and then written back.
In your context, I would extend the current file to a multiple of 4 KiB (8 KiB if you prefer), then go writing the main data blocks, and the final partial block if necessary. You would probably simply do memset(buffer, '\0', sizeof(buffer)); whereas the sample code was deliberately writing something other than blocks of zero bytes. AFAIK, even if the block you write is all zero bytes, the driver actually writes that block to the disk — the simple act of writing ensures the file is non-sparse.
The err_syserr() function is a bit like fprintf(stderr, …), but it adds the system error message from errno and strerror() and exits the program too. The code does assume 32-bit (or larger) int values. I never got to experiment with terabyte size files — the code was last updated in 2009.
I am trying to read files that are simultaneously written to disk. I need to read chunks of specific size. If the size read is less than the specific size, I'd like to unread the file (something like what ungetc does, instead for a char[]) and try again. Appending to the bytes read already is not an option for me.
How is this possible?
I tried saving the current position through:
FILE *fd = fopen("test.txt","r+");
fpos_t position;
fgetpos (fd, &position);
and then reading the file and putting the pointer back to its before-fread position.
numberOfBytes = fread(buff, sizeof(unsigned char), desiredSize, fd)
if (numberByBytes < desiredSize) {
fsetpos (fd, &position);
}
But it doesn't seem to be working.
Replacing my previous suggestions with code I just checked (Ubuntu 12.04 LTS, 32bit). GCC is 4.7 but I'm pretty sure this is 100% standard solution.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define desiredSize 10
#define desiredLimit 100
int main()
{
FILE *fd = fopen("test.txt","r+");
if (fd == NULL)
{
perror("open");
exit(1);
}
int total = 0;
unsigned char buff[desiredSize];
while (total < desiredLimit)
{
fpos_t position;
fgetpos (fd, &position);
int numberOfBytes = fread(buff, sizeof(unsigned char), desiredSize, fd);
printf("Read try: %d\n", numberOfBytes);
if (numberOfBytes < desiredSize)
{
fsetpos(fd, &position);
printf("Return\n");
sleep(10);
continue;
}
total += numberOfBytes;
printf("Total: %d\n", total);
}
return 0;
}
I was adding text to file from another console and yes, read was progressing by 5 chars blocks in accordance to what I was adding.
fseek seems perfect for this:
FILE *fptr = fopen("test.txt","r+");
numberOfBytes = fread(buff, 1, desiredSize, fptr)
if (numberOfBytes < desiredSize) {
fseek(fptr, -numberOfBytes, SEEK_CUR);
}
Also note that a file descriptor is what open returns, not fopen.