fseeko not creating a hole in file when seeking after EOF - c++

I'm trying to use a sparse file to store sparse array of data, logically I thought the code had no bugs but the unit tests keep failing, after many inspections of code I decided to check the file content after every step and found out the holes were not created, aka: write first element, seek x amount of elements, write 2nd element ends up writing first element then second element in file without any space at all between them.
My simplified code:
FILE* file = fopen64(fn.c_str(), "ar+b");
auto const entryPoint = 220; //calculated at runtime, the size of each element is 220 bytes
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
The offset is being calculated correctly, current behavior is writing first element, leaving 20 elements empty then writing 22nd element. When inspecting file using hex dumps it shows 2 elements at offset 0 and 220 (directly after first element). unit tests also fail because reading 2nd element actually returns element number 22.
Anyone could explain what is wrong with my code? maybe I misunderstood the concept of holes???
------Edit1------
Here's my full code
Read function:
FILE* file = fopen64(fn.c_str(), "r+b");
if(file == nullptr){
memset(page->entries, 0, sizeof(page->entries));
return ;
}
MoveCursor(file, id, sizeof(page->entries));
size_t read = fread(&page->entries[0], sizeof(page->entries), 1, file);
fclose(file);
if(read != 1){ //didn't read a full page.
memset(page->entries, 0, sizeof(page->entries));
}
Write function:
auto fn = dir.path().string() + std::filesystem::path::preferred_separator + GetFileId(page->pageId);
FILE* file = fopen64(fn.c_str(), "ar+b");
MoveCursor(file, page->pageId, sizeof(page->entries));
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
void MoveCursor(FILE* file, TPageId pid, size_t pageMultiplier){
auto const entryPoint = pid * pageMultiplier;
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
}
And here's a simplified page class:
template<typename TPageId uint32_t EntriesCount>
struct PODPage {
bool dirtyBit = false;
TPageId pageId;
uint32_t entries[EntriesCount];
};
The reason I'm saying it is fseeko problem when writing is because when inspecting file content with xdd it shows data is out of order. Break points in MoveCursor function shows the offset is calculated correctly and manual inspection of file fields shows the offset is set correctly however when writing it doesn't leave a hole.
=============Edit2============
Minimal reproducer, logic goes as: write first chunk of data, seek to position 900, write second chunk of data, then try to read from position 900 and compare to data that was supposed to be there. Each operation opens and closes file which is what happens in my original code, keeping a file open is not allowed.
Expected behavior is to create a hole in file, actual behavior is the file is written sequentially without holes.
#include <iostream>
#define _FILE_OFFSET_BITS 64
#define __USE_FILE_OFFSET64 1
#include <stdio.h>
#include <cstring>
int main() {
uint32_t data[10] = {1,2,3,4,5,6,7,8,9};
uint32_t data2[10] = {9,8,7,6,5,4,3,2,1};
{
FILE* file = fopen64("data", "ar+b");
if(fwrite(&data[0], sizeof(data), 1, file) !=1) {
perror("err1");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "ar+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err2");
return 0;
}
if(fwrite(&data2[0], sizeof(data2), 1, file) !=1) {
perror("err3");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "r+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err4");
return 0;
}
uint32_t data3[10] = {0};
if(fread(&data3[0], sizeof(data3), 1, file)!=1) {
perror("err5");
return 0;
}
fclose(file);
if (memcmp(&data2[0],&data3[0],sizeof(data))!=0) {
std::cerr << "err6";
return 0;
}
}
return 0;
}

I think your problem is the same as discussed here:
fseek does not work when file is opened in "a" (append) mode
Does fseek() move the file pointer to the beginning of the file if it was opened in "a+b" mode?
Summary of the two above: If a file is opened for appending (using "a") then fseek only applies to the read position, not to the write position. The write position will always be at the end of the file.
You can fix this by opening the file with "w" or "w+" instead. Both worked for me with your minimal code example.

Related

Read and remove first (or last) line from txt file without copying

I wanna read and remove the first line from a txt file (without copying, it's a huge file).
I've read the net but everybody just copies the desired content to a new file. I can't do that.
Below a first attempt. This code will be stucked in a loop as no lines are removed. If the code would remove the first line of file at each opening, the code would reach the end.
#include <iostream>
#include <string>
#include <fstream>
#include <boost/interprocess/sync/file_lock.hpp>
int main() {
std::string line;
std::fstream file;
boost::interprocess::file_lock lock("test.lock");
while (true) {
std::cout << "locking\n";
lock.lock();
file.open("test.txt", std::fstream::in|std::fstream::out);
if (!file.is_open()) {
std::cout << "can't open file\n";
file.close();
lock.unlock();
break;
}
else if (!std::getline(file,line)) {
std::cout << "empty file\n"; //
file.close(); // never
lock.unlock(); // reached
break; //
}
else {
// remove first line
file.close();
lock.unlock();
// do something with line
}
}
}
Here's a solution written in C for Windows.
It will execute and finish on a 700,000 line, 245MB file in no time. (0.14 seconds)
Basically, I memory map the file, so that I can access the contents using the functions used for raw memory access. Once the file has been mapped, I just use the strchr function to find the location of one of the pair of symbols used to denote an EOL in windows (\n and \r) - this tells us how long in bytes the first line is.
From here, I just memcpy from the first byte f the second line back to the start of the memory mapped area (basically, the first byte in the file).
Once this is done, the file is unmapped, the handle to the mem-mapped file is closed and we then use the SetEndOfFile function to reduce the length of the file by the length of the first line. When we close the file, it has shrunk by this length and the first line is gone.
Having the file already in memory since I've just created and written it is obviously altering the execution time somewhat, but the windows caching mechanism is the 'culprit' here - the very same mechanism we're leveraging to make the operation complete very quickly.
The test data is the source of the program duplicated 100,000 times and saved as testInput2.txt (paste it 10 times, select all, copy, paste 10 times - replacing the original 10, for a total of 100 times - repeat until output big enough. I stopped here because more seemed to make Notepad++ a 'bit' unhappy)
Error-checking in this program is virtually non-existent and the input is expected not to be UNICODE, i.e - the input is 1 byte per character.
The EOL sequence is 0x0D, 0x0A (\r, \n)
Code:
#include <stdio.h>
#include <windows.h>
void testFunc(const char inputFilename[] )
{
int lineLength;
HANDLE fileHandle = CreateFile(
inputFilename,
GENERIC_READ | GENERIC_WRITE,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL | FILE_FLAG_WRITE_THROUGH,
NULL
);
if (fileHandle != INVALID_HANDLE_VALUE)
{
printf("File opened okay\n");
DWORD fileSizeHi, fileSizeLo = GetFileSize(fileHandle, &fileSizeHi);
HANDLE memMappedHandle = CreateFileMapping(
fileHandle,
NULL,
PAGE_READWRITE | SEC_COMMIT,
0,
0,
NULL
);
if (memMappedHandle)
{
printf("File mapping success\n");
LPVOID memPtr = MapViewOfFile(
memMappedHandle,
FILE_MAP_ALL_ACCESS,
0,
0,
0
);
if (memPtr != NULL)
{
printf("view of file successfully created");
printf("File size is: 0x%04X%04X\n", fileSizeHi, fileSizeLo);
LPVOID eolPos = strchr((char*)memPtr, '\r'); // windows EOL sequence is \r\n
lineLength = (char*)eolPos-(char*)memPtr;
printf("Length of first line is: %ld\n", lineLength);
memcpy(memPtr, eolPos+2, fileSizeLo-lineLength);
UnmapViewOfFile(memPtr);
}
CloseHandle(memMappedHandle);
}
SetFilePointer(fileHandle, -(lineLength+2), 0, FILE_END);
SetEndOfFile(fileHandle);
CloseHandle(fileHandle);
}
}
int main()
{
const char inputFilename[] = "testInput2.txt";
testFunc(inputFilename);
return 0;
}
What you want to do, indeed, is not easy.
If you open the same file for reading and writing in it without being careful, you will end up reading what you just wrote and the result will not be what you want.
Modifying the file in place is doable: just open it, seek in it, modify and close. However, you want to copy all the content of the file except K bytes at the beginning of the file. It means you will have to iteratively read and write the whole file by chunks of N bytes.
Now once done, K bytes will remain at the end that would need to be removed. I don't think there's a way to do it with streams. You can use ftruncate or truncate functions from unistd.h or use Boost.Interprocess truncate for this.
Here is an example (without any error checking, I let you add it):
#include <iostream>
#include <fstream>
#include <unistd.h>
int main()
{
std::fstream file;
file.open("test.txt", std::fstream::in | std::fstream::out);
// First retrieve size of the file
file.seekg(0, file.end);
std::streampos endPos = file.tellg();
file.seekg(0, file.beg);
// Then retrieve size of the first line (a.k.a bufferSize)
std::string firstLine;
std::getline(file, firstLine);
// We need two streampos: the read one and the write one
std::streampos readPos = firstLine.size() + 1;
std::streampos writePos = 0;
// Read the whole file starting at readPos by chunks of size bufferSize
std::size_t bufferSize = 256;
char buffer[bufferSize];
bool finished = false;
while(!finished)
{
file.seekg(readPos);
if(readPos + static_cast<std::streampos>(bufferSize) >= endPos)
{
bufferSize = endPos - readPos;
finished = true;
}
file.read(buffer, bufferSize);
file.seekg(writePos);
file.write(buffer, bufferSize);
readPos += bufferSize;
writePos += bufferSize;
}
file.close();
// No clean way to truncate streams, use function from unistd.h
truncate("test.txt", writePos);
return 0;
}
I'd really like to be able to provide a cleaner solution for in-place modification of the file, but I'm not sure there's one.

fwrite doesn't seem to copy the whole file (just the start)

I'm trying to make a exe program that can read any file to binary and later use this binary to make the exact same file.
So I figured out that I can use fopen(content,"rb") to read a file as binary,
and using fwrite I can write block of data into stream. But the problem is when I fwrite it doesn't seems copy everything.
For example the text I opened contains 31231232131 in it. When I write it into another file it only copies 3123 (first 4 bytes).
I can see that it's a very simple thing that I'm missing but I don't know what.
#include <stdio.h>
#include <iostream>
using namespace std;
typedef unsigned char BYTE;
long getFileSize(FILE *file)
{
long lCurPos, lEndPos;
lCurPos = ftell(file);
fseek(file, 0, 2);
lEndPos = ftell(file);
fseek(file, lCurPos, 0);
return lEndPos;
}
int main()
{
//const char *filePath = "C:\\Documents and Settings\\Digital10\\MyDocuments\\Downloads\\123123.txt";
const char *filePath = "C:\\Program Files\\NPKI\\yessign\\User\\008104920100809181000405,OU=HNB,OU=personal4IB,O=yessign,C=kr\\SignCert.der";
BYTE *fileBuf;
FILE *file = NULL;
if ((file = fopen(filePath, "rb")) == NULL)
cout << "Could not open specified file" << endl;
else
cout << "File opened successfully" << endl;
long fileSize = getFileSize(file);
fileBuf = new BYTE[fileSize];
fread(fileBuf, fileSize, 1, file);
FILE* fi = fopen("C:\\Documents and Settings\\Digital10\\My Documents\\Downloads\\gcc.txt","wb");
fwrite(fileBuf,sizeof(fileBuf),1,fi);
cin.get();
delete[]fileBuf;
fclose(file);
fclose(fi);
return 0;
}
fwrite(fileBuf,fileSize,1,fi);
You did read fileSize bytes, but are writing sizeof(...) bytes, that is size of pointer, returned by new.
A C++ way to do it:
#include <fstream>
int main()
{
std::ifstream in("Source.txt");
std::ofstream out("Destination.txt");
out << in.rdbuf();
}
You have swapped the arguments of fread and fwrite. Element size precedes the number of elements. Should be like so:
fread(fileBuf, 1, fileSize, file);
And
fwrite(fileBuf, 1, fileSize, fi);
Also address my comment from above:
Enclose the else clause in { and }. Indentation does not determine blocks in c++. Otherwise your code will crash if you fail to open the file.
EDIT: and the another problem - you have been writing sizeof(fileBuf) bytes which is constant. Instead you should write the exact same number of bytes as you've read. Having in mind the rest of your code you could simply replace sizeof(fileBuf) with fileSize as I've done above.
fileBuf = new BYTE[fileSize];
fread(fileBuf, fileSize, 1, file);
FILE* fi = fopen("C:\\Documents and Settings\\[...]\gcc.txt","wb");
fwrite(fileBuf,sizeof(fileBuf),1,fi);
fileBuf is a pointer to BYTE. You declared it yourself, look: BYTE *fileBuf. And so sizeof(filebuf) is sizeof(BYTE *).
Perhaps you wanted:
fwrite(fileBuf, fileSize, 1, fi);
which closely mirrors the earlier fread call.
I strongly recommend that you capture the return values of I/O functions and check them.

C++ : Need help on decrypting a zip file and extracting the contents to memory

I have this requirement to be addressed. User inputs a encrypted zip file (only zip file is encrypted and not contents inside it) which contains a text file.
The function should decrypt the zip file using the password or key provided and then unzip the file to memory as an array of chars and return the pointer to the char.
I went through all the suggestions provided including using Minizip, microzip, zlib. But I am still not clear on what is the best fit for my requirement.
So far I have implemented decrypting the zip file using the password and converting the zip file to a string. I am planning to use this string as an input to zip decompresser and extract it to memory. However, I am not sure if my approach is right. If there are better ways to do it, please provide your suggestions along with your recommendations on the library to use in my C++ application.
https://code.google.com/p/microzip/source/browse/src/microzip/Unzipper.cpp?r=c18cac3b6126cfd1a08b3e4543801b21d80da08c
http://www.winimage.com/zLibDll/minizip.html
http://www.example-code.com/vcpp/zip.asp
http://zlib.net/
Many thanks
Please provide your suggestions.
Zero'd on using zlib. This link helped me to do that. Thought of sharing this so that it can help someone. In my case I am using that buffer directly instead of writing to a file.
http://www.gamedev.net/reference/articles/article2279.asp
#include <zlib.h>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <Windows.h>
using namespace std;
int main(int argc, char* argv[])
{
char c;
if ( argc != 2 )
{
cout << "Usage program.exe zipfilename" << endl;
return 0;
}
FILE * FileIn;
FILE * FileOut;
unsigned long FileInSize;
void *RawDataBuff;
//input and output files
FileIn = fopen(argv[1], "rb");
FileOut = fopen("FileOut.zip", "w");
//get the file size of the input file
fseek(FileIn, 0, SEEK_END);
FileInSize = ftell(FileIn);
//buffers for the raw and compressed data</span>
RawDataBuff = malloc(FileInSize);
void *CompDataBuff = NULL;
//zlib states that the source buffer must be at least 0.1 times larger than the source buffer plus 12 bytes
//to cope with the overhead of zlib data streams
uLongf CompBuffSize = (uLongf)(FileInSize + (FileInSize * 0.1) + 12);
CompDataBuff = malloc((size_t)(CompBuffSize));
//read in the contents of the file into the source buffer
fseek(FileIn, 0, SEEK_SET);
fread(RawDataBuff, FileInSize, 1, FileIn);
//now compress the data
uLongf DestBuffSize;
int returnValue;
returnValue = compress((Bytef*)CompDataBuff, (uLongf*)&DestBuffSize,
(const Bytef*)RawDataBuff, (uLongf)FileInSize);
cout << "Return value " << returnValue;
//write the compressed data to disk
fwrite(CompDataBuff, DestBuffSize, 1, FileOut);
fclose(FileIn);
fclose(FileOut);
errno_t err;
// Open for read (will fail if file "input.gz" does not exist)
if ((FileIn = fopen("FileOut.zip", "rb")) == NULL) {
fprintf(stderr, "error: Unable to open file" "\n");
exit(EXIT_FAILURE);
}
else
printf( "Successfully opened the file\n" );
cout << "Input file name " << argv[1] << "\n";
// Open for write (will fail if file "test.txt" does not exist)
if( (err = fopen_s( &FileOut, "test.txt", "wb" )) !=0 )
{
printf( "The file 'test.txt' was not opened\n" );
system ("pause");
exit (1);
}
else
printf( "The file 'test.txt' was opened\n" );
//get the file size of the input file
fseek(FileIn, 0, SEEK_END);
FileInSize = ftell(FileIn);
//buffers for the raw and uncompressed data
RawDataBuff = malloc(FileInSize);
char *UnCompDataBuff = NULL;
//RawDataBuff = (char*) malloc (sizeof(char)*FileInSize);
if (RawDataBuff == NULL)
{
fputs ("Memory error",stderr);
exit (2);
}
//read in the contents of the file into the source buffer
fseek(FileIn, 0, SEEK_SET);
fread(RawDataBuff, FileInSize, 1, FileIn);
//allocate a buffer big enough to hold the uncompressed data, we can cheat here
//because we know the file size of the original
uLongf UnCompSize = 482000; //TODO : Revisit this
int retValue;
UnCompDataBuff = (char*) malloc (sizeof(char)*UnCompSize);
if (UnCompDataBuff == NULL)
{
fputs ("Memory error",stderr);
exit (2);
}
//all data we require is ready so compress it into the source buffer, the exact
//size will be stored in UnCompSize
retValue = uncompress((Bytef*)UnCompDataBuff, &UnCompSize, (const Bytef*)RawDataBuff, FileInSize);
cout << "Return value of decompression " << retValue << "\n";
//write the decompressed data to disk
fwrite(UnCompDataBuff, UnCompSize, 1, FileOut);
free(RawDataBuff);
free(UnCompDataBuff);
fclose(FileIn);
fclose(FileOut);
system("pause");
exit (0);
}
Most, if not all, of the most popular ZIP tools also support command-line usage. So, if I were you I would just run a system command from your C++ program to unzip and decrypt the file using one of these popular ZIP tools. After the textfile has been unzip'ed and decrypted you can load it from the disk into internal memory to further process it from there. A simple solution, but efficient.

unread a file in C++

I am trying to read files that are simultaneously written to disk. I need to read chunks of specific size. If the size read is less than the specific size, I'd like to unread the file (something like what ungetc does, instead for a char[]) and try again. Appending to the bytes read already is not an option for me.
How is this possible?
I tried saving the current position through:
FILE *fd = fopen("test.txt","r+");
fpos_t position;
fgetpos (fd, &position);
and then reading the file and putting the pointer back to its before-fread position.
numberOfBytes = fread(buff, sizeof(unsigned char), desiredSize, fd)
if (numberByBytes < desiredSize) {
fsetpos (fd, &position);
}
But it doesn't seem to be working.
Replacing my previous suggestions with code I just checked (Ubuntu 12.04 LTS, 32bit). GCC is 4.7 but I'm pretty sure this is 100% standard solution.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define desiredSize 10
#define desiredLimit 100
int main()
{
FILE *fd = fopen("test.txt","r+");
if (fd == NULL)
{
perror("open");
exit(1);
}
int total = 0;
unsigned char buff[desiredSize];
while (total < desiredLimit)
{
fpos_t position;
fgetpos (fd, &position);
int numberOfBytes = fread(buff, sizeof(unsigned char), desiredSize, fd);
printf("Read try: %d\n", numberOfBytes);
if (numberOfBytes < desiredSize)
{
fsetpos(fd, &position);
printf("Return\n");
sleep(10);
continue;
}
total += numberOfBytes;
printf("Total: %d\n", total);
}
return 0;
}
I was adding text to file from another console and yes, read was progressing by 5 chars blocks in accordance to what I was adding.
fseek seems perfect for this:
FILE *fptr = fopen("test.txt","r+");
numberOfBytes = fread(buff, 1, desiredSize, fptr)
if (numberOfBytes < desiredSize) {
fseek(fptr, -numberOfBytes, SEEK_CUR);
}
Also note that a file descriptor is what open returns, not fopen.

File reading in C++ and eof

If I am reading a file in c++ like this:
//Begin to read a file
FILE *f = fopen("vids/18.dat", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *m_sendingStream = (char*)malloc(pos);
fread(m_sendingStream, pos, 1, f);
fclose(f);
//Finish reading a file
I have 2 questions first: Is this reading the entire file? (I want it to do so), and 2nd how can I create a while that continues until reaching the end of the file? I have:
while(i < sizeof(m_sendingStream))
but I am not sure if this works, I've been reading around (I've never programmed in c++ before) and I thought I could use eof() but apparently that's bad practice.
A loop should not be necessary when reading from a file, since you will get the entire contents with your code in one go. You should still record and check the return value, of course:
size_t const n = fread(buf, pos /*bytes in a record*/, 1 /*max number of records to read*/, f);
if (n != 1) { /* error! */ }
You can also write a loop that reads until the end of the file without knowing the file size in advance (e.g. read from a pipe or growing file):
#define CHUNKSIZE 65536
char * buf = malloc(CHUNKSIZE);
{
size_t n = 0, r = 0;
while ((r = fread(buf + n, 1 /*bytes in a record*/, CHUNKSIZE /*max records*/, f)) != 0)
{
n += r;
char * tmp = realloc(buf, n + CHUNKSIZE);
if (tmp) { buf = tmp; }
else { /* big fatal error */ }
}
if (!feof(f))
{
perror("Error reading file");
}
}
This is the C style of working with files, the C++ style would be using the fstream library.
And about your second question, a good way to check wether you are on the end of the file or not, would be to use the feof function.