I am trying to read files that are simultaneously written to disk. I need to read chunks of specific size. If the size read is less than the specific size, I'd like to unread the file (something like what ungetc does, instead for a char[]) and try again. Appending to the bytes read already is not an option for me.
How is this possible?
I tried saving the current position through:
FILE *fd = fopen("test.txt","r+");
fpos_t position;
fgetpos (fd, &position);
and then reading the file and putting the pointer back to its before-fread position.
numberOfBytes = fread(buff, sizeof(unsigned char), desiredSize, fd)
if (numberByBytes < desiredSize) {
fsetpos (fd, &position);
}
But it doesn't seem to be working.
Replacing my previous suggestions with code I just checked (Ubuntu 12.04 LTS, 32bit). GCC is 4.7 but I'm pretty sure this is 100% standard solution.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define desiredSize 10
#define desiredLimit 100
int main()
{
FILE *fd = fopen("test.txt","r+");
if (fd == NULL)
{
perror("open");
exit(1);
}
int total = 0;
unsigned char buff[desiredSize];
while (total < desiredLimit)
{
fpos_t position;
fgetpos (fd, &position);
int numberOfBytes = fread(buff, sizeof(unsigned char), desiredSize, fd);
printf("Read try: %d\n", numberOfBytes);
if (numberOfBytes < desiredSize)
{
fsetpos(fd, &position);
printf("Return\n");
sleep(10);
continue;
}
total += numberOfBytes;
printf("Total: %d\n", total);
}
return 0;
}
I was adding text to file from another console and yes, read was progressing by 5 chars blocks in accordance to what I was adding.
fseek seems perfect for this:
FILE *fptr = fopen("test.txt","r+");
numberOfBytes = fread(buff, 1, desiredSize, fptr)
if (numberOfBytes < desiredSize) {
fseek(fptr, -numberOfBytes, SEEK_CUR);
}
Also note that a file descriptor is what open returns, not fopen.
Related
First off, I understand that RC4 is not the safest encryption method and that it is outdated, this is just for a school project. Just thought I put it out there since people may ask.
I am working on using RC4 from OpenSSL to make a simple encryption and decryption program in C++. I noticed that the encryption and decryption is inconsistent. Here is what I have so far:
#include <fcntl.h>
#include <openssl/evp.h>
#include <openssl/rc4.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int inputFile = open(argv[1], O_RDONLY);
if (inputFile < 0) {
printf("Error opening file\n");
return 1;
}
unsigned char *keygen = reinterpret_cast<unsigned char*>(argv[2]);
RC4_KEY key;
size_t size = lseek(inputFile, 0, SEEK_END);
lseek(inputFile, 0, SEEK_SET);
unsigned char *fileIn = (unsigned char*) calloc(size, 1);
if (pread(inputFile, fileIn, size, 0) == -1) {
perror("Error opening read\n");
return 1;
}
unsigned char *fileOut = (unsigned char*) calloc(size, 1);
unsigned char *actualKey;
EVP_BytesToKey(EVP_rc4(), EVP_sha256(), NULL, keygen, sizeof(keygen), 1, actualKey, NULL);
RC4_set_key(&key, sizeof(actualKey), actualKey);
RC4(&key, size, fileIn, fileOut);
int outputFile = open(argv[3], O_WRONLY | O_TRUNC | O_CREAT, 0644);
if (outputFile < 0) {
perror("Error opening output file");
return 1;
}
if (pwrite(outputFile, fileOut, size, 0) == -1) {
perror("error writing file");
return 1;
}
close(inputFile);
close(outputFile);
free(fileIn);
free(fileOut);
return 0;
}
The syntax for running this in Ubuntu is:
./myRC4 test.txt pass123 testEnc.txt
MOST of the time this works fine, and encrypts and decrypts the file. However occasionally I get a Segmentation fault. If I do, I run the same exact command again and it encrypts or decrypts fine, at least for .txt files.
When I test on .jpg files, or any larger file, the issue seems to be more common and inconsistent. I notice that sometimes the images appear to have been decrypted (no segmentation fault) but in reality it has not, which I test by doing a diff between the original and the decrypted file.
Any ideas as to why I get these inconsistencies? Does it have to do with how I allocate memory for fileOut and fileIn?
Thank you in advance
actualKey needs to be pointing to a buffer of appropriate size before you pass it to EVP_BytesToKey. As it is you are passing in an uninitialised pointer which would explain your inconsistent results.
The documentation for EVP_BytesToKey has this to say:
If data is NULL, then EVP_BytesToKey() returns the number of bytes needed to store the derived key.
So you can call EVP_BytesToKey once with the data parameter set to NULL to determine the length of actualKey, then allocate a suitable buffer and call it again with actualKey pointing to that buffer.
As others have noted, passing sizeof(keygen) to EVP_BytesToKey is also incorrect. You probably meant strlen (argv [2]).
Likewise, passing sizeof(actualKey) to RC4_set_key is also an error. Instead, you should pass the value returned by EVP_BytesToKey.
I'm trying to use a sparse file to store sparse array of data, logically I thought the code had no bugs but the unit tests keep failing, after many inspections of code I decided to check the file content after every step and found out the holes were not created, aka: write first element, seek x amount of elements, write 2nd element ends up writing first element then second element in file without any space at all between them.
My simplified code:
FILE* file = fopen64(fn.c_str(), "ar+b");
auto const entryPoint = 220; //calculated at runtime, the size of each element is 220 bytes
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
The offset is being calculated correctly, current behavior is writing first element, leaving 20 elements empty then writing 22nd element. When inspecting file using hex dumps it shows 2 elements at offset 0 and 220 (directly after first element). unit tests also fail because reading 2nd element actually returns element number 22.
Anyone could explain what is wrong with my code? maybe I misunderstood the concept of holes???
------Edit1------
Here's my full code
Read function:
FILE* file = fopen64(fn.c_str(), "r+b");
if(file == nullptr){
memset(page->entries, 0, sizeof(page->entries));
return ;
}
MoveCursor(file, id, sizeof(page->entries));
size_t read = fread(&page->entries[0], sizeof(page->entries), 1, file);
fclose(file);
if(read != 1){ //didn't read a full page.
memset(page->entries, 0, sizeof(page->entries));
}
Write function:
auto fn = dir.path().string() + std::filesystem::path::preferred_separator + GetFileId(page->pageId);
FILE* file = fopen64(fn.c_str(), "ar+b");
MoveCursor(file, page->pageId, sizeof(page->entries));
size_t written = fwrite(&page->entries[0], sizeof(page->entries), 1, file);
if(written != 1) {
perror("error writing file");
}
fclose(file);
void MoveCursor(FILE* file, TPageId pid, size_t pageMultiplier){
auto const entryPoint = pid * pageMultiplier;
auto r = fseeko64(file, entryPoint, SEEK_SET);
if(r!=0){
std::cerr << "Error seeking file" << std::endl;
}
}
And here's a simplified page class:
template<typename TPageId uint32_t EntriesCount>
struct PODPage {
bool dirtyBit = false;
TPageId pageId;
uint32_t entries[EntriesCount];
};
The reason I'm saying it is fseeko problem when writing is because when inspecting file content with xdd it shows data is out of order. Break points in MoveCursor function shows the offset is calculated correctly and manual inspection of file fields shows the offset is set correctly however when writing it doesn't leave a hole.
=============Edit2============
Minimal reproducer, logic goes as: write first chunk of data, seek to position 900, write second chunk of data, then try to read from position 900 and compare to data that was supposed to be there. Each operation opens and closes file which is what happens in my original code, keeping a file open is not allowed.
Expected behavior is to create a hole in file, actual behavior is the file is written sequentially without holes.
#include <iostream>
#define _FILE_OFFSET_BITS 64
#define __USE_FILE_OFFSET64 1
#include <stdio.h>
#include <cstring>
int main() {
uint32_t data[10] = {1,2,3,4,5,6,7,8,9};
uint32_t data2[10] = {9,8,7,6,5,4,3,2,1};
{
FILE* file = fopen64("data", "ar+b");
if(fwrite(&data[0], sizeof(data), 1, file) !=1) {
perror("err1");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "ar+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err2");
return 0;
}
if(fwrite(&data2[0], sizeof(data2), 1, file) !=1) {
perror("err3");
return 0;
}
fclose(file);
}
{
FILE* file = fopen64("data", "r+b");
if (fseeko64(file, 900, SEEK_SET) != 0) {
perror("err4");
return 0;
}
uint32_t data3[10] = {0};
if(fread(&data3[0], sizeof(data3), 1, file)!=1) {
perror("err5");
return 0;
}
fclose(file);
if (memcmp(&data2[0],&data3[0],sizeof(data))!=0) {
std::cerr << "err6";
return 0;
}
}
return 0;
}
I think your problem is the same as discussed here:
fseek does not work when file is opened in "a" (append) mode
Does fseek() move the file pointer to the beginning of the file if it was opened in "a+b" mode?
Summary of the two above: If a file is opened for appending (using "a") then fseek only applies to the read position, not to the write position. The write position will always be at the end of the file.
You can fix this by opening the file with "w" or "w+" instead. Both worked for me with your minimal code example.
Scenario: I have a file that is 8,203,685 bytes long in binary, and I am using fread() to read in the file.
Problem: Hexdumping the data after the fread() on both Linux and Windows yields different results. Both hexdump files are the same size, but on Linux it matches the original input file that went in, whereas on Windows starting at byte 8,200,193 the rest of the hexdump contains 0's.
Code:
int main(void)
{
FILE * fp = fopen("input.exe", "rb");
unsigned char * data = NULL;
long size = 0;
if (fp)
{
fseek(fp, 0, SEEK_END);
size = ftell(fp);
fseek(fp, 0, SEEK_SET);
data = (unsigned char *)malloc(size);
size_t read_bytes = fread(data, 1, size, fp);
// print out read_bytes, value is equal to size
// Hex dump using ofstream. Hexdump file is different here on Windows vs
// on Linux. Last ~3000 bytes are all 0's on Windows.
std::ofstream out("hexdump.bin", std::ios::binary | std::ios::trunc);
out.write(reinterpret_cast<char *>(data), size);
out.close();
FILE * out_file = fopen("hexdump_with_FILE.bin", "wb");
fwrite(data, 1, size, out_file);
fflush(out_file);
fclose(out_file);
}
if (fp) fclose(fp);
if (data) free(data);
return 0;
}
Has anyone seen this behavior before, or have an idea of what might be causing the behavior that I am seeing?
P.S. Everything works as expected when using ifstream and its read function
Thanks!
I am trying to create a file with a given size using lseek() and adding a byte at the end of the file, however it creates a sparse file with 0 byte.
Below is the code...any suggestions?
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#ifndef BUF_SIZE
#define BUF_SIZE 1024
#endif // BUF_SIZE
int main(int argc, char *argv[])
{
int inputFd;
int fileSize = 500000000;
int openFlags;
int result;
mode_t filePerms;
ssize_t numRead;
char buf[BUF_SIZE];
openFlags = O_WRONLY | O_CREAT | O_EXCL;
filePerms = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; /*rw-rw-ew*/
inputFd = open(argv[1], openFlags, filePerms);
if (inputFd == -1)
printf("problem opening file %s ", argv[1]);
return 1;
printf ("input FD: %d", inputFd);
result = lseek(inputFd, fileSize-1, SEEK_SET);
if (result == -1){
close(inputFd);
printf("Error calling lseek() to stretch the file");
return 1;
}
result = write(inputFd, "", 1);
if (result < 0){
close(inputFd);
printf("Error writing a byte at the end of file\n");
return 1;
}
if (close(inputFd) == -1)
printf("problem closing file %s \n",argv[1]);
return 0;
}
You are missing some braces:
if (inputFd == -1)
printf("problem opening file %s ", argv[1]);
return 1;
You need to change this to:
if (inputFd == -1) {
printf("problem opening file %s ", argv[1]);
return 1;
}
Without the braces, the only statement controlled by the if statement is the printf, and the return 1; statement is always run no matter what the value of inputFd is.
It is good practice to always use braces around a controlled block, even if there is only one statement (such as for the close at the end of your program).
Do you have any example of writing a byte on every block of the file?
This code is from a slightly different context, but can be adapted to your case. The context was ensuring that the disk space for an Informix database was all allocated, so the wrapper code around this created the file (and it had not to exist, etc). However, the entry point to actually writing was the second of these two functions — the fill buffer function replicated the 8-byte word informix into a 64 KiB block.
/* Fill the given buffer with the string 'informix' repeatedly */
static void fill_buffer(char *buffer, size_t buflen)
{
size_t filled = sizeof("informix") - 1;
assert(buflen > filled);
memmove(buffer, "informix", sizeof("informix")-1);
while (filled < buflen)
{
size_t ncopy = (filled > buflen - filled) ? buflen - filled : filled;
memmove(&buffer[filled], buffer, ncopy);
filled *= 2;
}
}
/* Ensure the file is of the required size by writing to it */
static void write_file(int fd, size_t req_size)
{
char buffer[64*1024];
size_t nbytes = (req_size > sizeof(buffer)) ? sizeof(buffer) : req_size;
size_t filesize = 0;
fill_buffer(buffer, nbytes);
while (filesize < req_size)
{
size_t to_write = nbytes;
ssize_t written;
if (to_write > req_size - filesize)
to_write = req_size - filesize;
if ((written = write(fd, buffer, to_write)) != (ssize_t)to_write)
err_syserr("short write (%d vs %u requested)\n",
(int)written, (unsigned)to_write);
filesize += to_write;
}
}
As you can see, it writes in 64 KiB chunks. Frankly, there's going to be no difference between writing all bytes on a page and writing one byte per page. Indeed, if anything, writing the whole page will be faster because the new value can simply be written, whereas if you write just one byte per page, an old page has to be created/read, modified, and then written back.
In your context, I would extend the current file to a multiple of 4 KiB (8 KiB if you prefer), then go writing the main data blocks, and the final partial block if necessary. You would probably simply do memset(buffer, '\0', sizeof(buffer)); whereas the sample code was deliberately writing something other than blocks of zero bytes. AFAIK, even if the block you write is all zero bytes, the driver actually writes that block to the disk — the simple act of writing ensures the file is non-sparse.
The err_syserr() function is a bit like fprintf(stderr, …), but it adds the system error message from errno and strerror() and exits the program too. The code does assume 32-bit (or larger) int values. I never got to experiment with terabyte size files — the code was last updated in 2009.
If I am reading a file in c++ like this:
//Begin to read a file
FILE *f = fopen("vids/18.dat", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *m_sendingStream = (char*)malloc(pos);
fread(m_sendingStream, pos, 1, f);
fclose(f);
//Finish reading a file
I have 2 questions first: Is this reading the entire file? (I want it to do so), and 2nd how can I create a while that continues until reaching the end of the file? I have:
while(i < sizeof(m_sendingStream))
but I am not sure if this works, I've been reading around (I've never programmed in c++ before) and I thought I could use eof() but apparently that's bad practice.
A loop should not be necessary when reading from a file, since you will get the entire contents with your code in one go. You should still record and check the return value, of course:
size_t const n = fread(buf, pos /*bytes in a record*/, 1 /*max number of records to read*/, f);
if (n != 1) { /* error! */ }
You can also write a loop that reads until the end of the file without knowing the file size in advance (e.g. read from a pipe or growing file):
#define CHUNKSIZE 65536
char * buf = malloc(CHUNKSIZE);
{
size_t n = 0, r = 0;
while ((r = fread(buf + n, 1 /*bytes in a record*/, CHUNKSIZE /*max records*/, f)) != 0)
{
n += r;
char * tmp = realloc(buf, n + CHUNKSIZE);
if (tmp) { buf = tmp; }
else { /* big fatal error */ }
}
if (!feof(f))
{
perror("Error reading file");
}
}
This is the C style of working with files, the C++ style would be using the fstream library.
And about your second question, a good way to check wether you are on the end of the file or not, would be to use the feof function.