Problems getting ftell() in binary append - c++

I've got some code that was working on a server until it was moved to a different system. The problem seems to be here:
given a structure I've defined elsewhere:
user1_type user1; /* structure containing user data */
user1_type *user1_ptr=&user1;
this routine appends the record to the end of the file
if ((dfile=fopen(filename,"ab+"))==NULL)
error_message("Unable to open file for append.",filename,1);
else { /* append data */
user1.recid=ftell(dfile); /* update file position */
fwrite(user1_ptr,sizeof(user1),1,dfile);
fflush(dfile);
fclose(dfile);
I can confirm the data gets appended in the file, but the value of user1.recid always returns 0 - any ideas why?
UPDATE: Looks like the issue is not all implementations require a fseek() after fopen(). I obviously need an fseek(dfile,0,SEEK_END); before I do a ftell() when appending. But if I want to read from the beginning of a text or binary file, is it also customary to place a fseek() right after fopen? Does it vary depending upon the file type?

From MSDN's documentation of ftell
The position returned by ftell() is expressed as an offset relative
to the beginning of the stream
If no I/O operation has yet occurred on a file opened for appending,
the file position is the beginning of the file.
This gives you an offset of 0 relative to the beginning.
So when you call user1.recid=ftell(dfile); no I/O operation has occurred on the stream yet so ftell() returns 0 indicating the file pointer position is at the begining.

This behavior of ftell(dfile) here would be implementation defined. From C11 7.21.3 (similar wording in previous C standards):
If a file can support positioning requests (such as a disk file, as
opposed to a terminal), then a file position indicator associated with
the stream is positioned at the start (character number zero) of the
file, unless the file is opened with append mode in which case it is
implementation-defined whether the file position indicator is
initially positioned at the beginning or the end of the file.

...so (in addition to bkVnet's answer) you must use fseek(dfile,0,SEEK_END) to seek to the end, ask ftell the position and divide that by sizeof(user1_type) to get the record id (i.e. the number of records already in the file, so plus 1 for the new record).

Related

How to refresh an input file stream in C++

I am writing a program that monitors for changes in a file for a specific purpose. The possible values (3) in the file are known and can be differentiated by the first letter.
Using an input file stream ifstream status;, I'm unable to refresh the buffer of the input stream status to reflect changes in the file. I don't want to spam status.close() and status.open() to solve the problem.
If the changes you mentioned include only appended bytes, then you can use the std::ifstream::clear() to clear any error bit and continue reading the file until reaching the EOF. Check out this answer.

Fast accessing file position in ifs() C++

Info: What is the best way to store a position in a txt file, close the file, and later open it at the same position using c++?
I have a large text file that I need to parse in chunks and feed into some system. As of now, I load the file in the ifstream and then getlines until I find the data I need (let's say data is at position {x}). After this I close the file, process the data, and now I need to continue feeding the data from the big file. So I open the file again, and getlines until I get to position {x+d} this time ( d is the offset from the data I read)...
Instead of going through file once, it is easy to see, that I go (1d + 2d + ... + (N-1)d + Nd) ~ d*N^2 times through the file. Now I want to save the position in the file after d, close the file, and then instantly open the file at the same position. What can be used for this?
You can't do this with newline translation enabled (what the Standard calls "text mode"), because seeking back to the position requires the standard library to scan through the entire front of the file to find N characters-not-double-counting-newlines. Translations of variable length encodings (e.g. between UTF-8 and UCS) cause a similar problem.
The solution is to turn off newline translation (what the Standard calls "binary mode") and any other translations that involve variable-length encodings, and handle these yourself. With all translations turned off, the "file position" is the number directly used by the OS to perform file I/O, and therefore has the potential to be very efficient (whether it actually is efficient depends on the standard library implementation details).

Truncate or resize a file in order to modify its end

I have a FILE* file that holds some binary data. Let's say that this data are a list of double and that the last entry is a string that describes what are those double. I want to modify this string (the new string might be shorter). So first i delete the old string. I need to find the starting point of the string :
fseek(file,-size(sring.size()),SEEK_END);
and then what should i do ? i found Delete End of File link but i don't know which one to use... Once the file is re-sized, can i simply write my new string using fwrite ?
Neither FILE* nor iostream support truncation. If you want to
edit a file so that the new file is shorter than the old, you
have two solutions:
The usual solution is to copy the original file into a new
file, making any changes as you go. When finished, close the
new file, verify that there are no errors (an important point),
then delete the original file and rename to new file to have the
original name. This may cause problems on Unix systems if
there were hard links to the original file. (Typically, this
isn't an issue, since everyone uses soft links now. If it is,
you should stat the original, and if the st_nlink field is
greater than 1, copy the new file onto the original, and then
delete the new file.) On the other hand, it is the most generic
option; it works for all types of modifications, anywhere in the
file.
There are usually system specific functions at the lower level
to truncate a file. Under Unix, this is ftruncate. But
you'll need to find the byte count where you want to truncate
first; ftruncate requires an open file, but it won't truncate
at the current position in the file. So you'll have to 1) find
the start of this last line in the file, 2) seek to it, 3) write
the new value, 4) call ftell (or ftello, if the length can
be too large to fit on a long) to find the new end position.
At this point, you have the problem of synchronizing your
FILE* with the lower level; personally, I'd fclose the file,
then reopen it with open, and do the ftruncate on the file
descripter from this open. (In fact, personally, I'd do the
entire job using open, read, lseek, write, ftruncate
and close. And maybe stat to find out the file length up
front. If you don't have to translate the doubles,
there's really nothing that FILE* adds.
As a general rule, I'd go with the first solution, and only try
the second if it turns out to be too slow. (If the file
contains a couple of billion doubles, for example, copying them
will take some time.)
If you want to resize a file, then ftruncate() (http://www.linuxmanpages.com/man2/ftruncate.2.php) is the function you're looking for. You'll need to call fileno() on the FILE * structure to get the file descriptor for ftruncate(), though.
As for appending the new data (the new string) once the file has been reduced in size, just seeking to the end (fseek(file, 0, SEEK_END)) and fwrite()'ing there should do it.
EDIT: remember to call fflush() before truncating the file!

Is ftruncate() asynchronous?

I am attempting to write a class in C++ that provides a means of atomically appending to a file, even for the case of power failure mid write.
First, I write my current file position (a 64 offset from the beginning of the file, in bytes) to a separate journal file. Then, I write the requested data to the end of the date file. Finally, I call ftruncate() (setting the truncated size to 0) on the journal file.
The main idea is that if this class is ever asked to open a file that has a non empty journal file, then you know a write was interrupted and you can read the position of the last write from the journal file and fseek to that spot. You lose the last partial write, but the file should not be corrupted.
Unfortunately, it seems like ftruncate() is asynchronous. In practice, even if I call fflush() and fsync() after ftruncate I see the journal grow to up to hundreds of bytes while doing lots of writes. It always ultimately ends up at 0, but I expected to see it at either size 0 or size 8 at all times.
Is it possible to make ftruncate completely synchronous? Or is there a better way to use the journal?
ftruncate() does not change your file descriptor's write offset in the file. If you are leaving the file open and writing the next length after calling ftruncate(), then what's happening is the file's offset is still increasing. When you write, it resets the length of the file to be at the offset and then writes your bytes there.
Probably what you want to do is call lseek(fd, 0, SEEK_SET) after you call ftruncate() so that the next write to the file will take place at the beginning of the file.

is there a way to fopen a file that allows me to edit just a few bytes?

I am writing a class that compresses binary data using a zlib stream. I have a buffer that I fill with the output stream and once it becomes full I dump the buffer out to a file using fopen(filename, 'ab');... What this means is that my program only opens up the file to write to it whenever it has a buffer full of data to dump, it goes and does it and immediately closes it.
The issue is in my format I use an 8 byte header at the beginning of each file which contains the original length and compressed length but I do not know these values until the end of the whole compression process.
What I wanted to do was write 8 bytes of zeros, then append with all my compressed data, then come back at the end during cleanup to fill in those 8 bytes with the size data, but I can't seem to find a way to open the file without bringing it all back into memory. I just want to edit the first 8 bytes of the file. Do I need to use mmap?
Since you're using the file in append mode, you do need to close and re-open it:
open with fopen(filename, "r+b");
write the 8 bytes;
close the file using fclose().
The r+ means
Open for reading and writing. The stream is positioned at the
beginning of the file.
and the b is needed to open in binary mode.
You can use this method to change the data at any position in the file, not just at the beginning: simply use fseek() to seek to the required position before writing.
Use rewind() to take the file pointer back to the start of the file after you write out the last few bytes of data. You can then output your 8 bytes of length info.
If you have flexibility in changing your format, I might suggest this. Define your compressed stream such that it is a sequence of an unknown number of blocks, and each block is preceded by a fixed length integer specifying the number of bytes in the block. The stream is finished when the next block has a size of zero.
The drawback to this format is that there no way for the reader of the stream to know how much data is coming until it's all been read. But the advantage is that it avoids this problem you are trying to solve.
More importantly, it allows you to send a compressed stream of data somewhere as you read the input and you don't have to save it all before sending it. For example, you could write a compression Unix filter that you could put in a pipe stream:
prog1 | yourprog -compress | rsh host yourprog -expand | prog2
Good luck.