How to cut a file without using another file? - c++

Is it possible to delete part of a file (let's say from the beginning to its half), without having to use another file?
Thank's!

Yes, it is possible, but still you'll have to rewrite most of the file.
The rough idea is as follows:
open the file
beg = find the start of the fragment to be removed
len = length of the fragment to be removed
blocksize = 4096 -- example block size, may be any
datamoved = 0
do {
fseek(pos +len +datamoved);
if( endoffile ) return; -- finished!
actualread = fread(buffer, blocksize)
fseek(pos + datamoved)
fwrite(buffer, actualread)
datamoved += actualread
}
and the last step after the loop is to 'truncate' the file to the pos+datamoved size. if the underlying filesystem does not handle 'truncatefile' operation, then you have to rewrite.. but most of filesystems and libraries do support that.

The short answer is that no, most file systems don't attempt to support operations like that.
That leaves you with two choices. The obvious one is to create a copy of the data, leaving out the parts you don't want. You can do this either in-place (i.e., moving the data around in the same file) or by using an auxiliary file, typically copying the data to the new file, then doing something like renaming the new file to the old name.
The other major choice is to simply re-structure your file and data so you don't have to get rid of the old data at all. For example, if you want to keep the most recent N amount of data from a process, you might structure (most of) the file as a circular buffer, with a couple of "pointers" at the beginning tell you the head and tail points, so you know where to read data from/write data to. With a structure like this, you don't erase or remove the old data, you just overwrite it as needed.

If you have enough memory, read its contents fully to the memory, copy it back to the front of the file, and truncate the file.
If you do not have enough memory, copy in blocks, and only when you are done truncate the file.

Related

How to reduce the size of a fstream file in C++

What is the best way to cut the end off of a fstream file in C++ 11
I am writing a data persistence class to store audio for my audio editor. I have chosen to use fstream (possibly a bad idea) to create a random access binary read write file.
Each time I record a little sound into my file I simply tack it onto the end of this file. Another internal data structure / file, contains pointers into the audio file and keeps track of edits.
When I undo a recording action and then do something else the last bit of the audio file becomes irrelevant. It is not referenced in the current state of the document and you cannot redo yourself back to a state where you can ever see it again. So I want to chop this part of the file off and start recording at the new end. I don’t need to cut out bitts in the middle, just off the end.
When the user quits this file will remain and be reloaded when they open the project up again.
In my application I expect the user to do this all the time and being able to do this might save me as much as 30% of the file size. This file will be long, potentially very, very long, so rewriting it to another file every time this happens is not a viable option.
Rewriting it when the user saves could be an option but it is still not that attractive.
I could stick a value at the start that says how long the file is supposed to be and then overwrite the end to recycle the space but in the mean time. If I wanted to continually update the data store file in case of crash this would mean I would be rewriting the start over and over again. I worry that this might be bad for flash drives. I could also recomputed the end of the useful part of the file on load, by analyzing the pointer file but in the mean time I would be wasting all that space potentially, and that is complicated.
Is there a simple call for this in the fstream API?
Am I using the wrong library? Note I want to stick to something generic STL I preferred, so I can keep the code as cross platform as possible.
I can’t seem to find it in the documentation and have looked for many hours. It is not the end of the earth but would make this a little simpler and potentially more efficient. Maybe I am just missing it somehow.
Thanks for your help
Andre’
Is there a simple call for this in the fstream API?
If you have C++17 compiler then use std::filesystem::resize_file. In previous standards there was no such thing in standard library.
With older compilers ... on Windows you can use SetFilePointer or SetFilePointerEx to set the current position to the size you want, then call SetEndOfFile. On Unixes you can use truncate or ftruncate. If you want portable code then you can use Boost.Filesystem. From it is simplest to migrate to std::filesystem in the future because the std::filesystem was mostly specified based on it.
If you have variable, that contains your current position in the file, you could seek back for the length of your "unnedeed chunk", and just continue to write from there.
// Somewhere in the begining of your code:
std::ofstream *file = new std::ofstream();
file->open("/home/user/my-audio/my-file.dat");
// ...... long story of writing data .......
// Lets say, we are on a one millin byte now (in the file)
int current_file_pos = 1000000;
// Your last chunk size:
int last_chunk_size = 12345;
// Your chunk, that you are saving
char *last_chunk = get_audio_chunk_to_save();
// Writing chunk
file->write(last_chunk, last_chunk_size);
// Moving pointer:
current_file_pos += last_chunk_size;
// Lets undo it now!
current_file_pos -= last_chunk_size;
file->seekp(current_file_pos);
// Now you can write new chunks from the place, where you were before writing and unding the last one!
// .....
// When you want to finally write file to disk, you just close it
file->close();
// And when, truncate it to the size of current_file_pos
truncate("/home/user/my-audio/my-file.dat", current_file_pos);
Unfortunatelly, you'll have to write a crossplatform function truncate, that would call SetEndOfFile in windows, and truncate in linux. It's easy enough with using preprocessor macros.

zlib's compress function is not doing anything. Why?

before = new unsigned char[mSizeNeeded*4];
uLong value = compressBound(mSizeNeeded*4);
after = new unsigned char[value];
compress(after, &value, before, mSizeNeeded*4);
fwrite(&after, 1, value, file);
'before' has a bunch of audio data stored into it and I am trying to compress it and store it into 'after'. I then write it into a file. The file is the same size as the original file, it also contains the same data that was in before (as far as I can tell).
Compress also returns OK so I know that the compression is not failing.
Okay, so it looks like my only problem is somewhere in the compression (I think). I am able to run compress and then I can uncompress and get the correct data out. Also, it is writing into the file and fwrite returns 561152 but the count (value) is 684964. So it looks like something is wrong with fwrite. I looked more carefully and the after data is different than the before data.
561152 is the same size as the original audio data in a .wav file that I have (stripped of the .wav headers of course).
Based on your original text:
fwrite (&before, ...
I am trying to compress it and store it into 'after'. I then write it into a file.
I think not. You are writing the original data to the file, you should probably be writing after instead.
The other thing you should get in the habit of doing is checking return values from functions that you care about. In other words, compress() will tell you if a problem occurs yet you seem to be totally ignoring the possibility.
Similarly, fwrite() also uses its return value to indicate whether it was successful or not. Since you haven't included the code showing how that's set up, this is also a distinct possibility. In particular fwrite is under no obligation to write your entire block to the file in one hit (device may be full, etc), that's why it has a return value, so you can detect and adjust for that situation. Often, a better option than:
fwrite (&after, 1, value, file);
is:
fwrite (&after, value, 1, file);
since the latter will always give you one for a fully successful write, something else for a failure of some description.
That would be my first step in establishing where the problem lies.
On top of that, there are numerous other (generally-applicable) methods you can use to track down the issue, such as:
outputting all variables after they change or are set (like the return values of functions, after, before, value and so on).
delete the output file before running your program, to ensure it's created afresh.
run the code through a debugger so you can see what's happening under the covers.
clearing after to all zero bytes (or a known pattern) to ensure you don't get stale data in there.
And, as a final approach (given that the zlib source code is freely available), you can also modify (or debug into) it so that you can clearly see what's going on under the covers.

What is the most efficient way to remove first N bytes from a file on Windows?

Say, I have a file of an arbitrary length S and I need to remove first of its N bytes (where N is much less than S.) What is the most efficient way to do it on Windows?
I'm looking for a WinAPI to do this, if one is available.
Otherwise, what are my options -- to load it into RAM and then re-write the existing file with the remainder of data? (In this case I cannot be sure that the PC has enough RAM?) Or write the remainder of file data into a new file, erase the old one, and rename the new file into the old one. (In this case what to do if any of these steps fail? Plus how about defragmentation that this method causes on disk?)
There is no general way to do this built into the OS. There are theoretical ways to edit the file system's data structures underneath the operating system on sector or cluster boundaries, but this is different for each file system, and would need to violate any security model.
To accomplish this you can read in the data starting at byte N in chunks of say 4k, and then write them back out starting at byte zero, and then use the file truncate command (setendoffile) to set the new smaller end of file when you are finished copying the data.
The most efficient method to delete data at the beginning of the file is to modify the directory entry, on the hard drive, that tells where the data starts. Again, this is the most efficient method.
Note: This may not be possible, if the data must start on a new boundary. If this is the case, you may have to write the remainder data on the sector(s) to new sector(s), essentially moving the data.
The preferred method is to write a new file that starts with data copied after the deleted area.
Moving files on same drive is faster than copying files since data isn't duplicated; only the file pointer, (symbolic)links & file allocation/index table is updated.
The move command in CMD could be modified to allow user to set file start & end markers, effecting file truncation without copying file data, saving valuable time & RAM/Disk overheads.
Alternative would be to send the commands direct to the device/disk driver bypassing the Operating System as long as OS knows where to find the file & file properties eg. file size, name & sectors occupied on disk.

Append to a JSON array in a JSON file on disk, every second using C++

This is my first post here, so please bear with me.
I have searched high and low on the internet for an answer, but I've not been able to resolve my issue, so I have decided to write a post here.
I am trying to write(append) to a JSON array on file using C++ and JZON, at intervals of 1 write each second. The JSON file is initially written by a “Prepare” function. Another function is then called each second to a add an array to the JSON file and append an new object to the array every second.
I have tried many things, most of which resulted in all sorts of issues. My latest attempt gave me the best results and this is the code that I have included below. However, the approach I took is very inefficient as I am writing an entire array every second. This is having a massive hit on CPU utilisation as the array grows, but not so much on memory as I had first anticipated.
What I really would like to be able to do is to append to an existing array contained in a JSON file on disk, line by line, rather than having to clear the entire array from the JSON object and rewriting the entire file, each and every second.
I am hoping that some of the geniuses on this website will be able to point me in the right direction.
Thank you very much in advance.
Here is my code:
//Create some object somewhere at the top of the cpp file
Jzon::Object jsonFlight;
Jzon::Array jsonFlightPath;
Jzon::Object jsonCoordinates;
int PrepareFlight(const char* jsonfilename) {
//...SOME PREPARE FUNCTION STUFF GOES HERE...
//Add the Flight Information to the jsonFlight root JSON Object
jsonFlight.Add("Flight Number", flightnum);
jsonFlight.Add("Origin", originicao);
jsonFlight.Add("Destination", desticao);
jsonFlight.Add("Pilot in Command", pic);
//Write the jsonFlight object to a .json file on disk. Filename is passed in as a param of the function.
Jzon::FileWriter::WriteFile(jsonfilename, jsonFlight, Jzon::NoFormat);
return 0;
}
int UpdateJSON_FlightPath(ACFT_PARAM* pS, const char* jsonfilename) {
//Add the current returned coordinates to the jsonCoordinates jzon object
jsonCoordinates.Add("altitude", pS-> altitude);
jsonCoordinates.Add("latitude", pS-> latitude);
jsonCoordinates.Add("longitude", pS-> longitude);
//Add the Coordinates to the FlightPath then clear the coordinates.
jsonFlightPath.Add(jsonCoordinates);
jsonCoordinates.Clear();
//Now add the entire flightpath array to the jsonFlight object.
jsonFlight.Add("Flightpath", jsonFlightPath);
//write the jsonFlight object to a JSON file on disk.
Jzon::FileWriter::WriteFile(jsonfilename, jsonFlight, Jzon::NoFormat);
//Remove the entire jsonFlighPath array from the jsonFlight object to avoid duplicaiton next time the function executes.
jsonFlight.Remove("Flightpath");
return 0;
}
For sure you can do "flat file" storage yourself.. but this is a symptom of needing a database. Something very light like SQLite, or mid-weight & open-source like MySQL, FireBird, or PostgreSQL.
But as to your question:
1) Leave the closing ] bracket off, and just keep the file open & appending -- but if you don't close the file correctly, it will be damaged & need repair to be readable.
2) Your current option -- writing a complete file each time -- isn't safe from data loss either, as the moment you "open to overwrite" you lose all data previously stored in the file. The workaround here, is to rename the old file as a backup before you start writing.
You should also make backup copies of your file, with the first option. (Say at daily intervals). Otherwise data loss is likely to occur eventually -- on Ctrl-C, power loss, program error or system crash.
Of course if you use any of SQLlite, MySQL, Firebird or PostgreSQL all the data-integrity problems will be handled for you.

Truncate or resize a file in order to modify its end

I have a FILE* file that holds some binary data. Let's say that this data are a list of double and that the last entry is a string that describes what are those double. I want to modify this string (the new string might be shorter). So first i delete the old string. I need to find the starting point of the string :
fseek(file,-size(sring.size()),SEEK_END);
and then what should i do ? i found Delete End of File link but i don't know which one to use... Once the file is re-sized, can i simply write my new string using fwrite ?
Neither FILE* nor iostream support truncation. If you want to
edit a file so that the new file is shorter than the old, you
have two solutions:
The usual solution is to copy the original file into a new
file, making any changes as you go. When finished, close the
new file, verify that there are no errors (an important point),
then delete the original file and rename to new file to have the
original name. This may cause problems on Unix systems if
there were hard links to the original file. (Typically, this
isn't an issue, since everyone uses soft links now. If it is,
you should stat the original, and if the st_nlink field is
greater than 1, copy the new file onto the original, and then
delete the new file.) On the other hand, it is the most generic
option; it works for all types of modifications, anywhere in the
file.
There are usually system specific functions at the lower level
to truncate a file. Under Unix, this is ftruncate. But
you'll need to find the byte count where you want to truncate
first; ftruncate requires an open file, but it won't truncate
at the current position in the file. So you'll have to 1) find
the start of this last line in the file, 2) seek to it, 3) write
the new value, 4) call ftell (or ftello, if the length can
be too large to fit on a long) to find the new end position.
At this point, you have the problem of synchronizing your
FILE* with the lower level; personally, I'd fclose the file,
then reopen it with open, and do the ftruncate on the file
descripter from this open. (In fact, personally, I'd do the
entire job using open, read, lseek, write, ftruncate
and close. And maybe stat to find out the file length up
front. If you don't have to translate the doubles,
there's really nothing that FILE* adds.
As a general rule, I'd go with the first solution, and only try
the second if it turns out to be too slow. (If the file
contains a couple of billion doubles, for example, copying them
will take some time.)
If you want to resize a file, then ftruncate() (http://www.linuxmanpages.com/man2/ftruncate.2.php) is the function you're looking for. You'll need to call fileno() on the FILE * structure to get the file descriptor for ftruncate(), though.
As for appending the new data (the new string) once the file has been reduced in size, just seeking to the end (fseek(file, 0, SEEK_END)) and fwrite()'ing there should do it.
EDIT: remember to call fflush() before truncating the file!