can i perform gzseek to update a file compressed using gzwrite (CPP)? - c++

I have a file written using gzwrite. Now i want to edit this file and insert some data in the middle by seeking. Is this possible with gzseek/gzwrite in cpp?

No, it isn't possible. You have to create a new file by successively writing the pieces.
So it is not much different from inserting data in the middle of an uncompressed file, except for one thing: with the uncompressed file, you could leave a hole of the right size (a series of spaces, for example) and later on overwrite that with the data to be inserted, but of course that is not possible with the compressed file because you cannot predict its compressed length.

Related

save/write data structure into binary file and then restore from it in c++

I have a char** array that stores all sort of things like struct, integers and characters.
char** disk = new char*[100];
I set every block of disk to 64 bytes and use memcpy to store different information.
Then, I need to save this disk into a file, in which I can restore from it again.
But I don't know how to save this data structure into a binary file, all I know is output some text and read from this text.txt file using ifstream. I don't think this is efficient, so my question is what is the best way to write this disk into a file? How to write into a binary file? And how can I restore it (read from it?)? Could you please give me some examples?
Thanks!

C++ write to a specific part of text file without overwriting

I'm writing a simple calendar application which saves the data in a text file. I use the iCalendar-format, so my text file ends "END:VCALENDAR".
When the user adds a new event, the application should write the associated data at the end of the text file without overwriting "END:VCALENDAR", how can I do this? What about deleting an event which is saved in the middle of the text file? Is there a need to write the whole file again using the updated data? Many thanks.
You can't dynamically "expand" the file by writing in the middle of it.
You'll need to, either:
Deserialize the whole calendar to memory, then write it back (best option)
Read into memory everything which lies past the point you want to insert the data, write you data, then write the stored file "tail"
There isn't any way of inserting into the middle of a file; the underlying OS doesn't support it. The usual technique is to copy the file into a temporary file, making whatever modifications you need to along the line, then (and only if there are no errors on the output of the copy—do verify that the output stream has not failed after the close) delete the input file and rename/move the temporary file to the original name.
There is no method supported by the C++ libraries that, unlike append, gives an option to insert at any specific position into a file; be it a text or a binary file.
There are two options for you then:
First is the one you are presuming, that is, read the whole file, update the data and write it back again.
Second is to seek in the file to the last line's first character E as in END:VCALENDAR, write your event and then append "END:VCALENDAR" to it.
And yes, you can find that first character of last line, E right after the last newline character, programmatically.
Sorry, there isn't really any other way around, as far as I know.

Delete record from file C++

I'm working on a simple database console application in C++ for adding, editing and deleting records in a .dat file. I have the addition and modification down, I'm just finding it hard to understand the concept of deletion in this scenario. Below is how I write a record.
Write record
fh.seekp(num*sizeof(customerObj),ios::beg); // Move the write pointer to where rec is
fh.write((char*)&customerObj,sizeof(customerObj)); // Write updated rec
Any ideas how instead of write() I could have something equivalent to delete()... or is it not that simple?
C and C++ don't have functions to delete parts of files. Many operating systems don't either.
Possible options:
If this is the last record, truncate the file. If not, move (=copy) all records after it, overwriting it, then truncate. Alternatively you could move (=copy) the last record to it and then truncate.
Create an extra file and copy to it all records before this and after this record. Then delete the old file and rename the new file.
Mark the record as unused. When writing new records check if you have any unused locations and use them first.
Use a file per record.
Introduce marker of deleted record. So instead of movement of large chunks of file you need write 1 symbol. When you need allocate new record you could iterate over already deleted and just remove marker.
Well, deleting from files is not trivial as you cannot delete a row from a file directly.
One approach is to read the entire file (or in bulks) and write it back without the required line. (quite robust and not efficient for large files).
Maybe if you divide you record file into smaller partitioned files than doing the above will be more efficient.
Another thing you can do is just mark a row in the file as invalid (as done in memory when deleting a pointer) and overwrite it when needed which of course depends on how you write your records but I hope you get what I mean.

How do I truncate a file in Visual C++?

I have the following case. I have a big file say 1 KB. I want to read the first 100 bytes and then delete the 100 bytes read data from the file and then read next 100 bytes. To read 100 bytes is ok, but how do I delete 100 bytes from the file?
This is commonly done as a multiple-step process:
Rename the original file.
Write the data you want into a new file with the original file name.
Delete the old file with the temporary name that contains the data you no longer want.
That way, if something were to go wrong, you could simply restore the original file that you renamed. Moving a file from one place to another is implemented this way, as well.
However, if you don't want to do this, the SetEndOfFile function is another viable option to truncate the contents of a file in-place. From the documentation:
Sets the physical file size for the specified file to the current position of the file pointer.
The physical file size is also referred to as the end of the file. The SetEndOfFile function can be used to truncate or extend a file.
That wouldn't be called truncating; that term refers to removing data from the end, not the beginning. I'm not aware of any operating system where this is possible, other than by copying the contents of the file to a new file, starting at the 100th byte.
Deleting data that has been processed in a file is time consuming and in most cases not necessary.
Deleting data near the top or middle of the file requires writing a new file, which takes time and disk space. Most applications will read and process the entire file then rename the file (with a backup extension). This is useful for debugging purposes. Deleting an entire file is often a faster operation that writing a new file without processed data.
Deletions should only take place when necessary. For files, one can store an offset of where the valid data begins, thus reducing the need to delete data from a file. For secure purposes, overwriting data in the file is often faster then creating a new file without the processed data.
First try writing your program to not delete data in the file. Only delete as necessary, after the program is robust and working correctly. Many people would suggest to only delete files when there is no more space on drive.

C++ inserting a line into a file at a specific line number

I want to be able to read from an unsorted source text file (one record in each line), and insert the line/record into a destination text file by specifying the line number where it should be inserted.
Where to insert the line/record into the destination file will be determined by comparing the incoming line from the incoming file to the already ordered list in the destination file. (The destination file will start as an empty file and the data will be sorted and inserted into it one line at a time as the program iterates over the incoming file lines.)
Incoming File Example:
1 10/01/2008 line1data
2 11/01/2008 line2data
3 10/15/2008 line3data
Desired Destination File Example:
2 11/01/2008 line2data
3 10/15/2008 line3data
1 10/01/2008 line1data
I could do this by performing the sort in memory via a linked list or similar, but I want to allow this to scale to very large files. (And I am having fun trying to solve this problem as I am a C++ newbie :).)
One of the ways to do this may be to open 2 file streams with fstream (1 in and 1 out, or just 1 in/out stream), but then I run into the difficulty that it's difficult to find and search the file position because it seems to depend on absolute position from the start of the file rather than line numbers :).
I'm sure problems like this have been tackled before, and I would appreciate advice on how to proceed in a manner that is good practice.
I'm using Visual Studio 2008 Pro C++, and I'm just learning C++.
The basic problem is that under common OSs, files are just streams of bytes. There is no concept of lines at the filesystem level. Those semantics have to be added as an additional layer on top of the OS provided facilities. Although I have never used it, I believe that VMS has a record oriented filesystem that would make what you want to do easier. But under Linux or Windows, you can't insert into the middle of a file without rewriting the rest of the file. It is similar to memory: At the highest level, its just a sequence of bytes, and if you want something more complex, like a linked list, it has to be added on top.
If the file is just a plain text file, then I'm afraid the only way to find a particular numbered line is to walk the file counting lines as you go.
The usual 'non-memory' way of doing what you're trying to do is to copy the file from the original to a temporary file, inserting the data at the right point, and then do a rename/replace of the original file.
Obviously, once you've done your insertion, you can copy the rest of the file in one big lump, because you don't care about counting lines any more.
A [distinctly-no-c++] solution would be to use the *nix sort tool, sorting on the second column of data. It might look something like this:
cat <file> | sort -k 2,2 > <file2> ; mv <file2> <file>
It's not exactly in-place, and it fails the request of using C++, but it does work :)
Might even be able to do:
cat <file> | sort -k 2,2 > <file>
I haven't tried that route, though.
* http://www.ss64.com/bash/sort.html - sort man page
One way to do this is not to keep the file sorted, but to use a separate index, using berkley db (BerkleyDB). Each record in the db has the sort keys, and the offset into the main file. The advantage to this is that you can have multiple ways of sorting, without duplicating the text file. You can also change lines without rewriting the file by appending the changed line at the end, and updating the index to ignore the old line and point to the new one. We used this successfully for multi-GB text files that we had to make many small changes to.
Edit: The code I developed to do this is part of a larger package that can be downloaded here. The specific code is in the btree* files under source/IO.
Try a modifed Bucket Sort. Assuming the id values lend themselves well to it, you'll get a much more efficient sorting algorithm. You may be able to enhance I/O efficiency by actually writing out the buckets (use small ones) as you scan, thus potentially reducing the amount of randomized file/io you need. Or not.
Hopefully, there are some good code examples on how to insert a record based on line number into the destination file.
You can't insert contents into a middle of the file (i.e., without overwriting what was previously there); I'm not aware of production-level filesystems that support it.
I think the question is more about implementation rather than specific algorithms, specifically, handling very large datasets.
Suppose the source file has 2^32 lines of data. What would be an efficent way to sort the data.
Here's how I'd do it:
Parse the source file and extract the following information: sort key, offset of line in file, length of line. This information is written to another file. This produces a dataset of fixed size elements that is easy to index, call it the index file.
Use a modified merge sort. Recursively divide the index file until the number of elements to sort has reached some minimum amount - true merge sort recurses to 1 or 0 elements, I suggest stopping at 1024 or something, this will need fine tuning. Load the block of data from the index file into memory and perform a quicksort on it and then write the data back to disk.
Perform the merge on the index file. This is tricky, but can be done like this: load a block of data from each source (1024 entries, say). Merge into a temporary output file and write. When a block is emptied, refill it. When no more source data is found, read the temporary file from the start and overwrite the two parts being merged - they should be adjacent. Obviously, the final merge doesn't need to copy the data (or even create a temporary file). Thinking about this step, it is probably possible to set up a naming convention for the merged index files so that the data doesn't need to overwrite the unmerged data (if you see what I mean).
Read the sorted index file and pull out from the source file the line of data and write to the result file.
It certainly won't be quick with all that file reading and writing, but is should be quite efficient - the real killer is the random seeking of the source file in the final step. Up to that point, the disk access is usually linear and should therefore be reasonably efficient.