I want to write to a file without overwriting anything. It is a text file containing records. When I delete a specific record, I do not actually remove it from the file, I just put information in the header saying that it is deleted. How can I do this?
You cannot append to the BEGINNING of a file without having to rewrite it from scratch. It has to go at the end (which makes sense, since that's what the word "append" means).
If you want to be able to flag a record as deleted without reserving space for that flag, you'll need to place the information at the end, or rewrite everything.
A more sensible approach is indeed to reserve the space upfront - for example by placing a "deleted" field in each record.
One possible solution is if there are certain characters which are normally dissallowed in records (it seems like each file is a record - please correct me if I'm wrong):
Use these characters in combination with some number of word flag (eg. #deleted#, or #000 if # is a character not normally allowed in records).
Then just overwrite whatever happens to be at the beginning of the record; it's deleted anyways so it shouldn't matter that you're overwriting part of it.
On the other hand, this probably isn't a good idea if you anticipate ever needing to recover 'deleted' files.
By the way - if you do append (at the end of the file) the deleted flag, note that it's very easy to check for it if you know the file size - just look at the end of the file.
Related
I have 10 text files (named file0.txt to file9.txt) with arbitrary lengths and number of lines. I need to randomly pick a file, randomly access 1-3 lines from that file, process them and repeat until all the lines of all the files have been processed. This only needs to be done once. For the sake of this question let's say "process" means print the lines. Does anyone have any suggestions on how I can go about doing this without loading all the text files into memory?
There's not really any way to 'randomly access' (in the sense that you can randomly access a vector) lines in a text file since the only way to find the lines is to search the file linearly for newlines. This means you'll at least need to stream through the files once to access lines even if you don't load them fully into memory.
You could achieve what you're describing by passing over all the files once to count the number of lines in them and then passing over them again to pull out randomly selected lines. I'm not sure what the benefit of that would be though. What are you really trying to achieve?
you can scan the file one to index where line starts, and keep that in memory (or even persist that if you need to do the same file more than once).
once you have that you can just seek into the line beginning and just read it till newline/eof before processing.
Suggestion:
1/ Make a copy of the files
2/ Erase a line when it is read
3/ update number of lines in file
That way you randomly pick a line that exist and that was not already read.
Lot of read/write...not efficient
I have a program where I need to write text lines to a log file very frequently. I would like to limit the number of lines in the log file to 1000. When I write lines to the file, it should append them normally. Once the file reaches 1000 lines, I'd like to get rid of the first line and then append the new one. Does anyone know if there is a way to do this without rewriting the entire file each time?
Generally it's a little bit better for a case like this to remove more than one line at a time from the beginning.
That is, if your limit is 1000 lines, and you hit 1000 lines, delete the first 300 or so, and then resume writing. That way, you're not performing the delete operation with every single line written thereafter, only every 300 times. If you need to persist 1000 lines, then instead keep up to 1300 and delete 300 when 1300 is reached.
All files have to be aligned to FS cluster size. So, no, there's no way. You can append a line to a file, but you can't delete the first line without file rewriting.
You can use 2 files by turns.
Or use some buffer in memory and flush it periodically.
I think you still have to scan the file to find out how many lines are in the file at this moment. In that case, you can put it in some sort of buffer that you could easily add and delete from.
Then you do your logging and when you are done, you could "re-write" the file with the buffer (or only last 1000 lines).
Other alternatives are discussed above.
And yeah, try to avoid deleting line-by-line. Generally, it is a costly operation.
I've found some similar topics here and on CodeProject:
Small logger class;
Flexible logger class using standard streams in C++
http://www.codeproject.com/Articles/584794/Simple-logger-for-Cplusplus
Hope you find them useful :)
Any time you want to log, you can open the file, read your write index, jump to the position, and write the fixed-width log entry. When your index hits your upper threshold, simply set it back to 0.
There are a lot of warnings with this, though - first is that each proper log entry (assuming you close the file in between) will require an open, a read, a seek, a write, a seek, a write and a close - to find your index, go to it, write the new entry, then update your index. You also have the inherent issues of writing a fixed-size data element. Also, a human reader will depend on your content to know where the "beginning" of the file is. Most people expect "line 1" to be the first line.
I'm a much bigger advocate for simply having a few files and "rolling" them, so that each file on its own is coherent, but if you want just one file with a fixed number of lines, the circular buffer idea can work.
When you only want to use one file, and the length of the lines are not constant, there is no way without rewriting the whole file.
Depending on how often you are appending to the file, I don't see any problem doing so. 1000 lines of approx 100 chars are only approx 100kb, which is not to much. Additionally you may add some hysteresis.
However:
If the line length is constant (or you hard-limit the line length to some constant), you could just overwrite the oldest line. But then you have to keep track of the log file positions of old/new lines
I would use two files: The first one where you append lines. When the file gets full, rename it to a second one, and fill the first one from the beginning.
I'm writing a payroll program in c++ and need to be able to read lines in a file, do calculations, and then overwrite the read lines in the file. IS there a function/way i can simply overwrite specific lines, insert new lines, add onto the end of an existing file?
There are no C++ functionality to "insert" or "remove" text in a text-file. The only way to do that is to read the existing text in, and write out the modified text.
If the new text fits in the same space as the old one, all you need to do is to overwrite the existing text - and of course, you can always add extra spaces before/after a comma in a .CSV file, without it becoming part of the "field". But if the new data is longer, it definitely won't work to "overwrite in place".
Adding to the end is relatively easy by using the ios_base::ate modifier. But inserting in middle still involves basically reading until you find the relevant place, and then, if the new text is longer, you have to read all the following lines before you can write the new one(s) out.
I'm opening a file and getting a QTextStream of it. I am then reading the stream line by line using readLine(). When the line matches a certain string, I need to replace it with another string. I need the behaviour to be that the line is completely replaced (ie, if the line was "longword" and I replace it with "word", the line should contain "word" and "word" only).
At the moment I am using seek() and then the << operator to put my string in at the given location, but the remnants of the last string remain, so I am left with something like "wordword". How can I prevent this from happening and ensure the entire previous line is fully replaced with my new one?
To my knowledge, you cannot simply remove a chunk of a text file in-place. If the replacement string was identical in size, you might be able to replace those exact bytes, and if it were shorter you might be able to hack around the problem by filling the empty space with nulls.
If you didn't want to do that, you would have to create a new file, read each line from the old file, make any required changes to that line in memory, then write that line out to the new file. Once this is complete, you could then replace the original file with the new file.
If it were possible to add/remove chunks to/from the file, you would most likely be left with a considerably fragmented file on the HDD. If you needed to insert more characters, extra fragments would have to be created as the new data simply couldn't fit in the amount of space occupied by the old data, and removing data would leave holes in the file.
Is there a way that I can seek to a certain line in a file to read or write data?
Let's say I want to write some data starting on the 10th line in a text file. There might be some data already in the first few lines, or the file could even be empty. Is there a way I can seek directly to the line I want without having to worry about what's already in the file?
Only if the lines are all the same length (seek to 9 * bytes_per_line). Otherwise, you'll just have to scan your way to the appropriate spot in the file.
Also be wary of writing into the middle of a file. It may not do what you expect (insert new lines). It will simply overwrite whatever content is already there, and won't respect existing line boundaries.
You can seek to a position in a file, but that position must be a character offset from the start, end or current position - see for example fseek(). There is no way of seeking to a particular line, unless all the lines are exactly the same length.
No, you have to process the data to find the line delimiters (unless you have fixed length lines). Have a look at getline(), ftell() and fseek(). http://www.pixelbeat.org/programming/readline/cpp.cpp
The easy best way is to read the file in memory inserting for instance each line in a vector of strings, then modifying/adding whatever you want, and re-write each line in a new file.
(supposing the file fits in memory)