I've been having some trouble achieving this functionality. I haven't been able to find code resolving this specific issue anywhere.
Thank you for taking time to help me, it means a lot
I made my own getline() with while() in it with an integer that increases with every line, and the number that i entered ignores the number of that line, writes to a new file, removes the original file, and renames the temporary file to the original, many thanks
Related
Suppose I have a really large text file, say 100 million lines or 1 GB and I want to delete the last line. Is there anyway to do this without having to rewrite 99,999,999 lines to a new file and delete the old one? Suppose the file is really really large that the rewrite option is prohibitively expensive. What would you do to delete the last line then? Thank you.
You can open the file, read from the end backwards until you find the first line delimiter (normally LF or CR/LF, depending on platform), calculate the file offset at that point, and truncate the file to that file offset.
You should use a truncation function, but neither FILE* nor iostream support it.
However, there are usually OS-specific functions at the lower level to truncate a file.
If Unix, you may use ftruncate, but you'll need to find the offset where you want to truncate first (does each line have a fixed size?).
Be careful that, if you have opened a FILE* for finding the offset, you need to be sure to synchronize it with the lower level. You may simply fclose the file, then reopen it with open for the ftruncate of the file at the decided offset.
Similar questions: https://stackoverflow.com/a/873653/2741329 and https://stackoverflow.com/a/15154682/2741329
I have 10 text files (named file0.txt to file9.txt) with arbitrary lengths and number of lines. I need to randomly pick a file, randomly access 1-3 lines from that file, process them and repeat until all the lines of all the files have been processed. This only needs to be done once. For the sake of this question let's say "process" means print the lines. Does anyone have any suggestions on how I can go about doing this without loading all the text files into memory?
There's not really any way to 'randomly access' (in the sense that you can randomly access a vector) lines in a text file since the only way to find the lines is to search the file linearly for newlines. This means you'll at least need to stream through the files once to access lines even if you don't load them fully into memory.
You could achieve what you're describing by passing over all the files once to count the number of lines in them and then passing over them again to pull out randomly selected lines. I'm not sure what the benefit of that would be though. What are you really trying to achieve?
you can scan the file one to index where line starts, and keep that in memory (or even persist that if you need to do the same file more than once).
once you have that you can just seek into the line beginning and just read it till newline/eof before processing.
Suggestion:
1/ Make a copy of the files
2/ Erase a line when it is read
3/ update number of lines in file
That way you randomly pick a line that exist and that was not already read.
Lot of read/write...not efficient
I have a program where I need to write text lines to a log file very frequently. I would like to limit the number of lines in the log file to 1000. When I write lines to the file, it should append them normally. Once the file reaches 1000 lines, I'd like to get rid of the first line and then append the new one. Does anyone know if there is a way to do this without rewriting the entire file each time?
Generally it's a little bit better for a case like this to remove more than one line at a time from the beginning.
That is, if your limit is 1000 lines, and you hit 1000 lines, delete the first 300 or so, and then resume writing. That way, you're not performing the delete operation with every single line written thereafter, only every 300 times. If you need to persist 1000 lines, then instead keep up to 1300 and delete 300 when 1300 is reached.
All files have to be aligned to FS cluster size. So, no, there's no way. You can append a line to a file, but you can't delete the first line without file rewriting.
You can use 2 files by turns.
Or use some buffer in memory and flush it periodically.
I think you still have to scan the file to find out how many lines are in the file at this moment. In that case, you can put it in some sort of buffer that you could easily add and delete from.
Then you do your logging and when you are done, you could "re-write" the file with the buffer (or only last 1000 lines).
Other alternatives are discussed above.
And yeah, try to avoid deleting line-by-line. Generally, it is a costly operation.
I've found some similar topics here and on CodeProject:
Small logger class;
Flexible logger class using standard streams in C++
http://www.codeproject.com/Articles/584794/Simple-logger-for-Cplusplus
Hope you find them useful :)
Any time you want to log, you can open the file, read your write index, jump to the position, and write the fixed-width log entry. When your index hits your upper threshold, simply set it back to 0.
There are a lot of warnings with this, though - first is that each proper log entry (assuming you close the file in between) will require an open, a read, a seek, a write, a seek, a write and a close - to find your index, go to it, write the new entry, then update your index. You also have the inherent issues of writing a fixed-size data element. Also, a human reader will depend on your content to know where the "beginning" of the file is. Most people expect "line 1" to be the first line.
I'm a much bigger advocate for simply having a few files and "rolling" them, so that each file on its own is coherent, but if you want just one file with a fixed number of lines, the circular buffer idea can work.
When you only want to use one file, and the length of the lines are not constant, there is no way without rewriting the whole file.
Depending on how often you are appending to the file, I don't see any problem doing so. 1000 lines of approx 100 chars are only approx 100kb, which is not to much. Additionally you may add some hysteresis.
However:
If the line length is constant (or you hard-limit the line length to some constant), you could just overwrite the oldest line. But then you have to keep track of the log file positions of old/new lines
I would use two files: The first one where you append lines. When the file gets full, rename it to a second one, and fill the first one from the beginning.
I need some help with C++
I am trying to create a program which contains excersises to practice the different German cases.
Hard-coding all questions and respective answers seems like an awful lot of work, and super inefficient.
What I want my program to do, is: grab a random line from file X, and grab the same line number from file Y. (This seems like the easiest way to get both questions and answers from external files.) To me, it seems the most logical to get a random number, and use that as a line number. But, that's about how far I got...
I know basic C++, but am very eager to learn.
Can anyone please explain to me how to pull this off, including all necessary command?
First, I would recommend that you store questions and answers in the same text file, probably by alternating between a question line and then an answer line. This will make correcting mistakes, adding/removing questions, and general maintenance of your data easier.
But if you want to keep them in separate files, the following code snippet will read your text file in and store the questions in an array (an stl vector) which you can then index or iterate any way you'd like:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
int main()
{
std::ifstream file("questions.txt");
std::string line;
std::vector<std::string> questions;
while (std::getline(file, line))
{
questions.push_back( line );
}
// Now do something interesting with your questions. You can index them
// like this: questions[5], or questions[random_index]
}
There are two ways of doing this:
If you are planning on getting question/answer pairs you would be best to just read the who file line by line and store all the lines. Then you just look it up in the array.
If for some reason you only want to get one line at a time you'll have to read lines and count until you've gotten to the line you want.
you may have a keyword for each line, like an id.
that id can be paired to both questions, and answers if you have multiple files. or just pair the question, with the answer same order, or even same file.
You are constructing a database.
You should use a database.
The problem is that the question and answers are variable length records, which make positioning difficult. If all the records were the same length, you could position to a random record much faster.
In order to find a text line, you will need to read past all the other newlines (since they are not in the same column in every line). This is fine if you only need to search once, but very slow to search many times. Now comes the reason for the database.
To make finding the questions and answers faster, create an index file or table. (Starting to smell like a database). The index file will contain records of the form [question #, file position] where file position is the position in the question's file that the question starts on.
You would load this file into memory and use it to index into the "questions" file. By storing the index contents into a file, you won't have to construct it from scratch each time your program starts; only when the question's file changes.
I posted this over at Code Review Beta but noticed that there is much less activity there.
I have the following code and it works just fine. It's function is to grab the input from a file and display it out (to confirm that it's been grabbed). My task is to write a program that counts how many times a certain word (string) "abc" is found in the input file.
Is it better to store the input as a string or in arrays/vectors and have each line be stored separately? a[1], a[2] ect? Perhaps someone could also point me to a resource that I can use to learn how to filter through the input data.
Thanks.
input_file.open ("in.dat");
while(!input_file.eof()) // Inputs all the lines until the end of file (eof).
{
getline(input_file,STRING); // Saves the input_file in STRING.
cout<<STRING; // Prints our STRING.
}
input_file.close();
Reading as much of the file into memory is always more efficient than reading one letter or text line at a time. Disk drives take a lot of time to spin up and relocate to a sector. However, your program will run faster if you can minimize the number of reads from the file.
Memory is fast to search.
My recommendation is to read the entire file, or as much as you can into memory, then search the memory for a "word". Remember, that in English, words can have hyphens,'-', and single quotes, "don't". Word recognition may become more difficult if it is split across a line or you include abbreviations (with periods).
Good luck.