I am trying to learn how to handle and work on files, now I know how to open them, write on them and read. What I would like to do is, how can I delete data/content from the file, when I have finished to use the program?
I use the a txt file to save some informations that I used them, when I need during the execution of the program, but when I finish, I would like to delete the data saved, which are simply numbers. I was thinking to remove the file each time, and to create, but I think it's not the ideal. Any suggestions?
Using std::filesystem::resize_file:
std::filesystem::resize_file(your_file, 0);
In such a case, you usually just re-write the file as a whole. If it is a small file, you can read in the entire content, modify the content and write the file back.
With large files, if you fear that you are consuming too much memory, you can read in the file in chunks of appropriate size, but you'd write these chunks to another, temporary file. When being finished, you delete the old file and move the temporary file to the location of the old file.
You can combine both aproaches, too: Read the entire file, write it to temporary at once, then delete and move; if anything goes wrong while writing the temporary, you'd still have the old file as backup...
You can open the file in writing mode (w) and then close it . It will truncate all previous data.
It's generally a good idea to clean up temporary files once your program ends. I certainly wouldn't leave an empty temporary file hanging around. You can easily remove file (e.g. with boost::filesystem::remove or std::filesystem::remove). If you really just want to 'clear' a file then:
void clear_file(const std::string& filename)
{
std::ofstream file {filename};
}
Will do the job.
Related
I'm fairly new to Python, so I haven't done much in the way of reading files.
My question is this: if I use
with open(sendFile, 'r') as fileContent:
response = fileContent.read()
will the whole file always be read in to response at once or is there any chance that I'd have to call read() multiple times? Or does read() just handle that case for you?
I believe the file will be closed after this call, so I just want to make sure that I'm getting the whole file and not having to go back, open it again, and read more
Unless you specify a size, the read method reads the whole contents of the file.
From https://docs.python.org/2/library/stdtypes.html#file.read :
If the size argument is negative or omitted, read all data until EOF is reached.
I am trying to understand how basic I/O with files is handled in c++ or c. My aim is to read file line by line and send the lines across to a remote server. If the line is sent, I want to delete it from the file.
One way I tried was that I kept a count of the lines read and called an system() system call to delete the 'count' number of lines. I used the bash command: sed -i -e 1,'count'd filename.
After that I continued reading the file and surprisingly it worked as planned.
I have two questions:
Is this way reliable?
And why does this work at all, when while
reading the file I deleted a part of it and yet it works? What if I
did a seek to a previous position, what then?
Best,Digvijay
PS:
I would be glad if somebody could suggest a better way.
Also here is the code for the program I wrote:
#include<iostream>
#include<fstream>
#include<string>
#include<sstream>
#include<cstdlib>
int main(){
std::ifstream f;
std::string line;
std::stringstream ss;
int i=0;
f.open("in.txt");
if(f.is_open()){
while(getline(f,line)){
std::cout<<line<<std::endl;
i++;
if(i==2)break;
}
ss<<"sed -i -e 1,"<<i<<"d in.txt";
system(ss.str().c_str());
while(getline(f,line)){
std::cout<<line<<std::endl;
}
}
return 0;
}
Edit:
Firstly thanks for taking the time to write answers. But here is some extra information which I missed out on earlier. The files I am dealing with are log files. So they are constantly being appended with information from devices. The reason why I want to avoid creating a copy is, because the log file themselves are very big(at times) and plus this would help to keep the log file short. Since they would be divided into parts and archived on the server.
Solution
I have found the way to deal with the problem. Apparently Thomas is right, that sed does create a new file. So the old file remains as is. Using this, I can read n lines, call the system function, close the file pointer and open it again. I do this on small chunks of the log, repeatedly until it becomes small and hence efficient to deal with. The server while archives the logs in 1gb files.
However I have a new question, due to memory constraint, I need to know if it is possible to split a log file into two efficiently. (Which possibly would be another question on SO)
Most modern file systems don't support deleting lines at the beginning of the file, so doing so would be very inefficient.
The normal solution to your actual problem is to stop writing to your log file when it reaches some size, then start writing to a new file. The code that copies the files can delete a whole file once it has been written (this is an efficient operation).
sed writes a new version of the file, while the program keeps reading the same version that it opened. This is the usual behavior of Unix and Linux when a program writes a file that another program has open.
You can see this for yourself with this small C program:
#include <stdlib.h>
#include <stdio.h>
int main(void) {
FILE *f = fopen("in.txt", "r");
while (1) {
rewind(f);
int lines = 0;
int c;
while ((c = getc(f)) != EOF)
if (c == '\n')
++lines;
printf("Number of lines in file: %d\n", lines);
}
return 0;
}
Run that program in one window, and then use sed in another window to edit the file. The number of lines printed by the program will stay the same, even if the file on disk has been edited, and this is because Unix keeps the old, open version, even if it is no longer accessible to other programs.
As to your first question, how reliable your solution is, as far as I can see it should be reliable, except with the usual caveats about the system crashing or running out of memory in the middle of an update, someone else accessing the file, and of course all the problems with the system call. It is not very efficient, though, and for large data sets you might want to do it differently.
sujin's comment about using a temporary file for the lines you want to keep seems reasonable. It would be both faster and safer. Keep the original file, so if the system crashes you'll still have your data, and wait until you have finished to rename the old file to "in.txt.bak", and then rename your temporary file to "in.txt".
First off, avoid use of system calls as much as you can (if possible, don't use it at all) as they create race conditions and other problems which drastically (and often) detrimentally affect the outcome of your program. This especially true if access to files are involved.
Given your problem, there are a number of ways to do this, each with its own caveats.
I'll cover three possible solutions:
1) If the file is small enough:
you can read in the entire thing in a data structure (vector, list, deque, etc.)
delete the original file
determine how many lines to read (and send off via server protocol)
then write the remaining lines as the name of the original file.
If you intend to parallelize your program later on, this may be a better solution, provided that the file is small. Note: small is a relative term, but is generally limited by how much memory you have available.
2) If the file is quite large or you're limited by memory constraints, you will have to get creative by using buffers. Once you've read a line and successfully sent it off via your program, you determine where the file pointer is and copy the remaining information until the end of the current file as a new file. Once done, close and delete the old file, then close and rename the new file the same name as the old file.
3) If your solution doesn't have to be in C++, you can use shell-scripting or (controversially) another language to get the job done.
1) No, it's not reliable.
2) The C++ runtime library reads your file in blocks (internally) which are then parceled out to your (higher level) input requests until the block(s) is(are) exhausted, forcing it to (internally) read more blocks from disk. Since one or more physical blocks are read in before you make any call to sed, it/they cannot be altered if sed happens to change that first part of the file.
To see your code fail, you would need to make the input file big enough that there are remaining blocks of the file that have not been read in (internally by the runtime library) before you call sed. By "fail" I mean your program would not see all the characters that were originally in the file before sed clobbered some lines.
As the other guys said, you have to make another file with the records you need after read the original file and then delete it. But in this application perhaps you will see more useful a fifo than a file. If you are on a *NIX platform check up about the makefifo statement from the console.
It is like a file with the singularity that after read a line it gets deleted.
I was looking for an easy way to write something into the first line of an already existing textfile. I tried using ofstream like this:
ofstream textFileWriter("Data/...txt");
if (textFileWriter.is_open())
{
textFileWriter << "HEADER: stuffstuff";
}
But it would delete everything which used to be in that file, even though the ofstream wasn't constructed with std::ofstream::trunc. I cannot use std::ofstream::app, since it is important to write into the first line.
Copying the whole textfile into a vector which has the line already and then writing it back would be my last option, but something I would really like to avoid, since the textfiles are quite large.
You can't simply "append" to the beginning of a file.
The common solution is to open a new (temporary) file, write your new header, write the rest of the original file to the temporary file, and then "rename" (using the OS system calls) the temporary file as the original file.
Or as you say in your question, read the original file into an in-memory buffer (e.g. a vector) and do the modification in that buffer, and then write the buffer to the file.
I have a text file where I want to write. I want to keep the file content always. I want to write following a "FIFO" (last write always on the top line of the file).
I try using fout.open("filename"); with ate mode to keep file content, after that use seekg(0) trying to take back writing cursor to the begining of the file. Didn't work.
The unique way I found to do that I think it's so time-expensive, copy all the file content to a temporary file. Write want I want to write and after that write the content of the temp file at the end of the target file.
There must be an easy way do this operation?
Jorge, no matter what you will have to rewrite the entire file in memory. You cannot simply keep the file where it is and prepend memory, especially since it's a simple text file (maybe if there was some form of metadata you could...)
Anyways, your best chance is to flush the old contents into a temporary location, write what you need and append the old contents.
I'm not sure what you're asking for. If you want to add a
line to the beginning of the file, the only way is to open a
new, temporary file, write the line, copy the old file into
after the new line, then delete the old file and rename the
temporary.
If the original line has a fixed length, and you want to replace
it, then all you have to do is open the file with both
ios_base::in and ios_base::out.
First, you should realize that files are historically streams, i.e. they can only be read and written in one direction. This comes from the times when files were stored on tapes, which could move in one direction (at that time).
However, if you only want to prepend, then you can just store your file backwards. Sounds silly? Maybe, but this would work with just a little overhead.
Apart from that, with current OS's you will need to make a copy to prepend. While files are not streams anymore, and can be accessed randomly on a harddisk, they are still made to grow in one direction. Of course you could make a filesystem, where files grow in both directions, but I have not heard of one.
With <fstream> you may use the filebuf class.
filebuf myfile;
myfile.open ("test.txt", ios::in | ios::out);
if (!myfile.is_open()) cout << "cannot open" << endl;
myfile.sputn("AAAA", 4);
myfile.close();
filebuf myfile2;
myfile2.open ("test.txt", ios::in | ios::out);
if (!myfile2.is_open()) cout << "cannot open 2" << endl;
myfile2.sputn("BB", 2);
myfile2.close();
write to a string in order you want, then flush to the file
I am polling a directory constantly for files and every time I see a file that meets some certain criteria for reading, I open the file and parse it.
string newLine;
ifstream fileReader;
fileReader.open(filename.c_str());
while(!fileReader.eof())
{
getline(fileReader, newLine);
// do some stuff with the line...
}
filereader.close();
The above code is in a loop that runs every 1 second checking a directory for new files. My issue is that as I am transferring files into the folder for processing, my loop finds the file and passes the name of the file to ifstream who then opens it and tries to parse an incomplete file. How do I make ifstream wait until the file is done being written before it tries to parse the file?
EDIT:
Wanted to better word the issue here since a replier seems to have misunderstood my issue. I have 2 directories
mydirectory/
mydirectoryParsed/
THe way my code works is that my program checks for files in mydirectory/ and when it finds them, parses them and uses the information in the files. No writing to the files are done. Once I am done parsing the file, the file is moved to mydirectoryparsed/
The issue is that when I transfer files over the network into mydirectory/ the ifstream sees these files midtransfer and starts reading them before they finish writing to the directory. How do I make ifstream wait until the file is completely written before parsing it?
Don't transfer the files directly into the directory that your program is watching; instead, transfer them into a different directory on the same drive, and then when the transfer is done, move them into the watched directory. That way, the complete file appears in the watched directory in a single atomic operation.
Alternatively, you could use a naming convention in the watched directory — append a suffix like ".partial" to files that are being transferred, and then rename the file to remove the suffix when the transfer is done. Have your program ignore files whose names end with the suffix.
You're not supposed to open the file every time you write in it. Open it once!
Some pseudo-code for this would be :
1- Open file
2- Get the data you want to write, treat that data
3- Call the write to file function
4- Loop until you have nothing left to write
5- Close de file