Reading file contents using with statement - python-2.7

I'm fairly new to Python, so I haven't done much in the way of reading files.
My question is this: if I use
with open(sendFile, 'r') as fileContent:
response = fileContent.read()
will the whole file always be read in to response at once or is there any chance that I'd have to call read() multiple times? Or does read() just handle that case for you?
I believe the file will be closed after this call, so I just want to make sure that I'm getting the whole file and not having to go back, open it again, and read more

Unless you specify a size, the read method reads the whole contents of the file.
From https://docs.python.org/2/library/stdtypes.html#file.read :
If the size argument is negative or omitted, read all data until EOF is reached.

Related

How to refresh an input file stream in C++

I am writing a program that monitors for changes in a file for a specific purpose. The possible values (3) in the file are known and can be differentiated by the first letter.
Using an input file stream ifstream status;, I'm unable to refresh the buffer of the input stream status to reflect changes in the file. I don't want to spam status.close() and status.open() to solve the problem.
If the changes you mentioned include only appended bytes, then you can use the std::ifstream::clear() to clear any error bit and continue reading the file until reaching the EOF. Check out this answer.

How to delete data/content from a txt file

I am trying to learn how to handle and work on files, now I know how to open them, write on them and read. What I would like to do is, how can I delete data/content from the file, when I have finished to use the program?
I use the a txt file to save some informations that I used them, when I need during the execution of the program, but when I finish, I would like to delete the data saved, which are simply numbers. I was thinking to remove the file each time, and to create, but I think it's not the ideal. Any suggestions?
Using std::filesystem::resize_file:
std::filesystem::resize_file(your_file, 0);
In such a case, you usually just re-write the file as a whole. If it is a small file, you can read in the entire content, modify the content and write the file back.
With large files, if you fear that you are consuming too much memory, you can read in the file in chunks of appropriate size, but you'd write these chunks to another, temporary file. When being finished, you delete the old file and move the temporary file to the location of the old file.
You can combine both aproaches, too: Read the entire file, write it to temporary at once, then delete and move; if anything goes wrong while writing the temporary, you'd still have the old file as backup...
You can open the file in writing mode (w) and then close it . It will truncate all previous data.
It's generally a good idea to clean up temporary files once your program ends. I certainly wouldn't leave an empty temporary file hanging around. You can easily remove file (e.g. with boost::filesystem::remove or std::filesystem::remove). If you really just want to 'clear' a file then:
void clear_file(const std::string& filename)
{
std::ofstream file {filename};
}
Will do the job.

How does ios::trunc work in C++ for binary files?

When I write fout.open("file.dat",ios::out|ios::trunc|ios::binary);
does the file loose all its data at that instance
or it will wait for something to be written and then data will be lost?
(I hope you get my point, all I'm asking is whether just writting the above statement, i.e fout.write() will invoke removal of records from a binary file or we need to pass some data to the file and then the previous data already stored in the file would be lost)
The trunc flag will zero the file out at open().

Deleting Lines after reading them in C++ program using system()

I am trying to understand how basic I/O with files is handled in c++ or c. My aim is to read file line by line and send the lines across to a remote server. If the line is sent, I want to delete it from the file.
One way I tried was that I kept a count of the lines read and called an system() system call to delete the 'count' number of lines. I used the bash command: sed -i -e 1,'count'd filename.
After that I continued reading the file and surprisingly it worked as planned.
I have two questions:
Is this way reliable?
And why does this work at all, when while
reading the file I deleted a part of it and yet it works? What if I
did a seek to a previous position, what then?
Best,Digvijay
PS:
I would be glad if somebody could suggest a better way.
Also here is the code for the program I wrote:
#include<iostream>
#include<fstream>
#include<string>
#include<sstream>
#include<cstdlib>
int main(){
std::ifstream f;
std::string line;
std::stringstream ss;
int i=0;
f.open("in.txt");
if(f.is_open()){
while(getline(f,line)){
std::cout<<line<<std::endl;
i++;
if(i==2)break;
}
ss<<"sed -i -e 1,"<<i<<"d in.txt";
system(ss.str().c_str());
while(getline(f,line)){
std::cout<<line<<std::endl;
}
}
return 0;
}
Edit:
Firstly thanks for taking the time to write answers. But here is some extra information which I missed out on earlier. The files I am dealing with are log files. So they are constantly being appended with information from devices. The reason why I want to avoid creating a copy is, because the log file themselves are very big(at times) and plus this would help to keep the log file short. Since they would be divided into parts and archived on the server.
Solution
I have found the way to deal with the problem. Apparently Thomas is right, that sed does create a new file. So the old file remains as is. Using this, I can read n lines, call the system function, close the file pointer and open it again. I do this on small chunks of the log, repeatedly until it becomes small and hence efficient to deal with. The server while archives the logs in 1gb files.
However I have a new question, due to memory constraint, I need to know if it is possible to split a log file into two efficiently. (Which possibly would be another question on SO)
Most modern file systems don't support deleting lines at the beginning of the file, so doing so would be very inefficient.
The normal solution to your actual problem is to stop writing to your log file when it reaches some size, then start writing to a new file. The code that copies the files can delete a whole file once it has been written (this is an efficient operation).
sed writes a new version of the file, while the program keeps reading the same version that it opened. This is the usual behavior of Unix and Linux when a program writes a file that another program has open.
You can see this for yourself with this small C program:
#include <stdlib.h>
#include <stdio.h>
int main(void) {
FILE *f = fopen("in.txt", "r");
while (1) {
rewind(f);
int lines = 0;
int c;
while ((c = getc(f)) != EOF)
if (c == '\n')
++lines;
printf("Number of lines in file: %d\n", lines);
}
return 0;
}
Run that program in one window, and then use sed in another window to edit the file. The number of lines printed by the program will stay the same, even if the file on disk has been edited, and this is because Unix keeps the old, open version, even if it is no longer accessible to other programs.
As to your first question, how reliable your solution is, as far as I can see it should be reliable, except with the usual caveats about the system crashing or running out of memory in the middle of an update, someone else accessing the file, and of course all the problems with the system call. It is not very efficient, though, and for large data sets you might want to do it differently.
sujin's comment about using a temporary file for the lines you want to keep seems reasonable. It would be both faster and safer. Keep the original file, so if the system crashes you'll still have your data, and wait until you have finished to rename the old file to "in.txt.bak", and then rename your temporary file to "in.txt".
First off, avoid use of system calls as much as you can (if possible, don't use it at all) as they create race conditions and other problems which drastically (and often) detrimentally affect the outcome of your program. This especially true if access to files are involved.
Given your problem, there are a number of ways to do this, each with its own caveats.
I'll cover three possible solutions:
1) If the file is small enough:
you can read in the entire thing in a data structure (vector, list, deque, etc.)
delete the original file
determine how many lines to read (and send off via server protocol)
then write the remaining lines as the name of the original file.
If you intend to parallelize your program later on, this may be a better solution, provided that the file is small. Note: small is a relative term, but is generally limited by how much memory you have available.
2) If the file is quite large or you're limited by memory constraints, you will have to get creative by using buffers. Once you've read a line and successfully sent it off via your program, you determine where the file pointer is and copy the remaining information until the end of the current file as a new file. Once done, close and delete the old file, then close and rename the new file the same name as the old file.
3) If your solution doesn't have to be in C++, you can use shell-scripting or (controversially) another language to get the job done.
1) No, it's not reliable.
2) The C++ runtime library reads your file in blocks (internally) which are then parceled out to your (higher level) input requests until the block(s) is(are) exhausted, forcing it to (internally) read more blocks from disk. Since one or more physical blocks are read in before you make any call to sed, it/they cannot be altered if sed happens to change that first part of the file.
To see your code fail, you would need to make the input file big enough that there are remaining blocks of the file that have not been read in (internally by the runtime library) before you call sed. By "fail" I mean your program would not see all the characters that were originally in the file before sed clobbered some lines.
As the other guys said, you have to make another file with the records you need after read the original file and then delete it. But in this application perhaps you will see more useful a fifo than a file. If you are on a *NIX platform check up about the makefifo statement from the console.
It is like a file with the singularity that after read a line it gets deleted.

Does constructing an iostream (c++) read data from the hard drive into memory?

When I construct an iostream when say opening a file will this always read the entire file from the hard disk and then put it into memory, or is it streamed in and buffered by the OS on demand?
I ask because one way to check if a file exists is to see if opening it fails, but I fear if the files I am opening are very large then this take a long time if iostream must read the entire file in on open.
To check whether a file exists can be done like this if you want to use boost.
#include <boost/filesystem.hpp>
bool fileExists = boost::filesystem::exists("foo.txt");
No, it will not read the entire file into memory when you open it. It will read your file in chunks though, but I believe this process will not start until you read the first byte. Also these chunks are relatively small (on the order of 4-128 kibibytes in size), and the fact it does this will speed things up greatly if you are reading the file sequentially.
In a test on my Linux box (well, Linux VM) simply opening the file only results in the OS open system call, but no read system call. It doesn't start reading anything from the file until the first attempt to read from the stream. And then it reads 8191 (why 8191? that seems a very strange number) byte chunks as I read the file in.
Opening a file is a bad way of testing if the file exists - all it does is tell you if you can open it. Opening might fail for a number of reasons, typically because you don't have read permission, but the file will still exist. It is usually better to use an operating system specific function to test for existence. And no, opening an fstream will not cause the contents to be read.
What I think is, when you open a file, the corresponding data structures for the process opening the file are populated which include file pointer, file descriptor, v node etc.
Now one can read and write to a file using buffered streams (fwrite , fread) or using system calls (read and write).
When we use buffered streams, we buffer the data and then write or read it[This is done for efficiency puposes]. This statement itself means that the whole file is not read into memory but certain bytes are read into buffer and then made available.
In case of sys calls such as read and write , kernel level buffering is done (using fsync one can flush out kernel buffer too), but data is actually read and written to the device .file
checking existance of file
#include &lt sys/stat.h &gt
int main(){
struct stat file_i;
std::string f("myfile.txt");
if (stat(f.c_str(),&file_i) != 0){
cout &lt&lt "File not found" &lt&lt endl;
}
return 0;
}
Hope this clarifies a bit.