Python: Is there a better way for read()? - python-2.7

In the following code I am about to create a temp file and then save the entire content of a txt file in this temp file. This is just a example. I know that makes no sense to read a text file and then write it in the temp file. But I want to demonstrate my question.
Well when I use the read() metod that means the entire contens of the temp file is saved in the RAM memory, right? I can't control the content size of the temp file. So I thinking about if there is a better way to protect the RAM memory. I don't want to inundate the RAM memory.
# Use the TemporaryFile context manager for easy clean-up
with tempfile.TemporaryFile(delete=True) as tmp:
with open('filename.txt', 'r') as my_file:
for line in my_file:
tmp.write(line)
tmp.seek(0)
exec(tmp.read())

The for line in my_file calls the file objects .next which does not buffer the entire file in memory when reading:
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer
From the docs:
For reading lines from a file, you can loop over the file object. This
is memory efficient, fast, and leads to simple code
For the tmp.read() function, from the docs:
When size is omitted or negative, the entire contents of the file will
be read and returned; it’s your problem if the file is twice as large
as your machine’s memory.
So unless you read by line as you do when you write, or read with a fixed amount incrementally i.e tmp.read(100), you will read the entire file into memory.

Related

How to delete data/content from a txt file

I am trying to learn how to handle and work on files, now I know how to open them, write on them and read. What I would like to do is, how can I delete data/content from the file, when I have finished to use the program?
I use the a txt file to save some informations that I used them, when I need during the execution of the program, but when I finish, I would like to delete the data saved, which are simply numbers. I was thinking to remove the file each time, and to create, but I think it's not the ideal. Any suggestions?
Using std::filesystem::resize_file:
std::filesystem::resize_file(your_file, 0);
In such a case, you usually just re-write the file as a whole. If it is a small file, you can read in the entire content, modify the content and write the file back.
With large files, if you fear that you are consuming too much memory, you can read in the file in chunks of appropriate size, but you'd write these chunks to another, temporary file. When being finished, you delete the old file and move the temporary file to the location of the old file.
You can combine both aproaches, too: Read the entire file, write it to temporary at once, then delete and move; if anything goes wrong while writing the temporary, you'd still have the old file as backup...
You can open the file in writing mode (w) and then close it . It will truncate all previous data.
It's generally a good idea to clean up temporary files once your program ends. I certainly wouldn't leave an empty temporary file hanging around. You can easily remove file (e.g. with boost::filesystem::remove or std::filesystem::remove). If you really just want to 'clear' a file then:
void clear_file(const std::string& filename)
{
std::ofstream file {filename};
}
Will do the job.

What is the most efficient way to remove first N bytes from a file on Windows?

Say, I have a file of an arbitrary length S and I need to remove first of its N bytes (where N is much less than S.) What is the most efficient way to do it on Windows?
I'm looking for a WinAPI to do this, if one is available.
Otherwise, what are my options -- to load it into RAM and then re-write the existing file with the remainder of data? (In this case I cannot be sure that the PC has enough RAM?) Or write the remainder of file data into a new file, erase the old one, and rename the new file into the old one. (In this case what to do if any of these steps fail? Plus how about defragmentation that this method causes on disk?)
There is no general way to do this built into the OS. There are theoretical ways to edit the file system's data structures underneath the operating system on sector or cluster boundaries, but this is different for each file system, and would need to violate any security model.
To accomplish this you can read in the data starting at byte N in chunks of say 4k, and then write them back out starting at byte zero, and then use the file truncate command (setendoffile) to set the new smaller end of file when you are finished copying the data.
The most efficient method to delete data at the beginning of the file is to modify the directory entry, on the hard drive, that tells where the data starts. Again, this is the most efficient method.
Note: This may not be possible, if the data must start on a new boundary. If this is the case, you may have to write the remainder data on the sector(s) to new sector(s), essentially moving the data.
The preferred method is to write a new file that starts with data copied after the deleted area.
Moving files on same drive is faster than copying files since data isn't duplicated; only the file pointer, (symbolic)links & file allocation/index table is updated.
The move command in CMD could be modified to allow user to set file start & end markers, effecting file truncation without copying file data, saving valuable time & RAM/Disk overheads.
Alternative would be to send the commands direct to the device/disk driver bypassing the Operating System as long as OS knows where to find the file & file properties eg. file size, name & sectors occupied on disk.

Reading file contents using with statement

I'm fairly new to Python, so I haven't done much in the way of reading files.
My question is this: if I use
with open(sendFile, 'r') as fileContent:
response = fileContent.read()
will the whole file always be read in to response at once or is there any chance that I'd have to call read() multiple times? Or does read() just handle that case for you?
I believe the file will be closed after this call, so I just want to make sure that I'm getting the whole file and not having to go back, open it again, and read more
Unless you specify a size, the read method reads the whole contents of the file.
From https://docs.python.org/2/library/stdtypes.html#file.read :
If the size argument is negative or omitted, read all data until EOF is reached.

trying to get content of html file which is located inside the tar file

i am trying to get a html file which is one of the file located inside the tar file and i have something in mind i don't know its correct or not ?? please point me if i am wrong. My idea is-
i creating a stream from tar file in and storing that stream in buffer in order to have its contents then using strstr command to search the html file inside the tar file (as i know that in my tar file html contents start from "< !doctype html" and ends at <"/html>" , so i will load the contents between them which is actally the html file).Is my approach right ??
The problem is when i give very big size to the buffer (but smaller then size of tar file which contains html + many other files also) it gives stack over flow on debugging. but when i give small index then it show the contents of some other file which is located in starting if i open tar file in notepad(i have checked by opening the tar file in notepad those contents are really present in tar file but at the starting of the tar file so when i increased the index of buffer in order to access the html file which are located at the middle of file(which actually require very big index)it gives stackoverflow on debugging).
My code is-
HRESULT AMEPreviewHandler:: CreateHtmlPreview(IStream *m_pStream) //this function is called from
// somewhere
ULONG CbRead;
const int Size= 115000 ;
char Buffer[Size+1];
(m_pStream)->Read(Buffer, Size, &CbRead );
Buffer[CbRead ] = L'\0';
char *compare= "<!doctype html"; //this we have to search in tar file
// content because the html file contents starts from here
char * StartPosition;
StartPosition = strstr (Buffer,compare); //StartPosition gives Bad
// pointer when Size is small on debugging at this small size i can see some contents in buffer which i
//can find in tar file at starting
__int64 count=0;
while (StartPosition!=NULL)
{
MessageBox(m_hwndPreview,L"hurr inside the while loop",L"BTN WND",MB_ICONINFORMATION);
count=StartPosition-Buffer+1; //to get the location of
//"<!doctype html";
}
MessageBox(m_hwndPreview,L"wafter the while loop in CreateHtmlPreview ",L"BTN WND",MB_ICONINFORMATION);
return true;
}
Please tell is my approach to get the file contents of html file inside the tar file is correct ?? and why it gives stack overflow when i give big index to buffer in order to access the contents of buffer which are loctaed in middle of tar file ?even the size i declare is smaller then the size of tar file if i see manually?
The stack has a limited size, so just allocating arbitrary large amounts will not work - you either need to put a limit to it that fits within the stack available, and then loop for the read (makes for fun if your "needle string" (what you are looking for) straddles the "gap" between two blocks, but it's possible to overcome (see below). Or simply don't use the stack, but use new to allocate enough memory to hold the whole file. Of course, if the file is VERY large, that won't work - files can be larger than the total memory of your computer, and then you are stuffed, and have to go back to "read a bit at a time". It's also wasteful in terms of resources to read the entire file into memory, only to throw most of it away.
One solution, using one buffer, would be to add the length of the "needle" to the size of the buffer. When you read the second time, copy length of needle bytes from the back of the buffer to the beginning, and then read into the buffer at "needle" bytes in, then search from the beginning of the buffer. As long as the buffer is fairly large compared to the "needle", the overhead of searching through the same part of the buffer twice is not going to matter.

How to cut a file without using another file?

Is it possible to delete part of a file (let's say from the beginning to its half), without having to use another file?
Thank's!
Yes, it is possible, but still you'll have to rewrite most of the file.
The rough idea is as follows:
open the file
beg = find the start of the fragment to be removed
len = length of the fragment to be removed
blocksize = 4096 -- example block size, may be any
datamoved = 0
do {
fseek(pos +len +datamoved);
if( endoffile ) return; -- finished!
actualread = fread(buffer, blocksize)
fseek(pos + datamoved)
fwrite(buffer, actualread)
datamoved += actualread
}
and the last step after the loop is to 'truncate' the file to the pos+datamoved size. if the underlying filesystem does not handle 'truncatefile' operation, then you have to rewrite.. but most of filesystems and libraries do support that.
The short answer is that no, most file systems don't attempt to support operations like that.
That leaves you with two choices. The obvious one is to create a copy of the data, leaving out the parts you don't want. You can do this either in-place (i.e., moving the data around in the same file) or by using an auxiliary file, typically copying the data to the new file, then doing something like renaming the new file to the old name.
The other major choice is to simply re-structure your file and data so you don't have to get rid of the old data at all. For example, if you want to keep the most recent N amount of data from a process, you might structure (most of) the file as a circular buffer, with a couple of "pointers" at the beginning tell you the head and tail points, so you know where to read data from/write data to. With a structure like this, you don't erase or remove the old data, you just overwrite it as needed.
If you have enough memory, read its contents fully to the memory, copy it back to the front of the file, and truncate the file.
If you do not have enough memory, copy in blocks, and only when you are done truncate the file.