Calling ftplib retrbinary() inside timeit using lambda freezes program - python-2.7

I'm trying to gather some metrics regarding a network issue in a small LAN with SSH disconnecting occasionally and ping exhibiting huge delays (up to a minute instead of less than a second!).
Using timeit (I've read that it's a good way to check elapsed execution time when calling some code snippet) I try to download some data from a FTP server that also runs locally, measure the time and store it inside a log file.
from ftplib import FTP
from timeit import timeit
from datetime import datetime
ftp = FTP(host='10.0.0.8')
ftp.login(user='****', passw='****')
ftp.cwd('updates/')
ftp.retrlines('LIST')
# Get timestamp when the download starts
curr_t = datetime.now()
print('Download large file and measure the time')
file_big_t = timeit(lambda f=ftp: f.retrbinary('RETR update_big', open('/tmp/update_big', 'wb').write))
print('Download medium file and measure the time')
file_medium_t = timeit(lambda f=ftp: f.retrbinary('RETR file_medium ', open('/tmp/file_medium ', 'wb').write))
print('Download small file and measure the time')
file_small_t = timeit(lambda f=ftp: f.retrbinary('RETR update_small', open('/tmp/update_small', 'wb').write))
# Write timestamp and measured timings to file
# ...
If I call the retrbinary(...) without the timeit it works without any issues. However the code above results in the script freezing right after the first timeit call.

In case someone else wants to do what I've described in my question I found the solution here. For some reason passing the lambda directly to timeit results in the behaviour I've already mentioned. However if the same lambda is first passed to an instance of timeit.Timer and that that instance's timeit() function is called it works.
For the example above (let's take just file_big_t) I did
file_big_timer = Timer(lambda f=ftp: f.retrbinary('RETR update_big', open('/tmp/update_big', 'wb').write))
file_big_t = file_big_timer.timeit()

Related

Pathos multiprocessing pool hangs

I'm trying to use multiprocessing inside docker container. However, I'm facing two issues.
(I'm using python 2.7)
Creating ProcessingPool()/Pool() (I tried both) takes abnormally long time to create. Maybe over a minute or two.
After it processes the function, it hangs.
I basically trying to run a very simple case inside my container. Here's what I have..
import pathos.multiprocessing import ProcessingPool
import multiprocessing
class MultiprocessClassExample():
.
.
.
def worker(self, number):
return "Printing number %s" %(number)
.
.
def generateNumber(self):
PROCESSES = multiprocessing.cpu_count() - 1
NUMBER = ['One', 'Two', 'Three', 'Four', 'Five']
result = ProcessingPool(PROCESSES).map(self.worker, NUMBER)
print("Finished processing.")
print(result)
and I call using the following code.
MultiprocessClassExample().generateNumber()
Now, this seems fairly straight forward enough. I ran this on a jupyter notebook and it ran without an issue. I also tried running python inside my docker container, and tried running the above code inside, and it went fine. So I'm assuming it has to do with the complete code that I have. Obviously I didn't write out all the code, but that's the main section of the code I'm trying to handle right now.
I would expect the above code to work as well. However, first thing I notice is that when I call ProcessingPool(), it takes a long time. I tried regular multiprocessing.Pool() before, and had the same effect. Whereas, in the notebook, it ran very quick and smoothly.
After waiting several minutes, it prints :
Printing number One
Printing number Two
Printing number Three
Printing number Four
Printing number Five
and that's it. It never prints out Finished processing. and it just hangs there.
But when the print statements appear, I notice that several debug message appear at the same time. It says
[CRITICAL] WORKER TIMEOUT
[WARNING] Worker graceful timeout
[INFO] Worker exiting
[INFO] Booting worker with pid:
Any suggestions would be greatly appreciated.

multiprocessing for keras model predict with single GPU

Background
I want to predict pathology images using keras with Inception-Resnet_v2. I have trained the model already and got a .hdf5 file. Because the pathology image is very large (for example: 20,000 x 20,000 pixels), so I have to scan the image to get small patches for prediction.
I want to speed up the prediction procedure using multiprocessing lib with python2.7. The main idea is using different subprocesses to scan different lines and then sending patches to model.
I saw somebody suggests importing keras and loading model in subprocesses. But I don't think it is suitable for my task. Loading model usingkeras.models.load_model() one time will take about 47s, which is very time-consuming. So I can't reload the model every time when I start a new subprocess.
Question
My question is can I load the model in my main process and pass it as a parameter to subprocesses?
I have tried two methods but both of them didn't work.
Method 1. Using multiprocessing.Pool
The code is :
import keras
from keras.models import load_model
import multiprocessing
def predict(num,model):
print dir(model)
print num
model.predict("image data, type:list")
if __name__ == '__main__':
model = load_model("path of hdf5 file")
list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
pool = multiprocessing.Pool(4)
pool.map(predict,list)
pool.close()
pool.join()
The output is
cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed
I searched the error and found Pool can't map unpickelable parameters, so I try method 2.
Method 2. Using multiprocessing.Process
The code is
import keras
from keras.models import load_model
import multiprocessing
def predict(num,model):
print num
print dir(model)
model.predict("image data, type:list")
if __name__ == '__main__':
model = load_model("path of hdf5 file")
list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
proc = []
for i in range(4):
proc.append(multiprocessing.Process(predict, list[i]))
proc[i].start()
for i in range(4):
proc[i].join()
In Method 2, I can print dir(model). I think it means the model is passed to subprocesses successfully. But I got this error
E tensorflow/stream_executor/cuda/cuda_driver.cc:1296] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x13350b2200; host src: 0x2049e2400; size: 4=0x4
The environment which I use:
Ubuntu 16.04, python 2.7
keras 2.0.8 (tensorflow backend)
one Titan X, Driver version 384.98, CUDA 8.0
Looking forward to reply! Thanks!
Maybe you can use apply_async() instead of Pool()
and you can find more details here:
Python multiprocessing pickling error
Multi-processing works on CPU, while model prediction happened in GPU, which there is only one. I cannot see how multi-processing can help you on prediction.
Instead, I think you can use multi-processing to scan different patches, which you seems to have already managed to achieve. Then stack these patches into a batch or batches to predict in parallel in GPU.
As noted by Statham multiprocess requires all args to be compatible with pickle. This blog post describes how to save a keras model as a pickle: [http://zachmoshe.com/2017/04/03/pickling-keras-models.html][1]
It may be a sufficient workaround to get your keras model passed as an arg to multiprocess, but I have not tested the idea myself.
I will also add that I had better luck running two keras processes on a single gpu using windows rather than linux. On linux I was getting out of memory errors on the 2nd process, but the same memory allocation (45% of total GPU ram for each) worked on windows. In my case they were fits - for running predictions only, maybe the memory requirements are less.

Python 2.7 : How to track declining RAM?

Data is updated every 5 min. Every 5 min a python script I wrote is run. This data is related to signals, and when the data says a signal is True, then the signal name is shown in a PyQt Gui that I have.
In other words, the Gui is always on my screen, and every 5 min its "main" function is triggered and the "main" function's job is to check the database of signals against the newly downloaded data. I leave this GUI open for hours and days at a time and the computer always crashes. Random python modules get corrupted (pandas can't import this or numpy can't import that) and I have to reinstall python and all the packages.
I have a hypothesis that this is related to the program being open for a long time and using up more and more memory which eventually crashes the computer when the memory runs out.
How would I test this hypothesis? If I can just show that with every 5-min run the available memory decreases, then it would suggest that my hypothesis might be correct.
Here is the code that reruns the "main" function every 5 min:
class Editor(QtGui.QMainWindow):
# my app
def main():
app = QtGui.QApplication(sys.argv)
ex = Editor()
milliseconds_autocheck_frequency = 300000 # number of milliseconds in 5 min
timer = QtCore.QTimer()
timer.timeout.connect(ex.run)
timer.start(milliseconds_autocheck_frequency)
sys.exit(app.exec_())
if __name__ == '__main__':
main()

How to make Python do something every half an hour?

I would like my code (Python) to execute every half an hour. I'm using Windows. For example, I would like it to run at 11, 11:30, 12, 12:30, etc.
Thanks
This should call the function once, then wait 1800 second(half an hour), call function, wait, ect.
from time import sleep
from threading import Thread
def func():
your actual code code here
if __name__ == '__main__':
Thread(target = func).start()
while True:
sleep(1800)
Thread(target = func).start()
Windows Task Scheduler
You can also use the AT command in the command prompt, which is similar to cron in linux.

Limiting the lifetime of a file in Python

Helo there,
I'm looking for a way to limit a lifetime of a file in Python, that is, to make a file which will be auto deleted after 5 minutes after creation.
Problem:
I have a Django based webpage which has a service that generates plots (from user submitted input data) which are showed on the web page as .png images. The images get stored on the disk upon creation.
Image files are created per session basis, and should only be available a limited time after the user has seen them, and should be deleted 5 minutes after they have been created.
Possible solutions:
I've looked at Python tempfile, but that is not what I need, because the user should have to be able to return to the page containing the image without waiting for it to be generated again. In other words it shouldn't be destroyed as soon as it is closed
The other way that comes in mind is to call some sort of an external bash script which would delete files older than 5 minutes.
Does anybody know a preferred way doing this?
Ideas can also include changing the logic of showing/generating the image files.
You should write a Django custom management command to delete old files that you can then call from cron.
If you want no files older than 5 minutes, then you need to call it every 5 minutes of course. And yes, it would run unnecessarily when there are no users, but that shouln't worry you too much.
Ok that might be a good approach i guess...
You can write a script that checks your directory and delete outdated files, and choose the oldest file from the un-deleted files. Calculate how much time had passed since that file is created and calculate the remaining time to deletion of that file. Then call sleep function with remaining time. When sleep time ends and another loop begins, there will be (at least) one file to be deleted. If there is no files in the directory, set sleep time to 5 minutes.
In that way you will ensure that each file will be deleted exactly 5 minutes later, but when there are lots of files created simultaneously, sleep time will decrease greatly and your function will begin to check each file more and more often. To aviod that you add a proper latency to sleep function before starting another loop, like, if the oldest file is 4 minutes old, you can set sleep to 60+30 seconds (adding all time calculations 30 seconds).
An example:
from datetime import datetime
import time
import os
def clearDirectory():
while True:
_time_list = []
_now = time.mktime(datetime.now().timetuple())
for _f in os.listdir('/path/to/your/directory'):
if os.path.isfile(_f):
_f_time = os.path.getmtime(_f) #get file creation/modification time
if _now - _f_time < 300:
os.remove(_f) # delete outdated file
else:
_time_list.append(_f_time) # add time info to list
# after check all files, choose the oldest file creation time from list
_sleep_time = (_now - min(_time_list)) if _time_list else 300 #if _time_list is empty, set sleep time as 300 seconds, else calculate it based on the oldest file creation time
time.sleep(_sleep_time)
But as i said, if files are created oftenly, it is better to set a latency for sleep time
time.sleep(_sleep_time + 30) # sleep 30 seconds more so some other files might be outdated during that time too...
Also, it is better to read getmtime function for details.