I'm trying to use a validation monitor in skflow by passing my validation set as numpy array.
Here is some simple code to reproduce the problem (I installed tensorflow from the provided binaries for Ubuntu/Linux 64-bit, GPU enabled, Python 2.7):
import numpy as np
from sklearn.cross_validation import train_test_split
from tensorflow.contrib import learn
import tensorflow as tf
import logging
logging.getLogger().setLevel(logging.INFO)
#Some fake data
N=200
X=np.array(range(N),dtype=np.float32)/(N/10)
X=X[:,np.newaxis]
Y=np.sin(X.squeeze())+np.random.normal(0, 0.5, N)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
train_size=0.8,
test_size=0.2)
val_monitor = learn.monitors.ValidationMonitor(X_test, Y_test,early_stopping_rounds=200)
reg=learn.DNNRegressor(hidden_units=[10,10],activation_fn=tf.tanh,model_dir="tmp/")
reg.fit(X_train,Y_train,steps=5000,monitors=[val_monitor])
print "train error:", reg.evaluate(X_train, Y_train)
print "test error:", reg.evaluate(X_test, Y_test)
The code runs but only the first validation step is done properly, then validation always returns the same value even if training is actually going fine which can be checked by running an evaluation on the test set at the end. The following message also appears for each validation step.
INFO:tensorflow:Input iterator is exhausted.
Any help is welcome!
Thanks,
David
I was able to solve this by adding:
config=tf.contrib.learn.RunConfig(save_checkpoints_secs=1)
to the DNNRegressor call.
Improving on dbikard's solution:
Add config=tf.contrib.learn.RunConfig(save_checkpoints_steps=val_monitor._every_n_steps) to the DNN Regressor call instead.
This saves checkpoints when they are needed (i.e each time before the monitor is triggered) rather than once per second.
Related
In my website i'm now converting my upload images into webp, because is smaller than the other formats, the users will load my pages more faster(Also mobile user). But it's take some time to converter a medium image.
import StringIO
import time
from PIL import Image as PilImage
img = PilImage.open('222.jpg')
originalThumbStr = StringIO.StringIO()
now = time.time()
img.convert('RGBA').save(originalThumbStr, 'webp', quality=75)
print(time.time() - now)
It's take 2,8 seconds to convert this follow image:
860kbytes, 1920 x 1080
My memory is 8gb ram, with a processor with 4 cores(Intel I5), without GPU.
I'm using Pillow==5.4.1.
Is there a faster way to convert a image into WEBB more faster. 2,8s it's seems to long to wait.
If you want them done fast, use vips. So, taking your 1920x1080 image and using vips in the Terminal:
vips webpsave autumn.jpg autumn.webp --Q 70
That takes 0.3s on my MacBook Pro, i.e. it is 10x faster than the 3s your PIL implementation achieves.
If you want lots done really fast, use GNU Parallel and vips. So, I made 100 copies of your image and converted the whole lot to WEBP in parallel like this:
parallel vips webpsave {} {#}.webp --Q 70 ::: *jpg
That took 4.9s for 100 copies of your image, i.e. it is 50x faster than the 3s your PIL implementation achieves.
You could also use the pyvips binding - I am no expert on this but this works and takes 0.3s too:
#!/usr/bin/env python3
import pyvips
# VIPS
img = pyvips.Image.new_from_file("autumn.jpg", access='sequential')
img.write_to_file("autumn.webp")
So, my best suggestion would be to take the 2 lines of code above and use a multiprocessing pool or multithreading approach to get a whole directory of images processed. That could look like this:
#!/usr/bin/env python3
import pyvips
from glob import glob
from pathlib import Path
from multiprocessing import Pool
def doOne(f):
img = pyvips.Image.new_from_file(f, access='sequential')
webpname = Path(f).stem + ".webp"
img.write_to_file(webpname)
if __name__ == '__main__':
files = glob("*.jpg")
with Pool(12) as pool:
pool.map(doOne, files)
That takes 3.3s to convert 100 copies of your image into WEBP equivalents on my 12-core MacBook Pro with NVME disk.
I can run all the cells of the tutorial notebook of Pytorch about dataloading (pytorch tutorial).
But when I use OpenCV in place of Skimage to resize the image, the dataloader gets stuck, i.e nothing happens.
In the Rescale class:
class Rescale(object):
.....
def __call__(self, sample):
....
#img = transform.resize(image, (new_h, new_w))
img = cv2.resize(image, (new_h, new_w))
.....
The dataloader and the for loop are defined with:
dataloader = DataLoader(transformed_dataset, batch_size=4,
shuffle=True, num_workers=4)
for i_batch, sample_batched in enumerate(dataloader):
print(i_batch, sample_batched['image'].size(),
sample_batched['landmarks'].size())
I can get the iterator to print something if num_workers=0. It looks like opencv does not play well with the Multiprocessing of pytorch.
I would really prefer to use same package to transform the images at train time and test time (and I am already using OpenCV for the image rescale at test time).
Any suggestions would be greatly appreciated.
I had a very similar problem and that's how I solved it:
when you import cv2 set cv2.setNumThreads(0)
and then you can set num_workers>0 in the dataloader in PyTorch.
Seems like OpenCV tries to multithread and somewhere something goes into a deadlock.
Hope it helps.
Except for cv2.setNumThreads(0),
import multiprocessing
multiprocessing.set_start_method('spawn')
can also solve this problem.
From Pytorch issues these five may help (not recommended):
1. time.sleep(0.003)
2. pin_memory = True/False
3. num_workers = 0/1
4. from torch.utils.data.dataloader import DataLoader
5. writing 8192 to /proc/sys/kernel/shmmni
OpenCV and Pytorch multiprocessing don't play well together, sometimes. When running a code with OpenCV functions calls embedded in a homebrewed function parallelized in a "multiprocessing" pool, the code eventually ends up with idle processors after several calls to the pool than can fluctuate from run to run.
Forking could be the problem, forking only clones the current thread. In this case, the thread pool may wrongly assume that it has more threads than it has.
The code is waiting infinitely for a condition never signaled because the number of threads is not as expected after forking.
Spawning resets the memory after forking, forcing a re-initialization of all data structures (as the thread pool) within OpenCV.
More discussion in Github of OpenCV issues and Pytorch issues.
I'm trying to gather some metrics regarding a network issue in a small LAN with SSH disconnecting occasionally and ping exhibiting huge delays (up to a minute instead of less than a second!).
Using timeit (I've read that it's a good way to check elapsed execution time when calling some code snippet) I try to download some data from a FTP server that also runs locally, measure the time and store it inside a log file.
from ftplib import FTP
from timeit import timeit
from datetime import datetime
ftp = FTP(host='10.0.0.8')
ftp.login(user='****', passw='****')
ftp.cwd('updates/')
ftp.retrlines('LIST')
# Get timestamp when the download starts
curr_t = datetime.now()
print('Download large file and measure the time')
file_big_t = timeit(lambda f=ftp: f.retrbinary('RETR update_big', open('/tmp/update_big', 'wb').write))
print('Download medium file and measure the time')
file_medium_t = timeit(lambda f=ftp: f.retrbinary('RETR file_medium ', open('/tmp/file_medium ', 'wb').write))
print('Download small file and measure the time')
file_small_t = timeit(lambda f=ftp: f.retrbinary('RETR update_small', open('/tmp/update_small', 'wb').write))
# Write timestamp and measured timings to file
# ...
If I call the retrbinary(...) without the timeit it works without any issues. However the code above results in the script freezing right after the first timeit call.
In case someone else wants to do what I've described in my question I found the solution here. For some reason passing the lambda directly to timeit results in the behaviour I've already mentioned. However if the same lambda is first passed to an instance of timeit.Timer and that that instance's timeit() function is called it works.
For the example above (let's take just file_big_t) I did
file_big_timer = Timer(lambda f=ftp: f.retrbinary('RETR update_big', open('/tmp/update_big', 'wb').write))
file_big_t = file_big_timer.timeit()
Background
I want to predict pathology images using keras with Inception-Resnet_v2. I have trained the model already and got a .hdf5 file. Because the pathology image is very large (for example: 20,000 x 20,000 pixels), so I have to scan the image to get small patches for prediction.
I want to speed up the prediction procedure using multiprocessing lib with python2.7. The main idea is using different subprocesses to scan different lines and then sending patches to model.
I saw somebody suggests importing keras and loading model in subprocesses. But I don't think it is suitable for my task. Loading model usingkeras.models.load_model() one time will take about 47s, which is very time-consuming. So I can't reload the model every time when I start a new subprocess.
Question
My question is can I load the model in my main process and pass it as a parameter to subprocesses?
I have tried two methods but both of them didn't work.
Method 1. Using multiprocessing.Pool
The code is :
import keras
from keras.models import load_model
import multiprocessing
def predict(num,model):
print dir(model)
print num
model.predict("image data, type:list")
if __name__ == '__main__':
model = load_model("path of hdf5 file")
list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
pool = multiprocessing.Pool(4)
pool.map(predict,list)
pool.close()
pool.join()
The output is
cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed
I searched the error and found Pool can't map unpickelable parameters, so I try method 2.
Method 2. Using multiprocessing.Process
The code is
import keras
from keras.models import load_model
import multiprocessing
def predict(num,model):
print num
print dir(model)
model.predict("image data, type:list")
if __name__ == '__main__':
model = load_model("path of hdf5 file")
list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
proc = []
for i in range(4):
proc.append(multiprocessing.Process(predict, list[i]))
proc[i].start()
for i in range(4):
proc[i].join()
In Method 2, I can print dir(model). I think it means the model is passed to subprocesses successfully. But I got this error
E tensorflow/stream_executor/cuda/cuda_driver.cc:1296] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x13350b2200; host src: 0x2049e2400; size: 4=0x4
The environment which I use:
Ubuntu 16.04, python 2.7
keras 2.0.8 (tensorflow backend)
one Titan X, Driver version 384.98, CUDA 8.0
Looking forward to reply! Thanks!
Maybe you can use apply_async() instead of Pool()
and you can find more details here:
Python multiprocessing pickling error
Multi-processing works on CPU, while model prediction happened in GPU, which there is only one. I cannot see how multi-processing can help you on prediction.
Instead, I think you can use multi-processing to scan different patches, which you seems to have already managed to achieve. Then stack these patches into a batch or batches to predict in parallel in GPU.
As noted by Statham multiprocess requires all args to be compatible with pickle. This blog post describes how to save a keras model as a pickle: [http://zachmoshe.com/2017/04/03/pickling-keras-models.html][1]
It may be a sufficient workaround to get your keras model passed as an arg to multiprocess, but I have not tested the idea myself.
I will also add that I had better luck running two keras processes on a single gpu using windows rather than linux. On linux I was getting out of memory errors on the 2nd process, but the same memory allocation (45% of total GPU ram for each) worked on windows. In my case they were fits - for running predictions only, maybe the memory requirements are less.
I have a genetic algorithm which I would like to speed up. I'm thinking the easiest way to achieve this is by pythons multiprocessing module. After running cProfile on my GA, I found out that most of the computational time takes place in the evaluation function.
def evaluation():
scores = []
for chromosome in population:
scores.append(costly_function(chromosome))
How would I go about to parallelize this method? It is important that all the scores append in the same order as they would if the program would run sequentially.
I'm using python 2.7
Use pool (I show both imap and map because of some results on google say map may not be OK for ordering though I have yet to see proof):
from multiprocessing import Pool
def evaluation(population):
return list(Pool(processes=nprocs).imap(costly_function,population))
or (what I use):
return Pool(processes=nprocs).map(costly_function,population)
Define nprocs to the number of parallel process you want.
From:
https://docs.python.org/dev/library/multiprocessing.html#multiprocessing.pool.Pool