I am running an lstm net on ec2 p2.8xlarge. Of course I'd like to take advantage of all the gpus available(8). I an run it on one gpu easily, but not more. I get the following error when calling "multi_gpu_model":
"To call multi_gpu_model with gpus=8, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2', '/gpu:3', '/gpu:4', '/gpu:5', '/gpu:6', '/gpu:7']. However this machine only has: ['/cpu:0']. Try reducing gpus."
When I type nvidia-smi, all 8 gpus show up in terminal. How can I add these to my tf (keras) environment?
when I run device_lib.list_local_devices() in jupyter notebook it returns only CPU0 when it should return 8 GPUs too. Here is the relevant bit of code:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model=multi_gpu_model(model, gpus=8)
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)
Related
I just launched (and paid for) the Deep Learning AMI (Ubuntu 18.04) Version 27.0 (ami-0dbb717f493016a1a) instance type g2.2xlarge. I activated
for PyTorch with Python3 (CUDA 10.1 and Intel MKL) ____________source activate pytorch_p36
When I run my pytorch network I see a warning
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/cuda/__init__.py:134: UserWarning:
Found GPU0 GRID K520 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability that we support is 3.5.
Is this real?
This is my code to put my neural net on the gpu
if torch.cuda.is_available():
device = torch.device("cuda:0") # you can continue going on here, like cuda:1 cuda:2....etc.
print("Running on the GPU")
else:
device = torch.device("cpu")
print("Running on the CPU")
net = Net(image_height, image_width)
net.to(device)
I had to use a g3s.xlarge instance. I guess the g2 instances use older GPUs.
Also I had to make num_workers=0 on my dataloaders following this https://discuss.pytorch.org/t/oserror-errno-12-cannot-allocate-memory-but-memory-usage-is-actually-normal/56027.
And this is another pytorch gotcha https://stackoverflow.com/a/51606286/3614578 when adding tensors to a device.
Background
I want to predict pathology images using keras with Inception-Resnet_v2. I have trained the model already and got a .hdf5 file. Because the pathology image is very large (for example: 20,000 x 20,000 pixels), so I have to scan the image to get small patches for prediction.
I want to speed up the prediction procedure using multiprocessing lib with python2.7. The main idea is using different subprocesses to scan different lines and then sending patches to model.
I saw somebody suggests importing keras and loading model in subprocesses. But I don't think it is suitable for my task. Loading model usingkeras.models.load_model() one time will take about 47s, which is very time-consuming. So I can't reload the model every time when I start a new subprocess.
Question
My question is can I load the model in my main process and pass it as a parameter to subprocesses?
I have tried two methods but both of them didn't work.
Method 1. Using multiprocessing.Pool
The code is :
import keras
from keras.models import load_model
import multiprocessing
def predict(num,model):
print dir(model)
print num
model.predict("image data, type:list")
if __name__ == '__main__':
model = load_model("path of hdf5 file")
list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
pool = multiprocessing.Pool(4)
pool.map(predict,list)
pool.close()
pool.join()
The output is
cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed
I searched the error and found Pool can't map unpickelable parameters, so I try method 2.
Method 2. Using multiprocessing.Process
The code is
import keras
from keras.models import load_model
import multiprocessing
def predict(num,model):
print num
print dir(model)
model.predict("image data, type:list")
if __name__ == '__main__':
model = load_model("path of hdf5 file")
list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
proc = []
for i in range(4):
proc.append(multiprocessing.Process(predict, list[i]))
proc[i].start()
for i in range(4):
proc[i].join()
In Method 2, I can print dir(model). I think it means the model is passed to subprocesses successfully. But I got this error
E tensorflow/stream_executor/cuda/cuda_driver.cc:1296] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x13350b2200; host src: 0x2049e2400; size: 4=0x4
The environment which I use:
Ubuntu 16.04, python 2.7
keras 2.0.8 (tensorflow backend)
one Titan X, Driver version 384.98, CUDA 8.0
Looking forward to reply! Thanks!
Maybe you can use apply_async() instead of Pool()
and you can find more details here:
Python multiprocessing pickling error
Multi-processing works on CPU, while model prediction happened in GPU, which there is only one. I cannot see how multi-processing can help you on prediction.
Instead, I think you can use multi-processing to scan different patches, which you seems to have already managed to achieve. Then stack these patches into a batch or batches to predict in parallel in GPU.
As noted by Statham multiprocess requires all args to be compatible with pickle. This blog post describes how to save a keras model as a pickle: [http://zachmoshe.com/2017/04/03/pickling-keras-models.html][1]
It may be a sufficient workaround to get your keras model passed as an arg to multiprocess, but I have not tested the idea myself.
I will also add that I had better luck running two keras processes on a single gpu using windows rather than linux. On linux I was getting out of memory errors on the 2nd process, but the same memory allocation (45% of total GPU ram for each) worked on windows. In my case they were fits - for running predictions only, maybe the memory requirements are less.
I just launched an AWS P2 instance trying to train a model. However it seems to be using the CPU to train not the GPU. How can I force it to train using the GPU not the CPU?
$ nano ~/.keras/keras.json says this
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
I am getting a message "Failed to load the native TensorFlow runtime."
I then changed
$ nano ~/.keras/keras.json says this
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
It's training, however very slowly and seems to be using the cpu.
It looks like the answer is add a gpus flag!
python cnn_homework_solution.py --gpus 0,1
I use Raspbery Pi B+ 2. I have a Python program that uses the ultrasonic sensor to measure the distance to an object. What I would like to is change the volume depending on the distance to a human. Having a Python code to obtain the distance, I have no idea, how can I change the Raspbery Pi volume by a code in Python.
Is there any way to do that?
You can use the package python-alsaaudio.
The installation and usage is very simple.
To install run:
sudo apt-get install python-alsaaudio
In your Python script, import the module:
import alsaaudio
Now, you need to get the main mixer and get/set the volume:
m = alsaaudio.Mixer()
current_volume = m.getvolume() # Get the current Volume
m.setvolume(70) # Set the volume to 70%.
If the line m = alsaaudio.Mixer() throws an error, then try:
m = alsaaudio.Mixer('PCM')
this might happen because the Pi uses PCM rather than a Master channel.
You can see more information about your Pi's audio channels, volume (etc..), by running the command amixer.
Collecting actual mixers available (will provide a list of available cards):
import alsaaudio as audio
scanCards = audio.cards()
print("cards:", scanCards)
In my case I've got following list:
[u'PCH', u'headset']
Scanning for the mixers per each card:
for card in scanCards:
scanMixers = audio.mixers(scanCards.index(card))
print("mixers:", scanMixers)
In my case I've got the following two lists:
[u'Master', u'Headphone', u'Speaker', u'PCM', u'Mic', u'Mic Boost', u'IEC958', u'IEC958', u'IEC958', u'IEC958', u'IEC958', u'Beep', u'Capture', u'Auto-Mute Mode', u'Internal Mic Boost', u'Loopback Mixing']
[u'Headphone', u'Mic', u'Auto Gain Control']
As you may see, "Master" is not always the mixer available, but traditionally expected an equivalent of the Master mixer to be at index 0. (Doesn't mean always!)
To control volume in this case for the USB Headset would be following procedure.
Volume Up
def volumeMasterUP():
mixer = audio.Mixer('Headphone', cardindex=1)
volume = mixer.getvolume()
newVolume = int(volume[0])+10
if newVolume <= 100:
mixer.setvolume(newVolume)
Volume Down
def volumeMasterDOWN():
mixer = audio.Mixer('Headphone', cardindex=1)
volume = mixer.getvolume()
newVolume = int(volume[0])-10
if newVolume >= 0:
mixer.setvolume(newVolume)
I made a simple python service for a two button volume control. Based on what #ant0nisk put.
https://gist.github.com/peteristhegreat/3c94963d5b3a876b27accf86d0a7f7c0
It shows getting and setting the volume, and muting.
I am programming in python 2.7 with NLTK library for both text prepossessing and classification in sentiment analysis. I am using nltk wrapper of scikit-learn algorithms. bellow code is after prepossessing and separation to train and test sets.
from nltk.classify.scikitlearn import SklearnClassifier
from sklearn.svm import SVC, LinearSVC, NuSVC
training_set = nltk.classify.util.apply_features(extractFeatures, trainTweets)
testing_set = nltk.classify.util.apply_features(extractFeatures, testTweets)
#LinearSVC
LinearSVC_classifier = SklearnClassifier(LinearSVC())
LinearSVC_classifier.train(training_set)
LinearSVCAccuracy = nltk.classify.accuracy(LinearSVC_classifier, testing_set)*100
print "LinearSVC accuracy percentage:" + str(LinearSVCAccuracy)
it works fine when the number of rows are like 4000 tweets for training, but when it increases to for example 10000 tweets, possess is getting killed with following error.
Memory cgroup out of memory: Kill process 24293 (python) score 848 or
sacrifice child
Killed process 24293, UID 29091, (python) total-vm:14569168kB,
anon-rss:14206656kB, file-rss:3412kB
Clocksource tsc unstable (delta = -17179861691 ns). Enable clocksource
failover by adding clocksource_failover kernel parameter.
RAM of my pc is 8 Gig but I even tried with 16 Gig RAM and still has problem. How can I Classify this amount for tweets without any problem?
Which OS are you running? Which python distribution? Try to install cython and/or using scikit-learn directly. Have a look at scikit-learn optimization techniques