AWS EC2 Deep Learning instance cuda 3.0 - amazon-web-services

I just launched (and paid for) the Deep Learning AMI (Ubuntu 18.04) Version 27.0 (ami-0dbb717f493016a1a) instance type g2.2xlarge. I activated
for PyTorch with Python3 (CUDA 10.1 and Intel MKL) ____________source activate pytorch_p36
When I run my pytorch network I see a warning
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/cuda/__init__.py:134: UserWarning:
Found GPU0 GRID K520 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability that we support is 3.5.
Is this real?
This is my code to put my neural net on the gpu
if torch.cuda.is_available():
device = torch.device("cuda:0") # you can continue going on here, like cuda:1 cuda:2....etc.
print("Running on the GPU")
else:
device = torch.device("cpu")
print("Running on the CPU")
net = Net(image_height, image_width)
net.to(device)

I had to use a g3s.xlarge instance. I guess the g2 instances use older GPUs.
Also I had to make num_workers=0 on my dataloaders following this https://discuss.pytorch.org/t/oserror-errno-12-cannot-allocate-memory-but-memory-usage-is-actually-normal/56027.
And this is another pytorch gotcha https://stackoverflow.com/a/51606286/3614578 when adding tensors to a device.

Related

Tensorflow C API Selecting GPU

I am using the Tensorflow C API to run models saved/frozen in python. We use to run these models on CPU but recently switched to GPU for performance. To interact with the C API we use a wrapper library called CPPFlow (https://github.com/serizba/cppflow). I recently updated this library so that we can pass in GPU Config options so that we can control GPU memory allocations. However we also now have systems with multiple GPUs which is causing some issues. It seems like I cant get Tensorflow to use the same GPU as our software does.
I use the visible_device_list parameter with the same GPU ID as our software. If I set our software to run on device 1 and Tensorflow to device 1, Tensorflow will pick device 2. If I set our software to use device 1 and Tensorflow to use device 2, both software use the same GPU.
How does Tensorflow order GPU devices and do I need to use another method to manually select the device? Every where I look suggests it can be done using the GPU Config options.
One way to set the device is getting the hex string in python and then using the string in C API: For example,
Sample 1:
gpu_options = tf.GPUOptions(allow_growth=True,visible_device_list='1')
config = tf.ConfigProto(gpu_options=gpu_options)
serialized = config.SerializeToString()
print(list(map(hex, serialized)))
Sample 2:
import tensorflow as tf
config = tf.compat.v1.ConfigProto(device_count={"CPU":1}, inter_op_parallelism_threads=1,intra_op_parallelism_threads=1)
ser = config.SerializeToString()
list(map(hex,ser))
Out[]:
['0xa',
'0x7',
'0xa',
'0x3',
'0x43',
'0x50',
'0x55',
'0x10',
'0x1',
'0x10',
'0x1',
'0x28',
'0x1']
Use this string in C API as
uint8_t config[13] = {0xa, 0x7, 0xa, ... , 0x28, 0x1};
TF_SetConfig(opts, (void*)config, 13, status);
For more details:
https://github.com/tensorflow/tensorflow/issues/29217
https://github.com/cyberfire/tensorflow-mtcnn/issues/1
https://github.com/tensorflow/tensorflow/issues/27114
You can set Tensorflow GPU order by setting the environment variable CUDA_VISIBLE_DEVICES during execution. For more details, you can check it here
//Set TF to use GPU:1 and GPU:0 (in this order)
setenv( "CUDA_VISIBLE_DEVICES", "1,0", 1 );
//Set TF to use only GPU:0 (in this order)
setenv( "CUDA_VISIBLE_DEVICES", "0", 1 );
//Set TF to do not use GPUs
setenv( "CUDA_VISIBLE_DEVICES", "-1", 1 );

tensorflow places softmax op on cpu instead of gpu

I have a tensorflow model with multiple inputs and several layers, and a final softmax layer. The model is trained in Python (using the Keras framework), then saved and inference is done using a C++ program that facilitates a CMake build of TensorFlow (following basically those instructions: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/cmake).
In python (tensorflow-gpu) all ops use the GPU (using log_device_placement):
out/MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
2017-12-04 14:07:38.005837: I C:\tf_jenkins\home\workspace\rel-in\M\windows-gpu\PY\35\tensorflow\core\common_runtime\simple_placer.cc:872] out/MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
out/BiasAdd: (BiasAdd): /job:localhost/replica:0/task:0/gpu:0
2017-12-04 14:07:38.006201: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\simple_placer.cc:872]
out/BiasAdd: (BiasAdd)/job:localhost/replica:0/task:0/gpu:0
out/Softmax: (Softmax): /job:localhost/replica:0/task:0/gpu:0
2017-12-04 14:07:38.006535: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\simple_placer.cc:872] out/Softmax: (Softmax)/job:localhost/replica:0/task:0/gpu:0
To save the graph, the freeze_graph script is used (the script producing the log above loads again the freezed graph in .pb format).
When I use the C++ program and load the freezed graph (following closely the LoadGraph() function in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc - ReadBinaryProto() and session->Create()), and log again the device placements, I find that the Softmax is placed on CPU (all others ops are on GPU):
dense_6/MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
dense_6/BiasAdd: (BiasAdd): /job:localhost/replica:0/task:0/device:GPU:0
dense_6/Relu: (Relu): /job:localhost/replica:0/task:0/device:GPU:0
out/MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
out/BiasAdd: (BiasAdd): /job:localhost/replica:0/task:0/device:GPU:0
out/Softmax: (Softmax): /job:localhost/replica:0/task:0/device:CPU:0
This placement is also confirmed by high CPU/low GPU utilization, and also apparent from profiling the application. The data type of the out layer is float32 (out/Softmax -> (<tf.Tensor 'out/Softmax:0' shape=(?, 1418) dtype=float32>,)).
Further investigation revealed:
Creating the softmax-op in C++ and placing it on GPU explicitly throws this error message:
Cannot assign a device for operation 'tsoftmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
A call to tensorflow::LogAllRegisteredKernels() showed also that Softmax is only available for CPU!
The build directory contains many files related to "softmax" (e.g. `tf_core_gpu_kernels_generated_softmax_op_gpu.cu.cc.obj.Release.cmake). Don't know how to check every compilation step, though.
when I look into the "tf_core_gpu_kernels.lib" (one can open a .lib with 7Z ;)), there are files like "tf_core_gpu_kernels_generated_softmax_op_gpu.cu.cc.lib" - so I believe there is nothing wrong with compiling the kernels itself
But: inspecting the "tensorflow.dll" (Dependency Walker) shows that only CPU kernels for Softmax are included (there are functions like const tensorflow::SoftmaxOp<struct Eigen::ThreadPoolDevice,double>, but no functions with GPU such as const tensorflow::SoftplusGradOp<struct Eigen::GpuDevice,float>).
Setup: Tensorflow 1.3.0, Windows 10, GPU: NVidia GTX 1070 (8GB RAM, memory utilization also very low).
I found a workaround - the workaround is to include the tf_core_gpu_kernels.lib in some of the steps (create_def_file.py). More details here: GitHub Issue 15254

How to get TensorFlow to detect all GPUs on AWS?

I am running an lstm net on ec2 p2.8xlarge. Of course I'd like to take advantage of all the gpus available(8). I an run it on one gpu easily, but not more. I get the following error when calling "multi_gpu_model":
"To call multi_gpu_model with gpus=8, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2', '/gpu:3', '/gpu:4', '/gpu:5', '/gpu:6', '/gpu:7']. However this machine only has: ['/cpu:0']. Try reducing gpus."
When I type nvidia-smi, all 8 gpus show up in terminal. How can I add these to my tf (keras) environment?
when I run device_lib.list_local_devices() in jupyter notebook it returns only CPU0 when it should return 8 GPUs too. Here is the relevant bit of code:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model=multi_gpu_model(model, gpus=8)
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Train on AWS using GPU not CPU

I just launched an AWS P2 instance trying to train a model. However it seems to be using the CPU to train not the GPU. How can I force it to train using the GPU not the CPU?
$ nano ~/.keras/keras.json says this
{
 "image_dim_ordering": "th",
"epsilon": 1e-07,

 "floatx": "float32",
"backend": "tensorflow"
 }
I am getting a message "Failed to load the native TensorFlow runtime."
I then changed
$ nano ~/.keras/keras.json says this
{
 "image_dim_ordering": "th",
"epsilon": 1e-07,

 "floatx": "float32",
"backend": "theano"
 }
It's training, however very slowly and seems to be using the cpu.
It looks like the answer is add a gpus flag!
python cnn_homework_solution.py --gpus 0,1

Tensorflow does not recognize GPU on AWS

So here it goes: I wanted to use TensorFlow with GPU on AWS - p2.xlarge plan. Unfortunately, something must have gone wrong and I continue to get:
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'Variable_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
I checked both CUDA and cuDNN:
nvcc -V
cat /usr/local/cuda/include/cudnn.h
and got 8.0 and 5.1, respectively.
I call gpu like this:
with tf.device('/gpu:0'):
a = tf.Variable(tf.truncated_normal([100, 100]))
b = tf.Variable(tf.truncated_normal([100, 1000]))
with tf.Session() as sess:
sess.run(tf.matmul(a,b))
happy to post more details if necessary - don't know what will be useful yet.
I suppose you're trying to set up an EC2 instance from scratch? That can be difficult.
Instead, I'd strongly recommend using the Deep Learning AMI (https://aws.amazon.com/machine-learning/amis/). It comes preinstalled with everything you need (drivers, popular DL libraries, etc.). It's also free to use, you just pay for the instance itself.