I'm updating to the newest version of IBM DSX Desktop 1.1.7. The install process stalls and displays
"Downloading DSX... 0%".
The file desktop.log ends with lines
“The size is 83.85 and installsize is 14.73”
and
“Docker Storage Requirement Failed”.
The Docker app is up to date. I’ve tried the usual reboot, etc without success.
Any clue?
I work on the DSX Desktop team. You might be encountering a bug with the installation code. The good news is that it's a pretty simple fix if you are. Here's how to check:
Diagnosis & Fix
Open up your terminal or command prompt and run docker system df. Look for the entry that corresponds to IMAGE SIZE. If the unit is in kB or B, then you're encountering the bug.
To fix it, run docker pull busybox. Once the pull completes, you should be able to update DSX Desktop. After it successfully updates, you can run docker rmi busybox.
If the unit is in GB, then that means your existing images are taking up too much space. The limit is 60GB so make sure your IMAGE SIZE + installSize <= 60GB.
Explanation
There's a typo in the code that causes the installer to mistake kB and B for GB. Thus, if you have an IMAGE SIZE of 83.85kB, the installer will treat it as 83.85GB and will complain about being over the 60GB limit.
So to fix it, we pull a temporary image busybox, which updates our IMAGE SIZE to be in MB, allowing us to avoid the bug. After we successfully update DSX Desktop, we can remove the temporary image.
This has been patched and will be fixed in an upcoming release.
In my case, dsx-desktop.log has the following error:
[2017-11-04 19:52:03:0214] [error] exec error: Error: Command failed: eval $(docker-machine env ibm-dsx) && docker system df
docker: 'system' is not a docker command.
See 'docker --help'.
[2017-11-04 19:52:03:0214] [error] stderr: docker: 'system' is not a docker command.
See 'docker --help'.
It turns out "docker system" command is only available after Docker API version 1.25. Check your API version from the output of "docker version".
After I reinstalled a docker version with API version 1.33, it is able to download.
Related
I'm trying to install nightly version of pytorch on my google cloud account using the following command
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu113
But I receive the following error
ERROR: Could not install packages due to an EnvironmentError: [Errno
28] No space left on device
Then I can't do anything else as my disk is filled. I try
pip3 uninstall pytorch
and I get the message
WARNING: Skipping pytorch as it is not installed.
So my questions are 3 fold:
Considering that I have just purchased this account, why disk space
is already filled?
Now that I can't uninstall the package, how can I remove the downloaded pytorch files to open up some space?
When I click on Compute Engine -> Disk, I see that the size of the disk is 100GB, so why I can't install a simple pytorch package?
Most likely it is you are trying to download the data to your /tmp temporary location.
You can set your TMPDIR directory to /var/tmp by running the following command:
export TMPDIR='/var/tmp'
As mentioned in the Github link, you can also create a directory where you have enough space, say /folder/address/here/, and run below command to install it:
TMPDIR=/folder/address/here/ pip install --cache-dir=$TMPDIR --build $TMPDIR package-name
As suggested by #John Hanley The Google Console GUI supports resizing the disk larger.You can resize your disk at least 10 GB larger and then do disk clean up.
Refer to the stack post for more information.
I'm trying to create a Docker context that will automatically integrate with AWS's ECS.
I'm following this tutorial
The author just does:
docker context create ecs myecs and gets a "pick an integration" prompt, whereas I get an error saying it needs exactly 1 argument.
docker context create" requires exactly 1 argument.
See 'docker context create --help'.
Usage: docker context create [OPTIONS] CONTEXT
Create a context
You need to install the Docker Compose CLI preview
The below curl is from here: Docker docs
curl -L https://raw.githubusercontent.com/docker/compose-cli/main/scripts/install/install_linux.sh | sh
sudo docker context create ecs myecs
It didn't work without sudo for me for some reason.
After the script finished I had some weird errors:
cp: cannot stat '/tmp/tmp.d4QjhW8T6k/docker-compose': No such file or directory and docker context create ecs myecs didn't work at first, but once I tried with sudo it worked fine.
EDIT: . ~/.zshrc (or just close your terminal and open a new one) made it possible for me to run docker context create ecs myecs without sudo.
Author of the blog/tutorial here. It looks like you don't have the pre-requsite installed. In the blog I call out the pre-req in pieces like this.
....In July, Docker released a beta for Docker Desktop that embedded these functionalities and, on September 15th, Docker released an updated experience in their Docker Desktop stable channel....
and then
...For now the only thing you need is Docker Desktop and an AWS account. For this test , I am using Docker Desktop (stable) version 2.5.0.1....
and finally
The core of this integration is built around a new tool dubbed Compose CLI (this is not to be confused with the original docker-compose CLI). This new CLI surfaces to the user as new functionalities in the docker command. While in Docker Desktop all this plumbing is completely hidden and available out of the box, if you are using a Linux machine you can set it up using either a script or a manual install. This new CLI is, essentially, a new version of the docker binary.
Eager to understand more how we could make it more clear / front and center that there were stuff to install and/or minimum software versions you had to use.
Thanks for trying it out!
If you're on Linux and you're running the docker context create ecs myecscontext command from the docs then try enabling experimental features in docker:
Edit /etc/docker/daemon.json
Set contents to
{
"experimental": true
}
Restart docker service sudo systemctl restart docker
Exit your terminal and open a new one so that the changes take effect.
Source1
Source2
I had same issue but after installing Docker Desktop version problem resolved.
Server side version doesn't have such kind of functionality.
I am attempting to use AWS Batch to launch a linux server, which will in essence perform the fetch and go example included within AWS (to download a SH from S3 and run it).
Does AWS Batch work at all for anyone?
The aws fetch_and_go example always fails, even followed someone elses guide online which mimicked the aws example.
I have tried creating Dockerfile for amazonlinux:latest and ubuntu:20.04 with numerous RUN and CMD.
The scripts always seem to fail with the error:
standard_init_linux.go:219: exec user process caused: exec format error
I thought at first this was relevant to my deployment access rights maybe within the amazonlinux so have played with chmod 777, chmod -x etc on the she file.
The final nail in the coffin, my current script is litterely:
FROM ubuntu:20.04
Launch this using AWS Batch, no command or parameters passed through and it still fails with the same error code. This is almost hinting to me that there is either a setup issue with my AWS Batch (which im using default wizard settings, except changing to an a1.medium server) or that AWS Batch has some major issues.
Has anyone had any success with AWS Batch launching their own Dockerfiles ? Could they share their examples and/or setup parameters?
Thank you in advance.
A1 instances are ARM based first-generation Graviton CPU. It is highly likely the image you are trying to run something that is expecting x86 CPU (Intel or AMD). Any instance class with a "g" in it ("c6g" or "m5g") are Graviton2 which is also ARM based and will not work for the default examples.
You can test whether a specific container will run by launching an A1 instance yourself and running the container (after installing docker). My guess is that you will get the same error. Running on Intel or AMD instances should work.
To leverage Batch with ARM your containerized application will need to work on ARM. If you point me to the exact example, I can give more details on how to adjust to run on A1 or Graviton2 instances.
I had the same issue, and it was because I build the image locally on my M1 Mac.
Try adding --platform linux/amd64 to your docker build command before pushing if this is your case.
In addition to the other comment. You can create multi-arch images yourself which will provide the correct architecture.
https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/
Project Detail
I am running open source code of A GAN based Research Paper named "Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition"
source code: here
The dependencies include:
Python 2.7
TensorFlow 1.4.0
I pulled a Docker Image of TensorFlow 1.4.0 with python 2.7 on my GPU Virtual Machine Connected with ssh connection with this command :
docker pull tensorflow/tensorflow:1.4.0-gpu
I am running
bash rsrgan/run_gan_rnn_placeholder.sh
according to readme of source code
Issue's Detail
Everything is working, Model is Training and loss is decreasing, But there is only one issue that After some iterations terminal shows no output, GPU still shows PID but no Memory freed and sometime GPU-Utils becomes 0%. Training on VM's GPU and CPU are same case.
It is not a memory issue Because GPU Memory usage by model is 5400MB out of 11,000MB and RAM for CPU is also very Big
When I ran 21 Iteration on my local Computer each iteration with 0.09 hours with 1st Gen i5 and 4GB RAM all iterations executed. But whenever I run it with ssh inside docker issue happens again and again with both GPU and CPU.
Just keep in mind the issue is happening inside docker with computer connected with ssh and ssh is also not disconnect very often.
exact Numbers
If an iteration take 1.5 hour then issue happens after two to three iterations and if single iteration take 0.06 hours then issue happens exactly after 14 iteration out of 25
Perform operations inside Docker container
The first thing you can try out is to build the Docker image and then enter inside the Docker container by specifying the -ti flag or /bin/bash parameter in your docker run command.
Clone the repository inside the container and while building the image you should also copy your training data from local to inside the docker. Run the training there and commit the changes so that you need not repeat the steps in future runs as after you exit from the container, all the changes are lost if not committed.
You can find the reference for docker commit here.
$ docker commit <container-id> <image-name:tag>
While training is going on check for the GPU and CPU utilization of the VM, see if everything is working as expected.
Use Anaconda environment on you VM
Anaconda is a great package manager. You can install anaconda and create a virtual environment and run your code in the virtual environment.
$ wget <url_of_anaconda.sh>
$ bash <path_to_sh>
$ source anaconda3/bin/activate or source anaconda2/bin/activate
$ conda create -n <env_name> python==2.7.*
$ conda activate <env_name>
Install all the dependencies via conda (recommended) or pip.
Run your code.
Q1: GAN Training with Tensorflow 1.4 inside Docker Stops without Prompting
Although Docker gives OS-level virtualization inside Docker, we face issues in running some processes which run with ease on the system. So to debug the issue you should go inside the image and performs the steps above in order to debug the problem.
Q2: Training stops without Releasing Memory connected to VM with SSH Connection
Ya, this is an issue I had also faced earlier. The best way to release memory is to stop the Docker container. You can find more resource allocation options here.
Also, earlier versions of TensorFlow had issues with allocating and clearing memory properly. You can find some reference here and here. These issues have been fixed in recent versions of TensorFlow.
Additionally, check for Nvidia bug reports
Step 1: Install Nvidia-utils installed via the following command. You can find the driver version from nvidia-smi output (also mentioned in the question.)
$ sudo apt install nvidia-utils-<driver-version>
Step 2: Run the nvidia-bug-report.sh script
$ sudo /usr/bin/nvidia-bug-report.sh
Log file will be generated in your current working directory with name nvidia-bug-report.log.gz. Also, you can access the installer log at /var/log/nvidia-installer.log.
You can find additional information about Nvidia logs at these links:
Nvidia Bug Report Reference 1
Nvidia Bug Report Reference 2
Log GPU load
Hope this helps.
Within Google Container OS, I would like to use it as my cloud development environment. How would I run the docker command from the toolbox? Do I need to add the docker.sock as a bind mount? I need to be able to run docker (and docker-compose) to run my development environment.
Google Container OS images come with docker already installed and configured, so you will be able to use the docker command from the command line without any prior configuration if you create a virtual machine from one of these images, and SSH into the machine.
As for docker-compose, this doesn't come pre-installed. However, you can install this, and other relevant tools/programs you require by making use of the toolbox you mentioned which provides a shell (including a package manager)in a Debian chroot-like environment (here you automatically gain root privileges).
You can install docker-compose by following these steps:
1) If you havn't already, enter the toolbox environment by running /usr/bin/toolbox
2) Check the latest version of docker-compose here.
3) You can run the following to retrieve and install docker-compose on the machine (substitute the docker-compose version number for the latest version you retrieved in step 2):
curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
You've probably found at this point that although you can now run the freshly installed docker-compose command within the toolbox, you can't run the docker command. This is because by default the toolbox environment doesn't have access to all paths with the rootfs and that the filesystem available doesn't correspond between both environments.
It may be possible to remedy this by exiting out of the toolbox shell, and then edit the /etc/default/toolbox file which allows you to configure the toolbox settings. This would allow you to provide access to the docker binary file in the standard environment by following these steps:
1) Ensure you are no longer in the toolbox shell, then run command which docker. You will see something similar to /usr/bin/docker.
2) Open file /etc/default/toolbox
3) The TOOLBOX_BIND line specifies the paths from rootfs to be made available inside the toolbox environment. To ensure docker is available inside the toolbox environment, you could try adding an entry to the TOOLBOX_BIND section, for example --bind=/usr/bin/docker:/usr/bin/docker.
However, I've found that even though it's possible to edit the /etc/default/toolbox to make the docker binary file available in the toolbox environment, when certain docker commands are run in the toolbox environment, additional errors are generated as the docker version that comes pre-installed on the machine is configured to use certain configuration files and directories and although it may be possible edit the /etc/default/toolbox file and make all of the required locations accessible from within the toolbox environment, it may be simpler to install docker within the toolbox by following the instructions for installing docker on debian found here.
You would then be able, to issue both the docker and docker-compose commands from within toolbox.
Alternatively, it's possible to simply install docker and docker-compose on a standard VM (i.e. without necessarily using a Google Container OS machine type) although the suitability of this depends on your use case.