aws-shell not working in Ubuntu 20, on AWS Lightsail - amazon-web-services

I've created AWS Lightsail instance with Ubuntu 20.04, installed python3 and pip3.
I installed AWS Shell tool using pip3 install aws-shell command.
However, when I try to run it, it hangs and outputs Killed after several minutes.
This is how it looks like:
root#ip-...:/home/ubuntu# aws-shell
First run, creating autocomplete index...
Killed
root#ip-...:/home/ubuntu# aws-shell
First run, creating autocomplete index...
Killed
On Metrics page of AWS Lightsail it shows CPU utilization spike in Burstable zone.
So I'm quite sad that this just wastes CPU quota by loading CPU for several minutes and doesn't work.
I've done the same steps on Ubuntu 16.0 on virtual machine and it worked there fine. So I'm completely lost here and don't know how can I fix it. Tried to google this problem and didn't find anything related.
UPD: also I've just tried to use python 2.7 version to install aws-shell, it still doesn't work. So it doesn't work for both python 3.8.5 and 2.7.18

The aws-shell tool should be used on local machine, instead of on AWS Lightsail instance.
I wish it had warning or info message about it, besides me knowing now that it was an incorrect endeavor.

Related

How to get a local Cloud Foundry Instance?

I’m looking to learn about Cloud Foundry and I’m trying to get a development instance of it set up on my local Windows 10 PC. But I’m not having any luck.
I’m finding a lot of information about PCF Dev which was deprecated a while ago. I also looked at the replacement for PCF Dev, CF Dev (https://github.com/cloudfoundry-attic/cfdev). Its git page mentions that its repository is no longer receiving updates. I still went ahead and tried installing it using the instructions in the README:
cf install-plugin -r CF-Community cfdev
But the link it uses to download the plugin is broken:
Starting download of plugin binary from repository CF-Community...
Get "https://d3p1cc0zb2wjno.cloudfront.net/cfdev/cfdev-v0.0.18-rc.36-windows.exe": dial tcp: lookup d3p1cc0zb2wjno.cloudfront.net: no such host
Can anyone recommend a way to get a development instance of Cloud Foundry set up on my local machine so I can play around with it?
Thanks
Yes, steer clear of pcf-dev and cf-dev, they may still work but are definitely not getting updates so will be way out of date by now.
My understanding, although I haven't tried this process in a while, is that the way to run locally is with VirtualBox. You can run one locally using bosh-deployment & cf-deployment and Virtualbox.
For instructions installing Bosh in VirtualBox using bosh-deployment, see the Install Section to install Bosh.
With Bosh installed, follow the deployment guide to get CF installed. You can skip to step 4, since you're installing into VirtualBox. Be sure to read the entire document before you begin, however pay specific attention to this section which has specific instructions for running locally.

AWS Batch Failing to launch Dockerfile - standard_init_linux.go:219: exec user process caused: exec format error

I am attempting to use AWS Batch to launch a linux server, which will in essence perform the fetch and go example included within AWS (to download a SH from S3 and run it).
Does AWS Batch work at all for anyone?
The aws fetch_and_go example always fails, even followed someone elses guide online which mimicked the aws example.
I have tried creating Dockerfile for amazonlinux:latest and ubuntu:20.04 with numerous RUN and CMD.
The scripts always seem to fail with the error:
standard_init_linux.go:219: exec user process caused: exec format error
I thought at first this was relevant to my deployment access rights maybe within the amazonlinux so have played with chmod 777, chmod -x etc on the she file.
The final nail in the coffin, my current script is litterely:
FROM ubuntu:20.04
Launch this using AWS Batch, no command or parameters passed through and it still fails with the same error code. This is almost hinting to me that there is either a setup issue with my AWS Batch (which im using default wizard settings, except changing to an a1.medium server) or that AWS Batch has some major issues.
Has anyone had any success with AWS Batch launching their own Dockerfiles ? Could they share their examples and/or setup parameters?
Thank you in advance.
A1 instances are ARM based first-generation Graviton CPU. It is highly likely the image you are trying to run something that is expecting x86 CPU (Intel or AMD). Any instance class with a "g" in it ("c6g" or "m5g") are Graviton2 which is also ARM based and will not work for the default examples.
You can test whether a specific container will run by launching an A1 instance yourself and running the container (after installing docker). My guess is that you will get the same error. Running on Intel or AMD instances should work.
To leverage Batch with ARM your containerized application will need to work on ARM. If you point me to the exact example, I can give more details on how to adjust to run on A1 or Graviton2 instances.
I had the same issue, and it was because I build the image locally on my M1 Mac.
Try adding --platform linux/amd64 to your docker build command before pushing if this is your case.
In addition to the other comment. You can create multi-arch images yourself which will provide the correct architecture.
https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/

AWS Elastic Beanstalk Python 3.7 Deployment Location

I'm trying to upgrade an existing application on AWS from the now deprecated Python 3.4 platform to 3.7 with Amazon Linux 2/3.0.1, and in the process I ran into an issue with where the application source code is deployed on the EC2 instance.
From some empirical testing, I found that instead of /opt/python/current/app directory that most if not all AWS documentations say (e.g Troubleshooting issues with the EB CLI - AWS Elastic Beanstalk), with Python 3.7 it is actually deployed in /var/app/current/. I wasn't able to find any documentation regarding this change, and it is causing some issues with the application. I'm wondering is there any reason that this change is made? And if it is possible to revert it, how to do so?
Thanks in advance!
This is because the 3.7 Python Elastic Beanstalk distribution uses Amazon Linux 2 which is fundamentally different from the AMI predecessor. If you opt to use Python 3.6 instead you should be able to avoid this issue, as it runs on the earlier Linux version where deployments still occur in /opt/var/app/current. Most tutorials I've found are designed to work with this older rollout, including the most up-to-date Amazon start guide.
If you have the time, try migrating your code to the newer version, as this seems to be the workflow Amazon is embracing going forward, for all newer versions of Python (such as 3.8 and others yet to come).

GAN Training with Tensorflow 1.4 inside Docker Stops without Prompting and without Releasing Memory connected to VM with SSH Connection

Project Detail
I am running open source code of A GAN based Research Paper named "Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition"
source code: here
The dependencies include:
Python 2.7
TensorFlow 1.4.0
I pulled a Docker Image of TensorFlow 1.4.0 with python 2.7 on my GPU Virtual Machine Connected with ssh connection with this command :
docker pull tensorflow/tensorflow:1.4.0-gpu
I am running
bash rsrgan/run_gan_rnn_placeholder.sh
according to readme of source code
Issue's Detail
Everything is working, Model is Training and loss is decreasing, But there is only one issue that After some iterations terminal shows no output, GPU still shows PID but no Memory freed and sometime GPU-Utils becomes 0%. Training on VM's GPU and CPU are same case.
It is not a memory issue Because GPU Memory usage by model is 5400MB out of 11,000MB and RAM for CPU is also very Big
When I ran 21 Iteration on my local Computer each iteration with 0.09 hours with 1st Gen i5 and 4GB RAM all iterations executed. But whenever I run it with ssh inside docker issue happens again and again with both GPU and CPU.
Just keep in mind the issue is happening inside docker with computer connected with ssh and ssh is also not disconnect very often.
exact Numbers
If an iteration take 1.5 hour then issue happens after two to three iterations and if single iteration take 0.06 hours then issue happens exactly after 14 iteration out of 25
Perform operations inside Docker container
The first thing you can try out is to build the Docker image and then enter inside the Docker container by specifying the -ti flag or /bin/bash parameter in your docker run command.
Clone the repository inside the container and while building the image you should also copy your training data from local to inside the docker. Run the training there and commit the changes so that you need not repeat the steps in future runs as after you exit from the container, all the changes are lost if not committed.
You can find the reference for docker commit here.
$ docker commit <container-id> <image-name:tag>
While training is going on check for the GPU and CPU utilization of the VM, see if everything is working as expected.
Use Anaconda environment on you VM
Anaconda is a great package manager. You can install anaconda and create a virtual environment and run your code in the virtual environment.
$ wget <url_of_anaconda.sh>
$ bash <path_to_sh>
$ source anaconda3/bin/activate or source anaconda2/bin/activate
$ conda create -n <env_name> python==2.7.*
$ conda activate <env_name>
Install all the dependencies via conda (recommended) or pip.
Run your code.
Q1: GAN Training with Tensorflow 1.4 inside Docker Stops without Prompting
Although Docker gives OS-level virtualization inside Docker, we face issues in running some processes which run with ease on the system. So to debug the issue you should go inside the image and performs the steps above in order to debug the problem.
Q2: Training stops without Releasing Memory connected to VM with SSH Connection
Ya, this is an issue I had also faced earlier. The best way to release memory is to stop the Docker container. You can find more resource allocation options here.
Also, earlier versions of TensorFlow had issues with allocating and clearing memory properly. You can find some reference here and here. These issues have been fixed in recent versions of TensorFlow.
Additionally, check for Nvidia bug reports
Step 1: Install Nvidia-utils installed via the following command. You can find the driver version from nvidia-smi output (also mentioned in the question.)
$ sudo apt install nvidia-utils-<driver-version>
Step 2: Run the nvidia-bug-report.sh script
$ sudo /usr/bin/nvidia-bug-report.sh
Log file will be generated in your current working directory with name nvidia-bug-report.log.gz. Also, you can access the installer log at /var/log/nvidia-installer.log.
You can find additional information about Nvidia logs at these links:
Nvidia Bug Report Reference 1
Nvidia Bug Report Reference 2
Log GPU load
Hope this helps.

Installed package disappear on Google Cloud VMS instance

so I have installed some python nltk libraries (pip3 install) and c++ libraries (via apt-get install package_name_xxxx) on two different VMS instances.
Python packages for nltk would disappear and require a reinstall after reboot or change of the vms instance (e.g., add memory, cpu core),
C++ libraries disappeared without rebooting or any change of the machine. I do not find anything in the systemlog, a reinstall with apt-get works fine. But I am trying to figure out why it happens.
Is your GCE instance a preemptible instance? this option restarts the instance once every 24 hours and could be the reason why you are missing some packages.
After about an hour of inactivity, modifications not within the $HOME directory are lost. This includes installed packages.
See Custom installed software packages and persistence and usage limits.