Removed and reinstalled Anaconda on my AWS Deep Learning AMI EC2 instance and now can't enter preconfigured deep learning environments - amazon-web-services

I've just set up an Ubuntu Deep Learning AMI EC2 instance. I'm a total beginner on AWS/package handling stuff.
My aim is to use the instance to execute a Python deep learning script. This script uses a variety of packages.
When installing some of these packages with conda, I got an error stating environment inconsistencies for 100+ packages. After many attempts to solve this, I thought removing Anaconda and reinstalling may do the trick. After doing this, I've realised I may have messed up my instance even more. I can now no longer use the preset deep learning environments the AMI has been configured for, as these were accessed using conda commands, which (IMO) I seem to have removed.
I've tried repeating the commands, but I am getting an error stating these environments no longer exist. A tutorial using these commands is mentioned here:
https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-conda.html
source activate tensorflow_p36
I expected the above to enter me into the tensorflow_p36 environment. As in:
(tensorflow_p36) ubuntu#ip-XXX-XX-XX-XX:~/scripts
However it gives an error message:
could not find environment: tensorflow_p36
I realise uninstalling conda was a major rookie error which seems to have totally disabled my instance. If anyone has any ideas to salvage it that would be much appreciated!
Thanks very much

Not exactly your question, but if anybody else is thinking about uninstalling conda from the deep learning AMI because it seems insane, this might help.
The AWS Deep Learning AMIs is configured in a way that makes it refuse to install conda environments that work reliably on other machines. This seems to fix the problem for me:
conda config --set channel_priority false
(This is maybe obvious to conda-heads, but confounded me for a while, so hopefully this helps somebody else.)

Related

How to get a local Cloud Foundry Instance?

I’m looking to learn about Cloud Foundry and I’m trying to get a development instance of it set up on my local Windows 10 PC. But I’m not having any luck.
I’m finding a lot of information about PCF Dev which was deprecated a while ago. I also looked at the replacement for PCF Dev, CF Dev (https://github.com/cloudfoundry-attic/cfdev). Its git page mentions that its repository is no longer receiving updates. I still went ahead and tried installing it using the instructions in the README:
cf install-plugin -r CF-Community cfdev
But the link it uses to download the plugin is broken:
Starting download of plugin binary from repository CF-Community...
Get "https://d3p1cc0zb2wjno.cloudfront.net/cfdev/cfdev-v0.0.18-rc.36-windows.exe": dial tcp: lookup d3p1cc0zb2wjno.cloudfront.net: no such host
Can anyone recommend a way to get a development instance of Cloud Foundry set up on my local machine so I can play around with it?
Thanks
Yes, steer clear of pcf-dev and cf-dev, they may still work but are definitely not getting updates so will be way out of date by now.
My understanding, although I haven't tried this process in a while, is that the way to run locally is with VirtualBox. You can run one locally using bosh-deployment & cf-deployment and Virtualbox.
For instructions installing Bosh in VirtualBox using bosh-deployment, see the Install Section to install Bosh.
With Bosh installed, follow the deployment guide to get CF installed. You can skip to step 4, since you're installing into VirtualBox. Be sure to read the entire document before you begin, however pay specific attention to this section which has specific instructions for running locally.

Versions of Python & Spark to work with VS Code Notebooks

I'm developing scripts for AWS Glue, and trying to mimic development environment as close as possible to their specs here. Since it's a bit costly to run a Notebook server/development endpoint, I set everything up on local machine instead, develop scripts on VS Code Notebook, due to its usefulness.
There're some troubles with Notebook setup due to incompatible versions between installed Python & Spark.
For Python, I have gone through some harsh time to clean up, and its version is 3.8.3 now
For Spark, I use the manual method with version of 2.4.3, since I plan to use Scala alongside at later time. I install the findspark package to load that version as expected.
And it doesn't work! The error was TypeError: an integer is required (got type bytes)
I've searched around, and people said to downgrade to Python 3.7 using pyenv, and I got 3.7.7 installed but still had the same error
As a last resort, I tried pip install pyspark. it's Spark 3.0.0, and works fine, but not as expected.
Hope there's someone have experiences of this matter
A better approach would be to install the glue dependencies on docker then ssh into that docker container using VS code to mimic exact glue local dev environment.
I've written a blog about the same if you like to refer
https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

Does stopping google cloud instance will loose the installed programs on it ?

I have recently initialized a GPU instance on Google cloud, and installed Anaconda and installed all required dependencies before I stoped that instance. Now when I started the instance, it does not have anaconda installed in it. I found it is so weird. Please let me know if you know any details on it. I also looked into details from the doc of google, I don't find any related comments that should behave like this.
https://cloud.google.com/compute/docs/instances/stopping-or-deleting-an-instance
No, this should not happen if programs got installed properly in persistent/boot disk file system.
If programs are supposedly installed in TMPFS or other memory mapped file system then after the instance is rebooted the memory contents would be lost and consequently data and links to it.
However, this is never done as VM Instance packages are installed in persistent disk.
I guess your installation failed for some reason. Check if the packages are still installed. If you are using a Redhat Linux variant you can use ‘yum list installed’ to see all installed packages or ‘yum list installed|grep -i <package-to-search-for> to filter out a particular package.
If the package shows up, then the issue could be related to a misconfiguration or other problem somewhere. Use dmesg and/or cat /var/log/messages to view the logs and try to find any problems there which may be related to Anaconda or GPU software.
I just encountered the same problem. I know this question is dated but might help a complete beginner like myself. In my case I needed to SSH onto the instance instead of just being in the project level virtual environment.
gcloud beta compute ssh --zone "europe-west2-c" "myinstancename" --project "fired-brimstone-234534"

How to run pdftk on elastic beanstalk

I am trying to run pdftk on an Elastic Beanstalk. The first problem I run into is that I cannot install pdftk on an instance of a Amazon Linux AMI because one of the dependencies (gcj) is not supported.
One of the options I am looking at is creating my own AMI and using that for my Elastic Beanstalk. Amazon recommends not doing this, and there are no community images for EB and Ubuntu.
Another option is using Docker. I am not as familiar with Docker, but I think I would be able to install pdftk in a container and then deploy that to EB. I am using Codeship for deployments and it looks like they have some options for Docker. (This is the options I'm currently exploring)
The last option I can think of is writing a library for encrypting pdfs on my own. I had a look at the encryption specifications for pdfs and I think this is not a time efficient option.
Has any one had a similar problem and found a good solution to the problem?
UPDATE:
After some more research I discovered that the issue was not with Amazon Linux bug with Fedora. Fedora dropped gcj because there was a lack of maintainers on the project, then dropped pdftk because it depends on gcj.
If you need another pdf tool kit I have found podofo to be a good replacement for what I've needed.
First I apologise for resurrecting an old thread! Recently we wanted to create an Elastic Beanstalk worker environment that uses pdftk. Of course we also stumbled on the same issue, so this is what we did and it works for us so far. I hope it'll work for others too.
In the .ebextensions folder add the linked configs:
The needed LaTeX packages:
packages.config
You'll also need to add the el5 library in order to install libgcj.
01_el5_yum.config
Next add this config with the commands to install libgcj, pdftk and pdfjam
02_pdftk.config
And that should be it.
In case anyone comes here having problems with pdftk - poppler-utils also cover some tasks done by pdftk (in my case it was pdf splitting) and can be easily set up on an EB instance through .ebextensions:
packages:
yum:
poppler-utils: []

How to achieve consistency of re-baking an AMI

I am wondering what would be the best approach for baking an AMI. Although it offers a lot of consistency, it is hard to achieve a level of consistency when you need to re-bake your AMI because of a small security update or new package version because more than likely you will end up updating the other packages you don't need to update and that can cause something to break.
So far I am baking all my package installs including docker and pulling base images (like Ubuntu for example).
I know it is possible to specify exactly what package version you need when you do apt-get install or its cfn-init equivalent, but what if it is no longer supported? Should I put my packages in an S3 bucket? But then what about all the dependencies? Are there any simple ways of doing apt-get install from s3 instead of going out to the 3rd party repo?
I just answered a similar question about baking resources into an AMI vs. using a configuration management tool like Chef, Puppet, etc.
Short answer is to try and not bake software into the AMI but rather build on top of base images with repeatable "recipes" (Chef term).
As for the specific versions of packages to install, you certainly can pin software dependencies to specific versions. If you aren't doing anything special with them I would strongly advise to use the native package managers where you can. As for packages not being available anymore, with Ubuntu LTS that hopefully shouldn't be much of an issue.
See the full answer here.