I am a scientist who is exploring the use of Dask on Amazon Web Services. I have some experience with Dask, but none with AWS. I have a few large custom task graphs to execute, and a few colleagues who may want to do the same if I can show them how. I believe that I should be using Kubernetes with Helm because I fall into the "Try out Dask for the first time on a cloud-based system like Amazon, Google, or Microsoft Azure" category.
I also fall into the "Dynamically create a personal and ephemeral deployment for interactive use" category. Should I be trying native Dask-Kubernetes instead of Helm? It seems simpler, but it's hard to judge the trade-offs.
In either case, how do you provide Dask workers a uniform environment that includes your own Python packages (not on any package index)? The solution I've found suggests that packages need to be on a pip or conda index.
Thanks for any help!
Use Helm or Dask-Kubernetes ?
You can use either. Generally starting with Helm is simpler.
How to include custom packages
You can install custom software using pip or conda. They don't need to be on PyPI or the anaconda default channel. You can point pip or conda to other channels. Here is an example installing software using pip from github
pip install git+https://github.com/username/repository#branch
For small custom files you can also use the Client.upload_file method.
Related
I want to deploy my wagtail (which is a CMS based on django) project onto an AWS lambda function. The best option seems to be using zappa.
Wagtail needs opencv installed to support all the features.
As you might know, just running pip install opencv-python is not enough because opencv needs some os level packages to be installed. So before running pip install opencv-python one has to install some packages on the Amazon Linux in which the lambda environment is running. (yum install ...)
The only solution that came to my mind is using lambda layers to properly install opencv.
But I'm not sure whether it's possible to use lambda layers with projects deployed by zappa.
Any kind of help and sharing experiences would be really appreciated!
There is an open pull request that is ready to merge, but needs additional user testing.
The older project has a pull request that claims layer support has been merged
Feel free to try it out and let the maintainers know so documentation can be updated.
Is there a way to do it right from a cell in the notebook? similar to pip install ... --upgrade
I didn't know how to do what's instructed on https://docs.qubole.com/en/latest/faqs/general-questions/install-custom-python-libraries.html#pre-installed-python-libraries
The current Python version is 3.5.3, and Pandas 0.20.1. I need to upgrade Pandas, and Matplotlib
In Qubole are two ways to upgrade/install a package for the python environment. Currently there is no interface available inside notebook to install new packages.
New and Recommended Way (via Package Mangement) : User can enable Package Management functionality for an account and add new packages to a cluster via UI. There are lot of advantages of using package management over cluster versions in terms of performance and usability. Refer to https://docs.qubole.com/en/latest/user-guide/package-management/index.html for further details.
Old Way (via bootstrap) : User can configure a bootstrap which is basically a shell script executed on each node when the cluster starts and or upscales (more nodes are getting added to cluster). This can be configured via clusters UI and need a cluster start for every change. This is what is instructed in link you shared.
You cannot download/upgrade packages directly from the cell in the notebook. This is because your notebook is associated to a cluster. Now, to ensure that all the nodes of the cluster have the package installed, you must either use the package management (https://docs.qubole.com/en/latest/user-guide/package-management/package-management-environment.html) or the cluster's node bootstrap (https://docs.qubole.com/en/latest/user-guide/clusters/run-scripts-cluster.html#examples-node-scripts).
Do let me know if you have any further questions.
I have recently tried out Gitpod, which seems to be a quite cool tool.
For testing purposes, I have opened some C++ GitHub repository of mine that uses Boost (among other libraries). Unfortunately, Boost does not seem to be installed in the Docker image, so my code does not compile.
I know about the possibility of creating own Docker images, but I was wondering if there are alternative, easier options as well. Does Gitpod provide any Environment Modules-like option to dynamically load/unload certain "commonly used" libraries or do I always have to provide my own Docker instance in this case?
I work on Gitpod. Thank you for trying it and the compliment :)
We didn't want to invent yet another module system for Gitpod.
Instead, we decided to support Dockerfiles and build them on-demand, because Dockerfiles allow using all those amazing module systems that are already out there: Debian's packages, Alpine's packages, Node Version Manager (NVM), Ruby Version Manager (RVM), SDKman, etc. Basically any Linux-compatible package manager down to simple wget.
You can also use own Docker images, but I find Dockerfiles more convenient because you can check them into git and thereby version them together with your source code. It's dev-environment-as-code and should be shared across the team. Also, you don't need to bother with building and pushing them to a registry (e.g. hub.docker.com).
What Gitpod does offer, hoever, is a selection of Docker images (src). The most prominent one is gitpod/workspace-full, which it Gitpod's default image.
To get back to your question about the easiest way to get the right modules into your Gitpod development environment:
inheriting from gitpod/workspace-full is very convenient.
If you don't want (2), copy'n'pasting sections from gitpod/workspace-full is convenient.
Often, putting RUN apt-get update && apt-get install -y libboost-all-dev into your Dockerfile is enough. This is APT to install the package libboost-all-dev.
Most software projects have documentation on how to build them under Linux. These instructions usually work in Dockerfiles, too.
Search on hub.docker.com for useful Docker images. You can inherit from those images or find their Dockerfiles and copy'n'paste sections from there.
I'm currently developing a web application (Django 2.0) application.
My app will be deployed on IBM Cloud (Cloud Foundry) using python build-pack.
One of my requirements is to install blender.
Everything else is very well, but for blender installation.
What I've tried so far was:
I tried access my app using SSH connection, but surely I don't have root access to apt-get install blender!!
And tried to include blender in packages.json file and push that file using cf push my-app.
But nothing worked for me.
In another shorter question: what is the main approach in Cloud Foundry Apps to install packages like when we use apt-get install in Ubuntu / Debian.
Please correct me if I did anything wrong, or guide me with headlines to solve this problem!!
I see a couple options for you to install packages if they cannot be installed using the regular requirements file (which is the preferred way):
Download the relevant libraries and put them in subfolders of the app before pushing it. The libraries will be uploaded. That is how I would do it.
Once you have an SSH connection, use secure copy (scp) to upload the files and place them in the subfolders where they are expected.
Regarding Blender, the question is what you need in addition to having the code copied over. Does it need a running daemon? Are there more dependencies? You would need to share more information about your specific app to answer that. Maybe, packaging everything as one or more containers and run it on Kubernetes or a combination of Cloud Foundry and Kubernetes is a better way.
I am wondering what would be the best approach for baking an AMI. Although it offers a lot of consistency, it is hard to achieve a level of consistency when you need to re-bake your AMI because of a small security update or new package version because more than likely you will end up updating the other packages you don't need to update and that can cause something to break.
So far I am baking all my package installs including docker and pulling base images (like Ubuntu for example).
I know it is possible to specify exactly what package version you need when you do apt-get install or its cfn-init equivalent, but what if it is no longer supported? Should I put my packages in an S3 bucket? But then what about all the dependencies? Are there any simple ways of doing apt-get install from s3 instead of going out to the 3rd party repo?
I just answered a similar question about baking resources into an AMI vs. using a configuration management tool like Chef, Puppet, etc.
Short answer is to try and not bake software into the AMI but rather build on top of base images with repeatable "recipes" (Chef term).
As for the specific versions of packages to install, you certainly can pin software dependencies to specific versions. If you aren't doing anything special with them I would strongly advise to use the native package managers where you can. As for packages not being available anymore, with Ubuntu LTS that hopefully shouldn't be much of an issue.
See the full answer here.