How to avoid installing the same software on google cloud instance? - google-cloud-platform

I am using the compute engine of the google cloud platform to do computations.
I am using Ubuntu as the OS and every time I create a new instance, I have to install the software I need from scratch, including the build-essential.
I am pretty sure there is a way to specify the software I would like to have in my VM but couldnĀ“t figure out a straightforward way to do it.

You should use GCE custom images to create VM images with pre-installed software that you need.
Alternatively, you can consider using startup scripts in which you can install software during VM startup. But in contrast to custom images it will increase VM startup time, because startup script will be running during VM startup.

Related

Is it possible to use Sagemaker Notebooks with a Docker image as your environment?

I'm currently developing a system that some private libraries. I'm developing in local mode and then when I need to process something specific I use Sagemaker Processing Jobs. The thing is that in order to speed up the process it would be nice to have the possibility of developing everything in a cloud environment.
I'm wondering if is possible to use the same Docker image that I use
for batch processing (the one that I use for Sagemaker Processing Job)
in my Sagemaker Jupyter Notebooks of my cloud environment?
The main problem here is that every time that I work in my cloud Notebooks I have to deal with dependencies conflicts and etc. Using a Docker image would avoid this, and will also allow to each member of the team use the same image to develop in the cloud without having to deal with these kind of conflicts.
You can use the same Docker image to run a processing job locally using SageMaker local mode (basically setting the instance_type parameter on the Processor to local.
However, it sounds like you'd want to use the same image as your dev environment in notebooks. In SageMaker notebook instances, the solution would be to create and maintain conda environments with the same requirements and versions (you can also use LCCs to install a set of packages at notebook start, see some samples here).
An alternative is to use SageMaker Studio, where you can create and bring your own custom image for Studio. There is a detailed tutorial here, and some sample dockerfiles for you to get started here.

Using GPU with containers and Container Optimized OS in Google Cloud VM

I would like to run a custom Docker image with GPU on Google Compute Engine.
I have built and pushed the image to the Google Container Registry.
It seems logical to use Container-Optimized OS for a host machine in Google Cloud Engine since I don't need any extra soft on the host machine except Docker, Nvidia GPU drivers and nvidia-container-runtime.
I managed to install nvidia-drivers with this solution.
But I can't run my Docker image with GPU (using --gpu all option) without nvidia-container runtime. This step is specified in official Docker documentation.
Is there a way to install nvidia-container-runtime on Container-Optimized OS in Google Cloud VM?
You don't have to set --gpu all, because this is the default option for nvidia-container-runtime. The assumption, that you don't need anything else is wrong, because it requires libnvidia-container.
To precisely answer the question: No, because libnvidia-container needs to be installed on the OS and nvidia-container-runtime needs to be installed within the K8s container. The one exposes an interface - and the other connects it. And so the one is useless without the other.

How can I use Manjaro in google compute engine with GUI

I want to use Manjaro in GCP compute engine with GUI but there isn't any image available in compute engine
You need to install Manjaro in your own machine first using something like VirtualBox. Then you can upload the local boot disk to GCS and use that.
See details here: https://cloud.google.com/compute/docs/images/importing-virtual-disks
Seems Manjaro is based on Arch but GCP only supports CentOS/Debian/Red Hat/Ubuntu
(https://cloud.google.com/compute/docs/images/importing-virtual-disks#supported_operating_systems)
Don't if it will work.
For GUI, you need to install X and VNC server then use vnc client to connect.
Besides what Cloud Ace already mentions, there's another resource I believe you might find of interest.
There is actually an ArchLinux dedicated repo in the official GCP GitHub page, which contains some instructions as per how to install ArchLinux into a GCE instance by either using a preconfigured image in the available public images in GCP or building yourself your own custom image. I believe the process of building an image could be attempted with Manjaro, given the fact that it is originally intended for ArchLinux. It has potential to work.
In the end, if the custom image building does not work out with Manjaro, you can always use the Arch Linux public image mentioned in the GitHub I shared (which is the minimal base Arch Linux image), and install the desktop environment you like.
Hope this helps.

Machine Learning (NLP) on AWS. Cloud9? SageMaker? EC2-AMI?

I have finally arrived in the cloud to put my NLP work to the next level, but I am a bit overwhelmed with all the possibilities I have. So I am coming to you for advice.
Currently I see three possibilities:
SageMaker
Jupyter Notebooks are great
It's quick and simple
saves a lot of time spent on managing everything, you can very easily get the model into production
costs more
no version control
Cloud9
EC2(-AMI)
Well, that's where I am for now. I really like SageMaker, although I don't like the lack of version control (at least I haven't found anything for now).
Cloud9 seems just to be an IDE to an EC2 instance.. I haven't found any comparisons of Cloud9 vs SageMaker for Machine Learning. Maybe because Cloud9 is not advertised as an ML solution. But it seems to be an option.
What is your take on that question? What have I missed? What would you advise me to go for? What is your workflow and why?
I am looking for an easy work environment where I can quickly test my models, exactly. And it won't be only me working on it, it's a team effort.
Since you are working as a team I would recommend to use sagemaker with custom docker images. That way you have complete freedom over your algorithm. The docker images are stored in ecr. Here you can upload many versions of the same image and tag them to keep control of the different versions(which you build from a git repo).
Sagemaker also gives the execution role to inside the docker image. So you still have full access to other aws resources (if the execution role has the right permissions)
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb
In my opinion this is a good example to start because it shows how sagemaker is interacting with your image.
Some notes on other solutions:
The problem of every other solution you posted is you want to build and execute on the same machine. Sure you can do this but keep in mind, that gpu instances are expensive and therefore you might only switch to the cloud when the code is ready to run.
Some other notes
Jupyter Notebooks in general are not made for collaborative programming. I think they want to change this with jupyter lab but this is still in development and sagemaker only use the notebook at the moment.
EC2 is cheaper as sagemaker but you have to do more work. Especially if you want to run your model as docker images. Also with sagemaker you can easily build an endpoint for model inference which would be even more complex to realize with ec2.
Cloud 9 I never used this service and but on first glance it seems good to develop on, but the question remains if you want to do this on a gpu machine. Because you're using ec2 as instance you have the same advantage/disadvantage.
One thing I'd like to call out first is SageMaker notebook is not the only IDE environment in which you can interact with other components of SageMaker such as training and hosting. In fact you can make API calls to SageMaker training/hosting through Cloud9 or any IDEs you've installed on EC2 or even your laptop, as long as you have AWS SDK or SageMaker Python SDK installed.
Regarding the choice of the IDE, it's really up to your particular needs. SageMaker notebook is Jupyter based (now also supports JupyterLab beta), ML focused, and fully managed. Hundreds of Python packages that are commonly used in ML, as well as Tensorflow, Keras, MxNet, SageMaker Python SDK, etc., are preinstalled and automatically maintained for you. It also integrates more closely with other components of SageMaker as one can imagine.
Cloud9 is a managed IDE too but it is for general purpose rather than ML specific. If you want to use Jupyter on cloud9 it requires extra work from your side. It does not preinstall and maintain the version of common ML/DL related packages like SageMaker notebook does.

Deploying Containers on Compute Engine VMs

I'm a little bit confused, GCP has this new feature Deploying Containers on VMs and Managed Instance Groups which is currently marked as an Alpha release of Containers on Compute Engine and you actually need to request to be whitelisted for this feature.
What I'm struggling with is to understand how is it different from simply choosing Container-Optimized OS in the list of OS images when creating a new CE instance and then running your docker container on that instance? What are the benefits of the new approach?
Container-Optimized OS images have a number of benefits if all you want to do is run containers on your Compute Engine instance.
There is less configuration involved as they come pre-installed and configured with Docker which will already be running as a service when the machine starts.
There is a tick box in the Console when creating a new Container-Optimized OS instance labelled "Deploy a container image to this VM instance". Checking this provides a method of deploying containers/adding images via the Console/Gui and adding settings for commands to be issued to the container, restart policies, environmental variables, host mounts and other mount paths. This essentially allows you to bring up a container at the same time you create your VM.
In general it's more secure as it has a smaller attack surface than a standard VM, as the OS has a smaller footprint. It also includes a 'locked down' firewall and other security settings.
Due to the fact the OS is based on Chromium OS project, and not a full Linux OS, it benefits from automatic updates and comes configured to automatically download weekly updates (a reboot is necessary to install these updates).
So if you want to run containers with minimal setup on a simple operating system with high security, Container-Optimized OS may be suitable.
It should also be said that there are some use cases where these images are not suitable. For example, if you require the flexibility of a full Linux OS (for example, Container-Optimized OS doesn't include a package manager) or if your containers depends on Linux/kernel modules that may not be available in Container-Optimized OS. It would also not be suitable if you wanted your image and OS application to be supported outside of Google Cloud Platform. You would be better off considering public images other than Container-Optimized OS images in these scenarios.