What is the default environment on Google Cloud ML VMs? - google-cloud-ml

https://cloud.google.com/ml/docs/concepts/training-overview mentions the following:
If your trainer application has any dependencies that are not already
on the default virtual machines that Cloud ML uses, you must package
them and upload them to a Google Cloud Storage location as well.
What is "already on the default virtual machines that Cloud ML uses"? I couldn't find this info anywhere.
Incidentally, are there any published specs of the machine types here? https://cloud.google.com/ml/reference/rest/v1beta1/projects.jobs#ScaleTier

What is pre-installed on the CloudML machines is in the process of being documented. In the meantime, this is an informal list of packages with their versions:
numpy==1.10.4
pandas==0.17.1
scipy==0.17.0
scikit-learn==0.17.0
sympy==0.7.6.1
statsmodels==0.6.1
oauth2client==2.2.0
httplib2==0.9.2
python-dateutil==2.5.0
argparse==1.2.1
six==1.10.0
PyYAML==3.11
wrapt==1.10.8
crcmod==1.7
google-api-python-client==1.5.1
python-json-logger==0.1.5
gcloud==0.18.1
subprocess32==3.2.7
wheel==0.30.0a0
WebOb==1.6.2
Paste==2.0.3
tornado==4.3
grpcio==1.0.1
requests==2.9.1
webapp2==3.0.0b1
bs4==0.0.1
Pillow==3.4.1
nltk==3.2.1
python-snappy==0.5
google-cloud-dataflow==0.5.1
google-cloud-logging==0.22.0
In terms of published specs of machine types, those are not available.

Related

Import custom VM image, can it really be true only MBR bootloader is supported?

Was reading the "Manually importing boot disks" Google Cloud documentation and it says only MBR bootloader is supported.
I only use GPT and EFI loaders in my environment, this is 2022!
You can import UEFI images as described in this document.
List of supported systems can be found here.
When creating an image, add the --guest-os-features=UEFI_COMPATIBLE flag as listed here.
To check if an instance supports UEFI, you can run following command:
gcloud compute instances describe INSTANCE_NAME --zone=ZONE | grep type
If the results includes type: UEFI_COMPATIBLE then it can run from UEFI images.

Accessing Airflow REST API in AWS Managed Workflows?

I have Airflow running in AWS MWAA, I would like to access REST API and there are 2 ways to do this but doesn't seem to work for me.
Overriding api.auth_backend. This used to work and now AWS MWAA won't allow you to add this, it is consider as 'blocklist' and not allow.
api.auth_backend = airflow.api.auth.backend.default
Using MWAA Cli(Python). This doesn't work if any of the DAGs uses packages that are in requirments.txt file.
a. as an example, I have "paramiko" in requirements.txt because I have a task that uses SSHOperator. The MWAA Cli fails with "no module paramiko"
b. Also noted here, https://docs.aws.amazon.com/mwaa/latest/userguide/access-airflow-ui.html
"Any command that parses a DAG (such as list_dags, backfill) will fail if the DAG uses plugins that depend on packages that are installed through requirements.txt."
We are using MWAA 2.0.2 and managed to use Airflow's Rest-API through MWAA CLI, basically following the instructions and sample codes of the Apache Airflow CLI command reference. You'll notice that not all Rest-API calls are supported, but many of them are (even when you have a requirements.txt in place).
Also have a look at AWS sample codes on GitHub.

How does lambda functions works in serverless?

Is there any enviroment where handler.js is running? And if so what if somehow run sudo rm -rf ~/ in AWS lambda?
How do think what will happen?
You can think of a Lambda function as a managed (short-lived) docker container (although Micro-VM would be more correct, as we learned at re:Invent 2018). You define the compute and RAM resources your "container" has to run a function.
As the documentation states, you get the following environment:
The underlying AWS Lambda execution environment includes the following
software and libraries.
Operating system – Amazon Linux
AMI – amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2
Linux kernel – 4.14.77-70.59.amzn1.x86_64
AWS SDK for JavaScript – 2.290.0
SDK for Python (Boto 3) – 3-1.7.74 botocore-1.10.74
Furthermore you're provided with some temporary storage (at the moment 500MB) at /tmp/.
AWS tries to re-run the handler function for each Lambda-Invocation (see here for more details), if there is already a "container" running, so I'd imagine you could break your own container - although it apparently doesn't have sudo privileges, so there's limited impact that you can have with your sudo rm -rf.

GCP, Composer, Airflow, Operators

As Google Cloud Composer uses Cloud Storage to store Apache Airflow DAGs. However, where the operators are stored ? I am getting an error as below:
Broken DAG: [/home/airflow/gcs/dags/example_pubsub_flow.py] cannot import name PubSubSubscriptionCreateOperator.
This operator was added in Airflow 1.10.0 . As of today, Cloud Composer is still using Airflow 1.9.0, hence this operator is not available yet. You can add this as a plugin.
apparently, according to the following post in this message in the Composer Google Group list, to install as a plugin the contrib is not needed to add the Plugin boilerplate.
It is enough with registering the plugins via this command:
gcloud beta composer environments storage plugins import --environment dw --location us-central1 --source=custom_operators.py
See here for detail.
The drawback is that if your contrib operator uses others you will have to copy also those and modify the way they are imported in python, using:
from my_custom_operator import MyCustomOperator
instead of:
from airflow.contrib.operators.my_custom_operator import MyCustomOperator

chef-solo explained

Can somebody help me out understanding chef-solo. Still I didnt understood the part whether I have to run chef-solo on my machine to run provision a machine or I need to first provision a machine and install solo on the new machine that I provisioned. I need to understand end to end flow. Please help me better understanding.
There is a detailed explanation on how to use Chef-Solo in AWS environment in Integrating AWS CloudFormation With Opscode Chef.pdf
Chef Solo can be used to deploy Chef cookbooks and roles without a
dependency on a Chef Server. Chef Solo can be installed via a Ruby
Gem package; however, it requires a number of other dependent
packages to be installed. By using resource metadata and the AWS
CloudFormation helpers, you can deploy Chef Solo on a base AMI via
Cloud-init.
You can either use the cloud formation that is provided in the PDF above, or you can create the files and run the script (that are embedded in this template) yourself.
You can also use chef-solo with vagrant for testing how a target linux distro will behave on your local machine; however, your question is more about an end to end with AWS - so here we go.
end to end you need on the target machine the following dependencies at least:
ruby, the gem chef, ssh, git or some other way of getting your code to the vm.
You ssh into the machine, you get the recipes you want to use on the target machine, you run chef-solo with some parameters that specify at a minimum some attributes, the location of your cookbooks, and a run list that contains some recipes or roles to apply to the target machine. Below is an example for getting the apt recipe and mongo recipes (https://github.com/opscode-cookbooks/apt and https://github.com/edelight/chef-mongodb) ... I cloned those into the /opt/devops location on the target machine.
chef-solo -c solo.rb -j node.json
solo.rb contents
file_cache_path "/opt/devops/log"
cookbook_path "/opt/devops/cookbooks"
node.json contents
{
"node": {
"vm_ip": [ "192.168.33.10" ],
"myProject": {
"git_revision":"bzrDevel",
}
},
"run_list": ["recipe[apt]","recipe[mongodb]" ]
}