How to ensure software package version consistency in AWS SageMaker serverless compute? - amazon-web-services

I am learning AWS SageMaker which is supposed to be a serverless compute environment for Machine Learning. In this type of serverless compute environment, who is supposed to ensure the software package consistency and update the versions?
For example, I ran the demo program that came with SageMaker, deepar_synthetic. In this second cell, it executes the following: !conda install -y s3fs
However, I got the following warning message:
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.4.10
latest version: 4.5.4
Please update conda by running
$ conda update -n base conda
Since it is serverless compute, am I still supposed to update the software packages myself?
Another example is as follows. I wrote a few simple lines to find out the package versions in Jupyter notebook:
import platform
import tensorflow as tf
print(platform.python_version())
print (tf.version)
However, I got the following warning messages:
/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
The prints still worked and I got the results shown beolow:
3.6.4
1.4.0
I am wondering what I have to do to get the package consistent so that I don't get the warning messages. Thanks.

Today, SageMaker Notebook Instances are managed EC2 instances but users still have full control over the the Notebook Instance as root. You have full capabilities to install missing libraries through the Jupyter terminal.
To access a terminal, open your Notebook Instance to the home page and click the drop-down on the top right: “New” -> “Terminal”.
Note: By default, conda installs to the root environment.
The following are instructions you can follow https://conda.io/docs/user-guide/tasks/manage-environments.html on how to install libraries in the particular conda environment.
In general you will need following commands,
conda env list
which list all of your conda environments
source activate <conda environment name>
e.g. source activate python3
conda list | grep <package>
e.g. conda list | grep numpy
list what are the current package versions
pip install numpy
Or
conda install numpy
Note: Periodically the SageMaker team releases new versions of libraries onto the Notebook Instances. To get the new libraries, you can stop and start your Notebook Instance.
If you have recommendations on libraries you would like to see by default, you can create a forum post under https://forums.aws.amazon.com/forum.jspa?forumID=285 . Alternatively, you can bootstrap your Notebook Instances with Lifecycle Configurations to install custom libraries. More details here: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateNotebookInstanceLifecycleConfig.html

Related

Google cloud compute engine - disable automatic updates (centos)

I wonder if there is a way to disable automatic updates of our Linux machines on Google Cloud (yum update)
As far as I know during maintenance window our servers get new packages of software installed. (I checked yum.log). Since our installed software must be specific version (not latest) we don't want Google to run updates for us because it usually breaks all kind of dependencies...
I have searched on Google but didn't find any info about that.
Thanks.
The centOS 7 image used in Compute Engine includes the yum-cron installed and enabled by default. You can verify it by either using one of the following commands:
sudo yum list installed yum-cron
sudo systemctl status yum-cron.service
The yum-cron will periodically check for updates and apply them if there are updates available.
Solution
If you have yum-cron running on your instance, you can disable auto-updates by accessing the configuration file /etc/yum/yum-cron.conf. Then change the following variables to ‘no’:
update_messages = no
download_updates = no
apply_updates = no
This will prevent the system from updating automatically.
As an alternative, you can opt for uninstalling the package on your system using the following command.
sudo yum remove yum-cron
This part is missing in the official documentation so It will be added soon.

How do I upgrade a library in Qubole's Jupyter Notebook, using PySpark?

Is there a way to do it right from a cell in the notebook? similar to pip install ... --upgrade
I didn't know how to do what's instructed on https://docs.qubole.com/en/latest/faqs/general-questions/install-custom-python-libraries.html#pre-installed-python-libraries
The current Python version is 3.5.3, and Pandas 0.20.1. I need to upgrade Pandas, and Matplotlib
In Qubole are two ways to upgrade/install a package for the python environment. Currently there is no interface available inside notebook to install new packages.
New and Recommended Way (via Package Mangement) : User can enable Package Management functionality for an account and add new packages to a cluster via UI. There are lot of advantages of using package management over cluster versions in terms of performance and usability. Refer to https://docs.qubole.com/en/latest/user-guide/package-management/index.html for further details.
Old Way (via bootstrap) : User can configure a bootstrap which is basically a shell script executed on each node when the cluster starts and or upscales (more nodes are getting added to cluster). This can be configured via clusters UI and need a cluster start for every change. This is what is instructed in link you shared.
You cannot download/upgrade packages directly from the cell in the notebook. This is because your notebook is associated to a cluster. Now, to ensure that all the nodes of the cluster have the package installed, you must either use the package management (https://docs.qubole.com/en/latest/user-guide/package-management/package-management-environment.html) or the cluster's node bootstrap (https://docs.qubole.com/en/latest/user-guide/clusters/run-scripts-cluster.html#examples-node-scripts).
Do let me know if you have any further questions.

install be_helper on datalab

I knew that BigQuery module is already installed on datalab. I just wanna to use bq_helper module because I learned it on Kaggle.
I did !pip install -e git+https://github.com/SohierDane/BigQuery_Helper#egg=bq_helper and it worked.
but I can't import the bq_helper. The pic is shown below.
Please help. Thanks!
I used python2 on Datalab.
I am not familiar with the BigQuery Helper library you shared, but in general, in Datalab, it may happen that you need to restart the kernel in order for the libraries to be properly loaded.
I reproduced the scenario you proposed: installing the library with the command !pip install -e git+https://github.com/SohierDane/BigQuery_Helper#egg=bq_helper and then trying to import it in the notebook using:
from bq_helper import BigQueryHelper
bq_assistant = BigQueryHelper("bigquery-public-data", "github_repos")
bq_assistant.project_name
At first, it did not work and I obtained the same error as you; then I clicked on the Reset Session button and the library was loaded properly.
Some other details that may be relevant if this does not work for you are:
I am also running on Python2 (although the GitHub page of the library suggests that it was only tested in Python3.6+).
The Custom metadata parameters in the Datalab GCE instance are: created-with-datalab-version: 20180503 and created-with-sdk-version: 208.0.2.

Cannot run Google ML engine locally due to Tensorflow issues

I'm trying to run the Google Cloud ML engine locally for debugging purposes by running the command gcloud ml-engine local predict --model-dir=fasttext_cloud/ --json-instances=debug_instance.json. However, I'm getting the error: ERROR: (gcloud.ml-engine.local.predict) Cannot import Tensorflow.
This is strange as Tensorflow works fine on my machine. Even a simple example like python -c 'import tensorflow' has no issues whatsoever.
Is TensorFlow installed in a virtual environment or a non-standard location that isn't on the Python path when running from gcloud?
Its a bit kludgy but I would do the following to check the Python path being used by gcloud. Modify the file
${GCLOUD_INSTALL_LOCATION}/google-cloud-sdk/lib/surface/ml_engine/__init__.py
At the top of the file add
import sys
print("\n".join(sys.path))
Then run
gcloud ml-engine
This should print out the python path and you can now check that it includes the location where TensorFlow is installed.
Can you upgrade to the latest gcloud release (171.0.0) and retry?
To upgrade, run
$ gcloud components update

AWS Lambda : How to use Pillow library?

I'm trying to create an AWS lambda function in order to create thumbnail of my uploaded images.
My script is running well locally, I followed this tutorial to deploy my function but I have a problem with the Pillow library, indeed when I'm testing my function I can see this following log :
I found this post with the same issue but in my case I can't execute command line on the machine.
You must include the libjpeg.so in your lambda package, but it will also require some tweaking with the patchelf utility. Assuming that you prepare the lambda package via "pip install module-name -t" (rather than via virtualenv), do the following:
cd into/your/local/lambda/package/dir
cp -L $(ldd PIL/_imaging.so|grep libjpeg|awk '{print $3}') PIL/
patchelf --set-rpath PIL PIL/_imaging.so
# zip, deploy and test the package
This script works for Pillow version 3.2.0.
Regarding patchelf: under Ubuntu it can be 'apt install'ed, but under other Linuxes it may need to be built from source.
The problem here is that Pillow uses native libraries that must be built for the exact correct environment.
I solved this by installing my requirements in a Docker container that replicates very closely the AWS Lambda environment, lambci/lambda. I used the build-python3.8 version.
I installed my requirements there and zipped up the whole contents of /var/lang/lib/python3.8/site-packages/ directory along with my lambda function file.
I tried this with a standard Amazon Linux Docker image and it didn't work. Only the lambci/lambda image worked for me.