How to load an .RData file to AWS sagemaker notebook? - amazon-web-services

I just started using the AWS sagemaker and I have an xgboost model saved in my personal laptop using save as .Rdata, saveRDS, xgb.save commands. I have uploaded those files in my Sagemaker notebook instance where my different notebooks are. However, I am unable to load it to my environment and predict for test data by using the following commands:
load("Model.RData")
model=xgb.load('model')
model <- readRDS("Model.rds")
When I predict, I get NAs as my prediction. These commands work fine on Rstudio but not on sagemaker notebook.Please help

Related

Upload tensorboard logs from cloud storage to vertex ai - tensorboard

I created a pipeline with vertex ai and added the code for creating and storing my tensorboard logs in cloud storage. The next step in the instructions here https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview#getting_started is to use the tb-gcp-uploader command to upload the logs to the tensboard experiment page. But I'm getting this message "'tb-gcp-uploader' is not recognized as an internal or external command". Any thoughts?
You should be able to run the command tb-gcp-uploader by installing the following package:
pip install google-cloud-aiplatform[tensorboard]

uploading zip to GCP jupyterlab is superslow

I was using the jupyterlab notebook instance at AI platform at GCP. You can access this by 1) entering GCP console, 2) search notebook instance and choose the entry with the subtitle of AI platform. 3) create one.
When I upload a zip to the jupyterlab, the speed is very very slow.
Don't know what to do. It is very frustrating when cost a day just to upload the data.
The Davic at GCP 24/7 chat support is helpful. After checking a bunch of things such as network speed (http://speedtest.net)
I found the speed of uploading a single file is pretty fast. And the network is pretty good too. Since my dataset is available at Kaggle, I just thought why not download directly from kaggle.
So I used the following commands:
pip install kaggle
mv kaggle.json /home/jupyter/.kaggle. # download your kaggle.json from profile page, upload it to jupyterlab, and move this place
chmod 600 /home/jupyter/.kaggle
kaggle download datasets {username/dataset name}
It is done!! Just 5 seconds, I guess, the dataset is deployed!!

What to define as entrypoint when initializing a pytorch estimator with a custom docker image for training on AWS Sagemaker?

So I created a docker image for training. In the dockerfile I have an entrypoint defined such that when docker run is executed, it will start running my python code.
To use this on aws sagemaker in my understanding I need to create a pytorch estimator in a jupyter notebook in sagemaker. I tried something like this:
import sagemaker
from sagemaker.pytorch import PyTorch
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
estimator = PyTorch(entry_point='train.py',
role=role,
framework_version='1.3.1',
image_name='xxx.ecr.eu-west-1.amazonaws.com/xxx:latest',
train_instance_count=1,
train_instance_type='ml.p3.xlarge',
hyperparameters={})
estimator.fit({})
In the documentation I found that as image name I can specify the link the my docker image on aws ecr. When I try to execute this it keeps complaining
[Errno 2] No such file or directory: 'train.py'
It complains immidiatly, so surely I am doing something completely wrong. I would expect that first my docker image should run, and than it could find out that the entry point does not exist.
But besides this, why do I need to specify an entry point, as in, should it not be clear that the entry to my training is simply docker run?
For maybe better understanding. The entrypoint python file in my docker image looks like this:
if __name__=='__main__':
parser = argparse.ArgumentParser()
# Hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--epochs', type=int, default=5)
parser.add_argument('--batch_size', type=int, default=16)
parser.add_argument('--learning_rate', type=float, default=0.0001)
# Data and output directories
parser.add_argument('--output_data_dir', type=str, default=os.environ['OUTPUT_DATA_DIR'])
parser.add_argument('--train_data_path', type=str, default=os.environ['CHANNEL_TRAIN'])
parser.add_argument('--valid_data_path', type=str, default=os.environ['CHANNEL_VALID'])
# Start training
...
Later I would like to specify the hyperparameters and data channels. But for now I simply do not understand what to put as entry point. In the documentation it says that the entrypoint is required and it should be a local/global path to the entrypoint...
If you really would like to use a complete separate by yourself build docker image, you should create an Amazon Sagemaker algorithm (which is one of the options in the Sagemaker menu). Here you have to specify a link to your docker image on amazon ECR as well as the input parameters and data channels etc. When choosing this options, you should not use the PyTorch estimater but the Algoritm estimater. This way you indeed don't have to specify an entrypoint because it simple runs the docker when training and the default entrypoint can be defined in your docker file.
The Pytorch estimator can be used when having you own model code, but you would like to run this code in an off-the-shelf Sagemaker PyTorch docker image. This is why you have to for example specify the PyTorch framework version. In this case the entrypoint file by default should be placed next to where your jupyter notebook is stored (just upload the file by clicking on the upload button). The PyTorch estimator inherits all options from the framework estimator where options can be found where to place the entrypoint and model, for example source_dir.

How to install Kaggle on Jupyter Notebook services in Google Cloud

I've using Google Colab for computing my Kaggle competition, nowadays I decided to take a look if it'll work faster using services on Google Cloud. I have a *.ipybn file from Google Cloud, downloaded it and try to upload it to Google Cloud instance.
I created all connection on Google Colab using this link: https://towardsdatascience.com/setting-up-kaggle-in-google-colab-ebb281b61463 and it worked fine.
Using this tutorial: https://towardsdatascience.com/how-to-use-jupyter-on-a-google-cloud-vm-5ba1b473f4c2 I started a new instance for Jupyter notebook. Uploaded a *ipybn file, I tried to install Kaggle and run my notebook, but I have usually following errors:
kaggle: command not found error
ensure that your python binaries are on your path
How can I set everything to work on Google Cloud service?
Using this first tutorial mind about changing root directory path from content to /home/jupyter/, for example:
import zipfile
zip_ref = zipfile.ZipFile("/home/jupyter/Airbus_competition/input/test_v2.zip", 'r')
zip_ref.extractall("/home/jupyter/Airbus_competition/input/test_v2")
zip_ref.close()
For problems with installing kaggle, you don't have access to root folder from Jupyter notebooks, but you can install and use Kaggle API, when you change the command from !kaggle to !~/.local/bin/kaggle, for example (commands from tutorial changed to be working on GCS):
!mkdir ~/.kaggle
import json
token = {"your_TOKEN"}
with open('/home/jupyter/.kaggle/kaggle.json', 'w') as file:
json.dump(token, file)!cp /home/jupyter/.kaggle/kaggle.json
~/.kaggle/kaggle.json
!~/.local/bin/kaggle config set -n path -v{home/jupyter/Airbus_competition}
!chmod 600 /home/jupyter/.kaggle/kaggle.json
!~/.local/bin/kaggle competitions download -c airbus-ship-detection -p /home/jupyter/Airbus_competition/input --force

gcloud job can't access my files, either they are in GCS or in my cloud shell

I'm trying to run my code of machine learning from images using tensorflow in Google CloudML. However, it seems the submitted job can't access to my files in my cloud shell or in GCS. Even though it is working fine in my local machine, I get the following error once I submit my job using the command gcloud from the cloud shell:
ERROR 2017-12-19 13:52:28 +0100 service IOError: [Errno 2] No such file or directory: '/home/user/pores-project-googleML/trainer/train.txt'
This folder can be found for sure in cloud shell, and I can check it when I type:
ls /home/user/pores-project-googleML/trainer/train.txt
I tried putting my file train.txt in GCS and access to it from my code (by specifying the path gs://my_bucket/my_path), but once the job submitted, I got a 'No such file or directory' error with the corresponding path.
To check where the job I submitted using gcloud is running, I added print(os.getcwd()) in the beginning of my python code trainer/task.py, which printed as a result in the logs: /user_dir. I couldn't find this path using the cloud shell, not even in GCS. So my question is how can I know in which machine my job is running? If it's in a certain container somewhere, how can I access from it to my files using the cloud shell and in GCS?
Before I do all of this, I succesfully completed the 'Image Classification using Flowers Dataset' tutorial.
The command I used to submit my job is:
gcloud ml-engine jobs submit training $JOB_NAME --job-dir $JOB_DIR --packages trainer-0.1.tar.gz --module-name $MAIN_TRAINER_MODULE --region us-central1
where:
TRAINER_PACKAGE_PATH=/home/use/pores-project-googleML/trainer
MAIN_TRAINER_MODULE="trainer.task"
JOB_DIR="gs://pores/AlexNet_CloudML/job_dir/"
JOB_NAME="census$(date +"%Y%m%d_%H%M%S")"
Regular Python IO library is not able to access files on GCS. Instead, you need to use GCS python client or gstuil cli to access GCS files.
Note that TensorFlow itself has native support of GCS (i.e., it can read GCS files directly).