How do I download files within a Sagemaker notebook instance programatically? - amazon-web-services

We have a notebook instance within Sagemaker which contains many Jupyter Python scripts. I'd like to write a program which downloads these various scripts each day (i.e. so that I could back them up). Unfortunately I don't see any reference to this in the AWS CLI API.
Is this achievable?

It's not exactly that you want, but looks like VCS can fit your needs. You can use Github(if you already use it) or CodeCommit(free privat repos) Details and additional ways like sync target dir with S3 bucket - https://aws.amazon.com/blogs/machine-learning/how-to-use-common-workflows-on-amazon-sagemaker-notebook-instances/

Semi automatic way:
conda install -y -c conda-forge zip
!zip -r -X folder.zip folder-to-zip
Then download that zipfile.

Related

aws cli returns python objects instead of regular output

I just installed aws cli on Ubuntu following the oficial installation guide on an azure VM.
When I run any command from the command line the results are python objects and not a text or regular output
$ aws s3 ls
<botocore.awsrequest.AWSRequest object at 0x7f412f3573a0>
I searched everywhere but I cant find any hint.
I already reinstalled aws and also tried using the output flag but nothing changes.
Any suggestions?
This took me a while to figure out as well. For some reason this only affected our CICD jobs, but using the same exact container image and env vars locally worked fine.
Turns out, the issue stems from not providing a region.
You can fix this by specifying the region explicitly in the command:
aws s3 ls --region us-west-2
Or by providing the region with the available AWS env vars:
export AWS_REGION="us-west-2"
# or
export AWS_DEFAULT_REGION="us-west-2"
Some related sources that helped me figure this out:
https://github.com/jwalton/gh-ecr-login/issues/3
aws s3 ls gives error botocore.utils.BadIMDSRequestError: <botocore.awsrequest.AWSRequest object at 0x7f3f6cb44d00>
Well, I don't know how I didn't tried this before, but installing the awscli with apt fixed the issue.
sudo apt-get install awscli.

Google Cloud Platform: cloudshell - is there any way to "keep" gcloud init configs?

Does anyone know of a way to persist configurations done using "gcloud init" commands inside cloudshell, so they don't vanish each time you disconnect?
I figured out how to persist python pip installs using the --user
example: pip install --user pandas
But, when I create a new configuration using gcloud init, use it for a bit, close cloudshell (or cloudshell times out on me), then reconnect later, the configurations are gone.
Not a big deal, I bounce between projects/etc so it's nice to have the configs saved so I can simply run
gcloud config configurations activate config-name
Thanks...Rich Murnane
Google Cloud Shell only persists data in your $HOME directory. Commands like gcloud init modify the environment variables and store configuration files in /tmp which is deleted when the VM is restarted. The VM is terminated after being idle for 20 minutes or 60 minutes depending on which document you read.
Google Cloud Shell is a Docker container. You can modify the docker image to customize to fit your needs. This method will allow you to install packages, tools, etc that are not located in your $HOME directory.
You can also store your files and configuration scripts on Google Cloud Storage. Modify .bashrc to download your cloud files and run your configuration script.
Either method will allow you to create a persistent environment.
This StackOverflow answer covers in detail what gcloud init does and how to basically emulate the same thing via script or command line.
gcloud init details
this isn't exactly what I wanted, but since my
account (userid) isn't changing, I'm simply going to
do the command
gcloud config set project second-project-name
good enough, thanks...Rich

AWS: Mount S3 Bucket to an EC2 instance. (Later FTP Tunneling)

what do I want to do?
Step1: Mount a S3 Bucket to an EC2 Instance.
Step2: Install a FTP Server on the EC2 Instance and tunnel ftp-requests to files in the bucket.
What did I do so far?
create bucket
create security group with open input ports (FTP:20,21 - SSH:22 - some more)
connect to ec2
And the following code:
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/s3fs/s3fs-1.74.tar.gz
tar -xvzf s3fs-1.74.tar.gz
yum update all
yum install gcc libstdc++-devel gcc-c++ fuse fuse-devel curl-devel libxml2-devel openssl-devel mailcap
cd s3fs-1.74
./configure --prefix=/usr
make
make install
vi /etc/passwd-s3fs # set access:secret keys
chmod 640 /etc/passwd-s3fs
mkdir /s3bucket
cd /s3bucket
And cd anwers: Transport endpoint is not connected
Dunno what's wrong. Maybe I am using the wrong user? But currently I only have one user (for test reasons) except for root.
Next step would be the ftp tunnel, but for now I'd like getting this to work.
I followed these instructions now. https://github.com/s3fs-fuse/s3fs-fuse
I guess they are calling the API in background too, but it works as I wished.
One possible solution to mount S3 to an EC2 instance is to use the new file gateway.
Check out this:
https://aws.amazon.com/about-aws/whats-new/2017/02/aws-storage-gateway-supports-running-file-gateway-in-ec2-and-adds-file-share-security-options/
http://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html
Point 1
Whilst the other answerer is correct in saying that S3 is not built for this, it's not true to say a bucket cannot be mounted (I'd seriously consider finding a better way to solve your problem however).
That being said, you can use s3fuse to mount S3 buckets within EC2. There's plenty of good reasons not to do this, detailed here.
Point 2
From there it's just a case of setting up a standard FTP server, since the bucket now appears to your system as if it is any other file system (mostly).
vsftpd could be good choice for this. I'd have a go at both and then post separate questions with any specific problems you run into, but this should give you a rough outline to work from. (Well, in reality I'd have a go at neither and use S3 via app code consuming the API, but still).

Accessing data in Google Cloud bucket for a python Tensorflow learning program

I’m working through the Google quick start examples for Cloud Learning / Tensorflow as shown here: https://cloud.google.com/ml/docs/quickstarts/training
I want my python program to access data that I have stored in a Google Cloud bucket such as gs://mybucket. How do I do this inside of my python program instead of calling it from the command line?
Specifically, the quickstart example for cloud learning utilizes data they provided but what if I want to provide my own data that I have stored in a bucket such as gs://mybucket?
I noticed a similar post here: How can I get the Cloud ML service account programmatically in Python? ... but I can’t seem to install the googleapiclient module.
Some posts seem to mention Apache Beam though I can’t tell if that’s relevant to me, but besides I can’t figure out how to download or install that whatever it is.
If I understand your question correctly, you want to programmatically talk to GCS in Python.
The official docs are a good place to start.
First, grab the module using pip:
pip install --upgrade google-cloud-storage
Then:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')
Assuming you are using Ubuntu/Linux as an OS and already having data in GCS bucket
Execute following commands from a terminal or can be executed on Jupyter Notebook(just use ! before commands):
--------------------- Installation -----------------
1st install storage module:
on Terminal type:
pip install google-cloud-storage
2nd to verify storage installed or not type the command:
gsutil
(o/p will show available options)
---------------------- Copy data from GCS bucket --------
type this command: to check whether you are able to get information about bucket
gsutil acl get gs://BucketName
Now copy the file from GCS Bucket to your machine:
gsutil cp gs://BucketName/FileName /PathToDestinationDir/
In this way, you will be able to copy data from this bucket to your machine for further processing purpose.
NOTE: all the above commands can be run from a Jupyter Notebook just use ! before commands, it will run e.g.
!gsutil cp gs://BucketName/FileName /PathToDestinationDir/

How to set up and use EC2 CLI on Mac?

I am stuck at using Amazon EC2 CLI.
I have downloaded the Command Line Tools from
http://aws.amazon.com/developertools/351.
I placed the bin and lib folder into my Amazon project folder: /Users/Invictus/EC2
I downloaded the cert-xxxx.pem and pk-xxx.pem into the same folder.
Created a .bash_profile in the same folder.
I tried to execute ec2-describe-images -o amazon after I moved to cd /Users/Invictus/EC2.
The system does not recognise the command: command not found.
If I try to execute the same command inside the bin folder, the result is the same.
My .bash_profile:
export EC2_HOME=~/.EC2
export PATH=$PATH:$EC2_HOME/bin
export EC2_PRIVATE_KEY=`ls $EC2_HOME/pk-*.pem`
export EC2_CERT=`ls $EC2_HOME/cert-*.pem`
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home/
Where did I make a mistake?
My aim is to connect to the launched instance and be able to execute commands there from my local machine.
I have Java installed.
The newer AWS Unified CLI Tools is much, much easier to set up. All you need is Python, which comes built-in to every Mac.
Here are a few things I can think of:
Your .bash_profile should be in /Users/Invictus/ , not /Users/Invictus/EC2. Move it to your home directory and log off and log back in (or restart your machine) and see if it picks up the right path.
Instead of ec2-describe-images, can you run it as "./ec2-describe-images" - does that work? If not, can you check the permissions on that script?