How to copy s3 bucket files in to the Kubernetes running pods? - amazon-web-services

I have multiple files in s3 bucket which I need to copy to one of the running Kubernetes pods under /tmp path .
Need any reliable command or try and tested way to do the same.
Let's say my bucket name "learning" and pod name is "test-5c7cd9c-l6qng"

AWS CLI commands "aws s3api get-object" or "aws s3 cp" can be used to copy the data onto the Pod from S3. To make these calls AWS Access Keys are required. These keys provide the authentication to call the S3 service. "aws configure" command can be used to configure the Access Keys in the Pod.
Coming to K8S, an Init Container can be used to execute the above command before the actual application container starts. Instead of having the Access Keys directly written into the Pod which is not really safe, K8S Secrets feature can be used to pass/inject the Access Keys to the Pods.
FYI ... the download can be done programmatically by using the AWS SDK and the S3Client Interface for Java.

Related

How to use AWS CLI within AWS Lambda?

I want to Copy data from an S3 bucket in one account and Region to another account and Region which is why I want to use AWS CLI to be triggered by an entry to the source s3 bucket and the lambda function can then use AWS CLI to run aws s3 sync
So I tried using the techniques given here: https://bezdelev.com/hacking/aws-cli-inside-lambda-layer-aws-s3-sync/
Basically
Install AWS CLI in a local virtual environment
Package AWS CLI and all its dependencies to a zip file
Create a Lambda Layer
However even after I add the layer I still see the error ModuleNotFoundError: No module named 'awscli'

Copying File From S3 To EC2 by User Data Approach

I have been searching solution for this task, all I find CLI approaches which I don't want.
I simply want:
I have an S3 Bucket, which has one private file, file can be an image/zip file anything.
And I want when I launch any EC2 instance it should have taken that file from S3 bucket to EC2 instance directory.
And for this, I want to use only EC2 User Data Approach.
The User Data field in Amazon EC2 passes information to the instance that is accessible to applications running on the instance.
Amazon EC2 instances launched with Amazon-provided AMIs (eg Amazon Linux 2) include a program called Cloud-Init that looks at the User Data and, if a script is provided, runs that script the first time that the instance is booted.
Therefore, you can configure a script (passed via User Data) that will run when the instance is first launched. The script will run as the root user. Your script could copy a file from Amazon S3 by using the AWS Command-Line Interface (CLI), like this:
#!
aws s3 cp s3://my-bucket/foo.txt /home/ec2-user/foo.txt
chown ec2-user foo.txt
Please note that you will need to assign an IAM Role to the instance that has permission to access the bucket. The AWS CLI will use these permissions for the file copy.
You mention that you do not wish to use the AWS CLI. You could, instead, write a program that calls the Amazon S3 API using a preferred programming language (eg Python), but using the CLI is much simpler.

Run AWS CLI from local without storing credentials in local

How to run aws cli to download s3 bucket data without storing aws credential in local machine?
Please Note that s3 bucket is not a public bucket.
Not sure what your goal is, but you can use environment variables which you are only exporting for the current session/aws_cli run.
To prevent in bash (asuming you are using linux) that the export is written to history, you can use a space infront of the command.
You can start an EC2 instance and give that instance a role that allows it to read from your S3 bucket.
Once started, connect to the EC2 instance using ssh and initiate your S3 transfer using aws s3 cp...ˋ or ˋaws s3 sync...

Not able to get data from Amazon S3 to EC2 for Training

I'm new to cloud infrastructure for Deep Learning and trying to use AWS for deep learning first time and I don't know how to access my data from EC2 launched instance.
My data is stored is S3 bucket but I'm not able to find a way how to get it together and start training.
In that EC2 instance. login via ssh.
install aws cli if its not there
configure credentials are add permission for ec2 instance to use s3 bucket.
otherwise add aws secret and access key
get files to your local system
aws s3 cp s3://mybucket/test.txt test2.txt
Get files from local to s3
aws s3 cp test.txt s3://mybucket/test2.txt
https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html#examples

access amazon S3 bucket from hadoop specifying SecretAccessKey from command line

I am trying to access amazon S3 bucket using hdfs command. Here is command that I run:
$ hadoop fs -ls s3n://<ACCESSKEYID>:<SecretAccessKey>#<bucket-name>/tpt_files/
-ls: Invalid hostname in URI s3n://<ACCESSKEYID>:<SecretAccessKey>#<bucket-name>/tpt_files
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]
My SecretAccessKey includes “/”. Could it be cause of such behavior?
In the same time I have aws cli installed in this server and I can access my bucket using aws cli without any issues (AccessKeyId and SecretAccessKey configured in .aws/credentials):
aws s3 ls s3:// <bucket-name>/tpt_files/
If there any way how to access amazon S3 bucket using Hadoop command without specifying Keys in core-site.xml? I’d prefer to specify Keys in command line.
Any suggestions will be very helpful.
The best practice is run hadoop on an instance created with an EC2 Instance profile role, and the S3 access is specified as a policy of the assigned role. Keys are no longer needed when using the instance profile.
http://docs.aws.amazon.com/java-sdk/latest/developer-guide/credentials.html
You can also launch AMIs with an Instance profile role and the CLI and SDKs will use it. If your code uses the DefaultAWSCredentialsProviderChain class, then credentials can be obtained through environment variables, system properties, or credential profiles file (as well as EC2 Instance profile role).