How to access Google Cloud Storage bucket using aws-cli - amazon-web-services

I have an access with both aws and Google Cloud Platform.
Is this possible to do the following,
List Google Cloud Storage bucket using aws-cli
PUT a CSV file to Google Cloud Storage bucket using aws-cli
GET an object(s) from Google Cloud Storage bucket using aws-cli

It is possible. Per GCP documentation
The Cloud Storage XML API is interoperable with ... services such as Amazon Simple Storage Service (Amazon S3)
To do this you need to enable Interoperability in the Settings screen in the Google Cloud Storage console. From there you can creates a storage access key.
Configure the aws cli with those keys. IE aws configure.
You can then use the aws s3 command with the --endpoint-url flag set to https://storage.googleapis.com.
For example:
MacBook-Pro:~$ aws s3 --endpoint-url https://storage.googleapis.com ls
2018-02-09 14:43:42 foo.appspot.com
2018-02-09 14:43:42 bar.appspot.com
2018-05-02 20:03:08 etc.appspot.com
aws s3 --endpoint-url https://storage.googleapis.com cp test.md s3://foo.appspot.com
upload: ./test.md to s3://foo.appspot.com/test.md

I had a requirement to copy objects from GC storage bucket to S3 using AWS Lambda.
Python boto3 library allows listing and downloading objects from GC bucket.
Below is sample lambda code to copy "sample-data-s3.csv" object from GC bucket to s3 bucket.
import boto3
import io
s3 = boto3.resource('s3')
google_access_key_id="GOOG1EIxxMYKEYxxMQ"
google_access_key_secret="QifDxxMYSECRETKEYxxVU1oad1b"
gc_bucket_name="my_gc_bucket"
def get_gcs_objects(google_access_key_id, google_access_key_secret,
gc_bucket_name):
"""Gets GCS objects using boto3 SDK"""
client = boto3.client("s3", region_name="auto",
endpoint_url="https://storage.googleapis.com",
aws_access_key_id=google_access_key_id,
aws_secret_access_key=google_access_key_secret)
# Call GCS to list objects in gc_bucket_name
response = client.list_objects(Bucket=gc_bucket_name)
# Print object names
print("Objects:")
for blob in response["Contents"]:
print(blob)
object = s3.Object('my_aws_s3_bucket', 'sample-data-s3.csv')
f = io.BytesIO()
client.download_fileobj(gc_bucket_name,"sample-data.csv",f)
object.put(Body=f.getvalue())
def lambda_handler(event, context):
get_gcs_objects(google_access_key_id,google_access_key_secret,gc_bucket_name)
You can loop through blob to download all objects from GC bucket.
Hope this helps someone who wants to use AWS lambda to transfer objects from GC bucket to s3 bucket.

~$ aws configure
AWS Access Key ID [****************2ZL8]:
AWS Secret Access Key [****************obYP]:
Default region name [None]: us-east-1
Default output format [None]:
~$aws s3 ls --endpoint-url=<east-region-url>
2019-02-18 12:18:05 test
~$aws s3 cp test.py s3://<bucket-name> --endpoint-url=<east-region-url>
~$aws s3 mv s3://<bucket-name>/<filename> test1.txt --endpoint-url=<east-region-url>

Unfortunately this is not possible,
Could you maybe update your question to why you want to do this, maybe we know of an alternative solution to your question?

Related

Cannot access AMAZON REDSHIFT using boto3.resource

I am currently learning about boto3 and how it can interact with AWS to connect using both the client and resources methods. I was made to understand that it doesnt matter which one I use that I can still get access except in some cases where I would need to access some client features that are not available through the resources medium hence I would specifiy the created resource variable i.e from
import boto3
s3_resource = boto3.resource('s3')
Hence if there is a need for me to access some client features, I would simply specify
s3.resource.meta.client
But the main issue here is, I tried creating clients/resources first for EC2, S3, IAM, and Redshift so I did this
import boto3
ec2 = boto3.resource('ec2',
region_name='us-west-2',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET)
s3 = boto3.resource('s3',
region_name='us-west-2',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET)
iam = boto3.client('iam',
region_name='us-west-2',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET)
redshift = boto3.resource('redshift',
region_name='us-west-2',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET)
But I get this error
UnknownServiceError: Unknown service: 'redshift'. Valid service names are: cloudformation, cloudwatch, dynamodb, ec2, glacier, iam, opsworks, s3, sns, sqs
During handling of the above exception, another exception occurred:
...
- s3
- sns
- sqs
Consider using a boto3.client('redshift') instead of a resource for 'redshift'
Please why is that, I thought I could create using the commands that I specified, Please help
I suggest that you consult the Boto3 documentation for Amazon Redshift. It does, indeed, show that there is no resource method for Redshift (or Redshift Data API, or Redshift Serverless).
Also, I recommend against using aws_access_key_id and aws_secret_access_key in your code unless there is a specific need (such as extracting them from Environment Variables). It is better to use the AWS CLI aws configure command to store AWS credentials in a configuration file, which will be automatically accessed by AWS SDKs such as boto3.

Boto3 S3 cp recursive from one region to another

I am looking for a way to copy from an S3 bucket in one region to another S3 bucket in a different region via a python script.
I am able to do it using AWS CLI using:
aws s3 cp source-bucket target-bucket --recursive --source-region region1 --region region2
However, I want to see if similar is possible within python script using boto3.
Whatever I have researched seems to work for same region using boto3.resource and resource.meta.cleint.copy
Amazon S3 can only 'copy' one object at a time.
When you use that AWS CLI command, it first obtains a list of objects in the source bucket and then calls copy_object() once for each object to copy (it uses multithreading to do multiple copies simultaneously).
You can write your own python script to do the same thing. (In fact, the AWS CLI is a Python program!) Your code would also need to call list_objects_v2() and then call copy_object() for each object to copy.
Given that the buckets are in different regions, you would send the commands to destination region, referencing the source region, for example:
s3_client = boto3.client('s3', region_name='destination-region-id)
response = s3_client.copy_object(Bucket=..., Key=..., CopySource={'Bucket': 'source-bucket', 'Key': 'source-key'})

Accessing s3 bucket on AWS ParallelCluster

I have a requirement of accessing S3 bucket on the AWS ParallelCluster nodes. I did explore the s3_read_write_resource option in the ParallelCluster documentation. But, it is not clear as to how we can access the bucket. For example, will it be mounted on the nodes, or will the users be able to access it by default. I did test the latter by trying to access a bucket I declared using the s3_read_write_resource option in the config file, but was not able to access it (aws s3 ls s3://<name-of-the-bucket>).
I did go through this github issue talking about mounting S3 bucket using s3fs. In my experience it is very slow to access the objects using s3fs.
So, my question is,
How can we access the S3 bucket when using s3_read_write_resource option in AWS ParallelCluster config file
These parameters are used in ParallelCluster to include S3 permissions on the instance role that is created for cluster instances. They're mapped into Cloudformation template parameters S3ReadResource and S3ReadWriteResource . And later used in the Cloudformation template. For example, here and here. There's no special way for accessing S3 objects.
To access S3 on one cluster instance, we need to use the aws cli or any SDK . Credentials will be automatically obtained from the instance role using instance metadata service.
Please note that ParallelCluster doesn't grant permissions to list S3 objects.
Retrieving existing objects from S3 bucket defined in s3_read_resource, as well as retrieving and writing objects to S3 bucket defined in s3_read_write_resource should work.
However, "aws s3 ls" or "aws s3 ls s3://name-of-the-bucket" need additional permissions. See https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-listobjects-sync/.
I wouldn't use s3fs, as it's not AWS supported, it's been reported to be slow (as you've already noticed), and other reasons.
You might want to check the FSx section. It can create an attach an FSx for Lustre filesystem. It can import/export files to/from S3 natively. We just need to set import_path and export_path on this section.

How to access and copy Cloudian S3 files using AWS CLI

I'm new to AWS S3. I need to access a Cloudian S3 bucket and copy files within a bucket to my local directory. What I was given was 4 piece of info in the following format:
• Access key: 5x4x3x2x1xxx
• Secret key: ssssssssssss
• S3 endpoint: https://s3-aaa.xxx.bbb.net
• Storage path: store/STORE1/
When I'm trying to do a simple command like ls, I get this error:
aws s3 ls s3-aaa.xxx.bbb.net or aws s3 ls https://s3-aaa.xxx.bbb.net:
An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
What is the right commands to access the bucket and copy a file to my local directory?
It looks like you are missing your bucket name - you should be able to see it on the AWS S3 console.
You should also be able to use either the cp or sync command like so:
aws s3 cp s3://SOURCE_BUCKET_NAME/s3/file/key SomeDrive:/path/to/my/local/directory
Or:
aws s3 sync s3://SOURCE_BUCKET_NAME/s3/file/key SomeDrive:/path/to/my/local/directory
You may also need to check the permissions on the s3 bucket.
More info:
aws s3 sync: https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
aws s3 cp: https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 permissions: https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-access-default-encryption/

how to download file from public s3 bucket using console

I want to download the file from a public S3 bucket using AWS console. When I put the below in the browser but getting an error. Wanted to visually see what else is there in that folder and explore
Public S3 bucket :
s3://us-east-1.elasticmapreduce.samples/flightdata/input
It appears that you are wanting to access an Amazon S3 bucket that belongs to a different AWS account. This cannot be done via the Amazon S3 management console.
Instead, I recommend using the AWS Command-Line Interface (CLI). You can use:
aws s3 ls s3://flightdata/input/
That will show you the objects stored in that bucket/path.
You could then download the objects with:
aws s3 sync s3://flightdata/input/ input