How does boto choose aws region to create buckets? - amazon-web-services

I created 3 different buckets, 1 using aws management console and 2 using boto api.
Bucket created using aws management console was created in tokyo region whereas the one created using boto were created us-east-1 region.
When I access by bucket using boto how does it findout the correct region in which the buckets are created. Also, how does it choose which region to create bucket in.
I have gone thought the connection.py file in boto source code but I am not able to make any sense of the code.
Any help is greatly appreciated!

You can control the location of a new bucket by specifying a value for the location parameter in the create_bucket method. For example, to create a bucket in the ap-northeast-1 region you would do this:
import boto.s3
from boto.s3.connection import Location
c = boto.s3.connect_to_region('ap-northeast-1')
bucket = c.create_bucket('mynewbucket', location=Location.APNortheast)
In this example, I am connecting to the S3 endpoint in the ap-northeast-1 region but that is not required. Even if you are connected to the universal S3 endpoint you can still create a bucket in another location using this technique.
To access the bucket after it has been created, you have a couple of options:
You could connect to the S3 endpoint in the region you created the bucket and then use the get_bucket method to lookup your bucket and get a Bucket object for it.
You could connect to the universal S3 endpoint and use the get_bucket method to lookup your bucket. In order for this to work, you need to follow the more restricted bucket naming conventions described here. This allows your bucket to be accessed via virtual hosting style addressing, e.g. https://mybucket.s3.amazonaws.com/. This, in turn, allows DNS to resolve your request to the correct S3 endpoint. Note that the DNS records take time to propagate so if you try to address your bucket in this manner immediately after it has been create it might not work. Try again in a few minutes.

Related

Possible to access an AWS public dataset using Cyberduck?

Cyberduck version: Version 7.9.2
Cyberduck is designed to access non-public AWS buckets. It asks for:
Server
Port
Access Key ID
Secret Access Key
The Registry of Open Data on AWS provides this information for an open dataset (using the example at https://registry.opendata.aws/target/):
Resource type: S3 Bucket
Amazon Resource Name (ARN): arn:aws:s3:::gdc-target-phs000218-2-open
AWS Region: us-east-1
AWS CLI Access (No AWS account required): aws s3 ls s3://gdc-target-phs000218-2-open/ --no-sign-request
Is there a version of s3://gdc-target-phs000218-2-open that can be used in Cyberduck to connect to the data?
If the bucket is public, any AWS credentials will suffice. So as long as you can create an AWS account, you only need to create an IAM user for yourself with programmatic access, and you are all set.
No doubt, it's a pain because creating an AWS account needs your credit (or debit) card! But see https://stackoverflow.com/a/44825406/1094109 and https://stackoverflow.com/a/44825406/1094109
I tried this with s3://gdc-target-phs000218-2-open and it worked:
For RODA buckets that provide public access to specific prefixes, you'd need to edit the path to suit. E.g. s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/ (this is a RODA bucket maintained by us, yet to be released fully)
NOTE: The screenshots below show a different URL that's no longer operational. The correct URL is s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/
No, it's explicitly stated in the documentation that
You must obtain the login credentials [in order to connect to Amazon S3 in Cyberduck]

AWS Lambda Cross account Keys & Roles usage for S3 transfer

I have a usecase to use AWS Lambda to copy files/objects from one S3 bucket to another. In this usecase Source S3 bucket is in a separate AWS account(say Account 1) where the provider has only given us AccessKey & SecretAccess Key. Our Lambda runs in Account 2 and the destination bucket can be either in Account 2 or some other account 3 altogether which can be accessed using IAM role. The setup is like this due to multiple partner sharing data files
Usually, I used to use the following boto3 command to copy the contents between two buckets when everything is in the same account but want to know how this can be modified for the new usecase
copy_source_object = {'Bucket': source_bucket_name, 'Key': source_file_key}
s3_client.copy_object(CopySource=copy_source_object, Bucket=destination_bucket_name, Key=destination_file_key)
How can the above code be modified to fit my usecase of having accesskey based connection to source bucket and roles for destination bucket(which can be cross-account role as well)? Please let me know if any clarification is required
There's multiple options here. Easiest is by providing credentials to boto3 (docs). I would suggest retrieving the keys from the SSM parameter store or secrets manager so they're not stored hardcoded.
Edit: I realize the problem now, you can't use the same session for both buckets, makes sense. The exact thing you want is not possible (ie. use copy_object). The trick is to use 2 separate session so you don't mix the credentials. You would need to get_object from the first account and put_object to the second objects. You should be able to simply put the resp['Body'] from the get in the put request but I haven't tested this.
import boto3
acc1_session = boto3.session.Session(
aws_access_key_id=ACCESS_KEY_acc1,
aws_secret_access_key=SECRET_KEY_acc1
)
acc2_session = boto3.session.Session(
aws_access_key_id=ACCESS_KEY_acc2,
aws_secret_access_key=SECRET_KEY_acc2
)
acc1_client = acc1_session.client('s3')
acc2_client = acc2_session.client('s3')
copy_source_object = {'Bucket': source_bucket_name, 'Key': source_file_key}
resp = acc1_client.get_object(Bucket=source_bucket_name, Key=source_file_key)
acc2_client.put_object(Bucket=destination_bucket_name, Key=destination_file_key, Body=resp['Body'])
Your situation appears to be:
Account-1:
Amazon S3 bucket containing files you wish to copy
You have an Access Key + Secret Key from Account-1 that can read these objects
Account-2:
AWS Lambda function that has an IAM Role that can write to a destination bucket
When using the CopyObject() command, the credentials used must have read permission on the source bucket and write permission on the destination bucket. There are normally two ways to do this:
Use credentials from Account-1 to 'push' the file to Account-2. This requires a Bucket Policy on the destination bucket that permits PutObject for the Account-1 credentials. Also, you should set ACL= bucket-owner-full-control to handover control to Account-2. (This sounds similar to your situation.) OR
Use credentials from Account-2 to 'pull' the file from Account-1. This requires a Bucket Policy on the source bucket that permits GetObject for the Account-2 credentials.
If you can't ask for a change to the Bucket Policy on the source bucket that permits Account-2 to read the contents, then **you'll need a Bucket Policy on the Destination bucket that permits write access by the credentials from Account-1`.
This is made more complex by the fact that you are potentially copying the object to a bucket in "some other account". There is no easy answer if you are starting to use 3 accounts in the process.
Bottom line: If possible, ask them for a change to the source bucket's Bucket Policy so that your Lambda function can read the files without having to change credentials. It can then copy objects to any bucket that the function's IAM Role can access.

Accessing s3 bucket on AWS ParallelCluster

I have a requirement of accessing S3 bucket on the AWS ParallelCluster nodes. I did explore the s3_read_write_resource option in the ParallelCluster documentation. But, it is not clear as to how we can access the bucket. For example, will it be mounted on the nodes, or will the users be able to access it by default. I did test the latter by trying to access a bucket I declared using the s3_read_write_resource option in the config file, but was not able to access it (aws s3 ls s3://<name-of-the-bucket>).
I did go through this github issue talking about mounting S3 bucket using s3fs. In my experience it is very slow to access the objects using s3fs.
So, my question is,
How can we access the S3 bucket when using s3_read_write_resource option in AWS ParallelCluster config file
These parameters are used in ParallelCluster to include S3 permissions on the instance role that is created for cluster instances. They're mapped into Cloudformation template parameters S3ReadResource and S3ReadWriteResource . And later used in the Cloudformation template. For example, here and here. There's no special way for accessing S3 objects.
To access S3 on one cluster instance, we need to use the aws cli or any SDK . Credentials will be automatically obtained from the instance role using instance metadata service.
Please note that ParallelCluster doesn't grant permissions to list S3 objects.
Retrieving existing objects from S3 bucket defined in s3_read_resource, as well as retrieving and writing objects to S3 bucket defined in s3_read_write_resource should work.
However, "aws s3 ls" or "aws s3 ls s3://name-of-the-bucket" need additional permissions. See https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-listobjects-sync/.
I wouldn't use s3fs, as it's not AWS supported, it's been reported to be slow (as you've already noticed), and other reasons.
You might want to check the FSx section. It can create an attach an FSx for Lustre filesystem. It can import/export files to/from S3 natively. We just need to set import_path and export_path on this section.

Is my s3 bucket set to the correct region?

When I go to console in AWS by clicking the yellow cube in the top corner it directs me to the following url:
https://ap-southeast-1.console.aws.amazon.com/console/home?region=ap-southeast-1
This is correct, cause my app is used primarily in Southeast Asia.
Now when I go to my S3 bucket, right click and select properties, I see:
Bucket: examplebucket
Region: US Standard
I believe that when I first created my AWS account I had set it to us-west-2 and then later changed it to ap-southeast-1. Is there something I need to do is change the region of the s3 bucket from 'US Standard'?
In the navbar, under global it says "S3 does not require region selection." which is confusing to me.
The bucket is being used for photo storage. The majority of my web users are in Southeast Asia.
It would certainly make sense to locate the bucket closest to the majority of your users. Also, consider using Amazon CloudFront to cache objects, providing even faster data access to your users.
Each Amazon S3 bucket resides in a single region. Any data placed into that bucket stays within that region. It is also possible to configure cross-region replication of buckets, which will copy objects from one bucket to a different bucket in a different region.
The Amazon S3 management console displays all buckets in all regions (hence the message that "S3 does not require region selection"). Clicking on a bucket will display the bucket properties, which will show the region in which the bucket resides.
It is not possible to 'change' the region of a bucket. Instead, you should create a new bucket in the desired region and copy the objects to the new bucket. The easiest way to copy the files is via the AWS Command-Line Interface (CLI), with a command like:
aws s3 cp s3://source-bucket s3://destination-bucket --recursive
If you have many files, it might be safer to use the sync option, which can be run multiple times (in case of errors/failures):
aws s3 sync s3://source-bucket s3://destination-bucket
Please note that if you wish to retain the name of the bucket, you would need to copy to a temporary bucket, delete the original bucket, wait for the bucket name to become available again (10 minutes?), create the bucket in the desired region, then copy the objects to the new bucket.

How to copy S3 objects between regions with Amazon AWS PHP SDK?

I'm trying to copy Amazon AWS S3 objects between two buckets in two different regions with Amazon AWS PHP SDK v3. This would be a one-time process, so I don't need cross-region replication. Tried to use copyObject() but there is no way to specify the region.
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",
));
Source:
http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectUsingPHP.html
You don't need to specify regions for that operation. It'll find out the target bucket's region and copy it.
But you may be right, because on AWS CLI there is source region and target region attributes which do not exist on PHP SDK. So you can accomplish the task like this:
Create an interim bucket in the source region.
Create the bucket in the target region.
Configure replication from the interim bucket to target one.
On interim bucket set expiration rule, so files will be deleted after a short time automatically from the interim bucket.
Copy objects from source bucket to interim bucket using PHP SDK.
All your objects will also be copied to another region.
You can remove the interim bucket one day later.
Or use just cli and use this single command:
aws s3 cp s3://my-source-bucket-in-us-west-2/ s3://my-target-bucket-in-us-east-1/ --recursive --source-region us-west-2 --region us-east-1
Different region bucket could also be different account. What others had been doing was to copy off from one bucket and save the data temporary locally, then upload to different bucket with different credentials. (if you have two regional buckets with different credentials).
Newest update from CLI tool allows you to copy from bucket to bucket if it's under the same account. Using something like what Çağatay Gürtürk mentioned.