I have a requirement of accessing S3 bucket on the AWS ParallelCluster nodes. I did explore the s3_read_write_resource option in the ParallelCluster documentation. But, it is not clear as to how we can access the bucket. For example, will it be mounted on the nodes, or will the users be able to access it by default. I did test the latter by trying to access a bucket I declared using the s3_read_write_resource option in the config file, but was not able to access it (aws s3 ls s3://<name-of-the-bucket>).
I did go through this github issue talking about mounting S3 bucket using s3fs. In my experience it is very slow to access the objects using s3fs.
So, my question is,
How can we access the S3 bucket when using s3_read_write_resource option in AWS ParallelCluster config file
These parameters are used in ParallelCluster to include S3 permissions on the instance role that is created for cluster instances. They're mapped into Cloudformation template parameters S3ReadResource and S3ReadWriteResource . And later used in the Cloudformation template. For example, here and here. There's no special way for accessing S3 objects.
To access S3 on one cluster instance, we need to use the aws cli or any SDK . Credentials will be automatically obtained from the instance role using instance metadata service.
Please note that ParallelCluster doesn't grant permissions to list S3 objects.
Retrieving existing objects from S3 bucket defined in s3_read_resource, as well as retrieving and writing objects to S3 bucket defined in s3_read_write_resource should work.
However, "aws s3 ls" or "aws s3 ls s3://name-of-the-bucket" need additional permissions. See https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-listobjects-sync/.
I wouldn't use s3fs, as it's not AWS supported, it's been reported to be slow (as you've already noticed), and other reasons.
You might want to check the FSx section. It can create an attach an FSx for Lustre filesystem. It can import/export files to/from S3 natively. We just need to set import_path and export_path on this section.
Related
I cannot make the S3 bucket public. I would like a method to use without AWS command line interface.
You need to create a role with S3 access (full or limited, you decide) and attach this role to the EC2 instance. Now you can manipulate files in the S3 from the EC2.
Cyberduck version: Version 7.9.2
Cyberduck is designed to access non-public AWS buckets. It asks for:
Server
Port
Access Key ID
Secret Access Key
The Registry of Open Data on AWS provides this information for an open dataset (using the example at https://registry.opendata.aws/target/):
Resource type: S3 Bucket
Amazon Resource Name (ARN): arn:aws:s3:::gdc-target-phs000218-2-open
AWS Region: us-east-1
AWS CLI Access (No AWS account required): aws s3 ls s3://gdc-target-phs000218-2-open/ --no-sign-request
Is there a version of s3://gdc-target-phs000218-2-open that can be used in Cyberduck to connect to the data?
If the bucket is public, any AWS credentials will suffice. So as long as you can create an AWS account, you only need to create an IAM user for yourself with programmatic access, and you are all set.
No doubt, it's a pain because creating an AWS account needs your credit (or debit) card! But see https://stackoverflow.com/a/44825406/1094109 and https://stackoverflow.com/a/44825406/1094109
I tried this with s3://gdc-target-phs000218-2-open and it worked:
For RODA buckets that provide public access to specific prefixes, you'd need to edit the path to suit. E.g. s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/ (this is a RODA bucket maintained by us, yet to be released fully)
NOTE: The screenshots below show a different URL that's no longer operational. The correct URL is s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/
No, it's explicitly stated in the documentation that
You must obtain the login credentials [in order to connect to Amazon S3 in Cyberduck]
I'm trying to use an S3 bucket to upload files to as part of a build, it is configured to provide files as a static site and the content is protected using a Lambda and CloudFront. When I manually create files in the bucket they are all visible and everything is happy, but when the files are uploaded what is created are not available, resulting in an access denied response.
The user that's pushing to the bucket does not belong in the same AWS environment, but it has been set up with an ACL that allows it to push to the bucket, and the bucket with a policy that allows it to be pushed to by that user.
The command that I'm using is:
aws s3 sync --no-progress --delete docs/_build/html "s3://my-bucket" --acl bucket-owner-full-control
Is there something else that I can try that basically uses the bucket permissions for anything that's created?
According to OP's feedback in the comment section, setting Object Ownership to Bucket owner preferred fixed the issue.
I recently moved my app to elasticbeanstalk, and i am running Symfony3, there is a mandatory parameters.yml file that has to be populated with Environmental variables.
Id like to wget the parameters.yml from a private S3 bucket, limiting access to instances only.
I know i can set the environmental variables directly on the environment, but i have some very very sensitive stuff there, and environmental variables get leaked into my logging system, which is very bad.
I also have multiple environments such as workers using the same environmental variables, and copy pasting them is quite annoying.
So i am wondering if its possible to have the app wget it on deploy, i know how to do that, but i cant seem to configure the S3 bucket to only allow access from instances.
Yep, that definitely can be done, there are different ways of doing this depending what approach you want to take. I would suggest to use .ebextentions to create IAM Role -> grant access for that role to your bucket -> after package is unzip on the instance -> copy object from s3 using instance role
Create custom IAM role using AWS console or using .ebextentions custom resources and grant access for that role to the objects in your bucket.
Related read
Using above mentioned .ebextentions set aws:autoscaling:launchconfiguration in options_setting to specify instance profile you created before.
Again, using .ebxtentions use container_commands option to run aws s3 cp command
I'm trying to copy Amazon AWS S3 objects between two buckets in two different regions with Amazon AWS PHP SDK v3. This would be a one-time process, so I don't need cross-region replication. Tried to use copyObject() but there is no way to specify the region.
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",
));
Source:
http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectUsingPHP.html
You don't need to specify regions for that operation. It'll find out the target bucket's region and copy it.
But you may be right, because on AWS CLI there is source region and target region attributes which do not exist on PHP SDK. So you can accomplish the task like this:
Create an interim bucket in the source region.
Create the bucket in the target region.
Configure replication from the interim bucket to target one.
On interim bucket set expiration rule, so files will be deleted after a short time automatically from the interim bucket.
Copy objects from source bucket to interim bucket using PHP SDK.
All your objects will also be copied to another region.
You can remove the interim bucket one day later.
Or use just cli and use this single command:
aws s3 cp s3://my-source-bucket-in-us-west-2/ s3://my-target-bucket-in-us-east-1/ --recursive --source-region us-west-2 --region us-east-1
Different region bucket could also be different account. What others had been doing was to copy off from one bucket and save the data temporary locally, then upload to different bucket with different credentials. (if you have two regional buckets with different credentials).
Newest update from CLI tool allows you to copy from bucket to bucket if it's under the same account. Using something like what Çağatay Gürtürk mentioned.