Elastic Map Reduce and amazon s3: Error regarding access keys - amazon-web-services

I am new to Amazon EMR and Hadoop in general. I am currently trying to set up a Pig job on an EMR cluster and to import and export data from S3. I have set up a bucket in s3 with my data named "datastackexchange". In an attempt to begin to copy the data to Pig, I have used the following command:
ls s3://datastackexchange
And I am met with the following error message:
AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
I presume I am missing some critical steps (presumably involving setting up the access keys). As I am very new to EMR, could someone please explain what I need to do to get rid of this error and allow me to use my S3 data in EMR?
Any help is greatly appreciated - thank you.

As you correctly observed, your EMR instances do not have the privileges to access the S3 data. There are many ways to specify the AWS credentials to access your S3 data, but the correct way is to create IAM role(s) for accessing your S3 data.
Configure IAM Roles for Amazon EMR explains the steps involved.

Related

Copy an on premise Windows folder to S3 bucket

I have an old archive folder that exists on an on premise Windows server that I need to put into an S3 bucket, but having issues, it's more my knowledge of AWS tbh, but I'm trying.
I have created the S3 bucket and I can to attach it to the server using net share (AWS gives you the command via the AWS gateway) and I gave it a drive letter. I then tried to use robocopy to copy the data, but it didn't like the drive letter for some reason.
I then read I can use the AWS CLI so I tried something like:
aws s3 sync z: s3://archives-folder1
I get - fatal error: Unable to locate credentials
I guess I need to put some credentials in somewhere (.aws), but after reading too many documents I'm not sure what to do at this point, could someone advise?
Maybe there is a better way.
Thanks
You do not need to 'attach' the S3 bucket to your system. You can simply use the AWS CLI command to communicate directly with Amazon S3.
First, however, you need to provide the AWS CLI with a set of AWS credentials that can be used to access the bucket. You can do this with:
aws configure
It will ask for an Access Key and Secret Key. You can obtain these from the Security Credentials tab when viewing your IAM User in the IAM management console.

Access Denied When Create AWS Glue Crawler

I am trying to create a crawler in AWS Glue, but it gives error: {"service":"AWSGlue","statusCode":400,"errorCode":"AccessDeniedException","requestId":"<requestId>","errorMessage":"Account <accountId> is denied access.","type":"AwsServiceError"}.
This is what I've done so far:
Create a database in AWS Glue
Add tables in the database using a crawler
Name the crawler
Choose Amazon S3 as the data store and specified a path to a csv file inside a bucket in my account
Choose an existing IAM role I've created before
Choose a database I've created before
Press finish.
When I press finish, the above error is occurred.
I have grant AdministratorAccess both to IAM user and role used to create the crawler, so I assume there is no lack of permission issues. The bucket used is not encrypted and located in the same region as the AWS Glue.
I also have tried to create another database and specified a path to a different csv file but it is not solved the problem.
Any help would be very appreciated. Thanks.
I have contacted the owner (the root user) of this account and the owner asked for help to AWS Premium Support. The AWS Premium Support told us that all the required permissions to create AWS Glue Crawler are already provided and there is no SCPs attached to the account. After waiting around 7-working-day, finally I can create AWS Glue Crawler without any errors.
Unfortunately, I don't have any further information on how the AWS Premium Support solve the issue. For those of you who encounter similar errors like me, just try to contact the owner of the account, because most likely the issue is out of your control. Hope this helps in the future. Thanks.

Is it possible to unload my orgs redshift data to an s3 bucket outside of my org?

I am trying to unload my organizations redshift data to a vendors s3 bucket. They have provided me with Access Keys and Secret Access Keys but Im getting a 403 Access Denied Error. I want to make sure its not an issue with the credentials they sent me and am reaching out. Is this even possible?
Yes, That's possible. Here are the steps.
unload the redshift data to bucket associated to the current account.
unload ('select * from table')
to 's3://mybucket/table'
Mybucket should have replication configured to the vendor s3 bucket. use the below s3 replication to configure the replication

Hive on S3 multi aws users and Spark

This is my scenario
I am a spark and aws enthusiast and I am itching to understand more about the technology.
Case 1: My spark application runs on an EMR cluster and the spark application
read from a hive on s3 table and write into a hive table on s3. In this case , the S3 buckets belong to the same user usera so I added fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey to a config file . In my case I added it to the hdfs-site.xml. usera had the right permissions to access the bucket so no problem .
Case 2: I am reading from 2 hive tables on s3. table1 and table2.
table1 belongs to user1 and table2 belongs to user2.
given that i cannot specify multiple awsAccessKeyId in the config file for s3. [ I understand that s3a has a concept of bucket specific keys but I am not using s3a I am using s3.]
how are these scenarios supported in aws EMR ?
I understand that IAM , EC2 instance role and profile role can apply here
I think the solution to your problem is cross-account permissions. Thus, you can define permission for user1 to access user2's bucket. You can also take a look at this too.
Apache Hadoop 2.8 supports per-bucket configuration. AWS EMR doesn't, which is something you will have to take up with them.
As a workaround, you can put secrets in the URI, e.g. s3://user:secret#bucket, remembering to encode special characters in the secret. After doing that, the URL, logs and stack traces must be considered sensitive data and not shared.

Trying to load Redshift samples, Access Denied when COPYing from S3

I'm running through the Redshift tutorials on the AWS site, and I can't access their sample data buckets with the COPY command. I know I'm using the right Key and Secret Key, and have even generated new ones to try, without success.
The error from S3 is S3ServiceException:Access Denied,Status 403,Error AccessDenied. Amazon says this is related to permissions for a bucket, but they don't specify credentials to use for accessing their sample buckets, so I assume they're open to the public?
Anyone got a fix for this or am I misinterpreting the error?
I was misinterpreting the error. The buckets are publicly accessible and you just have to give your IAM user access to the S3 service.