Hive on S3 multi aws users and Spark - amazon-web-services

This is my scenario
I am a spark and aws enthusiast and I am itching to understand more about the technology.
Case 1: My spark application runs on an EMR cluster and the spark application
read from a hive on s3 table and write into a hive table on s3. In this case , the S3 buckets belong to the same user usera so I added fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey to a config file . In my case I added it to the hdfs-site.xml. usera had the right permissions to access the bucket so no problem .
Case 2: I am reading from 2 hive tables on s3. table1 and table2.
table1 belongs to user1 and table2 belongs to user2.
given that i cannot specify multiple awsAccessKeyId in the config file for s3. [ I understand that s3a has a concept of bucket specific keys but I am not using s3a I am using s3.]
how are these scenarios supported in aws EMR ?
I understand that IAM , EC2 instance role and profile role can apply here

I think the solution to your problem is cross-account permissions. Thus, you can define permission for user1 to access user2's bucket. You can also take a look at this too.

Apache Hadoop 2.8 supports per-bucket configuration. AWS EMR doesn't, which is something you will have to take up with them.
As a workaround, you can put secrets in the URI, e.g. s3://user:secret#bucket, remembering to encode special characters in the secret. After doing that, the URL, logs and stack traces must be considered sensitive data and not shared.

Related

Access Denied When Create AWS Glue Crawler

I am trying to create a crawler in AWS Glue, but it gives error: {"service":"AWSGlue","statusCode":400,"errorCode":"AccessDeniedException","requestId":"<requestId>","errorMessage":"Account <accountId> is denied access.","type":"AwsServiceError"}.
This is what I've done so far:
Create a database in AWS Glue
Add tables in the database using a crawler
Name the crawler
Choose Amazon S3 as the data store and specified a path to a csv file inside a bucket in my account
Choose an existing IAM role I've created before
Choose a database I've created before
Press finish.
When I press finish, the above error is occurred.
I have grant AdministratorAccess both to IAM user and role used to create the crawler, so I assume there is no lack of permission issues. The bucket used is not encrypted and located in the same region as the AWS Glue.
I also have tried to create another database and specified a path to a different csv file but it is not solved the problem.
Any help would be very appreciated. Thanks.
I have contacted the owner (the root user) of this account and the owner asked for help to AWS Premium Support. The AWS Premium Support told us that all the required permissions to create AWS Glue Crawler are already provided and there is no SCPs attached to the account. After waiting around 7-working-day, finally I can create AWS Glue Crawler without any errors.
Unfortunately, I don't have any further information on how the AWS Premium Support solve the issue. For those of you who encounter similar errors like me, just try to contact the owner of the account, because most likely the issue is out of your control. Hope this helps in the future. Thanks.

Can I access a AWS S3 Bucket with "Bucket and objects not public" access via AWS Quicksight?

There is a S3 bucket that has "Bucket and objects not public" access. Within Athena, there is table that is pulling data from the S3 bucket successfully. However, I cannot pull the data from Athena to Quicksight. My conclusion is that it is because the S3 bucket has "Bucket and objects not public" access. Is this correct?
Is it the case that Athena has some kind of special access to the S3 bucket, but Quicksight doesn't?
Here is a crude illustration of the issue:
I'm a total beginner when it comes to AWS so I apologise for missing any information.
Thanks in advance.
To verify that you can connect Amazon QuickSight to Athena, check the following settings:
AWS resource permissions inside of Amazon QuickSight
AWS IAM policies
S3 bucket location
Query results location
If S3 bucket location and Query results location are correct, you might have issues with Amazon QuickSight resource permissions. You have to make sure that Amazon QuickSight can access the S3 buckets used by Athena:
Choose your profile name. Choose Manage QuickSight, then choose Security & permissions.
Choose Add or remove.
Locate Athena, select it to enable Athena. (choose Connect both)
Choose the buckets that you want to access and click Select.
Choose Update.

Is it possible to unload my orgs redshift data to an s3 bucket outside of my org?

I am trying to unload my organizations redshift data to a vendors s3 bucket. They have provided me with Access Keys and Secret Access Keys but Im getting a 403 Access Denied Error. I want to make sure its not an issue with the credentials they sent me and am reaching out. Is this even possible?
Yes, That's possible. Here are the steps.
unload the redshift data to bucket associated to the current account.
unload ('select * from table')
to 's3://mybucket/table'
Mybucket should have replication configured to the vendor s3 bucket. use the below s3 replication to configure the replication

What are all the permission required other redshift query result to my AWS S3 bucket

We have access to other AWS redshift cluster. We want to unload one of the tables query results into our S3 bucket. Can we know what are all the permissions required to unload those results into our S3 bucket.
Tried to search in google but didn't find any related doc.
If you can access a Redshift, any table/view and could do SELECT on it, then you should be able to unload that select as well. No special permissions are required as far as Redshift is concerned.
Though you need to have valid S3 IAM role or S3 Access/Secret Key to unload the data.
Other way to look into unload is, its more of selecting bulk data and redirecting it to a physical store S3 for other purposes.
Here is official unload documentation page.
https://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html
With S3 credentials
unload ('select * from venue') to 's3://mybucket/tickit/venue_' access_key_id <access-key-id>' secret_access_key '<secret-access-key>'
With IAM Role,
unload ('select * from venue') to 's3://mybucket/tickit/unload/venue_' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

Elastic Map Reduce and amazon s3: Error regarding access keys

I am new to Amazon EMR and Hadoop in general. I am currently trying to set up a Pig job on an EMR cluster and to import and export data from S3. I have set up a bucket in s3 with my data named "datastackexchange". In an attempt to begin to copy the data to Pig, I have used the following command:
ls s3://datastackexchange
And I am met with the following error message:
AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
I presume I am missing some critical steps (presumably involving setting up the access keys). As I am very new to EMR, could someone please explain what I need to do to get rid of this error and allow me to use my S3 data in EMR?
Any help is greatly appreciated - thank you.
As you correctly observed, your EMR instances do not have the privileges to access the S3 data. There are many ways to specify the AWS credentials to access your S3 data, but the correct way is to create IAM role(s) for accessing your S3 data.
Configure IAM Roles for Amazon EMR explains the steps involved.