Is it possible to directly access AWS Glue Data Catalog of Account B via the Athena interface of Account A?
I was just trying to resolve this same issue in my own setup, but then stumbled across this bummer (the last bullet under Cross-Account Access Limitations on this page):
Cross-account access to the Data Catalog is not supported when using an AWS Glue crawler, Amazon Athena, or Amazon Redshift.
So it sounds like even with the cross-account access that is possible today, they won't naturally replicate through those services (including the asked about Athena).
That said, I was able to set up cross-account access to the AWS Glue Data Catalog in a way that allowed me to use Account A to pull all relevant info about Data Catalog objects from Account B. I can update my answer to incorporate how far I got, if you want, but a hacky method that might solve this question would be to set up the cross-account access that is possible today then run a recurring Lambda function that replicates over all the relevant metadata in the Data Catalog from Account B to Account A so users in Account A can view that within Account A's AWS Glue Data Catalog. I'm not sure whether Athena specifically would work in that setup, as I know it requires PutObject access when it queries data in S3 (which could be solved via the appropriate S3 bucket policies, but that'd be another cross-account permissions thing to manage).
Let me know whether you'd like to see those details on what cross-account stuff I was able to get working.
AWS has started supporting this using Lambda, please follow below link
https://aws.amazon.com/blogs/big-data/cross-account-aws-glue-data-catalog-access-with-amazon-athena/
Since May 2021 it is now possible to register a data catalog from a different account in Amazon Athena, see the User Guide.
Athena Query Engine v2 is required though and there are some other limitations.
Related
Currently, we use AWS IAM User permanent credentials to transfer customers' data from our company's internal AWS S3 buckets to customers' Google BigQuery tables following BigQuery Data Transfer Service documentation.
Using permanent credentials possesses security risks related to the data stored in AWS S3.
We would like to use AWS IAM Role temporary credentials, which require the support of a session token on the BiqQuery side to get authorized on the AWS side.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
We considered Omni framework (https://cloud.google.com/bigquery/docs/omni-aws-cross-cloud-transfer) to transfer data from S3 to BQ, however, we faced several concerns/limitations:
Omni framework targets data analysis use-case rather than data transfer from external services. This concerns us that the design of Omni framework may have drawbacks in relation to data transfer at high scale
Omni framework currently supports only AWS-US-EAST-1 region (we require support at least in AWS-US-WEST-2 and AWS-EU-CENTRAL-1 and corresponding Google regions). This is not backward compatible with current customers' setup to transfer data from internal S3 to customers' BQ.
Our current customers will need to signup for Omni service to properly migrate from the current transfer solution we use
We considered a workaround with exporting data from S3 through staging in GCS (i.e. S3 -> GCS -> BQ), but this will also require a lot of effort from both customers and our company's sides to migrate to the new solution.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
No unfortunately.
The official Google BigQuery Data Transfer Service only mentions AWS access keys all throughout the documentation:
The access key ID and secret access key are used to access the Amazon S3 data on your behalf. As a best practice, create a unique access key ID and secret access key specifically for Amazon S3 transfers to give minimal access to the BigQuery Data Transfer Service. For information on managing your access keys, see the AWS general reference documentation.
The irony of the Google documentation is that while it refers to best practices and links to the official AWS docs, it actually doesn't endorse best practices and ignores what AWS mention:
We recommend that you use temporary access keys over long term access keys, as mentioned in the previous section.
Important
Unless there is no other option, we strongly recommend that you don't create long-term access keys for your (root) user. If a malicious user gains access to your (root) user access keys, they can completely take over your account.
You have a few options:
hook into both sides manually (i.e. link up various SDKs and/or APIs)
find an alternative BigQuery-compatible service, which does as such
accept the risk of long-term access keys.
In conclusion, Google is at fault here of not following security best practices and you - as a consumer - will have to bear the risk.
From reading the AWS manuals,
it is not clear to me whether one can enforce access to S3 data for an IAM user via a AWS View only.
I am going to try it next week, but some up-front research is what I am doing first so as to save time.
No, you cannot, via AWS Glue Data Catalog. Need Athena or Redshift Spectrum for such an approach.
I am a beginner of AWS service AppSync and DynamoDB.
I want to ask whether it is possible to use AWS AppSync in one account access DynamoDB in another account?
I follow this tutorial: AppSync Tutorial, but when I choose the data source, I don't know how to choose the DynamoDB in my other AWS account.
Could anyone give me some suggestions about the most simple method to implement cross-account access with AppSync and DynamoDB? Thanks in advance!
I think the best way is: replicate a DynamoDB in account A from account B by using DynamoDB Stream and Lambda, and then we can use AppSync in account A to access DynamoDB in account A.
Reference:
Copy DynamoDB table data cross account real time
https://dzone.com/articles/real-time-cross-region-dynamodb-replication-using
Can I use Informatica EDC instead of Glue catalog in AWS.
does AWS Athena tightly coupled with Glue catalog?
Did you check here: [https://docs.aws.amazon.com/athena/latest/ug/glue-upgrade.html?
Looks like you need to perform some AWS Glue upgrade, and also add policies so that Athena can pull catalog information. Also, FAQ is available here https://docs.aws.amazon.com/athena/latest/ug/glue-faq.html. I have not worked on this scenario yet, but working on Glue - Redshift.
In the FAQ, its mentioned as follows:
Why do I need to add AWS Glue policies to Athena users?
Before you upgrade, Athena manages the data catalog, so Athena actions must be allowed for your users to perform queries. After you upgrade to the AWS Glue Data Catalog, Athena actions no longer apply to accessing the AWS Glue Data Catalog, so AWS Glue actions must be allowed for your users. Remember, the managed policy for Athena has already been updated to allow the required AWS Glue actions, so no action is required if you use the managed policy.
What happens if I don’t allow AWS Glue policies for Athena users?
If you upgrade to the AWS Glue Data Catalog and don't update a user's customer-managed or inline IAM policies, Athena queries fail because the user won't be allowed to perform actions in AWS Glue. For the specific actions to allow, see Step 2 - Update Customer-Managed/Inline Policies Associated with Athena Users.
I have set up a reporting stack using data stored in S3, schema mapped by AWS Glue, queried by Amazon Athena, and visualized in Amazon QuickSight.
I gave QuickSight permissions to access the three aws-athena-query-results buckets I have (see below)
However, when I try to build reports based on my Athena table, it throws an error. I went back in and explicitly gave it access to the S3 bucket that holds my raw data, and now I have visualizations.
My question is whether or not this is how it should need to be set up. My assumption was that Athena has access to S3, and QuickSight has access to Athena and it's results, so it shouldn't need direct access to each S3 bucket storing raw data. It seems it would generate a lot of overhead each time there is a new S3 bucket to be reported on that you need to go grant Athena and QuickSight access.
From reading this page: Troubleshoot Athena Insufficient Permissions, it's unclear which buckets are required.
Yes, at the moment, QuickSight needs to be granted explicit access to both Athena and the underlying buckets that Athena accesses. I got this answer from discussion with Amazon so, unfortunately, I don't have source to link.