I created a sort of data warehouse so you can make SQL queries in documentDB. I did so using Athena and a documentDBConnector.
(https://docs.aws.amazon.com/athena/latest/ug/athena-prebuilt-data-connectors-docdb.html)
However, I’d also like to set very deep and specific permission for a user who makes these queries in Athena and be able to specify table / column level permissions and I’m trying to see if I can do so using Lake Formation provided that I’m also using that documentDBConnector.
It doesn't seem like I can specify that docDB connector as a data source in lake formation.
If this is not possible, does anyone know of any other ideas that would let me specify more detailed permissions for making Athena Queries?
Related
Please what is the best or recommended way how to visualize data from the DynamoDB table ? We need to create some simple dashboard with graphs connected to data table on the AWS account.
We prefer to use one of the services from AWS to keep everything in one place. I read about the QuickSight but it would be great to know some experience.
You can use quicksight to visualize your table by using the Athena-DynamoDB connector. This will allow you to use DynamoDB as a table source in Athena which can then act as a source for Quicksight.
https://docs.aws.amazon.com/athena/latest/ug/connectors-dynamodb.html
I've recently been looking into the Apache Iceberg table format to reduce Athena query times on a Glue table with a large number of partitions, the additional features would be a bonus (transactions, row-level updates/deletes, time-travel queries etc). I've successfully built the tables and confirmed that they address the issue at-hand but I'd now like to be able to share the table with another AWS account, we've done this previously using Lake Formation cross-account grants and also the method described here but both approaches raise errors in the alternate account when trying to query the shared table. I've also tried using a bucket policy and registering a duplicate Glue table in the other account which doesn't throw an error but no rows are found when querying.
Is this currently possible to do? I'm aware that I could achieve this by providing role access into the account with the iceberg table but this complicates interaction with the table from other services in the alternate account. Any ideas appreciated.
Edit: When querying the lake formation table I see 'Generic internal error - access denied', it's documented that Iceberg tables don't work with Lake Formation so this is expected. When querying the table shared via cross account data catalog I see 'HIVE_METASTORE_ERROR: Table storage descriptor is missing SerDe info' when running a SELECT query and 'FAILED: SemanticException Unable to fetch table XXXXXXXXX. Unable to get table: java.lang.NullPointerException' when running SHOW CREATE TABLE or DESCRIBE. I can successfully run SHOW TBLPROPERTIES.
As of now Apache Iceberg Lake Formation integration is not supported:
Lake Formation – Integration with AWS Lake Formation is not supported.
https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html
I am trying to build AWS QuickSight reports using AWS Athena that builds the specific views for said reports. however, I seem to only be able to select a single table in creating the Glue job despite being able to select all tables i need for the crawler of the entire DB from Dynamo.
What is the simplest route to get a complete extract of all tables that is queryable in Athena.
I dont want to connect the reports direct to dynamoDB as it s a production database and want to create some separation to avoid any performance degradation by a poor query etc.
I would like to create via Terraform an Athena database including tables and views. I have already searched a lot and found some posts, e.g. here: Create AWS Athena view programmatically
I know that I can use Terraform provisioners to execute AWS CLI commands to create these resources, for example like this: AWS Athena Create table view with SQL
But I don't want to do that. I want to create everything (as far as possible) with Terraform so that I don't have to worry about lifecycle etc.
As far as I understand, an Athena database can be a Glue database, depending on the source you choose. If I choose the AWSDataCatalog (Glue) as data source in Athena, it should not matter if I create an Athena database or a Glue database with Terraform, correct?
In Glue I can also create tables, but no views. Do the Glue tables automatically correspond to Athena tables? How can I create Athena views? I would like to create everything with SQL DDL, just like you can do it in the AWS Web Console. How does this work via Terraform? If this functionality is not available, what is the best way to go? I am grateful for every tip and help!
Athena uses the Glue Data Catalog to store metadata about databases, tables, and views. All Athena tables are Glue tables. However, not all Glue tables work with Athena – you can create tables in Glue that won't be visible in Athena, and you can create tables that will be visible but won't work (for example cause runtime errors when you query them).
Athena uses Glue Data Catalog for views, but the format is very specific to Athena, unlike regular tables which can be made interoperable with for example Spark.
In an answer to the question you link to I explain in detail the anatomy of an Athena view. I have created views with CloudFormation with that information so it can be done with Terraform too. Unless you write code you will have to jump through all the hoops and repeat most of the information as Presto metadata, unfortunately.
Is it possible to query things in an RDS database using Athena? Or do I somehow have to get my data out of RDS and copy it into an s3 bucket so that Athena can query it from there? If that is the case how can I know the tables that are in my RDS? Is there a way to explore all the schemas of a database with Glue?
A feature was created exactly for this reason last year, Federated Queries.
By using this you can query across a large number of data sources other than just across S3.
If you're using either MySQL or Postgres in RDS then you can make use of the JDBC connector, with additional instructions here.