I have live data on DocumentDB
Provisioning data on Aurora (Postgres)
I would like to have both datasets on Quicksight for BI
The DocumentDB data uses internal IDs
The SQL holds the mapping to meaningful names
Is there a way to achieve such a thing?
I have a Glue job that extracts the data from the DocumentDB and outputs it as a JSON in S3
Regarding Aurora - Quicksight natively integrates
Thanks
I have managed to achieve this after adding another data source to the same data set.
AWS then enables you to do a join between the data sources.
Related
Please what is the best or recommended way how to visualize data from the DynamoDB table ? We need to create some simple dashboard with graphs connected to data table on the AWS account.
We prefer to use one of the services from AWS to keep everything in one place. I read about the QuickSight but it would be great to know some experience.
You can use quicksight to visualize your table by using the Athena-DynamoDB connector. This will allow you to use DynamoDB as a table source in Athena which can then act as a source for Quicksight.
https://docs.aws.amazon.com/athena/latest/ug/connectors-dynamodb.html
I have dozens of tables in my data source (RDS) and I am ingesting all of this data into Redshift through AWS Glue. I am currently manually creating tables in Redshift (through SQL) and then proceeding with the Crawler and AWS Glue to fill in the Redshift tables with the data flowing from RDS.
Is there a way I can create these target tables within Redshift automatically (based on the tables I have in RDS, as these will just be an exact same copy initially) and not manually create each one of them with SQL in the Redshift Query Editor section?
Thanks in advance,
I have a DocumentDB as the data source.
I am running an AWS Glue job that pulls all the data from a certain table, and then inserts it to a RedShift cluster.
Is it possible to avoid adding duplicate data?
I have seen that AWS glue supports bookmarks,
This does not seem to work for DocumentDB as the data source
Thanks.
Can we execute sql query inside DMS task so that it just fetches the required data and not the whole db.
If its not possible then which aws service is used to fetch query based data from on-prem data source to aws S3.
You can use filters and/or exclude fields: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html
Contact me if you have problems.
For alternate solution to DMS, you can use AWS Glue with data retrieved using PYSPARK dataframe from on prem DB to either s3 and AWS RDS. This works very well. The only down side is the cost.
This solution supports both table and SQL as input for data extraction
I would like to use a few Aurora(MySQL) tables as source when creating external tables on AWS Athena. Because, those tables are mutable and gets updated often. I see Hive and Presto support this by using org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler. Is there an equivalent of any such feature in AWS Athena?
Amazon Athena is dedicated to running interactive ad hoc SQL queries against data on Amazon S3, the mentioned feature isn't supported yet.
https://docs.aws.amazon.com/athena/latest/ug/supported-format.html
You have two solutions in this situation:
1) You can use Glue or EMR service.
https://docs.aws.amazon.com/glue/latest/dg/console-connections.html
2) You can export the data from Aurora to S3 and then you can use Athena.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html