Can we execute sql query inside DMS task so that it just fetches the required data and not the whole db.
If its not possible then which aws service is used to fetch query based data from on-prem data source to aws S3.
You can use filters and/or exclude fields: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html
Contact me if you have problems.
For alternate solution to DMS, you can use AWS Glue with data retrieved using PYSPARK dataframe from on prem DB to either s3 and AWS RDS. This works very well. The only down side is the cost.
This solution supports both table and SQL as input for data extraction
Related
I am trying to build AWS QuickSight reports using AWS Athena that builds the specific views for said reports. however, I seem to only be able to select a single table in creating the Glue job despite being able to select all tables i need for the crawler of the entire DB from Dynamo.
What is the simplest route to get a complete extract of all tables that is queryable in Athena.
I dont want to connect the reports direct to dynamoDB as it s a production database and want to create some separation to avoid any performance degradation by a poor query etc.
I have live data on DocumentDB
Provisioning data on Aurora (Postgres)
I would like to have both datasets on Quicksight for BI
The DocumentDB data uses internal IDs
The SQL holds the mapping to meaningful names
Is there a way to achieve such a thing?
I have a Glue job that extracts the data from the DocumentDB and outputs it as a JSON in S3
Regarding Aurora - Quicksight natively integrates
Thanks
I have managed to achieve this after adding another data source to the same data set.
AWS then enables you to do a join between the data sources.
I need to access some tables which are there in AWS Glue which i am using as a metastore. I wanted to know if Glue provides any jdbc endpoint to connect to it just like HIVE does.
I understand that it is possible to read data into AWS glue from other databases like MYSQL, Oracle etc using JDBC but my requirement is opposite and i have to read from AWS glue using JDBC. Please help if it is possible as I could not find a reference for this.
For accessing the data from glue catalog, follow these steps:
Run the crawler and update the table in glue catalog.
To access these tables using JDBC or ODBC endpoint, you need athena.
Download the driver from this link.
Read the docs for creating the url according to your region here
Also go through this documentation for additional properties
Hope it helps
Is it possible to query things in an RDS database using Athena? Or do I somehow have to get my data out of RDS and copy it into an s3 bucket so that Athena can query it from there? If that is the case how can I know the tables that are in my RDS? Is there a way to explore all the schemas of a database with Glue?
A feature was created exactly for this reason last year, Federated Queries.
By using this you can query across a large number of data sources other than just across S3.
If you're using either MySQL or Postgres in RDS then you can make use of the JDBC connector, with additional instructions here.
I'm attempting to use AWS Glue to ETL a MySQL database in RDS to S3 so that I can work with the data in services like SageMaker or Athena. At this time, I don't care about transformations, this is a prototype and I simply want to dump the DB to S3 to start testing the various tool chains.
I've set up a Glue database and tested the connection to RDS successfully
I am using the AWS provide Glue IAM service role
My S3 bucket has the correct prefix of aws-glue-*
I created a crawler using the Glue database, AWSGlue service role, and S3 bucket above with the options:
Schema updates in the data store: Update the table definition in the data catalog
Object deletion in the data store: Delete tables and partitions from the data catalog.
When I run the crawler, it completes in ~60 seconds but it does not create any tables in the database.
I've tried adding the Admin policy to the glue service role to eliminate IAM access issues and the result is the same.
Also, CloudWatch logs are empty. Log groups are created for the test connection and the crawler but neither contains any entries.
I'm not sure how to further troubleshoot this, info on AWS Glue seems pretty sparse.
Figured it out. I had a syntax error in my "include path" for the crawler. Make sure the connection is the data source (RDS in this case) and the include path lists the data target you want e.g. mydatabase/% (I forgot the /%).
You can substitute the percent (%) character for a schema or table. For databases that support schemas, type MyDatabase/MySchema/% to match all tables in MySchema with MyDatabase. Oracle and MySQL don't support schema in the path, instead type MyDatabase/%. For information about which JDBC data stores support schema, see Cataloging Tables with a Crawler.
Ryan Fisher is correct in the sense that it's an error. I wouldn't categorize it as a syntax error. When I ran into this it was because the 'Include path' didn't include the default schema that sql server lovingly provides to you.
I had this: database_name/table_name
When it needed to be: database_name/dbo/table_name