Populate external schema table in Redshift from S3 bucket file

Populate external schema table in Redshift from S3 bucket file - amazon-web-services

I am new to AWS and trying to figure out how to populate a table within an external schema, residing in Amazon Redshift. I used Amazon Glue to create a table from a .csv file that sits in a S3 bucket. I can query the newly created table via Amazon Athena.
Here is where I am stuck because my task is to take the data and populate a table living in an RedShift external schema. I tried created a Job within Glue, but had no luck.
This is where I am stuck. Am I supposed to first create an empty destination table that mirrors the table that I can query using Athena?
Thank you to anyone in advance who might be able to assist!!!

Redshift Spectrum and Athena both use the Glue data catalog for external tables. When you create a new Redshift external schema that points at your existing Glue catalog the tables it contains will immediately exist in Redshift.
-- Create the Redshift Spectrum schema
CREATE EXTERNAL SCHEMA IF NOT EXISTS my_redshift_schema
FROM DATA CATALOG DATABASE 'my_glue_database'
IAM_ROLE 'arn:aws:iam:::role/MyIAMRole'
;
-- Review the schema info
SELECT *
FROM svv_external_schemas
WHERE schemaname = 'my_redshift_schema'
;
-- Review the tables in the schema
SELECT *
FROM svv_external_tables
WHERE schemaname = 'my_redshift_schema'
;
-- Confirm that the table returns data
SELECT *
FROM my_redshift_schema.my_external_table LIMIT 10
;

Related

AWS Athena query error: Hive File Not Found:partition location does not exist

I created one table in glue database using crawler job. Table created successfully.
However, when I am trying to access that table in athena query editor its giving me below error when i am try to select the data from table:
Query:
select * from DB1.data_tbl;
Output:
Hive File Not Found: Partition location does not exist
I haven't found the partition location define.
Please assist.

Athena, by default, can read only data in S3. It will not read your postgresql databases. To connect to anything other than S3, you have to setup and use Amazon Athena Federated Query.
Alternatively, setup a Glue Job to copy all data from your Postegresql into S3, and then use Athena to query the data from S3.

How do you connect to an external schema/table on Redshift Spectrum through AWS Quicksight?

I have spun up a Redshift cluster and added my S3 external schema by running
CREATE EXTERNAL SCHEMA s3 FROM DATA CATALOG
DATABASE '<aws_glue_db>'
IAM_ROLE '<redshift_s3_glue_iam_role_arn>';
to access the AWS Glue Data Catalog. Everything is fine on Redshift, I can query data and all is well. On Quicksight, however, the table is recognized but is empty.
Do i have to move the data into Redshift? If so, would the only reason I should be using Redshift be to process Parquet files?

You should be able to select external tables from redshift, I think the role you're using is missing access to s3
https://aws.amazon.com/premiumsupport/knowledge-center/redshift-cross-account-glue-s3/

In the end I just wrote a custom SQL expression to select the relevant fields

While Running AWS Athena query Query says Zero Records Returned

SELECT * FROM "sampledb"."parquetcheck" limit 10;
Trying to use Parquet file in S3 and created a table in AWS Athena and it is created perfectly.
However when I run the select query, it says "Zero Records Returned."
Although My Parquet file in S3 has data.
I have created partition too. IAM has full access on Athena.

If your specified columns names are correct, then you may need to load the partitions using: MSCK REPAIR TABLE EnterYourTableName;. This will add new partitions to the Glue Catalog.
If any of the above fails, you can create a temporary Glue Crawler to crawl your table and then validate the metadata in Athena by clicking the 3 dots next to the table name. Then select Generate Create Table DDL. You can then compare any differences in DDL.

AWS Glue Crawler is not creating tables in schema

I am trying AWS Glue crawler to create tables in athena.
The source that I am pulling it from is a Postgresql server. The crawler is able to parse the tables, create metadata and show the tables and columns in the Glue data catalog but the tables are not added in athena despite the fact that I have added the target database from athena.
Not sure why this is happening
Also, if I choose a csv source from s3 then it is able to create a table in athena with _csv as a suffix
Any help?

Athena doesn't recognize my Postgres tables added by Glue either. My guess is that Athena is used for querying data stored on S3, so it's not working for database queries.
Also, to be able to query your CSV files on S3, files need to be under a folder crawled by glue. If you just crawl a single file with Glue, Athena will return 0 records from the query.

Query DynamoDB Data with EMR

I am looking for a way to query the AWS DynamoDB data with SQL Syntax using amazon EMR.
I have my DynamoDB table set up and ready. How can I import/query the data using Hue? The table in DynamoDB has a size of around 8GB.

Please follow the below steps:-
Hive to query non-live DynamoDB data:-
1) Export Data from DynamoDB to Hive
Refer Section : Exporting Data from DynamoDB in EMR Hive Commands link below
2) Use Amazon EMR to query data stored in DynamoDB
Refer Section : Querying Data in DynamoDB in EMR Hive Commands link below
3) Use Hue to run the queries (i.e. run Hive queries from Hue workbench)
EMR Hive Commands
Hue Supported
Hive to query live DynamoDB:-
1) Create Hive table to map to DynamoDB table
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/EMR_Interactive_Hive.html
2) Once you create the Hive table and run queries on it, it will refer the live DynamoDB table to get the data
Disadvantage : It consumes DynamoDB read or write units for each execution. In other words, it will cost you for each query execution.
Sample code:-
CREATE EXTERNAL TABLE hivetable1 (col1 string, col2 bigint, col3 array<string>)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "dynamodbtable1",
"dynamodb.column.mapping" = "col1:name,col2:year,col3:holidays");

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Populate external schema table in Redshift from S3 bucket file - amazon-web-services

Related

AWS Athena query error: Hive File Not Found:partition location does not exist

How do you connect to an external schema/table on Redshift Spectrum through AWS Quicksight?

While Running AWS Athena query Query says Zero Records Returned

AWS Glue Crawler is not creating tables in schema

Query DynamoDB Data with EMR

Categories

Resources