I have a table created in Redshift Spectrum as an external table however that table does not show up in dbeaver for some reason. Is this a permissions issue?
Related
I have seen here https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-redshift-spectrum-adds-support-for-querying-open-source-apache-hudi-and-delta-lake/ that Redshift Spectrum has support for Hudi and Delta.
We're using Iceberg right now as a file format, and we have the requirement to read some tables externally in redshift spectrum for the BI Team.
I have created an external schema and an external table, but when I try to read the table, Redshift spectrum give me more data then we should.
We are upserting data based in primary key, so what happens in redshift spectrum the way I tried is that it returns me all records for the same id, instead of returning me only the latest version of it (like a partition by id) - wondering if anyone has tried it with success to integrate Iceberg with AWS Redshift Spectrum?
I have a delta table in s3 and for the same table, I have defined an external table in Athena. After creating the Athena table and generating manifests, I am loading the partitions using MSCK REPAIR TABLE. All the partition columns are in snake_case. But still, I am getting
Partitions not in metastore.
Any idea what am I missing here?
The IAM user or role doesn't have a policy that allows the glue:BatchCreatePartition action. You have to allow glue:BatchCreatePartition in the IAM policy and it should work.
Resolved the issue. I was putting partition columns in wrong order while creating the table.
I am new to AWS and trying to figure out how to populate a table within an external schema, residing in Amazon Redshift. I used Amazon Glue to create a table from a .csv file that sits in a S3 bucket. I can query the newly created table via Amazon Athena.
Here is where I am stuck because my task is to take the data and populate a table living in an RedShift external schema. I tried created a Job within Glue, but had no luck.
This is where I am stuck. Am I supposed to first create an empty destination table that mirrors the table that I can query using Athena?
Thank you to anyone in advance who might be able to assist!!!
Redshift Spectrum and Athena both use the Glue data catalog for external tables. When you create a new Redshift external schema that points at your existing Glue catalog the tables it contains will immediately exist in Redshift.
-- Create the Redshift Spectrum schema
CREATE EXTERNAL SCHEMA IF NOT EXISTS my_redshift_schema
FROM DATA CATALOG DATABASE 'my_glue_database'
IAM_ROLE 'arn:aws:iam:::role/MyIAMRole'
;
-- Review the schema info
SELECT *
FROM svv_external_schemas
WHERE schemaname = 'my_redshift_schema'
;
-- Review the tables in the schema
SELECT *
FROM svv_external_tables
WHERE schemaname = 'my_redshift_schema'
;
-- Confirm that the table returns data
SELECT *
FROM my_redshift_schema.my_external_table LIMIT 10
;
I have spun up a Redshift cluster and added my S3 external schema by running
CREATE EXTERNAL SCHEMA s3 FROM DATA CATALOG
DATABASE '<aws_glue_db>'
IAM_ROLE '<redshift_s3_glue_iam_role_arn>';
to access the AWS Glue Data Catalog. Everything is fine on Redshift, I can query data and all is well. On Quicksight, however, the table is recognized but is empty.
Do i have to move the data into Redshift? If so, would the only reason I should be using Redshift be to process Parquet files?
You should be able to select external tables from redshift, I think the role you're using is missing access to s3
https://aws.amazon.com/premiumsupport/knowledge-center/redshift-cross-account-glue-s3/
In the end I just wrote a custom SQL expression to select the relevant fields
I have a table defined in Glue data catalog that I can query using Athena. As there is some data in the table that I want to use with other Redshift tables, can I access the table defined in Glue data catalog?
What will be the create external table query to reference the table definition in Glue catalog?
From AWS (Creating External Schemas),
create external schema athena_schema from data catalog
database 'sampledb'
iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole'
region 'us-east-2';
This creates a schema athena_schema that points to the sampledb database in Athena / Glue.
You need to grant appropriate access to the IAM role you specify: the Redshift cluster needs to be able to assume the role, and the role needs access to Glue.