While trying to create a CTE in Athena . I am having below issue.
Related
I created one table in glue database using crawler job. Table created successfully.
However, when I am trying to access that table in athena query editor its giving me below error when i am try to select the data from table:
Query:
select * from DB1.data_tbl;
Output:
Hive File Not Found: Partition location does not exist
I haven't found the partition location define.
Please assist.
Athena, by default, can read only data in S3. It will not read your postgresql databases. To connect to anything other than S3, you have to setup and use Amazon Athena Federated Query.
Alternatively, setup a Glue Job to copy all data from your Postegresql into S3, and then use Athena to query the data from S3.
When I create a table in athena with CTAS syntax (example below), tables are registered to glue in a way that when I read the same table on an EMR cluster with (py)spark, every partition is read twice, but when I read it with athena, it is alright. When I create a table through spark with write.saveAsTable syntax, it's registered to glue properly and this table is read properly with spark and with athena.
I didn't find anything in the spark/athena/glue documentation about this. After some trial and errors I found out that there is a glue table property that is set by spark and not set by athena: spark.sql.sources.provider='parquet'. When I set this manually on tables created via athena, spark will read it properly. But this feels like an ugly workaround and I would like to understand what's happening in the background. And I didn't find anything about this table property.
Athena create table syntax:
CREATE TABLE {database}.{table}
WITH (format = 'Parquet',
parquet_compression = 'SNAPPY',
external_location = '{s3path}')
AS SELECT
While running MSCK repair tablename command, athena query editor returns an error tables not in metastore.
But table exists and I can query on that table.
I have data kept in S3 in form of parquet files, partitioned with
hash as partition key (partitions look like hash=0, hash=100 and so on), and I am running glue crawler to create a table in Athena.
I know partitions not in metastore is common issues and there are solutions to fix it. But I am not able to find the solution for tables not in metastore
Has anyone solved similar issue, or have an idea what could be wrong?
Does the IAM role being used to execute the query have permission to read that S3 bucket? I had this error when running a query from Lambda using a role which did not have ListBucket permission on the bucket in question.
I solved this by selecting the correct database from dropdown menu on the left of query editor. I had run the previous setup query on sampledb and then i was trying to run a new query but the new tab changed the db to default. Changing default to sampledb fixed the issue!
I have a table in glue catalog which is created by glue crawler after parsing json files in s3. Now when I am querying this table using Athena, I am getting below error. Few things about this situation -
JSON files are in S3
Glue crawler created tables in glue catalog using json serder
table contains nested datatypes like array and struct
I am getting same error while querying other regular fields (excluding nested ones)
I am able to query same glue catalog table using Hive in EMR. Tried with and without nested datatypes and it works fine.
Amazon Athena experienced a transient error while executing this
query. Waiting a couple of minutes and retrying the query may solve
the problem. If you continue to see the issue, please contact customer
support for further assistance. We apologize for the inconvenience.
You will not be charged for this query.
I'm getting an error when running an Athena query against a Glue table created from an RDS database:
HIVE_UNKNOWN_ERROR: Unable to create input format
The tables are created using a crawler. The tables show up correctly in the Glue interface:
However, they do not show up in the Athena interface under the database. It says: "The selected database has no tables"
I do not see this behaviour when using a database created using an S3 file. Maybe this is related to the error. Does anybody have an idea?
I had the same problem. This is the answer that I have got from AWS Support:
I understand that you set up a Glue crawler to crawl our RDS postresql database but the tables are not visible in Athena.
Athena service is designed to query tables that point to S3 as data-source. It cannot read data from non-S3 resources as of today.
So, unfortunately not possible at the moment.