S3 inventory query in Athena

S3 inventory query in Athena - amazon-athena

I have s3 inventory service setup done for an s3 bucket. I have imported data into Athena in CSV format. Is there a way that I can query number of objects in a particular s3 directory in Athena ?

Yes, you can try this to get the number of objects
SELECT COUNT("$path") from TABLE_NAME

Related

AWS Athena query error: Hive File Not Found:partition location does not exist

I created one table in glue database using crawler job. Table created successfully.
However, when I am trying to access that table in athena query editor its giving me below error when i am try to select the data from table:
Query:
select * from DB1.data_tbl;
Output:
Hive File Not Found: Partition location does not exist
I haven't found the partition location define.
Please assist.

Athena, by default, can read only data in S3. It will not read your postgresql databases. To connect to anything other than S3, you have to setup and use Amazon Athena Federated Query.
Alternatively, setup a Glue Job to copy all data from your Postegresql into S3, and then use Athena to query the data from S3.

Why AWS Glue crawler does not crawl the data from datalake s3 bucket into single schema?

In redshift spectrum, while querying data it throws error . because table has duplicate column name
Glue crawler- does not crawl the schema into single schema even though i configure the crawler properly for "Create a single schema for each S3 path"

Possible to copy data from one s3 bucket to another via Hive?

How can I query one bucket via hive and copy the results to another bucket in s3?
I have a DDL setup to run avro queries but wanting to transfer the subset of results from my filter to a new bucket/location in s3.

You can just use a CREATE TABLE AS SELECT statement from one catalog in Presto to another.

AWS Glue Crawler is not creating tables in schema

I am trying AWS Glue crawler to create tables in athena.
The source that I am pulling it from is a Postgresql server. The crawler is able to parse the tables, create metadata and show the tables and columns in the Glue data catalog but the tables are not added in athena despite the fact that I have added the target database from athena.
Not sure why this is happening
Also, if I choose a csv source from s3 then it is able to create a table in athena with _csv as a suffix
Any help?

Athena doesn't recognize my Postgres tables added by Glue either. My guess is that Athena is used for querying data stored on S3, so it's not working for database queries.
Also, to be able to query your CSV files on S3, files need to be under a folder crawled by glue. If you just crawl a single file with Glue, Athena will return 0 records from the query.

Can Amazon Athena be used to query a dynamic schema?

I have a service running that populates my S3 bucket with the compressed log files, but the log files do not have a fixed schema and athena expects a fixed schema. (Which I wrote while creating the table)
So my question is as in the title, is there any way around through which I can query a dynamic schema? If not is there any other service like athena to do the same thing?

Amazon Athena can't do that by itself, but you can configure an AWS Glue crawler to automatically infer the schema of your JSON files. The crawler can run on a schedule, so your files will be indexed automatically even if the schema changes. Athena will use the Glue data catalog if AWS Glue is available in the region you're running Athena in.
See Cataloging Tables with a Crawler in the AWS Glue docs for the details on how to set that up.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

S3 inventory query in Athena - amazon-athena

I have s3 inventory service setup done for an s3 bucket. I have imported data into Athena in CSV format. Is there a way that I can query number of objects in a particular s3 directory in Athena ?

Yes, you can try this to get the number of objects SELECT COUNT("$path") from TABLE_NAME

Related

AWS Athena query error: Hive File Not Found:partition location does not exist

Why AWS Glue crawler does not crawl the data from datalake s3 bucket into single schema?

Possible to copy data from one s3 bucket to another via Hive?

AWS Glue Crawler is not creating tables in schema

Can Amazon Athena be used to query a dynamic schema?

Categories

Resources