I am planning to utilize catalogPartitionPredicate in one of my projects. I am unable to handle one of the scenarios. Below are the details:
Partition columns: Year,Month & Day
catalogPartitionPredicate: year>='2021' and month>='12'
If the year changes to 2022(2022-01-01) and I want to read data from 2021-12-01; the expression won't be able to handle as it will not allow to read 2022 data. I tried to concat the partition keys but it didn't work.
Is there any way to implement to_date functionality or any other workaround to handle this scenario?
could you add your glue code ?
did you try running glue crawler ?
Can I get multiple items by partition key in aws management console for Ddynamodb without using BatchGetItem. My Partition key is abcd1 abcd1 abcd2.
You cannot do this within the DynamoDB web UI because a Query which is used can only retrieve a single item collection, but you can achieve it using the PartiQL editor using SQL language.
SELECT * FROM mytable WHERE pk IN ['abc1','abc2']
You will need to modify the statement to suit your specific needs
I would like to know if there's an option to query with multiple partition keys from DynamoDB table in AWS dashboard. Unable to find any article or similar requests for dashboard on the web. Will keep you posted if I find an answer for the same.
Thanks in advance.
The Console doesn't support this directly, because there is no support in the underlying API. What you're looking for is the equivalent of the following SQL query:
select *
from table
where PK in ('value_1', 'value_2') /*equivalent to: PK = 'value_1' or PK = 'value_2' */
The console supports using the Query and Scan operations. Query always operates on an item collection, so all items that share the same partition key, which means it can't be used for your use case.
Scan on the other hand is a full table scan, which allows you to optionally filter the results. The filter language has no support for this kind of or logical operator so that won't really help you. It will however allow you to view all items, which includes the ones you're looking for, but as I said, it's not really possible.
The document just says that it is a query service but not explicitly states that it can or cannot perform data update.
If Athena cannot do insert or update, is there any other aws service which can do like a normal DB?
Amazon Athena is, indeed, a query service -- it only allows data to be read from Amazon S3.
One exception, however, is that the results of the query are automatically written to S3. You could, therefore, use a query to generate results that could be used by something else. It's not quite updating data but it is generating data.
My previous attempts to use Athena output in another Athena query didn't work due to problems with the automatically-generated header, but there might be some workarounds available.
If you are seeking a service that can update information in S3, you could use Amazon EMR, which is basically a managed Hadoop cluster. Very powerful and capable, and can most certainly update information in S3, but it is rather complex to learn.
Amazon Athena adds support for inserting data into a table using the results of a SELECT query or using a provided set of values
Amazon Athena now supports inserting new data to an existing table using the INSERT INTO statement.
https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-athena-adds-support-inserting-data-into-table-results-of-select-query/
https://docs.aws.amazon.com/athena/latest/ug/insert-into.html
Bucketed tables not supported
INSERT INTO is not supported on bucketed tables. For more information, see Bucketing vs Partitioning.
AWS S3 is a object storage. Both Athena and S3 Select is for queries. The only way to modify a object(file) in S3 is to retrieve from S3, modify and upload back to S3.
As of September 20, 2019 Athena also supports INSERT INTO: https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-athena-adds-support-inserting-data-into-table-results-of-select-query/
Finally there is a solution from AWS. Now you can perform CRUD (create, read, update and delete) operations on AWS Athena. Athena Iceberg integration is generally available now. Create the table with:
TBLPROPERTIES ( 'table_type' ='ICEBERG' [, property_name=property_value])
then you can use it's amazing feature.
For a quick introduction, you can watch this video. (Or search Insert / Update / Delete on S3 With Amazon Athena and Apache Iceberg | Amazon Web Services on Youtube)
Read Considerations and Limitations
Athena supports CTAS (create table as) statements as of October 2018. You can specify output location and file format among other options.
https://docs.aws.amazon.com/athena/latest/ug/ctas.html
To INSERT into tables you can write additional files in the same format to the S3 path for a given table (this is somewhat of a hack), or preferably add partitions for the new data.
Like many big data systems, Athena is not capable of handling UPDATE statements.
We could use something known as Apache Iceberg in collaboration with Athena to perform CRUD operations on S3 data inside AWS itself.
The only caveat being that at the time of table creation we need to use extra parameter as table_type = 'ICEBERG'.
Eg:
create table demo
(
id string,
attr1 string
)
location 's3://path'
TBLPROPERTIES (
'table_type' = 'ICEBERG'
)
For more details : https://www.youtube.com/watch?v=u1v666EXCJw
I want to provide runtime values to the query in Select & Create table statements. What are the ways to parameterize Athena SQL queries?
I tried with PREPARE & EXECUTE statements from Presto however it is not working in Athena console. Do we need any external scripts like Python to call it?
PREPARE my_select1
FROM SELECT * from NATION;
EXECUTE my_select1 USING 1;
The SQL and HiveQL Reference documentation does not list PREPARE nor EXECUTE as available commands.
You would need to fully construct your SELECT statement before sending it to Amazon Athena.
You have to upgrade to Athena engine version 2 and now this seems to be supported as of 2021-03-12 but I can't find an official report:
https://docs.aws.amazon.com/athena/latest/ug/querying-with-prepared-statements.html
Athena does not support Parameterized queries. How ever you can create user-defined functions that you can call in the body of a query. Refer to this to know more about UDFs.