I have created a new column to an existing dynamo db table in aws. Now I want a one-time script to populate values to the newly created column for all existing records. I have tried with the cursor as shown below from the PartiQL editor in aws
DECLARE cursor CURSOR FOR SELECT CRMCustomerGuid FROM "Customer";
OPEN cursor;
WHILE NEXT cursor DO
UPDATE "Customer"
SET "TimeToLive" = 1671860761
WHERE "CustomerGuid" = cursor.CRMCustomerGuid;
END WHILE
CLOSE cursor;
But I am getting the error message saying that ValidationException: Statement wasn't well formed, can't be processed: unexpected keyword
Any help is appreciated
DynamoDB is not a relational database and PartiQL is not a full SQL implementation.
Here’s the docs on the language. Cursor isn’t in there.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.html
My own advice would be to use the plain non-SQL interface first - because with it the calls you can make map directly to the things the database can do.
Once you understand that you may, in some contexts, leverage the PartiQL ability.
From #hunterhacker's comment we know that cursors are not possible with PartiQL. Similarly, we are unable to run multiple types of executions in PartiQL's web editor thus we are unable to do a SELECT then UPDATE.
However, this is quite easily achieved using the CLI or SDK. Below is a simple bash script which will update all of the items in your table with a TTL value, execute from any linux/unix based shell:
for pk in `aws dynamodb scan --table-name Customer --projection-expression 'CustomerGuid' --query 'Items[*].pk.S' --output text`; do
aws dynamodb update-item \
--table-name Customer \
--key '{"CustomerGuid": {"S": "'$pk'"}}' \
--update-expression "SET #ttl = :ttl" \
--expression-attribute-names '{"#ttl":"TimeToLive"}' \
--expression-attribute-values '{":ttl":{"N": "1671860762"}}'
done
Related
I am having a problem, where i have set enableUpdateCatalog=True and also updateBehaviour=LOG to update my glue table which has 1 partition key. After the job, runs there are no new partitions added on my glue catalog table, but data in S3 is separated by the partition key i have used, how do i get the job to automatically partition my glue catalog table?
Currently i have to manually run boto3 create_partition to create partitions on my glue catalog table. I want my job to automatically be able to create partitions as it discovers in S3 path separated by partition Keys
Code:
additionalOptions = {
"enableUpdateCatalog": True,
"updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]
my_df = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
table_name=<dst_tbl_name>, transformation_ctx="DataSink1",
additional_options=additionalOptions)
job.commit()
PS: I am currently using PARQUET format
Am i missing any Rights that has to be added to my job so that it can create partitions from the job itself?
I got it to work by adding useGlueParquetWriter: 'true' to the CATALOG table properties. And also I have added
format_options = {
'useGlueParquetWriter': True
}
in the write_dynamic_frame.from_catalog calls.
These steps got it to start working :)
I need to get the list of all items under Platform key in a certain Dynamodb Table called JKOwner using AWS CLI. This is what I'm currently trying to test:
aws dynamodb get-item --table-name JKOwner --key '{"Platform": {"S": "GOVTEST"}}'
Using this cli command, I am able to get the item GOVTEST under the Platform key. But what command should I use if I need to get all items aside from GOVTEST so I could store it in a variable? There are around 200+ Platform values which I need to get into the list. The details of the item doesn't really matter as of now. Thanks in advance for your help.
If you truly need to get all items, you should use a Scan operation. Then you use a filter expression to omit items where platform = GOVTEST items.
If you want to make it really easy on yourself, you could use PartiQL for DynamoDB and do something like:
SELECT * FROM JKOwner WHERE Platform <> 'GOVTEST'
For an example of doing PartiQL on the command line, look on the PartiQL for DynamoDB tab on this page or do this:
aws dynamodb execute-statement --statement "SELECT * FROM Music \
WHERE Artist='Acme Band' AND SongTitle='Happy Day'"
Have you tried?
aws dynamodb query
--table-name JKOwner
--key-condition-expression "Platform = :v1"
--expression-attribute-values "{ \":v1\" : { \"S\" : \"GOVTEST\" } }"
I tried to put an item into my DynamoDB table from my CLI. My command prompt gives me no errors. But when I refresh the DyanamoDB website, I don't see the item added in the "Global Secondary Index". When I try "aws dynamodb scan --table-name temp", some numbers and "<-" show up, not the item I put in.
Screenshot of my commands:
My dynamodb.json file:
{
"name": {"S": "Suguru Chhaya"},
"email": {"S": "suguruchhaya#gmail.com"}
}
As commented, the formatting is due to terminal control sequences so the table was working correctly.
Actually, the "Global Secondary indexes" was the wrong page to look at... I should have went to "Overview"->"View Items".
So, let's say I'm trying to post this JSON via the command line (not in a file because I'm not going to write a file for every invocation of this script) to a dynamo DB table
{\"TeamId\":{\"S\":\"One_Space_123\"},\"TeamName\":{\"S\":\"One_Space\"},\"Environment\":{\"S\":\"cte\"},\"StartDate\":{\"S\":\"null\"},\"EndDate\":{\"S\":\"null\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someones user\"},\"EmailDistributionList\":{\"S\":\"test#test.com\"},\"RemedyGroup\":{\"S\":\"OneSpace\"},\"ScomSubscriptionId\":{\"S\":\"guid-ab22-2345\"},\"ZabbixActionId\":{\"S\":\"11\"},\"SnsTopic\":{\"M\":{\"TopicName\":{\"S\":\"ATopicName\"},\"TopicArn\":{\"S\":\"AtopicArn1234\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someones user\"}}}}
Then the result from the CLI is one like this:
Unknown options: Space"},"ScomSubscriptionId":{"S":"guid-ab22-2345"},"ZabbixActionId":{"S":"11"},"SnsTopic":{"M":{"TopicName":{"S":"ATopicName"},"TopicArn":{"S":"AtopicArn1234"},"CreatedDate":{"S":"today"},"CreatedBy":{"S":"someones, user"}}}}, user"},"EmailDistributionList":{"S":"test#test.com"},"RemedyGroup":{"S":"One
As you can see, it fails on the TeamName property that in the above example is "One Space". If I change that value to "OneSpace" then instead it starts to fail on the "CreatedBy" property that is populated by "someones user" but if I remove all spaces from all properties I can suddenly pass this json to dynamoDB successfully.
In a working example the json looks like this:
{\"TeamId\":{\"S\":\"One_Space_123\"},\"TeamName\":{\"S\":\"One_Space\"},\"Environment\":{\"S\":\"cte\"},\"StartDate\":{\"S\":\"null\"},\"EndDate\":{\"S\":\"null\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someonesuser\"},\"EmailDistributionList\":{\"S\":\"test#test.com\"},\"RemedyGroup\":{\"S\":\"OneSpace\"},\"ScomSubscriptionId\":{\"S\":\"guid-ab22-2345\"},\"ZabbixActionId\":{\"S\":\"11\"},\"SnsTopic\":{\"M\":{\"TopicName\":{\"S\":\"ATopicName\"},\"TopicArn\":{\"S\":\"AtopicArn1234\"},\"CreatedDate\":{\"S\":\"today\"},\"CreatedBy\":{\"S\":\"someonesuser\"}}}}
I can't find any documentation that tells me I can't have spaces, if I read this in from a file it will post it with the spaces, so what gives? If anyone has any advice on this matter, I certainly appreciate it.
For what it's worth in Powershell the execution looks like this currently (though I've tried various combinations of quoting the $dbTeamTableEntry variable
$dbEntry = aws.exe dynamodb put-item --region $region --table-name $table --item "$($dbTeamTableEntry)"
As per this AWS Forum Thread, does anyone know how to use AWS Glue to create an AWS Athena table whose partitions contain different schemas (in this case different subsets of columns from the table schema)?
At the moment, when I run the crawler over this data and then make a query in Athena, I get the error 'HIVE_PARTITION_SCHEMA_MISMATCH'
My use case is:
Partitions represent days
Files represent events
Each event is a json blob in a single s3 file
An event contains a subset of columns (dependent on the type of event)
The 'schema' of the entire table is the full set of columns for all the event types (this is correctly put together by Glue crawler)
The 'schema' of each partition is the subset of columns for the event types that occurred on that day (hence in Glue each partition potentially has a different subset of columns from the table schema)
This inconsistency causes the error in Athena I think
If I were to manually write a schema I could do this fine as there would just be one table schema, and keys which are missing in the JSON file would be treated as Nulls.
Thanks in advance!
I had the same issue, solved it by configuring crawler to update table metadata for preexisting partitions:
It also fixed my issue!
If somebody need to provision This Configuration Crawler with Terraform so here is how I did it:
resource "aws_glue_crawler" "crawler-s3-rawdata" {
database_name = "my_glue_database"
name = "my_crawler"
role = "my_iam_role.arn"
configuration = <<EOF
{
"Version": 1.0,
"CrawlerOutput": {
"Partitions": { "AddOrUpdateBehavior": "InheritFromTable" }
}
}
EOF
s3_target {
path = "s3://mybucket"
}
}
This helped me. Posting the image for others in case the link is lost
Despite selecting Update all new and existing partitions with metadata from the table. in the crawler's configuration, it still occasionally failed to set the expected parameters for all partitions (specifically jsonPath wasn't inherited from the table's properties in my case).
As suggested in https://docs.aws.amazon.com/athena/latest/ug/updates-and-partitions.html, "to drop the partition that is causing the error and recreate it" helped
After dropping the problematic partitions, glue crawler re-created them correctly on the following run