Set Dynamodb tables replication with boto3 as soon as possible - amazon-web-services

I'm creating dynamodb tables with boto3/python using the client.create_table() method, an asynchronous action. However, you cannot create a global table(replication) until "after" the table is in ACTIVE status. We would like the replication to happen as soon as possible, however, we need to return to the user ASAP and can't wait for the tables to be ready and run create_global_table() on them.
My first thoughts are to create an SQS message referencing the new tables that can then be processed by a lambda to monitor the status of said table until the status is ACTIVE and only then make the call to replicate the table, and only then delete the message.
I suppose I could also create a lambda that runs every minute and scans for tables that don't have global tables enabled then run create_global_table() on them. This seems in efficient these tables are not created very often.
Can anyone think of a better way of doing this?

If the client doesn't need to wait until the table is created, just set up a SQS Queue with a Lambda function behind it.
When a client requests a new table to be created, you send a message to the queue and respond to the client with something along the lines of the HTTP 202 status code.
You can then write the Lambda function that listens to the queue in a way to create the table, wait for it to become active and then create a global table.
Another option would be a Step Function that creates the table, then you loop until it becomes active and afterwards create the global table. Then you'd have no long running lambdas. You could trigger the step function from the initial lambda that accepts the request.

Related

Why do I receive two events after a update on dynamodb?

I have configured dynamodb stream to trigger my lambda. When I update an item on dynamodb table, I see my lambda is triggered twice with two different event. The NewImage and OldImage are same in these two events. They are only different in eventID, ApproximateCreationDateTime, SequenceNumber etc.
And there is only 1 million second different based on the timestamp.
I updated the item via dynamodb console which means there should be only one action happened. Otherwise, it is impossible to update item twice within 1 million second via console.
Is it expected to see two events?
This would not be expected behaviour.
If you're seeing 2 separate events this would indicate 2 separate actions occurred. As theres a different time this indicates a secondary action has occurred.
From the AWS Documentation the following is true
DynamoDB Streams helps ensure the following:
Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.
This will likely be related to your application, ensure that you're not using multiple writes where you think there might be a single.
Also check your CloudTrail to see whether there are multiple API calls that you can see. I would imagine if you're using global tables there's a possibility of seeing a secondary api call as the contents of the item would be modified by the DynamoDB service.

DynamoDB and computed columns: Run Lambda on GetItem / Query request but before data is returned to caller

Is it possible to run a Lambda function as part of a GetItem / Query request? I plan to use some kind of computed colum that I would like to update before the value is returned to the caller. The current idea is to do this with a Lambda function and DynamoDB Streams. Up to know, I kind of missed the part in the docs where I can specify the exact moment when the Lambda is executed (before, after fetching data). Of course, I am open for better ideas!
No. It is not possible. Dynamodb is designed to response items in distributed systems within milliseconds. There is no way to execute Lambdas synchronous with Put or Get Requets. DynamodDB Streams are more like asynchronous Table Trigger and only executed on new Data.
One Idea is to call an Lambda to collect and compute your data instead request Dynamodb.

Is there a way to maintain, using AWS services, an all time available list of pre populated random id's (UUID)?

A piece of code generates UUIDs. Another piece of code, running in an AWS Lambda, need to use few, say 5, random UUID out of these already generated UUID's. Any suggestion or advice, please?
One option would be for the process that is originally generating the UUIDs and inserting items into DynamoDB to also send each UUID to an SQS queue. That would allow a consumer application to get a batch of UUIDs and process them. The consumer would process its batch of UUIDs and then delete them from the SQS queue. While the consumer is processing its batch of UUIDs, they are not visible to any other SQS consumer so you won't see multiple consumers processing the same UUIDs.
You need to use some sort of persistent storage for this, as in storing generated UUIDs and then their status (used/unused).
A good option would be AWS DynamoDB, due to its excellent integration with AWS Lambda. You can start here
As of now the solution I am using (will update once I figure out an optimized approach)
Created a table in RDS (Relation Database) with only one column and
used it to store only the id of the actual object stored in DynamoDB.
A SQL query to pull random records from RDS table is simple and that
too not very costly in comparison of using queues and applying logic
to queues to maintain the randomization.

DynamoDb persist items for a specific time

I have this flow in which we have to persist in DynamoDb some items for a specific time. After the items has expired, we have to call some other services, to notify them that data got expired.
I was thinking about two solutions:
1) Move expiry check to Java logic:
Retrieve DynamoDb data in batches, verify the expiry items in Java, and after that delete the data in batches, and notify other services.
There are some limitations:
BatchGetItem let you retrieve max 100 items.
BatchWriteItem let you delete max 25 items
2) Move expiry check to the db logic:
Query the DynamoDb, in order to check which items has expired(and delete them), and return the id's to the client, in order for us to notify other services.
Again, there are some limitations:
The result set from a Query is limited to 1 MB per call.
For both solutions, there will be a job, that will be run periodically, or we're going to use some aws lambda that will be triggered periodically and will call an endpoint from our app that is going to delete the item from db and notify other services.
My question is if DynamoDb is proper for my case, or should I use some relational db that doesn't have these kind of limitations like Mysql? What do you think ? Thanks!
Have you considered using the DynamoDB TTL feature? This allows you to create a time-based column in your table that DynamoDB will use to automatically delete the items based on the time value.
This requires no implementation on your part and no polling, querying, or batching limitations. You will need to populate a TTL column but you may already have that information present if you are rolling your own expiration logic.
If other services need to be notified when a TTL event occurs, you can create a Lambda that processes a DynamoDB stream and take action when a TTL delete event occurs.

Is Redis atomic when multiple clients attempt to read/write an item at the same time?

Let's say that I have several AWS Lambda functions that make up my API. One of the functions reads a specific value from a specific key on a single Redis node. The business logic goes as follows:
if the key exists:
serve the value of that key to the client
if the key does not exist:
get the most recent item from dynamoDB
insert that item as the value for that key, and set an expiration time
delete that item from dynamoDB, so that it only gets read into memory once
Serve the value of that key to the client
The idea is that every time a client makes a request, they get the value they need. If the key has expired, then lambda needs to first get the item from the database and put it back into Redis.
But what happens if 2 clients make an API call to lambda simultaneously? Will both lambda processes read that there is no key, and both will take an item from a database?
My goal is to implement a queue where a certain item lives in memory for only X amount of time, and as soon as that item expires, the next item should be pulled from the database, and when it is pulled, it should also be deleted so that it won't be pulled again.
I'm trying to see if there's a way to do this without having a separate EC2 process that's just keeping track of timing.
Is redis+lambda+dynamoDB a good setup for what I'm trying to accomplish, or are there better ways?
A Redis server will execute commands (or transactions, or scripts) atomically. But a sequence of operations involving separate services (e.g. Redis and DynamoDB) will not be atomic.
One approach is to make them atomic by adding some kind of lock around your business logic. This can be done with Redis, for example.
However, that's a costly and rather cumbersome solution, so if possible it's better to simply design your business logic to be resilient in the face of concurrent operations. To do that you have to look at the steps and imagine what can happen if multiple clients are running at the same time.
In your case, the flaw I can see is that two values can be read and deleted from DynamoDB, one writing over the other in Redis. That can be avoided by using Redis's SETNX (SET if Not eXists) command. Something like this:
GET the key from Redis
If the value exists:
Serve the value to the client
If the value does not exist:
Get the most recent item from DynamoDB
Insert that item into Redis with SETNX
If the key already exists, go back to step 1
Set an expiration time with EXPIRE
Delete that item from DynamoDB
Serve the value to the client