DynamoDb persist items for a specific time - amazon-web-services

I have this flow in which we have to persist in DynamoDb some items for a specific time. After the items has expired, we have to call some other services, to notify them that data got expired.
I was thinking about two solutions:
1) Move expiry check to Java logic:
Retrieve DynamoDb data in batches, verify the expiry items in Java, and after that delete the data in batches, and notify other services.
There are some limitations:
BatchGetItem let you retrieve max 100 items.
BatchWriteItem let you delete max 25 items
2) Move expiry check to the db logic:
Query the DynamoDb, in order to check which items has expired(and delete them), and return the id's to the client, in order for us to notify other services.
Again, there are some limitations:
The result set from a Query is limited to 1 MB per call.
For both solutions, there will be a job, that will be run periodically, or we're going to use some aws lambda that will be triggered periodically and will call an endpoint from our app that is going to delete the item from db and notify other services.
My question is if DynamoDb is proper for my case, or should I use some relational db that doesn't have these kind of limitations like Mysql? What do you think ? Thanks!

Have you considered using the DynamoDB TTL feature? This allows you to create a time-based column in your table that DynamoDB will use to automatically delete the items based on the time value.
This requires no implementation on your part and no polling, querying, or batching limitations. You will need to populate a TTL column but you may already have that information present if you are rolling your own expiration logic.
If other services need to be notified when a TTL event occurs, you can create a Lambda that processes a DynamoDB stream and take action when a TTL delete event occurs.

Related

Most efficient way to get total number of 'online users' from a Connections dynamodb table fed from a websocket

I have a Dynamodb table for Connections. The idea is that when a user logs into a website, a connection will be made via websockets and this connection information is stored in this table.
Now, we have a feature we want to release which shows total users online. My thoughts are, i could add a new API end point which scans dynamodb and returns the count of connections, but this would involve a dynamodb scan every time the UI refreshes - guessing this would be very expensive.
Another option i thought of was creating an API and an scheduled lambda that calls this API once every 10 minutes and uploads the count to an S3 file, the API for the UI could then be pointed at the S3 file which would be cheaper but this would not be real time as its 10 mins out of date potentially.
Alternatively, i tried to use the /#connections end point to see if this returned the total connections via the websocket API but seems i am getting CORS error when doing so and there's no way in AWS for us to be able to set CORS on the provided HTTP #connections route.
I would be interested in some ideas how to achieve this in the most efficient way :) my estimated table of connections could have anywhere between 5k-10k items.
Best thing here would be to use an item in the table to hold the live item count.
Add connection:
Add connection to DDB -> Stream -> Lambda -> Increment count item
Remove connection:
Remove connection from DDB -> Stream -> Lambda -> Decrement count item
This will allow you to efficiently gain the number of live-users on the system by a simple GetItem.
You just need to be mindful that a single item can consume only 1000WCU per second, so if you are trying to update the item more than 1000 times per second you will either:
Have to aggregate the events in Lambda, using a sliding window.
Artificially shard the count item n ways, count-item0, count-item1 etc...

Will a click counter slow down my DynamoDB API?

I want to create a DynamoDB WebAPI. It allows the creation and reading of Posts. Now I would like to implement a click counter that updates the popularity of a post each time a user requests it. For this reason, every time a GET request for a posts comes in, I would change the Post object itself.
But I know that DynamoDB is optimized for reads, not for writes. So updating the object that is being fetched everytime would probably be a problem.
So how can I measure the popularity of posts without slowing down the API itself? I was thinking of generating a random number for every fetch and only updating it if it is below 0.05 or something similar.
But is there a better solution for this?
Dynamo DB isn't "optimized for reads" it's optimized to provide "consistent, single-digit millisecond response times at any scale."
To optimize DDB for reads, you'd want to stick a Amazon DynamoDB Accelerator (DAX) instance in front of it for "faster access with microsecond latency".
In actuality, the DDB read/write performance isn't going to be an issue. In your case the network latency between your app and DDB will be orders of magnitude higher. By making two calls synchronously one after the other you'd be doubling your response time; regardless of what cloud DB you're writing too.
Assuming the data and counter are in the same record, the simple DDB solution in this case would be to not make a call to GetItem() and one to UpdateItem(). Instead, simply call UpdateItem() with an UpdateExpression that uses the ADD expression to add 1 to your counter and the ReturnValues attribute to return either ALL_OLD or ALL_NEW.
Other more complex solutions
assuming you've already got the data for display, do an async call to UpdateItem().
At scale, you might consider disconnecting the counter update from your app. Your app post a SQS message, that's processed by a lambda which could use batch updates to DDB.

Finding expired data in aws dynamoDB

I have a requirement where I need to store some data in dynamo-db with a status and a timestamp. Eg. <START, 20180203073000>
Now, above status flips to STOP when I receive a message in SQS. But, to make my system error-proof, I need some mechanism through which I can identify whether a data having START status present in dynamo-db is older than 1 day then set it's status to STOP. So that, it may not wait indefinitly for the message to arrive from SQS.
Is there an aws feature which I can use to achieve this, without polling for data at regular interval ?
Not sure if this will fit your needs, but here is one possibility:
Enable TTL on your DynamoDB table. This will work if your timestamp
data attribute is a Number data type containing time in epoch
format. Once the timestamp expires, the corresponding item is
deleted from the table in the background.
Enable Streams on your DynamoDB table. Items that are deleted by TTL
will be sent to the stream.
Create Trigger that connects DynamoDB stream to Lambda function. In your case the
trigger will receive your entire deleted item.
Modify your record (set 'START' to 'STOP'), remove your timestamp attribute (items with no TTL attribute are not deleted) and re-insert into the table.
This way you will avoid the table scans searching for expired items, but on other side there might be cost associated with lambda execution.
You can try creating a GSI using the status as primary key and timestamp as sort key. When querying for expired items, use a condition expression like status = "START" and timestamp < 1-day-ago.
Be careful though, because this basically creates 2 hot partitions (START and STOP), so make sure the projection expression only has the data you need and no more.
If you have a field that's set on the status = START state but doesn't exist otherwise, you'd be able to take advantage of a sparse index (basically, DynamoDB won't index any items in a GSI if the GSI keys don't exist on the item, so you don't need to filter them on query)

Is Redis atomic when multiple clients attempt to read/write an item at the same time?

Let's say that I have several AWS Lambda functions that make up my API. One of the functions reads a specific value from a specific key on a single Redis node. The business logic goes as follows:
if the key exists:
serve the value of that key to the client
if the key does not exist:
get the most recent item from dynamoDB
insert that item as the value for that key, and set an expiration time
delete that item from dynamoDB, so that it only gets read into memory once
Serve the value of that key to the client
The idea is that every time a client makes a request, they get the value they need. If the key has expired, then lambda needs to first get the item from the database and put it back into Redis.
But what happens if 2 clients make an API call to lambda simultaneously? Will both lambda processes read that there is no key, and both will take an item from a database?
My goal is to implement a queue where a certain item lives in memory for only X amount of time, and as soon as that item expires, the next item should be pulled from the database, and when it is pulled, it should also be deleted so that it won't be pulled again.
I'm trying to see if there's a way to do this without having a separate EC2 process that's just keeping track of timing.
Is redis+lambda+dynamoDB a good setup for what I'm trying to accomplish, or are there better ways?
A Redis server will execute commands (or transactions, or scripts) atomically. But a sequence of operations involving separate services (e.g. Redis and DynamoDB) will not be atomic.
One approach is to make them atomic by adding some kind of lock around your business logic. This can be done with Redis, for example.
However, that's a costly and rather cumbersome solution, so if possible it's better to simply design your business logic to be resilient in the face of concurrent operations. To do that you have to look at the steps and imagine what can happen if multiple clients are running at the same time.
In your case, the flaw I can see is that two values can be read and deleted from DynamoDB, one writing over the other in Redis. That can be avoided by using Redis's SETNX (SET if Not eXists) command. Something like this:
GET the key from Redis
If the value exists:
Serve the value to the client
If the value does not exist:
Get the most recent item from DynamoDB
Insert that item into Redis with SETNX
If the key already exists, go back to step 1
Set an expiration time with EXPIRE
Delete that item from DynamoDB
Serve the value to the client

Event to be triggered at a configurable time in AWS

I am working on a project which has a table "Games" with columns ID, Name, Start-time, End-time. Whenever an item is added to the table, and the end-time is reached, I want to trigger an event (SNS notification, but I can work my way around with anything else). The current implementation polls the table to check what "games" have expired and generates the event. Is there a technique to avoid polling, like SQSs, SWFs which I can use. The problem with the current approach is the polling the DB is costing me too much money since the complete infra is on cloud.
AWS DynamoDB is a perfect suit for this. You can set the record to Auto-Expire with a TTL attribute and the record will be removed.
If you want to extend the time of auto expiry, you can update the ttl attribute of the record and it will get extended.
Time to Live on DynamoDB:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html
You can configure a lambda to capture the deleted record and do whatever you want to do from there.
DynamoDB Streams:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
It is all event-based and zero polling.
Hope it helps.