How can I Query on TTL in dynamoDB? - amazon-web-services

I have setup a TTL attribute in my dynamoDB table. when i push records in a get the current date (using js sdk in node) and add a value to it (like 5000). It is my understanding that when that date is reached aws will purge the record but only within 48 hours. during that time the record could be returned as the result of a query.
I want to filter out the expired items so that if they are expired but not deleted they won't be returned as part of the query.
here is what i am using to try to do that:
var epoch = Math.floor(Date.now() / 1000);
console.log("ttl epoch is ", epoch);
var queryTTLParams = {
TableName : table,
KeyConditionExpression: "id = :idval",
ExpressionAttributeNames:{
"#theTTL": "TTL"
},
FilterExpression: "#theTTL < :ttl",
ExpressionAttributeValues: {
":idval": {S: "1234"},
":ttl": {S: epoch.toString()}
}
};
i do not get any results. I believe the issue has to do with the TTL attribute being a string and me trying to do a < on it. But i didn't get to decide on the datatype for the TTL field - aws did that for me.
How can i remedy this?

According to the Enabling Time to Live AWS documentation, the TTL should be set to a Number attribute:
TTL is a mechanism to set a specific timestamp for expiring items from your table. The timestamp should be expressed as an attribute on the items in the table. The attribute should be a Number data type containing time in epoch format. Once the timestamp expires, the corresponding item is deleted from the table in the background.
You probably just need to create a new Number attribute and set the TTL attribute to that one.

Related

How to compare strings in DynamoDB using Lambda NodeJS?

I have a lambda function that make some requests on DynamoDB.
var ddb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
const lookupminutes = 10;
var LookupDate = new Date(Date.now() - 1000 * lookupminutes);
params = {
TableName: TableName,
IndexName: "requestdate-index",
KeyConditionExpression: "requestdate > :startdate",
ExpressionAttributeValues: {":startdate": {S: LookupDate.toISOString()}
},
ProjectionExpression: "id, requestdate"
};
var results = await ddb.query(params).promise();
When running the lambda function, I'm getting the error : "Query key condition not supported" in the line that runs the query against DynamoDB
The field requestdate is stored in the table as a string.
Does anyone know what am I doing wrong please ?
Thanks.
You cannot use anything other than an equals operator on a partition key:
params = {
TableName: TableName,
IndexName: "requestdate-index",
KeyConditionExpression: "requestdate = :startdate",
ExpressionAttributeValues: {":startdate": {S: LookupDate.toISOString()}},
ProjectionExpression: "id, requestdate"
};
If you need all of the data back within the last 10 mins then they you have two choices, both of which are not very scalable, unless you shard your key (1a):
Put all the data in your index under the same partition key with sort key being timestamp. Then use KeyConditionExpression like:
gsipk=1 AND timestamp> 10mins
As all of the items are under the same partition key, the query will be efficient but at the cost of scalability as you will essentially bottleneck your throughput to 1000WCU.
1a. And probably the best option if you need scale beyond 1000 WCU is to do just as above except use a random number for the partition key (within a range). For example range = 0-9. That would give us 10 unique partition keys allowing us to scale to 10k WCU, however would require us to request 10 Query in parallel to retrieve the data.
Use a Scan with FilterExpression on the base table. If you do not want to place everything under the same key on the GSI then you can just Scan and add a filter. This becomes slow and expensive as the table grows.

How can I use nested field as TTL field in Dynamodb?

I am reading https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html to add a TTL field on a Dynamodb table. But I can't find how it supports a nested field as TTL field. For example,
id: xxx
user: { firstName: '', lastName:'', age: ''}
in above example, how can I use user -> age as ttl field?
First, a string can't be used for TTL. If you try and do that, it will be ignored, as per documentation:
The TTL attribute’s value must be a Number data type. For example, if you specify for a table to use the attribute name expdate as the TTL attribute, but the attribute on an item is a String data type, the TTL processes ignore the item.
Also from the same documentation page:
The TTL attribute’s value must be a timestamp in Unix epoch time format in seconds. If you use any other format, the TTL processes ignore the item. For example, if you set the value of the attribute to 1645119622, that is Thursday, February 17, 2022 17:40:22 (GMT), the item will be expired after that time.
I am mentioning this because I am getting an impression you want to use the user's age in some way as a TTL, and that is not a timestamp value.
And also, as #jarmod has said in the comment, it has to be a top-level attribute. You have to extract the value and add it to the TTL column, whichever it is, in a Number format timestamp value in seconds in the future.

Scanning With sort_key in DynamoDB

I have a table that will contain < 1300 entries at about 600 bytes each. The goal is to display pages of results ordered by epoch date. Right now, for any given search I request the full list of ids using a filtered scan, then handle paging on the UI side. For each page, I pass a chunk of ids to retrieve the full entry (also currently a filtered scan). Ideally, the list of ids would return sorted, but if I understand the docs correctly, only results that have the same partition key are sorted. My current partition key is a uuid, so all entries are unique.
Current Table Configuration
Do I essentially need to use a throwaway key for the partition just to get results returned by date? Maybe the size of my table makes this unreasonable to begin with? Is there a better way to handle this? I have another field, "is_active" that's currently a boolean and could be used for the partition key if I converted it to numeric, but that might complicate my update method. 95% of the time, every entry in the db will be "active", so this doesn't seem efficient.
Scan Index
let params = {
TableName: this.TABLE_NAME,
IndexName: this.INDEX_NAME,
ScanIndexForward: false,
ProjectionExpression: "id",
FilterExpression: filterSqlStatement,
ExpressionAttributeValues: filterValues,
ExpressionAttributeNames: {
"#n": "name"
}
};
let results = await this.DDB_CLIENT.scan(params).promise();
let finalizedResults = results ? results.Items : [];
Given that your dataset is relatively small you might try a fixed partition key with a sort key of the date and the UUID. You'd query by the partition key (which would be a fixed value) and the results would come back sorted. This isn't the best idea with large data sets, but < 1300 is not large.

Reasons why DynamoDB TTL is deleting items 48 hours early?

I was looking to set up TTL for my DynamoDB table and set it to be the "TTL" key. Used python3 + boto3 to do a batch write with the TTL field set to str(int(time.time()) + 172800). I also tried int(time.time()) + 172800.
In either case, the epoch time stamp in the TTL column was 48 hours in the future. When using the second version, hovering over the value in the DynamoDB table showed a pop up with the date and time of the time stamp and I confirmed it was 2 days in the future.
However, when I came back ~5 minutes later and refreshed the table, all of the entries were gone.
I repeated the process and actively refreshed to keep an eye on the values and they were all getting deleted gradually with later time stamps being deleted last. I checked the cloudwatch logs and they showed my scans and writes in the correct GMT time.
I'm just wondering what might cause this to happen. For reference the table's creation date was July 27, 2019; is it possible that the clock for the db is off?
Example timestamp that was deleted: 1588959677 which should translate to 5/8/2020 sometime.
Let me know if I need to provide more information and thanks for the help.
Edit: When I batch write I run the following:
boto3.resource('dynamodb', region_name).batch_write_item(RequestItems=put_data)
where:
put_data = { tablename: { "PutRequest": { "Item" { "id": "id", "TTL": ttl_integer_value, "another_id": "id", "flag": "true", "timestamp": original_time_value, "description": "some description" }}}}
I tried changing it to:
put_data = { tablename: { "PutRequest": { "Item" { "TTL": {"N": ttl_integer_value}... }}}}
but it threw an error with the key value not being valid
Also, if I hover over the integer value in the table, the appropriate date and time shows in a popup. Wouldn't that be an indication of the correct format?

Best method to extract data from dynamoDb and move it to another table

I have a table of 500gb. I want to transfer the data to another table based on the timestamps.
There are several items in table and I want only latest entry of every item in another table.
Considering the size of table, can anyone recommend best aws service to get it done fast and easy?
I have come across aws glue, hivecopyactivity. Are this the best solution or is there any other service I can use?
(assuming you now can add a Global secondary indexes (GSI) on that table, that is: you currently have < 5 GSIs)
Define a new GSI on your table. The GSI's partition key will be x. The GSI's sort key will be timestamp. Once you have that GSI defined you can do a query on that index with ScanIndexForward set to false to get the most recent item first. You need to supply the value of x you are interested at. In the following example request it is simply set to 'abc'
{
"TableName": "<your-table-name>",
"IndexName": "<your-GSI-name>",
"KeyConditionExpression": "x = :argx",
"ExpressionAttributeValues": {
":argx": {"S": "abc"}
},
"ScanIndexForward": false,
"Limit": 1
}
This query looks at items with a given x value (as set in the ExpressionAttributeValues field) sorted in descending order (by the GSI's sort key, which is the timestamp field) and picks the first one (Limit is set to 1). As long as you do not need filtering (the FilterExpression field is empty) then you will get the result that you need by issuing a single Query request.
If you do want to use filtering you will need to do multiple requests and unset the Limit field (i.e., use its default value). See this answer for further details on those subtleties.