Reasons why DynamoDB TTL is deleting items 48 hours early? - amazon-web-services

I was looking to set up TTL for my DynamoDB table and set it to be the "TTL" key. Used python3 + boto3 to do a batch write with the TTL field set to str(int(time.time()) + 172800). I also tried int(time.time()) + 172800.
In either case, the epoch time stamp in the TTL column was 48 hours in the future. When using the second version, hovering over the value in the DynamoDB table showed a pop up with the date and time of the time stamp and I confirmed it was 2 days in the future.
However, when I came back ~5 minutes later and refreshed the table, all of the entries were gone.
I repeated the process and actively refreshed to keep an eye on the values and they were all getting deleted gradually with later time stamps being deleted last. I checked the cloudwatch logs and they showed my scans and writes in the correct GMT time.
I'm just wondering what might cause this to happen. For reference the table's creation date was July 27, 2019; is it possible that the clock for the db is off?
Example timestamp that was deleted: 1588959677 which should translate to 5/8/2020 sometime.
Let me know if I need to provide more information and thanks for the help.
Edit: When I batch write I run the following:
boto3.resource('dynamodb', region_name).batch_write_item(RequestItems=put_data)
where:
put_data = { tablename: { "PutRequest": { "Item" { "id": "id", "TTL": ttl_integer_value, "another_id": "id", "flag": "true", "timestamp": original_time_value, "description": "some description" }}}}
I tried changing it to:
put_data = { tablename: { "PutRequest": { "Item" { "TTL": {"N": ttl_integer_value}... }}}}
but it threw an error with the key value not being valid
Also, if I hover over the integer value in the table, the appropriate date and time shows in a popup. Wouldn't that be an indication of the correct format?

Related

Oracle Apex - REST data source - nested JSON array - sync two tables - where to write SQL

This question is a follow up to another SO question.
Summary: I have an API returning a nested JSON array. Data is being extracted via APEX REST Data Sources. The Row Selector in the Data Profile is set to "." (to select the "root node").
The lines array has been manually added to a column (LINES) to the Data Profile, set data type to JSON Document, and used lines as the selector.
SAMPLE JSON RESPONSE FROM API
[ {
"order_number": "so1223",
"order_date": "2022-07-01",
"full_name": "Carny Coulter",
"email": "ccoulter2#ovh.net",
"credit_card": "3545556133694494",
"city": "Myhiya",
"state": "CA",
"zip_code": "12345",
"lines": [
{
"product": "Beans - Fava, Canned",
"quantity": 1,
"price": 1.99
},
{
"product": "Edible Flower - Mixed",
"quantity": 1,
"price": 1.50
}
]
},
{
"order_number": "so2244",
"order_date": "2022-12-28",
"full_name": "Liam Shawcross",
"email": "lshawcross5#exblog.jp",
"credit_card": "6331104669953298",
"city": "Humaitá",
"state": "NY",
"zip_code": "98670",
"lines": [
{
"order_id": 5,
"product": "Beans - Green",
"quantity": 2,
"price": 4.33
},
{
"order_id": 1,
"product": "Grapefruit - Pink",
"quantity": 5,
"price": 5.00
}
]
},
]
The order attributes have been synchronized to a local table (Table name: SOTEST_LOCAL)
The table has the correct data. As you can see below, the LINES column contains the JSON array.
I then created an ORDER_LINES child table to extract the JSON from LINES column in the SOTEST_LOCAL table. (Sorry for the table names.. I should've named the tables as ORDERS_LOCAL and ORDER_LINES_LOCAL)
CREATE TABLE "SOTEST_ORDER_LINES_LOCAL"
( "LINE_ID" NUMBER,
"ORDER_ID" NUMBER,
"LINE_NUMBER" NUMBER,
"PRODUCT" VARCHAR2(200) COLLATE "USING_NLS_COMP",
"QUANTITY" NUMBER,
"PRICE" NUMBER,
CONSTRAINT "SOTEST_ORDER_LINES_LOCAL_PK" PRIMARY KEY ("LINE_ID")
USING INDEX ENABLE
) DEFAULT COLLATION "USING_NLS_COMP"
/
ALTER TABLE "SOTEST_ORDER_LINES_LOCAL" ADD CONSTRAINT "SOTEST_ORDER_LINES_LOCAL_FK" FOREIGN KEY ("ORDER_ID")
REFERENCES "SOTEST_LOCAL" ("ORDER_ID") ON DELETE CASCADE ENABLE
/
QuickSQL version..
SOTEST_ORDER_LINES_LOCAL
LINE_ID /pk
ORDER_ID /fk SOTEST_LOCAL references ORDER_ID
LINE_NUMBER
PRODUCT
QUANTITY
PRICE
So per Carsten's answer in the previous question, I can write SQL to extract the JSON array from the LINES column in the SOTEST_LOCAL table to the child table SOTEST_ORDER_LINES_LOCAL.
My question is two parts.
Where exactly do I write the SQL? Would I write it in SQL Workshop in SQL Commands?
The REST data is being synchronized to make a request every hour. So would I need to write a function that runs every time there is new data being merged?
there are multiple options for this:
Create a trigger on the local synchronization table
You could create an trigger on your ORDERS table, which runs AFTER INSERT, UPDATE or DELETE on your ORDERS table, and which maintains the LINES table. The nice things about this one is that the maintenance of the child table is independent from APEX or the REST Synchronization; it would also work if you just inserted rows with plain SQL*Plus.
Here's some pseudo-code on how the trigger could look like.
create or replace trigger tr_maintain_lines
after insert or update or delete on ORDERS_LOCAL
for each row
begin
if inserting then
insert into SOTEST_ORDER_LINES_LOCAL ( order_id, line_id, line_number, product, quantity, price)
( select :new.id,
seq_lines.nextval,
j.line#,
j.product,
j.quantity,
j.price
from json_table(
:new.lines,
'$[*]' columns (
line# for ordinality,
product varchar2(255) path '$.product',
quantity number path '$.quantity',
price number path '$.price' ) ) );
elsif deleting then
delete SOTEST_ORDER_LINES_LOCAL
where order_id = :old.id;
elsif updating then
--
-- handle the update case here.
-- I would simply delete and re-insert LINES rows.
end if;
end;
Handle child table maintenance in APEX itself.
You could turn off the schedule of your REST Source synchronization, and have it only running when called with APEX_REST_SOURCE_SYNC.SYNCHRONIZE_DATA (https://docs.oracle.com/en/database/oracle/apex/22.1/aeapi/SYNCHRONIZE_DATA-Procedure.html#GUID-660DE4D1-4BAF-405A-A871-6B8C201969C9).
Then create an APEX Automation, which runs on your desired schedule, and this automation has two Actions. One would be the REST Source Synchronization, the other one would call PL/SQL code to maintain the child tables.
Have a look into this blog posting which talks a bit about more complex synchronization scenarios (although it does exactly fit scenario): https://blogs.oracle.com/apex/post/synchronize-parent-child-rest-sources
I hope this helps

iteration over dynamodb partition keys

I am using AWS.DynamoDB.DocumentClient. I want to iterate over the items and conditionally update them.
I have a table which contains 4000 items. When I scan the table, even if I use ProjectionExpression, I get only 480 results. This is because of scan size limit (1 MB). I'm pretty sure if I get only partition keys, it will be less than 1 MB.
There are some similar questions about scanning specific items. But that's not what I struggle. What can I do to list all partition keys of my table? Thanks.
Here is my scan operation;
docClient.scan({
TableName: "Recipes",
"ProjectionExpression": "#key",
"ExpressionAttributeNames": {
"#key": "id"
}
}, async (err, recipes) => {
console.log("scanned recipes:" + recipes.Items.length)
//output: 477 (but the list have 4000 items)
}
Can you show the scan operation you've tried but isn't working for you?
The following worked for me (my partition key is named PK)
ddbClient.scan(
{
"TableName": "<MY TABLE NAME>",
"ProjectionExpression": "#PK,
"ExpressionAttributeNames": {
"#PK": "PK"
}
}
)
Keep in mind that DynamoDB will consider the entire item size when calculating the 1MB limit, even if you use a projection expression that limits the response to just a few attributes. If your scan result returns a LastEvaluatedKey, you know that DynamoDB is paginating the results.
I found the solution on documentary. ExclusiveStartKey is the answer.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.NodeJs.04.html

AWS AppSync delta table not working correctly

I am following aws appsync tutorial and i'm stuck at delta sync step (https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-delta-sync.html).
I have finished the example but the result is not as expected. In the update step, dynamodb does not create 2 records (a record when the item was created and record for when the item was updated) as in the example. And when using delta query, an error message is received:
"data": null,
"errors": [
{
"path": [
"syncPosts"
],
"locations": null,
"message": "Cannot return null for non-nullable type: 'PostConnection' within parent 'Query' (/syncPosts)"
}
]
}
My detal table ttl is 1 min, and delta queries only select from base table, not delta table.
Can someone look into this and help me out? Thanks
I found solution, i was setting the delta table partition_key field to ds_pk and delta table sort_key field to ds_sk. The result is as expected after the changed.

How can I Query on TTL in dynamoDB?

I have setup a TTL attribute in my dynamoDB table. when i push records in a get the current date (using js sdk in node) and add a value to it (like 5000). It is my understanding that when that date is reached aws will purge the record but only within 48 hours. during that time the record could be returned as the result of a query.
I want to filter out the expired items so that if they are expired but not deleted they won't be returned as part of the query.
here is what i am using to try to do that:
var epoch = Math.floor(Date.now() / 1000);
console.log("ttl epoch is ", epoch);
var queryTTLParams = {
TableName : table,
KeyConditionExpression: "id = :idval",
ExpressionAttributeNames:{
"#theTTL": "TTL"
},
FilterExpression: "#theTTL < :ttl",
ExpressionAttributeValues: {
":idval": {S: "1234"},
":ttl": {S: epoch.toString()}
}
};
i do not get any results. I believe the issue has to do with the TTL attribute being a string and me trying to do a < on it. But i didn't get to decide on the datatype for the TTL field - aws did that for me.
How can i remedy this?
According to the Enabling Time to Live AWS documentation, the TTL should be set to a Number attribute:
TTL is a mechanism to set a specific timestamp for expiring items from your table. The timestamp should be expressed as an attribute on the items in the table. The attribute should be a Number data type containing time in epoch format. Once the timestamp expires, the corresponding item is deleted from the table in the background.
You probably just need to create a new Number attribute and set the TTL attribute to that one.

Custom tag or code to determine if a specific date is a holiday

I need to figure out if a particular date falls on a holiday, such as Memorial Day, Labor Day, Thanksgiving, Easter, etc. However, these holidays float based on week of month or day of week. I'm sure there is code out there to do this, so I'd hate to reinvent the wheel.
Specifically, I will have a date that something occurs and want to add information about that date, or do extra things (such as add extra pay), if the event happens on a holiday. Something like this, but for every federal holiday:
if ( month( date ) == 9 && day( date ) < 8 && DayOfWeek( date ) == 2 ) {
holiday = 'labor day';
}
On a side note, does anyone know a working URL for the CF custom tag library, or did they kill that?
You can access & cache the Google Calendar US Holiday feed. You'll need to register for a free API key. "FullCalendar" has instructions on establishing establishing a Calendar account for use w/JS, but you can use CF to consume the JSON once it's configured.
http://fullcalendar.io/docs/google_calendar/
View the source and server JSON response on this page using browser developer F12 tools:
http://fullcalendar.io/js/fullcalendar-2.5.0/demos/gcal.html
The URL you will use will look something like this (but with your own API key):
https://www.googleapis.com/calendar/v3/calendars/usa__en#holiday.calendar.google.com/events?key=AIzaSyDcnW6WejpTOCffshGDDb4neIrXVUA1EAE
The JSON response you get back will have a "items" array containing holiday infomation and the dates in ISO8601 format.
"items": [
{
"kind": "calendar#event",
"etag": "\"2778476758000000\"",
"id": "20140101_60o30dr46oo30c1g60o30dr4ck",
"status": "confirmed",
"htmlLink": "https://calendar.google.com/calendar/event?eid=MjAxNDAxMDFfNjBvMzBkcjQ2b28zMGMxZzYwbzMwZHI0Y2sgdXNhX19lbkBo",
"created": "2014-01-09T03:32:59.000Z",
"updated": "2014-01-09T03:32:59.000Z",
"summary": "New Year's Day",
"creator": {
"email": "usa__en#holiday.calendar.google.com",
"displayName": "Holidays in United States",
"self": true
},
"organizer": {
"email": "usa__en#holiday.calendar.google.com",
"displayName": "Holidays in United States",
"self": true
},
"start": {
"date": "2014-01-01"
},
"end": {
"date": "2014-01-02"
},
"transparency": "transparent",
"visibility": "public",
"iCalUID": "20140101_60o30dr46oo30c1g60o30dr4ck#google.com",
"sequence": 0
}
]
I recommend saving the API response data (so you can reuse it locally without having to rely on the remote API) and generate a struct using ISO8601 dates as the keys (yyyy-MM-DD) and have the value be an array of holiday names. You may want to extend this to denote whether it's a Federal "paid" holiday or not as I don't believe the Google Calendar API (or any holiday date library) will identify that for you.
Holidays = {
"2015-12-31" = ["New Year's Eve"],
"2016-01-01" = ["New Year's Day"]
}
and then use something like this as your logic:
holidayNames = '';
if (StructKeyExists(Holidays, DateFormat(theDate, "yyyy-MM-DD"))){
holidayNames = ArrayToList(Holidays[DateFormat(theDate, "yyyy-MM-DD")]);
}
UPDATE While googling for a ColdFusion-based library, I realized I wrote a GetGoogleHolidays UDF a year prior. (The UDF fetchs US holiday JSON data using the Google Calendar API, generates a struct with YYYYMMDD keys that contains an array of holiday names and caches it for 24 hours.)
http://gamesover2600.tumblr.com/post/104768724954/fetch-holiday-dates-from-google-calendar-api-using