DynamoDB column value reset to its initial value everyday - amazon-web-services

I have a table called 'Alunos' and one column called 'Presente' and it's a boolean value, which is set 'true'. And if a student didn't attend class, it changes to 'false'.
I need to know, how to make this value return 'true' automatically when changed the day.
Example: today is 9/10/19 and the value of the column is false. When it changes the day - 10/10/19 - it should return true. (its initial value)
How could I do that? Any tips?
Also, after doing that, I want to create a report or dashboard about this data, a monthly report for example. Do you have any tip for it as well?
Hope I made me clear.
Thanks

You should not change any existring value.
Here is what you should do.
Create 1 entry for everyday, with default value set to true or false.
Change the value based on user attendance.
Use enum instead of true/false.
This will make sure
You can see past trend.
you can have complex pattern like if holiday have new value than true and false.
If you are adamant on using single value per user, have a lambda function triggered by cloudwatch event everyday, which resets the value. But it will cause significant number of write everyday, instead of doing that you can check if value is expired and update only if expired.

this can solve your problem:
Time to Live (TTL) for Amazon DynamoDB lets you define when items in a table expire so that they can be automatically deleted from the database. TTL is provided at no extra cost as a way to reduce storage usage and reduce the cost of storing irrelevant data without using provisioned throughput.
and here is an example to set up TTL

Related

DynamoDB TTL fix strategy - set to wrong type (String and not Number)

We just realized that we have set TTL type on the DynamoDB table as String and not as a Number. The TTL field is named "ttl" unfortunately.
"ttl" was set to expire records that are older than 365 days.
This table has grown and is very very big... 26GB.
what is the quickest way to change the type of the "ttl" from String to Number?
Few thoughts that we had:
A. Change code so that when a "new record" is saved we set its "ttl" field to Number.
This works if you try it through the console! In this case, we would write a "clean up" job to go through previous records and fix the "type" to Number.
B. Create a new field "ttl_expiry" of type Number. So when "new record" is saved we set "ttl_expiry" and NOT the original "ttl" column. In this case, we would write a "clean up" job to go through previous records and "convert from string to number and copy" into "ttl_expiry" field.
what other strategies can we use to fix this TTL issue?
Is there a quick batch update we can do?
Ah yes, and we are using java :-)
Any help/suggestions would be appreciated.
Either way, you'd have to update all records in the table, which means you need to read and update them. That's going to be comparatively pricy and can't really be optimized if you're doing it online.
For DynamoDB it also doesn't really matter which of the options you're taking, it will only care about attributes that are called ttl and have the type Number (If you change the attribute name, it will look for that).
I'd be more worried about Java, it tends to be fussy with data types in the sense that it cares more about them than let's say Python. Option B may be more appropriate in this case and allows you to read "old" records without issues.
Shorter attribute names help you save on storage cost, so having ttl is probably preferable, but given the modest size of the data that's not going to be a significant number.
Bottom line: Either way can work, with Java the mechanical sympathy is probably higher with B), if you're optimizing for ongoing cost, go for A).

What's the cheapest way to store an auto increment indexed list of values in AWS?

I have a DynamoDB-based web application that uses DynamoDB to store my large JSON objects and perform simple CRUD operations on them via a web API. I would like to add a new table that acts like a categorization of these values. The user should be able to select from a selection box which category the object belongs to. If a desirable category does not exist, the user should be able to create a new category specifying a name which will be available to other objects in the future.
It is critical to the application that every one of these categories be given a integer ID that increments starting the first at 1. These numbers that are auto generated will turn into reproducible serial numbers for back end reports that will not use the user-visible text name.
So I would like to have a simple API available from the web fronted that allows me to:
A) GET /category : produces { int : string, ... } of all categories mapped to an ID
B) PUSH /category : accepts string and stores the string to the next integer
Here are some ideas for how to handle this kind of project.
Store it in DynamoDB with integer indexes. This leaves has some benefits but it leaves a lot to be desired. Firstly, there's no auto incrementing ID in DynamoDB, but I could definitely get the state of the table, create a new ID, and store the result. This might have issues with consistency and race conditions but there's probably a way to achieve this safely. It might, however, be a big anti pattern to use DynamoDB this way.
Store it in DynamoDB as one object in a table with some random index. Just store the mapping as a JSON object. This really forgets the notion of tables in DynamoDB and uses it as a simple file. It might also run into some issues with race conditions.
Use AWS ElasticCache to have a Redis key value store. This might be "the right" decision but the downside is that ElasticCache is an always on DB offering where you pay per hour. For a low-traffic web site like mine I'd be paying minumum $12/mo I think and I would really like for this to be pay per access/update due to the low volume. I'm not sure there's an auto increment feature for Redis built in the way I'd need it. But it's pretty trivial to make a trasaction that gets the length of the table, adds one, and stores a new value. Race conditions are easily avoid with this solution.
Use a SQL database like AWS Aurora or MYSQL. Well this has the same upsides as Redis, but it's also more overkill than Redis is, and also it costs a lot more and it's still always on.
Run my own in memory web service or MongoDB etc... still you're paying for constant containers running. Writing my own thing is obviously silly but I'm sure there are services that match this issue perfectly but they'd all require a constant container to run.
Is there a food way to just store a simple list, or integer mapping like this that doesn't cost a constant monthly cost? Is there a better way to do this with DynamoDB?
Store the maxCounterValue as an item in DyanamoDB.
For the PUSH /category, perform the following:
Get the current maxCounterValue.
TransactWrite:
Put the category name and id into a new item with id = maxCounterValue + 1.
Update the maxCounterValue +1, add a ConditionExpression to check that maxCounterValue = :valueFromGetOperation.
If TransactWrite fails, start at 1 again, try X more times

Waiting for a table to be completely deleted

I have a table that has to be refreshed daily from an external source. All the recommendations I read say to delete the whole table and re-create it instead of deleting all the items.
I tried the suggested method, but the deleteTable function returns successful even though the table is still in a state of "Table is being deleted", as seen from the DynamoDB console. Sometimes this takes more than a minute.
What is the proper way of deleting and re-creating a table? Should I just keep trying createTable until the already exists error goes away?
I am using Node.js.
(The table is a list of some 5,000+ bus stops. The source doesn't specify how often the data changes nor give any indicator that there are changes. I found a small number of changes once every few weeks.)
If you are using boto3 (Python), there is a waiter called TableNotExists:
Polls DynamoDB.Client.describe_table() every 20 seconds until a successful state is reached. An error is returned after 25 failed checks.
Or, you could just do that polling yourself.
I would suggest changing the table name each day, using the current date as part of the table name. Then you can create the new table and start populating it without having to wait for the delete of the previous day's table to complete.
If the response from the createTable method is a Table already exists exception, the exception also contains a retryDelay property that is a number.
I can't find documentation on retryDelay but it seems to be a time duration in seconds.
I use the Table already exists exception to check that the table is not completely deleted, and, if not, back off for a period specified in the retryDelay property. After a few iterations, the table can be successfully created.
Sometimes the value in retryDelay can be more than 20.
This approach has worked without issues for me every time.

Ensuring Dynamo retrieves *exactly* n results, given a filter expression

In DynamoDB is there a way to guarantee that exactly n results will be
returned if I specify a limit and a filter?
The problem I see is that the docs state:
In a response, DynamoDB returns all the matching results within the
scope of the Limit value. For example, if you issue a Query or a Scan
request with a Limit value of 6 and without a filter expression,
DynamoDB returns the first six items in the table that match the
specified key conditions in the request (or just the first six items
in the case of a Scan with no filter). If you also supply a
FilterExpression value, DynamoDB will return the items in the first
six that also match the filter requirements (the number of results
returned will be less than or equal to 6).
So this means 6 items will be retrieved and then the filter applied. How can I keep searching until I get exactly '6' items? (Ideally there is some setting in the query to keep going until the limit has been reached -- or exhaustion has been reached)
For example, Suppose I make a query to get 50 people, who's name is "john", Dynamo would return 50 people and then apply the "john" filter. Now only 3 people are returned.
Is there a way I can ensure it will keep searching until the limit of 50 is satisfied?
I don't want to use a Scan since a Scan always searches every item in the table (regardless of limit -- correct me if I'm wrong on this).
How can I make the query's filter lazily until the Limit is satisfied? How can I keep searching until the Limit is satisfied?
If you can filter in the query itself, then that'll be best, since you wouldn't have to use a filter expression. But if you can't, the way dynamo works I suspect means the filter is just a scan over the results - basically a way to save on bandwidth, not much more. You can still use pagination to get more results; and if you're using Dynamo you probably care about the rate in which you're querying, so having that control over how many queries you're actually doing (and their size) is kind of a good thing.

Auto-increment on Azure Table Storage

I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.
Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).
There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.
However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?
For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!
Create one entity in table for ID (you can even name it as ID or whatever).
1) Read it.
2) Increment.
3) InsertOrUpdate WITH ETag specified (from the read query).
if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3.
The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.
See UniqueIdGenerator class by Josh Twist.
I haven't implemented this yet but am working on it ...
You could seed a queue with your next ids to use, then just pick them off the queue when you need them.
You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.
You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.
If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)
lock
get the last key created from the table
increment and save
unlock
then use the new value.
The solution I found that prevents duplicate ids and lets you autoincrement it is to
lock (lease) a blob and let that act as a logical gate.
Then read the value.
Write the incremented value
Release the lease
Use the value in your app/table
Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.
Here is a code sample and more information on this approach from Steve Marx
If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.
Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.
Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).