DynamoDB - TransactWrite, how to do If Else condition - amazon-web-services

An example, a table with a column counter with only 1 row. I want to update this row's counter value based on its current value.
Say if counter is >= 10, set it to 0, otherwise counter++. How to achieve this if else clause in TransactWrite?
I can't have two actions in one transaction because Documentation states that it does not allow more than 1 action on the same item.
And of course, the reason I use TransactWrite is because there will be multiple lambda doing this task in parallel.

You cannot do it in one request, transactional or otherwise.
You can get the item, decide what to do, and then update the item in a second request accordingly. You’ll want to keep a version number or timestamp attribute to make sure the item hasn’t changed between the read and write, and use a condition expression to fail if it has.
That’s a common idiom:
https://dynobase.dev/dynamodb-locking/

You can't do it the way you like, because if/else is not supported in transactions. There is however a simpler solution.
Just Update the item and increment the counter. Whenever you read the value, you take the counter value modulo 10, and you get the desired behavior, e.g. 123 % 10 = 3 or 10 % 10 = 0.

Related

R Studio: Mutating a column variable based on two selection conditions

The dataframe above represents a repeated-measures design, each participant took part in both task A and B. The condition determines which order the tasks occurred - if in condition 1, then Task A came first followed by task B, and vice-versa for condition 2.
I would like to mutate a new column in my dataframe called 'First Task'. This column must represent the scores from the task that always occurred first. For example, participant 1001 was in condition 1, so their score from task A should go into this first task column. For participant 1002, in condition 2, their score from task B should go into the first task column, and so on.
After scouring possible threads (which have always solved every need I have!) I considered using the mutate function, combined with cases_when (group == 1), and thereafter I am not sure how to properly pipe something along the lines of select score from TASK A. Alternatively, I considered how I may go about using if or ifelse, probably the likely piece of code to execute something like this?
It would be an elegant piece of coding like this I am after, as opposed to re-creating a new dataframe . I would greatly appreciate any thoughts or ideas on this. Let me know if this is clear (note I have simplified this image as a example to make the question clearer).
Many Thanks community

AWS CloudWatch Logs Insights

I am working with data that comes in similar format to this in insights. Sometimes the value of J may be missing and I want to set the value as the value of B if this is the case. Is there any way to do conditional logic similar to this on data in CloudWatch Insights? I have explored ispresent() but cannot figure out how to do the conditional logic.
Example:
B
J
1
3
2
4
2
for the last I would like to set the data equal in J to 2 when I run the query.
You may be able to use coalesce(J, B) which won't set the J to 2 but can be assigned to a new field (e.g. fields coalesce(J, B) as newB that can be used for display or additional logic. coalesce takes 2+ arguments and returns the first value that isn't blank.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html (search for coalesce)

How to conditionally execute a SET operation in DynamoDB

I have an aggregations table in DynamoDb with the following columns: id, sum, count, max, min, and hash. I will ALWAYS want to update sum and count but will want to update min and max only when I have values greater than/lesser than the values already in the database. Also, I only want this operation to succeed when the stored hash is different from what I am sending, to prevent reprocessing the same data.
I currently have these:
UpdateExpression: ADD sum :sum ADD count :count SET hash :hash
UpdateCondition: attribute_not_exists(hash) OR hash <> :hash
The thing is that I need something like this for min and max:
SET min :min IF :min < min and something alike for max. Of course, this doesn't currently work. I could not find a suitable update function that would perform this comparision in DynamoDb. What is the proper way to achieve this.
PS.: I already was suggested doing multiple requests to dynamodb and place the max/min as UpdateConditions, but I want to avoid these multiple requests approach for data consistency reasons.
PS2.: Another way to express what I want in a JavaScript-sh way would be something like SET :min < min ? :min : min
I got to a solution to this problem by realizing that what I wanted was just not possible. There must be just one condition to the entire update and since there is no such thing as SET min = minimum(:min, min) I had to accept my fate and make more than one UpdateItem request to DynamoDB.
The nice thing is that the order of execution of these updates doesn't matter. The hard thing here is to make sure that each update is executed exactly once. Because we are firing a lot of requests (and having peaks eventually) there is a real chance of some failing updates due to ProvisionedThroughputExceededException or maybe just some rate limiting from AWS.
So here is my final solution;
Lambda function receives payload with hundreds of data points.
Lambda function aggregates this data points in memory and produces an intermediary aggregation object of the form {id, sum, count, min, max}.
Lambda function generates 3 update objects per aggregation object, of the forms (these updates are referring to the same record):
{UpdateExpression: 'ADD #SUM :sum, #COUNT :count'}
{ConditionExpression: '#MAX < :max OR attribute_not_exists(#MAX)', UpdateExpression: 'SET #MAX = :max'}
{ConditionExpression: '#MIN > :min OR attribute_not_exists(#MIN)', UpdateExpression: 'SET #MIN = :min'}
Because we need to be 100% sure that these updates will always be processed with success, then the lambda function sends them to a FIFO SQS queue (as 3 separate messages). I am not using a FIFO queue here because I want the order to be preserved but because I want the guarantee of exactly once delivery.
A consumer keeps pooling the queue and whenever there are messages it just shoots them to DynamoDB as the parameter of .updateItem.
At the end of this process, I was able to do real-time aggregations for thousands of records :)
PS.: Got rid of the hash column
It is not possible to do this in a single update since UpdateExpression doesn't support functions like max() and min(). The documentation for supported operations and functions can be found here
The best way to achieve the same effect is to add a field called latest or something similar which stores the latest value. You will need to change your update expression to be something like the following.
UpdateExpression: SET hash = :hash, latest = :latest, sum = sum + :latest, count = count + :num
Where :hash is of course your update hash to guard against replays, :latest is the latest value, and :num is 1 or whatever your increment is.
Then you can use DynamoDB Streams with a Lambda that looks at each update and checks if latest is less than min or greater than max. If not, ignore the update, otherwise perform a second update to set min or max to the latest value accordingly.
The main drawback to this approach is that there will be a small window where latest might be outside of the range of min or max however, this can be normalized easily in your application code when you read the records.
You should also consider the additional cost that will result from the DynamoDB Stream and Lambda invocations
I had a similar situation where I needed to atomically update a min value, and ended up doing this:
Let each item have an attribute of type Set (NS) keeping the candidate values for the minvalue, and when you want to set a new value that might be the new min, just add it to the set. Then at read time, find the lowest number in the set on the client side.
This is atomic and requires no condition expression, but has the downside that the set grows over time, so I added a clean up request to run as needed, for example when the set has more than N values, or simply on every get. The clean up might need to use a condition expression to be concurrent safe though, depending on if you also remove values through other use cases.
This does not solve all scenarios, but worked for me. In my case the value was a timestamp of an event in the future, and I wanted to store when the next event occurs. I could then easily also clean up by removing all values in the past.
Summary:
Set new potentially minimum value: ADD #values :value.
Read minimum value: GetItem followed by finding the lowest value in values client-side. This could if needed be combined with a clean up that finds all obsolete values, then calls UpdateItem DELETE #values [x, y, z...]

Order items with single write

High level overview with simple integer order value to get my point across:
id (primary) | order (sort) | attributes ..
----------------------------------------------------------
ft8df34gfx 1 ...
ft8df34gfx 2 ...
ft8df34gfx 3 ...
ft8df34gfx 4 ...
ft8df34gfx 5 ...
Usually it would be easy to change the order (e.g if user drags and drops list items on front-end): shift item around, calculate new order values and update affected items in db with new order.
Constraints:
Doesn't have all the items at once, only a subset of them (think pagination)
Update only a single item in db if single item is moved (1 item per shift)
My initial idea:
Use epoch as order and append something unique to avoid duplicate epoch times, e.g <epoch>#<something-unique-to-item>. Initial value is insertion time (default order is therefore newest first).
Client/server (whoever calculates order) knows the epoch for each item in subset of items it has.
If item is shifted, look at the epoch of previous and next item (if has previous or next - could be moved to first or last), pick a value between and update. More than 1 shifts? Repeat the process.
But..
If items are shifted enough times, epoch values get closer and closer to each other until you can't find a middleground with whole integers.
Add lots of zeroes to epoch on insert? Still reach limit at some point..
If item is shifted to first or last and there are items in previous or next page (remember, pagination), we don't know these values and can't reliably find a "value between".
Fetch 1 extra hidden item from previous and next page? Querying gets complicated..
Is this even possible? What type/value should I use as order?
DynamoDB does not allow the primary partition and sort keys to be changed for a particular item (to change them, the item would need to be deleted and recreated with the new key values), so you'll probably want to use a local or global secondary index instead.
Assuming the partition/sort keys you're mentioning are for a secondary index, I recommend storing natural numbers for the order (1, 2, 3, etc.) and then updating them as needed.
Effectively, you would have three cases to consider:
Adding a new item - You would perform a query on the secondary partition key with ScanIndexForward = false (to reverse the results order), with a projection on the "order" attribute, limited to 1 result. That will give you the maximum order value so far. The new item's order will just be this maximum value + 1.
Removing an item - It may seem unsettling at first, but you can freely remove items without touching the orders of the other items. You may have some holes in your ordering sequence, but that's ok.
Changing the order - There's not really a way around it; your application logic will need to take the list of affected items and write all of their new orders to the table. If the items used to be (A, 1), (B, 2), (C, 3) and they get changed to A, C, B, you'll need to write to both B and C to update their orders accordingly so they end up as (A, 1), (C, 2), (B, 3).

Limiting and Ordering the Scan Result AWS

I am using AWS mobilehub and I create a dynamoDb table(userId, username, usertoplevel, usertopscore).
My Partition key is a string (userId) and I have created one Global Search Index (GSI) in which i make usertoplevel is Partition key and usertopscore as Sort Key. I can successfully query for all items by the following code
final DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();
List<UserstopcoreDO> results;
DynamoDBMapper mapper = AWSMobileClient.defaultMobileClient().getDynamoDBMapper();
results = mapper.scan(UserstopcoreDO.class, scanExpression);
for (UserstopcoreDO usertopScore : results) {
Logger.d("SizeOfUserScore : " + usertopScore.getUsertopscore());
}
Now I have 1500+ records in the table and I want to limit the result to fetch only the top 10 users. I will be thankful if someone help.
In order to achieve this you need to move away from Scan and use Query operation.
The query operation provides you an option to specify if the index should be read forwards or in reverse.
In order to get the top 10 results, you need to limit the results returned to 10. This can be done by setting a limit on your query operation.
Therefore to summarize:
Start using a query operation instead of scan.
Set the scanIndexForward to false to start reading results in descending order.
Set a limit on your query operation to return top 10 results.
This page describes all the things I mentioned in this answer: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
The limit can be set in the scan expression. Please read the definition of the LIMIT carefully. It is the limit for the maximum number of items to evaluate. However, you don't need worry if there is no filter expression used in scan.
In case, if you use filter expression, you may need to do recursive scan until LastEvaluatedKey is null.
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression().withLimit(10);
The maximum number of items to evaluate (not necessarily the number of
matching items). If DynamoDB processes the number of items up to the
limit while processing the results, it stops the operation and returns
the matching values up to that point, and a key in LastEvaluatedKey to
apply in a subsequent operation, so that you can pick up where you
left off. Also, if the processed data set size exceeds 1 MB before
DynamoDB reaches this limit, it stops the operation and returns the
matching values up to the limit, and a key in LastEvaluatedKey to
apply in a subsequent operation to continue the operation.