I'm looking to ensure isolation when multiple transactions may execute a database insert or update, where the old value is required for the process.
Here is a MVP in python-like pseudo code, the default isolation level is assumed:
sql('BEGIN')
rows = sql('SELECT `value` FROM table WHERE `id`=<id> FOR UPDATE')
if rows:
old_value, = rows[0]
process(old_value, new_value)
sql('UPDATE table SET `value`=<new_value> WHERE `id`=<id>')
else:
sql('INSERT INTO table (`id`, `value`) VALUES (<id>, <new_value>)')
sql('COMMIT')
The issue with this is that FOR UPDATE leads to an IS lock, which does not prevent two transactions to proceed. This results in a deadlock when both transaction attempt to UPDATE or INSERT.
Another way to do is first try to insert, and update if there is a duplicated key:
sql('BEGIN')
rows_changed = sql('INSERT IGNORE INTO table (`id`, `value`) VALUES (<id>, <new_value>)')
if rows_changed == 0:
rows = sql('SELECT `value` FROM table WHERE `id`=<id> FOR UPDATE')
old_value, = rows[0]
process(old_value, new_value)
sql('UPDATE table SET `value`=<new_value> WHERE `id`=<id>')
sql('COMMIT')
The issue in this solution is that a failed INSERT leads to an S lock, which does not prevent two transaction to proceed as well, as described here: https://stackoverflow.com/a/31184293/710358.
Of course any solution requiring hardcoded wait or locking the entire table is not satisfying for production environments.
A hack to solve this issue is to use INSERT ... ON DUPLICATE KEY UPDATE ... which always issues an X lock. Since you need the old value, you can perform a blank update and proceed as in your second solution:
sql('BEGIN')
rows_changed = sql('INSERT INTO table (`id`, `value`) VALUES (<id>, <new_value>) ON DUPLICATE KEY UPDATE `value`=`value`')
if rows_changed == 0:
rows = sql('SELECT `value` FROM table WHERE `id`=<id> FOR UPDATE')
old_value, = rows[0]
process(old_value, new_value)
sql('UPDATE table SET `value`=<new_value> WHERE `id`=<id>')
sql('COMMIT')
Related
I like to write a dynamoDb query in which I filter for a certain field, sounds simple.
All the examples I find always include the partition key value, which really confuses me, since it is unique value, but I want a list.
I got id as the partition key and no sort key or any other index. I tried to add partner as an index did not make any difference.
AttributeValue attribute = AttributeValue.builder()
.s(partner)
.build();
Map<String, AttributeValue> expressionValues = new HashMap<>();
expressionValues.put(":value", attribute);
Expression expression = Expression.builder()
.expression("partner = :value")
.expressionValues(expressionValues)
.build();
QueryConditional queryConditional = QueryConditional
.keyEqualTo(Key.builder()
.partitionValue("id????")
.build());
Iterator<Product> results = productTable.query(r -> r.queryConditional(queryConditional)
Would appreciate any help. Is there a misunderstandig on my side?
DynamoDB has two distinct, but similar, operations - Query and Scan:
Scan is for reading the entire table, including all partition keys.
Query is for reading a specific partition key - and all sort key in it (or a contiguous range of sort key - hence the nickname "range key" for that key).
If your data model does not have a range key, Query is not relevant for you - you should use Scan.
However this means that each time you call this query, the entire table will be read. Unless your table is tiny, this doesn't make economic sense, and you should reconsider your data model. For example, if you frequently look up results by the "partner" attribute, you can consider creating a GSI (global secondary index) with "partner" as its partition key, allowing you to quickly and cheapy fetch the list of items with a given "partner" value without scanning the entire table.
I use a lambda to detect if there is any isActive record in my table and put_item to update the id if there is.
For example, I have a placeholder record with ID 999999999, if my table query detected there's an active record (isActive = True), it will put_item with the real session_id and other data.
Table record:
My lambda has the following section (from my cloudwatch the if...else statement is working as intended to verify the logic). Please ignore indentation hiccups when i copy and paste, the code runs with no issue.
##keep "isActive = True" when there's already an active status started from other source, just updating the session_id to from 999999999 to real session_id
else:
count_1 = query["Items"][0]["count_1"] <<< from earlier part of code to retrieve from current count_1 value from the table.
print(count_1) << get the right '13' value from the current table id = '999999999'
table.put_item(
Item={
'session_id': session_id,
'isActive': True,
'count_1': count_1,
'count_2': count_2
},
ConditionExpression='session_id = :session_id AND isActive = :isActive',
ExpressionAttributeValues={
':session_id': 999999999,
':isActive': True
}
)
However my table is not getting new item nor the primary key session_id is updated. Table still stays as the image above.
I understand from the documentation that
You cannot use UpdateItem to update any primary key attributes.
Instead, you will need to delete the item, and then use PutItem to
create a new item with new attributes.
but even if put_item is not able to update primary key, at least I am expecting a new item being created from my code when there isn't any error code thrown?
Does anybody know what is happening? thanks
I resolved it with different specification for ConditionExpression. Did multiple troubleshooting ways and pinpoint the issue comes from ConditionExpression:
What i did instead -
add imports of boto3.dynamodb.conditions import Key & Attr
and use ConditionExpression with ConditionExpression=Attr("session_id").ne(999999999)
and delete old id item
table.delete_item(
Key={
'session_id': 999999999
}
)
Other conditions available here https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/dynamodb.html#ref-dynamodb-conditions
If anyone has any other better and easier way would like to learn
I have a DynamoDB table with partition key as userID and no sort key.
The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions).
After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table.
Hence, I created a GSI with the following keys:
Partition Key: userID
Sort Key: timestamp
I am querying the index with Java SDK using the following code:
String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("timestamp > :v_timestampLowerBound")
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString));
ItemCollection<QueryOutcome> items = index.query(querySpec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
Item item = iter.next();
// extract item attributes here
}
I am getting the following error on executing this code:
Query condition missed key schema element: userID
From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.
Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.
When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)
The condition must perform an equality test on a single partition key value.
The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)
If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
.withKeyMap(new KeyMap()
.withString("#type", "type"))
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString)
.withString(":item", "item"));
Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI.
In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.
Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key.
At most, the number of query you need to make is just 2 at max in this case.
you are right, you should avoid filtering or scanning as much as you can.
So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:
.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")
and else like this:
.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")
and
.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")
Finally, you should union the result set from both queries.
This consumes only 2 read capacity unit at most.
For better comparison of sort key, you might need to use UNIX timestamp.
Thanks
I noticed that DynamoDB query/scan only returns documents that contain a subset of the document, just the key columns it appears.
This means I need to do a separate Batch_Get to get the actual documents referenced by those keys.
I am not using a projection expression, and according to the documentation this means the whole item should be returned.1
How do I get query to return the entire document so I don't have to do a separate batch get?
One example bit of code that shows this is below. It prints out found documents, yet they contain only the primary key, the secondary key, and the sort key.
t1 = db.Table(tname)
q = {
'IndexName': 'mysGSI',
'KeyConditionExpression': "secKey= :val1 AND " \
"begins_with(sortKey,:status)",
'ExpressionAttributeValues': {
":val1": 'XXX',
":status": 'active-',
}
}
res = t1.query(**q)
for doc in res['Items']:
print(json.dumps(doc))
This situation is discussed in the documentation for the Select parameter. You have to read quite a lot to find this, which is not ideal.
If you query or scan a global secondary index, you can only request
attributes that are projected into the index. Global secondary index
queries cannot fetch attributes from the parent table.
Basically:
If you query the parent table then you get all attributes by default.
If you query an LSI then you get all attributes by default - they're retrieved from the projection in the LSI if all attributes are projected into the index (so that costs nothing extra) or from the base table otherwise (which will cost you more reads).
If you query or scan a GSI, you can only request attributes that are projected into the index. GSI queries cannot fetch attributes from the parent table.
I have a SQLite database.
When writing move rows function, which moves rows from one table to another I need to have a query for incrementing column with name "row" which is INTEGER PRIMARY KEY, but there is an error. It is critical to have indexing with row in my task. The condition in example is WHERE row >= 2 because i am inserting rows from other table into position 2.
"UPDATE '4' SET row = row + 1 WHERE row >= 2"
Error("19", "Unable to fetch row", "UNIQUE constraint failed: 4.row")
The problem's origin WHERE row >= 2" part. How to overcome this problem?
The problem's origin WHERE row >= 2" part.
I'm inclined to disagree. The problem is not with which rows are updated, it is with the order in which they are updated.
Very likely SQLite will process rows in rowid order, which almost certainly is also increasing order of the row column, since that column is an auto-incremented PK. Suppose, then, that the table contains two rows with row values 2 and 3. If it processes the first row first, then it attempts to set that row's row value to 3, but that produces a constraint violation because that column is subject to a uniqueness constraint, and there is already a row with value 3 in that column.
How to overcome this problem?
Do not modify PK values, and especially do not modify the values of surrogate PKs, which substantially all auto-increment keys are.
Alternatively, update the rows into a temporary table, clear the original table, and copy the updated values back into it. This can be extremely messy if you have any FKs referencing this PK, however, so go back to the "Do not modify PK values" advice that I led off with.
First: '4' is not a table name. The UPDATE statement expects a table name where you have written '4'. For example:
UPDATE table1 SET row = row + 1 WHERE row >= 2
Second: Just do not use row as a primary key (or unique key, for that matter) when it obviously is not meant as primary key but as a changing row number. Create a separate column that can be used as primary index of that table instead.