Why does the following dynamoDB write with conditional expression succeeds? - amazon-web-services

I have the following code to create a dynamoDB table :
def create_mock_dynamo_table():
conn = boto3.client(
"dynamodb",
region_name=REGION,
aws_access_key_id="ak",
aws_secret_access_key="sk",
)
conn.create_table(
TableName=DYNAMO_DB_TABLE,
KeySchema=[
{'AttributeName': 'PK', 'KeyType': 'HASH'},
{'AttributeName': 'SK', 'KeyType': 'RANGE'}
],
AttributeDefinitions=[
{'AttributeName': 'PK', 'AttributeType': 'S'},
{'AttributeName': 'SK', 'AttributeType': 'S'}],
ProvisionedThroughput={"ReadCapacityUnits": 5, "WriteCapacityUnits": 5},
)
mock_table = boto3.resource('dynamodb', region_name=REGION).Table(DYNAMO_DB_TABLE)
return mock_table
Then I use it to create two put-items :
mock_table = create_mock_dynamo_table()
mock_table.put_item(
Item={
'PK': 'did:100000001',
'SK': 'weekday:monday:start_time:00:30',
}
)
mock_table.put_item(
Item={
'PK': 'did:100000001',
'SK': 'weekday:monday:start_time:00:40',
},
ConditionExpression='attribute_not_exists(PK)'
)
When I do the second put_item, the PK is already there in the system and only the sort key is different. But the condition I am setting only in the existence of same PK. So the second put_item should fail right ?

The condition check for PutItem does not check the condition against arbitrary items. It only checks the condition against an item with the same primary key (hash and sort keys), if such an item exists.
In your case, the value of the sort key is different, so when you put the second item, DynamoDB sees that no item exists with that key, therefore the PK attribute does not exist.
This is also why the condition check fails the second time you run the code—because at that point you do already have an item with the same hash and sort keys.

DynamoDB's "IOPS" is very low and the actual write takes some time. You can read more about it here. But, if you run the code a second time soon after, you'll see that you'll get the expected botocore.errorfactory.ConditionalCheckFailedException.
If I may refer to what I think you're trying to do - mock a DB + data. When you want to mock such an "expensive" resource, make an actual fake class. You'll want to wrap all your DB accesses in the actual code with some kind of dal.py module that consolidates operations such as write/read/etc. Then, you mock those methods/functions.
You don't want to write code so tightly coupled with the chosen DB.
The best practice is using an ORM framework such as SQLAlchemy. It is invaluable to take the time now to learn it. But, you might have time constraints I'm not aware of.

Related

dynamodb table tables design for multi user sharing applications

Sorry for the abstract title of the question but i will try to explain my intention in details in my question.
I want to create a reminders application in which each user has a separate login in the system but he/she can choose to share an item(in this case a reminder) with another user if he/she chooses. So when that user with whom the item is shared searches in his app he can also see the reminders which are shared with him.
So a user can have a reminder for only himself + a reminder which is shared with him.
This are my data access/retrieval patterns:
So when a user goes inside the application he should be able to see a list of reminders that he created and also the ones which are shared with him
From that list he should be able to search for a reminder by tag(i plan to do that outside dynamodb since the tag would be a set and not a scalar field hence i cannot have an index on that) and also should be able to search for a reminder by title
3.A user should be able to update or delete a reminder
4.A user should be able to create a reminder
5.Also the user should only be able to see future reminders and not the ones in which the expiration date is passed
The table and index creation that i have is created using the below create_table script :
import boto3
def create_reminders_table():
"""Just create the reminders table."""
session = boto3.session.Session(profile_name='dynamo_local')
dynamodb = session.resource('dynamodb', endpoint_url="http://localhost:8000")
table = dynamodb.create_table(
TableName='Reminders',
KeySchema=[
{
'AttributeName': 'reminder_id',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
'AttributeName': 'reminder_id',
'AttributeType': 'S'
},
{
'AttributeName': 'user_id',
'AttributeType': 'S'
},
{
'AttributeName': 'reminder_title_reminder_id',
'AttributeType': 'S'
}
],
GlobalSecondaryIndexes=[
{
'IndexName': 'UserTitleReminderIdGsi',
'KeySchema': [
{
'AttributeName': 'user_id',
'KeyType': 'HASH'
},
{
'AttributeName': 'reminder_title_reminder_id',
'KeyType': 'RANGE'
}
],
'Projection': {
'ProjectionType': 'INCLUDE',
'NonKeyAttributes': [
'reminder_expiration_date_time'
]
}
}
],
BillingMode='PAY_PER_REQUEST'
)
return table
if __name__ == '__main__':
movie_table = create_reminders_table()
print("Table status:", movie_table.table_status)
So the decision for the global secondary index us to allow a user to search for reminders with a reminder title.
Now to achieve the above case in which a user wants to also share his reminder with someone else i want to do the below change to my table schema . Basically i want to rename the user_id attribute to something like users_id which initially contains the user id of the user who created it but if that reminder is shared with someone then the user_id of the second user is also concatenated with the creator user id and the users_id column is modified .
If i do this i have 2 issues which i can think of:
How do i know the user_id of the user with whom the reminder is shared ? May be now i need to maintain a new table holding user information ? Or can i use some other service like amazon cognito for this?
If i still have the Global Secondary index on the users_id column when i need to search for reminders for a user the query needs to be like : select * from reminders where users_id startswith("Bob")( for example) .
Another option which i can think of(preferred way) is to drop the idea of creating a users_id attribute but instead of keeping the user_id column as is . I would the add the user_id as a sort key (RANGE) key to the table so that the combination of reminder_id and user_id is unique. Then when a user wants to share his created reminder with some other user a new entry is created inside the database with the same reminder_id and a new user id (which is the user id of the user with whom the reminder is shared)
Any help on my dilemma would be greatly appreciated.
Thanks in advance.
You don't mention your query access pattern in any detail, and with DynamoDB your data model flows from the query access pattern. So the below is based only on my imagination of what query patterns you might need. I could be off.
The PK can be the user_id. The SK can be the reminder_id of all reminders the user keeps. That lets you do a Query to get all reminders for a given user. The primary key then is the user id and reminder id in combination, so if you're passing around a reference, use that (not just the reminder_id).
A share gets added by putting another item under the user_id of the person getting shared with. That way a Query for that user can retrieve both their own reminders and those shared with them.
If you need people to list what reminders they've shared and with others, you can put that into the reminder itself as a list of who it's been shared with, if the list is short enough, or instead create a GSI on that share reference (against a shared_by attribute) if the list might be large.
If you need to query for a user's reminders and differentiate their own vs shared, you can prepend the SK with that so SHARED#reminder_id or SELF#reminder_id so a begins_with on the SK can differentiate.
You can refine this in various ways, but I think it would optimize for the "show me my reminders and the reminders shared with me" use cases, while making sharing (or undoing sharing) easy to implement.

Django Rest Framework: Disable save in update

It is my first question here, after reading the similar questions I did not find what I need, thanks for your help.
I am creating a fairly simple API but I want to use best practices at the security level.
Requirement: There is a table in SQL Server with +5 million records that I should ONLY allow READ (all fields) and UPDATE (one field). This is so that a data scientist consumes data from this table and through a predictive model (I think) can assign a value to each record.
For this I mainly need 2 things:
That only one field is updated despite sending all the fields of the table in the Json (I think I have achieved it with my serializer).
And, where I have problems, is in disabling the creation of new records when updating one that does not exist.
I am using an UpdateAPIView to allow trying to allow a bulk update using a json like this (subrrogate_key is in my table and I use lookup_field to:
[
{
"subrrogate_key": "A1",
"class": "A"
},
{
"subrrogate_key": "A2",
"class": "B"
},
{
"subrrogate_key": "A3",
"class": "C"
},
]
When using the partial_update methods use update and this perform_update and this finally calls save and the default operation is to insert a new record if the primary key (or the one specified in lookup_field) is not found.
If I overwrite them, how can I make a new record not be inserted, and only update the field if it exists?
I tried:
Model.objects.filter (subrrogate_key = ['subrrogate_key']). Update (class = ['class])
Model.objects.update_or_create (...)
They work fine if all the keys in the Json exist, because if a new one comes they will insert (I don't want this).
P.S. I use a translator, sorry.
perform_update will create a new record if you passed a serializer that doesn't have an instance. Depending on how you wrote your view, you can simply check if there is an instance in the serializer before calling save in perform_update to prevent creating a new record:
def perform_update(self, serializer):
if not serializer.instance:
return
serializer.save()
Django implements that feature through the use of either force_update or update_fields during save().
https://docs.djangoproject.com/en/3.2/ref/models/instances/#forcing-an-insert-or-update
https://docs.djangoproject.com/en/3.2/ref/models/instances/#specifying-which-fields-to-save
https://docs.djangoproject.com/en/3.2/ref/models/instances/#saving-objects
In some rare circumstances, it’s necessary to be able to force the
save() method to perform an SQL INSERT and not fall back to doing an
UPDATE. Or vice-versa: update, if possible, but not insert a new row.
In these cases you can pass the force_insert=True or force_update=True
parameters to the save() method.
model_obj.save(force_update=True)
or
model_obj.save(update_fields=['field1', 'field2'])

DynamoDB Concurrent Add to StringSet

I have read that concurrent writes on the same attribute, such as list appends or overwriting by using SET attr = :val, will lose some data (unordered overwrites, etc.). A quick test with 4 processes running in parallel successfully added all elements to the set but I want to be sure. Can I count on this behavior? Will concurrent writes using ADD to the same StringSet field correctly write all elements to the set?
Currently using updateItem with the following payload:
TableName='TestTable',
Key={
'PK': {'S': 'PrimaryKey'},
'SK': {'S':'SortKey'}
},
UpdateExpression='ADD Field :value',
ExpressionAttributeValues={
':value': {'SS': ['test value']}
}

Django-ORM: Check whether multiple items are in DB while minimizing calls

Assume that from an external API call, we get the following response:
resp = ['123', '67283', '99829', '786232']
These are external_id fields for our objects, defined in our Article model. Some of which may already exist in database, while others don't.
Before returning a response, we need to check whether each external_id corresponds to a record in our database, and if not, we need to create it and fetch additional info from another, third, source.
What is the most efficient way to do this? Right now I can't think of something better than:
for external_id in resp:
if not Article.objects.filter(external_id=external_id).exists():
# item doesn't exist, go fetch more data and create object
else:
# already exists, do something else
But there must be a better way..?
You can use sets for this task. Following code will issue only one database call:
expected_ids = set(int(pk) for pk in resp)
exist_ids = set(Article.objects.filter(external_id__in=resp)
.values_list('external_id', flat=True))
not_exist_ids = list(expected_ids - exist_ids)

Do we need cache for an array?

Since we're developing a web-based project using django. we cache the db operation to make a better performance. But I'm wondering whether we need cache the array.
the code sample like this:
ABigArray = {
"1" : {
"name" : "xx",
"gender" "xxx",
...
},
"2" : {
...
},
...
}
class Items:
def __init__(self):
self.data = ABigArray
def get_item_by_id(self, id):
item = cache.get("item" + str(id)) # get the cached item if possible
if item:
return item
else:
item = self.data.get(str(id))
cache.set("item" + str(id), item)
return item
So I'm wondering whether we really need such cache, since IMO the array( ABigArray ) will be loaded in memory when trying to get one item. So we don't need use cache in such condition, right? Or I'm wrong?
Please correct me if I'm wrong.
Thanks.
You've cut out a bit too much information, but it looks like the "array" (actually a dictionary) is always the same - there's a single instance that is created when the module is first imported, and will be used by every Items object. So there's absolutely nothing to be gained by caching it - in fact you will lose by doing so, as you will introduce an unnecessary round trip to get the data from the cache.