dynamodb table tables design for multi user sharing applications - amazon-web-services

Sorry for the abstract title of the question but i will try to explain my intention in details in my question.
I want to create a reminders application in which each user has a separate login in the system but he/she can choose to share an item(in this case a reminder) with another user if he/she chooses. So when that user with whom the item is shared searches in his app he can also see the reminders which are shared with him.
So a user can have a reminder for only himself + a reminder which is shared with him.
This are my data access/retrieval patterns:
So when a user goes inside the application he should be able to see a list of reminders that he created and also the ones which are shared with him
From that list he should be able to search for a reminder by tag(i plan to do that outside dynamodb since the tag would be a set and not a scalar field hence i cannot have an index on that) and also should be able to search for a reminder by title
3.A user should be able to update or delete a reminder
4.A user should be able to create a reminder
5.Also the user should only be able to see future reminders and not the ones in which the expiration date is passed
The table and index creation that i have is created using the below create_table script :
import boto3
def create_reminders_table():
"""Just create the reminders table."""
session = boto3.session.Session(profile_name='dynamo_local')
dynamodb = session.resource('dynamodb', endpoint_url="http://localhost:8000")
table = dynamodb.create_table(
TableName='Reminders',
KeySchema=[
{
'AttributeName': 'reminder_id',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
'AttributeName': 'reminder_id',
'AttributeType': 'S'
},
{
'AttributeName': 'user_id',
'AttributeType': 'S'
},
{
'AttributeName': 'reminder_title_reminder_id',
'AttributeType': 'S'
}
],
GlobalSecondaryIndexes=[
{
'IndexName': 'UserTitleReminderIdGsi',
'KeySchema': [
{
'AttributeName': 'user_id',
'KeyType': 'HASH'
},
{
'AttributeName': 'reminder_title_reminder_id',
'KeyType': 'RANGE'
}
],
'Projection': {
'ProjectionType': 'INCLUDE',
'NonKeyAttributes': [
'reminder_expiration_date_time'
]
}
}
],
BillingMode='PAY_PER_REQUEST'
)
return table
if __name__ == '__main__':
movie_table = create_reminders_table()
print("Table status:", movie_table.table_status)
So the decision for the global secondary index us to allow a user to search for reminders with a reminder title.
Now to achieve the above case in which a user wants to also share his reminder with someone else i want to do the below change to my table schema . Basically i want to rename the user_id attribute to something like users_id which initially contains the user id of the user who created it but if that reminder is shared with someone then the user_id of the second user is also concatenated with the creator user id and the users_id column is modified .
If i do this i have 2 issues which i can think of:
How do i know the user_id of the user with whom the reminder is shared ? May be now i need to maintain a new table holding user information ? Or can i use some other service like amazon cognito for this?
If i still have the Global Secondary index on the users_id column when i need to search for reminders for a user the query needs to be like : select * from reminders where users_id startswith("Bob")( for example) .
Another option which i can think of(preferred way) is to drop the idea of creating a users_id attribute but instead of keeping the user_id column as is . I would the add the user_id as a sort key (RANGE) key to the table so that the combination of reminder_id and user_id is unique. Then when a user wants to share his created reminder with some other user a new entry is created inside the database with the same reminder_id and a new user id (which is the user id of the user with whom the reminder is shared)
Any help on my dilemma would be greatly appreciated.
Thanks in advance.

You don't mention your query access pattern in any detail, and with DynamoDB your data model flows from the query access pattern. So the below is based only on my imagination of what query patterns you might need. I could be off.
The PK can be the user_id. The SK can be the reminder_id of all reminders the user keeps. That lets you do a Query to get all reminders for a given user. The primary key then is the user id and reminder id in combination, so if you're passing around a reference, use that (not just the reminder_id).
A share gets added by putting another item under the user_id of the person getting shared with. That way a Query for that user can retrieve both their own reminders and those shared with them.
If you need people to list what reminders they've shared and with others, you can put that into the reminder itself as a list of who it's been shared with, if the list is short enough, or instead create a GSI on that share reference (against a shared_by attribute) if the list might be large.
If you need to query for a user's reminders and differentiate their own vs shared, you can prepend the SK with that so SHARED#reminder_id or SELF#reminder_id so a begins_with on the SK can differentiate.
You can refine this in various ways, but I think it would optimize for the "show me my reminders and the reminders shared with me" use cases, while making sharing (or undoing sharing) easy to implement.

Related

Django Rest Framework: Disable save in update

It is my first question here, after reading the similar questions I did not find what I need, thanks for your help.
I am creating a fairly simple API but I want to use best practices at the security level.
Requirement: There is a table in SQL Server with +5 million records that I should ONLY allow READ (all fields) and UPDATE (one field). This is so that a data scientist consumes data from this table and through a predictive model (I think) can assign a value to each record.
For this I mainly need 2 things:
That only one field is updated despite sending all the fields of the table in the Json (I think I have achieved it with my serializer).
And, where I have problems, is in disabling the creation of new records when updating one that does not exist.
I am using an UpdateAPIView to allow trying to allow a bulk update using a json like this (subrrogate_key is in my table and I use lookup_field to:
[
{
"subrrogate_key": "A1",
"class": "A"
},
{
"subrrogate_key": "A2",
"class": "B"
},
{
"subrrogate_key": "A3",
"class": "C"
},
]
When using the partial_update methods use update and this perform_update and this finally calls save and the default operation is to insert a new record if the primary key (or the one specified in lookup_field) is not found.
If I overwrite them, how can I make a new record not be inserted, and only update the field if it exists?
I tried:
Model.objects.filter (subrrogate_key = ['subrrogate_key']). Update (class = ['class])
Model.objects.update_or_create (...)
They work fine if all the keys in the Json exist, because if a new one comes they will insert (I don't want this).
P.S. I use a translator, sorry.
perform_update will create a new record if you passed a serializer that doesn't have an instance. Depending on how you wrote your view, you can simply check if there is an instance in the serializer before calling save in perform_update to prevent creating a new record:
def perform_update(self, serializer):
if not serializer.instance:
return
serializer.save()
Django implements that feature through the use of either force_update or update_fields during save().
https://docs.djangoproject.com/en/3.2/ref/models/instances/#forcing-an-insert-or-update
https://docs.djangoproject.com/en/3.2/ref/models/instances/#specifying-which-fields-to-save
https://docs.djangoproject.com/en/3.2/ref/models/instances/#saving-objects
In some rare circumstances, it’s necessary to be able to force the
save() method to perform an SQL INSERT and not fall back to doing an
UPDATE. Or vice-versa: update, if possible, but not insert a new row.
In these cases you can pass the force_insert=True or force_update=True
parameters to the save() method.
model_obj.save(force_update=True)
or
model_obj.save(update_fields=['field1', 'field2'])

Why does the following dynamoDB write with conditional expression succeeds?

I have the following code to create a dynamoDB table :
def create_mock_dynamo_table():
conn = boto3.client(
"dynamodb",
region_name=REGION,
aws_access_key_id="ak",
aws_secret_access_key="sk",
)
conn.create_table(
TableName=DYNAMO_DB_TABLE,
KeySchema=[
{'AttributeName': 'PK', 'KeyType': 'HASH'},
{'AttributeName': 'SK', 'KeyType': 'RANGE'}
],
AttributeDefinitions=[
{'AttributeName': 'PK', 'AttributeType': 'S'},
{'AttributeName': 'SK', 'AttributeType': 'S'}],
ProvisionedThroughput={"ReadCapacityUnits": 5, "WriteCapacityUnits": 5},
)
mock_table = boto3.resource('dynamodb', region_name=REGION).Table(DYNAMO_DB_TABLE)
return mock_table
Then I use it to create two put-items :
mock_table = create_mock_dynamo_table()
mock_table.put_item(
Item={
'PK': 'did:100000001',
'SK': 'weekday:monday:start_time:00:30',
}
)
mock_table.put_item(
Item={
'PK': 'did:100000001',
'SK': 'weekday:monday:start_time:00:40',
},
ConditionExpression='attribute_not_exists(PK)'
)
When I do the second put_item, the PK is already there in the system and only the sort key is different. But the condition I am setting only in the existence of same PK. So the second put_item should fail right ?
The condition check for PutItem does not check the condition against arbitrary items. It only checks the condition against an item with the same primary key (hash and sort keys), if such an item exists.
In your case, the value of the sort key is different, so when you put the second item, DynamoDB sees that no item exists with that key, therefore the PK attribute does not exist.
This is also why the condition check fails the second time you run the code—because at that point you do already have an item with the same hash and sort keys.
DynamoDB's "IOPS" is very low and the actual write takes some time. You can read more about it here. But, if you run the code a second time soon after, you'll see that you'll get the expected botocore.errorfactory.ConditionalCheckFailedException.
If I may refer to what I think you're trying to do - mock a DB + data. When you want to mock such an "expensive" resource, make an actual fake class. You'll want to wrap all your DB accesses in the actual code with some kind of dal.py module that consolidates operations such as write/read/etc. Then, you mock those methods/functions.
You don't want to write code so tightly coupled with the chosen DB.
The best practice is using an ORM framework such as SQLAlchemy. It is invaluable to take the time now to learn it. But, you might have time constraints I'm not aware of.

How do I select records where its related records do not have a specific attribute-value?

I want to select all of the users who do not have associated posts where the title is, 'test123'. The following syntax is not working:
User.includes(:posts).where.not(posts: { title: 'test123' })
How do I select users whose associated posts do not have a specific title?
UPDATE
Originally, I tried to isolate where exactly I thought I was having the problem, but I want to show the query that most closely reflects what I am doing. The problem is still with the, where.not, clause. It's returning an empty array when I know there are records with posts that have other titles.
User.where("users.created_at >= ? and users.created_at <= ?", 1.month.ago.utc, 1.week.ago.utc)
.where(active: true)
.includes(:comments)
.where('comments.id is not null')
.includes(:posts)
.where.not(posts: { title: 'test123'} )
.references(:posts)
From the API Doc
If you want to add conditions to your included models you’ll have to
explicitly reference them
So the below query will work
User.includes(:posts).where.not(posts: { title: 'test123' }).references(:posts)
I believe the problem was with executing inner vs outer joins with Active Record Query Syntax. The problem with this query:
User.includes(:posts).where.not(posts: { title: 'test123' })
is that it evaluates, where.not(posts: { title: 'test123'}), only after retrieving all the user records with an associated post -- leaving behind all of the other user records who do not have a corresponding post. I wanted a query that would not exclude the Users without posts.
This is how I did it:
User.where("users.id NOT IN (SELECT DISTINCT(user_id) from posts where title = 'test123')")

Wizard input - DB View - Dynamic where clause

I was trying to pass where condition values onto a database view.
View was created in init method of class defined.
Input to where clause was taken from a popped up wizard.
Issue is that the wizard form values are inserted into model bound database table.
This is happening on all submits.
Currently I am reading the latest record from table on wizard input.
And the view definition is modified to generate result set based on latest input record from wizard table.
select v.col1, v.expre2
from view_name v,
( select fld1, fld2 from wizrd_tbl_1 order by id desc limit 1 ) as w
where
v.colM between w.fld1 and w.fld2
Currently I am following the above sequence of steps and results are fetched.
But I think, this would fail if at least two users are using the same wizard concurrently.
How can I change my approach, so that
1. Wizard input is not sent to database table,
2. The inputs are sent to a where clause dynamically and the result set is bound to a List View
As a summary, I was trying to:
Creates a database view joining multiple table.
Take user input ( and saves in db table, which is not expected and
not required ).
Pass the user input to db view's where clause. ( Any alternative to wizard ? )
Bind the result set to List View
It is definitely a bad idea to morph a database view based on user input, when that view is likely to be accessed by multiple users.
The 'correct' way to do this would be to have a static database view which contains all possible records from the joined tables, and then filter that data for individual users by generating a "domain" and redirecting the user to a tree view with that domain applied.
You can redirect the user by creating a <button type="object"> which calls a function such as the below:
def action_get_results(self, cr, uid, ids, context={}):
# Redirect user to results
my_domain = ['&', ('col1','=','testval'), ('col2','>',33)]
return {
'type': 'ir.actions.act_window',
'name': 'Search Results',
'view_mode': 'tree',
'res_model': 'your.osv_memory.model.name',
'target': 'new', # or 'current'
'context': context,
'domain': my_domain,
}

Django: structuring a complex relationship intended for use with built-in admin site

I have a fairly complex relationship that I am trying to make work with the Django admin site. I have spent quite some time trying to get this right and it just seems like I am not getting the philosophy behind the Django models.
There is a list of Groups. Each Group has multiple departments. There are also Employees. Each Employee belongs to a single group, but some employees also belong to a single Department within a Group. (Some employees might belong to only a Group but no Department, but no Employee will belong only to a Department).
Here is a simplified version of what I currently have:
class Group:
name = models.CharField(max_length=128)
class Department
group = models.ForeignKey(Group)
class Employee
department = models.ForeignKey(Department)
group = models.ForeignKey(Group)
The problem with this is that the Department select box on the Employees page must display all Departments, because a group has not yet been set. I tried to rectify this by making an EmployeeInline for the GroupAdmin page, but it is not good to have 500+ employees on a non-paginated inline. I must be able to use the models.ModelAdmin page for Employees (unless there is a way to search, sort, collapse and perform actions on inlines).
If I make EmployeeInline an inline of DepartmentAdmin (instead of having a DepartmentInline in GroupAdmin), then things are even worse, because it is not possible to have an Employee that does not belong to a Group.
Given my description of the relationships, am I missing out on some part of the Django ORM that will allow me to structure this relationship the way it 'should be' instead of hacking around and trying to make things come together?
Thanks a lot.
It sounds like what you want is for the Department options to only be those that are ForeignKey'ed to Group? The standard answer is that the admin site is only for simple CRUD operations.
But doing what you're supposed to do is boring.
You could probably overcome this limitation with some ninja javascript and JSON.
So first of all, we need an API that can let us know which departments are available for each group.
def api_departments_from_group(request, group_id):
departments = Department.objects.filter(group__id=group_id)
return json(departments) # Note: serialize, however
Once the API is in place we can add some javascript to change the <option>'s on the department select...
$(function() {
// On page load...
if ($('#id_group')) {
// Trap when the group box is changed
$('#id_group').bind('blur', function() {
$.getJSON('/api/get-departments/' + $('#id_group').val() + '/', function(data) {
// Clear existing options
$('#id_department').children().remove();
// Parse JSON and turn into <option> tags
$.each(data, function(i, item) {
$('#id_department').append('<option>' + item.name + '</option>');
});
});
});
}
});
Save that to admin-ninja.js. Then you can include it on the admin model itself...
class EmployeeAdmin(models.ModelAdmin):
# ...
class Media:
js = ('/media/admin-ninja.js',)
Yeah, so I didn't test a drop of this, but you can get some ideas hopefully. Also, I didn't get fancy with anything, for example the javascript doesn't account for an option already already being selected (and then re-select it).