I’m attempting to create a multi-tenant application with DynamoDB and Cognito. The documentation is pretty clear on how to implement fine-grained authorisation so that users can access only their own records, by adding a condition to the IAM access policy like so:
"Condition": {
"ForAllValues:StringEquals": {
"dynamodb:LeadingKeys": [
"${cognito-identity.amazonaws.com:sub}"
]
}
}
This is great for allowing users to read & write their own records, when the Cognito user id is the hash key of the row, but I’m struggling with how to allow other users to have read only access to some records.
Take as an example my model for a student who has has multiple courses:
{
“student_id”: “ABC-1234567”,
“course_name”: “Statistics 101”,
“tutors”: [“Cognito-sub-1”, “Cognito-sub-2”],
“seminar_reviews”: [
{
“seminar_id”: “XXXYYY-12345”
“date”: “2018-01-12”,
“score”: “8”,
“comments”: “Nice class!”
},
{
“seminar_id”: “ABCDEF-98765”
“date”: “2018-01-25”,
“score”: “3”,
“comments”: “Boring.”
}
]
}
(Cognito-sub-1 is the Cognito id of a tutor)
With the policy conditions above applied to the user’s IAM role, the user could read & write this document since the hash key (student_id) is the Cognito id of the user.
I’d also like the tutors listed in the document to have read-only access to certain attributes, but I can’t find any examples of how this can be done. I know that I can’t use the dynamodb:LeadingKeys condition since tutors is not the hash key of the table. Can this be done if I set up a Global Secondary Index (GSI) that uses the list of tutors as the hash key?
If this can be done with an index, I assume that this would only allow read access to that index (since an index can’t allow write operations). Is there any alternative method to allow write access based on an attribute that is not the hash key?
Alternatively, can I use a longer string as the hash key, concatenating attributes like ”owner”: and ”read-only”: that contain lists of Cognito IDs and consume this within my policy to create a more fine-grained permissions model based only on the hash key? This assumes a policy can decode lists from a string, since DynamoDB does not allow a hash key to be a list, JSON object or similar.
I haven’t been able to find any resources that consider fine-grained access control beyond allowing users to read/write only their own records, so if anyone can direct me to some, that would be a great start.
You can restrict access to specific attributes easily (just the attributes).
However, in order to achieve more fine-grained access patterns you'd have to either:
offload access control task to your code (e.g. Lambda)
or you could evaluate your access patterns (which is generally a good thing, however, it might be a little trickier) and model your data accordingly
Generally speaking, when designing NoSQL applications, you should always evaluate how you consume your data. They are usually tailored for specific use-case - unlike RDBMS, which allow very general queries regardless.
There's a nice example regarding modeling relational data in terms of DynamoDB available here
Related
I have an AWS Amplify application that has a structure with multi-organizations:
Organization A -> Content of Organization A
Organization B -> Content of Organization B
Let's say we have the user Alice, Alice belongs to both organizations, however, she has different roles in each one, on organization A Alice is an administrator and has more privileges (i.e: can delete content or modify other's content), while on Organization B she is a regular user.
For this reason I cannot simply set regular groups on Amplify (Cognito), because some users, like Alice, can belong to different groups on different organizations.
One solution that I thought was having a group for each combination of organization and role.
i.e: OrganizationA__ADMIN, OrganizationB__USER, etc
So I could restrict the access on the schema using a group auth directive on the Content model:
{allow: group, groupsField: "group", operations: [update]},
The content would have a group field with a value: OrganizationA__ADMIN
Then I could add the user to the group using the Admin Queries API
However, it doesn't seem to be possible to add a user to a group dynamically, I'd have to manually create each group every time a new organization is created, which pretty much kills my idea.
Any other idea on how I can achieve the result I'm aiming for?
I know that I can add the restriction on code, but this is less safe, and I'd rather to have this constraint on the database layer.
Look into generating additional claims in you pre-token-generation handler
Basically you can create an attribute that includes organization role mapping
e.g.
{
// ...
"custom:orgmapping": "OrgA:User,OrgB:Admin"
}
then transform them in your pre-token-generation handler into "pseudo" groups that don't actually exist in the pool.
I have a requirement to build a basic "3 failed login attempts and your account gets locked" functionality. The project uses AWS Cognito for Authentication, and the Cognito PreAuth and PostAuth triggers to run a Lambda function look like they will help here.
So the basic flow is to increment a counter in the PreAuth lambda, check it and block login there, or reset the counter in the PostAuth lambda (so successful logins dont end up locking the user out). Essentially it boils down to:
PreAuth Lambda
if failed-login-count > LIMIT:
block login
else:
increment failed-login-count
PostAuth Lambda
reset failed-login-count to zero
Now at the moment I am using a dedicated DynamoDB table to store the failed-login-count for a given user. This seems to work fine for now.
Then I figured it'd be neater to use a custom attribute in Cognito (using CognitoIdentityServiceProvider.adminUpdateUserAttributes) so I could throw away the DynamoDB table.
However reading https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-dg.pdf the section titled "Configuring User Pool Attributes" states:
Attributes are pieces of information that help you identify individual users, such as name, email, and phone number. Not all information about your users should be stored in attributes. For example, user data that changes frequently, such as usage statistics or game scores, should be kept in a separate data store, such as Amazon Cognito Sync or Amazon DynamoDB.
Given that the counter will change on every single login attempt, the docs would seem to indicate I shouldn't do this...
But can anyone tell me why? Or if there would be some negative consequence of doing so?
As far as I can see, Cognito billing is purely based on storage (i.e. number of users), and not operations, whereas Dynamo charges for read/write/storage.
Could it simply be AWS not wanting people to abuse Cognito as a storage mechanism? Or am I being daft?
We are dealing with similar problem and main reason why we have decided to store extra attributes in DB is that Cognito has quotas for all the actions and "AdminUpdateUserAttributes" is limited to 25 per second.
More information here:
https://docs.aws.amazon.com/cognito/latest/developerguide/limits.html
So if you have a pool with 100k or more it can create a bottle neck if wanted to update a Cognito user records with every login etc.
Cognito UserAttributes are meant to store information about the users. This information can then be read from the client using the AWS Cognito SDK, or just by decoding the idToken on the client-side. Every custom attribute you add will be visible on the client-side.
Another downside of custom attributes is that:
You only have 25 values to set
They cannot be removed or changed once added to the user pool.
I have personally used custom attributes and the interface to manipulate them is not excellent. But that is just a personal thought.
If you want to store this information, and not depend on DynamoDB, you can use Amazon Cognito Sync. Besides the service, it offers a client with great features that you can incorporate to your app.
AWS DynamoDb appears to be your best option, it is commonly used for such use cases. Some of the benefits of using it:
You can store separate record for each login attempt with as much info as you want such as ip address, location, user-agent etc. You can also add datetime that can be used by pre-auth Lambda to query by time range for example failed attempt within last 30 minutes
You don't need to manage table because you can set TTL for DynamoDb record so that record will be deleted automatically after specified time.
You can also archive items in S3
I'm developing a huge application in django and I need a permission system and I assume that the native user/group permission within django is not sufficient. Here my needs:
The application will be available through multiple departments. In each department there will be nearly the same actions. But maybe an user will be allowed to add a new team member in department A and in department B he is only allowed to view the team list and in the other departments he has no access at all.
I though using a RBAC system would be most appropriate. Roles must also be inheritable, stored in a model an managable through an interface. Any good ideas or suggestions? Regards
What you are looking for is called abac aka Attribute-Based Access Control. It's an evolution of RBAC as an access control model. In RBAC, you define access control in terms of roles, groups, and potentially permissions. You then have to write code within your application to make sense of the roles and groups. This is called identity-centric access control.
In ABAC, there are 2 new elements:
attributes which are a generalization of groups and roles. Attributes are a key-value pair that can describe anyone and anything. For instance, department, member, and action are all attributes.
policies tie attributes together to determine whether access should be granted or denied. Policies are a human-friendly way of expressing authorization. Rather than write custom code in your app, you write a policy that can be centrally managed and reused across apps, databases and APIs.
There are a couple of ABAC languages such as xacml and alfa. Using ALFA, I could write the following policy:
A user will be allowed to add a new team member in department A
In department B he is only allowed to view the team list
In the other departments he has no access at all.
Roles must also be inheritable, stored in a model an managable through an interface.
policyset appAccess{
apply firstApplicable
policy members{
target clause object = "member"
apply firstApplicable
/**
* A user can add a member to a department if they are a manager and if they are assigned to that department.
*/
rule addMember{
target clause role == "manager" and action == "add"
permit
condition user.department == target.department
}
}
}
One of the key benefits of ABAC is that you can develop as many policies as you like, audit them, share them, and not have to touch your application code at all because you end up externalizing authorization.
There are several engines / projects that implement ABAC such as:
AuthZForce (a Java library for XACML authorization)
Axiomatics Policy Server (commercial product - disclaimer: I work there)
AT&T XACML
There are two components to this question:
First, role management. Roles can be achieved through group membership, i.e. departmentA_addMember & departmentB_listMembers. These Groups would have corresponding permissions attached, e.g. "Member | Add" and "Member | View". A department in this context may have more resources included, that require separate permissions. Django allows to extend Objects with custom Permissions.
Second, inheritance. Do I understand you want to have individual Groups being member of other groups? Then this is something Django would require you to implement yourself.
However, should you be looking for a really more complex authentication solution, it may be worthwhile to integrate with 3rd party services through, e.g. django-allauth. There are sure more/other solutions, just to throw in one name.
I'm trying to figure out if this is possible with baqend, or even the correct approach to begin with.
I have a bunch of users, using the default user account system that comes with Baqend.
Some of these users will be administrators of a company. A company will have somewhere between 1 and 5 users who are administrators.
There is a separate data class that contains a record for the company and an array of users who are the administrators.
Like this:
{
id: "/db/Companies/123-456-789",
name: "Test Co",
admins: [
{ id: "/db/Users/10", name: "Joe Schmo" },
{ id: "/db/Users/11", name: "Kate Skate" },
{ id: "/db/Users/12", name: "Johny Begood" }
]
}
What is the approach to ensure that only users 10, 11, and 12 can modify the contents of the admins array and whatever else is contained in /db/Companies/123-456-789 ?
Is it as simple as inserting the additional admin's info into the array and also adding that person to the ACL of /db/Companies/123-456-789 at the same time or right after?
Also what is the way to remove a persons ACL? I see how to set it here: https://www.baqend.com/guide/topics/user-management/#permissions but how do we do remove or delete? And what is the difference between explicitly denying that user in the ACL vs that user simply not existing (and I guess by default being denied? Assuming the entire collection is set to NOT be public in the first place).
For our use, just because an administrator leaves does not mean he leaves OUR APP, he might go work for another customer who uses our app and his user account should remain valid, but with no more access to the company record.
I think you already have it quite right: you can add administrators to a company by adding them to the admins array and by explicitly allowing them read and write access in the ACLs. To remove admin rights, you can simply remove the explicit allow rules for the soon-to-be-ex admin.
In Baqend, permissions are enforced like this:
superusers are always allowed.
explicitly denied users and roles are always denied.
if there is no allow rule for a record, access is public. As soon as there is at least one allow rule, only allowed users are granted access.
Since every record is public unless there is at least one allow rule, your record will be protected when you allow access to the administrators. However, it will be public again as soon as you remove the last administrator from the access ruleset. Therefore, it's probably a good idea to always explicitly allow write access to your Baqend app admin, so that there is always at least one allow rule.
let me try to explain how exactly ACLs in Baqend work.
TL;DR
To secure your object /db/Companies/123-456-789 you can simply add an allow rule for each of your three user ids (/db/Users/10. /db/Users/11, /db/Users/12) to the object acls of your company object like this:
db.Companies.load("/db/Companies/123-456-789").then(function(company) {
company.allowWriteAccess("/db/Users/10");
company.allowWriteAccess("/db/Users/11");
company.allowWriteAccess("/db/Users/12");
return company.save();
})
This ensures that only these users can edit the company object. Notably, this list of rules is independent of the list of admins contained in your company object. To revoke the write access of a user, you can use deleteWriteAccess in the same way we used allowWriteAccess before.
This means your users can leave a Company easily without leaving your app.
I hope this answers your question. Because ACLs are complex I will try to explain the general approach in more detail now.
How ACLs Work
On which level can access be controlled?
There are two levels to control access to your data:
On table level (so-called Schema ACLs)
On object level (so-called Object ACLs)
Schema ACLs define who is allowed access to the table in general. For example, you could define that the User table is not readable for the public by granting read access only to the admin:
allowReadAccess("/db/Role/admin") // Schema ACLs can only be set by the admin
You can define rules for reading, updating, inserting and querying on the table separately.
Object ACLs defines access on the lower level. You can use it to deny access to a specific object. For example, you could define that only the user itself can update its own User object, like this:
allowWriteAccess("<userId>")
For objects, you can define rules for reading and writing separately.
Who has access now (how are permission evaluated)?
In order to access an object, a user needs to have general permission to access the table (Schema ACLs) and also permission to access the object itself (Object ACLs). This means the Schema ACLs are evaluated first if they grant you access, the Object ACLs are evaluated as well.
Which rules can I define?
There are two types of rules that can be defined to allow or deny access:
Allow Rules define who has access in general. These rules are checked first. If you do not define allow rules, everyone has general access.
Deny Rules defines who is denied access (even if the user was allowed by an allow rule). These rules are checked after the allow rules.
Take a look at the JS API for ACLs for the actual method documentation.
These separate rules can be tricky at the start but they are really powerful. Let's do some examples. How can I use these rules to ...
Deny access for everyone: --> Set the only allow rule for admin
Allow access for logged-in users but not for some guy Peter (like when you block someone is a chat application): --> Set an allow rule for the loggedin role and a deny rule for Peter.
Only allow access from backend code modules: --> Set an allow rule for the node role (see below for the explanation of Roles).
Who can I grant or deny access?
There are two entities you can use in your allow and deny rules:
Users from the predefined User table can be granted or denied access
Groups of Users defined in the predefined Role table can be granted or denied access
The predefined roles admin, loggedin (represents all logged-in users) and node (represents backend code modules that access the database)
What is the default access for my tables?
Tables and objects are publically accessible by default if not configured otherwise.
What about attribute-level ACLs
There are no attribute level ACLs in Baqend. This means when you have a User object with a private email address and a public name you can only make the object private or public.
The solution for this is to use two objects, one for the private information and one for the public information and then link the two. For the User, this would mean you make the actual User object private and define a new Profile table where you keep the public user information.
While this solution is more work when defining your schema, there are good reasons why Baqend does not support attribute-level ACLs. Without going into too much detail:
Better caching. Attribute-level ACLs would severely limit how we can cache and therefore accelerate your database requests.
Expensive evaluation. Attribute-level ACLs are much harder to evaluate and therefore slow down database access. Object-level ACLs, on the other hand, can be pushed down to our database system and are evaluated very efficiently.
Something missing
I hope these explanations help to understand the ACL system better. If there is something missing here, just comment and I will add it.
I'm working on a web application that uses a bunch of Amazon Web Services. I'd like to use DynamoDB for a particular part of the application but I'm not sure if it's an appropriate use-case.
When a registered user on the site performs a "job", an entry is recorded and stored for that job. The job has a bunch of details associated with it, but the most relevant thing is that each job has a unique identifier and an associated username. Usernames are unique too, but there can of course be multiple job entries for the same user, each with different job identifiers.
The only query that I need to perform on this data is: give me all the job entries (and their associated details) for username X.
I started to create a DynamoDB table but I'm not sure if it's right. My understanding is that the chosen hash key should be the key that's used for querying/indexing into the table, but it should be unique per item/row. Username is what I want to query by, but username will not be unique per item/row.
If I make the job identifier the primary hash key and the username a secondary index, will that work? Can I have duplicate values for a secondary index? But that means I will never use the primary hash key for querying/indexing into the table, which is the whole point of it, isn't it?
Is there something I'm missing, or is this just not a good fit for NoSQL.
Edit:
The accepted answer helped me find out what I was looking for as well as this question.
I'm not totally clear on what you're asking, but I'll give it a shot...
With DynamoDB, the combination of your hash key and range key must uniquely identify an item. Range key is optional; without it, hash key alone must uniquely identify an item.
You can also store a list of values (rather than just a single value) as an item's attributes. If, for example, each item represented a user, an attribute on that item could be a list of that user's job entries.
If you're concerned about hitting the size limitation of DynamoDB records, you can use S3 as backing storage for that list - essentially use the DDB item to store a reference to the S3 resource containing the complete list for a given user. This gives you flexibility to query for or store other attributes rather easily. Alternatively (as you suggested in your answer), you could put the entire user's record in S3, but you'd lose some of the flexibility and throughput of doing your querying/updating through DDB.
Perhaps a "Jobs" table would work better for you than a "User" table. Here's what I mean.
If you're worried about all of those jobs inside a user document adding up to more than the 400kb limit, why not store the jobs individually in a table like:
my_jobs_table:
{
{
Username:toby,
JobId:1234,
Status: Active,
CreationDate: 2014-10-05,
FileRef: some-reference1
},
{
Username:toby,
JobId:5678,
Status: Closed,
CreationDate: 2014-10-01,
FileRef: some-reference2
},
{
Username:bob,
JobId:1111,
Status: Closed,
CreationDate: 2014-09-01,
FileRef: some-reference3
}
}
Username is the hash and JobId is the range. You can query on the Username to get all the user's jobs.
Now that the size of each document is more limited, you could think about putting all the data for each job in the dynamo db record instead of using the FileRef and looking it up in S3. This would probably save a significant amount of latency.
Each record might then look like:
{
Username:bob,
JobId:1111,
Status: Closed,
CreationDate: 2014-09-01,
JobCategory: housework,
JobDescription: Doing the dishes,
EstimatedDifficulty: Extreme,
EstimatedDuration: 9001
}
I reckon I didn't really play with the DynamoDB console for long enough to get a good understanding before posting this question. I only just understood now that a DynamoDB table (and presumably any other NoSQL table) is really just a giant dictionary/hash data structure. So to answer my question, yes I can use DynamoDB, and each item/row would look something like this:
{
"Username": "SomeUser",
"Jobs": {
"gdjk345nj34j3nj378jh4": {
"Status": "Active",
"CreationDate": "2014-10-05",
"FileRef": "some-reference"
},
"ghj3j76k8bg3vb44h6l22": {
"Status": "Closed",
"CreationDate": "2014-09-14",
"FileRef": "another-reference"
}
}
}
But I'm not sure it's even worth using DynamoDB after all that. It might be simpler to just store a JSON file containing that content structure above in an S3 bucket, where the filename is the username.json
Edit:
For what it's worth, I just realized that DynamoDB has a 400KB size limit on items. That's a huge amount of data, relatively speaking for my use-case, but I can't take the chance so I'll have to go with S3.
It seems that username as the hash key and a unique job_id as the range, as others have already suggested would serve you well in dynamodb. Using a query you can quickly search for all records for a username.
Another option is to take advantage of local secondary indexes and sparse indexes. It seems that there is a status column but based upon what I've read you could add another column, perhaps 'not_processed': 'x', and make your local secondary index on username+not_processed. Only records which have this field are indexed and once a job is complete you delete this field. This means you can effectively table scan using an index for username where not_processed=x. Also your index will be small.
All my relational db experience seems to be getting in the way for my understanding dynamodb. Good luck!