setting foreign key and primary key in gcp firestore - google-cloud-platform

I am new to GCP and NOSQL.
is it possible to have primary and foreign key in the GCP fire-store
Example: I have two table STUDENT and DEPARTMENT
table looks like below
Department-table
dept-id(primary key)
deptname
Student-table
dept-id(foreign key)
student-id
student name
can anybody please help in design this in GCP Fire-store?

To a database, a key is the same as any UUID/randomID and can be shared and used between users, teams, admins, businesses, of all kinds. what matters is how that data is associated. Since Firestore is a noSQL database, there is no direct relational references, so one key cannot be equal to another without including secondary lookups.
In the same way you would define a user profile by an ID, you can create an empty document with a random ID to facilitate the ID of a team, or in this case the department. You can also utilize string combinations if you have a team and a sub-team, so long as at the point of the database request you have access to the team/department ID, you can use Regex to match a string comparison.
Example: request.resource.data.name.matches('/^' + departmentID)
To make a foreign key work with Security Rules or within the client, you must get the key that contains the data as the key should be the name of the document in question to streamline the request as you cannot perform queries or loop through data within Security Rules.
I great read on this subject, I highly suggest this article
https://medium.com/firebase-developers/a-list-of-firebase-firestore-security-rules-for-your-project-fe46cfaf8b2a
But my suggestion is to use a key that represents the department directly rather than using additional resource to have a foreign key and managing it.

Firestore won't support referential integrity.
It means that you can use any (subject to rules and conventions) names for fields, but the semantic and additional functionality is to be maintained by you, rather than by the system.

Related

How to use dynamodb:LeadingKeys when Partition key has more than one kind of values

My Dynamo Tables have tenant_id as the partition key in my multi-tenant application but my partition key also has other types of entities in it in addition to tenant_id.
For example: (This is a small example, we are using this pattern throughout)
PK SK Att
Customer-4312a674-54a user-abc 672453782
user-abc user-abc 672453782
I would like to use dynamodb:LeadingKeys to ensure data of one tenant can never be accessed by another tenant. How can I go about that in this case when PK is overloaded and has other entities in it as well.
In a multi-tenant system my recommendation would be to add the tenant-id as a prefix to the partition key of all items belonging to the tenant. That way you can use the dynamodb:LeadingKeys condition for access control.
The tenant-id should be known at query time for every query anyway, my guess is that it's probably stored in the session information. This means you can add the tenant-id to every Key and still do partition key overloading.

Should Dynamodb apply single table design instead of multiple table design when the entities are not relational

Let’s assume there are mainly 3 tables for the current database.
Pkey = partition key
Admin
-id(Pkey), username, email, createdAt,UpdatedAt
Banner
-id(Pkey), isActive, createdAt, caption
News
-id(Pkey), createdAt, isActive, title, message
None of the above tables have relation with other tables, and more tables will be required in the future(I think most of it also don’t have the relation with other tables).
According to the aws document
You should maintain as few tables as possible in a DynamoDB application.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html
So I was considering the need to combine these 3 tables into a single table.
Should I start to use a single table from now on, or keep using multiple tables for the database?
If using a single table, how should I design the table schema?
DynamoDB is a NoSQL database, hence you design your schema specifically to make the most common and important queries as fast and as inexpensive as possible. Your data structures are tailored to the specific requirements of your business use cases.
When designing a data model for your DynamoDB Table, you should start from the access patterns of your data that would in turn inform the relation (or lack thereof) among them.
Two interesting resources that would help you get started are From SQL to NoSQL and NoSQL Design for DynamoDB, both part of the AWS Developer Documentation of DynamoDB.
In your specific example, based on the questions you're trying to answer (i.e. use case & access patterns), you could either work with only the Partition Key or more likely, benefit from the usage of composite Sort Keys / Sort Key overloading as described in Best Practices for Using Sort Keys to Organize Data.
Update, add example table design to get you started:

AWS Dynamo Smart Home Schema (Locations/Rooms)

I am trying to build out a scalable smart home infrastructure on AWS using iot core, lambda, and dynamodb along with the serverless framework and subsequent Android/iOS app.
I am implementing locations and rooms in dynamodb. A user can have many locations, and locations can have many rooms. I am used to using Firebase Firestore, so the use of partition keys and sort keys (hash and range?) and the combination to query are a little confusing. I implemented my own hash to use as a primary (partition? hash?) id. Here is the structure I am thinking of:
Location
id
name
username
I also added a secondary index on username, so that a user could query all of their locations.
Room
id
name
locationId
I also added a secondary index on locationId, so that a user could query all rooms for a given location
Here is the code in which I create the id's:
// need a unique hash for the id
let hash = event.name + event.username + new Date().getTime();
let id = crypto.createHash('md5').update(hash).digest('hex');
let location = {
id: id,
name: event.name,
username: event.username
};
And for rooms:
// need a unique hash for the id
let hash = event.name + event.locationId + new Date().getTime();
Since I'm fairly new to Dynamo/AWS, I'm wondering if this is an acceptable solution. Obviously I would expand on this by adding multiple devices under rooms by associating via the roomId. I would also like to be able to share devices, so I'm not quite sure how that would work, as the association for a user is on location - so I assume I would have to share location, room(s), and device(s) (which I think is how Google Home does it)
Any suggestions would be greatly appreciated!
EDIT
The queries that I can think of would be:
Get Location by Id
Get all Locations by User
Get Room by Id
Get all Rooms by Location
However as the app expands in the future, I would want these queries to be flexible (share location, get shared locations, etc)
I would want these queries to be flexible
Then noSQL in general and Dynamo specifically may not be the right choice.
As #varnit alludes to, noSQL DB's are very flexible in what you store, but very inflexible in how you can query that data.
Dynamo for instance can only return a list (Query) if you use a sort key (SK) or if you do a full table scan (not recommended). Otherwise, it can only return a single record.
I don't understand what a "shared location" would entail.
But with multiple tenets in Dynamo, (each user is only looking at their data) the easy solution would be to use userID as the partition key (PK).
I'd use a composite sort key of location#room
Get Location by Id --> GetItem(PK = User, SK = location)
Get all Locations by User --> Query (PK = User)
Get all rooms by Location --> Query (PK = User, SK starts with Location)
This one is a little trickier...
- Get Room by Id -->
If you really need to get a room without having the location, then you'd want to have room as stand-a-lone attribute in addition to having it as part of the sort key. Then you can create a local secondary index over it and query (PK = User, Index SK = Room)
I suspect that finding a room via GetItem(PK = User, SK = location#room) might work for you instead.
Key point, the partition key comparision is always equal. There's no start with, ends with or contains for the partition key comparison.
If you haven't seen them, take a look at the following videos
AWS re:Invent 2018: Building with AWS Databases: Match Your Workload to the Right Database (DAT301)
AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401)
Also be sure to read the SaaS Storage Strategies - Building a Multitenant Storage Model on AWS whitepaper.
EDIT
"location" and "room" can be whatever makes the most sense to your application. GUID or a natural key such as "Home". In a noSQL db, GUIDs are useful when multiple nodes are adding records. But a natural key is good when that what the application user will have handy. Since you don't want to have to look up a guid by the natural key. RDBMS practices don't apply to noSQL DBs.
So yes, I'd use "Home" as the location, meaning the user won't be able to have multiple "Home"s. But I don't see that as a big deal, I'd use "Home" and "Vacation House" in real life.
EDIT2
Dynamo doesn't care if it's a GUID or a natural key. It internally hashes the whatever value you use for partition key. All that matters is the number of distinct values. Distinct is distinct, doesn't matter if the value is '0ae4ad25-5551-46a7-8e39-64619645bd58' or 'charles.wilt#mydomain.com'. If your authorization process returns a GUID, use that. Otherwise use the username.

DynamoDB table/index schema design for querying multi-valued attributes

I'm building a DynamoDB app that will eventually serve a large number (millions) of users. Currently the app's item schema is simple:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
email: "foo#foo.com",
... other attributes ...
}
When a new user signs up, or if a user wants to find another user by email address, we'll need to look up users by email instead of by userId. With the current schema that's easy: just use a global secondary index with email as the Partition Key.
But we want to enable multiple email addresses per user, and the DynamoDB Query operation doesn't support a List-typed KeyConditionExpression. So I'm weighing several options to avoid an expensive Scan operation every time a user signs up or wants to find another user by email address.
Below is what I'm planning to change to enable additional emails per user. Is this a good approach? Is there a better option?
Add a sort key column (e.g. itemTypeAndIndex) to allow multiple items per userId.
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "main", // sort key
email: "foo#foo.com",
... other attributes ...
}
If the user adds a second, third, etc. email, then add a new item for each email, like this:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "Email-2", // sort key
email: "bar#bar.com"
// no more attributes
}
The same global secondary index (with email as the Partition Key) can still be used to find both primary and non-primary email addresses.
If a user wants to change their primary email address, we'd swap the email values in the "primary" and "non-primary" items. (Now that DynamoDB supports transactions, doing this will be safer than before!)
If we need to delete a user, we'd have to delete all the items for that userId. If we need to merge two users then we'd have to merge all items for that userId.
The same approach (new items with same userId but different sort keys) could be used for other 1-user-has-many-values data that needs to be Query-able
Is this a good way to do it? Is there a better way?
Justin, for searching on attributes I would strongly advise not to use DynamoDB. I am not saying, you can't achieve this. However, I see a few problems that will eventually come in your path if you will go this root.
Using sort-key on email-id will result in creating duplicate records for the same user i.e. if a user has registered 5 email, that implies 5 records in your table with the same schema and attribute except email-id attribute.
What if a new use-case comes in the future, where now you also want to search for a user based on some other attribute(for example cell phone number, assuming a user may have more then one cell phone number)
DynamoDB has a hard limit of the number of secondary indexes you can create for a table i.e. 5.
Thus with increasing use-case on search criteria, this solution will easily become a bottle-neck for your system. As a result, your system may not scale well.
To best of my knowledge, I can suggest a few options that you may choose based on your requirement/budget to address this problem using a combination of databases.
Option 1. DynamoDB as a primary store and AWS Elasticsearch as secondary storage [Preferred]
Store the user records in DynamoDB table(let's call it UserTable)as and when a user registers.
Enable DynamoDB table streams on UserTable table.
Build an AWS Lambda function that reads from the table's stream and persists the records in AWS Elasticsearch.
Now in your application, use DynamoDB for fetching user records from id. For all other search criteria(like searching on emailId, phone number, zip code, location etc) fetch the records from AWS Elasticsearch. AWS Elasticsearch by default indexes all the attributes of your record, so you can search on any field within millisecond of latency.
Option 2. Use AWS Aurora [Less preferred solution]
If your application has a relational use-case where data are related, you may consider this option. Just to call out, Aurora is a SQL database.
Since this is a relational storage, you can opt for organizing the records in multiple tables and join them based on the primary key of those tables.
I will suggest for 1st option as:
DynamoDB will provide you durable, highly available, low latency primary storage for your application.
AWS Elasticsearch will act as secondary storage, which is also durable, scalable and low latency storage.
With AWS Elasticsearch, you can run any search query on your table. You can also do analytics on data. Kibana UI is provided out of the box, that you may use to plot the analytical data on a dashboard like (how user growth is trending, how many users belong to a specific location, user distribution based on city/state/country etc)
With DynamoDB streams and AWS Lambda, you will be syncing these two databases in near real-time [within few milliseconds]
Your application will be scalable and the search feature can further be enhanced to do filtering on multi-level attributes. [One such example: search all users who belong to a given city]
Having said that, now I will leave this up to you to decide. 😊

one primary key column foreign key to 2 other table columns.How to resolve data entry issue

I have a requirement according to which I have to create a central Login system.We have 2 things Corporate and Brand each represented by tables "Corporate" and "Brand".
When a corporate gets registered,corporateID is given,When a user under that corporate gets registered there is a table corporateuser in which corporateID is a foreign key and CorporateUserID is a primary key.Similarly in the case of a brand.
So we have CorporateUserId and BrandUserID.
Now i have a table called RegisteredUsers in which i want to have corporate as well as brand users.UserID is a primary key in this table which is a foreign key to both corporateuser as well as Branduser.
now when i enter a corporateuser,I do an entry to corporateuser as well as RegisteredUsers.When i enter CorporateUserID in userID for RegisteredUsers.It gives foreign key violation error.
I fully understand this error.How can i achieve this.This requirement is very rigid.Please tell a workaround
What you're trying to do is not totally clear, but it seems that you want the primary key of all three user tables to be the same. This is not a strict foreign key relationship, but it seems reasonable in your application.
You need to assign the userID in RegisteredUsers first, and use that key when you create your Corporate User or Brand User. Then the user id's will be unique across the whole system.
If that's not what you want, edit your entry with the table layouts to make the problem clearer.
If you are trying to insert records into tables with relational conatraints, you will need do all inserts under one SQL Transaction.