I have an application that allows for "contacts" to be made completely customized. My method of doing that is letting the administrator setup all of the fields allowed for the contact. My database is as follows:
Contacts
id
active
lastactive
created_on
Fields
id
label
FieldValues
id
fieldid
contactid
response
So the contact table only tells whether they are active and their identifier; the fields tables only holds the label of the field and identifier, and the fieldvalues table is what actually holds the data for contacts (name, address, etc.)
So this setup has worked just fine for me up until now. The client would like to be able to pull a cumulative report, but say state of all the contacts in a certain city. Effectively the data would have to look like the following
California (from fields table)
Costa Mesa - (from fields table) 5 - (counted in fieldvalues table)
Newport 2
Connecticut
Wallingford 2
Clinton 2
Berlin 5
The state field might be id 6 and the city field might be id 4. I don't know if I have just been looking at this code way to long to figure it out or what,
The SQL to create those three tables can be found at https://s3.amazonaws.com/davejlong/Contact.sql
You've got an Entity Attribute Value (EAV) model. Use the field and fieldvalue tables for searching only - the WHERE caluse. Then make life easier by keeping the full entity's data in a CLOB off the main table (e.g. Contacts.data) in a serialized format (WDDX is good for this). Read the data column out, deserialize, and work with on the server side. This is much easier than the myriad of joins you'd need to do otherwise to reproduce the fully hydrated entity from an EAV setup.
Related
good day guys,
I need your opinion on this problem. although am using Django for my project but am sure this problem is not tie to django alone. So, I am working on these services booking system. In my database I have 3 tables listed below:
User_Table with field
• Id
• Username
• Fullname
Services_Table with field
• Id
• name
• Price
Transaction_Table with field
• Id
• User_id
• Services_id (many to many relationship)
When this services get booked, I send it to the transaction table using the user_id and services_id as foreign key for User Table and Services Table meaning it’s the id values that are saved.
When a client want to view his or her transaction history, I provide it by running the query:
price = transaction.service.price
service_name = transaction.service.name
total_cost = sum of all services selected
as not to present the user with id values for price and service_name.
now here is my problem, in future, if the admin decide to change the name and price of a service and the client goes back to view his old transaction log, the new value get populated cus I referenced them by ids which is not what I want, I want the client to see the old value as a receipt would be even when I updated the services table.
What do you suggest I do in this case?
You should record every transaction made and record the price and amount it totalled up to at the moment the txn was made. Transaction model should have fields to record every detail about the transaction.
This means:
You would have a txn_service table, where all services in a transaction are saved and linked to the transaction table.
I am all new to NoSQL and specifically DynamoDB single table design. Have been going through a lot of videos and articles on the internet regarding the single-table design and finally I have put together a small design for a chat application which I am planning to build in the future.
The access patterns I have so far thought about are -
Get user details by User Id.
Get list of conversations the user is part of.
Get list of messages the user has created
Get all members of a conversation
Get all messages of a conversation
Also want to access messages of a conversation by a date range, so far I haven't figured out that one.
As per the below design, if I were to pull all messages of a conversation, is that going to pull the actual message in the message attribute which is in the message partition?
Here is the snip of the model I have created with some sample data on. Please let me know if I am in the right direction.
As per the below design, if I were to pull all messages of a conversation, is that going to pull the actual message in the message attribute which is in the message partition?
No, it will only return the IDs of a message as the actual content is in a separate partition.
I'd propose a different model - it consists of a table with a Global Secondary Indexe (GSI1). The layout is like this:
Base Table:
Partition Key: PK
Sort Key: SK
Global Secondary Index GSI1:
Partition Key: GSI1PK
Sort Key: GSI1SK
Base Table
GSI 1
Access Patterns
1.) Get user details by User Id.
GetItem on Base Table with Partition Key = PK = U#<id> and Sort Key SK = USER
2.) Get list of conversations the user is part of.
Query on Base Table with Partition Key = PK = U#<id> and Sort Key SK = starts_with(CONV#)
3.) Get list of messages the user has created
Query on GSI1 with Partition Key GSI1PK = U#<id>
4.) Get all members of a conversation
Query on Base Table with Partition Key = PK = CONV#<id> and Sort Key SK starts_with(U#)
5.) Get all messages of a conversation
Query on Base Table with Partition Key PK = CONV#<id> and Sort Key SK starts_with(MSG#)
6.) Also want to access messages of a conversation by a date range, so far I haven't figured out that one.
DynamoDB does Byte-Order Sorting in a partition - if you format all dates according to ISO 8601 in the UTC timezone, you can make the range query, e.g.:
Query on Base Table with Partition Key PK = CONV#<id> and Sort Key SK between(MSG#2021-09-20, MSG#2021-09-30)
I'm new to DynamoDB and trying to figure out how to structure my data/table/index. My schema includes an itemid (unique) and an orderid (multiple items per order), along with some other arbitrary attributes. I want to be able to retrieve a single item by its itemid, but also retrieve a set of items by their OrderId.
My initial instinct was to set the itemid as the primary key and the orderid as the sort key, but that didn't allow me to query by orderid only. However the same problem occurs if I reverse those.
Example data:
ItemId
OrderId
abc-123
1234
def-345
1234
ghi-678
5678
jkl-901
5678
I think I may need a Global Se but not quite understanding where those fit.
If your question is really whether you "are able" to do this, then with ItemId as the partition key, you can still retrieve by OrderId, with the Scan operation, which will let you filter by any attribute.
However Scan will perform full table scans, so the real question is probably whether you can retrieve by OrderId efficiently. In that case, you would indeed need a Global Secondary Index with OrderId and ItemId as the composite attribute key.
This is typically achieved using what's called a "single table design". What this means, is that you store all your data in one table, and store it normalized, i.e. duplicate your data so that it fits your access patterns.
Generally speaking, if you do not know your access patterns beforehand, dynamodb might not be a good fit. For many systems, a good solution is to have the "main" access patterns in dynamo and then offloading some not so performance critical ad-hoc queries by replicating data to something like elasticsearh.
If you have a table with the hash key PK (String) and the sort key SK (String), you can store your data like this. Use transactions to keep the multiple items up to date and consistent etc.
PK
SK
shippingStatus
totalPrice
cartQuantity
order_1234
order_status
PENDING
123123
order_1234
item_abc-123
3
order_1234
item_def-345
1
order_5678
order_status
SHIPPED
54321
order_5678
item_jkl-901
5
item_abc-123
order_1234
item_abc-123
order_9876
item_abc-123
order_5656
This table illustrates the schemaless nature of a dynamo table (except from the PK/SK). With this setup, you can store "metadata" about the order in the order_1234/order_status item. Then, you can query for items with PK order_1234 and SK starts_with "item_" to get all the items for that order. You can do the same to get all the orders for an item - query for PK item_abc-123 and SK starting with "order_" to get all the orders.
I highly recommend this talk by Rick Houlihan to get into single table design and data modelling in dynamo :)
https://www.youtube.com/watch?v=HaEPXoXVf2k
I've been trying to create my first star schema based on Google Classroom data for a week. I put a description of the tables from my most recent attempt below. I didn't list descriptive fields not relevant to my question.
I have a table visual that shows CourseName, StudentsEnrolled (it works)
StudentsEnrolled = CALCULATE(DISTINCTCOUNT(gc_FactSubmissions[StudentID]))
I am trying to create a table visual that shows StudentName, CourseWorkTitle, PointsEarned, MarkPct.
MarkPct =
divide(sum(gc_FactSubmissions[PointsEarned]),sum(gc_DimCourseWork[MaxPoints]))
When I try to add StudentName to the visual, I end up with incorrect results (some blank student names and incorrect totals). When I try to use DAX Related(), I can only select fields in the Submissions table.
I’ve spent countless hours of Googling sites/pages like the following one and others:
https://exceleratorbi.com.au/the-optimal-shape-for-power-pivot-data/
I think the problem is the gc_DimStudents table because it contains a student record for every student that is enrolled in a gc_DimCourses. Not all students enrolled have submitted assignments, so if I limited the gc_DimStudents to only the StudentIDs in gc_FactSubmissions, then I won’t be able to get a count of StudentsEnrolled in courses.
I’m not sure how to resolve this. Should gc_DimCourses also be made into a fact table? With a gc_DimCourseStudents and a gc_DimSubmissionStudents? Then I’d have to create a surrogate key to join gc_FactSubmissions to the new gc_FactCourses? If that is true, then as I add more fact tables to my model, is it normal to have many DimAnotherStudentList in many places in a Star Schema model?
I want to keep building on this star schema because we want reports/dashboards that relate things like online marks, to attendance, to disciplinary actions, etc., etc. So I want to get the relationships correct this time.
Any help is very much appreciated.
Thanks,
JMC
gc_FactSubmissions (contains one record for every combination of the 4 ID fields, no blanks)
CourseID (many to 1 join to gc_Dimcourses.CourseID )
OwnerID (many to 1 join to gc_DimOwners.OwnerID)
CourseWorkID (many to 1 join to gc_DimCourseWork.CourseWorkID)
StudentID (many to 1 join to gc_DimStudents)
SubmissionID
SubmissionName
PointsEarned (int, default to sum)
(other descriptive fields)
gc_DimCourseWork (one CourseWorkID for each gc_FactSubmissions.CourseWorkID)
CourseWorkID (it is distinct, no blanks)
CourseWorkName
MaxPoints (int, default to sum)
(other descriptive fields)
gc_DimCourses (one CourseID for each gc_FactSubmissions.Course CourseID)
CourseID (it is distinct, no blanks)
CourseName
(other descriptive fields)
gc_DimOwners (one OwnerID for each gc_DimOwners.OwnerID)
OwnerID (it is distinct, no blanks)
OwnerName
(other descriptive fields)
gc_DimStudents (one StudentID for each gc_FactSubmissions.Course CourseID)
StudentID (distinct, no blanks)
StudentName
(other descriptive fields)
A Snowflake Schema is one where Dimensions are related to each other directly rather than via a Fact table - so no, adding another fact table to your model doesn't make it a Snowflake.
An Enrolment fact would have FKs to any Dimensions that are relevant to Enrolments - so Course, Student, probably at least 1 date and whatever other enrolment attributes there may be.
As an additional comment, while there are many incorrect ways of modelling a star schema there can also be many correct ways of modelling it: there is rarely one correct answer. For example, for your Submissions Star you could denormalise your Course data into your CourseWork Dim and possibly also include the Owner data (I assume Owner is Owner of the course?). The fewer joins there are in any query the better the performance. If another fact, such as Enrolment, needed to be related to a Course Dim (rather than to Coursework) then you'd need to consider the trade-off in performance of having fewer joins to one fact and having to maintain the course data in two different Dims (Course and Coursework).
As a star schema is denormalised there is no issue with the same data appearing in multiple tables (within reason). The most common example is a Date Dim that has date, week, month and year attributes and a Month Dim that has just month and year attributes.
I have 3 tables in my DB:
user (_id, name)
event (_id, name, ...)
events_partecipants(user_id, event_id)
I have two Doctrine entities which maps those tables and their relations and everything works (eg. I'm able to get all the partecipants for a specific event).
Now, I want to retrieve the number of events each user joined to. Using pure SQL the query will be:
SELECT user_id, COUNT(*) as count
FROM events_partecipants
GROUP BY user_id
ORDER BY count DESC
As result, I want to retrieve also the name of each user, so I can send back into my JSON information about each user with the name, and not the ID.
If I want to use Doctrine how this can be achieved? I cannot find a smart way to do that.