I am creating a database for an achievement system (like something you would see in a Blizzard game). I would like to have a GUI that displays the current progress of all achievements in the game which means I will need to query the progress of all achievements for a user in order to populate the GUI. I plan on having somewhere around 100 achievements.
This brings about a design question. What is the best way to design the database and querying code to query the progress of ~100 bit fields?
It seems like the brute force method would be to get the entire row of achievements and then for each field in the row do some hardcoded string comparison to determine which achievement we are dealing with.
Another possible solution may be to have a big switch statement based on the column index of the table and handle each achievement for each case (requires not modifying the table or you have to refactor a lot of C++ code).
I'm curious to hear any other designs you guys may have for this.
Thanks!
I suggest building a solution using 3 tables. These tables are users, achievements and user_achievements. A user would be identified with a u_id in the users table. An achievement would be identified with a a_id in the achievements table. You would then keep track of users achievements by inserting a row in the user_achievements table that includes a u_id to identify the user and a a_id to identify the achievement. The user_achievements table would also contain a column that would specify the % completion of that achievement for the given user.
Came across this question and even though it's 5 years old, perhaps someone would be interested in following approach.
Achievements are usually broken down to numbers (the rest, like Name, Description of each achievement can be put to site/app core to avoid bloating the DB).
lets be simple, we are not FB and don't need separate table for them, so in "users" table we add just 1 single column: "Achievements" it is a varchar(50). Number in brackets (50) will depend on your actual needs to this column (i.e. how much data it stores).
so you end up having in each cell of the Achievements column a numerical sequence: 10982039482084109384
Read this line of digits as follows, from left to right: user has reached "1098 profile views", received "2039 likes", etc. Optionally, add a separator for easier distinction + to instantly handle cases when as first user had 25 likes, then 125, then 2039 (2 digits, 3 digits, 4 digits - or another alternative is to use 0025 then 0125 then 2039 given you know max digits is 4 per achievement). But still lets say we decide to use separators, i.e. a comma:
1098,2039,4820,8410,9384
Then once you need a data, just SELECT achievements belonging to specific userID and subsequently (if you added a separator)
explode (',', $array)
then your site php core knows that first 4 digits stand for "profile views" and lets say this means that he has a level 10 badge for profile views (1 badge for 100 views).
Thereon, you can easily do operations with no further need for SQL queries. Example, user wants to know his progress on achieving a level 20 badge, you display: he has a 1098/2000 (or 55%) progress.
At that, achievement Description, Name, level information is stored in site core, while percentage is calculated on the go.
Hope the logic is clear and may be useful to any1 in community out there.
Cheers!
Related
I am modelling the data of my application to use DynamoDB.
My data model is rather simple:
I have users and projects
Each user can have multiple projects
Users can be millions, project per users can be thousands.
My access pattern is also rather simple:
Get a user by id
Get a list of paginated users sorted by name or creation date
Get a project by id
get projects by user sorted by date
My single table for this data model is the following:
I can easily implement all my access patterns using table PK/SK and GSIs, but I have issues with number 2.
According to the documentation and best practices, to get a sorted list of paginated users:
I can't use a scan, as sorting is not supported
I should not use a GSI with a PK that would put all my users in the same partition (e.g. GSI PK = "sorted_user", SK = "name"), as that would make my single partition hot and would not scale
I can't create a new entity of type "organisation", put all users in there, and query by PK = "org", as that would have the same hot partition issue as above
I could bucket users and use write sharding, but I don't really know how I could practically query paginated sorted users, as bucket PKs would need to be possibly random, and I would have to query all buckets to be able to sort all users together. I also thought that bucket PKs could be alphabetical letters, but that could crated hot partitions as well, as the letter "A" would probably be hit quite hard.
My application model is rather simple. However, after having read all docs and best practices and watched many online videos, I find myself stuck with the most basic use case that DynamoDB does not seem to be supporting well. I suppose it must be quite common to have to get lists of users in some sort of admin panel for practically any modern application.
What would others would do in this case? I would really want to use DynamoDB for all the benefits that it gives, especially in terms of costs.
Edit
Since I have been asked, in my app the main use case for 2) is something like this: https://stackoverflow.com/users?tab=Reputation&filter=all.
As to the sizing, it needs to scale well, at least to the tens of thousands.
I also thought that bucket PKs could be alphabetical letters, but
that could create hot partitions as well, as the letter "A" would
probably be hit quite hard.
I think this sounds like a reasonable approach.
The US Social Security Administration publishes data about names on its website. You can download the list of name data from as far back as 1879! I stumbled upon a website from data scientist and linguist Joshua Falk that charted the baby name data from the SSA, which can give us a hint of how names are distributed by their first letter.
Your users may not all be from the US, but this can give us an understanding of how names might be distributed if partitioned by the first letter.
While not exactly evenly distributed, perhaps it's close enough for your use case? If not, you could further distribute the data by using the first two (or three, or four...) letters of the name as your partition key.
1 million names likely amount to no more than a few MBs of data, which isn't very much. Partitioning based on name prefixes seems like a reasonable way to proceed.
You might also consider using a tool like ElasticSearch, which could support your second access pattern and more.
I’m working on a dating app for a hackathon project. We have a series of questions that users fill out, and then every few days we are going to send suggested matches. If anyone has a good tutorial for these kinds of matching algorithms, it would be very appreciated. One idea is to assign a point value for each question and then to do a
def comparison(person_a, person_b) function where you iterate through these questions, and where there’s a common answer, you add in a point. So the higher the score, the better the match. I understand this so far, but I’m struggling to see how to save this data in the database.
In python, I could take each user and then iterate through all the other users with this comparison function and make a dictionary for each person that lists all the other users and a score for them. And then to suggest matches, I iterate through the dictionary list and if that person hasn’t been matched up already with that person, then make a match.
person1_dictionary_of_matches = {‘person2’: 3, ‘person3’: 5, ‘person4’: 10, ‘person5’: 12, ‘person6’: 2,……,‘person200’:10}
person_1_list_of_prior_matches = [‘person3’, 'person4']
I'm struggling on how to represent this in django. I could have a bunch of users and make a Match model like:
class Match(Model):
person1 = models.ForeignKey(User)
person2 = models.ForeignKey(User)
score = models.PositiveIntegerField()
Where I do the iteration and save all the pairwise scores.
and then do
person_matches = Match.objectsfilter(person1=sarah, person2!=sarah).order_by('score').exclude(person2 in list_of_past_matches)
But I’m worried with 1000 users, I will have 1000000 rows in my table if do this. Will this be brutal to have to save all these pairwise scores for each user in the database? Or does this not matter if I run it at like Sunday night at 1am or just cache these responses once and use the comparisons for a period of months? Is there a better way to do this than matching everyone up pairwise? Should I use some other data structure to capture the people and their compatibility score? Thanks so much for any guidance!
Interesting question. In machine learning's current paradigm you work with sparse matrices that means that you would not have to perform every single match evaluation. The sparsity may come from two alternatives:
Create a batch offline analysis of your data to perform some clustering (fancy solution).
Filter the individuals by some key attributes: a) gender/sexual preference, b) geographical location, c) dating status etc. (simple solution)
After the filtering you could perform a function for estimating appropriate matches for the new user. Based on the selected choices of the user adscribe selected matches into the database for future queries. However, if you get serious about this problem I suggest you give Spark a try. This is not a problem for an SQL database but for a Big Data Engine.
I'm building an app where two users can connect with each other and I need to store that connection (e.g. a friendship) in a DynamoDB table. Basically, the connection table has have two fields:
userIdA (hash key)
userIdB (sort key)
I was thinking to add an index on userIdB to query on both fields. Should I store a connection with one record (ALICE, BOB) or two records (ALICE, BOB; BOB, ALICE)? The first option needs one write operation and less space, but I have to query twice to get all all connections of an user. The second option needs two write operations and more space, but I only have to query once for the userId.
The user tablehas details like name and email:
userId (hash key)
name (sort key)
email
In my app, I want to show all connections of a certain user with user details in a listview. That means I have two options:
Store the user details of the connected users also in the connection table, e.g. add two name fields to that table. This is fast, but if the user name changes (name and email are retrieved from Facebook), the details are invalid and I need to update all entries.
Query the user details of each userId with a Batch Get request to read multiple items. This may be slower, but I always have up to date user details and don't need to store them in the connection table.
So what is the better solution, or are there any other advantages/disadvantages that I may have overlooked?
EDIT
After some google research regarding friendship tables with NoSQL databases, I found the following two links:
How does Facebook maintain a list of friends for each user? Does it maintain a separate table for each user?
NoSQL Design Patterns for Relational Data
The first link suggests to store the connection (or friendship) in a two way direction with two records, because it makes it easier and faster to query:
Connections:
1 userIdA userIdB
2 userIdB userIdA
The second link suggests to save a subset of duplicated data (“summary”) into the tables to read it faster with just one query. That would be mean to save the user details also into the connection table and to save the userIds into an attribute of the user table:
Connections:
# userIdA userIdB userDetails status
1 123 456 { userId: 456, name: "Bob" } connected
2 456 123 { userId: 123, name: "Alice" } connected
Users:
# userId name connections
1 123 Alice { 456 }
2 456 Bob { 123 }
This database model makes it pretty easy to query connections, but seems to be difficult to update if some user details may change. Also, I'm not sure if I need the userIds within the user table again because I can easily query on a userId.
What do you think about that database model?
In general, nosql databases are often combined with a couple of assumptions:
Eventual consistency is acceptable. That is, it's often acceptable in application design if during an update some of the intermediate answers aren't right. That is, it might be fine if for a few seconds while alice is becoming Bob's friend, It's OK if "Is Alice Bob's friend" returns true while "is Bob Alice's friend" returns false
Performance is important. If you're using nosql it's generally because performance matters to you. It's also almost certainly because you care about the performance of operations that happen most commonly. (It's possible that you have a problem where the performance of some uncommon operation is so bad that you can't do it; nosql is not generally the answer in that situation)
You're willing to make uncommon operations slower to improve the performance of common operations.
So, how does that apply to your question. First, it suggests that ultimately the answer depends on performance. That is, no matter what people say here, the right answer depends on what you observe in practice. You can try multiple options and see what results you get.
With regard to the specific options you enumerated.
Assuming that performance is enough of a concern that nosql is a reasonable solution for your application, it's almost certainly query rather than update performance you care about. You probably will be happy if you make updates slower and more expensive so that queries can be faster. That's kind of the whole point.
You can likely handle updates out of band--that is eventually consistency likely works for you. You could submit update operations to a SQS queue rather than handling them during your page load. So if someone clicks a confirm friend button, you could queue a request to actually update your database. It is OK even if that involves rebuilding their user row, rebuilding the friend rows, and even updating some counts about how many friends they have.
It probably does make sense to store a friend row in each direction so you only need one query.
It probably does make sense to store the user information like Name and picture that you typically display in a friend list duplicated in the friendship rows. Note that whenever the name or picture changes you'll need to go update all those rows.
It's less clear that storing the friends in the user table makes sense. That could get big. Also, it could be tricky to guarantee eventual consistency. Consider what happens if you are processing updates to two users' friendships at the same time. It's very important that you not end up with inconsistency once all the dust has settled.
Whenever you have non-normalized data such as duplicating rows in each direction, or copying user info into friendship tables, you want some way to revalidate and fix your data. You want to write code that in the background can go scan your system for inconsistencies caused by bugs or crashed activities and fix them.
I suggest you have the following fields in the table:
userId (hash key)
name (sort key)
email
connections (Comma separated or an array of userId assuming you have multiple connections for a user)
This structure can ensure consistency across your data.
I have 4 tables in my database. The image below shows the rows and columns with the name of the table enclosed in a red box. 4 tables total. Am I going about the relationship design correctly? This is a test project and I am strongly assuming that I will use a JOIN to get the entire set of data on one table. I want to start this very correctly.
A beginner question but is it normal that the publisher table, for example, has 4 rows with Nintendo?
I am using Django 1.7 along with PostgreSQL 9.3. I aim to keep simple with room to grow.
Basically you've got the relations back-to-front here...
You have game_id (i.e. a ForeignKey relation) on each of publisher, developer and platform models... but that means each of those entities can only be related to a single game. I'm pretty sure that's not what you want.
You need it the other way around... instead put three foreign keys onto the game model, one each for publisher, developer and platform.
A ForeignKey is what's called a many-to-one relation. In this example I think what you want is for 'many' games to be related to 'one' publisher. Same for developer and platform.
is it normal that the publisher table, for example, has 4 rows with Nintendo?
No, that's is an example of why you have it backwards. You should only have a single row for each publisher.
yes you are correct in saying that something is wrong.
First of all those screen shots are hard to follow, for this simple example they could work but that is not the right tool, pick up pen and paper and sketch some relational diagrams and think about what are the entities involved in the schema and what are their relations, for example you know you have publishers, and they can publish games, so in this restricted example you have 2 entities, game and publisher, and a relation publish among them (in this case you can place a fk on game if you have a single publisher for a game, or create an intermediary relation for a many to many case). The same point can be made for platform and games, why are you placing an fk to game there, what will happen if the game with id 2 will be published for nintendo 64 ? You are making the exact same mistake in all the entities.
Pick up any book about database design basics, maybe it will help in reasoning about your context and future problems.
I have a tabular form which is updated throughout the year and i wanted to prevent users from editing certain rows. Currently the 'row type' is hard coded however I want the application admin to control which 'row types' are readable / write at certain times. My answered question, click here.
Currently a dynamic action is fired which prevents the rows that contain the type 'manager figure' and 'sales_target' being edited.
I have created a table with the three row types against each customer. Each status is set by a number: 0 to 3 (These i will decode into something meaningful for users).
0 - Row with that row type is read only.
1 - Users can enter into the row with that row type.
2 - row is read only with that row type.
3 - row is complete and set to read only.
I have created a new form (new tab) for the admin user to maintain each status.
Currently for Customer 'Big Toy Store' rows should be set as follows:
Manager Figure row should be read only (since set to 2)
Sales should be readable (since set to 0)
Sales target should be writable (since set to 1)
Please can i be pointed in the right direction, ive looked into jquery but struggling to work out how to pass the output of an sql query to it, so it can be used to determine which rows should be read only.
Link:apex.oracle.com
workspace: apps2
user: developer.user
password: DynamicAction
application name: Application 71656 Read only Rows for Tabular Form
I'm not sure that a tabular form is a good format to work out this idea. As you can see, you require quite a bit of javascript to produce the results you want. Not only that, but this is all client side too, and thus there are some security risks to take into account. After all, I could just run some Firebug and disable or revert all things you did, and even change the numbers. Especially with sales figures, which is something you most definitely do want altered by everybody and is also the nature of your question, security is important.
There are more elegant ways here for you to control this, and not in the least to reduce the amount of highly customized javascript code. For example, you could do away with the tabular form, and instead implement a modal popup from an interactive report. Since the modal popup would be an iframe and thus a different page, you can create a form page. On a form page you have a lot more control over what happens to certain elements. You can specify conditions, read-only conditions, or use authorization schemes. All things you can not evidently use in a tabular form.
I'd think you'd do yourself a service by thinking this over again, and explore a different option. How much of a dealbreaker is using a tabular form actually?
You need the user. You need to know what group he belongs to, and then this has to be checked against the different statusses and rows have to be en/disabled. Do you really want this to happen on the client side?
I'm not saying it can't be done in a tabular form and javascript. It can, I'm just really doubting this is the correct approach!