I hope you understand my problem. An excerpt of my MySQL database looks like that:
To ensure data integrity two constraints must be implemented:
Given an answer record, both "referencial paths" have to lead to the same exam record.
Every exam_student must answer every questions. In other words: There should be an exam_question_result for each pair of exam_question and exam_student related to the same exam.
How can I realize these constraints in my database?
Solution 1 (referring to comments under this post):
I updated the schema using composite keys.
Related
Pls check the uml diagram
What I want to know is if there is 30quest and their options in section 1 ,20question in section 2,30question in section 3, how should i keep in the table as RC passages would have 300-400 words, plus the questions,options it would be around 7-800 words per question.
So each question should have one row in the table or , testwise i should have different columns of section and all question, option should be saved in json format in one column(item for dynamodb)?
I would follow these rules for DynamoDB table design:
Definitely keep everything in one table. It's rare for one application to need multiple tables. It is OK to have different items (rows) in DynamoDB represent different kinds of objects.
Start by identifying your access patterns, that is, what are the questions you need to ask of your data? This will determine your choice of partition key, sort key, and indexes.
Try to pick a partition key that will result in object accesses being spread somewhat evenly over your different partitions.
If you will have lots of different tests, with accesses spread somewhat evenly over the tests, then TestID could be a good partition key. You will probably want to pull up all the tests for a given instructor, so you could have a column InstructorID with a global secondary index pointing back to the primary key attributes.
Your sort key could be heterogenous--it could be different depending on whether the item is a question or a student's answer. For questions, the sort key could be QuestionID with the content of the question stored as other attributes. For question options it could be QuestionID#OptionID, with something like an OptionDescription attribute for the content of the option. Keep in mind that it's OK to have sparse attributes--not every item needs something populated for every attribute, and it's OK to have attributes that are meaningless for many items. For answers, your sort key could be QuestionID#OptionID#StudentID, with the content of the student's answer stored as a StudentAnswer attribute.
Here is a guide on DynamoDB best practices. For something more digestible, search in YouTube for "aws reinvent dynamo rick houlihan." Rick Houlihan has some good talks about data modeling in DynamoDB. Here are a couple, and one more on data modeling:
https://www.youtube.com/watch?v=6yqfmXiZTlM&list=PL_EDAAla3DXWy4GW_gnmaIs0PFvEklEB7
https://www.youtube.com/watch?v=HaEPXoXVf2k
https://www.youtube.com/watch?v=DIQVJqiSUkE
The better approach is to store each question and its option as a row in DynamoDB Table . Definitely will not suggest , the second approach of storing the question and answer as a JSON is definitely not advisable as the maximum size of a DynamoDb Item is 400 Kb. In such scenarios , using a document database is much more helpful.
Also try to come up with the type of queries that you will be running . Some of the typical ones are
Get all questions in a section by SectionID
Get the details of a Question by Question Id
Get all questions
If you can provide some more information , I could guide you in data modelling
Also I did not see the UML diagram
The following is my suggestion.Create the DynamoDB table
Store each sectionId , question and its option as a row in DynamoDB Table
Partition Key :- SectionID , Sort Key :- QuestionId
Create a GSI on the table with Partition Key :- QuestionId, Sort Key :- OptionId
I am working on making an application in Django that can manage my GRE words and other details of each word. So whenever I add a new word to it that I have learnt, it should insert the word and its details in the database alphabetically. Also while retrieving, I want the details of the particular word I want to be extracted from the database.
Efficiency is the main issue.
Should I use SQLite? Should I use a file? Should I use a JSON object to store the data?
If using a file is the most efficient, what data structure should I implement?
Are there any functions in Django to efficiently do this?
Each word will have - meaning, sentence, picture, roots. How should I store all this information?
It's fine if the answer is not Django specific and talks about the algorithm or the type of database.
I'm going to answer from the data perspective since this is not totally related to django.
From your question it appears you have a fixed identifier for each "row": the word, which is a string, and a fixed set of attributes.
I would recommend using any enterprise level RDBMS. In the case of django, the most popular for the python ecosystem is PostgreSQL.
As for the ordering, just create the table with an index on the word name (this will be automatically done for you if you use the word as primary key), and retrieve your records using order_by in django.
Here's some info on django field options (check primary_key=True)
And here's the info for the order_by order_by method
Keep in mind you can also set the ordering in the Meta class of the model.
For your search case, you'll have to implement an endpoint that is capable of querying your database with startswith. You can check an example here
Example model:
class Word(models.Model):
word = models.CharField(max_length=255, primary_key=True)
roots = ...
picture = ...
On your second question: "Is this costly?"
It really depends. With 4000 words I'll say: NO
You probably want to add a delay in the client to do the query anyways (for example "if the user has typed in and 500ms have passed w/o further input")
If I'm to give 1 good advice to any starting developer, it's don't optimize prematurely
This question already has answers here:
Django: __in query lookup doesn't maintain the order in queryset
(6 answers)
Closed 8 years ago.
I've searched online and could only find one blog that seemed like a hackish attempt to keep the order of a query list. I was hoping to query using the ORM with a list of strings, but doing it that way does not keep the order of the list.
From what I understand bulk_query only works if you have the id's of the items you want to query.
Can anybody recommend an ideal way of querying by a list of strings and making sure the objects are kept in their proper order?
So in a perfect world I would be able to query a set of objects by doing something like this...
Entry.objects.filter(id__in=['list', 'of', 'strings'])
However, they do not keep order, so string could be before list etc...
The only work around I see, and I may just be tired or this may be perfectly acceptable I'm not sure is doing this...
for i in listOfStrings:
object = Object.objects.get(title=str(i))
myIterableCorrectOrderedList.append(object)
Thank you,
The problem with your solution is that it does a separate database query for each item.
This answer gives the right solution if you're using ids: use in_bulk to create a map between ids and items, and then reorder them as you wish.
If you're not using ids, you can just create the mapping yourself:
values = ['list', 'of', 'strings']
# one database query
entries = Entry.objects.filter(field__in=values)
# one trip through the list to create the mapping
entry_map = {entry.field: entry for entry in entries}
# one more trip through the list to build the ordered entries
ordered_entries = [entry_map[value] for value in values]
(You could save yourself a line by using index, as in this example, but since index is O(n) the performance will not be good for long lists.)
Remember that ultimately this is all done to a database; these operations get translated down to SQL somewhere.
Your Django query loosely translated into SQL would be something like:
SELECT * FROM entry_table e WHERE e.title IN ("list", "of", "strings");
So, in a way, your question is equivalent to asking how to ORDER BY the order something was specified in a WHERE clause. (Needless to say, I hope, this is a confusing request to write in SQL -- NOT the way it was designed to be used.)
You can do this in a couple of ways, as documented in some other answers on StackOverflow [1] [2]. However, as you can see, both rely on adding (temporary) information to the database in order to sort the selection.
Really, this should suggest the correct answer: the information you are sorting on should be in your database. Or, back in high-level Django-land, it should be in your models. Consider revising your models to save a timestamp or an ordering when the user adds favorites, if that's what you want to preserve.
Otherwise, you're stuck with one of the solutions that either grabs the unordered data from the db then "fixes" it in Python, or constructing your own SQL query and implementing your own ugly hack from one of the solutions I linked (don't do this).
tl;dr The "right" answer is to keep the sort order in the database; the "quick fix" is to massage the unsorted data from the database to your liking in Python.
EDIT: Apparently MySQL has some weird feature that will let you do this, if that happens to be your backend.
I'm having an issue with querying an index where a common search term also happens to be part of a company name interspersed throughout most of the documents. How do I exclude the business name in results without effecting the ranking on a search that includes part of the business name?
example: Bobs Automotive Supply is the business name.
How can I include relevant results when someone searches automotive or supply without returning every document in the index?
I tried "-'Bobs Automotive Supply' +'search term'" but this seems to exclude any document with Bobs Automotive Supply and isn't very effective on searching 'supply' or 'automotive'
Thanks in advance.
Second answer here, based on additional clarification from first answer.
A few options.
Add the business name as StopWords in the StopWordFilter. This will stop Solr from Indexing them at all. Searches that use them will only really search for those words that aren't in the business name.
Rely on the inherent scoring that Solr will apply due to Term frequency. It sounds like these terms will be in the index frequently. Queries for them will still return the documents, but if the user queries for other, less common terms, those will get a higher score.
Apply a low query boost (not quite negative, but less than other documents) to documents that contain the business name. This is covered in the Solr Relevancy FAQ http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F
Do you know that the article is tied to the business name or derive this? If so, you could create another field and then just exclude entities that match on the business name using a filter query. Something like
q=search_term&fq=business_name:(NOT search_term)
It may be helpful to use subqueries for this or to just boost down rather than filter out results.
EDIT: Update to question make this irrelavent. Leaving it hear for posterity. :)
This is why Solr Documents have different fields.
In this case, it sounds like there is a "Footer" field that is separate from your "Body" field in your documents. When searches are performed, they would only done against the Body, which won't include data from the Footer. You could even have a third field which is the "OriginalContent" field, which contains the original copy for display purposes. You wouldn't search that, just store it for later.
The important part is to create the two separate fields in your schema and make sure that you index those field that you want to be able to search.
I'm working on a QA system in django, which includes data tables of Question, Answer and Answer_statistics. One Question can have multiple Answers, an Answer has an Answer_statistics. Answer_statistics contain values like votes count, comments count of each answer. Now I'm trying to get the sum of a column in answer_statistics filtered by the question the answers are attached to. For example, get the total vote count of all the answers to a certain question. It should be something like this:
a_question.answer__answer_statistics_set.aggregate(Sum('comment_count'))
Feels like there should be some kind of easy solutions, but couldn't find one by now. Could someone please give a hint? Thanks!
You follow the relationship inside the aggregate call, not in the object lookup. Something like:
a_question.aggregate(Sum('answer__answer_statistics__comment_count'))