I'm trying to make a simple "fill in the blanks" type of exam in django and would like to know what is the best way to design the database.
Example: "9 is the sum of 4 and 5, or 3 and 6."
During the exam, the above sentence would appear as "__ is the sum of __ and _, or _ and __."
Obviously there are unlimited number of answers to this question, but assume that the above numbers are the only answers. But the catch is that you can switch the places of 4 and 5, or the places of 3 and 6 and still get the right answer. Besides, the number of blanks is not known, so it can be 1 or more.
I would go with something like. First define a Question table:
Question
--------------------------
Id Text
1 9 is the sum of 4 and 5, or 3 and 6
...
Then save the position of the hidden substrings, let's call them fields, in another table:
QuestionField
--------------------------
Id QuestionId StartsAt EndsAt Set
1 1 0 1 1
2 1 16 17 2
3 1 22 23 2 # NOTE: Is in the same set as QuestionField #2
...
This table lets you retrieve the actual value of the field by querying the Question table (e.g. entry one refers to the value '9' in the first question).
The "Set" column contains an identifier of the "set" in which this field is, where fields in the same set can be replaced by each other. When you populate it, you would have to ensure that all questions that can be replaced by each other are in the same set. The actual number of the set doesn't matter, as long as it's unique. But it makes sense to have it equal to the ID of one of the elements of the set.
Related
I have a data set of flows between locations, say they are 50 locations, but the number of pairs is not even because some locations do not have flows. I would like to create ids for each pair of observation (w_id and h_id)
Thank you.
Desired output
w_code h_code w_id h_id
295101011001003 291892204451015 1 1
295101011001003 295101011001003 1 2
295101011001003 291892202003011 1 3
295101011001025 295101021003001 2 1
295101011001025 295101011001025 2 2
295101011001026 291879507003038 3 1
295101011001026 190130007001013 3 2
295101011001026 295101105001027 3 3
295101011001026 291892126002008 3 4
295101011001026 291892126001005 3 5
295101011001029 291892199006006 4 1
295101011002007 295101011002015 5 1
295101011002014 295101011002016 6 1
295101011002014 295101011001003 6 2
295101011002016 295101011001007 7 1
295101011002030 295101255001008 8 1
Documentation accessible through Stata includes this paper on composite categorical variables and this paper on handling dyadic data. The Stata command search would have led to these papers, except that the art to finding as well as searching is thinking of the right keywords.
In your case the natural question arises whether for example the pair (1, 2) is really the same as (2, 1) and for flows my guess is No. In mathematics, abstraction is often the key to solving a problem; in statistical computing some concreteness may make a problem clearer. Perhaps h means husband and w means wife, and perhaps not. Assuming that (1, 2) and (2, 1) are quite different, a joint identifier is immediately obtained by
egen newid = group(w_id h_id)
and for a small number of identifiers -- you mention 50 -- there is no pain in asking for values to be labelled, so that with
egen newid = group(w_id h_id), label
the pair (1, 1) would be mapped to the value 1 and the value label 1 1.
As this solution was not immediately obvious, it is likely that a study of help egen will reveal a bunch of tools likely to be useful in data management; some are directly statistical.
For pairs of identifiers where Billy, Bob is to be treated like Bob, Billy see the second paper linked above. Whether this is true for the OP is a little unclear, but it is likely to be true for some others reading this in the future.
This question deal with collision based in a new approach for chaining in hash tables.
There is 2 hash functions: First function h1(x) = x mod m1
with this function all the items are hashed to the primary hash table.
inside each index for the primary hash table there is internal hash table that hash the key with function 2 : h2(x) = x mod m2 and (m1!=m2)
for example lets say i had m1 = 5 and m2 = 3
and i want to insert 2 .. h1(2) = 2 mod 5 = 2 and h2(2) = 2 mod 3=2
this mean 2 will be inserted in the second index in the primary table in the second index of the internal table.
when collision happen in the primary table (this mean h1(x)=x%m1= y%m1 =h1(y)) we going to the second hash function h2 and calculate h2(x) and h2(y) and we put each one in the corresponding index in the internal hash table. lets say h1(x)= x%5 and h2(x) = x%3 for example if we insert 7 and 12 we will get h1(7)=2 and h1(12)=2 this mean both will be in index 2 in the primary hash table. then we compute h2 for both ( h2(7) = 1 and h2(12)=0) which mean we put 7 in index 1 and 12 in index 0 in the internal table.(and by this we avoid collision)
this was the question in the exams, also first section about the question was if there is collision for this numbers - 0 5 15 17 (with m1=5 and m2=3) and there is obviously 0 and 15 have the modulo for 5 and 3. the second question was about the search worst case runtime complexity? and the third section was to give 5 numbers that make the worst case if we search for number 2 in the tablewhen collision happen in the primary table (this mean h1(x)=x%m1= y%m1 =h1(y)) we going to the second hash function h2 and calculate h2(x) and h2(y) and we put each one in the corresponding index in the internal hash table. lets say h1(x)= x%5 and h2(x) = x%3 for example if we insert 7 and 12 we will get h1(7)=2 and h1(12)=2 this mean both will be in index 2 in the primary hash table. then we compute h2 for both ( h2(7) = 1 and h2(12)=0) which mean we put 7 in index 1 and 12 in index 0 in the internal table.(and by this we avoid collision)
this was the question in the exams, also first section about the question was if there is collision for this numbers - 0 5 15 17 (with m1=5 and m2=3) and there is obviously 0 and 15 have the modulo for 5 and 3. the second question was about the search worst case runtime complexity? and the third section was to give 5 numbers that make the worst case if we search for number 2 in the table
the question is what is the search worst case runtime complexity?
and example for 5 numbers that can cause worst case if we search for number 2.
i think the complexity is o(1) and i used this 5 numbers
7 12 17 22 42
did this correct ?can anybody help with this!
I'm now working on a household survey data set and I'd like to give certain members extra IDs according to their relationship to the household head. More specifically, I need to identify the adult children of household head and his/her spouse, if married, and assign them "sub-household IDs".
The variables are: hhid - household ID; pid -individual ID; relhead - relationship with head.
Regarding relhead, a 1 represents the head, a 6 represents a child, and a 7 represents a child-in-law. Below some example data, including in the last column the desired outcome. I assume that whenever a 6 is followed by a 7, they constitute a couple and belong to the same sub-household.
hhid pid relhead sub_hhid(desired)
50 1 1 1
50 2 3 1
50 3 6 2
50 4 6 3
50 5 7 3
-----------------------------------------------
67 1 1 1
67 3 6 2
67 4 7 2
Here are some thoughts:
There may be married and unmarried adult children within one household, the family structure is a little bit complicated, so I want to write some loop across the members in a household.
The basic idea is in the outer loop we identify the children staying-at-home and then check if there's a spouse presented, if there is, then we give the couple an indicator, if not, we continue and give the single stay_chil other indicator. After walking through all the possible members within a household, we get a series of within-household IDs. To facilitate further analysis , I need some kind of external ID variable to separate the sub-families.
* Define N as the total number of household, n as number of individual household size
* sty_chil is indicator for adult child who living with parents(head)
* sty_chil_sp is adult child's spouse
* "hid" and "ind_id" are local macros
forvalue hid=1/N {
forvalue ind_id= 1/n {
if sty_chil[`ind_id']==1 {
check if sty_chil_sp[`ind_id+1']==1 {
if yes then assign sub_hhid to this couples *a 6-7 pairs,identifid as couple
}
else { * single 6 identifid as single child
assign sub_hhid to this child
}
else { *Other relationships rather than 6, move forward
++ind_id the members within a household
}
++hid *move forward across households
}
The built-in stata by,sort: is pretty powerful but here I want to treat part of family members who fall into certain criterion and leave other untouched, so a if-else type loop is more natural for me (even by: may achieve my goal,it's always too tactful when situation become not so simpleļ¼and we cannot exhaust all the possible pattern of household pattern).
An immediate problem is that I don't know how to write loop across house IDs and individual IDs, because I used to acquire the household size (increment of outer loop) using by command (I'm not sure in this case it's 1 or the numerber of family members), and I'm not sure if mix up the by and if loops is a good programming practice, I favor write a "full loop" in this case. Please give me some clues how to achieve my goal and provide (illustrate)pseudo code for me.
An extra question is I cannot find the ado file which contains the content of by command, does it exist?
I will abstract from the issue of whether the assumption used to create matches is a sensible one or not. Rather, let this be an example of reaching the desired results without using explicit loops. Some logic and the use of subscripting (see help subscripting) can get you far.
clear
set more off
*----- example data -----
input ///
hhid pid relhead sub_hhid
50 1 1 1
50 3 6 2
50 4 6 3
50 5 7 3
67 1 1 1
67 3 6 2
67 4 7 2
67 5 6 3
end
list, sepby(hhid)
*----- what you want -----
bysort hhid (pid): gen hhid2 = sum( !(relhead == 7 & relhead[_n-1] == 6) )
list, sepby(hhid)
As you can see, one line of code gets you there. The reasoning is the following:
sum() gives the running sum. The arguments to sum(), being conditions, can either be True or False. The ! denotes the logical not (see help operators).
If it is not the case that the relationship is daughter/son-in-law AND the previous relationship is daughter/son, the condition evaluates to True and takes on the value of 1, increasing the running sum by 1. If it evaluates to False, meaning that the relationship is daughter/son-in-law AND the previous relationship is daughter/son, then it takes on the value of 0 and the running sum will not increase. This gives the result you seek.
You do this using the by: prefix, since you want to check each original household independently, so to speak.
For the the first observation of each original household, the condition always evaluates to True. This is because there exist no "previous" observation (relationship), and Stata considers relhead to be missing (., a very large number) and therefore, not equal to 6. This takes the running sum from 0 to 1 for the first observation of each sub-group, and so on.
Bottom line: learn how to use by: and take advantage of the features offered by Stata. Do not swim against the current; not here.
Edit
Please note that instead of progressively changing your example data set, you should provide a representative example from the beginning. Not doing so can render answers that are initially OK, completely inadequate.
For your modified example, add:
replace hhid2 = 1 if !inlist(relhead,6,7)
That will simply assign anyone not 6 or 7 to the same household as the head. The head is assumed to always have hhid2 == 1. If the head can have hhid2 != 1, then
bysort hhid (relhead): replace hhid2 = hhid2[1] if !inlist(relhead,6,7)
should work.
You can follow with:
bysort hhid (pid): replace hhid2 = hhid2[_n-1] + 1 if hhid2 != hhid2[_n-1] & _n > 1
but because they are IDs, it's not really necessary.
Finally, use:
gen hhid3 = string(hhid) + "_" + string(hhid2)
to create IDs with the form 50_1, 50_2, 50_3, etc.
Like I said before, if your data presents more complications, you should present a relevant example.
Currently I have an IDescriptor that pulls Sales from another FILE for Period 1,2,3. I want to be able to pull Costs from Period 1,2,3 and subtract the totals to get a profit.
The Current I-Descriptor Statement is:
TRANS(SAS1,ITEM,4,'X');#1<1,1,1>+#1<1,1,2>+#1<1,1,3>
4 = Sales
3 = Cost
#1<1,1,1> = Period 1
#1<1,1,2> = Period 2
#1<1,1,3> = Period 3
#1<1,1,4> = Period 4
You are looking for EXTRACT
So, try the following the the loc attribute:
TRANS(SAS1,ITEM,4,'X');EXTRACT(#1,1,1,1)+EXTRACT(#1,1,1,2)+EXTRACT(#1,1,1,3)
The next bit of the question isn't entirely clear to me, so let me know if I've made an incorrect assumption.
Costs come from the current file (the one this dictionary file is) from attribute (field) 3. It has the same format as the data for Sales (<1,1,1 to 3>). In this case you would need to use #RECORD.
TRANS(SAS1,ITEM,4,'X');EXTRACT(#1,1,1,1)+EXTRACT(#1,1,1,2)+EXTRACT(#1,1,1,3);EXTRACT(#RECORD,1,1,1)+EXTRACT(#RECORD,1,1,2)+EXTRACT(#RECORD,1,1,3);#2-#3
So, let's break it down:
Read attribute 4 from record ITEM in file SAS1. Return an empty string if the item doesn't exist. Hold this in position 1 (#1):
TRANS(SAS1,ITEM,4,'X');
Extract multi-subvalues 1 to 3 from the value in position 1 then add them together (). Hold this in position 2:
EXTRACT(#1,1,1,1)+EXTRACT(#1,1,1,2)+EXTRACT(#1,1,1,3);
Extract multi-subvalues 1 to 3 from the current record and add them together. Hold this in position 3:
EXTRACT(#RECORD,1,1,1)+EXTRACT(#RECORD,1,1,2)+EXTRACT(#RECORD,1,1,3);
Finally, subtract the value in position 3 (total costs) from the value in position 2 (total sales). As this is the last position, return the result:
#2-#3
The only missing thing in Dan's answer is that you need another TRANS to get your COST field, hence TRANS(SAS1,ITEM,3,'X');
after the first operations on the EXTRACTs.
I have a query that's basically "count all the items of type X, and return the items that exist more than once, along with their counts". Right now I have this:
Item.objects.annotate(type_count=models.Count("type")).filter(type_count__gt=1).order_by("-type_count")
but it returns nothing (the count is 1 for all items). What am I doing wrong?
Ideally, it should get the following:
Type
----
1
1
2
3
3
3
and return:
Type, Count
-----------
1 2
3 3
In order to count the number of occurrences of each type, you have to group by the type field. In Django this is done by using values to get just that field. So, this should work:
Item.objects.values('group').annotate(
type_count=models.Count("type")
).filter(type_count__gt=1).order_by("-type_count")
It's logical error ;)
type_count__gt=1 means type_count > 1 so if the count == 1 it won't be displayed :)
use type_count__gte=1 instead - it means type_count >= 1 :)