Duplicate each row as many times as is given in a variable - stata

I have a set of individuals with characteristics. Each individual belongs to one or more group. I need to merge individuals to group characteristics, by firstly duplicating each row of individual data set as many times as is given by n_groups.
The data looks like
id age n_groups
1 50 2
2 46 1
3 51 3
4 44 2
I need to have
id age n_groups group_index
1 50 2 1
1 50 2 2
2 46 1 1
3 51 3 1
3 51 3 2
3 51 3 3
4 44 2 1
4 44 2 1
It seems like a very easy task, and I need some variation of expand with variable number of duplicates. Any ideas if there is a simple command for this?
Thanks!

Appears the solution is very standard. The expand command indeed allows for expanding based on variable: expand n_groups solved the question.

Related

In Stata, how can I only analyze observations with repeated measures using the mixed command?

I have a dataset on multiple outcome for individuals in two groups that were treated (or not treated) by an intervention at two time points. However, not every individual has complete data for each measure at each time point.
id
outcome
outcome_value
group
time
1
depression
10
1
1
1
depression
8
1
2
2
depression
10
2
1
2
depression
.
2
2
1
anxiety
12
1
1
1
anxiety
8
1
2
2
anxiety
12
2
1
2
anxiety
6
2
2
How do I exclude IDs that do not have an outcome in both periods? I only want to see how outcomes changed between groups over time for observations have data in all periods. I am using the mixed command in Stata to conduct this analysis.
First drop the missing rows
keep if !missing(outcome_value)
Then, keep the ID/outcome combinations that have _N==2
bysort id outcome: keep if _N==2
Output:
id outcome outco~ue group time ct
1 anxiety 8 1 2 2
1 anxiety 12 1 1 2
1 depression 10 1 1 2
1 depression 8 1 2 2
2 anxiety 6 2 2 2
2 anxiety 12 2 1 2
As #NickCox has pointed out in the comments, while we cannot directly combine these two, there is still a one-line approach:
bysort id outcome (time) : keep if !missing(outcome_value[1], outcome_value[2])
Of note, we cannot do this:
bysort id outcome : keep if !missing(outcome_value) & _N==2
because _N is not reduced by group until after the rows with missing outcome have been removed.

Apply filter before query editor steps

I have the following scenario. My datasource looks like this:
Order Item Type Value
1 1 A 14
1 1 B 10
1 1 C 12
1 2 A 12
2 1 C 19
2 1 D 15
2 2 B 11
Now I apply a few steps in the query editor, inter alia, a Group By (by Order and Item), so that my finished table looks like this:
Order Item Value
1 1 36
1 2 12
2 1 34
2 2 11
I am looking now for a possiblity to filter my datasource table before the steps are getting applied (Filter datasource > query steps getting applied > chart changes).
In my example here I would filter the datasource by Type <> B:
Order Item Type Value
1 1 A 14
1 1 C 12
1 2 A 12
2 1 C 19
2 1 D 15
And the final table (chart datasource) would be looking like this:
Order Item Value
1 1 26
1 2 12
2 1 34
I tried it with parameters. But the problem is I need the filter in power bi online, so that the enduser can apply this filter.
Thanks in advance for any ideas !!
Don't apply the grouping in your query. Leave the source table as it is, create a measure which sums Value, and filter Type.
using order and item in a table visual(don't summarise) and for value using SUM of values, which can later be filtered by type should give the desired result.

GLPK Mathprog group of sets

I'm trying to code a model that can solve the Multiple Choice Knapsack Problem (MCKP) as described in Knapsack Problems involving dimensions, demands and multiple
choice constraints: generalization and transformations between
formulations (Found here, see figures 8 an 9). You can find an example GMPL model of the basic knapsack problem here. For anyone looking for a quick explanation of the knapsack problem read the following illustration:
You are an adventurer and have stumbled upon a treasure trove. There are hundreds of wonderful items 'i' that each have a weight 'w' and a profit 'p'. Say you have a knapsack with weight capacity as 'c' and you want to make the most profit without overfilling your knapsack. What is the best combination of items such that you make the most profit?
In code:
maximize obj :
sum{(i,w,p) in I} p*x[i];
Where 'I' is the basket of items, and x[i] is the binary variable (0 = not chosen, 1 = chosen)
The problem that I am having trouble with is the addition of multiple groups. MCKP requires exactly one item to be selected from each group. So, for example, lets say we have three groups from which to choose. They could be represented as follows (ignore actual values):
# Items: index, weight, profit
set ONE :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set TWO :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set THREE :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
I am confused on how I can iterate over each group and how I would define the variable x. I assume it would look something like:
var x{i,j} binary;
Where i is the index of items in j of groups. This assumes I define a set of sets:
set Groups{ONE,TWO,THREE}
Then I'd iterate over the groups of items:
sum{j in Groups, (i,w,p) in Groups[j]} p*x[i,j];
But I am concerned because I believe GMPL does not support ordered sets. I have seen this related question where the answer suggests defining a set within a set. However, I am not sure how it would apply in this particular scenario.
My main question, to be clear: In GMPL, how can I iterate over sets of sets (in this case a set of groups where each group has a set of items)?
Unlike AMPL, GMPL doesn't support sets of sets. Here's how to do it in AMPL:
set Groups;
set Items{Groups} dimen 3;
# define x and additional constraints
# ...
maximize obj: sum{g in Groups, (i,w,p) in Items[g]} p*x[i];
data;
set Groups := ONE TWO THREE;
# Items: index, weight, profit
set Items[ONE] :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set Items[TWO] :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set Items[THREE] :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
If you have no more than 300 variables, you can use a free student version of AMPL and solvers (e.g. CPLEX or Gurobi).
Based on this gnu mailing list thread, I believe GMPL/MathProg has support for what you want to do. Here's their example:
set WORKERS;
param number_of_shifts, integer, >= 1;
set WORKER_CLIQUE{1..number_of_shifts}, within WORKERS;
data;
set WORKERS := Jack Kate Sawyer Sun Juliet Richard Desmond Hugo;
param number_of_shifts := 2;
set WORKER_CLIQUE[1] := Sawyer, Juliet;
set WORKER_CLIQUE[2] := Jack, Kate, Hugo;
In your example, I assume you'd use something like, set Items{1..3}, within Groups; with the data block from #vitaut's answer.

Two Way EntityCollection Binding to a Two Dimension Data Matrix

I have a Day Strucuture Table, which has following Columns I want to display:
DoW HoD Value
1 1 1
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
1 11 4
1 12 4
1 13 4
1 14 4
1 15 4
1 16 4
1 17 4
1 18 4
1 19 4
1 20 4
1 21 1
1 22 1
1 23 1
1 24 1
Dow is The Day of Week (Monday etc.), HoD is the Hour of Day and Value is the actual value.
Now I want to Bind this Day Structure Entity Collection directly to a Control so any Changes can be bound TwoWay
Like this Format:
I think the best way to achieve this is to use a Template and/or a converter, but I just dont know how ;)
I already read this article, but Lack of a TwoWay Binding functionality makes it not useful for me :(
I Hope you can help me
Jonny
Again i solved it on my own ;)
For this problem i created a Grid with a fixed amout of rows and columns. Inside this Grid I put a Itemscontrol bound to my List of data. Inside the DataTemplate I placed a Textbox bound to the current value and bound the Grid Row and Columnproperties to the Day of the Week/Hour of Day.
Pro:
The Textbox is TwoWay Databound to a certain Object or Element.
Very Easy to implement if Row and Colum Property is numeric.
Con:
Limited to a fixed amout of Rows/Columns.
Very much Code to write in XAML (Copy and Paste)
Kinda "dirty" Code. Feels not like the best way to do it.
Im still open for other suggestions.

Django query aggregation

Imagine a number guessing game where one person thinks of a number and another person has to guess it. The game is over if the correct number was guessed.
The models might look like this
class SecretNumber(models.Model):
number = models.IntegerField()
class Guess(models.Model)
secretnumber = models.Foreignkey(SecretNumber)
guess = models.IntegerField()
After having played four times, the database might look like this:
id number
==========
1 10
2 54
3 68
4 25
id secretnumber_id guess
=============================
1 1 50
2 1 30
3 1 10
4 2 99
5 2 60
6 2 54
7 3 1
8 3 68
9 4 73
10 4 34
11 4 86
12 4 51
13 4 25
As you can see, the guesser was very lucky: it took him 3, 3, 2 and 4 guesses. But that's just to keep this example short.
Now I need to come up with a query which will allow to display the following data:
Nb. guesses Count
=====================
2 1
3 2
4 1
A manual SQL statement would look something like this:
SELECT inner_count AS 'Nb. guesses', count(inner_count) AS 'Count' FROM (
SELECT secretnumber_id, count(id) AS inner_count FROM guess GROUP BY secretnumber_id
) GROUP BY inner_count
I thought about annotating an annotation, but this seems not to be possible.
Any ideas?
If you're using django (ie models instead of classes), you want to use the QuerySet aggregate functions
e.g.
from django.db.models import Count
guesses = Guess.objects.values('secretnumber').annotate(Count('secretnumber'))
This will give you a queryset with a list of objects, which have a secretnumber and a count value.