The questionnaire I have data from asked respondents to rank 20 items on a scale of importance to them. The lower end of the scale contained a "bin" in which respondents could throw away any of the 20 items that they found completely unimportant to them. The result is a dataset with 20 variables (1 for every item). Every variable receives a number between 1 and 100 (and 0 if the item was thrown in the bin)
I would like to recode the entries into a ranking of the variables for every respondent. So all variables would receive a number between 1 and 20 relative to where that respondent ranked it.
Example:
Current:
item1 item2 item3 item4 item5 item6 item7 item8 etc.
respondent1 67 44 29 7 0 99 35 22
respondent2 0 42 69 50 12 0 67 100
etc.
What I want:
item1 item2 item3 item4 item5 item6 item7 item8 etc.
respondent1 7 6 4 2 1 8 5 3
respondent2 1 4 7 5 3 1 6 8
etc.
As you can see with respondent2, I would like items that received the same value, to get the same rank and the ranking to then skip a number.
I have found a lot of info on how to rank observations but I have not found out how to rank variables yet. Is there anyone that knows how to do this?
Here is one solution using reshape:
/* Create sample data */
clear *
set obs 2
gen respondant = "respondant1"
replace respondant = "respondant2" in 2
set seed 123456789
forvalues i = 1/10 {
gen item`i' = ceil(runiform()*100)
}
replace item2 = item1 if respondant == "respondant2"
list
+----------------------------------------------------------------------------------------------+
| respondant item1 item2 item3 item4 item5 item6 item7 item8 item9 item10 |
|----------------------------------------------------------------------------------------------|
1. | respondant1 14 56 69 62 56 26 43 53 22 27 |
2. | respondant2 65 65 11 7 88 5 90 85 57 95 |
+----------------------------------------------------------------------------------------------+
/* reshape long first */
reshape long item, i(respondant) j(itemNum)
/* Rank observations, accounting for ties */
by respondant (item), sort : gen rank = _n
replace rank = rank[_n-1] if item[_n] == item[_n-1] & _n > 1
/* reshape back to wide format */
drop item // optional, you can keep and just include in reshape wide
reshape wide rank, i(respondant) j(itemNum)
Related
I have the following problem, I would like to sum up a column and divide the sum every line through the sum of the whole column till a specific value is reached. so in Pseudocode it would look like that:
data;
set auto;
sum_of_whole_column = sum(price);
subtotal[i] = 0;
i =1;
do until (subtotal[i] = 70000)
subtotal[i] = (subtotal[i] + subtotal[i+1])/sum_of_whole_column
i = i+1
end;
run;
I get the error that I haven't defined an array... so can I use something else instead of subtotal[i]?and how can I put a column in an array? I tried but it doesn't work (data = auto and price the column I want to put into an array)
data invent_array;
set auto;
array price_array {1} price;
run;
EDIT: maybe the dataset I used is helpful :)
DATA auto ;
LENGTH make $ 20 ;
INPUT make $ 1-17 price mpg rep78 ;
CARDS;
AMC Concord 4099 22 3
AMC Pacer 4749 17 3
Audi 5000 9690 17 5
Audi Fox 6295 23 3
BMW 320i 9735 25 4
Buick Century 4816 20 3
Buick Electra 7827 15 4
Buick LeSabre 5788 18 3
Cad. Eldorado 14500 14 2
Olds Starfire 4195 24 1
Olds Toronado 10371 16 3
Plym. Volare 4060 18 2
Pont. Catalina 5798 18 4
Pont. Firebird 4934 18 1
Pont. Grand Prix 5222 19 3
Pont. Le Mans 4723 19 3
;
RUN;
Perhaps I am missing your point but your subtotal will never be equal to 70 000 if you divide by the sum of its column. The maximum value will be 1. Your incremental sum however can be equal or superior to 70 000.
data stage1;
retain _sum 0;
set auto;
_sum = sum(_sum, price);
if _sum < 70000 then output;
run;
proc sql;
create table want as
select t1.*, t1._sum/sum(price) as subtotal
from stage1 as t1;
quit;
subtotal
0.0607268256
0.1310834235
0.2746411058
0.3679017467
0.5121261056
0.5834753107
0.6994325842
0.7851820027
1
I have a large panel dataset that looks as follows.
input id age high weight str6 daily_drink
1 10 110 35 water
1 10 110 35 coffee
1 11 120 38 water
1 11 120 38 coffee
1 12 130 50 water
1 12 130 50 coffee
2 11 118 31 water
2 11 118 31 coffee
2 11 118 31 milk
2 12 123 38 water
2 12 123 38 coffee
2 12 123 38 milk
3 10 98 55 water
3 11 116 36 water
3 12 129 39 water
4 12 125 40 water
end
However, I would like to use stata to keep objects with complete 10, 11, and 12 age. Looks like this.
id age high weight daily_drink
1 10 110 35 water
1 10 110 35 coffee
1 11 120 38 water
1 11 120 38 coffee
1 12 130 50 water
1 12 130 50 coffee
3 10 98 55 water
3 11 116 36 water
3 12 129 39 water
However, all the rows are without missing data, so I cannot simply delete the row with missing data. Is there any way to do it? Any suggestion will help. Thanks in advance.
You can use bysort and egen for this. Something along the lines of
bysort id: egen has10 = total(age==10)
bysort id: egen has11 = total(age==11)
bysort id: egen has12 = total(age==12)
keep if (has10 != 0) & (has11 != 0) & (has12 != 0)
should work (untested). See help egen for more info. Install gtools if you have very large data (ssc install gtools) and then replace egen by gegen.
A solution that works if 10, 11, 12 are the only age values possible:
bysort id (age) : gen nvals = sum(age != age[_n-1])
by id : replace nvals = nvals[_N]
keep if nvals == 3
Consider also
bysort id (age) : gen OK1 = age[1] == 10 & age[_N] == 12
by id : egen OK2 = max(age == 11)
keep if OK1 & OK2
I have a matrix Power BI visualization which is like
Jan Feb Mar April
Client1 10 20 30 10
Client2 15 25 65 80
Client3 66 22 54 12
I have created 3 what if parameters slicer table (having values from 1 to 4) for each client
For example, If the value of the first slicer is 1 and the second is 2 and the third is 2 then I want
Jan Feb Mar April
Client1 0 20 30 10
Client2 0 0 65 80
Client3 0 0 54 12
That is, it should replace the value with zero. I have been able to achieve that for one client using Dateadd function (by adding month)
Measure = CALCULATE(SUM('Table'[Value]),
DATEADD('Table'[Column], Parameter[Parameter Value], MONTH))
and I have used this measure to display the value, but how to make it work for the other two clients as well .
Let say you have three parameter tables as follows
Parameter1 Parameter2 Parameter3
Value1 Value2 Value3
------ ------ ------
1 1 1
2 2 2
3 3 3
4 4 4
and each of them has its own slicer. Then the measure you are after might look something like this:
Measure =
VAR Val1 = MAX(Parameter1[Value1])
VAR Val2 = MAX(Parameter2[Value2])
VAR Val3 = MAX(Parameter3[Value3])
VAR CurrClient = MAX('Table'[Client])
VAR CurrMonth = MONTH(DATEVALUE(MAX('Table'[Month]) & " 1, 2000"))
RETURN SWITCH(CurrClient,
"Client1", IF(CurrMonth <= Val1, 0, SUM('Table'[Value])),
"Client2", IF(CurrMonth <= Val2, 0, SUM('Table'[Value])),
"Client3", IF(CurrMonth <= Val3, 0, SUM('Table'[Value])),
SUM('Table'[Value])
)
Basically, you read in each parameter and compare them to the month in the current cell.
Here is the data I have, I use proc tabulate to present it how it is presented in excel, and to make the visualization easier. The goal is to make sure groups strictly below the diagonal (i know it's a rectangle, the (1,1) (2,2)...(7,7) "diagonal") to roll up the column until it hits the diagonal or makes a group size of at least 75.
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 80 90 87 45 12 41 88
4 24 90 100 110 22 141 88
5 0 1 0 0 0 0 2
6 0 1 0 0 0 0 6
7 0 1 0 0 0 0 2
8 0 1 0 0 0 0 11
Ive already used if/thens to regroup certain data values, but I need a general way to do it for other sets.
Thanks in advance
desired results
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 104 90 87 45 12 41 88
4 0 94 100 110 22 141 88
5 0 0 0 0 0 0 2
6 0 0 0 0 0 0 6
7 0 0 0 0 0 0 13
8 0 0 0 0 0 0 0
Mock up some categorical data for some patients who have to be counted
data mock;
do patient_id = 1 to 2500;
month = ceil(7*ranuni(123));
age = ceil(8*ranuni(123));
output;
end;
stop;
run;
Create a tabulation of counts (N) similar to the one shown in the question:
options missing='0';
proc tabulate data=mock;
class month age;
table age,month*n=''/nocellmerge;
run;
For each month get the sub-diagonal patient count
proc sql;
/* create table subdiagonal_column as */
select month, count(*) as subdiag_col_freq
from mock
where age > month
group by month;
For each row get the pre-diagonal patient count
/* create table prediagonal_row as */
select age, count(*) as prediag_row_freq
from mock
where age > month
group by age;
other sets can be tricky if the categorical values are not +1 monotonic. To do a similar process for non-montonic categorical values you will need to create surrogate variables that are +1 monotonic. For example:
data mock;
do item_id = 1 to 2500;
pet = scan ('cat dog snake rabbit hamster', ceil(5*ranuni(123)));
place = scan ('farm home condo apt tower wild', ceil(6*ranuni(123)));
output;
end;
run;
proc tabulate data=mock;
class pet place;
table pet,place*n=''/nocellmerge;
run;
proc sql;
create table unq_pets as select distinct pet from mock;
create table unq_places as select distinct place from mock;
data pets;
set unq_pets;
pet_num = _n_;
run;
data places;
set unq_places;
place_num = _n_;
run;
proc sql;
select distinct place_num, mock.place, count(*) as subdiag_col_freq
from mock
join pets on pets.pet = mock.pet
join places on places.place = mock.place
where pet_num > place_num
group by place_num
order by place_num
;
I have the following dataset detailing the ages of women present in a household :
Household ID Age
1 19
2 52
2 22
2 18
3 37
3 29
I would like to add a third column to this table which gives an ID to each women in the household from 1 to n, where n is the number of women in the household. So this would give the following :
Household ID Age Woman ID
1 19 1
2 52 1
2 22 2
2 18 3
3 37 1
3 29 2
How can I achieve this ?
First make sure that the Household ID is sorted. Then using First. should give you what you need.
proc sort data = old;
by Household_ID;
run;
data new(rename= (count=woman_id));
set old;
count + 1;
by Household_ID;
if first.Household_ID then count = 1;
run;