Using a loop to replace values of a variable - stata

I'm attempting to create a new variable in my dataset that stores a number, which is derived from a computation on another number from the same observation.
* Here is what my dataset looks like:
SubjectID Score MyNewScore
1001 5442822 0
1002 6406134 0
1003 16 0
Now, the variable Score is the sum of up to 23 distinct numbers (I'll call them "Responses"), ranging from 1 to 8,388,608.
/* Example of response values
1st response = 1
2nd response = 2
3rd response = 4
4th response = 8
5th response = 16
6th response = 32
...
23rd response = 8,388,608
*/
MyNewScore contains a count of these distinct responses used to obtain the value in Score. In my example dataset, MyNewScore should equal 9 as there are 9 responses used to arrive at a sum of 5,442,822.
I have nested a forvalues loop within a while loop in Stata that successfully calculates MyNewScore but I do not know how to replace the 0 that currently exists in the dataset with the result of my nested-loops.
Stata code used to calculate the value I'm after:
// Build a loop to create a Roland Morris Score
local score = 16
local count = 0
while `score' != 0 {
local ItemCode
forvalues i=1/24
local j = 2^(`i' - 1)
if `j' >= `score' continue, break
local ItemCode `j'
* display "`ItemCode'"
}
local score = `score' - `ItemCode'
if `score' > 1 {
local count = `count' + 1
display "`count'"
}
else if `score' == 1 {
local count = `count' + 1
display "`count'"
continue, break
}
}
How do I replace the 0s in MyNewScore with the output from the nested-loops? I have tried nesting these two loops in another while loop, with a `replace' command although that simply applies the count from the first observation, to all observations in the dataset.

I think there's an error in the value of the 23rd response, it should be 2^(23-1), which is 4,194,304.
The sum of the first 4 responses is 15; that's 1+2+4+8 or 2^4-1. The sum of all 23 responses is 2^23 - 1 so the largest possible value for Score is 8,388,607.
There's no need for a loop over observations here. You start with a cloned copy of the Score variable. You loop over each response, starting from the highest down to 1. At each pass, if the current score is higher or equal to the value of the response, you count that response and you subtract the value from the score.
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(SubjectID Score)
1001 5442822
1002 6406134
1003 16
1004 1
1005 19
1006 15
1007 8388607
end
clonevar x = Score
gen wanted = 0
qui forvalues i=23(-1)1 {
local response = 2^(`i'-1)
replace wanted = wanted + 1 if x >= `response'
replace x = x - `response' if x >= `response'
}

I think all that you would need to do is nest your code in a loop that goes through each variable in your dataset, like so:
// get total number of observations in dataset
local N = _N
// go through each observation and run the while loop
forvalues observation = 1/`N' {
local score = Score[`observation']
local count = 0
// your while loop here
while `score' != 0 {
...
}
replace MyNewScore = `ItemCode' in `observation' // (or whatever value you're after)
}
Is this what you're after?

Related

SAS: Getting variable name of last non-zero observation

I'm trying to figure this out. I have a table as follows and I'm trying to populate the final column with the variable name of the last non-zero value (as shown in final column):
ID MTH_1 MTH_2 MTH_3 MTH_4 MTH_5 MONTH_LAST_BALANCE
--------------------------------------------------------------
1 10 0 10 20 10 MTH_5
2 5 10 15 5 0 MTH_4
3 5 10 5 0 0 MTH_3
4 1 2 3 1 0 MTH_4
5 1 0 0 0 0 MTH_1
I'm guessing I need to use some sort of array to make this work but I don't know. As per row 1, I need the last non-zero value only, not the left-most one that some other code seems to retrieve.
Any help would be much appreicated.
Cheers
data want ;
set have ;
/* Load MTH_1 to MTH_5 into array */
array m{*} MTH_1-MTH_5 ;
length MONTH_LAST_BALANCE $5. ;
/* Iterate over array */
do i = 1 to dim(m) ;
/* Use vname function to get variable name from array element */
if m{i} > 0 then MONTH_LAST_BALANCE = vname(m{i}) ;
end ;
run ;

SAS SymputX and Symget Function

I try to construct Table 2 by writing below SAS code but what I get is the Table 1. I could not figure out what I missed. Help very appreciated Thank you.
&counter = 4
data new;set set1;
total = 0;
a = 1;
do i = 1 to &counter;
call symputX('a',a);
total = total + Tem_&a.;
a = symget('a')+1;
call symputX('a',a);
end;
run;
Table 1
ID Amt Tem_1 Tem_2 Tem_3 Tem_4 total
4 500 1 4 5 900 3600
5 200 50 100 200 0 0
9 50 40 0 0 0 0
10 500 70 100 250 0 0
Table 2
ID Amt Tem_1 Tem_2 Tem_3 Tem_4 total
4 500 1 4 5 900 910
5 200 50 100 200 0 350
9 50 40 0 0 0 40
10 500 70 100 250 0 420
You cannot use SYMPUT and SYMGET that way, unfortunately. While you can use them to store/retrieve macro variable values, you cannot change the code sent to the compiler after execution.
Basically, SAS has to figure out the machine code for what it's supposed to do on every iteration of the data step loop before it looks at any data (this is called compiling). So the problem is, you can't define tem_&a. and expect to be allowed to change what _&a. is during execution, because it would change what that machine code needs to do, and SAS couldn't prepare for that sufficiently.
So, what you wrote the &a. would be resolved when the program compiled, and whatever value &a. had before your data step woudl be what tem_&a. would turn into. Presumably the first time you ran this it errored (&a. does not resolve and then an error about & being illegal in variable names), and then eventually the call symput did its job and &a got a 4 in it at the end of the loop, and forever more your tem_&a. resolved to tem_4.
The solution? Don't use macros for this. Instead, use arrays.
data new;
set set1;
total = 0;
array tem[&counter.] tem_1-tem_&counter.;
a = 1;
do i = 1 to &counter; *or do i = 1 to dim(tem);
total = total + Tem[i];
end;
run;
Or, of course, just directly sum them.
data new;
set set1;
total = sum(of tem_1-tem_4);
run;
If you REALLY like macro variables, you could of course do this in a macro do loop, though this is not recommended for this purpose as it's really better to stick with data step techniques. But this should work, anyway, if you run this inside a macro (this won't be valid in open code).
data new;
set set1;
total = 0;
%do i = 1 %to &counter;
total = total + Tem_&i.;
%end;
run;

Variable that double counts the observations

I am trying to create a new variable such that it would count like
1,1,2,2,3,3,4,4 ..... meaning it would double count the observations.
My current code is like this
gen newid = _n
replace newid = newid[_n+1] if mod(newid2,2) == 0
but with this the result comes out as 1,1,3,3,5,5,7,7, ... where the increments are in 2's, i.e. I only get odd numbers. How should I modify this code?
You might try dividing your ID variable by 2, and then use Stata's ceil function to force it up to the nearest integer.
clear
set obs 50
gen newid = _n
gen newid2 = ceil(newid/2)
You can use the int(x) function.
This function returns the integer obtained by truncating x.
Thus, int(5.2) is 5.
If you want the following pattern
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
the command is
gen seq = int((_n-1)/2) +1

How to decrement a values of variable in python?

Does anyone here who can help me what is the shortest way of decrementing a values of a variable?
Below is my desired output:
start = 5000
range = 5
qout = start/range
Distributed Remaining
1000 4000 # start - 1000
1000 3000 # 4000 - 1000
1000 2000 # 3000 - 1000
1000 1000 # 2000 - 1000
1000 0 # 1000 - 1000
what i have done so far is this:
start = 5000
range = 5
qout = start/range
i = 0
while i < range:
temp = {
'distr' : qout,
'remain' : start - remain, # This is what i can do only, unless it is being saved in the database so that i can move to next item.
}
i+=1
return temp
RE UPDATED:
I guess you are right, i don't know how should i ask. But let me show my original code.
temp = {}
i = 0
seq = 0
start = 11529.60
range = 6
qout = start / range
remaining = start - qout
while i < range:
while remaining >= 0:
temp = {
'sequence' : i+1,
'distributed' : qout,
'remaining' : remaining,
}
remaining -= qout
i += 1
print(temp)
My expected output would like this (and this is the output that i wanted/desired to show)
Sequence Distributed Remaining
1 1921.60 9608.00
2 1921.60 7686.40
3 1921.60 5764.80
4 1921.60 3843.20
5 1921.60 1921.60
6 1921.60 0.00
How ever this is what i get:
Sequence Distributed Remaining
1 1921.60 9608.00
1 1921.60 7686.40
1 1921.60 5764.80
1 1921.60 3843.20
1 1921.60 1921.60
Thanks for any help
This is my 3rd edit. I honestly believe that the largest problem here is that you can not define the question.
How to decrement a values of variable in python?
The answer to this is --i, but that's not what you asking.
Than you have desired input with no explanation which is what.
That's how I guess you want it to work...
start - an initial value;
range - how many times start will be deducted from
quot - amount of deduction, which is eq. to start/range.
remaining - this is my variable, which reflect the result of deducting from start. From your comment below, I assume remaining can go negative.
Still no question here, but let's put it together ...
start = 11529.60
range = 6
quot = start/range
sequence = 0
remaining = start
while range > 0:
range -= 1
sequence += 1
remaining -= quot
print(sequence, quot, remaining)

Setting cutoff period SAS

I am having a problem with a dataset that looks like the one below. It is an inventory count of different location/weeks:
data have;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 5
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 6
4 x 1 30
4 x 2 0
4 x 3 40
4 x 4 10
;
run;
Here is the issue...once the inventory hits 0 for a specific location/item combination, I want all remaining weeks for that combination to be imputed with 0. My desired output looks like this:
data want;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 0
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 0
4 x 1 30
4 x 2 0
4 x 3 0
4 x 4 0
;
run;
I'm fairly new to SAS and don't know how to do this. Help?!
Thank you!
You can do that in the following steps:
by statement to indicate the order (the input dataset must be sorted accordingly)
retain statement to pass the value of a control variable (reset) to the following rows
deactivate the imputation (reset=0) for every first location/item combination
activate the imputation (reset=1) for zero values of inv
set to 0 if the imputation is active
Code:
data want (drop=reset);
set have;
by itm location week;
retain reset;
if first.location then reset=0;
if (inv = 0) then reset=1;
else if (reset = 1) then inv=0;
run;
The value of reset remains constant from row to row until explicitly modified.
The presence of the variable week in the by statement is only to check that the input data is chronologically sorted.
The following will use proc sql to give the wanted result. I have commented inline why I do different steps.
proc sql;
/* First of all fetch all observations where the inventory is depleated.*/
create table work.zero_inv as
select *, min(week) as min_zero_inv_week
from work.have where inv = 0
/* If your example data set had included several zero inventory weeks, without the follwing "commented" code you would got duplicates. I'll leave the excercise to explain this to you. Just alter your have data set and see the difference.*/
/*group by itm, location
having (calculated min_zero_inv_week) = week*/;
create table work.want_calculated as
/* Since we have fetched all weeks with zero inventories, we can use a left join and only update weeks that follows those with zeros and leave the inventory untouched for the rest.*/
select t1.itm, t1.location, t1.week,
/* Since we use a left join, we can check if the second data sets includes any rows at all for the specific item at the given location. */
case when t2.itm is missing or t1.week <= t2.week then t1.inv else t2.inv end as inv
from work.have as t1
left join work.zero_inv as t2
on t1.itm = t2.itm and t1.location = t2.location
/* proc sql does not promise to keep the order in your dataset, therefore it is good to sort it.*/
order by t1.itm, t1.location, t1.week;
run;
proc compare base=work.want compare=work.want_calculated;
title 'Hopefully no differences';
run;
Remember that stackoverflow isn't a "give me the code" forum, it is customary to try some solutions by yourself first. I'll cut you some slack since this is your first question; Welcome to SO :D.