Variable that double counts the observations - stata

I am trying to create a new variable such that it would count like
1,1,2,2,3,3,4,4 ..... meaning it would double count the observations.
My current code is like this
gen newid = _n
replace newid = newid[_n+1] if mod(newid2,2) == 0
but with this the result comes out as 1,1,3,3,5,5,7,7, ... where the increments are in 2's, i.e. I only get odd numbers. How should I modify this code?

You might try dividing your ID variable by 2, and then use Stata's ceil function to force it up to the nearest integer.
clear
set obs 50
gen newid = _n
gen newid2 = ceil(newid/2)

You can use the int(x) function.
This function returns the integer obtained by truncating x.
Thus, int(5.2) is 5.
If you want the following pattern
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
the command is
gen seq = int((_n-1)/2) +1

Related

SAS: Getting variable name of last non-zero observation

I'm trying to figure this out. I have a table as follows and I'm trying to populate the final column with the variable name of the last non-zero value (as shown in final column):
ID MTH_1 MTH_2 MTH_3 MTH_4 MTH_5 MONTH_LAST_BALANCE
--------------------------------------------------------------
1 10 0 10 20 10 MTH_5
2 5 10 15 5 0 MTH_4
3 5 10 5 0 0 MTH_3
4 1 2 3 1 0 MTH_4
5 1 0 0 0 0 MTH_1
I'm guessing I need to use some sort of array to make this work but I don't know. As per row 1, I need the last non-zero value only, not the left-most one that some other code seems to retrieve.
Any help would be much appreicated.
Cheers
data want ;
set have ;
/* Load MTH_1 to MTH_5 into array */
array m{*} MTH_1-MTH_5 ;
length MONTH_LAST_BALANCE $5. ;
/* Iterate over array */
do i = 1 to dim(m) ;
/* Use vname function to get variable name from array element */
if m{i} > 0 then MONTH_LAST_BALANCE = vname(m{i}) ;
end ;
run ;

How to retain calculated values between rows when calculating running totals?

I have a tricky question about conditional sum in SAS. Actually, it is very complicated for me and therefore, I cannot explain it by words. Therefore I want to show an example:
A B
5 3
7 2
8 6
6 4
9 5
8 2
3 1
4 3
As you can see, I have a datasheet that has two columns. First of all, I calculated the conditional cumulative sum of column A ( I can do it by myself-So no need help for that step):
A B CA
5 3 5
7 2 12
8 6 18
6 4 8 ((12+8)-18)+6
9 5 17
8 2 18
3 1 10 (((17+8)-18)+3
4 3 14
So my condition value is 18. If the cumulative more than 18, then it equal 18 and next value if sum of the first value after 18 and exceeds amount over 18. ( As I said I can do it by myself )
So the tricky part is I have to calculate the cumulative sum of column B according to column A:
A B CA CB
5 3 5 3
7 2 12 5
8 6 18 9.5 (5+(6*((18-12)/8)))
6 4 8 5.5 ((5+6)-9.5)+4
9 5 17 10.5 (5.5+5)
8 2 18 10.75 (10.5+(2*((18-7)/8)))
3 1 10 2.75 ((10.5+2)-10.75)+1
4 3 14 5.75 (2.75+3)
As you can see from example the cumulative sum of column B is very specific. When column CA is equal to our condition value (18), then we calculate the proportion of the last value for getting our condition value (18) and then use this proportion for computing cumulative sum of column B.
Looks like when the sum of A reaches 18 or more you want to split the values of A and B between the current and the next record. One way is to remember the left over values for A and B and carry them forward in your new cumulative variables. Just make sure to output the observation before resetting those variables.
data want ;
set have ;
ca+a;
cb+b;
if ca >= 18 then do;
extra_a=ca - 18;
extra_b=b - b*((a - extra_a)/a) ;
ca=18;
cb=cb-extra_b ;
end;
output;
if ca=18 then do;
ca=extra_a;
cb=extra_b;
end;
drop extra_a extra_b ;
run;

How to write an algorithm that will delete observations on a specific condition?

I have a variable v1 with the following entries :
v1
1
2
4
11
13
5
6
7
How should I delete every observation that is repeated with a 1 in the front? In this case I want to delete 1 and 11, but not 13 because we dont have the corresponding 3 in v1.
describe v1
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------------------------------------------------------
v1 int %td
Given the incomplete information, here is something that seems to do what is requested. The two assertions confirm that the variable is of type int and that it is non-negative and either a single-digit or two digits of which the first is 1. Note that the statement of the problem does not specify what is to happen if there is, say, a 1 and two 11s: my interpretation is to delete every observation with the same final digit as long as there is a pair that qualify for deletion.
clear
input int v1
1
2
4
11
13
5
6
7
11
end
assert "`: type v1'"=="int"
assert inrange(v1,0,19)
generate digit = mod(v1,10)
generate hasone = (v1-digit)==10
bysort digit (hasone): drop if hasone[_N] & !hasone[1]
sort v1
list, clean noobs
which yields
v1 digit hasone
2 2 0
4 4 0
5 5 0
6 6 0
7 7 0
13 3 1

Setting cutoff period SAS

I am having a problem with a dataset that looks like the one below. It is an inventory count of different location/weeks:
data have;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 5
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 6
4 x 1 30
4 x 2 0
4 x 3 40
4 x 4 10
;
run;
Here is the issue...once the inventory hits 0 for a specific location/item combination, I want all remaining weeks for that combination to be imputed with 0. My desired output looks like this:
data want;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 0
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 0
4 x 1 30
4 x 2 0
4 x 3 0
4 x 4 0
;
run;
I'm fairly new to SAS and don't know how to do this. Help?!
Thank you!
You can do that in the following steps:
by statement to indicate the order (the input dataset must be sorted accordingly)
retain statement to pass the value of a control variable (reset) to the following rows
deactivate the imputation (reset=0) for every first location/item combination
activate the imputation (reset=1) for zero values of inv
set to 0 if the imputation is active
Code:
data want (drop=reset);
set have;
by itm location week;
retain reset;
if first.location then reset=0;
if (inv = 0) then reset=1;
else if (reset = 1) then inv=0;
run;
The value of reset remains constant from row to row until explicitly modified.
The presence of the variable week in the by statement is only to check that the input data is chronologically sorted.
The following will use proc sql to give the wanted result. I have commented inline why I do different steps.
proc sql;
/* First of all fetch all observations where the inventory is depleated.*/
create table work.zero_inv as
select *, min(week) as min_zero_inv_week
from work.have where inv = 0
/* If your example data set had included several zero inventory weeks, without the follwing "commented" code you would got duplicates. I'll leave the excercise to explain this to you. Just alter your have data set and see the difference.*/
/*group by itm, location
having (calculated min_zero_inv_week) = week*/;
create table work.want_calculated as
/* Since we have fetched all weeks with zero inventories, we can use a left join and only update weeks that follows those with zeros and leave the inventory untouched for the rest.*/
select t1.itm, t1.location, t1.week,
/* Since we use a left join, we can check if the second data sets includes any rows at all for the specific item at the given location. */
case when t2.itm is missing or t1.week <= t2.week then t1.inv else t2.inv end as inv
from work.have as t1
left join work.zero_inv as t2
on t1.itm = t2.itm and t1.location = t2.location
/* proc sql does not promise to keep the order in your dataset, therefore it is good to sort it.*/
order by t1.itm, t1.location, t1.week;
run;
proc compare base=work.want compare=work.want_calculated;
title 'Hopefully no differences';
run;
Remember that stackoverflow isn't a "give me the code" forum, it is customary to try some solutions by yourself first. I'll cut you some slack since this is your first question; Welcome to SO :D.

How to select the 5 minimum values with SAS Proc IML?

I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4
Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.
You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;