SAS Maintaining last non null character variable until it changes - sas

In the 3rd and fourth observation the value for status is null, i need the 3rd and 4th observation to equal the second ob. this needs to occur thru the data set by id.
data z;
input id $ d status $;
cards;
11111 01 a
11111 02 a
11111 03 .
11111 04 .
11111 05 p
11111 06 .
11111 07 .
11111 08 .
11111 09 a
11111 10 .
11111 11 .
11111 12 .
11111 13 .
11111 14 .
11111 15 .
11111 16 .
11112 01 p
11112 02 .
11112 03 .
11112 04 .
11112 05 p
11112 06 .
11112 07 .
11112 08 .
11112 09 .
11112 10 a
;
run;

This data step should do the trick.
data want;
set z;
by id;
length lastStatus $1;
retain lastStatus;
if first.id then lastStatus = status;
else lastStatus = coalescec(status,lastStatus);
drop status;
rename lastStatus = status;
run;

Related

Turn rows into variables in a dataset

I have a data set as following:-
data club;
input Name $ Gov_Type $ YR1 YR2 YR3;
datalines;
Afg COC 10 20 30
Afg GE 20 30 40
Afg PS 10 3 202
Afg RQ . 30 10
Brh COC 10 . 30
Brh GE 4 12 33
Brh PS 12 43 12
Brh RQ 19 3 12
Gen COC 32 . 65
Gen GE 13 93 44
Gen PS 12 38 12
Gen RQ 13 1 13
;
I want to change it so that COC, GE, PS and RQ become variables and have the value of YR1, YR2 and YR3 are displayed as following dataset:-
data club2;
input Name $ YR $ COC GE PS RQ;
datalines;
Afg YR1 10 20 10 .
Afg YR2 20 30 3 30
Afg YR3 30 40 202 10
Brh YR1 10 4 12 9
Brh YR2 . 12 43 3
Brh YR3 30 33 12 12
Gen YR1 32 13 12 13
Gen YR2 . 93 38 1
Gen YR3 65 44 12 13
;
How will I be able to this?
Thanks in advance.
The desired data transformation is a easily accomplished with Proc TRANSPOSE.
proc transpose data=club out=stage(rename=_name_=YR);
by name;
id Gov_type;
run;
If the transform is for reporting purposes considering using Proc TABULATE
proc tabulate data=club;
class name gov_type;
var yr1-yr3;
table name=''*(yr1-yr3)*sum=' '*f=9.,gov_type;
run;
Try this
data club;
input Name $ Gov_Type $ YR1 YR2 YR3;
datalines;
Afg COC 10 20 30
Afg GE 20 30 40
Afg PS 10 3 202
Afg RQ . 30 10
Brh COC 10 . 30
Brh GE 4 12 33
Brh PS 12 43 12
Brh RQ 19 3 12
Gen COC 32 . 65
Gen GE 13 93 44
Gen PS 12 38 12
Gen RQ 13 1 13
;
data temp;
set club;
array y yr:;
do over y;
yr = y;
v = vname(y);
output;
end;
drop yr1-yr3;
run;
proc sort data = temp;
by Name v Gov_Type;
run;
data club2;
do i = 1 by 1 until (last.v);
set temp;
by Name v;
array g{*} coc ge ps rq;
if upcase(Gov_Type) = upcase(vname(g[i])) then g[i] = yr;
end;
drop i yr Gov_Type;
run;

Identify most frequent number across variables and largest number of the most frequent (SAS)

I am working on an assignment where I need to identify the most frequent number across a range of variables. If there is a tie between two number, I also need SAS to return the highest value of the two most frequent number.
Using this answer (https://communities.sas.com/t5/General-SAS-Programming/Find-most-frequent-response-across-multiple-variables/td-p/269774), I know how to identify the most frequent number if there isn't a tie between two number. I now only need SAS to return the highest number if there is a tie. I think the problem arises in the last line before the 'run'-statement.
data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
run;
data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=whichn(Count, of _t[*]);
run;
WhichN will return the index of only the first (left to right) occurrence, so when there is a MODE tie you will not get the highest.
You can compute highest mode and count of mode at frequency bin update time.
data have;
input id (x1-x5) ($CHAR2. +1);
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
data want;
set have;
label
hmode_n = 'Mode (count)'
hmode = 'Mode (highest)'
;
array x x1-x5;
array bins[00:99] _temporary_; * freq table for two digit numbers;
do index = 1 to dim(x);
if missing(x[index]) then continue;
value = input(x[index],2.);
bins[value] + 1;
if bins[value] > hmode_n then do;
hmode_n = bins[value];
hmode = value;
end;
else
if bins[value] = hmode_n and value > hmode then do;
hmode = value;
end;
end;
call missing(of bins(*));
drop index value;
run;
First, adjust your input dataset so that all values are numeric rather than character:
data have;
input id 1 x1 x2 x3 x4 x5;
datalines;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 . 03 .
4 02 01 02 01
5 01 02 03 .04
;
run;
Next, transpose the data by id. This will make it easier to work with. Once it is in a long format, you can more easily feed the data into procs to handle the calculations for you.
proc transpose data=have
out=have2(rename=(col1 = value))
name=var;
by id;
var x1-x5;
run;
proc rank can allow you to grab what you need.
proc rank data=have2
out=want
ties=high
;
by id;
var value;
ranks rank;
run;
proc sort data=want;
by id rank;
run;
proc univariate is also an option to get your statistics of interest.
proc univariate data=have2;
by id;
id var;
var value;
run;
Well, I have to say, The code you give is a very excellent example for whichn usage. And I will also praise the usage of array _t, really nice thought!
For the question itself, here is my answer.
data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
run;
data want;
set have;
array x x1-x5;
array y y1-y5;
do i = 1 to dim(x);
y[i] = count(catx('#',of x[*]),cats(x[i]));
end;
count = max(of y[*]);
do i = 1 to dim(x);
if y[i] = count then highest = highest <> input(x[i],best.);
end;
drop y: i;
run;
The assignment of y[i] assumed that x1 to x5 are between 0 and 9. If not, there should be some more restrict:
y[i] = count('#'||catx('#',of x[*]),catx('#','',x[i]));

Calculating moving averages in SAS

I am relatively new to SAS and need to calculate a moving average based on a variable.
I've made some example code to explain:
DATA testData;
input shop year sales;
datalines;
01 01 20000
01 02 23500
01 03 21020
02 01 23664
02 02 15420
02 03 14200
03 01 25623
03 02 12500
03 03 20030
;
run;
DATA average;
retain y 0;
set testData;
y = y + sales;
avg = y/_n_;
run;
This gives me the average for all my sales. What I want to do is only get the averages per shop and based on the last year and then on all years of that shop. Then to start again for the next shop. Hopefully this makes some kind of sense. I don't want the moving average of any of shop 1's years to affect the average in shop 2.
What you need to do is to reset your average every time you start counting a new shop. You also need to use your own record counter. Here is the improved code:
DATA testData;
input shop year sales;
datalines;
01 01 20000
01 02 23500
01 03 21020
02 01 23664
02 02 15420
02 03 14200
03 01 25623
03 02 12500
03 03 20030
;
run;
PROC SORT DATA=WORK.TESTDATA
OUT=Sorted;
BY shop year;
RUN;
DATA average (drop=n);
set Sorted;
by shop;
if first.shop then
do;
y = 0;
n = 0;
end;
n + 1;
y + sales;
avg = y/n;
run;
Also, notice that the retain statement is not necessary is you express your sum statement is expressed as "i + y" instead of "i=i+y".
For more information about group by, see this SAS Support doc.
Result:

Loading array from file

I'm trying to load a file of integers, add them to a 2D array, iterate through the array, and add tiles to my level based on the integer(Tile ID)at the current index. My problem seems to be that the array is loaded/iterated through in the wrong order. This is the file I'm loading from:
test.txt
02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
This is the level constructor:
Level::Level(std::string levelpath, int _width, int _height)
{
std::ifstream levelfile(levelpath);
width = _width;
height = _height;
int ids[15][9];
while (levelfile.is_open()) {
std::copy_n(std::istream_iterator<int>(levelfile), width * height, &ids[0][0]);
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
tiles.push_back(getTile(ids[x][y], sf::Vector2f(x * Tile::SIZE, y * Tile::SIZE)));
std::cout << ids[x][y] << " ";
}
std::cout << std::endl;
}
levelfile.close();
}
}
And this is how I create the level:
level = std::unique_ptr<Level>(new Level("data/maps/test.txt", 15, 9));
Here's the output in the console:
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
As you can see the contents are the same as in test.txt, but in the wrong order.
The reason is that you swapped the dimensions of the array. Instead of
int ids[15][9];
...which is 15 lines of 9 elements, you want
int ids[9][15];
...which is 9 lines of 15 elements. The order of the extents in the declaration is the same as the order of indices in access.
EDIT: ...which you also swapped. Instead of
ids[x][y]
you need
ids[y][x]
That does rather better explain the output you get, come to think of it. 2D-Arrays in C++ are stored row-major, meaning that the innermost arrays (the ones stored contiguously) are the ones with the rightmost index. Put another way, ids[y][x] is stored directly before ids[y][x + 1], whereas there is some space between ids[y][x] and ids[y + 1][x].
If you read in a row-major array like you do with std::copy_n and interpret it as a column-major array, you get the transpose (a bit warped because of the changed dimensions, but recognizably so. If you swapped height and width, you'd see the real transpose).
int ids[9][15];
while (levelfile.is_open()) {
std::copy_n(std::istream_iterator<int>(levelfile), width * height, &ids[0][0]);
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
tiles.push_back(getTile(ids[y][x], sf::Vector2f(x * Tile::SIZE, y * Tile::SIZE)));
std::cout << ids[y][x] << " ";
}
std::cout << std::endl;
}
If you look you can see that your print the first 15 values (the need to be in the first line) in the first raw (and what doesn't fit in the second). You can understand that it start filling the rows before the lines and your file contain first the line. So load your map "on the side". Set the height as the width (15) and the opposite (the width is 9 and not 15). Now you will load the map correctly.
Not just print each row and "endl" before the second row (each row print as line). And you will see this ok.
Hope it was clear enough.

SAS PROC GLM in a loop

I tried to run PROC GLM in a loop, because I have many models (different combinations of dependent and independent variables), and it’s very time consuming to run them one by one. But log error indicates only one MODEL statement allowed in PROC GLM, so any solutions for this?
my code looks like as below
data old;
input year A1 A2 A3 A4 B C D;
datalines;
2000 22 22 30 37 4 13 14
2000 37 29 31 38 6 16 12
2000 42 29 34 37 3 15 15
2000 28 28 27 35 10 13 15
2000 33 22 37 40 9 12 15
2000 22 29 26 40 3 11 15
2000 37 20 32 40 6 12 13
2001 44 22 33 35 7 20 12
2001 33 20 26 40 6 13 15
2001 32 30 37 35 1 12 13
2001 44 25 31 39 4 20 14
2001 25 30 37 38 4 20 10
2001 43 21 35 38 6 11 10
2001 25 23 34 37 5 17 11
2001 45 30 35 37 1 13 14
2001 48 24 36 39 2 13 15
2001 25 24 35 40 9 16 11
2002 38 26 33 35 2 14 10
2002 29 24 35 36 1 16 13
2002 34 28 32 35 9 16 11
2002 45 26 29 35 9 19 10
2002 26 22 38 35 1 14 12
2002 20 26 26 39 8 17 10
2002 33 20 35 37 9 18 12
;
run;
%macro regression (in, YLIST,XLIST);
%local NYLIST;
%let NYLIST=%sysfunc(countw(&YLIST));
ods tagsets.ExcelXP path='D:\REG' file='Regression.xls';
Proc GLM data=&in; class year;
%do i=1 %to &NYLIST;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
%end;
%do i=2 %to &NYLIST;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
%end;
run;
ods tagsets.excelxp close;
%mend regression;
options mprint;
%regression
(in=old
,YLIST= A1 A2 A3 A4
,XLIST= B C D);
/*potential solutions*/
%macro regression(data,y,x1,x2,x3);
proc glm data=&data;
class year;
model &y=&x1 &x2 &x3/solution;
run;
%mend regression;
%macro sql (mydataset,y,x1,x2,x3);
proc sql noprint;
select cats('%regression(&mydataset,',&y,',',&x1,',',&x2,',',&x3,')')
into :calllist separated by ' ' from &mydataset;
quit;
&calllist.;
%mend sql;
%sql
(mydataset=old
,y=A1
,X1=B
,X2=C
,X3=D);
You should do this in two steps. One is a macro that contains one instance of PROC GLM:
%macro regression(data,y,x1,x2,x3);
proc glm data=&data;
class year;
model &y &x1 &x2 &x3/solution;
run;
%mend regression;
And then call that macro from something else, either a macro with the looping elements, or better, from a dataset that contains your y/x1/x2/x3 as columns (one row per model statement) using call execute or proc sql select into methods. For example, with a data set modeldata containing the y/x values for your model:
proc sql noprint;
select cats('%regression(mydataset,',y,',',x1,',',x2,',',x3,')') into :calllist separated by ' ' from modeldata;
quit;
&calllist.;
Or similar.