I am relatively new to SAS and need to calculate a moving average based on a variable.
I've made some example code to explain:
DATA testData;
input shop year sales;
datalines;
01 01 20000
01 02 23500
01 03 21020
02 01 23664
02 02 15420
02 03 14200
03 01 25623
03 02 12500
03 03 20030
;
run;
DATA average;
retain y 0;
set testData;
y = y + sales;
avg = y/_n_;
run;
This gives me the average for all my sales. What I want to do is only get the averages per shop and based on the last year and then on all years of that shop. Then to start again for the next shop. Hopefully this makes some kind of sense. I don't want the moving average of any of shop 1's years to affect the average in shop 2.
What you need to do is to reset your average every time you start counting a new shop. You also need to use your own record counter. Here is the improved code:
DATA testData;
input shop year sales;
datalines;
01 01 20000
01 02 23500
01 03 21020
02 01 23664
02 02 15420
02 03 14200
03 01 25623
03 02 12500
03 03 20030
;
run;
PROC SORT DATA=WORK.TESTDATA
OUT=Sorted;
BY shop year;
RUN;
DATA average (drop=n);
set Sorted;
by shop;
if first.shop then
do;
y = 0;
n = 0;
end;
n + 1;
y + sales;
avg = y/n;
run;
Also, notice that the retain statement is not necessary is you express your sum statement is expressed as "i + y" instead of "i=i+y".
For more information about group by, see this SAS Support doc.
Result:
Related
I am working on an assignment where I need to identify the most frequent number across a range of variables. If there is a tie between two number, I also need SAS to return the highest value of the two most frequent number.
Using this answer (https://communities.sas.com/t5/General-SAS-Programming/Find-most-frequent-response-across-multiple-variables/td-p/269774), I know how to identify the most frequent number if there isn't a tie between two number. I now only need SAS to return the highest number if there is a tie. I think the problem arises in the last line before the 'run'-statement.
data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
run;
data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=whichn(Count, of _t[*]);
run;
WhichN will return the index of only the first (left to right) occurrence, so when there is a MODE tie you will not get the highest.
You can compute highest mode and count of mode at frequency bin update time.
data have;
input id (x1-x5) ($CHAR2. +1);
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
data want;
set have;
label
hmode_n = 'Mode (count)'
hmode = 'Mode (highest)'
;
array x x1-x5;
array bins[00:99] _temporary_; * freq table for two digit numbers;
do index = 1 to dim(x);
if missing(x[index]) then continue;
value = input(x[index],2.);
bins[value] + 1;
if bins[value] > hmode_n then do;
hmode_n = bins[value];
hmode = value;
end;
else
if bins[value] = hmode_n and value > hmode then do;
hmode = value;
end;
end;
call missing(of bins(*));
drop index value;
run;
First, adjust your input dataset so that all values are numeric rather than character:
data have;
input id 1 x1 x2 x3 x4 x5;
datalines;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 . 03 .
4 02 01 02 01
5 01 02 03 .04
;
run;
Next, transpose the data by id. This will make it easier to work with. Once it is in a long format, you can more easily feed the data into procs to handle the calculations for you.
proc transpose data=have
out=have2(rename=(col1 = value))
name=var;
by id;
var x1-x5;
run;
proc rank can allow you to grab what you need.
proc rank data=have2
out=want
ties=high
;
by id;
var value;
ranks rank;
run;
proc sort data=want;
by id rank;
run;
proc univariate is also an option to get your statistics of interest.
proc univariate data=have2;
by id;
id var;
var value;
run;
Well, I have to say, The code you give is a very excellent example for whichn usage. And I will also praise the usage of array _t, really nice thought!
For the question itself, here is my answer.
data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
run;
data want;
set have;
array x x1-x5;
array y y1-y5;
do i = 1 to dim(x);
y[i] = count(catx('#',of x[*]),cats(x[i]));
end;
count = max(of y[*]);
do i = 1 to dim(x);
if y[i] = count then highest = highest <> input(x[i],best.);
end;
drop y: i;
run;
The assignment of y[i] assumed that x1 to x5 are between 0 and 9. If not, there should be some more restrict:
y[i] = count('#'||catx('#',of x[*]),catx('#','',x[i]));
In my program, I am trying to find the lowest values in the row of a matrix, then find the lowest value in the next row corresponding to the column it was found.
I wrote a function that does most of that work, however, I am confused on the algorithm:
Declare and define a function that computes the index where the lowest (non-zero) values are found in a given row.
It takes three parameters; the 2D array declared in the main function that is the matrix, the 1D array that contains the list of all rows that were visited and an integer that represents a row in the 2D array.
For each of the columns in a row, compute the lowest non-zero value only if that row wasn’t visited before.
I am lost on how to move to the next row of the column with the lowest value
int const SIZE = 10;
int lowest_level(int array_2D[][SIZE], int path[], int row /* current row*/)
{
for (int i = 0; i < SIZE; ++i)
{
int minValue = array_2D[i][0]; // sets min value every row
for(int j = 0; j < SIZE; ++j)
{
if (path_checker(path, row) == false) // if row was not
visited
{
if ((array_2D[i][j] < minValue) && array_2D[i][j] != 0) // if
value is less than min & value is not 0
{
minValue = array_2D[i][j];
//cout << minValue << " "; // for testing; crashes
}
else
{
break;
}
}
}
}
//return (minValue);
}
I expect something like this
A B C D E F G H I J
-----------------------------------------------
A | 00 08 15 01 10 05 19 19 03 05
B | 06 00 02 08 02 12 16 03 08 17
C | 12 05 00 14 13 03 02 17 19 16
D | 08 07 12 00 10 13 08 20 16 15
E | 04 12 03 14 00 05 02 12 14 09
F | 08 05 03 18 18 00 04 02 10 19
G | 17 16 11 03 09 07 00 03 05 09
H | 07 06 11 10 11 11 07 00 14 09
I | 10 04 05 15 17 01 07 17 00 09
J | 05 20 07 04 18 19 19 03 10 00
Path
A --> D --> B --> C --> G --> H --> J --> I --> F --> E (path by rows it takes)
1 7 2 2 3 9 10 1 18 (lowest values in each row, next value corresponding to the column (1 = coordinates (A, D))
I am trying to convert my columns to diagonals using SAS.
For example
D C1 C2 C3 C4 C5
J 11 00 14 15 20
F 00 13 16 00 30
M 00 00 18 19 00
A 00 00 00 98 50
S 00 00 00 00 41
Want this converted to
D N1 N2 N3 N4 N5
J 11 00 14 15 20
F 13 16 00 30
M 18 19 00
A 98 50
M 41
Can anyone tell or help me with this?
Update base on new info: This just uses an array to shift the values starting from the diagonal to the left. Not dependent of on values in the lower triangle.
data havethis;
infile cards firstobs=2;
input D:$1. C1-C5;
cards;
D C1 C2 C3 C4 C5
J 11 00 14 15 20
F 00 13 16 00 30
M 00 00 18 19 00
A 00 00 00 98 50
S 00 00 00 00 41
;;;;
run;
data want;
set havethis;
array c[*] c:;
array N[&sysnobs];
j = 0;
do i = _n_ to dim(c);
j + 1;
n[j] = c[i];
end;
drop j i;
run;
This method use two transposes (flip/flop), it assumes only zeros are on the off diagonal (better if they were missing) and missing is what you get as result. I like this method because you don't have to know anything, like how many.
data havethis;
input D:$1. C1 C2 C3;
format c: z2.;
cards;
J 11 12 14
M 00 13 15
A 00 00 16
;;;;
run;
proc transpose data=havethis out=wantthis(drop=_name_ where=(col1 ne 0));
by D notsorted;
run;
proc transpose data=wantthis out=whatthismore(drop=_name_) prefix=N;
by d notsorted;
run;
I'm trying to load a file of integers, add them to a 2D array, iterate through the array, and add tiles to my level based on the integer(Tile ID)at the current index. My problem seems to be that the array is loaded/iterated through in the wrong order. This is the file I'm loading from:
test.txt
02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
This is the level constructor:
Level::Level(std::string levelpath, int _width, int _height)
{
std::ifstream levelfile(levelpath);
width = _width;
height = _height;
int ids[15][9];
while (levelfile.is_open()) {
std::copy_n(std::istream_iterator<int>(levelfile), width * height, &ids[0][0]);
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
tiles.push_back(getTile(ids[x][y], sf::Vector2f(x * Tile::SIZE, y * Tile::SIZE)));
std::cout << ids[x][y] << " ";
}
std::cout << std::endl;
}
levelfile.close();
}
}
And this is how I create the level:
level = std::unique_ptr<Level>(new Level("data/maps/test.txt", 15, 9));
Here's the output in the console:
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
As you can see the contents are the same as in test.txt, but in the wrong order.
The reason is that you swapped the dimensions of the array. Instead of
int ids[15][9];
...which is 15 lines of 9 elements, you want
int ids[9][15];
...which is 9 lines of 15 elements. The order of the extents in the declaration is the same as the order of indices in access.
EDIT: ...which you also swapped. Instead of
ids[x][y]
you need
ids[y][x]
That does rather better explain the output you get, come to think of it. 2D-Arrays in C++ are stored row-major, meaning that the innermost arrays (the ones stored contiguously) are the ones with the rightmost index. Put another way, ids[y][x] is stored directly before ids[y][x + 1], whereas there is some space between ids[y][x] and ids[y + 1][x].
If you read in a row-major array like you do with std::copy_n and interpret it as a column-major array, you get the transpose (a bit warped because of the changed dimensions, but recognizably so. If you swapped height and width, you'd see the real transpose).
int ids[9][15];
while (levelfile.is_open()) {
std::copy_n(std::istream_iterator<int>(levelfile), width * height, &ids[0][0]);
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
tiles.push_back(getTile(ids[y][x], sf::Vector2f(x * Tile::SIZE, y * Tile::SIZE)));
std::cout << ids[y][x] << " ";
}
std::cout << std::endl;
}
If you look you can see that your print the first 15 values (the need to be in the first line) in the first raw (and what doesn't fit in the second). You can understand that it start filling the rows before the lines and your file contain first the line. So load your map "on the side". Set the height as the width (15) and the opposite (the width is 9 and not 15). Now you will load the map correctly.
Not just print each row and "endl" before the second row (each row print as line). And you will see this ok.
Hope it was clear enough.
In the 3rd and fourth observation the value for status is null, i need the 3rd and 4th observation to equal the second ob. this needs to occur thru the data set by id.
data z;
input id $ d status $;
cards;
11111 01 a
11111 02 a
11111 03 .
11111 04 .
11111 05 p
11111 06 .
11111 07 .
11111 08 .
11111 09 a
11111 10 .
11111 11 .
11111 12 .
11111 13 .
11111 14 .
11111 15 .
11111 16 .
11112 01 p
11112 02 .
11112 03 .
11112 04 .
11112 05 p
11112 06 .
11112 07 .
11112 08 .
11112 09 .
11112 10 a
;
run;
This data step should do the trick.
data want;
set z;
by id;
length lastStatus $1;
retain lastStatus;
if first.id then lastStatus = status;
else lastStatus = coalescec(status,lastStatus);
drop status;
rename lastStatus = status;
run;