sas change last value by group to first value - sas

I want to change data of the form
id value
1 1
1 1
1 2
2 7
2 7
2 7
2 5
. .
. .
. .
to
id value
1 1
1 1
1 1
2 7
2 7
2 7
2 7
. .
. .
. .
That is, the last value by group should be the first value by group. I have tried the following code
data want;
set have;
by id;
last.value=first.value;
run;
But that didn't work. Could someone help me out?

You should save first.id value in variable and retain it.
data want(drop=tValue);
set have;
by id;
retain tValue;
if first.id then tValue=value;
if last.id then value=tValue;
run;

The problem here is that first.value and last.value:
Do not hold the actual value, they just tell you if an observation is the first or last in a BY-group
Cannot be assigned - last.value = is not valid syntax
Secondly, first.value and last.value only get set if the value variable is stated in the by statement. You should use first.id and last.id instead.
What we need to do here is:
Check if we are looking at an observation that is the first in the BY-group based on id
Keep the value of the value variable until the last id value is reached
When we are looking at the last id value then set the value from step 1.
Alexey's answer covers the actual syntax required to do this. Here's what the first.id/last.id values look like. (You can always view them by adding put _all_; into your datastep):
id value first.id last.id tValue
1 1 1 0 1
1 1 0 0 1
1 2 0 1 1
2 7 1 0 7
2 7 0 0 7
2 7 0 0 7
2 5 0 1 7
. .
. .
. .

Related

Assigning a value to a certain number of rows within a "by" group - SAS

I've spent quite a lot of time on Stack Overflow looking for answers to other questions, but I'm really stuck on this one, so I'm finally asking a question!
I have a dataset of fish in SAS, with:
a unique ID for each angler
three different variables with number of fish released in each category by that angler: over legal size, under legal size, and released dead
a sequential number (fishno) based on the number of rows for each ID; 1 to the last row of that ID.
Variable to be created: Disposition--could be either character variable with "legal" "under" "dead" options or even numeric values of 1-3.
It was originally set up with one row per unique ID, but I set it so that now there is one row per fish discarded (i.e. if there were 3 legal size and 2 undersize fish, I now have 5 rows).
I need to assign, by unique ID, whether each row/fish was released legal, undersize or dead. In the previous example, for a unique ID, I'd need 3 rows assigned to a Disposition of "legal" and 2 rows assigned to a Disposition of "under".
I've tried first.var statements along with if-then-do statements; played around with macros; nothing worked quite right and I'm pretty stuck here. Is there some sort of random assignment I should try? Is there a much easier way that I'm missing?
Example of the data below...
THANK YOU!!
Data in Excel format
Assuming you already have the FISHNO variable, there needs to be some method for assigning each fish as legal, dead, or undersize. The following code will assign the disposition in the that order:
data have;
input ID LEGAL DEAD UNDERSIZE FISHNO;
datalines;
15 1 0 1 1
15 1 0 1 2
29 2 0 2 1
29 2 0 2 2
29 2 0 2 3
29 2 0 2 4
38 1 0 1 1
38 1 0 1 2
53 1 0 1 1
53 1 0 1 2
55 1 0 1 1
55 1 0 1 2
;
run;
data want;
set have;
if legal>0 and legal>=fishno then disposition = 'legal';
else if dead>0 and legal+dead>=fishno then disposition = 'dead';
else if undersize>0 and legal+dead+undersize>=fishno then disposition = 'under';
run;

Multiple conditions for same variable in SAS

I'm trying to detect specific values of one variable and create a new one if those conditions are fulfilled.
Here's a part of my data (I have much more rows) :
id time result
1 1 normal
1 2 normal
1 3 abnormal
2 1 normal
2 2
3 3 normal
4 1 normal
4 2 normal
4 3 abnormal
5 1 normal
5 2 normal
5 3
What I want
id time result base
1 1 normal
1 2 normal x
1 3 abnormal
2 1 normal x
2 2
2 3 normal
3 3 normal
4 1 normal
4 2 normal x
4 3 abnormal
5 1 normal
5 2 normal x
5 3
My baseline value (base) should be populated when result exists at timepoint (time) 2. If there's no result then baseline should be at time=1.
if result="" and time=2 then do;
if time=10 and result ne "" then base=X; end;
if result ne "" and time=2 then base=X; `
It works correctly when time=2 and results exists. But if results missing, then there's something wrong.
The question seems a bit off. "Else if time="" and time=1" There seems to be a typo there somewhere.
However, your syntax seems solid. I've worked an example with your given data. The first condition works, but second (else if ) is assumption. Updating as question is updated.
options missing='';
data begin;
input id time result $ 5-20 ;
datalines;
1 1 normal
1 2 normal
1 3 abnormal
2 1 normal
2 2
3 3 normal
4 1 normal
4 2 normal
4 3 abnormal
;
run;
data flagged;
set begin;
if time=2 and result NE "" then base='X';
else if time=1 and id=2 then base='X';
run;
Edit based on revisited question.
Assuming that the time-point (1) is always next to the point (2). (If not, then add more lags.) Simulating the Lead function we sort the data backwards and utilize lag.
proc sort data=begin; by id descending time; run;
data flagged;
set begin;
if lag(time)=2 and lag(result) EQ "" then base='X';
if time=2 and result NE "" then base='X';
run;
More about opposite of lag: https://communities.sas.com/t5/SAS-Communities-Library/How-to-simulate-the-LEAD-function-opposite-of-LAG/ta-p/232151

Reshaping dataset wide

I have a dataset from a small clinic which looks something like this:
What I am trying to do is make the top long form of the dataset look like the bottom wide form.
My code is the following:
reform date injury_code_1 .... , i(ID) j(VisitNum)
The error code I get is this:
There are variables other than a, b, ID, VisitNum in your data. They must be constant within ID because that is the only way they can fit into wide data without loss of information.
The variable or variables listed above are not constant within ID. Perhaps the values are in error. Type reshape error for a list of the problem observations.
Either that, or the values vary because they should vary, in which case you must either add the variables to the list of xij variables to be reshaped, or drop them.
Why is my code wrong?
Using the data as illustrated in the screenshot, the following works for me:
clear
input ID VisitNum str6 date Injury_1 Injury_2 Injury_3 gender
1 1 "12-Mar" 1 2 3 0
2 1 "2-Apr" 4 . . 1
1 2 "23-Jun" 1 2 . 0
3 1 "1-Feb" 5 6 . 1
1 3 "30-Aug" 8 9 10 0
end
reshape wide date Injury_1 Injury_2 Injury_3, i(ID) j(VisitNum)
order ID gender
list, abbreviate(15)
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| ID gender date1 Injury_11 Injury_21 Injury_31 date2 Injury_12 Injury_22 Injury_32 date3 Injury_13 Injury_23 Injury_33 |
|----------------------------------------------------------------------------------------------------------------------------------------------------|
1. | 1 0 12-Mar 1 2 3 23-Jun 1 2 . 30-Aug 8 9 10 |
2. | 2 1 2-Apr 4 . . . . . . . . |
3. | 3 1 1-Feb 5 6 . . . . . . . |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
The command provided is not valid Stata syntax.

SAS - How to keep the earliest date considering a missing

A need to create a new variable to repeat the earliest date for a ID visit and if it missing it should type missing, after a missing it should keep the earliest date since it was missing(like in the example). I've tried the LAG function and it didn't work; I also try the keep function but just repeat the 25NOV2015 for all records. The final result/"what I need" is in the last column.
Thanks
Example
You need to use retain statement. Retain means your value in each observation won't be reinitialized to a missing. So in the next iteration of data step your variable remembers its value.
Sample data
data a;
input date;
format date ddmmyy10.;
datalines;
.
5
6
7
.
1
2
.
9
;
run;
Solution
data b;
set a;
retain new_date;
format new_date ddmmyy10.;
if date = . then
new_date = .;
if new_date = . then
new_date = date;
run;
Since you didn't post any data I will make up some. Also since the fact that your variable is a date doesn't really impact the answer I will just use some integers as they are easier to type.
data have ;
input id value ## ;
cards;
1 . 1 2 1 3 1 . 1 5 1 6 1 . 1 8
2 1 2 2 2 3 2 . 2 5 2 6
;;;;
Basically your algorithm says that you want to store the value when either the current value is missing or stored value is missing. With multiple BY groups you would also want to set it when you start a new group.
data want ;
set have ;
by id ;
retain new_value ;
if first.id or missing(new_value) or missing(value)
then new_value=value;
run;
Results:
new_
Obs id value value
1 1 . .
2 1 2 2
3 1 3 2
4 1 . .
5 1 5 5
6 1 6 5
7 1 . .
8 1 8 8
9 2 1 1
10 2 2 1
11 2 3 1
12 2 . .
13 2 5 5
14 2 6 5

Cleaner way of handling addition of summarizing rows to table?

I have a dataset that is unique by 5 variables, with two dependent variables. My goal is for this dataset to have appended to it additional rows with TOTAL as the value of independent variables, with the values of the dependent variables changing accordingly.
To do this for a single independent variable is not a problem, I would do something along the lines of:
proc sql;
create table want as
select "TOTAL" as independent_var1,
independent_var2,
...
independent_var5,
sum(dependent_1) as dependent_1,
sum(dependent_2) as dependent_2
from have
group by independent_var1,...,independent_var5;
quit;
Followed by appending the original dataset in whatever fashion you choose. However, I want the above, yet x5 (for each independent variable), and then again for each possible combination of TOTAL/nontotal across the 5 independent variables. Not sure just how many datasets that is off the top of my head...but it's a decent amount.
So best strategy I've come up with so far is to use the above with some mildly creative macro code to generate all possible table combinations of total/non-total, but it seems like SAS just might have a better way, maybe tucked away in an esoteric proc step I've never heard of...
--
Attempt to show example, using three independent variables and 1 dependent variable:
Ind1|2|3|Dependent1
0 0 0 1
0 0 1 3
0 1 0 5
0 1 1 7
Desired output would be:
0 0 ALL 4
0 1 ALL 12
0 ALL 0 6
0 ALL 1 10
ALL 0 0 1
ALL 0 1 3
ALL 1 0 5
ALL 1 1 7
0 ALL ALL 16
ALL 0 ALL 4
ALL 1 ALL 12
ALL ALL 0 6
ALL ALL 1 10
ALL ALL ALL 16
0 0 0 1
0 0 1 3
0 1 0 5
0 1 1 7
I may have forgotten some combinations, but that should serve to get the point across.
PROC MEANS should do this for you trivially. You need to clean up the output in order to get it to perfectly match what you want (missing for INDx = "ALL" in your example) but otherwise it gets the calculations done properly.
data have;
input Ind1 Ind2 Ind3 Dependent1;
datalines;
0 0 0 1
0 0 1 3
0 1 0 5
0 1 1 7
;;;;
run;
proc means data=have;
class ind1 ind2 ind3;
var dependent1;
output out=want sum=;
run;