how to swap first and last observations in sas data set if it contain 3 observations - sas

I have a data set with 3 observations, 1 2 3
4 5 6
7 8 9 , now i have to interchange 1 2 3 and 7 8 9.
How can do this in base sas?

If you just want to sort your dataset by a variable in descending order, use proc sort:
data example;
input number;
datalines;
123
456
789
;
run;
proc sort data = example;
by descending number;
run;
If you want to re-order a dataset in a more complex way, create a new variable containing the position that you want each row to be in, and then sort it by that variable.

If you want to swap the contents of the first and last observations while leaving the rest of the dataset in place, you could do something like this.
data class;
set sashelp.class;
run;
data firstobs;
i = 1;
set sashelp.class(obs = 1);
run;
data lastobs;
i = nobs;
set sashelp.class nobs = nobs point = nobs;
output;
stop;
run;
data transaction;
set lastobs firstobs;
/*Swap the values of i for first and last obs*/
retain _i;
if _n_ = 1 then do;
_i = i;
i = 1;
end;
if _n_ = 2 then i = _i;
drop _i;
run;
data class;
set transaction(keep = i);
modify class point = i;
set transaction;
run;
This modifies just the first and last observations, which should be quite a bit faster than sorting or replacing a large dataset. You can do a similar thing with the update statement, but that only works if your dataset is already sorted / indexed by a unique key.

By Sandeep Sharma:sandeep.sharmas091#gmail.com
data testy;
input a;
datalines;
1
2
3
4
5
6
7
8
9
;
run;
data ghj;
drop y;
do i=nobs-2 to nobs;
set testy point=i nobs=nobs;
output;
end;
do n=4 to nobs-3;
set testy point=n;
output;
end;
do y=1 to 3;
set testy;
output;
end;
stop;
run;

Related

Assign 3 values to make one record becomes three records

I want to make a simple dataset as left, to become a dataset at the right.
There are three values for V1, 1,2,3. And I want to assign it to each of the id.
No problem. Do this
data have;
input id;
datalines;
1
2
;
data want;
set have;
do _N_ = 1 to 3;
v1 = put(_N_, z2. -l);
output;
end;
run;
Try this
data have;
input id;
datalines;
1
2
;
data want;
set have;
do v1 = 1 to 3;
output;
end;
run;

SAS - Row by row Comparison within different ID Variables of Same Dataset and delete ALL Duplicates

I need some help in trying to execute a comparison of rows within different ID variable groups, all in a single dataset.
That is, if there is any duplicate observation within two or more ID groups, then I'd like to delete the observation entirely.
I want to identify any duplicates between rows of different groups and delete the observation entirely.
For example:
ID Value
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
The output I desire is:
ID Value
1 D
3 Z
I have looked online extensively, and tried a few things. I thought I could mark the duplicates with a flag and then delete based off that flag.
The flagging code is:
data have;
set want;
flag = first.ID ne last.ID;
run;
This worked for some cases, but I also got duplicates within the same value group flagged.
Therefore the first observation got deleted:
ID Value
3 Z
I also tried:
data have;
set want;
flag = first.ID ne last.ID and first.value ne last.value;
run;
but that didn't mark any duplicates at all.
I would appreciate any help.
Please let me know if any other information is required.
Thanks.
Here's a fairly simple way to do it: sort and deduplicate by value + ID, then keep only rows with values that occur only for a single ID.
data have;
input ID Value $;
cards;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
;
run;
proc sort data = have nodupkey;
by value ID;
run;
data want;
set have;
by value;
if first.value and last.value;
run;
proc sql version:
proc sql;
create table want as
select distinct ID, value from have
group by value
having count(distinct id) =1
order by id
;
quit;
This is my interpretation of the requirements.
Find levels of value that occur in only 1 ID.
data have;
input ID Value:$1.;
cards;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
;;;;
proc print;
proc summary nway; /*Dedup*/
class id value;
output out=dedup(drop=_type_ rename=(_freq_=occr));
run;
proc print;
run;
proc summary nway;
class value;
output out=want(drop=_type_) idgroup(out[1](id)=) sum(occr)=;
run;
proc print;
where _freq_ eq 1;
run;
proc print;
run;
A slightly different approach can use a hash object to track the unique values belonging to a single group.
data have; input
ID Value:& $1.; datalines;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
run;
proc delete data=want;
proc ds2;
data _null_;
declare package hash values();
declare package hash discards();
declare double idhave;
method init();
values.keys([value]);
values.data([value ID]);
values.defineDone();
discards.keys([value]);
discards.defineDone();
end;
method run();
set have;
if discards.find() ne 0 then do;
idhave = id;
if values.find() eq 0 and id ne idhave then do;
values.remove();
discards.add();
end;
else
values.add();
end;
end;
method term();
values.output('want');
end;
enddata;
run;
quit;
%let syslast = want;
I think what you should do is:
data want;
set have;
by ID value;
if not first.value then flag = 1;
else flag = 0;
run;
This basically flags all occurrences of a value except the first for a given ID.
Also I changed want and have assuming you create what you want from what you have. Also I assume have is sorted by ID value order.
Also this will only flag 1 D above. Not 3 Z
Additional Inputs
Can't you just do a sort to get rid of the duplicates:
proc sort data = have out = want nodupkey dupout = not_wanted;
by ID value;
run;
So if you process the observations by VALUE levels (instead of by ID levels) then you just need keep track of whether any ID is ever different than the first one.
data want ;
do until (last.value);
set have ;
by value ;
if first.value then first_id=id;
else if id ne first_id then remapped=1;
end;
if not remapped;
keep value id;
run;

Add observation at the end to an existing SAS dataset

My dataset(named A) has columns : A B C. I want to add new observations (new row) at the end of it with the values: 1 2 3. There must be an easy way to do that?
Just use a proc sql and insert statement.
proc sql;
insert into table_name (A,B,C) values (1,2,3);
quit;
Here are 5 more ways of doing this:
/*Some dummy data*/
data have;
input A B C;
cards;
4 5 6
;
run;
data new_rows;
input A B C;
cards;
1 2 3
6 7 8
;
run;
/* Modifying in place - more efficient, increased risk of data loss */
proc sql;
insert into have
select * from new_rows;
quit;
proc append base = have data = new_rows;
run;
data have;
modify have;
set new_rows;
output;
run;
/* Overwriting - less efficient, no harm if interrupted. */
data have;
set have new_rows;
run;
data have;
update have new_rows;
/*N.B. assumes that A B C form a set of unique keys and that the datasets are sorted*/
by A B C;
run;

Copy a value from a line below in SAS

I hope this is not a duplicate question. I've searched the forum and retain function seems to be choice of weapon but it copies down an observation, and I'm trying to do the following; for a given id, copy the second line to the first line for the x value. also first value of x is always 2.
Here's my data;
id x
3 2
3 1
3 1
2 2
2 1
2 1
6 2
6 0
6 0
and i want it to look like this;
id x
3 1
3 1
3 1
2 1
2 1
2 1
6 0
6 0
6 0
and here's the starter code;
data have;
input id x;
cards;
3 2
3 1
3 1
2 2
2 1
2 1
6 2
6 0
6 0
;
run;
Lead is tricky in SAS. You can sort in reverse and use a lag function to get around it though, and you are right: a retain statement will allow us to add an order variable so we can sort it back to its original format.
data have;
set have;
retain order;
lagid = lag(id);
if id ne lagid then order = 0;
order = order + 1;
drop lagid;
run;
proc sort data=have; by id descending order; run;
data have;
set have;
leadx = lag(x);
run;
proc sort data=have; by id order; run;
data have;
set have;
if order = 3 then x_fixed = x;
else x_fixed = leadx;
run;
If your data is exactly as you say, then you can use a lookahead merge. It literally takes the dataset and merges itself to a copy of the dataset that starts on row 2, side-to-side. You just have to check that you're still on the same ID. This does change the value of x for all records to the value one hence, not just the first; you could add additional code to pay attention to that (but can't use FIRST and LAST).
data want;
merge have have(firstobs=2 rename=(id=newid x=newx));
if newid=id then x=newx;
keep x id;
run;
If you don't have any additional variables of interest, then you can do something even more interesting: duplicate the second row in its entirety and delete the first row.
data want;
set have;
by id notsorted;
if first.id then do;
firstrow+1;
delete;
end;
if firstrow=1 then do;
firstrow=0;
output;
end;
output;
run;
However, the "safest" method (in terms of doing most likely what you want precisely) is the following, which is a DoW loop.
data want;
idcounter=0;
do _n_ = 1 by 1 until (last.id);
set have;
by id notsorted;
idcounter+1;
if idcounter=2 then second_x = x;
end;
do _n_=1 by 1 until (last.id);
set have;
by id notsorted;
if first.id then x=second_x;
output;
end;
run;
This identifies the second x in the first loop, for that BY group, then in the second loop sets it to the correct value for row 1 and outputs.
In both of the latter examples I assume your data is organized by ID but not truly sorted (like your initial example is). If it's not organized by ID, you need to perform a sort first (but then can remove the NOTSORTED).

How to find max value of variable for each unique observation in "stacked" dataset

Sorry for the vauge title.
My data set looks essentially like this:
ID X
18 1
18 1
18 2
18 1
18 2
369 2
369 3
369 3
361 1
what I want is to find the max value of x for each ID. In this dataset, that would be 2 for ID=18 and 3 for ID=361.
Any feedback would be greatly appreciated.
Proc Means with a class statement (so you don't have to sort) and requesting the max statistic is probably the most straightforward approach (untested):
data sample;
input id x;
datalines;
18 1
18 1
18 2
18 1
18 2
369 2
369 3
369 3
361 1
;
run;
proc means data=sample noprint max nway missing;
class id;
var x;
output out=sample_max (drop=_type_ _freq_) max=;
run;
Check out the online SAS documentation for more details on Proc Means (http://support.sas.com/onlinedoc/913/docMainpage.jsp).
I don't quite understand your example. I can't imagine that the input data set really has all the values in one observation. Do you instead mean something like this?
data sample;
input myid myvalue;
datalines;
18 1
18 1
18 2
18 1
18 2
369 2
369 3
369 3
361 1
;
proc sort data=sample;
by myid myvalue;
run;
data result;
set sample;
by myid;
if last.myid then output;
run;
proc print data=result;
run;
This would give you this result:
Obs myid myvalue
1 18 2
2 361 1
3 369 3
If you want to keep both all records and the max value of X by id, I would use either the PROC MEANS aproach followed by a merge statement, or you can sort the data by Id and DESCENDING X first, and then use the RETAIN statement to create the max_value directly in the datastep:
PROC SORT DATA=A; BY ID DESCENDING X; RUN;
DATA B; SET A;
BY ID;
RETAIN X_MAX;
IF FIRST.ID THEN X_MAX = X;
ELSE X_MAX = X_MAX;
RUN;
You could try this:
PROC SQL;
CREATE TABLE CHCK AS SELECT MYID, MAX(MYVALUE) FROM SAMPLE
GROUP BY 1;
QUIT;
A couple of more over-engineered options that might be of interest for anyone who needs to do this with a really big dataset, where performance is more of a concern:
If your dataset is already sorted by ID, but not by X within each ID, you can still do this in a single data step without any sorting, using a retained max within each by group. Alternatively, you can use proc means (as per the top answer) but with a by statement rather than a class statement - this reduces the memory usage.
data sample;
input id x;
datalines;
18 1
18 1
18 2
18 1
18 2
369 2
369 3
369 3
361 1
;
run;
data want;
do until(last.ID);
set sample;
by ID;
xmax = max(x, xmax);
end;
x = xmax;
drop xmax;
run;
Even if your dataset is not sorted by ID, you can still do this in one data step, without sorting it, by using a hash object to keep track of the maximum x value you've found for each ID as you go along. This will be a little faster than proc means and will typically use less memory, as proc means does various calculations in the background which are not needed in the output dataset.
data _null_;
set sample end = eof;
if _n_ = 1 then do;
call missing(xmax);
declare hash h(ordered:'a');
rc = h.definekey('ID');
rc = h.definedata('ID','xmax');
rc = h.definedone();
end;
rc = h.find();
if rc = 0 then do;
if x > xmax then do;
xmax = x;
rc = h.replace();
end;
end;
else do;
xmax = x;
rc = h.add();
end;
if eof then rc = h.output(dataset:'want2');
run;
In this example, on my PC, the hash approach used this much memory:
memory 966.15k
OS Memory 27292.00k
vs. this much for an equivalent proc summary:
memory 8706.90k
OS Memory 35760.00k
Not a bad saving if you really need it to scale up!
Use an appropriate proc with the by statement. For instance,
data sample;
input myid myvalue;
datalines;
18 1
18 1
18 2
18 1
18 2
369 2
369 3
369 3
361 1
;
run;
proc sort data=sample;
by myid;
run;
proc means data=sample;
var myvalue;
by myid;
run;
I would just sort by x and id putting the highest value for each ID at the top.
NODUPKEY removes every duplicate below.
proc sort data=yourstacked_data out=yourstacked_data_sorted;
by DECENDING x id;
run;
proc sort data=yourstacked_data NODUPKEY out=top_value_only;
by id;
run;
A multidata hash should be used if you want the result to show each id at max value. That is, for the cases when more than one id is found having a max value
Example code:
Find the ids associated with the max value of 40 different numeric variables.
The code is Proc DS2 data program.
data have;
call streaminit(123);
do id = 1 to 1e5; %* 10,000 rows;
array v v1-v40; %* 40 different variables;
do over v; v=ceil(rand('uniform', 2e5)); end;
output;
end;
run;
proc ds2;
data _null_;
declare char(32) _name_ ; %* global declarations;
declare double value id;
declare package hash result();
vararray double v[*] v:; %* variable based array, limit yourself to 1,000;
declare double max[1000]; %* temporary array for holding the vars maximum values;
method init();
declare package sqlstmt s('drop table want'); %* DS2 version of `delete`;
s.execute();
result.keys([_name_]); %* instantiate a multidata hash;
result.data([_name_ value id]);
result.multidata();
result.ordered('ascending');
result.defineDone();
end;
method term();
result.output('want'); %* write the results to a table;
end;
method run();
declare int index;
set have;
%* process each variable being examined for 'id at max';
do index = 1 to dim(v);
if v[index] > max[index] then do; %* new maximum for this variable ?
_name_ = vname(v[index]); %* retrieve variable name;
value = v[index]; %* move value into hash host variable;
if not missing (max[index]) then do;
result.removeall(); %* remove existing multidata items associated with the variable;
end;
result.add(); %* add new multidata item to hash;
max[index] = v[index]; %* track new maximum;
end;
else
if v[index] = max[index] then do; %* two or more ids have same max;
_name_ = vname(v[index]);
value = v[index];
result.add(); %* add id to the multidata item;
end;
end;
end;
enddata;
run;
quit;
%let syslast=want;
Reminder: Proc DS2 defaults are to not overwrite existing tables. To 'overwrite' a table you need to either:
Use table option overwrite=yes when syntax allows
The package hash .output() method does not recognize the table option
Drop the table before recreating it
The above code can be used in a Base SAS DATA step with minor modifications.