Add observation at the end to an existing SAS dataset - sas

My dataset(named A) has columns : A B C. I want to add new observations (new row) at the end of it with the values: 1 2 3. There must be an easy way to do that?

Just use a proc sql and insert statement.
proc sql;
insert into table_name (A,B,C) values (1,2,3);
quit;

Here are 5 more ways of doing this:
/*Some dummy data*/
data have;
input A B C;
cards;
4 5 6
;
run;
data new_rows;
input A B C;
cards;
1 2 3
6 7 8
;
run;
/* Modifying in place - more efficient, increased risk of data loss */
proc sql;
insert into have
select * from new_rows;
quit;
proc append base = have data = new_rows;
run;
data have;
modify have;
set new_rows;
output;
run;
/* Overwriting - less efficient, no harm if interrupted. */
data have;
set have new_rows;
run;
data have;
update have new_rows;
/*N.B. assumes that A B C form a set of unique keys and that the datasets are sorted*/
by A B C;
run;

Related

How do I add in rows with specific values missing in a single DATA step?

Here is a simple example I came up with. There are 3 players here (id is 1,2,3) and each player gets 3 attempts at the game (attempt is 1,2,3).
data have;
infile datalines delimiter=",";
input id attempt score;
datalines;
1,1,100
1,2,200
2,1,150
3,1,60
;
run;
I would like to add in rows where the score is missing if they did not play attempt 2 or attempt 3.
data want;
set have;
by id attempt;
* ??? ;
run;
proc print data=have;
run;
The output would look something like this.
1 1 100
1 2 200
1 3 .
2 1 150
2 2 .
2 3 .
3 1 60
3 2 .
3 3 .
How do I go about doing this?
You could solve this by first creating a table where you have the structure you want to see: for each ID three attempts. This structure can then be joined with a 'left join' to your 'have' table to get the actual scores if they exist and missing variable if they don't.
/* Create table with all ids for which the structure needs to be created */
proc sql;
create table ids as
select distinct id from have;
quit;
/* Create table structure with 3 attempts per ID */
data ids (drop = i);
set ids;
do i = 1 to 3;
attempt = i;
output;
end;
run;
/* Join the table structure to the actual scores in the have table */
proc sql;
create table want as
select a.*,
b.score
from ids a left join have b on a.id = b.id and a.attempt = b.attempt;
quit;
A table of possible attempts cross joined with the distinct ids left joined to the data will produce the desired result set.
Example:
data have;
infile datalines delimiter=",";
input id attempt score;
datalines;
1,1,100
1,2,200
2,1,150
3,1,60
;
data attempts;
do attempt = 1 to 3; output; end;
run;
proc sql;
create table want as
select
each_id.id,
each_attempt.attempt,
have.score
from
(select distinct id from have) each_id
cross join
attempts each_attempt
left join
have
on
each_id.id = have.id
& each_attempt.attempt = have.attempt
order by
id, attempt
;
Update: I figured it out.
proc sort data=have;
by id attempt;
data want;
set have (rename=(attempt=orig_attempt score=orig_score));
by id;
** Previous attempt number **;
retain prev;
if first.id then prev = 0;
** If there is a gap between previous attempt and current attempt, output a blank record for each intervening attempt **;
if orig_attempt > prev + 1 then do attempt = prev + 1 to orig_attempt - 1;
score = .;
output;
end;
** Output current attempt **;
attempt = orig_attempt;
score = orig_score;
output;
** If this is the last record and there are more attempts that should be included, output dummy records for them **;
** (Assumes that you know the maximum number of attempts) **;
if last.id & attempt < 3 then do attempt = attempt + 1 to 3;
score = .;
output;
end;
** Update last attempt used in this iteration **;
prev = attempt;
run;
Here is a alternative DATA step, a DOW way:
data want;
do until (last.id);
set have;
by id;
output;
end;
call missing(score);
do attempt = attempt+1 to 3;
output;
end;
run;
If the absent observations are only at the end then you can just use a couple of OUTPUT statements and a DO loop. So write each observation as it is read and if the last one is NOT attempt 3 then add more observations until you get to attempt 3.
data want1;
set have ;
by id;
output;
score=.;
if last.id then do attempt=attempt+1 to 3;
output;
end;
run;
If the absent attempts can appear any where then you need to "look ahead" to see whether the next observations skips any attempts.
data want2;
set have end=eof;
by id ;
if not eof then set have (firstobs=2 keep=attempt rename=(attempt=next));
if last.id then next=3+1;
output;
score=.;
do attempt=attempt+1 to next-1;
output;
end;
drop next;
run;

SAS - Row by row Comparison within different ID Variables of Same Dataset and delete ALL Duplicates

I need some help in trying to execute a comparison of rows within different ID variable groups, all in a single dataset.
That is, if there is any duplicate observation within two or more ID groups, then I'd like to delete the observation entirely.
I want to identify any duplicates between rows of different groups and delete the observation entirely.
For example:
ID Value
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
The output I desire is:
ID Value
1 D
3 Z
I have looked online extensively, and tried a few things. I thought I could mark the duplicates with a flag and then delete based off that flag.
The flagging code is:
data have;
set want;
flag = first.ID ne last.ID;
run;
This worked for some cases, but I also got duplicates within the same value group flagged.
Therefore the first observation got deleted:
ID Value
3 Z
I also tried:
data have;
set want;
flag = first.ID ne last.ID and first.value ne last.value;
run;
but that didn't mark any duplicates at all.
I would appreciate any help.
Please let me know if any other information is required.
Thanks.
Here's a fairly simple way to do it: sort and deduplicate by value + ID, then keep only rows with values that occur only for a single ID.
data have;
input ID Value $;
cards;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
;
run;
proc sort data = have nodupkey;
by value ID;
run;
data want;
set have;
by value;
if first.value and last.value;
run;
proc sql version:
proc sql;
create table want as
select distinct ID, value from have
group by value
having count(distinct id) =1
order by id
;
quit;
This is my interpretation of the requirements.
Find levels of value that occur in only 1 ID.
data have;
input ID Value:$1.;
cards;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
;;;;
proc print;
proc summary nway; /*Dedup*/
class id value;
output out=dedup(drop=_type_ rename=(_freq_=occr));
run;
proc print;
run;
proc summary nway;
class value;
output out=want(drop=_type_) idgroup(out[1](id)=) sum(occr)=;
run;
proc print;
where _freq_ eq 1;
run;
proc print;
run;
A slightly different approach can use a hash object to track the unique values belonging to a single group.
data have; input
ID Value:& $1.; datalines;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
run;
proc delete data=want;
proc ds2;
data _null_;
declare package hash values();
declare package hash discards();
declare double idhave;
method init();
values.keys([value]);
values.data([value ID]);
values.defineDone();
discards.keys([value]);
discards.defineDone();
end;
method run();
set have;
if discards.find() ne 0 then do;
idhave = id;
if values.find() eq 0 and id ne idhave then do;
values.remove();
discards.add();
end;
else
values.add();
end;
end;
method term();
values.output('want');
end;
enddata;
run;
quit;
%let syslast = want;
I think what you should do is:
data want;
set have;
by ID value;
if not first.value then flag = 1;
else flag = 0;
run;
This basically flags all occurrences of a value except the first for a given ID.
Also I changed want and have assuming you create what you want from what you have. Also I assume have is sorted by ID value order.
Also this will only flag 1 D above. Not 3 Z
Additional Inputs
Can't you just do a sort to get rid of the duplicates:
proc sort data = have out = want nodupkey dupout = not_wanted;
by ID value;
run;
So if you process the observations by VALUE levels (instead of by ID levels) then you just need keep track of whether any ID is ever different than the first one.
data want ;
do until (last.value);
set have ;
by value ;
if first.value then first_id=id;
else if id ne first_id then remapped=1;
end;
if not remapped;
keep value id;
run;

Converting data set in SAS for 1-way anova

Essentially I need to reorder my data set.
The data consists of 4 columns, one for each treatment group. I'm trying to run a simple 1-way anova in SAS but I don't know how to reorder the data so that there are two columns, one with the responses and one with the treatments.
Here is some example code to create example data sets.
data have;
input A B C D;
cards;
26.72 37.42 11.23 44.33
28.58 56.46 29.63 76.86
29.71 51.91 . .
;
run;
data want;
input Response Treatment $;
cards;
26.72 A
28.58 A
29.71 A
37.42 B
56.46 B
51.91 B
11.23 C
29.63 C
44.33 D
76.86 D
;
run;
I'm sure this is a very simple solution but I didn't see the same thing asked elsewhere on the site. I'm typically an R user but have to use SAS for this analysis so I could be looking for the wrong keywords.
I have used proc transpose for this, see below
/*1. create row numbers for each obs*/
data have1;
set have;
if _n_=1 then row=1;
else row+1;
run;
proc sort data=have1; by row; run;
/*Transpose the dataset by row number*/
proc transpose data=have1 out=want0;
by row;
run;
/*Final dataset by removing missing values*/
data want;
set want0;
drop row;
if COL1=. then delete;
rename _NAME_=Response
COL1=Treatment;
run;
proc sort data=want; by Response; run;
proc print data=want; run;
If that is your data for which you must use SAS just read so that you get the structure required for ANOVA.
data have;
do rep=1 to 3;
do trt='A','B','C','D';
input y #;
output;
end;
end;
cards;
26.72 37.42 11.23 44.33
28.58 56.46 29.63 76.86
29.71 51.91 . .
;;;;
run;
proc print;
run;

SAS: Using Do/Loop in a Proc Transpose

I'm not very familiar with Do Loops in SAS and was hoping to get some help. I have data that looks like this:
Product A: 1
Product A: 2
Product A: 4
I'd like to transpose (easy) and flag that Product A: 3 is missing, but I need to do this iteratively to the i-th degree since the number of products is large.
If I run the transpose part in SAS, my first column will be 1, second column will be 2, and third column will be 4 - but I'd really like the third column to be missing and the fourth column to be 4.
Any thoughts? Thanks.
Get some sample data:
proc sort data=sashelp.iris out=sorted;
by species;
run;
Determine the largest column we will need to transpose to. Depending on your situation you may just want to hardcode this value using a %let max=somevalue; statement:
proc sql noprint;
select cats(max(sepallength)) into :max from sorted;
quit;
%put &=max;
Transpose the data using a data step:
data want;
set sorted;
by species;
retain _1-_&max;
array a[1:&max] _1-_&max;
if first.species then do;
do cnt = lbound(a) to hbound(a);
a[cnt] = .;
end;
end;
a[sepallength] = sepallength;
if last.species then do;
output;
end;
keep species _1-_&max;
run;
Notice we are defining an array of columns: _1,_2,_3,..._max. This happens in our array statement.
We then use by-group processing to populate these newly created columns for a single species at a time. For each species, on the first record, we clear the array. For each record of the species, we populate the appropriate element of the array. On the final record for the species output the array contents.
You need a way to tell SAS that you have 4 products and the values are 1-4. In this example I create dummy ID with the needed information then transpose using ID statement to name new variables using the value of product.
data product;
input id product ##;
cards;
1 1 1 2 1 4
2 2 2 3
;;;;
run;
proc print;
run;
data productspace;
if 0 then set product;
do product = 1 to 4;
output;
end;
stop;
run;
data productV / view=productV;
set productspace product;
run;
proc transpose data=productV out=wide(where=(not missing(id))) prefix=P;
by id;
var product;
id product;
run;
proc print;
run;

how to swap first and last observations in sas data set if it contain 3 observations

I have a data set with 3 observations, 1 2 3
4 5 6
7 8 9 , now i have to interchange 1 2 3 and 7 8 9.
How can do this in base sas?
If you just want to sort your dataset by a variable in descending order, use proc sort:
data example;
input number;
datalines;
123
456
789
;
run;
proc sort data = example;
by descending number;
run;
If you want to re-order a dataset in a more complex way, create a new variable containing the position that you want each row to be in, and then sort it by that variable.
If you want to swap the contents of the first and last observations while leaving the rest of the dataset in place, you could do something like this.
data class;
set sashelp.class;
run;
data firstobs;
i = 1;
set sashelp.class(obs = 1);
run;
data lastobs;
i = nobs;
set sashelp.class nobs = nobs point = nobs;
output;
stop;
run;
data transaction;
set lastobs firstobs;
/*Swap the values of i for first and last obs*/
retain _i;
if _n_ = 1 then do;
_i = i;
i = 1;
end;
if _n_ = 2 then i = _i;
drop _i;
run;
data class;
set transaction(keep = i);
modify class point = i;
set transaction;
run;
This modifies just the first and last observations, which should be quite a bit faster than sorting or replacing a large dataset. You can do a similar thing with the update statement, but that only works if your dataset is already sorted / indexed by a unique key.
By Sandeep Sharma:sandeep.sharmas091#gmail.com
data testy;
input a;
datalines;
1
2
3
4
5
6
7
8
9
;
run;
data ghj;
drop y;
do i=nobs-2 to nobs;
set testy point=i nobs=nobs;
output;
end;
do n=4 to nobs-3;
set testy point=n;
output;
end;
do y=1 to 3;
set testy;
output;
end;
stop;
run;