SAS combine two columns in a dataset - sas

I got a dataset looks like this:
Category
Sub-Category
value
A
1
xx
A
2
xx
A
3
xx
A
4
xx
B
1
xx
B
2
xx
B
3
xx
B
4
xx
I want to combine the first two columns and create a new dataset with the new category(rows) and it should looks like this:
Category
Value
A
1
xx
2
xx
3
xx
4
xx
B
1
xx
2
xx
3
xx
4
xx
...
Can anyone help me with that using SAS?
Thanks in advance!

That seems pretty useless as a DATASET.
Why not just print the original dataset.
proc print data=have;
by category;
run;
Result:
If you did want to generate that goofy dataset you could try interleaving two copies of the original data.
data want;
set have(in=in1) have(in=in2);
by category;
if in1 then do;
if first.category then call missing(value);
else delete;
end;
if in2 then category=cats(subcategory);
drop subcategory ;
run;
Result:

You can do this in a data step with multiple output statements. The only tricky part is the first row: you'll need to retain the original value to create the "header" row, output, then output a second time to create the first subcategory row.
data want;
set have end=eof;
by category subcategory;
if(first.category) then do;
/* "Header" row: */
_value_ = value;
value = ' ';
output;
/* First subcategory row: */
value = _value_;
category = strip(subcategory);
output;
end;
/* Subsequent subcategory rows */
else do;
category = strip(subcategory);
output;
end;
/* Add a space between categories except the last row */
if(last.category AND NOT eof) then do;
call missing(value, category);
output;
end;
drop _value_ subcategory;
run;
Result:
category value
A
1 xx
2 xx
3 xx
4 xx
B
1 xx
2 xx
3 xx
4 xx

data have;
input Category $ SubCategory value $;
datalines;
A 1 xx
A 2 xx
A 3 xx
A 4 xx
B 1 xx
B 2 xx
B 3 xx
B 4 xx
;
data want;
set have;
by Category;
if first.Category then do;
v = value;
value = '';
output;
value = v;
Category = put(SubCategory, 8. -l);
output;
end;
else do;
Category = put(SubCategory, 8. -l);
output;
end;
drop SubCategory v;
run;

Proc REPORT can produce the desired rendering of the data set.
proc report data=have;
columns category subcategory value;
define category / order noprint;
compute before category;
line category $200.;
endcomp;
run;
Some alternative renderings
proc print data=have;
by category;
id category;
var subcategory value;
run;
proc tabulate data=have;
class category subcategory;
var value;
table
category
* subcategory
,
value
;
run;
data haveR;
set have;
by category;
if first.category then output;
output;
run;
title "Report - indent";
proc report data=haveR;
column category subcategory cat value;
define category / order noprint;
define subcategory / display noprint;
define value / display;
define cat / computed style=[just=left] 'Category/hierarchy';
compute cat / length=200;
endcomp;
compute value;
if not missing(category) then do;
priorcategory = category;
cat = category;
value = .;
end;
else do;
cat = repeat('A0'x,3) || cats(subcategory);
end;
endcomp;
run;

Related

Count function for the last columns

How do i stat a count function for the last 3 columns in my dataset, putting into consideration that the name of the last 3 columns always changes
Data test;
Set test1;
Count=count(coulmn12,column13,column14);
Run;
You could also use an ARRAY. And the old if 0 then set.
data have;
retain id x1-x5 z1-z6 . z7-z10 . a ' ' z11-z12 . ;
id+1; z10 = 1; output;
id+1; z11 = 3.14159; output;
id+1; z12 = 42; output;
format _numeric_ 4.;
run;
data want;
if 0 then set have(drop=id /*or other numeric vars as needed*/);
array _v[*] _numeric_;
set have;
nmiss_3 = nmiss(_v[dim(_v)],_v[dim(_v)-1],_v[dim(_v)-2]);
run;
data want; /*move ID and NMISS_3 back to left*/
if 0 then set want(keep=id nmiss_3);
set want;
run;
proc print;
run;
You can query a data set's metadata to get the names of the last three columns.
data have;
retain id x1-x15 z1-z12 . ;
id+1; z10 = 1; output;
id+1; z11 = 3.14159; output;
id+1; z12 = 42; output;
format _numeric_ 4.;
run;
* get data sets metadata;
proc contents noprint data=have out=have_metadata;
run;
* query for names of last 3 columns;
proc sql noprint;
select name
into :last_three_columns separated by ','
from have_metadata
having varnum > max(varnum) - 3
;
%put NOTE: &=last_three_columns;
data want;
attrib last3_nmiss_count length=8;
set have;
last3_nmiss_count = nmiss(&last_three_columns);
run;
dm 'viewtable
want(keep=last3_nmiss_count id z:)';

SAS - Row by row Comparison within different ID Variables of Same Dataset and delete ALL Duplicates

I need some help in trying to execute a comparison of rows within different ID variable groups, all in a single dataset.
That is, if there is any duplicate observation within two or more ID groups, then I'd like to delete the observation entirely.
I want to identify any duplicates between rows of different groups and delete the observation entirely.
For example:
ID Value
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
The output I desire is:
ID Value
1 D
3 Z
I have looked online extensively, and tried a few things. I thought I could mark the duplicates with a flag and then delete based off that flag.
The flagging code is:
data have;
set want;
flag = first.ID ne last.ID;
run;
This worked for some cases, but I also got duplicates within the same value group flagged.
Therefore the first observation got deleted:
ID Value
3 Z
I also tried:
data have;
set want;
flag = first.ID ne last.ID and first.value ne last.value;
run;
but that didn't mark any duplicates at all.
I would appreciate any help.
Please let me know if any other information is required.
Thanks.
Here's a fairly simple way to do it: sort and deduplicate by value + ID, then keep only rows with values that occur only for a single ID.
data have;
input ID Value $;
cards;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
;
run;
proc sort data = have nodupkey;
by value ID;
run;
data want;
set have;
by value;
if first.value and last.value;
run;
proc sql version:
proc sql;
create table want as
select distinct ID, value from have
group by value
having count(distinct id) =1
order by id
;
quit;
This is my interpretation of the requirements.
Find levels of value that occur in only 1 ID.
data have;
input ID Value:$1.;
cards;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
;;;;
proc print;
proc summary nway; /*Dedup*/
class id value;
output out=dedup(drop=_type_ rename=(_freq_=occr));
run;
proc print;
run;
proc summary nway;
class value;
output out=want(drop=_type_) idgroup(out[1](id)=) sum(occr)=;
run;
proc print;
where _freq_ eq 1;
run;
proc print;
run;
A slightly different approach can use a hash object to track the unique values belonging to a single group.
data have; input
ID Value:& $1.; datalines;
1 A
1 B
1 C
1 D
1 D
2 A
2 C
3 A
3 Z
3 B
run;
proc delete data=want;
proc ds2;
data _null_;
declare package hash values();
declare package hash discards();
declare double idhave;
method init();
values.keys([value]);
values.data([value ID]);
values.defineDone();
discards.keys([value]);
discards.defineDone();
end;
method run();
set have;
if discards.find() ne 0 then do;
idhave = id;
if values.find() eq 0 and id ne idhave then do;
values.remove();
discards.add();
end;
else
values.add();
end;
end;
method term();
values.output('want');
end;
enddata;
run;
quit;
%let syslast = want;
I think what you should do is:
data want;
set have;
by ID value;
if not first.value then flag = 1;
else flag = 0;
run;
This basically flags all occurrences of a value except the first for a given ID.
Also I changed want and have assuming you create what you want from what you have. Also I assume have is sorted by ID value order.
Also this will only flag 1 D above. Not 3 Z
Additional Inputs
Can't you just do a sort to get rid of the duplicates:
proc sort data = have out = want nodupkey dupout = not_wanted;
by ID value;
run;
So if you process the observations by VALUE levels (instead of by ID levels) then you just need keep track of whether any ID is ever different than the first one.
data want ;
do until (last.value);
set have ;
by value ;
if first.value then first_id=id;
else if id ne first_id then remapped=1;
end;
if not remapped;
keep value id;
run;

Setting names to idgroup

Follow up to
SAS - transpose multiple variables in rows to columns
I have the following code:
data have;
input CX_ID 1. TYPE $1. COUNT_RATE 1. SUM_RATE 2.;
datalines;
1A110
1B220
2A120
;
run;
proc summary data = have nway;
class cx_id;
output out=want (drop = _:)
idgroup(out[2] (count_rate sum_rate)= count sum);
run;
So this table:
CX_ID TYPE COUNT_RATE SUM_RATE
1 A 1 10
1 B 2 20
2 A 1 20
becomes
CX_ID COUNT_1 COUNT_2 SUM_1 SUM_2
1 1 2 10 20
2 1 . 20 .
Which is perfect, but how do I set the names to be
Count_A Count_B Sum_A Sum_B
Or in general whatever the value in the type field of the have table ?
Thank you
A double PROC TRANSPOSE is dynamic and you can add a data step to customize the names easily.
*sample data;
data have;
input CX_ID 1. TYPE $1. COUNT 1. SUM 2.;
datalines;
1A110
1B220
2A120
;
run;
*transpose to long;
proc transpose data=have out=long;
by cx_id type;
run;
*transpose to wide;
proc transpose data=long out=wide;
by cx_id;
var col1;
id _name_ type;
run;

Report using data _Null_

I'm looking for report using SAS data step :
I have a data set:
Name Company Date
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
Report I'm looking for:
Name Company From To
X A 1997/05 1998/02
D 1998/02 1999/01
Y B 2003/09 2004/05
F 2003/09 2003/09
Z C 2002/10 2005/03
M 2001/09 2001/09
W G 2000/10 2000/10
THANK you,
Tried using proc print but it is not accurate. So looking for a data null solution.
data _null_;
set salesdata;
by name company date;
array x(*) from;
From=lag(date);
if first.name then count=1;
do i=count to dim(x);
x(i)=.;
end;
count+1;
If first.company then do;
from_date1=date;
end;
if last.company then To_date=date;
if from_date1 ="" and to_date="" then delete;
run;
data _null_;
set yourEvents;
by Name Company notsorted;
file print;
If _N_ EQ 1 then put
#01 'Name'
#06 'Company'
#14 'From'
#22 'To'
;
if first.Name then put
#01 Name
#; ** This instructs sas to not start a new line for the next put instruction **;
retain From To;
if first.company then do;
From = 1E9;
To = 0;
end;
if Date LT From then From = Date;
if Date GT To then To = Date;
if last.Company then put
#06 Company
#14 From yymm7.
#22 To yymm7.
;
run;
I have done data step to calculate From_date and To_date
and then proc report to print the report by group.
proc sort data=have ;
by Name Company Date;
run;
data want(drop=prev_date date);
set have;
by Name Company date;
attrib From_Date To_date format=yymms10.;
retain prev_date;
if first.Company then prev_date=date;
if last.Company then do;
To_date=Date;
From_Date=prev_date;
end;
if not(last.company) then delete;
run;
proc sort data=want;
by descending name ;
run;
proc report data=want;
define Name/order order=data;
run;
IMHO, the simplest way is exploiting proc report and its analysis column type as the code below. Note that name and company columns are automatically sorted in alphabetical order (as most of the summary functions or procedures do).
/* your data */
data have;
infile datalines;
input Name $ Company $ Date $;
cards;
X A 199802
X A 199705
X D 199901
y B 200405
y F 200309
Z C 200503
Z C 200408
Z C 200404
Z C 200309
Z C 200210
Z M 200109
W G 200010
;
run;
/* convert YYYYMM to date */
data have2(keep=name company date);
set have(rename=(date=date_txt));
name = upcase(name);
y = input(substr(date_txt, 1, 4), 4.);
m = input(substr(date_txt, 5, 2), 2.);
date = mdy(m,1,y);
format date yymms7.;
run;
/****** 1. proc report ******/
proc report data=have2;
columns name company date=date_from date=date_to;
define name / 'Name' group;
define company / 'Company' group;
define date_from / 'From' analysis min;
define date_to / 'To' analysis max;
run;
The html output:
(tested on SAS 9.4 win7 x64)
============================ OFFTOPIC ==============================
One may also consider using proc means or proc tabulate. The basic code forms are shown below. However, you can also see that further adjustments in output formats are required.
/***** 2. proc tabulate *****/
proc tabulate data=have2;
class name company;
var date;
table name*company, date=' '*(min='From' max='To')*format=yymms7.;
run;
proc tabulate output:
/***** 3. proc means (not quite there) *****/
* proc means + ODS -> cannot recognize date formats;
proc means data=have2 nonobs min max;
class name company;
format date yymms7.; * in vain;
var date;
run;
proc means output (cannot output date format, dunno why):
You may leave comments on improving these alternative ways.

how to swap first and last observations in sas data set if it contain 3 observations

I have a data set with 3 observations, 1 2 3
4 5 6
7 8 9 , now i have to interchange 1 2 3 and 7 8 9.
How can do this in base sas?
If you just want to sort your dataset by a variable in descending order, use proc sort:
data example;
input number;
datalines;
123
456
789
;
run;
proc sort data = example;
by descending number;
run;
If you want to re-order a dataset in a more complex way, create a new variable containing the position that you want each row to be in, and then sort it by that variable.
If you want to swap the contents of the first and last observations while leaving the rest of the dataset in place, you could do something like this.
data class;
set sashelp.class;
run;
data firstobs;
i = 1;
set sashelp.class(obs = 1);
run;
data lastobs;
i = nobs;
set sashelp.class nobs = nobs point = nobs;
output;
stop;
run;
data transaction;
set lastobs firstobs;
/*Swap the values of i for first and last obs*/
retain _i;
if _n_ = 1 then do;
_i = i;
i = 1;
end;
if _n_ = 2 then i = _i;
drop _i;
run;
data class;
set transaction(keep = i);
modify class point = i;
set transaction;
run;
This modifies just the first and last observations, which should be quite a bit faster than sorting or replacing a large dataset. You can do a similar thing with the update statement, but that only works if your dataset is already sorted / indexed by a unique key.
By Sandeep Sharma:sandeep.sharmas091#gmail.com
data testy;
input a;
datalines;
1
2
3
4
5
6
7
8
9
;
run;
data ghj;
drop y;
do i=nobs-2 to nobs;
set testy point=i nobs=nobs;
output;
end;
do n=4 to nobs-3;
set testy point=n;
output;
end;
do y=1 to 3;
set testy;
output;
end;
stop;
run;