SAS: Proc Report span columns instead of rows - sas

Take the following data as an example:
data test;
length IID_p PREF_p DATE_p IID IID_c PREF_c DATE_c $12;
input IID_p $ PREF_p $ DATE_p $ IID $ IID_c $ PREF_c $ DATE_c;
datalines;
ABC SHARE 20161024 ABC ABC NOSHARE 20161031
DEF SHARE 20161024 DEF DEF NOSHARE 20161031
HIJ NOSHARE 20161024 HIJ Notfound Notfound Notfound
XYZ NOSHARE 20161024 XYZ Notfound Notfound Notfound
;
run;
After a merge, I have the data above and show that HIJ and XYZ are not in the current week's data with the value Notfound.
In the following PROC REPORT statement (watered down for brevity), is it possible to span Not found across all three columns in the "Current Week" columns?
proc report data = merged spanrows nowd;
column ("Previous week"(IID_p PREF_p DATE_p)) ("Current week"(IID_c PREF_c DATE_c));
run;
So instead of the output below, "Not found" would span across all three columns and not be in each individual column:

You can only span rows and columns in the header but not the values/data.

Related

How do I avoid spaces/tabs in columns names when I use proc transpose?

How do I avoid spaces/tabs in columns names when I use proc transpose? The best way to illustrate my problem is by giving an example:
Data tst; input ColA $ ColB; datalines;
Cat1 1
Cat2 2
Cat3 3
; run;
proc transpose data = tst out= tst_out (drop = _name_); id ColA;
run;
When running this code my column names look something like this:
Basically I want the column names to be "Cat1", "Cat1", "Cat1" and not " Cat1", " Cat1", " Cat1".
(If that is not possible then I have an alternative question: How do I remove the spaces AFTER proc transpose? In my real data set I have a lot of columns so I prefer a method where I don't have to type for every column)
Just change the setting of VALIDVARNAME option to V7 instead of ANY. It won't remove the leading spaces/tabs but it will change them to underscores so the result are valid names.
Example:
data tst;
input ColA $& ColB;
datalines;
Cat 1 1
Cat 2 2
Cat 3 3
;
options validvarname=v7;
proc transpose data=tst out=tst2; id cola ; var colb; run;
proc print;
run;
Result:
Obs _NAME_ Cat_1 Cat_2 Cat_3
1 ColB 1 2 3
PS When using in-line data in your SAS program make sure to start the lines of data in the first column. That will prevent the accidental inclusion of spaces (or tabs when using SAS/Studio interface) in the lines of data. Placing the DATALINES (also known as CARDS) statement starting in the first column will also prevent the editor from automatically indenting when you start adding lines of data.

sas issue with retain to replace missing data

The following inherited simplified code is meant to replace missing values of a column with the values of not missing entries in a group:
DATA WORK.TOYDATA;
INPUT Category $ PRICE;
DATALINES;
Cat1 2
Cat1 .
Cat1 .
Cat2 .
Cat2 3
Cat2 .
;
DATA WORK.OUTTOYDATA;
SET WORK.TOYDATA;
BY Category ;
RETAIN _PRICE;
IF FIRST.Category THEN _PRICE=PRICE;
IF NOT MISSING(PRICE) THEN _PRICE=PRICE;
ELSE PRICE=_PRICE;
DROP _PRICE;
RUN;
Unfortunately, this will not work if the first entry in a group is missing. How could this be fixed?
As SAS works row by row through the dataset there is no value to replace if the first value is missing.
You could sort the data by Category and Price DESCENDING to circumvent this.
proc sort data= WORK.TOYDATA; by Category DESCENDING PRICE; run;
Or if there is only one NON-missing value by category you could use a sql join e.g.
proc sql;
create table WORK.OUTTOYDATA as
select a.Category, coalesce(a.PRICE, b.PRICE) as PRICE
from WORK.TOYDATA a
left join (select distinct Category, PRICE
from WORK.TOYDATA
where PRICE ne .
) b
on a.Category eq b.Category
;
quit;
As #Jetzler pointed out, the easiest way is just to sort the data. However, if you have multiple columns with missing values then you'd need to do multiple sorts, which isn't efficient.
Another option from doing a join is proc stdize which can be used to replace missing values with a simple measure (mean, median, sum etc). The default method will suffice in your example, you just need to add the reponly option which only replaces missing values and does not standardize the data.
DATA WORK.TOYDATA;
INPUT Category $ PRICE;
DATALINES;
Cat1 2
Cat1 .
Cat1 .
Cat2 .
Cat2 3
Cat2 .
;
run;
proc stdize data=TOYDATA out=want reponly;
by category;
var price;
run;

Extend SAS MACRO to multiple fields

I have a macro inspired by "PROC SQL by Example" that finds duplicate rows based on a single column/field:
data have ;
input name $ term $;
cards;
Joe 2000
Joe 2000
Joe 2002
Joe 2008
Sally 2001
Sally 2003
; run;
%MACRO DUPS(LIB, TABLE, GROUPBY) ;
PROC SQL ;
CREATE TABLE DUPROWS AS
SELECT &GROUPBY, COUNT(*) AS Duplicate_Rows
FROM &LIB..&TABLE
GROUP BY &GROUPBY
HAVING COUNT(*) > 1
ORDER BY Duplicate_Rows;
QUIT;
%MEND DUPS ;
%DUPS(WORK,have,name) ;
proc print data=duprows ; run;
I would like to extend this to look for duplicates based on multiple columns (Rows 1 and 2 in my example), but still be flexible enough to deal with a single column.
In this case it would run the code:
proc sql ;
create table duprows as select name,term,count(*) as Duplicate_Rows
from work.have
group by name,term
HAVING COUNT(*) > 1
;quit;
To produce:
To include an arbitrary number of fields to group on, you can list them all in the groupby macro parameter, but the list must be comma-delimited and surrounded by %quote(). Otherwise SAS will see the commas and think you're providing more macro parameters.
So in your case, your macro call would be:
%dups(lib = work, table = have, groupby = %quote(name, term));
Since &groupby is included in the select and group by clauses, all fields listed will appear in the output and will be used for grouping. This is because when &groupby resolves, it becomes the text name, term.

SAS - Proc Compare - show ALL duplicates

whilst using the PROC COMPARE is SAS, is it possible to list all duplicates found? By default a message will be displayed stating the first duplicate found and the total number of duplicates.
i.e:
data x1;
input x $ y $ z $ ;
datalines;
222 test abc
qqq test abc
aaa test abc
222 test abc
222 test abc
;
run;
data y1;
input x $ y $ z $ ;
datalines;
222 test abc
qqq test abc
aaa test abc
222 test abc
222 test abc
;
run;
***********************************;
*** sort data;
***********************************;
proc sort data=x1;
by x y;
run;
proc sort data=y1;
by x y;
run;
***********************************;
*** compare data;
***********************************;
proc compare listvar
base=x1
compare = y1;
id x y;
run;
************** END *****************;
output
The SAS System
The COMPARE Procedure
Comparison of WORK.X1 with WORK.Y1
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.X1 23OCT14:16:03:38 23OCT14:16:03:38 3 5
WORK.Y1 23OCT14:16:03:38 23OCT14:16:03:38 3 5
Variables Summary
Number of Variables in Common: 3.
Number of ID Variables: 2.
WARNING: The data set WORK.X1 contains a duplicate observation at observation
number 2.
NOTE: At observation 2 the current and previous ID values are:
x=222 y=test.
NOTE: Further warnings for duplicate observations in this data set will not be
printed.
WARNING: The data set WORK.Y1 contains a duplicate observation at observation
number 2.
NOTE: At observation 2 the current and previous ID values are:
x=222 y=test.
NOTE: Further warnings for duplicate observations in this data set will not be
printed.
Observation Summary
Observation Base Compare ID
First Obs 1 1 x=222 y=test
Last Obs 5 5 x=qqq y=test
Number of Observations in Common: 5.
Number of Duplicate Observations found in WORK.X1: 2.
Number of Duplicate Observations found in WORK.Y1: 2.
Total Number of Observations Read from WORK.X1: 5.
Total Number of Observations Read from WORK.Y1: 5.
Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 5.
NOTE: No unequal values were found. All values compared are exactly equal.
# Joe - thanks for the comment!
Proc Freq might be a good approach to find duplicates. Then just print them out with a Proc Print.
PROC FREQ;
TABLES keyvar / noprint out=keylist;
RUN;
PROC PRINT data=keylist;
WHERE count ge 2;
RUN;
I don't think there's a way to get the log or listing to list more than just the first duplicate, if that's what you're going after, using the ID statement.
What you are likely best off doing is using the OUTALL option, and outputting the results to a dataset (if you're not already). Then it would be fairly easy to see the duplicates.
For example:
data class2 class3;
set sashelp.class;
output;
output;
output class3;
run;
proc compare base=class2 compare=class3 out=outclass outall;
id name;
run;
You could also use the BY statement along with the ID statement, if it's sorted; then you'll still have duplicates, but each BY Group has a separate report, so you'd see the duplicates there.
proc compare base=class2 compare=class3 out=outclass outall;
by name;
id name;
run;
Finding exact number of duplicates for each id may be better suited for proc sql.
Something like:
proc sql;
create table x2 as select
*,
count(id_var)
from x1
group by x,y,z;
quit;
This could reveal any duplicate rows in either dataset.

SAS - Find value in a column and display that value in excel export

I am trying to basically do this :
I have a frequency query running on a data set which will output the result in excel.
I also want to add a column to the output in which the value will be based on what is listed in a particular cell or a particular column.
How would I go about this? (*very new sas user)
Without hearing more information, I assume what you're trying to do is save the output of your proc freq and then manipulate it further with a data step.
Simple example of this:
data beer;
length firstname favbrand $20.;
input firstname $
favbrand $;
datalines;
John bud
Steve dogfishhead
Jason coors
Anna anchorsteam
Bob bud
Dan bud
;
run;
proc freq data=beer;
table favbrand / out=freqout;
run;
data beerstat(keep=favbrand status);
set freqout;
* create a new column called "status" based on the count column ;
if (count >=2) then status="popular";
else status = "hipster";
run;
* instead of proc print you can send your output to excel with proc export ;
proc print data=beerstat;
run;