How to make dataset, where there last variables will be in one column - sas

I have a dataset:
1 300 apple pear onion
1 302 banana tomato cookie
2 302 bread meat tomato
How to make dataset, where there last variables will be in one column.
What I need:
Dataset

You need to look at the CATX function (or its siblings, CATS, CATT, CATQ, CAT).
new_var = catx(var1,var2,var3)
Or a couple of other options:
new_var = catx(of var:);
new_var = catx(of var1-var3);
If they're all starting with the same pattern.

Use proc transpose with your categories in the by statement and the variables to transpose in the var statement:
data have;
input var1 var2 var3 $ var4 $ var5 $;
datalines;
1 300 apple pear onion
1 302 banana tomato cookie
2 302 bread meat tomato
;
run;
proc transpose data=have out=want (drop=_name_ rename=(col1 = fruit));
by var1 var2;
var var3 var4 var5;
run;

Related

How do I conditionally select variables in PROC SQL?

I have calculated a frequency table in a previous step. Excerpt below:
I want to automatically drop all variables from this table where the frequency is missing. In the excerpt above, that would mean the variables "Exkl_UtgUtl_Taxi_kvot" and "Exkl_UtgUtl_Driv_kvot" would need to be dropped.
I try the following step in PROC SQL (which ideally I will repeat for all variables in the table):
PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot ELSE NULL END)
FROM stickprovsstorlekar;
quit;
This fails, however, since SAS does not like NULL values. How do I do this?
I tried just writing:
PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot)
FROM stickprovsstorlekar;
quit;
But that just generates a variable with an automatically generated name (like DATA_007). I want all variables containing missing values to be totally excluded from the results.
Let's say you have 10 variables, where var1, var3, var5, var7, and var9 have missing values in the first observation. We want to select only the variables with no missing observations.
var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
. 8 . 9 . 6 . 1 . 4
5 1 2 7 2 7 2 9 7 7
5 9 7 7 6 8 5 6 4 9
...
First, let's find all variables that have missing observations:
proc means data=have noprint;
var _NUMERIC_;
output out=missing nmiss=;
run;
Then transpose this output table so it's easier to work with:
proc transpose data=missing out=missing_tpose;
run;
We now have a table that looks like this:
_NAME_ COL1
_TYPE_ 0
_FREQ_ 10
var1 1
var2 0
var3 1
var4 0
var5 1
var6 0
var7 1
var8 0
var9 1
var10 0
When COL1 is > 0 and the name is not _TYPE_ or _FREQ_, that means the variable has missing values. Let's extract the name of the variable from _NAME_ into a comma-separated list.
proc sql noprint;
select _NAME_
into :vars separated by ','
from missing_tpose
where COL1 = 0 AND _NAME_ NOT IN('_TYPE_', '_FREQ_')
;
quit;
%put &vars and you'll see all of the non-missing values that can be passed into SQL.
var2,var4,var6,var8,var10
Now we have a dynamic way to select variables with only non-missing values.
proc sql;
create table want as
select &vars
from have
;
quit;

SAS: How to replace delimit and split column to multiple rows?

I have just started to learn SAS programming and I'm trying to experiment with replacing a phrase "####" with "|" before splitting the cell into multiple rows in SAS Studio.
I have created an example below for this experiment. It was reference from How to split a column into multiple rows in SAS but I couldn't get it to work. The SYSTEM_ID column is printing well but the ITEM_LIST is not splitting.
My current output is as follows:
Here's my current code. Please help.
data example1;
input SYSTEM_ID $ ITEM_LIST $ 5-50 ;
datalines;
ID_1 Apple Juice #### Orange #### Banana Milk
;
run;
data example2 (keep=SYSTEM_ID ITEM_LIST_SUB rename=(ITEM_LIST_SUB=ITEM_LIST));
set example1;
ITEM_LIST_TRANS = tranwrd(ITEM_LIST,"####","|");
do i = 1 to countw(ITEM_LIST_TRANS,"|");
ITEM_LIST_SUB = scan(ITEM_LIST,i,"|");
output;
end;
run;
proc print data = example2;run;
There are two small problems to your otherwise fine solution :-)
You reference ITEM_LIST in the Scan Function instead of ITEM_LIST_TRANS
In your example data, your data is indented, so ID_1 becomes part of ITEM_LIST.
See if this works for you
data example1;
input SYSTEM_ID $ ITEM_LIST $ 5-50 ;
datalines;
ID_1 Apple Juice #### Orange #### Banana Milk
;
run;
data example2 (keep=SYSTEM_ID ITEM_LIST_SUB rename=(ITEM_LIST_SUB=ITEM_LIST));
set example1;
ITEM_LIST_TRANS = tranwrd(ITEM_LIST,"####","|");
do i = 1 to countw(ITEM_LIST_TRANS,"|");
ITEM_LIST_SUB = scan(ITEM_LIST_TRANS,i,"|");
output;
end;
run;
proc print data = example2;run;
Result:
Obs SYSTEM_ID ITEM_LIST
1 ID_1 Apple Juice
2 ID_1 Orange
3 ID_1 Banana Milk
DLMSTR will allow direct read.
data example1;
infile cards dlmstr='####' missover;
input SYSTEM_ID $ #;
length item $50;
do until(missing(item));
input item #;
if not missing(item) then output;
end;
datalines;
ID_1 Apple Juice #### Orange #### Banana Milk
ID_2 Apple Juice #### Orange #### Banana Milk #### Apple Juice #### Orange #### Banana Milk
;
run;
proc print;
run;

sas relative frequencies by group

I have a categorical variable, say SALARY_GROUP, and a group variable, say COUNTRY. I would like to get the relative frequency of SALARY_GROUP within COUNTRY in SAS. Is it possible to get it by proc SUMMARY or proc means?
Perhaps explore proc tabulate and a counter variable?
Yes, You can calculate the relative frequency of a categorical variable using both Proc Means and Proc Summary. For both procs you have to:
-Specify NWAY in the proc statement,
-Specify in the Class statement your categorical fields,
-Specify in the Var statement your response or numeric field.
Example below is for proc means:
Dummy Data:
/*Dummy Data*/
data work.have;
input Country $ Salary_Group $ Value;
datalines;
USA Group1 100
USA Group1 100
GBR Group1 100
GBR Group1 100
USA Group2 20
USA Group2 20
GBR Group2 20
GBR Group1 100
;
run;
Code:
*Calculating Frequncy and saving output to table sg_means*/
proc means data=have n nway ;
class Country Salary_Group;
var Value;
output out=sg_means n=frequency;
run;
Output Table:
Country=GBR Salary_Group=Group1 _TYPE_=3 _FREQ_=3 frequency=3
Country=GBR Salary_Group=Group2 _TYPE_=3 _FREQ_=1 frequency=1
Country=USA Salary_Group=Group1 _TYPE_=3 _FREQ_=2 frequency=2
Country=USA Salary_Group=Group2 _TYPE_=3 _FREQ_=2 frequency=2

How to match data in SAS

I have a dataset which contain three variables var1, var2, and Price. Price is the price of var2. var1 is a subsample of of Var2. Now, I want to find the price of each product in var1 by matching the name of Var1 with Var2.
The data looks like this. Can anyone help me solve this out please. Many thanks
Var1 Var2 Price
apple ?
apple 2
banana ?
banana 2.1
apple ?
orange ?
orange 4
banana ?
yoghurt 2
You could do this through SQL by merging your prices onto your dataset by var1/var2:
proc sql ;
create table output as
select a.var1, a.var2, b.price
from input a
left join (select distinct var2, price
from input
where not missing(var2)) as b
on (a.var1=b.var2
or a.var2=b.var2)
;quit ;
Try to use hash table.
data want;
if 0 then set have(keep=var2 price where=(not missing(var2)));
if _n_=1 then do;
declare hash h (dataset:'have1(keep=var2 price where=(not missing(var2)))');
h.definekey('var2');
h.definedata('price');
h.definedone();
call missing(var2,price);
end;
set have;
rc=h.find(key:var1);
drop rc;
run;

Suppress column headings in proc report

My boss would like me to create a chart and table in SAS similar to something you can produce in excel, where the data table sits below the chart. This would mean using the data on the x-axis and placing more data below it.
Desired output
(chart area) (Row 1) Building 1 Building 2 Building 3 Building 4
(Row 2) 333 267 234 235
(Row 3) 3232 213 3215 657
I'm not sure how to do this in proc report, where the data runs long, instead of wide. Also, the data set is long:
Building ID var1 var2
Building 1 333 3232
Building 2 267 213
CarolinaJay's suggestion of a PROC GCHART or SGPLOT or whatnot followed by another proc is the way to go, IMO; while you could do both at once, it's a lot more work to do so.
To accomplish your specific table, I recommend PROC TABULATE; it doesn't care what direction your data goes.
data have;
informat buildingID $12.;
input BuildingID $ var1 var2;
datalines;
Building1 333 3232
Building2 267 213
;;;;
run;
proc tabulate data=have;
class buildingID;
var var1 var2;
tables (var1 var2)*sum=' ', buildingID=' ';
run;
Plop that under a plot, and you have something like this (I have no idea how to plot this so I just picked something totally at random):
ods _all_ close;
ods html;
data have;
informat buildingID $12.;
input BuildingID $ var1 var2;
datalines;
Building1 333 323
Building2 267 213
;;;;
run;
proc sgplot data=have;
vbar var1/response=var2 group=buildingID;
run;
title;
proc tabulate data=have;
class buildingID;
var var1 var2;
tables (var1 var2)*sum=' ', buildingID=' ';
run;
ods html close;