Concatenation in SAS - sas

Help my with my task, please. I have a problem:
Name Age Height Eyes
Dan 25 174 Blue
Dan 54 165 Black
Jane 33 160 Blue
Kate 19 170 Green
I need:
Name Characteristic
Dan 25
174
Blue
Dan 54
165
Black
Jane 33
160
Blue
Kate 19
170
Green
I tryed to do it with concatenation:
Characteristic=Age||Height||Eyes
But it makes one line from characteristics, but not a column:
Name Characteristic
Dan 25 174 Blue
Dan 54 165 Black
Jane 33160 Blue
Kate 19 170Green
I knew, I need use split to solve this moment. Help me, please with some advice

You need to convert it all to character to have it one field. You may also be able to use a data _null_ step to create your report.
Here's how you could transpose your data to one field that you could then use proc report on. This is a transpose problem, not concatenation.
data have;
input Name $ Age Height Eyes $;
cards;
Dan 25 174 Blue
Dan 54 165 Black
Jane 33 160 Blue
Kate 19 170 Green
;
run;
data want;
set have;
characteristic = put(age, 8. -l); output;
characteristic = put(height, 8. -l); output;
characteristic = Eyes; output;
drop age height eyes;
run;
If you were creating a text file or a different output this may be what you want:
data _null_;
set have;
file '/folders/myfolders/want.txt' dlm=" ";
put Name "09"x Age;
put "09"x Height;
put "09"x Eyes;
run;
Hope that helps!

Related

Creating Columns From Stacked Data

Piggy backing on a similar question I asked
(Summing a Column By Group In a Dataset With Macros)...
I have the following dataset:
Month Cost_Center Account Actual Annual_Budget
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
I would like to "splice" it so each month has its own respective column for Actual while summing the numeric values by Account.
So for example, I want the output to look like the following:
Account May_Actual_Sum June_Actual_Sum Annual_Budget
Postage 14562 37960 255251
Phone 4564 2660 32241
The code below provided by a fellow user works great when not needing to further dis-aggregated by month; however, I'm not sure if it's possible to do so (I tired adding a 'by month clause' - didn't work).
proc means data=Test N SUM NWAY STACKODS;
class Account_Description;
var Actual annual_budget;
by month;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
data want;
set summary_stats2;
run;
Use PROC MEANS to get summaries - same as last time. Please read up the documentation on PROC MEANS to understand how the CLASS statements works and how you can control the different levels of output.
Use PROC TRANSPOSE to flip the data wide. Since the budget amount is consistent across rows you'll be fine.
I'm guessing your next set of question will then be how to sort the columns correctly because your months won't sort and how to reference them dynamically to calculate the month to date changes. Which are some of the reasons why this data structure is not recommended.
data have;
input Month $ Cost_Center $ Account $ Actual Annual_Budget;
cards;
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
;
;
;;
run;
*summarize;
proc means data=have noprint nway;
class account month;
var actual annual_budget;
output out=temp sum=actual_total budget_total;
run;
*transpose;
proc transpose data=temp out=want prefix=Month_;
by account budget_total;
var actual_total;
id month;
run;
Output:
I cannot think of a way to generate this report using just one PROC. You will need to do some post processing of PROC MEANS or PROC SUMMARY results to get to this:
proc means data=have SUM ;
class Account month;
var Actual annual_budget;
output out = summary_stats SUM=;
run;
/* Look at summary_stats to understand it's structure here */
/* Otherwise you will not understand the following code */
proc sort data = summary_stats;
where _type_ in (2,3);
by account;
run;
data want;
set summary_stats;
by account ;
retain May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
if first.account then Annual_Budget_sum = Annual_Budget;
else do;
select(month);
when ('May') May_Actual_Sum = actual;
when ('June') June_Actual_Sum = actual;
/* List other months also here. Can use some macros here to make the code compact and expandable for future enhancements */
end;
end;
if last.account then output;
keep account May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
run;

Program for determining if aspirin is significantly better than placebo

I have been tasked with the following problem:
Out of a total 1,000 subjects on aspirin, 80 had heart attacks and 65 had strokes. Out of a total 2,000 subjects on placebo, 240 had heart attacks and 165 had strokes.
I am asked if there is a significant benefit for aspirin therapy for heart attacks and strokes. What is the RR for aspirin use for each of the two outcomes?
My main issue has been setting up the data lines. Here is what I have so far, but my output window doesn't look right.
Another issue is figuring out how to account for the varying sample sizes and the fact that someone might have had a heart attack AND a stroke.
DATA ODDS;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
Stroke 1-Yes 65
Stroke 2-No 165
;
PROC FREQ DATA=ODDS;
TITLE "Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH;
WEIGHT COUNT;
RUN;
Edit 1:
DATA ODDS;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
NoHeartAttack 1-Yes 920
NoHeartAttack 2-No 1760
Stroke 1-Yes 65
Stroke 2-No 165
NoStroke 1-Yes 935
NoStroke 2-No 1835
;
PROC FREQ DATA=ODDS;
TITLE "Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH;
WEIGHT COUNT;
RUN;
This is what your data and code should look like. You may need to flip the order in the TABLES statement so that the Relative Risk is calculated appropriately for your situation. I didn't bother checking that this was the case, as you can easily change if required.
DATA HeartAttack;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
NoHeartAttack 1-Yes 920
NoHeartAttack 2-No 1760
;
PROC FREQ DATA=HeartAttack;
TITLE "Heart Attack Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH Relrisk;
WEIGHT COUNT;
RUN;
DATA Stroke;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
Stroke 1-Yes 65
Stroke 2-No 165
NoStroke 1-Yes 935
NoStroke 2-No 1835
;
PROC FREQ DATA=Stroke;
TITLE "Stroke Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH Relrisk;
WEIGHT COUNT;
RUN;
If you have the raw data I would recommend working with that instead of creating datalines in the first place. It leaves room for errors and then you could also deal with the interaction.
Placebo gives you your expected distribution. We have to handle strokes and heart attacks separately because there is no data for interaction. (If there is no interaction we'd expect a small number of patients with both, but there could be a negative or masking interaction, e.g. if the heart attacks are fatal, or there could be a cumulative interaction heart attacks often preceded by strokes or vice versa). We can't answer any of those questions.
Once you've got you expected, it's simply two chi-squared tests with two bins each. Not one chi-squared test with 4 bins.
(I'll put in a plug for my book Basic Algorithms if you want to code the chi-squared significance test from the ground up, without using any loook-up tables).

SAS using Datalines - "observation read not used"

I am a complete newb to SAS and I only know is basic sql. Currently taking Regression class and having trouble with SAS code.
I am trying to input two columns of data where x variable is State; y variable is # of accidents for a simple regression.
I keep getting this:
ERROR: No valid observations are found.
Number of Observations Read 51
Number of Observations Used 0
Number of Observations with Missing Values 51
Is it because datalines only read numbers and not charcters?
Here is the code as well as the datalines:
Data Firearm_Accidents_1999_to_2014;
ods graphics on;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
Connecticut 0
Delaware 0
District_of_Columbia 0
Florida 350
Georgia 413
Hawaii 0
Idaho 0
Illinois 287
Indiana 288
Iowa 0
Kansas 44
Kentucky 384
Louisiana 562
Maine 0
Maryland 21
Massachusetts 27
Michigan 168
Minnesota 0
Mississippi 332
Missouri 320
Montana 0
Nebraska 0
Nevada 0
New_Hampshire 0
New_Jersey 85
New_Mexico 49
New_York 218
North_Carolina 437
North_Dakota 0
Ohio 306
Oklahoma 227
Oregon 41
Pennsylvania 465
Rhode_Island 0
South_Carolina 324
South_Dakota 0
Tennessee 603
Texas 876
Utah 0
Vermont 0
Virginia 203
Washington 45
West_Virginia 136
Wisconsin 64
Wyoming 0
;
run; proc print;
proc reg data = Firearm_Accidents_1999_to_2014;
model State = Sum_OF_Deaths;
ods graphics off;
run; quit;
OK, some different levels of issues here.
ODS GRAPHICS go before and after procs, not inside them.
When reading a character variable you need to tell SAS using an informat.
This allows you to read in the data. However your regression has several issues. For one, State is a character variable and you can do regression with a character variable. I think that issue is beyond this forum. Review your regression basics and check what you're trying to do.
Data Firearm_Accidents_1999_to_2014;
informat state $32.;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
....
;
run;

Values in column to reverse order

i need help in finding how to convert datavalues in a column to reverse order into new column or same column.I mean first datavalue in column should be the last value in column and vice versa.
example:
name age
karl 40
lowry 56
jim 29
robert 34
samuel 60
harry 47
the output i need should look like this.
name age
harry 47
samuel 60
robert 34
jim 29
lowry 56
karl 40
i need reverse order of the datavalues on variables age and name or only on one variable.
First create a variable of the observation number:
data temp;
set have;
ObsNum = _n_;
run;
Then use that variable to sort the dataset:
proc sort data=temp out=want (drop=ObsNum);
by descending ObsNum;
run;

How to subset automatically in SAS?

I am new to SAS, so this might be a silly type of question.
Assume there are several datasets with similar structure but different column names. I want to get new datasets with the same number of rows but only a subset of columns.
In the following example, Data_A and Data_B are original datasets and SubA and SubBare what I want. What is the efficient way of deriving SubA and SubB?
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
DATA SubA;
set A_auto;
keep A_make A_price;
RUN;
DATA SubB;
set B_auto;
keep B_make B_price;
RUN;
Here's my new answer. This introduces quite a few concepts, but all are necessary to complete this task.
First of all I would store the required part variable names (the suffixes that are common to all datasets) in a new dataset. This keeps them all in one place and makes it easier to change if required.
The next step is to create a regular expression (regex) search string that combines all the names, separated by a pipe (|), which is the regex symbol for or. I've also added a $ symbol to end of the names, this ensures only variables ending with the part names will be selected.
select into :[macroname] is the method to create macro variables within proc sql
Then I set up a macro to extract the specific variable names for the current dataset and use those names to create a view (like my original answer)
The dictionary library referenced in the proc sql is a metadata library that contains information on all active libraries, tables, columns etc, so is a good source of identifying what the actual variable names are called (based on the regex search string created earlier).
You won't need the proc print in your code, I just put it in to show everything is working as expected.
Let me know if this works for you
/* create intial datasets */
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH B_make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
/* create dataset containing partial name of variables to keep */
data keepvars;
input part_name $ :20.;
datalines;
_make
_price
;
run;
/* create regular expression search string from partial names */
proc sql noprint;
select
cats(part_name,'$') /* '$' matches end of string */
into
:name_str separated by '|' /* '|' is an 'or' search operator in regular expressions */
from
keepvars;
quit;
%put &name_str.; /* print search string to log */
/* macro to create views from datasets */
%macro create_views (dsname, vwname); /* inputs are dataset name being read in and view name being created */
/* extract specific variable names to be kept, based on search string */
proc sql noprint;
select
name
into
:vars separated by ' '
from
dictionary.columns
where
libname = 'WORK'
and memname = upper("&dsname.")
and prxmatch("/&name_str./",strip(name))>0; /* prxmatch is regular expression search function */
quit;
%put &vars.; /* print variables to keep to log */
/* create views */
data &vwname. / view=&vwname.;
set &dsname. (keep=&vars.);
run;
/* test view by printing */
proc print data=&vwname.;;
run;
%mend create_views;
/* run macro for each dataset */
%create_views(A_auto, SubA);
%create_views(B_auto, SubB);