Library Mozart does not exist in SAS - sas

I have written the following to store a file:
libname mozart 'C:\Users\PCPCPC\Documents\sasdeposite\learning';
data mozart.test_scores;
length ID $ 3 Name $ 15;
input ID $ Score1-Score3 Name $;
datalines;
1 90 95 98
2 78 77 75
3 88 91 92
;
But the compiler says that the library MOZART does not exist. But I can see MOZART in Solution->Analysis->Interactive dataAnalysis.

Check to make sure that folder location exists on the computer.
SAS will not create the folder for you if it doesn't already exist.

Related

Why does SAS skip an entire row of data values due to missing value?

When I run the following code the third observation is not output. Why does SAS omit the third observation?
data info;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62
M 61 72 271
F 29 73 125
M 16 65 178
;
Run;
title "Listing of Dataset Demographics";
proc print data=info;
run;
Defaults will get you, the default in SAS is FLOWOVER, so if a record is missing it looks for it on the next line. You want MISSOVER or TRUNCOVER instead.
Your log tells you this happened with the following note:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
This works:
data info;
infile cards truncover;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62
M 61 72 271
F 29 73 125
M 16 65 178
;
Run;
More details are available in the Example 2 in the documentation here.
Specifically:
When you omit the MISSOVER option or use FLOWOVER (which is the default), SAS moves the
input pointer to line 2 and reads values for TEMP4 and TEMP5 (variables it cannot find). The next
time the DATA step executes, SAS reads a new line which, in this case,
is line 3. This message appears in the SAS log:
NOTE: SAS went to a new line when INPUT statement
reached past the end of a line.
Lines of text do not have "observations". They just have lines.
It didn't skip any of the lines of data. It just used two lines for the second observation because the first of the lines only had values for 3 of the 4 variables the INPUT statement requested.
This behavior is what SAS calls the flowover option of the INFILE statement. This allows you to have more than one line of text to represent the data for a single observation without having to be too persnickety about which fields you insert the line breaks between across the different observations of data.
If you don't want it to have to go hunt for the next field on the next line of text then make sure every variable has a value in the text lines. You can represent missing values by using a period for either numeric or character variables.
So use something like this:
data info;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62 .
M 61 72 271
. 29 73 125
M 16 65 178
;
When using flowover you can insert as many extra line breaks as you want as long as each new observation starts on a new line. Like this
data info;
input Gender $ Age Height Weight;
datalines;
M 45 72
149
F 64
62 .
M
61 72 271
F 29 73 125
M 16 65 178
;
If you want SAS to just give up when a there are no more values on the line use the flowover option on the infile statement.
data info;
infile datalines flowover;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62
M 61 72 271
F 29 73 125
M 16 65 178
;
There is also the older missover option, but you would normally never want that as it will set values at the end of the line that too short for an explicit INFORMAT width to missing instead of just use the number of characters that are available.
PS Don't indent lines of data. That will just make the code harder to read and the diagnostic messages about invalid data values harder to interpret. To make it easier don't intend the DATALINES (aka CARDS) statement line either. That will also make it clearer the data step definition ends where the lines of data starts and prevent you from accidentally inserting other statements for the data step after the data.

removing common prefix or suffix

I have a data set contains a series variables named; PG_86xt, AG_86xt,... with same suffix _86xt. How can I remove such suffix while renaming these variables?
I know how to add prefix or suffix. But the logic of removing them seems to be a little bit different. I think proc dataset modify is still the way to go. But the length of substring before suffix (or after prefix) is unknown.
The example on how to add prefix or suffix
data one;
input id name :$10. age score1 score2 score3;
datalines;
1 George 10 85 90 89
2 Mary 11 99 98 91
3 John 12 100 100 100
4 Susan 11 78 89 100
;
run;
proc datasets library = work nolist;
modify one;
rename &suffixlist;
quit;
You can use the scan function to get the desired result.
By altering the example you have in the link to fit your example:
data one;
input id name :$10. age PG_86xt AG_86xt IG_86xt;
datalines;
1 George 10 85 90 89
2 Mary 11 99 98 91
3 John 12 100 100 100
4 Susan 11 78 89 100
;
run;
By filtering on only those column that fits your convention (XX_86xt), you could use the first part of the scan for renaming.
proc sql noprint;
select cats(name,'=',scan(name, 1, '_'))
into :suffixlist
separated by ' '
from dictionary.columns
where libname = 'WORK' and memname = 'ONE' and '86xt' = scan(name, 2, '_');
quit;
You can use the index function to find the (first) place in each variable name where the suffix / prefix starts, then use that to construct appropriate parameters for substr. It's a bit more work than the code in your example, but you'll get there.

Input statement is not reading all the datalines

I'm trying to read in some raw data using datalines...
data Exp_data;
INPUT a: 2. b: 2. DATE1: MMDDYY10. DATE2: MMDDYY10.;
FORMAT DATE1 DATE9. DATE2 DATE9.;
datalines;
27 93 03/16/2008 03/17/2008
27 93 03/17/2009 03/19/2009
68 68
55 55
46 68
34 34
45 67
56 75
34 34
34 34
;RUN;
But this code is reading data until 6 th row. I couldn't figure out where I'm doing mistake.
Thanks in advance!
Add this line before your input statement.
infile datalines missover;
As of the third row you don't have 4 values, so SAS needs to know what to do with the missing values. Missover tells sas to set the remaining values to missing.

Check if a column exists and then sum in SAS

This is my input dataset:
Ref Col_A0 Col_01 Col_02 Col_aa Col_03 Col_04 Col_bb
NYC 10 0 44 55 66 34 44
CHG 90 55 4 33 22 34 23
TAR 10 8 0 25 65 88 22
I need to calculate the % of Col_A0 for a specific reference.
For example % col_A0 would be calculated as
10/(10+0+44+55+66+34+44)=.0395 i.e. 3.95%
So my output should be
Ref %Col_A0 %Rest
NYC 3.95% 96.05%
CHG 34.48% 65.52%
TAR 4.58% 95.42%
I can do this part but the issue is column variables.
Col_A0 and Ref are fixed columns so they will be there in the input every time. But the other columns won't be there. And there can be some additional columns too like Col_10, col_11 till col_30 and col_cc till col_zz.
For example the input data set in some scenarios can be just:
Ref Col_A0 Col_01 Col_02 Col_aa Col_03
NYC 10 0 44 55 66
CHG 90 55 4 33 22
TAR 10 8 0 25 65
So is there a way I can write a SAS code which checks to see if the column exists or not. Or if there is any other better way to do it.
This is my current SAS code written in Enterprise Guide.
PROC SQL;
CREATE TABLE output123 AS
select
ref,
(col_A0/(Sum(Col_A0,Col_01,Col_02,Col_aa,Col_03,Col_04,Col_bb)) FORMAT=PERCENT8.2 AS PERCNT_ColA0,
(1-(col_A0/(Sum(Col_A0,Col_01,Col_02,Col_aa,Col_03,Col_04,Col_bb))) FORMAT=PERCENT8.2 AS PERCNT_Rest
From Input123;
quit;
Scenarios where all the columns are not there I get an error. And if there are additional columns then I miss those. Please advice.
Thanks
I would not use SQL, but would use regular datastep.
data want;
set have;
a0_prop = col_a0/sum(of _numeric_);
run;
If you wanted to do this in SQL, the easiest way is to keep (or transform) the dataset in vertical format, ie, each variable a separate row per ID. Then you don't need to know how many variables there are to figure it out.
If you always want to sum all the numeric columns then just do :
col_A0 / sum(of _numeric_)

Merge dataset without common variable (By)?

Currently I have two datasets with similar variable lists. Each dataset has a procedure variable. I want to compare the frequency of the procedure variable between datasets. I created a flag in both datasets to id the source dataset, and was going to merge but don't have a common identifier. How do I merge a dataset without deleting any observations? This isn't just a simple Merge without a By function, right?
Currently have:
Data.a Data.b
pproc proc1_numb
70 9
71 15
77 24
80 80
81 42
83 71
86 66
87 125
121 159
125 242
Want Output:
pproc freq
9 1
15 1
24 1
42 1
66 1
70 1
71 2
77 1
80 2
81 1
83 1
86 1
87 1
121 1
125 2
159 1
242 1
If I understand your question properly, you should just concatenate the two datasets into one and rename the variable. Then you can use PROC MEANS to get the frequencies. Something like this:
data all;
set a
b(rename=(proc1_numb=pproc));
run;
proc means nway data=all noprint;
class pproc;
output out=want(drop=_type_ rename=(_freq_=freq));
run;