SAS Drop Condition Output - Not enough variables observed in Output - sas

I am learning drop_conditions in SAS. I am requesting a sample on three variables, however only two observations are retrieved. Please advise! Thank you!
data temp;
set mydata.ames_housing_data;
format drop condition $30.;
if (LotArea in (7000:9000)) then drop_condition = '01: Between 7000-9000 sqft';
else if (LotShape EQ 'IR3') then drop_condition = '02: Irregular Shape of Property';
else if (condition1 in 'Artery' OR 'Feedr') then drop_condition = '03: Proximity to arterial/feeder ST';
run;
proc freq data=temp;
tables drop_condition;
title 'Sample Waterfall';
run; quit;

Your conditions/comparisons aren't specified correctly, I think you're looking for the IN operator to do multiple comparisons in a single line. I'm surprised there aren't errors in your log.
Rather than the following:
if (LotArea = 7000:9000)
Try:
if (lotArea in (7000:9000))
and
if (condition1 EQ 'Atrery' OR 'Feedr')
should be
if (condition1 in ('Atrery', 'Feedr'))
EDIT:
You also need to specify a length for the drop_condition variable, instead of a format to ensure the variable is long enough to hold the text specified. It's also useful to verify your answer afterwards with a proc freq against the specified conditions, for example:
proc freq data=temp;
where drop_condition=:'01';
tables drop_condition*lot_area;
run;

Related

How to print the first 10 rows with columns label in SAS

PROC PRINT DATA = pg1.eu_occ obs=10 label;
RUN;
I tried print first 10 observations with their label, but it doesn't work at all. Any idea how to solve this problem?enter image description here
Thanks.
obs= is a data set option, and thus must be specified in parenthesis after the data set name. A name=value coded into a Proc statement is known as a procedure option.
Your code should be
proc print data=pg1.eu_occ(obs=10) label;
run;

This range is repeated or overlapped

Now the question I have is I have a bigger problem as I am getting "this range is repeated or overlapped"... To be specific my values of label are repeating I mean my format has repeated values like a=aa b=aa c=as kind of. How do I resolve this error. When I use the hlo=M as muntilqbel option it gives double the data...
I am mapping like below.
Santhan=Santhan
Chintu=Santhan
Please suggest a solution.
To convert data to a FORMAT use the CNTLIN= option on PROC FORMAT. But first make sure the data describes a valid format. So read the data from the file.
data myfmt ;
infile 'myfile.txt' dsd truncover ;
length fmtname $32 start $100 value $200 ;
fmtname = '$MYFMT';
input start value ;
run;
Make sure to set the lengths of START and VALUE to be long enough for any actual values your source file might have.
Then make sure it is sorted and you do not have duplicate codes (START values).
proc sort data=myfmt out=myfmt_clean nodupkey ;
by start;
run;
The SAS log will show if any observations were deleted because of duplicate START values.
If you do have duplicate values then examine the dataset or original text file to understand why and determine how you want to handle the duplicates. The PROC SORT step above will keep just one of the duplicates. You might just has exact duplicates, in which case keeping only one is fine. Or you might want to collapse the duplicate observations into a single observation and concatenate the multiple decodes into one long decode.
If you want you can add a record that will add the functionality of the OTHER keyword of the VALUE statement in PROC FORMAT. You can use that to set a default value, like 'Value not found', to decode any value you might encounter that was not in your original source file.
data myfmt_final;
set myfmt_clean end=eof;
output;
if eof then do;
start = ' ';
label = 'Value not found';
hlo = 'O' ;
output;
end;
run;
Then use PROC FORMAT to make the format from the cleaned up data file.
proc format cntlin = myfmt_final;
run;
To convert a FORMAT to a dataset use the CNTLOUT= option on PROC FORMAT.
For example if you had created this format previously.
proc format ;
value $myfmt 'ABC'='ABC' 'BCD'='BCD' 'BCD1'='BCD' 'BCD2'='BCD' ;
run;
then you can use another PROC FORMAT step to make a dataset. Use the SELECT statement if you format catalog has more than one format defined and you just want one (or some) of them.
proc format cntlout=myfmt ;
select $myfmt ;
run;
Then you can use that dataset to easily make a text file. For example a comma delimited file.
data _null_;
set myfmt ;
file 'myfmt.txt' dsd ;
put start label;
run;
The result would be a text file that looks like this:
ABC,ABC
BCD,BCD
BCD1,BCD
BCD2,BCD
You get this error because you have the same code that maps to two different categories. I'm going to guess you likely did not import your data correctly from your text file and ended up getting some values truncated but without the full process it's an educated guess.
This will work fine:
proc format;
value $ test
'a'='aa' 'b'='aa' 'c'='as'
;
run;
This version will not work, because a is mapped to two different values, so SAS will not know which one to use.
proc format;
value $ badtest
'a'='aa'
'a' = 'ba'
'b' = 'aa'
'c' = 'as';
run;
This generates the error regarding overlaps in your data.
The way to fix this is to find the duplicates and determine which code they should actually map to. PROC SORT can be used to get your duplicate records.

How Do I get Ride of the Missing Frequency in my RESULT USING SAS

I am new to this and I already posted this question. But I think I did not explain it well.
I have a DATA inside SAS.
Some of the cells are empty[nothing in] and in the SAS output window, they have a DOT in the cell.
WHen I run the Result, At the end of the table, It add MISSING FREQUENCY = 7 or whatever the number is...
How do I make SAS disregard the Missing Frequency and ONLY use the one that have result...
Please see my screen shot, code and my CSV:OUTPUT DATA
RESULT WITH the MISSING frequency at the bottom
/* Generated Code (IMPORT) */
/* Source File:2012_16_ChathamPed.csv */
/* Source Path: /home/cwacta0/my_courses/Week2/ACCIDENTS */
PROC IMPORT
DATAFILE='/home/cwacta0/my_courses/Week2/ACCIDENTS/2012_16_ChathamPed.csv'
OUT=imported REPLACE;
GETNAMES=YES;
GUESSINGROWS=32767;
RUN;
proc contents data=work.imported;
run;
libname mydata"/courses/d1406ae5ba27fe300" access=readonly;
run;
/* sorting data by location*/
PROC SORT ;
by LocationOfimpact;
LABEL Route="STREET NAME" Fatalities="FATALITIES" Injuries="INJURIES"
SeriousInjuries="SERIOUS INJURIES" LocationOfimpact="LOCATION OF IMPACT"
MannerOfCollision="MANNER OF COLLISION"
U1Factors="PRIMARY CAUSES OF ACCIDENT"
U1TrafficControl="TRAFFIC CONTROL SIGNS AT THE LOCATION"
U2Factors="SECONDARY CAUSES OF ACCIDENT"
U2TrafficControl="OTHER TRAFFIC CONTROL SIGNS AT THE LOCATION"
Light="TYPE OF LIGHTHING AT THE TIME OF THE ACCIDENT"
DriverAge1="AGE OF THE DRIVER" DriverAge2="AGE OF THE CYCLIST";
/* Here I was unable to extract the drivers age 25 or less and te drivers who disregarded stop sign. here is how I coded it;
IF DriverAge1 LE 25;
IF U1Factors="Failed to Yield" OR U1Factors= "Disregard Stop Sign";
Run;
Also, I want to remove the Missing DATA under the results. But in the data, those are just a blank cell. How do I tell SAS to disregard a blank cell and not add it to the result?
Here is what I did and it does not work...
if U1Factors="BLANK" Then U1Factors=".";
Please help me figre this out...Tks
IF U1Factors="." Then call missing(U1Factors)*/;
Data want;
set imported;
IF DriverAge1 LE 25 And U1Factors in ("Failed to Yield", "Wrong Side of Road",
"Inattentive");
IF Light in ("DarkLighted", "DarkNot Lighted", "Dawn");
run;
proc freq ;
tables /*Route Fatalities Injuries SeriousInjuries LocationOfimpact MannerOfCollision*/
U1Factors /*U1TrafficControl U2Factors U2TrafficControl*/
light DriverAge1 DriverAge2;
RUN;
SAS will display missing numeric variables using a period. So if there was nothing in column for DriverAge1 in the CSV file then that observation will have a missing value. If your variable is character then SAS will also normally convert values of just a period in the input stream into blanks in the SAS variable.
Missing numeric values are considered less than any real number. So if you want use conditions like less than or equal to then missing values would be included if you do not exclude them by some other condition.
You can use a WHERE statement on procs to filter the data. If you want to append to the WHERE condition in a separate statement you can use the WHERE ALSO syntax to add the extra conditions.
If you want the missing category to appear in the PROC FREQ output add the MISSPRINT option to the TABLES statement. Or add the MISSING option and it will appear and also be counted in statistics.
proc freq ;
where . < DriverAge1 <= 25
and U1Factors in ("Failed to Yield", "Wrong Side of Road","Inattentive")
;
where also Light in ("DarkLighted", "DarkNot Lighted", "Dawn");
tables U1Factors light DriverAge1 DriverAge2 / missing;
run;
The WHERE conditions will apply to the whole dataset. So if you exclude missing DriverAge1 and missing U1Factors
proc freq ;
where not missing(U1Factors) and not missing(DriverAge1);
tables U1Factors DriverAge1 ;
run;
then only the observations that are not missing for both will be included. So you might want to generate the statistics separately for each variable.
proc freq ;
where not missing(U1Factors);
tables U1Factors ;
run;
proc freq ;
where not missing(DriverAge1);
tables DriverAge1 ;
run;

PROC FREQ on multiple variables combined into one table

I have the following problem. I need to run PROC FREQ on multiple variables, but I want the output to all be on the same table. Currently, a PROC FREQ statement with something like TABLES ERstatus Age Race, InsuranceStatus; will calculate frequencies for each variable and print them all on separate tables. I just want the data on ONE table.
Any help would be appreciated. Thanks!
P.S. I tried using PROC TABULATE, but it didn't not calculate N correctly, so I'm not sure what I did wrong. Here is my code for PROC TABULATE. My variables are all categorical, so I just need to know N and percentages.
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
The above code does not return the correct frequencies based on InsuranceStatus where 0 = insured and 1 = uninsured, but PROC FREQ does. Also doesn't calculate correctly with ROWPCTN. So any way that I can get PROC FREQ to calculate multiple variables on one table, or PROC TABULATE to return the correct frequencies, would be appreciated.
Here is a nice image of my output in a simplified analysis of only ERstatus and InsuranceStatus. You can see that PROC FREQ returns 204 people with an ERstatus of 1 and InsuranceStatus of 1. That's correct. The values in PROC TABULATE are not.
OUTPUT
I'll answer this separately as this is answering the other possible interpretation of the question; when it's clarified I'll delete one or the other.
If you want this in a single printed table, then you either need to use proc tabulate or you need to normalize your data - meaning put it in the form of variable | value. PROC FREQ is not capable of doing multiple one-way frequencies in a single table.
For PROC TABULATE, likely your issue is missing data. Any variable that is on the class statement will be checked for missingness, and if any rows are missing data for any of the class variables, those rows are entirely excluded from the tabulation for all variables.
You can override this by adding the missing option on the class statement, or in the table statement, or in the proc tabulate statement. So:
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus/missing;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
This will result in a slightly different appearance than on your table, though, as it will include the missing rows in places you probably do not want them, and they'll be factored against the colpctn when again you probably don't want them.
Typically some manipulation is then necessary; the easiest is to normalize your data and then run a tabulation (using PROC TABULATE or PROC FREQ, whichever is more appropriate; TABULATE has better percentaging options though) against that normalized dataset.
Let's say we have this:
data class;
set sashelp.class;
if _n_=5 then call missing(age);
if _n_=3 then call missing(sex);
run;
And we want these two tables in one table.
proc freq data=class;
tables age sex;
run;
If we do this:
proc tabulate data=class;
class age sex;
tables (age sex),(N colpctn);
run;
Then we get an N=17 total for both subtables - that's not what we want, we want N=18. Then we can do:
proc tabulate data=class;
class age sex/missing;
tables (age sex),(N colpctn);
run;
But that's not quite right either; I want F to have 8/18 = 44.44% and M 10/18 = 55.55%, not 42% and 53% with 5% allocated to the missing row.
The way I do this is to normalize the data. This means you get a dataset with 2 variables, varname and val, or whatever makes sense for your data, plus whatever identifier/demographic/whatnot variables you might have. val has to be character unless all of your values are numeric.
So for example here I normalize class with age and sex variables. I don't keep any identifiers, but you certainly could in your data, I imagine InsuranceStatus would be kept there if I understand what you're doing in that table. Once I have the normalized table, I just use those two variables, and carefully construct a denominator definition in proc tabulate to have the right basis for my pctn value. It's not quite the same as the single table before - the variable name is in its own column, not on top of the list of values - but honestly that looks better in my opinion.
data class_norm;
set class;
length val $2;
varname='age';
val=put(age,2. -l);
if not missing(age) then output;
varname='sex';
val=sex;
if not missing(sex) then output;
keep varname val;
run;
proc tabulate data=class_norm;
class varname val;
tables varname=' '*val=' ',n pctn<val>;
run;
If you want something better than this, you'll probably have to construct it in proc report. That gives you the most flexibility, but is the most onerous to program in also.
You can use ODS OUTPUT to get all of the PROC FREQ output to one dataset.
ods output onewayfreqs=class_freqs;
proc freq data=sashelp.class;
tables age sex;
run;
ods output close;
or
ods output crosstabfreqs=class_tabs;
proc freq data=sashelp.class;
tables sex*(height weight);
run;
ods output close;
Crosstabfreqs is the name of the cross-tab output, while one-way frequencies are onewayfreqs. You can use ods trace to find out the name if you forget it.
You may (probably will) still need to manipulate this dataset some to get the structure you want ultimately.

SAS merge if-then getting two results back

I cannot seem to find a way to use and if-then or just an if statement below the unique command in the SAS merge code below. I am matching all of the same S2_Liab numbers just pulling a different number where there is a different title in the "Liability_Lmt" column.
/*--------------------------------------------------------------------------------------
/*---- Additional Partner and/or Corporation charge ----------
/*--------------------------------------------------------------------------------------
data full;
set whole_FR;
record_num=_N_;
S2_Liab = S2LiabLimit;
run;
proc sort;
by S2_Liab;
run;
data unique(keep=S2_Liab S2Partners P_charge FD_charge);
set WORK.FR_S2_Mandatory_Liab_Cov;
S2_Liab = Liability_Lmt;
if Additional = "Partner_Corp" then P_charge = round(&thestate.,0.01);
if Additional = "Farm_Dwelling" then FD_charge = round(&thestate.,0.01);
run;
proc sort nodupkey;
by S2_Liab;
run;
data match nonmatch;
merge full(in=a) unique(in=b);
by S2_Liab;
if a=1 and b=1 then output match;
if a=1 and b=0 then output nonmatch;
run;
data whole_FR;
set match nonmatch;
proc sort;
by record_num;
run;
For the first if-then statement above I am getting no results, instead the code seems to skip directly to the second if-then statement.
These are my results show me all of the numbers for the second if statement ("FD_charge") but blank answers for "P_charge". For some reason the program skips over the first if statement and prints out the answer for the second if statement. I have also tried an if-then-else statement but I get the same answers.
Does anybody have a clue how to make this work?