How is this SAS dataset being emptied? - sas

Running SAS 9.3 supporting MXG 32.10 on Windows
I have a dataset, WORK.SMFLISTD, which has 15 obs. At the end of processing, it has 0 obs even though I can find no modification to it. I can find no reference to WORK.SMFLISTD between these two states. Searches for SMFLISTD in SASLOG find no reference to it. Searches for WORK find many DATASETS DELETE statements but none reference SMFLISTD. There is no overall DATASETS KILL for the WORK library. In any case, if there were, SMFLISTD still exists but with 0 obs. Given these conditions, how does this dataset end up empty?
Options are:
options source source2 symbolgen mprint;
At the beginning of the run:
77 data work.smflistd(keep=refdate dsname volser);
78 set
78 ! work.smflist(obs=&getcnt);
SYMBOLGEN: Macro variable GETCNT resolves to 15
79 run;
NOTE: There were 15 observations read from the data set WORK.SMFLIST.
NOTE: The data set WORK.SMFLISTD has 15 observations and 3 variables.
At the end of the run:
MPRINT(RUNCICSSMF): proc print data=work.smflistd;
MPRINT(RUNCICSSMF): title "SMFLISTD After";
NOTE: No observations in data set WORK.SMFLISTD.
MPRINT(RUNCICSSMF): data daily.smflist;
MPRINT(RUNCICSSMF): set work.smflistd;
MPRINT(RUNCICSSMF): modify daily.smflist key=dsnid;
MPRINT(RUNCICSSMF): procdt = datetime();
MPRINT(RUNCICSSMF): run;
3223 The SAS System 06:41 Wednesday, November 12, 2014
NOTE: There were 0 observations read from the data set WORK.SMFLISTD.
NOTE: The data set DAILY.SMFLIST has been updated. There were 0 observations rewritten, 0
observations added and 0 observations deleted.

There are a multitude of ways that this could be happening so searching for a reference to the table may not work.
You can run the code to the point where the table is created. Open the table (so that it is locked), and then finish running the rest of the code. Because the table is locked, the step that tries to empty it will fail and show in the log as either an ERROR: or a WARNING:.

As Robert says, there are many possibilities. One "gotcha" is the global obs setting:
options obs=0;
You could check for this..

Related

SAS data step with BY variable on unsorted data

I'm executing SAS data step with by variable. I understand the output when the data is sorted by key (X in my case). However, when the data is unsorted, I get the following output:
I'm using SAS ODA's AFRICA dataset from MAPS library which has 52824 rows. Here's the link to the CSV file.
data AFRICA_NEW12;
set Maps.AFRICA;
by X;
firstX = FIRST.X;
lastX = LAST.X;
run;
I don't understand how rows are selected when data is not sorted. Why does the output have 14 rows?
You have an error in your log because you didn't sort it. Make sure to read your log.
This likely generates the same issue for you:
data cars;
set sashelp.cars;
by model;
run;
proc print data=cars;
var make model origin;
run;
Output is:
Obs Make Model Origin
1 Acura MDX Asia
2 Acura RSX Type S 2dr Asia
And the log shows:
ERROR: BY variables are not properly sorted on data set SASHELP.CARS.
Make=Acura Model=TSX 4dr Type=Sedan Origin=Asia DriveTrain=Front MSRP=$26,990 Invoice=$24,647 EngineSize=2.4 Cylinders=4
Horsepower=200 MPG_City=22 MPG_Highway=29 Weight=3230 Wheelbase=105 Length=183 FIRST.Model=1 LAST.Model=1 _ERROR_=1 _N_=3
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4 observations read from the data set SASHELP.CARS.
WARNING: The data set WORK.CARS may be incomplete. When this step was stopped there were 2 observations and 15 variables.
WARNING: Data set WORK.CARS was not replaced because this step was stopped.
Note this portion specifically:
WARNING: The data set WORK.CARS may be incomplete. When this step was stopped there were 2 observations and 15 variables.
If you know the data is sorted in the order you want, which may not be the same as what SAS expects you can add the notsorted option on the BY statement but this is a different type of functionality so check your code thoroughly.
data cars;
set sashelp.cars;
by model notsorted;
run;

Why doesn't my SAS Stored Process output the data set to E.G., while the log shows it's created?

I'm creating a stored process in SAS EG for some business partners, but I can't seem to get my dataset to output.
A 'Results' viewer shows up but is blank, and my code works perfectly fine when not using a stored process, but the user has to manually change the macro variable for the account they are looking for. With a stored process I can mitigate users accidentally deleting some code, etc.
I can see in my SAS log that the output dataset is being created with variables and observations, but it doesn't automatically pop up like a typical SAS EG job would. I also have some documentation I received from a co-worker around stored processes, and it seems to me that after successful execution a SAS dataset should automatically output.
One thought: Will a stored process output a dataset if there are warnings in the log? I have warnings presented because I am appending datasets to a base file that isn't created, so the lengths of my numeric variables change.
Here's a snippet from the log..
NOTE: The address space has used a maximum of 5504K below the line and 222716K above the line.
104
105 data tran_last;
106 retain TRAN_DT MRCH_NAME MRCH_CITY AMT_TRAN DEB_CRD_IND;
107 set tran_sorted;
108 output;
109 run;
The SAS System
NOTE: There were 164 observations read from the data set WORK.TRAN_SORTED.
NOTE: The data set WORK.TRAN_LAST has 164 observations and 5 variables.
NOTE: The DATA statement used 0.00 CPU seconds and 51817K.
NOTE: The address space has used a maximum of 5504K below the line and 222716K above the line.
The data set WORK.TRAN_LAST is the dataset I wish to be output so that my user can directly copy/paste from there, maybe I'm missing something apparent, but I can't seem to figure this out.
Version 7.1
The answer was extremely simple. I had to use
PROC PRINT DATA = MYDATA ;
RUN;
at the end of my stored procedure.
However, I have books from the SAS Institute that say you can retrieve an "Output Data" file from a stored procedure instead of the "Results Viewer" using proc print. This functionality must have been taken out with newer versions, or maybe I was doing something wrong.
To fix this issue, I have my SAS connected to an excel file that the end-user will run the program(s) from so that they won't need to worry about the output being "Results Viewer".

Controlling export (ODS or PROC) based on Number of Observations

I currently have a SAS process that generates multiple data sets (whether they have observations or not). I want to determine a way to control the export procedure based on the total number of observations (if nobs > 0, then export). My first attempt was something primitive using if/then logic comparing a select into macro var (counting obs in a data set) -
DATA _NULL_;
SET A_EXISTS_ON_B;
IF &A_E > 0 THEN DO;
FILE "C:\Users\ME\Desktop\WORKLIST_T &PDAY..xls";
PUT TASK;
END;
RUN;
The issue here is that I don't have a way to write multiple sets to the same workbook with multiple sheets(or do I?)
In addition, whenever I try and add another "Do" block, with similar logic, the execution fails. If this cannot be done with a data null, would ODS be the answer?
The core of what you want to do, conditionally execute code, can be done one of a number of ways.
Let's imagine we have a short macro that exports a dataset to excel. Simple as pie.
%macro export_to_excel(data=,file=,sheet=);
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%mend export_to_excel;
Now let's say we want to do this conditionally. How we do it depends, to some degree, on how we call this macro in our code now.
Let's say you have:
%let wherecondition=1; *always true!;
data class;
set sashelp.class;
if &wherecondition. then output;
run;
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1);
Now you want to make this so it only exports if class has some rows in it, right. So you get the # of obs in class:
proc sql;
select count(1) into :classobs from class;
quit;
And now you need to incorporate that somehow. In this case, the easiest way is to add a condition to the export macro. Open code doesn't allow conditional executing of code, so it needs to be in a macro.
So we do:
%macro export_to_excel(data=,file=,sheet=,condition=1);
%if &condition. %then %do;
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%end;
%mend export_to_excel;
And you add the count to the call:
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1,condition=&classobs.)
Tada, now it won't try to export when it's 0. Great.
If this code is already in a macro, you don't have to alter the export macro itself. You can simply put that %if %then part around the macro call. But that's only if the whole thing is already a macro - %if isn't allowed outside of macros (sorry).
Now, if you're exporting a whole bunch of datasets, and you're generating your export calls from something, you can add the condition there, more easily and more smoothly than this.
Basically, either make by hand (if that makes sense), or use proc sql or proc contents or (other method of your choice) to make a dataset that contains one row per dataset-to-export, with four variables: dataset name, file to export, sheet to export (unless that's the same as the dataset name), and count of observations for that dataset. Often the first three would be made by hand, and then merged/updated via sql or something else to the count of obs per dataset.
Then you can generate calls to export, like so:
proc sql;
select cats('%export_to_excel(data=',dataname,',file=',filename,',sheet=',sheetname,')')
into :explist separated by ' '
from datasetwithnames
where obsnum>0;
quit;
&explist.; *this actually executes them;
Assuming obsnum is the new variable you created with the # of obs, and the other variables are obviously named. That won't pull a line with anything with 0 observations - so it never tries to execute the export. That works with the initial export macro just as well as with the modified one.
Suggest you google around for different approaches to writing XLS files.
Regarding using a DATA step or PROC step, the DATA step is tolerant of datasets that have 0 obs. If the SET statement reads a dataset that has 0 obs, it will simply end the step. So you don't need special logic. Most PROCS also accomodate 0 obs dataset without throwing a warning or error.
For example:
1218 *Make a 0 obs dataset;
1219 data empty;
1220 x=1;
1221 stop;
1222 run;
NOTE: The data set WORK.EMPTY has 0 observations and 1 variables.
1223
1224 data want;
1225 put "I run before SET statement.";
1226 set empty;
1227 put "I do not run after SET statement.";
1228 run;
I run before SET statement.
NOTE: There were 0 observations read from the data set WORK.EMPTY.
NOTE: The data set WORK.WANT has 0 observations and 1 variables.
1229
1230 proc print data=empty;
1231 run;
NOTE: No observations in data set WORK.EMPTY.
But note as Joe points out, PROC EXPORT will happily export a dataset with 0 obs and write an file with 0 records, overwriting if it was there already. e.g.:
1582 proc export data=sashelp.class outfile="d:\junk\class.xls";
1583 run;
NOTE: File "d:\junk\class.xls" will be created if the export process succeeds.
NOTE: "CLASS" range/sheet was successfully created.
1584
1585 data class;
1586 stop;
1587 set sashelp.class;
1588 run;
NOTE: The data set WORK.CLASS has 0 observations and 5 variables.
1589
1590 *This will replace class.xls";
1591 proc export data=class outfile="d:\junk\class.xls" replace;
1592 run;
NOTE: "CLASS" range/sheet was successfully created.
ODS statements would likely do the same.
I use a macro to check if a dataset is empty. SO answers like:
How to detect how many observations in a dataset (or if it is empty), in SAS?

SAS Transpose Comma Separated Field

This is a follow-up to an earlier question of mine.
Transposing Comma-delimited field
The answer I got worked for the specific case, but now I have a much larger dataset, so reading it in a datalines statement is not an option. I have a dataset similar to the one created by this process:
data MAIN;
input ID STATUS STATE $;
cards;
123 7 AL,NC,SC,NY
456 6 AL,NC
789 7 ALL
;
run;
There are two problems here:
1: I need a separate row for each state in the STATE column
2: Notice the third observation says 'ALL'. I need to replace that with a list of the specific states, which I can get from a separate dataset (below).
data STATES;
input STATE $;
cards;
AL
NC
SC
NY
TX
;
run;
So, here is the process I am attempting that doesn't seem to be working.
First, I create a list of the STATES needed for the imputation, and a count of said states.
proc sql;
select distinct STATE into :all_states separated by ','
from STATES;
select count(distinct STATE) into :count_states
from STATES;
quit;
Second, I try to impute that list where the 'ALL' value appears for STATE. This is where the first error appears. How can I ensure that the variable STATE is long enough for the new value? Also, how do I handle the commas?
data x_MAIN;
set MAIN;
if STATE='ALL' then STATE="&all_states.";
run;
Finally, I use a SCAN function to read in one state at a time. I'm also getting an error here, but I think fixing the above part may solve it.
data x_MAIN_mod;
set x_MAIN;
array state(&count_states.) state:;
do i=1 to dim(state);
state(i) = scan(STATE,i,',');
end;
run;
Thanks in advance for the help!
Looks like you are almost there. Try this on the last Data Step.
data x_MAIN_mod;
set x_MAIN;
format out_state $2.;
nstate = countw(state,",");
do i=1 to nstate;
out_state = scan(state,i,",");
output;
end;
run;
Do you have to actually have two steps like that? You can use a 'big number' in a temporary variable and not have much effect on things, if you don't have the intermediate dataset.
data x_MAIN;
length state_temp $150;
set MAIN;
if STATE='ALL' then STATE_temp="&all_states.";
else STATE_temp=STATE;
array state(&count_states.) state:;
do i=1 to dim(state);
state(i) = scan(STATE,i,',');
end;
drop STATE_temp;
run;
If you actually do need the STATE, then honestly I'd go with the big number (=50*3, so not all that big) and then add OPTIONS COMPRESS=CHAR; which will (give or take) turn your CHAR fields into VARCHAR (at the cost of a tiny bit of CPU time, but usually far less than the disk read/write time saved).

Understanding the SAS PDV in by-group processing

While I've read quite a bit about conceptualizing the Program Data Vector when using a SAS data step, I still don't understand how the PDV works when there is by group processing. For example if I have the dataset olddata
GROUP VAL
A 10
A 5
B 20
And I call a datastep on it with a by statement, such as:
data newdata;
set olddata;
by group;
...
run;
then the compiler adds two temporary variables to the PDV: first.group and last.group. When you read any tutorial on the PDV it will tell you that on the first pass of the SET statement, the PDV will look like:
_N_ _ERROR_ FIRST.GROUP LAST.GROUP GROUP VAL
1 0 1 0 A 10
and that LAST.GROUP is zero because observation 1 is not the last observation in group A.
Herein lies my question: How does SAS know that this is not the last observation?
If SAS is processing olddata row-by-row, how is the PDV aware that the next row holds another group A observation instead of a new group? In other words, it seems like SAS must be using information from previous or future rows to update the FIRST and LAST variables, but I'm not sure how. Is there some trick in how the PDV retains values from row to row when the BY statement is called?
SAS actually looks ahead to the next record to see if it should set LAST.(var) or not. I haven't been able to find an article explaining that in any detail, unfortunately. I was a bit disappointed to see that even papers like http://www.wuss.org/proceedings09/09WUSSProceedings/papers/ess/ESS-Li1.pdf just gloss over how LAST is detemined.
SAS also looks ahead to see if the END= variable should be set, when specified, and a few other things. It's not just using metadata to determine those; you can remove or modify records without modifying the metadata, and it will still work - and SQL tables that don't have the usual SAS metadata will still allow you to perform normal BY group processing and such.
The FIRST variable doesn't need a look-behind, of course; it remembers where it was after all.
Edit: I crossposted this to SAS-L, and got the same answer - there doesn't seem to be any documentation of the subject, but it must read ahead. See http://listserv.uga.edu/cgi-bin/wa?A1=ind1303a&L=sas-l#8 for example.
Edit2: From SAS-L, Dan Nordlund linked to a paper that confirms this. http://support.sas.com/resources/papers/proceedings12/222-2012.pdf
The paper's logic that confirms the lookahead - look at the number of observations read from the data set.
DATA DS_Sample1;
Input Sum_Var
Product;
Cards;
100 3
100 2
100 1
;
*With BY statement - reads 3 observations even though it stops after 2.;
DATA DS_Sample2;
Set DS_Sample1;
by Sum_Var;
cnt+1; If CNT > 1 then stop;
Run;
*no BY statement - reads 2 observations as expected;
DATA DS_Sample2;
Set DS_Sample1;
cnt+1; If CNT > 1 then stop;
Run;
* END statement - again, a lookahead;
DATA DS_Sample2;
Set DS_Sample1 end=eof;
cnt+1; If CNT > 1 then stop;
Run;