Sort Order of Datasets - sas

Is there any inherent method in SAS to find the sort order of a dataset?

As Sassy says, the only way SAS knows if a data set is sorted is if it did the sorting, or if you explicitly tell it the sort order. If you haven't done either of these steps, it will have no idea if the data is in any type of order.
I like AFHood's idea of just trying to sort it. If SAS knows it is sorted that way it will just tell you and won't do it again.
NOTE: Input data set is already sorted, no sorting done
Here are some other ideas for investigating data sorting...Enjoy.
If you just want to look at it manually, you can use proc contents data=libname.data;run; and look at the output. There is an attribute called sorted. If you are using the windowing mode you can right click on the data set in the explorer and choose properties, then click the details tab and see the sortedby values.
For a programmatic testing approach, you can use an output data set from proc contents. The sorted and sortedby columns will tell you if the data set is sorted and which variable it is sorted by. Try it by running the code below.
/* In an unsorted data set, proc contents will give missing values
for the sorted and sortedby columns of its output data */
proc contents data=sashelp.class out=class_contents noprint;run;
proc print data=class_contents;
var memname name sorted sortedby;
run;
/* Now sort and observe the changes in the sorted and sortedby columns */
proc sort data=sashelp.class out=class_sorted; by name;run;
proc contents data=class_sorted out=class_sorted_contents;run;
proc print data=class_sorted_contents;
var memname name sorted sortedby;
run;`enter code here`

If the data has indeed been "sorted", yes. There is a small table concerning sort information at the bottom of the output produced by proc contents.
Datasets that have been built with data that was already in some kind of sort order may not have this information attached to them and you will need to begin to explore the data to determine its order.

You can use the Attrc function.
Docs at http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000147794.htm
It goes something like the following
data _null_;
dsid=open("work.a","i");
sortby=attrc(dsid,"SORTEDBY");
put sortby=;
rc=close(dsid);
run;

I'm assuming that the problem you are solving is trying to determine whether a dataset needs to be sorted in a programmatic manner. The best solution we've used is to simply use proc sort when in doubt.
Of course you might say, that's overhead and processing.. Well yes, but if the dataset is already sorted correctly, proc sort with know it and let your code move on with minimal processing. It has the "if sorted then move on" logic built in.
If this isn't the problem you are trying to solve, elaborate and we'll see if we can help.

If sas didn't sort the data but you think it might be sorted, you can try to process it as though it's sorted and deal with the errors that may or may not occur as a result.
* This works, swap some values to see how an error looks;
data foo;
input height;
cards;
1
2
3
4
;
run;
data _null_;
set foo;
by height;
run;
Error states can be detected and reset in macro, but that approach is likely to get messy. More info on that here

Related

permanently save modified dataset

I know this is a very basic question but my code keeps failing when trying to run what I found through the help documentation.
Up to now I have been running an analysis project off of the .WORK directory which I understand gets wiped out every time a session ends. I have done a bunch of data cleaning and preparation and do not want to have to do that every time before I start my analysis.
So I understand, from reading this: https://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001310720.htm that I have to output the cleaned dataset to a non-temporary directory.
Steps I have taken so far:
1) created a new Library called "Project"
2) Saved it in a folder that I have under "my folders" in SAS
3) My code for saving the cleaned dataset to the "Project" library is as follows:
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Then I run this code in a new program:
PROC PRINT DATA=PROJECT.FAA_ALL;
RUN;
It says there are no observations and that the dataset is essentially empty.
Can some tell me where I'm going wrong?
Your problem is the PROC SORT
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Should be
PROC SORT DATA=FAA_ALL OUT= PROJECT.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
That DATA PROJECT.FAA_ALL was starting a Data Step creating a blank data set.
Something else worth mentioning: your data step didn't do what you might have expected because you had no set statement. Your code was equivalent to:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET _NULL_;
RUN;
PROJECT.FAA_ALL is empty because nothing was read in.
The SORT procedure implicitly sorts a dataset in-place. You could have SAS move the sorted data by adding the set statement to your data step:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET WORK.FAA_ALL;
RUN;
However, this still takes two steps, and requires extra disk I/O. Using the out option in a SAS procedure (as in DomPazz's answer) is almost always faster and more efficient than using a data step just to move data.

How to get SAS tables sizes and last usage time in library

Good day!
I need a list of libraries-tables on a SAS server with a size of each table and last time, when it was open/used.
I'm not very familiar with SAS, so I don't even know where would I start searching :(
I assume, that there is some simple solution, maybe a proc of some sort, that may help...
You can use proc contents to access metadata about a library in SAS, for example using the sashelp library:
proc contents data = sashelp._ALL_ NODS;
run;
sashelp is the library you are refencing. By specifying _ALL_ you ask SAS for data about all the files in this library (by choosing a singular file such as sashelp.ztc you can get information on jut one file).
This will give you a lot of information, so by using the NODS statement you can suppress the output to give you less detail. The above code will give you the number of files, their type, the level, the file size, and the data they were last modified.
If you want to output this information to a dataset, you have to use the ODS output system with the correct ods table name, in this case it is Members. Furthermore, if you're looking for datasets in particular then you can filter the output with a where= statement:
ods output Members = test (where = (memtype = "DATA"));
proc contents data = work._ALL_ NODS noprint;
run;
ods listing; /* change back to listing output*/

SAS proc Freq & gchart display additional value's frequency/ bars

This might be a weird question. I have a data set contains data like agree, neutral, disagree...for many questions. There is not so many observations so for some question, one or more options has frequency of 0, say neutral. When I run proc freq, since neutral shows up in that variable, the table does not contain a row for neutral. I end up with tables with different number of rows. I would like to know if there is a option to show these 0 frequency rows. I will also need to run proc gchart for the same data set, and I will run into the same problem for having different number of bars. Please help me on this. Thank you!
This depends on how exactly you are running your PROC FREQ. It has the sparse option, which tells it to create a value for every logical cell on the table when creating an output dataset; normally, while you would have a cell with a missing value (or zero) in a crosstab, if that is output to a dataset (which is vertical, ie each combination of x and y axis value are placed in one row) those rows are left off. Sparse makes sure that doesn't happen; and in a larger (n-dimensional) crosstab, it creates rows for every possible combination of every variable, even ones that don't occur in the data.
However, if you're just doing
proc freq data=mydata;
tables myvar;
run;
That won't help you, as SAS doesn't really have anything to go on to figure out what should be there.
For that, you have to use a class variable procedure. Proc Tabulate is one of such procedures, and is similar to Proc Freq in its syntax (sort of). You need to either use CLASSDATA on the proc statement, or PRINTMISS on the table statement. In the former case, you do not need to use a format, I don't believe. In the latter case (PRINTMISS), you need to create a format for your variable (if you don't already have one) that contains all levels of the data that you want to display (even if it's just an identity format, e.g. formatting character strings to identical character strings), and specify PRELOADFMT on the proc statement. See this man page for more details.

Is there an easy way to drop all variables from one of the datasets when merging in SAS?

Say I've already sorted set1 and set2 by the variables 'sticks', 'stones', and 'bones' and then i do this:
data merged;
merge set1(in=a) set2(in=b);
by sticks stones bones;
if a and b then output;
*else we don't want to do anything;
run;
Is there an easy way to drop all the variables from set2 in the merged dataset without having to type them all? I keep running into this problem where I have two datasets - both with quite a few variables - and I only want to merge them by a few variables and then only keep the variables from one of the sets.
I usually just use proc sql for something like this, but there are a few situations (more complex than above) where where I think merge is better.
Also, I find it annoying that SAS requires you to "manually" sort datasets before merging them. If it will not let you merge datasets unless they are sorted correctly, why doesn't it just do it for you when you use merge? Thoughts? Maybe there is a way around this I don't know about.
The sorted requirements is there for the way the merge statement and the PDV works in it.
There is really no way around it.
However here basically you're doing a lookup of set2 to make sure you have a match of the key variables (sticks stones bones) through the equivalent of an inner join, which you could likely do more efficiently through an hash table or set with keys (if you have an index of course).
The easiest and more convenient way for what you want here is having a keep statement in the set2 so you load into the PDV only the by variables.
Something like this:
data merged;
merge set1(in=a) set2(in=b keep=sticks stones bones);
by sticks stones bones;
if a and b then output;
*else we don't want to do anything;
run;
In case hash tables don't scare you and want to learn more on how to implement them in this case feel free to contact me for more help.
EDIT:
Here is a good paper about using hash tables http://www.nesug.org/proceedings/nesug06/dm/da07.pdf
Bear in mind that using hashes you should know what you're doing and they may yield unexpected results if you don't know whats happening under the hood.
Regardless here is the problem solved using a very simple and basic hash table
data merged2;
set set1;
if _N_ = 1 then do;
declare hash h(dataset:"set2");
h.defineKey('sticks','stones','bones');
h.defineData('sticks','stones','bones');
h.defineDone();
end;
rc = h.find();
if rc=0;
drop rc;
run;
This code has the main benefit of not requiring the sorting of the datasets which in case set2 is particularly big is a great time-saver.

Compare in SAS proc

First off, I know pretty much nothing about SAS and I am not a programmer but an accountant, but here it goes:
I am trying to compare two data sets to identify differences between them, so I am using the 'proc compare' command as follows:
proc compare data=table1 compare=table2
criterion=.01;
run;
This works fine, but it compares line by line and in order, so if table2 is missing a row half way through, then all entries after that row will be returned as not equal.
How do I ask the comparison to be made based on a variable so that the proc compare finds the value associated with variable X in table 1, and then makes sure that the same variable X in table 2 has the same value?
The ID statement in PROC COMPARE is used to match rows. This code may work for you:
proc compare data=table1 compare=table2 criterion=.01;
id X;
run;
You may need to use PROC SORT to sort the data by X before doing the PROC COMPARE. Refer to the PROC COMPARE documentation for details on the ID statement to determine if you should sort or not.
Here is a link to the PROC COMPARE documentation:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a000057814.htm