In SAS, Proc HPBIN, OUTPUT option does not keep original variables as explained below
OUTPUT=SAS-data-set
creates an output SAS data set in single-machine mode or a database table that is saved alongside the distributed database in distributed mode. The output data set or table contains binning variables. To avoid data duplication for large data sets, the variables in the input data set are not included in the output data set.
--> How can I keep original variables and bin number?
Do a 1:1 merge of the output data set with the input data set.
For example:
proc hpbin data=sashelp.orsales noprint out=profitbinned;
var profit;
run;
data want;
merge sashelp.orsales profitbinned;
* 1:1 merge does not have a BY statement;
run;
If the input data has a primary, or unique key, you can specify those key variables in an ID statement to ensure a more robust post-binning merge:
proc hpbin data=sashelp.citiday noprint out=dowcmp_binned;
var SNYDJCM;
id date;
run;
proc sql;
create table want as
select bin.date, ticker.SNYDJCM, bin.bin_SNYDJCM
from sashelp.citiday as ticker
join work.dowcmp_binned as bin
on ticker.date = bin.date
order by bin.date;
Related
I have multiple SAS dataset in single location(folder) with two columns and name of the SAS dataset seems to be Diagnosis_<diagnosis_name>.
Here I want to load all dataset and combine all together like below,
Sample data set
File Location: C:\Users\xyz\Desktop\diagnosis\Diagnosis_<diagnosis_name>.sas7bdat
1. Dataset Name : Diagnosis_Diabetes.sas7bdat
2. Dataset Name : Diagnosis_Obesity.sas7bdat
Ouput which I expect like this
Could you please help me on this.
You can just combine the datasets using SET statement. If want all of the datasets with names that start with a constant prefix you can use the : wildcard to make a name list.
First create a libref to reference the directory:
libname diag 'C:\Users\xyz\Desktop\diagnosis\';
Then combine the datasets. If the original datasets are sorted by the PersonID then you can add a BY statement and the result will also be sorted.
data tall;
set diag.diagnosis_: ;
by person_id;
run;
If want to generate that wide dataset you could use PROC TRANSPOSE, but in that case you will need some extra variable to actually transpose.
data tall;
set diag.diagnosis_: ;
by person_id;
present=1;
run;
proc transpose data=tall out=want(drop=_name_);
by person_id;
id diagnosis;
var present;
run;
I have run a regression on a data set. I would like to then add the predicted values into the original data set table. I would like the PredictedMS_Diff values to be added to the PROPreg_CSR_final dataset.
proc reg data=PROPreg_CSR_final outest=outest_model_1 covout plots=diagnostics(stats=(default aic
sbc));
title "CSR Final";
FinalCSR:MODEL MS_Diff_CSR=Rank_Delta_prop;
Output PREDICTED=PredictedMS_Diff
run;
title;
Your output statement does not have an OUT= option so the data set is named by SAS. Also missing a semicolon.
Output PREDICTED=PredictedMS_Diff
If that has worked it would have been a copy of the input data with PredictedMS_Diff added.
proc reg data=sashelp.class;
model weight=height;
output out=pred predicted=p residual=r;
run;
My question is about the append of two different tables that are supposed to have the same name/format/type/length variables.
I am trying to create a step in my SAS program where I don't allow my program to be executed if the format/type/length of variables with the same name is not the same.
For example, when in one table I have a date in type string "dd-mm-yyyy" and in the other table I have the "yyyy-mm-dd" or "dd-mm-yyyy hh:mm:ss". After the append, our daily executions based on these input tables didn't work as expected. Sometimes the values come up as missing or out of order, since the formats are different.
I tried using the PROC COMPARE statement, which allowed me to check which variables have Differing Attributes (Type, Length, Format, InFormat and Labels).
proc compare base = SAS-data-set
compare = SAS-data-set;
run;
However, I only got the info on which variables have differing atributes (listing of common variables with differing attributes), not being able to do anything with/about it.
On the other hand, I would like to know if there's a chance to have a structured output table with this information, in order to use it as a control statement.
Creating an automatic task to do it would save me a lot of time.
Screenshot of an example:
You can use Proc CONTENTS to get information about a data sets variables. Do that for both data sets, and then you can use Proc COMPARE to create a data set informing you of the variable attributes differences.
data cars1;
set sashelp.cars (obs=10);
date = today ();
format date date9.;
cars1_only = 1;
x = 1.458; label x = "x-factor";
run;
data cars2;
length type $50;
set sashelp.cars (obs=10);
format date yymmdd10.;
cars2_only = 1;
X = 1.548; label x = "X factor to apply";
run;
proc contents noprint data=cars1 out=cars1_contents;
proc contents noprint data=cars2 out=cars2_contents;
run;
data cars1_contents;
set cars1_contents;
upName = upcase(Name);
run;
data cars2_contents;
set cars2_contents;
upName = upcase(Name);
run;
proc sort data=cars1_contents; by upName;
proc sort data=cars2_contents; by upName;
run;
proc compare noprint
base=cars1_contents
compare=cars2_contents
outall
out=cars_contents_compare (where=(_TYPE_ ne 'PERCENT'))
;
by upName;
run;
There is also an ODS table you can capture directly without having to run Proc CONTENTS, but the capture is not 'data-rific'
ods output CompareVariables=work.cars_vars;
proc compare base=cars1 compare=cars2;
run;
I have approximately 1,000,000 rows and 25 columns of data and I'm trying to return a list of column names, the number of distinct values and whether there are missing values.
I am not able to directly code in column names in PROC SQL and count distinct as I have numerous data sets with different column names and I'm trying to automatically return the desired outcome for all tables with one piece of code.
I've tried running the following code
proc freq nlevels data= &DATASET_NAME;
ods output nlevels=nlevels ;
tables _all_ NOPRINT;
run;
This returns an out of memory error. Is there another way to achieve the result, avoiding the out of memory error.
It is unnecessary to input column name by table _all_, but it possibly makes out of memory by inputting all columns at the same time, try to separate column to do proc freq and then combine results:
proc sql;
create table name as
select name from dictionary.columns where libname='SASHELP' and memname='CLASS';
quit;
data want;
run;
data _null_;
set name;
call execute(
'proc freq data=class nlevels;
table '||name||';
ods output nlevels=nlevels;
run;
data want;
set want nlevels;
run;'
);
run;
This question is very similar to SAS summary statistic from a dataset
The answers cover techniques for
transpose + freq
hash
freq w/ ODS exclude+output
I want to count the number of records in a dataset in SAS. There is a function the make this thing in a simple way? I used R ed for obtain this information there was the length() function. Morover I need the number of record to compute some percetages so I need this value not in a table but in a value that can be used for other data step. How can I fix?
Thanks in advance
Here is another solution, using SAS dictionaries,
proc sql;
select nobs into: num_obs
from dictionary.tables
where libname = "WORK" and memname = "A"
;
quit;
It is easy to get the size of many datasets by modifying the above code,
proc sql;
create table test as
select memname, nobs
from dictionary.tables
where libname = "WORK" and memname like "A%"
;
quit;
data _null_;
set test;
call symput(memname, nobs);
run;
The above code will give you the sizes of all data sets with name starting with "a" in the temporary/work library.
Assuming this is a basic SAS table that you've created, and not modified or appended to, the best way is to use the meta data held in a dataset (the Number of tries is held in a piece of meta data called "nobs"), without reading through the dataset its self and place it in a macro variable. You can do this in the following way:
Data _null_;
i=1;
If i = 0 then set DATASETTOCOUNT nobs= mycount;
Call symput('mycount', mycount);
Run;
%put &mycount.;
You will now have a macro variable that contains the number of rows in your dataset, that you can call on in other data steps using &mycount.