Creating specific PROC JSON hierarchy - sas

I am trying to create a JSON file using SAS Enterprise Guide (EG) in the following format:
{
"schema": "EMAIL_SHA26",
"data": ["1516e67afa2d9c3874c3e9874bdb41c4", "1a7e5f59b3f0dfe6ea152cb65aedb0d2"]
}
I am pretty close, but my resulting JSON file has a few too many brackets. Here is what I am currently getting:
{
"schema": "EMAIL_SHA26",
"data": [
[
"1516e67afa2d9c3874c3e9874bdb41c4"
],
[
"1a7e5f59b3f0dfe6ea152cb65aedb0d2"
]
]
}
And here is the code I am using:
PROC JSON
OUT = users NOSASTAGS PRETTY;
WRITE VALUES "schema";
WRITE VALUES "EMAIL_SHA26";
WRITE VALUES "data";
WRITE OPEN ARRAY;
EXPORT users_ds / NOKEYS;
WRITE CLOSE;
RUN;
The "users_ds" data set has one column with the 2 data records in it. Is there any way I can prevent it from putting brackets around each value in the data set? Furthermore, is my desired output achievable knowing that the list of hashed emails could be as large as 10,000 records?
Any help would be greatly appreciated.

Export defaults to outputting a dataset as an array of objects of name:value pairs. With NOKEYS the construct is an array of arrays, with the inner array being an array of values.
In order to get a single array of values for a column, you can transpose the column into a single rowed data set and export that. You will not have to OPEN ARRAY before EXPORT.
data have;
do row = 1 to 10;
userid = uuidgen();
age = 19 + row;
output;
end;
run;
* transpose a single column into a single row data set;
proc transpose data=have out=have_1_row(drop=_name_);
var userid;
run;
filename users "C:\temp\users.json" ;
proc json out = users nosastags pretty;
WRITE VALUES "schema";
WRITE VALUES "EMAIL_SHA26";
WRITE VALUES "data";
EXPORT have_1_row / NOKEYS;
RUN;
Yields json
{
"schema": "EMAIL_SHA26",
"data": [
"6ebd89fa-b6bc-4c14-b094-43792d202ad7",
"ec53dd59-1290-47d7-b437-0c754349434c",
"17332882-58ca-4c09-a599-2048d58460d0",
"d5b57a19-ff73-4deb-bfc7-62ebc19d719e",
"9d2758b2-e128-45df-8589-99cd7204c1ab",
"a13bcba7-742f-4a01-bd56-dc12f4190d3e",
"5f853bf3-9597-4c94-9b57-a54d3de190c3",
"0edbd2d8-bd5d-46be-aaa7-ac208df4ba62",
"07347e73-7efa-4e9c-8242-5a9c85f07b56",
"03976b1b-513f-41ee-92d5-d23c8d3d4918"
]
}
For the case of wanting to EXPORT more than one column as an array of values, consider using DOSUBL to invoke a macro that side-runs the transposition and generates the single row data set used in a macro code generated EXPORT statement:
%macro transpose_column(data=, column=, out=);
%* generate code that will transpose a single column into a single row data set;
proc transpose data=&data out=&out(keep=col:);
var &column;
run;
%mend;
%macro export_column_as_array (data=, column=);
%local rc out;
%let out = _%sysfunc(monotonic());
%* Invoke DOSUBL to side-run macro generated proc transpose code;
%let rc = %sysfunc(
DOSUBL(
%transpose_column(data=&data, column=&column, out=&out)
)
);
%* use the output data set created by the side-run code;
WRITE VALUES "&column";
EXPORT &out / NOKEYS;
%mend;
data have;
do row = 1 to 10;
userid = uuidgen();
age = 19 + row;
date = today() - row;
output;
end;
format date yymmdd10.;
run;
filename users "C:\temp\users.json" ;
options mprint mtrace;
proc json out = users nosastags pretty;
WRITE VALUES "schema";
WRITE VALUES "EMAIL_SHA26";
%export_column_as_array(data=have,column=userid);
%export_column_as_array(data=have,column=age);
%export_column_as_array(data=have,column=date);
run;
quit;

The output you are getting indicates that there are multiple rows each with a different email.
To get the output you desire, you need to concatenate all of these emails into one long string and that will be problematic since SAS limits the length of each character variable.
Here is a work around. This basically used CALL EXECUTE to generate manual code that will generate the JSON you want:
data have;
data = "1516e67afa2d9c3874c3e9874bdb41c4"; output;
data = "1a7e5f59b3f0dfe6ea152cb65aedb0d2"; output;
run;
data _null_;
set have end=lastrec;
if _N_ = 1 then do;
call execute(
"PROC JSON OUT = 'want.json' NOSASTAGS PRETTY;
WRITE VALUES 'schema';
WRITE VALUES 'EMAIL_SHA26';
WRITE VALUES 'data';
WRITE OPEN ARRAY;
");
end;
call execute('WRITE VALUES "' || data || '";');
if lastrec then call execute("WRITE CLOSE;");
run;
This produces:
{
"schema": "EMAIL_SHA26",
"data": [
"1516e67afa2d9c3874c3e9874bdb41c4",
"1a7e5f59b3f0dfe6ea152cb65aedb0d2"
]
}

Related

SaS 9.4: How to use different weights on the same variable without datastep or proc sql

I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here

SAS Append datasets only if they exist

I have many datasets for each month with the same name, changing just the end with specific month so for instance my datasets that i am calling with this code:
TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
are called "TEMPCAAD.LIFT_MODEL_V1_202021", "TEMPCAAD.LIFT_MODEL_V1_202022" and so on...
I am trying to append all datasets but some of them doesn't exist, so when i run the following code I get the error
Dataset "TEMPCAAD.LIFT_MODEL_V1_202022" does not exist.
%let currentmonth = &anomes_scores;
%let previousyearmonth = &anomes_x12;
data _null_;
length string $1000;
cur_month = input("&previousyearmonth.01",yymmdd8.);
do until (cur_month > input("&currentmonth.01",yymmdd8.));
string = catx(' ',trim(string),'TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
cur_month = intnx('month',cur_month,1,'b');
end;
call symput('mydatasets',trim(string));
%put &mydatasets;
run;
data WORK.LIFTS_U6M;
set &mydatasets.;
run;
How can I append only existing datasets?
Instead of looping on every file to see whether it exist or not, why don't you just extract all the dataset names from dictionary.tables?
libname TEMPCAAD "/home/kermit/TEMPCAAD";
data tempcaad.lift_model_v1_202110 tempcaad.lift_model_v1_202111 tempcaad.lift_model_v1_202112;
id = 1;
output tempcaad.lift_model_v1_202110;
id = 2;
output tempcaad.lift_model_v1_202111;
id = 3;
output tempcaad.lift_model_v1_202112;
run;
%let nome_modelo = MODEL;
%let versao_modelo = V1;
proc sql;
select strip("TEMPCAAD."||memname) into :dataset separated by " "
from dictionary.tables
where libname="TEMPCAAD" and memname like "LIFT_&NOME_MODELO._&VERSAO_MODELO.%";
quit;
data want;
set &dataset.;
run;
You can easily tweak the where statement to only extract the data that you wish to append. Just remember to put double quotes if you specify a macro-variable in it.

Highlight the corresponding line number with another datasets

I have two datasets, one extract the extreme values from proc univariate. I would like to create a new variable and label them as 1 if the n in the original dataset equals the extracted line number in the univariate dataset. But I don't know how to program it not manually enter the line number.
 
There're a few ways to do this, but one easy way is to just add the rownum to the original dataset and merge on it.
Here's an example.
ods output extremeobs=extreme_test;
proc univariate data=sashelp.heart;
run;
ods output close;
data extreme_diastolic extreme_systolic; *just creating the extreme datasets;
set extreme_test;
if varname='Diastolic' then output extreme_diastolic;
else if varname='Systolic' then output extreme_systolic;
run;
data for_merge; *adding rownum on to the original dataset;
set sashelp.heart;
rownum = _n_;
run;
*now, sort the extreme datasets by the `highobs` and `lowobs` values respectively and save those as `rownum`, so they can be merged;
proc sort data=extreme_diastolic out=high_diastolic(keep=highobs rename=highobs=rownum);
by highobs;
run;
proc sort data=extreme_systolic out=high_systolic(keep=highobs rename=highobs=rownum);
by highobs;
run;
proc sort data=extreme_diastolic out=low_diastolic(keep=lowobs rename=lowobs=rownum);
by lowobs;
run;
proc sort data=extreme_systolic out=low_systolic(keep=lowobs rename=lowobs=rownum);
by lowobs;
run;
*now, merge those on using `in=` to identify which are matches.;
data heart_extremes;
merge for_merge high_diastolic(in=_highd) high_systolic(in=_highs) low_diastolic(in=_lowd) low_systolic(in=_lows);
by rownum;
if _highd then high_diastolic = 1;
if _highs then high_systolic = 1;
if _lowd then low_diastolic = 1;
if _lows then low_systolic = 1;
run;

How do I create a SAS data view from an 'Out=' option?

I have a process flow in SAS Enterprise Guide which is comprised mainly of Data views rather than tables, for the sake of storage in the work library.
The problem is that I need to calculate percentiles (using proc univariate) from one of the data views and left join this to the final table (shown in the screenshot of my process flow).
Is there any way that I can specify the outfile in the univariate procedure as being a data view, so that the procedure doesn't calculate everything prior to it in the flow? When the percentiles are left joined to the final table, the flow is calculated again so I'm effectively doubling my processing time.
Please find the code for the univariate procedure below
proc univariate data=WORK.QUERY_FOR_SGFIX noprint;
var CSA_Price;
by product_id;
output out= work.CSA_Percentiles_Prod
pctlpre= P
pctlpts= 40 to 60 by 10;
run;
In SAS, my understanding is that procs such as proc univariate cannot generally produce views as output. The only workaround I can think of would be for you to replicate the proc logic within a data step and produce a view from the data step. You could do this e.g. by transposing your variables into temporary arrays and using the pctl function.
Here's a simple example:
data example /view = example;
array _height[19]; /*Number of rows in sashelp.class dataset*/
/*Populate array*/
do _n_ = 1 by 1 until(eof);
set sashelp.class end = eof;
_height[_n_] = height;
end;
/*Calculate quantiles*/
array quantiles[3] q40 q50 q60;
array points[3] (40 50 60);
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/*Keep only the quantiles we calculated*/
keep q40--q60;
run;
With a bit more work, you could also make this approach return percentiles for individual by groups rather than for the whole dataset at once. You would need to write a double-DOW loop to do this, e.g.:
data example;
array _height[19];
array quantiles[3] q40 q50 q60;
array points[3] _temporary_ (40 50 60);
/*Clear heights array between by groups*/
call missing(of _height[*]);
/*Populate heights array*/
do _n_ = 1 by 1 until(last.sex);
set class end = eof;
by sex;
_height[_n_] = height;
end;
/*Calculate quantiles*/
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/* Output all rows from input dataset, with by-group quantiles attached*/
do _n_ = 1 to _n_;
set class;
output;
end;
keep name sex q40--q60;
run;

SAS - Sort Multiple Dataset using loops

I have a list of SAS datasets which I want to sort by the same variable.
I do not want to use the PROC Sort statement for each one of them,
is there a way to use loops to shorten the entire code?
I am new to SAS so please help!
%let prim =sasdata.qc_no_rx ;
%let other_removals = sasdata.qc_other_removals;
%let drops =sasdata.droplist;
Array data_1(3) $ sasdata.qc_no_rx sasdata.qc_other_removals
sasdata.droplist ;
do over data_1;
Proc sort data = data_1 ;
by ims_ref;
end;
Assuming you have a data set, called dname_list, with the data set names, and they're called dname. Call execute will generate the code and execute it.
I usually create my command in a string and then pass that to call execute. This is a data _null_ step so it doesn't generate a data set but you can generate the data set to test at first if necessary.
You don't need to loop because SAS loops through the records in a data set by itself.
If you're sorting data in a library make sure to include the library name as well.
data _null_;
*data dname_execute;
set dname_list;
string = catt('proc sort data=', dname, '; by age; run;');
call execute(string);
run;
This should help:
%macro multsort(indlist,outdlist,byvarlist,ndata);
%do i = 1 %to &ndata.;
%let indata = %scan("&indlist.",&i.," ");
%let outdata = %scan("&outdlist.",&i.," ");
%let byvars = %scan("&byvarlist.",&i.," ");
proc sort data = &indata. out=&outdata.;by &byvars. ;run;
%end;
%mend;
%multsort(indlist=sashelp.Air sashelp.Buy,outdlist=Sa Sb,byvarlist=Air amount,ndata=2);