How do I unequally bin data in SAS? - sas

I am trying to take a data set that ranges from -2.0 to 1.55 and make three bins of the data. The data are z scores that represent lengths, and I'm trying to find cutoffs for "short, medium, long," essentially.
I know this cannot be done using univariate, so I've been playing with the bin statement, but I am having trouble with it.
Here is the data and some code I can't get to work:
data limbs;
input
LowerLimb
;
datalines;
0.5945611665
-0.5826515170
-1.2089586047
-0.7638814175
-1.3541648163
-0.8279052306
-0.9069854423
0.9439623714
-0.1525671573
0.4056990026
1.5466954947
0.4034370839
0.8766515519
-1.5657943810
0.1315781412
0.5884629368
0.6104427011
-0.1874296672
-0.6318866100
-1.0145154507
0.4573267066
-0.0788037696
0.0988716187
-0.1062918576
1.6032740744
0.0366051704
1.2256319114
-0.0975189376
-0.2566850316
1.6652074953
-0.2515183734
0.5004921436
-0.1593883100
0.8129010817
1.1351908320
0.8123843303
1.2870155459
-1.1128448929
0.4506147031
-0.6403088674
-1.0680294390
0.6944292489
-1.5325710123
0.5268637927
1.2873515926
-0.1441695459
-0.7166217143
1.2461334186
-0.9583531596
-0.7342533139
-0.4907715810
0.2059216422
-1.3839801362
0.7310499731
0.6130932991
1.0859079024
1.4255534497
-0.1813774454
0.6544726467
-1.0171713430
-0.5005970523
0.5884629368
-1.4752458285
-0.9195150817
-1.4752458285
0.8515222486
-0.6348874123
-1.0206723355
-0.3331377791
-1.1015990720
-0.1196299907
-0.3504059025
-0.5797983640
-0.2784242647
0.4381186749
0.4665127006
-1.8605760577
-1.5943485819
-0.4196862951
0.5247889197
1.6982671983
-0.5015070356
0.2510690218
-0.1088654424
-0.0470926244
-1.1256998568
-0.4694781522
-0.4903309954
-0.1706456902
-0.7996053224
-0.2106636370
1.1087050595
1.5393390992
0.6407710538
0.8738320036
-1.1218388138
0.5477816746
0.5999120789
0.2915917178
0.5932996471
-0.4754278117
-0.1195030573
0.3480903069
0.1629924791
-0.8543653798
0.0602221361
-0.3484280234
0.8213886228
1.0996879917
-1.0171713430
-0.2613938856
0.1435928118
-0.2410397237
2.0380301721
0.9942206208
-0.7858668669
1.0463609814
0.5651396814
-0.4366703308
-1.2232641582
-0.3770888329
-1.9197016431
1.0463609814
-1.3738499052
-1.0554234361
1.1701816705
-0.8687068897
-0.8743902197
-1.3518493892
-1.6473112739
-0.2953961077
0.5734156662
0.5065516647
1.1603237185
-0.3369092077
1.0982075159
-1.0002141384
-0.9192524613
-0.0431072738
-0.0742208903
0.8658302777
-1.1095158202
-0.8361540961
0.5871263103
-0.3311134236
0.3331929252
-0.6499008335
-1.1966097379
0.7227541366
0.1853978157
0.8074323856
-0.8096153897
-1.0220042319
1.1172583088
1.3540629514
-1.3149667205
-0.7600725098
1.1145492382
1.3270625584
1.1572834877
-1.1877623250
0.3202875975
0.8779400227
-0.4333817521
1.2656618368
-1.1416425163
-0.9014599711
0.3918324501
-0.7997140876
-0.2229835864
-0.0362833762
-0.4399531798
1.1975110853
-0.0183032379
-0.7413393186
0.8474043498
1.2789829755
0.8673767628
-0.4438902513
1.0776590402
0.1910287517
1.1313548102
0.8659515949
-0.9444619985
-0.9926366647
0.6447604307
0.8370824694
-0.8917575544
1.5862615371
0.8437626048
0.2362696149
-1.1429415083
-1.2621422188
-0.7364910931
0.3618265073
0.1708182871
-0.3114446248
0.0011119450
-1.0323790009
0.5951509779
1.6392758858
0.9646250229
-0.9076823320
-0.1409210592
-0.8529359998
;
cutpts = do(-2.00, 1.55);
b = bin(x, 3); /* i_th element is 1-8 to indicate bin */

If you want to group the data into 3 groups you can use PROC RANK with GROUP=3. You don't have to sort by the target variable but I makes it easier to see what was done.
data limbs;
input LowerLimb ##;
datalines;
0.5945611665 -0.5826515170 -1.2089586047 -0.7638814175 -1.3541648163 -0.8279052306 -0.9069854423 0.9439623714 -0.1525671573 0.4056990026
1.5466954947 0.4034370839 0.8766515519 -1.5657943810 0.1315781412 0.5884629368 0.6104427011 -0.1874296672 -0.6318866100 -1.0145154507
0.4573267066 -0.0788037696 0.0988716187 -0.1062918576 1.6032740744 0.0366051704 1.2256319114 -0.0975189376 -0.2566850316 1.6652074953
-0.2515183734 0.5004921436 -0.1593883100 0.8129010817 1.1351908320 0.8123843303 1.2870155459 -1.1128448929 0.4506147031 -0.6403088674
-1.0680294390 0.6944292489 -1.5325710123 0.5268637927 1.2873515926 -0.1441695459 -0.7166217143 1.2461334186 -0.9583531596 -0.7342533139
-0.4907715810 0.2059216422 -1.3839801362 0.7310499731 0.6130932991 1.0859079024 1.4255534497 -0.1813774454 0.6544726467 -1.0171713430
-0.5005970523 0.5884629368 -1.4752458285 -0.9195150817 -1.4752458285 0.8515222486 -0.6348874123 -1.0206723355 -0.3331377791 -1.1015990720
-0.1196299907 -0.3504059025 -0.5797983640 -0.2784242647 0.4381186749 0.4665127006 -1.8605760577 -1.5943485819 -0.4196862951 0.5247889197
1.6982671983 -0.5015070356 0.2510690218 -0.1088654424 -0.0470926244 -1.1256998568 -0.4694781522 -0.4903309954 -0.1706456902 -0.7996053224
-0.2106636370 1.1087050595 1.5393390992 0.6407710538 0.8738320036 -1.1218388138 0.5477816746 0.5999120789 0.2915917178 0.5932996471
-0.4754278117 -0.1195030573 0.3480903069 0.1629924791 -0.8543653798 0.0602221361 -0.3484280234 0.8213886228 1.0996879917 -1.0171713430
-0.2613938856 0.1435928118 -0.2410397237 2.0380301721 0.9942206208 -0.7858668669 1.0463609814 0.5651396814 -0.4366703308 -1.2232641582
-0.3770888329 -1.9197016431 1.0463609814 -1.3738499052 -1.0554234361 1.1701816705 -0.8687068897 -0.8743902197 -1.3518493892 -1.6473112739
-0.2953961077 0.5734156662 0.5065516647 1.1603237185 -0.3369092077 1.0982075159 -1.0002141384 -0.9192524613 -0.0431072738 -0.0742208903
0.8658302777 -1.1095158202 -0.8361540961 0.5871263103 -0.3311134236 0.3331929252 -0.6499008335 -1.1966097379 0.7227541366 0.1853978157
0.8074323856 -0.8096153897 -1.0220042319 1.1172583088 1.3540629514 -1.3149667205 -0.7600725098 1.1145492382 1.3270625584 1.1572834877
-1.1877623250 0.3202875975 0.8779400227 -0.4333817521 1.2656618368 -1.1416425163 -0.9014599711 0.3918324501 -0.7997140876 -0.2229835864
-0.0362833762 -0.4399531798 1.1975110853 -0.0183032379 -0.7413393186 0.8474043498 1.2789829755 0.8673767628 -0.4438902513 1.0776590402
0.1910287517 1.1313548102 0.8659515949 -0.9444619985 -0.9926366647 0.6447604307 0.8370824694 -0.8917575544 1.5862615371 0.8437626048
0.2362696149 -1.1429415083 -1.2621422188 -0.7364910931 0.3618265073 0.1708182871 -0.3114446248 0.0011119450 -1.0323790009 0.5951509779
1.6392758858 0.9646250229 -0.9076823320 -0.1409210592 -0.8529359998
;;;;
run;
proc sort data=limbs;
by lowerLimb;
run;
proc rank data=limbs out=bin groups=3;
var lowerLimb;
ranks bin;
run;

Related

Using macro variable in an IF statement within a loop is not working

I am having an issue with my code where it is working when I am hard coding the value (in comments) in the IF statement but when I insert the macro variable, the functions 'Copy' and 'Delete' do not work with no errors generated. Below is the code being used:
*%let pathscr = //files/FEB_P000/Reporting_FS;
%let pathdes = //files/FEB_P000/Reporting_FS/Accounting log/2021;
%let fn = LFNPAccounting;
%let dt = %sysfunc(inputn(&acc_date, yymmddn8.),yymmddn8.); /* 20211209 */
%let Var = &fn&dt;/* LFNPAccounting20211209 */
data _null_;
length fref $8 fname $256;
did = filename(fref,'\\files\FEB_P000\Reporting_FS');
did = dopen(fref);
do i = 1 to dnum(did);
fname = dread(did,i);
newfn = SUBSTR(fname,1,22);
if newfn = &Var then do;
/*if newfn = 'LFNPAccounting20211209' then do;*/
rc1=filename('src',catx('/',"&pathscr",fname));
rc2=filename('des',catx('/',"&pathdes",fname));
rc3=fcopy('src','des');
rc4= fdelete('src');
end;
end;
run;*
Could anyone help please?
Thanks
Hans
I am guessing you try to look into a specified folder pathscr, and if a file matches a certain string (SUBSTR(fname,1,22)), you copy and delete the latter to the Logs folder pathdes.
libname report "/home/kermit/temp/Reporting/";
data report.have20211210
report.have20211209
report.have20211208;
id = 1;
output;
run;
%let pathscr = /home/kermit/temp/Reporting/;
%let pathdes = /home/kermit/temp/Logs/;
%let fn = have; /* Name of the file */
%let type = .sas7bdat; /* File extension */
%let dt = %sysfunc(inputn(%sysfunc(today()), yymmddn8.), yymmddn8.);
%let file = &fn&dt&type.;
%put &=file;
data _null_;
drop rc did;
rc=filename("mydir", "&pathscr.");
did=dopen("mydir");
if did > 0 then do; /* check that the directory can be opened */
do i=1 to dnum(did); /* use dnum() to determine the highest possible member number */
fname=dread(did, i); /* get the name of the file */
if fname = "&file." then do; /* if the name of the file match: */
rc=filename('src', "&pathscr&file.");
rc=filename('des', "&pathdes&file.");
rc=fcopy('src', 'des'); /* copy from source to destination */
rc=fdelete('src'); /* delete from source */
end;
end;
end;
else do; /* if directory cannot be open, put the error message to the logs */
msg=sysmsg();
put msg;
end;
run;
Logs:
FILE=have20211210.sas7bdat
DOPEN opens a directory and returns a directory identifier value (a number greater than 0) that is used to identify the open directory in other SAS external file access functions. If the directory cannot be opened, DOPEN returns 0, and you can obtain the error message by calling the SYSMSG function.
I used today() for the dt macro-variable for convenience sake, but you will have to change it to whatever date you are searching for.
Consider that with the code above, if the file is already in the Logs folder, it will not be overwritten. Note that you do not have to use the CATX function if you put another / at the very end of your specified path.
Result
Macro variables are not resolved when bounded by single quotes. They are resolved when within double quotes.
Try
did = filename(fref,"&path_scr");
You set VAR to a value like:
%let Var = LFNPAccounting20211209 ;
Then you use it to generate a SAS statement:
if newfn = &Var then do;
Which will resolve to
if newfn = LFNPAccounting20211209 then do;
Since I did not see you creating any variable named LFNPAccounting20211209 it is most likely that you want to use this statement instead:
if newfn = "&Var" then do;
So that the SAS code you generate will compare the value of NEWFN to a string literal instead of another variable.
Note: Since it looks like you are using WINDOWS filesystem you should make the comparison case insenstive.
if upcase(newfn) = %upcase("&Var") then do;

SpreadsheetFormats not working as expected

I am able to populate data from a query into a spreadsheet. However, I am having problems getting "ranged" formatting to work properly. The formatting for specific column (date) and row (header) work fine. But SpreadsheetFormatColumns, ...Rows, ...CellRange is not. I need to set the font and fontsize to the whole dataset.
Here is what I have tried.
<cfscript>
//Current directory path.
theFile = GetDirectoryFromPath(GetCurrentTemplatePath()) & "GridDump.xls";
//Create a new Excel spreadsheet object and add the query data.
theSheet = SpreadsheetNew("Raw Data");
FormatDate.dataformat = "dd-mmm-yy";
//Get Row Count and Row Range
RC = toString(result.recordcount+1);
RR = "1-" & RC;
//Get Column Count
CC = toString(ListLen(GridFieldNames));
//Get Column Letter
CL = chr(CC + 64);
//Get Column Range (Nummerical)
CRN = "1-" & CC;
//Get Column Range (Alphabetical)
CRA = "A-" & CL;
//Set Sheet Format
WholeSheet = StructNew();
WholeSheet.font="Consolas";
WholeSheet.fontsize=12;
//Set header Row Format
HeadRow = StructNew();
HeadRow.bold="true";
//Insert the Header Row
SpreadsheetAddRow(theSheet,GridFieldNames);
//Insert the Data
SpreadsheetAddRows(theSheet,result);
//Format the Data
SpreadsheetFormatCellRange(theSheet,WholeSheet,1,1,RC,CC);
//SpreadsheetFormatRows(theSheet,WholeSheet,RR);
//SpreadsheetFormatColumns(theSheet,WholeSheet,CRN);
SpreadsheetFormatRow(theSheet,HeadRow,1);
//Header Row
SpreadsheetFormatColumn(theSheet,FormatDate,1);//Date Column
SpreadsheetAddFreezePane(theSheet,0,1);//Top Row Only
//SpreadSheetAddAutofilter(theSheet,"A1:J1");
</cfscript>
Here are the results
I'm getting the same result for all three of the "ranged" formatting functions. The format stops part way through the spreadsheet. I expect the whole dataset to accept any of the ranged function formats.
I got the same result with CF 2018,0,04,314546. Could just be a limitation of XLS format.
Switching to XLSX worked fine for me:
theSheet = SpreadsheetNew("Raw Data", true);
YMMV, but what also worked with CF2018 was using SpreadsheetFormatColumns() instead of SpreadsheetFormatCellRange().

SAS - hash in macro

I'm trying to turn my hash object into a macro so that I can do a match on a number of different analysis variables. Here is the part of the macro with the hash object. I feel that my issue must be with how I am calling/quoting the macros in the hash, because a different version of this hash works without the macro. Thoughts?
The errors I am getting are ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase. ERROR 557-185: Variable data is not an object. And then later in the object, ERROR: File DATA.TEST_BANK_ACCOUNT_ALL_REGS.DATA does not exist.
data data.test_&match_field._all_regs;
if _N_ = 1 then do;
if 0 then set = data.test_&match_field._match_srt;
declare hash contractors(dataset:"data.test_&match_field._match_srt", multidata: 'yes');
contractors.defineKey("&match_var.");
contractors.defineData('fpds_duns',
'xxx_dod_contractor',
"&match_flag.",
'xxx_small_contractor',
'xxx_medium_contractor',
'xxx_large_contractor',
'xxx_reported_relationship',
'xxx_joint_venture_flag');
contractors.defineDone();
end;
set data.test_xxx_200;
rc = contractors.find(key:"&match_var.");
do while (rc=0);
if xxxx_duns = xxx_hq_parent_duns_number or
xxxx_duns = xxx_hq_parent_duns_number or
xxxx_duns = xxx_global_parent_duns_number then xxx_reported_relationship = 'Y';
else xxx_reported_relationship = 'N';
output data.test_&match_field._all_regs;
rc = contractors.find_next(key:"&match_var.");
end;
run;

SQLite Update table

Trying to update table by user specified values. But the values are not getting updated.
cout<<"\nEnter Ac No"<<endl;
cin>>ac;
cout<<"\nEnter Amount"<<endl;
cin>>amt;
/* Create merged SQL statement */
sql = "UPDATE RECORDS set BAL = '%d' where ACCOUNT_NO = '%d'",amt, ac;
/* Execute SQL statement */
rc = sqlite3_exec(db, sql, callback, (void*)data, &zErrMsg);
If I replace BAL and ACCOUNT_NO by some integer value instead of place holder then it is working fine.
Your sql string is not being created properly.
If you expect this code
sql = "UPDATE RECORDS set BAL = '%d' where ACCOUNT_NO = '%d'",amt, ac;
to result in
"UPDATE RECORDS set BAL = '1' where ACCOUNT_NO = '2'"
where
amt= 1 and ac = 2 then you need to use a string formatting call like this.
// the buffer where your sql statement will live
char sql[1024];
// write the SQL statment with values into the buffer
_snprintf(sql,sizeof(sql)-1, "UPDATE RECORDS set BAL = '%d' where ACCOUNT_NO = '%d'",amt, ac);
buff[sizeof(sql)-1]='\0';
On your particular platform _snprintf(...) might be snprintf(..) or another similarly named function. Also your compiler may warn about buffer manipulation security vulnerabilities. Choose the appropriate substitute for your needs

Use of maximum likelihood in ado file in Stata

I am trying to understand the use of maximum likelihood in Stata (for which I am currently using the third edition of the book by Gould et al.). In particular, I am focussing on user program craggit. The detail of command can be found in Stata article. When using the view source craggit.ado, I can see all codes in the ado file. In the ado file [details below], I see the ml using the lf method, but nowhere in the file do I see the maximum likelihood commands (probit and truncreg as specified in the article). Please let me know whether I am missing something.
program craggit
version 9.2
if replay() {
if ("`e(cmd)'" != "craggit") error 301
Replay `0'
}
else {
//Checking data structure
syntax varlist [fweight pweight] [if] [in], SECond(varlist) [ ///
Level(cilevel) CLuster(varname) HETero(varlist) * ///
]
gettoken lhs1 rhs1 : varlist
gettoken lhs2 rhs2 : second
marksample touse
quietly sum `lhs1' if `touse'
local minval1 = r(min)
quietly sum `lhs2' if `touse'
local minval2 = r(min)
if `minval1'<0 | `minval2'<0 {
di "{error:A dependant variable is not truncated at 0: {help craggit} is
> not appropriate}"
}
else Estimate `0'
}
end
program Estimate, eclass sortpreserve
di ""
di "{text:Estimating Cragg's tobit alternative}"
di "{text:Assumes conditional independence}"
syntax varlist [fweight pweight] [if] [in], SECond(varlist) [ ///
Level(cilevel) CLuster(varname) HETero(varlist) * ///
]
mlopts mlopts, `options'
gettoken lhs1 rhs1 : varlist
gettoken lhs2 rhs2 : second
if "`cluster'" != "" {
local clopt cluster(`cluster')
}
//mark the estimation subsample
marksample touse
//perform estimation using ml
ml model lf craggit_ll ///
(Tier1: `lhs1' = `rhs1') ///
(Tier2: `lhs2' = `rhs2') ///
(sigma: `hetero') ///
[`weight'`exp'] if `touse', `clopt' `mlopts' ///
maximize
ereturn local cmd craggit
Replay, `level'
end
program Replay
syntax [, Level(cilevel) *]
ml display, level(`level')
end
The log likelihood function is computed in the file craggit_ll.ado, so to see that you need to type viewsource craggit_ll.ado.
The logic behind storing the log likelihood evaluator program in a separate file is that all programs that are defined in the craggit.ado file, except the very first one, are local to the commands stored in that file, so ml would not be able to see it. By storing it in a separate file, the craggit_ll command will become global, and ml wil be able to use it.