Changing the first row name conditionally on character interval in SAS - sas

Consider the following data:
data GDP;
input Year $ Agriculture Industry;
datalines;
2016 195 1634
2017 220 1986
;
When exporting as a .dat file:
proc export
data = GDP
outfile = '....\GDP.dat'
dbms = TAB
replace;
run;
Then I get the following file:
However, I want the following file:
Where:
Mydata is a text I manually add.
The number after for instance Year (that is Year: 1-4) is the character intervals where the values are within. For instance, the values in the Year column is from characther 1 to 4. The values in the agriculture column goes from 9 to 11, and so on.
So SAS should count the interval for the values and add it to the first row name. How to do it in SAS?

You can fudge this with labels to your variables and then add the LABEL option to PROC EXPORT.
data GDP;
input Year $ Agriculture Industry;
label Year = "Mydata, Year:1-4" Agriculture = "Agriculture:9-11";
datalines;
2016 195 1634
2017 220 1986;
run;
proc export
data = GDP
outfile = '....\GDP.dat'
dbms = TAB
LABEL
replace;
run;
FYI - it looks like you're trying to create a fixed width file and put the specifications in the header. I'd advise against this and either put the specifications in a separate file or to include it at the top of the file instead.
Putting it in the header makes it harder for any other system to process correctly.
If you really need this for some reason, you may also want to consider using a data step to create your export instead of using PROC EXPORT.
AFAIK there is no easy way to define the specifications automatically though you could push the PROC CONTENTS output to a separate data set.

Related

How to manipulate sas7bdat files?

I'm working with a sas7dbat file that was created erroneously and trying to fix it. When the file was created, the data was all input unto a single column instead of multiples, and I can't figure out how to manipulate it to do this. I thought I'd be able to use infile and make the dlm "|" with dsd to remove the quotations on the name column, but it seems that this problem is harder than it looks.
I basically want to turn that one column into the six it was supposed to be and delete the quotations from the names. Here's what it looks like in SAS:
SAS7dbat
And here's the datalines in case they're needed:
1 0017|2020-04-09|"Jason Nguyen"|122L|500.0|$404.82
2 0017|2020-04-09|"Jason Nguyen"|407XX|100.0|$201.95
3 0177|2020-04-05|"Glenda Johnson"|144L|100.0|$91.01
4 0177|2020-04-05|"Glenda Johnson"|188X|100.0|$70.76
5 0177|2020-04-05|"Glenda Johnson"|733|2.0|$101,230.00
6 0177|2020-04-05|"Glenda Johnson"|777|5.0|$106.29
7 1843|2020-04-03|"George Smith"|122|100.0|$60.64
8 1843|2020-04-03|"George Smith"|122L|10.0|$303.18
9 1843|2020-04-03|"George Smith"|144L|50.0|$91.01
10 1843|2020-04-03|"George Smith"|188S|3.0|$52,629.48
11 1843|2020-04-03|"George Smith"|855W|1.0|$92,210.41
12 1843|2020-04-03|"George Smith"|908X|1.0|$51,920.87
13 9888|2020-04-11|"Sharon Lu"|100W|1,000.0|$20.14
14 9888|2020-04-11|"Sharon Lu"|122|50.0|$60.64
(each line is one column inside SAS)
Go back and fix the import code would be my suggestion otherwise use the SCAN() function.
data want;
set have;
var1 = scan(variableName, 1, '|');
var2 = input(scan(variableName, 2, '|'), yymmdd.);
format var2 date9.;
var3 = dequote(scan(variableName, 3, '|'));
....
run;
Another option is to write the file as is back to a text file and then import it using the DLM='|' option.
Untested:
proc export data=have outfile='myfile.txt' dbms=dlm replace;
delimiter='';
run;
proc import out=want datafile='myfile.txt' dbms=dlm replace;
delimiter='|';
run;
Given that it's only 6 variables though you may as well write the data step for that code anyways.

Produce custom table in SAS with a subsetted data set

I want to use SAS and eg. proc report to produce a custom table within my workflow.
Why: Prior, I used proc export (dbms=excel) and did some very basic stats by hand and copied pasted to an excel sheet to complete the report. Recently, I've started to use ODS excel to print all the relevant data to excel sheets but since ODS excel would always overwrite the whole excel workbook (and hence also the handcrafted stats) I now want to streamline the process.
The task itself is actually very straightforward. We have some information about IDs, age, and registration, so something like this:
data test;
input ID $ AGE CENTER $;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
The goal is to produce a table report which should look like this structure-wise:
ID NO-ID Total
Count 3 2 5
Age (mean) 27 45.5 34.4
Count by Center:
A 2 1 3
B 0 1 1
A 1 0 1
It seems, proc report only takes variables as columns but not a subsetted data set (ID NE .; ID =''). Of course I could just produce three reports with three subsetted data sets and print them all separately but I hope there is a way to put this in one table.
Is proc report the right tool for this and if so how should I proceed? Or is it better to use proc tabulate or proc template or...?
I found a way to achieve an almost match to what I wanted. First if all, I had to introduce a new variable vID (valid ID, 0 not valid, 1 valid) in the data set, like so:
data test;
input ID $ AGE CENTER $;
if ID = '' then vID = 0;
else vID = 1;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
After this I was able to use proc tabulate as suggested by #Reeza in the comments to build a table which pretty much resembles what I initially aimed for:
proc tabulate data = test;
class vID Center;
var age;
keylabel N = 'Count';
table N age*mean Center*N, vID ALL;
run;
Still, I wonder if there is a way without introducing the new variable at all and just use the SAS counters for missing and non-missing observations.
UPDATE:
#Reeza pointed out to use the proc format to assign a value to missing/non-missing ID data. In combination with the missing option (prints missing values) in proc tabulate this delivers the output without introducing a new variable:
proc format;
value $ id_fmt
' ' = 'No-ID'
other = 'ID'
;
run;
proc tabulate data = test missing;
format ID $id_fmt.;
class ID Center;
var age;
keylabel N = 'Count';
table N age*(mean median) Center*N, (ID=' ') ALL;
run;

SAS: Loop over dataset, make temp data step for ith row, do some proc w/ temp data, return results to first dataset

So I have a dataset_a that looks like this:
Name Month
Dick Aug
Dick Sep
Dick Oct
Jane Aug
Jane Sep
...
And some other, much larger dataset_b like this:
Name Day X Y
Dick 12-Jul-13 14.8 2.3
Jane 05-Sep-13 12.2 2.0
Dick 02-Aug-13 15.1 3.2
Dick 07-Aug-13 14.5 3.0
Jane 05-Aug-13 12.8 2.5
Dick 08-Aug-13 14.5 3.0
Dick 10-Aug-13 13.5 2.3
Jane 31-Jul-13 13.0 2.2
...
I want to iterate over it, and for each row in dataset_a, do a data step that gets the appropriate records from dataset_b and puts them in a temp dataset--temp, let's call it. Then I need to do a proc reg on temp and stick the results (row-vector-style) back into dataset_a, like so:
Name Month Parameter-est.-for-Y p-value R-squared
Dick Aug Some # Some # Some #
Dick Sep Some # Some # Some #
Dick Oct Some # Some # Some #
Jane Aug Some # Some # Some #
Jane Sep Some # Some # Some #
...
Here's some code/pseudocode to illustrate my need:
for each row in dataset_a
data temp;
set dataset_b; where name=['i'th name] and month(day)=['i'th month];
run;
proc reg /*noprint*/ alpha=0.1 outest=[?] tableout; model X = Y; run;
/*somehow put these regression results back into 'i'th row of dataset_a*/
next
Please post a comment if something doesn't make sense. Thanks very much in advance!
The efficient approach for this is somewhat different than what you are listing. In the particular instance you show, the most efficient approach would be to use a format to group the Day values into Months, and run your regression by name day, assuming regression respects formats (if not, then create a new variable month and assign that using the format).
For example:
data for_reg/view=for_reg;
set dataset_b;
month=put(day,MONNAME3.);
run;
Or
proc datasets lib=work;
modify dataset_b;
format day MONNAME3.;
quit;
Then
proc reg data=for_reg;
by name month; *or if using the other one, by name day;
**other proc reg statements**;
run;
Then merge that output dataset with dataset_a if needed. It will run the proc reg as if you'd run it once for each name/month combination, but all in one call and one pass through the data.
If PROC REG doesn't respect by groups (and I think it does, but who knows), the best solution is still to do something like this; write a macro to run the proc reg taking arguments of name and month, and call the macro from the dataset_a. Then generate common output files (or proc append them into a single master output dataset in the macro) and merge the result to dataset_a if needed at the end.
Something like
%macro run_procreg(name=,month=);
data for_run/view=for_run;
set dataset_b;
where name=&name. and put(day,MONNAME3.)=&month.;
run;
proc reg data=for_run;
*other stuff*;
output out=tempdataset; *or however you create your output;
run;
proc append base=master_output data=tempdataset force;
run;
%mend run_procreg;
proc sql;
select cats('%run_procreg(name=',name,',month=',month,')') into :macrocalllist
separated by ' ' from dataset_a;
quit;
&macrocalllist;
data fin;
merge dataset_a (in=a) master_output(in=b);
by name month;
run;
You probably don't need to merge on dataset_a at the end if it just has those two variables. This will be a lot slower than one call with by, but if it's necessary, this is the way to do it.
You can also use call execute in the datastep to drive a macro list like above - that's nearly the most similar concept to your stated pseudocode, it's almost identical - but it doesn't return the information back to the data step (it executes after the data step completes), and it's slightly more troublesome than the above method. There is also, in 9.3+, dosubl in the FCMP language which allows you to do a bit closer to what you want, but I don't know it well enough to explain or know that it does indeed meet your needs.

SAS-How to format arrays dynamically based on information in one column

I'm new to SAS, and would greatly appreciate anyone who can help me formulate a code. Can someone please help me with formatting changing arrays based on the first column values?
So basically here's the original data:
Category Name1 Name2......... (Changes invariably)
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
I would like to format the values under Name1 to infinite Name# and reformat them to dollar10.2 for any values under Category called 'AmountBilled','AmountPaid','AmountDed'.
Thank you so much for your help!
You can't conditionally format a column (like you might in excel). A variable/column has one format for the entire column. There are tricks to get around this, but they're invariably more complex than should be considered useful.
You can store the formatted value in a character variable, but it loses the ability to do math.
data have;
input category :$10. name1 name2;
datalines;
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
;;;;
run;
data want;
set have;
array names name:; *colon is wildcard (starts with);
array newnames $10 newname1-newname10; *Arbitrarily 10, can be whatever;
if substr(category,1,6)='Amount' then do;
do _t = 1 to dim(names);
newnames[_t] = put(names[_t],dollar10.2);
end;
end;
run;
You could programmatically figure out the newname1000 endpoint using PROC CONTENTS or SQL's DICTIONARY.COLUMNS / SAS's SASHELP.VCOLUMN. Alternately, you could put out the original dataset as a three column dataset with many rows for each category (was it this way to begin with prior to a PROC TRANSPOSE?) and put the character variable there (not needing an array). To me that's the cleanest option.
data have_t;
set have;
array names name:;
format nameval $10.;
do namenum = 1 to dim(names);
if substr(category,1,6)='Amount' then nameval = put(names[namenum],dollar10.2 -l);
else nameval=put(names[namenum],10. -l); *left aligning here, change this if you want otherwise;
output; *now we have (namenum) rows per line. Test for missing(name) if you want only nonmissing rows output (if not every row has same number of names).
end;
run;
proc transpose data=have_t out=want_T(drop=_name_) prefix=name;
by category notsorted;
var nameval;
run;
Finally, depending on what you're actually doing with this, you may have superior options in terms of the output method. If you're doing PROC REPORT for example, you can use compute blocks to set the style (format) of the column conditionally in the report output.

SAS DDE not formatting output correctly in Excel

I'm just looking to export a SAS dataset into a pre-made Excel template.
First 6 variables of my dataset (which is a .wpd file) look like:
StartDate EndDate product_code Description Leaflet Media
04-Jul-13 07-Jul-13 256554 BUTCHER BEEF 1PK (1 KGM) 54x10 3
I currently have:
options noxwait noxsync;
x '"c:\Template.xls"'; /* <--excel template to use*/
filename template dde 'excel|Leaflets!r6c1:r183c67'; /*put data in rows 3 to 183 in leaflets sheet*/
data LEAF.results; set LEAF.results;
file template ;
put StartDate EndDate product_code Description Leaflet Media
/*and the remaining 61 variables*/
run;
The DDE procedure works and opens the excel sheet, but the data is not formatted correctly in excel and looks like this:
StartDate EndDate Product code Description Leaflet Media
04 July 2013 07 July 2013 256554 BUTCHER BEEF 1PK
As you can see it seems to have treated spaces as delimiters but I'm not sure of the syntax to change this
- might also be worth noting that I have 67 variables in my actual dataset so didn't want to have to informat and format them all individually.
Also, is there a way to output this dataset into my excel template and then save the template as a different filename elsewhere on my c drive?
Thanks!
After trying every DDE option under the sun I finally stumbled across LRECL.
So,
options noxwait noxsync;
x '"c:\Template.xls"'; /* <--excel template to use*/
filename template dde 'excel|Leaflets!r6c1:r183c67' notab **LRECL=3000**; /*put data in rows 6 to 183 in leaflets sheet*/
data LEAF.results; set LEAF.results;
file template ;
put StartDate EndDate product_code Description Leaflet Media
/*and the remaining 61 variables*/
run;
I'm guessing the default length of characters allowed in each cell was too short, so increasing the length allowed means each cell doesn't get split into multiple cells?
source:
http://support.sas.com/resources/papers/proceedings11/003-2011.pdf
Try changing file template ; to use a delimiter of tab, ie, file template dlm='09'x;;
Also, in the filename, add 'notab':
filename template dde 'excel|Leaflets!r6c1:r183c67' notab;