Hello I need to output a dat file from my sas code , something like:
#################################
###Game Of Thrones
################################
Number of Candidates = 1
################################
Number of Games = 3
################################
Controlppt = 1
Controlgame = 2
################################
# PPt 1 = Abc
# PPt 2 = Bcd
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
################################
So, It's a mix of comments and text and underneath is pipe delimited data.
I tried using proc export but none of the titles are printed, I also tried with Ods and using title statement for each comment, but does not work either.
Can anyone please suggest a way to achieve this?
You didn't say what your dataset looks like. So let's just invent one.
data have ;
ncandidates=1; ngames=3; controlppt=1; controlgame=2;
ppt1='Abc'; ppt2='Bcd';
infile cards dsd dlm='|';
input (var1-var21) ($);
cards;
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
;
Now a simple DATA step can write the report.
filename report 'myreport.txt';
data _null_;
file report dsd dlm='|' ;
set have end=eof;
if _n_=1 then put
32*'#'
/ '###Game Of Thrones'
/ 32*'#'
/ 'Number of Candidates = ' ncandidates
// 32*'#'
/ 'Number of Games = ' ngames
// 32*'#'
/ 'Controlppt = ' controlppt
/ 'Controlgame = ' controlgame
/ 32*'#'
/ '# PPt 1 = ' ppt1
/ '# PPt 2 = ' ppt2
;
put var1-var21 ;
if eof then put 32*'#';
run;
This is a pretty customized report. what is your output destination, a text file, or PDF?
This can be achieved with PUT statements most likely, especially if to a text file. If it's HTML or RTF it may be slightly different.
Here's a rough approximation of what you need.
data _null_;
file '/folders/myfolders/demo.txt';
set sashelp.class;
put 'Name';
put '###########################';
put name;
put 'Sex';
put '###########################';
put (_numeric_) ('|');
put ;*empty line;
put ;*empty line;
run;
I tried this :
data have ;
ncandidates=1;
ngames=3; controlppt=1; controlgame=2;
ppt1='Abc';
ppt2='Bcd';
infile cards dsd dlm='|';
input (var1-var21) ($);
cards;
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
;
run;
data have1;
infile cards dsd dlm='|';
input (var1-var2) ($);
cards;
1|2
2|3
3|4
4|5
5|6
;
run;
filename report '/home/sas/l119834/myreport.txt';
data _null_;
file report dsd dlm='|' ;
eof=0;
do until(eof);
set have end=eof;
if _n_=1 then put
32*'#'
/ '###Game Of Thrones'
/ 32*'#'
/ 'Number of Candidates =' ncandidates
// 32*'#'
/ 'Number of Games = ' ngames
// 32*'#'
/ 'Controlppt = ' controlppt
/ 'Controlgame = ' controlgame
/ 32*'#'
/ '# PPt 1 = ' ppt1
/ '# PPt 2 = ' ppt2
/ 'Input.Data='
;
put var1-var21 ##;
if eof then put / 32*'#';
end;
put // 83*'#'
/ '### Output Data'
/ 83* '#'
/ '# Output field name, usage = Output Area|Name'
/ '# Area = 0, 1, 2, 3, 4, 5'
// 'output.Name='
;
eof1=0;
do until(eof1);
set have1 end=eof;
put var1-var2 ##;
end;
run;
and got this in the report:
################################
###Game Of Thrones
################################
Number of Candidates =1
################################
Number of Games = 3
################################
Controlppt = 1
Controlgame = 2
################################
# PPt 1 = Abc
# PPt 2 = Bcd
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b|################################
###Game Of Thrones
################################
Number of Candidates =1
################################
Number of Games = 3
################################
Controlppt = 1
Controlgame = 2
################################
Input.Data=
# PPt 1 = Abc
# PPt 2 = Bcd
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
################################
###################################################################################
### Output Data
###################################################################################
# Output field name, usage = Output Area|Name
# Area = 0, 1, 2, 3, 4, 5
output.Name=
1|2|2|3|3|4|4|5|5|6
So, the first half is repeated twice in the output and Input.Name and the actual inputs vars are output on different lines,whereas I wanted it to be something like : input.Data=1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
Related
I tried formatting the data so that U is only numeric while all descriptions should be under T. May i know what could i possibly do to fix it?
DATA data;
infile '....csv'
dlm=',' firstobs=2 dsd;
format A$ B$ C$ D$ E$ F$ G$ H$ I$ J$ K$ L$ M$ N$ O$ P$ Q$ R$ S$ T$ U V W$ X$ Y$ Z$ AA$ AB$ AC$ AD$ AE$ AF$ AG$ AH$ AI$ AJ$ AK$ AL$ AM$ AN$ AO$ AP$ AQ$ AR$ AS;
input A B C D E F G H I J K L M N O P Q R S T#;
do _n_=1 to 24;
input U #;
description=catx(', ',T, U);
end;
input U V W X Y Z AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS;
RUN;
If you are talking about the data file in this Kaggle project then I would use a divide and conquer approach. Check each line in the file to see how many columns it contains. Then split the problem lines into separate file(s) and figure out how to handle whatever issue it is that causes them to be poorly formatted/parsed.
So get a list of the rows and number of columns in each row.
data maxcol;
infile "C:\downloads\archive.zip" zip member='Datafiniti_Mens_Shoe_Prices.csv'
dsd truncover column=cc length=ll lrecl=2000000
;
row+1;
input #;
do maxcol=1 by 1 while(cc<=ll); input dummy :$1. # +(-1) dummy $char1. #; end;
if dummy ne ',' then maxcol=maxcol-1;
keep row maxcol ;
run;
proc freq ;
tables maxcol;
run;
For example you could get the list of bad row numbers into a macro variable.
proc sql noprint;
select row into :rowlist separated by ' ' from maxcol where maxcol > 48 ;
quit;
Then use that macro variable in your code that reads the datafile.
data want;
infile "C:\downloads\archive.zip" zip member='Datafiniti_Mens_Shoe_Prices.csv' dsd
truncover lrecl=2000000
;
input #;
if _n_ in (1 &rowlist) then delete;
... rest of data step to read the "good" rows ...
For the other rows take a look at them and figure out where they are getting extra columns inserted. Possibly just fix them by hand. Or craft separate data steps to read each set separately using the same &ROWLIST trick.
If you are positive that
the extra columns are inserted between column 20 and 21
that column 21 always has a valid numeric string
none of the extra values are valid numeric strings
then you could use logic like this to generate a new delimited file (why not use | as the delimiter this time?).
data _null_;
infile "C:\downloads\archive.zip" zip member='Datafiniti_Mens_Shoe_Prices.csv' dsd
truncover lrecl=2000000
;
row+1;
length col1-col48 $32767;
input col1-col21 #;
if _N_>1 then do while(missing(input(col21,??32.)));
col20=catx(',',col20,col21);
input col21 #;
end;
input col22-col48;
file "c:\downloads\shoes.txt" dsd dlm='|' lrecl=2000000 ;
put col1-col48 ;
run;
Which you could even then try to read using PROC IMPORT to guess how to define the variables. (But watch out as PROC IMPORT might truncate some of the records by using LRECL=32767)
proc import datafile="c:\downloads\shoes.txt" dbms=csv out=want replace ;
delimiter='|';
guessingrows=max;
run;
Checking column 21:
The MEANS Procedure
Analysis Variable : prices_amountMin
N Mean Std Dev Minimum Maximum
---------------------------------------------------------------------
19387 111.8138820 276.7080893 0 16949.00
---------------------------------------------------------------------
I have this csv dataset named Movie:
ID,Underage,Name,Rating,Year, Rank on IMDb ,
M1021,,Elanor, Melanor,12,1879,5
M1203,Yes,IT,12,1999,1,
M0081,,Cars 2,13,1999,2,
M1371,No,Kiminonawa,12,2017,3,
M3416,,Living in the past, fading future,13,2018,12
I would like to import Movie into SAS such that "Elanor, Melanor" is the Name instead of 'Elanor' being under Name while 'Melanor' being in Rating.
I tried the follow code:
FILENAME XX '....Movie.csv';
data movieYY (drop=DLM1at field2);
infile XX dlm=',' firstobs=2 dsd;
format ID $5. Underage $3. Name $50. Year 4. Rating $3. 'Rank on IMDb'n 2.;
input #;
DLM1at = find(_INFILE_, ',');
length field2 $4;
field2 = substr(_INFILE_, DLM1at + 1, 4);
if lengthn(compress(field2, '1234567890')) ne 0 then do;
_INFILE_ = substr(_INFILE_, 1, dlm1at - 1) || ' ' ||
substr(_INFILE_, dlm1at + 1);
end;
input ID Underage Name Year Rating 'Rank on IMDb'n;
run;
May I know what should i do? I am still a beginner in SAS. Thank you!
Add quotes to the name of each movie, or use another delimiter. Any data within a delimited file that also has the same delimiter must be in quotes. For example:
data foo;
infile datalines dlm="," dsd;
length id 8. name $25.;
input id name$;
datalines;
1, "Smith, John"
2, "Cage, Nicolas"
;
run;
I am trying to stack multiple variables vertically in a PROC REPORT. I am tied to PROC REPORT over TABULATE or FREQ, so a solution using REPORT would be preferable.
I've tested out other solutions, but unable to find success using my data.
proc format library = library ;
value AGE
1 = '18 to 29'
2 = '30 to 45'
3 = '46 to 64'
4 = '65 and over'
9 = 'NA' ;
value SEX
1 = 'Male'
2 = 'Female'
9 = 'NA' ;
value Q16F
1 = 'EXCELLENT'
2 = 'VERY GOOD'
3 = 'GOOD'
4 = 'FAIR'
5 = 'POOR'
8 = 'DON''T KNOW'
9 = 'NA/REFUSED' ;
DATA CHSS2017_sashelp (keep = q16 sex age);
SET CHSS2017.CHSS2017_sashelp;
FORMAT q16 q16f.;
FORMAT sex SEX.;
FORMAT age AGE.;
RUN;
proc report data = CHSS2017_sashelp nowindows headline;
columns sex n, (q16);
define sex / group;
define q16 / across;
run;
The expected result would be a stacked REPORT table with multiple variables:
expected output
If you are fine with repeated headings/variable names then you can use two report procedures. See code below, I have used sample sas data and customized the formats a bit:
proc format ;
value AGE
1-10 = '1 to 10'
11-12 = '11 to 12'
13-High = '13 and over'
;
value $SEXv
'M' = 'Male'
'F' = 'Female'
;
value Q16F
1 = 'EXCELLENT'
2 = 'VERY GOOD'
3 = 'GOOD'
4 = 'FAIR'
5 = 'POOR'
8 = 'DON''T KNOW'
9 = 'NA/REFUSED' ;
run;
%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data class;
set sashelp.class;
q16 = %RandBetween(1, 9);
FORMAT q16 q16f.;
FORMAT sex $SEXv.;
FORMAT age AGE.;
run;
proc report data = class nowindows headline;
columns age n, (q16);
define age/ group;
define q16 / across;
run;
proc report data = class nowindows headline;
columns sex n, (q16);
define sex / group;
define q16 / across;
run;
the macro RandBetween is only for this code, you don't have to use it
Most of my data is read in in a fixed width format, such as fixedwidth.txt:
00012000ABC
0044500DEFG
345340000HI
00234000JKL
06453MNOPQR
Where the first 5 characters are colA and the next six are colB. The code to read this in looks something like:
infile "&path.fixedwidth.txt" lrecl = 397 missover;
input colA $5.
colB $6.
;
label colA = 'column A '
colB = 'column B '
;
run;
However some of my data is coming from elsewhere and is formatted as a csv without the leading zeroes, i.e. example.csv:
colA,colB
12,ABC
445,DEFG
34534,HI
234,JKL
6453,MNOPQR
As the csv data is being added to the existing data read in from the fixed width file, I want to match the formatting exactly.
The code I've got so far for reading in example.csv is:
data work.example;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile "&path./example.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat colA $5.;
informat colB $6.;
format colA z5.; *;
format colB z6.; *;
input
colA $
colB $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
But the formats z5. & z6. only work on columns formatted as numeric so this isn't working and gives this output:
ColA colB
12 ABC
445 DEFG
34534 HI
234 JKL
6453 MNOPQR
When I want:
ColA colB
00012 000ABC
00445 00DEFG
34534 0000HI
00234 000JKL
06453 MNOPQR
With both columns formatted as characters.
Ideally I'd like to find a way to get the output I need using only formats & informats to keep the code easy to follow (I have a lot of columns to keep track of!).
Grateful for any suggestions!
You can use cats to force the csv columns to character, without knowing what types the csv import determined they were. Right justify the resultant to the expected or needed variable length and translate the filled in spaces to zeroes.
For example
data have;
length a 8 b $7; * dang csv data, someone entered 7 chars for colB;
a = 12; b = "MNQ"; output;
a = 123456; b = "ABCDEFG"; output;
run;
data want;
set have (rename=(a=csvA b=csvB));
length a $5 b $6;
* may transfer, truncate or convert, based on length and type of csv variables;
* substr used to prevent blank results when cats (number) is too long;
* instead, the number will be truncated;
a = substr(cats(csvA),1);
b = substr(cats(csvB),1);
a = translate(right(a),'0',' ');
b = translate(right(b),'0',' ');
run;
SUBSTR on the left.
data test;
infile cards firstobs=2 dsd;
length cola $5 colb $6;
cola = '00000';
colb = '000000';
input (a b)($);
substr(cola,vlength(cola)-length(a)+1)=a;
substr(colb,vlength(colb)-length(b)+1)=b;
cards;
colA,colB
12,ABC
445,DEFG
34534,HI
234,JKL
6453,MNOPQR
;;;;
run;
proc print;
run;
I would like to categorize variables from one table which looks like this:
Var1 Var2
19 0.2
30 0.1
45 0.2
With table that stores conditions for the categroziation
variable condition category
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
And the result of that would be a new table created containing categories of variables based on first table:
Var1 Var2
1 2
2 1
3 2
This is just a duplicate of this previous question. Categorize variables basing on conditions from other data set
Code generation from data is much easier to create and debug if you just use SAs code to do it and not add in complications of macro code.
Here is the answer again in more detail. First let's make your example data printouts into actual SAS datasets.
data rawdata ;
input Var1 Var2;
cards;
19 0.2
30 0.1
45 0.2
;
data metadata ;
input variable :$32. condition :$200. category ;
cards;
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
;
Now let's generate an SQL select statement with a CASE statement to generate each output variable from the metadata.
filename code temp;
data _null_;
set metadata end=eof;
by variable ;
file code ;
retain sep ' ';
if _n_=1 then put "create table want as select";
if first.variable then put sep $1. 'case ';
put ' when (' condition ') then ' category ;
if last.variable then put ' else . end as ' variable ;
if eof then put 'from rawdata' / ';' ;
sep=',' ;
run;
And run it.
proc sql;
%include code / source2 ;
quit;
Example SAS LOG:
1639 proc sql;
1640 %include code / source2 ;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\xxx\AppData\Local\Temp\1\SAS Temporary Files\_TD13724_AMRL20B7F00CGPP_\#LN00654.
1641 +create table want as select
1642 + case
1643 + when (Var1<20 ) then 1
1644 + when (40>Var1>=20 ) then 2
1645 + when (Var1>=40 ) then 3
1646 + else . end as Var1
1647 +,case
1648 + when (Var2<0.2 ) then 1
1649 + when (Var2>=0.2 ) then 2
1650 + else . end as Var2
1651 +from rawdata
1652 +;
NOTE: Table WORK.WANT created, with 3 rows and 2 columns.
Results:
Obs Var1 Var2
1 1 2
2 2 1
3 3 2
If you want to convert it to macro then just replace the hard coded input dataset names and output dataset names with macro variable references.
%macro gencat(indata=,outdata=,metadata=metadata);
filename code temp;
data _null_;
set &metadata end=eof;
by variable ;
file code ;
retain sep ' ';
if _n_=1 then put "create table &outdata as select";
if first.variable then put sep $1. 'case ';
put ' when (' condition ') then ' category ;
if last.variable then put ' else . end as ' variable ;
if eof then put "from &indata" / ';' ;
sep=',' ;
run;
proc sql;
%include code / nosource2 ;
quit;
%mend gencat;
So now the same result is gotten by calling with these values:
%gencat(indata=rawdata,outdata=want)
So the log now looks like this:
1783 %gencat(indata=rawdata,outdata=want)
MPRINT(GENCAT): filename code temp;
NOTE: PROCEDURE SQL used (Total process time):
real time 10.35 seconds
cpu time 0.20 seconds
MPRINT(GENCAT): data _null_;
MPRINT(GENCAT): set metadata end=eof;
MPRINT(GENCAT): by variable ;
MPRINT(GENCAT): file code ;
MPRINT(GENCAT): retain sep ' ';
MPRINT(GENCAT): if _n_=1 then put "create table want as select";
MPRINT(GENCAT): if first.variable then put sep $1. 'case ';
MPRINT(GENCAT): put ' when (' condition ') then ' category ;
MPRINT(GENCAT): if last.variable then put ' else . end as ' variable ;
MPRINT(GENCAT): if eof then put "from rawdata" / ';' ;
MPRINT(GENCAT): sep=',' ;
MPRINT(GENCAT): run;
NOTE: The file CODE is:
Filename=C:\Users\AppData\Local\Temp\1\SAS Temporary Files\_TD13724_AMRL20B7F00CGPP_\#LN00659,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=02Feb2018:12:36:39,
Create Time=02Feb2018:12:36:39
NOTE: 12 records were written to the file CODE.
The minimum record length was 1.
The maximum record length was 28.
NOTE: There were 5 observations read from the data set WORK.METADATA.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
MPRINT(GENCAT): proc sql;
MPRINT(GENCAT): create table want as select case when (Var1<20 ) then 1 when (40>Var1>=20 ) then 2 when (Var1>=40 ) then 3 else .
end as Var1 ,case when (Var2<0.2 ) then 1 when (Var2>=0.2 ) then 2 else . end as Var2 from rawdata ;
NOTE: Table WORK.WANT created, with 3 rows and 2 columns.
MPRINT(GENCAT): quit;
Here is a macro way to accomplish this. It assumes that the conditions in the table are in the order you want them applied and grouped by variable. If not, then sort the table appropriately.
First test data:
data have;
input Var1 Var2;
datalines;
19 0.2
30 0.1
45 0.2
;
data conditions;
informat variable condition $32.;
input variable $ condition $ category;
datalines;
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
;
Now make a macro. We will read the table into macro variables and then write a datastep to apply them. We use IF/THEN/ELSE blocks for each variable.
%macro apply_conditions();
%local i j n;
proc sql noprint;
select count(*) into :n trimmed from conditions;
%do i=1 %to &n;
%local var&i;
%local condition&i;
%local category&i;
%end;
select variable, condition, category
into :var1 - :var&n,
:condition1 - :condition&n,
:category1 - :category&n
from conditions;
quit;
data want;
set have;
%do i=1 %to &n;
/*If the variable changes, then don't add the ELSE */
%if &i>1 %then %do;
%let j=%eval(&i-1);
%if &&var&i = &&var&j %then %do;
else
%end;
%end;
/*apply the condition*/
if &&condition&i then
&&var&i = &&category&i;
%end;
run;
%mend;
Finally run the macro. Using MPRINT to see the code that is generated.
options mprint;
%apply_conditions;