file statement in data step to export comma delimited text file - sas

Problem: suppose i do not know the variable name and number of variable. or imagine there are too many variables that i cannot write the put statement.
the following cases is that i knew there are 3 varialbes
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put region mtg sendmail;
run;
I tried using put _all_;
And the output is:
region=N mtg=24NOV1999 sendmail=10OCT1999 _ERROR_=0 _N_=1
region=S mtg=28DEC1999 sendmail=13NOV1999 _ERROR_=0 _N_=2
region=E mtg=03DEC1999 sendmail=19OCT1999 _ERROR_=0 _N_=3
region=W mtg=04OCT1999 sendmail=20AUG1999 _ERROR_=0 _N_=4
While it does not give comman delimited format but named format instead
My desired output would be
N,24NOV1999,10OCT1999
S,28DEC1999,13NOV1999
E,03DEC1999,19OCT1999
W,04OCT1999,20AUG1999

This is right one
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (~);
run;
this one helps u

The reason you have so many answers that seem to work, but have different characters, is that the important thing is changing _all_ to (_all_). The arguments after that are not important.
Explained in some detail here, you actually have two entirely different things going on when you write
put _all_;
and
put (_all_) (:);
Programmers familiar with the concept of an overloaded function will find that as the simplest way to think of this. If put sees _all_, it calls one version of put. If it sees (_all_) (or any list of variables with ( ) around it), it calls another (expanding _all_ to its variable list). Notice that if you try
put (_all_);
It fails, and it fails with errors suggesting it is trying to call formatted input (ie, it asks you why you don't have another ( there, which would be the normal thing in formatted input after a list with ( ).)
By itself, _all_ is an argument to put that specifically tells it to use named output to output all variables in the dataset. Hence the variable=value format of the output. So in the first example, _all_ is a constant - an argument - nothing more.
In the second example, though, (_all_) is a variable list, which contains all variables as if they were typed in, space delimited. So
put (_all_) (:);
is equivalent to
put (name sex age height weight) (:);
if used with SASHELP.CLASS. Adding anything - a colon, a tilde, an ampersand, etc. - that is legal in the context of formatted output will cause that to be used.
Note that
put _all_ #;
Does not cause that to happen - apparently # (or ## or / or //) are all legal arguments to put _all_.
Interestingly, _numeric_ and _character_ do not have an analogous shortcut - clearly this is an explicit, special case just for _all_. They cannot be used without parens. put _numeric_; gives an error that _numeric_ is not a legal variable name. But, put (_numeric_) (:); is perfectly legal.

Try the colon modifier option.
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (:);
run;
Another option is to read the names from the SASHELP.VCOLUMN table, create a macro variable that lists the columns and include that in your put statement.
The documentation is a bit scarce:
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000176623.htm
:
enables you to specify a format that the PUT statement uses to write the variable value. All leading and trailing blanks are deleted, and each value is followed by a single blank.
~
enables you to specify a format that the PUT statement uses to write the variable value. SAS displays the formatted value in quotation marks even if the formatted value does not contain the delimiter. SAS deletes all leading and trailing blanks, and each value is followed by a single blank. Missing values for character variables are written as a blank (" ") and, by default, missing values for numeric variables are written as a period (".").

It is easiest to just use a variable list followed by a format list. Syntax is:
(<variable list>) (<format list>)
The values in the format list are repeated until the variables in the variable list are exhausted. The format list can include format modifiers like :,&,~ or = and cursor movement commands like /, +n, or #n.
Also you should add the DSD option to your FILE statement so that missing values are properly represented in the CSV file as having nothing between the delimiters.
So your program reduces to:
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' DSD dlm=',';
put (_all_) (:) ;
run;
The problem you had with PUT _ALL_; is that when _ALL_ is used by itself it is treated differently than when it is part of a variable list inside of (). As a variable list it does not include system generated variables such as _N_ or FIRST. or LAST. variables generated by BY statements.
Note that if you want to use _ALL_ in a variable list and still get named output you can use the = format modifier in the format list.
put (_all_) (=) ;

No, I'm Spartacus!
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (&);
run;

data meeting;
input region $ mtg $ sendmail $;
cards;
N 24NOV1999 10OCT1999
S 28DEC1999 13NOV1999
E 03DEC1999 19OCT1999
W 04OCT1999 20AUG1999
;
run;
proc export data=meeting
outfile='c:\input\meeting.txt'
dbms=tab replace;
delimiter=',';
run;
hope this is helpul for even number of of variables.

Related

SAS Export Issue as it is giving additional double quote

I am trying to export SAS data into CSV, sas dataset name is abc here and format is
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
I am using following code.
filename exprt "C:/abc.csv" encoding="utf-8";
proc export data=abc
outfile=exprt
dbms=tab;
run;
output is
LINE_NUMBER DESCRIPTION
524JG "24PC AMEFA VINTAGE CUTLERY SET ""DUBARRY"""
so there is double quote available before and after the description here and additional doble quote is coming after & before DUBARRY word. I have no clue whats happening. Can some one help me to resolve this and make me understand what exatly happening here.
expected result:
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
There is no need to use PROC EXPORT to create a delimited file. You can write it with a simple DATA step. If you want to create your example file then just do not use the DSD option on the FILE statement. But note that depending on the data you are writing that you could create a file that cannot be properly parsed because of extra un-protected delimiters. Also you will have trouble representing missing values.
Let's make a sample dataset we can use to test.
data have ;
input id value cvalue $ name $20. ;
cards;
1 123 A Normal
2 345 B Embedded|delimiter
3 678 C Embedded "quotes"
4 . D Missing value
5 901 . Missing cvalue
;
Essentially PROC EXPORT is writing the data using the DSD option. Like this:
data _null_;
set have ;
file 'myfile.txt' dsd dlm='09'x ;
put (_all_) (+0);
run;
Which will yield a file like this (with pipes replacing the tabs so you can see them).
1|123|A|Normal
2|345|B|"Embedded|delimiter"
3|678|C|"Embedded ""quotes"""
4||D|Missing value
5|901||Missing cvalue
If you just remove DSD option then you get a file like this instead.
1|123|A|Normal
2|345|B|Embedded|delimiter
3|678|C|Embedded "quotes"
4|.|D|Missing value
5|901| |Missing cvalue
Notice how the second line looks like it has 5 values instead of 4, making it impossible to know how to split it into 4 values. Also notice how the missing values have a minimum length of at least one character.
Another way would be to run a data step to convert the normal file that PROC EXPORT generates into the variant format that you want. This might also give you a place to add escape characters to protect special characters if your target format requires them.
data _null_;
infile normal dsd dlm='|' truncover ;
file abnormal dlm='|';
do i=1 to 4 ;
if i>1 then put '|' #;
input field :$32767. #;
field = tranwrd(field,'\','\\');
field = tranwrd(field,'|','\|');
len = lengthn(field);
put field $varying32767. len #;
end;
put;
run;
You could even make this datastep smart enough to count the number of fields on the first row and use that to control the loop so that you wouldn't have to hard code it.

This range is repeated or overlapped

Now the question I have is I have a bigger problem as I am getting "this range is repeated or overlapped"... To be specific my values of label are repeating I mean my format has repeated values like a=aa b=aa c=as kind of. How do I resolve this error. When I use the hlo=M as muntilqbel option it gives double the data...
I am mapping like below.
Santhan=Santhan
Chintu=Santhan
Please suggest a solution.
To convert data to a FORMAT use the CNTLIN= option on PROC FORMAT. But first make sure the data describes a valid format. So read the data from the file.
data myfmt ;
infile 'myfile.txt' dsd truncover ;
length fmtname $32 start $100 value $200 ;
fmtname = '$MYFMT';
input start value ;
run;
Make sure to set the lengths of START and VALUE to be long enough for any actual values your source file might have.
Then make sure it is sorted and you do not have duplicate codes (START values).
proc sort data=myfmt out=myfmt_clean nodupkey ;
by start;
run;
The SAS log will show if any observations were deleted because of duplicate START values.
If you do have duplicate values then examine the dataset or original text file to understand why and determine how you want to handle the duplicates. The PROC SORT step above will keep just one of the duplicates. You might just has exact duplicates, in which case keeping only one is fine. Or you might want to collapse the duplicate observations into a single observation and concatenate the multiple decodes into one long decode.
If you want you can add a record that will add the functionality of the OTHER keyword of the VALUE statement in PROC FORMAT. You can use that to set a default value, like 'Value not found', to decode any value you might encounter that was not in your original source file.
data myfmt_final;
set myfmt_clean end=eof;
output;
if eof then do;
start = ' ';
label = 'Value not found';
hlo = 'O' ;
output;
end;
run;
Then use PROC FORMAT to make the format from the cleaned up data file.
proc format cntlin = myfmt_final;
run;
To convert a FORMAT to a dataset use the CNTLOUT= option on PROC FORMAT.
For example if you had created this format previously.
proc format ;
value $myfmt 'ABC'='ABC' 'BCD'='BCD' 'BCD1'='BCD' 'BCD2'='BCD' ;
run;
then you can use another PROC FORMAT step to make a dataset. Use the SELECT statement if you format catalog has more than one format defined and you just want one (or some) of them.
proc format cntlout=myfmt ;
select $myfmt ;
run;
Then you can use that dataset to easily make a text file. For example a comma delimited file.
data _null_;
set myfmt ;
file 'myfmt.txt' dsd ;
put start label;
run;
The result would be a text file that looks like this:
ABC,ABC
BCD,BCD
BCD1,BCD
BCD2,BCD
You get this error because you have the same code that maps to two different categories. I'm going to guess you likely did not import your data correctly from your text file and ended up getting some values truncated but without the full process it's an educated guess.
This will work fine:
proc format;
value $ test
'a'='aa' 'b'='aa' 'c'='as'
;
run;
This version will not work, because a is mapped to two different values, so SAS will not know which one to use.
proc format;
value $ badtest
'a'='aa'
'a' = 'ba'
'b' = 'aa'
'c' = 'as';
run;
This generates the error regarding overlaps in your data.
The way to fix this is to find the duplicates and determine which code they should actually map to. PROC SORT can be used to get your duplicate records.

Print all columns SAS with delimiter

I am trying to print out a delimited file, without having to specify all of the columns. I can get close, but the numeric columns are always quoted:
DATA _NULL_;
SET SASHELP.CARS (obs = 5 keep = Make Model EngineSize);
FILE "foo.csv" DSD DLM=",";
PUT (_all_) (~);
RUN;
foo.csv
"Acura","MDX","3.5"
"Acura","RSX Type S 2dr","2"
"Acura","TSX 4dr","2.4"
"Acura","TL 4dr","3.2"
"Acura","3.5 RL 4dr","3.5"
How can I achieve either:
"Acura","MDX",3.5
"Acura","RSX Type S 2dr",2
"Acura","TSX 4dr",2.4
"Acura","TL 4dr",3.2
"Acura","3.5 RL 4dr",3.5
or:
Acura,MDX,3.5
Acura,RSX Type S 2dr,2
Acura,TSX 4dr,2.4
Acura,TL 4dr,3.2
Acura,3.5 RL 4dr,3.5
~ asks for quoting. So, you're getting quoting.
You can use & instead:
DATA _NULL_;
SET SASHELP.CARS (obs = 5 keep = Make Model EngineSize);
FILE "c:\temp\foo.csv" DSD DLM=",";
PUT (_all_) (&);
RUN;
& has effectively no impact on the data (we've had a question about it once upon a time, I don't recall the ultimate answer, but basically it seems to mostly be used for this specific purpose, even though that's not its purpose).

why macro is creating Leading space while resolving macro in sas?

I am submitting the following SAS code:
proc format;
picture mysdt
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
DATA _NULL_;
call symput("Today", Put(datetime(),mysdt.));
run;
%put t_&today;
The resulting log shows 2 spaces before the datetime:
t_ 201504240150
The problem here is when my macro is resolved it is creating leading space. Why is it creating spaces?
My output should be:
t_201504240150
I know the solution but just wanted to know the reason.
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt.)));
run;
The reason for this is that your format is set up with a default length of 14. Therefore, when you go to put your value into &today it is stored with leading blanks to fill out the length to 14. From the SAS documentation:
DEFAULT=length
specifies the default length of the picture. The value for DEFAULT= becomes the length of picture if you do not give a specific length when you associate the format with a variable.
So, there are a number of options:
Set the default format length to match the expected length of your datetime value (using DEFAULT=12):
proc format;
picture mysdt (default=12)
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
Specify the width in your format:
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt12.)));
run;
Or, as previously answered, use call symputx to trim whitespace:
DATA _NULL_;
call symputx("Today", strip(Put(datetime(),mysdt.)));
run;
Personally, I'd fix the format to default to 12, then you won't need to remember to specify width or use call symputx each time.
call symputx remove leading and tailing space.
DATA _NULL_;
call symputx("Today", Put(datetime(),mysdt.));
run;

Creating a dataset variable from a macro variable containing both quotes, double quotes and mismatched quotes

In summary, I am struggling to achieve the following:
data _null_;
input x $ 1-50 ;
call symput('problem',x);
cards4;
'this' "is '' my "string"" from 'hell!
;;;;
run;
data _null_;
x="%superQ(problem)";
put x=;
run;
The superq function does a good job of managing the mismatched quotes, however the consecutive quotes ("") were still resolved back to single quotes in variable X.
Is this addressable?
Current result:
x='this' "is '' my "string" from 'hell!
Desired result:
x='this' "is '' my "string"" from 'hell!
The short answer is that you can use SYMGET here:
data _null_;
x=symget("problem");
put x=;
run;
If that is not an option for some reason, provide some more information as to the context. I'll also see if I can point Toby (the SAS-L macro quoting guru) or some of the other folks there here, to see if they have any suggestions for handling this without SYMGET.
From SAS-L, FriedEgg (Matt) posted the following additional solution:
resolve=resolve('%superq(problem)');
He also notes that you can mask it on the way in, if you have control over that:
data _null_;
input x $ 1-50 ;
call symput('problem',quote(x));
cards4;
'this' "is '' my "string"" from 'hell!
;;;;
run;
data _null_;
x=&problem;
put x=;
run;