I'm importing a parameter file in txt form with no title row such as the following:
byvars TEST16X GIO
log_transform N
y_intercept Y
exclude_outliers N
exclude_einmos N
Because it is a parameter file, the length of the two columns will not be fixed. The following is the problematic code I created to import the txt file. The two columns are concatenated instead of splitting into individual columns:
data test1;
infile "files/parameters.txt" DELIMITER='09'x col=Colpoint
length=linelen;
length pname $30 pvalue $10;
input #1 pname $ #;
varlen=linelen - colpoint + 1;
input pvalue $varying1024. varlen;
call symputx('pname', STRIP(pvalue));
run;
Output:
This parameter file defines global macro variables and their values. Such that log_transform is a macro variable with value 'N'.
You seem to be working way too hard. Just use TRUNCOVER and formatted input for the PVALUE field. Use list mode input for the parameter name field.
data parameters;
infile "files/parameters.txt" truncover ;
input pname :$32. pvalue $200. ;
call symputx(pname,pvalue);
run;
Related
I have a SAS dataset, let us say
it has 4 columns A,B,C,D and the values
A = x
B = x
C = x
**D = x,y**
Here column D has two values inside a single column while converting it into CSV format it generates a new column with the value Y. How to avoid this and to convert SAS dataset into CSV file?
* get some test records in a file;
Data _null_;
file 'c:\tmp\test.txt' lrecl=80;
put '1,22,Hans Olsen,Denmark,333,4';
put '1111,2,Turner, Alfred,England,3333,4';
put '1,222,Horst Mayer,Germany,3,4444';
run;
* Read the file as a delimited file;
data test; infile 'c:\tmp\test.txt' dsd dlm=',' missover;
length v1 v2 8 v3 v4 $40 v5 v6 8;
input
'V1'n : ?? BEST5.
'V2'n : ?? BEST5.
'V3'n : $CHAR40.
'V4'n : $CHAR40.
'V5'n : ?? BEST5.
'V6'n : ?? BEST5.;
run;
* Read the file and write another file.
* If 6 delimiters and not 5, change the third to #;
data test2;
infile 'c:\tmp\test.txt' lrecl=80 truncover;
file 'c:\tmp\test2.txt' lrecl=80;
length rec $80;
drop pos len;
input rec $char80.;
if count(rec,',') = 6 then do;
call scan(rec,4,pos,len,',');
substr(rec,pos-1,1) = '','';
end;
put rec;
run;
* Read the new file as a delimited file;
data test2; infile 'c:\tmp\test2.txt' dsd dlm=',' missover;
length v1 v2 8 v3 v4 $40 v5 v6 8;
input
'V1'n : ?? BEST5.
'V2'n : ?? BEST5.
'V3'n : $CHAR40.
'V4'n : $CHAR40.
'V5'n : ?? BEST5.
'V6'n : ?? BEST5.;
run;
In this code, it add '#' but I want ',' itself in the output.
Could anyone please guide me to do that?
Thanks in advance!!
It sounds like you are starting with an improperly created CSV file.
1,22,Hans Olsen,Denmark,333,4
1111,2,Turner, Alfred,England,3333,4
1,222,Horst Mayer,Germany,3,4444
That should have been made like this:
1,22,Hans Olsen,Denmark,333,4
1111,2,"Turner, Alfred",England,3333,4
1,222,Horst Mayer,Germany,3,4444
If you are positive that you know that the only field with embedded commas is the third then you can use a data step to read it in and generate a valid file.
data _null_;
infile bad dsd truncover ;
file good dsd ;
length v1-v6 dummy $200;
input v1-v2 #;
do i=1 to countw(_infile_,',','q')-5;
input dummy #;
v3=catx(', ',v3,dummy);
end;
input v4-v6 ;
put v1-v6 ;
run;
Once you have a properly formatted CSV file then it is easy to read.
data want;
infile good dsd truncover ;
length v1-v2 8 v3-v4 $40 v5-v6 8;
input v1-v6 ;
run;
But if the extra comma could be in any field then you will probably need to have a human fix those lines.
If your field value contains the field delimiter you will want to double quote the field value. Proc EXPORT will do such double quoting when the data base type is specified as CSV
Example:
data have;
A = 1;
B = 2;
C = 3;
D = 'x,y';
run;
filename csv temp;
proc export data=have outfile=csv dbms=csv;
run;
data _null_;
infile csv;
input;
put _infile_;
run;
The log will show the exported file contains double quoted values as needed in the csv file produced.
Log
A,B,C,D
1,2,3,"x,y"
data name
filename reference name "filename.csv"
infile filename.csv dlm=",";
run;
what is wrong with the code?How to create data set by the reference csv file
Place the filename statement before the DATA Step.
You will need an INPUT statement to read the data into variables,
or if the file has a header row use Proc IMPORT and the system will best guess the input needed.
Example 1
Presume file has no header row and there are 3 columns of numbers separated by commas
filename myfile 'mydatafile.csv';
data want;
infile myfile dsd dlm=',';
input x y z;
run;
Example 2
Presume there is a header row
filename myfile 'mydatafile.csv';
proc import file=myfile replace out=want dbms=csv;
run;
or
* columns expected are known;
filename myfile 'mydatafile.csv';
data want;
infile myfile dsd dlm=',' firstobs=2;
input x y z;
run;
NOTE
An INFILE statement can also directly refer to a file
...
INFILE "filename.csv" ... ;
...
I have two datasets, both with same variable names. In one of the datasets two variables have character format, however in the other dataset all variables are numeric. I use the following code to convert numeric variables to character, but the numbers are changing by 490.6 -> 491.
How can I do the conversion so that the numbers wouldn't change?
data tst ;
set data (rename=(Day14=Day14_Character Day2=Day2_Character)) ;
Day14 = put(Day14_Character, 8.) ;
Day2 = put(Day2_Character, 8.) ;
drop Day14_Character Day2_Character ;
run;
Your posted code is confused. Half of it looks like code to convert from character to numeric and half looks like it is for the other direction.
To convert to character use the PUT() function. Normally you will want to left align the resulting string. You can use the -L modifier on the end of the format specification to left align the value.
So to convert numeric variables DAY14 and DAY2 to character variables of length $8 you could use code like this:
data want ;
set have (rename=(Day14=Day14_Numeric Day2=Day2_Numeric)) ;
Day14 = put(Day14_Numeric, best8.-L) ;
Day2 = put(Day2_Numeric, best8.-L) ;
drop Day14_Numeric Day2_Numeric ;
run;
Remember you use PUT statement or PUT() function with formats to convert values to text. And you use the INPUT statement or INPUT() function with informats to convert text to values.
Change the format to something like Best8.2:
data tst ;
set data (rename=(Day14=Day14_Character Day2=Day2_Character)) ;
Day14 = put(Day14_Character, best8.2) ;
Day2 = put(Day2_Character, best8.2) ;
drop Day14_Character Day2_Character ;
run;
Here is an example:
data test;
input r ;
datalines;
500.04
490.6
;
run;
data test1;
set test;
num1 = put(r, 8.2);
run;
If you do not want to specify the width and number of decimal points you can just use the BEST. informat and SAS will automatically assign the width and decimals based on the input data. However the length of the outcome variable may be large unless you specify it explicitly. This will still retain your numbers as in the original variable.
I have a variable, textvar, that looks like this:
type=1&name=bob
type=2&name=sue
I want to create a new table that looks like this:
type name
1 bob
2 sue
My approach is to use scan to split the variables on & so for the first observation I have
var1 var2
type=1 name=bob
So now I can use scan again to split on =:
vname = scan(var1, 1, '=');
value = scan(var1, 2, '=');
But how can I now assign value to the variable named vname?
PROC TRANPSOSE is the quickest way. You need an ID variable (dummy or real).
data test;
informat testvar $50.;
input testvar $;
datalines;
type=1&name=bob
type=2&name=sue
;;;;
run;
data test_vert;
set test;
id+1;
length scanner $20 vname vvalue $20;
scanner=scan(testvar,1,"&");
do _t=2 by 1 until (scanner=' ');
vname=scan(scanner,1,"=");
vvalue=scan(scanner,2,"=");
output;
scanner=scan(testvar,_t,"&");
end;
run;
proc transpose data=test_vert out=test_T;
by id;
id vname;
var vvalue;
run;
Does this help? Dynamic variable names in SAS
I think I have some code to address this, but left it at my workplace.
Obviously you haven't included your real data, but can't you just hard code some of the values if the format of the raw data is the same in each row? My code converts the "=" and "&" to "," to make the scan function easier to use.
data want (keep=type name);
set test;
_newvar=translate(testvar,",,","&=");
type=input(scan(_newvar,2),best12.);
length name $20;
name=scan(_newvar,4);
run;
I have my data in a .txt file which is comma separated. I am writing a regular infile statements to import that file into a sas dataset. The data is some 2.5 million rows. However in the 37314th row and many more rows I have junk values. SAS is importing rows only a row above the junk value rows and therefore I am not getting a dataset with all 2.5 million rows but with 37314 rows. I want to write a code which while writing this infile takes care of these junk rows and either doesnt take them or deletes them. All in all, I need all the 2.5 million rows which I am not able to get because of in between junk rows.
any help would be appreciated.
You can read the whole line into the input buffer using just an
Input;
statement. Then you can parse the fields individually using the
_infile_
variable.
Example:
data _null_;
infile datalines firstobs=2;
input;
city = scan(_infile_, 1, ' ');
char_min = scan(_infile_, 3, ' ');
char_min = substr(char_min, 2, length(char_min)-2);
minutes = input(char_min, BEST12.);
put city= minutes=;
datalines;
City Number Minutes Charge
Jackson 415-555-2384 <25> <2.45>
Jefferson 813-555-2356 <15> <1.62>
Joliet 913-555-3223 <65> <10.32>
;
run;
Working with Data in the Input Buffer.
You can also use the ? and ?? modifiers for the input statement to 'ignore' any problem rows.
Here's the link to the doc. Look under the heading "Format Modifiers for Error Reporting".
An example:
data x;
format my_num best.;
input my_num ?? ;
**
** POSSIBLE ERROR HANDLING HERE:
*;
if my_num ne . then do;
output;
end;
datalines;
a
;
run;