I get an error with this SAS code and I'm not sure what is wrong with it.
Data june;
infile "/home/u/Work/NHS/20200630-RTT-JUNE.csv" DLM=',';
'Gt 00 To 01 Weeks SUM 1'n = input('Gt 00 To 01 Weeks SUM 1'n);
run;
ERROR 388-185: Expecting an arithmetic operator.
ERROR 76-322: Syntax error, statement will be ignored.
Thank you very much.
You did not provide the INPUT() function call with any informat specification to use. The INPUT() function requires two arguments, the text string to convert and the informat to use to do the conversion.
Also your code will not do anything since you are not reading any data from the file you pointed to with the INFILE statement.
You probably need to use the INPUT statement instead. For example code like this:
data june;
infile "/home/u59602916/Work/NHS/20200630-RTT-JUNE-2020-full-extract-revised.csv" DLM=',';
input 'Gt 00 To 01 Weeks SUM 1'n;
run;
Will attempt to one number from each line of the source text file. It will read the characters up to the first comma on the line.
If you really have a CSV file you should use the DSD option and not just the DLM= option. That will properly handle values that might be enclosed in quotes because they contain the delimiter. It will also properly treat adjacent delimiters as indicating an empty value.
Use the put function if you want to convert numeric values to character. Also, you'll need to use the input statement to indicate the variable(s) you would like to read in from the external file.
Data june;
infile "/home/u59602916/Work/NHS/20200630-RTT-JUNE-2020-full-extract-revised.csv" DLM=',';
input 'Gt 00 To 01 Weeks SUM 1'n $;
'Gt 00 To 01 Weeks SUM 1'n = put('Gt 00 To 01 Weeks SUM 1'n, 4.);
run;
If you want to read in a numeric value, don't use a function, simply use the input statement as above (without the dollar sign to indicate numeric).
Data june;
infile "/home/u59602916/Work/NHS/20200630-RTT-JUNE-2020-full-extract-revised.csv" DLM=',';
input 'Gt 00 To 01 Weeks SUM 1'n ;
run;
http://support.sas.com/kb/24/590.html
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lestmtsref/n1rill4udj0tfun1fvce3j401plo.htm#p1sic7f8w8x1wpn1i07zx5vhwdwj
Related
I have two lines of observations to read in SAS.
It is a comma-delimited data set.
My code is as below:
DATA SASweek1.industry;
INFILE "&Dirdata.Assignment1_Q6_data.txt" DLM="," DSD termstr=crlf TRUNCOVER;
LENGTH Company $ 15;
INPUT Company $ State $ Expense COMMA9. ;
FORMAT Expense DOLLAR9.;
*INFORMAT Expense DOLLAR10.;
RUN; * not ready;
The raw data set looks like this:
I can print out the first line of observations well,
but the last "0" will go to the first position of the second
line, becoming "0Lee's..".
Any suggestions would be highly appreciated!!
It is just doing what you told it to do. You told it to read exactly 9 characters.
Normally you should not use formatted input mode with delimited data. You prevent that by either adding the : (colon) prefix in front of the informat specification in the INPUT statement or removing the informat specification completely and using an INFORMAT statement to let SAS know what informat to use.
But your data is NOT properly delimited because the last field contains the delimiter, but the value is not enclosed in quotes. So the commas make it look like two values instead of one. The real solution is to fix the process that created the file to create a valid delimited file. It needs to quote the values with commas in them, or remove the commas from the numbers, or use a delimiter character that does not appear in the data.
Fortunately since it is the last field on the line you CAN use formatted input to read just that field. Since you are using the TRUNCOVER option just set the width of the informat in the INPUT statement to the maximum.
DATA SASweek1.industry;
INFILE "&Dirdata.Assignment1_Q6_data.txt" DLM="," DSD termstr=crlf TRUNCOVER;
LENGTH Company $15 State $15 Expense 8;
INPUT Company State Expense COMMA32. ;
FORMAT Expense DOLLAR9.;
RUN;
I have a numeric variable in SAS and I am struggling to extract the last digits of it. I tried using substr but it only handles char variables. The variable I have sometimes has 3 or 4 digits.
Example
1234
237
754
9000
In these cases I need to extract
34
37
54
00
And store them as a new numeric variable. I tried the code bellow in a proc sql statement but it returns and error. Can someone help me?
Var2 = input(substr(put(var1), 1, length(put(var1))-1), 8.)
SUBSTRN() works with numeric variables but it doesn't work well in this case because there's no easy way to specify the last two characters only. The MOD() function works well in this case, because you're essentially finding the remainder of 100. Since it looks like it's a character you want, you need to use PUT() to convert it to a character as well, with the Z2 format to keep the 0 and leading zeroes.
want = put(mod(value, 100), z2.);
one more way is to use prxchange as shown below
data have;
input val1;
val2= input(prxchange('s/^(.*)(\d{2})$/$2/', -1, trim(put(val1,8.))),best.);
format val2 z2.;
datalines;
1234
237
754
9000
40
;
You can also use the reverse and substr functions as follow and then use the z2. format.
input(reverse(substr(reverse(strip(put(var1,best.))), 1,2)), 8.);
I have a data set with 2 variables: a subject id number and a result. The result is a character variable. It was read in from an excel spreadsheet. Most results are numbers, but some of the results have a letter after them which was serving as a footnote in the excel file. I need to get rid of the letters after the numbers so I can convert the data to numeric for analysis. How can I do this? Below is some code to create an example dataset of the structure that I'm talking about.
data test;
input id result $ ;
datalines;
1 13
2 15
3 20
4 25c
5 75
6 99c
7 89b
8 10a
9 100
10 67
;
run;
Have a look at the compress and input functions.
num = input(compress(result, , "dk"), best.);
input converts character to numeric, interpreting the data using the informat you provide (best. here).
compress can be used to strip certain characters from a string, here it is used with the d modifier to request that all numeric digits be excluded, and the k modifier to request that the selected characters be kept rather than removed.
You may have to tweak the compress arguments a bit to deal with more complicated cases such as decimal points.
In the following program all data is read correctly
data test ;
infile datalines ;
input make 10$ mpg ## ; /* should I use make : 10$ . . */
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;
run ;
proc print ;
run ;
The above code works fine, however my teacher says that I must use colon : and the correct answer is input make : 10$ mpg ## ;
I dont understand why . As far as I know : is useful if we have trailing spaces at the begining of a record line , otherwise why should we use it here ?
The colon tells SAS to use the following informat. Without the colon SAS would ignore that part (it doesn't do anything). SAS by default uses an informat (and resultant length) of $8. if you don't specify it otherwise.
You are always better off specifying the informat, as a character of 2 length stored in the default 8 length character variable would be wasting storage space and processing time, but it won't alter the value (assuming you know to be aware of the trailing spaces).
You can also specify the informat ahead of time:
data test;
infile datalines;
informat make $10.;
input
make $ mpg ##;
datalines;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;;;;
run;
proc print data=test;
run ;
I find that usually easier to read, although using :$10. in stream is acceptable as well.
The : modifier on the INPUT statement says to read the text using LIST MODE even when there is an in-line format specification. In list mode the input statement reads the next word on the line into the variable. Without the : modifier the INPUT statement FORMATTED MDOE will be used which will read exactly the number of character specified by the in-line informat. Even if this could cause it to stop before the end of the current word on the line or read past the delimiter into the next word on the line.
But there are other problems with your INPUT statement. As currently written it will generate an error.
2112 input make 10$ mpg ## ;
-
22
ERROR 22-322: Expecting a name.
The number 10 in the INPUT statement is taken to mean you want to read MAKE using COLUMN MODE input. So you want to read the single digit number from the 10th character on the line. Then the $ modifier after the column number is generating an error because there is no variable directly in front of it to modify. If you want specify an informat you need to include the period as part of the specification. If you want to specify a character informat instead of a numeric informat then the name of the informat should start with the $ character.
So your INPUT statement with an in-line informat specification for MAKE would look like:
input make :$10. mpg ## ;
The other way to make sure the MAKE is defined long enough to hold 10 characters is to define the variable before referencing it in the INPUT statement. Then SAS does not have to guess how you want it defined by how you are using it in the INPUT statement. Once the variable is known there is no need to include any extra characters in the INPUT statement.
data test ;
length make $10 mpg 8;
input make mpg ## ;
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;
I’m working with some raw data that has fixed column widths, but has all its records written into a single line (blame the data vendor, not me :-) ). I know how to use
fixed column widths in the INPUT statement, and how to use ## to read more than one observation per line, but I am having trouble when I try to do both.
As an example, here’s some code where the data has fixed column widths, but there is one line per record. This code works fine:
DATA test_1;
INPUT alpha $ 1-5 beta $ 6-10 gamma 11-15 ;
DATALINES;
a f 1
ab fg 12
abc fgh 123
abcd fghi 1234
abcdefghij12345
;
RUN;
Now here’s the code for what I’m really trying to do – all the data is in one line, and I try to use the ## notation:
DATA test_2;
INPUT alpha $ 1-5 beta $ 6-10 gamma 11-15 ##;
DATALINES;
a f 1 ab fg 12 abc fgh 123 abcd fghi 1234 abcdefghij12345
;
RUN;
This fails because it just keeps reading the beginning 15 characters, holding that record, and re-reading from the start. Based on my understanding of the semantics of the ## notation, I can definitely understand why this would be happening.
Is there any way I can accomplish reading fixed column data from a single line; that is, make test_2 have the same content as test_1? Perhaps through some combination of symbols in the INPUT statement, or maybe resorting to another method (with file I/O functions, PROC IMPORT, etc.)?
Have you tried specifying variable lengths using informats?
For example:
DATA test_2;
INPUT alpha $5. beta $5. gamma 5.0 ##;
DATALINES;
a f 1 ab fg 12 abc fgh 123 abcd fghi 1234 abcdefghij12345
;
RUN;
From the SAS documentation:
Formatted input causes the pointer to move like that of column input
to read a variable value. The pointer moves the length that is
specified in the informat and stops at the next column.