I am completely new to SAS programming hence pardon if the question is very basic. I am trying to send a file using SAS sftp from Linux to windows server. I am able to transfer the file but the destination file has a row delimiter LF whereas our job is expecting the file to have CRLF delimiter. I tried using the termstr option but it fails with error "invalid option termstr". Below is my code
filename out-file sftp 'file.txt' cd='/project/dir'
host='hostname' recfm=v
user=user1;
data _null_;
file out-file TERMSTR=crlf;
do i=1 to i=10;
put i=;
end;
run;
Your program is using an invalid value for the fileref. You cannot use a hyphen in a SAS name.
You can use the TERMSTR= option on either the FILENAME or FILE statement to change the end of line characters.
I thought that SFTP always moved files as binary. You could try changing your SFTP option to make sure it is doing that. Try removing the recfm=v option.
Note: Text files have end-of-line characters, not record delimiters. If you are writing some type of proprietary binary file format you might consider the characters between rows of data a record delimiter, but it just leads to confusion if you think of the lines in text files as being separated instead of terminated.
TERMSTR is an option for the INFILE statement, there is no corresponding option for the FILE statement.
Try using PUT with a hexadecimal string and held output (#)
PUT I= '0d0a'x #;
From SAS documentation
Specifying Hexadecimal Values
Hexadecimal values for (system) option values must begin with a number (0–9) and must be followed by an X. For example, the following OPTIONS statement sets the line size to 160 using a hexadecimal number:
options linesize=0a0x;
Character assignments for hexadecimal numbers require quotation marks:
options formchar='a0'x;
Additional reading at SAS Constants in Expressions will reveal
Character Constants Expressed in Hexadecimal Notation
SAS character constants can be expressed in hexadecimal notation. A character hexadecimal constant is a string of an even number of hexadecimal characters enclosed in single or double quotation marks, followed immediately by an X
and
Numeric Constants Expressed in Hexadecimal Notation
A numeric constant that is expressed as a hexadecimal value starts with a numeric digit (usually 0), can be followed by more hexadecimal characters, and ends with the letter X. The constant can contain up to 16 valid hexadecimal characters (0 to 9, A to F)
Related
My dataset has a column with a wide range of values in it, such as the one below:
Value
3223145.306
1.044303129
345.556033
17693.00837
8.03E-06
NaN
1.97E-04
2.29E-04
8.01E-04
7.46E-04
18345.82237
47.78282804
4.14E-06
When I read this column in SAS, observations are read as character. Once I convert this to numeric the observations with E-04, E-05, E-06, etc. are being converted to 1.9736273 instead of 0.00019736273.
How do I account for E-04, E-05, E-05 etc.?
code for character to numeric:
Value=input(Value, best12.);
You have to make a NEW variable if you want it to have a different type.
The INPUT function does not care if the width used on the informat is larger than the length of the string being read. So just use the maximum width that the informat supports. Also BEST is the name of a FORMAT, not an INFORMAT. If you use as the name of an informat then SAS will just default to using the normal numeric informat. So just go ahead and say that from the start instead of confusing format names for informat names.
The normal numeric informat can read those strings as numbers. So this code will work to create a new numeric variable named NUMBER from the existing character variable named VALUE.
number = input(VALUE,32.);
The only string in your list that will cause any issues is the string 'NaN'. SAS will not know how to translate that so you will just get a missing value as the result. Which is basically what systems that use that "not a number" symbol mean by it anyway. To prevent the notes in the log you can either test for it explicitly.
if upcase(value) not in ('NA','N/A','NAN') then number=input(value,32.);
Or just suppress the error messages by add the ?? modifier.
number=input(value,??32.);
But then you will not get any message if there is other gibberish in the value variable.
If I have a text file where lines contains some non-blank characters followed by spaces, how do I read those lines into a character variable without excess spaces?
character (len=1000) :: text
open (unit=20,file="foo.txt",action="read")
read (20,"(a)") text
will read the first 1000 characters of a line into variable text, which will be padded with spaces at the end if there are fewer than 1000 characters in the line. But if the line length is 100 you have 900 extraneous spaces, and the program does not "know" how long the line read actually was.
Fortran strings are blank-padded. There is simply no chance to distinguish any significant blank-padding in your strings with constant-length Fortran strings.
If every whitespace character is important, I suggest to treat the file as a stream-access file instead (formated or unformatted as needed), read individual characters to some array buffer and allocate a deferred-length string only after you know the length you actually need.
character (len=1000) :: text
integer :: s, ios
open (unit=20,file="foo.txt",action="read")
read (20,"(a)", size=s, advance='no', iostat=ios) text
After that last line, s contains the number of characters read, including trailing spaces, which I think is what you wanted.
Notes:
With a size tag, you must also have an advance tag set to 'no' otherwise you get a compilation error. Since the format is "(a)", the whole line is read so the next read statement will advance to the next line despite the 'no'. That's fine.
ios stores a negative integer when attempting to read past the end of the line. This will always happen if the line is shorter than length of text. That's fine.
When attempting to read past the end of the file, ios will store a different negative integer. What those two negative integers are is not set by the standard I think so you may have to experiment a bit. In my case, with the gfortran compiler, ios was -1 when attempting to read past the end of the file and -2 otherwise.
I wonder how Fortran's I/O is expected to behave in case of a NULL character ACHAR(0).
The actual task is to fill an ASCII file by blocks of precisely eight characters. The strings are read from a binary and may contain non-printing characters.
I tried with gfortran 4.8, 8.1 and f2c. If there is a NULL character in the string the format specifier FORMAT(A8) does not write eight characters.
Give the following F77 code a try:
c Print a string of eight character surrounded by dashes
100 FORMAT('-',A8,'-')
c Works fine if empty or any other combination of printing chars
write(*,100) ''
c In case of a short sting blanks are padded
write(*,100) '345678'
c A NULL character does something I did not expect
write(*,100) '123'//ACHAR(0)//'4567'
c Not even position editing helps
101 FORMAT('-',A8,T10,'x')
write(*,101) '123'//ACHAR(0)//'4567'
end
My output is:
- -
- 345678-
-1234567-
-1234567x
Is this expected behavior? Any idea how to get the output eight characters wide in any case?
When using an edit descriptor A8 the field width is eight. For output, eight characters will be written.
In the case of the example, it isn't the writing of the characters that is contrary to your expectations, but how they are displayed by your terminal.
You can examine the output further with tools like hexdump or you can write to an internal file and look at arbitrary substrings.
Yes, that is expected, if there is a null character, the printing of the string on the screen can stop there. The characters will still be sent, but the string does not have to be printed on the screen.
Note that C uses NULL to delimit strings and the OS may interpret the strings it receives with the same conventions. The allows the non-printable characters to be interpreted in processor specific ways by the processor and the processor includes the whole complex of the compiler, the executing environment (OS and programs in the OS) and the hardware.
Why do these two print different things? The first prints abcd but the second prints \x61\x62\x63\x64. What do I need to do to make the line from the file to be read as abcd?
std::string line("\x61\x62\x63\x64");
ifstream myfile ("myfile.txt"); //<-- the file contains \x61\x62\x63\x64
std::string line_file;
getline(myfile,line_file);
cout << line << endl;
cout << line_file << endl;
In c++, the backslash is an escape character, which can be used to represent special characters such as new-lines \n and tabs \t, or in your case, hexadecimal representations of ASCII characters in string literals. If you actually want to store a backslash in c++ you have to escape it: char c='\\'. When you read a backslash from a file, it's not treated as an escape character, but as an actual backslash.
It has to do with the input file stream character interpretation:
File streams opened in binary mode perform input and output operations independently of any format considerations. Non-binary files are known as text files, and some translations may occur due to formatting of some special characters (like newline and carriage return characters).
Text file streams are those where the ios::binary flag is not included in their opening mode. These files are designed to store text and thus all values that are input or output from/to them can suffer some formatting transformations, which do not necessarily correspond to their literal binary value.
So, the backslashes'\' are the most probable reason your ifstream is reading and interpreting the bytes from the file differently (as separate characters), as opposed to the string that contains information about its value, thus making it non-ambiguous.
For further reading see how fstreams work and learn about character literals backslash escape.
We are loading a Fixed width text file into a SAS dataset.
The character we are using to delimit multi valued field values is being interpreted as 2 characters by SAS. This breaks things, because the fields are of a fixed width.
We can use characters that appear on the keyboard, but obviously this isn't as safe, because our data could actually contain those characters.
The character we would like to use is '§'.
I'm guessing this may be an encoding issue, but don't know what to do about it.
Could you use the keycode for the character like DLM='09'x and change 09 to the right keycode?