How to read comma delimited data file if some data include spaces

How to read comma delimited data file if some data include spaces - fortran

I am trying to read a data file which uses comma as delimiter as shown below
IPE 80,764,80.14,8.49
IPE 100,1030,171,15.92
However If I read using
READ(1,*) var1, var2, var3, var4
It reads IPE and 80 as different data. In other words it counts both commas and spaces as delimiter but I don't want this. How can I tell to my program "hey spaces are not delimiter only commas!" ?

One possibility would be to read in the entire line into a string buffer, and look for (some of) the delimiters yourself. Assuming that similar to your example, only the first column contains with whitespaces, you could do like:
program test
implicit none
character(1024) :: buffer
character(20) :: var1
integer :: pos, var2
real :: var3, var4
read(*,"(A)") buffer
pos = index(buffer, ",")
var1 = buffer(1:pos-1)
read(buffer(pos+1:), *) var2, var3, var4
print *, var1, var2, var3, var4
end program test
This way, you split that part of the string manually which is affected by the spaces, and everything else after it you conviniently read via the read statement. If not just the first but also other fields can contain whitespaces, it is easy to extend the example above to look for all the necessary delimiters in the buffer via the index() function.

Related

SAS- print Single Quotes without spaces

How can I remove the space at the end of the output? I am getting a space at the end within quotes
data _null_;
files = 'AAAAAAAAAAAAAA,BBBBBBBBBBBBBB';
f_count = countw(files);
do i=1 to f_count;
file = scan(files, i, ',');
put '''file''';
end;
run;
output: There is a space at the end
'AAAAAAAAAAAA '
'BBBBBBBBBBBB '

That is because you are used LIST MODE style PUT statement. SAS will write a delimiter (space in this case) after each variable written when using LIST MODE style in PUT statements.
You could just use a cursor movement command to back-up one byte so that the closing quote is written over the space.
put "'" file +(-1) "'";
You could add the quotes to the variable rather than in the PUT statement. (Then the space will be written after the closing quote.)
file = quote(strip(scan(files, i, ',')),"'");
put file ;
Or you could use the $VARYING format to write the exact number of bytes that FILE contains.
len = lengthn(file);
put "'" file $varying200. len "'" ;
If you don't mind using double quote characters instead of single quotes you could just use the $QUOTE format.
put file :$quote. ;
You could also use the DSD option on the FILE statement. SAS will then automatically add double quotes if they are needed. They will be needed when the value contains the delimiter character or quote characters themselves. With the DSD option in effect you can use the ~ modifier in the PUT statement to write quotes around the value even when the value does not require quoting.
data _null_;
file log dsd ;
files = 'AAAAAAAAAAAAAA,BBBBBBBBBBBBBB';
f_count = countw(files);
do i=1 to f_count;
file = scan(files, i, ',');
put file ~;
end;
run;

Why is the last character getting removed after applying tranwrd function

I want to replace certain values in my json file (in this example null values with empty quotation marks.) My solution is working correctly but, for some mysterious reason, the last character of the json file is deleted. Regardless of the last character, the code always deletes it - I have also tried with a different json file that ends in curly braces.
What is causing this and more importantly how can I prevent this?
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'G:\test.json' pretty fmtnumeric nosastags keys;
export testdata_;
run;
data _null_;
infile 'G:\test.json';
file 'G:\test.json';
input;
_infile_ = tranwrd(_infile_,'null','""');
put _infile_ ;
run;
To see how the contents change, first run the code until "data null" statement and check the file content, then run the last statement.

Data _null_ has it correct; don't write to the same file. SAS offers this option, but in the modern day it's almost always the wrong answer, due to how SAS supports this and the fact that storage is sufficiently cheap and fast.
In this case, it looks like it's a relatively easy fix, but you probably should do as suggested and write to a new file anyway - there will be other issues.
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'H:\temp\test.json' pretty fmtnumeric nosastags keys;
export testdata_;
run;
data _null_;
infile 'H:\temp\test.json' end=eof;
file 'H:\temp\test.json';
input #;
putlog _infile_;
_infile_ = tranwrd(_infile_,'null','"" ');
len = length(_infile_);
put _infile_ ;
if eof then put _infile_;
run;
There's two changes. One, I use '"" ' instead of '""' in the tranwrd; that's because otherwise you end up with slightly odd results with new lines being added. If your JSON parser doesn't like "" ,, then you may want to instead have two tranwrd, one for null, and one for null, or something similar (or use a regular expression). But what's important is the number of characters needs to match in the input and the output. If you can't handle that (like the extra spaces are problematic) then you're left with "write a new file".
Two, I look for the end of the file, then intentionally write out a second line there. That avoids the issue you're having with the bracket, as it avoids having the EOF being written out before the bracket. I'm not 100% sure I know why you need that - but you do.
Another option, which might make more sense, is to only write the lines that have the bracket.
data _null_;
infile 'H:\temp\test.json' sharebuffers;
file 'H:\temp\test.json';
input #;
putlog _infile_;
if find(_infile_,'null') then do;
_infile_ = tranwrd(_infile_,'null','"" ');
put _infile_;
end;
run;
I added sharebuffers because that should make it run a bit faster. Note that I also remove one space - something weird about how SAS does this seems to otherwise remove a space from the following line otherwise. No idea why, probably something weird with EOL characters.
But again - don't do any of this unless there's no other option. Write a new file.

One strange thing is that the PROC JSON always writes a text file that uses LF as the end of line characters.
So you might be able to get your overwriting of the file to work if add these caveats:
Use TERMSTR=LF on the INFILE statement.
Use SHAREDBUFFERS on the INFILE statement.
Replace the string with the same number of bytes with the TRANWRD() function and not put a space as the last character on the line.
I would also search for ': null' instead of just 'null' to reduce risk of replacing those characters in some other string in the file.
data _null_;
infile json SHAREBUFFERS termstr=lf ;
file json ;
input ;
_infile_ = tranwrd(_infile_,': null',': ""');
put _infile_;
run;

How do I print a Fortran string with quotes around it?

Suppose I have a Fortran program like the following:
character*30 changed_string1
changed_string1="hello"
write(*,"(A)")changed_string1(1:3)
end
I would like to print the string with quotes so that I can exactly see leading and trailing spaces. How to do this?

There is no edit descriptor for characters which outputs them along with delimiters. A character variable does not have "automatic" delimiters like those which appear in a literal character constant (although may have them as content).
Which means you have to explicitly print any chosen delimiter yourself, adding them to the format or concatenating as in Vladimir F's answer.
Similarly, you can also add the delimiters to the output list (with
corresponding format change):
write (*,'(3A)') '"', string, '"'
You can even write a function which returns a "delimited string" and use the
result in the output list:
implicit none
character(50) :: string="hello"
print '(A)', delimit(string,'"')
contains
pure function delimit(str, delim) result(delimited)
character(*), intent(in) :: str, delim
character(len(str)+2*len(delim)) delimited
delimited = delim//str//delim
end function delimit
end program
The function result above could even be deferred length (character(:), allocatable :: delimited) to avoid the explicit statement of result length.
As yamajun reminds us in a comment, a connection for formatted output has a delimiter mode, which does allow quotes and apostrophes to be added automatically to the output for list-directed and namelist output (only). For example, we can control the delimiter mode for a particular data transfer statement:
write(*, *, delim='quote') string
write(*, *, delim='apostrophe') string
or for the connection as a whole:
open(unit=output_unit, delim='quote') ! output_unit from module iso_fortan_env
Don't forget that list-directed output will add that leading blank to your output, and if you have quotes or apostrophes in your character output item you will not see exactly the same representation (this could even be what you want):
use, intrinsic :: iso_fortran_env, only : output_unit
open(output_unit, delim='apostrophe')
print*, "Don't be surprised by this output"
end
Fortran 2018 doesn't allow arbitrary delimiter choice in this way, but this could still be suitable for some uses.

You can print quotes around your string. That will enable see the leading and trailing spaces.
write(*,"('''',A,'''')") changed_string1
or with the same effect
write(*,"(3A)") "'",changed_string1,"'"
(also mentioned by francescalus) that print a ' character before and afgter your string,
or you can concatenate your string with these characters and print the result
write(*,"(A)") "'"//changed_string1//"'"

SAS Scan function separator not working as it should

I ran into a problem with the scan function in sas.
The dataset I have contains one variable that needs to be split into multiple variables.
The variable is structured like this:
4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur /
bewerker__(blank)__1__17__108.03__93.7
I use this code to split this into multiple variables:
data /*ULB.*/work.smart_BCSS_withNISS_&JJ.&K.;
set work.smart_BCSS_withNISS_&JJ.&K.;
/* Maand splitsen in variablen */
mois=scan(smart,1,"__");
jours=scan(smart,2,"__");
nbjours=scan(smart,3,"__");
refClient=scan(smart,4,"__");
paysPrestation=scan(smart,5,"__");
wordingFR=scan(smart,6,"__");
wordingNL=scan(smart,7,"__");
fonction=scan(smart,8,"__");
ARTISTIQUE2=scan(smart,9,"__");
Art_At_LEAST=scan(smart,10,"__");
totalBrut=scan(smart,11,"__");
totalImposable=scan(smart,12,"__");
run;
Most of the time this works perfectly. However sometimes the 4th variable 'refClient' contains one single underscore like this:
4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur /
bewerker__(blank)__1__17__108.03__93.7
Somehow the scan function also detects this single underscore as a separator even though the separator is a double underscore.
Any idea on how to avoid this behavior?

Aurieli's code works, but their answer doesn't explain why. Your understanding of how scan works is incorrect.
If there is more than 1 character in the delimiter specified for scan, each character is treated as a delimiter. You've specified _ twice. If you had specified ab then a and b would both have been treated as delimiters, rather than ab being the delimiter.
scan by default treats multiple consecutive delimiters as a single delimiter, which was why your code treated both __ and _ as delimiters. So if you specified ab as the delimiter string then ba, abba etc. would also be counted as a single delimiter by default.

You can use regexp to change single '_' (for example, change to '-') and then scan what you want:
data /*ULB.*/work.test;
smart="4__J04__1__LE_18__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7";
smartcr=prxchange("s/(?<=[^_])(_{1})(?=[^_])/-/",-1,smart);
/* Maand splitsen in variablen */
mois=scan(smartcr,1,"__");
jours=scan(smartcr,2,"__");
nbjours=scan(smartcr,3,"__");
refClient=tranwrd(scan(smartcr,4,"__"),'-','_');
paysPrestation=scan(smartcr,5,"__");
wordingFR=scan(smartcr,6,"__");
wordingNL=scan(smartcr,7,"__");
fonction=scan(smartcr,8,"__");
ARTISTIQUE2=scan(smartcr,9,"__");
Art_At_LEAST=scan(smartcr,10,"__");
totalBrut=scan(smartcr,11,"__");
totalImposable=scan(smartcr,12,"__");
run;

Mildly interesting, the INFILE statement supports a delimiter string.
data test;
infile cards dlmstr='__';
input (mois
jours
nbjours
refClient
paysPrestation
wordingFR
wordingNL
fonction
ARTISTIQUE2
Art_At_LEAST
totalBrut
totalImposable) (:$32.);
cards;
4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
;;;;
run;
proc print;
run;

In SAS, what does the option "dsd" stand for?

I have a quick question.
I am learning SAS and have come across the dsd= option.
Does anyone know what this stands for? It might assist in remembering / contextualizing.
Thanks.

Rather than just copy and pasting text from the internet. I'll try to explain it a bit clearer. Like the delimiter DLM=, DSD is an option that you can use in the infile statement.
Suppose a delimiter has been specified with DLM= and we used DSD. If SAS sees two delimiters that are side by side or with only blank space(s) between them, then it would recognize this as a missing value.
For example, if text file dog.txt contains the row:
171,255,,dog
Then,
data test;
infile 'C:\sasdata\dog.txt' DLM=',' DSD;
input A B C D $;
run;
will output:
A B C D
171 255 . dog
Therefore, variable C will be missing denoted by the .. If we had not used DSD, it would return as invalid data.

DSD stands for Delimiter-Sensitive Data.
The DSD (Delimiter-Sensitive Data) in infile statement does three things for you. 1: it ignores delimiters in data values enclosed in quotation marks; 2: it ignores quotation marks as part of your data; 3: it treats two consecutive delimiters in a row as missing value.
Source: easy sas

DSD (delimiter-sensitive data)
specifies that when data values are enclosed in quotation marks,
delimiters within the value are treated as character data. The DSD
option changes how SAS treats delimiters when you use LIST input and
sets the default delimiter to a comma. When you specify DSD, SAS
treats two consecutive delimiters as a missing value and removes
quotation marks from character values.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm

DSD refers to delimited data files that have delimiters back to back when there is missing data. In the past, programs that created delimited files always put a blank for missing data. Today, however, pc software does not put in blanks, which means that the delimiters are not separated. The DSD option of the INFILE statement tells SAS to watch out for this. Below are examples (using comma delimited values) to illustrated:
Old Way: 5,4, ,2, ,1 ===> INFILE 'file' DLM=',' ... etc
New Way: 5,4,,2,,1 ===> INFILE 'file' DLM=',' DSD ... etc.
Refer
reference

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to read comma delimited data file if some data include spaces - fortran

Related

SAS- print Single Quotes without spaces

Why is the last character getting removed after applying tranwrd function

How do I print a Fortran string with quotes around it?

SAS Scan function separator not working as it should

In SAS, what does the option "dsd" stand for?

Categories

Resources