Read specific columns of a delimited file in SAS - sas

This seems like it should be straightforward, but I can't find how to do this in the documentation. I want to read in a comma-delimited file, but it's very wide, and I just want to read a few columns.
I thought I could do this, but the # pointer seems to point to columns of the text rather than the column numbers defined by the delimiter:
data tmp;
infile 'results.csv' delimiter=',' MISSOVER DSD lrecl=32767 firstobs=2;
#1 id
#5 name$
run;
In this example, I want to read just what is in the 1st and 5th columns based on the delimiter, but SAS is reading what is in position 1 and position 5 of text file. So if the first line of the input file starts like this
1234567, "x", "y", "asdf", "bubba", ... more variables ...
I want id=1234567 and name=bubba, but I'm getting name=567, ".
I realize that I could read in every column and drop the ones I don't want, but there must be a better way.

Indeed, # does point to column of text not the delimited column. The only method using standard input I've ever found was to read in blank, ie
input
id
blank $
blank $
blank $
name $
;
and then drop blank.
However, there is a better solution if you don't mind writing your input differently.
data tmp;
infile datalines;
input #;
id = scan(_INFILE_,1,',');
name = scan(_INFILE_,5,',');
put _all_;
datalines;
12345,x,y,z,Joe
12346,x,y,z,Bob
;;;;
run;
It makes formatting slightly messier, as you need put or input statements for each variable you do not want in base character format, but it might be easier depending on your needs.

You can skip fields fairly efficiently if you know a bit of INPUT statement syntax, note the use of (3*dummy)(:$1.). Reading just one byte should also improve performance slightly.
data tmp;
infile cards DSD firstobs=2;
input id $ (3*dummy)(:$1.) name $;
drop dummy;
cards;
id,x,y,z,name
1234567, "x", "y", "asdf", "bubba", ... more variables
1234567, "x", "y", "asdf", "bubba", ... more variables
run;
proc print;
run;

One more option that I thought of when answering a related question from another user.
filename tempfile temp;
data _null_;
set sashelp.cars;
file tempfile dlm=',' dsd lrecl=32767;
put (Make--Wheelbase) ($);
run;
data mydata;
infile tempfile dlm=',' dsd truncover lrecl=32767;
length _tempvars1-_tempvars100 $32;
array _tempvars[100] $;
input (_tempvars[*]) ($);
make=_tempvars[1];
type=_tempvars[3];
MSRP=input(_tempvars[6],dollar8.);
keep make type msrp;
run;
Here we use an array of effectively temporary (can't actually BE temporary, unfortunately) variables, and then grab just what we want specifying the columns. This is probably overkill for a small file - just read in all the variables and deal with it - but for 100 or 200 variables where you want just 15, 18, and 25, this might be easier, as long as you know which column you want exactly. (I could see using this in dealing with census data, for example, if you have it in CSV form. It's very common to just want a few columns most of which are way down 100 or 200 columns from the starting column.)
You have to take some care with your lengths for the temporary array (has to be as long as your longest column that you care about!), and you have to make sure not to mess up the columns since you won't get to know if you mess up unless it's obvious from the data.

Related

Manually Reading in Data in SAS from CSV

So I have a large dataset that is rather oddly formatted and I want to read it in based on the header. It only has unique columns for each unique participant and each participant participated in multiple rounds of the study. The data is from some experiments and is formatted as having variables for each participant (e.g. "participant.code") then some session variables which I can drop and then the actual variables from the experiment. These are formatted as "study.[round number].player.[variable]"
Rather then repeating the variable for every round, I want to just take out the round number as a separate variable and have an observation for every round for each participant.
I want to read these in differently depending on the variable and pick it out. I would rather not have to manually mess with the source file since the experiment is going to be run multiple times.
If someone could just point me towards some relevant material or whatnot that would be great.
Thank you!
Edit: example of some of the raw data:
participant.id_in_session,participant.code,participant.label,participant._is_bot,participant._index_in_pages,participant._max_page_index,participant._current_app_name,participant._current_page_name,participant.time_started_utc,participant.visited,participant.mturk_worker_id,participant.mturk_assignment_id,participant.payoff,session.code,session.label,session.mturk_HITId,session.mturk_HITGroupId,session.comment,session.is_demo,session.config.real_world_currency_per_point,session.config.participation_fee,session.config.name,session.config.treatment,study.1.player.id_in_group,study.1.player.role,study.1.player.payoff,study.1.player.Seatfinal,study.1.player.finalpay,study.1.player.payroundpay,study.1.player.QCorrect,study.1.player.treatment,study.1.player.Q1a,study.1.player.Q1b,study.1.player.Q1c,study.1.player.Q2a,study.1.player.Q3,study.1.player.Q4,study.1.player.Q5,study.1.player.Q6,study.1.player.Q7,study.1.player.Q80,study.1.player.Q81,study.1.player.Q82,study.1.player.offer,study.1.player.OfferNum,study.1.player.OfferTaken,study.1.player.BuyerNumber,study.1.player.Seatnum2,study.1.player.Seatnum,study.1.player.pay,study.1.player.isoffertaken,study.1.player.hastakenoffer,study.1.player.consent,study.1.player.offerPrice,study.1.player.oprice,study.1.player.guess_num_seller,study.1.player.BoughtPrice,study.1.player.reward,study.1.player.guess_num_buyer,study.1.group.id_in_subsession,study.1.subsession.round_number,study.1.subsession.offersrem,study.1.subsession.game_finished,study.1.subsession.numbuyers,study.1.subsession.bnum,study.1.subsession.payround,study.2.player.id_in_group,study.2.player.role,study.2.player.payoff,study.2.player.Seatfinal,study.2.player.finalpay,study.2.player.payroundpay,study.2.player.QCorrect,study.2.player.treatment,study.2.player.Q1a,study.2.player.Q1b,study.2.player.Q1c,study.2.player.Q2a,study.2.player.Q3,study.2.player.Q4,study.2.player.Q5,study.2.player.Q6,study.2.player.Q7,study.2.player.Q80,study.2.player.Q81,study.2.player.Q82,study.2.player.offer,study.2.player.OfferNum,study.2.player.OfferTaken,study.2.player.BuyerNumber,study.2.player.Seatnum2,study.2.player.Seatnum,study.2.player.pay,study.2.player.isoffertaken,study.2.player.hastakenoffer,study.2.player.consent,study.2.player.offerPrice,study.2.player.oprice,study.2.player.guess_num_seller,study.2.player.BoughtPrice,study.2.player.reward,study.2.player.guess_num_buyer,study.2.group.id_in_subsession,study.2.subsession.round_number,study.2.subsession.offersrem,study.2.subsession.game_finished,study.2.subsession.numbuyers,study.2.subsession.bnum,study.2.subsession.payround,study.3.player.id_in_group,study.3.player.role,study.3.player.payoff,study.3.player.Seatfinal,study.3.player.finalpay,study.3.player.payroundpay,study.3.player.QCorrect,study.3.player.treatment,study.3.player.Q1a,study.3.player.Q1b,study.3.player.Q1c,study.3.player.Q2a,study.3.player.Q3,study.3.player.Q4,study.3.player.Q5,study.3.player.Q6,study.3.player.Q7,study.3.player.Q80,study.3.player.Q81,study.3.player.Q82,study.3.player.offer,study.3.player.OfferNum,study.3.player.OfferTaken,study.3.player.BuyerNumber,study.3.player.Seatnum2,study.3.player.Seatnum,study.3.player.pay,study.3.player.isoffertaken,study.3.player.hastakenoffer,study.3.player.consent,study.3.player.offerPrice,study.3.player.oprice,study.3.player.guess_num_seller,study.3.player.BoughtPrice,study.3.player.reward,study.3.player.guess_num_buyer,study.3.group.id_in_subsession,study.3.subsession.round_number,study.3.subsession.offersrem,study.3.subsession.game_finished,study.3.subsession.numbuyers,study.3.subsession.bnum,study.3.subsession.payround,study.4.player.id_in_group,study.4.player.role,study.4.player.payoff,study.4.player.Seatfinal,study.4.player.finalpay,study.4.player.payroundpay,study.4.player.QCorrect,study.4.player.treatment,study.4.player.Q1a,study.4.player.Q1b,study.4.player.Q1c,study.4.player.Q2a,study.4.player.Q3,study.4.player.Q4,study.4.player.Q5,study.4.player.Q6,study.4.player.Q7,study.4.player.Q80,study.4.player.Q81,study.4.player.Q82,study.4.player.offer,study.4.player.OfferNum,study.4.player.OfferTaken,study.4.player.BuyerNumber,study.4.player.Seatnum2,study.4.player.Seatnum,study.4.player.pay,study.4.player.isoffertaken,study.4.player.hastakenoffer,study.4.player.consent,study.4.player.offerPrice,study.4.player.oprice,study.4.player.guess_num_seller,study.4.player.BoughtPrice,study.4.player.reward,study.4.player.guess_num_buyer,study.4.group.id_in_subsession,study.4.subsession.round_number,study.4.subsession.offersrem,study.4.subsession.game_finished,study.4.subsession.numbuyers,study.4.subsession.bnum,study.4.subsession.payround,study.5.player.id_in_group,study.5.player.role,study.5.player.payoff,study.5.player.Seatfinal,study.5.player.finalpay,study.5.player.payroundpay,study.5.player.QCorrect,study.5.player.treatment,study.5.player.Q1a,study.5.player.Q1b,study.5.player.Q1c,study.5.player.Q2a,study.5.player.Q3,study.5.player.Q4,study.5.player.Q5,study.5.player.Q6,study.5.player.Q7,study.5.player.Q80,study.5.player.Q81,study.5.player.Q82,study.5.player.offer,study.5.player.OfferNum,study.5.player.OfferTaken,study.5.player.BuyerNumber,study.5.player.Seatnum2,study.5.player.Seatnum,study.5.player.pay,study.5.player.isoffertaken,study.5.player.hastakenoffer,study.5.player.consent,study.5.player.offerPrice,study.5.player.oprice,study.5.player.guess_num_seller,study.5.player.BoughtPrice,study.5.player.reward,study.5.player.guess_num_buyer,study.5.group.id_in_subsession,study.5.subsession.round_number,study.5.subsession.offersrem,study.5.subsession.game_finished,study.5.subsession.numbuyers,study.5.subsession.bnum,study.5.subsession.payround,study.6.player.id_in_group,study.6.player.role,study.6.player.payoff,study.6.player.Seatfinal,study.6.player.finalpay,study.6.player.payroundpay,study.6.player.QCorrect,study.6.player.treatment,study.6.player.Q1a,study.6.player.Q1b,study.6.player.Q1c,study.6.player.Q2a,study.6.player.Q3,study.6.player.Q4,study.6.player.Q5,study.6.player.Q6,study.6.player.Q7,study.6.player.Q80,study.6.player.Q81,study.6.player.Q82,study.6.player.offer,study.6.player.OfferNum,study.6.player.OfferTaken,study.6.player.BuyerNumber,study.6.player.Seatnum2,study.6.player.Seatnum,study.6.player.pay,study.6.player.isoffertaken,study.6.player.hastakenoffer,study.6.player.consent,study.6.player.offerPrice,study.6.player.oprice,study.6.player.guess_num_seller,study.6.player.BoughtPrice,study.6.player.reward,study.6.player.guess_num_buyer,study.6.group.id_in_subsession,study.6.subsession.round_number,study.6.subsession.offersrem,study.6.subsession.game_finished,study.6.subsession.numbuyers,study.6.subsession.bnum,study.6.subsession.payround,study.7.player.id_in_group,study.7.player.role,study.7.player.payoff,study.7.player.Seatfinal,study.7.player.finalpay,study.7.player.payroundpay,study.7.player.QCorrect,study.7.player.treatment,study.7.player.Q1a,study.7.player.Q1b,study.7.player.Q1c,study.7.player.Q2a,study.7.player.Q3,study.7.player.Q4,study.7.player.Q5,study.7.player.Q6,study.7.player.Q7,study.7.player.Q80,study.7.player.Q81,study.7.player.Q82,study.7.player.offer,study.7.player.OfferNum,study.7.player.OfferTaken,study.7.player.BuyerNumber,study.7.player.Seatnum2,study.7.player.Seatnum,study.7.player.pay,study.7.player.isoffertaken,study.7.player.hastakenoffer,study.7.player.consent,study.7.player.offerPrice,study.7.player.oprice,study.7.player.guess_num_seller,study.7.player.BoughtPrice,study.7.player.reward,study.7.player.guess_num_buyer,study.7.group.id_in_subsession,study.7.subsession.round_number,study.7.subsession.offersrem,study.7.subsession.game_finished,study.7.subsession.numbuyers,study.7.subsession.bnum,study.7.subsession.payround,study.8.player.id_in_group,study.8.player.role,study.8.player.payoff,study.8.player.Seatfinal,study.8.player.finalpay,study.8.player.payroundpay,study.8.player.QCorrect,study.8.player.treatment,study.8.player.Q1a,study.8.player.Q1b,study.8.player.Q1c,study.8.player.Q2a,study.8.player.Q3,study.8.player.Q4,study.8.player.Q5,study.8.player.Q6,study.8.player.Q7,study.8.player.Q80,study.8.player.Q81,study.8.player.Q82,study.8.player.offer,study.8.player.OfferNum,study.8.player.OfferTaken,study.8.player.BuyerNumber,study.8.player.Seatnum2,study.8.player.Seatnum,study.8.player.pay,study.8.player.isoffertaken,study.8.player.hastakenoffer,study.8.player.consent,study.8.player.offerPrice,study.8.player.oprice,study.8.player.guess_num_seller,study.8.player.BoughtPrice,study.8.player.reward,study.8.player.guess_num_buyer,study.8.group.id_in_subsession,study.8.subsession.round_number,study.8.subsession.offersrem,study.8.subsession.game_finished,study.8.subsession.numbuyers,study.8.subsession.bnum,study.8.subsession.payround,study.9.player.id_in_group,study.9.player.role,study.9.player.payoff,study.9.player.Seatfinal,study.9.player.finalpay,study.9.player.payroundpay,study.9.player.QCorrect,study.9.player.treatment,study.9.player.Q1a,study.9.player.Q1b,study.9.player.Q1c,study.9.player.Q2a,study.9.player.Q3,study.9.player.Q4,study.9.player.Q5,study.9.player.Q6,study.9.player.Q7,study.9.player.Q80,study.9.player.Q81,study.9.player.Q82,study.9.player.offer,study.9.player.OfferNum,study.9.player.OfferTaken,study.9.player.BuyerNumber,study.9.player.Seatnum2,study.9.player.Seatnum,study.9.player.pay,study.9.player.isoffertaken,study.9.player.hastakenoffer,study.9.player.consent,study.9.player.offerPrice,study.9.player.oprice,study.9.player.guess_num_seller,study.9.player.BoughtPrice,study.9.player.reward,study.9.player.guess_num_buyer,study.9.group.id_in_subsession,study.9.subsession.round_number,study.9.subsession.offersrem,study.9.subsession.game_finished,study.9.subsession.numbuyers,study.9.subsession.bnum,study.9.subsession.payround,study.10.player.id_in_group,study.10.player.role,study.10.player.payoff,study.10.player.Seatfinal,study.10.player.finalpay,study.10.player.payroundpay,study.10.player.QCorrect,study.10.player.treatment,study.10.player.Q1a,study.10.player.Q1b,study.10.player.Q1c,study.10.player.Q2a,study.10.player.Q3,study.10.player.Q4,study.10.player.Q5,study.10.player.Q6,study.10.player.Q7,study.10.player.Q80,study.10.player.Q81,study.10.player.Q82,study.10.player.offer,study.10.player.OfferNum,study.10.player.OfferTaken,study.10.player.BuyerNumber,study.10.player.Seatnum2,study.10.player.Seatnum,study.10.player.pay,study.10.player.isoffertaken,study.10.player.hastakenoffer,study.10.player.consent,study.10.player.offerPrice,study.10.player.oprice,study.10.player.guess_num_seller,study.10.player.BoughtPrice,study.10.player.reward,study.10.player.guess_num_buyer,study.10.group.id_in_subsession,study.10.subsession.round_number,study.10.subsession.offersrem,study.10.subsession.game_finished,study.10.subsession.numbuyers,study.10.subsession.bnum,study.10.subsession.payround,study.11.player.id_in_group,study.11.player.role,study.11.player.payoff,study.11.player.Seatfinal,study.11.player.finalpay,study.11.player.payroundpay,study.11.player.QCorrect,study.11.player.treatment,study.11.player.Q1a,study.11.player.Q1b,study.11.player.Q1c,study.11.player.Q2a,study.11.player.Q3,study.11.player.Q4,study.11.player.Q5,study.11.player.Q6,study.11.player.Q7,study.11.player.Q80,study.11.player.Q81,study.11.player.Q82,study.11.player.offer,study.11.player.OfferNum,study.11.player.OfferTaken,study.11.player.BuyerNumber,study.11.player.Seatnum2,study.11.player.Seatnum,study.11.player.pay,study.11.player.isoffertaken,study.11.player.hastakenoffer,study.11.player.consent,study.11.player.offerPrice,study.11.player.oprice,study.11.player.guess_num_seller,study.11.player.BoughtPrice,study.11.player.reward,study.11.player.guess_num_buyer,study.11.group.id_in_subsession,study.11.subsession.round_number,study.11.subsession.offersrem,study.11.subsession.game_finished,study.11.subsession.numbuyers,study.11.subsession.bnum,study.11.subsession.payround,study.12.player.id_in_group,study.12.player.role,study.12.player.payoff,study.12.player.Seatfinal,study.12.player.finalpay,study.12.player.payroundpay,study.12.player.QCorrect,study.12.player.treatment,study.12.player.Q1a,study.12.player.Q1b,study.12.player.Q1c,study.12.player.Q2a,study.12.player.Q3,study.12.player.Q4,study.12.player.Q5,study.12.player.Q6,study.12.player.Q7,study.12.player.Q80,study.12.player.Q81,study.12.player.Q82,study.12.player.offer,study.12.player.OfferNum,study.12.player.OfferTaken,study.12.player.BuyerNumber,study.12.player.Seatnum2,study.12.player.Seatnum,study.12.player.pay,study.12.player.isoffertaken,study.12.player.hastakenoffer,study.12.player.consent,study.12.player.offerPrice,study.12.player.oprice,study.12.player.guess_num_seller,study.12.player.BoughtPrice,study.12.player.reward,study.12.player.guess_num_buyer,study.12.group.id_in_subsession,study.12.subsession.round_number,study.12.subsession.offersrem,study.12.subsession.game_finished,study.12.subsession.numbuyers,study.12.subsession.bnum,study.12.subsession.payround,study.13.player.id_in_group,study.13.player.role,study.13.player.payoff,study.13.player.Seatfinal,study.13.player.finalpay,study.13.player.payroundpay,study.13.player.QCorrect,study.13.player.treatment,study.13.player.Q1a,study.13.player.Q1b,study.13.player.Q1c,study.13.player.Q2a,study.13.player.Q3,study.13.player.Q4,study.13.player.Q5,study.13.player.Q6,study.13.player.Q7,study.13.player.Q80,study.13.player.Q81,study.13.player.Q82,study.13.player.offer,study.13.player.OfferNum,study.13.player.OfferTaken,study.13.player.BuyerNumber,study.13.player.Seatnum2,study.13.player.Seatnum,study.13.player.pay,study.13.player.isoffertaken,study.13.player.hastakenoffer,study.13.player.consent,study.13.player.offerPrice,study.13.player.oprice,study.13.player.guess_num_seller,study.13.player.BoughtPrice,study.13.player.reward,study.13.player.guess_num_buyer,study.13.group.id_in_subsession,study.13.subsession.round_number,study.13.subsession.offersrem,study.13.subsession.game_finished,study.13.subsession.numbuyers,study.13.subsession.bnum,study.13.subsession.payround
1,kppf7hjb,,0,221,221,study,FinalPay,2022-04-16 22:08:18.471115,1,,,0.0,lew8kph3,,,,,0,1.0,0.0,externality_control,0,2,Seller,0.0,1,0,0,10,0,125,125,50,100,50,0,0,0,1,1,,,1,3,,0,1,1,100,0,0,,50.0,,,,,,1,1,6,1,5,6,4,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,100,0,0,,45.0,,,,,,1,2,6,1,5,6,13,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,0,0,,0,,,100,0,0,,,,,,,,1,3,5,1,5,6,6,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,6,,0,,,138,1,0,,38.0,,,,,,1,4,6,1,5,6,3,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,2,,0,,,135,1,0,,35.0,,,,,,1,5,6,1,5,6,11,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,0,0,,0,,,100,0,0,,,,,,,,1,6,5,1,5,6,6,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,6,,0,,,132,1,0,,32.0,,,,,,1,7,6,1,5,6,4,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,5,,0,,,150,1,0,,50.0,,,,,,1,8,6,1,5,6,9,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,2,,0,,,100,0,0,,49.0,,,,,,1,9,6,1,5,6,10,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,5,,0,,,100,0,0,,39.0,,,,,,1,10,6,1,5,6,3,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,132,1,0,,32.0,,,,,,1,11,6,1,5,6,10,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,130,1,0,,30.0,,,,,,1,12,6,1,5,6,8,2,Seller,0.0,1,192,132,10,0,,,,,,,,,,,,,1,2,,0,,,128,1,0,,28.0,,,,,,1,13,6,1,5,6,11
Your file is not really as complicated as it first seems. For example the bulk of the data is just 43 columns that repeat 13 times. The STUDY.1 columns, then STUDY.2 columns etc.
For this one just write a program to read it. There are 22 columns that are not "study" columns. Then 13 copies of the 43 study columns.
data want;
infile csv dsd truncover firstobs=2;
input var1 ..... var22 #;
do study=1 to 13;
input svar1 .... svar43 # ;
output;
end;
run;
So you turn each line into 13 observations (study=1 to study=13).
To complete the sketch of a data step above you just need figure out want names you want to use for the 65 (22 + 43) variables other than STUDY. And for each variable what type of variable it is, numeric or character, and when character what length it needs to store the longest possible value.
If you need to work with a lot of different variations of files in this style then it might be worth working on a program to analyze the headers and determine the role of the columns based on the pattern of the header name and perhaps generate the code to read the file.
You might start by building a dataset with just the header names.
data headers;
infile csv dsd obs=1 ;
length col 8 words 8 ;
col+1;
array header [4] $50 ;
input header1 :$50. ## ;
words=countw(header1,'.');
do _n_=words to 1 by -1;
header[_n_] = scan(header1,_n_,'.');
end;
run;
You can use that list of the headers to help you figure out what would be useful names for the variables.
If you want to let SAS guess how to define and name the variables you could try splitting the CSV file into two separate CSV files. One with the first 22 columns and one with the other 43. So first split the headers (perhaps removing the STUDY.N. prefix while you are at it). Then split the data. Add an ROW number to make it easy to join them later.
filename single temp;
filename multiple temp;
data _null_;
infile csv dsd obs=1 ;
input header :$50. ## ;
file single dsd ;
if _n_=1 then put 'ROW,' #;
if _n_<= 22 then put header #;
else do;
file multiple dsd;
if _n_=23 then put 'ROW,STUDY,'# ;
call scan(header,3,pos,len,'.');
header = substr(header,pos);
put header #;
end;
if _n_=22+43 then stop;
run;
data _null_;
infile csv dsd firstobs=2 truncover ;
row+1;
length s1-s43 $200 ;
input s1-s22 #;
file single dsd mod;
put row s1-s22 ;
file multiple dsd mod;
do study=1 to 13 ;
input s1-s43 # ;
put row study s1-s43 ;
end;
run;
Now you can use PROC IMPORT to GUESS how to read SINGLE and MULTIPLE and then you can join them back together.
proc import file=single dbms=csv out=single replace;
run;
proc import file=multiple dbms=csv out=multiple replace;
run;
data want;
merge single multiple;
by row;
run;

SAS Export Issue as it is giving additional double quote

I am trying to export SAS data into CSV, sas dataset name is abc here and format is
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
I am using following code.
filename exprt "C:/abc.csv" encoding="utf-8";
proc export data=abc
outfile=exprt
dbms=tab;
run;
output is
LINE_NUMBER DESCRIPTION
524JG "24PC AMEFA VINTAGE CUTLERY SET ""DUBARRY"""
so there is double quote available before and after the description here and additional doble quote is coming after & before DUBARRY word. I have no clue whats happening. Can some one help me to resolve this and make me understand what exatly happening here.
expected result:
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
There is no need to use PROC EXPORT to create a delimited file. You can write it with a simple DATA step. If you want to create your example file then just do not use the DSD option on the FILE statement. But note that depending on the data you are writing that you could create a file that cannot be properly parsed because of extra un-protected delimiters. Also you will have trouble representing missing values.
Let's make a sample dataset we can use to test.
data have ;
input id value cvalue $ name $20. ;
cards;
1 123 A Normal
2 345 B Embedded|delimiter
3 678 C Embedded "quotes"
4 . D Missing value
5 901 . Missing cvalue
;
Essentially PROC EXPORT is writing the data using the DSD option. Like this:
data _null_;
set have ;
file 'myfile.txt' dsd dlm='09'x ;
put (_all_) (+0);
run;
Which will yield a file like this (with pipes replacing the tabs so you can see them).
1|123|A|Normal
2|345|B|"Embedded|delimiter"
3|678|C|"Embedded ""quotes"""
4||D|Missing value
5|901||Missing cvalue
If you just remove DSD option then you get a file like this instead.
1|123|A|Normal
2|345|B|Embedded|delimiter
3|678|C|Embedded "quotes"
4|.|D|Missing value
5|901| |Missing cvalue
Notice how the second line looks like it has 5 values instead of 4, making it impossible to know how to split it into 4 values. Also notice how the missing values have a minimum length of at least one character.
Another way would be to run a data step to convert the normal file that PROC EXPORT generates into the variant format that you want. This might also give you a place to add escape characters to protect special characters if your target format requires them.
data _null_;
infile normal dsd dlm='|' truncover ;
file abnormal dlm='|';
do i=1 to 4 ;
if i>1 then put '|' #;
input field :$32767. #;
field = tranwrd(field,'\','\\');
field = tranwrd(field,'|','\|');
len = lengthn(field);
put field $varying32767. len #;
end;
put;
run;
You could even make this datastep smart enough to count the number of fields on the first row and use that to control the loop so that you wouldn't have to hard code it.

Reading in a .csv file with null values in date field

I am reading in a .csv file in SAS where some of the fields are populated in the main by null values . and a handful are populated by 5 digit SAS dates. I need SAS to recognise the field as a date field (or at the very least a numeric field), instead of reading it in as text as it is is doing at the minute.
A simplified version of my code is as so:
data test;
informat mydate date9.;
infile myfile dsd dlm ',' missover;
input
myfirstval
mydate
;
run;
With this code all values are read in as . and the field data type is text. Can anyone tell me what I need to change in the above code to get the output I need?
Thanks
If you write a data step to read a CSV file SAS will create the variable as the data type that you specify. If you tell it that MYDATE is a number it will NOT convert it to a character variable.
data test;
infile cards dsd dlm=',' TRUNCOVER ;
length myfirstval 8 mydate 8 mythirdval 8;
input myfirstval mydate mythirdval;
format mydate date9.;
cards;
1,1234,5
2,.,6
;
Note that the data step compiler will define the type of the variable at the first chance that it can. For example if the first reference is in a statement like IF MYDATE='.' ... then MYDATE will be defined as character length one to match the type of the value that it is being compared to. That is why it is best to start with a LENGTH or ATTRIB statement to clearly define your variables.

Input delimited is8601 datetimes in SAS

Is it possible to input the following with a single input statement without producing any erroneous missing values? I believe I've got the right format for the first 19 characters of each of the datetime variables below, but I can't seem to find a way to make SAS ignore the extraneous characters and skip to the next delimiter before trying to input the next variable.
data _null_;
infile datalines dlm=',' dsd missover;
input a is8601dt19. b is8601dt19. c $4.;
format a b is8601dt.;
put a= b= c=;
datalines;
2013-01-19T09:40:39.812+0000,2013-01-19T09:40:39.812+0000,text
,2013-01-19T09:40:39.812+0000,text
,,text
;
run;
My workaround for the time being is to initially input as $28. and then use the substr and input functions, but I suspect that there may be a more direct/efficient way.
I don't see a clear way to do this. The problem is that these are not actually ISO8601 values, at least according to SAS.
SAS recognizes two versions of ISO: Basic (B8601DZ.) and Extended (E8601DZ.). Basic has no colons/dashes/etc., and Extended has all possible ones.
Basic: 20130119T094039812+0000
Extended: 2013-01-19T09:40:39.812+00:00
(see the doc page on ISO date/times for more information)
Yours are an amalgamation of the two, and SAS doesn't seem to like that.
Add to that the fact that you're reading this from a delimited file, and I don't see a good single pass solution. I think your method is fine. You can probably skip the substring, but otherwise you will be stuck.
Your input above doesn't work because you can't use informats in a list input method like that; if you prepend a : then the informat will be used, but unfortunately you can't actually use it to limit the incoming text to the informat (not sure why - it can in other contexts). IE:
input a :e8601dz19. b :e8601dz19. c :$4.;
That's legal, but doesn't help you, as it tries to stick the 28 long bit into that (I'm not sure if it's right-aligning it perhaps, but it's definitely not left-aligning it like it would in formatted input). You're using formatted input but mean to use modified list input, hence the issue.
You could do this, if you didn't have all that missing data, for example:
data _null_;
infile datalines dlm=',' dsd missover;
informat a b e8601dt19.;
input
#1 a e8601dt19.
#"," b e8601dt19.
#"," c $4.;
format a b is8601dt.;
put a= b= c=;
datalines;
2013-01-19T09:40:39.812+0000,2013-01-19T09:40:39.812+0000,text
,2013-01-19T09:40:39.812+0000,text
, ,text
;
run;
That works for the first line, basically reading the first 19 into a and then skipping to the next comma and reading the b. But notice it fails for every other row, because it eats up too many characters for a. Anything you do to adapt this to work (which probably could be done) is going to be far more than you'd do just substringing.
I would do this:
data _null_;
infile datalines dlm=',' dsd missover;
informat a b e8601dt19.;
length a_c b_c $28;
input
a_c $ b_c $ c $;
a = input(a_c,??e8601dt19. -l);
b = input(b_c,??e8601dt19. -l);
format a b is8601dt.;
put a= b= c=;
datalines;
2013-01-19T09:40:39.812+0000,2013-01-19T09:40:39.812+0000,text
,2013-01-19T09:40:39.812+0000,text
, ,text
;
run;
No substring necessary, just use the w to shorten to 19. Or add the : programmatically if you would like the TZ information used.

How to read only select columns in infile statement [duplicate]

This seems like it should be straightforward, but I can't find how to do this in the documentation. I want to read in a comma-delimited file, but it's very wide, and I just want to read a few columns.
I thought I could do this, but the # pointer seems to point to columns of the text rather than the column numbers defined by the delimiter:
data tmp;
infile 'results.csv' delimiter=',' MISSOVER DSD lrecl=32767 firstobs=2;
#1 id
#5 name$
run;
In this example, I want to read just what is in the 1st and 5th columns based on the delimiter, but SAS is reading what is in position 1 and position 5 of text file. So if the first line of the input file starts like this
1234567, "x", "y", "asdf", "bubba", ... more variables ...
I want id=1234567 and name=bubba, but I'm getting name=567, ".
I realize that I could read in every column and drop the ones I don't want, but there must be a better way.
Indeed, # does point to column of text not the delimited column. The only method using standard input I've ever found was to read in blank, ie
input
id
blank $
blank $
blank $
name $
;
and then drop blank.
However, there is a better solution if you don't mind writing your input differently.
data tmp;
infile datalines;
input #;
id = scan(_INFILE_,1,',');
name = scan(_INFILE_,5,',');
put _all_;
datalines;
12345,x,y,z,Joe
12346,x,y,z,Bob
;;;;
run;
It makes formatting slightly messier, as you need put or input statements for each variable you do not want in base character format, but it might be easier depending on your needs.
You can skip fields fairly efficiently if you know a bit of INPUT statement syntax, note the use of (3*dummy)(:$1.). Reading just one byte should also improve performance slightly.
data tmp;
infile cards DSD firstobs=2;
input id $ (3*dummy)(:$1.) name $;
drop dummy;
cards;
id,x,y,z,name
1234567, "x", "y", "asdf", "bubba", ... more variables
1234567, "x", "y", "asdf", "bubba", ... more variables
run;
proc print;
run;
One more option that I thought of when answering a related question from another user.
filename tempfile temp;
data _null_;
set sashelp.cars;
file tempfile dlm=',' dsd lrecl=32767;
put (Make--Wheelbase) ($);
run;
data mydata;
infile tempfile dlm=',' dsd truncover lrecl=32767;
length _tempvars1-_tempvars100 $32;
array _tempvars[100] $;
input (_tempvars[*]) ($);
make=_tempvars[1];
type=_tempvars[3];
MSRP=input(_tempvars[6],dollar8.);
keep make type msrp;
run;
Here we use an array of effectively temporary (can't actually BE temporary, unfortunately) variables, and then grab just what we want specifying the columns. This is probably overkill for a small file - just read in all the variables and deal with it - but for 100 or 200 variables where you want just 15, 18, and 25, this might be easier, as long as you know which column you want exactly. (I could see using this in dealing with census data, for example, if you have it in CSV form. It's very common to just want a few columns most of which are way down 100 or 200 columns from the starting column.)
You have to take some care with your lengths for the temporary array (has to be as long as your longest column that you care about!), and you have to make sure not to mess up the columns since you won't get to know if you mess up unless it's obvious from the data.