SAS Import from .CSV with line break - sas

I have .csv with line break and want to import in to SAS, But am facing the problems with data having like CUSTOMER with space (wrap text). Please help me how to overcome from this problem, Similar way I have some other variables, If I import mannualy its working fine.Please find the example below. See SLN PJ0136 to know the problem.
SLN MOD PM NE CUSTOMER
32121 GG 1 1 AVAILABLE UPON REQUEST
71403 EN 1 0 JET SUPPORT SERVICE INC.
305173 EN 1 1 UNKNOWN / COTTONWOOD, LLC / J SUPPORT SERVICE, INC.
PJ0136 PS 0 0 "UNKNOWN / GROUP B-50 INC AA
TC0004 anada CSC Europe
Inglewood Ava"
EB0162 RG 0 0 ATR
I used infile to import
DATA WORK.test1;
%let _EFIERR_ = 0;
INFILE 'C:\Users\26631.IELPWC\Downloads\test.csv'
delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
INFORMAT
SLN $CHAR6. MOD $CHAR2. PM BEST1. NE BEST1. CUSTOMER $CHAR82. ;
FORMAT
SLN $CHAR6. MOD $CHAR2. PM BEST1. NE BEST1. CUSTOMER $CHAR82. ;
INPUT
SLN $ MOD $ PM NE CUSTOMER $ ;
if _ERROR_ then call symputx('_EFIERR_',1);
RUN;
Please see the wrong output
32121 GG 1 1 AVAILABLE UPON REQUEST
71403 EN 1 0 JET SUPPORT SERVICE INC.
305173 EN 1 1 UNKNOWN / COTTONWOOD, LLC / J SUPPORT SERVICE, INC.
PJ0136 PS 0 0 "UNKNOWN / GROUP B-50 INC AA
TC0004 . .
24719 . .
" . .
EB0162 RG 0 0 ATR

Assuming that your input data is in the following format:
SLN,MOD,PM,NE,CUSTOMER
32121,GG,1,1,AVAILABLE UPON REQUEST
71403,EN,1,0,JET SUPPORT SERVICE INC.
305173,EN,1,1,"UNKNOWN / COTTONWOOD, LLC / J SUPPORT SERVICE, INC."
PJ0136,PS,0,0,"UNKNOWN / GROUP B-50 INC AA
TC0004 anada CSC Europe
Inglewood Ava"
EB0162,RG,0,0,ATR
The following SAS code will produce required output:
data TEST (drop=_TMP_:);
format SLN $6. MOD $2. PM 8. NE 8. CUSTOMER $82. _TMP_STR $100.;
infile 'input.csv' truncover firstobs=2 dlm=',' dsd lrecl=10000;
input SLN MOD PM NE _TMP_STR #;
_TMP_COUNT=0;
do until(mod(_TMP_COUNT, 2) = 0);
CUSTOMER=catx('0A'x, CUSTOMER, _TMP_STR);
_TMP_COUNT=_TMP_COUNT + countc(_TMP_STR, '"');
if mod(_TMP_COUNT, 2) then do;
input _TMP_STR;
end;
end;
CUSTOMER=dequote(CUSTOMER);
run;
Please note that the value for CUSTOMER column where SLN='PJ0136' is multiline (Unix style). You can remove this by changing function catx(...) acordingly.

Related

Reading in messy csv data with data step

timeOccur,timeReport,location,details,disposition
01/02/2021 11:20 am,,Sports Complex Road,Incident: PATROL ACTIVITY - Patrol Check<br /> CAD #: 21-01-02-000062,In Service
01/02/2021 11:20 am,,Corporation,Incident: ALARM - Burglary Alarm<br /> CAD #: 21-01-02-000063,Verified Entry/Field Interview
01/02/2021 10:20 am,,Grand Avenue Parking Structure,Incident: PATROL ACTIVITY - Patrol Check<br /> CAD #: 21-01-02-000061,In Service
12/30/2020 11:02 pm,12/30/2020 11:02 pm - 12/30/2020 11:50 pm,Canyon Circle Parking Structure,Incident: TRAFFIC COLLISION - Traffic Collision/Unknown Injury // ALCOHOL - Possession of false identification // TRAFFIC-ALCOHOL - DUI Alcohol // TRAFFIC - ALCOHOL - DUI >.08%<br /> Report #: 20074,Arrest
Need Help reading in this data.
Data CP_Crime;
infile '/home/u60629206/SAS330_Handouts/Crime Scrape 01022021.csv' DLM = ',' DSD firstobs=2 MISSOVER;
input Time_Occur anydtdtm19. Time_Report anydtdtm19. Location :$60. Details :$150. Disposition $35.;
run;
I am unsure how to read the time report, when it gives me a range like that, not looking to format it so what the base sas returns is fine as generated in the time occurs.
Treat Time_Report as as a character string, then convert each datetime within the string to a SAS datetime. Since both datetimes are separated by a -, we can use the scan function to read each one and convert them to SAS datetimes.
Data CP_Crime;
infile '/home/u60629206/SAS330_Handouts/Crime Scrape 01022021.csv' firstobs=2 DLM = ',' DSD MISSOVER;
input Time_Occur:anydtdtm19. Time_Report:$50. Location :$60. Details :$150. Disposition $35.;
Time_Report_Start = input(scan(time_report, 1, '-'), anydtdtm.);
Time_Report_End = input(scan(time_report, 2, '-'), anydtdtm.);
format Time_Occur
Time_Report_Start
Time_Report_End mdyampm.
;
drop time_report;
run;
Output:
Time_Occur Time_Report_Start Time_Report_End Location ...
1/2/2021 11:20 AM . . Sports Complex Road ...
1/2/2021 11:20 AM . . Corporation ...
1/2/2021 10:20 AM . . Grand Avenue Parking Structure ...
12/30/2020 11:02 PM 12/30/2020 11:02 PM 12/30/2020 11:50 PM Canyon Circle Parking Structure ...

ERROR 388-185: Expecting an arithmetic operator. SAS

I am pretty new to SAS, I am trying to see which songs/artists/albums have appeared most on my spotify most played csv's (2017-2020). I am getting stuck very early on trying to just set the 2017 csv as a data set. Is there anything anyone can see that I am doing wrong? Seems like this step should be pretty straight forward.
data Spotify_2017;
infile='C:\Users\your_top_songs_2017.csv' dlm=’09’x dsd firstobs=2;
input Track URI Track Name Artist URI Artist Name Album URI Album Name Album Release Date Disc Number Track Number Track Duration Explicit Popularity Added By Added At;
run;
and here is the log:
1 The SAS System 10:18 Friday, January 15, 2021
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Spotify.sas';
4 %LET _CLIENTPROCESSFLOWNAME='Standalone Not In Project';
5 %LET _CLIENTPROJECTPATH='';
6 %LET _CLIENTPROJECTPATHHOST='';
7 %LET _CLIENTPROJECTNAME='';
8 %LET _SASPROGRAMFILE='C:\Users\xxx\Desktop\Spotify\Spotify.sas';
9 %LET _SASPROGRAMFILEHOST='USRDUL-PC0NNXU1';
10
11 ODS _ALL_ CLOSE;
12 OPTIONS DEV=SVG;
13 GOPTIONS XPIXELS=0 YPIXELS=0;
14 %macro HTML5AccessibleGraphSupported;
15 %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16 %mend;
17 FILENAME EGHTML TEMP;
18 ODS HTML5(ID=EGHTML) FILE=EGHTML
19 OPTIONS(BITMAP_MODE='INLINE')
20 %HTML5AccessibleGraphSupported
21 ENCODING='utf-8'
22 STYLE=HtmlBlue
23 NOGTITLE
24 NOGFOOTNOTE
25 GPATH=&sasworklocation
26 ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 data Spotify_2017;
29 infile='C:\Users\pcardella\Desktop\Spotify\C:\Users\xxx\Desktop\Spotify\your_top_songs_2017.csv' dlm=’09’x dsd
___
388
76
29 ! firstobs=2;
ERROR 388-185: Expecting an arithmetic operator.
ERROR 76-322: Syntax error, statement will be ignored.
30 input Track URI Track Name Artist URI Artist Name Album URI Album Name Album Release Date Disc Number Track Number Track
30 ! Duration Explicit Popularity Added By Added At;
31 run;
ERROR: No DATALINES or INFILE statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.SPOTIFY_2017 may be incomplete. When this step was stopped there were 0 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
32
33 %LET _CLIENTTASKLABEL=;
34 %LET _CLIENTPROCESSFLOWNAME=;
35 %LET _CLIENTPROJECTPATH=;
36 %LET _CLIENTPROJECTPATHHOST=;
37 %LET _CLIENTPROJECTNAME=;
38 %LET _SASPROGRAMFILE=;
39 %LET _SASPROGRAMFILEHOST=;
2 The SAS System 10:18 Friday, January 15, 2021
40
41 ;*';*";*/;quit;run;
42 ODS _ALL_ CLOSE;
43
44
45 QUIT; RUN;
46
infile is a statement and does not need an equals sign. The syntax is:
infile 'file location here' <options>;
data Spotify_2017;
infile 'C:\Users\your_top_songs_2017.csv' dlm=’09’x dsd firstobs=2;
input Track URI Track Name Artist URI Artist Name Album URI Album Name Album Release Date Disc Number Track Number Track Duration Explicit Popularity Added By Added At;
run;
One way to help learn importing raw files using the data step is to use proc import. proc import will import the data and generate data step code for you in the log when importing csv files. You can study it to see how it works and try to replicate it.
proc import
file = 'C:\Users\your_top_songs_2017.csv'
out = spotify_2017
dbms = csv
replace;
run;
Also, a great option to help make logs more readable in Enterprise Guide is to disable autogenerated code. Go to Tools -> Options -> Results -> General -> uncheck "Show generated wrapper code in SAS log"

Query on mixed input style SAS data.This code is not giving me proper output

DATA nationalparks;
INPUT #1 ParkName $ 1-21 #23 State $ Year #40 Acreage COMMA9.;
DATALINES;
Yellowstone ID/MT/WY 1872 4,065,493
Everglades FL 1934 1,398,800
Yosemite CA 1864 760,91
Great Smoky Mountains NC/TN 1926 520,269
;
RUN;
This SAS code is not showing proper result set.
Your program looks fine to me if the data really is in the columns your code is using.
Make sure you haven't indented the datalines. Also check that your editor hasn't replaced spaces with TAB characters. Make sure that STATE and YEAR always have a value and that the value does not include any spaces. You can use . to mark missing values, even for the character variable STATE. You can read STATE and YEAR using columns instead and then blanks will be treated as missing. No need to use formatted input for the last variable, if you add the : modifier then SAS will use list mode and adjust the width used on the informat to match the width of the next value on the line. But again if ACREAGE is missing then use . to mark that. Or add an INFILE DATALINES TRUNCOVER; statement before the INPUT statement.
DATA nationalparks;
INPUT ParkName $ 1-21 State $ 23-30 Year 32-35 Acreage :COMMA.;
*---+----10---+----20---+----30---+----40---+----50;
DATALINES;
Yellowstone ID/MT/WY 1872 4,065,493
Everglades FL 1934 1,398,800
Yosemite CA 1864 760,91
Great Smoky Mountains NC/TN 1926 520,269
;
Results:
Obs ParkName State Year Acreage
1 Yellowstone ID/MT/WY 1872 4065493
2 Everglades FL 1934 1398800
3 Yosemite CA 1864 76091
4 Great Smoky Mountains NC/TN 1926 520269

Need to filter a large SAS data set on a list of 2000+ diagnosis codes. Should I use the IF statement or try to merge? [duplicate]

I'm trying to improve the processing time used via an already existing for-loop in a *.jsl file my classmates and I are using in our programming course using SAS. My question: is there a PROC or sequence of statements that exist that SAS offers that can replicate a search and match condition? Or a way to go through unsorted files without going line by line looking for matching condition(s)?
Our current scrip file is below:
if( roadNumber_Fuel[n]==roadNumber_TO[m] &
fuelDate[n]>=tripStart[m] & fuelDate[n]<=TripEnd[m],
newtripID[n] = tripID[m];
);
I have 2 sets of data simplified below.
DATA1:
ID1 Date1
1 May 1, 2012
2 Jun 4, 2013
3 Aug 5, 2013
..
.
&
DATA2:
ID2 Date2 Date3 TRIP_ID
1 Jan 1 2012 Feb 1 2012 9876
2 Sep 5 2013 Nov 3 2013 931
1 Dec 1 2012 Dec 3 2012 236
3 Mar 9 2013 May 3 2013 390
2 Jun 1 2013 Jun 9 2013 811
1 Apr 1 2012 May 5 2012 76
...
..
.
I need to check a lot of iterations but my goal is to have the code
check:
Data1.ID1 = Data2.ID2 AND (Date1 >Date2 and Date1 < Date3)
My desired output dataset woudld be
ID1 Date1 TRIP_ID
1 May 1, 2012 76
2 Jun 4, 2013 811
Thanks for any insight!
You can do range matches in two ways. First off, you can match using PROC SQL if you're familiar with SQL:
proc sql;
create tableC as
select * from table A
left join table B
on A.id=B.id and A.date > B.date1 and A.date < B.date2
;
quit;
Second, you can create a format. This is usually the faster option if it's possible to do this. This is tricky when you have IDs, but you can do it.
First, create a new variable, ID+date. Dates are numbers around 18,000-20,000, so multiply your ID by 100,000 and you're safe.
Second, create a dataset from the range dataset where START=lower date plus id*100,000, END=higher date + id*100,000, FMTNAME=some string that will become the format name (must start with A-Z or _ and have A-Z, _, digits only). LABEL is the value you want to retrieve (Trip_ID in the above example).
data b_fmts;
set b;
start=id*100000+date1;
end =id*100000+date2;
label=value_you_want_out;
fmtname='MYDATEF';
run;
Then use PROC FORMAT with CNTLIN=` option to import formats.
proc format cntlin=b_fmts;
quit;
Make sure your date ranges don't overlap - if they do this will fail.
Then you can use it easily:
data a_match;
set a;
trip_id=put(id*100000+date,MYDATEF.);
run;

Find matches by condition between 2 datasets in SAS

I'm trying to improve the processing time used via an already existing for-loop in a *.jsl file my classmates and I are using in our programming course using SAS. My question: is there a PROC or sequence of statements that exist that SAS offers that can replicate a search and match condition? Or a way to go through unsorted files without going line by line looking for matching condition(s)?
Our current scrip file is below:
if( roadNumber_Fuel[n]==roadNumber_TO[m] &
fuelDate[n]>=tripStart[m] & fuelDate[n]<=TripEnd[m],
newtripID[n] = tripID[m];
);
I have 2 sets of data simplified below.
DATA1:
ID1 Date1
1 May 1, 2012
2 Jun 4, 2013
3 Aug 5, 2013
..
.
&
DATA2:
ID2 Date2 Date3 TRIP_ID
1 Jan 1 2012 Feb 1 2012 9876
2 Sep 5 2013 Nov 3 2013 931
1 Dec 1 2012 Dec 3 2012 236
3 Mar 9 2013 May 3 2013 390
2 Jun 1 2013 Jun 9 2013 811
1 Apr 1 2012 May 5 2012 76
...
..
.
I need to check a lot of iterations but my goal is to have the code
check:
Data1.ID1 = Data2.ID2 AND (Date1 >Date2 and Date1 < Date3)
My desired output dataset woudld be
ID1 Date1 TRIP_ID
1 May 1, 2012 76
2 Jun 4, 2013 811
Thanks for any insight!
You can do range matches in two ways. First off, you can match using PROC SQL if you're familiar with SQL:
proc sql;
create tableC as
select * from table A
left join table B
on A.id=B.id and A.date > B.date1 and A.date < B.date2
;
quit;
Second, you can create a format. This is usually the faster option if it's possible to do this. This is tricky when you have IDs, but you can do it.
First, create a new variable, ID+date. Dates are numbers around 18,000-20,000, so multiply your ID by 100,000 and you're safe.
Second, create a dataset from the range dataset where START=lower date plus id*100,000, END=higher date + id*100,000, FMTNAME=some string that will become the format name (must start with A-Z or _ and have A-Z, _, digits only). LABEL is the value you want to retrieve (Trip_ID in the above example).
data b_fmts;
set b;
start=id*100000+date1;
end =id*100000+date2;
label=value_you_want_out;
fmtname='MYDATEF';
run;
Then use PROC FORMAT with CNTLIN=` option to import formats.
proc format cntlin=b_fmts;
quit;
Make sure your date ranges don't overlap - if they do this will fail.
Then you can use it easily:
data a_match;
set a;
trip_id=put(id*100000+date,MYDATEF.);
run;