I have a dataset that I'm trying to work with
My code is
DATA AlbumData;
INFILE '/folders/myfolders/Data.txt' DLM=',';
INPUT Movie $ Director $ Date ANYDTDTE10. Budget BoxOffice Genre $;
RUN;
My issue is the first line of the .txt file I'm importing has these same headings. How do I work around that?
Try adding the FIRSTOBS=2 option to the INFILE line. This will tell the file pointer to start on line 2.
DATA AlbumData;
INFILE '/folders/myfolders/Data.txt' DLM=',' FIRSTOBS=2;
INPUT Movie $ Director $ Date ANYDTDTE10. Budget BoxOffice Genre $;
RUN;
Related
I need to use the INFILE statement to read a file called np_traffic.csv, name the table traffic2, and only import a column called ReportingDate as a character.
Current Code is giving me the error
"The data set WORK.TRAFFIC2 may be incomplete. When this step was
stopped there were 0 observations and 1 variables."
DATA traffic2;
INFILE “E:/Documents/Week 2/np_traffic.csv”
dsd firstobs=2;
INPUT ReportingDate $;
RUN;
Let's assume that you really have a delimited text file, which is what a CSV file is, instead of the spreadsheet you pictured in the photograph in your post. To read the 6th field in a line you need to first read the first 5 fields. That does not mean you need use the values read from those fields.
data traffic2;
infile “E:/Documents/Week 2/np_traffic.csv”
dsd firstobs=2
;
length dummy $1 ReportingDate $12;
input 5*dummy ReportingDate ;
drop dummy;
run;
I would suggest to try it this way:
data traffic2;
drop a b c d e g;
infile 'E:\Documents\Week 2\np_traffic.csv' dsd dlm='<Insert your delimiter>' firstobs=2;
input a b c d e f g;
run;
https://documentation.sas.com/?docsetId=lestmtsref&docsetTarget=n1rill4udj0tfun1fvce3j401plo.htm&docsetVersion=9.4&locale=en
I am running SAS Studio 3.6 Basic Edition. I am a beginner at SAS and I can't get past this error that I've been having. I have the code below and the file is in the correct place. I created a folder under the "my folders" in the sidebar called "Exercises" and under that I created a folder called "data". It seems that it is not reading the file but I'm not sure why because the path is correct (to my knowledge).
Any ideas? I have already tried googling and most of the results with this error have to do with _WEBOUT which I don't believe is my problem.
DATA SALARIES;
INFILE '/Exercises/data/AAUP_data.txt';
INFILE SALARIES delimiter=',';
INPUT FICE College_Name $ State $ Type $ Average_Salary_Full
Average_Salary_Assoc Average_Salary_Asst Average_Salary_All
Average_Comp_Full Average_Comp_Assoc Average_Comp_Asst Average_Comp_All
Number_of_Professors_Full Number_of_Professors_Assoc
Number_of_Professors_Asst Number_of_Instructors Number_of_Faculty_All
;
RUN;
PROC PRINT;
RUN;
I appreciate the help.
Your second infile statement is meant to be used in conjunction with a filename statement.
If your SALARIES infile is meant to be the text file stored at '/Exercises/data/AAUP_data.txt', then there are two ways you could write this:
FILENAME SALARIES '/Exercises/data/AAUP_data.txt';
DATA SALARIES;
INFILE SALARIES delimiter=',';
INPUT FICE College_Name $ State $ Type $ Average_Salary_Full
Average_Salary_Assoc Average_Salary_Asst Average_Salary_All
Average_Comp_Full Average_Comp_Assoc Average_Comp_Asst Average_Comp_All
Number_of_Professors_Full Number_of_Professors_Assoc
Number_of_Professors_Asst Number_of_Instructors Number_of_Faculty_All
;
RUN;
PROC PRINT;
RUN;
or simply
DATA SALARIES;
INFILE '/Exercises/data/AAUP_data.txt' delimiter=',';
INPUT FICE College_Name $ State $ Type $ Average_Salary_Full
Average_Salary_Assoc Average_Salary_Asst Average_Salary_All
Average_Comp_Full Average_Comp_Assoc Average_Comp_Asst Average_Comp_All
Number_of_Professors_Full Number_of_Professors_Assoc
Number_of_Professors_Asst Number_of_Instructors Number_of_Faculty_All
;
RUN;
PROC PRINT;
RUN;
My input file looks like this:
FCODE,AIRPORT,TDATE,CHILD,ADULT,ECAR,FCAR
IA06100,CDGWE,01APR2000,8,85,3328,11730
IA05900,CDG,01APR2000,8,85,2392,8415
IA07200,FRA,01APR2000,10,97,1720,5432
IA04700,LHR,01APR2000,14,120,2576,7320
I want to read this file and create a dataset, where the TDATE file is formatted into mm/dd/yyyy. The following is my base sas code:
data test;
infile "C:\SAS Training\testfile.txt" firstobs=2 dlm=',';
input FCODE $ AIRPORT $ TDATE $ CHILD ADULT ECAR FCAR;
format tdate MMDDYYYY10.;
run;
But I am getting ERROR 48-59: The informat $MMDDYY was not found or could not be loaded. Essentially I want to convert the date "01APR2000" in the input file to "04/01/2000" in the dataset.
Thanks
Try this below. TDATE is a date (special case of numeric in SAS) so no following $ in input statement. You also make some confusion between informat (how SAS reads raw data and convert it) and format (how SAS displays data).
data test;
infile "C:\SAS Training\testfile.txt" firstobs=2 dlm=',';
input FCODE $ AIRPORT $ TDATE CHILD ADULT ECAR FCAR;
informat tdate DATE9.;
format tdate DDMMYY10.;
run;
You need to use proper INFORMAT
TDATE :date.
Note the :
I would like to import data from a csv file in a permanent data set which has this date column with data format like "dd-mmm-yyyy" like "22-FEB-1990". I want this to be imported as date format inside the data set too. I have tried many format informats but i am not getting anything in the column.
Here is the code i wrote(While I commented out certain things I have tested all the permutations and combinations with the formats and informats i could think of):
libname asgn1 "C:\Users\*****\abc";
data asgn1.Car_sales_1_1;
infile "C:\Users\********\Car_sales.csv" dsd dlm="," FIRSTOBS=2 ;
input Manufacturer $ Model $ Fuel_efficiency Latest_Launch;
* format Latest_Launch mmddyy10.;
* informat Latest_Launch mmddyy10.;
run;
Please help...
Change your informat to date11. (dd-mmm-yyyy).
SAS Informats by Category > http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htm
I tried the following code and I got just the result I wanted....Thanks #Chris J
libname asgn1 "C:\Users\*****\abc";
data asgn1.Car_sales_1_1;
infile "C:\Users\********\Car_sales.csv" dsd dlm="," FIRSTOBS=2 ;
input Manufacturer $ Model $ Fuel_efficiency Latest_Launch;
informat Latest_Launch date11.;
format Latest_Launch ddmmyy10.;
run;
I am currently using SAS version 9 to try and read in a flat file in .txt format of a HTML table that I have taken from the following page (entitled Wayne Rooney's Match History):
http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney
I've got the data into a .txt file using a Python webscraper using Scrapy. The format of my .txt file is like thus:
17-08-2013,1 : 4,Swansea,Manchester United,28',7.26,Assist Assist,26-08-2013,0 : 0,Manchester United,Chelsea,90',7.03,None,14-09-2013,2 : 0,Manchester United,Crystal Palace,90',8.44,Man of the Match Goal,17-09-2013,4 : 2,Manchester United,Bayer Leverkusen,84',9.18,Goal Goal Assist,22-09-2013,4 : 1,Manchester City,Manchester United,90',7.17,Goal Yellow Card,25-09-2013,1 : 0,Manchester United,Liverpool,90',None,Man of the Match Assist,28-09-2013,1 : 2,Manchester United,West Bromwich Albion,90'...
...and so on. What I want is a dataset that has the same format as the original table. I know my way round SAS fairly well, but tend not to use infile statements all that much. I have tried a few variations on a theme, but this syntax has got me the nearest to what I want so far:
filename myfile "C:\Python27\Football Data\test.txt";
data test;
length date $10.
score $6.
home_team $40.
away_team $40.
mins_played $3.
rating $4.
incidents $40.;
infile myfile DSD;
input date $
score $
home_team $
away_team $
mins_played $
rating $
incidents $ ;
run;
This returns a dataset with only the first row of the table included. I have tried using fixed widths and pointers to set the dataset dimensions, but because the length of things like team names can change so much, this is causing the data to be reassembled from the flat file incorrectly.
I think I'm most of the way there, but can't quite crack the last bit. If anyone knows the exact syntax I need that would be great.
Thanks
I would read it straight from the web. Something like this; this works about 50% but took a whole ten minutes to write, i'm sure it could be easily improved.
Basic approach is you use #'string' to read in text following a string. You might be better off reading this in as a bytestream and doing a regular expression match on <tr> ... </tr> and then parsing that rather than taking the sort of more brute force method here.
filename rooney url "http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney" lrecl=32767;
data rooney;
infile rooney scanover;
retain are_reading;
input #;
if find(_infile_,'<table id="player-fixture" class="grid fixture">')
then are_reading=1;
if find(_infile_,'</table>') then are_reading=0;
if are_reading then do;
input #'<td class="date">' date ddmmyy10.
#'class="team-link">' home_team $20.
#'class="result-1 rc">' score $10.
#'class="team-link">' away_team $20.
#'title="Minutes played in this match">' mins_played $10.
#'title="Rating in this match">' rating $6.
;
output;
end;
run;
As far as reading the scrapy output, you should change at least two things:
Add the delimiter. Not truly necessary, but I'd consider the code incorrect without it, unless delimiter is space.
Add a trailing "##" to get SAS to hold the line pointer, since you don't have line feeds in your data.
data want;
infile myfile flowover dlm=',' dsd lrecl=32767;
length date $10.
score $6.
home_team $40.
away_team $40.
mins_played $3.
rating $4.
incidents $40.;
input date $
score $
home_team $
away_team $
mins_played $
rating $
incidents $ ##;
run;
Flowover is default, but I like to include it to make it clear.
You also probably want to input the date as a date value (not char), so informat date ddmmyy10.;. The rating is also easily input as numeric if you want to, and both mins played and score could be input as numeric if you're doing analysis on those by adding ' and : to the delimiter list.
Finally, your . on length is incorrect; SAS is nice enough to ignore it, but . is only placed like so for formats.
Here's my final code:
data want;
infile "c:\temp\test2.txt" flowover dlm="',:" lrecl=32767;
informat date ddmmyy10.
score_1 score_2 2.
home_team $40.
away_team $40.
mins_played 3.
rating 4.2
incidents $40.;
input date
score_1
score_2
home_team $
away_team $
mins_played
rating ??
incidents $ ##;
run;
I remove the dsd as that's incompatible with the ' delimiter; if DSD is actually needed then you can add it back, remove that delimiter, and read minutes in as char. I add ?? for rating as it sometimes is "None" so ?? ignores the warnings about that.