I am new to SAS.
I want to read a txt file in SAS. The thing is the file has missing data in some lines.
For example the data is:
James Monroe Monroe Hall Virginia 58 4/28/1758 1816
John Quincy Adams Braintree Massachusetts 57 7/11/1767 1824 113,142 30.92%
Andrew Jackson Waxhaws Region South/North Carolina 61 3/15/1767 1828 642,806 55.93%
Martin Van Buren Kinderhook New York 54 12/5/1782 1836 763,291 50.79%
William Henry Harrison Charles City County Virginia 68 2/9/1773 1840 1,275,583 52.87%
The columns I want are 'FullName', 'City', 'State', 'Age', DOB', 'Year','Number' and 'Percentage'.
My code is:
infile 'C:\sasfiles\testfile.txt' missover dlm='09'x dsd
input FullName City State Age DOB Year Number PercVote;
Run;
But I get the error
9 CHAR Rutherford B. Hayes.Delaware ..Ohio..54.10/4/1822.1876.4,034,142.47.92% 73
ZONE 5776676676242246767046667676222004666003303323233330333303233323330332332
NUMR 2548526F2402E081953945C1712500099F89F9954910F4F18229187694C034C142947E925
FullName=. City=. State=. Age=. DOB=. Year=54 Number=. PercVote=1876 ERROR=1 N=19
NOTE: Invalid data for FullName in line 20 1-17.
NOTE: Invalid data for City in line 20 19-32.
NOTE: Invalid data for State in line 20 37-40.
NOTE: Invalid data for Number in line 20 46-55.
NOTE: Invalid data for PercVote in line 20 62-70.
You need to provide data in a format that can by properly parsed.
When using delimited data then use adjacent delimiters to indicate there is a value missing. It is best to define the variables before you use them. If you define them in order then your input statement can be very simple.
data want ;
infile cards dsd dlm='|' firstobs=2 truncover ;
length name $13 city $11 state $15 age 8 dob 8 yod 8 ;
input name -- yod ;
informat dob mmddyy10.;
format dob yymmdd10.;
cards;
----+----0----+----0----+----0----+----0----+----0----+----0----+----0
James Monroe|Monroe Hall|Virginia||4/28/1758|1816
John Quincy Adams|Braintree|Massachusetts|57|7/11/1767|1824
;
Or force the data into columns and use column input.
data want ;
infile cards firstobs=2 truncover ;
input name $ 1-13 city $ 19-29 state $ 31-45 age 47-48 #50 dob mmddyy10. yod 61-64;
format dob yymmdd10.;
cards;
----+----0----+----0----+----0----+----0----+----0----+----0----+----0
James Monroe Monroe Hall Virginia 4/28/1758 1816
John Quincy Adams Braintree Massachusetts 57 7/11/1767 1824
;
Try this:
data want;
infile cards dlm='09'x missover;
input (FullName City State) (:$32.) Age :8. DOB :mmddyy9. Year :4. Number :comma8. PercVote :percent8.;
format DOB mmddyy10. number comma16. percvote percent6.2;
cards;
James Monroe Monroe Hall Virginia 58 4/28/1758 1816
John Quincy Adams Braintree Massachusetts 57 7/11/1767 1824 113,142 30.92%
Andrew Jackson Waxhaws Region South/North Carolina 61 3/15/1767 1828 642,806 55.93%
Martin Van Buren Kinderhook New York 54 12/5/1782 1836 763,291 50.79%
William Henry Harrison Charles City County Virginia 68 2/9/1773 1840 1,275,583 52.87%
;
run;
Related
Piggy backing on a similar question I asked
(Summing a Column By Group In a Dataset With Macros)...
I have the following dataset:
Month Cost_Center Account Actual Annual_Budget
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
I would like to "splice" it so each month has its own respective column for Actual while summing the numeric values by Account.
So for example, I want the output to look like the following:
Account May_Actual_Sum June_Actual_Sum Annual_Budget
Postage 14562 37960 255251
Phone 4564 2660 32241
The code below provided by a fellow user works great when not needing to further dis-aggregated by month; however, I'm not sure if it's possible to do so (I tired adding a 'by month clause' - didn't work).
proc means data=Test N SUM NWAY STACKODS;
class Account_Description;
var Actual annual_budget;
by month;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
data want;
set summary_stats2;
run;
Use PROC MEANS to get summaries - same as last time. Please read up the documentation on PROC MEANS to understand how the CLASS statements works and how you can control the different levels of output.
Use PROC TRANSPOSE to flip the data wide. Since the budget amount is consistent across rows you'll be fine.
I'm guessing your next set of question will then be how to sort the columns correctly because your months won't sort and how to reference them dynamically to calculate the month to date changes. Which are some of the reasons why this data structure is not recommended.
data have;
input Month $ Cost_Center $ Account $ Actual Annual_Budget;
cards;
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
;
;
;;
run;
*summarize;
proc means data=have noprint nway;
class account month;
var actual annual_budget;
output out=temp sum=actual_total budget_total;
run;
*transpose;
proc transpose data=temp out=want prefix=Month_;
by account budget_total;
var actual_total;
id month;
run;
Output:
I cannot think of a way to generate this report using just one PROC. You will need to do some post processing of PROC MEANS or PROC SUMMARY results to get to this:
proc means data=have SUM ;
class Account month;
var Actual annual_budget;
output out = summary_stats SUM=;
run;
/* Look at summary_stats to understand it's structure here */
/* Otherwise you will not understand the following code */
proc sort data = summary_stats;
where _type_ in (2,3);
by account;
run;
data want;
set summary_stats;
by account ;
retain May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
if first.account then Annual_Budget_sum = Annual_Budget;
else do;
select(month);
when ('May') May_Actual_Sum = actual;
when ('June') June_Actual_Sum = actual;
/* List other months also here. Can use some macros here to make the code compact and expandable for future enhancements */
end;
end;
if last.account then output;
keep account May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
run;
I have data which doesn't appear to have consistent spacings or positioning. It looks like:
1675 C Street , Suite 201
Anchorage AK 99501
61.205475 -149.886882
600 Azalea Road
Mobile AL 36609
30.656824 -88.148781
1601 Harbor Bay Parkway , Suite 150
Alameda CA 94502
37.726114 -122.240546
1900 Point West Way, Suite 270
Sacramento CA 95815
38.5994175 -121.4315844
3600 Wilshire Blvd., Suite 1500
Los Angeles CA 90010
34.06153 -118.303463
From this I'd like to extract the street address, city name, state, zip code, lat, and long. I thought the following code would work, but it produces very weird results.
data voa;
input Address $50.;
input City $ State $ Zip;
input Latitude Longitude;
datalines;
I think the issue comes from the fact that there isn't consistent spacing or positioning of the elements.
Your data will work fine using LIST input you just need to add the "look for double delimiter option" & to CITY plus it need to be a bit longer $16 or so.
input City &$16. State $ Zip;
In the absence of consistent delimiters or fixed width fields, this is easier to do using scan:
data want;
infile cards truncover;
length STATE $2 CITY $32;
input Address $50.;
input;
ZIP = input(scan(_INFILE_, -1),5.);
STATE = scan(_INFILE_, -2);
CITY = trim(substr(_INFILE_,1,index(_INFILE_,STATE) - 1));
input Latitude Longitude;
cards;
1675 C Street , Suite 201
Anchorage AK 99501
61.205475 -149.886882
600 Azalea Road
Mobile AL 36609
30.656824 -88.148781
1601 Harbor Bay Parkway , Suite 150
Alameda CA 94502
37.726114 -122.240546
1900 Point West Way, Suite 270
Sacramento CA 95815
38.5994175 -121.4315844
3600 Wilshire Blvd., Suite 1500
Los Angeles CA 90010
34.06153 -118.303463
;
run;
What's the code program in SAS to stack data?
For the purpose of example, lets say I have this dataset:
DATA test.one;
INPUT Name $ Y1996 Y1997 Y1998 Y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
Running this set would give me an output like this:
Name Y1996 Y1997 Y1998 Y1999
Dan 5 10 40 20
Derek 10 12 10 10
However, I would want my data to look like this:
Name Year Income
Dan 1996 5
Dan 1997 10
Dan 1998 40
Dan 1999 20
Derek 1996 10
Derek 1997 12
Derek 1998 10
Derek 1999 10
It would create a new variable income corresponding to the stacking the of the data as shown above.
Are you asking how to read the raw data directly into that form?
DATA want;
INPUT Name $ #;
do year=1996 to 1999;
input income #;
output;
end;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
;
The PROC Transpose can solve this;
DATA test.one;
INPUT Name $ y1996 y1997 y1998 y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
proc transpose data=test.one out=long1;
by name;
run;
data test2;
set long1 (rename=(col1=Income));
RUN;
It will then transform the dataset into a stacked version.
i need help in finding how to convert datavalues in a column to reverse order into new column or same column.I mean first datavalue in column should be the last value in column and vice versa.
example:
name age
karl 40
lowry 56
jim 29
robert 34
samuel 60
harry 47
the output i need should look like this.
name age
harry 47
samuel 60
robert 34
jim 29
lowry 56
karl 40
i need reverse order of the datavalues on variables age and name or only on one variable.
First create a variable of the observation number:
data temp;
set have;
ObsNum = _n_;
run;
Then use that variable to sort the dataset:
proc sort data=temp out=want (drop=ObsNum);
by descending ObsNum;
run;
I'm trying to read in some raw data using datalines...
data Exp_data;
INPUT a: 2. b: 2. DATE1: MMDDYY10. DATE2: MMDDYY10.;
FORMAT DATE1 DATE9. DATE2 DATE9.;
datalines;
27 93 03/16/2008 03/17/2008
27 93 03/17/2009 03/19/2009
68 68
55 55
46 68
34 34
45 67
56 75
34 34
34 34
;RUN;
But this code is reading data until 6 th row. I couldn't figure out where I'm doing mistake.
Thanks in advance!
Add this line before your input statement.
infile datalines missover;
As of the third row you don't have 4 values, so SAS needs to know what to do with the missing values. Missover tells sas to set the remaining values to missing.