Combine two SAS datasets into one .dat file

Combine two SAS datasets into one .dat file - sas

I have two datasets in two different SAS tables that also have completely different data structures. I am being asked (not my idea) to export these datasets to one .dat file and essentially stack them on top of each other using a fixed width method. The below listed snippet of data is how the export should ultimately look when it gets to the .dat file. The first row is the result of the first dataset. The second row is result of the second dataset.
UH INCR000000XXXXXXXXXXXXXXXX
XXX SFLXXXXXXXXXXXX 000 M SMITH XXXXXX XXXXXXXXXXXXX9991231
I cant figure out exactly how to do this. Below is the code I've come up with that exports the data but the second data step just overwrites the first.

Here's an example using the MOD option on the FILE statement.
Note this may not work on all OS's.
filename test1 '/home/reeza/Demo1/testfile.dat';
data exportClass;
set sashelp.class;
file test1;
if _n_=1 then do;
put #1 "Name" #20 "Age" #30 "Sex";
end;
put #1 Name #20 Age #30 Sex;
run;
data exportClass;
set sashelp.class;
file test1 mod;
if _n_=1 then do;
put #1 "Name" #20 "Weight" #30 "Height";
end;
put #1 Name #20 Weight #30 Height;
run;
filename test1;

Related

Insert (internally existing) column headers as first row to a table

Assume that we have a table INPUT_TABLE which has four columns name, lat, lon, and z, filled with many data sets. In the SAS Explorer it would e.g. look like this:
name lat lon z
1 Germany 49.420469 8.7269178 17
2 England 51.5540693 -0.8249039 16
...
I handover a PREPROCESSED_TABLE based on this INPUT_TABLE to a macro %tabl:
data V42.PREPROCESSED_TABLE;
set V21.INPUT_TABLE;
drop NAME;
run;
%tabl(libin=V42, file=PREPROCESSED_TABLE);
The macro itself I am not allowed to modify.
Among other things, %tabl also writes a plain text file PREPROCESSED_TABLE.txt:
49.420469|8.7269178|17
51.5540693|-0.8249039|16
I would like to have the header names written out as well, e.g.:
lat|lon|z
49.420469|8.7269178|17
51.5540693|-0.8249039|16
My idea is to expand the PREPROCESSED_TABLE somewhere in the data step - could somebody help me with that, please? How can I read out the header names which are internally stored?

If the goal is to make a file with one line with the variable names then just write the file yourself. First get the names into a dataset (in order) and then write them. For example you could use PROC TRANSPOSE with OBS=0 dataset option to generate a file with one observation per variable.
proc transpose data=V42.PREPROCESSED_TABLE(obs=0) out=NAMES ;
var _all_ ;
run;
Which you can then use to write to a file.
data _null_;
set names ;
file 'preprocessed.txt' dsd dlm='|';
put _name_ # ;
run;
If you also want to add the data to that same file just use a second data step. Make sure to use the MOD option on the FILE statement so that data lines are appended to the existing file.
data _null_;
set V42.PREPROCESSED_TABLE;
file 'preprocessed.txt' dsd dlm='|' mod;
put (_all_) (+0);
run;
If you need to call the existing macro for other reasons you could either ignore the file it creates. Or if for some reason the content is different than just the simple dump of the file then you could just concatenate the file with the the headers with the file the macro generates. Say the macro generated 'PREPROCESSED_TABLE.txt' and your code generated the one line file 'headers.txt'. Then this step will read both and write 'PREPROCESSED_TABLE_w_headers.txt';
data _null_;
file 'PREPROCESSED_TABLE_w_headers.txt';
if _n_=1 then do;
infile 'headers.txt';
input;
put _infile_;
end;
infile 'PREPROCESSED_TABLE.txt';
input;
put _infile_;
run;

Given Reeza's and Tom's hints, I figured out a workaround myself: We simple call out macro %tabl twice, once with a 1-row-table with column-names and once with the data. This approach essentially corresponds to attaching to the file first the headers and then then data to the file (except that I have to worry about additional things added by %tabl further down in the process chain).
The technical difficulty I had was how to extract this 1-row-table with column names from the meta-info of the table input table V21.INPUT_TABLE.
My team mate showed me how that is done. To make it testable for everybody, I will show this step for the test data table sashelp.class:
proc contents data=sashelp.class out=meta (keep=NAME VARNUM) noprint;
run;
proc sort data=meta out=meta2;
by VARNUM;
run;
proc transpose data=meta2 out=colheaders (drop=_NAME_ _LABEL_);
var name;
run;
As a result, we will have a table colheaders with exactly one line containing the table headers, sorted by VARNUM which is the order in which they appear in the original table:
COL1 COL2 COL3 COL4 COL5
1 NAME SEX AGE HEIGHT WEIGHT
Problem solved, at least theoretically.

Struggling trying to read this txt dataset into SAS

That's the dataset. I need a variable for ShipID, Received, Shipped, City, Zip Code. How would I go about doing that?
This is my first statistical programming language course and I am struggling. My professor hasn't been much of a help either.
ShipID Received Shipped Address .
X8742 2018/03/14 2018/03/17 Little River, KS, 67457
There's a ton more lines and I've been lost on it for an hour.
infile "/home/rossfosher0/SAS Homework/SAS Sessions/WarehouseA.txt" firstobs = 2;
input #2-7 ShipID $ #9-18 Received: YYYYMMDD8. #20-28 Shipped: YYYYMMDD8. #City $;
run;
I'm trying to set up a data set for this warehouse.

data mydata;
input #1 shipid $ #7 received yymmdd10. #18 shipped yymmdd10. #28 address $30.;
format received yymmdd10. shipped yymmdd10.;
datalines;
X8742 2018/03/14 2018/03/17 blue ridge, MA 02391
;
run;

Assuming that all rows have values for the first three variables you could just read those using list mode input. Then read the rest of the line as the address.
data want;
infile "..." firstobs=2 truncover;
input shipid $ received shipped address $50. ;
informat received shipped yymmdd.;
format received shipped yymmdd10.;
run;
If the data is really in fixed columns then you can use column locations in your INPUT statement, but that is not compatible with using informats. So either use formatted input for the two date fields or read them as strings.
input shipid $1-7 #8 Received yymmdd10. #19 Shipped yymmdd10. Address $ 30-79 ;
format Received Shipped yymmdd10.;

Tom and DCR are both right. I prefer an easier route using Proc import.
proc import datafile='c:\personal\My_file.csv'
out=SAS_data replace;
DELIMITER=";" ;
getnames=yes;
guessingrows= 32767;
run;
What this does is that it makes a guess based on the file read and auto creates the infile-statement. (I just copy it from log and make adjustments if something is read incorrectly.)
If you know the structure of the data follow the other answers, but this is more beginnger friendly approach. (imho) For more see documentation

Using Line Pointer Controls with Packed Decimals SAS

I'm trying to use line pointer controls in my SAS program which utilizes many INPUT statements based on the value of certain variables. Many of my fields contain packed decimals and I think this is causing issues with the line-pointer controls, it seems that the program doesn't determine the packed decimal before it unpacks and has moved the column pointer incorrectly.
I have no way of knowing how many of my 'segments' to read, I just know the maximum number that is possible, so I will be needing to check values through the program before reading the data and use relative pointer controls based on that 'segment'. I'll be looping and using line-pointer controls (INPUT +N) to accomplish this.
My old program worked correctly when I knew the exact columns I needed to read in, so I simply used an input statement with.
Here's a sample of the old program, I'm only including the top portion because it will illustrate what I had:
....some rsubmitting and options statements....
%DO i=0 %TO 2;
filename MyFN "MyFile(-&i.)" disp=shr;
DATA ReadInTemp;
INFILE MyFN MISSOVER PAD;
INPUT
#1 Pro_Ind $1. #;
IF Pro_Ind ="H" or Pro_Ind ="T" THEN DELETE;
IF Pro_Ind ="1" THEN DO;
INPUT
#2 Time_Stamp ? PD8.
#10 MyVar2 ? $1.
#11 MyVar3 ? $20.
#31 MyVar4 ? $2.
Here's the program I'm trying with Line Controls:
....some rsubmitting and options statements....
%DO i=0 %TO 2;
filename MyFN "MyFile(-&i.)" disp=shr;
DATA ReadInTemp;
INFILE MyFN missover pad;
INPUT
#1 Pro_Ind $1. #;
IF Pro_Ind ="H" or Pro_Ind ="T" THEN DELETE;
IF Pro_Ind ="1" THEN DO;
INPUT
#2 Time_Stamp ? PD8. +7
MyVar1 ? $1. +1
MyVar2 ? $20. +19
MyVar3 ? $2. +1
Please keep in mind this is only the sample of where A) the program used to work and B) where it is not working now. I understand that there is no END and %END statements, etc, but I believe my issue is after I am reading this TimeStamp variable which contains the Packed Decimal.

One issue in your program with line controls is that the informat will advance the column pointer by the informat width, and then you are explicitly advancing more.
To be certain the informat reads the data as-is (retain leading spaces #Tom) use $CHARw. instead of $w. and remove the extraneous +n
INPUT
#2 Time_Stamp ? PD8.
MyVar1 ? $CHAR1.
MyVar2 ? $CHAR20.
MyVar3 ? $CHAR2.
...

How do I print out newly added variables within a SAS dataset in SAS studio

So I imported a SAS dataset and specified the desired variables while correctly formating them.
FILENAME currency '/folders/myfolders/SAS assignment/Assignment4/currency.txt';
data assn4.currency;
infile currency;
input
#1 currencynotes $3.
#6 purchasedate mmddyy10.
#19 purchasevalue 7.0000
#30 selldate mmddyy10.
#44 sellvalue 7.0000
#55 numberofnotespurchased;
I then added in a number of SAS variables based on the other variables
data assn4.currency;
set assn4.currency;
Timeheld = selldate-purchasedate;
run;
data assn4.currency;
set assn4.currency;
value_at_dollar_per_purchase = numberofnotespurchased/purchasevalue;
run;
data assn4.currency;
set assn4.currency;
value_at_dollar_per_sale = numberofnotespurchased/sellvalue;
run;
data assn4.currency;
set assn4.currency;
profit= value_at_dollar_per_sale-value_at_dollar_per_purchase;
run;
data assn4.currency;
set assn4.currency;
PPD = profit/Timeheld;
run;
I then wanted to format and print out the dataset along with these new variables, however I do not know the spacing of these new variables and the dataset created in my ASSN4 library has column numbers instead of the spacing information i used from the imported txt file.
data assn4.currency;
infile currency;
input
#1 currencynotes $3.
#6 purchasedate mmddyy10.
#19 purchasevalue 7.0000
#30 selldate mmddyy10.
#44 sellvalue 7.0000
#55 numberofnotespurchased
#65 Timeheld mmddyy10.
value_at_dollar_per_purchase 12.00000000
value_at_dollar_per_sale 12.00000000
profit 12.0000000000
PPD 12.0000000000
;
when I attempt to print out my dataset using
Proc Print data = assn4.currency;
run;
all these new variables had . denoting missing info, while the new dataset created that is in the library shows these values.

I'll try to keep my answer simple and short despite the fact that it seems you lack some basic SAS knowledge.
In a data step, you use infile to read from an external file. To read from a SAS data set, you use a set statement.
In the first step, you created a dataset called currency in a library called assn4 by reading from your text file. In the next few steps, you correctly add variables to that dataset, although all this could be done in one step.
However in the last step, you overwrite your dataset by reading again from your text file (with the infile statement). You then of course lose all the variables you had created.
This does what (I think) you are trying to achieve:
FILENAME currency '/folders/myfolders/SAS assignment/Assignment4/currency.txt';
data assn4.currency;
infile currency;
input
#1 currencynotes $3.
#6 purchasedate mmddyy10.
#19 purchasevalue 7.
#30 selldate mmddyy10.
#44 sellvalue 7.
#55 numberofnotespurchased
;
Timeheld = selldate-purchasedate;
value_at_dollar_per_purchase = numberofnotespurchased/purchasevalue;
value_at_dollar_per_sale = numberofnotespurchased/sellvalue;
profit= value_at_dollar_per_sale-value_at_dollar_per_purchase;
PPD = profit/Timeheld;
format
Timeheld mmddyy10.
value_at_dollar_per_purchase
value_at_dollar_per_sale
profit
PPD 12.
;
run;
Note that I changed your formats to what they are actually equivalent to. Adding a bunch of zeros after the dot in a format does absolutely nothing.

Need to repeate data reading from multiple files using sas and run freqs on separate dataset created from separate files

I am new to SAS and facing few difficulties while creating following program.
My requirement is to pass the filename generated dynamically and read it so that don't have to write code five times to read data from 5 different files and then run freqs on the datasets.
I have provided the code below and have to write this code for more than 50 files:
Code
filename inp1 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1102_t1102_c10216_vEL5535.raw';
filename inp2 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1103_t1103_c10317_vEL8312.raw';
filename inp3 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1104_t1104_c10420_vEL11614.raw';
filename inp4 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1105_t1105_c10510_vEL13913.raw';
filename inp5 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1106_t1106_c10628_vEL17663.raw';
data test;
Do i = 1 to 5;
infile_name = 'inp' || i;
infile infile_name recfm = v lrecl=1800 end=eof truncover;
INPUT
#1 E_CUSTDEF1_CLIENT_ID $CHAR5.
#1235 E_MED_PLAN_CODE $CHAR20.
#1090 MED_INS_ELIG_COVERAGE_IND $CHAR20.
#1064 MED_COVERAGE_BEGIN_DATE $CHAR8.
#1072 MED_COVERAGE_TERM_DATE $CHAR8.
;
if E_CUSTDEF1_CLIENT_ID ='00002' then
output test;
end;
run;
proc freq data = test;
tables E_CUSTDEF1_CLIENT_ID*E_MED_PLAN_CODE / list missing;
run;
Please help!!

Here's an example you can adapt. There are different ways to do this, but this is one- depending no how you want the frequencies.
Step 1: Create a dataset, 'my_filenames', that stores the filename you want to read in, one per line, in a variable FILE_NAME.
Step 2: Read in the files.
data my_data;
set my_filenames;
infile a filevar=file_name <the rest of your options>;
<your input statement>;
run;
proc freq data=mydata;
by file_name;
<your table statements>;
run;
This is simple, data driven code that doesn't require macros or storing large amounts of data in things that shouldn't have data in them (macro variables, filenames, etc.)

To directly answer your question, here is a SAS macro to read each file and run PROC FREQ:
%macro freqme(dsn);
data test;
infile "&dsn" recfm = v lrecl=1800 end=eof truncover;
INPUT #1 E_CUSTDEF1_CLIENT_ID $CHAR5.
#1235 E_MED_PLAN_CODE $CHAR20.
#1090 MED_INS_ELIG_COVERAGE_IND $CHAR20.
#1064 MED_COVERAGE_BEGIN_DATE $CHAR8.
#1072 MED_COVERAGE_TERM_DATE $CHAR8.
;
if E_CUSTDEF1_CLIENT_ID = '00002';
run;
proc freq data=test;
tables E_CUSTDEF1_CLIENT_ID*E_MED_PLAN_CODE / list missing;
run;
proc delete data=test;
run;
%mend;
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1102_t1102_c10216_vEL5535.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1103_t1103_c10317_vEL8312.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1104_t1104_c10420_vEL11614.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1105_t1105_c10510_vEL13913.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1106_t1106_c10628_vEL17663.raw);
Note that I added a PROC DELETE step to delete the SAS data set after creating the report. I did that more for illustration, since you don't say you need the file as a SAS data set for subsequent processing.
You can use this as a template for other macro programming.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Combine two SAS datasets into one .dat file - sas

Related

Insert (internally existing) column headers as first row to a table

Struggling trying to read this txt dataset into SAS

Using Line Pointer Controls with Packed Decimals SAS

How do I print out newly added variables within a SAS dataset in SAS studio

Need to repeate data reading from multiple files using sas and run freqs on separate dataset created from separate files

Categories

Resources