How to manipulate sas7bdat files? - sas

I'm working with a sas7dbat file that was created erroneously and trying to fix it. When the file was created, the data was all input unto a single column instead of multiples, and I can't figure out how to manipulate it to do this. I thought I'd be able to use infile and make the dlm "|" with dsd to remove the quotations on the name column, but it seems that this problem is harder than it looks.
I basically want to turn that one column into the six it was supposed to be and delete the quotations from the names. Here's what it looks like in SAS:
And here's the datalines in case they're needed:
1 0017|2020-04-09|"Jason Nguyen"|122L|500.0|$404.82
2 0017|2020-04-09|"Jason Nguyen"|407XX|100.0|$201.95
3 0177|2020-04-05|"Glenda Johnson"|144L|100.0|$91.01
4 0177|2020-04-05|"Glenda Johnson"|188X|100.0|$70.76
5 0177|2020-04-05|"Glenda Johnson"|733|2.0|$101,230.00
6 0177|2020-04-05|"Glenda Johnson"|777|5.0|$106.29
7 1843|2020-04-03|"George Smith"|122|100.0|$60.64
8 1843|2020-04-03|"George Smith"|122L|10.0|$303.18
9 1843|2020-04-03|"George Smith"|144L|50.0|$91.01
10 1843|2020-04-03|"George Smith"|188S|3.0|$52,629.48
11 1843|2020-04-03|"George Smith"|855W|1.0|$92,210.41
12 1843|2020-04-03|"George Smith"|908X|1.0|$51,920.87
13 9888|2020-04-11|"Sharon Lu"|100W|1,000.0|$20.14
14 9888|2020-04-11|"Sharon Lu"|122|50.0|$60.64
(each line is one column inside SAS)

Go back and fix the import code would be my suggestion otherwise use the SCAN() function.
data want;
set have;
var1 = scan(variableName, 1, '|');
var2 = input(scan(variableName, 2, '|'), yymmdd.);
format var2 date9.;
var3 = dequote(scan(variableName, 3, '|'));
Another option is to write the file as is back to a text file and then import it using the DLM='|' option.
proc export data=have outfile='myfile.txt' dbms=dlm replace;
proc import out=want datafile='myfile.txt' dbms=dlm replace;
Given that it's only 6 variables though you may as well write the data step for that code anyways.


How to use filters when importing on sas

I have a very large data table on "dsv" format and i'm trying to import it on sas. However i don't have enough space to import the full table and then filter it (i've done this for smaller tables).
Is there any way to filter the data while importing it because at the end i will only use a part of that table ? If i want for example to import only rows that have the value 103 for Var2
PS: i'm using "proc import" not "data - infile..." because i don't know the exact number of columns
Thank you
You can add dataset options to the dataset listed in the OUT= option of PROC IMPORT.
filename dsv temp;
data _null_;
input (var1-var3) (:$20.);
file dsv dsd dlm='|';
put var1-var3;
Var1 Var2 Var3
A10 103 Test
A02 102 Hiis
proc import file=dsv dbms=csv out=want(where=(var2=102)) replace ;
The result is a dataset with just one observation.
NOTE: The data set WORK.WANT has 1 observations and 3 variables.
If you don't know the name of the second variable you could always just read the header row first and put the name into a macro variable.
data _null_;
infile dsv dsd dlm='|' truncover obs=1;
input (2*name) (:$32.);
call symputx('var2',nliteral(name));
proc import file=dsv dbms=csv out=want(where=(&var2=102)) replace ;
You can add a where dataset option to the out= statement. For example:
proc import
file = 'myfile.txt'
out = want(where=(var2=103))

Manually Reading in Data in SAS from CSV

So I have a large dataset that is rather oddly formatted and I want to read it in based on the header. It only has unique columns for each unique participant and each participant participated in multiple rounds of the study. The data is from some experiments and is formatted as having variables for each participant (e.g. "participant.code") then some session variables which I can drop and then the actual variables from the experiment. These are formatted as "study.[round number].player.[variable]"
Rather then repeating the variable for every round, I want to just take out the round number as a separate variable and have an observation for every round for each participant.
I want to read these in differently depending on the variable and pick it out. I would rather not have to manually mess with the source file since the experiment is going to be run multiple times.
If someone could just point me towards some relevant material or whatnot that would be great.
Thank you!
Edit: example of some of the raw data:
1,kppf7hjb,,0,221,221,study,FinalPay,2022-04-16 22:08:18.471115,1,,,0.0,lew8kph3,,,,,0,1.0,0.0,externality_control,0,2,Seller,0.0,1,0,0,10,0,125,125,50,100,50,0,0,0,1,1,,,1,3,,0,1,1,100,0,0,,50.0,,,,,,1,1,6,1,5,6,4,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,100,0,0,,45.0,,,,,,1,2,6,1,5,6,13,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,0,0,,0,,,100,0,0,,,,,,,,1,3,5,1,5,6,6,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,6,,0,,,138,1,0,,38.0,,,,,,1,4,6,1,5,6,3,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,2,,0,,,135,1,0,,35.0,,,,,,1,5,6,1,5,6,11,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,0,0,,0,,,100,0,0,,,,,,,,1,6,5,1,5,6,6,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,6,,0,,,132,1,0,,32.0,,,,,,1,7,6,1,5,6,4,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,5,,0,,,150,1,0,,50.0,,,,,,1,8,6,1,5,6,9,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,2,,0,,,100,0,0,,49.0,,,,,,1,9,6,1,5,6,10,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,5,,0,,,100,0,0,,39.0,,,,,,1,10,6,1,5,6,3,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,132,1,0,,32.0,,,,,,1,11,6,1,5,6,10,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,130,1,0,,30.0,,,,,,1,12,6,1,5,6,8,2,Seller,0.0,1,192,132,10,0,,,,,,,,,,,,,1,2,,0,,,128,1,0,,28.0,,,,,,1,13,6,1,5,6,11
Your file is not really as complicated as it first seems. For example the bulk of the data is just 43 columns that repeat 13 times. The STUDY.1 columns, then STUDY.2 columns etc.
For this one just write a program to read it. There are 22 columns that are not "study" columns. Then 13 copies of the 43 study columns.
data want;
infile csv dsd truncover firstobs=2;
input var1 ..... var22 #;
do study=1 to 13;
input svar1 .... svar43 # ;
So you turn each line into 13 observations (study=1 to study=13).
To complete the sketch of a data step above you just need figure out want names you want to use for the 65 (22 + 43) variables other than STUDY. And for each variable what type of variable it is, numeric or character, and when character what length it needs to store the longest possible value.
If you need to work with a lot of different variations of files in this style then it might be worth working on a program to analyze the headers and determine the role of the columns based on the pattern of the header name and perhaps generate the code to read the file.
You might start by building a dataset with just the header names.
data headers;
infile csv dsd obs=1 ;
length col 8 words 8 ;
array header [4] $50 ;
input header1 :$50. ## ;
do _n_=words to 1 by -1;
header[_n_] = scan(header1,_n_,'.');
You can use that list of the headers to help you figure out what would be useful names for the variables.
If you want to let SAS guess how to define and name the variables you could try splitting the CSV file into two separate CSV files. One with the first 22 columns and one with the other 43. So first split the headers (perhaps removing the STUDY.N. prefix while you are at it). Then split the data. Add an ROW number to make it easy to join them later.
filename single temp;
filename multiple temp;
data _null_;
infile csv dsd obs=1 ;
input header :$50. ## ;
file single dsd ;
if _n_=1 then put 'ROW,' #;
if _n_<= 22 then put header #;
else do;
file multiple dsd;
if _n_=23 then put 'ROW,STUDY,'# ;
call scan(header,3,pos,len,'.');
header = substr(header,pos);
put header #;
if _n_=22+43 then stop;
data _null_;
infile csv dsd firstobs=2 truncover ;
length s1-s43 $200 ;
input s1-s22 #;
file single dsd mod;
put row s1-s22 ;
file multiple dsd mod;
do study=1 to 13 ;
input s1-s43 # ;
put row study s1-s43 ;
Now you can use PROC IMPORT to GUESS how to read SINGLE and MULTIPLE and then you can join them back together.
proc import file=single dbms=csv out=single replace;
proc import file=multiple dbms=csv out=multiple replace;
data want;
merge single multiple;
by row;

Changing the first row name conditionally on character interval in SAS

Consider the following data:
data GDP;
input Year $ Agriculture Industry;
2016 195 1634
2017 220 1986
When exporting as a .dat file:
proc export
data = GDP
outfile = '....\GDP.dat'
dbms = TAB
Then I get the following file:
However, I want the following file:
Mydata is a text I manually add.
The number after for instance Year (that is Year: 1-4) is the character intervals where the values are within. For instance, the values in the Year column is from characther 1 to 4. The values in the agriculture column goes from 9 to 11, and so on.
So SAS should count the interval for the values and add it to the first row name. How to do it in SAS?
You can fudge this with labels to your variables and then add the LABEL option to PROC EXPORT.
data GDP;
input Year $ Agriculture Industry;
label Year = "Mydata, Year:1-4" Agriculture = "Agriculture:9-11";
2016 195 1634
2017 220 1986;
proc export
data = GDP
outfile = '....\GDP.dat'
dbms = TAB
FYI - it looks like you're trying to create a fixed width file and put the specifications in the header. I'd advise against this and either put the specifications in a separate file or to include it at the top of the file instead.
Putting it in the header makes it harder for any other system to process correctly.
If you really need this for some reason, you may also want to consider using a data step to create your export instead of using PROC EXPORT.
AFAIK there is no easy way to define the specifications automatically though you could push the PROC CONTENTS output to a separate data set.

SAS: Change dataset with loop count into append statement

TOPIC: Change dataset with loop count into append statement
I have a macro that will loop and create a new dataset with a counter behind.
Code like this:
DBMS=csv REPLACE; delimiter='09'x; getnames=no; RUN;
data test&i (drop= %do k=1 %to &cnt; &&col&k.. %end;
length station $10 voltage $10 year 8 month $20 transformer $10 Day $20 Date Time MW_Imp MW_Exp MVAR_Imp MVAR_Exp MVA
Power_Factor 8; format Time hhmm.; set out&i. end=last;
Currently the script will generate about 4 data sets if i have 4 external files by PROC IMPORT.
What i want is to eliminate the creation of multiple datasets but just append them into the master file. Is there a way to do so?
An append statement inside the loop should be sufficient to achieve this. SAS will copy first dataset as base since it was not existing.
proc append base=test data=test&i force; run;
Appending is probably just as easy, but if you don't want to create many datasets to begin with, you could use a data step to read in several files at once, using wildcards. That would eliminate the need to loop through the files, but does require that the files have the same structure and aren't stored in a folder with other similarly named files. The firstobs-option caused som issues in my tests, but as you have specified getnames=no in your import, I guess you have no need for it.
The snippet below inputs all csv files in c:\test.
data test;
infile "c:\test\*.csv" dsd delimiter='09'x;
input varA $ varB $;

import text file tab delimiter with variable name more than 32 chars

I am trying to import a text file with tab delimiter which has two variables.
ID var234488hhfyggyhuur_jhjhuytsdrkkjuht_kjy
1 5,6
2 10
3 122,5
4 0,6
I am able to import the file but not in the right format of the seciond variable and also the variable name is more than 32 char long.
data exam1;
infile "C:\Users\gght\Desktop\today.txt" firstobs=2 dlm='09'x ;
input id 3. var234488hhfyggyhuur_jhjhuytsdrkkjuht_kjy numx12.2;
Use a label to capture the variable name and use a generic variable name to import the data.
data exam1;
infile "C:\Users\gght\Desktop\today.txt" firstobs=2 dlm='09'x ;
label var2 = 'var234488hhfyggyhuur_jhjhuytsdrkkjuht_kjy';
input id 3. var2 numx12.2;
I am afraid there is no other way. You will have to rename the vars explicitly after the import of the file in SAS. It is well worth doing this once and re-using code if this file is something you're going to get with some frequency.
You can easily create your input statement in excel and copy-paste in your SAS program.