Validating SAS dataset - sas

I want to validate a dataset which has 100 variables and over 100000 records. I am importing data in string form (even for the numeric data). I observed some variables getting truncated in the end. How do I validate each variable to make sure that the data populated totally (not truncated)?
Sample data:
data dsn;
infile "xyz.txt" dlm= '|' RECFM=V LRECL=2000 PAD MISSOVER;
length
a1 $20.
a2 $100.
a3 $50.
;
input a1 $
a2 $
a3 $
;
run;
For example the string value is 1532564.7564 and I am getting 1532564.756 after my import. So, My question is this is the value that got is getting truncated. But, when I change it to numerical data then I would get the full value. Like wise, Licnum is character data (eg:12xd456) and this is getting truncated in the last digit ( shows up as12xd4).

Try these adjustments
Increase the LRECL
Anything beyond 2000th character column would be getting clipped.
Increase the $ length of the variable getting clipped.
The value 1532564.7564 is 12 characters and won't fit in a variable that is $11.
Make sure any variables with a $<n>. format are not longer than <n>
If you have attrib Licnum length=$7 format=$5. then only the first five characters will be seen when the data is viewed.

Related

SAS character and numeric change with set statement

I am working to merge two data sets and get the following error:
Variable DOB has been defined as both character and numeric.
Here is my code. I know I need a set statement to change the character to numeric. I was thinking:
DATA Merged1;
SET Aug21 Aug22;
RUN;
set (rename=(DOB=DOBnum));
length DOB $ 10.;
DOB= put(DOBnum,f10. -L);
drop DOBnum;
Would this be placed before my Set statement to merge to Aug 21 Aug 22?
Thank you!
I tried to run the code but it would not merge, unsure if where the Set statement for DOB would go
You do not need the second SET statement. You need to add the RENAME= dataset option to the dataset where it is mentioned in the first SET statement.
So something like:
DATA BOTH;
SET Aug21 Aug22(in=in2 rename=(DOB=DOBnum));
if in2 then DOB= put(DOBnum,f10. -L);
drop DOBnum;
RUN;
To get a more detailed answer provide more details about the variables and the types of values they contain. For example if DOB means Date of Birth then it does not make much sense to use the F format. If DOB should be an actual DATE then it should be numeric and not character. And if the version that is numeric has actual date values then converting them to text using the F format is going to generate strings that will be confusing for humans.
If you're a beginner I recommend two steps so you can trace the work.
Convert dob from character to numeric
Append the two datasets together (assume you're stacking the data sets)
Use format to control how the date is displayed
*convert character to numeric SAS date;
data aug21_convert2num;
set aug21(rename=dob=dobchar);
dob = input(dob, anydtdte.);
drop dobchar;
run;
*append the two data sets;
data want;
set aug21_convert2num aug22;
format dob yymmdd10.;
run;

SAS: Transform variable into time series in text file import - length greater than 32.767

I get a calendar file from a vendor containing all holidays for a specific calendar.
The file contain 7 columns separated by a pipe (|). However column 7 that contain the actual holiday comes in a string format separated by semi-colon (;).
My problem is that column 7 has a length greater than 32.767 - then the solution I have done so far using some array and transpose tricks doesn't work anymore.
Basically the text file looks like:
INTERNAL_NAME|ERROR_CODE|NUMBER_OF_FIELDS|CALENDAR_CODE|CALENDAR_TYPE|CALENDAR_NAME|DATES
US|0|4|US|Country|United States|;2;15728;1;5;19440101;5;19440102;5;19440103;5;19440108;5;19440109......etc.
However column 7 is delivered in a nice format so that the size of the array/matrix is given and the delimiter is given at the start of the string.
*1st charachter = delimiter -> ;
*Number of dimensions in matrix -> 2
*Number of rows in matrix -> 15.728
*Number of columns -> 1
*Data elements + Data -> 5 = Date and Data=01JAN1944 etc.
My desired result would be a dataset looking like
INTERNAL_NAME DATES
US 01JAN1944
US 02JAN1944
US 03JAN1944
US 08JAN1944
etc. until 15.728 observations is read.....
You can do this fairly easily.
The manual solution, i.e., assuming the fields are just as you say they are, is to use the secondary delimiter (;) and then you can parse that initial string on your own later since it's known to be shorter. Then iterate the inputs of that string, using # to hold the line.
data want;
infile datalines4 dlm=';' truncover;
length initial_string $500;
input initial_String $ #;
input dim row col #;
do _n_ = 1 by 1 until (missing(holiday_date));
input col_type holiday_Date #;
if not missing(holiday_date) then output;
end;
datalines4;
US|0|4|US|Country|United States|;2;15728;1;5;19440101;5;19440102;5;19440103;5;19440108;5;19440109
;;;;
run;
If you want to use that information that tells you about the delimiter/etc. to drive the readin, you could do that, but it would take two passes on the data file (unless it has a limited set of possibilities and you could just use if/else branching with those limited set of input statements). One pass would read just that part, then call a macro to read in the rest in a separate data step. But if this is always the format of the file, and you don't really care about those fields - you just have to work with them being there - the above is probably better as it's faster and less complicated.

Adding a new Column to existing SAS dataset

I have a SAS dataset that I have created by reading in a .txt file. It has about 20-25 rows and I'd like to add a new column that assigns an alphabet in serial order to each row.
Row 1 A
Row 2 B
Row 3 C
.......
It sounds like a really basic question and one that should have an easy solution, but unfortunately, I'm unable to find this anywhere. I get solutions for adding new calculated columns and so on, but in my case, I just want to add a new column to my existing datatable - there is no other relation between the variables.
This is kind of ugly and if you have more than 26 rows it will start to use random ascii characters. But it does solve the problem as defined by the question.
Test data:
data have;
do row = 1 to 26;
output;
end;
run;
Explanation:
On my computer, the letter 'A' is at position 65 in the ASCII table (YMMV). We can determine this by using this code:
data _null_;
pos = rank('A');
put pos=;
run;
The ASCII table will position the alphabet sequentially, so that B will be at position 66 (if A is at 65 and so on).
The byte() function returns a character from the ASCII table at a certain position. We can take advantage of this by using the position of ASCII character A as an offset, subtracting 1, then adding the row number (_n_) to it.
Final Solution:
data want;
set have;
alphabet = byte(rank('A')-1 + _n_);
run;
Not better than Tom's but a brute force alternative essentially. Create the string of Alpha and then use CHAR() to identify character of interest.
data want;
set sashelp.class;
retain string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
letter = char(string, _n_);
run;

Is it possible to convert a character date to another character date format without using input statement...possible?

Is there a faster /shorter way to convert character date to another character date format without using an intermediate input statement as done here below...
data alpha;
length date $12;
input date $;
cards;
15FEB2014
;
run;
data beta;set alpha;
date_f=put(input(date,date9.),yymmdd10.);
run;
Character date: 15FEB2014 --> Character date: 2014-02-15
Thanks
sas_kappel
Of course it is possible, but it's not likely to be easier.
data alpha;
length date $12;
input date $;
cards;
15FEB2014
;
run;
data beta;set alpha;
date_f=put(input(date,date9.),yymmdd10.);
format date_f2 $10.;
array monnames[12] $ _temporary_ ("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC") ;
date_f2 = catx('-',substr(date,6,4),put(whichc(substr(date,3,3),of monnames[*]),z2.),substr(date,1,2));
run;
I can't think of a simpler way than that without cheating any more than I already am in that. The put(input()) method is doing a lot of work behind that small amount of code.
You could also write a direct character format that converted '01JAN2014' to '2014-01-01', but you'd have to write every single possible conversion one by one unless I'm missing something obvious.

how to convert a variable with both character and numeric variable into a numeric variable in sas

i looked at the previous links related to the topic and tried using the commands but it is showing error.
i have a variable var1 = census tract 244.1 which is in character format of length 25. i need a final variable which will contain only the number 244.1 and the format should be numeric
i used the following commands:
newvar = input (var1, 8.)
but it showed error mentioning it as an invalid argument to function INPUT.
i also used:
newvar = input (var1, best32.) but again the same error message as above.
i tried to remove the word 'census tract' word using:
TRACT =tranwrd(var1, "Census Tract", '');
the message said that var1 is defined both as character and numeric variable
i have run out of option. so need help. i'm using sas 9.3
You'll have to do this in two steps:
Extract the characters "244.1"
Since we're only interested in 244.1, we'll get rid of the rest. This could have been done in a number of ways, one of which is tranwrd as you pointed out.
var2 = substr(var1, 13, 6);
Convert the character value "244.1" to the number 244.1
We need to take the character value and convert to a number. The input function allows us to take a character value and convert it to a number using an informat. An informat is just a way of telling sas how to interpret the value. In this case, treat it as a number stored in 8 bytes.
var3 = input(var2, 8.);
Full example program:
data work.one;
var1 = "census tract 244.1";
var2 = substr(var1, 13, 6);
var3 = input(var2, 8.);
run;
/* Show that var3 is a numeric variable */
proc contents data=work.one;
run;
Bonus Info
Note that you cannot save the converted value back to the original "var1" variable, since once it has been declared as a character variable it cannot store a number. If you did want to keep the same variable you would have to drop var1, then rename var3 to var1:
drop var1;
rename var3=var1;