salvaging Binary code in SAS - sas

I am working with a large SAS database and it appears that a few of the columns were derived from binary sources and forced into a character format. I believe this is the case because all sorts of crazy characters are appearing in the column: ##, ??, the "female" symbol to name a few. Is there a way to salvage this data and convert in to a usable format, or do I need to correct the way the table is populated from the original data source.

Related

SAS ALTER TABLE MODIFY Length

Suppose I have in SAS someTable with a column someColumn of type Character.
I can adjust length, format, informat and label in the following way:
ALTER TABLE WORK.someTable
MODIFY someColumn char(8) format=$CHAR6. informat=$CHAR6. label='abcdef'
But I doubt if this is the correct way for the following reasons:
It seems pointless that the syntax requires the type char because column type can't be changed with a MODIFYstatement.
This code does not work if someColumn is of type Numeric or Date.
The syntax for changing length is inconsistent with the syntax for changing format/informat/label.
Actually, I expected the following code to work:
ALTER TABLE WORK.someTable
MODIFY someColumn length=8 format=$CHAR6. informat=$CHAR6. label='someLabel'
This code runs without errors nut does not change the length.
Question:
What is the correct syntax to modify the length of a column using ALTER TABLE / MODIFY?
(For arbitrary column type like character/numeric/date.)
The syntax for defining the altered variable ("column") is the same as the syntax PROC SQL uses for defining a variable. What the documentation calls "column-definition Component"
column data-type <column-modifier(s)>
That is why you use the SQL syntax, char(n) or num, for specifying the type. Note that SAS datasets only have two data types: fixed length character strings and floating point numbers. SAS will automatically convert any other SQL data-type into the proper one of those.
The limitations on altering the type are spelled out in the documentation:
Changing Column Attributes
If a column is already in the table, then
you can change the following column attributes by using the MODIFY
clause: length, informat, format, and label. The values in a table are
either truncated or padded with blanks (if character data) as
necessary to meet the specified length attribute.
You cannot change a character column to numeric and vice versa. To
change a column’s data type, drop the column and then add it (and its
data) again, or use the DATA step.
Note: You cannot change the length of a numeric column with the ALTER
TABLE statement. Use the DATA step instead.
Note that to make such changes to a dataset SAS will have to create a whole new dataset. So you might as well just write a data step to create the new dataset and then you will have full control.
Also be careful if you change the length of character variable to make sure that the attached FORMAT is still correct.
In your example you are changing the variable to be 8 bytes long, but are attaching a format that will only display the first 6 bytes.
In general it is best to not attach formats to character variables to avoid the confusion that type of mismatch can cause. Unfortunately there is no way to remove the attached format using PROC SQL. The best you could do is to set the format to $., that is without an explicit width. If you want to completely remove the format you will need to use a FORMAT statement in PROC DATASETS or a data step.

Generating many tables from a single table in SAS

I have a table in SAS which contains the format information I want. I want to bin this data into the categories given.
What I don't know how to do is create either an xform or a format file from the data.
An example table looks like this:
TxtLabel Type FmtName label Hlo count
. I FAC1f 0 O 1
1996 I FAC1f 1 2
1997 I FAC1f 2 3
I want to date all years in a different data set as after 1997 OR before 1996.
The problem is that I know how to do this by hard coding it, but these files changes the numbers each time so I'm hoping to use the information in the table to generate the bins rather than hard code them.
How do I go about binning by data using a column from another dataset for my categorization?
Edit
I have two data sets, one which looks like the one I have included and one which has a column titled "YEAR". I want to bin the second data set using the categories from the first. In this case there are two available years in TxtLabel. There are multiple tables like this, I'm looking at how to generate PROC Format code from the table, rather than hard coding the values.
This should run to create the desired format
Proc FORMAT CNTLIN=MyCustomFormatControlData;
run;
You can then use it in a DATA Step, or apply it to a column in a data set.
Binning the data might be construed as 'data set splitting' but your question does not make it clear if that is so. Generic arbitrary splitting is often done with one of these techniques:
wall paper source code resolved from macro variables populated from information garnered in a Proc SQL or Proc FREQ step
dynamic data splitting using hash object for grouping records in memory, and saved to a data set with an .output() call.
Sample code for explicit binning
data want0 want1 want2 want3 want4 want5 wantOther;
set have;
* explicit wall paper;
select (put(year,FAC1f.));
when ('0') output want0;
when ('1') output want1;
when ('2') output want2;
when ('3') output want3;
when ('4') output want4;
when ('5') output want5;
otherwise output wantOther;
run;
This is the construct that source code generated by macro can produce, and requires
one pass to determine the when/output lines that are to be generated
a second pass to apply the lines of code that were generated.
If this is the data processing that you are attempting:
do some research (plenty of info out there)
write some code
make a new question if you get errors you can't resolve
Proc FORMAT
Proc FORMAT has a CNTLIN option for specifying a data set containing the format information. The structure and values expected of the Input Control Data Set (that CNTLIN) is described in the Output Control Data Set documentation. Some of the important control data columns are:
FMTNAME
specifies a character variable whose value is the format or informat name.
LABEL
specifies a character variable whose value is associated with a format or an informat.
START
specifies a character variable that gives the range's starting value.
END
specifies a character variable that gives the range's ending value.
As the requirements of the custom format to be created get more sophisticated you will need to have more information variables in the input control data set.

Power BI Desktop doesn't honor percentage column type

I've imported data (approximately 200 columns) into Power BI desktop (latest version as of 2017-08-02) and have explicitly told the app to treat a number of columns as being percentages. Within the query editor, I can verify that my values are treated as such:
When I put my data into a table, they show up as normal floats, not percentages. When I click on the exact same column as in the above picture and view it in the Modeling tab, Power BI shows it as being "General" format:
While I can go through and change the formatting here to have them all be percentages, I have already done so in the query editor! Is there a way to make PBI recognize my already specified format?
Short answer: No.
Explanation:
In the query editor, you didn't actually specify any format. What you specified is the data type, so that the source data can be read correctly. Say you have a column with data like 001, you can specify it as text type so you can retain the leading zero.
However, the actual formatting (i.e. data presentation) is done in your second step, because even if it's a (decimal) number, you can still format it as a percentage, with different decimal places, etc. (vice versa)

How to convert several fields in SAS to numeric?

I am working on a project where I'm reading raw census data into SAS enterprise guide to be processed as a different merged output. The first few columns are character fields, serving as geographic identifiers.
The rest of the raw data contains numeric fields, all fields are like "HD01_VD01" and so on up through numbers like "HD01_VD78". However, occasionally with census data numbers get suppressed and some observations have "*****" in the raw data like in the picture below. Whenever that happens, SAS reads in the numeric field as a character.
What would be a good way to ensure that anytime an "HD01_VD(whatevernumber)" is always numeric and converts "*****" to a blank/missing value like "." thus keeping the field numeric?
I don't want to hard-code every instance of a field being read in as a character back to numeric because my code is working with many different census tables. Would a macro variable be the way to do this? An if statement in each census table's data step?
Using arrays and looping them would be the best option; as mention in the comment by david25272.
Another option is to change the format of the fields in Enterprise Guide either in:
Import Task taht reads the files: change the field to numeric
or
Add a Query Builder Task: and create calculate field and use this advanced expression input(HD02_V36,11.)

SAS insert column for dynamically determined levels

I am attempting to set up SAS to do something I am able to easily do in Excel, but am unable to find a way to do effectively. Given the first two tables shown here (dubbed TREE and LEVEL, respectively), I am trying to end up with the third table (FINAL_TREE).
Adding in the Level column to TREE, so that it becomes FINAL_TREE works as follows: any given tree must have a number Apple which is greater than or equal to Apple_Req for a given Level, as well as Orange greater than or equal to Orange_Req. So a Tree is given a Level to which it meets all given requirements.
So in the example tables, Tree3 is given Level1, despite the fact that it would easily be Level3 if not for its low Orange count.
In Excel, this can be done using INDEX and finding the MIN of two MATCH functions, but I don't think that can be directly translated into SAS. I imagine there is a way to set this up using explicilty defined nested IF statements, but I am hoping there is a solution which can handle a LEVEL table with any number of levels (so long as the requirements are set up correctly).
In fact, this is quite a bit easier in SAS - in part because there are a lot of different ways to do this.
The most straightforward is probably using SQL, if you're familiar with it. The most similar to what you're doing in Excel, though, is Format, and perhaps the fastest as well.
proc format;
value appleF
1-<4 = '1'
5-<15 = '2'
15-high='3'
other='0';
value orangeF
5-<15 = '1'
16-<30 = '2'
30-high= '3'
other='0';
quit;
Now, you can convert the values using put and then use min just like you would in Excel. Basically this replaces your index.
data want;
set have;
level = min(put(apple,applef1.),put(orange,orangef1.));
run;
You can also produce a format from a dataset directly - see this paper for example for using CNTLIN option on PROC FORMAT.