How to use proc format with the number of lines? - sas

I have a table like this :
|Num | Label
-----------------------
1|1 | a thing
2|2 | another thing
3|3 | something else
4|4 | whatever
I want to replace my values of my label column by something more generic for example the first two lines : label One, the two next ones label Two ...
|Num | Label
-----------------------
1|1 | label One
2|2 | label One
3|3 | label Two
4|4 | label Two
How can I do that using proc format procedure ? I was wondering if I can use either the number of lines or another column like Num.
I need to do something like this :
proc format;
value label_f
low-2 = "label One"
3-high = "label Two"
;
run;
But I want to specify the number of the line or the value of the Num column.

You could do what you are describing using the words format. You could swap out num for _N_ in the ceil function below in order to use the observation number instead of the value of num (if they are not always equal):
data have;
length num 8 label $20;
infile datalines dlm='|';
input num label $;
datalines;
1|a thing
2|another thing
3|something else
4|whatever
5|whatever else
6|so many things
;
run;
data want;
set have;
label=catx(' ','label',propcase(put(ceil(num/2),words.)));
run;
Although this answer is probably a bit too specific to your example and it may not apply in your actual context.

Gatsby:
It sounds like you want to format NUM instead of LABEL.
Where you want the use the 'generic' representation defined by your format simply place a FORMAT statement in the Proc being used:
PROC PRINT data=have;
format num label_f.;
RUN;
If you want both num and generic, you will need to add a new column to the data for use during processing. This can be done with a view:
data have_view / view=have_view;
set have;
num_replicate1 = num;
attrib num_replicate1 format=label_f. label='Generic';
num_replacement = put (num,label_f.);
attrib num_replacement label='Generic'; %* no format because the value is the formatted value of the original num;
run;
PROC PRINT data=have_view;
var num num_replicate1 num_replacement;
RUN;
If you want a the 'generic' representation of the NUM column to be used in by-processing as a grouping variable, you have several scenarios:
know apriori the generic representation is by-group clustered
use a view and process with BY or BY ... NOTSORTED if clusters are not in sort order
force ordering for use with by-group processing
use an ordered SQL view containing the replicate and process with BY
add a replicate variable to the data set, sort by the formatted value and process with BY
A direct backmap from label to num to generic is possible only if the label is known to be unique, or you know apriori the transformation backmap-num + num-map is unique.
Proc FORMAT also has a special value construct [format] that can be used to map different ranges of values according to different formatting rules. The other range can also map to a different format that itself has an other range that maps to yet another different format. The SAS format engine will log an error if you happen to define a recursive loop using this advanced kind of format mapping.
propaedeutics
One of my favorite Dorfman words.
Format does not replace underlying values. Format is a map from the underlying data value to a rendered representation. The map can be 1:1, many:1. The MultiLabel Format (MLF) feature of the format system can even perform 1:many and many:many mappings in procedures many MLF enabled procedures (which is most of them)
To replace an underlying value with it's formatted version you need to use the PUT, PUTC or PUTN functions. The PUT functions always outputs a character value.
character ⇒ PUT ⇒ character [ FILE / PUT ]
numeric ⇒ PUT ⇒ character [ FILE / PUT ]
There is no guarantee a mapped value will mapped to the same value, it depends on the format.
INFORMATs are similar to FORMATs, however the target value depend on the in format type
character ⇒ INPUT ⇒ character [ INFILE / INPUT ]
numeric ⇒ INPUT ⇒ character
character ⇒ INPUT ⇒ numeric [ INFILE / INPUT ]
numeric ⇒ INPUT ⇒ numeric
Custom formats are created with Proc FORMAT. The construction of a format is specified by either the VALUE statement, or the CNTLIN= option. CNTLIN lets you create formats directly from data and avoids really large VALUE statements that are hand-entered or code-generated (via say macro)
Data-centric 'formatting' performs the mapping through a left-join. This is prevalent in SQL data bases. Left-joins in SAS can be done through SQL, DATA Step MERGE BY and FORMAT application. 1:1 left-joins can also be done via Hash object SET POINT=

Related

Delete all observations starting with a list of values from database (SAS)

I am trying to find the optimized way to do this :
I want to delete from a character variable all the observations STARTING with different possible strings such as :
"Subtotal" "Including:"
So if it starts with any of these values (or many others that i didn't write here) then delete them from the dataset.
Best solution would be a macro variable containing all the values but i don't know how to deal with it. (%let list = Subtotal Including: but counts them as variables while they are values)
I did this :
data a ; set b ;
if findw(product,"Subtotal") then delete ;
if findw(product,"Including:") then delete;
...
...
Would appreciate any suggestions !Thanks
First figure out what SAS code you want. Then you can begin to worry about how to use macro logic or macro variables.
Do you just to exclude the strings that start with the values?
data want ;
set have ;
where product not in: ("Subtotal" "Including");
run;
Or do you want to subset based on the first "word" in the string variable?
where scan(product,1) not in ("Subtotal" "Including");
Or perhaps case insensitive?
where lowcase(scan(product,1)) not in ("subtotal" "including");
Now if the list of values is small enough (less than 64K bytes) then you could put the list into a macro variable.
%let list="Subtotal" "Including";
And then later use the macro variable to generate the WHERE statement.
where product not in: (&list);
You could even generate the macro variable from a dataset of prefix values.
proc sql noprint;
select quote(trim(prefix)) into :list separated by ' '
from prefixes
;
quit;

Add zero before decimal place in SAS

Hello I need to add a zero for only results and ranges are in a .2, .3 etc format but not tall values are like this. What would be the most efficient way to do this?.
I've only tried to format it using a w.d but the issue is that not all values are the same and all I want to do is add a zero for where they are applicable without messing the format of the other values.
format lborres lbornlo lbornhi z7.2.;
If your variable is numeric you can apply a standard format and that would add the leading zero.
ie
format yourVariable 8.1;
If your variable is character then you can test if the first character is a period and add a 0 or you can convert it to a number and store it that way instead. Option 2 is illustrated first as the first option overwrites the variable so to avoid any issues with that it's shown after.
data want;
set yourInputDataSet;
*Option 2;
new_numeric_variable = input(yourVariable, 8.);
format new_numeric_variable 8.1;
*Option 1;
if yourVariable =: '.' then yourVariable = catt('0', yourVariable);
run;
And as always, if your variable is incorrectly formatted this way, I would check my data import step and see if I can fix it there instead of after the fact. This is especially true if you used PROC IMPORT on a text file, where you can easily control the variable format and types as they're read in.
Are you value all less than 1? eg.
.3
.002
.752
.056
If not you would have to create picture format for each unit (thousands, hundreds, tens etc)
If they are all less than 1, z5.3 I believe will give you - 5 being the total length and 3 the number of decimal points.
data test;
format A z5.3;
a=.3; output; /* 0.300 */
a=.002; output; /* 0.002 */
a=.752; output; /* 0.752 */
a=.056; output; /* 0.056 */
run;

Convert Number with Format into a String

How do you convert a number or currency variable into a character string that keeps the format as part of the string?
For instance, the below code has a character variable, MSRP_to_text, and currency variable, MSRP. When I set MSRP_to_text equal to MSRP, it takes the unformatted number and converts it to a string, so the dollar sign and the comma are gone.
DATA want;
SET SASHELP.CARS(KEEP=MSRP);
ATTRIB MSRP_to_text FORMAT=$8.;
MSRP_to_text = MSRP;
RUN;
In other words, the code is currently converting $36,945 -> "36945", but what I really want is $36,945 -> "$36,945".
Is there a way to keep the dollar sign and comma in the string?
VVALUE function will retrieve the formatted value of a variable.
MSRP_as_text = VVALUE(MSRP);
VVALUEX goes one step further for the case of the variable name being dynamic; such as being stored in a different variable, or is computed from some name patterning algorithm.
name = 'MSRP';
formatted_value = VVALUEX(name);
Instead of ATTRIB statement, Use the PUT function to convert number to Character. and it will keep the text value with format. Since the original Format of MSRP is DOLLAR8. , so using same format in put statement will suffice the purpose
DATA want;
SET SASHELP.CARS(KEEP=MSRP);
MSRP_to_text = put(MSRP, DOLLAR8.);
RUN;
proc contents data=want; run;

Multiple To clauses in Data step

I have a data step where I have a few columns that need tied to one other column.
I have tried using multiple "from" statements and " to" statements and a couple other permutations of that, but nothing seems to do the trick. The code looks something like this:
data analyze;
set css_email_analysis;
from = bill_account_number;
to = customer_number;
output;
from = bill_account_number;
to = email_addr;
output;
from = bill_account_number;
to = e_customer_nm;
output;
run;
I would like to see two columns showing bill accounts in the "from" column, and the other values in the "to", but instead I get a bill account and its customer number, with some "..."'s for the other values.
Issue
This is most likely because SAS has two datatypes and the first time the to variable is set up, it has the value of customer_number. At your second to statement you attempt to set to to have the value of email_addr. Assuming email_addr is a character variable, two things can happen here:
Customer_number is a number - to has already been set up as a number, so SAS cannot force to to become a character, an error like this may appear:
NOTE: Invalid numeric data, 'me#mywebsite.com' , at line 15 column 8. to=.
ERROR=1 N=1
Customer_number is a character - to has been set up as a character, but without explicitly defining its length, if it happens to be shorter than the value of email_addr then the email address will be truncated. SAS will not show an error if this happens:
Code:
data _NULL_;
to = 'hiya';
to = 'me#mydomain.com';
put to=;
run;
short=me#m
to is set with a length of 4, and SAS does not expand it to fit the new data.
Detail
The thing to bear in mind here is how SAS works behind the scenes.
The data statement sets up an output location
The set statement adds the variables from first observation of the dataset specified to a space in memory called the PDV, inheriting lengths and data types.
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm
===================================================================
010101 | 758|me#my.com |John Smith
The to statement adds another variable inheriting the characteristics of customer_number
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |758
(to is either char length 3 or a numeric)
Subsequent to statements will not alter the characteristics of the variable and SAS will continue processing
PDV (if customer_number is character = TRUNCATION):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |me#
PDV (if customer_number is numeric = DATA ERROR, to set to missing):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |.
Resolution
To resolve this issue it's probably easiest to set the length and type of to before your first to statement:
data analyze;
set css_email_analysis;
from = bill_account_number;
length to $200;
to = customer_number;
output;
...
You may get messages like this, where SAS has converted data on your behalf:
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
27:8
N.B. it's not necessary to explicitly define the length and type of from, because as far as I can see, you only ever get the values for this variable from one variable in the source dataset. You could also achieve this with a rename if you don't need to keep the bill_account_number variable:
rename bill_account_number = from;

SAS format char

First i have created this table
data rmlib.tableXML;
input XMLCol1 $ 1-10 XMLCol2 $ 11-20 XMLCol3 $ 21-30 XMLCol4 $ 31-40 XMLCol5 $ 41-50 XMLCol6 $ 51-60;
datalines;
| AAAAA A||AABAAAAA|| BAAAAA|| AAAAAA||AAAAAAA ||AAAA |
;
run;
I want to clean, concatenate and export. I have written the following code
data rmlib.tableXML_LARGO;
file CleanXML lrecl=90000;
set rmlib.tableXML;
array XMLCol{6} ;
array bits{6};
array sqlvars{6};
do i = 1 to 6;
*bits{i}=%largo(XMLCol{i})-2;
%let bits =input(%largo(XMLCol{i})-2,comma16.5);
sqlvars{i} = substr(XMLCol{i},2,&bits.);
put sqlvars{i} &char10.. #;
end;
run;
the macro largo count how many characters i have
%macro largo(num);
length(put(&num.,32500.))
%mend;
What i need is instead of have char10, i would like that this number(10) would be the length, of each string, so to have something like
put sqlvars{i} &char&bits.. #;
I don't know if it possible but i can't do it.
I would like to see something like
AAAAA AAABAAAAA BAAAAA AAAAAAAAAAAAA AAAA
It is important to me to keep the spaces(this is only an example of an extract of a xml extract). In addition I will change (for example) "B" for "XPM", so the size will change after cleaning the text, that it what i need to be flexible in the char
Thank you for your time
Julen
I'm still not quite sure what you want to achieve, but if you want to combine the text from multiple varriables into one variable, then you could do something along the lines:
proc sql;
select name into :names separated by '||'
from dictionary.columns
where 1=1
and upcase(libname)='YOURLIBNAME'
and upcase(memname)='YOURTABLENAME';
quit;
data work.testing;
length resultvar $ 32000;
set YOURLIBNAME.YOURTABLENAME;
resultvar = &names;
resultvar2 = compress(resultvar,'|');
run;
Wasn't able to test this, but this should work if you replace YOURLIBNAME and YOURTABLENAME with your respective tables. I'm not 100% sure if the compress will preserve the spaces in the text.. But I think it should.
The format $VARYING. <length-variable> is a good candidate for solving this output problem.
On the presumption of having a number of variables whose values are vertical-bar bounded and wanting to output to a file the concatenation of the values without the bounding bars.
data have;
file "c:\temp\want.txt" lrecl=9000;
length xmlcol1-xmlcol6 $100;
array values xmlcol1-xmlcol6 ;
xmlcol1 = '| A |';
xmlcol2 = '|A BB|';
xmlcol3 = '|A BB|';
xmlcol4 = '|A BBXC|';
xmlcol5 = '|DD |';
xmlcol6 = '| ZZZ |';
do index = 1 to dim(values);
value = substr(values[index], 2); * ignore presumed opening vertical bar;
value_length = length(value)-1; * length with still presumed closing vertical bar excluded;
put value $varying. value_length #; * send to file the value excluding the presumed closing vertical bar;
end;
run;
You have some coding errors in that is making it difficult to understand what you want to do.
Your %largo() macro doesn't make any sense. There is no format 32500.. The only reason it would run in your code is because you are trying to apply the format to a character variable instead of a number. So SAS will automatically convert to use the $32500. instead.
The %LET statement that you have hidden in the middle of your data step will execute BEFORE the data step runs. So it would be less confusing to move it before the data step.
So replacing the call to %largo() your macro variable BITS will contain this text.
%let bits =input(length(put(XMLCol{i},32500.))-2,comma16.5);
Which you then use inside a line of code. So that line will end up being this SAS code.
sqlvars{i} = substr(XMLCol{i},2,input(length(put(XMLCol{i},$32500.))-2,comma16.5));
Which seems to me to be a really roundabout way to do this:
sqlvars{i} = substr(XMLCol{i},2,length(XMLCol{i})-2);
Since SAS stores character variables as fixed length, it will pad the value stored. So what you need to do is to remember the length so that you can use it later when you write out the value. So perhaps you should just create another array of numeric variables where you can store the lengths.
sqllen{i} = length(XMLCol{i})-2;
sqlvars{i} = substr(XMLCol{i},2,sqllen{i});