I have a variable DRG in my dataset and I would like to create a new variable with the second and third characters in the DRG string. For example, if DRG value is A23B I would like to extract 23 as a new variable.
Can someone please help me with the SAS code. Thanks a lot in advance.
Sample code
data example;
input DRG $4.;
cards;
A23B
A13A
A45C
B82B
B82C
B34A
C01A
C25B
C46B
;
run;
Thanks for the help.
I was able to work out the answer by following this webpage https://www.listendata.com/2017/03/extract-last-4-characters-digits-in-sas.html
Here is my code:
data example2;
set example;
want = substr(DRG,length(DRG)-2,2);
run;
Related
I have a code that converts a character to numeric using the informat and I'm using length function as the value of informat.
However, I'm having error with this approach.
Background of this problem is that the informat before was fixed value. I want to enhance the code for the informat to be flexible and remove the fixed value.
Before code:
data work.test;
emp_input = '168643123'
emp_value = input(emp_input, 6.);
run;
My current testcode:
data work.test;
emp_input = '168643123'
emp_value = input(emp_input, length(emp_input).);
run;
I expect the result that character '168643123' would be converted to numeric 168643123.
Using before code the output for this would be: numeric 168643.
That is not valid syntax. You have to use inputn an and then generate an string for the format.
data work.test;
emp_input = '168643123';
emp_value = inputn(emp_input, cats(put(length(emp_input),3.),'.'));
run;
But better use Use BEST32. for all generic numbers of up 32 chars length.
data work.test;
emp_input = '168643123';
emp_value = input(emp_input, BEST32.);
run;
INPUT requires a text value for the second parameter.
INPUTN() or INPUTC() can take the second parameter as a string/character/variable and use that to apply the format. You do have to convert it to a string first.
Why? The INPUT function is happy to adjust when the width of the informat is larger than the length of the string you are reading. Just use the maximum width that the informat allows.
emp_value = input(emp_input,32.);
If you did want to limit the number of characters read (perhaps there are letters after the digits?) then you can use the INPUTN() function (or INPUTC() function for character results). Let's test by appending some X's to the end of the string and using an informat whose width stops before the X's.
emp_value = inputn(cats(emp_input,'XXX'),cats(length(emp_input),'.'));
This code worked for me. This is based on all your answers. Thank you very much!
data work.test;
emp_input = '168643123'
emp_value = inputn(emp_input, cats(length(emp_input),'.'));
run;
My code was running fine until I added the last line for age 5+. Does anyone know what's wrong with that line? Thank you.
data Work.File ;
set Work.File;
Female =(Sex ='F');
Male = (Sex ='M');
Age1=(age=1);
Age2=(age=2);
Age3=(age=3);
Age4=(age=4);
Age5+=(age='5+');
run;
The name of a SAS variable has certain restrictions, you can't have a + sign. Also Age should be a numeric variable. You can write last line as:
Age5Plus=(age>=5);
"Age5+"n=(age>=5);
would also work after setting
options validvarname=any;
but than you have to escape that name every time you use that variable
First i have created this table
data rmlib.tableXML;
input XMLCol1 $ 1-10 XMLCol2 $ 11-20 XMLCol3 $ 21-30 XMLCol4 $ 31-40 XMLCol5 $ 41-50 XMLCol6 $ 51-60;
datalines;
| AAAAA A||AABAAAAA|| BAAAAA|| AAAAAA||AAAAAAA ||AAAA |
;
run;
I want to clean, concatenate and export. I have written the following code
data rmlib.tableXML_LARGO;
file CleanXML lrecl=90000;
set rmlib.tableXML;
array XMLCol{6} ;
array bits{6};
array sqlvars{6};
do i = 1 to 6;
*bits{i}=%largo(XMLCol{i})-2;
%let bits =input(%largo(XMLCol{i})-2,comma16.5);
sqlvars{i} = substr(XMLCol{i},2,&bits.);
put sqlvars{i} &char10.. #;
end;
run;
the macro largo count how many characters i have
%macro largo(num);
length(put(&num.,32500.))
%mend;
What i need is instead of have char10, i would like that this number(10) would be the length, of each string, so to have something like
put sqlvars{i} &char&bits.. #;
I don't know if it possible but i can't do it.
I would like to see something like
AAAAA AAABAAAAA BAAAAA AAAAAAAAAAAAA AAAA
It is important to me to keep the spaces(this is only an example of an extract of a xml extract). In addition I will change (for example) "B" for "XPM", so the size will change after cleaning the text, that it what i need to be flexible in the char
Thank you for your time
Julen
I'm still not quite sure what you want to achieve, but if you want to combine the text from multiple varriables into one variable, then you could do something along the lines:
proc sql;
select name into :names separated by '||'
from dictionary.columns
where 1=1
and upcase(libname)='YOURLIBNAME'
and upcase(memname)='YOURTABLENAME';
quit;
data work.testing;
length resultvar $ 32000;
set YOURLIBNAME.YOURTABLENAME;
resultvar = &names;
resultvar2 = compress(resultvar,'|');
run;
Wasn't able to test this, but this should work if you replace YOURLIBNAME and YOURTABLENAME with your respective tables. I'm not 100% sure if the compress will preserve the spaces in the text.. But I think it should.
The format $VARYING. <length-variable> is a good candidate for solving this output problem.
On the presumption of having a number of variables whose values are vertical-bar bounded and wanting to output to a file the concatenation of the values without the bounding bars.
data have;
file "c:\temp\want.txt" lrecl=9000;
length xmlcol1-xmlcol6 $100;
array values xmlcol1-xmlcol6 ;
xmlcol1 = '| A |';
xmlcol2 = '|A BB|';
xmlcol3 = '|A BB|';
xmlcol4 = '|A BBXC|';
xmlcol5 = '|DD |';
xmlcol6 = '| ZZZ |';
do index = 1 to dim(values);
value = substr(values[index], 2); * ignore presumed opening vertical bar;
value_length = length(value)-1; * length with still presumed closing vertical bar excluded;
put value $varying. value_length #; * send to file the value excluding the presumed closing vertical bar;
end;
run;
You have some coding errors in that is making it difficult to understand what you want to do.
Your %largo() macro doesn't make any sense. There is no format 32500.. The only reason it would run in your code is because you are trying to apply the format to a character variable instead of a number. So SAS will automatically convert to use the $32500. instead.
The %LET statement that you have hidden in the middle of your data step will execute BEFORE the data step runs. So it would be less confusing to move it before the data step.
So replacing the call to %largo() your macro variable BITS will contain this text.
%let bits =input(length(put(XMLCol{i},32500.))-2,comma16.5);
Which you then use inside a line of code. So that line will end up being this SAS code.
sqlvars{i} = substr(XMLCol{i},2,input(length(put(XMLCol{i},$32500.))-2,comma16.5));
Which seems to me to be a really roundabout way to do this:
sqlvars{i} = substr(XMLCol{i},2,length(XMLCol{i})-2);
Since SAS stores character variables as fixed length, it will pad the value stored. So what you need to do is to remember the length so that you can use it later when you write out the value. So perhaps you should just create another array of numeric variables where you can store the lengths.
sqllen{i} = length(XMLCol{i})-2;
sqlvars{i} = substr(XMLCol{i},2,sqllen{i});
I am unsure if this is possible (or stupid question), as I just started looking at SAS last week. I've managed to import my .CSV file to a SAS data set using the:
proc import
Specifying the guessingrows= to limit my out=.
My problem is now that my CSV files to import are not of same structure, which I noticed after writing some code using the obsnum= to specify start and x-lines to read.
So my question is wether or not SAS is capable of either look for a specific string/empty variable, and use as end observation?
My Data looks like (but number of Var_x varies for each file):
First I tried looking at the slice= but is only useful if I know the exact Places of interest, as the empty Space between the Groups can vary.
Is it possible to use the set function to specify to start at line 1 and read till encounting a blank field? Or can you redirect me to some function (that I couldn't find myself)?
I would like to look at each "block" separately and process.
Thank you in advance
I think you can do this in a relatively straightforward way if you are comfortable doing some processing after all the data has been inputted.
So do proc import on the whole dataset with no restriction.
Then use a data step and a counter to process through the data and output as necessary. Something like:
data output1 output2 output3;
set imported_data;
if _n_ = 1 then counter = 1;
var1lag = lag(var1);
if var1 = '' and var1lag ne '' then counter=counter+1;
if counter = 1 then output output1;
else if counter = 2 then output output2;
else output output3;
run;
data output1;
set output1;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output2;
set output2;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output3;
set output3;
if var1 = '' and var2 = . and var3 = . then delete;
run;
The above code outputs to three datasets based on the value of counter. The lag function lets us look up a row to ensure its the first time we see no data and updates the counter as we see no data.
Then we go back and remove any fully blank data for our datasets.
You could easily use some arrays to make this work more scaleably if you have many outputs instead of the if/else statements to output the data.
The NEGPARENw.d reads the values -2000 as (20,00) based on the w.d
is there anyway to do the same in SAS 9.1?
I read a value 00005000- as character value and then converted to numeric value
-5000
TEMP=000005000-
Temp= COMPRESS(TEMP,'-')
TEMP=-(INPUT(TEMP,16.2)) format NEGPARENw.d its not working
PRoc report;
.....
define temp /display format = NEGPAREN16.2
Run;
Thanks
NEGPARENw.d format exists in 9.1.3, so there's no particular reason it wouldn't work the same in 9.1.3 as it would in later versions.