SAS refer to a single digit out of a whole number - sas

Do I have a way to refer a value in the following way =>
value=23032017 and I want to refer to the digit in the forth place of this number, i.e 2

If it is a number then use some simple arithmetic.
data _null_;
value=23032017 ;
do position=1 to 9;
digit = mod(int(value/10**(position-1)),10);
put position= digit= ;
end;
run;
position=1 digit=7
position=2 digit=1
position=3 digit=0
position=4 digit=2
position=5 digit=3
position=6 digit=0
position=7 digit=3
position=8 digit=2
position=9 digit=0

Related

How to correct this sas function in order to have the jaccard distance?

I created a SAS function using fcmp to calculate the jaccard distance between two strings. I do not want to use macros, as I'm going to use it through a large dataset for multiples variables. the substrings I have are missing others.
proc fcmp outlib=work.functions.func;
function distance_jaccard(string1 $, string2 $);
n = length(string1);
m = length(string2);
ngrams1 = "";
do i = 1 to (n-1);
ngrams1 = cats(ngrams1, substr(string1, i, 2) || '*');
end;
/*ngrams1= ngrams1||'*';*/
put ngrams1=;
ngrams2 = "";
do j = 1 to (m-1);
ngrams2 = cats(ngrams2, substr(string2, j, 2) || '*');
end;
endsub;
options cmplib=(work.functions);
data test;
string1 = "joubrel";
string2 = "farjoubrel";
jaccard_distance = distance_jaccard(string1, string2);
run;
I expected ngrams1 and ngrams2 to contain all the substrings of length 2 instead I got this
ngrams1=jo*ou*ub
ngrams2=fa*ar*rj
If you want real help with your algorithm you need to explain in words what you want to do.
I suspect your problem is that you never defined how long you new character variables NGRAM1 and NGRAM2 should be. From the output you show it appears that FCMP defaulted them to length $8.
To define a variable you need use a LENGTH statement (or an ATTRIB statement with the LENGTH= option) before you start referencing the variable.

How to make my output to read as yes or no?

I want to display my data as either yes or no in the output for initaltesting, site visit, and follow up, how would I do that? There are numeric values for this on the data set but want character responses of "y" or "n"
PROC FORMAT;
VALUE SiteVisitfmt 1 = 'yes'
0 = 'no';
VALUE InitialTestingfmt 1 = 'yes'
2 = 'no';
VALUE TestEventfmt 1 = 'One Event '
2 = 'Two Events'
3 = 'Three Events'
4 = 'Four Events'
5 = 'Five Events';
VALUE FollowUpfmt 1 = 'yes'
0 = 'no';
FORMAT SiteVisit SiteVisitfmt. InitialTesting InitialTestingfmt. TestEvent TestEventfmt.
FollowUp FollowUpfmt.;
RUN;
data PMdataedits;
set PMdata (rename = (Number_of_Days_from_Onset_to_Sit =SiteVisit
Number_of_Days_between_Onset_and = InitialTesting
Number_of_Test_Events_in_IRIS = TestEvent
Number_of_Days_between_Test_1_an = FollowUp));
drop SPA;
attrib date1 format=date9.;
date1=input(date,mmddyy10.);
NewSiteVisit = put(SiteVisit, 8.);
NewInitialTesting = put(InitialTesting, 8.);
NewFollowUp = put(FollowUp, 8.);
NewSiteVisit=;
if (NewSiteVisit=<1) THEN NewSiteVisit= '1';
if (NewSiteVisit>1) THEN NewSiteVist= '0';
NewInitialTesting=;
if (NewInitialTesting<=2) THEN NewInitialTesting= '1';
if (NewInitialTesting>2) THEN NewInitialTesting='0';
This statement:
FORMAT SiteVisit SiteVisitfmt. InitialTesting InitialTestingfmt. TestEvent TestEventfmt.
FollowUp FollowUpfmt.;
Needs to be on the data step (sometime after data PMdataedits; but before the run; that you don't show), not in the proc format. That's the statement that assigns the format to a variable; each dataset (which is defined by a data step) has its own, unique set of variables that can be the same name as other datasets but have different contents and formats.
Also note that you don't have to name the formats after the variables, and don't need three different yes/no formats. You could have done:
proc format;
format ynf
'1'='yes'
'0'='no'
;
run;
And then used
format sitevisit initialtesting followup ynf.;
And that would have covered all three of them with one format. But what you did is legal, it's just more typing than you need!

Is there a SAS function to delete negative and missing values from a variable in a dataset?

Variable name is PRC. This is what I have so far. First block to delete negative values. Second block is to delete missing values.
data work.crspselected;
set work.crspraw;
where crspyear=2016;
if (PRC < 0)
then delete;
where ticker = 'SKYW';
run;
data work.crspselected;
set work.crspraw;
where ticker = 'SKYW';
where crspyear=2016;
where=(PRC ne .) ;
run;
Instead of using a function to remove negative and missing values, it can be done more simply when inputting or outputting the data. It can also be done with only one data step:
data work.crspselected;
set work.crspraw(where = (PRC >= 0 & PRC ^= .)); * delete values that are negative and missing;
where crspyear = 2016;
where ticker = 'SKYW';
run;
The section that does it is:
(where = (PRC >= 0 & PRC ^= .))
Which can be done for either the input dataset (work.crspraw) or the output dataset (work.crspselected).
If you must use a function, then the function missing() includes only missing values as per this answer. Hence ^missing() would do the opposite and include only non-missing values. There is not a function for non-negative values. But I think it's easier and quicker to do both together simultaneously without a function.
You don't need more than your first test to remove negative and missing values. SAS treats all 28 missing values (., ._, .A ... .Z) as less than any actual number.

How can I sort variables based on part of a string variable?

I have a dataset with string variables and I am trying to generate a new binary variable based on the first two characters. All strings are 5 characters long, but I'm only concerned with the first two in order to sort.
For example, I could have 22001 and 22005. Since both are of the form 22XXX, I want to assign value 1 for both in the variable type_A. And if I have 25001 and 25005, since both are not of the form 22XXX, I want to assign value 0 for both in the variable type_A.
This should do the job:
clear
set obs 4
generate str5 var1 = "22001" in 1
replace var1 = "22005" in 2
replace var1 = "25001" in 3
replace var1 = "25005" in 4
gen type_A = substr(var1, 1, 2) == "22"
Please note that as you explain your problem it looks like you you are storing 22005 as text - which may not necessarily be the best idea..

how to extract last 4 characters of the string in SAS

improved formatting,I am a bit stuck where I am not able to extract the last 4 characters of the string., when I write :-
indikan=substr(Indikation,length(Indikation)-3,4);
It is giving invalid argument.
how to do this?
This code works:
data temp;
indikation = "Idontknow";
run;
data temp;
set temp;
indikan = substrn(indikation,max(1,length(indikation)-3),4);
run;
Can you provide more context on the variable? If indikation is length 3 or smaller than I could see this erroring or if it was numeric it may cause issues because it right justifies the numbers (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245907.htm).
If it's likely to be under four characters in some cases, I would recommend adding max:
indikan = substrn(indikation,max(1,length(indikation)-3),4);
I've also added substrn as Rob suggests given it better handles a not-long-enough string.
Or one could use the reverse function twice, like this:
data _null_;
my_string = "Fri Apr 22 13:52:55 +0000 2016";
_day = substr(my_string, 9, 2);
_month = lowcase(substr(my_string, 5, 3));
* Check the _year out;
_year = reverse(substr(reverse(trim(my_string)), 1, 4));
created_at = input(compress(_day || _month || _year), date9.);
put my_string=;
put created_at=weekdatx29.;
run;
Wrong results might be caused by trailing blanks:
so, before you perform substr, strip/trim your string:
indikan=substr(strip(Indikation),length(strip(Indikation))-3);
must give you last 4 characters
Or you can try this approach, which, while initially a bit less intuitive, is stable, shorter, uses fewer functions, and works with numeric and text values:
indikan = prxchange("s/.*(.{4}$)/$1/",1,indikation);
data temp;
input trt$;
cards;
treat123
treat121
treat21
treat1
treat1
trea2
;run;
data abc;
set temp;
b=substr(trt,length(trt)-3);
run;
[Output]
Output: