SAS replicates values - sas

I have a table with some millions records. There, I have a column looking like that (goes from 1 to 7 for hundreds of times)
I would like to add an index (say nweeks) looking like that,
Any ideas?
Thanks

Without seeing more of the data table and it's potential natural ordering columns you could create a DATA step view
data work.big_with_week / view=work.big_with_week;
set big;
if list = 1 then nweek + 1;
run;
The syntax variable+expression is known as a SUM statement.
The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain variable 0;
variable=sum(variable,expression);
Thus, the retained variable nweek is only incremented when the list value is 1. If your big data ever becomes disordered or otherwise not uphold the implicit contract of list being sequenced 1..7 the view will not be accurate.

Related

List and for in the Power BI

I have a column "answers" in my data whose values can be a,b,c up to the letter z. I want to create a "conditional" column that would always result in 1 when the row result equals my list of correct answers ={a,d,f,g,j). I did it with the switch, but as this list can change, adding more letters, it is not automatic, being necessary for me to manually insert each letter that represents the answer into the switch. Is it possible to use a "for" inside power BI so that it automates this?
thought about:
list={a,d,f,g,j)
For k in list:
if responses [k]=list:
1;
else:0
I don't know how to structure this in the context of Dax or M language

How to create variable from 1 to n (increments of 1)?

Assuming that in Stata e.g. I have one stacked variable (in column 2) of stock returns with data populating range of 1 to 2000000 (some blanks are replaced with dots). How can I create another variable next to it, which will start at 1 and jump in the increments of one (1, 2, 3, 4...) all the way down to 2000000? I need this kind of variable to merge datasets. Advice would be much appreciated.
If it helps, if I was to use VBA, I would find the last row of the stacked column and then create a variable on this basis moving in the increments of one (that would be of course if Excel allowed 2 million rows)
gen long id = _n
will populate a variable with the observation number.
Note that you can merge on observation number. You don't need any identifier variable(s) to do it. In practice, I would almost always be very queasy about any merge not based on explicit identifiers, unless the datasets were visibly compatible (not so with 2 million observations).

Is sorting more favorable (efficient) in if-else statement?

Assume two functions fun1, fun2 have been defined to carry out some calculation given input x.
The structure of data have is:
Day Group x
01Jul14 A 1.5
02JUl14 B 2.7
I want to do sth like this:
data want;
set have;
if Group = 'A' then y = fun1(x);
if Group = 'B' then y = fun2(x);
run;
Is it better to do proc sort data=have;by Group;run; first then move on to the data step? Or it doesn't matter because each time it just picks one observation and determines which if statement it falls into?
So long as you are not doing anything to alter the normal input of observations - such as using random access (point=), building a hash table, using a by statement, etc. - sorting will have no impact: you read each row regardless of the if statement, check both lines, execute one of them. Nothing different occurs sorted or unsorted.
This is easy to test. Write something like this:
%put Before Unsorted Time: %sysfunc(time(),time8.);
***your datastep here***;
%put After Unsorted Time: %sysfunc(time(),time8.);
proc sort data=your_dataset;
by x;
run;
%put Before Sorted Time: %sysfunc(time(),time8.);
***your datastep here***;
%put After Sorted Time: %sysfunc(time(),time8.);
Or just run your datasteps and look at the execution time!
You may be confusing this with sorting your if statements (ie, changing the order of them in the code). That could have an impact, if your data is skewed and you use else. That's because SAS won't have to evaluate further downstream conditionals. It's not very common for this to have any sort of impact - it only matters when you have extremely skewed data, large numbers of observations, and certain other conditions based on your code - so I wouldn't program for it.

sas collapsing categorical variables clustering analysis

I came across the following code in the logistic regression modeling course offered by SAS:
data dataset(drop=i);
set data;
array mi{*} mi_Ag mi_Inc
mi_WR;
array x{*} Ag Inc WR;
do i=1 to dim(mi);
mi{i}=(x{i}=.);
end;
run;
I need to understand two things:
1.) there is a column created titled "i" once this data step is run. What does that signify and why is there. The drop "i" essentially drops it but if i don't use drop option the column stays in the data set
2.) this do step is replacing all the missing values with a 1 and rest with 0. How is that happening when nothing is clearly specified in the do step as to what needs to be done. In my eyes, "do i=1 to dim(mi); mi{i}=(x{i}=.);" should simply put dots in mi(i) wherever it finds dots in x(i).
Part 2:
While collapsing the categorical variable, following code has been used:
proc freq data=example1 noprint;
tables CLUSTER_CODE*TARGET_B / chisq;
output out=out_chi(keep=_pchi_) chisq;
run;
data ex_cutoff;
if _n_=1 then set out_chi;
set ex_cluster;
chisquare=_pchi_*rsquared;
degfree=numberofclusters-1;
logpvalue=logsdf('CHISQ',chisquare,degfree);
run;
what is n=1 doing ? and also, why are we creating chisquare=_pchi*rsquared. pchi is already chisquare so whats the point of multiplying it with R square?
Thanks
P.S. The code is from one of the SAS learning courses. Hopefully I am allowed to share it here for discussion/learning purposes.
i is the array iterator (created in the do loop). It's dropped since it's not really intended to be kept on the dataset, it's just an iterator (letting you go through the array one element at a time and during that iteration letting you reference a single element).
mi{i}=(x{i}=.); is assigning 1/0 like this:
x(i)=. is either true or false. If it is true, it evaluates to a 1. If it is false, it evaluates to 0. Thus when it's true that x(i)=. then m(i) is assigned a 1; otherwise it is assigned a 0. That's just how SAS works with boolean (True/False) values; many other langauges work that way as well (True is nonzero, False is zero); and when converted to number, True is converted to 1 (but any nonzero nonmissing value is 'True' when converted the other way around).

create SAS dataset with variable number of attributes

I need to create a dataset in SAS, with a variable no of attribute names.
Im not so proficient in SAS, so writing the logic in normal lang
for(i=1 to 10)
{
for (j=1 to n)
{
Combinations(j,i);
}
//perform some calculations on the temporary average table and delete it
}
The problem is in the combinations function. Here
combinations(i,j)
{
//find all possible combinations
//find average of all combinations
}
I now need to store all the averages in a temporary table/data set
For ex., for i=2,j=5.. ill have ten combinations for each value of j.
so, the column count will be 10 and the row count wil be 2.
This table should be a dynamic dataset I guess.
Im not really sure what to do.. just struck.
Any help will be much appreciated.
Thanks
Likely the best solution is to initially create the i,j dataset as vertical - with each eventual-variable as a row - and then use PROC TRANSPOSE to transpose it to horizontal. You can use the ID statement to name the variable.