create SAS dataset with variable number of attributes - sas

I need to create a dataset in SAS, with a variable no of attribute names.
Im not so proficient in SAS, so writing the logic in normal lang
for(i=1 to 10)
{
for (j=1 to n)
{
Combinations(j,i);
}
//perform some calculations on the temporary average table and delete it
}
The problem is in the combinations function. Here
combinations(i,j)
{
//find all possible combinations
//find average of all combinations
}
I now need to store all the averages in a temporary table/data set
For ex., for i=2,j=5.. ill have ten combinations for each value of j.
so, the column count will be 10 and the row count wil be 2.
This table should be a dynamic dataset I guess.
Im not really sure what to do.. just struck.
Any help will be much appreciated.
Thanks

Likely the best solution is to initially create the i,j dataset as vertical - with each eventual-variable as a row - and then use PROC TRANSPOSE to transpose it to horizontal. You can use the ID statement to name the variable.

Related

How can I sort from highest to lowest column values in Power BI Matrix Visual

I am trying to figure out how can I sort column values from highest to lowest in Power BI Matrix Visual. I have a small matrix with 3 columns: "No", "Yes" and "Total" and on rows I have the name of some people.
What I want to do is to sort values from highest to lowest in the "No" column but when I click on sort by I only get the option to filter by total count and the names of the people, I have added a picture below for better context. Any help will be much appreciated!
This can be done by creating three measures. Don't use the implicit measures, always create your own.
First, create the measure that generates the total you use right now. It may be a count, it may be a sum, I cannot tell because I don't know your data source. Let's call that measure "total".
Assuming your data source has a column with the "yes" and "no" values, and assuming the name of that column is "status", you can then create two additional measures.
TotalYes = CALCULATE([total],'Table'[status]="yes")
TotalNo = CALCULATE([total],'Table'[status]="no")
Add these measures to the matrix and remove the status column from the columns well. You can now sort the matrix by the "TotalNo" column. Of course, you can rename the column in the matrix, so it just says "No".

Calculated field Sub total in pivot table is not displaying correct value

I am working on QuickSight in AWS. I am trying to achieve weighted average value in a Pivot table.
I am using SPICE data to create this analysis.
I have created a calculate field (WAM) in analysis with formula "percentOfTotal(sum(upb),[{pool_num}]) * sum({remaining_terms})".
This gives me the desired value on each row level, but the sub total of a particular column is not reflecting the total of values in the calculate field, rather it displays the sum of original values in the "remaining_terms" field.
Please see below image for the same. Can some one please through some light on this ?
Thanks in advance for your help
Please note that I have tried the same in Excel pivot table and it works perfectly.
Try to remove the 2nd argument from the percentOfTotal function. For example, just do:
percentOfTotal(sum(upb))
I am not 100% this will work but one thought it that it would match the remaining_terms value if the percentageOfTotal was 1 (i.e. 100%) and you may not need to provide a partition argument in a pivot table since pivot tables implicitly provide partitions.
I have solved the problem in a different way. See below what I have done.
WAM = percentOfTotal(sum(upb),[{pool_num}]) * sum({remaining_terms}).
It looks like QuickSight treats the subtotal as a row and the above function is applied on the subtotal, hence it is converted as
(1186272.5 / 1186272.5) * 31 = 31.
I have tried to produce the desired result by introducing another custom field with formula
SUM_WAM = sumOver({WAM},[{pool_num}]).
This gives me the output I need, but in a column. See the screen shot attached

SAS replicates values

I have a table with some millions records. There, I have a column looking like that (goes from 1 to 7 for hundreds of times)
I would like to add an index (say nweeks) looking like that,
Any ideas?
Thanks
Without seeing more of the data table and it's potential natural ordering columns you could create a DATA step view
data work.big_with_week / view=work.big_with_week;
set big;
if list = 1 then nweek + 1;
run;
The syntax variable+expression is known as a SUM statement.
The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain variable 0;
variable=sum(variable,expression);
Thus, the retained variable nweek is only incremented when the list value is 1. If your big data ever becomes disordered or otherwise not uphold the implicit contract of list being sequenced 1..7 the view will not be accurate.

Macro that outputs table with testing results of SAS table

Problem
I'm not a very experienced SAS user, but unfortunately the lab where I can access data is restricted to SAS. Also, I don't currently have access to the data since it is only available in the lab, so I've created simulated data for testing.
I need to create a macro that gets the values and dimensions from a PROC MEANS table and performs some tests that check whether or not the top two values from the data make up 90% of the results.
As an example, assume I have panel data that lists firms revenue, costs, and profits. I've created a table that lists n, sum, mean, median, and std. Now I need to check whether or not the top two firms make up 90% of the results and if so, flag if it's profit, revenue, or costs that makes up 90%.
I'm not sure how to get started
Here are the steps :
Read the data
Read the PROC MEAN table created, get dimensions, and variables.
Get top two firms in each variable and perform check
Create new table that lists variable, value from read table, largest and second largest, and flag.
Then print table
Simulated data :
https://www.dropbox.com/s/ypmri8s6i8irn8a/dataset.csv?dl=0
PROC MEANS Table
proc import datafile="/folders/myfolders/dataset.csv"
out=dt
dbms=csv
replace;
getnames=yes;
run;
TITLE "Macro Project Sample";
PROC MEANS n sum mean median std;
VAR V1 V2 V3;
RUN;
Desired Results :
Value Largest Sec. Largest Flag
V1 463138.09 9888.09 9847.13
V2 148.92 1.99 1.99
V3 11503375 9999900 1000000 Y
At the moment I can't open your simulated dataset but I can give you some advices, hope they will help.
You can add the n extreme values of given variables using the 'output out=' statement with the option IDGROUP.
Here an example using charity dataset ( run this to create it http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1oii7oi6k9gfxn19hxiiszb70ms.htm)
proc means data=Charity;
var MoneyRaised HoursVolunteered;
output out=try sum=
IDGROUP ( MAX (Moneyraised HoursVolunteered) OUT[2] (moneyraised hoursvolunteered)=max1 max2);
run;
data var1 (keep=name1 _freq_ moneyraised max1_1 max1_2 rename=(moneyraised=value max1_1=largest max1_2=seclargest name1=name))
var2 (keep=name2 _freq_ HoursVolunteered max2_1 max2_2 rename=(HoursVolunteered=value max2_1=largest max2_2=seclargest name2=name));
length name1 name2 $4;
set try ;
name1='VAR1';
name2='VAR2';
run;
data finalmerge;
length flag $1;
set var1 var2;
if largest+seclargest > value*0.9 then flag='Y';
run;
in the proc means I choose to variables moneyraised and hoursvolunteered, you will choose your var1 var2 var3 and make your changes in all the program.
The IDgroup will output the max value for both variables, as you see in the parentheses, but with out[2], obviously largest and second largest.
You must rename them, I choose to rename max1 and max 2, then sas will add an _1 and _2 to the first and the second max values automatically.
All the output will be on the same line, so I do a datastep referencing 2 datasets in output (data var1 var2) keeping the variables needed and renaming them for the next merge, I also choose a naming system as you see.
Finally I'll merge the 2 datasets created and add the flag.
Here are some initial steps and pointers in a non macro approach which restructures the data in such a manner that no array processing is required. This approach should be good for teaching you a bit about manipulating data in SAS but will not be as fast a single pass approach (like the macros you originally posted) as it transposes and sorts the data.
First create some nice looking dummy data.
/* Create some dummy data with three variables to assess */
data have;
do firm = 1 to 3;
revenue = rand("uniform");
costs = rand("uniform");
profits = rand("uniform");
output;
end;
run;
Transpose the data so all the values are in one column (with the variable names in another).
/* Move from wide to deep table */
proc transpose
data = have
out = trans
name = Variable;
by firm;
var revenue costs profits;
run;
Sort the data so each variable is in a contiguous group of rows and the highest values are at the end of each Variable group.
/* Sort by Variable and then value
so the biggest values are at the end of each Variable group */
proc sort data = trans;
by Variable COL1;
run;
Because of the structure of this data, you could go down through each observation in turn, creating a running total, which when you get to the final observation in a Variable group would be the Variable total. In this observation you also have the largest value (the second largest was in the previous observation).
At this point you can create a data step that:
Is aware when it is in the first and last values of each variable group
by statement to make the data step aware of your groups
first.Variable temporary variable so you can initialise your total variable to 0
last.Variable temporary variable so you can output only the last line of each group
Sums up the values in each group
retain statement so SAS doesn't empty your total with each new observation
sum() function or + operator to create your total
Creates and populates new variables for the largest and second largest values in each group
lag() function or retain statement to keep the previous value (the second largest)
Creates your flag
Outputs your new variables at the end of each group
output statement to request an observation be stored
keep statement to select which variables you want
The macros you posted originally looked like they were meant to perform the analysis you are describing but with some extras (only positive values contributed to the Total, an arbitrary number of values could be included rather than just the top 2, the total was multiplied by another variable k1198, negative values where caught in the second largest, extra flags and values were calculated).

"Automatically" calculate linear combination of parameter estimates with PROC GLM

Background: I have a categorical variable, X, with four levels that I fit as separate dummy variables. Thus, there are three total dummy variables representing x=1, x=2, x=3 (x=0 is baseline).
Problem/issue: I want to be able to calculate the value of a linear combination (i.e. using SAS as a calculator) of these dummy variables. For example, 2*B1 + 2*B2 + B3.
In Stata, this can be done using the lincom command, which uses the stored beta estimates to calculate linear combinations of the parameters.
In SAS in a procedure such as PROC GLM, I think I should use the ESTIMATE statement, but I'm not sure how I would specify the "weights" for each variable in this case.
You are looking for PROC SCORE. This takes output regression or factor estimates and scores a new data set. See here for an example. http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug_score_examples02.htm
FYI, PROC MODEL does allow this in the model statement, which may be less work than PROC SCORE. I know PROC MODEL can be used readily in place of PROC REG, but I'm not sure how advanced of modeling PROC MODEL does, so it may not be an option for more complex models. I was hoping for something with less coding, but given the nature of SAS, I think this and PROC SCORE are the best I'm going to get.
What if you add your linear combination as a variable in your input dataset?
data myDatasetWithLinCom;
set mydata;
LinComb=2*(x=1)+ 2*(x=2)+(x=3); /*equvilent to 2*B1 + 2*B2 + B3*/
run;
then you can specify LinComb as one of the explanatory variables and you can lookup the coefficient directly from the output.