I am trying to create a table which has two categories - X an Y. I am trying to create a table in SAS visual analytics that tells me the share of total in each category. My table looks something like this
Category A
Catgeoy B
Total
40%
60%
100%
I was trying to follow the below link but unfortunately my version of SAS VA does not have Aggregated measure ( tabular) option in it so I do not know how can I proceed forward with it.
How can i go about creating one without the aggregated tabular option
https://communities.sas.com/t5/SAS-Communities-Library/SAS-Visual-Analytics-Report-Example-Percent-of-Total-For-All-For/ta-p/636030
To do this in VA 7.5, we'll use a Crosstab object, a transposed form of your data, and use the "Percent of row total" calculation option within the crosstab. Let's use the below data for our example:
data have;
input id x y;
datalines;
1 40 60
2 30 70
3 90 10
;
run;
Step 1: Transpose to long and create by-groups
Transpose your data so that it is in a long format, then load it and register it to LASR.
proc transpose data = have
out = want(rename=(COL1 = value))
name = category
;
by id;
var x y;
run;
Output:
id category value
1 x 40
1 y 60
2 x 30
2 y 70
3 x 90
3 y 10
Step 2: Create a crosstab
Change id to a category, then create a crosstab that looks like this:
Columns: category
Rows: id
Measures: value
Go to Options --> Scroll to the bottom --> expand "Totals and Subtotals," and Enable "Totals" for rows and set the Placement to "After."
Step 3: Create a row-level Percent Calculation
Right-click the header value within the table and select "Create and add calculation...".
Select "Percent of row total - Sum" under the "Type" drop-down menu.
Remove Value as a role from the crosstab graph, format Percent to have 0 decimal places, and you'll have a table with row-wise percentages.
Related
I'm facing a challenge in power bi where I need to join Table 1 & Table 2 but the catch is Table 2 needs to be pivoted before joining.
Input:
Table 1
Table 2
Expected Output:
How to build the output table when the Table 2 rows will be increasing daily
Your desired result needs a combination of Unpivot, Merge and Pivot steps.
You can follow the following steps:
Convert the Date column to Text type.
Unpivot columns Sales in KG & Sales in Amount from Table 2 - it will create 2 columns called Attribute & Value
Merge columns Date & Attribute to create a new column (lets call it columnHeaders) - this will be in the format - 1/3/22 Sales in KG, 1/3/22 Sales in Amount ...
Merge Table 1 into Table 2 and expand the Product Name column
Now you will have 4 columns - Product Code, columnHeaders, Value, & Product Name
Pivot columnHeader using the Value column for values
You should have your desired result.
I dont know why you try to unpivot your table. Just use a Matrix visualization:
Model relationship:
Imput data + output:
sum of kg = CALCULATE(sum(Table2[Sales in kg]))
Suppose I have the following database:
DATA have;
INPUT id date gain;
CARDS;
1 201405 100
2 201504 20
2 201504 30
2 201505 30
2 201505 50
3 201508 200
3 201509 200
3 201509 300
;
RUN;
I want to create a new table want where the average of the variable gain is grouped by id and by date. The final database should look like this:
DATA want;
INPUT id date average_gain;
CARDS;
1 201405 100
2 201504 25
2 201505 40
3 201508 200
3 201509 250
I tried to obtain the desired result using the code below but it didn't work:
PROC sql;
CREATE TABLE want as
SELECT *,
mean(gain) as average_gain
FROM have
GROUP BY id, date
ORDER BY id, date
;
QUIT;
It's the asterisk that's causing the issue. That will resolve to id, date, gain, which is not what you want. ANSI SQL would not allow this type of functionality so it's one way in which SAS differs from other SQL implementation.
There should be a note in the log about remerging with the original data, which is essentially what's happening. The summary values are remerged to every line.
To avoid this, list your group by fields in your query and it will work as expected.
PROC sql;
CREATE TABLE want as
SELECT id, date,
mean(gain) as average_gain
FROM have
GROUP BY id, date
ORDER BY id, date
;
QUIT;
I will say, in general, PROC MEANS is usually a better option because:
calculate for multiple variables & statistics without need to list them all out multiple times
can get results at multiple levels, for example totals at grand total, id and group level
not all statistics can be calculated within PROC MEANS
supports variable lists so you can shortcut reference long lists without any issues
I am looking to join two tables together
Table 1 - The baseball dataset
DATA baseball;
SET sashelp.baseball
(KEEP = crhits);
RUN;
Table 2 - A table containing the percentiles of CRhits
PROC STDIZE
DATA = baseball
OUT=_NULL_
PCTLMTD=ORD_STAT
PCTLDEF=5
OUTSTAT=STDLONGPCTLS
(WHERE = (SUBSTR(_TYPE_,1,1) = "P"))
pctlpts = 1 TO 99 BY 1;
RUN;
I would like to join these tables together to create a table that contains the values for crhits and then a column identifying which percentile that value belongs to like below
crhits percentile percentile_value
54 p3 54
66 p5 66
825 p63 825
1134 p76 1133
The last column indicates the percentile value given by stdlongpctls
I currently use the following code to calculate the percentiles and a loop to count the number of "Events" per percentile, per factor
I have tried a cross-join but I am having trouble visualising how to join these two tables without an explicit key
PROC SQL;
CREATE TABLE cross_join_table AS
SELECT
a.crhits
, b._TYPE_
, CASE WHEN
a.crhits < b.type THEN b._TYPE_ END AS percentile
FROM
baseball a
CROSS JOIN
stdlongpctls b;
QUIT;
If there is another easier / more efficient way to find the number of observations and number of dependent variables (e.g. I am modelling on a default flag event in my actual dataset, so the sum of 1's per percentile group, I would appreciate it)
Use PROC RANK instead to group it into the percentiles.
proc rank data=sashelp.baseball out=baseball_ranks group=100;
var crhits;
rank rank_crhits;
run;
You can then summarize it using PROC MEANS.
I have a requirement where I have a data like this,
Date Name Age
1-1-2018 A 1
2-2-2018 B 1
3-3-2018 B 1
6-6-2018 C 2
7-7-2018 B 6
I am trying to give a slicer to the user to select the required number of months from the last month.
So to do that, I am using a calculated column like this:
Month Year = DATEDIFF((Table1[Date]), TODAY(), MONTH) + 1
So that changes the data to something like this:
Date Name Age MonthYear
1-1-2018 A 1 7
2-2-2018 B 1 6
3-3-2018 B 1 5
6-6-2018 C 2 2
7-7-2018 B 6 1
The user selects the Month Year from the Slicer.
For example, when he selects 2, I want to display the last 2 months records in the table.
Expected Output:
Date Name Age
6-6-2018 C 2
7-7-2018 B 6
This works for me if I hardcode it like this:
Calculated Table = CALCULATETABLE(Table1,
FILTER(Table1, OR(Table1[MonthYear] > 2, Table1[MonthYear] = 2)))
But it fails when I try to pass the value in the place of 2 dynamically through a measure using SelectedValue function.
Calculated columns and calculated tables cannot reference a slicer value since they are only computed when you load your data.
If you want to apply this filtering to a visual, I'd suggest creating a separate table for your slicer. For example, you could use Months = GENERATESERIES(1,12) and then rename the column Months as well.
Use the Months[Months] column for your slicer and then create a measure which references it to filter your table/matrix visual.
Filter = IF(SELECTEDVALUE(Months[Months]) >= MAX(Table1[Month Year]), 1, 0)
Then use that measure in your Visual level filters box:
I'm trying to use SAS to compute a moving average for x number of periods that uses forecasted values in the calculation. For example if I have a data set with ten observations for a variable, and I wanted to do a 3-month moving average. The first forecast value should be an average of the last 3 observations, and the second forecast value should be an average of the last two observations, and the first forecast value.
If you have for example data like this:
data input;
infile datalines;
length product $10 period value 8;
informat period yymmdd10.;
format period yymmdd10.;
input product $ period value;
datalines;
car 2016-01-01 10
car 2015-12-01 20
car 2015-11-01 30
car 2015-10-01 40
car 2015-09-01 30
car 2015-08-01 15
;
run;
You can left join input table itself with a condition:
input t1 left join input t2
on t1.product = t2.product
and t2.period between intnx('month',t1.period,-2,'b') and t1.period
group by t1.product, t1.period, t1.value
With this you have t1.value as current value and avg(t2.value) as 3 months avg. To compute 2 months avg change every value that is older then previos period to missing value with ifn() function:
avg(ifn( t2.period >= intnx('month',t1.period,-1,'b'),t2.value,. ))
Full code could looks like this:
proc sql;
create table want as
select t1.product, t1.period, t1.value as currentValue,
ifn(count(t2.period)>1,avg(ifn( t2.period >= intnx('month',t1.period,-1,'b'),t2.value,. )),.) as twoMonthsAVG,
ifn(count(t2.period)>2,avg(t2.value),.) as threeMonthsAVG
from input t1 left join input t2
on t1.product = t2.product
and t2.period between intnx('month',t1.period,-2,'b') and t1.period
group by t1.product, t1.period, t1.value
;
quit;
I've also added count(t2.perion) condition to return missing values if I haven't got enough records to compute measure. My result set looks like this: