Calculate Median after Summarize with detail in Stata

Calculate Median after Summarize with detail in Stata - stata

The summarize command creates various scalars in Stata. For instance, one can store the mean or min/max values through gen mean=r(mean)afterwards.
It is also possible to get more sophisticated measures via the summarize varname, detailoption. Through this, one also obtains the median in form of the 50% percentile.
My goal is to store the median. Is there a corresponding scalar?
Where can I obtain information on stored scalars after standard operations like summarize? As far as I can see they are not listed in the Stata manuals.

After each command, one can find out where the results are saved through ereturn list or return list.
In the case of summarize varname, detail the median can be obtained through r(p50).
summarize varname, detail
return list
local var_median = r(p50)

The scalars stored by the summarize command are documented at the end of the output of help summarize and in the "Stored results" section of the Stata manual documentation for the summarize command found in the Stata Base Reference Manual PDF included with your Stata installation. In general, returned results for all commands are found in locations analogous to these.

Related

How is it possible for DAX syntax to reference the original table name when using table variables?

This question comes from an example that I'm trying to understand in The Definitive Guide to DAX, Second Edition chapter 4. If you want the sample Power BI file, you can download it from the website above; it's Figure 4-26 in chapter 4. Here is the DAX code:
Correct Average =
VAR CustomersAge =
SUMMARIZE ( -- Existing combinations
Sales, -- that exist in Sales
Sales[CustomerKey], -- of the customer key and
Sales[Customer Age] -- the customer age
)
RETURN
AVERAGEX ( -- Iterate on list of
CustomersAge, -- Customers/age in Sales
Sales[Customer Age] -- and average the customer’s age
)
I understand the logic behind how SUMMARIZE and AVERAGEX are used in this example, and the requirements are all clear. What's confusing to me is how AVERAGEX references Sales[Customer Age]. Since AVERAGEX is operating on the summarized CustomersAge table variable, I would have assumed that the syntax would have been something along the lines of:
AVERAGEX (
CustomersAge,
[Customer Age] -- This is the line that I assumed would be different
)
How is it that the code given in the book is correct? Does the table variable (and the summarized table it contains) somehow have pointers to the original underlying table and column names? And is that normal for writing DAX queries, to always reference the original underlying table and column names when using table variables for intermediate steps?

Yes, the columns have what's known as data lineage. Sometimes you even have to restore lineage if it gets lost. You can read more about it here: https://www.sqlbi.com/articles/understanding-data-lineage-in-dax/

Lars, To the best of my understanding this is how I can explain it.
Creating a variable doesn't create a table that is added to the model. You can think of variables as steps or placeholders of a series of DAX expressions.
And so in the case of the SUMMARIZE used in the CustomerAge variable in this code, you'd see that the actual columns in the model were what was referenced in the arguments of SUMMARIZE. So when you perform calculations on that variable, the columns you can access are the actual columns in the model rather new columns.
What the variable has done is to help you break down the process of writing the calculation and make it less complex.
The code you wrote, as what you expect, would have been valid if in the CustomerAge variable, we created a new column, say Age * 2, and needed to perform the average over that. Then in that case that new column isn't part of the model, thus we'd reference it like you wrote.
I just got my copy of the book but I hope this helps a bit.

SUMIF type column in PowerBI

I want to count the number of stores in a particular region within Power BI. Similar to how you would use a SUMIF in Excel.
Below is a rough example of what I mean (and the data in its current format) as I am unable to share actual snips due to sensitive information.
I'm happy for any working solution, even if the count of stores is repeated on the store lines.
Thanks.

Proc reg using by variable (month): How do you take average of all coefficients across all months?

How do you take an average of the coefficients across all months?
Please refer to this question earlier
How do I perform regression by month on the same SAS data set?

The comments in the linked question provide the code to get the estimates in a data set. Then you would run a PROC MEANS on the saved data set to get the averages. But you could also run the model without which a variable to get the monthly estimates alone. In general, it isn't common to average parameter estimates this way, except in a bootstrapping process.

Power BI How to Sum Based on If a Column Contains String from Other Column

I have an Entity column with one row per entity. This table has three columns: Entity ID, a Descriptor, and a Metric. The Descriptor is a concatenation of numerous codes and I would like to see the metrics broken down by code.
I originally just split the Descriptor column into numerous rows but that led to some data relationship issues so I'd like to do it without splitting the Descriptor column.
I tried doing the following DAX formula but it resulted in an error stating "the expression contains multiple columns, but only a single column can be used in a True/False expression that is used as a table filter expression"
Desired Output Metric = CALCULATE('Metric',CONTAINSSTRING('Entity Table'[Descriptor],'Code Table'[Code]))
Ultimately I'm not even sure I need this as a column, and it may be better as a measure...
Any help would be appreciated. Thank you!

You can get around "the expression contains multiple columns, but only a single column can be used in a True/False expression that is used as a table filter expression" by using Filter within your CALCULATE.
Here it is as a created column. I used an IF because 'E' code evaluates to a blank and you wanted a 0.
Desired Output Metric = IF(CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],'Code Table'[Code])))>0,CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],'Code Table'[Code]))),0)
Here it is as a measure. Be careful to only use this at the Code detail level. When making a measure you need to use aggregate functions to reference your columns, so I am just doing the MIN(Code) since for any single code the Min() will always evaluate to equal that Code. If you try to use this at a higher summary level you may get some odd answers as it will only total for the MIN() code in the data set you are referencing.
Desired Output Metric = IF(CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],MIN('Code Table'[Code]))))>0,CALCULATE(SUM('Entity Table'[Metric]),FILTER('Entity Table',CONTAINSSTRING('Entity Table'[Descriptor],MIN('Code Table'[Code])))),0)

Why do I get different regression outputs in SAS and in Stata when using Prais-Winsten estimation?

I have a time series dataset with serious serial correlation problem, so I adopted Prais-Winsten estimator with iterated estimates to fix that. I did the regressions in Stata with the following command:
prais depvar indepvar indepvar2, vce(robust) rhotype(regress)
My colleague wanted to reproduce my results in SAS, so she used the following:
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 iter itprint method=YW;
run;
For the different specifications we ran, some of them roughly match, while others do not. Also I noticed that for each regression specification, Stata has many more iterations than SAS. I wonder if there is something wrong with my (or my colleague's) code.
Update
Inspired by Joe's comment, I modified my SAS code.
/*Iterated Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=ITYW;
run;
/*Twostep Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=YW;
run;

I have a few suggestions. Note that I'm not a real statistician and am not familiar with the specific estimators here, so this is just a quick read of the docs.
First off, the most likely issue is that it looks like SAS uses the OLS variance estimation method. That is, in your Stata code, you have vce(robust), which is in contrast to what I read SAS as using, the equivalent of vce(ols). See this page in the docs which explains how SAS does the Y-W method of autoregression, compared to this doc page that explains how Stata does it.
Second, you probably should not specify method=YW. SAS distinguishes between the simple Y-W estimation ("two-step" method) and iterated Y-W estimation. method=ITYW is what you want. You specify iter, so it may well be that you're getting this anyway as SAS tends to be smart about those sorts of things, but it's good to verify.
I would suggest actually turning the iterations off to begin with - have both do the two-step method (Stata option twostep, SAS by removing the iter request and specifying method=YW or no method specification). See how well they match there. Once you can get those to match, then move on to iterated; it's possible SAS has a different cutoff than Stata and may well not iterate past that.
I'd also suggest trying this with only one independent and dependent variable pair first, as it's possible the two programs handle things differently when you add in a second independent variable. Always start simple and then add complexity.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Calculate Median after Summarize with detail in Stata - stata

After each command, one can find out where the results are saved through ereturn list or return list. In the case of summarize varname, detail the median can be obtained through r(p50). summarize varname, detail return list local var_median = r(p50)

Related

How is it possible for DAX syntax to reference the original table name when using table variables?

SUMIF type column in PowerBI

Proc reg using by variable (month): How do you take average of all coefficients across all months?

Power BI How to Sum Based on If a Column Contains String from Other Column

Why do I get different regression outputs in SAS and in Stata when using Prais-Winsten estimation?

Categories

Resources