I have two columns, one as Acutal_date and another as Incident_ID
Actual_date Incident_ID
12/02/2014 W23421
02/12/2015 W234234
I want to create a simple time series plot in SAS VA that displays count by Month-Year. For example, Feb-2014 == count as 1, Mar-2014 == count as 0 ... I refomart the Actual_date variable into MMYYYY, but when I do the time series plot, the count doesn't compress as Month-Year, but still actual dates, so I get a lot of 1's across the board.
Related
I am trying to return the percentage of the grand total from #number of clients using the second and fourth columns. The number of clients values are text and are collected using Count('Table'[Column]). That is where I have run into issues. When I try Countrows() or AllSelected() to try and work around it, it returns all the rows and doesn't keep the filters I have set.
My current measures:
Client Total = COUNTA('Table'[Client_Name])
% with at least 1 document = DIVIDE(SUM('Table'[At Least 1 Document Sum]), [Client Total])
Right now, it only calculates the percentage based on the filtered # of clients in the same row versus the grand total. I am hoping to have it use a dynamic grand total that is shown at the bottom of the table.
[Current Table] (https://i.stack.imgur.com/IG254.png)
Here is my solve
Measure Name = CALCULATE(COUNTROWS('Table'),ALLSELECTED('Table'))
Then:
Measure Name = SUM('Table'[At Least 1 Document Sum])/'Table'[Measure]
I have a large list of client data that I am interested in measuring the number of blanks and the number of cells with data (similar to using the counta and countblank functions in Excel). I'd like the data to be displayed in a table similar to the one pasted below.
Desired Output
[
Sample Data (first 4 rows)
You can use the COUNTBLANKS DAX function.
Your DAX will be something like
Measure Name = COUNTBLANKS(tablename[columnname])
For counts of items that are not blank, then you need to COUNTA function, this will count all non empty cells
Measure Name = COUNTA(tablename[columnname])
I was attempting to employ the formulas here to calculate a running total of a column in PowerBI. However, my data is time-independent. In addition, every other running total calculation I've seen for PowerBI has been in reference to a date field. The target column is a "Frequency" column, and represents the estimated frequency of the event represented by each record. How do I generate a cumulative total of these frequencies, from lowest frequency to greatest? This is used to generate an exceedence curve for the consequences of events based on the running frequency total, called an F-N curve.
Per this site: https://www.daxpatterns.com/cumulative-total/, I was able to generate the following measure:
Measure_cumF =
CALCULATE (
sum([content.F]),
FILTER (
ALLSELECTED( Sheet1),
Sheet1[Content.N] >= MIN ( Sheet1[Content.N] )
)
)
"MIN" allows the cumulative sum of "Content.F" to start at the row containing the highest value of the desired sorting list, in this case "Content.N". "MAX" will start the cumulative sum at the row containing the lowest value of "Content.N".
"ALLSELECTED" applies the current filters to the measure. Replace with "ALL" to have a static value that always returns the cumulative sum of the entire column.
I've started to manage PowerBi from a couple of weeks so i'm a little bit confused about some things.
My problem is that i need a Matrix in my dashboard with percent values but i want the total in number value because the total of a percent of row shows me always 100% and i dont know about the number i'm working
This is my Matrix with percentage values
This is how i want the total of row returns me but with the columns values ins percentage
I've tried to make a measure counting the values
COUNT(OPSRespuestas[answer])
After that turn off the total of rows and add this measure to the values in my matrix but this is what i get
This is my table after trying add a measure with the total
It returns me the total for each of the columns and not the total of all my rows.
These are the tables i'm working with
This my top header values
This is my left header values
The answer column is what i need to count
This is my relationship between this 3 tables although i have many more intermediate table aside from this 3 as you're going to see in the next picture:
My relationship tables
So finally what i need is that this matrix shows me the total of answer in percentage for each of departments and group of questions and then show me total by department but with number value
The problem you are experiencing has to do with context. Each row is seen as it own total population hence the 100% total. Each column in this row is evaluated against the total of that row to provide a percentage value.
In addition to adding a custom measure to replace the total, you could also consider computing a percentage against the grand total of all dimensions. This means that each cell gets evaluated against the the total of all rows and columns. In this ways the cell value would change compared to your first table but the row total does not evaluate to 100% anymore.
SUM ( [Value] ) / CALCULATE ( SUM ( [Value] ) ; ALL ( 'Your Table' ) )
I have a list of companies with start and end dates for each. I want to count the number of companies alive over time. I have the following code but it runs slowly on my large dataset. Is there a more efficient way to do this in Stata?
forvalues y = 1982/2012 {
forvalues m = 1/12 {
*display date("01-`m'-`y'","DMY")
count if start_dt <= date("01-`m'-`y'","DMY") & date("01-`m'-`y'","DMY") <= end_dt
}
}
One way is to use the inrange function. In Stata, Date variables are just integers so you can easily operate on them.
forvalues y = 1982/2012 {
forvalues m = 1/12 {
local d = date("01-`m'-`y'","DMY")
count if inrange(`d', start_dt, end_dt)
}
}
This alone will save you a huge amount of time. For 50.000 observations (and made-up data):
. timer list 1
1: 3.40 / 1 = 3.3980
. timer list 2
2: 18.61 / 1 = 18.6130
timer 1 is with inrange, timer 2 is your original code. Results are in seconds. Run help inrange and help timer for details.
That said, maybe someone can suggest an overall better strategy.
Assuming a firm identifier firmid, this is another way to think about the problem, but with a different data structure. Make sure you have a saved copy of your dataset before you do this.
expand 2
bysort firmid : gen eitherdate = cond(_n == 1, start_dt, end_dt)
by firmid : gen score = cond(_n == 1, 1, -1)
sort eitherdate
gen living = sum(score)
by eitherdate : replace living = living[_N]
So,
We expand each observation to 2 and put both dates in a new variable, the start date in one observation and the end date in the other observation.
We assign a score that is 1 when a firm starts and -1 when it ends.
The number of firms is increased by 1 every time a firm starts and decreased by 1 every time one ends. We just need to sort by date and the number of firms is the cumulative sum of those scores. (EDIT: There is a fix for changes on the same date.)
This new data structure could be useful for other purposes.
There is a write-up at http://www.stata-journal.com/article.html?article=dm0068
EDIT:
Notes in response to #Roberto Ferrer (and anyone else who read this):
I fixed a bad bug, which made this too difficult to understand. Sorry about that.
The dates used here are just the dates at which firms start and end. There is no evident point in evaluating the number of firms at any other date as it would just be the same number as the previous date used. If you needed, however, to interpolate to a grid of dates, copying the previous count would be sufficient.
It is important not to confuse the Stata function sum() which returns the cumulative sum with any egen function. The impression that egen's total() is an alternative here was a side-effect of my bug.