How to replace missing values in panel data? - stata

I am looking into weekly earnings data, where I have defined my data as pre-pandemic earning data and post-pandemic earning data. Now for some individuals, I have some missing values for the post-pandemic period which I want to replace with their pre-pandemic earnings. However, I am struggling with the coding for this panel data. I was hoping someone could help me with. Thanks in advance.

It is always easier if you share example data (see dataex) or at least list what variables you have. The example below will therefore most likely need to be edited.
* Sort the data by individual id and the time unit
* that indicates if this the obs is pre or post pandemic
sort id time
* This replaces the earnings value with a missing value if the
* id var is the same as on the next row AND the earnings var
* on is missing on the next row
replace earnings = . if id == id[_n+1] & missing(earnings[_n+1])
This assumes that all individuals are indeed represented in each time period and that you have a unique id variable (id) in your data set.

Related

DAX - Count one row for each group that meet filter requirement

I have a UserLog table that logs the actions "ADDED", "DELETED" or "UPDATED" along with the date/timestamp.
In Power BI, I would like to accurately show the amount of new users as well as removed users for a filtered time period. Since it's possible that a user has been added and then deleted, I need to make sure that I only get the last record (ADDED/DELETED) from the log for every user.
Firstly, I tried setting up a measure that gets the max date/timestamp:
LastUpdate = CALCULATE(MAX(UserLog[LogDate]), UserLog[Action] <> "UPDATED")
I then tried to create the measure that shows the amount of new users:
AddedCount = CALCULATE(COUNT(UserLog[userId]), FILTER(UserLog, [Action] = "ADDED" && [LogDate] = [LastUpdate]))
But the result is not accurate as it still counts all "ADDED" records regardless if it's the last record or not.
I find writing formulas against unstructured data is much harder. If you put a little more structure to the data, it becomes much easier. There are lots of ways to do this. Here's one.
Add 2 columns to your UserLog table to help count adds and deletes:
Add = IF([Action]="ADDED",1,0)
Delete = IF([Action]="DELETED",1,0)
Then create a summary table so you can tell if the same user was added and deleted in the same [LogDate]:
Counts = SUMMARIZE('UserLog',
[LogDate],
[userId],
"Add",SUM('UserLog'[Add]),
"Delete",SUM('UserLog'[Delete]))
Then the measures are simple to write. For the AddedCount, just filter the rows where [Delete]=0, and for DeletedCount, just filter the rows were [Add]=0:
AddedCount = CALCULATE(SUM(Counts[Add]),
Counts[Delete]=0,
FILTER(Counts,Counts[LogDate]=[LastUpdate]))
DeletedCount = CALCULATE(SUM(Counts[Delete]),
Counts[Add]=0,
FILTER(Counts,Counts[LogDate]=[LastUpdate]))

Tableau: percentage of total for specific dimension / COUNTD IF statement

I have an employee dataset and I'm trying to display the proportion of BAME staff only as a KPI. My dataset has duplicate lines (staff who have more than 1 role) and so I have to use a distinct count of their staff IDs (i.e. cannot use measure values or SUM) as I want to measure ethnicity per person not per role. The only way I can get the % of BAME across the population is by having all other categories visible however I am looking for a way to display the % of BAME across total only. For example in a table calc BAME = 12%, Non-BAME=85% and Unknown=3% - I want to keep the 12% only to put on my dashboard.
I've not needed to do this in Tableau before and I'm quite new to the software. Is there a way to do this? I thought that perhaps I would need to do an IF statement similar to Excel but the below syntax returns an error.
IF [Ethnic Group]="BAME" THEN COUNTD([Staff ID]) / COUNTD([Staff ID]) END
Thank you so much in advance!
COUNTD(IF [Ethnic Group]="BAME" THEN [Staff ID] END)/COUNTD([Staff ID])

Power BI - Creating a calculated table

I am creating a dashboard in Power BI. I have to report the executions of a process in a daily basis. When selecting one of these days, I want to create another calculated table based on the day selected (providing concrete information about the number of executions and hours) as it follows:
TABLE_B = FILTER(TABLE_A; TABLE_A[EXEC_DATE] = [dateSelected])
When [dateSelected] is previously calculated from the selected day as it follows:
dateSelected = FORMAT(FIRSTDATE(TABLE_A[EXEC_DATE]);"dd/MM/yyyy")
I tried a lot of alternatives as, for example, create individualy the year, month and day to later compare. I used the format in both sides of the comparation, but none of them works for me. The most of the cases it returns me the whole source table without any kind of filters. In other cases, it doesn't return anything. But, when I put a concrete day ...
TABLE_B = FILTER(TABLE_A; TABLE_A[EXEC_DATE] = "20/02/2019")
... it makes the filter correctly generating the table as I want.
Does someone know how to implement the functionality I am searching for?
Thanks in advance.
You're almost there Juan. You simply need to use dateSelected as a varialbe inside of your DAX query:
TABLE_B =
var dateSelected = FIRSTDATE(TABLE_A[EXEC_DATE])
return
FILTER(TABLE_A, TABLE_A[EXEC_DATE] = dateSelected)
Note that all my dates are formatted as Date so I didn't need to use a FORMAT function.
Here's the final result:
I admit that this behavior can be quite confusing! Here is a useful link that will help you understand Power BI's context:
https://community.powerbi.com/t5/Desktop/Filtering-table-by-measures/td-p/131361
Let's treat option 1 as FILTER(TABLE_A; TABLE_A[EXEC_DATE] = "20/02/2019") and option 2 as FILTER(TABLE_A; TABLE_A[EXEC_DATE] = [dateSelected]). Quote from the post:
In option 1, in the filter function, you are iterating
over each row of your 'Table' (row context). In option 2, because you
are using a measure as part of the filter condition, this row context
is transformed into an equivalent filter context (context transition).
Using variables (...) is very convenient when you want to filter
a column based on the value of a measure but you don't want context
transition to apply.

DAX: How to achieve the effect of a join/relationship between a bitmask column and a table of bit values?

I have a table CarHistoryFact (CarHistoryFactId, CarId, CarHistoryFactTime, CarHistoryFactConditions) that tracks the historical status of cars. CarHistoryFactConditions is a 25-bit (in binary) int column that encodes the status of 25 various conditions the car may be in at a given point in time.
I have a dimension table CarConditions with a row for each of the conditions, and their base 10 bit value.
How can I implement a "relationship" between the fact and dimension, giving a list of all the conditions a given car is
I can come up with bit parsing code, but I'm not sure how to hook it up to the dimension table to get just the currently applicable conditions at a car-time.
Bitmask parsing in dax can be seen here :
https://radacad.com/quick-dax-convert-number-to-binary
You can create a CROSSJOIN Table where all records are added 25 times and then filter out the once not existing
CarHistoryConditions =
var temp = CROSSJOIN(CarHistoryFact ; CarConditions )
return FILTER(temp; MOD(TRUNC(CarHistoryFact [CarHistoryFactConditions] / CarConditions [bit]):2) = 1)
note: I assumed the CarHistoryFactConditions and bit to be an integer, not a string of bits. For sure you can change that.
The reult is a table with as many rows of conditions for each car. E.g. Car one has 2 conditions and Car two has 5 conditions. You get 7 rows

Power BI Dashboard where the core filter condition is a disjunction on numeric fields

We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).
The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.
To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!