My table looks like this:
Candidate |Current Status | Interviewer 1 | Interview 1 Date | Interviewer 2 | Interview 2 Date
Candidate 1 | Int1 clear | aaa | 1/1/2020 | bbb | 2/1/2020 <br>
Candidate 2 | Int1 pending | bbb | 10/1/2020 | aaa | 10/2/2020 <br>
There're more columns but I'm ignoring them for now.
I want to create a view to find out how many interviews were conducted by "aaa" drill down to interview date and the current status. Issue is, "aaa" will be shown for both Interview 1 & 2.
I tried to unpivot for Interviewer 1 and Interviewer 2, but that gives me the irrelevant dates of Interview by "bbb". Something like,
Candidate 1 | Int 1 clear | 1/1/2020 | 2/1/2020 | Interviewr 1 | aaa<br>
Candidate 1 | Int 1 clear | 1/1/2020 | 2/1/2020 | Interviewr 1 | bbb<br>
Candidate 2 | Int 1 pending | 1/1/2020 | 2/1/2020 | Interviewr 2 | aaa<br>
Candidate 2 | Int 1 pending | 1/1/2020 | 2/1/2020 | Interviewr 2 | bbb<br>
Now there's data (Interview 2 date) of aaa for interviews conducted by bbb.
Clarification - Interview 1 and Interview 2 are of same candidate. Candidate is going through series of interview so we're trying to keep the track of the candidate and the interviews they go through.
Each interview is taken by a different panelist - I want to count the number of interviews taken by the panelist and drill down to the details of each Interview
I don't exactly know what it is you want to do since your explanation is somewhat vague. If I understand you correctly you might be better of giving Interviewer labels to the correct Interview by hand.
For example:
(this is without unpivoting)
Interview |Interviewer|Candidate. |status
____________________________________________
Interview 1|aaa. |Candidate 1|Pending
Interview 2|bbb. |Candidate 2|Pending
Interview 3|aaa. |Candidate 3|Clear
and so on
Or, you could also try making interviewer columns like the following:
aaa. |bbb. |Candidate. |status
____________________________________________
Interview 1|Interview 2|Candidate 1|Pending
Interview 3|Interview 5|Candidate 2|Pending
Interview 4|interview 6|Candidate 3|Clear
and so on
In case of the latter you can unpivot aaa and bbb. This wil create a table where you will find the interviewer in one table and the interviews the interviewer has conducted in a values table. This will however make it so that the candidates where interviewed by both interviewers. I do not know if this is what you want. You can work around this, but for that we would need more information and a more clear question.
Both ways described above would let you create a filter for interviewer and thus let you calculate whatever you want for the corresponding interviewer.
hope this helps
Are you 100% married to the idea of keeping everything in one table? There are some advantages to the approach of creating separate tables for the interviewers, candidates, and possibly the interview status.
However, let's assume that you prefer to keep everything in one table. There's actually no need to unpivot columns to solve what you're looking for.
I recommend using a tidy data approach and creating one column for each variable. In this case the variables are the candidate, the interviewer, the date of the interview, which interview it was, and what the interview status is. Personally I would make the interview status a calculated column either directly in the query or after the table loads and using DAX.
This is how I would approach it - first make a duplicate of the original query. Drop the interview status column for now in both queries.
In your original query, also get rid of the columns for the interviewer and interview date for the second interview. You should have three columns left in the original query - candidate, interviewer 1, and interview 1 date. Create a new column for the interview stage. Populate it with something like "1" or "First".
In your duplicate query keep the information for candidate, interviewer 2, and interview 2 date. Get rid of interviewer 1 and interview 1 date. You should have three columns, candidate, interviewer 2, and interview 2 date. Create a new column for the interview stage. Populate it with something like "2" or "Second".
In both queries change the column names so they're the same in both queries. I recommend simply dropping the 1 or 2 from the interviewer and interview date columns.
Append the two queries together. You should now have one table with four columns: candidate, interviewer, interview date, and interview stage. Since your primary interest is in the interviewer, move that column to the far left. Sort by the interviewer first (ascending or descending by whichever works better for you), then by the candidate ascending or descending, and then by the date in ascending order. Add an index column and either leave it at the end or move it to the far left as you choose. It doesn't matter if you start at 0 or 1 on the index column.
At this point you can either load the table or try to create a status column using whatever logic determines pending vs cleared or other statuses you might have. Personally I find it easier to create columns for that type of logic using DAX but it may be easier to do it in the query depending on how complex the logic is.
Once you have that calculated column for the status you should have everything you need to generate the visuals for what you want to see. The index column is there to give you more options with how you approach the status column. It also gives you a way to put the table in the exact order you had it in the query prior to load. As I'm sure you've noticed when looking at your tables in the datasheet view after load, the rows probably aren't in the same order that they were in the query. Also, you can't sort on more than one column at a time in the datasheet view. Sorting by the index column takes care of both those concerns.
If you do the status column in DAX, you will probably want to look at the EARLIER function if you're not already familiar with it.
Related
I'm struggling to create a Measure that sums a column and have it filter out duplicate IDs while taking only the latest row.
For example, there is a table as such:
UID | Quantity | Status | StatusDate
aaa | 3 | Shipped | 11/1/2020
aaa | 3 | Delivered | 11/5/2020
bbb | 5 | Ordered | 10/29/2020
ccc | 8 | Shipped | 11/4/2020
So the idea would be to sum the quantity, but I would only want to count quantity for id "aaa" once and only count towards the latest status ("Delivered" in this case). I would make a visual that shows the quantities with status as its axis. I also need to add a date Slicer so I could go back in time. So, when I go before 11/5/2020, instead of "Delivered," it would switch back to "Shipped."
I tried several methods:
SUMMARIZE to a table filtering "MAX" date value if UID is the same. I found this doesn't work with the date slicer since it seems like it is not actually recalculating the filtering and just slicing away rows outside of the dates. Seems to be the same whether the SUMMARIZE is set as a new table or VAR in the Measure.
CALCULATE seems promising but I can't seem to figure out a syntax
that filters that I need. Example of one that doesn't work (I also tried SUMX instead of SUM but that doesn't work either):
CALCULATE(
SUM(Table1[Quantity]),
FILTER(Table1, [StatusDate]=MAXX(FILTER(Table1,[UID]=EARLIER([UID])),[StatusDate])
)
I also tried adding a column that states whether if the row is "old" as well as a numerical "rank" to the different statuses. But once again, I run into the issue where the date slicer is not exactly filtering to recalculate those columns. For example, if the date slicer is set to 11/3/2020, it should add "3" to "Shipped" instead of "Delivered." But instead of that, it just removes the row which tells me that it is not actually recalculating the columns (like #1).
Any help would be appreciated :-) Thank you!
You can try something like this:
Measure =
VAR d = LASTDATE(Table1[StatusDate])
VAR tb = SUMMARIZE(FILTER(Table1, Table1[StatusDate] <= d),
Table1[UID],
"last", LASTDATE(Table1[StatusDate]))
RETURN CALCULATE(SUM(Table1[Quantity]), TREATAS(tb, Table1[UID], Table1[StatusDate]))
The tb variable contains a table which has the latest date per UID. You then use that to filter your main table with the TREATAS function.
One other alternative is to create a table with the RANK function ordered by date and then doing a SUM over that table, where Rank = 1.
This seems like one of the use cases for groups, but maybe I'm understanding them wrong.
I have a table that shows a count of all rows like this:
User | Completed Tasks
Bob | 2
Jim | 1
Pete | 1
The table it comes from looks like this:
User | Type
Bob | A
Bob | B
Jim | A
Pete | C
This is very simplified - in reality there are about 80 different types - I'm hoping to get 5 of them in a group called Secondary and the rest in a group called Primary
For the example, say I want A and B to be considered "primary" and C to be secondary.
The new table would look like this:
User | Completed Tasks | Primary | Secondary
Bob | 2 | 2 | 0
Jim | 1 | 1 | 0
Pete | 1 | 0 | 1
I tried creating a group of Type with 5 called Secondary and the rest called Primary, but I was having trouble figuring it out.
I just want a Count of types for that particular group based on the filtered values and everything.
Is there an easy way to do this or do I need to create a measure/calculated column?
I ended up solving this by creating two calculated columns.
The Dax for the primary would be a 1 for each row not in the Secondary list:
PrimaryCount = IF(table[Type] in {"C","D","E","F","G"},0,1)
The Dax for the secondary would be a 1 for each row IN the Secondary list:
SecondaryCount = IF(table[Type] in {"C","D","E","F","G"},1,0)
Then, just add those to your table values and make sure Sum is selected (the default).
I figured using groups would be easier, but I suppose this is simple enough and seems to work.
Another way to approach this is to create a calculated column for Group
Group = IF(table[Type] IN {"A","B"}, "Primary", "Secondary")
You can then use the Group as the columns on a matrix and count the Type column.
Note that this approach scales better if you want to break into a lot more groups. You'd likely want to use a SWITCH in that case like this:
Group =
SWITCH(TRUE(),
Table1[Type] IN {"A","B"}, "Primary",
Table1[Type] IN {"C"}, "Secondary",
Table1[Type] IN {"D", "E", "F"}, "Tertiary",
"Other"
)
I have a flat table like this,
R# Cat SWN CWN CompBy ReqBy Department
1 A 1 1 Team A Team B Department 1
2 A 1 3 Team A Team B Department 1
3 B 1 3 Team A Team B Department 1
4 B 2 3 Team A Team C Department 1
5 B 2 3 Team D Team C Department 2
6 C 2 2 Team D Team C Department 2
R# indicates the RequestNumber,
Cat# indicates the Category,
SWN indicates the Submitted Week Number,
CWN indicates the Completed Week Number,
CompBy indicates Completed By,
ReqBy Indicates Requested By,
Department Indicates Department Name,
I would like to create a data model that avoids ambiguity and at the same time allows me to report on Category, SWN, CWN (needs to be only a week number), CompBY, ReqBy, Department through a single filter.
For example, the dashboard will have a single filter choice to select a week number.If that week number is selected, it will show the details of these requests from submitted and completed week number. I understand this requires the creation of a calendar table or something like that.
I am looking for a data-model that explains the cardinality and direction(Single or both). If possible, kindly post the PBIX file and repost the link here.
What I have tried: Not able to establish one of the four connections
Update: Providing a bounty for this question because I would like to see how does the Star schema will look like for this flat table.
One of the reason I am looking for a star schema over a flat table is - For example, a restaurant menu is a dimension and the purchased food is a fact. If you combined these into one table, how would you identify which food has never been ordered? For that matter, prior to your first order, how would you identify what food was available on the menu?
The scope of your question is rather unclear, so I'm just addressing this part of the post:
the dashboard will have a single filter choice to select a week number. If that week number is selected, it will show the details of these requests from submitted and completed week number.
One way to get OR logic is to use a disconnected parameter table and write your measures using the parameters selected. For example, consider this schema:
If you put WN on a slicer, then you can write a measure to filter the table based on the number selected.
WN Filter = IF(COUNTROWS(
INTERSECT(
VALUES(WeekDimension[WN]),
UNION(
VALUES(MasterTable[SWN]),
VALUES(MasterTable[CWN])))) > 0, 1, 0)
Then if you use that measure as a visual level filter, you can see all the records that correspond to your WN selection.
If you can clarify your question to more closely approach a mcve, then you'll likely get better responses. I can't quite determine the specific idea you're having trouble with.
I am using EU-SILC database for 2008 for Greece. Firstly, I would like to use PE040 so as to create three dummies: primeduc for education on pre-primary AND primary school seceduc on lower secondary education +(upper) secondary + post-secondary non tertiary education and tereduc on 1st + 2nd tertiary stage.
Secondly, I would like to make a variable about working experience based on the idea exper=age-educ-6 where educ I would like sth about the years (generally) spent in education.
Any ideas of which commands I should use on stata???
What I've tried so far
About stata syntax:
tabulate PE040, gen(educ)
gen primeduc=educ1+educ2
gen seceduc=educ3+educ4+educ5
gen tereduc=educ6
Having defined lnwage as =log(PY010N/(PL060+PL070)) and age as =2008-PB140, I've tried to regress and it takes only into account 191 obs.
For your first question, I think you want a 0-1 indicator, equal to 1 if either of the indicated educational categories was recorded.
gen primeduc=educ1 | educ2
gen seceduc =educ3 |educ4 |educ5
The "|" stands for logical "or". For example, primeduc will be 1 if educ1 is 1 or educ2 is 1.
I have a dataset that results from the joins between a few results from a proc univariate.
After some more joins, I have a final dataset with a variable called "Measure", which has the name of certain measures, like 'mean' and 'standard deviation', for example, and other variables each with values for these measures, representing a month in a certain year.
I'd like to sort these measures in a particular order and, for now, I'm doing a proc transpose, doing a retain to stabilish the order I want, and doing another transpose. The problem is that this a really naive solution and I feel it just takes longer than it should take.
Is there a simpler/more effective way to do this sort?
An example of what I want to do, with random values:
What I have:
Measures | 2013/01 | 2013/02 | 2013/03
Mean | 10 | 9 | 11
Std Devi.| 1 | 1 | 1
Median | 3 | 5 | 4
What I want:
Measures | 2013/01 | 2013/02 | 2013/03
Std Devi.| 1 | 1 | 1
Median | 3 | 5 | 4
Mean | 10 | 9 | 11
I hope I was clear enough.
Thanks in advance
Couple of straightforward solutions. First, you could simply add a variable that you sort by and then drop. Don't need to transpose, just do it in the data step or PROC SQL after the join. if measures='Mean' then sortorder=3; else if measures='MEdian' then sortorder=2;... then sort by sortorder and then drop it in the PROC SORT step.
Second, if you're using entirely numeric values, you can use PROC MEANS to do the sorting for you, with a custom format that defines the order (using NOTSORTED and order=data on the class statement) and idgroup functionality in PROC MEANS to do the sorting and output the right values. This is overkill in most cases, but if the dataset is huge it might be appropriate.
Third, if you're doing the joins in SQL, you can order by the variable that you input into a order you want - I can explain that in more detail if you find that the most useful.