I need to build a student cohort tracking pbix so as to show students who have progressed onto the next consecutive year, students who have continued their studies and other similar metrics. Currently, I have a standard star schema as follows:
Fact Enrolment – Logs all enrolment activity for each student (multiple records can exist in the fact for each student based on different years, statuses, courses etc)
Student – Shows all students and their personal details such as email addresses, phone numbers etc. I’d rather not build upon this table as it is quite large as it currently stands.
Year of Study - This table helps to identify which year a student is studying in (e.g second year)
University Academic Year – This lists all academic years (e.g. 2017/18)
Student Status Per Year - This table lists all the possible statuses a student can have for a particular year of their degree such as ‘Current Student’, ‘withdrawn’, ‘transferred’
I was thinking of building a dimension in Power Query which shows cohort tracking for each student and links back to the fact in the standard one-to-many relationship. This will enable end-users to slice the data further by faculty etc. However, I’m not entirely sure how to do this. I was thinking of using Cohort Analysis but this does not appear to do what I need it to.
Any advice would be much appreciated.
Related
I have a dashboard that looks like this in PowerBi:
Almost every slicer and visual on this page comes from the "visits" dataset. That dataset is 70,000+ rows, where each row stands for a single patient visit to the hospital. There are a few relevant columns for this question such as: "subject mrn", and "protocol_no" (the study they're on).
Well, elsewhere, I have a dataset called "Data Managers" that is the staff assigned to each protocol. It has relevant columns of "subject mrn", "protocol no" and "staff name"
I have these datasets in my power bi like this:
When I connect these datasets by dragging in between them, Power BI warns me that they are many-to-many relationships. This makes sense because:
Lets say staff member John is the data manager for patient 12345 on study x
Well patient 12345 might also be on study y, and on that study, staff member steve is the data manager.
Also, other patients on study x might have other data managers.
So I need to connect these datasets in a way that when I filter to John, I only get rows back from the visits data where John is the data manager for that combination of subject AND study.
When I just drag across from protocol no and subject mrn like this
it doesnt work. The dropdown appears to filter to lists for john, but when I check for accuracy, its people with totally different data managers. Any idea what to do?
If anybody is looking at this, theres probably a way to do it with managing multiple relationships, but I ended up creating a concatenated column in each dataset of "Protocol, subject_mrn" and then linking those new columns together.
I need to calculate the time elapsed between to dates, in days. The values are the creation date and the payment date for every invoice in my model table. I will use this value to classify my invoices: sort by time passed, classyfy them, make a Pareto Chart... etc. So I need to create a calculated column in the inoice model table (can't imagine an approach using measures).
But I have two handicaps:
Tables ARE NOT DIRECTLY RELATED, so I can't use the RELATED function
The invoice can be paid in several installments
So, for each invoice, I need to calculate de time elapsed, in days, between the generation date (in a table) and the MAX payment date of all it's installments (in another indirectly related table)
This:
DaysElapsedInPayment = DATEDIFF(Invoices[InvoiceDate], Max(RELATED(Installments[InstallmentPaymentDate])), DAY)
would work if the tables were directly related an DateDiff whould accept MAX, but no, it's not the case. For your information, if could be usefull to you, this is my invoice-payment model scheme:
Could you help me, please?
EDIT:
As requested, I explain more deeply my data model...
My model looks complex because an invoice can be paid by several payments (part in cash, part in direct debit, for example), so a invoice can have MULTIPLE payments. At the same time, a payment can involve several invoices. Thats why my invoices-payments is many-to-many. And also, a payment (for example a payment by direct debit) can be divided in installments that are monthly sent to bank.
My model has lots of tables, not in english. For my purpose, I mock the needed ones here:
As you can see in the mocked data:
One invoice is paid in a single payment. It's the common case. We only have a single payment date in 'Payments'. That one is the one I need.
Another invoice is paid in TWO payments (part in cash, part payed by VISA), in different days. I need the date of the LAST of both payments.
Last case, I have two invoices that will be paid as a 'SINGLE' payment. Both should have to return the PaymentDate in Payments. But, at the las moment, we agreed with the client to pay them in installments. So installments are created, each one with a DueDate an the date wuere they were finally payed. So I need to get the LAST date of all installments for this payment.
It's complex, I know, but at the end, is about:
Given a value, find related values in anoter table indirectly related
From all values found, get the greatest
The function must perform theese operations, and put the results in a calculated column.
Have you triend creating a Calendar Table with the AUTOCALENDAR function? That way you could create a relation between both tables to make the calculations. Let me know if you need further assistance in order to add this table to the model
Basically I have a big Excel dataset about 500x500 with economic information from various companies.
Each row is representing a different company and in columns we have the information. A little bit of it is qualitative like ZIP code, type, etc. But most of it is quantitative. For each of the quantitative info, we have info for 5 years, so we have one column for each year and for each information i.e. Debt 2019, Debt 2020, etc.
So my question is which is the best way to preprocess this data to work with it and how should it be done. Either doing the preprocessing with Excel, running a Script on PowerBI, using Query, SQL, ...
The objective is to have a report which will be accessible online and the user will type the name of the company and it will show them the dashboard with the information of that company (only that one), so they can navigate through it.
The structure and which information is shown is the same for each company, the only thing that changes is the "numbers" that each company has. So it has to be possible to change which data is showing (to use the one from the company they want).
It also needs to be able to show comparative data to other groups of companies or to the total.
I want to have it right from the start, because then changes get complicated.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info). But I am not really sure.
I know how to use Power BI but I have never used it for something this big. I would like to know which way to organize this data is better and some info on how to do it.
Many thanks to everyone.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info).
Yes, do that.
General guidance is to use Power Query in PowerBI to transform the data into a star schema model. See Understand star schema and the importance for Power BI
So that would typically result in one table that has the "dimension" data for each company, a date table, and a "fact" table at the grain of (CompanyId,Date) with the quantitative data.
i have created a model in AWS
contains Sales records by date
for example
Type: Sale,Time:2016-08-01,Success:1 (1 is a boolean)
i want to predict how much Sales will be after 1 month from the latest date (2016-08-01)
which means a combo of Type=Sale AND Time >2016-08-01 and Success=1
any idea how to achieve this
thank u
You need to aggregate your data to a wider array of attributes to be able to use Amazon ML for such predictions. You can use different level of aggregation, for example daily, weekly and monthly.
You should also add any relevant information for the items that you are selling. For example, if you are selling umbrellas, you should add information about the amount of rain on that day, or if you are selling flowers, you should add information about day of the week or proximity to holidays, when people are buying more flowers.
I want to create a report in Netsuite ERP that shows me the information about Departments Sales and Budget by Month. I think I can achieve this by creating a saved search that chooses this items, however I don't understand under which category I can find this fields. A saved search would be ideal as I am trying to authomatize the reports in a java application, and I discovered that I can call the savedSearch results.
I found the Department under the standard Criteria in the subcategory "Owner..." and I added a Date standard criteria with the values "within this month", however I have not found the group that contains all the Sales/Income/Margin or the budget (though, I found an aggregation sum function that may be used along with a field). I will appreciate any help. Also, will the addition of this fileds be enough to get the Sales X Department X Date information or do I have to use a different join method?
Thanks!
You'd have to combine two saved searches to achieve this.One on budgets for the period you need. The department column is available on the budgets saved search.
The other would be transactions for the period. Generally budgets are against posting transactions so Invoices, Cash Sales, Credit Memos and Cash Refunds would be in your other search. If you group those by Department you could then combine the two searches in code to create your own budget vs actuals report.