I have a set of courses in PowerBI, and I have the student grades for each course. I would like to compare the grade distribution and for this purpose, I would like to present a histogram (or a line) for each course in the same visualization.
Microsoft deprecated its Histogram visualization, so I looked for third-party tools. There are two options, but both do not answer my need. I know I could somehow define grade beans and a metric that counts the number of students in each bean. However, it seems to me that such a fundamental visualization must have a better solution.
Can you suggest an alternative?
Related
I am simulating pga tournaments using Stata. My simulation results table consists of:
column 1: the names of the 30 players in the tournament
columns 2 - 30,001: the 4 round results of my monte-carol simulations.
what I am trying to do is create a 30 x 30 matrix with the golfers' names as column 1 and across the column names where each cell represents the percentage of times Golfer A beat Golfer B outright from the 30,000 simulations. Is this possible to do in Stata? Thanks
I tend to say that everything is always possible in all programming languages, but somethings are much more difficult to do in some languages compared to others. I do not think that Stata is great tool for what you intend to do.
You need to provide some code examples for us to be able to help you with your task, but here is one thing I can say. Stata has two programming languages. One is often called Stata (but is called ado on Stata Corps webiste) and the other is Mata. If you for some reason need to use the software Stata, you should do this in the language Mata that has more matrix operators than ado. And in ado you cant store text in a matrix, so if you want to store the name of the golfer you need to use Mata, but you can also use indexes of rows and columns to keep track of the golfers.
With that said, Stata is primarily a tool to make operations and analyze a single dataset loaded into memory (recently support for multiple datasets has been added). So to answer your question, yes, this can be done in Stata, but you are probably much better of doing it in a language with more support for multidimensional arrays/vectors. For example, R or Python.
Let's say you had a table.
OrderNumber, OrderDate, City, & Sales
The Sales field is given to you. No need to calculate it.
When you bring in this data into Power BI, say you want to analyze Sales by City (in a table format).
You can just straight away drag the two fields into the table.
No need to create a measure.
So now, suppose you created a measure, though.
Total Sales = Sum(Sales).
Is there any advantage to it, in this scenario?
Is it more efficient to use: City, Total Sales
than it is to use: City, Sales
Both display the same information.
When you drag the field into the table, what Power BI does is create an implicit measure automatically based on its best guess of what aggregation (e.g. sum, max, count) it thinks you want.
So in this case, using an explicitly defined measure or an implicitly generated measure should perform the same since it is doing the same thing in the background, i.e., SUM(TableName[Sales]).
It's generally considered best practice to use explicit measures.
You may be interested in this video discussing the differences.
I was told that it is good to always create explicit measures, and that measures are more efficient. Weather right or wrong, I don't know, but from perspective of policy, it is a good idea, since measures do protect you from column name changes. In general, I think I can just make a rule of thumb to always define any measures that you want to report on explicitly.... BUT the answer above could also be correct... stack exchange doesn't let you choose multiple answers....
I'm playing around with some datasets on Kaggle.com, trying to learn better practices for ETL, as I tend to get stuck with specific things with the transform part. For this question, I am dealing with the survey results from Stack Overflow 2018: https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey - specifically the LanguageWorkedWith column.
Currently I am using a combination of RapidMiner/Excel to attempt to change the data. I am not well versed in R and Python code enough to solve this problem with coding methods.
The problem with the current column, is it lists all the languages that a user has chosen separated by a semi-colon. I can easily split a column on a semi-colon, but what occurs is either 2 things:
I have 31 columns of LanguageWorkedWith1 - LanguageWorkedWith31. This makes gathering a count of languages by salary to not work.
A cartesian effect where each row would be duplicated to accommodate only the choice of language. So you'll have a lot of duplicate rows, which definitely affects the integrity of the data. I have also tried using Power BI ( the Load location) to remove duplicates on the responder ID and language, but that didnt work.
Ideally I'd like to do a language by salary visual in Power BI, similar to how many kernals have it, but cant figure out the process for making this happen outside of code. Not sure how this would look exactly, but if i can split all the languages and count them, I can at least do something like this:
But I'm not sure if i can relate this back to salary with how the data is.
I just want to understand some transforming processes better! Appreciate any help!
The key here is to split into rows instead of columns.
So that you end up with a table like this:
You can keep that row expansion in its own related table in your data model so you aren't creating a giant table.
From there it's pretty easy to make visuals provided you know a little bit of DAX. For example, I created an AvgSalary measure (after converting that column to a numeric type) like this:
AvgSalary =
CALCULATE (
AVERAGE ( survey_results_public[ConvertedSalary] ),
FILTER (
survey_results_public,
survey_results_public[Respondent] IN VALUES ( 'Language'[Respondent] )
)
)
and was then able to create interesting charts like the following:
In PowerBI I'd like to build Non-standard matrix very similar to the report in Google Analytics.
What do I have now:
I want to change my subtotal to measure, which is calculated as the difference in percentage of the two values
What I want to get:
In Power BI, there is no way to override the subtotals of a matrix with a calculation. Part of the challenge is that you know there are only two date ranges, but as far as Power BI is concerned, there could be any number of date ranges.
It's difficult to tell from your question exactly what input you have and what output you're looking for. Further, the numbers in your screenshots are obscured. However, one consideration would be to solve the problem using measures (i.e. a measure representing the first date range, a measure representing the 2nd date range, and then a measure calculating the difference between them). You may need to change the layout of your visual a little to make this work and the specific design would depend on how static your date ranges are.
I have finally gotten a column chart working for my data set. However, it only outputs fifteen columns, and the data set has 36 columns. It will output fifteen columns (or less if I limit the set to only items that are non-zero...but my boss wants all of the data shown) no matter what width the graph is set to.
Is there an absolute hard-coded column limit for graphs made by Google's Charts API, and if not, is there a way I can tell the graph to output everything?
I've just run into this myself, almost 7 years after the original problem report. Columns representing the right-side of my data are being silently un-drawn.
Let's look at the big picture. Somebody provides a charting library. They should be expected to show the data as best they can. In the case of a column table, that would be to show the first and last columns, and then choose which intermediate columns to show based on an algorithm that takes available pixels into account. It would then let the user zoom in to see the full set of columns within the selected range. This gives the developer using the chart the freedom to show an unlimited amount of data and not have to worry that someday columns at the end are simply not drawn.
Google is already choosing to not print some of the column labels due to space constraints, so they're already halfway to understanding the big picture.
Nowhere in the documentation does it explain this truncation of columns due to space constraints, or for any other chart type that I've seen. But you sure can choose your background colors in great levels of detail.
If I had known this restriction going in, I would have chosen a different chart package and not wasted my time. My choices now are to break my "Lifetime" data into yearly graphs that fit in the available space, which is clunky as hell, or migrate to a different chart package. Thanks Google. :^(
P.S. I tried to post this as a comment to the OP, but after using SO for years I don't have enough points...