How to model data for race results in PowerBi? - powerbi

I have data where a constant set of athletes compete in the same race every month. Each get a position 1st, 2nd... etc
I was wondering what visualization to choose to see the position results for rach race through time. I was thinking a sanke diagram such that each destination column would represent a single race results and the results would always be ordered from top down 1st 2nd... respectively. see below:
You can see that Bue got 2nd place in Race 1 and 2nd place in Race 2. Also, Purple 1st in Race 1 but had a bad lunch before the race and didn't do so well.
I haven't been able to adapt current resources to a sanke in this way.
Is this possible?
Is there another visualization that can accomplish the same idea?
How should the data be structured for this chart to work?
Thanks so much;

You can certainly do this with the Sankey Chart visual:
However, you'll probably need to drag and drop the bars to get the order you want and manually set the colors how you want (not great if you need this fully automated).
This is how I set up the data:
Edit:
A simple line chart will be easier to automate.
The data format is more intuitive too.

Related

How to choose the MIN of a calculated column (not in Power Query)

Working with basketball data, I'm trying to get the time on court for the players (there are some columns that have information about a player or players).
I tried to obtain the value with a calculated column, named "TimeOnCourt". The code works for most cases but there is a case that, due to a mistake in the data entry team, there are different values of the players columns for the same "TimeOnCourt" so, when I try to visualize the information, the data entry mistake comes out.
I guess I could use the column "Index" to add a piece of code to choose the MIN value for the "TimeOnCourt" column but, after trying some options, I don't know where to put it or if I have to change the full code.
I also tried with Test_Flags but not working for all cases (but could fix 2 of the 4 cases).
Add you the link with the pbix file and the Test_Flag measures I tried: Link to pbix file v3
And the image with the mistake marked. The expected time in the right visualization should be 0:40:00 instead of 0:43:03 (it's due to the duplicate in Full Quarter = 2Q and Time_Def = 0:04:00. This could happen again although I talked with them so the solution should be general, not filtering this specific case.
Problem

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

How to apply conditional formatting (if cell is in another range) to a range of cells

So I have searched through several different questions related to this. None of them seem to be asking exactly what I'm looking for and none of the solutions I've found have worked for me thus far.
I have several columns of data (Player names) where each column's values are generated from a formula in the 2nd row of that column. The 1st row is a header (Game name). This whole range is the collection of which players are willing to play which games. These are columns D-J(ish, the list is dynamically generated with another formula, based on form responses)
I have another range of data where the 1st column is the Player and the 2nd is the player's PREFERRED game. This data is also generated with a formula based on form responses. These are columns A-B.
Here's what I'm trying to do
Using conditional formatting in columns D-J, I want to highlight the player's name if this game (in row 1 of this column) is their preferred game (range A2:B).
I've tried several different variations of VLOOKUPS, MATCHES, and FILTERS in the conditional formatting, but so far nothing has worked. The problem I run into every time is that I can't figure out how to reference the cell that the formatting is applying to, but still have it reference each individual cell over the whole range.
I know I could do this if I applied an individual conditional formatting to each individual cell. However that is a very time consuming and inelegant solution to this issue considering I'm expecting my data range to be much larger in the future. I need a conditional formatting formula that will work across the whole range or , at the very least, for an entire column.
This is a mock of what I'm trying to accomplish:
This is a link to a mock of my sheet so that you can clearly see the data layout and specific formulas I'm using:
https://docs.google.com/spreadsheets/d/1wy1T6dWJwNC_EfdCAbkuxtkJH7y4Cg3x4IyEk6R567M/edit?usp=sharing
use:
=REGEXMATCH(D3, TEXTJOIN("|", 1, FILTER($A$3:$A, $B$3:$B=D$2)))

ChartJS Tooltip displaying data from same dataset

Currently working on a chart that displays upwards ~1000 datapoints at any given moment between 2-3 datasets.
The only thing is that each of the points has a different timestamp (x-value).
Our goal is that upon hovering on one data point, it also brings the closest data point from the other datasets as well. We were able to achieve that with:
options={{
tooltips: {
mode: 'x'
}
}}
I understand that there is a default pointHitRadius and that seems to be the reason why multiple values of the same dataset are appearing in the same tooltip.
I made a simple test case: TEST
I increased the pointBorderRadius and it seems to include 1-5 points at a time.
Is there a way to only include data from each dataset ONCE?
The closest thing I found (before having to extend functionality), is that there is a filter function available.
However, from what I can see, it looks like it only returns one instance from each dataset. Which wouldn't be too helpful.
Anyone run into this issue?

Data mining with Weka

I am learning how to do data mining and I am using this data set from UCI's website.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
The problem I am encountering is how to deal with the area class. My understanding from the description is that I need to apply ln(x+1) to area using AddExpression.
Am I going in the correct direction with this? Or are there other filters I should investigate? Thank you.
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your dataset might become too small, or whatnot.
I think you are asked to perform regression of some attribute(s) against "log(area)" in order to linearize it. However,when you try to calculate the log of the Area, values such as log(0) are a problem. values between 0 and 1 might also be problematic.
So a common fix is to add 1 to the value of "Area". This introduces a systematic error, but it is small, and it removes all 0-values, and you can still derive useful models from your log(x+1)-transformed dataset.
And yes, in Weka you do this by "Preprocess"/ AddExpression(x+1). This creates a new attribute. Then you might remove the old area attribute.
Of course, in interpreting your model, you should be aware of the transformation. If you just want to find out what the significant independent attributes are in your linear regression model, I'd say the transformation does not matter. The data points are just shifted a little bit.