Currently working on a chart that displays upwards ~1000 datapoints at any given moment between 2-3 datasets.
The only thing is that each of the points has a different timestamp (x-value).
Our goal is that upon hovering on one data point, it also brings the closest data point from the other datasets as well. We were able to achieve that with:
options={{
tooltips: {
mode: 'x'
}
}}
I understand that there is a default pointHitRadius and that seems to be the reason why multiple values of the same dataset are appearing in the same tooltip.
I made a simple test case: TEST
I increased the pointBorderRadius and it seems to include 1-5 points at a time.
Is there a way to only include data from each dataset ONCE?
The closest thing I found (before having to extend functionality), is that there is a filter function available.
However, from what I can see, it looks like it only returns one instance from each dataset. Which wouldn't be too helpful.
Anyone run into this issue?
Related
Working with basketball data, I'm trying to get the time on court for the players (there are some columns that have information about a player or players).
I tried to obtain the value with a calculated column, named "TimeOnCourt". The code works for most cases but there is a case that, due to a mistake in the data entry team, there are different values of the players columns for the same "TimeOnCourt" so, when I try to visualize the information, the data entry mistake comes out.
I guess I could use the column "Index" to add a piece of code to choose the MIN value for the "TimeOnCourt" column but, after trying some options, I don't know where to put it or if I have to change the full code.
I also tried with Test_Flags but not working for all cases (but could fix 2 of the 4 cases).
Add you the link with the pbix file and the Test_Flag measures I tried: Link to pbix file v3
And the image with the mistake marked. The expected time in the right visualization should be 0:40:00 instead of 0:43:03 (it's due to the duplicate in Full Quarter = 2Q and Time_Def = 0:04:00. This could happen again although I talked with them so the solution should be general, not filtering this specific case.
Problem
I'm running into an issue when trying to compare data across two sheets to find discrepancies - specifically when it comes to comparing start and end times.
Right now, the "IF" statement in my screenshot is executing perfectly, except when a time is involved - it's reading those cells as decimals instead (but only sometimes).
I've tried formatting these cells (on the raw data AND on this "Discrepancies" report sheet) so that they are displayed as a "HH:MM am/pm" time, but the sheet is still comparing the decimal values.
Is there anything that I can add to this function to account for a compared value being a time instead of text, and having that text be compared for any discrepancies? I cannot add or change anything to the raw data sheets, the only thing I can edit is the formula seen in the screenshot I provided.
See the highlighted cells in my screenshot - this is the issue I keep running into. As you can see, there are SOME cells (the non-highlighted ones) that are executing as intended, but I'm unsure why this isn't the case for the whole spreadsheet when I've formatted everything the same way using the exact same formula across the whole sheet.
For example, the values in cell N2 is "8:00 AM" on both sheets, so the formula should just display "8:00 AM" in that cell (and NOT be highlighted) since there is no discrepancy in the cells between both sheets it's comparing. But instead, it's showing both times as a decimal with the slightest difference between them and is suggesting a difference where there technically isn't (or shouldn't be) one.
Please help!
Screenshot of original spreadsheet for reference
---EDIT (added the below):
Here is a view-only version of a SAMPLE SHEET that displays the issue I'm having:
https://docs.google.com/spreadsheets/d/1BdSQGsCajB3kOnYxzM3sl-0o3iTvR3ABdHpnzYRXjpA/edit?usp=sharing
On the sample sheet, the only cells that are performing as intended are C2, E2, G2, I2, K2, K6, or any cells that contain text like "Closed". Any of the other cells that have a time in both raw data tabs appears to be pulling the serial numbers for those times instead of correctly formatting it into "HH:mm AM/PM".
A quick tour of how the SAMPLE SHEET is set up:
User enters raw data into the "MicrositeRawData" and "SalesforceRawData" tabs.
Data is pulled from the "SalesforceRawData" tab into the "CleanedUpSalesforceData" tab using a QUERY that matches the UNIQUE ID's from the "MicrositeRawData" sheet, so that it essentially creates a tab that's in the same order and accounts for any extraneous data between the tabs (keep in mind this is a sample sheet and that the original sheet I'm using includes a lot more data which causes a mismatch of rows between the sheets which makes the QUERY necessary).
The "DISCREPANCIES" tab then compares the data between the "MicrositeRawData" and "CleanedUpSalesforceData" tabs. If the data is the same, it simply copies the data from the "MicrositeRawData" cell. But if the data is NOT the same, it lists the values from both sheets and is conditionally formatted to highlight those cells in yellow.
If there is data on the "MicrositeRawData" tab that is NOT included on the "SalesforceRawData" tab, the "DISCREPANCIES" tab will notate that and highlight the "A" cell in pink instead of yellow (as demonstrated in "A5").
try in B2:
=IF(MicrositeRawData!B2=CleanedUpSalesforceData!B2, MicrositeRawData!B2,
"MICROSITE: "&TEXT(MicrositeRawData!B2, "h:mm AM/PM")&CHAR(10)&
"SALESFORCE: "&TEXT(CleanedUpSalesforceData!B2, "h:mm AM/PM"))
update
delete all formulae from range B2:O10 and use this in B2:
=ARRAYFORMULA(IF(TO_TEXT(MicrositeRawData!B2:O10)=
TO_TEXT(CleanedUpSalesforceData!B2:O10), MicrositeRawData!B2:O10,
"MICROSITE: "&TEXT(IF(MicrositeRawData!B2:O10="",
"", MicrositeRawData!B2:O10), "h:mm AM/PM")&CHAR(10)&
"SALESFORCE: "&TEXT(IF(CleanedUpSalesforceData!B2:O10="",
"", CleanedUpSalesforceData!B2:O10), "h:mm AM/PM")))
I am new to SageMaker. I have a large csv dataset which I would like labelled:
sentence_id
sentence
pre_agreed_label
148392
A sentence
0
383294
Another sentence
1
For each sentence, I would like a) a yes/no binary classification in response to a question, and b) on a scale of 1-3, how obvious the classification was. I need the sentence id to map to other parts of the dataset, and will use the pre-agreed labels to assess accuracy.
I have identified SageMaker GroundTruth labelling jobs as a possible way to do this. Is this the best way? In trying to set it up I have run into a few problems.
The first problem is I can't find a way to display only the sentence column to the labellers, hiding the sentence_id and pre_agreed_labels.
The second is that there is either single labelling or multi labelling, but I would like a way to have two sets of single-selection labels:
Select one for binary classification:
Yes
No
Select one for difficulty of classification:
Easy
Medium
Hard
It seems as though this can be done using custom HTML, but I don't know how to do this - the template it gives you doesn't even render
Finally, having not used mechanical turk before, are there ways of ensuring people take the work seriously and don't just select random answers? I can see there's an option to have x number of people answer the same question, but is there also a way to put in an obvious question to which we already have a 'pre_agreed_label' every nth question, and kick people off the task if they get it wrong? There also appears to be a maximum of $1.20 per task which seems odd.
I have data where a constant set of athletes compete in the same race every month. Each get a position 1st, 2nd... etc
I was wondering what visualization to choose to see the position results for rach race through time. I was thinking a sanke diagram such that each destination column would represent a single race results and the results would always be ordered from top down 1st 2nd... respectively. see below:
You can see that Bue got 2nd place in Race 1 and 2nd place in Race 2. Also, Purple 1st in Race 1 but had a bad lunch before the race and didn't do so well.
I haven't been able to adapt current resources to a sanke in this way.
Is this possible?
Is there another visualization that can accomplish the same idea?
How should the data be structured for this chart to work?
Thanks so much;
You can certainly do this with the Sankey Chart visual:
However, you'll probably need to drag and drop the bars to get the order you want and manually set the colors how you want (not great if you need this fully automated).
This is how I set up the data:
Edit:
A simple line chart will be easier to automate.
The data format is more intuitive too.
In my application, I can view multiple charts at the same time. Those charts can be either displayed seperated or overlayed. Overlayed means, that those charts all are in the same chart-div, like this:
The problem is: if i want to add the data, i need to build an array out of all those datapoints, looking like this:
let data=[
{
date:...,
data1:...,
data2:...,
.....
},
{
....
}
]
But when i load it from the server, I have seperate data arrays which i have to merge. That means, I have to look at every single datapoint and create a new one in the "merged-array" if there is no datapoint at the given date.
Doing this with thousands of datapoints uses way too much resources. My question is: is it possible to supply multiple data arrays instead of only 1?
Using series.data instead of chart.data solved the problem for me!