What would be the best visually appealing power BI chart/graph to compare numbers with huge differences? - powerbi

I need to compare Year Month values somehow next to each other.
But the difference is so big that it is not very visually appealing.
I tried to find a good chart to display those, but so far no success.
Any recommendation?

If you want them to share the same axis as you have in your image, you could switch the scale to logarithmic instead of linear. Otherwise, you could try putting them on separate axes so they scale independently.
Another option would be to normalize them somehow. If one set were monthly, then you could multiply it by 12 to "annualize" it.

Related

How to manage custom number formatting in power BI?

How can I do custom number formatting in a Power Bi visual?
I don't want to show all value as million. I want to put thousand for 1-day value, and million for 1-week value and year for 1-year value.
Power BI charts follow the principles of good data visualisation. That includes a scale that is relevant to the data with labels that relate to the scale.
In the visualisation, the differences for the values less than 1M are not discernible. The label with the 0M supports that approach, although it doesn't look great. But that happens when you have a chart with very large AND very small values. Power BI only supports one display unit and you selected Millions.
You may want to consider using a different visual for the data. Not all visuals to be shown as charts. If you want to show the exact numbers, then a simple table might be a better approach. In a sorted list of numbers, the digits in a number act very much like a horizontal bar.
Or split the chart in two and show one chart for values above 1M and another for values below 1M.
Or use Thousands as display units instead of Millions.

Clustering a list of dates

I have a list of dates I'd like to cluster into 3 clusters. Now, I can see hints that I should be looking at k-means, but all the examples I've found so far are related to coordinates, in other words, pairs of list items.
I want to take this list of dates and append them to three separate lists indicating whether they were before, during or after a certain event. I don't have the time for this event, but that's why I'm guessing it by breaking the date/times into three groups.
Can anyone please help with a simple example on how to use something like numpy or scipy to do this?
k-means is exclusively for coordinates. And more precisely: for continuous and linear values.
The reason is the mean functions. Many people overlook the role of the mean for k-means (despite it being in the name...)
On non-numerical data, how do you compute the mean?
There exist some variants for binary or categorial data. IIRC there is k-modes, for example, and there is k-medoids (PAM, partitioning around medoids).
It's unclear to me what you want to achieve overall... your data seems to be 1-dimensional, so you may want to look at the many questions here about 1-dimensional data (as the data can be sorted, it can be processed much more efficiently than multidimensional data).
In general, even if you projected your data into unix time (seconds since 1.1.1970), k-means will likely only return mediocre results for you. The reason is that it will try to make the three intervals have the same length.
Do you have any reason to suspect that "before", "during" and "after" have the same duration? If not, don't use k-means.
You may however want to have a look at KDE; and plot the estimated density. Once you have understood the role of density for your task, you can start looking at appropriate algorithms (e.g. take the derivative of your density estimation, and look for the largest increase / decrease, or estimate an "average" level, and look for the longest above-average interval).
Here are some workaround methods that may not be the best answer but should help.
You can plot the dates as converted durations from a starting date (such as one week)
and convert the dates to number representations for time in minutes or hours from the starting point.
These would all graph along an x-axis but Kmeans should still be possible and clustering still visible on a graph.
Here are more examples of numpy:Python k-means algorithm

Shape-matching of plots using non-linear least squares

What would b the best way to implement a simple shape-matching algorithm to match a plot interpolated from just 8 points (x, y) against a database of similar plots (> 12 000 entries), each plot having >100 nodes. The database has 6 categories of plots (signals measured under 6 different conditions), and the main aim is to find the right category (so for every category there's around 2000 plots to compare against).
The 8-node plot would represent actual data from measurement, but for now I am simulating this by selecting a random plot from the database, then 8 points from it, then smearing it using gaussian random number generator.
What would be the best way to implement non-linear least-squares to compare the shape of the 8-node plot against each plot from the database? Are there any c++ libraries you know of that could help with this?
Is it necessary to find the actual formula (f(x)) of the 8-node plot to use it with least squares, or will it be sufficient to use interpolation in requested points, such as interpolation from the gsl library?
You can certainly use least squares without knowing the actual formula. If all of your plots are measured at the same x value, then this is easy -- you simply compute the sum in the normal way:
where y_i is a point in your 8-node plot, sigma_i is the error on the point and Y(x_i) is the value of the plot from the database at the same x position as y_i. You can see why this is trivial if all your plots are measured at the same x value.
If they're not, you can get Y(x_i) either by fitting the plot from the database with some function (if you know it) or by interpolating between the points (if you don't know it). The simplest interpolation is just to connect the points with straight lines and find the value of the straight lines at the x_i that you want. Other interpolations might do better.
In my field, we use ROOT for these kind of things. However, scipy has a great collections of functions, and it might be easier to get started with -- if you don't mind using Python.
One major problem you could have would be that the two plots are not independent. Wikipedia suggests McNemar's test in this case.
Another problem you could have is that you don't have much information in your test plot, so your results will be affected greatly by statistical fluctuations. In other words, if you only have 8 test points and two plots match, how will you know if the underlying functions are really the same, or if the 8 points simply jumped around (inside their error bars) in such a way that it looks like the plot from the database -- purely by chance! ... I'm afraid you won't really know. So the plots that test well will include false positives (low purity), and some of the plots that don't happen to test well were probably actually good matches (low efficiency).
To solve that, you would need to either use a test plot with more points or else bring in other information. If you can throw away plots from the database that you know can't match for other reasons, that will help a lot.

Analyzing gaze tracking data

I have an image which was shown to groups of people with different domain knowledge of its content. I than recorded gaze fixation data of them watching the image.
I now kind of want to compare the results of the two groups - so what I need to know is, if there is a correlation of the positions of the sampling data between the two groups or not.
I have the original image as well as the fixation coords. Do you have any good idea how to start analyzing the data?
It's more about the idea or the plan so you don't have to be too technical on that one.
Thanks
Simple idea: render all the coordinates on the original image in a 'heat map' like way, one image for each group. You can then visually compare the images for correlation, and you have some nice graphics for in your paper.
There is something like the two-dimensional correlation coefficient. With software like R or Matlab you can do the number crunching for the correlation.
Matlab has a function for this:
Two Dimensional Correlation Function: corr2
Computes two dimensional correlation coefficient between two matrices
and the matrices must be of the same size. r = corr2 (A,B)
In gaze tracking, the most interesting data lies in two areas.
In where all people look, for that you can use the heat map Daan suggests. Make a heat map for all people, and heat maps for separate groups of people.
In when people look there. For that I would recommend you start by making heat maps as above, but for short time intervals starting from the time the picture was first shown. Again, for all people, and for the separate groups you have.
The resulting set of heat-maps, perhaps animated for the ones from the second point, should give you some pointers for further analysis.

Google Visualization Annotated Time Line, removing data points

I am trying to build a graph that will change resolution depending on how far you are zoomed in. Here is what it looks like when you are complete zoomed out.
So this looks good so when I zoom in I get a higher resolution data and my graph looks like this:
The problem is when I zoom out the higher resolution data does not get cleared out of the graph:
The tables below the graphs are table display what is in the DataTable. This is what drawing code looks like.
var g_graph = new google.visualization.AnnotatedTimeLine(document.getElementById('graph_div_json'));
var table = new google.visualization.Table(document.getElementById('table_div_json'));
function handleQueryResponse(response){
log("Drawing graph")
var data = response.getDataTable()
g_graph.draw(data, {allowRedraw:true, thickness:2, fill:50, scaleType:'maximized'})
table.draw(data, {allowRedraw:true})
}
I am try to find a way for it to only displaying the data that is in the DataTable. I have tried removing the allowRedraw flag but then it breaks the zooming operation.
Any help would be greatly appreciated.
Thanks
See also
Annotated TimeLine when zoomed-out, Too Many Datapoints.
you can remove the allow redraw flag.
In that case you have to put the data points manually in your data table
The latest date of the actual whole data
The most outdated date in the actual whole data.
this will retain your zooming operation.
I think you have already seen removing the allowRedraw flag, works but with a small problem, flickering the whole chart.
It seems to me that the best solution would be to draw every nth data point, depending on your level of zoom. On the Google Finance graph(s), the zoom levels are pre-determined at the top: 1m, 5m, 1h, 1 day, 5 days, etc. It seems evident that this is exactly what Google is doing. At the max view level, they're plotting points that fall on the month. If you're polling 1000 times a day (with each poll generating a single point), then you'd be taking every 30,000th point (the fist point being the very first one of the month, and the 30,000th one being the last point).
Each of these zoom levels would implement a different plot of the data points. Each point should have a time stamp with accuracy to the second, so you'll easily be able to scale the plot based on the level of detail.