Running into an odd problem. I'm trying to plot a dual line graph time series, with the difference between those graphs shown as bars in the same graph.
ax = dfx['diff'].plot(kind='bar')
dfx['p1','p2'].plot(ax=ax,kind='line)
So far, so good, this works and gives me the visual I'd like but because the index is made up of unicode date strings, the x axis is a jumbled mess. So I go back and convert the index into timestamps:
dfx['Date'] = pd.to_datetime(pd.Series(dfx['Date']))
I then reset this new column as the index. Problem is, now when I try to plot it, the x axis is perfect, formatted nicely, but the "diff" bars completely disappear, yet it still shows up in the legend. Feels like whack-a-mole...one problem solved, another appears ;)
Thanks for your help!
Edit: After some testing, I've found that if I show the diff as a line, it's fine, which makes me think it's a problem with continuous vs discrete plotting. I think it's getting confused when attempting to show bar graphs on a continuous timestamp range...but I can't be the only person who's wanted to plot lines and bars on the same graph in pandas...
Related
We got several OLAP Cubes in PowerBI Datasets.
One of the cubes has a dimension "dim_location" which contains columns for latitude and longitude. But each dataset has 2 pairs of values, let's call them start_latitude, start_longitude and end_latitude, end_longitude.
I got a fact table connected to that dim_location and want to show some of the measures on a map.
It works perfectly fine with both the map visual and the ArcGIS visual, if I use either the end or the start coordinates. I can show the values as circles with changing size or changing color dependent on the value of a measure. So far so good.
But what I instead want to accomplish is to show a line on the map for each dataset. Each line shall go from start point to end point, color dependent on measure value.
Is there a way to offer the coordinates in the cube dimension in some string syntax that will create a shape, like a polygon with only 2 points, which would result in a line, which can then be shown on the map?
As stated before everything works fine on the map and the ArcGIS visual with one point (lat/lon) per dataset. Tried to find help online for some polygon syntax but came up empty.
problem screenshot
Hey guys, I'm having a problem: as you can see in the screenshot, there are some resolutions that the distance between the last and the second to last dates on x-axis is greater than other ones when you use a lot of data on that axis. Can somebody with a project like that test to see if this happens to every project? Probably this is the way Chart.js handles putting more elements on x-axis, but can I do something about it?
Thank you very much!
hist body, discrete freq xlabel(#5, labsize(small) angle(forty_five) valuelabel) produces:
I'm graphing a categorical variable, but I can't figure out how to drop the zero from the x-axis. I've tried the documentation for xlabel() and xscale() but didn't find any winners.
The short answer is to spell out that you only want xla(1/5, stuff ). How to spell out precisely which labels you want is documented.
Not the question, but this is in my view a poor graph. Go with a horizontal bar chart in which (1) the discreteness of the variable is respected;(2) the category labels are properly and readably horizontal, instead of using a most awkward device of text at 45 degrees. catplot (SSC) is one way to go. Also in Stata 13 (updated) upwards, graph hbar will do as well. You should also split the title in two lines. Even further off-topic: most consumers of this research should not care two hoots about the variable name or its question number in your survey.
What would b the best way to implement a simple shape-matching algorithm to match a plot interpolated from just 8 points (x, y) against a database of similar plots (> 12 000 entries), each plot having >100 nodes. The database has 6 categories of plots (signals measured under 6 different conditions), and the main aim is to find the right category (so for every category there's around 2000 plots to compare against).
The 8-node plot would represent actual data from measurement, but for now I am simulating this by selecting a random plot from the database, then 8 points from it, then smearing it using gaussian random number generator.
What would be the best way to implement non-linear least-squares to compare the shape of the 8-node plot against each plot from the database? Are there any c++ libraries you know of that could help with this?
Is it necessary to find the actual formula (f(x)) of the 8-node plot to use it with least squares, or will it be sufficient to use interpolation in requested points, such as interpolation from the gsl library?
You can certainly use least squares without knowing the actual formula. If all of your plots are measured at the same x value, then this is easy -- you simply compute the sum in the normal way:
where y_i is a point in your 8-node plot, sigma_i is the error on the point and Y(x_i) is the value of the plot from the database at the same x position as y_i. You can see why this is trivial if all your plots are measured at the same x value.
If they're not, you can get Y(x_i) either by fitting the plot from the database with some function (if you know it) or by interpolating between the points (if you don't know it). The simplest interpolation is just to connect the points with straight lines and find the value of the straight lines at the x_i that you want. Other interpolations might do better.
In my field, we use ROOT for these kind of things. However, scipy has a great collections of functions, and it might be easier to get started with -- if you don't mind using Python.
One major problem you could have would be that the two plots are not independent. Wikipedia suggests McNemar's test in this case.
Another problem you could have is that you don't have much information in your test plot, so your results will be affected greatly by statistical fluctuations. In other words, if you only have 8 test points and two plots match, how will you know if the underlying functions are really the same, or if the 8 points simply jumped around (inside their error bars) in such a way that it looks like the plot from the database -- purely by chance! ... I'm afraid you won't really know. So the plots that test well will include false positives (low purity), and some of the plots that don't happen to test well were probably actually good matches (low efficiency).
To solve that, you would need to either use a test plot with more points or else bring in other information. If you can throw away plots from the database that you know can't match for other reasons, that will help a lot.
I am trying to build a graph that will change resolution depending on how far you are zoomed in. Here is what it looks like when you are complete zoomed out.
So this looks good so when I zoom in I get a higher resolution data and my graph looks like this:
The problem is when I zoom out the higher resolution data does not get cleared out of the graph:
The tables below the graphs are table display what is in the DataTable. This is what drawing code looks like.
var g_graph = new google.visualization.AnnotatedTimeLine(document.getElementById('graph_div_json'));
var table = new google.visualization.Table(document.getElementById('table_div_json'));
function handleQueryResponse(response){
log("Drawing graph")
var data = response.getDataTable()
g_graph.draw(data, {allowRedraw:true, thickness:2, fill:50, scaleType:'maximized'})
table.draw(data, {allowRedraw:true})
}
I am try to find a way for it to only displaying the data that is in the DataTable. I have tried removing the allowRedraw flag but then it breaks the zooming operation.
Any help would be greatly appreciated.
Thanks
See also
Annotated TimeLine when zoomed-out, Too Many Datapoints.
you can remove the allow redraw flag.
In that case you have to put the data points manually in your data table
The latest date of the actual whole data
The most outdated date in the actual whole data.
this will retain your zooming operation.
I think you have already seen removing the allowRedraw flag, works but with a small problem, flickering the whole chart.
It seems to me that the best solution would be to draw every nth data point, depending on your level of zoom. On the Google Finance graph(s), the zoom levels are pre-determined at the top: 1m, 5m, 1h, 1 day, 5 days, etc. It seems evident that this is exactly what Google is doing. At the max view level, they're plotting points that fall on the month. If you're polling 1000 times a day (with each poll generating a single point), then you'd be taking every 30,000th point (the fist point being the very first one of the month, and the 30,000th one being the last point).
Each of these zoom levels would implement a different plot of the data points. Each point should have a time stamp with accuracy to the second, so you'll easily be able to scale the plot based on the level of detail.