How to produce a smoothed curve in Superset? - apache-superset

Using Apache Superset, I would like to plot a smoothed curve for my data.
I have data where the frequency of datapoints is inconsistent; sometimes there are whole months without data:
I would like to smooth this using Apache Superset.
My approach was to resample the data to daily values, filling in days without values with a zero value:
This gives an appropriate result:
I then attempted to smooth the data with a rolling mean:
However, this does not give the result I expected:
I would have expected a smooth line. How should I modify my use of Superset to obtain a smoothed line given data with inconsistent frequency of datapoints?

Related

PowerBI - Show lines on a map from one point to another

We got several OLAP Cubes in PowerBI Datasets.
One of the cubes has a dimension "dim_location" which contains columns for latitude and longitude. But each dataset has 2 pairs of values, let's call them start_latitude, start_longitude and end_latitude, end_longitude.
I got a fact table connected to that dim_location and want to show some of the measures on a map.
It works perfectly fine with both the map visual and the ArcGIS visual, if I use either the end or the start coordinates. I can show the values as circles with changing size or changing color dependent on the value of a measure. So far so good.
But what I instead want to accomplish is to show a line on the map for each dataset. Each line shall go from start point to end point, color dependent on measure value.
Is there a way to offer the coordinates in the cube dimension in some string syntax that will create a shape, like a polygon with only 2 points, which would result in a line, which can then be shown on the map?
As stated before everything works fine on the map and the ArcGIS visual with one point (lat/lon) per dataset. Tried to find help online for some polygon syntax but came up empty.

How to plot spectrogram from an array or (vector,list etc) containing raw data?

I have been working to find temporal displacement between audio signals using a spectrogram. I have a short array containing data of a sound wave (pulses at specific frequencies). Now I want to plot spectrogram from that array. I have followed this steps (Spectrogram C++ library):
It would be fairly easy to put together your own spectrogram. The steps are:
window function (fairly trivial, e.g. Hanning)
FFT (FFTW would be a good choice but if licensing is an issue then go for Kiss FFT or
similar)
calculate log magnitude of frequency domain components(trivial: log(sqrt(re * re + im * im))
Now after performing these 3 steps, I am stuck at how to plot the spectrogram from this available data? Being naive in this field, I need some clear steps ahead to plot the spectrogram.
I know that a simple spectrogram has Frequency at Y-Axis, time at X-axis and magnitude as the color intensity.
But how do I get these three things to plot the spectrogram? (I want to observe and analyze data behind spectral peaks(what's the value on Y-axis and X-axis), the main purpose of plotting spectrogram).
Regards,
Khubaib

Measure the time the temperature is ascending and descending?

Ive got a graph that displays the temperature from my wood pellets stove, what I would like is to get the time the temperature is rising vs cooling down.
Anyone know how to get something like the slope of the curve in RRDTool or something similar?
You can do this in two different ways.
First of all, you could use a "DERIVE" data type. This will log the derivative -- IE, the slope -- of the data instead of the actual data. However, this will not store the actual temperatures, which is probably not what you want.
The next way to do it is to calculate the slope on the fly from the actual data, as we build the graph. You've already stored your temperature using a GAUGE data type. Now, you can use a calculated value to work out the slope.
DEF:temp=myrrdfile.rrd:ds0:AVERAGE
CDEF:slope=temp,PREV(temp),-,STEPWIDTH,/
This calculates slope to be the difference between the current and previous value, divided by the time interval.
However, since all you seem to be interested in is if the temperature is going up or down, you could instead use something like:
CDEF:cooling=temp,PREV(temp),LT,INF,0,IF
CDEF:warming=temp,PREV(temp),GT,INF,0,IF
AREA:cooling#0000cc::skipscale
AREA:warming#cc0000::skipscale
LINE:temp#00cc00:Temperature
This will graph the temperature as a green line, with a background of red if warming, and blue if cooling.

How to exploit periodicity to reduce noise of a signal?

100 periods have been collected from a 3 dimensional periodic signal. The wavelength slightly varies. The noise of the wavelength follows Gaussian distribution with zero mean. A good estimate of the wavelength is known, that is not an issue here. The noise of the amplitude may not be Gaussian and may be contaminated with outliers.
How can I compute a single period that approximates 'best' all of the collected 100 periods?
Time-series, ARMA, ARIMA, Kalman Filter, autoregression and autocorrelation seem to be keywords here.
UPDATE 1: I have no idea how time-series models work. Are they prepared for varying wavelengths? Can they handle non-smooth true signals? If a time-series model is fitted, can I compute a 'best estimate' for a single period? How?
UPDATE 2: A related question is this. Speed is not an issue in my case. Processing is done off-line, after all periods have been collected.
Origin of the problem: I am measuring acceleration during human steps at 200 Hz. After that I am trying to double integrate the data to get the vertical displacement of the center of gravity. Of course the noise introduces a HUGE error when you integrate twice. I would like to exploit periodicity to reduce this noise. Here is a crude graph of the actual data (y: acceleration in g, x: time in second) of 6 steps corresponding to 3 periods (1 left and 1 right step is a period):
My interest is now purely theoretical, as http://jap.physiology.org/content/39/1/174.abstract gives a pretty good recipe what to do.
We have used wavelets for noise suppression with similar signal measured from cows during walking.
I'm don't think the noise is so much of a problem here and the biggest peaks represent actual changes in the acceleration during walking.
I suppose that the angle of the leg and thus accelerometer changes during your experiment and you need to account for that in order to calculate the distance i.e you need to know what is the orientation of the accelerometer in each time step. See e.g this technical note for one to account for angle.
If you need get accurate measures of the position the best solution would be to get an accelerometer with a magnetometer, which also measures orientation. Something like this should work: http://www.sparkfun.com/products/10321.
EDIT: I have looked into this a bit more in the last few days because a similar project is in my to do list as well... We have not used gyros in the past, but we are doing so in the next project.
The inaccuracy in the positioning doesn't come from the white noise, but from the inaccuracy and drift of the gyro. And the error then accumulates very quickly due to the double integration. Intersense has a product called Navshoe, that addresses this problem by zeroing the error after each step (see this paper). And this is a good introduction to inertial navigation.
Periodic signal without noise has the following property:
f(a) = f(a+k), where k is the wavelength.
Next bit of information that is needed is that your signal is composed of separate samples. Every bit of information you've collected are based on samples, which are values of f() function. From 100 samples, you can get the mean value:
1/n * sum(s_i), where i is in range [0..n-1] and n = 100.
This needs to be done for every dimension of your data. If you use 3d data, it will be applied 3 times. Result would be (x,y,z) points. You can find value of s_i from the periodic signal equation simply by doing
s_i(a).x = f(a+k*i).x
s_i(a).y = f(a+k*i).y
s_i(a).z = f(a+k*i).z
If the wavelength is not accurate, this will give you additional source of error or you'll need to adjust it to match the real wavelength of each period. Since
k*i = k+k+...+k
if the wavelength varies, you'll need to use
k_1+k_2+k_3+...+k_i
instead of k*i.
Unfortunately with errors in wavelength, there will be big problems keeping this k_1..k_i chain in sync with the actual data. You'd actually need to know how to regognize the starting position of each period from your actual data. Possibly need to mark them by hand.
Now, all the mean values you calculated would be functions like this:
m(a) :: R->(x,y,z)
Now this is a curve in 3d space. More complex error models will be left as an excersize for the reader.
If you have a copy of Curve Fitting Toolbox, localized regression might be a good choice.
Curve Fitting Toolbox supports both lowess and loess localized regression models for curve and curve fitting.
There is an option for robust localized regression
The following blog post shows how to use cross validation to estimate an optimzal spaning parameter for a localized regression model, as well as techniques to estimate confidence intervals using a bootstrap.
http://blogs.mathworks.com/loren/2011/01/13/data-driven-fitting/

Google Visualization Annotated Time Line, removing data points

I am trying to build a graph that will change resolution depending on how far you are zoomed in. Here is what it looks like when you are complete zoomed out.
So this looks good so when I zoom in I get a higher resolution data and my graph looks like this:
The problem is when I zoom out the higher resolution data does not get cleared out of the graph:
The tables below the graphs are table display what is in the DataTable. This is what drawing code looks like.
var g_graph = new google.visualization.AnnotatedTimeLine(document.getElementById('graph_div_json'));
var table = new google.visualization.Table(document.getElementById('table_div_json'));
function handleQueryResponse(response){
log("Drawing graph")
var data = response.getDataTable()
g_graph.draw(data, {allowRedraw:true, thickness:2, fill:50, scaleType:'maximized'})
table.draw(data, {allowRedraw:true})
}
I am try to find a way for it to only displaying the data that is in the DataTable. I have tried removing the allowRedraw flag but then it breaks the zooming operation.
Any help would be greatly appreciated.
Thanks
See also
Annotated TimeLine when zoomed-out, Too Many Datapoints.
you can remove the allow redraw flag.
In that case you have to put the data points manually in your data table
The latest date of the actual whole data
The most outdated date in the actual whole data.
this will retain your zooming operation.
I think you have already seen removing the allowRedraw flag, works but with a small problem, flickering the whole chart.
It seems to me that the best solution would be to draw every nth data point, depending on your level of zoom. On the Google Finance graph(s), the zoom levels are pre-determined at the top: 1m, 5m, 1h, 1 day, 5 days, etc. It seems evident that this is exactly what Google is doing. At the max view level, they're plotting points that fall on the month. If you're polling 1000 times a day (with each poll generating a single point), then you'd be taking every 30,000th point (the fist point being the very first one of the month, and the 30,000th one being the last point).
Each of these zoom levels would implement a different plot of the data points. Each point should have a time stamp with accuracy to the second, so you'll easily be able to scale the plot based on the level of detail.