pywavelet signal reconstruction - wavelet

I am trying to understand the concept of wavelets using the pywavelet library. My first step was to see how I could reconstruct a given input signal using the wavelet coefficients. Please see my code below:
db1 = pywt.Wavelet('db1')
cA6, cD6,cD5, cD4, cD3, cD2, cD1=pywt.wavedec(data, db1, level=6)
cA6cD_approx = pywt.upcoef('a',cA6,'db1',take=n, level=6) + pywt.upcoef('d',cD1,'db1',take=n, level=6)\
+pywt.upcoef('d',cD2,'db1',take=n, level=6) + pywt.upcoef('d',cD3,'db1',take=n, level=6) + \
pywt.upcoef('d',cD4,'db1',take=n, level=6) + pywt.upcoef('d',cD5,'db1',take=n, level=6) + \
pywt.upcoef('d',cD6,'db1',take=n, level=6)
plt.figure(figsize=(28,10))
p1, =plt.plot(t, cA6cD_approx,'r')
p2, =plt.plot(t, data, 'b')
plt.xlabel('Day')
plt.ylabel('Number of units sold')
plt.legend([p2,p1], ["original signal", "cA6+cD* reconstructed"])
plt.show()
This yielded the following plot:
Now, when I used the waverec() method, the signal reconstruction was quite accurate. Please see plot below:
Can someone please explain the difference between the two reconstruction methods?

They are both Inverse Discrete Wavelet Transform "upcoef" is a direct reconstruction using the coefficients while "waverec" is a Multilevel 1D Inverse Discrete Wavelet Transform, doing pretty much the same thing, but doing it in a way that allows you to line up your coefficients and be more efficient when developing.

I changed a little bit, especially the setting for "level". From the plot, you will see two ways of reconstruct will produce the same result.
import numpy as np
import pywt
import matplotlib.pyplot as plt
data = np.loadtxt('Mysample_test.txt')
n = len(data)
wl = pywt.Wavelet("db1")
coeff_all = pywt.wavedec(data, wl, level=6)
cA6, cD6,cD5, cD4, cD3, cD2, cD1= coeff_all
omp0 = pywt.upcoef('a',cA6,wl,level=6)[:n]
omp1 = pywt.upcoef('d',cD1,wl,level=1)[:n]
omp2 = pywt.upcoef('d',cD2,wl,level=2)[:n]
omp3 = pywt.upcoef('d',cD3,wl,level=3)[:n]
omp4 = pywt.upcoef('d',cD4,wl,level=4)[:n]
omp5 = pywt.upcoef('d',cD5,wl,level=5)[:n]
omp6 = pywt.upcoef('d',cD6,wl,level=6)[:n]
#cA6cD_approx = omp0 + omp1 + omp2 + omp3 + omp4+ omp5 + omp6
#plt.figure(figsize=(18,9))
recon = pywt.waverec(coeff_all, wavelet= wl)
p1, =plt.plot(omp0 + omp6 + omp5 + omp4 + omp3 + omp2 + omp1,'r')
p2, =plt.plot(data, 'b')
p3, =plt.plot(recon, 'y')
plt.xlabel('Day')
plt.ylabel('Number of units sold')
plt.legend([p3,p2,p1], ["waverec reconstructed","original signal", "cA6+cD* reconstructed"])
plt.show()

The function wavedec performs a tree decomposition, which means a filtering followed by a downsampling (of a factor 2 for a dyadic scheme).
Both functions waverec and upcoef can lead to reconstruction.
The first one, waverec, performs a direct tree reconstruction symmetrical to what is done by wavedec, which means an upsampling followed by a filtering. At each reconstruction level (6 in your case) a summation is also performed to yield a signal with more details to be used for the next reconstruction level.
The second function, upcoef, allows to perform the independent reconstruction of a given subscale without considering the rest of the details contained in the other subscales. This is usually performed by zero padding when rebuilding the signal. In other words, upcoef can be seen like an interpolation operator.
In your case, you used upcoef to interpolate all the wavelet subscales from their decimated x-grid to the original x-grid. You then performed the summation of all the interpolated signals (only containing a defined and limited quantity of details). Because Daubechies' wavelets are orthogonal, they lead to a perfect reconstruction and this way you can get your original signal back after reconstruction.
In short:
waverec => direct reconstruction => original signal
n times upcoef => interpolation followed by a global summation => original signal
Subscales interpolation is only useful when you want to visualise all the details on the same non-decimated x-grid frame. Such an interpolation brings nothing more since the quantity of information contained in any subscale and its interpolated version is the same.

Related

Parseval's Theorem does not hold for FFT of a sinusoid + noise?

Thanks in advance for any help on this subject. I've recently been trying to work out Parseval's theorem for discrete fourier transforms when noise is included. I based my code from this code.
What I expected to see is that (as when no noise is included) the total power in the frequency domain is half that of the total power in the time-domain, as I have cut off the negative frequencies.
However, as more noise is added to the time-domain signal, the total power of the fourier transform of the signal+noise becomes much less than half of the total power of the signal+noise.
My code is as follows:
import numpy as np
import numpy.fft as nf
import matplotlib.pyplot as plt
def findingdifference(randomvalues):
n = int(1e7) #number of points
tmax = 40e-3 #measurement time
f1 = 30e6 #beat frequency
t = np.linspace(-tmax,tmax,num=n) #define time axis
dt = t[1]-t[0] #time spacing
gt = np.sin(2*np.pi*f1*t)+randomvalues #make a sin + noise
fftfreq = nf.fftfreq(n,dt) #defining frequency (x) axis
hkk = nf.fft(gt) # fourier transform of sinusoid + noise
hkn = nf.fft(randomvalues) #fourier transform of just noise
fftfreq = fftfreq[fftfreq>0] #only taking positive frequencies
hkk = hkk[fftfreq>0]
hkn = hkn[fftfreq>0]
timedomain_p = sum(abs(gt)**2.0)*dt #parseval's theorem for time
freqdomain_p = sum(abs(hkk)**2.0)*dt/n # parseval's therom for frequency
difference = (timedomain_p-freqdomain_p)/timedomain_p*100 #percentage diff
tdomain_pn = sum(abs(randomvalues)**2.0)*dt #parseval's for time
fdomain_pn = sum(abs(hkn)**2.0)*dt/n # parseval's for frequency
difference_n = (tdomain_pn-fdomain_pn)/tdomain_pn*100 #percent diff
return difference,difference_n
def definingvalues(max_amp,length):
noise_amplitude = np.linspace(0,max_amp,length) #defining noise amplitude
difference = np.zeros((2,len(noise_amplitude)))
randomvals = np.random.random(int(1e7)) #defining noise
for i in range(len(noise_amplitude)):
difference[:,i] = (findingdifference(noise_amplitude[i]*randomvals))
return noise_amplitude,difference
def figure(max_amp,length):
noise_amplitude,difference = definingvalues(max_amp,length)
plt.figure()
plt.plot(noise_amplitude,difference[0,:],color='red')
plt.plot(noise_amplitude,difference[1,:],color='blue')
plt.xlabel('Noise_Variable')
plt.ylabel(r'Difference in $\%$')
plt.show()
return
figure(max_amp=3,length=21)
My final graph looks like this figure. Am I doing something wrong when working this out? Is there an physical reason that this trend occurs with added noise? Is it to do with doing a fourier transform on a not perfectly sinusoidal signal? The reason I am doing this is to understand a very noisy sinusoidal signal that I have real data for.
Parseval's theorem holds in general if you use the whole spectrum (positive and negative) frequencies to compute the power.
The reason for the discrepancy is the DC (f=0) component, which is treated somewhat special.
First, where does the DC component come from? You use np.random.random to generate random values between 0 and 1. So on average you raise the signal by 0.5*noise_amplitude, which entails a lot of power. This power is correctly computed in the time domain.
However, in the frequency domain, there is only a single FFT bin that corresponds to f=0. The power of all other frequencies is distributed over two bins, only the DC power is contained in a single bin.
By scaling the noise you add DC power. By removing the negative frequencies you remove half the signal power, but most of the noise power is located in the DC component which is used fully.
You have several options:
Use all frequencies to compute the power.
Use noise without a DC component: randomvals = np.random.random(int(1e7)) - 0.5
"Fix" the power calculation by removing half of the DC power: hkk[fftfreq==0] /= np.sqrt(2)
I'd go with option 1. The second might be OK and I don't really recommend 3.
Finally, there is a minor problem with the code:
fftfreq = fftfreq[fftfreq>0] #only taking positive frequencies
hkk = hkk[fftfreq>0]
hkn = hkn[fftfreq>0]
This does not really make sense. Better change it to
hkk = hkk[fftfreq>=0]
hkn = hkn[fftfreq>=0]
or completely remove it for option 1.

ODEINT with multiple parameters (time-dependent)

I'm trying to solve a single first-order ODE using ODEINT. Following is the code. I expect to get 3 values of y for 3 time-points. The issue I'm struggling with is ability to pass nth value of mt and nt to calculate dydt. I think the ODEINT passes all 3 values of mt and nt, instead just 0th, 1st or 2nd, depending on the iteration. Because of this, I get this error:
RuntimeError: The size of the array returned by func (4) does not match the size of y0 (1).
Interestingly, if I replace the initial condition, which is (and should be) a single value as: a0= [2]*4, the code works, but gives me a 4X4 matrix as solution, which seems incorrect.
mt = np.array([3,7,4,2]) # Array of constants
nt = np.array([5,1,9,3]) # Array of constants
c1,c2,c3 = [-0.3,1.4,-0.5] # co-efficients
para = [mt,nt] # Packing parameters
#Test ODE function
def test (y,t,extra):
m,n = extra
dydt = c1*c2*m - c1*y - c3*n
return dydt
a0= [2] # Initial Condition
tspan = range(len(mt)) # Define tspan
#Solving the ODE
yt= odeint(test, a0,tspan,args=(para,))
#Plotting the ODE
plt.plot(tspan,yt,'g')
plt.title('Multiple Parameters Test')
plt.xlabel('Time')
plt.ylabel('Magnitude')
The first order differential equation is:
dy/dt = c1*(c2*mt-y(t)) - c3*nt
This equation represents a part of murine endocrine system, which I am trying to model. The system is analogous to a two-tank system, where the first tank receives a specific hormone [at an unknown rate] but our sensor will detect that level (mt) at specific time intervals (1 second). This tank then feeds into the second tank, where the level of this hormone (y) is detected by another sensor. I labeled the levels using separate variables because the sensors that detect the levels are independent of each other and are not calibrated to each other. 'c2' may be considered as the co-efficient that shows the correlation between the two levels. Also, the transfer of this hormone from tank 1 to tank 2 is diffusion-driven. This hormone is further consumed by a biochemical process (similar to a drain valve for the second tank). At the moment, it is unclear which parameters affect the consumption; however, another sensor can detect the amount of hormone (nt) being consumed at a specific time interval (1 second, in this case too).
Thus, mt and nt are the concentrations/levels of the hormone at specific time points. although only 4-element in length in the code, these arrays are much longer in my study. All sensors report the concentrations at 1 second interval - hence tspan consists of time points separated by 1 second.
The objective is to determine the concentration of this hormone in the second tank (y) mathematically and then optimize the values of these coefficients based on the experimental data. I was able to pass these arrays mt and nt to the defined ODE and solve using ODE45 in MATLAB with no issue. I've been running into this RunTimeError, while trying to replicate the code in Python.
As I mentioned in a comment, if you want to model this system using an ordinary differential equation, you have to make an assumption about the values of m and n between sample times. One possible model is to use linear interpolation. Here's a script that uses scipy.interpolate.interp1d to create the functions mfunc(t) and nfunc(t) based on the samples mt and nt.
import numpy as np
from scipy.integrate import odeint
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
mt = np.array([3,7,4,2]) # Array of constants
nt = np.array([5,1,9,3]) # Array of constants
c1, c2, c3 = [-0.3, 1.4, -0.5] # co-efficients
# Create linear interpolators for m(t) and n(t).
sample_times = np.arange(len(mt))
mfunc = interp1d(sample_times, mt, bounds_error=False, fill_value="extrapolate")
nfunc = interp1d(sample_times, nt, bounds_error=False, fill_value="extrapolate")
# Test ODE function
def test (y, t):
dydt = c1*c2*mfunc(t) - c1*y - c3*nfunc(t)
return dydt
a0 = [2] # Initial Condition
tspan = np.linspace(0, sample_times.max(), 8*len(sample_times)+1)
#tspan = sample_times
# Solving the ODE
yt = odeint(test, a0, tspan)
# Plotting the ODE
plt.plot(tspan, yt, 'g')
plt.title('Multiple Parameters Test')
plt.xlabel('Time')
plt.ylabel('Magnitude')
plt.show()
Here is the plot created by the script:
Note that instead of generating the solution only at sample_times (i.e. at times 0, 1, 2, and 3), I set tspan to a denser set of points. This shows the behavior of the model between sample times.

How to get the dropout mask in Tensorflow

I have constructed a regression type of neural net (NN) with dropout by Tensorflow. I would like to know if it is possible to find which hidden units are dropped from the previous layer in the output file. Therefore, we could implement the NN results by C++ or Matlab.
The following is an example of Tensorflow model. There are three hidden layer with one output layer. After the 3rd sigmoid layer, there is a dropout with probability equal to 0.9. I would like to know if it is possible to know which hidden units in the 3rd sigmoid layer are dropped.
def multilayer_perceptron(_x, _weights, _biases):
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(_x, _weights['h1']), _biases['b1']))
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']))
layer_3 = tf.nn.sigmoid(tf.add(tf.matmul(layer_2, _weights['h3']), _biases['b3']))
layer_d = tf.nn.dropout(layer_3, 0.9)
return tf.matmul(layer_d, _weights['out']) + _biases['out']
Thank you very much!
There is a way to get the mask of 0 and 1, and of shape layer_3.get_shape() produced by tf.nn.dropout().
The trick is to give a name to your dropout operation:
layer_d = tf.nn.dropout(layer_3, 0.9, name='my_dropout')
Then you can get the wanted mask through the TensorFlow graph:
graph = tf.get_default_graph()
mask = graph.get_tensor_by_name('my_dropout/Floor:0')
The tensor mask will be of same shape and type as layer_d, and will only have values 0 or 1. 0 corresponds to the dropped neurons.
Simple and idiomatic solution (although possibly slightly slower than Oliver's):
# generate mask
mask = tf.nn.dropout(tf.ones_like(layer),rate)
# apply mask
dropped_layer = layer * mask

Fitting a Gaussian, getting a straight line. Python 2.7

As my title suggests, I'm trying to fit a Gaussian to some data and I'm just getting a straight line. I've been looking at these other discussion Gaussian fit for Python and Fitting a gaussian to a curve in Python which seem to suggest basically the same thing. I can make the code in those discussions work fine for the data they provide, but it won't do it for my data.
My code looks like this:
import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
y = y - y[0] # to make it go to zero on both sides
x = range(len(y))
max_y = max(y)
n = len(y)
mean = sum(x*y)/n
sigma = np.sqrt(sum(y*(x-mean)**2)/n)
# Someone on a previous post seemed to think this needed to have the sqrt.
# Tried it without as well, made no difference.
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[max_y,mean,sigma])
# It was suggested in one of the other posts I looked at to make the
# first element of p0 be the maximum value of y.
# I also tried it as 1, but that did not work either
plt.plot(x,y,'b:',label='data')
plt.plot(x,gaus(x,*popt),'r:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
The data I am trying to fit is as follows:
y = array([ 6.95301373e+12, 9.62971320e+12, 1.32501876e+13,
1.81150568e+13, 2.46111132e+13, 3.32321345e+13,
4.45978682e+13, 5.94819771e+13, 7.88394616e+13,
1.03837779e+14, 1.35888594e+14, 1.76677210e+14,
2.28196006e+14, 2.92781632e+14, 3.73133045e+14,
4.72340762e+14, 5.93892782e+14, 7.41632194e+14,
9.19750269e+14, 1.13278296e+15, 1.38551838e+15,
1.68291212e+15, 2.02996957e+15, 2.43161742e+15,
2.89259207e+15, 3.41725793e+15, 4.00937676e+15,
4.67187762e+15, 5.40667931e+15, 6.21440313e+15,
7.09421973e+15, 8.04366842e+15, 9.05855930e+15,
1.01328502e+16, 1.12585509e+16, 1.24257598e+16,
1.36226443e+16, 1.48356404e+16, 1.60496345e+16,
1.72482199e+16, 1.84140400e+16, 1.95291969e+16,
2.05757166e+16, 2.15360187e+16, 2.23933053e+16,
2.31320228e+16, 2.37385276e+16, 2.42009864e+16,
2.45114362e+16, 2.46427484e+16, 2.45114362e+16,
2.42009864e+16, 2.37385276e+16, 2.31320228e+16,
2.23933053e+16, 2.15360187e+16, 2.05757166e+16,
1.95291969e+16, 1.84140400e+16, 1.72482199e+16,
1.60496345e+16, 1.48356404e+16, 1.36226443e+16,
1.24257598e+16, 1.12585509e+16, 1.01328502e+16,
9.05855930e+15, 8.04366842e+15, 7.09421973e+15,
6.21440313e+15, 5.40667931e+15, 4.67187762e+15,
4.00937676e+15, 3.41725793e+15, 2.89259207e+15,
2.43161742e+15, 2.02996957e+15, 1.68291212e+15,
1.38551838e+15, 1.13278296e+15, 9.19750269e+14,
7.41632194e+14, 5.93892782e+14, 4.72340762e+14,
3.73133045e+14, 2.92781632e+14, 2.28196006e+14,
1.76677210e+14, 1.35888594e+14, 1.03837779e+14,
7.88394616e+13, 5.94819771e+13, 4.45978682e+13,
3.32321345e+13, 2.46111132e+13, 1.81150568e+13,
1.32501876e+13, 9.62971320e+12, 6.95301373e+12,
4.98705540e+12])
I would show you what it looks like, but apparently I don't have enough reputation points...
Anyone got any idea why it's not fitting properly?
Thanks for your help :)
The importance of the initial guess, p0 in curve_fit's default argument list, cannot be stressed enough.
Notice that the docstring mentions that
[p0] If None, then the initial values will all be 1
So if you do not supply it, it will use an initial guess of 1 for all parameters you're trying to optimize for.
The choice of p0 affects the speed at which the underlying algorithm changes the guess vector p0 (ref. the documentation of least_squares).
When you look at the data that you have, you'll notice that the maximum and the mean, mu_0, of the Gaussian-like dataset y, are
2.4e16 and 49 respectively. With the peak value so large, the algorithm, would need to make drastic changes to its initial guess to reach that large value.
When you supply a good initial guess to the curve fitting algorithm, convergence is more likely to occur.
Using your data, you can supply a good initial guess for the peak_value, the mean and sigma, by writing them like this:
y = np.array([...]) # starting from the original dataset
x = np.arange(len(y))
peak_value = y.max()
mean = x[y.argmax()] # observation of the data shows that the peak is close to the center of the interval of the x-data
sigma = mean - np.where(y > peak_value * np.exp(-.5))[0][0] # when x is sigma in the gaussian model, the function evaluates to a*exp(-.5)
popt,pcov = curve_fit(gaus, x, y, p0=[peak_value, mean, sigma])
print(popt) # prints: [ 2.44402560e+16 4.90000000e+01 1.20588976e+01]
Note that in your code, for the mean you take sum(x*y)/n , which is strange, because this would modulate the gaussian by a polynome of degree 1 (it multiplies a gaussian with a monotonously increasing line of constant slope) before taking the mean. That will offset the mean value of y (in this case to the right). A similar remark can be made for your calculation of sigma.
Final remark: the histogram of y will not resemble a Gaussian, as y is already a Gaussian. The histogram will merely bin (count) values into different categories (answering the question "how many datapoints in y reach a value between [a, b]?").

GeoDjango: Determine the area of a polygon

In my model I have a polygon field defined via
polygon = models.PolygonField(srid=4326, geography=True, null=True, blank=True)
When I want to determine the area of the polygon, I call
area_square_degrees = object.polygon.area
But how can I convert the result in square degrees into m2 with GeoDjango?
This answer does not work, since area does not have a method sq_m. Is there any built-in conversion?
You need to transform your data to the correct spatial reference system.
area_square_local_units = object.polygon.transform(srid, clone=False).area
In the UK you might use the British National Grid SRID of 27700 which uses meters.
area_square_meters = object.polygon.transform(27700, clone=False).area
You may or may not want to clone the geometry depending on whether or not you need to do anything else with it in its untransformed state.
Docs are here https://docs.djangoproject.com/en/1.8/ref/contrib/gis/geos/
I have struggled a lot with this, since i could'nt find a clean solution. The trick is you have to use the postgis capabilities (and and thus its only working with postgis..):
from django.contrib.gis.db.models.functions import Area
loc_obj = Location.objects.annotate(area_=Area("poly")).get(pk=??)
# put the primary key of the object
print(loc_obj.area_) # distance object, result should be in meters, but you can change to the unit you want, e.g .mi for miles etc..
The models.py:
class Location(models.Model):
poly = gis_models.PolygonField(srid=4326, geography=True)
It's i think the best way to do it if you have to deal with geographic coordinates instead of projections. It does handle the curve calculation of the earth, and the result is precise even with big distance/area
I needed an application to get the area of poligons around the globe and if I used an invalid country/region projection I got the error OGRException: OGR failure
I ended using an OpenLayers implementation
using the 4326 projection (is the default projection) to avoid concerning about every country/region specific projection.
Here is my code:
import math
def getCoordsM2(coordinates):
d2r = 0.017453292519943295 # Degrees to radiant
area = 0.0
for coord in range(0, len(coordinates)):
point_1 = coordinates[coord]
point_2 = coordinates[(coord + 1) % len(coordinates)]
area += ((point_2[0] - point_1[0]) * d2r) *\
(2 + math.sin(point_1[1] * d2r) + math.sin(point_2[1] * d2r))
area = area * 6378137.0 * 6378137.0 / 2.0
return math.fabs(area)
def getGeometryM2(geometry):
area = 0.0
if geometry.num_coords > 2:
# Outer ring
area += getCoordsM2(geometry.coords[0])
# Inner rings
for counter, coordinates in enumerate(geometry.coords):
if counter > 0:
area -= getCoordsM2(coordinates)
return area
Simply pass your geometry to getGeometryM2 function and you are done!
I use this function in my GeoDjango model as a property.
Hope it helps!
If Its earths surface area that you are talking about, 1 square degree has 12,365.1613 square km. So multiple your square degree and multiply by 10^6 to convert to meters.