Is Time Resolution always the best in Continuous Wavelet Transform (CWT)? - wavelet

Does wavelet move a sample point at a time in CWT? This seems to have the best time resolution at all time regardless of what scale is. In the example below, signal length is 2048, for whatever scale value, the calculated coef has a length of 2048.
import pywt
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
# Define signal
fs = 1024.0
dt = 1 / fs
signal_freuqncy = 100
scale = np.arange(2, 256)
wavelet = 'morl'
t = np.linspace(0, 2, int(2 * fs))
y = np.sin(signal_freuqncy*2*np.pi*t)
# Calculate continuous wavelet transform
coef, freqs = pywt.cwt(y, scale, wavelet, dt)
# Show w.r.t. time and frequency
plt.figure(figsize=(6, 4))
plt.pcolor(t, freqs, coef, shading = 'flat', cmap = 'jet')
# Set yscale, ylim and labels
# plt.yscale('log')
# plt.ylim([0, 20])
plt.ylabel('Frequency (Hz)')
plt.xlabel('Time (sec)')
cbar = plt.colorbar()
cbar.set_label('Energy Intensity')
plt.show()
If my guess is right, it is contradictory to my previous understanding that we use expanded wavelet for low freq. component because we want good freq. resolution but poor time resolution at low freq. range. Like shown in the picture.

Related

keras autoencoder vs PCA

I am playing with a toy example to understand PCA vs keras autoencoder
I have the following code for understanding PCA:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
pca = decomposition.PCA(n_components=3)
pca.fit(X)
pca.explained_variance_ratio_
array([ 0.92461621, 0.05301557, 0.01718514])
pca.components_
array([[ 0.36158968, -0.08226889, 0.85657211, 0.35884393],
[ 0.65653988, 0.72971237, -0.1757674 , -0.07470647],
[-0.58099728, 0.59641809, 0.07252408, 0.54906091]])
I have done a few readings and play codes with keras including this one.
However, the reference code feels too high a leap for my level of understanding.
Does someone have a short auto-encoder code which can show me
(1) how to pull the first 3 components from auto-encoder
(2) how to understand what amount of variance the auto-encoder captures
(3) how the auto-encoder components compare against PCA components
First of all, the aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. So, the target output of the autoencoder is the autoencoder input itself.
It is shown in [1] that If there is one linear hidden layer and the mean squared error criterion is used to train the network, then the k hidden units learn to project the input in the span of the first k principal components of the data.
And in [2] you can see that If the hidden layer is nonlinear, the autoencoder behaves differently from PCA, with the ability to capture multi-modal aspects of the input distribution.
Autoencoders are data-specific, which means that they will only be able to compress data similar to what they have been trained on. So, the usefulness of features that have been learned by hidden layers could be used for evaluating the efficacy of the method.
For this reason, one way to evaluate an autoencoder efficacy in dimensionality reduction is cutting the output of the middle hidden layer and compare the accuracy/performance of your desired algorithm by this reduced data rather than using original data.
Generally, PCA is a linear method, while autoencoders are usually non-linear. Mathematically, it is hard to compare them together, but intuitively I provide an example of dimensionality reduction on MNIST dataset using Autoencoder for your better understanding. The code is here:
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense
from keras.utils import np_utils
import numpy as np
num_train = 60000
num_test = 10000
height, width, depth = 28, 28, 1 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(num_train, height * width)
X_test = X_test.reshape(num_test, height * width)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range
Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels
input_img = Input(shape=(height * width,))
x = Dense(height * width, activation='relu')(input_img)
encoded = Dense(height * width//2, activation='relu')(x)
encoded = Dense(height * width//8, activation='relu')(encoded)
y = Dense(height * width//256, activation='relu')(x)
decoded = Dense(height * width//8, activation='relu')(y)
decoded = Dense(height * width//2, activation='relu')(decoded)
z = Dense(height * width, activation='sigmoid')(decoded)
model = Model(input_img, z)
model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy
model.fit(X_train, X_train,
epochs=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, X_test))
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)
reduced.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
reduced.fit(X_train, Y_train,
epochs=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, Y_test))
scores = reduced.evaluate(X_test, Y_test, verbose=1)
print("Accuracy: ", scores[1])
It produces a $y\in \mathbb{R}^{3}$ ( almost like what you get by decomposition.PCA(n_components=3) ). For example, here you see the outputs of layer y for a digit 5 instance in dataset:
class y_1 y_2 y_3
5 87.38 0.00 20.79
As you see in the above code, when we connect layer y to a softmax dense layer:
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
the new model mid give us a good classification accuracy about 95%. So, it would be reasonable to say that y, is an efficiently extracted feature vector for the dataset.
References:
[1]: Bourlard, Hervé, and Yves Kamp. "Auto-association by multilayer perceptrons and singular value decomposition." Biological cybernetics 59.4 (1988): 291-294.
[2]: Japkowicz, Nathalie, Stephen Jose Hanson, and Mark A. Gluck. "Nonlinear autoassociation is not equivalent to PCA." Neural computation 12.3 (2000): 531-545.
The earlier answer cover the whole thing, however I am doing the analysis on the Iris data - my code comes with a slightly modificiation from this post which dives further into the topic. As it was request, lets load the data
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
Let's do a regular PCA
from sklearn import decomposition
pca = decomposition.PCA()
pca_transformed = pca.fit_transform(X_scaled)
plot3clusters(pca_transformed[:,:2], 'PCA', 'PC')
A very simple AE model with linear layers, as the earlier answer pointed out with ... the first reference, one linear hidden layer and the mean squared error criterion is used to train the network, then the k hidden units learn to project the input in the span of the first k principal components of the data.
from keras.layers import Input, Dense
from keras.models import Model
import matplotlib.pyplot as plt
#create an AE and fit it with our data using 3 neurons in the dense layer using keras' functional API
input_dim = X_scaled.shape[1]
encoding_dim = 2
input_img = Input(shape=(input_dim,))
encoded = Dense(encoding_dim, activation='linear')(input_img)
decoded = Dense(input_dim, activation='linear')(encoded)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
print(autoencoder.summary())
history = autoencoder.fit(X_scaled, X_scaled,
epochs=1000,
batch_size=16,
shuffle=True,
validation_split=0.1,
verbose = 0)
# use our encoded layer to encode the training input
encoder = Model(input_img, encoded)
encoded_input = Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]
decoder = Model(encoded_input, decoder_layer(encoded_input))
encoded_data = encoder.predict(X_scaled)
plot3clusters(encoded_data[:,:2], 'Linear AE', 'AE')
You can look into the loss if you want
#plot our loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model train vs validation loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
The function to plot the data
def plot3clusters(X, title, vtitle):
import matplotlib.pyplot as plt
plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
plt.scatter(X[y == i, 0], X[y == i, 1], color=color, alpha=1., lw=lw, label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title(title)
plt.xlabel(vtitle + "1")
plt.ylabel(vtitle + "2")
return(plt.show())
Regarding explaining the variability, using non-linear hidden function, leads to other approximation similar to ICA / TSNE and others. Where the idea of variance explanation is not there, still one can look into the convergence.

Computational Physics, FFT analysis

I solved the following questions for a computational assignment, I got a really bad grade on it (67%) I would like to understand how to properly do these questions, in particular Q1.b and Q3. Please be as detailed as possible, I would really like to understand my msitakes
Generate data (sinusoidal functions). Use fft to analyze:
a) A superposition of three waves with constant, but different frequencies
b) A wave whose frequency depends on time
Plot the graphs, sample frequencies, amplitude and power spectra with appropriate axes.
Use the 3 waves from Exercise 1a), but change them to have the same frequency, phase and amplitude. Contaminate each of them with successively increasing amounts of
random, Gaussian-distributed noise.
1) Perform an FFT on the superposition of the three noise-contaminated waves.
Analyze and plot the output.
2) Filter the signal with a Gaussian function, plot the “clean” wave, and analyze the
result. Is the resultant wave 100% clean? Explain.
#1(b)
tmin = -2*pi
tmax - 2*pi
delta = 0.01
t = arange(tmin, tmax, delta)
y = sin(2.5*t*t)
plot(t, y, '-')
title('Figure 2: Plotting a wave whose frequency depends on time ')
xlabel('Time (s)')
ylabel('Y(t)')
show()
#b.2
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 5; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[range(n/2)] # one side frequency range
Y = np.fft.fft(y)/n # fft computing and normalization
Y = Y[range(n/2)]
#Time vs. Amplitude
plot(t,y)
title('Figure 2: Time vs. Amplitude')
xlabel('Time')
ylabel('Amplitude')
plt.show()
#Amplitude Spectrum
plot(frq,abs(Y),'r')
title('Figure 2a: Amplitude Spectrum')
xlabel('Freq (Hz)')
ylabel('amplitude spectrum')
plt.show()
#Power Spectrum
plot(frq,abs(Y)**2,'r')
title('Figure 2b: Power Spectrum')
xlabel('Freq (Hz)')
ylabel('power spectrum')
plt.show()
#Exercise 3:
#part 1
t = np.linspace(-0.5*pi,0.5*pi,1000)
#contaminating our waves with successively increasing white noise
y_1 = sin(15*t) + np.random.normal(0,0.2*pi,1000)
y_2 = sin(15*t) + np.random.normal(0,0.3*pi,1000)
y_3 = sin(15*t) + np.random.normal(0,0.4*pi,1000)
y = y_1 + y_2 + y_3 # superposition of three contaminated waves
#Plotting the figure
plot(t,y,'-')
title('A superposition of three waves contaminated with Gaussian Noise')
xlabel('Time (s)')
ylabel('Y(t)')
show()
delta = pi/1000.0
n = len(y) ## calculate frequency in Hz
freq = fftfreq(n, delta) # Computing the FFT
Freq = fftfreq(len(y), delta) #Using Fast Fourier Transformation to #calculate frequencies
N = len(Freq)
fr = Freq[1:len(Freq)/2.0]
A = fft(y)
XF = A[1:len(A)/2.0]/float(len(A[1:len(A)/2.0]))
# Amplitude spectrum for contaminated waves
plt.plot(fr, abs(XF))
title('Figure 3a : Amplitude spectrum with Gaussian Noise')
xlabel('frequency')
ylabel('Amplitude')
show()
# Power spectrum for contaminated waves
plt.plot(fr,abs(XF)**2)
title('Figure 3b: Power spectrum with Gaussian Noise')
xlabel('frequency(cycles/year)')
ylabel('Power')
show()
# part 2
F_v = exp(-(abs(freq)-2)**2/2*0.5**2)
spectrum = A*F_v #Applying the Gaussian Filter to clean our waves
new_y = ifft(spectrum) #Computing the inverse FFT
plot(t,new_y,'-')
title('A superposition of three waves after Noise Filtering')
xlabel('Time (s)')
ylabel('Y(t)')
show()
Something like the code/images below would have been expected. I deviated in the plot of the sum of the three noisy waves to show off all three waves and the sum. Note that in the intensity spectrum of the noisy wave you don't see much. For those cases it can be instructive to also plot the logarithm of the spectrum (np.log) so you can see the noise better.
In the last plot I plotted both the Gaussian filter and the spectrum (different sizes) w/o rescaling just to show where the filter applies. It is effectively a low pass filter (lets low frequencies through), removing the higher frequency noise by multiplying it with numbers close to zero.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
#1(b)
p.figure(figsize=(20,16))
p.subplot(431)
t = np.arange(0,10, 0.001) #units in seconds
#cleaner to show the frequency change explicitly than y = sin(2.5*t*t)
f= 1+ t*0.1 # linear up chirp, i.e. frequency goes up , frequency units in Hz (1/sec)
y = np.sin(2* np.pi* f* t)
p.plot(t, y, '-')
p.title('Figure 2: Plotting a wave whose frequency depends on time ')
p.xlabel('Time (s)')
p.ylabel('Y(t)')
#b.2
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 5; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
n = len(y) # length of the signal
k = np.arange(n) ## ok, the FFT has as many points in frequency space, as the original in time
T = n/Fs ## correct ; T=sampling time, the total frequency range is 1/sample time
frq = k/T # two sided frequency range
frq = frq[range(n/2)] # one sided frequency range
Y = np.fft.fft(y)/n # fft computing and normalization
Y = Y[range(n/2)]
# Amplitude vs. Time
p.subplot(434)
p.plot(t,y)
p.title('y(t)') # Amplitude vs Time is commonly said, but strictly not true, the amplitude is unchanging
p.xlabel('Time')
p.ylabel('Amplitude')
#Amplitude Spectrum
p.subplot(435)
p.plot(frq,abs(Y),'r')
p.title('Figure 2a: Amplitude Spectrum')
p.xlabel('Freq (Hz)')
p.ylabel('amplitude spectrum')
#Power Spectrum
p.subplot(436)
p.plot(frq,abs(Y)**2,'r')
p.title('Figure 2b: Power Spectrum')
p.xlabel('Freq (Hz)')
p.ylabel('power spectrum')
#Exercise 3:
#part 1
t = np.linspace(-0.5*np.pi,0.5*np.pi,1000)
# #contaminating our waves with successively increasing white noise
y_1 = np.sin(15*t) + np.random.normal(0,0.1,1000) # no need to get pi involved in this amplitude
y_2 = np.sin(15*t) + np.random.normal(0,0.2,1000)
y_3 = np.sin(15*t) + np.random.normal(0,0.4,1000)
y = y_1 + y_2 + y_3 # superposition of three contaminated waves
#Plotting the figure
p.subplot(437)
p.plot(t,y_1+2,'-',lw=0.3)
p.plot(t,y_2,'-',lw=0.3)
p.plot(t,y_3-2,'-',lw=0.3)
p.plot(t,y-6 ,lw=1,color='black')
p.title('A superposition of three waves contaminated with Gaussian Noise')
p.xlabel('Time (s)')
p.ylabel('Y(t)')
delta = np.pi/1000.0
n = len(y) ## calculate frequency in Hz
# freq = np.fft(n, delta) # Computing the FFT <-- wrong, you don't calculate the FFT from a number, but from a time dep. vector/array
# Freq = np.fftfreq(len(y), delta) #Using Fast Fourier Transformation to #calculate frequencies
# N = len(Freq)
# fr = Freq[1:len(Freq)/2.0]
# A = fft(y)
# XF = A[1:len(A)/2.0]/float(len(A[1:len(A)/2.0]))
# Why not do as before?
k = np.arange(n) ## ok, the FFT has as many points in frequency space, as the original in time
T = n/Fs ## correct ; T=sampling time, the total frequency range is 1/sample time
frq = k/T # two sided frequency range
frq = frq[range(n/2)] # one sided frequency range
Y = np.fft.fft(y)/n # fft computing and normalization
Y = Y[range(n/2)]
# Amplitude spectrum for contaminated waves
p.subplot(438)
p.plot(frq, abs(Y))
p.title('Figure 3a : Amplitude spectrum with Gaussian Noise')
p.xlabel('frequency')
p.ylabel('Amplitude')
# Power spectrum for contaminated waves
p.subplot(439)
p.plot(frq,abs(Y)**2)
p.title('Figure 3b: Power spectrum with Gaussian Noise')
p.xlabel('frequency(cycles/year)')
p.ylabel('Power')
# part 2
p.subplot(4,3,11)
F_v = np.exp(-(np.abs(frq)-2)**2/2*0.5**2) ## this is a Gaussian, plot it separately to see it; play with the values
cleaned_spectrum = Y*F_v #Applying the Gaussian Filter to clean our waves ## multiplication in FreqDomain is convolution in time domain
p.plot(frq,F_v)
p.plot(frq,cleaned_spectrum)
p.subplot(4,3,10)
new_y = np.fft.ifft(cleaned_spectrum) #Computing the inverse FFT of the cleaned spectrum to see the cleaned wave
p.plot(t[range(n/2)],new_y,'-')
p.title('A superposition of three waves after Noise Filtering')
p.xlabel('Time (s)')
p.ylabel('Y(t)')

Complex cross spectral density

mlab.csd from matplotlib: http://matplotlib.org/api/mlab_api.html#matplotlib.mlab.csd can be used to get real valued cross spectral density. If I want to get the phase information from the spectral density, I need a csd calculation which returns complex values. Is there one ?
This is discussed e.g. in this answer: https://stackoverflow.com/a/29306730/3920342
If you use csd of the mlab library you will get complex values so you can calculate phase angles (and the real valued coherence). In the following code s1 and and s2 contain the two signals (in time domain) to be correlated.
from matplotlib import mlab
# First create power sectral densities for normalization
(ps1, f) = mlab.psd(s1, Fs=1./dt, scale_by_freq=False)
(ps2, f) = mlab.psd(s2, Fs=1./dt, scale_by_freq=False)
plt.plot(f, ps1)
plt.plot(f, ps2)
# Then calculate cross spectral density
(csd, f) = mlab.csd(s1, s2, NFFT=256, Fs=1./dt,sides='default', scale_by_freq=False)
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
# Normalize cross spectral absolute values by auto power spectral density
ax1.plot(f, np.absolute(csd)**2 / (ps1 * ps2))
ax2 = fig.add_subplot(1, 2, 2)
angle = np.angle(csd, deg=True)
angle[angle<-90] += 360
ax2.plot(f, angle)
# zoom in on frequency with maximum coherence
ax1.set_xlim(9, 11)
ax1.set_ylim(0, 1e-0)
ax1.set_title("Cross spectral density: Coherence")
ax2.set_xlim(9, 11)
ax2.set_ylim(0, 90)
ax2.set_title("Cross spectral density: Phase angle")
Here the real and imaginary(!) part of the cross spectral density:
This code is taken from the question How to use the cross-spectral density to calculate the phase shift of two related signals to create two signals s1 and s2:
"""
Compute the coherence of two signals
"""
import numpy as np
import matplotlib.pyplot as plt
# make a little extra space between the subplots
plt.subplots_adjust(wspace=0.5)
nfft = 256
dt = 0.01
t = np.arange(0, 30, dt)
nse1 = np.random.randn(len(t)) # white noise 1
nse2 = np.random.randn(len(t)) # white noise 2
r = np.exp(-t/0.05)
cnse1 = np.convolve(nse1, r, mode='same')*dt # colored noise 1
cnse2 = np.convolve(nse2, r, mode='same')*dt # colored noise 2
# two signals with a coherent part and a random part
s1 = 0.01*np.sin(2*np.pi*10*t) + cnse1
s2 = 0.01*np.sin(2*np.pi*10*t) + cnse2

Fourier coefficients for NFFT - non uniform fast Fourier transform?

I am trying to use the package pynfft in python 2.7 to do the non-uniform fast Fourier transform (nfft). I have learnt python for only two months, so I have some difficulties.
This is my code:
import numpy as np
from pynfft.nfft import NFFT
#loading data, 104 lines
t_diff, x_diff = np.loadtxt('data/analysis/amplitudes.dat', unpack = True)
N = [13,8]
M = 52
#fourier coefficients
f_hat = np.fft.fft(x_diff)/(2*M)
#instantiation
plan = NFFT(N,M)
#precomputation
x = t_diff
plan.x = x
plan.precompute()
# vector of non uniform samples
f = x_diff[0:M]
#execution
plan.f = f
plan.f_hat = f_hat
f = plan.trafo()
I am basically following the instructions I found in the pynfft tutorial (http://pythonhosted.org/pyNFFT/tutorial.html).
I need the nfft because the time intervals in which my data are taken are not constant (I mean, the first measure is taken at t, the second after dt, the third after dt+dt' with dt' different from dt and so on).
The pynfft package wants the vector of the fourier coefficients ("f_hat") before execution, so I calculated it using numpy.fft, but I am not sure this procedure is correct. Is there another way to do it (maybe with the nfft)?
I would like also to calculate the frquencies; I know that with numpy.fft there is a command: is ther anything like that also for pynfft? I did not find anything in the tutorial.
Thank you for any advice you can give me.
Here is a working example, taken from here:
First we define the function we want to reconstruct, which is the sum of four harmonics:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12345)
%pylab inline --no-import-all
# function we want to reconstruct
k=[1,5,10,30] # modulating coefficients
def myf(x,k):
return sum(np.sin(x*k0*(2*np.pi)) for k0 in k)
x=np.linspace(-0.5,0.5,1000) # 'continuous' time/spatial domain; -0.5<x<+0.5
y=myf(x,k) # 'true' underlying trigonometric function
fig=plt.figure(1,(20,5))
ax =fig.add_subplot(111)
ax.plot(x,y,'red')
ax.plot(x,y,'r.')
# we should sample at a rate of >2*~max(k)
M=256 # number of nodes
N=128 # number of Fourier coefficients
nodes =np.random.rand(M)-0.5 # non-uniform oversampling
values=myf(nodes,k) # nodes&values will be used below to reconstruct
# original function using the Solver
ax.plot(nodes,values,'bo')
ax.set_xlim(-0.5,+0.5)
The we initialize and run the Solver:
from pynfft import NFFT, Solver
f = np.empty(M, dtype=np.complex128)
f_hat = np.empty([N,N], dtype=np.complex128)
this_nfft = NFFT(N=[N,N], M=M)
this_nfft.x = np.array([[node_i,0.] for node_i in nodes])
this_nfft.precompute()
this_nfft.f = f
ret2=this_nfft.adjoint()
print this_nfft.M # number of nodes, complex typed
print this_nfft.N # number of Fourier coefficients, complex typed
#print this_nfft.x # nodes in [-0.5, 0.5), float typed
this_solver = Solver(this_nfft)
this_solver.y = values # '''right hand side, samples.'''
#this_solver.f_hat_iter = f_hat # assign arbitrary initial solution guess, default is 0
this_solver.before_loop() # initialize solver internals
while not np.all(this_solver.r_iter < 1e-2):
this_solver.loop_one_step()
Finally, we display the frequencies:
import matplotlib.pyplot as plt
fig=plt.figure(1,(20,5))
ax =fig.add_subplot(111)
foo=[ np.abs( this_solver.f_hat_iter[i][0])**2 for i in range(len(this_solver.f_hat_iter) ) ]
ax.plot(np.abs(np.arange(-N/2,+N/2,1)),foo)
cheers

Any way to create histogram with matplotlib.pyplot without plotting the histogram?

I am using matplotlib.pyplot to create histograms. I'm not actually interested in the plots of these histograms, but interested in the frequencies and bins (I know I can write my own code to do this, but would prefer to use this package).
I know I can do the following,
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.normal(1.5,1.0)
x2 = np.random.normal(0,1.0)
freq, bins, patches = plt.hist([x1,x1],50,histtype='step')
to create a histogram. All I need is freq[0], freq[1], and bins[0]. The problem occurs when I try and use,
freq, bins, patches = plt.hist([x1,x1],50,histtype='step')
in a function. For example,
def func(x, y, Nbins):
freq, bins, patches = plt.hist([x,y],Nbins,histtype='step') # create histogram
bincenters = 0.5*(bins[1:] + bins[:-1]) # center bins
xf= [float(i) for i in freq[0]] # convert integers to float
xf = [float(i) for i in freq[1]]
p = [ (bincenters[j], (1.0 / (xf[j] + yf[j] )) for j in range(Nbins) if (xf[j] + yf[j]) != 0]
Xt = [j for i,j in p] # separate pairs formed in p
Yt = [i for i,j in p]
Y = np.array(Yt) # convert to arrays for later fitting
X = np.array(Xt)
return X, Y # return arrays X and Y
When I call func(x1,x2,Nbins) and plot or print X and Y, I do not get my expected curve/values. I suspect it something to do with plt.hist, since there is a partial histogram in my plot.
I don't know if I'm understanding your question very well, but here, you have an example of a very simple home-made histogram (in 1D or 2D), each one inside a function, and properly called:
import numpy as np
import matplotlib.pyplot as plt
def func2d(x, y, nbins):
histo, xedges, yedges = np.histogram2d(x,y,nbins)
plt.plot(x,y,'wo',alpha=0.3)
plt.imshow(histo.T,
extent=[xedges.min(),xedges.max(),yedges.min(),yedges.max()],
origin='lower',
interpolation='nearest',
cmap=plt.cm.hot)
plt.show()
def func1d(x, nbins):
histo, bin_edges = np.histogram(x,nbins)
bin_center = 0.5*(bin_edges[1:] + bin_edges[:-1])
plt.step(bin_center,histo,where='mid')
plt.show()
x = np.random.normal(1.5,1.0, (1000,1000))
func1d(x[0],40)
func2d(x[0],x[1],40)
Of course, you may check if the centering of the data is right, but I think that the example shows some useful things about this topic.
My recommendation: Try to avoid any loop in your code! They kill the performance. If you look, In my example there aren't loops. The best practice in numerical problems with python is avoiding loops! Numpy has a lot of C-implemented functions that do all the hard looping work.
You can use np.histogram2d (for 2D histogram) or np.histogram (for 1D histogram):
hst = np.histogram(A, bins)
hst2d = np.histogram2d(X,Y,bins)
Output form will be the same as plt.hist and plt.hist2d, the only difference is there is no plot.
No.
But you can bypass the pyplot:
import matplotlib.pyplot
fig = matplotlib.figure.Figure()
ax = matplotlib.axes.Axes(fig, (0,0,0,0))
numeric_results = ax.hist(data)
del ax, fig
It won't impact active axes and figures, so it would be ok to use it even in the middle of plotting something else.
This is because any usage of plt.draw_something() will put the plot in current axis - which is a global variable.
If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the np.histogram() function is available