I am currently trying to normalize complex values..
as i don't have a good way of doing this, i decided to divide my dataset into two, consisting of data with only the real part and data only with the imaginary part.
def split_real_img(x):
real_array = x.real
img_array = x.imag
return real_array, img_array
And then normalize each separately with
def numpy_minmax(X):
xmin = X.min()
print X.min()
print X.max()
return (2*(X - xmin) / (X.max() - xmin)-1)*0.9
after the normalization, is the both dataset supposed to be merged, such that it returns into one data set with complex values?, but how do I do that?
The data normalization is done, such that I can use tanh as activation function, which operates in the ranges -0.9 to 0.9 => which is why I need to the data set to be normalized into these ranges.
Basically, two steps would be involved :
Offset all numbers by the minimum along real and imaginary axes.
Divide each by the max. magnitude. To get the magnitude of a complex number, simply use np.abs().
Thus, the implementation would be -
def normalize_complex_arr(a):
a_oo = a - a.real.min() - 1j*a.imag.min() # origin offsetted
return a_oo/np.abs(a_oo).max()
Sample runs for verification
Let'start with an array that has a minimum one of [0+0j] and two more elements - [x1+y1*J] & [y1+x1*J]. Thus, their magnitudes after normalizing should be 1 each.
In [358]: a = np.array([0+0j, 1+17j, 17+1j])
In [359]: normalize_complex_arr(a)
Out[359]:
array([ 0.00000000+0.j , 0.05872202+0.99827437j,
0.99827437+0.05872202j])
In [360]: np.abs(normalize_complex_arr(a))
Out[360]: array([ 0., 1., 1.])
Next up, let's add an offset to the minimum element. This shouldn't change their magnitudes after normalization -
In [361]: a = np.array([0+0j, 1+17j, 17+1j]) + np.array([2+3j])
In [362]: a
Out[362]: array([ 2. +3.j, 3.+20.j, 19. +4.j])
In [363]: normalize_complex_arr(a)
Out[363]:
array([ 0.00000000+0.j , 0.05872202+0.99827437j,
0.99827437+0.05872202j])
In [364]: np.abs(normalize_complex_arr(a))
Out[364]: array([ 0., 1., 1.])
Finally, let's add another element that is at twice the distance from offsetted origin to make sure this new one has a magnitude of 1 and others are reduce to 0.5 -
In [365]: a = np.array([0+0j, 1+17j, 17+1j, 34+2j]) + np.array([2+3j])
In [366]: a
Out[366]: array([ 2. +3.j, 3.+20.j, 19. +4.j, 36. +5.j])
In [367]: normalize_complex_arr(a)
Out[367]:
array([ 0.00000000+0.j , 0.02936101+0.49913719j,
0.49913719+0.02936101j, 0.99827437+0.05872202j])
In [368]: np.abs(normalize_complex_arr(a))
Out[368]: array([ 0. , 0.5, 0.5, 1. ])
Related
I am not pratice in Sympy manipulation.
I need to find roots on particular poly:
-4x**(11/2)-24x**(9/2)-16x**(7/2)+2x**(5/2)+16x**(5)+23x**(4)+5x**(3)-x**(2)
I verified that I have 2 real solution and I find one of them with Sympy function
nsolve(mypoly,x,1).
Why the previous step doesn't look the other?
How can I proceed to find ALL roots?
Thank you to all for assistance
A.
To my knowledge, nsolve looks in the proximity of the provided initial guess to find one root for each equations.
I would plot the expression to find suitable initial guesses:
from sympy import *
from sympy.plotting import PlotGrid
expr = -4*x**(S(11)/2)-24*x**(S(9)/2)-16*x**(S(7)/2)+2*x**(S(5)/2)+16*x**(5)+23*x**(4)+5*x**(3)-x**(2)
p1 = plot(expr, (x, 0, 0.5), adaptive=False, n=1000, ylim=(-0.01, 0.05), show=False)
p2 = plot(expr, (x, 0, 5), adaptive=False, n=1000, ylim=(-200, 200), show=False)
PlotGrid(1, 2, p1, p2)
Now, we can do:
nsolve(expr, x, 0.2)
# out: 0.169003536680445
nsolve(expr, x, 4)
# out: 4.28968831654177
EDIT: to find all roots (even the complex one), we can:
compute the derivative of the expression.
convert both the expression and the derivative to numerical functions with sympy's lambdify.
visually inspect the expression in the complex plane to determine good initial values for the root finding algorithm. I'm going to use this plotting module, SymPy Plotting Backend which exposes a very handy function, plot_complex, to generate domain coloring plots. In particular, I will plot alternating black and white stripes corresponding to modulus.
use scipy's newton method to compute the actual roots. EDIT: I just discovered that nsolve works too :)
# step 1 and 2
f = lambdify(x, expr)
f_der = lambdify(x, expr.diff(x))
# step 3
from spb import plot_complex
r = (x, -1-0.8j, 4.5+0.8j)
w = r[1].real - r[2].real
h = r[1].imag - r[2].imag
# number of discretization points, watch out memory usage
n1 = 1500
n2 = int(h / w * n1)
plot_complex(expr, r, {"interpolation": "spline36"}, grid=False, coloring="e", n1=n1, n2=n2, size=(10, 5))
In the above picture we see circular stripes getting bigger and deforming. The center of these circular stripes represent a pole or a zero. But this is an easy case: there are no poles. So, from the above pictures we count 7 zeros. We already know 3, the two computed above and the value 0. Let's find the others:
from scipy.optimize import newton
r1 = newton(f, x0=-0.9+0.1j, fprime=f_der)
r2 = newton(f, x0=-0.9-0.1j, fprime=f_der)
r3 = newton(f, x0=0.6+0.6j, fprime=f_der)
r4 = newton(f, x0=0.6-0.6j, fprime=f_der)
for r in (r1, r2, r3, r4):
print(r, ": is it a zero?", expr.subs(x, r).evalf())
# out:
# (-0.9202719950522663+0.09010409402273806j) : is it a zero? -8.21787666002984e-15 + 2.06697764417957e-15*I
# (-0.9202719950522663-0.09010409402273806j) : is it a zero? -8.21787666002984e-15 - 2.06697764417957e-15*I
# (0.6323265751497729+0.6785871500619469j) : is it a zero? -2.2103533615688e-15 - 2.77549897301442e-15*I
# (0.6323265751497729-0.6785871500619469j) : is it a zero? -2.2103533615688e-15 + 2.77549897301442e-15*I
As you can see, inserting those values into the original expression get values very very close to zero. It is perfectly normal to see these kind of errors.
I just discovered that you can use also use nsolve instead of newton to compute complex roots. This makes step 1 and 2 unnecessary.
nsolve(expr, x, -0.9+0.1j)
# out: −0.920271995052266+0.0901040940227375𝑖
I have few questions regarding Scharr derivatives and its OpenCV implementation.
I am interested in second order image derivatives with (3X3) kernels.
I started with Sobel second derivative, which failed to find some thin lines in the images. After reading the Sobel and Charr comparison in the bottom of this page, I decided to try Scharr instead by changing this line:
Sobel(gray, grad, ddepth, 2, 2, 3, scale, delta, BORDER_DEFAULT);
to this line:
Scharr(img, gray, ddepth, 2, 2, scale, delta, BORDER_DEFAULT );
My problem is that it seems like cv::Scharr allows performing an only first order of one partial derivative at a time, So I get the following error:
error: (-215) dx >= 0 && dy >= 0 && dx+dy == 1 in function getScharrKernels
(see assertion line here)
Following this restriction, I have a few questions regarding Scharr derivatives:
Is it considered bad-practice to use high order Scharr derivatives? Why did OpenCV choose to assert dx+dy == 1?
If I am to call Scharr twice for each axis, What is the correct way to combine the results?
I am currently using:
addWeighted( abs_grad_x, 0.5, abs_grad_y, 0.5, 0, grad );
but I am not sure that this how the Sobel function combines the two axis and in what order it should be done for all 4 derivatives.
If I am to compute the (dx=2,dy=2) derivative by using 4 different kernels, I would like to reduce processing time by unifying all 4 kernels into 1 before applying it on the image (I assume that this is what cv::Sobel does). Is there a reasonable way to create such combined Shcarr kernel and convolve it with my image?
Thanks!
I've never read the original Scharr paper (the dissertation is in German) so I don't know the answer to why the Scharr() function doesn't allow higher order derivatives. Maybe because of the first point I make in #3 below?
The Scharr function is supposed to be a derivative. And the total derivative of a multivariable function f(x) = f(x0, ..., xN) is
df/dx = dx0*df/dx0 + ... + dxN*df/dxN
That is, the sum of the partials each multiplied by the change. In the case of images of course, the change dx in the input is a single pixel, so it's equivalent to 1. In other words, just sum the partials; not weighting them by half. You can use addWeighted() with 1s as the weights, or you can just sum them, but to make sure you won't saturate your image you'll need to convert to a float or 16-bit image first. However, it's also pretty common to compute the Euclidean magnitude of the derivatives, too, if you're trying to get the gradient instead of the derivative.
However, that's just for the first-order derivative. For higher orders, you need to apply some chain ruling. See here for the details of combining a second order.
Note that an optimized kernel for first-order derivatives is not necessarily the optimal kernel for second-order derivatives by applying it twice. Scharr himself has a paper on optimizing second-order derivative kernels, you can read it here.
With that said, filters are split into x and y directions to make linear separable filters, which basically turn your 2d convolution problem into two 1d convolutions with smaller kernels. Think of the Sobel and Scharr kernels: for the x direction, they both just have the single column on either side with the same values (except one is negative). When you slide the kernel across the image, at the first location, you're multiplying the first column and the third column by the values in your kernel. And then two steps later, you're multiplying the third and the fifth. But the third was already computed, so that's wasteful. Instead, since both sides are the same, just multiply each column by the vector since you know you need those values, and then you can just look up the values for the results in column 1 and 3 and subtract them.
In short, I don't think you can combine them with built-in separable filter functions, because certain values are positive sometimes, and negative otherwise; and the only way to know when applying a filter linearly is to do them separately. However, we can examine the result of applying both filters and see how they affect a single pixel, construct the 2D kernel, and then convolve with OpenCV.
Suppose we have a 3x3 image:
image
=====
a b c
d e f
g h i
And we have the Scharr kernels:
kernel_x
========
-3 0 3
-10 0 10
-3 0 3
kernel_y
========
-3 -10 -3
0 0 0
3 10 3
The result of applying each kernel to this image gives us:
image * kernel_x
================
-3a -10b -3c
+0d +0e +0f
+3g +10h +3i
image * kernel_y
================
-3a +0b +3c
-10d +0e +10f
-3g +0h +3i
These values are summed and placed into pixel e. Since the sum of both of these is the total derivative, we sum all these values into pixel e at the end of the day.
image * kernel_x + image * kernel y
===================================
-3a -10b -3c +3g +10h +3i
-3a +3c -10d +10f -3g +3i
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
-6a -10b +0c -10d +10f +0g +10h +6i
And this is the same result we'd have gotten if we multiplied by the kernel
kernel_xy
=============
-6 -10 0
-10 0 10
0 10 6
So there's a 2D kernel that does a single-order derivative. Notice anything interesting? It's just the addition of the two kernels. Is that surprising? Not really, as x(a+b) = ax + bx. Now we can pass that into filter2D()
to compute the addition of the derivatives. Does that actually give the same result?
import cv2
import numpy as np
img = cv2.imread('cameraman.png', 0).astype(np.float32)
kernel = np.array([[-6, -10, 0],
[-10, 0, 10],
[0, 10, 6]])
total_first_derivative = cv2.filter2D(img, -1, kernel)
scharr_x = cv2.Scharr(img, -1, 1, 0)
scharr_y = cv2.Scharr(img, -1, 0, 1)
print((total_first_derivative == (scharr_x + scharr_y)).all())
True
Yep. Now I guess you can just do it twice.
I have a columns with the following V:
Voltage
-46.1
-46.1
-46.1
-46.1
-46.1
-46.1
-46.1
-46.1
-46.1
-45.6
i wrote the following function to do that:
def Percentage_change( data):
list2=data
list1=[]
for i in range(len(data)):
if i==0:
list1.append(0)
else:
try:
list1.append((float(list2[i])-float(list2[i-1]))/float(list2[i-1]))
except ZeroDivisionError: list1.append(0)
#print(" your are trying to divide something par zero",float(list2[i-1]))
return list1
Now i want to check if the % change is greater than a percentage. I have the following check:
per_change = Percentage_change(v) #v is the voltage values
for i in range(len(per_change)):
if (per_change[i]<=-0.2 or per_change[i]>=0.2):
print("too much percentage change")
The problem is that, for the last value (-45.6) execute as true while in reality, its percentage change is -0.01. What is wrong with my code?
Let's make an approximate calculation (with only absolute numbers):
((float(list2[i])-float(list2[i-1])) is at the last value pair 0.5
float(list2[i-1]) is at the last value pair approx. 50
Thus, the devision is approx. 0.5 / 50. Thus v is approx. 0.01 (approx. one per cent of the total voltage).
But, your limit is 0.2 (which corresponds to twenty percent)
I conclude: the limit of 0.2 is not hurt. The print will not come.
def Percentage_change(data):
abs_change = [(data[i] - data[i-1]) for i in range(1, len(data))]
percent_change = []
for i in range(len(abs_change)):
if abs_change[i] is 0.0:
percent_change.append(0.0)
elif abs_change[i] is not 0.0:
percent_change.append(100.0*abs_change[i]/abs(float(data[i])))
#Depending on what you want you can add the following line
percent_change.insert(0, 0.0)
return percent_change
V = [-46.1,-46.1,-46.1,-46.1,-46.1,-46.1,-46.1,-46.1,-46.1,-45.6]
per_change = Percentage_change(V)
for value in per_change:
if abs(value)> 20.0:
print 'too much percentage change'
Output:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0845986984815619]
Note: This code works good as long as the list 'data' does not contain 0.
I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
This function handles the situation where vector v has the norm value of 0.
Is there any similar functions provided in sklearn or numpy?
If you're using scikit-learn you can use sklearn.preprocessing.normalize:
import numpy as np
from sklearn.preprocessing import normalize
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True
I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.
import numpy as np
def normalized(a, axis=-1, order=2):
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
l2[l2==0] = 1
return a / np.expand_dims(l2, axis)
A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))
print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))
This might also work for you
import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))
but fails when v has length 0.
In that case, introducing a small constant to prevent the zero division solves this.
As proposed in the comments one could also use
v/np.linalg.norm(v)
To avoid zero division I use eps, but that's maybe not great.
def normalize(v):
norm=np.linalg.norm(v)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
If you have multidimensional data and want each axis normalized to its max or its sum:
def normalize(_d, to_sum=True, copy=True):
# d is a (n x dimension) np array
d = _d if not copy else np.copy(_d)
d -= np.min(d, axis=0)
d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
return d
Uses numpys peak to peak function.
a = np.random.random((5, 3))
b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1
c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1
If you don't need utmost precision, your function can be reduced to:
v_norm = v / (np.linalg.norm(v) + 1e-16)
You mentioned sci-kit learn, so I want to share another solution.
sci-kit learn MinMaxScaler
In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.
It also deal with NaN issues for us.
NaNs are treated as missing values: disregarded in fit, and maintained
in transform. ... see reference [1]
Code sample
The code is simple, just type
# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)
Reference
[1] sklearn.preprocessing.MinMaxScaler
There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:
import transformations as trafo
import numpy as np
data = np.array([[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
[1.0, 2.0, 3.0]])
print(trafo.unit_vector(data, axis=1))
If you work with multidimensional array following fast solution is possible.
Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.
import numpy as np
arr = np.array([
[1, 2, 3],
[0, 0, 0],
[5, 6, 7]
], dtype=np.float)
lengths = np.linalg.norm(arr, axis=-1)
print(lengths) # [ 3.74165739 0. 10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0. 0. 0. ]
# [0.47673129 0.57207755 0.66742381]]
If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:
import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize
vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()
If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.
import numpy as np
import vg
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True
I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.
Without sklearn and using just numpy.
Just define a function:.
Assuming that the rows are the variables and the columns the samples (axis= 1):
import numpy as np
# Example array
X = np.array([[1,2,3],[4,5,6]])
def stdmtx(X):
means = X.mean(axis =1)
stds = X.std(axis= 1, ddof=1)
X= X - means[:, np.newaxis]
X= X / stds[:, np.newaxis]
return np.nan_to_num(X)
output:
X
array([[1, 2, 3],
[4, 5, 6]])
stdmtx(X)
array([[-1., 0., 1.],
[-1., 0., 1.]])
For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.
a / np.linalg.norm(a, axis=1, keepdims=True)
If you want all values in [0; 1] for 1d-array then just use
(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
Where a is your 1d-array.
An example:
>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])
Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.
A simple dot product would do the job. No need for any extra package.
x = x/np.sqrt(x.dot(x))
By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use
norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x
I am using matplotlib to plot a series of horizontal lines that overlap. I would like to indicate (in a very rough way) how much overlap there is via transparency. For example if I have ten lines and 5 of them overlap over a certain interval, I would like that interval to have an alpha value of 0.5. If all of them overlap over a certain interval then the interval should have an alpha value of 1.0. The following code should illustrate what I want:
import matplotlib.pyplot as plt
y = [1,1,1,1,1,1,1,1,1,1]
x_start = [0,0,0,0,0,0,0,0,0,0]
x_end = [1,2,3,4,5,6,7,8,9,10]
plt.hlines(y, x_start, x_end, linewidth=7, colors='red', alpha=0.1)
plt.hlines(1.2, 0, 10, linewidth=7, colors='red', alpha=1)
plt.ylim(0.8, 1.4)
plt.show()
I would like the transparency of the red from x=0 to x=1 for the line at y=1 to be the same as that of the horizontal line at y=1.2 (not transparent at all). However this is not the case.
Is there a way to achieve what I want with matplotlib and the alpha values? I will know the total number of lines that can possibly overlap (i.e., how many lines overlapping should correspond to 0 transparency).
Thanks to #cphlewis who got me pointed in the right direction I now have an approximation that works well enough for my needs.
My problem is much easier than the general problem since I want to assign each line (layer) the exact same transparency level s. If there are n=2 lines I want the transparency when both lines overlap to be close to 0, e.g. alpha=0.97.
If n=2 and alpha=0.97, solving
0.97 = s + s(1-s)
for s yields s=0.827.
Generalizing this for any n leads to solving a polynomial where the coefficients are given by the n'th row of Pascal's triangle and where the sign of each coefficient is equal to
(-1)^(n + pos)
where pos is the position of the coefficient in Pascal's triangle from left to right and where pos starts at 1. Also, the last coefficient in Pascal's triangle is replaced with the desired alpha value.
So for n=5 the polynomial to be solved is
s^5 - 5s^4 + 10s^3 - 10s^2 + 5s - 0.97 = 0
The following Python code solves for the smallest real root (which is the alpha value that I want) given n and alpha (note that alpha < 1).
import numpy as np
import scipy.linalg
num_lines = 5
end_alpha_value = 0.97 ## end_alpha_value must be in the interval (0, 1)
pascal_triangle = scipy.linalg.pascal(num_lines + 1, kind='lower')
print 'num_reps: 1, minimum real root: %.3f' % end_alpha_value
for i in range(2, num_lines + 1):
coeff_list = []
for j, coeff in enumerate(pascal_triangle[i][:i]):
coeff_list.append(coeff * ((-1)**(i+j+1)))
coeff_list.append(-end_alpha_value)
all_roots = np.roots(coeff_list)
real_roots = all_roots[np.isreal(all_roots)]
min_real_root = min(real_roots)
real_valued = min_real_root.real[abs(min_real_root.imag) < 1e-5]
print 'num_reps: %i, minimum real root: %.3f' % (i, real_valued[0])
For the case n=10 if the desired transparency is alpha=0.97 then s=0.296 resulting in the following output:
I believe what is going on shows up better using black as the color: