Related
I want to extract all coefficients (INCLUDING ZEROS) of a multariable polynomial using sympy.
Sympy offers all_coeffs() but it only works for univariate. Otherwise I get this error PolynomialError: multivariate polynomials not supported
For example for a polynomial x^3+y^3+x*y+1 I woud like the output to be [3,3,0,0,0,0,1,0,0,1]
If you make the monomials of interest then you can see what their coefficients are in your expression. You have to watch out for requesting a monomial coefficient of x*y in an expression with terms like x*y*z, however. The following routine handles that by zeroing out any variables in the coefficient that is obtained. It also has a routine to create monomials of interest.
def all_coeffs(expr,*free):
x = IndexedBase('x')
expr = expr.expand()
free = list(free) or list(expr.free_symbols)
pows = [p.as_base_exp() for p in expr.atoms(Pow,Symbol)]
P = {}
for p,e in pows:
if p not in free:
continue
elif p not in P:
P[p]=e
elif e>P[p]:
P[p] = e
reps = dict([(f, x[i]) for i,f in enumerate(free)])
xzero = dict([(v,0) for k,v in reps.items()])
e = expr.xreplace(reps); reps = {v:k for k,v in reps.items()}
return dict([(m.xreplace(reps), e.coeff(m).xreplace(xzero) if m!=1 else e.xreplace(xzero)) for m in monoms(*[P[f] for f in free])])
def monoms(*o):
x = IndexedBase('x')
f = []
for i,o in enumerate(o):
f.append(Poly([1]*(o+1),x[i]).as_expr())
return Mul(*f).expand().args
>>> eq = x**2 + x*y - 3
>>> all_coeffs(eq)
{1: -3, x**2: 1, x**2*y: 0, x*y: 1, y: 0, x: 0}
>>> all_coeffs(eq, x)
{1: -3, x**2: 1, x: y}
I have a table, represented by an np.array like the following:
A = [[12,412,42,54],
[144,2,42,4],
[2,43,22,10]]
And a list that contains the desired starting point of each row in A:
L=[0,2,1]
The desired output would be:
B = [[12,412,42,54],
[42,4,np.nan,np.nan],
[43,22,10,np.nan]]
Edit
I prefer to avoid using a for-loop for obvious reasons.
Try compare the L with column index, then use boolean set/get items:
# convert A to numpy array for advanced indexing
A = np.array(A)
ll = A.shape[1]
keep = np.arange(ll) >= np.array(L)[:,None]
out = np.full(A.shape, np.nan)
out[keep[:,::-1]] = A[keep]
print(out)
Output:
[[ 12. 412. 42. 54.]
[ 42. 4. nan nan]
[ 43. 22. 10. nan]]
My guess would be that a vectorized approach for this would be less efficient than explicit looping, because the result is fundamentally a jagged array, which NumPy does not support well.
However, a loop-based solution is simple, that can be made faster with Numba's nb.njit(), if needed.:
import numpy as np
import numba as nb
#nb.njit
def jag_nb(arr, starts, empty=np.nan):
result = np.full(arr.shape, empty)
for i, x in enumerate(starts):
if x != 0:
result[i, :-x] = arr[i, x:]
else:
result[i, :] = arr[i, :]
return result
A = np.array([[12,412,42,54], [144,2,42,4], [2,43,22,10]])
L = np.array([0,2,1])
jag(A, L)
# array([[ 12., 412., 42., 54.],
# [ 42., 4., nan, nan],
# [ 43., 22., 10., nan]])
Compared to the pure NumPy vectorized approach proposed in #QuangHoang's answer:
def jag_np(arr, starts, empty=np.nan):
m, _ = arr.shape
keep = np.arange(m) >= starts[:, None]
result = np.full(arr.shape, np.nan)
result[keep[:, ::-1]] = arr[keep]
return result
The Numba based approach is noticeably faster, as shown with the following benchmarks:
import pandas as pd
import matplotlib.pyplot as plt
def benchmark(
funcs,
ii=range(4, 10, 1),
is_equal=lambda x, y: np.allclose(x, y, equal_nan=True),
seed=0,
unit="ms",
verbose=True,
use_str=True
):
labels = [func.__name__ for func in funcs]
units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
assert unit in units
np.random.seed(seed)
timings = {}
for i in ii:
m = n = 2 ** i
if verbose:
print(f"i={i}, n={n}")
arr = np.random.random((m, n))
starts = np.random.randint(0, n, m)
base = funcs[0](arr, starts)
timings[n] = []
for func in funcs:
res = func(arr, starts)
is_good = is_equal(base, res)
timed = %timeit -n 64 -r 8 -q -o func(arr, starts)
timing = timed.best
timings[n].append(timing if is_good else None)
if verbose:
print(
f"{func.__name__:>24}"
f" {is_good!s:5}"
f" {timing * (10 ** units[unit]):10.3f} {unit}"
f" {timings[n][0] / timing:5.1f}x")
return timings, labels
def plot(timings, labels, title=None, xlabel="Input Size / #", unit="ms"):
n_rows = 1
n_cols = 3
fig, axs = plt.subplots(n_rows, n_cols, figsize=(8 * n_cols, 6 * n_rows), squeeze=False)
units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
df = pd.DataFrame(data=timings, index=labels).transpose()
base = df[[labels[0]]].to_numpy()
(df * 10 ** units[unit]).plot(marker="o", xlabel=xlabel, ylabel=f"Best timing / {unit}", ax=axs[0, 0])
(df / base * 100).plot(marker='o', xlabel=xlabel, ylabel='Relative speed / %', logx=True, ax=axs[0, 1])
(base / df).plot(marker='o', xlabel=xlabel, ylabel='Speed Gain / x', ax=axs[0, 2])
if title:
fig.suptitle(title)
fig.patch.set_facecolor('white')
funcs = jag_np, jag_nb
timings, labels = benchmark(funcs, ii=range(4, 11))
plot(timings, labels, unit="ms")
I use the following code to count the number occurrence of two values (1, -1):
import numpy as np
a = np.empty(0, dtype=np.int)
tmp = [-1,1,1,1,1,1,-1, 1, -1]
a = np.append(a, tmp)
unique, counts = np.unique(a, return_counts=True)
r = dict(zip(unique, counts))
print r
if r.values()[0] > r.values()[1]:
print r.keys()[0]
else:
print r.keys()[1]
The problem is that tmp can be sometimes all 1s or -1s which causes to fail the printing. The possible solve I can think of is to add a null-like key of zero value. For instance, when tmp=[1,1,1,1], r should be {1: 4, -1: 0} and visa versa. How can I modify this code to do so?
Thank you
One trick given that the input list/array contains only -1 and 1 would be to use offsetted array (offset by 1 to make -1s as 0s and 1s as 2s) for binned counting with np.bincount and then slicing with a step-size of 2 to consider counting for -1 and 1 -
dict(zip([-1,1],np.bincount(a+1,minlength=3)[::2]))
Sample runs -
In [954]: a = np.array([-1,1,1,1,1,1,-1,1,-1])
In [955]: dict(zip([-1,1],np.bincount(a+1,minlength=3)[::2]))
Out[955]: {-1: 3, 1: 6}
In [956]: a = np.array([-1,-1,-1,-1])
In [957]: dict(zip([-1,1],np.bincount(a+1,minlength=3)[::2]))
Out[957]: {-1: 4, 1: 0}
In [958]: a = np.array([1,1,1,1])
In [959]: dict(zip([-1,1],np.bincount(a+1,minlength=3)[::2]))
Out[959]: {-1: 0, 1: 4}
If you just need which one of -1 or 1 has the bigger count, simply do -
np.bincount(a+1,minlength=3).argmax()-1
Just for free, say you have
>>> uvalues = [-1,1]
which stands for the list of values to count.
What about doing {uvalue:r.get(uvalue,0) for uvalue in uvalues}
Use case
>>> a = np.array([-1,-1, -1, -1])
>>> unique, counts = np.unique(a, return_counts=True)
>>> r = dict(zip(unique, counts))
>>> r
{-1: 4}
>>> {uvalue:r.get(uvalue,0) for uvalue in uvalues}
{1: 0, -1: 4}
An other natural (and fast) solution using collections.Counter :
from collections import Counter
tmp = [1,1,1,1,1,1]
c=Counter({1:0,-1:0}) # init
c.update(tmp)
#Counter({-1: 0, 1: 6})
I have a numpy array of 3 million points in the form of [pt_id, x, y, z]. The goal is to return all pairs of points that have an Euclidean distance two numbers min_d and max_d.
The Euclidean distance is between x and y and not on the z. However, I'd like to preserve the array with pt_id_from, pt_id_to, distance attributes.
I'm using scipy's dist to calculate the distances:
import scipy.spatial.distance
coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000],
['pt2', 2479539.000, 7287455.000, 4.900],
['pt3', 2479626.000, 7287458.000, 10.000],
['pt4', 2484097.000, 7292784.000, 8.800],
['pt5', 2484106.000, 7293079.000, 7.300],
['pt6', 2484095.000, 7292891.000, 11.100]])
dists = scipy.spatial.distance.pdist(coords_arr[:,1:3], 'euclidean')
np.savetxt('test.out', scipy.spatial.distance.squareform(dists), delimiter=',')
What should I do to return an array of form: [pt_id_from, pt_id_to, distance]?
You simply create a new array from the data by looping through all the possible combinations. The itertools module is excellent for this.
n = coords_arr.shape[0] # number of points
D = scipy.spatial.distance.squareform(dists) # distance matrix
data = []
for i, j in itertools.combinations(range(n), 2):
pt_a = coords_arr[i, 0]
pt_b = coords_arr[j, 0]
d_ab = D[i,j]
data.append([pt_a, pt_b, d_ab])
result_arr = np.array(data)
If memory is a problem, you might want to change the distance lookup from using the huge matrix D to looking up the value directly in dists using the i and j index.
Well, ['pt1', 'pt2', distance_as_number] is not exactly possible. The closest you can get with mixed datatypes is a structured array but then you can't do things like result[:2,0]. You'll have to index field names and array indices separately like: result[['a','b']][0].
Here is my solution:
import numpy as np
import scipy.spatial.distance
coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000],
['pt2', 2479539.000, 7287455.000, 4.900],
['pt3', 2479626.000, 7287458.000, 10.000],
['pt4', 2484097.000, 7292784.000, 8.800],
['pt5', 2484106.000, 7293079.000, 7.300],
['pt6', 2484095.000, 7292891.000, 11.100]])
dists = scipy.spatial.distance.pdist(coords_arr[:,1:3], 'euclidean')
# Create a shortcut for `coords_arr.shape[0]` which is basically
# the total amount of points, hence `n`
n = coords_arr.shape[0]
# `a` and `b` contain the indices of the points which were used to compute the
# distances in dists. In this example:
# a = [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
# b = [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
a = np.arange(n).repeat(np.arange(n-1, -1, -1))
b = np.hstack([range(x, n) for x in xrange(1, n)])
min_d = 1000
max_d = 10000
# Find out which distances are in range.
in_range = np.less_equal(min_d, dists) & np.less_equal(dists, max_d)
# Define the datatype of the structured array which will be the result.
dtype = [('a', '<f8', (3,)), ('b', '<f8', (3,)), ('dist', '<f8')]
# Create an empty array. We fill it later because it makes the code cleaner.
# Its size is given by the sum over `in_range` which is possible
# since True and False are equivalent to 1 and 0.
result = np.empty(np.sum(in_range), dtype=dtype)
# Fill the resulting array.
result['a'] = coords_arr[a[in_range], 1:4]
result['b'] = coords_arr[b[in_range], 1:4]
result['dist'] = dists[in_range]
print(result)
# In caste you don't want a structured array at all, this is what you can do:
result = np.hstack([coords_arr[a[in_range],1:],
coords_arr[b[in_range],1:],
dists[in_range, None]]).astype('<f8')
print(result)
The structured array:
[([2479539.0, 7287455.0, 4.9], [2484097.0, 7292784.0, 8.8], 7012.389393067102)
([2479539.0, 7287455.0, 4.9], [2484106.0, 7293079.0, 7.3], 7244.7819152821985)
([2479539.0, 7287455.0, 4.9], [2484095.0, 7292891.0, 11.1], 7092.75912462844)
([2479626.0, 7287458.0, 10.0], [2484097.0, 7292784.0, 8.8], 6953.856268287403)
([2479626.0, 7287458.0, 10.0], [2484106.0, 7293079.0, 7.3], 7187.909362255481)
([2479626.0, 7287458.0, 10.0], [2484095.0, 7292891.0, 11.1], 7034.873843929257)]
The ndarray:
[[2479539.0, 7287455.0, 4.9, 2484097.0, 7292784.0, 8.8, 7012.3893],
[2479539.0, 7287455.0, 4.9, 2484106.0, 7293079.0, 7.3, 7244.7819],
[2479539.0, 7287455.0, 4.9, 2484095.0, 7292891.0, 11.1, 7092.7591],
[2479626.0, 7287458.0, 10.0, 2484097.0, 7292784.0, 8.8, 6953.8562],
[2479626.0, 7287458.0, 10.0, 2484106.0, 7293079.0, 7.3, 7187.9093],
[2479626.0, 7287458.0, 10.0, 2484095.0, 7292891.0, 11.1, 7034.8738]]
You can use np.where to get a coords of distances within a range, then generate a new list in your format, filtering same pairs. Like this:
>>> import scipy.spatial.distance
>>> import numpy as np
>>> coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000],
... ['pt2', 2479539.000, 7287455.000, 4.900],
... ['pt3', 2479626.000, 7287458.000, 10.000],
... ['pt4', 2484097.000, 7292784.000, 8.800],
... ['pt5', 2484106.000, 7293079.000, 7.300],
... ['pt6', 2484095.000, 7292891.000, 11.100]])
>>>
>>> dists = scipy.spatial.distance.pdist(coords_arr[:,1:3], 'euclidean')
>>> dists = scipy.spatial.distance.squareform(dists)
>>> x, y = np.where((dists >= 8000) & (dists <= 30000))
>>> [(coords_arr[x[i]][0], coords_arr[y[i]][0], dists[y[i]][x[i]]) for i in xrange(len(x)) if x[i] < y[i]]
[('pt1', 'pt2', 28959.576688895162), ('pt1', 'pt3', 29042.897927032005)]
My code is currently written as:
convert = {0:0,1:1,2:2,3:3,4:0,5:1,6:2,7:1}
rows = [[convert[random.randint(0,7)] for _ in range(5)] for _ in range(5)]
numgood = 25 - rows.count(0)
print numgood
>> 25
It always comes out as 25, so it's not just that rows contains no 0's.
Have you printed rows?
It's [[0, 1, 0, 0, 2], [1, 2, 0, 1, 2], [3, 1, 1, 1, 1], [1, 0, 0, 1, 0], [0, 3, 2, 0, 1]], so you have a nested list there.
If you want to count the number of 0's in those nested lists, you could try:
import random
convert = {0:0, 1:1, 2:2, 3:3, 4:0, 5:1, 6:2, 7:1}
rows = [[convert[random.randint(0, 7)] for _ in range(5)] for _ in range(5)]
numgood = 25 - sum(e.count(0) for e in rows)
print numgood
Output:
18
rows doesn't contain any zeroes; it contains lists, not integers.
>>> row = [1,2,3]
>>> type(row)
<type 'list'>
>>> row.count(2)
1
>>> rows = [[1,2,3],[4,5,6]]
>>> rows.count(2)
0
>>> rows.count([1,2,3])
1
To count the number of zeroes in any of the lists in rows, you could use a generator expression:
>>> rows = [[1,2,3],[4,5,6], [0,0,8]]
>>> sum(x == 0 for row in rows for x in row)
2
You could also use numpy:
import numpy as np
import random
convert = {0:0,1:1,2:2,3:3,4:0,5:1,6:2,7:1}
rows = [[convert[random.randint(0,7)] for _ in range(5)] for _ in range(5)]
numgood = 25 - np.count_nonzero(rows)
print numgood
Output:
9