I'm trying to use the function collect() to simplify mi expression . My desired result is
My code:
from sympy import *
#index
i = symbols('i' , integer = True )
#constants
a = symbols( 'a' )
#variables
alpha = IndexedBase('alpha', positive=True, domain=QQ)
index = (i, 1, 3)
rho = symbols( 'rho')
U = product( alpha[i]**(1/(rho-1)) , index )
U
:
My solution attempt:
U = U.subs(1/(rho-1),a)
collect(U,rho, evaluate=False)[1]
:
What I'm doing wrong?
You must be using a fairly old version of SymPy because in recent versions the form that you wanted arises automatically. In any case you should be able to use powsimp:
In [9]: U
Out[9]:
a a a
alpha[1] ⋅alpha[2] ⋅alpha[3]
In [10]: powsimp(U, force=True)
Out[10]:
a
(alpha[1]⋅alpha[2]⋅alpha[3])
https://docs.sympy.org/latest/tutorials/intro-tutorial/simplification.html#powsimp
This is my code:
from sympy import *
from sympy.parsing.sympy_parser import parse_expr
x, y, z, t = symbols('x y z t')
print(N('abs(2)'))
It returns abs(2) instead of 2 running on Jupyter Notebook on Anaconda. Isn't N() meant to evaluate numerical expressions?
I thought that when you give N() a string, it parses automatically, but just in case I checked:
expr = parse_expr('abs(2)')
print(N(expr))
This again returns abs(2)
The function is called Abs in sympy. What you get back from parse_expr is an arbitrary function that just happens to be called abs:
In [8]: parse_expr('f(2)')
Out[8]: f(2)
In [9]: parse_expr('abs(2)')
Out[9]: abs(2)
In [10]: parse_expr('Abs(2)')
Out[10]: 2
I am trying to create a term document matrix using my custom analyser to extract features out of the documents. Following is the code for the same :
vectorizer = CountVectorizer( \
ngram_range=(1,2),
)
analyzer=vectorizer.build_analyzer()
def customAnalyzer(text):
grams = analyzer(text)
tgrams = [gram for gram in grams if not re.match("^[0-9\s]+$",gram)]
return tgrams
This function is called to create the custom analyser, which is used by the countVectorizer to extract the features.
for i in xrange( 0, num_rows ):
clean_query.append( review_to_words( inp["keyword"][i] , units))
vectorizer = CountVectorizer(analyzer = customAnalyzer, \
tokenizer = None, \
ngram_range=(1,2), \
preprocessor = None, \
stop_words = None, \
max_features = n,
)
features = vectorizer.fit_transform(clean_query)
z = vectorizer.get_feature_names()
This call throws the following error:
(<type 'exceptions.NotImplementedError'>, 'python.py', 128,NotImplementedError('adding a nonzero scalar to a sparse matrix is not supported',))
This error comes when we call the vectorizer to fit and transform.
But the value of the variable clean_query is not scalar. I am using sklearn-0.17.1
np.isscalar(clean_query)
False
This is a small test which I did to reproduce the error, but it did not throw the same error for me. (This example has been taken from : scikit-learn Feature extraction)
scikit-learn version : 0.19.dev0
In [1]: corpus = [
...: ... 'This is the first document.',
...: ... 'This is the second second document.',
...: ... 'And the third one.',
...: ... 'Is this the first document?',
...: ... ]
In [2]: from sklearn.feature_extraction.text import TfidfVectorizer
In [3]: vectorizer = TfidfVectorizer(min_df=1)
In [4]: vectorizer.fit_transform(corpus)
Out[4]:
<4x9 sparse matrix of type '<type 'numpy.float64'>'
with 19 stored elements in Compressed Sparse Row format>
In [5]: import numpy as np
In [6]: np.isscalar(corpus)
Out[6]: False
In [7]: type(corpus)
Out[7]: list
From the code above you can see, corpus is not a scalar and has the type list.
I think your solution lies in creating the clean_query variable, as expected by the vectorizer.fit_transform function.
I am brand new to Cython. How to convert the Python function called Values below to Cython? With factors=2 and i=60 this takes 2.8 secs on my big Linux box. The goal is sub 1 sec with factors=2 and i=360.
Here's the code. Thanks!
import numpy as np
import itertools
class Numeraire:
def __init__(self, rate):
self.rate = rate
def __call__(self, timenext, time, state):
return np.exp(-self.rate*(timenext - time))
def Values(values, i1, i0=0, numeraire=Numeraire(0.)):
factors=len(values.shape)
norm=0.5**factors
for i in np.arange(i1-1, i0-1, -1):
for j in itertools.product(np.arange(i+1), repeat=factors):
value = 0.
for k in itertools.product(np.arange(2), repeat=factors):
value += values[tuple(np.array(j) + np.array(k))]
values[j] = value*norm*numeraire(i+1, i, j)
return values
factors = 2
i = 60
values = np.ones([i+1]*factors)
Values(values, i, numeraire=Numeraire(0.05/12))
print values[(0,)*factors], np.exp(-0.05/12*i)
Here's my latest answer (no Cython!), which runs in 125 msec for the factor=2, i=360 case.
import numpy as np
import itertools
slices = (slice(None, -1, None), slice(1, None, None))
def Expectation(values, numeraire, i, i0=0):
def Values(values, i):
factors = values.ndim
expect = np.zeros((i,)*factors)
for j in itertools.product(slices, repeat=factors):
expect += values[j]
return expect*0.5**factors*numeraire(i, i-1)
return reduce(Values, range(i, i0, -1), values)
class Numeraire:
def __init__(self, factors, rate=0):
self.factors = factors
self.rate = rate
def __call__(self, timenext, time):
return np.full((time+1,)*factors, np.exp(-self.rate*(timenext - time)))
factors = 2
i = 360
values, numeraire = np.ones((i+1,)*factors), Numeraire(factors, 0.05/12)
%timeit Expectation(values, numeraire, i)
Expectation(values, numeraire, i)[(0,)*factors], np.exp(-0.05/12*i)
Before using Cython, you should optimize your code with Numpy. Here, vectorizing the third and second inner for loops, yields a x40 speed-up,
In [1]: import numpy as np
...: import itertools
...:
...: # define Numaire and Values functions from the question above
...:
...: def Values2(values, i1, i0=0, numeraire=Numeraire(0.)):
...: factors=len(values.shape)
...: norm=0.5**factors
...: k = np.array(list(itertools.product(np.arange(2), repeat=factors)))
...: for i in np.arange(i1-1, i0-1, -1):
...: j = np.array(list(itertools.product(np.arange(i+1), repeat=factors)))
...: mask_all = j[:,:,np.newaxis] + k.T[np.newaxis, :, :]
...: mask_x, mask_y = np.swapaxes(mask_all, 2, 1).reshape(-1, 2).T
...:
...: values_tmp = values[mask_x, mask_y].reshape((j.shape[0], k.shape[0]))
...: values_tmp = values_tmp.sum(axis=1)
...: values[j[:,0], j[:,1]] = values_tmp*norm*numeraire(i+1, i, j)
...: return values
...:
...: factors = 2
...: i = 60
...: values = lambda : np.ones([i+1]*factors)
...: print values()[(0,)*factors], np.exp(-0.05/12*i)
...:
...: res = Values(values(), i, numeraire=Numeraire(0.05/12))
...: res2 = Values2(values(), i, numeraire=Numeraire(0.05/12))
...: np.testing.assert_allclose(res, res2)
...:
...: %timeit Values(values(), i, numeraire=Numeraire(0.05/12))
...: %timeit Values2(values(), i, numeraire=Numeraire(0.05/12))
...:
1.0 0.778800783071
1 loops, best of 3: 1.26 s per loop
10 loops, best of 3: 31.8 ms per loop
The next step would be to replace the line,
j = np.array(list(itertools.product(np.arange(i+1), repeat=factors)
with it's Numpy equivalent, taken from this answer (not very pretty),
def itertools_product_numpy(some_list, some_length):
return some_list[np.rollaxis(
np.indices((len(some_list),) * some_length), 0, some_length + 1)
.reshape(-1, some_length)]
k = itertools_product_numpy(np.arange(i+1), factors)
this result in an overall x160 speed up and the code runs in 1.2 second on my laptop for i=360 and factors = 2.
In this last version, I don't think that you will get much speed up, if you port it to Cython, since there is just one loop remaining and it has only ~360 iterations. Rather, some fine-tuned Python/Numpy optimizations should be performed to get a further speed increase.
Alternatively, you can try applying Cython to your original implementation. However because it is based on itertools.product, which is slow when called repeatedly in a loop, Cython will not help there.
I have a set like this.
x = set([u'[{"Mychannel":"sample text"},"p"]'])
I need to convert it into Dict.
I need to get output as
x = {'mychannel':'sampletext'}
How to do this.
It looks like you can unpack that crazy thing like this:
>>> x = set([u'[{"Mychannel":"sample text"}, "p"]'])
>>> lst = list(x)
>>> lst
[u'[{"Mychannel":"sample text"}, "p"]']
>>> lst[0]
u'[{"Mychannel":"sample text"}, "p"]'
>>> inner_lst = eval(lst[0])
>>> inner_lst
[{'Mychannel': 'sample text'}, 'p']
>>> d = inner_lst[0]
>>> d
{'Mychannel': 'sample text'}
However, as #MattDMo suggests in comments, I seriously suggest you re-evaluate this data structure, if not at least to factor out the step where you need eval to use it!