how to print sympy matrix....why new line? - sympy

Thank you in advance and sorry for the bad English!
how to print sympy matrix....why new line?
return?
〇I will try to make new def.
def myPrintMatrix(Matrix1,Matrix2)
print???????
return
Please give me advice
from sympy import *
def myPrintMatrix():
return Matrix([11,12]), \
Matrix([21,22])
print(myPrintMatrix())
# (Matrix([
# [11],
# [12]]), Matrix([
# [21],
# [22]]))
################################
# I want bellow.
# Matrix([11, 12]),
# Matrix([21, 22])
# or
# Matrix([11, 12]),Matrix([21, 22])
# why print? [12]]), Matrix([

An objective for the str representation of SymPy objects is to have something that looks good and that can be copied and pasted to recreate the object. The newline is printed after Matrix([ so that what you see is mostly the uncluttered matrix elements:
>>> randMatrix(2,2)
Matrix([
[44, 73],
[97, 57]])
instead of
Matrix([[44, 73],[97, 57]])
since the latter loses the 2x2 structural appearance.
If you don't like this, capture the string output and print it by replacing newlines with a space
>>> print(str(_).replace('\n',''))
Matrix([[44, 73], [97, 57]])

Is this what bothers you:
In [353]: Matrix([1,2])
Out[353]:
⎡1⎤
⎢ ⎥
⎣2⎦
In [354]: Matrix([[1,2]])
Out[354]: [1 2]
In [355]: _353.shape
Out[355]: (2, 1)
In [356]: _354.shape
Out[356]: (1, 2)
Regardless of what you supply, Matrix makes a 2d object. It's docs should discuss this.
Array can be 1d:
In [358]: Array([1,2])
Out[358]: [1 2]
In [359]: _.shape
Out[359]: (2,)
I'm more familiar with numpy's distinction between ndarray and matrix, but sympy seems to follow a similar (but not identical) pattern.

I try
from sympy import *
def myPrintNewline(myInput):
return str(myInput).replace('\n','')
def myTest(myT0, myT1):
return Matrix(myT0),Matrix(myT1)
myInput=[[44,73],[97,57]]
myOutput=myTest(myInput[0],myInput[1])
print("#10-line",myOutput)
print("#11-line",myPrintNewline(myOutput))
print("#12-line",myOutput[0])
print("#13-line",myOutput[1])
bb=Matrix([
[44, 73],
[97, 57]])
print("#17-line",bb)
#
# #10-line (Matrix([
# [44],
# [73]]), Matrix([
# [97],
# [57]]))
# #11-line (Matrix([[44],[73]]), Matrix([[97],[57]]))
# #12-line Matrix([[44], [73]])
# #13-line Matrix([[97], [57]])
# #17-line Matrix([[44, 73], [97, 57]])

Related

Is there a complete example to write a mathematical expression in sympy to a Microsoft Word document?

this might be a silly question. But I am desperate. I am a math teacher and I try to generate Math tests. I tried Python for this and I get some things done. However, I am not a professional programmer, so I get lost with MathMl, prettyprint() and whatsoever.
Is there anybody who can supply me a complete example that I can execute? It may just contain one small silly equation, that does not matter. I just want to see how I can get it into a Word document. After that, I can use that as a basis. I work on a Mac.
I hope anyone can help me out. Thanks in advance!
Best regards, Johan
This works for me:
from sympy import *
from docx import Document
from lxml import etree
# create expression
x, y = symbols('x y')
expr1 = (x+y)**2
# create MathML structure
expr1xml = mathml(expr1, printer = 'presentation')
tree = etree.fromstring('<math xmlns="http://www.w3.org/1998/Math/MathML">'+expr1xml+'</math>')
# convert to MS Office structure
xslt = etree.parse('C:/MML2OMML.XSL')
transform = etree.XSLT(xslt)
new_dom = transform(tree)
# write to docx
document = Document()
p = document.add_paragraph()
p._element.append(new_dom.getroot())
document.save("simpleEq.docx")
How about the following. The capture captures whatever is printed. In this case I use pprint to print the expression that I want written to file. There are lots of options you can use with pprint (including wrapping which you might want to set to False). The quality of output will depend on the fonts you use. I don't do this at all so I don't have a lot of hints for that.
from pprint import pprint
from sympy.utilities.iterables import capture
from sympy.abc import x
from sympy import Integral
with open('out.doc','w',encoding='utf-8') as f:
f.write(capture(lambda:pprint(Integral(x**2, (x, 1, 3)))))
When I double click (in Windows) on the out.doc file, a word equation with the integral appears.
Here is the actual IPython session:
IPython console for SymPy 1.6.dev (Python 3.7.3-32-bit) (ground types: python)
These commands were executed:
>>> from __future__ import division
>>> from sympy import *
>>> x, y, z, t = symbols('x y z t')
>>> k, m, n = symbols('k m n', integer=True)
>>> f, g, h = symbols('f g h', cls=Function)
>>> init_printing()
Documentation can be found at https://docs.sympy.org/dev
In [1]: pprint(Integral(x**2, (x, 1, 3)))
3
(
? 2
? x dx
)
1
In [2]: from pprint import pprint
...: from sympy.utilities.iterables import capture
...: from sympy.abc import x
...: from sympy import Integral
...: with open('out.doc','w',encoding='utf-8') as f:
...: f.write(capture(lambda:pprint(Integral(x**2, (x, 1, 3)))))
...:
{problems pasting the unicode here, but it shows up as an integral symbol in console}

Replacing strings using regex using Pandas

In Pandas, why does the following not replace any strings containing an exclamation mark with whatever follows it?
In [1]: import pandas as pd
In [2]: ser = pd.Series(['Aland Islands !Åland Islands', 'Reunion !Réunion', 'Zi
...: mbabwe'])
In [3]: ser
Out[3]:
0 Aland Islands !Åland Islands
1 Reunion !Réunion
2 Zimbabwe
dtype: object
In [4]: patt = r'.*!(.*)'
In [5]: repl = lambda m: m.group(1)
In [6]: ser.replace(patt, repl)
Out[6]:
0 Aland Islands !Åland Islands
1 Reunion !Réunion
2 Zimbabwe
dtype: object
Whereas the direct reference to the matched substring does work:
In [7]: ser.replace({patt: r'\1'}, regex=True)
Out[7]:
0 Åland Islands
1 Réunion
2 Zimbabwe
dtype: object
What am I doing wrong in the first case?
It appears that replace does not support a method as a replacement argument. Thus, all you can do is to import re library implicitly and use apply:
>>> import re
>>> #... your code ...
>>> ser.apply(lambda row: re.sub(patt, repl, row))
0 Åland Islands
1 Réunion
2 Zimbabwe"
dtype: object
There are two replace methods in Pandas.
The one that acts directly on a Series can take a regex pattern string or a compiled regex and can act in-place, but doesn't allow the replacement argument to be a callable. You must set regex=True and use raw strings.
With:
import re
import pandas as pd
ser = pd.Series(['Aland Islands !Åland Islands', 'Reunion !Réunion', 'Zimbabwe'])
Yes:
ser.replace(r'.*!(.*)', r'\1', regex=True, inplace=True)
ser.replace(r'.*!', '', regex=True, inplace=True)
regex = re.compile(r'.*!(.*)', inplace=True)
ser.replace(regex, r'\1', regex=True, inplace=True)
No:
repl = lambda m: m.group(1)
ser.replace(regex, repl, regex=True, inplace=True)
There's another, used as Series.str.replace. This one accepts a callable replacement but won't substitute in-place and doesn't take a regex argument (though regular expression pattern strings can be used):
Yes:
ser.str.replace(r'.*!', '')
ser.str.replace(r'.*!(.*)', r'\1')
ser.str.replace(regex, repl)
No:
ser.str.replace(regex, r'\1')
ser.str.replace(r'.*!', '', inplace=True)
I hope this is helpful to someone out there.
Try this snippet:
pattern = r'(.*)!'
ser.replace(pattern, '', regex=True)
In your case, you didn't set regex=True, as it is false by default.

How to provide custom gradient in TensorFlow

I am trying to understand that how to use #tf.custom_gradient function available in TensorFlow 1.7 for providing a custom gradient of a vector with respect to a vector. Below code is the minimum working example which solves following problem to get dz/dx.
y=Ax
z=||y||2
Also, this attached image describes the solution as expected by manually calulation
If I do not use the #tf.custom_gradient then the TensorFlow gives the desired solution as expected. My question is that how can I provide custom gradient for y=Ax? We know that dy/dx = A^T as shown in the above attachment which shows steps of calculation that matches the TensorFlow output.
import tensorflow as tf
#I want to write custom gradient for this function f1
def f1(A,x):
y=tf.matmul(A,x,name='y')
return y
#for y= Ax, the derivative is: dy/dx= transpose(A)
#tf.custom_gradient
def f2(A,x):
y=f1(A,x)
def grad(dzByDy): # dz/dy = 2y reaches here correctly.
dzByDx=tf.matmul(A,dzByDy,transpose_a=True)
return dzByDx
return y,grad
x= tf.constant([[1.],[0.]],name='x')
A= tf.constant([ [1., 2.], [3., 4.]],name='A')
y=f1(A,x) # This works as desired
#y=f2(A,x) #This line gives Error
z=tf.reduce_sum(y*y,name='z')
g=tf.gradients(ys=z,xs=x)
with tf.Session() as sess:
print sess.run(g)
Since your function f2() has two inputs, you have to provide a gradient to flow back to each of them. The error you see:
Num gradients 2 generated for op name: "IdentityN" [...] do not match num inputs 3
is admittedly quite cryptic, though. Supposing you never want to calculate dy/dA, you can just return None, dzByDx. The code below (tested):
import tensorflow as tf
#I want to write custom gradient for this function f1
def f1(A,x):
y=tf.matmul(A,x,name='y')
return y
#for y= Ax, the derivative is: dy/dx= transpose(A)
#tf.custom_gradient
def f2(A,x):
y=f1(A,x)
def grad(dzByDy): # dz/dy = 2y reaches here correctly.
dzByDx=tf.matmul(A,dzByDy,transpose_a=True)
return None, dzByDx
return y,grad
x= tf.constant([[1.],[0.]],name='x')
A= tf.constant([ [1., 2.], [3., 4.]],name='A')
#y=f1(A,x) # This works as desired
y=f2(A,x) #This line gives Error
z=tf.reduce_sum(y*y,name='z')
g=tf.gradients(ys=z,xs=x)
with tf.Session() as sess:
print sess.run( g )
outputs:
[array([[20.],
[28.]], dtype=float32)]
as desired.

Dictvectorizer for list as one feature in Python Pandas and Scikit-learn

I have been trying to solve this for days, and although I have found a similar problem here How can i vectorize list using sklearn DictVectorizer, the solution is overly simplified.
I would like to fit some features into a logistic regression model to predict 'chinese' or 'non-chinese'. I have a raw_name which I will extract to get two features 1) is just the last name, and 2) is a list of substring of the last name, for example, 'Chan' will give ['ch', 'ha', 'an']. But it seems Dictvectorizer doesn't take list type as part of the dictionary. From the link above, I try to create a function list_to_dict, and successfully, return some dict elements,
{'substring=co': True, 'substring=or': True, 'substring=rn': True, 'substring=ns': True}
but I have no idea how to incorporate that in the my_dict = ... before applying the dictvectorizer.
# coding=utf-8
import pandas as pd
from pandas import DataFrame, Series
import numpy as np
import nltk
import re
import random
from random import randint
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction import DictVectorizer
lr = LogisticRegression()
dv = DictVectorizer()
# Get csv file into data frame
data = pd.read_csv("V2-1_2000Records_Processed_SEP2015.csv", header=0, encoding="utf-8")
df = DataFrame(data)
# Pandas data frame shuffling
df_shuffled = df.iloc[np.random.permutation(len(df))]
df_shuffled.reset_index(drop=True)
# Assign X and y variables
X = df.raw_name.values
y = df.chineseScan.values
# Feature extraction functions
def feature_full_last_name(nameString):
try:
last_name = nameString.rsplit(None, 1)[-1]
if len(last_name) > 1: # not accept name with only 1 character
return last_name
else: return None
except: return None
def feature_twoLetters(nameString):
placeHolder = []
try:
for i in range(0, len(nameString)):
x = nameString[i:i+2]
if len(x) == 2:
placeHolder.append(x)
return placeHolder
except: return []
def list_to_dict(substring_list):
try:
substring_dict = {}
for i in substring_list:
substring_dict['substring='+str(i)] = True
return substring_dict
except: return None
list_example = ['co', 'or', 'rn', 'ns']
print list_to_dict(list_example)
# Transform format of X variables, and spit out a numpy array for all features
my_dict = [{'two-letter-substrings': feature_twoLetters(feature_full_last_name(i)),
'last-name': feature_full_last_name(i), 'dummy': 1} for i in X]
print my_dict[3]
Output:
{'substring=co': True, 'substring=or': True, 'substring=rn': True, 'substring=ns': True}
{'dummy': 1, 'two-letter-substrings': [u'co', u'or', u'rn', u'ns'], 'last-name': u'corns'}
Sample data:
Raw_name chineseScan
Jack Anderson non-chinese
Po Lee chinese
If I have understood correctly you want a way to encode list values in order to have a feature dictionary that DictVectorizer could use. (One year too late but) something like this can be used depending on the case:
my_dict_list = []
for i in X:
# create a new feature dictionary
feat_dict = {}
# add the features that are straight forward
feat_dict['last-name'] = feature_full_last_name(i)
feat_dict['dummy'] = 1
# for the features that have a list of values iterate over the values and
# create a custom feature for each value
for two_letters in feature_twoLetters(feature_full_last_name(i)):
# make sure the naming is unique enough so that no other feature
# unrelated to this will have the same name/ key
feat_dict['two-letter-substrings-' + two_letters] = True
# save it to the feature dictionary list that will be used in Dict vectorizer
my_dict_list.append(feat_dict)
print my_dict_list
from sklearn.feature_extraction import DictVectorizer
dict_vect = DictVectorizer(sparse=False)
transformed_x = dict_vect.fit_transform(my_dict_list)
print transformed_x
Output:
[{'dummy': 1, u'two-letter-substrings-er': True, 'last-name': u'Anderson', u'two-letter-substrings-on': True, u'two-letter-substrings-de': True, u'two-letter-substrings-An': True, u'two-letter-substrings-rs': True, u'two-letter-substrings-nd': True, u'two-letter-substrings-so': True}, {'dummy': 1, u'two-letter-substrings-ee': True, u'two-letter-substrings-Le': True, 'last-name': u'Lee'}]
[[ 1. 1. 0. 1. 0. 1. 0. 1. 1. 1. 1. 1.]
[ 1. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0.]]
Another thing you could do (but I don't recommend) if you don't want to create as many features as the values in your lists is something like this:
# sorting the values would be a good idea
feat_dict[frozenset(feature_twoLetters(feature_full_last_name(i)))] = True
# or
feat_dict[" ".join(feature_twoLetters(feature_full_last_name(i)))] = True
but the first one means that you can't have any duplicate values and probably both don't make good features, especially if you need fine-tuned and detailed ones. Also, they reduce the possibility of two rows having the same combination of two letter combinations, thus the classification probably won't do well.
Output:
[{'dummy': 1, 'last-name': u'Anderson', frozenset([u'on', u'rs', u'de', u'nd', u'An', u'so', u'er']): True}, {'dummy': 1, 'last-name': u'Lee', frozenset([u'ee', u'Le']): True}]
[{'dummy': 1, 'last-name': u'Anderson', u'An nd de er rs so on': True}, {'dummy': 1, u'Le ee': True, 'last-name': u'Lee'}]
[[ 1. 0. 1. 1. 0.]
[ 0. 1. 1. 0. 1.]]

Removing features with low variance using scikit-learn

scikit-learn provides various methods to remove descriptors, a basic method for this purpose has been provided by the given tutorial below,
http://scikit-learn.org/stable/modules/feature_selection.html
but the tutorial does not provide any method or a way that can tell you the way to keep the list of features that either removed or kept.
The code below has been taken from the tutorial.
from sklearn.feature_selection import VarianceThreshold
X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]]
sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
sel.fit_transform(X)
array([[0, 1],
[1, 0],
[0, 0],
[1, 1],
[1, 0],
[1, 1]])
The given example code above depicts only two descriptors "shape(6, 2)", but in my case, I have a huge data frames with a shape of (rows 51, columns 9000). After finding a suitable model I want to keep the track of useful and useless features because I can save computational time during the computation of the features of test data set by calculating only useful features.
For example, when you perform machine learning modeling with WEKA 6.0, it provided with remarkable flexibility over feature selection and after removing the useless feature you can get a list of a discarded features along with the useful features.
thanks
Then, what you can do, if I'm not wrong is:
In the case of the VarianceThreshold, you can call the method fit instead of fit_transform. This will fit data, and the resulting variances will be stored in vt.variances_ (assuming vt is your object).
Having a threhold, you can extract the features of the transformation as fit_transform would do:
X[:, vt.variances_ > threshold]
Or get the indexes as:
idx = np.where(vt.variances_ > threshold)[0]
Or as a mask
mask = vt.variances_ > threshold
PS: default threshold is 0
EDIT:
A more straight forward to do, is by using the method get_support of the class VarianceThreshold. From the documentation:
get_support([indices]) Get a mask, or integer index, of the features selected
You should call this method after fit or fit_transform.
import numpy as np
import pandas as pd
from sklearn.feature_selection import VarianceThreshold
# Just make a convenience function; this one wraps the VarianceThreshold
# transformer but you can pass it a pandas dataframe and get one in return
def get_low_variance_columns(dframe=None, columns=None,
skip_columns=None, thresh=0.0,
autoremove=False):
"""
Wrapper for sklearn VarianceThreshold for use on pandas dataframes.
"""
print("Finding low-variance features.")
try:
# get list of all the original df columns
all_columns = dframe.columns
# remove `skip_columns`
remaining_columns = all_columns.drop(skip_columns)
# get length of new index
max_index = len(remaining_columns) - 1
# get indices for `skip_columns`
skipped_idx = [all_columns.get_loc(column)
for column
in skip_columns]
# adjust insert location by the number of columns removed
# (for non-zero insertion locations) to keep relative
# locations intact
for idx, item in enumerate(skipped_idx):
if item > max_index:
diff = item - max_index
skipped_idx[idx] -= diff
if item == max_index:
diff = item - len(skip_columns)
skipped_idx[idx] -= diff
if idx == 0:
skipped_idx[idx] = item
# get values of `skip_columns`
skipped_values = dframe.iloc[:, skipped_idx].values
# get dataframe values
X = dframe.loc[:, remaining_columns].values
# instantiate VarianceThreshold object
vt = VarianceThreshold(threshold=thresh)
# fit vt to data
vt.fit(X)
# get the indices of the features that are being kept
feature_indices = vt.get_support(indices=True)
# remove low-variance columns from index
feature_names = [remaining_columns[idx]
for idx, _
in enumerate(remaining_columns)
if idx
in feature_indices]
# get the columns to be removed
removed_features = list(np.setdiff1d(remaining_columns,
feature_names))
print("Found {0} low-variance columns."
.format(len(removed_features)))
# remove the columns
if autoremove:
print("Removing low-variance features.")
# remove the low-variance columns
X_removed = vt.transform(X)
print("Reassembling the dataframe (with low-variance "
"features removed).")
# re-assemble the dataframe
dframe = pd.DataFrame(data=X_removed,
columns=feature_names)
# add back the `skip_columns`
for idx, index in enumerate(skipped_idx):
dframe.insert(loc=index,
column=skip_columns[idx],
value=skipped_values[:, idx])
print("Succesfully removed low-variance columns.")
# do not remove columns
else:
print("No changes have been made to the dataframe.")
except Exception as e:
print(e)
print("Could not remove low-variance features. Something "
"went wrong.")
pass
return dframe, removed_features
this worked for me if you want to see exactly which columns are remained after thresholding you may use this method:
from sklearn.feature_selection import VarianceThreshold
threshold_n=0.95
sel = VarianceThreshold(threshold=(threshold_n* (1 - threshold_n) ))
sel_var=sel.fit_transform(data)
data[data.columns[sel.get_support(indices=True)]]
When testing features I wrote this simple function that tells me which variables remained in the data frame after the VarianceThreshold is applied.
from sklearn.feature_selection import VarianceThreshold
from itertools import compress
def fs_variance(df, threshold:float=0.1):
"""
Return a list of selected variables based on the threshold.
"""
# The list of columns in the data frame
features = list(df.columns)
# Initialize and fit the method
vt = VarianceThreshold(threshold = threshold)
_ = vt.fit(df)
# Get which column names which pass the threshold
feat_select = list(compress(features, vt.get_support()))
return feat_select
which returns a list of column names which are selected. For example: ['col_2','col_14', 'col_17'].