How to include [] in python regex code? - python-2.7

I am using python 2.7.8 to write a small python code that reads a rule in a form A ==> B by using regex and return it in a form of 'A, B'.
This is my code:
import re
def fixp1(s):
pattern = re.compile("(?P<g1>([A-Z0-9a-z]|\?)*):(?P<g2>([A-Z0-9a-z]|\?)*)")
return eval(pattern.sub("('\g<g1>', '\g<g2>')", s))
x = "[ABCD:NP, [PQR:?TAG1]] ==> [XXX:?P]"
def readrule(r):
r.split("==>")
return [fixp1(r[0].strip()), fixp1(r[1].strip())]
When I test this code:
>>> readrule(x)
I got the following error message:
readrule(y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../patterns.py", line 12, in readrule
return [fixp1(r[0].strip()), fixp1(r[1].strip())]
File ".../patterns.py", line 5, in fixp1
return eval(pattern.sub("('\g<g1>', '\g<g2>')", s))
File "<string>", line 1
[
^
SyntaxError: unexpected EOF while parsing
>>>
I think this problem happened because I couldn't add '[' and ']' in here
([A-Z0-9a-z]|\?)
If that's right, how to do it? if not; where is my mistake?

Remove the eval command, the RegEx.sub returns a string which is your match with the replacements applied, you cannot evaluate the string. This yields the SyntaxError you are seeing.
If you want to include [] in your patterns, you need to escape them with \:
pattern = re.compile(r'[\[\]0-9]+')
would match strings like '[1234]'.

Related

tornado set_cookie array support

Does Tornado support array cookie?
In PHP, we set array cookie by
setcookie('UserTable[Name]','Tinywan',time()+3600);
setcookie('UserTable[Age]','24',time()+3600);
setcookie('UserTable[Email]','7514#xxx.com',time()+3600);
But in tornado, it doesn't work!
self.set_cookie('UserTable[Name]', 'Tinywan', expires_days=30)
self.set_cookie('UserTable[Age]', '24', expires_days=30)
self.set_cookie('UserTable[Email]', '7514#xxx.com', expires_days=30)
It because of the Python Http cookie module raise the error when illegal charater present in the Key
from http import cookies
C = cookies.SimpleCookie()
C['UserTable[Name]'] = 'Tinywan'
Traceback
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/http/cookies.py", line 521, in __setitem__
self.__set(key, rval, cval)
File "/usr/local/lib/python3.6/http/cookies.py", line 511, in __set
M.set(key, real_value, coded_value)
File "/usr/local/lib/python3.6/http/cookies.py", line 380, in set
raise CookieError('Illegal key %r' % (key,))
http.cookies.CookieError: Illegal key 'UserTable[Name]'
>>> C['UserTableName'] = 'Tinywan'
You're using illegal characters for Cookie name, that is, [ and ] characters are illegal.
From MDN documentation:
A <cookie-name> can be any US-ASCII characters except control characters (CTLs), spaces, or tabs. It also must not contain a separator character like the following: ( ) < > # , ; : \ " / [ ] ? = { }.
You don't really need these characters. UserTableName should work fine.

Having difficult time parsing this string with regex?

text="""[{"token":"03AJzQf7P5tfAY0T8yGDlF_aoPkLgz9-F7aiXzvViQqaaRmcJeFuIq96vmLoPXVK1GW-Fs8xp6OmJWvFvyNa3ayMpvaLkb6R sVkjjWBjqVIW4ziWeHk--Vrd8zVaA-Pt8VxMdoDBYxjRRrCNdeQN-Fk_-Wywb5XceJGdPJbMDZ-BoOB8l3Gq4bFwJTVu56zLT-4fbAsLWqRI7TjEswJ_y2-6NlEOyTTxFblzlZLYFh7urRx2Wra_gdP0-uUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xh p34_dhspsjDpgfxYcHTI3e3yAir_QQ","timestamp":"2017-10-11T23:40:13.436Z"},{"token":"03AJzQf7Mmj_KZVl39Ob1_YnvsJuj4vFS o8ZWXNUJjSEjZqNyH8puB035sZYbQdPLVdOoX8ljyGeDYvxk6Kkf3Sc16EAS0bg0cXUAXzs6LAr3jDZmtW38TjWN5qbykIN_-s0-YpX6F0XJ4Hw3GDl vVkxmAk1btZewbeUp1nwMeM9BGJxkJZ5_2LRCGTERPGICKU4P897_FYcduADw5j1wEd9Yp7TdczRkgkY3qpsNcxlrF_rXv7DAvUxkg2_fussc3RkRgq ZueTMPkDN7B5BYiTBqVeXJ48Lvm6-1R86HgyrcDAPaZ1xMY0JxzMSvU26rChpomXFLERLfxijDNrixfGeh8hSCa0dX1HiLac8yOERKRcbBk1kXLOK8" ,"timestamp":"2017-10-11T23:40:29.916Z"}]"""
I would like to parse out this string, and put it into a list. I only want the token value to be store in the list.
My current code (not working):
token = re.search('(?<="token":").*?"', text)
print(token.group(0))
print(token.group(1))
the output:
03AJzQf7P5tfAY0T8yGDlF_aoPkLgz9-F7aiXzvViQqaaRmcJeFuIq96vmLoPXVK1GW-Fs8xp6OmJWvFvyNa3ayMpvaLkb6RsVkjjWBjqVIW4ziWeHk--Vrd8zVaA-Pt8VxMdoDBYxjRRrCNdeQN-Fk_-Wywb5XceJGdPJbMDZ-BoOB8l3Gq4bFwJTVu56zLT-4fbAsLWqRI7TjEswJ_y2-6NlEOyTTxFblzlZLYFh7urRx2Wra_gdP0-uUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xhp34_dhspsjDpgfxYcHTI3e3yAir_QQ"
Error for token.group(1):
Traceback (most recent call last):
File "main.py", line 45, in <module>
print(token.group(1))
IndexError: no such groupuUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xhp34_dhspsjDpgfxYcHTI3e3yAir_QQ"
Your text is a json string. You can use json.loads to convert from json string to a list of dict.
import json
text="""[{"token":"03AJzQf7P5tfAY0T8yGDlF_aoPkLgz9-F7aiXzvViQqaaRmcJeFuIq96vmLoPXVK1GW-Fs8xp6OmJWvFvyNa3ayMpvaLkb6R sVkjjWBjqVIW4ziWeHk--Vrd8zVaA-Pt8VxMdoDBYxjRRrCNdeQN-Fk_-Wywb5XceJGdPJbMDZ-BoOB8l3Gq4bFwJTVu56zLT-4fbAsLWqRI7TjEswJ_y2-6NlEOyTTxFblzlZLYFh7urRx2Wra_gdP0-uUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xh p34_dhspsjDpgfxYcHTI3e3yAir_QQ","timestamp":"2017-10-11T23:40:13.436Z"},{"token":"03AJzQf7Mmj_KZVl39Ob1_YnvsJuj4vFS o8ZWXNUJjSEjZqNyH8puB035sZYbQdPLVdOoX8ljyGeDYvxk6Kkf3Sc16EAS0bg0cXUAXzs6LAr3jDZmtW38TjWN5qbykIN_-s0-YpX6F0XJ4Hw3GDl vVkxmAk1btZewbeUp1nwMeM9BGJxkJZ5_2LRCGTERPGICKU4P897_FYcduADw5j1wEd9Yp7TdczRkgkY3qpsNcxlrF_rXv7DAvUxkg2_fussc3RkRgq ZueTMPkDN7B5BYiTBqVeXJ48Lvm6-1R86HgyrcDAPaZ1xMY0JxzMSvU26rChpomXFLERLfxijDNrixfGeh8hSCa0dX1HiLac8yOERKRcbBk1kXLOK8" ,"timestamp":"2017-10-11T23:40:29.916Z"}]"""
mylist = json.loads(text)
token = ' '.join(mylist[0]['token'].split()).split()

Python RPy2 function with multiple input arguments

I am looking to call an rPy2 function with multiple input parameters. Here is the R function write.csv that I am trying to use. It has multiple input parameters and I need to specify more than one such parameter.
If I use it without the optional parameter row.names and column.names, it works like this:
r("write.csv")(d,file='myfilename.csv')
For my requirements, I must issue this command with the optional parameters row.names and column.names. So, I tried:
r('write.csv')(d, file='myfilename.csv', row.names=FALSE, column.names=FALSE)
but I got this error message:
File "/home/UserName/test.py", line 12
r("write.csv")(d,file='myfilename.csv',row.names=FALSE, column.names=FALSE)
SyntaxError: keyword can't be an expression
[Finished in 0.0s with exit code 1]
[shell_cmd: python -u "/home/UserName/test.py"]
[dir: /home/UserName]
[path: /home/UserName/bin:/home/UserName/.local/bin:/usr/local/sbin:/usr/local/bin:
.../usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]
How can I achieve write.csv with row.names=FALSE and column.names=FALSE, in rPy2?
You can use Python's **.
See the note here: http://rpy2.readthedocs.io/en/version_2.8.x/robjects_functions.html#callable
Ony of my mistakes was that I should have replaced . by _, as shown in the docs here:
from rpy2.robjects.packages import importr
base = importr('base')
base.rank(0, na_last = True)
so I would analogously need row_names = TRUE. However, the . in write.csv() still remained, so this only solved part of the question. Ok, so I tried a few things to get an answer:
Generating sample data:
from rpy2.robjects import r, globalenv
from rpy2.robjects import IntVector, DataFrame
d = {'a': IntVector((1,2,3)), 'b': IntVector((4,5,6))}
dataf = DataFrame(d)
Attempts follow - 1. did not work, 2. and 3. did work:
1:
r('write_csv')(x=dataf,file='testing.csv',row_names=False)
Traceback (most recent call last):
File "C:\Users\UserName\FileD\test.py", line 18, in <module>
r('write_csv')(x=dataf,file='testing.csv',row_names=False)
File "C:\Python27\lib\site-packages\rpy2\robjects\__init__.py", line 321, in __call__
res = self.eval(p)
File "C:\Python27\lib\site-packages\rpy2\robjects\functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "C:\Python27\lib\site-packages\rpy2\robjects\functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in eval(expr, envir, enclos) : object 'write_csv'
..not found
Error in eval(expr, envir, enclos) : object 'write_csv' not found
2.
r('''
write_csv <- function(x,verbose=FALSE)
write.csv(x,file='testing.csv',row.names=FALSE)
''')
r['write_csv'](dataf)
3.
globalenv['dataf'] = dataf
r("write.csv(dataf,file='testing2.csv',row.names=FALSE)")
I was really hoping attempt 1. would have worked. It seemed I had reproduced the example in the docs base.rank(0, na_last = True), but I think something might have still been missing.

What is wrong with following piece of code?

I have the following piece of code copied from book programming collective intelligence page 118, chapter "Document Filtering". This function breaks up the text into words by dividing the text on any character that isn't a letter. This leaves only actual words,all converted to lower-case.
import re
import math
def getwords(doc):
splitter=re.compile('\\W*')
words=[s.lower() for s in splitter.split(doc)
if len(s)>2 and len(s)<20]
return dict([(w,1) for w in words])
I implemented the function and got the following error:
>>> import docclas
>>> t=docclass.getwords(s)
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
t=docclass.getwords(s)
File "docclass.py", line 6, in getwords
words=[s.lower() for s in splitter.split(doc)
NameError: global name 'splitter' is not defined
It works here
>>> import re
>>>
>>> def getwords(doc):
... splitter=re.compile('\\W*')
... words=[s.lower() for s in splitter.split(doc)
... if len(s)>2 and len(s)<20]
... return dict([(w,1) for w in words])
...
>>> getwords ("He's fallen in the water!");
{'water': 1, 'the': 1, 'fallen': 1}
I'm gueesing you made a typo in your code, but got it right when you pasted it here.

How to understand the Url patterns in django url.py

How to understand the Url patterns for eg. (?P<slug>[-\w]+)/$ in django url.py
This url: (?P<slug>[-\w]+)/$
Says that you are passing a variable to your view called slug could be any digits or letters and -
your view is like this:
def my_view(request, slug):
....
hope it helps...
First Mastering Regular Expressions, then 7.2.1 - Regular Expression Syntax
I think it is not a valid regex pattern.
"[-\w]+" will get "word and -", something like "a-b9-c-" or "---"
(?P...) is a "Named Group". If you don't give its name, python (mine is 2.7) will raise error.
>>> m = re.match("(?P<e>[-\w]+)/$", "a-b-c-/")
>>> m.group('e')
'a-b-c-'
>>> m = re.match("(?P[-\w]+)/$", "a-b-c-/")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 137, in match
return _compile(pattern, flags).match(string)
File "/usr/lib/python2.7/re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: unknown specifier: ?P[
Note that slug fields might also include digits (not just letters and the dash), so you want to alter it to say something like:
SLUG = '(?P<slug>[\w\d-]+)'
I hope this helpful to you...