Having difficult time parsing this string with regex?

Having difficult time parsing this string with regex? - regex

text="""[{"token":"03AJzQf7P5tfAY0T8yGDlF_aoPkLgz9-F7aiXzvViQqaaRmcJeFuIq96vmLoPXVK1GW-Fs8xp6OmJWvFvyNa3ayMpvaLkb6R sVkjjWBjqVIW4ziWeHk--Vrd8zVaA-Pt8VxMdoDBYxjRRrCNdeQN-Fk_-Wywb5XceJGdPJbMDZ-BoOB8l3Gq4bFwJTVu56zLT-4fbAsLWqRI7TjEswJ_y2-6NlEOyTTxFblzlZLYFh7urRx2Wra_gdP0-uUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xh p34_dhspsjDpgfxYcHTI3e3yAir_QQ","timestamp":"2017-10-11T23:40:13.436Z"},{"token":"03AJzQf7Mmj_KZVl39Ob1_YnvsJuj4vFS o8ZWXNUJjSEjZqNyH8puB035sZYbQdPLVdOoX8ljyGeDYvxk6Kkf3Sc16EAS0bg0cXUAXzs6LAr3jDZmtW38TjWN5qbykIN_-s0-YpX6F0XJ4Hw3GDl vVkxmAk1btZewbeUp1nwMeM9BGJxkJZ5_2LRCGTERPGICKU4P897_FYcduADw5j1wEd9Yp7TdczRkgkY3qpsNcxlrF_rXv7DAvUxkg2_fussc3RkRgq ZueTMPkDN7B5BYiTBqVeXJ48Lvm6-1R86HgyrcDAPaZ1xMY0JxzMSvU26rChpomXFLERLfxijDNrixfGeh8hSCa0dX1HiLac8yOERKRcbBk1kXLOK8" ,"timestamp":"2017-10-11T23:40:29.916Z"}]"""
I would like to parse out this string, and put it into a list. I only want the token value to be store in the list.
My current code (not working):
token = re.search('(?<="token":").*?"', text)
print(token.group(0))
print(token.group(1))
the output:
03AJzQf7P5tfAY0T8yGDlF_aoPkLgz9-F7aiXzvViQqaaRmcJeFuIq96vmLoPXVK1GW-Fs8xp6OmJWvFvyNa3ayMpvaLkb6RsVkjjWBjqVIW4ziWeHk--Vrd8zVaA-Pt8VxMdoDBYxjRRrCNdeQN-Fk_-Wywb5XceJGdPJbMDZ-BoOB8l3Gq4bFwJTVu56zLT-4fbAsLWqRI7TjEswJ_y2-6NlEOyTTxFblzlZLYFh7urRx2Wra_gdP0-uUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xhp34_dhspsjDpgfxYcHTI3e3yAir_QQ"
Error for token.group(1):
Traceback (most recent call last):
File "main.py", line 45, in <module>
print(token.group(1))
IndexError: no such groupuUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xhp34_dhspsjDpgfxYcHTI3e3yAir_QQ"

Your text is a json string. You can use json.loads to convert from json string to a list of dict.
import json
text="""[{"token":"03AJzQf7P5tfAY0T8yGDlF_aoPkLgz9-F7aiXzvViQqaaRmcJeFuIq96vmLoPXVK1GW-Fs8xp6OmJWvFvyNa3ayMpvaLkb6R sVkjjWBjqVIW4ziWeHk--Vrd8zVaA-Pt8VxMdoDBYxjRRrCNdeQN-Fk_-Wywb5XceJGdPJbMDZ-BoOB8l3Gq4bFwJTVu56zLT-4fbAsLWqRI7TjEswJ_y2-6NlEOyTTxFblzlZLYFh7urRx2Wra_gdP0-uUxoZydZBzbiPetcYmGo9b1B69-Pmb7akK7aqLUN03mvC3t1bn4u0ZvJGWjBmqhv81QoP3J1u-_Xh p34_dhspsjDpgfxYcHTI3e3yAir_QQ","timestamp":"2017-10-11T23:40:13.436Z"},{"token":"03AJzQf7Mmj_KZVl39Ob1_YnvsJuj4vFS o8ZWXNUJjSEjZqNyH8puB035sZYbQdPLVdOoX8ljyGeDYvxk6Kkf3Sc16EAS0bg0cXUAXzs6LAr3jDZmtW38TjWN5qbykIN_-s0-YpX6F0XJ4Hw3GDl vVkxmAk1btZewbeUp1nwMeM9BGJxkJZ5_2LRCGTERPGICKU4P897_FYcduADw5j1wEd9Yp7TdczRkgkY3qpsNcxlrF_rXv7DAvUxkg2_fussc3RkRgq ZueTMPkDN7B5BYiTBqVeXJ48Lvm6-1R86HgyrcDAPaZ1xMY0JxzMSvU26rChpomXFLERLfxijDNrixfGeh8hSCa0dX1HiLac8yOERKRcbBk1kXLOK8" ,"timestamp":"2017-10-11T23:40:29.916Z"}]"""
mylist = json.loads(text)
token = ' '.join(mylist[0]['token'].split()).split()

Related

python sqlite3 .executemany() with named placeholders?

This works:
ss = 'insert into images (file_path) values(?);'
dddd = (('dd1',), ('dd2',))
conn.executemany(ss, dddd)
However this does not:
s = 'insert into images (file_path) values (:v)'
ddddd = ({':v': 'dd11'}, {':v': 'dd22'})
conn.executemany(s, ddddd)
Traceback (most recent call last):
File "/Users/Wes/.virtualenvs/ppyy/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3035, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-31-a999de59f73b>", line 1, in <module>
conn.executemany(s, ddddd)
ProgrammingError: You did not supply a value for binding 1.
I am wondering if it is possible to use named parameters with executemany and, if so, how.
The documentation at section 11.13.3 talks generally about parameters but doesn't discuss the two styles of parameters that are described for other flavors of .executexxx().
I have checked out Python sqlite3 execute with both named and qmark parameters which does not pertain to executemany.

The source shows that execute() simply constructs a one-element list and calls executemany(), so the problem is not with executemany() itself; the same call fails with execute():
>>> conn.execute('SELECT :v', {':v': 42})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: You did not supply a value for binding 1.
As shown in the Python documentation, named parameters do not include the colon:
# And this is the named style:
cur.execute("select * from people where name_last=:who and age=:age", {"who": who, "age": age})
So you have to use ddddd = ({'v': 'dd11'}, {'v': 'dd22'}).

The : isn't part of the parameter name.
>>> s = 'insert into images (file_path) values (:v)'
>>> ddddd = ({'v': 'dd11'}, {'v': 'dd22'})
>>> conn.executemany(s, ddddd)
<sqlite3.Cursor object at 0x0000000002C0E500>
>>> conn.execute('select * from images').fetchall()
[(u'dd11',), (u'dd22',)]

tornado set_cookie array support

Does Tornado support array cookie?
In PHP, we set array cookie by
setcookie('UserTable[Name]','Tinywan',time()+3600);
setcookie('UserTable[Age]','24',time()+3600);
setcookie('UserTable[Email]','7514#xxx.com',time()+3600);
But in tornado, it doesn't work!
self.set_cookie('UserTable[Name]', 'Tinywan', expires_days=30)
self.set_cookie('UserTable[Age]', '24', expires_days=30)
self.set_cookie('UserTable[Email]', '7514#xxx.com', expires_days=30)

It because of the Python Http cookie module raise the error when illegal charater present in the Key
from http import cookies
C = cookies.SimpleCookie()
C['UserTable[Name]'] = 'Tinywan'
Traceback
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/http/cookies.py", line 521, in __setitem__
self.__set(key, rval, cval)
File "/usr/local/lib/python3.6/http/cookies.py", line 511, in __set
M.set(key, real_value, coded_value)
File "/usr/local/lib/python3.6/http/cookies.py", line 380, in set
raise CookieError('Illegal key %r' % (key,))
http.cookies.CookieError: Illegal key 'UserTable[Name]'
>>> C['UserTableName'] = 'Tinywan'

You're using illegal characters for Cookie name, that is, [ and ] characters are illegal.
From MDN documentation:
A <cookie-name> can be any US-ASCII characters except control characters (CTLs), spaces, or tabs. It also must not contain a separator character like the following: ( ) < > # , ; : \ " / [ ] ? = { }.
You don't really need these characters. UserTableName should work fine.

Regex & BeautifulSoup - TypeError: expected string or bytes-like object

My code is running into some unexpedt error. Tried to tweak with have 'u' instead of 'r', but still get same error. Tried other solutions from stacks, but didn't go anywhere. Any suggestion?
#use urlib and beautifulsoup to scrpe table
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import pandas as pd
url = 'https://www.example.com/profiles'
page = urlopen(url).read()
soup = BeautifulSoup(page, 'lxml')
#print(soup)
reEngName = re.compile(r'\[\*\*.+\*\*\]')
reKorName = re.compile(r'\([^\/h]*\)')
reProfile = re.compile(r'\|.+')
for line in re.findall(reEngName, soup):
print(line)
Error message:
Traceback (most recent call last):
File "ckurllib.py", line 18, in <module>
for line in re.findall(reEngName, soup):
File "C:\Users\Sammy\Anaconda3\lib\re.py", line 222, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object

Regex works with strings. If you want to search whole raw text of file, give the page to regex. Soap is a parser, that internally splits html into its syntactic components, organized into a tree, you can iterate through them. For example, to iterate all <a> tags:
soup = BeautifulSoup.BeautifulSoup(urllib2.urlopen(url).read())
for a in soup('a'):
out = doThings(a)
in doThings(a):
if a['href'].startswith("http:///www.domain.net"):
Naturally, in latter stage you can use regexes to check for matches in strings.

Getting ParseError when parsing using xml.etree.ElementTree

I am trying to extract the <comment> tag (using xml.etree.ElementTree) from the XML and find the comment count number and add all of the numbers. I am reading the file via a URL using urllib package.
sample data: http://python-data.dr-chuck.net/comments_42.xml
But currently i am trying to trying to print the name, and count.
import urllib
import xml.etree.ElementTree as ET
serviceurl = 'http://python-data.dr-chuck.net/comments_42.xml'
address = raw_input("Enter location: ")
url = serviceurl + urllib.urlencode({'sensor': 'false', 'address': address})
print ("Retrieving: ", url)
link = urllib.urlopen(url)
data = link.read()
print("Retrieved ", len(data), "characters")
tree = ET.fromstring(data)
tags = tree.findall('.//comment')
for tag in tags:
Name = ''
count = ''
Name = tree.find('commentinfo').find('comments').find('comment').find('name').text
count = tree.find('comments').find('comments').find('comment').find('count').number
print Name, count
Unfortunately, I am not able to even parse the XML file into Python, because i am getting this error as follows:
Traceback (most recent call last):
File "ch13_parseXML_assignment.py", line 14, in <module>
tree = ET.fromstring(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 49
I have read previously in a similar situation that maybe the parser isn't accepting the XML file. Anticipating this, i did a Try and Except around tree = ET.fromstring(data) and I was able to get past this line, but later it is throwing an erro saying tree variable is not defined. This defeats the purpose of the output I am expecting.
Can somebody please point me in a direction that helps me?

How to include [] in python regex code?

I am using python 2.7.8 to write a small python code that reads a rule in a form A ==> B by using regex and return it in a form of 'A, B'.
This is my code:
import re
def fixp1(s):
pattern = re.compile("(?P<g1>([A-Z0-9a-z]|\?)*):(?P<g2>([A-Z0-9a-z]|\?)*)")
return eval(pattern.sub("('\g<g1>', '\g<g2>')", s))
x = "[ABCD:NP, [PQR:?TAG1]] ==> [XXX:?P]"
def readrule(r):
r.split("==>")
return [fixp1(r[0].strip()), fixp1(r[1].strip())]
When I test this code:
>>> readrule(x)
I got the following error message:
readrule(y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../patterns.py", line 12, in readrule
return [fixp1(r[0].strip()), fixp1(r[1].strip())]
File ".../patterns.py", line 5, in fixp1
return eval(pattern.sub("('\g<g1>', '\g<g2>')", s))
File "<string>", line 1
[
^
SyntaxError: unexpected EOF while parsing
>>>
I think this problem happened because I couldn't add '[' and ']' in here
([A-Z0-9a-z]|\?)
If that's right, how to do it? if not; where is my mistake?

Remove the eval command, the RegEx.sub returns a string which is your match with the replacements applied, you cannot evaluate the string. This yields the SyntaxError you are seeing.
If you want to include [] in your patterns, you need to escape them with \:
pattern = re.compile(r'[\[\]0-9]+')
would match strings like '[1234]'.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Having difficult time parsing this string with regex? - regex

Related

python sqlite3 .executemany() with named placeholders?

tornado set_cookie array support

Regex & BeautifulSoup - TypeError: expected string or bytes-like object

Getting ParseError when parsing using xml.etree.ElementTree

How to include [] in python regex code?

Categories

Resources