I am trying to extract the <comment> tag (using xml.etree.ElementTree) from the XML and find the comment count number and add all of the numbers. I am reading the file via a URL using urllib package.
sample data: http://python-data.dr-chuck.net/comments_42.xml
But currently i am trying to trying to print the name, and count.
import urllib
import xml.etree.ElementTree as ET
serviceurl = 'http://python-data.dr-chuck.net/comments_42.xml'
address = raw_input("Enter location: ")
url = serviceurl + urllib.urlencode({'sensor': 'false', 'address': address})
print ("Retrieving: ", url)
link = urllib.urlopen(url)
data = link.read()
print("Retrieved ", len(data), "characters")
tree = ET.fromstring(data)
tags = tree.findall('.//comment')
for tag in tags:
Name = ''
count = ''
Name = tree.find('commentinfo').find('comments').find('comment').find('name').text
count = tree.find('comments').find('comments').find('comment').find('count').number
print Name, count
Unfortunately, I am not able to even parse the XML file into Python, because i am getting this error as follows:
Traceback (most recent call last):
File "ch13_parseXML_assignment.py", line 14, in <module>
tree = ET.fromstring(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 49
I have read previously in a similar situation that maybe the parser isn't accepting the XML file. Anticipating this, i did a Try and Except around tree = ET.fromstring(data) and I was able to get past this line, but later it is throwing an erro saying tree variable is not defined. This defeats the purpose of the output I am expecting.
Can somebody please point me in a direction that helps me?
Related
I am getting a feedback error message that I can't seem to resolve. I have a csv file that I am trying to read and generate pdf files based on the county they fall in. If there is only one map in that county then I do not need to append the files (code TBD once this hurdle is resolved as I am sure I will run into the same issue with the code when using pyPDF2) and want to simply copy the map to a new directory with a new name. The shutil.copyfile does not seem to recognize the path as valid for County3 which meets the condition to execute this command.
Map.csv file
County Maps
County1 C:\maps\map1.pdf
County1 C:\maps\map2.pdf
County2 C:\maps\map1.pdf
County2 C:\maps\map3.pdf
County3 C:\maps\map3.pdf
County4 C:\maps\map2.pdf
County4 C:\maps\map3.pdf
County4 C:\maps\map4.pdf
My code:
import csv, os
import shutil
from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
merged_file = PdfFileMerger()
counties = {}
with open(r'C:\maps\Maps.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=",")
for n, row in enumerate(reader):
if not n:
continue
county, location = row
if county not in counties:
counties[county] = list()
counties[county].append((location))
for k, v in counties.items():
newPdfFile = ('C:\maps\Maps\JoinedMaps\County-' + k +'.pdf')
if len(str(v).split(',')) > 1:
print newPdfFile
else:
shutil.copyfile(str(v),newPdfFile)
print 'v: ' + str(v)
Feedback message:
C:\maps\Maps\JoinedMaps\County-County4.pdf
C:\maps\Maps\JoinedMaps\County-County1.pdf
v: ['C:\\maps\\map3.pdf']
Traceback (most recent call last):
File "<module2>", line 22, in <module>
File "C:\Python27\ArcGIS10.5\lib\shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 22] invalid mode ('rb') or filename: "['C:\\\\maps\\\\map3.pdf']"
There are no blank lines in the csv file. In the csv file I tried changing the back slashes to forward slashes, double slashes, etc. I still get the error message. Is it because data is returned in brackets? If so, how do I strip these?
You are actually trying to create the file ['C:\maps\map3.pdf'], you can tell this because the error messages shows the filename its trying to create:
IOError: [Errno 22] invalid mode ('rb') or filename: "['C:\\\\maps\\\\map3.pdf']"
This value comes from the fact that you are converting to string, the value of the dictionary key, which is a list here:
shutil.copyfile(str(v),newPdfFile)
What you need to do is check if the list has more than one member or not, then step through each member of the list (the v) and copy the file.
for k, v in counties.items():
newPdfFile = (r'C:\maps\Maps\JoinedMaps\County-' + k +'.pdf')
if len(v) > 1:
print newPdfFile
else:
for filename in v:
shutil.copyfile(filename, newPdfFile)
print('v: {}'.format(filename))
My code is running into some unexpedt error. Tried to tweak with have 'u' instead of 'r', but still get same error. Tried other solutions from stacks, but didn't go anywhere. Any suggestion?
#use urlib and beautifulsoup to scrpe table
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import pandas as pd
url = 'https://www.example.com/profiles'
page = urlopen(url).read()
soup = BeautifulSoup(page, 'lxml')
#print(soup)
reEngName = re.compile(r'\[\*\*.+\*\*\]')
reKorName = re.compile(r'\([^\/h]*\)')
reProfile = re.compile(r'\|.+')
for line in re.findall(reEngName, soup):
print(line)
Error message:
Traceback (most recent call last):
File "ckurllib.py", line 18, in <module>
for line in re.findall(reEngName, soup):
File "C:\Users\Sammy\Anaconda3\lib\re.py", line 222, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
Regex works with strings. If you want to search whole raw text of file, give the page to regex. Soap is a parser, that internally splits html into its syntactic components, organized into a tree, you can iterate through them. For example, to iterate all <a> tags:
soup = BeautifulSoup.BeautifulSoup(urllib2.urlopen(url).read())
for a in soup('a'):
out = doThings(a)
in doThings(a):
if a['href'].startswith("http:///www.domain.net"):
Naturally, in latter stage you can use regexes to check for matches in strings.
I am using tweepy twitter api for python, while using it i got some error,
I am not able to use send_direct_message(user/screen_name/user_id, text) this method
Here is my code:-
import tweepy
consumer_key='XXXXXXXXXXXXXXXXX'
consumer_secret='XXXXXXXXXXXXXXXXX'
access_token='XXXXXXXXXXXXXXXXX'
access_token_secret='XXXXXXXXXXXXXXXXX'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
API = tweepy.API(auth)
user = API.get_user('SSPendse')
screen_name="CJ0495"
id = 773436067231956992
text="Message Which we have to send must have maximum 140 characters, If it is Greater then the message will be truncated upto 140 characters...."
# re = API.send_direct_message(screen_name, text)
re = API.send_direct_message(id, text)
print re
got following error:-
Traceback (most recent call last):
File "tweetApi.py", line 36, in <module>
re = API.send_direct_message(id, text)
File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 245, in _call
return method.execute()
File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 229, in execute
raise TweepError(error_msg, resp, api_code=api_error_code)
tweepy.error.TweepError: [{u'message': u'Text parameter is missing.', u'code': 38}]
What will be mistake done by me...???
I also have another problem related to tweepy, How can I moved to page two or got more followers in following code
i=1
user = API.get_user('Apple')
followers = API.followers(user.id,-1)
for follower in followers:
print follower,'\t',i
i=i+1
if I run the code I got only 5000 followers however if I use user.followers_count it gives 362705 followers (This no. may be changes while You check it) How can I see remaining followers
Thank You... :)
To solve your first error, replace re = API.send_direct_message(id, text) with re = API.send_direct_message(id, text=text). This function only works if you give it the message as a named parameter. The parameter name you need here is "text", so you might want to change your variable name to avoid confusion. Also, I just tried it, since you are sending a direct message, not a tweet, it will not be truncated to only the first 140 characters.
For your second question, this should do the trick, as explained here:
followers = []
for page in tweepy.Cursor(API.followers, screen_name='Apple').pages():
followers.extend(page)
time.sleep(60) #this is here to slow down your requests and prevent you from hitting the rate limit and encountering a new error
print(followers)
Running the below script works for 60% of the entries from the MasterGroupList however suddenly fails with the below error. although my questions seem to be poor ou guys have been able to help me before. Any idea how I can avoid getting this error? or what is trhoughing off the script? The masterGroupList looks like:
Groups Pulled from AD
SET00 POWERUSER
SET00 USERS
SEF00 CREATORS
SEF00 USERS
...another 300 entries...
Error:
Traceback (most recent call last):
File "C:\Users\ks185278\OneDrive - NCR Corporation\Active Directory Access Scr
ipt\test.py", line 44, in <module>
print group.member
File "C:\Python27\lib\site-packages\active_directory.py", line 805, in __getat
tr__
raise AttributeError
AttributeError
Code:
from active_directory import *
import os
file = open("C:\Users\NAME\Active Directory Access Script\MasterGroupList.txt", "r")
fileAsList = file.readlines()
indexOfTitle = fileAsList.index("Groups Pulled from AD\n")
i = indexOfTitle + 1
while i <= len(fileAsList):
fileLocation = 'C:\\AD Access\\%s\\%s.txt' % (fileAsList[i][:5], fileAsList[i][:fileAsList[i].find("\n")])
#Creates the dir if it does not exist already
if not os.path.isdir(os.path.dirname(fileLocation)):
os.makedirs(os.path.dirname(fileLocation))
fileGroup = open(fileLocation, "w+")
#writes group members to the open file
group = find_group(fileAsList[i][:fileAsList[i].find("\n")])
print group.member
for group_member in group.member: #this is line 44
fileGroup.write(group_member.cn + "\n")
fileGroup.close()
i+=1
Disclaimer: I don't know python, but I know Active Directory fairly well.
If it's failing on this:
for group_member in group.member:
It could possibly mean that the group has no members.
Depending on how phython handles this, it could also mean that the group has only one member and group.member is a plain string rather than an array.
What does print group.member show?
The source code of active_directory.py is here: https://github.com/tjguk/active_directory/blob/master/active_directory.py
These are the relevant lines:
if name not in self._delegate_map:
try:
attr = getattr(self.com_object, name)
except AttributeError:
try:
attr = self.com_object.Get(name)
except:
raise AttributeError
So it looks like it just can't find the attribute you're looking up, which in this case looks like the 'member' attribute.
Im trying to use ElementTree to get data from a .config file. The structure of this file is like this for example:
<userSettings>
<AutotaskUpdateTicketEstimatedHours.My.MySettings>
<setting name="Username" serializeAs="String">
<value>AAA</value>
</setting>
My code is this:
import os, sys
import xml.etree.ElementTree as ET
class Init():
script_dir = os.path.dirname(__file__)
rel_path = "app.config"
abs_file_path = os.path.join(script_dir, rel_path)
tree = ET.parse(abs_file_path)
root = tree.getroot()
sites = root.iter('userSettings')
for site in sites:
apps = site.findall('AutotaskUpdateTicketEstimatedHours.My.MySettings')
for app in apps:
print(''.join([site.get('Username'), app.get('value')]))
if __name__ == '__main__':
handler = Init()
However, when I run this code I get:
Traceback (most recent call last):
File "/Users/AAAA/Documents/Aptana/AutotaskUpdateTicketEstimatedHours/Main.py", line 5, in <module>
class Init():
File "/Users/AAA/Documents/Aptana/AutotaskUpdateTicketEstimatedHours/Main.py", line 16, in Init
print(''.join([site.get('Username'), app.get('value')]))
TypeError: sequence item 0: expected string, NoneType found
What I'm I doing wrong the causes this error?
(My problem seems to be accessing the tree structure of my config.file correctly)
You may change your code to:
print(''.join([app.get('name'), app.find('value').text]))
app is an Element Object in this case <setting>. Using the get function you will get an attribute value by name (e.g. name, serializeAs), using the find
function you will get a subelement (e.g <value>).
Once you have <value> you can get the data inside with text
Note that site (<AutotaskUpdateTicketEstimatedHours.My.MySettings>) doesn't have any attributes, therefore you get None.