Read attribute data from an xml in python - python-2.7

I am trying to read data from an xml file from an url using request module in python
import requests
from requests.auth import HTTPBasicAuth
import xml.etree.ElementTree as et
url ="https://sample.com/simple.xml"
response = requests.get(url,auth=HTTPBasicAuth(username,password))
xml_data = et.fromstring(response.text)
The error I am getting is:
Traceback (most recent call last):
File "C:\Python27\myfolder\Artifactory.py", line 156, in <module>
xml_data = et.fromstring(xml_response.text)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1311, in XML
parser.feed(text)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1657, in feed
self._parser.Parse(data, 0)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 8419: ordinal not in range(128)
So i changed the code to xml_data = et.parse(response.text)
then the error is :
Traceback (most recent call last):
File "C:\Python27\myfolder\Artifactory.py", line 156, in <module>
xml_data = et.parse(xml_response.text)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 647, in parse
source = open(source, "rb")
IOError: [Errno 2] No such file or directory: u'<?xml version="1.0" encoding="utf-8"?>
After this error the xml data is getting printed
please help me in this issue

et.parse requires file path (not contents).
You need to encode response to utf-8
xml_data = et.fromstring(response.text.encode('utf-8'))

The first attempt seems like an encoding issue with python.
try adding this to your code between your last import and your url variable.
import sys
reload(sys)
sys.setdefaultencoding("utf8")

Related

UnicodeEncodeError: 'ascii' codec can't encode characters in position 62-11168: ordinal not in range(128)

Help me figure out what's wrong with this. I am running Text summarization using Transformers
~/Bart_T5-summarization$ python app.py
No handlers could be found for logger "transformers.data.metrics"
Traceback (most recent call last):
File "app.py", line 6, in
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
File "/home/darshan/.local/lib/python2.7/site-packages/transformers/init.py", line 42, in
from .tokenization_auto import AutoTokenizer
File "/home/darshan/.local/lib/python2.7/site-packages/transformers/tokenization_auto.py", line 28, in
from .tokenization_xlm import XLMTokenizer
File "/home/darshan/.local/lib/python2.7/site-packages/transformers/tokenization_xlm.py", line 27, in
import sacremoses as sm
File "/home/darshan/.local/lib/python2.7/site-packages/sacremoses/init.py", line 2, in
from sacremoses.tokenize import *
File "/home/darshan/.local/lib/python2.7/site-packages/sacremoses/tokenize.py", line 16, in
class MosesTokenizer(object):
File "/home/darshan/.local/lib/python2.7/site-packages/sacremoses/tokenize.py", line 41, in MosesTokenizer
PAD_NOT_ISALNUM = r"([^{}\s.'`\,-])".format(IsAlnum), r" \1 "
enter image description here
UnicodeEncodeError: 'ascii' codec can't encode characters in position 62-11168: ordinal not in range(128)
Running the command with python3 instead of python solved this issue for me. I was able to run the code and obtain a summarization.

Google Vision Python 2.7 TypeError: construct_settings() got an unexpected keyword argument 'metrics_headers'

After installing the required packages using pip, downloading a Json key and setting the enviroment variable in the cmd window with: set GOOGLE_APPLICATION_CREDENTIALS = 'C:\Users\ xxx .json' and following the instructions to use the Google Vision API on https://googlecloudplatform.github.io/google-cloud-python/stable/vision-usage.html#authentication-and-configuration
I tried the following and got the following error without any idea how to solve the error, so all suggestions are much appreciated
>>> from google.cloud import vision
>>> client =vision.Client()
>>> print client
<google.cloud.vision.client.Client object at 0x08D414F0>
>>> image = client.image(filename='test2.jpg')
>>> print image
<google.cloud.vision.image.Image object at 0x0CBF68F0>
>>> text = image.detect_text()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\google\cloud\vision\image.py", line 289, in detect_text
annotations = self.detect(features)
File "C:\Python27\lib\site-packages\google\cloud\vision\image.py", line 143, in detect
return self._detect_annotation(images)
File "C:\Python27\lib\site-packages\google\cloud\vision\image.py", line 117, in _detect_annotation
return self.client._vision_api.annotate(images)
File "C:\Python27\lib\site-packages\google\cloud\vision\client.py", line 114, in _vision_api
self._vision_api_internal = _GAPICVisionAPI(self)
File "C:\Python27\lib\site-packages\google\cloud\vision\_gax.py", line 34, in __init__
lib_version=__version__)
File "C:\Python27\lib\site-packages\google\cloud\gapic\vision\v1\image_annotator_client.py", line 140, in __init__
metrics_headers=metrics_headers, )
TypeError: construct_settings() got an unexpected keyword argument 'metrics_headers'

create file containing '/' in file name in python

how can I create a file in python if the filename contains '/'
url='https://www.udacity.com/cs101x/index.html'
f=open(url,'w')
f.write('123')
f.close()
above code produces an error as
Traceback (most recent call last):
File "9.py", line 2, in <module>
f=open(url,'w')
IOError: [Errno 22] invalid mode ('w') or filename:https://www.udacity.com/cs101x/index.html'
Use os.path.basename() to isolate the filename.
import os
url='https://www.udacity.com/cs101x/index.html'
filename = os.path.basename(url)
f=open(filename,'w')
f.write('123')
f.close()
This will create a file called index.html

ftp a tar file using python

I am trying to copy a file from remote server to my local machine using FTP and i have written a python script for it.:
import os
from ftplib import FTP
btsIp=raw_input("Enter the IP:")
try:
ftp=FTP(btsIp,user='',passwd='',timeout=20)
except :
print("FTP Connection failed")
sys.exit(1)
ftp.cwd('/store/slv_imt')
filenames=ftp.nlst()
for eachFile in filenames:
localName=os.path.join('D:\Users\shaik-s\Documents\SLV_logs',eachFile)
file=open(localName,'w')
ftp.retrbinary('RETR' + eachFile,file.write)
file.close()
ftp.quit()
However, i am getting error, as command not understood:
slv_imt_2016-07-13__00-00-34.tar
Traceback (most recent call last):
File "C:/Python27/Scripts/ftp.py", line 16, in <module>
ftp.retrbinary('RETR' + eachFile,file.write)
File "C:\Python27\lib\ftplib.py", line 414, in retrbinary
conn = self.transfercmd(cmd, rest)
File "C:\Python27\lib\ftplib.py", line 376, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "C:\Python27\lib\ftplib.py", line 339, in ntransfercmd
resp = self.sendcmd(cmd)
File "C:\Python27\lib\ftplib.py", line 249, in sendcmd
return self.getresp()
File "C:\Python27\lib\ftplib.py", line 224, in getresp
raise error_perm, resp
error_perm: 500 'RETRSLV_IMT_2016-07-13__00-00-34.TAR': command not understood.
Kindly help..
To fix the error, you need to add a space between "RETR" and "eachFile".
ftp.retrbinary('RETR %s'%eachFile, file.write)
But your code shows that you may encounter another problem, try to use the following cmd.
file = open(localName, 'wb')

NLTK python tokenizing a CSV file

I have began to experiment with Python and NLTK.
I am experiencing a lengthy error message which I cannot find a solution to and would appreciate any insights you may have.
import nltk,csv,numpy
from nltk import sent_tokenize, word_tokenize, pos_tag
reader = csv.reader(open('Medium_Edited.csv', 'rU'), delimiter= ",",quotechar='|')
tokenData = nltk.word_tokenize(reader)
I'm running Python 2.7 and the latest nltk package on OSX Yosemite.
These are also two lines of code I attempted with no difference in results:
with open("Medium_Edited.csv", "rU") as csvfile:
tokenData = nltk.word_tokenize(reader)
These are the error messages I see:
Traceback (most recent call last):
File "nltk_text.py", line 11, in <module>
tokenData = nltk.word_tokenize(reader)
File "/Library/Python/2.7/site-packages/nltk/tokenize/__init__.py", line 101, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "/Library/Python/2.7/site-packages/nltk/tokenize/__init__.py", line 86, in sent_tokenize
return tokenizer.tokenize(text)
File "/Library/Python/2.7/site-packages/nltk/tokenize/punkt.py", line 1226, in tokenize
return list(self.sentences_from_text(text, realign_boundaries))
File "/Library/Python/2.7/site-packages/nltk/tokenize/punkt.py", line 1274, in sentences_from_text
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "/Library/Python/2.7/site-packages/nltk/tokenize/punkt.py", line 1265, in span_tokenize
return [(sl.start, sl.stop) for sl in slices]
File "/Library/Python/2.7/site-packages/nltk/tokenize/punkt.py", line 1304, in _realign_boundaries
for sl1, sl2 in _pair_iter(slices):
File "/Library/Python/2.7/site-packages/nltk/tokenize/punkt.py", line 310, in _pair_iter
prev = next(it)
File "/Library/Python/2.7/site-packages/nltk/tokenize/punkt.py", line 1278, in _slices_from_text
for match in self._lang_vars.period_context_re().finditer(text):
TypeError: expected string or buffer
Thanks in advance
As you can read in the Python csv documentation, csv.reader "returns a reader object which will iterate over lines in the given csvfile". In other words, if you want to tokenize the text in your csv file, you will have to go through the lines and the fields in those lines:
for line in reader:
for field in line:
tokens = word_tokenize(field)
Also, when you import word_tokenize at the beginning of your script, you should call it as word_tokenize, and not as nltk.word_tokenize. This also means you can drop the import nltk statement.
It is giving error - expected string or buffer because you have forgotten to add str as
tokenData = nltk.word_tokenize(str(reader))