ConvertAPI convertapi-python failing with error to convert PDF_PPT input file to PPTX output file - file-conversion

I followed the instructions for this API conversion service, but get an error that is not covered in their documentation. Below is just a test to see if I can get it working for 1 PDF file. I have over 100 to convert, so I would rather use a proven service, rather than create my own converter.
Here's the code I used:
# -*- coding: utf-8 -*-
''' Spyder Editor '''
import convertapi
path = r'C:\conversion_test'
filename = r'10-cleaning-data-in-python-folder_4-cleaning-data-for-analysis-folder_ch4_slides'
path_and_filename = path + filename
convertapi.api_secret = 'LoDZP1klb1farkdh'
convertapi.convert('pptx', {'File': path_and_filename }, from_format = 'pdf').save_files(path)
The error I am getting is: AttributeError: 'ApiError' object has no attribute 'message'
Here's the full stack trace.
File "C:\ProgramData\Anaconda3\envs\convertapi\lib\site-packages\IPython\core\interactiveshell.py", line 3284, in run_code
self.showtraceback(running_compiled_code=True)
File "C:\ProgramData\Anaconda3\envs\convertapi\lib\site-packages\IPython\core\interactiveshell.py", line 2023, in showtraceback
self._showtraceback(etype, value, stb)
File "C:\ProgramData\Anaconda3\envs\convertapi\lib\site-packages\ipykernel\zmqshell.py", line 546, in _showtraceback
u'evalue' : py3compat.safe_unicode(evalue),
File "C:\ProgramData\Anaconda3\envs\convertapi\lib\site-packages\ipython_genutils\py3compat.py", line 65, in safe_unicode
return unicode_type(e)
File "C:\ProgramData\Anaconda3\envs\convertapi\lib\site-packages\convertapi\exceptions.py", line 14, in str
message = "%s Code: %s. %s" % (self.message, self.code, self.invalid_parameters)
AttributeError: 'ApiError' object has no attribute 'message'

Related

Google Vision API 'TypeError: invalid file'

The following piece of code comes from Google's Vision API Documentation, the only modification I've made is adding the argument parser for the function at the bottom.
import argparse
import os
from google.cloud import vision
import io
def detect_text(path):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str,
help="path to input image")
args = vars(ap.parse_args())
detect_text(args)
If I run it from a terminal like below, I get this invalid file error:
PS C:\VisionTest> python visionTest.py --image C:\VisionTest\test.png
Traceback (most recent call last):
File "visionTest.py", line 31, in <module>
detect_text(args)
File "visionTest.py", line 10, in detect_text
with io.open(path, 'rb') as image_file:
TypeError: invalid file: {'image': 'C:\\VisionTest\\test.png'}
I've tried with various images and image types as well as running the code from different locations with no success.
Seems like either the file doesn't exist or is corrupt since it isn't even read. Can you try another image and validate it is in the location you expect?

Errno 22 when using shutil.copyfile on dictionary values in python

I am getting a feedback error message that I can't seem to resolve. I have a csv file that I am trying to read and generate pdf files based on the county they fall in. If there is only one map in that county then I do not need to append the files (code TBD once this hurdle is resolved as I am sure I will run into the same issue with the code when using pyPDF2) and want to simply copy the map to a new directory with a new name. The shutil.copyfile does not seem to recognize the path as valid for County3 which meets the condition to execute this command.
Map.csv file
County Maps
County1 C:\maps\map1.pdf
County1 C:\maps\map2.pdf
County2 C:\maps\map1.pdf
County2 C:\maps\map3.pdf
County3 C:\maps\map3.pdf
County4 C:\maps\map2.pdf
County4 C:\maps\map3.pdf
County4 C:\maps\map4.pdf
My code:
import csv, os
import shutil
from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
merged_file = PdfFileMerger()
counties = {}
with open(r'C:\maps\Maps.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=",")
for n, row in enumerate(reader):
if not n:
continue
county, location = row
if county not in counties:
counties[county] = list()
counties[county].append((location))
for k, v in counties.items():
newPdfFile = ('C:\maps\Maps\JoinedMaps\County-' + k +'.pdf')
if len(str(v).split(',')) > 1:
print newPdfFile
else:
shutil.copyfile(str(v),newPdfFile)
print 'v: ' + str(v)
Feedback message:
C:\maps\Maps\JoinedMaps\County-County4.pdf
C:\maps\Maps\JoinedMaps\County-County1.pdf
v: ['C:\\maps\\map3.pdf']
Traceback (most recent call last):
File "<module2>", line 22, in <module>
File "C:\Python27\ArcGIS10.5\lib\shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 22] invalid mode ('rb') or filename: "['C:\\\\maps\\\\map3.pdf']"
There are no blank lines in the csv file. In the csv file I tried changing the back slashes to forward slashes, double slashes, etc. I still get the error message. Is it because data is returned in brackets? If so, how do I strip these?
You are actually trying to create the file ['C:\maps\map3.pdf'], you can tell this because the error messages shows the filename its trying to create:
IOError: [Errno 22] invalid mode ('rb') or filename: "['C:\\\\maps\\\\map3.pdf']"
This value comes from the fact that you are converting to string, the value of the dictionary key, which is a list here:
shutil.copyfile(str(v),newPdfFile)
What you need to do is check if the list has more than one member or not, then step through each member of the list (the v) and copy the file.
for k, v in counties.items():
newPdfFile = (r'C:\maps\Maps\JoinedMaps\County-' + k +'.pdf')
if len(v) > 1:
print newPdfFile
else:
for filename in v:
shutil.copyfile(filename, newPdfFile)
print('v: {}'.format(filename))

Getting ParseError when parsing using xml.etree.ElementTree

I am trying to extract the <comment> tag (using xml.etree.ElementTree) from the XML and find the comment count number and add all of the numbers. I am reading the file via a URL using urllib package.
sample data: http://python-data.dr-chuck.net/comments_42.xml
But currently i am trying to trying to print the name, and count.
import urllib
import xml.etree.ElementTree as ET
serviceurl = 'http://python-data.dr-chuck.net/comments_42.xml'
address = raw_input("Enter location: ")
url = serviceurl + urllib.urlencode({'sensor': 'false', 'address': address})
print ("Retrieving: ", url)
link = urllib.urlopen(url)
data = link.read()
print("Retrieved ", len(data), "characters")
tree = ET.fromstring(data)
tags = tree.findall('.//comment')
for tag in tags:
Name = ''
count = ''
Name = tree.find('commentinfo').find('comments').find('comment').find('name').text
count = tree.find('comments').find('comments').find('comment').find('count').number
print Name, count
Unfortunately, I am not able to even parse the XML file into Python, because i am getting this error as follows:
Traceback (most recent call last):
File "ch13_parseXML_assignment.py", line 14, in <module>
tree = ET.fromstring(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 49
I have read previously in a similar situation that maybe the parser isn't accepting the XML file. Anticipating this, i did a Try and Except around tree = ET.fromstring(data) and I was able to get past this line, but later it is throwing an erro saying tree variable is not defined. This defeats the purpose of the output I am expecting.
Can somebody please point me in a direction that helps me?

Error using OAuth2 to connect to dropbox in Python

On my Raspberry Pi running raspbian jessie I tried to go through the OAuth2 flow to connect a program to my dropbox using the dropbox SDK for Python which I installed via pip.
For a test, I copied the code from the documentation (and defined the app-key and secret, of course):
from dropbox import DropboxOAuth2FlowNoRedirect
auth_flow = DropboxOAuth2FlowNoRedirect(APP_KEY, APP_SECRET)
authorize_url = auth_flow.start()
print "1. Go to: " + authorize_url
print "2. Click \"Allow\" (you might have to log in first)."
print "3. Copy the authorization code."
auth_code = raw_input("Enter the authorization code here: ").strip()
try:
access_token, user_id = auth_flow.finish(auth_code)
except Exception, e:
print('Error: %s' % (e,))
return
dbx = Dropbox(access_token)
I was able to get the URL and to click allow. When I then entered the authorization code however, it printed the following error:
Error: 'str' object has no attribute 'copy'
Using format_exc from the traceback-module, I got the following information:
Traceback (most recent call last):
File "test.py", line 18, in <module>
access_token, user_id = auth_flow.finish(auth_code)
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 180, in finish
return self._finish(code, None)
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 50, in _finish
url = self.build_url(Dropbox.HOST_API, '/oauth2/token')
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 111, in build_url
return "https://%s%s" % (self._host, self.build_path(target, params))
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 89, in build_path
params = params.copy()
AttributeError: 'str' object has no attribute 'copy'
It seems the build_path method expects a dict 'params' and receives a string instead. Any ideas?
Thanks to smarx for his comment. The error is a known issue and will be fixed in version 3.42 of the SDK. source

downloading multiple files with urllib.urlretrieve

I'm trying to download multiple files from a website.
The url resembles this: foo.com/foo-1.pdf.
Since I want those files to be stored in a directory of my choice,
I have written the following code:
import os
from urllib import urlretrieve
ext = ".pdf"
for i in range(1,37):
print "fetching file " + str(i)
url = "http://foo.com/Lec-" + str(i) + ext
myPath = "/dir/"
filename = "Lec-"+str(i)+ext
fullfilename = os.path.join(myPath, filename)
x = urlretrieve(url, fullfilename)
EDIT : Complete error message.
Traceback (most recent call last):
File "scraper.py", line 10, in <module>
x = urlretrieve(url, fullfilename)
File "/usr/lib/python2.7/urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/usr/lib/python2.7/urllib.py", line 244, in retrieve
tfp = open(filename, 'wb')
IOError: [Errno 2] No such file or directory: /dir/Lec-1.pdf'
I'd be grateful if someone could point out where I have gone wrong.
Thanks in advance!
As for me your code works (Python3.9). So make sure your script has access to the directory you've specified. Also, it looks like you are trying to open a file which does not exist. So make sure you've downloaded the file before opening it:
fullfilename = os.path.abspath("d:/DownloadedFiles/Lec-1.pdf")
print(fullfilename)
if os.path.exists(fullfilename): # open file only if it exists
with open(fullfilename, 'rb') as file:
content = file.read() # read file's content
print(content[:150]) # print only the first 150 characters
The output would be as follows:
C:/Users/Administrator/PycharmProjects/Tests/dtest.py
d:\DownloadedFiles\Lec-1.pdf
b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n2346 0 obj <</Linearized 1/L 1916277/O 2349/E 70472/N 160/T 1869308/H [ 536 3620]>>\rendobj\r \r\nxref\r\n2346 12\r\n0000000016 00000 n\r'
Process finished with exit code 0