I have an analysis project that is requiring me to extract the 'current state' of a PDF that houses our report that is sent out 4 times daily. I have the code written to scrape my PDF but I need to figure out how to extract the PDF from the email so I can step through it with my code.
I tried using the code below
import win32com.client
import os
location = r'C:\Users\myusername\OneDrive - companyinfo\Department Projects\TestEmails'
files = [f for f in os.listdir(location)]
print(files)
for file in files:
if file.endswith('.msg'):
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(file)
att = msg.Attachments
for i in att:
i.SaveAsFil`e(os.path.join(r'C:\Users\username\OneDrive - companyname\Department Projects\TestPDF', i.FileName))
The error it produces is:
pywintypes.com_error: (-2147352567, 'Exception occurred.', (4096, u'Microsoft Outlook', u"We can't open 'Stats Report.msg'. It's possible the file is already open, or you don't have permission to open it.\n\nTo check your permissions, right-click the file folder, then click Properties.", None, 0, -2147287038), None)
I am only currently testing with one saved test.msg file but I have over 1400 I need to parse through. Maybe this isn't the best technique as I know VBA could do something similar within outlook, but I don't have much skills in the VBA region.
I have outlook 2016 installed on Windows 7 computer running python 2.7. Is this error something easy to fix? Is there a better technique to take an attached PDF and save it to a folder so my other program can grab the necessary data?
Desired output: PDF Attachment is Extracted and Saved into a separate folder.
Thank you for your help and expertise,
Andy
So I figured out the answer and how simple and stupid it was makes me unreasonably frustrated.....
My working directory was wrong even though I grabbed the file, the file name was the only item created.
I created a true_location variable that gave it the true full working directory and it worked like a charm.
true_location = location + '\\' + file
Enter that in the for loop under the if clause and it works like a charm.
Best,
Andy
Related
I am creating a file download functionality on link click from admin panel in django. I am using FileField for storing the files. For the download purpose I researched and found help on stackoverflow. After using that help, I have the following code for file download (with some minor changes of my own).
def pdf_download(request):
#print("request: ", request.META["PATH_INFO"])
a = request.META["PATH_INFO"]
#print(type(a))
a = a.split("/")
a = a[-1]
#print(a)
#print(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
with open(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))+"\\router_specifications\\"+a ,"rb") as pdf:
#Here router_specifications is the directory on local storage where the uploaded files are being stored.
response = HttpResponse(pdf.read()) #can add ', content_type = "application/pdf" as a specific pdf parameter'
response["Content-Disposition"] = "attachment; filename ="+a
pdf.close()
return response
Now, when I this code runs in my laptop, the file is downloaded automatically. but, when I switch to some other laptop, it asks me where should I save the file i.e. it's not automatically getting downloaded.
What changes should I do so that the file automatically gets downloaded without asking for manual save. Requesting help at the earliest.
You can try adding the following content_type:
content_type='application/force-download'
In a Rails 4 project I have been tasked with merging an “about” page to the end of PDF documents that have been uploaded via Paperclip.
The issue is that some of the uploaded pdfs contain optional content. I started out using combine_pdf but it does not support files with optional content as explained here. I have tried Prawn but it no longer support this functionality. I finally found the PDF Toolkit gem but the documentation does not say anything about optional content support. PDF Toolkit is a command line tool that has been wrapped in a Gem and therefore operates outside of the Rails application. I have tried using these command line examples in my CLI: pdftk file1.pdf file2.pdf cat output out_file.pdf but the terminal just hangs indefinitely.
The gem's documentation here is very unclear (to me) and may be the source of my issues.
My hope is to find advice on how to accomplish this using PDF Toolkit OR a better library for merging PDFs with optional content.
I have gotten this far by researching Stack Overflow questions like this and this
My OS is OSX 10.11.6
existing_pdf_path = #report.document.file.path
# Create a html template and convert it to pdf
about_company_html = render_to_string("_about_company.html.erb", layout: false)
about_company_pdf = WickedPdf.new.pdf_from_string(about_company_html, orientation: 'Landscape')
# Save about_company_pdf to file in tmp/pdf
about_company_pdf_path = Rails.root.join('tmp/pdf', 'about_company_partial.pdf').to_s
File.open(about_company_pdf_path, 'wb') { |file| file << about_company_pdf }
# Create and save a blank target file we will save everything to
combined_pdf_path = Rails.root.join('tmp/pdf', 'combined.pdf').to_s
FileUtils.touch(combined_pdf_path)
# This returns false OR just hangs depending on my exact syntax, no error or backtrace
result = PDF::Toolkit.pdftk( *%w(existing_pdf_path about_company_pdf_path cat output combined_pdf_path) )
I have tried as many variations of the above call to pdftk as I can find or think of. For example PDF::Toolkit.pdftk( existing_pdf_path, about_lux_pdf_path, 'cat', 'output', combined_pdf_path ) with no results.
I currently have a 34x22 .xlsx spreadsheet. I am downloading it via pydrive, filling in some of the blank values, and uploading the file back via pydrive. When I upload the file back, all cells with formulas are blank (any cell that starts with =). I have a local copy of the file I want to upload, and it looks fine so I'm pretty sure the issue must be with pydrive.
My code:
def upload_r1masterfile(filename='temp.xlsx'):
"""
Upload a given file to drive as our master file
:param filename: name of local file to upload
:return:
"""
# Get the file we want
master_file = find_r1masterfile()
try:
master_file.SetContentFile(filename)
master_file.Upload()
print 'Master file updated. ' + str(datetime.datetime.now())
except Exception, e:
print "Warning: Something wrong with file R1 Master File."
print str(e)
return e
The only hint I have is that if I add the param={'convert': True} tag to Upload, then there is no loss. However, that means I am now working in google sheets format, and I would rather not do that. Not only because it's not the performed format to work with here, but also because if I try to master_file.GetContentFile(filename) I get the error: No downloadLink/exportLinks for mimetype found in metadata
Any hints? Is there another attribute on upload that I am not aware of?
Thanks!
Robin was able to help me answer this question at the github repository. Both suggested solutions worked:
1) When you upload the file, did you close Excel first? IIRC MS Office writes a lot of the content to a temporary file, so that may explain why some parts are missing. If you tried the non converting upload first, the full file may have been saved to disk between the two tries, and thus the second converting upload attempt worked.
2) GetContentFile takes a second argument called mimetype, which should allow you to download the file. Could you try .GetContentFile(filename, mimetype="application/vnd.ms-excel")? If that mimetype doesn't work as anticipated, there is a great StackOverflow post here which lists a bunch of different types you can try.
Thanks again Robin!
I have a dataset containing thousands of tweets. Some of those contain urls but most of them are in the classical shortened forms used in Twitter. I need something that gets the full urls so that I can check the presence of some particular websites. I have solved the problem in Python like this:
import urllib2
url_filename='C:\Users\Monica\Documents\Pythonfiles\urlstrial.txt'
url_filename2='C:\Users\Monica\Documents\Pythonfiles\output_file.txt'
url_file= open(url_filename, 'r')
out = open(url_filename2, 'w')
for line in url_file:
tco_url = line.strip('\n')
req = urllib2.urlopen(tco_url)
print >>out, req.url
url_file.close()
out.close()
Which works but requires that I export my urls from Stata to a .txt file and then reimport the full urls. Is there some version of my Python script that would allow me to integrate the task in Stata using the shell command? I have quite a lot of different .dta files and I would ideally like to avoid appending them all just to execute this task.
Thanks in advance for any answer!
Sure, this is possible without leaving Stata. I am using a Mac running OS X. The details might differ on your operating system, which I am guessing is Windows.
Python and Stata Method
Say we have the following trivial Python program, called hello.py:
#!/usr/bin/env python
import csv
data = [['name', 'message'], ['Monica', 'Hello World!']]
with open('data.csv', 'w') as wsock:
wtr = csv.writer(wsock)
for i in data:
wtr.writerow(i)
wsock.close()
This "program" just writes some fake data to a file called data.csv in the script's working directory. Now make sure the script is executable: chmod 755 hello.py.
From within Stata, you can do the following:
! ./hello.py
* The above line called the Python program, which created a data.csv file.
insheet using data.csv, comma clear names case
list
+-----------------------+
| name message |
|-----------------------|
1. | Monica Hello World! |
+-----------------------+
This is a simple example. The full process for your case will be:
Write file to disk with the URLs, using outsheet or some other command
Use ! to call the Python script
Read the output into Stata using insheet or infile or some other command
Cleanup by deleting files with capture erase my_file_on_disk.csv
Let me know if that is not clear. It works fine on *nix; as I said, Windows might be a little different. If I had a Windows box I would test it.
Pure Stata Solution (kind of a hack)
Also, I think what you want to accomplish can be done completely in Stata, but it's a hack. Here are two programs. The first simply opens a log file and makes a request for the url (which is the first argument). The second reads that log file and uses regular expressions to find the url that Stata was redirected to.
capture program drop geturl
program define geturl
* pass short url as first argument (e.g. http://bit.ly/162VWRZ)
capture erase temp_log.txt
log using temp_log.txt
copy `1' temp_web_file
end
The above program will not finish because the copy command will fail (intentionally). It also doesn't clean up after itself (intentionally). So I created the next program to read what happened (and get the URL redirect).
capture program drop longurl
program define longurl, rclass
* find the url in the log file created by geturl
capture log close
loc long_url = ""
file open urlfile using temp_log.txt , read
file read urlfile line
while r(eof) == 0 {
if regexm("`line'", "server says file permanently redirected to (.+)") == 1 {
loc long_url = regexs(1)
}
file read urlfile line
}
file close urlfile
return local url "`long_url'"
end
You can use it like this:
geturl http://bit.ly/162VWRZ
longurl
di "The long url is: `r(url)'"
* The long url is: http://www.ciwati.it/2013/06/10/wdays/?utm_source=twitterfeed&
* > utm_medium=twitter
You should run them one after the other. Things might get ugly using this solution, but it does find the URL you are looking for. May I suggest that another approach is to contact the shortening service and ask nicely for some data?
If someone at Stata is reading this, it would be nice to have copy return HTTP response header information. Doing this entirely in Stata is a little out there. Personally I would use entirely Python for this sort of thing and use Stata for the analysis of data once I had everything I needed.
Using TextMate:
Is it possible to assign a shortcut to preview/refresh the currently edited HTML document in, say, Firefox, without having to first hit Save?
I'm looking for the same functionality as TextMate's built-in Web Preview window, but I'd prefer an external browser instead of TextMate's. (Mainly in order to use a JavaScript console such as Firebug for instance).
Would it be possible to pipe the currently unsaved document through the shell and then preview in Firefox. And if so, is there anyone having a TextMate command for this, willing to share it?
Not trivially. The easiest way would be to write the current file to the temp dir, then launch that file.. but, this would break any relative links (images, scripts, CSS files)
Add a bundle:
Input: Entire Document
Output: Discard
Scope Selector: source.html
And the script:
#!/usr/bin/env python2.5
import os
import sys
import random
import tempfile
import subprocess
fname = os.environ.get("TM_FILEPATH", "Untitled %s.html" % random.randint(100, 1000))
fcontent = sys.stdin.read()
fd, name = tempfile.mkstemp()
print name
open(name, "w+").write(fcontent)
print subprocess.Popen(["open", "-a", "Firefox", name]).communicate()
As I said, that wont work with relative resource links, which is probably a big problem.. Another option is to modify the following line of code, from the exiting "Refresh Browsers" command:
osascript <<'APPLESCRIPT'
tell app "Firefox" to Get URL "JavaScript:window.location.reload();" inside window 1
APPLESCRIPT
Instead of having the javascript reload the page, it could clear it, and write the current document using a series of document.write() calls. The problem with this is you can't guarantee the current document is the one you want to replace.. Windows 1 could have changed to another site etc, especially with tabbed browsing..
Finally, an option that doesn't have a huge drawback: Use version control, particularly one of the "distributed" ones, where you don't have to send your changes to a remote server - git, mercurial, darcs, bazaar etc (all have TextMate integration also)
If your code is in version control, it doesn't matter if you save before previewing, you can also always go back to your last-commited version if you break something and lose the undo buffer.
Here's something that you can use and just replace "Safari" with "Firefox":
http://wiki.macromates.com/Main/Howtos#SafariPreview
Open the Bundle Editor (control + option + command + B)
Scroll to the HTML Bundle and expand the tree
Select "Open Document in Running Browser(s)"
Assign Activation Key Equivalent (shortcut)
Close the bundle editor
I don't think this is possible. You can however enable the 'atomic saves' option so every time you alt tab to Firefox your project is saved.
If you ever find a solution to have a proper Firefox live preview, let us know.