pydrive: Losing file content during upload() - pydrive

I currently have a 34x22 .xlsx spreadsheet. I am downloading it via pydrive, filling in some of the blank values, and uploading the file back via pydrive. When I upload the file back, all cells with formulas are blank (any cell that starts with =). I have a local copy of the file I want to upload, and it looks fine so I'm pretty sure the issue must be with pydrive.
My code:
def upload_r1masterfile(filename='temp.xlsx'):
"""
Upload a given file to drive as our master file
:param filename: name of local file to upload
:return:
"""
# Get the file we want
master_file = find_r1masterfile()
try:
master_file.SetContentFile(filename)
master_file.Upload()
print 'Master file updated. ' + str(datetime.datetime.now())
except Exception, e:
print "Warning: Something wrong with file R1 Master File."
print str(e)
return e
The only hint I have is that if I add the param={'convert': True} tag to Upload, then there is no loss. However, that means I am now working in google sheets format, and I would rather not do that. Not only because it's not the performed format to work with here, but also because if I try to master_file.GetContentFile(filename) I get the error: No downloadLink/exportLinks for mimetype found in metadata
Any hints? Is there another attribute on upload that I am not aware of?
Thanks!

Robin was able to help me answer this question at the github repository. Both suggested solutions worked:
1) When you upload the file, did you close Excel first? IIRC MS Office writes a lot of the content to a temporary file, so that may explain why some parts are missing. If you tried the non converting upload first, the full file may have been saved to disk between the two tries, and thus the second converting upload attempt worked.
2) GetContentFile takes a second argument called mimetype, which should allow you to download the file. Could you try .GetContentFile(filename, mimetype="application/vnd.ms-excel")? If that mimetype doesn't work as anticipated, there is a great StackOverflow post here which lists a bunch of different types you can try.
Thanks again Robin!

Related

How to train SageMaker BlazingText model using multiple channels

I have two separate normalized text files that I want to train my BlazingText model on.
I am struggling to get this to work and the documentation is not helping.
Basically I need to figure out how to supply multiple files or S3 prefixes as "inputs" parameter to the sagemaker.estimator.Estimator.fit() method.
I first tried:
s3_train_data1 = 's3://{}/{}'.format(bucket, prefix1)
s3_train_data2 = 's3://{}/{}'.format(bucket, prefix2)
train_data1 = sagemaker.session.s3_input(s3_train_data1, distribution='FullyReplicated', content_type='text/plain', s3_data_type='S3Prefix')
train_data2 = sagemaker.session.s3_input(s3_train_data2, distribution='FullyReplicated', content_type='text/plain', s3_data_type='S3Prefix')
bt_model.fit(inputs={'train1': train_data1, 'train2': train_data2}, logs=True)
this doesn't work because SageMaker is looking for the key specifically to be "train" in the inputs parameter.
So then i tried:
bt_model.fit(inputs={'train': train_data1, 'train': train_data2}, logs=True)
This trains the model only on the second dataset and ignores the first one completely.
Now finally I tried using a Manifest file using the documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html
(see manifest file format under "S3Uri" section)
the documentation says the manifest file format is a JSON that looks like this example:
[
{"prefix": "s3://customer_bucket/some/prefix/"},
"relative/path/to/custdata-1",
"relative/path/custdata-2"
]
Well, I don't think this is valid JSON in the first place but what do I know, I still give it a try.
When I try this:
s3_train_data_manifest = 'https://s3.us-east-2.amazonaws.com/bucketpath/myfilename.manifest'
train_data_merged = sagemaker.session.s3_input(s3_train_data_manifest, distribution='FullyReplicated', content_type='text/plain', s3_data_type='ManifestFile')
data_channel_merged = {'train': train_data_merged}
bt_model.fit(inputs=data_channel_merged, logs=True)
I get an error saying:
ValueError: Error training blazingtext-2018-10-17-XX-XX-XX-XXX: Failed Reason: ClientError: Data download failed:Unable to parse manifest at s3://mybucketpath/myfilename.manifest - invalid format
I tried replacing square brackets in my manifest file with curly braces ...but still I feel the JSON file format seems to be missing something that documentation fails to describe correctly?
You can certainly match multiple files with the same prefix, so your first attempt could have worked as long as you organize your files in your S3 bucket to suit. For e.g. the prefix: s3://mybucket/foo/ will match the files s3://mybucket/foo/bar/data1.txt and s3://mybucket/foo/baz/data2.txt
However, if there is a third file in your bucket called s3://mybucket/foo/qux/data3.txt that you don't want matched (while still matching the first two) there is no way to do achieve that with a single prefix. In these cases a manifest would work. So, in the above example, the manifest would simply be:
[
{"prefix": "s3://mybucket/foo/"},
"bar/data1.txt",
"baz/data2.txt"
]
(and yes, this is valid json - it is an array whose first element is an object with an attribute called prefix and all subsequent elements are strings).
Please double check your manifest (you didn't actually post it so I can't do that for you) and make sure it conforms to the above syntax.
If you're still stuck please open up a thread on the AWS sagemaker forums - https://forums.aws.amazon.com/forum.jspa?forumID=285 and after you do that we can setup a PM to try and get to the bottom of this (never post your AWS account id in a public forum like StackOverflow or even in AWS forums).

Automatic file download in Django

I am creating a file download functionality on link click from admin panel in django. I am using FileField for storing the files. For the download purpose I researched and found help on stackoverflow. After using that help, I have the following code for file download (with some minor changes of my own).
def pdf_download(request):
#print("request: ", request.META["PATH_INFO"])
a = request.META["PATH_INFO"]
#print(type(a))
a = a.split("/")
a = a[-1]
#print(a)
#print(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
with open(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))+"\\router_specifications\\"+a ,"rb") as pdf:
#Here router_specifications is the directory on local storage where the uploaded files are being stored.
response = HttpResponse(pdf.read()) #can add ', content_type = "application/pdf" as a specific pdf parameter'
response["Content-Disposition"] = "attachment; filename ="+a
pdf.close()
return response
Now, when I this code runs in my laptop, the file is downloaded automatically. but, when I switch to some other laptop, it asks me where should I save the file i.e. it's not automatically getting downloaded.
What changes should I do so that the file automatically gets downloaded without asking for manual save. Requesting help at the earliest.
You can try adding the following content_type:
content_type='application/force-download'

Extracting Outlook Attachment from Saved Email

I have an analysis project that is requiring me to extract the 'current state' of a PDF that houses our report that is sent out 4 times daily. I have the code written to scrape my PDF but I need to figure out how to extract the PDF from the email so I can step through it with my code.
I tried using the code below
import win32com.client
import os
location = r'C:\Users\myusername\OneDrive - companyinfo\Department Projects\TestEmails'
files = [f for f in os.listdir(location)]
print(files)
for file in files:
if file.endswith('.msg'):
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(file)
att = msg.Attachments
for i in att:
i.SaveAsFil`e(os.path.join(r'C:\Users\username\OneDrive - companyname\Department Projects\TestPDF', i.FileName))
The error it produces is:
pywintypes.com_error: (-2147352567, 'Exception occurred.', (4096, u'Microsoft Outlook', u"We can't open 'Stats Report.msg'. It's possible the file is already open, or you don't have permission to open it.\n\nTo check your permissions, right-click the file folder, then click Properties.", None, 0, -2147287038), None)
I am only currently testing with one saved test.msg file but I have over 1400 I need to parse through. Maybe this isn't the best technique as I know VBA could do something similar within outlook, but I don't have much skills in the VBA region.
I have outlook 2016 installed on Windows 7 computer running python 2.7. Is this error something easy to fix? Is there a better technique to take an attached PDF and save it to a folder so my other program can grab the necessary data?
Desired output: PDF Attachment is Extracted and Saved into a separate folder.
Thank you for your help and expertise,
Andy
So I figured out the answer and how simple and stupid it was makes me unreasonably frustrated.....
My working directory was wrong even though I grabbed the file, the file name was the only item created.
I created a true_location variable that gave it the true full working directory and it worked like a charm.
true_location = location + '\\' + file
Enter that in the for loop under the if clause and it works like a charm.
Best,
Andy

Django FileField uploads stored on filesystem with different filename

HIn Django, when uploading a file with spaces, and brackets, it's stored in the file system with a different filename.
For example, when uploading the file 'lo go (1).jpg' via the admin interface, it's stored on the filesystem as 'lo__go_1.jpg'.
How can I know what the file will be called at upload time? I can't seem to find the source code that replaces the characters.
I found out the answer to my question.
https://github.com/django/django/blob/master/django/db/models/fields/files.py#L310
https://github.com/django/django/blob/master/django/core/files/storage.py#L58
https://github.com/django/django/blob/master/django/utils/text.py#L234

rails4 upload file extension error ActionDispatch::Http::UploadedFile

Guy
now i want to upload file with rails 4
my problem now i can't check the file extension before upload it
Note : I can upload the file well but i want to get the file kind before upload it
because i need the extension in another step in my App.
I'm tried to use the commands
File.extname(params[:Upload])
but always got the error
can't convert ActionDispatch::Http::UploadedFile into String
also how i can get the file base name before upload it ??
when i tryed to use
File.basename(params[:Upload])
i got the same error
can't convert ActionDispatch::Http::UploadedFile into String
also when i tried to convert the name to Sting i don't get any thing
That's because File.extname expects a string file name, but an uploaded file (your params[:upload] is an object, it's an instance of the ActionDispatch::Http::UploadedFile class (kind of a temporary file)
To fix the problem you need to call the path property on your params[:upload] object, kind of like that
File.extname(params[:Upload].path)
Btw, if you're trying to get the type of the uploaded file, I'd encourage you to check for the params[:Upload].content_type instead, it's harder to spoof
You can use this:
params[:Upload].original_filename.split('.').last
The original_filename contains full filename with extension.
so you split it based on '.' and last index will contain the file extension.
For Example:
"my_file.doc.pdf".split('.').last # => 'pdf'
You can check this for more info ActionDispatch::Http::UploadedFile.