How to pass InMemoryUploadedFile as a file? - django

User records audio, audio gets saved into audio Blob and sent to backend. I want to get the audio file and send it to openai whisper API.
files = request.FILES.get('audio')
audio = whisper.load_audio(files)
I've tried different ways to send the audio file but none of it seemed to work and I don't understand how it should be sent. I would prefer not to save the file. I want user recorded audio sent to whisper API from backend.
Edit*
The answer by AKX seems to work but now there is another error
Edit 2*
He has edited his answer and everything works perfectly now. Thanks a lot to #AKX!

load_audio() requires a file on disk, so you'll need to cater to it – but you can use a temporary file that's automagically deleted outside the with block. (On Windows, you may need to use delete=False because of sharing permission reasons.)
import os
import tempfile
file = request.FILES.get('audio')
with tempfile.NamedTemporaryFile(suffix=os.path.splitext(file.name)[1], delete=False) as f:
for chunk in file.chunks():
f.write(chunk)
f.seek(0)
try:
audio = whisper.load_audio(f.name)
finally:
os.unlink(f.name)

Related

How do I make excel spreadsheets downloadable in Django?

I'm writing a web application that generates reports from a local database. I want to generate an excel spreadsheet and immediately cause the user to download it. However, when I try to return the file via HttpResponse, I can not open the file. However, if I try to open the file in storage, the file opens perfectly fine.
This is using Django 2.1 (for database reasons, I'm not using 2.2) and I'm generating the file with xlrd. There is another excel spreadsheet that will need to be generated and downloaded that uses the openpyxl library (both libraries serve very distinct purposes IMO).
This spreadsheet is not very large (5x6 column s xrows).
I've looked at other similar stack overflow questions and followed their instructions. Specifically, I am talking about this answer:
https://stackoverflow.com/a/36394206/6411417
As you can see in my code, the logic is nearly the same and yet I can not open the downloaded excel spreadsheets. The only difference is that my file name is generated when the file is generated and returned into the file_name variable.
def make_lrm_summary_file(request):
file_path = make_lrm_summary()
if os.path.exists(file_path):
with open(file_path, 'rb') as fh:
response = HttpResponse(fh.read(), content_type="application/vnd.ms-excel")
response['Content-Disposition'] = f'inline; filename="{ os.path.basename(file_path) }"'
return response
raise Http404
Again, the file is properly generated and stored on my server but the download itself is providing an excel file that can not be opened. Specifically, I get the error message:
EXCEL.EXE - Application Error | The application was unable to start correctly (0x0000005). Click OK to close the application.

How to zip or tar a static folder without writing anything to the filesystem in python?

I know about this question. But you can’t write to filesystem in app engine (shutil or zipfile require creating files).
So basically I need to archive something like/base/naclusing zip or tar, and write the output to the web browser asking the page (the output will never exceed 32 Mb).
It just happened that I had to solve the exact same problem tonight :) This worked for me:
import StringIO
import tarfile
fd = StringIO.StringIO()
with tarfile.open(mode="w:gz", fileobj=fd) as tgz:
tgz.add('dir_to_download')
self.response.headers['Content-Type'] ='application/octet-stream'
self.response.headers['Content-Disposition'] = 'attachment; filename="archive.tgz"'
self.response.write(fd.getvalue())
Key points:
used StringIO to fake a file in memory
used fileobj to pass directly the fake file's object to tarfile.open() (also supported by gzip.GzipFile() if you prefer gzip instead of tarfile)
set headers to present the response as a downloadable file

How do I get a Ruby IO stream for a Paperclip Attachment?

I have an application that stores uploaded CSV files using the Paperclip gem.
Once uploaded, I would like to be able to stream the data from the uploaded file into code that reads it line-by-line and loads it into a data-staging table in Postgres.
I've gotten this far in my efforts, where data_file.upload is a Paperclip CSV Attachment
io = StringIO.new(Paperclip.io_adapters.for(data_file.upload).read, 'r')
Even though ^^ works, the problem is that - as you can see - it loads the entire file into memory as a honkin' Ruby String, and Ruby String garbage is notoriously bad for app performance.
Instead, I want a Ruby IO object that supports use of e.g., io.gets so that the IO object handles buffering and cleanup, and the whole file doesn't sit as one huge string in memory.
Thanks in advance for any suggestions!
With some help (from StackOverflow, of course), I was able to suss this myself.
In my PaperClip AR model object, I now have the following:
# Done this way so we get auto-closing of the File object
def yielding_upload_as_readable_file
# It's quite annoying that there's not 1 method that works for both filesystem and S3 storage
open(filesystem_storage? ? upload.path : upload.url) { |file| yield file }
end
def filesystem_storage?
Paperclip::Attachment.default_options[:storage] == :filesystem
end
... and, I consume it in another model like so:
data_file.yielding_upload_as_readable_file do |file|
while line = file.gets
next if line.strip.size == 0
... process line ...
end
end

Downloading large files in Python

In python 2.7.3, I try to create a script to download a file over the Internet. I use the urllib2 module.
Here, what I have done :
import urllib2
HTTP_client = urllib2.build_opener()
#### Here I can modify HTTP_client headers
URL = 'http://www.google.com'
data = HTTP_client.open(URL)
with open ('file.txt','wb') as f:
f.write(data.read())
OK. That's work perfectly.
The problem is when I want to save big files (hundreds of MB). I think that when I call the 'open' method, it downloads the file in memory. But, what about large files ? It will not save 1 GB of data in memory !! What happen if i lost connection, all the downloaded part is lost.
How to download large files in Python like wget does ? In wget, it downloads the file 'directly' in hard disk. We can see the file growning up in size.
I'm surprised there is no method 'retrieve' for doing stuff like
HTTP_client.retrieve(URL, 'filetosave.ext')
To resolve this, you can read chunks at a time and write them to file.
req = urllib2.urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)

django (or wsgi) chain stdout from subprocess

I am writing a webservice in Django to handle image/video streams, but it's mostly done in an external program. For instance:
client requests for /1.jpg?size=300x200
python code parse 300x200 in django (or other WSGI app)
python calls convert (part of Imagemagick) using subprocess module, with parameter 300x200
convert reads 1.jpg from local disk, convert to size accordingly
Writing to a temp file
Django builds HttpResponse() and read the whole temp file content as body
As you can see, the whole temp file read-then-write process is inefficient. I need a generic way to handle similar external programs like this, not only convert, but others as well like cjpeg, ffmepg, etc. or even proprietary binaries.
I want to implement it in this way:
python gets the stdout fd of the convert child process
chain it to WSGI socket fd for output
I've done my homework, Google says this kind of zero-copy could be done with system call splice(). but it's not available in Python. So how to maximize performance in Python for these kind of scenario?
Call splice() using ctypes?
hack memoryview() or buffer() ?
subprocess has stdout which has readinto(), could this be utilized somehow?
How could we get fd number for any WSGI app?
I am kinda newbie to these, any suggestion is appreciated, thanks!
If the goal is to increase performance, you ought to examine the bottlenecks on a case-by-case basis, rather than taking a "one solution fits all" approach.
For the convert case, assuming the images aren't insanely large, the bottleneck there will most likely be spawning a subprocess for each request.
I'd suggest avoiding creating a subprocess and a temporary file, and do the whole thing in the Django process using PIL with something like this...
import os
from PIL import Image
from django.http import HttpResponse
IMAGE_ROOT = '/path/to/images'
# A Django view which returns a resized image
# Example parameters: image_filename='1.jpg', width=300, height=200
def resized_image_view(request, image_filename, width, height):
full_path = os.path.join(IMAGE_ROOT, image_filename)
source_image = Image.open(full_path)
resized_image = source_image.resize((width, height))
response = HttpResponse(content_type='image/jpeg')
resized_image.save(response, 'JPEG')
return response
You should be able to get results identical to ImageMagick by using the correct scaling algorithm, which, in general is ANTIALIAS for cases where the rescaled image is less than 50% of the size of the original, and BICUBIC in all other cases.
For the case of videos, if you're returning a transcoded video stream, the bottleneck will likely be either CPU-time, or network bandwidth.
I find that WSGI could actually handle an fd as an interator response
Example WSGI app:
def image_app(environ, start_response):
start_response('200 OK', [('Content-Type', 'image/jpeg'), ('Connection', 'Close')])
proc = subprocess.Popen([
'convert',
'1.jpg',
'-thumbnail', '200x150',
'-', //to stdout
], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
return proc.stdout
It wrapps the stdout as http response via a pipe