Flask: stream file upload without form boundary data to file - flask

I want to upload large files using flask. Rather than try to load the entire file into memory, I've implemented the request.stream.read() method to stream the file to disk in chunks, as per the following code, which is very similar to answers given to many similar questions I have found:
#app.route("/uploadData", methods=["POST"])
def uploadData():
filename = uuid.uuid4().hex + '.nc'
filePath = os.path.join("/tmp", filename)
with open(filePath, "wb+") as f:
chunk_size = 4096
while True:
chunk = flask.request.stream.read(chunk_size)
if len(chunk) == 0:
break
f.write(chunk)
return flask.jsonify({'success': True, 'filename': filename})
This works well, except that it "wraps" the file in post data, like the following:
------WebKitFormBoundaryoQ8GPdNkcfUNrKBd
Content-Disposition: form-data; name="inputFile"; filename="some_file_upload.nc"
Content-Type: application/x-netcdf
<Actual File content here>
------WebKitFormBoundaryoQ8GPdNkcfUNrKBd--
How can I stream the file to disk without getting the form boundary stuff?
In theory, I could call flask.request.file or the like to get the file correctly, but as that loads the entire file into memory (or more likely a temporary file), and is quite slow relative to the stream method, I don't like it as a solution.
If it makes a difference, I'm initiating the file upload using the following javascript:
var formData=new FormData($('#fileform')[0])
$.ajax({
url:'/uploadData',
data:formData,
processData:false,
contentType:false,
type:'POST'
})
EDIT: I've managed to work around the issue by using readline() rather than read(), discarding the first four lines, and then checking for chunk starting with "---" to discard the last line, which works. However, this feels both kludgy and fragile, so if there is a better solution, I would love to hear it.

Related

How to get number of lines of code of a file in a remote repo using PyGithub/ Githubsearch api?

commit = repo.get_commit(sha="0adf369fda5c2d4231881d66e3bc0bd12fb86c9a")
print(commit.stats.total)
i = commit.files[0].filename
I can get the filename, even the file sha; but can't seem to get loc of the file. Any pointers?
So let's see this line
commit = repo.get_commit(sha="0adf369fda5c2d4231881d66e3bc0bd12fb86c9a")
Here the commit is of type github.Commit.Commit
Now when you pick a file, it's of the type github.File.File
If you checked that, you'll see that there is no real way of getting lines of code directly. But there is one important field raw_url.
This will give you the raw_url of the file, which you can now get, perhaps like
url = commit.files[0].raw_url
r = requests.get(url)
r.text
This will give you the raw data of the file and you can use it to get the number of lines of code.

Finding out the file name in a FileUploadHandler

I am rolling my own fileupload handler in django and would like to know the file name. I am supporting more than one file format and want to do different processing in the receive_data_chunk method depending on which file format the uploaded file has. I thought I would be pragmatic and just judge file format based on file ending but I can't figure out how to get hold of the file name. If I try to extract the file name with something like the following code (before that method is called):
if request.method == 'POST':
p = re.compile('^.*\.sdf$', re.IGNORECASE)
if ( p.search(request.FILES['filecontent'].name) ) :
self.sdf = True
else:
self.sdf = False
It seems I never reach the receive_data_chunk method. I presume the call to request.FILES trigger the loading somehow and then it's already done? How can I do different processing based on file ending in my receive_data_chunk method?
Have you tried using
data=request.POST.copy()
and then working on the copy? I have used this for other things but may work in this case as well.

Multipart httpresponse with django

I would like some help about my code. My goal is to send at the same time string variables as a ini plain text and a bmp file in an httpResponse.
For the moment I insert the decoded bytes of the bmp file in an ini parameter, take into account that I communicate with an interphone which is only client but not server so I can only make httpresponses but no requests.
If I base64 encode my image, I'll need to change the software of our interphone to decode it, for the moment I can't, can you tell me if base64 encode bytes is mandatory in my case ?
I made some researches on the web and I saw that people base64 encode their images or they make multipart response.
Could you help me to implement a multipart response please, even hand made, that would interest me ?
I show you how I do for the moment, I put the image in the "string" ini parameter:
def send_bmp():
outputConfig = io.StringIO()
outputConfig.write('[RETURN_INFO]\r\n')
outputConfig.write('config_id=255\r\n')
outputConfig.write('config_type=2\r\n')
outputConfig.write('action=3\r\n')
outputConfig.write('[DATABASE]\r\n')
file = open(django_settings.TMP_DIR+'/qrcode.bmp', 'rb').read()
outputConfig.write('size_all='+str(len(file))+'\r\n')
outputConfig.write('string='+file.decode('iso-8859-1')+'\r\n')
outputConfig.write('csum='+str(sum(file))+'\r\n')
body = outputConfig.getvalue()
httpR = HttpResponse(body, content_type='text/plain;charset=iso-8859-1')
httpR['Content-Length'] = len(body)
return httpR
Here is the response I get :
https://gist.github.com/Ezekiah/e6fd50f13c05f338f27a
If you need to mix the image file content with the rest of the response I think you have to use Base64 encoding. If it is possible to return the ini parameters in one request and the file in another Django provides a FileResponse class(subclass of StreamingHttpResponse) that you can use to return the bmp file in chunks, like this:
from django.http import FileResponse
def send_bmp(request):
file = open(django_settings.TMP_DIR+'/qrcode.bmp', 'rb')
return FileResponse(file)

Reading only the request body in a PUT upload

I have a Django view which accepts an uploaded file over PUT. I've created my own upload handler and am processing the data in chunks like so:
handler = MD5ChecksumUploadHandler()
handler.new_file(field_name="file", file_name="unknown",
content_type=request.META.get('CONTENT_TYPE', 'application/octet-stream'),
content_length=int(request.META.get('CONTENT_LENGTH', 0)))
upload_size = 0
while True:
# read the request body in chunks
chunk = request.read(handler.chunk_size)
if chunk:
handler.receive_data_chunk(chunk, start=None)
upload_size += len(chunk)
else:
break
# return the MD5ChecksumUploadedFile
return handler.file_complete(upload_size)
As I've found out, the request.read method starts reading at the beginning of the actual request and not the request body. This causes my MD5 checksums to fail, which is incidentally good, as I know that something's going wrong.
Is there a way for me to read the actual request body rather than just the raw request?
The request I was making was a bad one, that's what my problem was:
Content-MD5\n: XXXXXXXXXXXXXXXXXXXXX
Check your request if you're having this problem.

Django upload file into specific directory that depends on the POST URI

I'd like to store uploaded files into a specific directory that depends on the URI of the POST request. Perhaps, I'd also like to rename the file to something fixed (the name of the file input for example) so I have an easy way to grep the file system, etc. and also to avoid possible security problems.
What's the preferred way to do this in Django?
Edit: I should clarify that I'd be interested in possibly doing this as a file upload handler to avoid writing a large file twice to the file system.
Edit2: I suppose one can just 'mv' the tmp file to a new location. That's a cheap operation if on the same file system.
Fixed olooney example. It is working now
#csrf_exempt
def upload_video_file(request):
folder = 'tmp_dir2/' #request.path.replace("/", "_")
uploaded_filename = request.FILES['file'].name
BASE_PATH = '/home/'
# create the folder if it doesn't exist.
try:
os.mkdir(os.path.join(BASE_PATH, folder))
except:
pass
# save the uploaded file inside that folder.
full_filename = os.path.join(BASE_PATH, folder, uploaded_filename)
fout = open(full_filename, 'wb+')
file_content = ContentFile( request.FILES['file'].read() )
try:
# Iterate through the chunks.
for chunk in file_content.chunks():
fout.write(chunk)
fout.close()
html = "<html><body>SAVED</body></html>"
return HttpResponse(html)
except:
html = "<html><body>NOT SAVED</body></html>"
return HttpResponse(html)
Django gives you total control over where (and if) you save files. See: http://docs.djangoproject.com/en/dev/topics/http/file-uploads/
The below example shows how to combine the URL and the name of the uploaded file and write the file out to disk:
def upload(request):
folder = request.path.replace("/", "_")
uploaded_filename = request.FILES['file'].name
# create the folder if it doesn't exist.
try:
os.mkdir(os.path.join(BASE_PATH, folder))
except:
pass
# save the uploaded file inside that folder.
full_filename = os.path.join(BASE_PATH, folder, uploaded_filename)
fout = open(full_filename, 'wb+')
# Iterate through the chunks.
for chunk in fout.chunks():
fout.write(chunk)
fout.close()
Edit: How to do this with a FileUploadHandler? It traced down through the code and it seems like you need to do four things to repurpose the TemporaryFileUploadHandler to save outside of FILE_UPLOAD_TEMP_DIR:
extend TemporaryUploadedFile and override init() to pass through a different directory to NamedTemporaryFile. It can use the try mkdir except for pass I showed above.
extend TemporaryFileUploadHandler and override new_file() to use the above class.
also extend init() to accept the directory where you want the folder to go.
Dynamically add the request handler, passing through a directory determined from the URL:
request.upload_handlers = [ProgressBarUploadHandler(request.path.replace('/', '_')]
While non-trivial, it's still easier than writing a handler from scratch: In particular, you won't have to write a single line of error-prone buffered reading. Steps 3 and 4 are necessary because FileUploadHandlers are not passed request information by default, I believe, so you'll have to tell it separately if you want to use the URL somehow.
I can't really recommend writing a custom FileUploadHandler for this. It's really mixing layers of responsibility. Relative to the speed of uploading a file over the internet, doing a local file copy is insignificant. And if the file's small, Django will just keep it in memory without writing it out to a temp file. I have a bad feeling that you'll get all this working and find you can't even measure the performance difference.