Django FileUploadHundler force file storage to Disk

Django FileUploadHundler force file storage to Disk - django

I understand that Django File Upload Handler by default stores files less than 2.5MB to memory and those above it to a temp folder in the disk,
In my models,where I have a file field,I have specified the upload_tofolder where I expect the files to be written to.
Though when I try reading this files from this folder,I get an error implying that the files do not yet exist in that folder.
How will I force django to write the files to the folder specified in upload_to before another procedure starts reading from it?
I know I can read the files directly from memory by request.FILES['file'].name but I would rather force the files to be written from memory to folder before I read them.
Any Insights will be highly appreciated.

FILE_UPLOAD_MAX_MEMORY_SIZE setting tells django the maximum size of file to keep in memory. Set it to 0 and it will be always written to disk.

Related

Minio/S3 scenarios where files have to be moved in batch

I searched but haven't found a satisfying solution.
Minio/S3 does not have directories, only keys (with prefixes). So far so good.
Now I am in the need to change those prefixes. Not for a single file but for a whole bunch (a lot) files which can be really large (actually no limit).
Unfortunatly these storage servers seem not to have a concept of (and does not support):
rename file
move file
What has to be done is for each file
copy the file to the new target location
delete the file from the old source location
My given design looks like:
users upload files to bucketname/uploads/filename.ext
a background process takes the uploaded files, generates some more files and uploads them to bucketname/temp/filename.ext
when all processings are done the uploaded file and the processed files are moved to bucketname/processed/jobid/new-filenames...
The path prefix is used when handling the object created notification to differentiate if it is a upload (start processing), temp (check if all files are uploaded) and processed/jobid for holding them until the user deletes them.
Imagine a task where 1000 files have to get to a new location (within the same bucket) copy and delete them one by one has a lot of space for errors. Out of storage space during the copy operation and connection errors without any chance for rollback(s). It doesn't get easier if the locations would be different bucktes.
So, having this old design and not chance to rename/move a file:
Is there any change to copy the files without creating new physical files (without duplicating used storage space)?
Any experienced cloud developer could give me please a hint how to do this bulk copy with rollbacks in error cases?
Anyone implemented something like that with a functional rollback mechanism if e.g. file 517 of 1000 fails? Copy and delete them back seems not to be way to go.
Currently I am using Minio server and Minio dotnet library. But since they are compatible with Amazon S3 this scenario could also have happend on Amazon S3.

concatenate/append/merge files in c++ (windows) without coping

How can i concatenate few large files(total size~ 3 Tb) in 1 file using c/c++ on windows?
I cant copy data, because it takes too much time, so i cant use:
cmd copy
Appending One File to Another File(https://msdn.microsoft.com/en-us/library/windows/desktop/aa363778%28v=vs.85%29.aspx)
and so on(stream::readbuf(),...)
I just need represent few files as one.

if this is inside your own program only, then you can create a class that would virtually glue the files together so you can read over it and make it apear as a single file.
if you want to physically have a single file. then no, not possible.
that requires opening file 1 and appending the others.
or creating a new file and appending all the files.
neither the C/C++ library nor the windows API have a means to concatenate files
even if such an API would be available, it would be restrictive in that the first file would have to be of a size that is a multiple of the disk allocation size.
Going really really low level, and assuming the multiple of allocation size is fulfilled... yes, if you unmount the drive, and physically override the file system and mess around with the file system structures, you could "stitch" the files together but that would be a challenge to do for FAT, and near impossible for NTFS.

python mechanize retrieving files larger than 1GB

I am trying to download some files via mechanize. Files smaller than 1GB are downloaded without causing any trouble. However, if a file is bigger than 1GB the script runs out of memory:
The mechanize_response.py script throws out of memory at the following line
self.__cache.write(self.wrapped.read())
__cache is a cStringIO.StringIO, It seems that it can not handle more than 1GB.
How to download files larger than 1GB?
Thanks

It sounds like you are trying to download the file into memory but you don't have enough. Try using the retrieve method with a file name to stream the downloaded file to disc.

I finally figured out a work around.
Other than using browser.retrieve or browser.open I used mechanize.urlopen which returned the urllib2 Handler. This allowed me to download files larger than 1GB.
I am still interested in figuring out how to make retrieve work for files larger than 1GB.

How to get the list of files created during a specific period of time in a directory?

I need to get the list of files that have been created within a specific period of time in a directory, e.g files created after 19:14 and before 23:11. Each directory contains files belonging to a specific date (24 hours). Should I include the creation time of each file in its name? (like prefix-hh-mm-ss-ms.txt). These files are meant to be copied from another place to the directory, so I am afraid copying may modify the creation time of file and I should not rely on it. Any advice showing me the best way to achieve what I want to do would be appreciated.

Copying should not "modify" the creation time; since the destination file was actually only created at copying time, isn't it only logical that the creation time of the copied file is the time when the copying occured?
The file creation time is however not really available under linux anyway (see the question you linked yourself How to get 'file creation time' in Linux, or https://superuser.com/questions/437663/whats-an-elegant-way-to-copy-the-creation-and-modification-dates-of-a-file-to-a).
So you'll have to encode that in some other way anyway. Encoding it in the filename as you suggest sounds like a reasonable way!

Where does django store temporary upload files?

I have a Django/uwsgi/nginx stack running on CentOS. When uploading a large file to django (1GB+), I expect it to create a temp file in /tmp and I should be able to watch it grow as the upload progresses. However, I don't. ls -lah /tmp doesn't show any new files being created or changing in size. I even specified in my settings.py explicitly that FILE_UPLOAD_TEMP_DIR = '/tmp' but still nothing.
I'd appreciate any help in tracking down where the temp files are stored. I need this to determine whether there are any large uploads in progress.

They are stored in your system's temp directory. From https://docs.djangoproject.com/en/dev/topics/http/file-uploads/?from=olddocs:
Where uploaded data is stored
Before you save uploaded files, the data needs to be stored somewhere.
By default, if an uploaded file is smaller than 2.5 megabytes, Django
will hold the entire contents of the upload in memory. This means that
saving the file involves only a read from memory and a write to disk
and thus is very fast.
However, if an uploaded file is too large, Django will write the
uploaded file to a temporary file stored in your system's temporary
directory. On a Unix-like platform this means you can expect Django to
generate a file called something like /tmp/tmpzfp6I6.upload. If an
upload is large enough, you can watch this file grow in size as Django
streams the data onto disk.
These specifics -- 2.5 megabytes; /tmp; etc. -- are simply "reasonable
defaults". Read on for details on how you can customize or completely
replace upload behavior.
Additionally, this only happens after a given size, defaulted to 2.5MB
FILE_UPLOAD_MAX_MEMORY_SIZE The maximum size, in bytes, for files that
will be uploaded into memory. Files larger than
FILE_UPLOAD_MAX_MEMORY_SIZE will be streamed to disk.
Defaults to 2.5 megabytes.

I just tracked this down on my OS X system with Django 1.4.1.
In django/core/files/uploadedfile.py, a temporary file is created using django.core.files.temp, imported as tempfile
from django.core.files import temp as tempfile
This simply returns Python's standard tempfile.NamedTemporaryFile unless it's running on Windows.
To see the location of the tempdir, you can run this command at the shell:
python -c "import tempfile; print tempfile.gettempdir()"
On my system right now it outputs /var/folders/9v/npjlh_kn7s9fv5p4dwh1spdr0000gn/T, which is where I found my temporary uploaded file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js