I have a Django based media server that accepts a variety of video formats for upload. While uploading a large .wmv file, I noticed some strange behavior. The first time I uploaded the video, it took almost five minutes to convert and upload. Thereafter, some sort of caching occurred, and the video would merely point to the one I had previously uploaded. I don't understand why this is happening. When a video is uploaded, the file name extension is checked for conversion purposes, and then an ffmpeg command is executed to carry out the conversion. This is all run asynchronously, using django-celery with RabbitMQ as the message broker. I don't see any reason why the ffmpeg conversion command would not execute again. Here is my code for the celery task that handles the upload. (This was my initial reasoning, look at the EDIT for the correct error diagnosis)
#celery.task
def handleFileUploadAsync(update, m, file_type, video_types):
filename = m.file.name.replace(' ', '\\ ')
if video_types[file_type] == 'wmv':
os.system(
"ffmpeg -i " + MEDIA_ROOT + filename + " -strict experimental -vcodec libx264 -profile:v baseline " + MEDIA_ROOT + filename.replace(video_types[file_type],'mp4')
)
m.file.name = m.file.name.replace(video_types[file_type], 'mp4')
m.save()
os.remove(m.file.path.replace('mp4', 'wmv'))
elif file_type in video_types.keys():
os.system(
"ffmpeg -i " + MEDIA_ROOT + filename + " -vcodec libx264 -profile:v baseline -s 672x576 " + MEDIA_ROOT + filename.replace(video_types[file_type],'mp4')
)
m.file.name = m.file.name.replace(video_types[file_type], 'mp4')
m.save()
if video_types[file_type] != 'mp4':
os.remove(m.file.path.replace('mp4', video_types[file_type]))
EDIT:
Here's the problem. When I convert videos, I only want the converted .mp4 file, not the original upload. Django generates filenames from the file upload field, automatically appending numbers to the end of existing files (i.e. test.mp4, test_1.mp4, test_2.mp4, etc.). However, when I upload a video like test.wmv, there will be no file named test.wmv after the conversion is complete (I delete the non-converted file). Is there any way I can modify Django method that generates these filenames??
Use upload_to when declaring the FileField. Maybe use the object's primary key as the filename?
Related
Hi I am a novice developer and deployed my first django project on Heroku.
I want to compress it into ffmpeg and save it to Google Cloud Storage when the user uploads a video file from uploadForm in the Django project.And by extracting the duration from the saved video using ffprobe and storing it in the duration field of object.
Save() of My forms.py code is as follows:
def save(self, *args, **kwargs):
def clean_video(self):
raw_video = self.cleaned_data.get("video")
timestamp = int(time())
raw_video_path = raw_video.temporary_file_path()
print(raw_video_path)
video_name = f"{raw_video}".split(".")[0]
subprocess.run(f"ffmpeg -i {raw_video_path} -vcodec libx265 -crf 28 -acodec mp3 -y uploads/videoart_files/{video_name}_{timestamp}.mp4", shell=True)
return f"videoart_files/{video_name}_{timestamp}.mp4"
videoart = super().save(commit=False)
videoart.video = clean_video(self)
video_path = videoart.video.path
get_duration = subprocess.check_output(['ffprobe', '-i', f'{video_path}', '-show_entries', 'format=duration', '-v', 'quiet', '-of', 'csv=%s' % ("p=0")])
duration = int(float(get_duration.decode('utf-8').replace("\n", "")))
videoart.duration = duration
return videoart
After all the validation of the other fields, I put the code to process the video inside the save method to compress the video at the end. Anyway, this code is not a problem in the local server it works very well. However, the server gets a NotImplementedError ("This backend dogn't support absolute paths.") error.
Naturally, ffmpeg can receive input videos from temporary_file_path(), but it doesn't find a path to output. This absolute path is not the path of GCS.
However, ffmpeg will not recognize the url. I'm not sure how to save a file created in ffmpeg on the server to GCS and how to access it.
Don't you want to give me some advice?
i am working on a script which uploads pictures to S3 and then adds that picture to a rekognition collection. When i run this script from the command line everything works perfectly - no issues. However, when the system executes it automatically (the script runs whenever a new file is added to the specified upload folder) the rekognition portion of the code does not run. Everything up through os.remove works fine automatically, but i can't get the images added to the collection. After days of messing with the code i am looking for some help - please let me know if i am missing something here.
After messing around with the script a bit and debugging it is client=boto3.client('rekognition') which for some reason does not allow the script to run. Any thoughts on why that would be?
import boto3
import os
#Get file name of newly uploaded picture
path = "/var/www/html/upload/webcam-capture/"
files = os.listdir(path)
for name in files:
#Split up file name for proper processing
components = name.split('-')
schoolName = components[0]
imageType = components[1]
idNumber = components[2]
# Upload files to bucket with Python SDK
s3 = boto3.resource('s3')
s3.meta.client.upload_file(path + name, schoolName + '-' + imageType + '-' + 'media.XXX.school', idNumber)
# Delete file from webcam-capture temp folder
os.remove (path+name)
#Add Face to Facial Recognition Collection
collection_id = schoolName + '-' + imageType
bucket= collection_id + '-media.XXX.school'
client=boto3.client('rekognition')
response=client.index_faces(CollectionId=collection_id,Image={'S3Object':{'Bucket':bucket,'Name':idNumber}},MaxFaces=1,QualityFilter="AUTO",DetectionAttributes=['ALL'])```
I have a setup that lets users download files that are stored in the DB as BYTEA data. Everything works OK, except the download speed is very slow...it seems to download in 33KB chunks, one chunk per second.
Is there a setting I can specify to speed this up?
views.py
from django.http import FileResponse
def getFileResponse(filedata, filename, filesize, contenttype):
response = FileResponse(filedata, content_type=contenttype)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
response['Content-Length'] = filesize
return response
return getFileResponse(
filedata = myfile.filedata, # Binary data from DB
filename = myfile.filename + myfile.fileextension,
filesize = myfile.filesize,
contenttype = myfile.filetype
)
Previously, I had the binary data returned as an HttpResponse and it downloaded like a normal file, with normal speeds. This worked fine locally, but when I pushed to Heroku, it wouldn't download the file -- instead displaying <Memory at XXX> in the download file.
And another side issue...when I include a text file with non-ASCII data (i.e. รก), I get an error as well:
UnicodeEncodeError: 'ascii' codec can't encode characters...: ordinal not in range(128)
How can I handle files with Unicode data?
Update
Anyone know why the download speed gets so slow when changing from HTTPResponse to FileResponse? Or alternatively, why the HTTPResponse to return a file doesn't work on Heroku?
Update - Google Drive
I re-worked my application and hooked it up with a Google Drive back-end for serving files. It employs BytesIO() suggested by Eric below:
def download_file(self, fileid, mimetype=None):
# Get binary file data
request = self.get_file(fileid=fileid, mediaflag=True)
stream = io.BytesIO()
downloader = MediaIoBaseDownload(stream, request)
done = False
# Retry if we received HTTPError
for retry in range(0, 5):
try:
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
return stream.getvalue()
except (HTTPError) as error:
return ('API error: {}. Try # {} failed.'.format(error.response, retry))
I think the difference you observe between HttpResponse vs. FileResponse is caused by the spec: https://www.python.org/dev/peps/pep-3333/#buffering-and-streaming
In your previous code, an HttpResponse was created with one huge byte string containing your whole file, and the first iteration pass returned the complete response body. With a a FileResponse, the file is iterated in chunks (of 4kb, 8kb or other depending on your WSGI app server), which (I think) are streamed immediately upstream (to the reverse proxy then client), which may add overhead (more communication over process boundaries?).
It would help to know the app server used (uwsgi, gunicorn, waitress, other) and its relevant config. Also more details about the heroku error in case that can be solved!
why you store whole file in database.
best case is to store file on hard and store only path on database
then according to your web server you can let web server to serve file.
web services serve file better than Django.
if files have no access check store them on media
if your files have access control you according to your web server you can use some response headers
if you use Nginx must use X-Accel-Redirect and use any alternative on other web services tutorial on https://wellfire.co/learn/nginx-django-x-accel-redirects/
The client app uploaded a video file and i need to generate a thumbnail and dump it to AWS s3 and return the client the link to the thumbnail.
I searched around and found ffmpeg fit for the purpose.
The following was the code i could come up with:
from ffmpy import FFmpeg
import tempfile
def generate_thumbnails(file_name):
output_file = tempfile.NamedTemporaryFile(suffix='.jpg', delete=False, prefix=file_name)
output_file_path = output_file.name
try:
# generate the thumbnail using the first frame of the video
ff = FFmpeg(inputs={file_name: None}, outputs={output_file_path: ['-ss', '00:00:1', '-vframes', '1']})
ff.run()
# upload generated thumbnail to s3 logic
# return uploaded s3 path
except:
error = traceback.format_exc()
write_error_log(error)
finally:
os.remove(output_file_path)
return ''
I was using django and was greeted with permission error for the above.
I found out later than ffmpeg requires the file to be on the disk and doesn't just take into account the InMemory uploaded file (I may be wrong as i assumed this).
Is there a way to read in memory video file likes normal ones using ffmpeg or should i use StringIO and dump it onto a temp. file?
I prefer not to do the above as it is an overhead.
Any alternative solution with a better benchmark also would be appreciated.
Thanks.
Update:
To save the inmemory uploaded file to disk: How to copy InMemoryUploadedFile object to disk
One of the possible ways i got it to work were as follows:
Steps:
a) read the InMemory uploaded file onto a temp file chunk by chunk
temp_file = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
temp_file_path = temp_file.name
with open(temp_file_path, 'wb+') as destination:
for chunk in in_memory_file_content.chunks():
destination.write(chunk)
b) generate thumbnail using ffmpeg and subprocess
ffmpeg_command = 'ffmpeg -y -i {} -ss 00:00:01 vframes 1 {}'.format(video_file_path, thumbail_file_path
subprocess.call(ffmpeg_command, shell=True)
where,
-y is to overwrite the destination if it already exists
00:00:01 is to grab the first frame
More info on ffmpeg: https://ffmpeg.org/ffmpeg.html
I have a process that scans a tape library and looks for media that has expired, so they can be removed and reused before sending the tapes to an offsite vault. (We have some 7 day policies that never make it offsite.) This process takes around 20 minutes to run, so I didn't want it to run on-demand when loading/refreshing the page. Rather, I set up a django-cron job (I know I could have done this in Linux cron, but wanted the project to be as self-contained as possible) to run the scan, and creates a file in /tmp. I've verified that this works -- the file exists in /tmp from this morning's execution. The problem I'm having is that now I want to display a list of those expired (scratch) media on my web page, but the script is saying that it can't find the file. When the file was created, I use the absolute filename "/tmp/scratch.2015-11-13.out" (for example), but here's the error I get in the browser:
IOError at /
[Errno 2] No such file or directory: '/tmp/corpscratch.2015-11-13.out'
My assumption is that this is a "web root" issue, but I just can't figure it out. I tried copying the file to the /static/ and /media/ directories configured in django, and even in the django root directory, and the project root directory, but nothing seems to work. When it says it cant' find /tmp/file, where is it really looking?
def sample():
""" Just testing """
today = datetime.date.today() #format 2015-11-31
inputfile = "/tmp/corpscratch.%s.out" % str(today)
with open(inputfile) as fh: # This is the line reporting the error
lines = [line.strip('\n') for line in fh]
print(lines)
The print statement was used for testing in the shell (which works, I might add), but the browser gives an error.
And the file does exist:
$ ls /tmp/corpscratch.2015-11-13.out
/tmp/corpscratch.2015-11-13.out
Thanks.
Edit: was mistaken, doesn't work in python shell either. Was thinking of a previous issue.
Use this instead:
today = datetime.datetime.today().date()
inputfile = "/tmp/corpscratch.%s.out" % str(today)
Or:
today = datetime.datetime.today().strftime('%Y-%m-%d')
inputfile = "/tmp/corpscratch.%s.out" % today # No need to use str()
See the difference:
>>> str(datetime.datetime.today().date())
'2015-11-13'
>>> str(datetime.datetime.today())
'2015-11-13 15:56:19.578569'
I ended up finding this elsewhere:
today = datetime.date.today() #format 2015-11-31
inputfilename = "tmp/corpscratch.%s.out" % str(today)
inputfile = os.path.join(settings.PROJECT_ROOT, inputfilename)
With settings.py containing the following:
PROJECT_ROOT = os.path.abspath(os.path.dirname(__file__))
Completely resolved my issues.