Generate thumbnail for inmemory uploaded video file - django

The client app uploaded a video file and i need to generate a thumbnail and dump it to AWS s3 and return the client the link to the thumbnail.
I searched around and found ffmpeg fit for the purpose.
The following was the code i could come up with:
from ffmpy import FFmpeg
import tempfile
def generate_thumbnails(file_name):
output_file = tempfile.NamedTemporaryFile(suffix='.jpg', delete=False, prefix=file_name)
output_file_path = output_file.name
try:
# generate the thumbnail using the first frame of the video
ff = FFmpeg(inputs={file_name: None}, outputs={output_file_path: ['-ss', '00:00:1', '-vframes', '1']})
ff.run()
# upload generated thumbnail to s3 logic
# return uploaded s3 path
except:
error = traceback.format_exc()
write_error_log(error)
finally:
os.remove(output_file_path)
return ''
I was using django and was greeted with permission error for the above.
I found out later than ffmpeg requires the file to be on the disk and doesn't just take into account the InMemory uploaded file (I may be wrong as i assumed this).
Is there a way to read in memory video file likes normal ones using ffmpeg or should i use StringIO and dump it onto a temp. file?
I prefer not to do the above as it is an overhead.
Any alternative solution with a better benchmark also would be appreciated.
Thanks.
Update:
To save the inmemory uploaded file to disk: How to copy InMemoryUploadedFile object to disk

One of the possible ways i got it to work were as follows:
Steps:
a) read the InMemory uploaded file onto a temp file chunk by chunk
temp_file = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
temp_file_path = temp_file.name
with open(temp_file_path, 'wb+') as destination:
for chunk in in_memory_file_content.chunks():
destination.write(chunk)
b) generate thumbnail using ffmpeg and subprocess
ffmpeg_command = 'ffmpeg -y -i {} -ss 00:00:01 vframes 1 {}'.format(video_file_path, thumbail_file_path
subprocess.call(ffmpeg_command, shell=True)
where,
-y is to overwrite the destination if it already exists
00:00:01 is to grab the first frame
More info on ffmpeg: https://ffmpeg.org/ffmpeg.html

Related

Why is the file uploaded to AWS S3 0B in size?

I am developing a webapplication with Flask as the backend and Nuxt JS as the frontend. I receive an image file from the frontend and can save it to my Flask directory structure locally. The file is ok and the images is being shown if I open it. Now i want to upload this image to AWS S3 instead of saving it to my disk. I use the boto3 SDK, here is my code:
Here is my save_picture method, that opens the image file and resizes it. I had the save method, but commented it out to avoid saving the file to disk as I want it only on S3.
def save_picture(object_id, form_picture, path):
if form_picture is None:
return None
random_hex = token_hex(8)
filename = form_picture.filename
if '.' not in filename:
return None
extension = filename.rsplit('.', 1)[1].lower()
if not allowed_file(extension, form_picture):
return None
picture_fn = f'{object_id}_{random_hex}.{extension}'
picture_path = current_app.config['UPLOAD_FOLDER'] / path / picture_fn
# resizing image and saving the small version
output_size = (1280, 720)
i = Image.open(form_picture)
i.thumbnail(output_size)
# i.save(picture_path)
return picture_fn
image_name = save_picture(object_id=new_object.id, form_picture=file, path=f'{object_type}_images')
s3 = boto3.client(
's3',
aws_access_key_id=current_app.config['AWS_ACCESS_KEY'],
aws_secret_access_key=current_app.config['AWS_SECRET_ACCESS_KEY']
)
print(file) # this prints <FileStorage: 'Capture.JPG' ('image/jpeg')>, so the file is ok
try:
s3.upload_fileobj(
file,
current_app.config['AWS_BUCKET_NAME'],
image_name,
ExtraArgs={
'ContentType': file.content_type
}
)
except Exception as e:
print(e)
return make_response({'msg': 'Something went wrong.'}, 500)
I can see the uploaded file in my S3, but it shows 0 B in size and if I download it, it says that it cannot be viewed.
I have tried different access policies in S3, as well as many tutorials online, nothing seems to help. Changing the version of S3 to v3 when creating the client breaks the whole system and the file is not being uploaded at all with an access error.
What could be the reason for this upload failure? I it the config of AWS or something else?
Thank you!
Thanks to #jarmod I tried to avoid the image processing and it worked. I am now resizing the image, saving it to disk, opening the saved image, not the initial file, and sending it to S3. I then delete the image on disk as I don't need it.

Re-encoding audio file to linear16 for google cloud speech api fails with '[Errno 30] Read-only file system'

I'm trying to convert an audio file to linear 16 format using FFmpeg module. I've stored the audio file in one cloud storage bucket and want to move the converted file to a different bucket. The code works perfectly in VS code and deploys successfully to cloud functions. But, fails with [Errno 30] Read-only file system when run on the cloud.
Here's the code
from google.cloud import speech
from google.cloud import storage
import ffmpeg
import sys
out_bucket = 'encoded_audio_landing'
input_bucket_name = 'audio_landing'
def process_audio(input_bucket_name, in_filename, out_bucket):
'''
converts audio encoding for GSK call center call recordings to linear16 encoding and 16,000
hertz sample rate
Params:
in_filename: a gsk call audio file
returns an audio file encoded so that google speech to text api can transcribe
'''
storage_client = storage.Client()
bucket = storage_client.bucket(input_bucket_name)
blob = bucket.blob(in_filename)
blob.download_to_filename(blob.name)
print('type contents: ', type('processedfile'))
#print('blob name / len / type', blob.name, len(blob.name), type(blob.name))
try:
out, err = (
ffmpeg.input(blob.name)
#ffmpeg.input()
.output('pipe: a', format="s16le", acodec="pcm_s16le", ac=1, ar="16k")
.overwrite_output()
.run(capture_stdout=True, capture_stderr=True)
)
except ffmpeg.Error as e:
print(e.stderr, file=sys.stderr)
sys.exit(1)
up_bucket = storage_client.bucket(out_bucket)
up_blob = up_bucket.blob(blob.name)
#print('type / len out', type(out), len(out))
up_blob.upload_from_string(out)
#delete source file
blob.delete()
def hello_gcs(event, context):
"""Background Cloud Function to be triggered by Cloud Storage.
This generic function logs relevant data when a file is changed,
and works for all Cloud Storage CRUD operations.
Args:
event (dict): The dictionary with data specific to this type of event.
The `data` field contains a description of the event in
the Cloud Storage `object` format described here:
https://cloud.google.com/storage/docs/json_api/v1/objects#resource
context (google.cloud.functions.Context): Metadata of triggering event.
Returns:
None; the output is written to Cloud Logging
"""
#print('Event ID: {}'.format(context.event_id))
#print('Event type: {}'.format(context.event_type))
print('Bucket: {}'.format(event['bucket']))
print('File: {}'.format(event['name']))
print('Metageneration: {}'.format(event['metageneration']))
#print('Created: {}'.format(event['timeCreated']))
#print('Updated: {}'.format(event['updated']))
#convert audio encoding
print('begin process_audio')
process_audio(input_bucket_name, event['name'], out_bucket)
The problem was that I was downloading the file to my local directory, which obviously wouldn't work on the cloud. I read another article where someone used added the get file path function and used that as an input into blob.download_tofilename(). I'm not sure why that worked.
I did try just removing the whole download_tofilename bit, but it didn't work without that.
I'd very much appreciate an explanation if someone knows why
#this gets around downloading the file to a local folder. it creates some sort of templ location
def get_file_path(filename):
file_name = secure_filename(filename)
return os.path.join(tempfile.gettempdir(), file_name)
def process_audio(input_bucket_name, in_filename, out_bucket):
'''
converts audio encoding for GSK call center call recordings to linear16 encoding and 16,000
hertz sample rate
Params:
in_filename: a gsk call audio file
input_bucket_name: location of the sourcefile that needs to be re-encoded
out_bucket: where to put the newly encoded file
returns an audio file encoded so that google speech to text api can transcribe
'''
storage_client = storage.Client()
bucket = storage_client.bucket(input_bucket_name)
blob = bucket.blob(in_filename)
print(blob.name)
#creates some sort of temp loaction for the tile
file_path = get_file_path(blob.name)
blob.download_to_filename(file_path)
print('type contents: ', type('processedfile'))
#print('blob name / len / type', blob.name, len(blob.name), type(blob.name))
#envokes the ffmpeg library to re-encode the audio file, it's actually some sort of command line application
# that is available in Python and google cloud. The things in the .outuput bit are options from ffmpeg, you
# pass these options into ffmpeg there
try:
out, err = (
ffmpeg.input(file_path)
#ffmpeg.input()
.output('pipe: a', format="s16le", acodec="pcm_s16le", ac=1, ar="16k")
.overwrite_output()
.run(capture_stdout=True, capture_stderr=True)
)
except ffmpeg.Error as e:
print(e.stderr, file=sys.stderr)
sys.exit(1)

How to put and access a file with FFmpeg in Google Cloude Storages?

Hi I am a novice developer and deployed my first django project on Heroku.
I want to compress it into ffmpeg and save it to Google Cloud Storage when the user uploads a video file from uploadForm in the Django project.And by extracting the duration from the saved video using ffprobe and storing it in the duration field of object.
Save() of My forms.py code is as follows:
def save(self, *args, **kwargs):
def clean_video(self):
raw_video = self.cleaned_data.get("video")
timestamp = int(time())
raw_video_path = raw_video.temporary_file_path()
print(raw_video_path)
video_name = f"{raw_video}".split(".")[0]
subprocess.run(f"ffmpeg -i {raw_video_path} -vcodec libx265 -crf 28 -acodec mp3 -y uploads/videoart_files/{video_name}_{timestamp}.mp4", shell=True)
return f"videoart_files/{video_name}_{timestamp}.mp4"
videoart = super().save(commit=False)
videoart.video = clean_video(self)
video_path = videoart.video.path
get_duration = subprocess.check_output(['ffprobe', '-i', f'{video_path}', '-show_entries', 'format=duration', '-v', 'quiet', '-of', 'csv=%s' % ("p=0")])
duration = int(float(get_duration.decode('utf-8').replace("\n", "")))
videoart.duration = duration
return videoart
After all the validation of the other fields, I put the code to process the video inside the save method to compress the video at the end. Anyway, this code is not a problem in the local server it works very well. However, the server gets a NotImplementedError ("This backend dogn't support absolute paths.") error.
Naturally, ffmpeg can receive input videos from temporary_file_path(), but it doesn't find a path to output. This absolute path is not the path of GCS.
However, ffmpeg will not recognize the url. I'm not sure how to save a file created in ffmpeg on the server to GCS and how to access it.
Don't you want to give me some advice?

Convert image to PDF with Django?

Receive multiple images as input from the user and convert them into PDF. I don't understand how to implement it with Django.
Use this command to install the packages
pip install img2pdf
Below is the implementation:
Image can be converted into pdf bytes using img2pdf.convert() functions provided by img2pdf module, then the pdf file opened in wb mode and is written with the bytes.
# Python3 program to convert image to pfd
# using img2pdf library
# importing necessary libraries
import img2pdf
from PIL import Image
import os
# storing image path
img_path = "C:/Users/Admin/Desktop/GfG_images/do_nawab.png"
# storing pdf path
pdf_path = "C:/Users/Admin/Desktop/GfG_images/file.pdf"
# opening image
image = Image.open(img_path)
# converting into chunks using img2pdf
pdf_bytes = img2pdf.convert(image.filename)
# opening or creating pdf file
file = open(pdf_path, "wb")
# writing pdf files with chunks
file.write(pdf_bytes)
# closing image file
image.close()
# closing pdf file
file.close()
# output
print("Successfully made pdf file")
Pillow supports PDF format. Documentation is available here.
from PIL import Image
img = Image.open('/path/to/image.jpg')
img = img.convert('RGB') //This removes alpha channel from .png images.
img.save('/path/to/image.pdf', format="PDF")

Django Filename Generation Issue

I have a Django based media server that accepts a variety of video formats for upload. While uploading a large .wmv file, I noticed some strange behavior. The first time I uploaded the video, it took almost five minutes to convert and upload. Thereafter, some sort of caching occurred, and the video would merely point to the one I had previously uploaded. I don't understand why this is happening. When a video is uploaded, the file name extension is checked for conversion purposes, and then an ffmpeg command is executed to carry out the conversion. This is all run asynchronously, using django-celery with RabbitMQ as the message broker. I don't see any reason why the ffmpeg conversion command would not execute again. Here is my code for the celery task that handles the upload. (This was my initial reasoning, look at the EDIT for the correct error diagnosis)
#celery.task
def handleFileUploadAsync(update, m, file_type, video_types):
filename = m.file.name.replace(' ', '\\ ')
if video_types[file_type] == 'wmv':
os.system(
"ffmpeg -i " + MEDIA_ROOT + filename + " -strict experimental -vcodec libx264 -profile:v baseline " + MEDIA_ROOT + filename.replace(video_types[file_type],'mp4')
)
m.file.name = m.file.name.replace(video_types[file_type], 'mp4')
m.save()
os.remove(m.file.path.replace('mp4', 'wmv'))
elif file_type in video_types.keys():
os.system(
"ffmpeg -i " + MEDIA_ROOT + filename + " -vcodec libx264 -profile:v baseline -s 672x576 " + MEDIA_ROOT + filename.replace(video_types[file_type],'mp4')
)
m.file.name = m.file.name.replace(video_types[file_type], 'mp4')
m.save()
if video_types[file_type] != 'mp4':
os.remove(m.file.path.replace('mp4', video_types[file_type]))
EDIT:
Here's the problem. When I convert videos, I only want the converted .mp4 file, not the original upload. Django generates filenames from the file upload field, automatically appending numbers to the end of existing files (i.e. test.mp4, test_1.mp4, test_2.mp4, etc.). However, when I upload a video like test.wmv, there will be no file named test.wmv after the conversion is complete (I delete the non-converted file). Is there any way I can modify Django method that generates these filenames??
Use upload_to when declaring the FileField. Maybe use the object's primary key as the filename?