I have a music uploading app and believe that it would be smart to pass the files to a celery task to handle uploading. However, when attempting to pass the files, as I will show in my code below, I get a message stating that they are not JSON serializable. What would be the correct way to handle this operation?
Everything below uploaded_songs in .views.py is my current code that successfully uploads the audio tracks. It doesn't, however, utilize celery yet.
.task.py
from django.contrib.auth import get_user_model
from Beyond_April_Base_Backend.celery import app
from django.contrib.auth.models import User
#app.task
def upload_songs(songs, user_id):
try:
user = User.objects.get(pk=user_id)
print('user and songs')
print(user)
print(songs)
except User.DoesNotExist:
logging.warning("Tried to find non-exisiting user '%s'" % user_id)
.views.py
class ConcertUploadView(APIView):
permission_classes = [permissions.IsAuthenticated]
def post(self, request):
track_files = request.FILES.getlist('files')
current_user = self.request.user
upload_songs.delay(track_files, current_user.pk)
try:
selected_band = Band.objects.get(name=request.data['band'])
except ObjectDoesNotExist:
print('band not received from form')
selected_band = Band.objects.get(name='Band')
venue_name = request.data['venue']
concert_date_str = request.data['concertDate']
concert_date_split = concert_date_str.split('(')[0]
concert_date = datetime.strptime(concert_date_split, '%a %b %d %Y %H:%M:%S %Z%z ')
concert_city = request.data['city']
concert_state = request.data['state']
concert_country = request.data['country']
new_concert = Concert(
venue=venue_name,
date=concert_date,
city=concert_city,
state=concert_state,
country=concert_country,
band=selected_band,
user=current_user,
)
new_concert.save()
i = 0
for song in track_files:
audio_metadata = music_tag.load_file(track_files[i].temporary_file_path())
temp_path = song.temporary_file_path
song_title = str(audio_metadata['title'])
audio_file_instance = Song(
title=song_title,
concert=new_concert,
user=current_user,
concert_order = i + 1,
audio_file = track_files[i],
)
audio_file_instance.save()
i += 1
return Response(status=status.HTTP_201_CREATED)
When you create a celery task, it serializes the arguments so that it can store the message in the queue backend (RabbitMQ, Redis, etc). The default serializer is JSON, and a binary file is not JSON-serializable. See celery's serialization docs for more info.
You could base64 encode the binary file to text, but you shouldn't: it will increase the size of the data, and you'll be passing around potentially very large messages. With lots of large messages, you could run out of memory/space in your backend, and it will make it hard to inspect or log messages.
Instead, you should store the binary file somewhere, and pass a reference (filename, S3 URL, database key, etc) to the task. The task can then load the file, do what it needs to, and delete the original (if appropriate).
Related
How can I download a large CSV file that shows me a 502 bad gateway error?
I get this solution I added in below.
Actually, in this, we use streaming references. In this concept for example we download a movie it's will download in the browser and show status when complete this will give the option to show in a folder same as that CSV file download completely this will show us.
There is one solution for resolving this error to increase nginx time but this is will affect cost so better way to use Django streaming. streaming is like an example when we add a movie for download it's downloading on the browser. This concept is used in Django streaming.
Write View for this in Django.
views.py
from django.http import StreamingHttpResponse
503_ERROR = 'something went wrong.'
DASHBOARD_URL = 'path'
def get_headers():
return ['field1', 'field2', 'field3']
def get_data(item):
return {
'field1': item.field1,
'field2': item.field2,
'field3': item.field3,
}
class CSVBuffer(object):
def write(self, value):
return value
class Streaming_CSV(generic.View):
model = Model_name
def get(self, request, *args, **kwargs):
try:
queryset = self.model.objects.filter(is_draft=False)
response = StreamingHttpResponse(streaming_content=(iter_items(queryset, CSVBuffer())), content_type='text/csv', )
file_name = 'Experience_data_%s' % (str(datetime.datetime.now()))
response['Content-Disposition'] = 'attachment;filename=%s.csv' % (file_name)
except Exception as e:
print(e)
messages.error(request, ERROR_503)
return redirect(DASHBOARD_URL)
return response
urls.py
path('streaming-csv/',views.Streaming_CSV.as_view(),name = 'streaming-csv')
For reference use the below links.
https://docs.djangoproject.com/en/4.0/howto/outputting-csv/#streaming-large-csv-files
GIT.
https://gist.github.com/niuware/ba19bbc0169039e89326e1599dba3a87
GIT
Adding rows manually to StreamingHttpResponse (Django)
I am trying to upload and process excel file using Django and DRF with Celery.
There is an issue when I am trying to pass the file to my Celery task to be processed in the background, I get a following error:
kombu.exceptions.EncodeError: Object of type InMemoryUploadedFile is not JSON serializable
Here is my view post request handler:
class FileUploadView(generics.CreateAPIView):
"""
POST: upload file to save data in the database
"""
parser_classes = [MultiPartParser]
serializer_class = FileSerializerXLSX
def post(self, request, format=None):
"""
Allows to upload file and lets it be handled by pandas
"""
serialized = FileSerializerXLSX(data=request.data)
if serialized.is_valid():
file_obj = request.data['file']
# file_bytes = file_obj.read()
print(file_obj)
import_excel_task.delay(file_obj)
print("its working")
return Response(status=204)
return Response(serialized._errors, status=status.HTTP_400_BAD_REQUEST)
And my celery task:
def import_excel_helper(file_obj):
df = extract_excel_to_dataframe(file_obj)
transform_df_to_clientmodel(df)
transform_df_to_productmodel(df)
transform_df_to_salesmodel(df)
#shared_task(name="import_excel_task")
def import_excel_task(file_obj):
"""Save excel file in the background"""
logger.info("Importing excel file")
import_excel_helper(file_obj)
Any idea what is the way to handle importing Excel files into celery task so that it can be processed by other functions in the background?
As in the error, the body of the request to call a celery task must be JSON serializable since it is the default configuration. Then as documented in kombu:
The primary disadvantage to JSON is that it limits you to the following data types: strings, Unicode, floats, boolean, dictionaries, and lists. Decimals and dates are notably missing.
Let's say this is my excel file.
file.xlsx
Some
Value
Here
:)
Solution 1
Convert the raw bytes of the excel into Base64 string before calling the task so that it can be JSON serialized (since strings are valid data types in a JSON document, raw bytes are not). Then, everything else in the Celery configurations are the same default values.
tasks.py
import base64
import pandas
from celery import Celery
app = Celery('tasks')
#app.task
def add(excel_file_base64):
excel_file = base64.b64decode(excel_file_base64)
df = pandas.read_excel(excel_file)
print("Contents of excel file:", df)
views.py
import base64
from tasks import add
with open("file.xlsx", 'rb') as file: # Change this to be your <request.data['file']>
excel_raw_bytes = file.read()
excel_base64 = base64.b64encode(excel_raw_bytes).decode()
add.apply_async((excel_base64,))
Output
[2021-08-19 20:40:28,904: INFO/MainProcess] Task tasks.add[d5373444-485d-4c50-8695-be2e68ef1c67] received
[2021-08-19 20:40:29,094: WARNING/ForkPoolWorker-4] Contents of excel file:
[2021-08-19 20:40:29,094: WARNING/ForkPoolWorker-4]
[2021-08-19 20:40:29,099: WARNING/ForkPoolWorker-4] Some Value
0 Here :)
[2021-08-19 20:40:29,099: WARNING/ForkPoolWorker-4]
[2021-08-19 20:40:29,099: INFO/ForkPoolWorker-4] Task tasks.add[d5373444-485d-4c50-8695-be2e68ef1c67] succeeded in 0.19386404199940444s: None
Solution 2:
This is the harder way. Implement a custom serializer that will handle excel files.
tasks.py
import ast
import base64
import pandas
from celery import Celery
from kombu.serialization import register
def my_custom_excel_encoder(obj):
"""Uncomment this block if you intend to pass it as a Base64 string:
file_base64 = base64.b64encode(obj[0][0]).decode()
obj = list(obj)
obj[0] = [file_base64]
"""
return str(obj)
def my_custom_excel_decoder(obj):
obj = ast.literal_eval(obj)
"""Uncomment this block if you passed it as a Base64 string (as commented above in the encoder):
obj[0][0] = base64.b64decode(obj[0][0])
"""
return obj
register(
'my_custom_excel',
my_custom_excel_encoder,
my_custom_excel_decoder,
content_type='application/x-my-custom-excel',
content_encoding='utf-8',
)
app = Celery('tasks')
app.conf.update(
accept_content=['json', 'my_custom_excel'],
)
#app.task
def add(excel_file):
df = pandas.read_excel(excel_file)
print("Contents of excel file:", df)
views.py
from tasks import add
with open("file.xlsx", 'rb') as excel_file: # Change this to be your <request.data['file']>
excel_raw_bytes = excel_file.read()
add.apply_async((excel_raw_bytes,), serializer='my_custom_excel')
Output
Same as Solution 1
Solution 3
You might be interested with this documentation of Sending raw data without Serialization
I am trying to fetch data from twitter for processing. Please see the code I want various data corresponding to a particular tweet corresponding to a given topic. I am able to fetch data (created_at, text, username, user_id). It shows error when i try to fetch(location, followers_count, friends_count, retweet_count).
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import json
ckey = '***********************'
csecret = '************************'
atoken ='*************************'
asecret = '**********************'
class listener(StreamListener):
def on_data(self,data):
try:
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
timestamp = all_data["created_at"]
user_id = all_data["id_str"]
location = all_data["location"]
followers_count = all_data["followers_count"]
friends_count = all_data["friends_count"]
retweet_count = all_data["retweet_count"]
saveThis = str(time.time())+'::'+timestamp+'::'+username+'::'+user_id+'::'+tweet+'::'+followers_count+'::'+friends_count+'::'+retweet_count+'::'+location
saveFile = open('clean2.txt','a')
saveFile.write(saveThis)
saveFile.write('\n')
saveFile.close
return True
except BaseException, e:
print 'failed on data,',str(e)
time.sleep(5)
def on_error(self, status):
print status
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["tweepy"])#topic
The reason it fails on all_data["location"] is that tweets don't have such a property: https://dev.twitter.com/overview/api/tweets
same with friends_count, followers_count - they are properties of users, not tweets.
The code should not be failing on all_date["retweet_count"] as tweets have such a property.
P.S. please include the error message (even if you skip the full error trackback) when reporting errors. makes it's easier to help you, otherwise one has to guess what the error might be.
I would like to transcode user uploaded videos using celery. I think first I should upload the video, and spawn a celery task for transcoding.
Maybe something like this in the tasks.py:
subprocess.call('ffmpeg -i path/.../original path/.../output')
Just completed First steps with celery, so confused how to do so in the views.py and tasks.py. Also is it a good solution? I would really appreciate your help and advice. Thank you.
models.py:
class Video(models.Model):
user = models.ForeignKey(User)
title = models.CharField(max_length=100)
original = models.FileField(upload_to=get_upload_file_name)
mp4_480 = models.FileField(upload_to=get_upload_file_name, blank=True, null=True)
mp4_720 = models.FileField(upload_to=get_upload_file_name, blank=True, null=True)
privacy = models.CharField(max_length=1,choices=PRIVACY, default='F')
pub_date = models.DateTimeField(auto_now_add=True, auto_now=False)
my incomplete views.py:
#login_required
def upload_video(request):
if request.method == 'POST':
form = VideoForm(request.POST, request.FILES)
if form.is_valid():
if form.cleaned_data:
user = request.user
#
#
# No IDEA WHAT TO DO NEXT
#
#
return HttpResponseRedirect('/')
else:
form = VideoForm()
return render(request, 'upload_video.html', {
'form':form
})
I guess you already have solved the problem but I will provide a bit more information to what already said GwynBleidD because I had the same issue.
So as GwynBleidD you need to call Celery tasks, but how to code those tasks ? here is the structure :
the task get the video from the database
it encodes it with ffmepg and outputs it anywhere you want
when done with the encoding, it sets the corresponding attribute to the model and saves it (be careful, if you run various tasks on the same video, do not save with the old instance, as you may lose information from other tasks running)
First, set a FFMPEG_PATH variable in your settings, then:
import os, subprocess
from .models import Video
#app.task
def encode_mp4(video_id, height):
try:
video = Video.objects.get(id = video_id)
input_file_path = video.original.path
input_file_name = video.original.name
#get the filename (without extension)
filename = os.path.basename(input_file_path)
# path to the new file, change it according to where you want to put it
output_file_name = os.path.join('videos', 'mp4', '{}.mp4'.format(filename))
output_file_path = os.path.join(settings.MEDIA_ROOT, output_file_name)
# 2-pass encoding
for i in range(1):
subprocess.call([FFMPEG_PATH, '-i', input_file_path, '-s', '{}x{}'.format(height * 16 /9, height), '-vcodec', 'mpeg4', '-acodec', 'libvo_aacenc', '-b', '10000k', '-pass', i, '-r', '30', output_file_path])
# Save the new file in the database
video.mp4_720.name = output_file_name
video.save(update_fields=['mp4_720'])
Modify your model so you can save original (uploaded) video without transcoded version(s) and maybe add some flag into your model that will save state if video was transcoded (and based on that flag you can display to user that video transcoding is still in progress).
After uploading video and saving it's model to database, run celery task passing ID of your video into it. In celery task retrieve video from database, transcode it and save it into database with changed flag.
I upload some data to a django view. Client:
from poster.encode import multipart_encode
def upload_data(upload_url, data, filename):
print "Uploading %d bytes to server, file=%s..." % (len(data), filename)
datagen, headers = multipart_encode({filename: data})
request = urllib2.Request(upload_url, datagen, headers)
# Actually do the request, and get the response
try:
resp_f = urllib2.urlopen(request, timeout=120)
except urllib2.URLError:
return None
res = resp_f.read()
resp_f.close()
return res
#...
def foo(self, event_dicts_td):
event_dicts_td_json = json.dumps(event_dicts_td)
res = upload_data(self.upload_url, event_dicts_td_json.encode('utf8').encode('zlib'), "event_dicts_td.json.gz")
The view:
def my_view(request):
event_dicts_td_json_gz = request.POST.get('event_dicts_td.json.gz')
if not event_dicts_td_json_gz:
return HttpResponse("fail")
print type(event_dicts_td_json_gz), repr(event_dicts_td_json_gz[:10])
event_dicts_td_json_gz = event_dicts_td_json_gz.encode("utf8")
print type(event_dicts_td_json_gz), repr(event_dicts_td_json_gz[:10])
event_dicts_td_json = event_dicts_td_json_gz.decode("zlib").decode("utf8")
return HttpResponse("it still failed")
The output:
<type 'unicode'> u'x\ufffd\ufffd]s\ufffd\u0192\ufffd\ufffd\n'
<type 'str'> 'x\xef\xbf\xbd\xef\xbf\xbd]s\xef'
This is not acceptable. I just need the raw bytes. I'm not uploading unicode - I'm uploading raw bytes - and I want those raw bytes back. I don't know how it's trying to decode it into unicode - apparently not using utf8 cause zlib was unable to decompress the data. (It was unable to decompress it even when I didn't try to do an .encode("utf8") before zlibbing-it, that was just a test.)
How do I make django not unicodify the POST variables? Or, if it does, how do I undo it?
You can undo this.
Try to use *smart_str* from django.utils.encoding:
from django.utils.encoding import smart_str
event_dicts_td_json_gz = smart_str( event_dicts_td_json_gz )
View the docs here please: https://docs.djangoproject.com/en/dev/ref/unicode/#useful-utility-functions