Is it possible in Django to load data into the Memory, process it and return it to the User?
Example:
User uploads image -> Python Script process it (e.g. make it s/w) -> User sees processed Image on same Webpage
(Other Examples would be all these "convert-online" sites like pdf2doc.com)
Is it a bad idea to load it into the Memory?
Would a Queue and saving it on a CDN be a better solution?
I'm trying to understand the possibilities of handling files from the User, which don't need to be safed. I appreciate any further ideas/considerations.
Related
I have a general understanding question. I am building a flutter app that relies on a content library containing text files, latex equations, images, pdfs, videos etc.
The content lies on an aws amplify backend. Depending on the navigation of the user in the app, the corresponding data is fetched and displayed.
I am not sure about the correct way of fetching the data. The current method (which works) is that the data is stored in an S3 bucket. When data is requested, the data is downloaded to a temporary directory and then opened and processed in the app. This is actually not slow, but I feel that it is not the way it should be done.
When data is downloaded a file transfer notification pops up, which bothers me because it is shown all the time. Also I would like to read the data directly with something like a get request, without downloading the file first (specially for text files, which I would like to read directly into a String). But here I don't know how it works, because I don't see that you can save data in a file system with the other amplify services like data store or the rest api. Also, the S3 bucket is an intuitive way of storing data that is easy to use for the content creators of my company, for me it seems that the S3 bucket is the way to go. However with S3 I have only figured out the download method to fetch data.
Could someone give me a hint on what is the correct approach for this use case? Thank you very much!
I’m trying to build application with backend in java that allows users to create a text with images in it (something like a a personal blog). I’m planning to store these images to s3 bucket. When uploading image files to bucket i’m hashing the original name and store the hashed one in the bucket. Images are for display purpose only, no user will be able to download them. Frontend displays these images by getting a path to them from the server. So the question is, is there any need to store original name of the image file in the database? And what are the reasons, if any, of doing so?
I guess in general it is not needed because what is more important is how these resources are used or managed in the system.
Assuming your service is something like data access (similar to google drive), I don't think it's necessary to store it in DB, unless you want to make faster search queries.
I am developing an application with Django, AWS S3 and hosted on Heroku.
At one point users have to upload multiple large files, totaling around 150MB each time.
I have tried various approaches.
1st attempt: directly call the save method of the Django form:
Result: the request takes more than 30 seconds and returns a timeout.
2nd attempt: temporarily save the file to a Heroku directory and read it from Celery task.
Result: Cannot save because it throws FileNotFoundError: [Errno 2] No such file or directory on production.
3rd attempt: pass the uploaded files (in memory files) to a celery task but the bytes cannot be serialized and passed to the task neither with json or with pickle.
Could anyone help me please?
Thanks advance.
Another approach can be like that
Expose an APIs to generate presigned URL for Frontend (Steps are here).
Upload files by using that URL from the frontend in async way. That will offload your computation at Backend.
After successful upload, you will get an URL of file location. Now save the S3 URL along with other fields data to Django model.
You can upload more than 150MB file size by this method. Your system will be scalable.
I want to do the following: a user in a browser types some text and after he presses a 'Save' button, the text should be saved in a file (for example: content.txt) in a folder (for example: /username_text) on the root of an S3 bucket.
Also, I want the user to be able, when he visits the same page, load the content from S3 and continue working on the file. Then, if he/she is done, save the file to S3 again.
Probably important to mention, but I plan on using NodeJS for my back-end...
My question now is: What is the best way to set this storing-and-retrieving thing up? Do I create an API gateway + Lambda function to GET and POST files through that? Or do I for example use the aws-sdk in Node to directly push and pull files from S3? Or is there a better way to do this?
I looked at the following two guides:
Using AWS S3 Buckets in a NodeJS App – Codebase – Medium
Image Upload and Retrieval from S3 Using AWS API Gateway and Lambda
Welcome to StackOverflow!
I think you are worrying too much about the not-so-important stuff. S3 is nothing but a storage system. You could have decided to store the content of these files on DynamoDB, RDS, etc. What would you do if you stored its contents on these real databases? You'd fetch for data and display it to the user, wouldn't you?
This is what you need to do with S3! S3 is a smart choice on your scenario because your "file" can grow very big and S3 is a great place for storing files. However, apparently, you're not actually storing files (think of .pdf, .mp4, .mov, etc.), you're essentially only storing human-readable text.
So here's one approach on how to solve your problem:
FETCHING FILE CONTENT
User logs in
You fetch the user's personal information based on some token. You can store all the metadata in DynamoDB, where given a user_id, fetch all the "files" from this user. These "files" (metadata only) would be the bucket and key for the actual file on S3.
You use the getObject API from S3 to fetch the file based on your query and display the body of your file to your user in a RESTful way. Your response should look something like this:
{
"content": "some content"
}
SAVING FILE CONTENT
User logs in
The user writes anything in a form and submits it. In your Lambda function, you grab the content of this form and process it. This request should look something like this:
{
"file_id": "some-id",
"user_id": "some-id",
"content": "some-content"
}
If the file_id exists, update the content in S3. Otherwise, upload a new file in S3 and then create a new entry in DynamoDB. You'd then, of course, have to handle if the user submitting the changes actually owns the file, but if you're using UUIDs it shouldn't be too much of a problem, but still worth checking in case an ID is leaked somehow.
This way, you don't need to worry about uploading/downloading files as these are CPU intensive tasks, so you can keep your costs low as well as using very little RAM in your functions (128MB should be more than enough), after all, you're now only serving text. Not only this will simplify your way of designing it, but will also make things simpler both in API Gateway and in your code as you won't have to deal with binary types. The maximum you'll do is convert the buffer from S3 to a String when serving some content, but this should be completely fine.
EDIT
On your question regarding whether you should upload it from the browser or not, I suggest you take a look into this answer where I cover the pros/cons of doing it via API Gateway vs from the Browser.
I'm having problems when uploading lots of files in Django. The context is the following: I've a spreadsheet with one or more columns being image filenames; those images are being uploaded through an form with input type=file and the option multiple.
With few lines - say 70, everything goes fine. But with more lines, and consequently more images, there's a IOError happening in random positions.
I've checked several questions about file/image upload in Django but couldn't find any that is related to my problem.
The model I'm using is the Product model of LFS (www.getlfs.com). We are developing a system that is based on LFS and to facilitate the creation of dozens of products in batch we wrote some views and templates to receive the main product properties through a spreadsheet. Each line is a product and the columns are the desired properties.
LFS uses a custom class ImageWithThumbsField(ImageField) to store the product's image and when saving the product instance (got from the spreadsheet), all thumbnails are generated. This is a time (cpu) consuming task, and my initial guess is that for some reason the temporary file is deleted before all processing had occurred.
Is there a way to keep these uploaded files for more time? Any other approach suggested to be able to process hundreds of uploaded files? Any hints on what can be happening?
Hope you can understand my question. I can post code if need.
Links to relevant portions of LFS code:
where thumbnails are generated:
https://github.com/diefenbach/django-lfs/blob/master/lfs/core/fields/thumbs.py
product model
https://github.com/diefenbach/django-lfs/blob/master/lfs/catalog/models.py
Thanks in advance!
It sounds like you are running out of memory. When django processess uploads, until the form is validated all of the files are either:
kept in memory inside the python/wsgi process/worker. (Usual mode of op for runserver)
In this case, you are uploading enough photos to fill up the process memory and running out of space. This will be non-deterministic as to where the IOError happens as you can imagine (GC Dependent).
Temporarily stored in /tmp/ (usual setup of apache)
In this case, the webserver's ramfs is full of images that have not yet been written to disk. In this case it should IOError arround the same place.
In either case, you should not be bulk uploading images in this way anyway. Apache/Django is not designed for it. Try uploading a single product/image per request/response, and all your problems will go away.