Heroku ephemeral storage, Sendgrid, and attachments - django

On occasion I need to send emails with attachments to users of my site. I am using SendGrid and python-sendgrid 0.1.4 to do the send. Email sending is queued through Redis.
Here's the issue -- where do I put the attachment, which is currently generated as part of the web process? I tried putting it /tmp, which didn't work -- presumably because the file was deleted when the web process shut down and was no longer available when the worker process came by? I tried /app/media, which also didn't work -- I think because /app/media is read-only (though, oddly, I did not get any errors attempting to write to this directory)?
I think the answer may be that I have to refactor my code to generate the attachment in the same process as the email is sent, but as that is a pretty significant refactor, I thought I'd ask the community first. Thanks!

Heroku's /tmp directories are unique to each dyno. So your Web Dyno saves a file in its /tmp directory, then your worker looks in its /tmp directory and cannot find it.
The best option is likely refactoring your code (that way you aren't clogging up your Web Dyno's resources creating and writing files to disk). However, if you really want to avoid it, you could store your files temporarily on S3 [tutorial] or some other external storage mechanism.

You always need to use an external storage like for example S3, to store files that need to be available to every server instance/dyno.
Interesting to know is, if you don't want to store those attachements forever. You can attach a lifecycle event to your S3 bucket that will automatically delete a file if it's older then x days.

Related

Heroku doesn't update github file system when an image is uploaded from website

I ran into the problem where Heroku doesn't update my GitHub repository (or say static filesystem) when a blog post (including pictures) is created from the website.
Other images survive, whilst the ones saved in my filesystem with the server running on heroku, disapper.
I found this on their documentation.
The Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted.
I'm still confused why not all the pictures disappear and only those added later do.
Is AWS S3 a solution for this? If it is, how can I represent my filesystem using buckets?
Say, for the Blog Post 1 I have 2 picture resolutions, which means storing the files in different folders corresponding to those resolutions.
---1920x1920
-----picture.jpg
---800x800
-----picture.jpg
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Is AWS S3 a solution for this?
S3 is the recommended solution for this, and the configuration is documented in Heroku DevCentre with specfic instructions for uploading from Python.
Note these Python instructions use the Direct Upload approch: Have the flask app generate a pre-signed URL, which is then passed back to the client Javascript code, so that the user's browser can make the upload to S3 directly. The resulting S3 URL of the image, is then put into a hidden element in the form, which is then received by your app on form submit.
The fact that you have separate image sizes suggests your app does some processing (maybe with PIL) to get these thumbnails. In which case it may be easier to use the Pass-Through approach where your app implements its own upload mechanism, does the processing and then uploads the thumbnails to S3 (The upload to S3 part is well document, such as in this SO thread).
The Pass-Through method carries the warning that this may cause blocking of a single threaded worker. If your site gets a volume of requests that causes this to be an issue, you may need to increase the number of gunicorn workers, or change to a worker type that supports concurrency (This github post has some useful commands/info on concurrent worker types).
The best way to implement this whole thing (although the requirement for a redisgo dyno and worker dyno may push you into the paid teir) may be with Background Tasks using rq. You use the Direct-Upload approach above to upload the original image, then have a background job download that, do the resizing, and put the resulting thumbnails back onto S3.
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Have one Bucket for the entire app, and just include forward slashes in the object's key to mimic a subdirectory structure.

How can i get django to process media files on production?

How can i get django to process media files on production when DEBUG = False on heroku server?
I know that it’s better not to do this and that this will lead to a loss of performance, but my application is used only by me and my cat, so I don't think that this will be unjustified in my case.
The reason this won't work is because the Heroku file system is ephemeral, meaning any files uploaded after your app code is pushed will be overwritten anytime your app is restarted. This will leave your app with image links in the DB which lead to non-existent files.
You can read more about it here:
https://help.heroku.com/K1PPS2WM/why-are-my-file-uploads-missing-deleted
Your best bet is using a bucket like Amazon S3 to upload your files to. It costs almost nothing for small use, and is very reliable.
https://blog.theodo.com/2019/07/aws-s3-upload-django/

Is there an easier way to remove old files on servers through GitHub after deleting files and syncing locally?

I currently work with a web dev team and we have 100+ GitHub Repo's, each for a different e-commerce website that has an instance on AWS. The developers use the GitHub app to upload their changes to the servers, and do this multiple times a day.
I'm trying to find the easiest way for us to remove old, deleted files from our servers after we delete and sync GitHub locally.
To make it clear, say we have an index.html, page1.html and page2.html. We want to remove page1.html, so they delete page1.html and sync through the GitHub app. The file is no longer visibly in the repo, but for us to completely remove the file I must also SSH into our AWS server, go to the www directory and find page1.html and also remove it there. Is there an easier way for the developers, who do not use SSH and the command line, to get rid of those files in terms of syncing with GitHub? It becomes a pain to have to SSH into many different servers and then determining which files were removed from the repo so that I can remove them there.
Thanks in advance
Something we do with our repo is we use tags(releases) and then through automation (chef in our case) we tell it to pull the new tag. It sounds like this wouldn't necessarily work for you but what Chef actually does with the tag might be of interest
It pulls the tag and then updates a symlink (and graceful restarts Apache). This means there's 0 downtime (symlink updates instantly) and, because it's pulling a fresh copy, any deleted files are gone.

Is it better to pull files from remote locations or grant users FTP access to my system?

I need to setup a process to update a database table with user supplied CSV-data (running Coldfusion 8/MySQL 5.0.88).
I'm not sure about the best way to do this.
Should I give users FTP-access to my system, generate a directory for every user and upload files from there, or should I pick files up from external locations, so the user has to setup an FTP folder my system can access. I'm sort of leaning towards the 2nd way and wanted to set this up using cfschedule and cfftp, but I'm not sure this is the best way to go forward. Security wise, I'm mor inclined to have users specify an FTP location, from where I pull, rather than handing out and maintaing FTP folders for every user.
Question:
Which approach is better both in terms of security and automation?
Thanks for input!
I wouldn't use either approach. I would give the users a web page to upload their csv files. The cf page that accepts the files would place them into a specific folder and make sure they have unique filenames. The cffile tag will help you with that.
The scheduled job would start with a cfdirectory tag on the target folder. This creates a query object. Loop through it and do what you have to do with each file.
Remember to check for the correct file extension. Then look at the first line of the file to ensure it matches the expected format.
Once you have finished processing the file, do something with it so that you don't process it again on the next scheduled job.
Setting up a custom FTP server is certainly a possibility, since you are able to create users, and give them privileges (automated). It is also secure.
But I don't know the best place to start if you don't have any experience with setting up a FTP server.
Try https://www.dropbox.com/
a.)Create a dropbox account,send invites to your users/clients.
b.)You can upload files/folders into dropbox,your clients/users can access it from their
dropbox account/dropbox desktop app..
c.)Your users/clients can upload files/folders and you can access it from your dropbox
website account/desktop app.
Dropbox is rank 1 software, better in security and automation.
Other solutions:
Best solution GOOGLE DRIVE(5gb free)
create a new gmail account,give ur id and password to your users.ask them to open google drive and import/export files.or try skydrive(25gb free)
http://www.syncplicity.com/
https://www.cubby.com/
http://www.huddle.com/?source=cj&aff=4003003
http://www.egnyte.com/
http://www.sharefile.com/

How to delete files via FTP when directory has over 100,000 files?

I went to upload a new file to my web server only to get a message in return saying that my disk quota was full... I wasn't using up my allotted space but rather my allotted FILE QUANTITY. My host caps my total number of files at about 260,000.
Checking through my folders I believe I found the culprit...
I have a small DVD database application (Video dB By split Brain) that I have installed and hidden away on my web site for my own personal use. It apparently caches data from IMDB, and over the years has secretly amassed what is probably close to a MIRROR of IMDB at this point. I don't know for certain but I did have a 2nd (inactive) copy of the program on the host that I created a few years back that I was using for testing when I was modifying portions of it. The cache folder in this inactive copy had 40,000 files totalling 2.3GB in size. I was able to delete this folder over FTP but it took over an hour. Thankfully it also gave me some much needed breathing room.
...But now as you can imagine the cache folder for the active copy of this web-app likely has closer to 150000 files totalling about 7GB worth of data.
This is where my problem comes in... I use Flash FXP for my FTP client and whenever I try to delete the cache folder, or even just view the contents it will sit and try to load a file list for a good 5 minutes and then lose connection to the server...
my host has a web based file browser and it crashes when trying to do this... as do free online FTP clients like net2ftp.com. I don't have SSH ability on this server so I can't login directly to delete either.
Anyone have any idea how I can delete these files? Is there a different FTP program I can download that would have better success... or perhaps a small script I could run that would be able to take care of it?
Any help would be greatly appreciated.
Anyone have any idea how I can delete
these files?
Submit a support request asking for them to delete it for you?
It sounds like it might be time for a command line FTP utility. One ships with just about every operating system. With that many files, I would write a script for my command-line FTP client that goes to the folder in question and performs a directory listing, redirecting the output to a file. Then, use magic (or perl or whatever) to process that file into a new FTP script that runs a delete command against all of the files. Yes, it will take a long time to run.
If the server supports wildcards, do that instead and just delete ..
If that all seems like too much work, open a support ticket with your hosting provider and ask them to clean it up on the server directly.
Having said all that, this isn't really a programming question and should probably be closed.
We had a question a while back where I ran an experiment to show that Firefox can browse a directory with 10,000 files no problem, via FTP. Presumably 150,000 will also be ok. Firefox won't help you delete, but it might be helpful in capturing the names of the files you need to delete.
But first I would just try the command-line client ncftp. It is well engineered and I have had good luck with it in the past. You can delete a large number of files at once using shell patterns. And it is available for Windows, MacOS, Linux, and many other platforms.
If that doesn't work, you sound like a long-term customer---could you beg your ISP the privilege of a shell account for a week so you can remote login with Putty or ssh and blow away the entire directory with a single rm -r command?
If your ISP provides ssh access, you can use one rm command to remove the files.
If there is no command line access, you can have a try with some powerful FTP client like CrossFTP. It works on win, mac, and linux. When you select to delete the huge amount of files on your server, it can queue in the delete operations, so that you don't need to reload the folder again. When you restart CrossFTP, the queue can also be restored and continued.