What is the best way to backup a django project? - django

I maintain a couple of low-traffic sites that have reasonable user uploaded media files and semi big databases. My goal is to backup all the data that is not under version control in a central place.
My current approach
At the moment I use a nightly cronjob that uses dumpdata to dump all the DB content into JSON files in a subdirectory of the project. The media uploads is already in the project directory (in media).
After the DB is dumped, the files are copied with rdiff-backup (makes an incremental backup) into another location. I then download the rdiff-backup directory on a regular basis with rsync to store a local copy.
Your Ideas?
What do you use to backup your data? Please post your backup solution - if you only have a few hits per day on your site or if you maintain a high traffic one with shareded databases and multiple fileservers :)
Thanks for your input.

Recently, I've found this solution called Django-Backup and has worked for me. You can even combine the task of backing up the databases or media files with a cronjob.
Regards,

My backup solution works the following way:
Every night, dump the data to a separate directory. I prefer to keep data dump directory distinct from the project directory (one reason being that project directory changes with every code deployment).
Run a job to upload the data to my Amazon S3 account and another location using rsync.
Send me an email with the log.
To restore a backup locally I use a script to download the data from S3 and upload it locally.

Related

Strapi - how to switch and migrate from Cloudinary to S3 in production

Given quite a steep cost of Cloudinary as multimedia hosting service (images and videos), our client decided that they want to switch to AWS S3 as file hosting.
The problem is that there are a lot of files (thousands of images and videos) already in the app, so merely switching the provider is not enough - we need to also migrate all the files and make it look like nothing really changed for the end user.
This topic is somehow covered on Strapi forum: https://forum.strapi.io/t/switch-from-cloudinary-to-s3/15285, but there is no solution posted besides vaguely described procedure.
Is there a way to reliably perform the migration, without losing any data and without the need to change anything on client (apps that communicate with Strapi by REST/GraphQL API) side?
There are three steps to perform the migration:
switch provider from Cloudinary to S3 in Strapi
migrate files from Cloudinary to S3
perform database update to reroute Strapi from Cloudinary to S3
Switching provider
This is the only step that is actually well documented, so I will be brief here.
First, you need to uninstall your Cloudinary Strapi plugin by running yarn remove #strapi/provider-upload-cloudinary and install S3 Plugin by running yarn add #strapi/plugin-sentry.
After you do that, you need to create your AWS infrastructure (S3 bucket and IAM with sufficient permissions). Please follow official Strapi S3 plugin documentation https://market.strapi.io/providers/#strapi-provider-upload-aws-s3 and this guide https://dev.to/kevinadhiguna/how-to-setup-amazon-s3-upload-provider-in-your-strapi-app-1opc for steps to follow.
Check that you've done everything correctly by logging in to your Strapi Admin Panel and accessing Media Library. If everything went well, all images should be missing (you will see all metadata like sizes and extensions, but not actual images). Try to upload new image by clicking on 'Add new assets' button. This image should upload successfully and also appear in your S3 bucket.
After everything works as described above, proceed to actual data migration.
Files migration
Most simple (and error resistant) way to migrate files from Cloudinary to S3 is to download them locally, then use AWS Console to upload them. If you have only hundreds (or low thousands) of files to migrate, you might actually used Cloudinary Web UI to download them all (there is a limit of downloading 1000 files at once from Cloudinary Web App).
If this is not suitable for you, there is a CLI available that can easily download all files using your terminal:
pip3 install cloudinary-cli (download CLI)
cld config -url {CLOUDINARY_API_ENV} (api env can be found on first page you see when you log into cloudinary)
cld -C {CLOUD_NAME} sync --pull . / (This step begins the download. Based on how much files you have, it might take a while. Run this command from a directory you want to download the files in. {CLOUD_NAME} can be find just above {CLOUDINARY_API_ENV} on Cloudinary dashboard, you should also see it in after running second command in your terminal. For me, this command failed several times in the middle of the download, but you can just run it again and it will continue without any problem.)
After you download files to your computer, simply use drag and drop S3 feature to upload them into your S3 bucket.
Update database
Strapi saves links to all files in database. This means that even though you switched your provider to S3 and copied all files, Strapi still doesn't know where to find these files as links in database point to Cloudinary server.
You need to update three columns in Strapi database (this approach is tested on Postgres database, there might be minor changes when using other databases). Look into 'files' table, there should be url, formats and provider columns.
Provider column is trivial, just replace cloudinary by aws-s3.
Url and formats are harder as you need to replace only part of the string - to be more precise, Cloudinary stores urls in {CLOUDINARY_LINK}/{VERSION}/{FILE} format, while S3 uses {S3_BUCKET_LINK}/{FILE} format.
My friend and colleague came up with following SQL query to perform the update:
UPDATE files SET
formats = REGEXP_REPLACE(formats::TEXT, '\"https:\/\/res\.cloudinary\.com\/{CLOUDINARY_PROJECT}\/((image)|(video))\/upload\/v\d{10}\/([\w\.]+)\"', '"https://{BUCKET_NAME}.s3.{REGION}/\4"', 'g')::JSONB,
url = REGEXP_REPLACE(url, 'https:\/\/res\.cloudinary\.com\/{CLOUDINARY_PROJECT}\/((image)|(video))\/upload\/v\d{10}\/([\w\.]+)', 'https://{BUCKET_NAME}.s3.{REGION}/\4', 'g')
just don't forget to replace {CLOUDINARY_PROJECT}, {BUCKET_NAME} and {REGION} with correct strings (easiest way to see those values is to access the database, go to files table and check one of the old urls and url of file you uploaded at the end of Switching provider step.
Also, before running the query, don't forget to backup your database! Even better, make a copy of production database and run the query on it before you mess with the production.
And that's all! Strapi is now uploading files to S3 bucket and you also have access to all the data you previously had on Cloudinary.

Is there an easier way to remove old files on servers through GitHub after deleting files and syncing locally?

I currently work with a web dev team and we have 100+ GitHub Repo's, each for a different e-commerce website that has an instance on AWS. The developers use the GitHub app to upload their changes to the servers, and do this multiple times a day.
I'm trying to find the easiest way for us to remove old, deleted files from our servers after we delete and sync GitHub locally.
To make it clear, say we have an index.html, page1.html and page2.html. We want to remove page1.html, so they delete page1.html and sync through the GitHub app. The file is no longer visibly in the repo, but for us to completely remove the file I must also SSH into our AWS server, go to the www directory and find page1.html and also remove it there. Is there an easier way for the developers, who do not use SSH and the command line, to get rid of those files in terms of syncing with GitHub? It becomes a pain to have to SSH into many different servers and then determining which files were removed from the repo so that I can remove them there.
Thanks in advance
Something we do with our repo is we use tags(releases) and then through automation (chef in our case) we tell it to pull the new tag. It sounds like this wouldn't necessarily work for you but what Chef actually does with the tag might be of interest
It pulls the tag and then updates a symlink (and graceful restarts Apache). This means there's 0 downtime (symlink updates instantly) and, because it's pulling a fresh copy, any deleted files are gone.

Is it better to pull files from remote locations or grant users FTP access to my system?

I need to setup a process to update a database table with user supplied CSV-data (running Coldfusion 8/MySQL 5.0.88).
I'm not sure about the best way to do this.
Should I give users FTP-access to my system, generate a directory for every user and upload files from there, or should I pick files up from external locations, so the user has to setup an FTP folder my system can access. I'm sort of leaning towards the 2nd way and wanted to set this up using cfschedule and cfftp, but I'm not sure this is the best way to go forward. Security wise, I'm mor inclined to have users specify an FTP location, from where I pull, rather than handing out and maintaing FTP folders for every user.
Question:
Which approach is better both in terms of security and automation?
Thanks for input!
I wouldn't use either approach. I would give the users a web page to upload their csv files. The cf page that accepts the files would place them into a specific folder and make sure they have unique filenames. The cffile tag will help you with that.
The scheduled job would start with a cfdirectory tag on the target folder. This creates a query object. Loop through it and do what you have to do with each file.
Remember to check for the correct file extension. Then look at the first line of the file to ensure it matches the expected format.
Once you have finished processing the file, do something with it so that you don't process it again on the next scheduled job.
Setting up a custom FTP server is certainly a possibility, since you are able to create users, and give them privileges (automated). It is also secure.
But I don't know the best place to start if you don't have any experience with setting up a FTP server.
Try https://www.dropbox.com/
a.)Create a dropbox account,send invites to your users/clients.
b.)You can upload files/folders into dropbox,your clients/users can access it from their
dropbox account/dropbox desktop app..
c.)Your users/clients can upload files/folders and you can access it from your dropbox
website account/desktop app.
Dropbox is rank 1 software, better in security and automation.
Other solutions:
Best solution GOOGLE DRIVE(5gb free)
create a new gmail account,give ur id and password to your users.ask them to open google drive and import/export files.or try skydrive(25gb free)
http://www.syncplicity.com/
https://www.cubby.com/
http://www.huddle.com/?source=cj&aff=4003003
http://www.egnyte.com/
http://www.sharefile.com/

Should I have my Postgres directory right next to my project folder? If so, how?

I'm trying to develop a Django website with Heroku. Having no previous experience with databases (except the sqlite3 one from the tutorial), it seems to me a good idea to have the following file structure:
Projects
'-MySite
|-MySite
'-MyDB
I'm finding it hard to figure out how to do it, with psql commands preferring to put the databases in some obscure directory instead. Perhaps it's not such a good idea?
Eventually I want to be able to test and develop my site (it'll be just a blog for a while, I'm still learning) locally (ie. add a post, play with the CSS) and sync with the Heroku repository, but I also want to be able to add posts via the website itself occasionally.
The underlying data files (MyDb) has nothing to do with your project files and should not be under your project.
EDIT
added two ways to sync your local database with the database ON the Heroku server
1) export-import
This is the most simple way, do the following steps every now and then:
make an export on the Heroku server by using the pg_dump utility
download the dump file
import the dump into your local database by using the psql utility
2) replication
A more sophisticated way for keeping your local db in sync all the time is Replication. It is used in professional environments and it is probably an overkill for you at the moment. You can read more about it here: http://www.postgresql.org/docs/9.1/static/high-availability.html

Backup strategy for django

I recently deployed a couple of web applications built using django (on webfaction).
These would be some of the first projects of this scale that i am working on, so I wanted to know what an effective backup strategy was for maintaining backups both on webfaction and an alternate location.
EDIT:
What i want to backup?
Database and user uploaded media. (my code is managed via git)
I'm not sure there is a one size fits all answer especially since you haven't said what you intend to backup. My usual MO:
Source code: use source control such as svn or git. This means that you will usually have: dev, deploy and repository backups for code (specially in a drsc).
Database: this also depends on usage, but usually:
Have a dump_database.py management command that will introspect settings and for each db will output the correct db dump command (taking into consideration the db type and also the database name).
Have a cron job on another server that connects through ssh to the application server, executes the dump db management command, tars the sql file with the db name + timestamp as the file name and uploads it to another server (amazon's s3 in my case).
Media file: e.g. user uploads. Keep a cron job on another server that can ssh into the application server and calls rsync to another server.
The thing to keep in mind though, it what is the intended purpose of the backup.
If it's accidental (be it disk failure, bug or sql injection) data loss or simply restoring, you can keep those cron jobs on the same server.
If you also want to be safe in case the server is compromised, you cannot keep the remote backup credentials (sshkeys, amazon secret etc) on the application server! Or else an attacker will gain access to the backup server.