Adverse effects of meta data file deletion - microsoft-sync-framework

We're using Microsoft Sync Framework 2.0.5 to synchronize files between one server and multiple hosts. We've noticed that when the connection is interrupted and then reconnects it starts trying to sync again. This is good. However, it can no longer sync--it's like it is blocked. When we delete the sync meta data file in the folder that it is trying to sync to, everything starts syncing again just fine. The simplest solution I can think of is to have the code check for the presence of the meta data file and delete it before starting sync. What are the downsides of this approach and is there a better way? Thanks!

the metadata stores what has been synched. if you delete it, you will always initiate a full sync, not an incremental sync. as a practice, backup the metadata file before you sync and in case of cancelled syncs, restore the metadata file. you may want to store the metadata file in a folder separate from the files you're synching.

Related

Delaying system shutdown during json DB update in python

So I have a rather large json database that I'm maintaining with python. It's basically scraping data from a website on an hourly basis and I'm running daily restarts on the system (Linux Mint) via crontab. My issue is that if the system happens to restart during the database updating process I get corrupted json files.
My question is if there is anyway to delay the system restart in my script to ensure the system shuts down at a safe time? I could issue the restart command inside the script itself but if I decide to run multiple scripts that are similar to this in the future I'll obviously have a problem.
Any help here would be greatly appreciated. Thanks
Edit: Just to clarify I'm not using the python jsondb package. I am doing all file handling myself
So my solution to this was quite simple (Just protect data integrity):
Before write - backup the file
On successful write - delete the backup (Avoids doubling the size of the DB)
Where ever a corrupted file is encountered - revert to backup
The idea being that if the system closes the script during the file backup, it doesn't matter, we still have the original and if the system closes the script during write to the original file, the backup never gets deleted and we can just use that instead. All and all it was just an extra 6 lines of code and appears to have solved the issue.

Heroku ephemeral storage, Sendgrid, and attachments

On occasion I need to send emails with attachments to users of my site. I am using SendGrid and python-sendgrid 0.1.4 to do the send. Email sending is queued through Redis.
Here's the issue -- where do I put the attachment, which is currently generated as part of the web process? I tried putting it /tmp, which didn't work -- presumably because the file was deleted when the web process shut down and was no longer available when the worker process came by? I tried /app/media, which also didn't work -- I think because /app/media is read-only (though, oddly, I did not get any errors attempting to write to this directory)?
I think the answer may be that I have to refactor my code to generate the attachment in the same process as the email is sent, but as that is a pretty significant refactor, I thought I'd ask the community first. Thanks!
Heroku's /tmp directories are unique to each dyno. So your Web Dyno saves a file in its /tmp directory, then your worker looks in its /tmp directory and cannot find it.
The best option is likely refactoring your code (that way you aren't clogging up your Web Dyno's resources creating and writing files to disk). However, if you really want to avoid it, you could store your files temporarily on S3 [tutorial] or some other external storage mechanism.
You always need to use an external storage like for example S3, to store files that need to be available to every server instance/dyno.
Interesting to know is, if you don't want to store those attachements forever. You can attach a lifecycle event to your S3 bucket that will automatically delete a file if it's older then x days.

Multiple copies of application accessing SQLite file?

Our application uses an SQLite database file to hold some data in it. The app opens the database in the file on startup, reads and writes to it, and closes it on exit.
Unfortunately, we can't forbid someone from running two copies of our app at once. If that happens, presumably there will be two copies of the app trying to read from and/or write to the file at the same time. I imagine this would not end well for the database file.
What can we do to avoid causing data loss for the user? Should we simply avoid opening the database if a second copy of the app is launched concurrently? Or is there something cleverer we can do?
Thanks.
Any sane database provider, including sqlite, will not corrupt your database if 2 people access it at the same time. Most will queue the requests if there's no way to run them in parallel.
Now what your app does with the data, that is your app's problem, but don't worry about the database itself.
Some info about sqlite concurrency: http://www.sqlite.org/lockingv3.html

Django action after file upload

We have an extensive existing codebase and we've added load-balanced servers with a single master server to the equation now. There are various apps that contain models with uploaded files and images which all work fine... However, this raises the obvious problem of the rsync delay. Rsync is in the crontab and set to run every minute but this still means there's a potential 59 second wait between content being created and it actually existing on the webservers.
What I would like, is to be able to register some kind of 'post file changed' handler that triggers rsync whenever a new file is uploaded. I can't find anything of the sort though! Django has file upload handlers, but these appear to only deal with the actual upload stream, not the file as it is saved to the filesystem thereafter.
The best approach I can see is to create simple extensions to FileField, FieldFile, ImageField and ImageFieldFile as part of my project and hook into the save and delete methods in the FileField. Essentially, to create custom File and Image fields with this behaviour added. This isn't massively complicated to do but it doesn't seem like the most elegant solution to me. I'll need to teach South about my new fields, update every model that is affected and then create hordes of south migrations (which I'm pretty sure will clash with some code we have pending).
I'm also looking into creating a custom Storage class for the project, but I'm nervous about this having far-reaching effects on other pieces of code.
I can't believe no-one has come across this issue before, is there a canonical approach?
Thanks very much!
If you want to tackle this problem from the server-side (eg. similar solution to rsync) and you're running Linux, you might want to check out lsyncd:
http://code.google.com/p/lsyncd/
lsyncd uses inotify in the Linux kernel to watch directories and invoke an rsync as soon as files are modified. Fairly simple to drop in.

What is the best way to backup a django project?

I maintain a couple of low-traffic sites that have reasonable user uploaded media files and semi big databases. My goal is to backup all the data that is not under version control in a central place.
My current approach
At the moment I use a nightly cronjob that uses dumpdata to dump all the DB content into JSON files in a subdirectory of the project. The media uploads is already in the project directory (in media).
After the DB is dumped, the files are copied with rdiff-backup (makes an incremental backup) into another location. I then download the rdiff-backup directory on a regular basis with rsync to store a local copy.
Your Ideas?
What do you use to backup your data? Please post your backup solution - if you only have a few hits per day on your site or if you maintain a high traffic one with shareded databases and multiple fileservers :)
Thanks for your input.
Recently, I've found this solution called Django-Backup and has worked for me. You can even combine the task of backing up the databases or media files with a cronjob.
Regards,
My backup solution works the following way:
Every night, dump the data to a separate directory. I prefer to keep data dump directory distinct from the project directory (one reason being that project directory changes with every code deployment).
Run a job to upload the data to my Amazon S3 account and another location using rsync.
Send me an email with the log.
To restore a backup locally I use a script to download the data from S3 and upload it locally.