Are Google Compute Engine (GCE) persistent disk (PD) snapshots crash-consistent? - google-cloud-platform

When I take a persistent disk snapshot, what are the consistency guarantees for this snapshot? I know it is not guaranteed to be "application consistent", but is it "crash consistent"? Or are there no consistency guarantees?
-- EDIT --
For comparison, Machine Images are guaranteed to be crash-consistent as is. However, the docs are silent on this issues with persistent disk snapshots.

TL;DR: PD disk snapshots are crash consistent.
From the Compute Engine Persistent documentation:
When you take a snapshot of a persistent disk, you don't need to take any additional steps to make your snapshot crash consistent. In particular, you do not need to pause your workload.

As per the snapshot best practice you can create consistent snapshot from the persistent disk even if the application is writing data to those disks. If your app require strict consistency then you can follow the below steps to ensure consistent snapshot.
To prepare your persistent disk before you take a snapshot do the following:
1- Connect to your instance using SSH.
2- Run an app flush to disk. For example, MySQL has a FLUSH statement. Use whichever tool is available for your app.
3- Stop your apps from writing to your persistent disk.
4- Run sudo sync.

Related

Risk of disk corruption when using a persistent google cloud disk with a preemptible google compute instance?

I'd like to use preemptible google compute instance to save money, and persist data using a persistent disk that is attached to the instance. But I'm concerned about the risk of data corruption as outlined in the detach-disk documentation, because it's unclear to me if the disk will be detached properly when the instance is preempted. Any advice/experiences on this? I didn't find any guidance in the google cloud documentation. I'm wondering if I could add the unmount command, sudo umount /dev/disk/by-id/[disk-name], to the preemptible instance shutdown script, but I haven't tried that yet. I could also set up frequent disk snapshots to minimize the damage of data corruption, if that does happen.
You should have zero concern that there will be disk corruption because a preempt-able Compute Engine is preempted. If it were possible that disk buffers were not properly flushed during a Compute Engine preemption, that fact would be screamed loudly from the documentation and the forum. While I have no information on how the Compute Engine is ultimately shutdown, I do have a rock-solid belief that just before it breathes its last, any persistent disk writes out-standing are committed.

GCP: Creating a snapshot of a VM including runtime processes

From what I could find, Google Cloud will only allow me to create a snapshot of a machine disk.
Is it possible in some way to also capture its runtime? i.e RAM and process states.
Unfortunately, snapshots are limited to the persistent disk and not runtime processes and RAM. I would also like to mention that it is not possible to have a snapshot of RAM as this is volatile memory.

GCE: persistent boot disk

Simple question for GCE users: are persistent boot disks safe to be used or data loss could occur?
I've seen that I can attach additional persistent disks, but what about the standard boot disks (that should be persistent as well) ?
What happens during maintenance, equipment failures and so on ? Are these boot disks stored on hardware with built-in redundancy (raid and so on) ?
In other words, are a compute instance with persistent boot-disk similiar to a non-cloud VM stored on local RAID (from data-loss point of view) ?
Usually cloud instances are volatile, a crash, shutdown, maintenance and so on, will destroy all data stored.
Obvisouly, i'll have backups.
GCE Persistent Disks are designed to be durable and highly-available:
Persistent disks are durable network storage devices that your instances can access like physical disks in a desktop or a server. The data on each persistent disk is distributed across several physical disks. Compute Engine manages the physical disks and the data distribution to ensure redundancy and optimize performance for you.
(emphasis my own, source: Google documentation)
You have a choice of zonal or regional (currently in public beta) persistent disks, on an HDD or SSD-based platform. For boot disks, only zonal disks are supported as of the time of this writing.
As the name suggests, zonal disks are only guaranteed to persist their data within a single zone; outage or failure of that zone may render the data unavailable. Writes to regional disks are replicated to two zones in a region to safeguard against the outage of any one zone. The Google Compute Engine console, "Disks" section will show you that boot disks for your instances are zonal persistent disks.
Irrespective of the durability, it is obviously wise to keep your own backups of your persistent disks in another form of storage to safeguard other mechanisms for data loss, such as corruption in your application or user error by an operator. Snapshots of persistent disks are replicated to other regions; however, be aware of their lifecycle in the event the parent disk is deleted.
In addition to reviewing the comprehensive page linked above, I recommend reviewing the relevant SLA documentation to ascertain the precise guarantees and service levels offered to you.
Usually cloud instances are volatile, a crash, shutdown, maintenance and so on, will destroy all data stored.
The cloud model does indeed prefer instances which are stateless and can be replaced at will. This offers many scalability and robustness advantages, which can be achieved using managed instance groups, for example. However, you can use VMs for persistent storage if desired.
normally the data boot disk should be ok with restart and other maintenance operation. But it will be deleted with the compute by default.
If you use managed-instance-group, preemptible compute... and you want persistent data, you should use another storage system. If you juste use compute as is, it should be safe enough with backup.
I still think an additional persistent disk or another storage system is a better way to do things. But it's only my opinion.

Consistent EBS snapshot without downtime on a Windows Server 2012 AWS EC2 instance

I have an AWS EC2 Windows Server 2012 R2 instance with a magnetic EBS-volume D:\ (Windows SO is on C:\).
My server works on D:\ writes everytime some temporally files in D:\temp (session file, cache etc.) and reads some static files in D:\htdocs.
I need do a daily consistent snapshot of EBS-volume without downtime
About this question a lot of people says:
Snapshot EBS if the volume is in use it is possible but not recommended
From official documentation:
You can take a snapshot of an attached volume that is in use. However, snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued. If you can pause any file writes to the volume long enough to take a snapshot, your snapshot should be complete.
and here:
EBS volumes and snapshots operate at a block level - a consequence of
which allows snapshots to be taken while an instance is running, even
if the EBS volume is in use. However, only data that is actually on
the disk (i.e. not in a file cache) will be included in the snapshot.
It is the latter reason that gives rise to the idea of consistent
snapshots.
The recommended way is to detach the volume, snapshot it, and reattach it
My question is:
if the snapshot is inconsistent because when i do it there are writing operations, can i remount it? Since only files written is temporally files but they aren't important for me, if are damaged can i simple delete them (after i remount snapshot)? my only target it's to be safe the static file.
If you create a snapshot, you will be able to create a volume from it, and remount it without any issues.
HOWEVER: you are not guaranteed that the data in the volume is consistent.
Consider this scenario: you commit a 1 MB file to an SSD-backed EBS volume. This will require 4 x 256k IO operations. So the first 3 complete, then you take your snapshot, then the 4th block is written.
You will be able to create a volume from your snapshot, but your file will only be 768k in size - the final block will not be there, since it was written after the snapshot was created.
If you have control over what is writing to the disk, pausing it and flushing any caches is really the only way to ensure that the data on the resulting snapshot is consistent.

Backup cassandra to another disk

I'm trying to backup my cassandra cluster to AWS' S3, and found this tool, which seems to do the work:
https://github.com/tbarbugli/cassandra_snapshotter/
But the problem is, in our current cluster, we cant afford to have snapshots on the same disk as the actual data, for we are using SSD's with limited space.
I've also looked up the nodetool snapshot documentation, but I didn't find any option to change the snapshots dir.
So, how can I backup cassandra to another disk, without using the data disk?
Cassandra snapshots are just hard links to all the live sstables at the moment you take the snapshot. So initially they don't take up any additional space on disk. As time passes the new live sstables will supersede the old one at which point your snapshots will start to count against your storage space.
Generally you will take a snapshot to get a consistent view of the database at a given point in time and then use an external tool or script to copy that backup to external storage (and finally clean up the snapshot).
There is no additional tool provided with Cassandra to handle copying the snapshots to external storage. This isn't too surprising as backup strategies very a lot across companies.