GCP: Creating a snapshot of a VM including runtime processes - google-cloud-platform

From what I could find, Google Cloud will only allow me to create a snapshot of a machine disk.
Is it possible in some way to also capture its runtime? i.e RAM and process states.

Unfortunately, snapshots are limited to the persistent disk and not runtime processes and RAM. I would also like to mention that it is not possible to have a snapshot of RAM as this is volatile memory.

Related

What does EC2 store and why does it even need a storage solution like EBS or Instance Store?

If you use EC2 and launch instances, you can add EBS volumes. So a storage option. However, what I still don't understand exactly is why. Why is there or does EC2 even need a storage option like EBS or Instance Store? What does EC2 store anyway? And why it makes sense that there is EBS?
I know that EBS volume is persistent block storage and data is not lost after exit, unlike instance store. I just don't really understand what EBS is useful for. For which cases and applications is EBS used? Or does using EBS have more to do with creating snapshots that you can create to cache data and then save it to S3?
I've already read a lot and tried to make it understandable somehow, but somehow I can't get any further here. I would be really happy if someone could shed some light on this for me.
Thank you already!
Think of an Amazon EC2 instance as a normal computer. Inside, there is CPU, RAM and (perhaps) a hard disk.
When an EC2 instance has a hard disk, it is called Instance Storage and it behaves just like a normal hard disk in a computer. However, when you turn off the instance and stop paying for it, the EC2 instance can give that computer to somebody else. Rather than giving your data to somebody else, the disk is erased. So, anything you stored on Instance Store is gone! (In truth, instance store is also a virtualised disk, but this is close enough.)
In fact, in the early days of EC2, this was the only storage available. If you wanted to keep data after the instance was turned off, you first had to copy it to Amazon S3. People didn't like this, so they invented Amazon EBS.
If you want to keep your data so that it is still there when you turn on the instance in future, it needs to be stored on a network disk and that is what Amazon EBS provides. Think of it a bit like a USB drive that you can plug into one computer, then disconnect it and plug it into another computer. However, rather than being a physical device, it uses a storage service that keeps multiple copies of the data (in case a disk fails) and lets you modify the size of the disk. You are charged based on the amount of storage space assigned and how long the data is kept ("GB-Month").
Amazon EBS Snapshots are simply a backup of the disk. A snapshot contains all the data currently on the disk, allowing you to create a new disk anytime that will contain an exact copy of the disk as it was when the snapshot was created. This is great for backups, but is also very useful for creating multiple EC2 instances with the same disk content. An Amazon Machine Image (AMI) is actually just an Amazon EBS Snapshot plus a bit of metadata. When a new EC2 instance is launched, it uses an AMI to populate the boot disk rather than loading the operating system from scratch every time.
It is possible to create an AMI that populates an Instance Store disk. This way, you don't actually need to use an Amazon EBS volume. This is good for instances that don't need to permanently keep any data -- they could simply store information in a database or Amazon S3 instead of saving it on disk. Instance Store disks can be very fast since they don't send data across the network, so this is very useful in some situations.
In summary:
Instance Store is a normal disk in a computer (but it gets erased when the instance turns off so nobody else sees your data)
Amazon EBS volumes are network-attached storage that stays around until you delete it

Are Google Compute Engine (GCE) persistent disk (PD) snapshots crash-consistent?

When I take a persistent disk snapshot, what are the consistency guarantees for this snapshot? I know it is not guaranteed to be "application consistent", but is it "crash consistent"? Or are there no consistency guarantees?
-- EDIT --
For comparison, Machine Images are guaranteed to be crash-consistent as is. However, the docs are silent on this issues with persistent disk snapshots.
TL;DR: PD disk snapshots are crash consistent.
From the Compute Engine Persistent documentation:
When you take a snapshot of a persistent disk, you don't need to take any additional steps to make your snapshot crash consistent. In particular, you do not need to pause your workload.
As per the snapshot best practice you can create consistent snapshot from the persistent disk even if the application is writing data to those disks. If your app require strict consistency then you can follow the below steps to ensure consistent snapshot.
To prepare your persistent disk before you take a snapshot do the following:
1- Connect to your instance using SSH.
2- Run an app flush to disk. For example, MySQL has a FLUSH statement. Use whichever tool is available for your app.
3- Stop your apps from writing to your persistent disk.
4- Run sudo sync.

Risk of disk corruption when using a persistent google cloud disk with a preemptible google compute instance?

I'd like to use preemptible google compute instance to save money, and persist data using a persistent disk that is attached to the instance. But I'm concerned about the risk of data corruption as outlined in the detach-disk documentation, because it's unclear to me if the disk will be detached properly when the instance is preempted. Any advice/experiences on this? I didn't find any guidance in the google cloud documentation. I'm wondering if I could add the unmount command, sudo umount /dev/disk/by-id/[disk-name], to the preemptible instance shutdown script, but I haven't tried that yet. I could also set up frequent disk snapshots to minimize the damage of data corruption, if that does happen.
You should have zero concern that there will be disk corruption because a preempt-able Compute Engine is preempted. If it were possible that disk buffers were not properly flushed during a Compute Engine preemption, that fact would be screamed loudly from the documentation and the forum. While I have no information on how the Compute Engine is ultimately shutdown, I do have a rock-solid belief that just before it breathes its last, any persistent disk writes out-standing are committed.

GCE: persistent boot disk

Simple question for GCE users: are persistent boot disks safe to be used or data loss could occur?
I've seen that I can attach additional persistent disks, but what about the standard boot disks (that should be persistent as well) ?
What happens during maintenance, equipment failures and so on ? Are these boot disks stored on hardware with built-in redundancy (raid and so on) ?
In other words, are a compute instance with persistent boot-disk similiar to a non-cloud VM stored on local RAID (from data-loss point of view) ?
Usually cloud instances are volatile, a crash, shutdown, maintenance and so on, will destroy all data stored.
Obvisouly, i'll have backups.
GCE Persistent Disks are designed to be durable and highly-available:
Persistent disks are durable network storage devices that your instances can access like physical disks in a desktop or a server. The data on each persistent disk is distributed across several physical disks. Compute Engine manages the physical disks and the data distribution to ensure redundancy and optimize performance for you.
(emphasis my own, source: Google documentation)
You have a choice of zonal or regional (currently in public beta) persistent disks, on an HDD or SSD-based platform. For boot disks, only zonal disks are supported as of the time of this writing.
As the name suggests, zonal disks are only guaranteed to persist their data within a single zone; outage or failure of that zone may render the data unavailable. Writes to regional disks are replicated to two zones in a region to safeguard against the outage of any one zone. The Google Compute Engine console, "Disks" section will show you that boot disks for your instances are zonal persistent disks.
Irrespective of the durability, it is obviously wise to keep your own backups of your persistent disks in another form of storage to safeguard other mechanisms for data loss, such as corruption in your application or user error by an operator. Snapshots of persistent disks are replicated to other regions; however, be aware of their lifecycle in the event the parent disk is deleted.
In addition to reviewing the comprehensive page linked above, I recommend reviewing the relevant SLA documentation to ascertain the precise guarantees and service levels offered to you.
Usually cloud instances are volatile, a crash, shutdown, maintenance and so on, will destroy all data stored.
The cloud model does indeed prefer instances which are stateless and can be replaced at will. This offers many scalability and robustness advantages, which can be achieved using managed instance groups, for example. However, you can use VMs for persistent storage if desired.
normally the data boot disk should be ok with restart and other maintenance operation. But it will be deleted with the compute by default.
If you use managed-instance-group, preemptible compute... and you want persistent data, you should use another storage system. If you juste use compute as is, it should be safe enough with backup.
I still think an additional persistent disk or another storage system is a better way to do things. But it's only my opinion.

Google Cloud - Local SSD hadware failure?

We are planning to use Google Cloud Local SSDs, because we need better IOPS than the persistent SSD disk have. We want to build a RAID5 array with 4 disks with mdamd (Linux). My question: how can we manage hardware failure with these disks? We can't unplug these disks, because we don't have phisycal access to the server. If we remove a disk with mdamd and add a new one, will it solve this problem?
Local SSD is an ephemeral storage space and is not a reliable storage method. For example, should the machine hosting your VM suffer from a hardware failure, your data will be lost and unrecoverable. The same is true if you stop the machine on purpose or accidentally.
RAID does not help, as your instance (and Google for that matter) will lose access to the data you stored on Local SSD once the instance stops running on that machine.