Snapshot is a way usig which VM states can be saved and it can be reverted back to point in time when the snapshot was taken.
Are there any other ways of doing this? For example, create incremental copies of VM files and restore those copies as needed. Copies can contain only incremental data. Are there any such different alternatives to snaphots? One of the other considerations for me is to use only VMware tools/technologies.
Thanks,
Vivek.
Snapshot is one of the best thing you have for maintaining Virtual Machine state.
It locks the current disk and creates a new disk which will have the incremental data stored.
So when you revert to snapshot same state is restored.
VCB is another way to take backups, it internally uses snapshots for taking backup.
So AFAIK taking snapshots is the only available way to maintain state of a VM.
Related
I would like to update the samba on a 3TB NAS. My boss suggested making a clone, however, there is no storage that will fit him whole. If a snapshot of the VM costs a smaller size, and serves to, in case of failure, restore the samba as it was, making it a better idea.
There's no real guide on how much space snapshots occupy. That will greatly depend on the activity on the VM where the snapshot has been taken. If it's an active VM (database or something of the like), there could be a considerable amount of data written. If it's not a very used VM, there could be limited to no data written to the backend datastore.
Will restoring a snapshot recreate the environment EXACTLY as it was at the point of the snapshot? I am specifically referring to the operating system and installed software.
If not, then I assume that a disk image is a correct approach
Snapshot and Disk Image use the same process. Both take a point in time copy of a storage device by performing a block copy.
Will either restore exactly (bit-for-bit) as the source? Yes, if you shutdown the VM instance. Maybe if you do not. Google (and AWS, Azure, etc.) strongly recommend that you shutdown your VM before these types of operations. The reason is that file system data could be cached in memory that has not been flushed to disk. A snaphot requires that all applications and the OS participate in the snaphost process. Few applications do.
On one of my AWS instances running Ubuntu 16.04, I've a MySQL replica database on a 1TB ext4 EBS volume. I plan to increase it to 2TB. Before I increase the size of the volume and extend the filesystem using the resize2fs command, do I need to take any precautions? Is there any possibility of data corruption? If so would it be sane to create a EBS snapshot of this volume?
Do I need to take any precautions?
You shouldn't need to take any unusual precautions -- just standard best practices, like maintaining backups and having a tested recovery plan. Anything can go wrong at any time, even when you're sitting, doing nothing.
Important
Before modifying a volume that contains valuable data, it is a best practice to create a snapshot of the volume in case you need to roll back your changes.
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-modify-volume.html
But this is not indicative of the operation being especially risky. Anecdotally, I've never experienced complications, and have occasionally resized an EBS volume and then its filesystem under a live, master, production database.
Is there any possibility of data corruption?
The possibility of data corruption is always there, no matter what you are doing... but this seems to be a safe operation. The additional space becomes available immediately, and there is no I/O freeze or disruption.
If so would it be sane to create a EBS snapshot of this volume?
As noted above, yes.
Concerns about errors creeping in later are valid, but EBS maintains internal consistency checks and will disable a volume if this fails to help avoid further scrambling of data so that you can do a controlled recovery and repair operation.
This would not help if EBS is prefectly storing data that was corrupted by something on the instance, such as might be caused by a defect in resize2fs, but it seems to be a solid utility. It doesn't move your existing data -- it just fleshes out the filesystem structures as needed to the filesystem use the entire free space that has become available.
I have a gcp snapshot and I want to read the data(not restoring) of that snapshot. How can I do that? What are the APIs available in gcp for that?
It's reasonable to want to browse (Persistent Disk) Snapshots but you cannot.
You must restore a (series of) Snapshots to a PD in order to browse the content.
Although more involved, you could create a PD, restore the Snapshot, do what you need and then delete the PD without incurring considerable cost.
I assume your use-case is that you'd like to confirm a specific (set of) file(s) has been Snapshotted?
If so, you may wish to consider a formal backup solution. This would provide you with a rich, queryable source of backed up files, would permit you to move backed-up replicas to different locations|media, and could possibly provide online backups whereby you needn't stop applications atop your PDs while backups are taken.
Currently I am looking how the backup/restore be done in Cassandra. We've setup a three node cluster in AWS. I understand that using nodetool snapshot tool we can take a snapshot but it's bit cumbersome process.
My idea is :
Make use of EBS snapshot because they're more durable and easy to setup but one problem which I see with EBS is inconsistency backup. Hence, my plan is run a script prior to taking EBS snapshot which would just run flush command to flush out all the memtable data and copies it on to the disk(SSTable) and then prepares the hard link with flushed sstables.
Once that's done, initiate the EBS snapshot, this was we can address the inconsistency issue which we might face if we only use EBS snapshost.
Please let me know if you see any issue with this approach or share your suggestions.
Being immutable, SSTables do help a lot when it comes to backups, indeed.
Your ideia sounds ok for situations where everything is healthy on your cluster. Actually, Cassandra is consistency-configurable (if I say eventually consistent, some people may be offended here, hehe), and as the system itself may no be fully consistent at a given time, you cannot say your backup will be as well. But, by the other hand, one of the beauties of Cassandra (and NoSQL models) is that it tends to recover pretty well, which is true for Cassandra in most situations (quite opposite to a relational databases, which are very sensitive to data losses). It's very unlikely you end up with a bunch of useless data if you have at least fully preserved SSTables files.
Be aware that EBS Snapshots are block-level. So, when you have a filesystem on top of it, it may be a concern as well. Fortunately, any modern filesystem have journaling nowadays and are pretty reliable, so that shouldn't be a problem, but having your data in a separate partition is a good practice, so the chances of someone else writing in it right after a full flush are smaller.
You may have some lost replicas when you eventually need to restore you cluster, demanding you to run nodetool repair, what, if you have done before, is a bit painful and takes very long for large amounts of data. (But, repair is recommended to be run regularly anyway, specially if you delete a lot.)
Another thing to consider are hinted handoffs (writes whose row owners are missing, but which are kept by other nodes until the owners come back). I don't know what happens with them when you flush, but I guess they're kept in memory and on commit logs only.
And, off course, do a full restore before you assume this will work in the future.
I don't have a large experience with Cassandra, but what I have heard about backup solutions for it are whole cluster replicas in another region, or datacenter, instead of cold backups like snapshots. It's probably more expensive but more reliable too than raw disks snapshots like you trying to do.
I am not sure how backup of a node will help, because in C* data is already backed up in the replica nodes.
If a node is dead and has to be replaced, the new node will learn about the data from other nodes that it needs to own and get it from other nodes, so you might not need to restore from a disk backup.
Would a replication scenario like the following help ?
Use two data centers (DC:A with 3 nodes) (DC:B with one node) with RF of (A:2 & B:1). Allow clients to interact with nodes in DC:A, with a Read/write consistency of Local_QUORUM. Here since quorum in 2 all reads and write will be successful and you will get data replicated on DC:B. Now you could back up DC:B