Local SSDs are the fastest storage available in Google Cloud Platform, which makes it clear why people would want to use it. But it comes with some severe drawbacks, most notably that all data on the SSDs will be lost if the VM is shut down manually.
It also seems to be impossible to take and image of a local SSD and restore it to a restarted disk.
What then is the recommended way to back-up local SSD data if I have to shut down the VM (if I want to change the machine specs for example)?
So far I can only thing of manually copying my files to a bucket or a persistent disk.
If you have a solution to this, would it work if multiple local SSDs were mounted into a single logical volume?
I'm guessing that you want to create a backup of data stored on local SSD every time you shut down the machine (or reboot).
To achieve this (as #John Hanley commented) you have to copy the data either manually or by some script to other storage (persistend disk, bucket etc).
If you're running Linux:
Here's a great answer how to run script at reboot / shutdown. You can then create a script that will copy all the data to some more persistend storage solution.
If I were you I'd just use rsync and cron. Run it every hour or day (depends on your use case). Here's a another great example how to use rsync to synchronize folders.
If you're running Windows:
It is also possible to run command at windows shutdown and here's how.
Related
I am running a notebook instance on Amazon Sagemaker and my understanding is that notebooks by default have 5GB of storage. I am running into [Errno 28] No space left on device on a notebook that worked just fine the last time I tried it. I checked and I'm using approximately 1.5GB out of 5GB. I'm trying to download a bunch of files from my S3 bucket but I get the error even before one file is downloaded. Additionally, the notebook no longer autosaves.
Has anyone run into this and figured out a way to fix it? I've already tried clearing all outputs.
Thanks in advance!
Open a terminal and run df -kh to tell which FS is running out of disk space.
There's a root filesystem which is 100GB, and there's a user filesystem which size you can customize (default 5GB) (doc).
Guess: I saw that, especially when using Docker, the root FS can run out of space.
You will want to try to restart or kill the kernel. Sometimes some cells are left running and you may be trying to execute another operation. Sometimes logfiles also if you have any kill the space so try to remove all tertiary files that you are not using.
I work for AWS & my opinions are my own
Often this is seen in cases when there are certain unused resources running.
The default FS size is 100GB.
If you are using SageMaker Studio, you can use [this][1] JupyterLab extension to automatically shuts down Kernels, Terminals and Apps in Sagemaker Studio when they are idle for a stipulated period of time. You will be able to configure an idle time limit using the user interface this extension provides.
[1]: https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension
You can resize the persistent storage of your notebook up to 16Tb in the editing the notebook details on AWS console. This volume however is mounted under /home/ec2-user/SageMaker. Download your files under this folder and you'll see all the storage you allocated.
I am trying to run airflow in google cloud run.
Getting error Disk I/O error, I guess the disk write permission is missing.
can someone please help me with this how to give write permission inside cloud run.
I also have to write file and later delete it.
Only the directory /tmp is writable in Cloud Run. So, change the default write location to write into this directory.
However, you have to be aware of 2 things:
Cloud Run is stateless, that means when a new instance is created, the container start from scratch, with an empty /tmp directory
/tmp directory is an in-memory file system. The maximum allowed memory on Cloud Run is 2Gb, your app memory footprint included. In addition of your file and Airflow, not sure that you will have a lot of space.
A final remark. Cloud Run is active only when it process request, and a request has a maximum timeout of 15 minutes. When no request, the allowed cpu is close to 0%. I'm not sure of what you want to achieve with Airflow on Cloud Run, but my feeling tells me that your design is strange. And I prefer to warn you before you spend too much effort on this.
EDIT 1:
Cloud Run service has evolved in the right way. In 2022,
/tmp is no longer the only writable directory (you can write everywhere, but it's still in memory)
the timeout is no longer limited to 15 minutes, but to 60 minutes
The 2nd gen runtime execution (still in preview) allows you to mount NFS (Filestore) or Cloud Storage (GCSFuse) volume to have services "more stateful".
You can also execute jobs now. So, a lot of very great evolution!
My impression is that you have a write i/o error because you are using SQLite. Is that possible.
If you want to run Airflow using cointainers, I would recommend to use Postgres or MySQL as backend databases.
You can also mount the plugins and dag folder in some external volume.
I have a question about "quiescence" snapshot.
As I understand step of quiescence" snapshot:
Freeze FS and Processes
Run pre-freeze
Make snapshot
Run post-script
Unfreeze FS and Processes
Is it right? For example if I stop mysql in pre-freeze , is it not necessary to run it in post-script?
You may find this KB article useful:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180
In general, the goal here is to get the file system into a consistent state. That is to say, if you restore from this snapshot, you want to make sure that your application can re-start from this point. For many SQL based databases, this implies that transactions should be committed prior to snapshot. The actual details vary depending on the system you are using.
All of this is managed by guest tools as only the guest can actually quiesce the file system.
I'm planning to get a hadoop/hbase cluster up and I'm trying to figure out what ec2 intance type to use and how much EBS space.
I'm going initially with
1 master (m1.small)
2 slaves (m1.small)
I'm not expecting more than 100 simultaneous users on my website (Is this no big?)
Well, I would attach 20 GB EBS blocks to each of master and slaves. These EBS blocks will contain the data storage and logs from HDFS and HBase.
The path of hbase should look like /mnt/hadoop/hbase/root where /mnt/hadoop is the directory where EBS block (for e.g. /dev/sda) will be mounted
Eventually, this space will be filled and when I realize that 20 GB is less I would create a 60 GB (/dev/sdb) lets say and attach it the disk. Now, I'll copy everything from /dev/sda to /dev/sdb and finally mount /dev/sdb to /mnt/hadoop
Does, HDFS/HBase see any difference after this change? Is it legal to do in this way or discouraged?
How do we increase the storage of the device the HBase/HDFS write its data?
Neither HBase nor Hadoop stops you from doing that. It's merely changing your configuration parameters accordingly. You need to be careful though.
But why would you do that, when you are sure that you are going to hit the limits?Prevention is better than cure, IMHO.
I was wondering, what kind of technique VMware Snapshots uses to assure that you will be able to return to a previous state without copying all the VM's disk?
It is basically a delta child disk. Operations are made it it while running off the snapshot. Makes it easy to revert.
link to explanation