google cloud hard disk deleted. all data lost

google cloud hard disk deleted. all data lost - google-cloud-platform

My google cloud VM hard disk got full. So I tried to increase its size. I have done this before. This time things went differently. I increased the size. But the VM was not picking up the new size. So I stopped VM. Next thing I know, my VM got deleted and recreated, my hard disk returned to previous size with all data lost. It had my database with over 2 months of changes.
I admit I was careless not to backup. But currently my concern is, is there a way to retrieve the data. On Google Cloud, it shows $400 for Gold Plan which includes Tech Support. If I can be certain that they will be able to recover the data, I will am willing to pay. Does anyone know if I pay $400, the google support team will be able to recover the data?
If there are other ways to recover data, kindly let me know.
UPDATE:
Few people have shown interest in investigating this.
This most likely happened because by default "Auto-delete boot disk" option is selected which I was not aware of. But even then, I would expect auto-delete to happen when I delete the VM, not when I simply stopped it.
I am attaching screenshot of all activities that happened after I resized the boot partition.
As you can see, I resized the disk at 2:00AM.
After receiving resize successful message, I stopped the VM.
Suddenly at 2:01, VM got deleted.
At this point I had not checked notifications, I simply thought, it stopped. Then I started VM hoping to see new resized disk.
Instead of starting my VM, new VM was created with new disk and all previous data was lost.
I tried stopping and starting VM again. But the result was still the same.
UPDATE:
Adding activities before the incident.

It is not possible to recover deleted PDs.
You have no snapshots either?

The disk may have been marked for auto-delete.
However, this disk shouldn't have been deleted when the instance was stopped even if it was marked for auto-delete.
You can also only recover a persistent disk from a snapshot.

In a managed instance group, when you stop an instance, health check fails and the MIG deletes and recreates an instance if autoscaler is on. The process is discussed here. I hope that sheds some light if that is your use case.

Related

Google Cloud SQL - Database instance storage size increased dramatically everyday

I have a database instance (MySQL 8) on Google Cloud and since 20 days ago, the instance's storage usage just keeps increasing (approx 2Gb every single day!).
But I couldn't find out why.
What I have done:
Take a look at Point-in-time recovery "Point-in-time recovery" option, it's already disabled.
Binary logs is not enabled.
Check the actual database size and I see my database is just only 10GB in size
No innodb_per_table flag, so it must be "false" by default
Storage usage chart:
Database flags:
The actual database size is 10GB, now the storage usage takes up to 220GB! That's a lot of money!
I couldn't resolve this issue, please give me some ideal tips. Thank you!

I had the same thing happen to me about a year ago. I couldn't determine any root cause of the huge increase in storage size. I restarted the server and the problem stopped. None of my databases experienced any significant increase in size. My best guess is that some runaway process causes the binlog to blow up.

Turns out the problem is in a Wordpress theme's function called "related_products" which just read and write every instance of the products that user comes accross (it would be millions per day) and makes the database physically blew up.

Getting an error with no resources when creating a vm that is ongoing

I keep getting an error message that says there are not enough resources in the zone to create a VM (Us-Central F). This has been going on for a couple of days. Is there a way to fix this or report this? Any advice and answers would be appreciated!

You can reserve resources you need or wait and try your luck with creating desired VM. Changing the machine type, amount of ram etc - lowering VM specs will also increase your chances.
Otherwise you have to use other zone or even region - there's no way around it since even GCP has limited resources and due to high demand some of them may not be available. The only difference will be higher latency.

not have enough resources available to fulfil the request try a different zone

not have enough resources available to fulfill the request try a different zone
All of my machines in the different zone
have the same issue and can not run.
"Starting VM instance "home-1" failed.
Error:
The zone 'projects/extreme-pixel-208800/zones/us-west1-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later."

I am having the same issue. I emailed google and figured out this has nothing to do with quota. However, you can try to decrease the need of your instance (eg. decrease RAM, CPUs, GPUs). It might work if you are lucky.
Secondly, if you want to email google again, you will get the message sent from the following template.
Good day! This is XX from Google Cloud Platform Support and I'll be
glad to help you from here. First, my apologies that you’re
experiencing this issue. Rest assured that the team is working hard to
resolve it.
Our goal is to make sure that there are available resources in all
zones. This type of issue is rare, when a situation like this occurs
or is about to occur, our team is notified immediately and the issue
is investigated.
We recommend deploying and balancing your workload across multiple
zones or regions to reduce the likelihood of an outage. Please review
our documentation [1] which outlines how to build resilient and
scalable architectures on Google Cloud Platform.
Again, we want to offer our sincerest apologies. We are working hard
to resolve this and make this an exceptionally rare event. I'll be
keeping this case open for one (1) business day in case you have
additional question related to this matter, otherwise you may
disregard this email for this ticket to automatically close.
All the best,
XXXX Google Cloud Platform Support
[1] https://cloud.google.com/solutions/scalable-and-resilient-apps
So, if you ask me how long you are expected to wait and when this issue is likely to happen:
I waited for an average of 1.5-3 days.
During the weekend (like from Friday to Sunday) daytime EST, GCP has a high probability of unavailable resources.
Usually when you have one instance that has this issue, others too. For me, keep trying in different region waste my time. (But, maybe it just that I don't have any luck)

The error message "The zone 'projects/[...]' does not have enough resources available to fulfill the request. Try a different zone, or try again later." is always in reference to a shortage of resources in a zone.
Google recommends spreading your workload across different zones to reduce the impact of these issues on your workload. Otherwise, there isn't much else to do other than wait or try another zone/region

Faced this Issue yesterday [01/Aug/2020] when GCP free credit was over and below steps helped to workaround this.
I was on asia-south-c zone and moved to us zone
Going to my Google Cloud Platform >>> Compute Engine
Went to Snapshots >>> created a snapshot >>> Select your Compute Engine instance
Once snapshot was completed I clicked on my snapshot.
Ended up under "snapshot details". There, on the top, just click create instance. Here you are basically creating an instance with a copy of your disk.
Select your new zone, don't forget to attach GPUs, all previous setting, create new name.
Click create, that's it, your image should now be running in your new zone
No worry of losting configuration as well.

Recover session after Network Error in AWS

I'm a beginner user of AWS and I'm using an EC2 instance for MCMC sampling which requires some hours of time. Unfortunately I had a network problem in the middle of the sampling and got the message:
Network error: Software caused connection abort
So that I had to reboot the instance losing all of my work (but not my data).
Is there a way to set up the instance to avoid this issue?
Thank you in advance

I'm unsure what MCMC sampling mean but will try to guess.
The only way not to lost information in such cases is to store it at reliable solution, e.g. S3.
If you meant long calculations then you need to parallel them or at least subdivide to smaller chunks then store the queue, its status and the intermediate results at the reliable storage. Merhaps the code have to be modified. If your calculations can be parallelized then you may want to check SQS and spot instances, sometimes you can save a lot of money.
If my guess is incorrect then pls clarify.

instead of restarting, rebooting the instance will fix this issue most of the time. Instance reboot persist any data on its instance store volumes.

WHM Cpanel / AWS hosting, this partition is full, how?

I've got this hosting account with AWS, and I had a partition that was getting really full. So I hired someone to create a new partition and he showed me how to migrate a few accounts into the new partition.
I thought this would bring down the amount of data that was in the near full partition, but it hasn't.
See from the screenshot, what is called 'home3' is totally full, and 'home4' (the new partition) is slowly filling up.
I'm assuming home3 is full of backups or something.
How do I clean up home3 without messing with command line tools? Or, do I need to hire a real pro to do this? Is there something in WHM that allows me to do a clean up? Because if there is I can't find it.

If you are not sure what is stored in /home3 partition, you should hire server admin to check and perform necessary migration for you. Before you delete any data, you should know whether those data are important or not.
Also, in some cause, it requires a server reboot after deleting or moving data from full partition. Actual free space should be reflected after server reboot.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js