I have tried to reboot the Amazon RDS and the status is stuck at Rebooting. It's been 3 days now and it still shows the same message. I have tried killing all the processes running on the database but it did not work.
I'm unable to take a snapshot also due to this.
Please suggest a solution.
Image for Reference
Contact AWS support.
Realistically that is the only way you will get this resolved. 3 days implies something has gone wrong with the underlying system and they will need to get system engineers involved to resolve it.
Related
Attempted a minor system update to WikiJS yesterday afternoon, and when the site came up for a restart... all I'm getting now is a site can't be reached failure.
Having issues SSH'ing into the box, and am looking for ways around that particular problem. It's hosted on an AWS EC2 instance that I can stop/start/reboot, but that's it.
At one point yesterday I did get an Unknown authentication strategy "jwt" error, but now it's showing nothing again.
While I'm working through the issues of getting into the box itself, is there anything that jumps out at y'all that I should be looking towards?
Many thanks in advance.
This may be a very simple thing, but I am pretty new to GCP and don't really understand how all this stuff works so please bear with me.
I am trying to host a static site with GCP. My site is built with Jekyll and I am using GCP containers to deploy it. I got that part working.
I then wanted to give it a human-friendly URL. I bought one using the GCP console and then went to create a domain name mapping. So far I have been waiting for a couple of days. I read on some other similar posts that canceling and restarting the mapping process helped with the issue, but I've tried 3 times so far waiting ~24 hours between each, and no luck still.
It tells me that I need to configure the DNS records with my domain host, but if I understand it correctly GCP is my domain host. I have also followed the instructions here and still no luck.
Am I doing something wrong or perhaps I am missing something here?
Note: I have DNSSEC on, maybe that makes a difference.
I am having some of my GCP instances behave in a way similar to what is described in the below link:
Google Cloud VM Files Deleted after Restart
The session gets disconnected after a small duration of inactivity at times. On reconnecting, the machine is as if it is freshly installed. (Not on restarts as in the above link). All the files are gone.
As you can see in the attachment, it is creating the profile directory fresh when the session is reconnected. Also, none of the installations I have made are there. Everything is lost including the root installations. Fortunately, I have been logging all my commands and file set ups manually on my client. So, nothing is lost, but I would like to know what is happening and resolve this for good.
This has now happened a few times.
A point to note is that if I get a clean exit, like if I properly logout or exit from the ssh, I get the machine back as I have left, when I reconnect. The issue is there only when the session disconnects itself. There have been instances where the session disconnected and I was able to connect back as well.
The issue is not there on all my VMs.
From the suggestions from the link I have posted above:
I am not connected to the cloud shell. i am taking ssh of the machine using the chrome extension
Have not manually mounted any disks (afaik)
I have checked the logs from gcloud compute instances get-serial-port-output --zone us-east4-c INSTANCE_NAME. I could not really make much of it. Is there anything I should look for specifically?
Any help is appreciated.
Please find the links to the logs as suggested by #W_B
Below is from 8th when the machine was restarted and files deleted
https://pastebin.com/NN5dvQMK
It happened again today. I didn't run the command immediately then. The below file is from afterwards though
https://pastebin.com/m5cgdLF6
The below one is after logout today.
[4]: https://pastebin.com/143NPatF
Please note that I have replaced the user id, system name and a lot of numeric values in general using regexp. So, there is a slight chance that the time and other values have changed. Not sure if that would be a problem.
I have added the screenshot of the current config from the UI
Using locally attached SDD seems to be the cause ... here it is explained:
https://cloud.google.com/compute/docs/disks/local-ssd#data_persistence
You need to use a "persistent disk" - else it will behave just as you describe it.
I have a project deployed on EC2 instance and is up.
But sometime when I login through FTP and transfer the updated build to the EC2, some of my project file gets missing.
After a while those set of files is seen listed at the same place.
Couldn't relate why these unexpected behavior is happening. Let me know if anyone has faced similar kind of situation.
Or anyone can give me a way to know what all logins are being done through FTP and SSH on my EC2.
Files don't just randomly go missing on an EC2 instance. I suspect there is something going on and you'll need to diagnose it. There is not enough information here to help you but I can try point you in the right direction.
A few things that come to mind are:
What are you running to execute the ftp command? If it's appearing after some time, are you sure it's just not in progress when you first check then it appears when it's done? are you sure nothing is being cached?
Are you sure your FTP client is connected to the right instance?
Are you sure there are no cron tasks or external entities connecting to the instance and cleaning out a certain directory? You said something about the build, is this a build agent you're performing this on?
I highly doubt it's this one but: What type of volume are you working on? EBS? Instance Store? Instance Store is ephemeral so stopping/starting the instance can result in data being lost.
Have you tried using scp ?
If you're still stumped, please provide more info on your ec2 config and how you're transferring the file.
Yes, I've heard all the stories about EC2 instances being unreliable and how you need to proactively prepare for that. I've also heard stories from others about how they have never had a problem, and their instances just run and run.
Today I had a strange thing happen. I've had an Linux instance running for a couple of months, as I've been preparing to launch an e-commerce site. I've been periodically taking snapshots. I have my images on S3. I have my code in a private github repo. All things considered, I've been doing a fairly good job of protecting myself against failure. Ironically, it was while I was doing even more in this regard today that I experienced something really strange.
Since I have these snapshots, I had assumed that the best thing to do if I needed to quickly spin up a new instance (whether due to a failed instance that wouldn't come back up, or if I just needed additional capacity) would be to take a snapshot and make a volume out of it, then make an image out of that volume, and then launch a new instance using that image.
For whatever reason, every time I've tried that lately, the new instance had a kernel panic during boot, so I decided to try a different approach. I right-clicked on my RUNNING INSTANCE, and chose "Create Image." That seemed like a reasonable shortcut. Then I went to that image and launched an instance.
At almost exactly the same time, my original instance rebooted. I didn't even see it happen. I only know it did from the system log. Is this just a wild coincidence? Or did I commit a silly mistake and accidentally screw up my instance?
Fortunately, I'm just getting this new thing off the ground, so the bit of downtime didn't kill me, and I was able to very quickly get things going again. But either I totally do not understand the "Create Image" feature from the instance list, or I got really unlucky today.
"Create image" takes the following actions:
Stop EC2 instance
Snapshot EBS volume
Start EC2 instance
Register EBS snapshot as an AMI
So, yes, this would look like a reboot because it is like a reboot.
Here's an article I wrote on the difference between stop/start and simple reboot: http://alestic.com/2011/09/ec2-reboot-stop-start
Your problem sounds a lot like my problem. After some searching this page helped me: http://www.raleche.com/node/138
"The problem turned out to be the kernel. Both when creating the AMI and the instance I selected default for the kernel image.
To resolve the problem, I recreated the AMI using the same kernel image as the original instance."