Ec2 1/2 checks passed - amazon-web-services

Since today i can't access my instance, i tried stop and restart several times but the status is always : "1/2 checks passed"
I tried to create a snapshot, detach and reattach new volume but the result is the same.
I also tried to create another instance and attach the volume and it's not starting either.
Any help ?

The status checks automatically performed on Amazon EC2 instances are:
System Status Checks: These check the underlying systems used by the Amazon EC2 instance
Instance Status Checks: These check the configuration of the specific instance
See documentation: Status Checks for Your Instances
Often, an instance is available and ready to be used before these checks are complete -- this is especially the case for Linux instances because they boot very quickly.
If you receive a 1/2 checks passed message, either wait a little longer or Stop and Start the instance. Performing a Stop/Start will launch the instance on a different host, which will probably fix whatever problem was being experienced.
If the 1/2 checks passed message continues to appear after a Stop/Start, it is probably a misconfiguration of the AMI. I have seen this when the wrong virtualization type was selected for an AMI that was created from a Snapshot.
You might be able to get a hint about the problem by using the Get System Log command in the Actions menu, which shows the log while the instance is booting.
Worst case, launch a new instance from a known-good AMI, attach the non-booting volume as an additional disk and copy files to the new disk. You will still have access to your files even if it will not boot.

You can check the description of the checks here, and understand which one is not working...

Related

preemptible VM's in managed instance group go into terminated state

I have a Managed Instance Group made up of a set of preemptible VM's - they are ephemeral and can be preempted anytime (our group is large enough to sustain losing several VM's at once) - for the most part the MIG will bring the VM count back up to the desired level on VM preemption - occasionally a node goes into the terminated state and the MIG still counts it as available and does nothing to correct the issue - so I am down a one or more vms. My understanding of the terminated state is that "TERMINATED. A user shut down the instance, or the instance encountered a failure. You can choose to restart the instance or delete it". Given that we didnt shut the instance down it must have encountered some failure - logs dont indicate anything other than the node was pre-empted. How can I configure my instance group to delete/recreate VM's that end up in this state?
Reading your question I understand that you want to know why VM's terminated all the time right?
As you mentioned that you are using a Managed Instance Group with preemptible VM's, this means that the VM's are always terminated in 24 hours (or less) according to this document.
Other than that, maybe you want to be sure what happened on your instance in the last hours, for that I recommend you to open SSH in your instance and use "journalctl" as example:
journalctl -b --since "2021-03-04 00:00:00" | grep 'terminated'
This command will look for all the "terminated" statements from the given timestamp to the moment you run the command.
If you don't care about the termination or your VM's every 24 hours I don't see a problem of using preemptible VM's. But if this is causing you problems in your operation I would suggest you to turn off the preemptible feature and let the load balancer to act according your needs.
Jose.

EDIT: How to restore instance from scheduled snapshots in GCP

I have a scheduled daily snapshot in GCP for one of my instances. I have several snapshots now. The first one is the full snapshot and the rest of the snapshots only contain changed data.
I want to be able to restore and boot the instance but it fails to boot. Checking the serial console I see reference to a blue screen and then it reboots and shows the same errors again, repeating the reboot cycle.
I have followed the guide in GCP on how to restore an instance from a snapshot by creating a new instance, selecting the snapshot tab and then selecting my snapshot. After saving the instance and trying to boot it I get the blue screen message.
Also, if I create a new instance and use a Windows 2008 R2 Datacenter image the system obviously boots fine but if I try to attach the snapshot disk as a secondary disk (non-boot) then I get the error: Editing VM instance failed. Error: Supplied fingerprint does not match current metadata fingerprint. I'm not sure if this is related to my issue with unable to boot the OS from my snapshot.
I did find a workaround:
1) create an image of running instance (my instance is Win'2008 R2 Datacenter)
2) enable scheduled snapshots of this new instance (with VSS)
3) wait for a scheduled snapshot to get created (hourly so must wait 1 hr)
4) create new instance from the scheduled snapshot
After all this work the instance boots just fine with all my data. Obviously not a very good workaround as now I have two instances with the same data. So I have to schedule the production system for maintenance so that I can bring it down and use the new instance so that future scheduled snapshots work if I try to restore it again. A major paint in the butt.
Anyone have any ideas as to why none of my instances boot from scheduled snapshots without my workaround? Manual snapshots work fine. And new instances also work fine with the same snapshot schedule.
I had this exact same problem. I tried multiple scheduled snapshots with the same result UNTIL I made a change in the VM Instance when attaching the restored snapshot. Maybe it's just Windows but if you named the disk something different then it seems to fail.
My original disk was called disk-1 for example. When I restored the snapshot I did it to disk-1-a and attached it to my instance. It failed the same way yours did. When I attach it and under "Device name" for the boot disk, select Custom and entered my original disk name of disk-1, it booted and RDP worked.

Change interruption behavior of fulfilled AWS spot request

Before knowing exactly how AWS spot instances work I configured a spot request with the interruption behavior set to terminate. As I understand it my running instance state will be deleted on termination. So if I don't have an image backup I will not be able to start the server again in it's last state.
Since the spot request is fulfilled and the instance running, is it possible to change the interruption behavior to stop when I am outbid? I cant seem to find the option to change the interruption behavior.
For setting the interruption behavior to stop, we need to take care of some requirements:
For a Spot Instance request, the type must be persistent, not
one-time. You cannot specify a launch group in the Spot Instance
request.
For a Spot Fleet request, the type must be maintain, not request.
The root volume must be an EBS volume, not an instance store volume.
by following the above requirements, we can change the interruption behavior from terminate to stop.
Please refer the following url for reference:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html
According to boto3 Spot instance creation documentation, you may also instruct the instance to stop or hibernate if you set the instance type to persistent. The default behaviour is terminate. This features is added in November 2017.
response = client.request_spot_fleet(
SpotFleetRequestConfig={
.....
Type='one-time'|'persistent',
'InstanceInterruptionBehavior': 'hibernate'|'stop'|'terminate'
}
Use them sparingly as each behaviour has some pros and cons, e.g. you must take care of process network connection interruption if yo use hibernate. For stop, you may want to store data into another mounted EBS.

Amazon EC2 instance passed 1/2 checks

Newbie to Amazon Web Services here. I launched an instance from a Public AMI and found that I could not ssh into the instance - I received the error "Connection timed out." I checked the security groups to verify that Port 22 was associated with 0.0.0.0/0. Additionally, I checked the route tables to verify that 0.0.0.0/0 is associated with target gateway attached to the VPC.
I find that only 1/2 status checks have passed - the instance status check failed. I have tried stopping and starting the instance as well as terminated and launching a new instance, both to no avail. The error that I see in the system log is:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1).
From this previous question, it appears that this could be a virtualization issue, but I'm not sure if that was due to something I did on my end when launching the instance or something that occurred from the creators of the AMI? Ec2 1/2 checks passed
Any help would be appreciated!
Can you share any more details about how you deployed the instance? Did you use the AWS Management Console, or one of the command line tools or SDKs to deploy it? Which public AMI did you use? Was it one of the ones provided by Amazon?
Depending on your needs, I would make sure that you use one of the AMIs provided by Amazon, such as Ubuntu, Amazon Linux, CentOS, etc. Here's the links to the docs on AMIs, but you can learn quite a bit by just searching for images. Since you mentioned virtualization types though, I'd suggest reading up briefly on the HVM vs. Paravirtual virtualization types on AWS. Each of the instance types / families uses a certain virtualization type, which is indicated in the chart on this page.
Instance Status Checks
This documentation page covers the instance status checks, which you'll probably want to familiarize yourself with. It's entirely possible that shutting down (not restart, but shutdown) and then starting the instance back up might resolve the instance status check.
Spot Instances - cost savings!
By the way, I'll just mention this since you indicated that you're new to AWS ... if you're just playing around right now, you can save a ton of cost by deploying EC2 Spot Instances, instead of paying the normal, on-demand rates. Depending on current rates, you can save more than 50%, and per-second billing still applies. Although there's the possibility that your EC2 instance could get "interrupted" based on market demand, you can configure your Spot Instance to just "Hibernate" or "Stop" instead of terminating and relaunching. That way, your work is instance state is saved for when it relaunches.
Hope this helps!
1) Use well-known images or contact with the image developer. Perhaps it requires more than one drive or tricky partitioning.
2) make sure you selected proper HVM/PV image according to the instance type.
3) (after checks are passed) make sure the instance has public ip

EC2 Spot instances: How to start tasks, how to stop them?

I have a long batch job that I'd like to run on AWS EC2 Spot Instances, to save money. However, I can't find the answer to two seemingly critical questions:
When a new instance is created, I need to upload the code onto it, configure it, and run the code. How does that get done for Spot Instances, which are created automatically and unattatendly?
When an instance is stopped, I would prefer having some type of notification, so that the state could be saved. (This is not critical, as the batch job will run fine if terminated suddenly - but a clean shutdown is preferred).
What is the standard way to deploy spot instances? Is there a way to do manual setup, turn it into a spot instance, and then let it hibernate until the spot price is available?
As to #1, if you create an AMI (amazon machine image), you can have everything you want pre-installed on a 'hibernating' image that you can use as a basis for the spot image you start:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-getting-started.html
For #2, you can be notified when a spot instance terminates using SNS:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-autoscaling-notifications.html
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/ASGettingNotifications.html
BTW: You can be notified the the instance was terminated, but only after it terminates. You can't get notified that an instance is about to be shutdown and gracefully save the state - you need to engineer your solution to be OK with unexpected shutdowns.
No matter how high you bid, there is always a risk that your Spot
Instance will be interrupted. We strongly recommend against bidding
above the On-Demand price or using Spot for applications that cannot
tolerate interruptions.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-protect-interruptions.html
You can use the user data settings to download from a specific repository a script and run it at the first instance startup.
As E.J. Brennan said: you can use SNS