Spontaneous shutdowns in AWS EC2 instance - amazon-web-services

I have hosted a web server in an EC2 instance running Windows Server 2012 R2, and suddenly the instance became not available. I ran into the issue couple of times and when I checked the AWS Console, status of the instance has changed to Stop.
Interestingly when I checked for system logs in Event Viewer, I found this error message.
The process C:\Program Files\Amazon\XenTools\LiteAgent.exe
(EC2AMAZ-******) has initiated the shutdown of computer EC2AMAZ-******
on behalf of user NT AUTHORITY\SYSTEM for the following reason: No
title for this reason could be found Reason Code: 0x8000000c
Shutdown Type: shutdown Comment:
Any idea why it happened and what does LiteAgent.exe do?

This is Amazon's management service. This is the message you would get if someone shuts the machine down via the Web UI or if Amazon's infrastructure shut the machine down (for autoscaling etc).
If you need to know who's doing this you should consider enabling AWS Cloud Trail on the EC2 instances.

Related

EC2 server losses internet connection and application fails to send email, sms and even yum updates

I have 5 EC2 servers in the same VPC and all of a sudden yesterday, all of my applications started failing to send email and sms. So I tried doing git pull of my project it also timed out. Then tried to install telnet using yum that to failed with Time out. I have checked almost everything including Network ACLs, Security Groups, Subnets, Iptables, etc and everything is correct. I am not sure why is this happening.
The weird thing is if I reboot the server once the internet comes for a brief amount of time and again it disconnects.
Attaching below are the errors I am facing:
Error while Generating the Tiny URL. Error: {"errno":-110,"code":"ETIMEDOUT","syscall":"connect","address":"XXX.XX.XXX.XX","port":443}
Error SendEmail UnknownEndpoint: Inaccessible host: `email.ap-south-1.amazonaws.com'. This service may not be available in the `ap-south-1' region.
Attaching screenshots of my Network ACLs, Security Groups, Subnets, and iptables:
Please help with what am I doing wrong or if is this an issue with AWS EC2? My goal is to make sure my application works without timeout and git and yum starts working.
Did you try terminating and reprovisioning the instances, rather than rebooting them? There may be some problem with the underlying hardware. When you terminate and recreate an instance, it will likely end up in a different rack in the datacenter, which may solve the problem.
If the above helps, you should consider setting up an application load balancer with an auto scaling group, with health checks enabled for both, so that the auto scaling group terminates unhealthy instances and replaces then with the new ones automatically.
You may also consider using Simple Notification Service and stop worrying about underlying compute for e-mail and sms distribution altogether!

EC2 instances connecting to lambda result in ConnectFailure

I'm trying to access lambda functions from a Windows VM I have created in EC2 for dev purposes but even a simple 'list functions' command fails to connect
I have tried using the AWS CLI through PowerShell, the dotnet sdk and the VS AWS Toolkit but each of these times out after a long waiting period. I can, however, list other services such as my databases and S3 buckets.
aws cli failure message
VS toolkit failure message
I have tried creating a new VM with the same results. I've disabled windows firewall altogether, allowed all traffic through the security group and have VPC endpoints for my subnet (ssm, ec2messages, lambda, ec2).
I have no trouble connecting to the lambda service through my own computer. On the VM, I have modified the .aws/credentials file to match the one on my computer for both the admin and current user but I still can't connect. This tells me that the problem isn't related to my access key credentials.
I'm reaching the end of the troubleshooting options I can think of so any help would be very much appreciated!
Update: using telnet, I cannot connect to lambda.ap-southeast-2 but I can connect to s3.ap-southeast-2 and lambda.ap-southeast-1. It seems lambda.ap-southeast-2 is being blocked somewhere but it isn't windows firewall because it's off and the same problem happens on Ubuntu VMs.
In the VPC Management Console, I haven't set up any firewalls under network or dns filewalls and my network ACL allows all traffic.

Why are outbound SSH connections from Google CloudRun to EC2 instances unspeakably slow?

I have a Node API deployed to Google CloudRun and it is responsible for managing external servers (clean, new Amazon EC2 Linux VM's), including through SSH and SFTP. SSH and SFTP actually work eventually but the connections take 2-5 MINUTES to initiate. Sometimes they timeout with handshake timeout errors.
The same service running on my laptop, connecting to the same external servers, has no issues and the connections are as fast as any normal SSH connection.
The deployment on CloudRun is pretty standard. I'm running it with a service account that permits access to secrets, etc. Plenty of memory allocated.
I have a VPC Connector set up, and have routed all traffic through the VPC connector, as per the instructions here: https://cloud.google.com/run/docs/configuring/static-outbound-ip
I also tried setting UseDNS no in the /etc/ssh/sshd_config file on the EC2 as per some suggestions online re: slow SSH logins, but that has not make a difference.
I have rebuilt and redeployed the project a few dozen times and all tests are on brand new EC2 instances.
I am attempting these connections using open source wrappers on the Node ssh2 library, node-ssh and ssh2-sftp-client.
Ideas?
Cloud Run works only until you have a HTTP request active.
You proably don't have an active request during this on Cloud Run, as outside of the active request the CPU is throttled.
Best for this pipeline is Cloud Workflows and regular Compute Engine instances.
You can setup a Workflow to start a Compute Engine for this task, and stop once it finished doing the steps.
I am the author of article: Run shell commands and orchestrate Compute Engine VMs with Cloud Workflows it will guide you how to setup.
Executing the Workflow can be triggered by Cloud Scheduler or by HTTP ping.

TeamCity Agent Push Failing Across AWS Accounts

We've recently moved our TeamCity server to AWS, but it is managed by a different business unit in my company, therefore we have different AWS accounts. I've gone through our parent company to get VPC peering enabled, so that I can launch EC2 instance build agents.
To simplify: Our TeamCity server is on AWS account A and I'm working on AWS account B, where I want the build agents to launch.
I had no problems doing this back when the server was on-prem, but I'm having real trouble now.
Good: I can launch the instances from TeamCity, which is located in the other business unit's account.
Bad: I can't get it to progress from there.
I just want to be able to get 'Agent Push' to work right now. Right now, when I try, this is the output I'm given in the web console:
[15:12:09]: AgentPush v58406 - Install Agent on remote host
[15:12:09]: Looking for Target Host...
[15:12:09]: Validating TeamCity Server Root URL 'https://teamcity.company.com' ...
[15:12:09]: Starting agent push to 'xx.xx.xxx.xxx'(IP: xx.xx.xxx.xxx) using preset 'Amazon Linux' (Username 'ec2-user'. Target platform: 'Unix')
[15:12:09]: Checking Platform...
[15:16:09]: Remote agent installation failed: timeout: socket is not established
One more thing: we use direct connect and all private IPs. I'm supplying the private IP to the agent push. This worked when I was running it on-prem.
Does anyone have any ideas as to why I can't get the instances to talk to each other?
You need to setup AWS Cross account access. More here in docs:
https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html?icmpid=docs_iam_console

Amazon EC2 small instance not responding

My Amazon EC2 small instance stopped responding, I looked at the AWS console and CPU use had gone through the roof. I tried rebooting instance but it didn't respond. So I stopped it and started it again (twice).
Now says the CPU usage is fine (was triggering an alarm when breaching 90%) but still can't login via SSH and Apache is not working (my sites are down).
Anyone give me any idea how I can sort this out? I'm out of my depth a bit as unfamiliar with the ins and outs of EC2.
EDIT: console log http://pastebin.com/JWFeG7NU shows Apache, SSH, etc starting up fine but I can't access via SSH and no response to pinging website hosted on server.
If you have stop/started your instance and you were not using an elastic IP address, your instance IP has changed.
If you were using an elastic IP address, it would have become disassociated.
If you do have applications that are causing you to exceed the allocated CPU, other applications such as ssh, may become slow to respond or not respond at all within the timeout.