Cloud SQL: Instance update will not end

Cloud SQL: Instance update will not end - google-cloud-platform

I have a problem with my Cloud SQL Second Generation instance.
I used this instance for months without problems.
Every evening I stop the instance and every morning I perform a restart via Java, using "patch" API:
DatabaseInstance requestBody = new DatabaseInstance();
Settings settings = new Settings();
settings.setActivationPolicy(ALWAYS_POLICY);
requestBody.setSettings(settings);
SQLAdmin sqlAdminService = createSqlAdminService();
SQLAdmin.Instances.Patch request =
sqlAdminService.instances().patch(projectId, sql_instanceId, requestBody);
Operation response = request.execute();
Today after the usual restart operation, the instance results in an "Instance update" status that never ends.
All functions on the cloud console are disabled for the instance and I cant't perform any action.
These are the settings of my instance:
Database version is MySQL 5.7
Auto storage increase is enabled
Automated backups are enabled
Binary logging is disabled
Located in europe-west1-c
How can I solve this situation?

Related

Restart EC2 instance on Website unavailability

I have a website hosted on an EC2 server. I want to monitor the website endpoint and restart the EC2 instance if the website in unavailable for a certain time frame (say 60 seconds).
What tools do I use in AWS and how do I accomplish this?

This is not a recommended approach.
Firstly, if a website is unavailable, you would probably want to investigate the cause rather than just restarting the instance. Your goal should be to run a stable system by removing root causes of problems rather than just ignoring the problem by restarting all the time.
The recommended design would be to run in a Highly Available configuration with:
The application running on at least two servers across at least two Availability Zones (in case of failure of an AZ). This is not necessarily more expensive because each server can be smaller than a single, large server.
A load balancer in front of the instances, distributing the traffic to the instances. The load balancer also performs continuous health checks and stops sending requests to servers that fail the health check
An Auto Scaling group that can terminate unhealthy instances and automatically launch replacement servers. This also works well if an Availability Zone should fail.
In this design, an unhealthy instance would be terminated (stopped and destroyed) and a new instance created with a pre-defined disk image and startup script. Alternatively, you might choose to move bad instances out of the Auto Scaling group for investigation of the problem, with a new instance being launched to take its place.
If your application requires a database, the database should be external to the instances so that all instances can connect to the database and replacing application instances does not cause any data loss.
As to the speed of noticing problems on a server, the load balancer can perform checks every few seconds. Amazon CloudWatch, on the other hand, would need at least a minute to detect problems (probably longer since metrics are calculated over a period rather than being "now" metrics).

John's approach is the correct one, but at its simplest:
Write a lambda function that can query your website and see if it is running or not and if not have that lambda function restart the instance.
Setup a cloudwatch event rule that runs on a frequency you determine to call the lambda function
I'll leave to you the work of writing the code that determines if the website is functional and restarting the server - but that is pretty straightforward. You can use python, java, node, go or .net core in your lambda function - I would think python would be the easiest in this case, but that is an opinion.

It is clear that this is not a best practice in AWS but can make some sense - e.g. you are running a small personal web server with low demand where availability is a less issue than costs.
At least that was my reason why I built automation for it.
diagram
lambda code
import json
import os
import boto3
import time
env_vars = [
'ALARM_NAME',
'REGION',
'INSTANCE_ID',
'OUTPUT_SNS_ARN'
]
ENV = {}
for env_var in env_vars:
ENV[env_var] = os.environ.get(env_var, None)
if not ENV[env_var]:
raise Exception(f"Environment variable {env_var} must be set!")
def reboot_instance(instanceID, regionName) -> "instanceID":
"""
InstanceID
instanceID - ID of instance
regionName - name of region
return InstanceID or False in case of exception
"""
ec2 = boto3.resource('ec2', region_name=regionName)
instance = ec2.Instance(instanceID)
try:
instance.stop()
time.sleep(30)
instance.stop(Force=True)
except:
pass
for i in range(180): # wait 3 minutes
instance = ec2.Instance(instanceID)
if instance.state['Code'] == 80:
break
time.sleep(1)
else:
raise Exception('Unable to stop instance')
instance.start()
return instanceID
def notify_about_reboot(instanceID, snsarn) -> True:
"""
Put SNS message about reboot to snsarn
"""
client = boto3.client('sns', region_name='us-east-1')
client.publish(TopicArn=snsarn, Message=f'EC2 instance {instanceID} was rebooted!')
return True
def lambda_handler(event, context) -> "status about reboot":
"""
event: see events/event.json
"""
print('EVENT:')
print(event)
for record in event.get('Records', None):
sns = record.get('Sns', None)
message = json.loads(sns.get('Message', None))
msgalarm = message.get('AlarmName', None)
msgstatus = message.get('NewStateValue', None)
if not all([sns,message,msgalarm,msgstatus]):
continue
if (msgalarm == ENV['ALARM_NAME']) and (msgstatus == 'ALARM'):
notify_about_reboot(reboot_instance(ENV['INSTANCE_ID'], ENV['REGION']), ENV['OUTPUT_SNS_ARN'])
return 'rebooting'
else:
return 'nothing to do'
return 'no sns record found'
I have released whole tested automation with SAM template and installation instructions also on https://github.com/koss822/misc/tree/master/Aws/route53-healthcheck-instance-reboot

uknown reason for google cloud compute engine VM shutdown?

One of the vm instances on google cloud compute was shutdown, with an event log in stackdriver without ip or actor (user or service or system) which initiated the event. The instance has onHostMaintenance set to migrate and automaticRestart set to true. This particular instance has migrated on maintenance without error before. The stackdriver event log looks like
{
actor: {
user: ""
},
event_subtype: "compute.instances.stop",
event_timestamp_us: "1531781734907624",
event_type: "GCE_API_CALL",
ip_address: "",
}
The user and ip_address fields are NOT redacted. They have empty values on actual log.
is this common? how does one identify the cause for shutdown in these peculiar cases ?

According to your event type, I dive in the doc of Activity logs,
Compute Engine API calls - GCE_API_CALL events are API calls that change the state of a resource.
It seems like someone may use API calls to shutdown your VM. Looking your settings and logs on the VM instance, it can't be a hostError or Maintenance event. As far as you turn on the onHostMaintenance set to migrate and automaticRestart set to true. Your VM will always be migrated to another hardware.
Out of curiosity, did you restart your VM? did it shutdown again? How often does it happen?

Seems like app engine had an outage during that time, incident .

Does AWS CPP S3 SDK support "Transfer acceleration"

I enabled "Transfer acceleration" on my bucket. But I dont see any improvement in speed of Upload in my C++ application. I have waited for more than 20 minutes that is mentioned in AWS Documentation.
Does the SDK support "Transfer acceleration" by default or is there a run time flag or compiler flag? I did not spot anything in the SDK code.
thanks

Currently, there isn't a configuration option that simply turns on transfer acceleration. You can however, use endpoint override in the client configuration to set the accelerated endpoint.

What I did to enable a (working) transfer acceleration:
set in the bucket configuration on the AWS panel "Transfer Acceleration" to enabled.
add to the IAM user that I use inside my C++ application the permission s3::PutAccelerateConfiguration
Add the following code to the s3 transfer configuration (bucket_ is your bucket name, the final URL must match the one shown in the AWS panel "Transfer Acceleration"):
Aws::Client::ClientConfiguration config;
/* other configuration options */
config.endpointOverride = bucket_ + ".s3-accelerate.amazonaws.com";
Ask for acceleration to the bucket before transfer... (docs in here )
auto s3Client = Aws::MakeShared<Aws::S3::S3Client>("Uploader",
Aws::Auth::AWSCredentials(id_, key_), config);
Aws::S3::Model::PutBucketAccelerateConfigurationRequest bucket_accel;
bucket_accel.SetAccelerateConfiguration(
Aws::S3::Model::AccelerateConfiguration().WithStatus(
Aws::S3::Model::BucketAccelerateStatus::Enabled));
bucket_accel.SetBucket(bucket_);
s3Client->PutBucketAccelerateConfiguration(bucket_accel);
You can check in the detailed logs of the AWS sdk that your code is using the accelerated entrypoint and you can also check that before the transfer start there is a call to /?accelerate (info)

What worked for me:
Enabling S3 Transfer Acceleration within AWS console
When configuring the client, only utilize the accelerated endpoint service:
clientConfig->endpointOverride = "s3-accelerate.amazonaws.com";
#gabry - your solution was extremely close, I think the reason it wasn't working for me was perhaps due to SDK changes since originally posted as the change is relatively small. Or maybe because I am constructing put object templates for requests used with the transfer manager.
Looking through the logs (Debug level) the SDK automatically concatenates the bucket used in transferManager::UploadFile() with the overridden endpoint. I was getting unresolved host errors as the requested host looked like:
[DEBUG] host: myBucket.myBucket.s3-accelerate.amazonaws.com
This way I could still keep the same S3_BUCKET macro name while only selectively calling this when instantiating a new configuration for upload.
e.g.
<<
...
auto putTemplate = new Aws::S3::Model::PutObjectRequest();
putTemplate->SetStorageClass(STORAGE_CLASS);
transferConfig->putObjectTemplate = *putTemplate;
auto multiTemplate = new Aws::S3::Model::CreateMultipartUploadRequest();
multiTemplate->SetStorageClass(STORAGE_CLASS);
transferConfig->createMultipartUploadTemplate = *multiTemplate;
transferMgr = Aws::Transfer::TransferManager::Create(*transferConfig);
auto transferHandle = transferMgr->UploadFile(localFile, S3_BUCKET, s3File);
transferMgr = Aws::Transfer::TransferManager::Create(*transferConfig);
...
>>

Unable to launch task from a spring cloud data flow stream

I registered my task app in Spring Cloud Data Flow, created a definition for it and the status shows 'unknown'. I created the stream and trying to launch the task through task-sink and I get an error:
java.lang.IllegalStateException: failed to resolve MavenResource:
How to launch a task from the task-sink? Am I missing something? Any help is appreciated. Another question I have is how do I access the payload sent via TaskLaunchRequest in my task?
S1 http | step1: transformer-rabbit | log
S2 :S1.step1 > filter --expression=payload.contains('CUSTADDRMODRQ_V15') | task-processor | task-sink
task-sink is launching the task provided by the uri in the TaskLaunchRequest. It is looking for the resource as shown in the log
OUT Using manager EnhancedLocalRepositoryManager with priority 10.0 for /home/vcap/.m2/repository
OUT Using transporter HttpTransporter with priority 5.0 for https://repo.spring.io/libs-snapshot and finally failing.
The task is deployed in our repository and as mentioned I registered and created the definition for it as well.
This one is in cf environment and I am using SCDF server 1.0.0.M4.
In the application.properties for the task-sink i am providing maven.remote.repositories.snapshots.url=**
task create fis-ifx-event-task --definition "fis-event-task"
My goal is launching the task from the stream.
Thanks for the information. I am in fact using the BUILD-SNAPSHOT as I am unable to enable taks in 1.0.0M4 version. Here is the one I am using spring-cloud-dataflow-server-cloudfoundry-1.0.0.BUILD-20160808.144306-116. I am able to register and create task definitions. The status of the task definition is showing as 'unknown' even when I am using the sample task module provided by your team. But when I initiate the flow of the stream and when task-sink tries to launch the task, it is unable to find the maven resource. When I create the task definition, does the task module gets deployed? I don't see any app in Pivotal Apps Manager. As mentioned earlier, I provided maven.remote.repositories.snapshot.url in the application.properties file for the task-sink application. Another thing I observed is when I launch the task manually from dataflow shell it gives an error CF-UnprocessableEntity(10008): The request is semantically invalid: Unknown field(s): 'staging_disk_in_mb', 'staging_memory_in_mb' and also a message saying 'Source is empty'. Presently the task is supposed to print the timestamp and is not dependent on any input.
TaskProcessor code:
#EnableBinding(Processor.class)
#EnableConfigurationProperties(TaskProcessorProperties.class)
public class TaskProcessor {
#Autowired
private TaskProcessorProperties processorProperties;
public TaskProcessor() {
}
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
#ELI(level = "info", eventType = ELIEventType.INBOUND)
public Object setupRequest(String message) {
Map<String, String> properties = new HashMap<String, String>();
properties.put("payload", message);
TaskLaunchRequest request = new TaskLaunchRequest(processorProperties.getUri(), null, properties, null);
return new GenericMessage<>(request);
}
}
TaskSink code:
#SpringBootApplication
#EnableTaskLauncher
#EnableBinding(Sink.class)
#EnableConfigurationProperties(TaskSinkProperties.class)
public class FisIfxEventTaskSinkApplication {
public static void main(String[] args) {
SpringApplication.run(FisIfxEventTaskSinkApplication.class, args);
}
}
I provided the stream I am using earlier in the post. Sink is receiving the TaskLaunchRequest with uri and payload as you can see here and unable to launch the task.
OUT registering [40, java.io.File] with serializer org.springframework.integration.codec.kryo.FileSerializer
2016-08-10T16:08:55.02-0600 [APP/0]
OUT Launching Task for the following resource TaskLaunchRequest{uri='maven://com.xxx:fis.ifx.event-task:jar:1.0-SNAPSHOT', commandlineArguments=[], environmentProperties={payload={"statusCode":0,"fisT
opic":"CustomerDataUpdated","payloadId":"CUSTADDRMODR``Q_V15","customerIds":[1597304]}}, deploymentProperties={}}

Before I begin, you have a number of questions here. In the future, it's better to break them up into multiple questions so that they are easier to find by other users and easier to answer. That being said:
A little context on the current state of things
In order to understand how things will work, it's important to understand the current state of things. The current releases of the software involved are:
Pivotal Cloud Foundry (PCF) - 1.7.12. This version is required for any task support.
Spring Cloud Task (SCT) - 1.0.2.RELEASE
Spring Cloud Data Flow CF (SCDF) - 1.0.0.BUILD-SNAPSHOT (current as of the date of this post).
Currently PCF 1.7.12+ has all the capabilities to run tasks. You can create v3 applications (the type of application used to launch a task), run it as a task, etc. However, the tooling around that functionality is not currently complete. There is no support for v3 applications in Apps Manager or the CLI. There is a plugin for the CLI that is more of a dev tool that can be used to help with some functions (it will show you logs, etc), but it is not fully functional and requires a specific version of the CLI to work [1]. This is one of the reasons that the task functionality within PCF is still considered experimental.
Spring Cloud Task is currently GA and supports all the functionality needed to effectively run tasks on CF. However, it's important to note that SCT doesn't handle orchestration so the actual launching of tasks on CF is the responsibility of either the user, or Spring Cloud Data Flow (the easier route).
Spring Cloud Data Flow's Cloud Foundry server implementation currently has functionality to launch tasks on PCF in the latest snapshots. We have validated this against 1.7.12 as well as the development branch of 1.8.
The task workflow within SCDF
Tasks are fundamentally different from stream applications within the context of SCDF. When you create a stream definition, you are given the option to deploy it. What this does is it actually downloads the Spring Boot über jars and deploys them to PCF as long running processes. If they go down, PCF, will relaunch them as expected, etc.
Tasks on the other hand, are not deployed. They are launched. The difference is that while you create a task definition, there is nothing deployed until you click launch. And when the task completes, the software is shut down and cleaned up. So while a stream definition may have states, it's really a one to one relationship between the definition and the deployed software. Where with a task, you can launch a task definition as many times as you want.
Your issues
Reading through your post, I see a few things that you are struggling with. Let me see if I can help:
Task Definitions within SCDF and launching them via a stream - When launching a task from a stream, the task registry within SCDF is not used. The sink expects the URL for the resource to be within the TaskLauchRequest.
Apps Manager and tasks - As mentioned above, there is no support for v3 applications in Apps Manager yet so you won't be able to see your tasks there.
Viewing the logs - In order to debug what's going wrong with launching your task on CF, you're going to want to view the logs. To do so, use the v3 CLI plugin mentioned above to view them. It's important to note that you can only tail live logs with the plugin, not view logs that have previously been rendered. Because of that, when testing, you'll want to tail the logs as soon as the app is created, before it's launched.
Error in SCDF Shell - The error you received from the SCDF shell (CF-UnprocessableEntity(10008):...) leads me to wonder if you have both the correct version of PCF (1.7.12+) and the correct version of the following other libraries:
spring-cloud-deployer-cloudfoundry - The latest snapshots
cf-java-client - 2.0.0.M10+
reactor-core - 3.0.0.RC1+
I hope this helps!
[1] https://github.com/cloudfoundry/v3-cli-plugin

Task support is not available in 1.0.0.M4 release of SCDF's CF-server. In this release, the task commands/REST-APIs should be disabled - see here. And for that reason, you wouldn't see any docs related to Tasks in the 1.0.0.M4 reference guide.
That said, the Task support is available/enabled in the BUILD-SNAPSHOT release. If you're locally building the CF-server and upon pushing it to CF, you could take advantage the task commands in the shell to create and launch task definitions.

Amazon EC2 custom AMI not running bootstrap (user-data)

I have encountered an issue when creating custom AMIs (images) on EC2 instances. If I start up a Windows default 2012 server instance with a custom bootstrap/user-data script such as;
<powershell>
PowerShell "(New-Object System.Net.WebClient).DownloadFile('http://download.microsoft.com/download/3/2/2/3224B87F-CFA0-4E70-BDA3-3DE650EFEBA5/vcredist_x64.exe','C:\vcredist_x64.exe')"
</powershell>
It will work as intended and go to the URL and download the file, and store it on the C: Drive.
But if I setup a Windows Server Instance, then create a image from it, and store it as a Custom AMI, then deploy it with the exact same custom user-data script it will not work. But if I go to the instance url (http://169.254.169.254/latest/user-data) it will show the script has imported successfully but has not been executed.
After checking the error logs I have noticed this on a regular occasion:
Failed to fetch instance metadata http://169.254.169.254/latest/user-data with exception The remote server returned an error: (404) Not Found.

Update 4/15/2017: For EC2Launch and Windows Server 2016 AMIs
Per AWS documentation for EC2Launch, Windows Server 2016 users can continue using the persist tags introduced in EC2Config 2.1.10:
For EC2Config version 2.1.10 and later, or for EC2Launch, you can use
true in the user data to enable the plug-in after
user data execution.
User data example:
<powershell>
insert script here
</powershell>
<persist>true</persist>
For subsequent boots:
Windows Server 2016 users must additionally enable configure and enable EC2Launch instead of EC2Config. EC2Config was deprecated on Windows Server 2016 AMIs in favor of EC2Launch.
Run the following powershell to schedule a Windows Task that will run the user data on next boot:
C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 –Schedule
By design, this task is disabled after it is run for the first time. However, using the persist tag causes Invoke-UserData to schedule a separate task via Register-FunctionScheduler, to persist your user data on subsequent boots. You can see this for yourself at C:\ProgramData\Amazon\EC2-Windows\Launch\Module\Scripts\Invoke-Userdata.ps1.
Further troubleshooting:
If you're having additional issues with your user data scripts, you can find the user data execution logs at C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log for instances sourced from the WS 2016 base AMI.
Original Answer: For EC2Config and older versions of Windows Server
User data execution is automatically disabled after the initial boot. When you created your image, it is probable that execution had already been disabled. This is configurable manually within C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml.
The documentation for "Configuring a Windows Instance Using the EC2Config Service" suggests several options:
Programmatically create a scheduled task to run at system start using schtasks.exe /Create, and point the scheduled task to the user data script (or another script) at C:\Program Files\Amazon\Ec2ConfigServer\Scripts\UserScript.ps1.
Programmatically enable the user data plug-in in Config.xml.
Example, from the documentation:
<powershell>
$EC2SettingsFile="C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml"
$xml = [xml](get-content $EC2SettingsFile)
$xmlElement = $xml.get_DocumentElement()
$xmlElementToModify = $xmlElement.Plugins
foreach ($element in $xmlElementToModify.Plugin)
{
if ($element.name -eq "Ec2SetPassword")
{
$element.State="Enabled"
}
elseif ($element.name -eq "Ec2HandleUserData")
{
$element.State="Enabled"
}
}
$xml.Save($EC2SettingsFile)
</powershell>
Starting with EC2Config version 2.1.10, you can use <persist>true</persist> to enable the plug-in after user data execution.
Example, from the documentation:
<powershell>
insert script here
</powershell>
<persist>true</persist>

Another solution that worked for me is to run Sysprep with EC2Launch.
The issue is that AWS doesn't reestablish the route to the profile service (169.254.169.254) in your custom AMI. See response by SanjitPatel in this post. So when I tried to use my custom AMI to create spot requests, my new instances were failing to find user data.
Shutting down with Sysprep, essentially forces AWS re-do all setup work on the instance, as if it were run for the first time. So when you create your instance, shut it down with Sysprep and then create your custom AMI, AWS will setup the profile service route correctly for the new instances and execute your user data. This also avoids manually changing Windows Tasks and executing user data on subsequent boots, as persist tag does.
Here is a quick step-by-step:
Create an instance using one of the AWS Windows AMIs (Windows Server 2016 Nano Server doesn't support Sysprep) and passing your desired user data (this may be optional, but good to make sure AWS wires setup scripts correctly to handle user data).
Customize your instance as needed.
Shut down your instance with Sysprep. Just open EC2LaunchSettings application and click "Shutdown with Sysprep". Full instructions here.
Create your custom AMI from the instance you just shut down.
Use your custom AMI to create other instances, passing user data on instance creation. User data will be executed on instance launch. In my case, I used Spot Request screen, which had a User Data text box.
Hope this helps!

At the end of initial bootstrap (UserData) script, just append persist tag as shown below.
Works perfectly.
<powershell>
insert script here
</powershell>
<persist>true</persist>

For those people that got here from Google and are running a Server 2016 instance, it seems that this is no longer possible.
Server2016 doesn't have ec2config service and so you can't use the persist flag.
<persist>true</persist>
Described in Anthony Neace's post.
Server 2016 uses EC2Launch and I haven't yet seen how it's possible to run a script at every boot. You can run a script on the first boot, but subsequent boots will not run it.

I added below powershell script to run during the AMI bake process which helped me fix this issue. This was Windows server 2019.
$EC2LaunchInitInstance = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1"
$EC2LaunchSysprep = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\SysprepInstance.ps1"
Invoke-Expression -Command "$EC2LaunchInitInstance -Schedule"
Invoke-Expression -Command "$EC2LaunchSysprep -NoShutdown"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js