Cant Upload image mikrotik-chr on google cloud - google-cloud-platform

I start make image mikrotik-chr from my bucket but always error. I dontt know how to fix it
[inflate.import-virtual-disk]: 2021-08-16T05:39:39Z CreateInstances: Creating instance "inst-importer-inflate-6t2qt".
[inflate]: 2021-08-16T05:39:46Z Error running workflow: step "import-virtual-disk" run error: operation failed &{ClientOperationId: CreationTimestamp: Description: EndTime:2021-08-15T22:39:46.802-07:00 Error:0xc00007b770 HttpErrorMessage:SERVICE UNAVAILABLE HttpErrorStatusCode:503 Id:1873370325760361715 InsertTime:2021-08-15T22:39:40.692-07:00 Kind:compute#operation Name:operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 OperationGroupId: OperationType:insert Progress:100 Region: SelfLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/operations/operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 StartTime:2021-08-15T22:39:40.692-07:00 Status:DONE StatusMessage: TargetId:6947401086746772724 TargetLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/instances/inst-importer-inflate-6t2qt User:606260965808#cloudbuild.gserviceaccount.com Warnings:[] Zone:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a ServerResponse:{HTTPStatusCode:200 Header:map[Cache-Control:[private] Content-Type:[application/json; charset=UTF-8] Date:[Mon, 16 Aug 2021 05:39:46 GMT] Server:[ESF] Vary:[Origin X-Origin Referer] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[0]]} ForceSendFields:[] NullFields:[]}:
Code: ZONE_RESOURCE_POOL_EXHAUSTED
Message: The zone 'projects/circular-jet-322614/zones/asia-southeast2-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
[inflate]: 2021-08-16T05:39:46Z Workflow "inflate" cleaning up (this may take up to 2 minutes).
[inflate]: 2021-08-16T05:39:48Z Workflow "inflate" finished cleanup.
[import-image]: 2021-08-16T05:39:48Z Finished creating Google Compute Engine disk
[import-image]: 2021-08-16T05:39:49Z step "import-virtual-disk" run error: operation failed &{ClientOperationId: CreationTimestamp: Description: EndTime:2021-08-15T22:39:46.802-07:00 Error:0xc00007b770 HttpErrorMessage:SERVICE UNAVAILABLE HttpErrorStatusCode:503 Id:1873370325760361715 InsertTime:2021-08-15T22:39:40.692-07:00 Kind:compute#operation Name:operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 OperationGroupId: OperationType:insert Progress:100 Region: SelfLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/operations/operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 StartTime:2021-08-15T22:39:40.692-07:00 Status:DONE StatusMessage: TargetId:6947401086746772724 TargetLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/instances/inst-importer-inflate-6t2qt User:606260965808#cloudbuild.gserviceaccount.com Warnings:[] Zone:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a ServerResponse:{HTTPStatusCode:200 Header:map[Cache-Control:[private] Content-Type:[application/json; charset=UTF-8] Date:[Mon, 16 Aug 2021 05:39:46 GMT] Server:[ESF] Vary:[Origin X-Origin Referer] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[0]]} ForceSendFields:[] NullFields:[]}: Code: ZONE_RESOURCE_POOL_EXHAUSTED; Message: The zone 'projects/circular-jet-322614/zones/asia-southeast2-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
ERROR
ERROR: build step 0 "gcr.io/compute-image-tools/gce_vm_image_import:release" failed: step exited with non-zero status: 1

You will need to check if you have enough CPUs and other resources quota in 'projects/circular-jet-322614/zones/asia-southeast2-a'. Resource requirement can be found by looking at the deployment specs of the workload.

Related

Find strings in a log file for zabbix monitoring

I need to find strings in a log file with regex and later send output to Zabbix monitoring server to fire triggers if needed.
For example here is a part of the log file:
===== Backup Failures =====
Description: Checks number of studies that their backup failed
Status: OK , Check Time: Sun Oct 30 07:31:13 2022
Details: [OK] 0 total backup commands failed during the last day.
===== Oracle queues =====
Description: Count Oracle queues sizes. The queues are used to pass information between the applications
Status: OK , Check Time: Sun Oct 30 07:31:04 2022
Details: [OK] All queues have less than 15 elements.
===== Zombie Services =====
Description: Checks for zombie services
Status: Error , Check Time: Sun Oct 30 07:31:30 2022, Script: <check_mvs_services.pl>
Details: [CRITICAL] 1 missing process(es) found. Failed killing 1 process(es)
===== IIS Application Pools Memory Usage =====
Description: Checks the memory usage of the application pools that run under IIS (w3wp.exe)
Status: OK , Check Time: Sun Oct 30 07:32:30 2022
Details: [OK] All processes of type w3wp.exe don't exceed memory limits
===== IIS Web Response =====
Description: Checks that the web site responds properly
Status: OK , Check Time: Sun Oct 30 07:32:34 2022
Details: [OK] All addresses returned 200
I need to find all items for monitoring and it's results.
If results not OK then Zabbix triggers should send alarm.
I found Zabbix can handle log file monitoring with similar command here but first need to find strings in the log file:
log[/path/to/the/file,"regex expression",,,,]
In this example I believe these items should find for Zabbix:
===== Backup Failures =====
Details: [OK] 0 total backup commands failed during the last day.
===== Oracle queues =====
Details: [OK] All queues have less than 15 elements.
===== Zombie Services =====
Details: [CRITICAL] 1 missing process(es) found. Failed killing 1 process(es)
===== IIS Application Pools Memory Usage =====
Details: [OK] All processes of type w3wp.exe don't exceed memory limits
===== IIS Web Response =====
Details: [OK] All addresses returned 200
Can you advise how possible to achieve this solution?
For any help I would be really appreciated.
Thanks in advance.

How to solve Memory issues with Paketo buildpack used to build a spring-boot app?

I am building Docker image with the spring-boot-maven-plugin that is deployed to AWS BeanStalk. I use the plugin through the 2.4.3 spring boot starter dependency)
However, when the container is started, I get the error below.
I am a bit new in the buildpack stuff, but tried to solve it by playing with the Buildpack env variables as described on the website. But it had absolutely no effect on the values shown in the error log below.
I found this github issue but not sure if it's relevant and how to use it.
I am using AWS Micro instance that has 1G total RAM, it performs a rolling update, so at the time of starting the new image, the other is also running till the new one started with success, so to start the container could as well be that only 300MB is available, however, during normal run it has more available.
Why do I need this memory calculation? Can't I just disable it? When I build a Docker image of the app.jar and deploy it to aws beanstalk, it works well without any memory settings:
docker build . --build-arg JAR_FILE=./target/app.jar -t
$APPLICATION_NAME
But I would love to use the image build through the spring-boot-maven plugin.
Please some advice on how to solve this?
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<image>
<name>${image.name}</name>
<env>
<tag>${project.version}</tag>
<!--BPE_APPEND_JAVA_TOOL_OPTIONS>-XX:MaxDirectMemorySize=1M</BPE_APPEND_JAVA_TOOL_OPTIONS-->
<BPE_JAVA_TOOL_OPTIONS>-Xms1024m -Xmx3048m</BPE_JAVA_TOOL_OPTIONS>
</env>
</image>
</configuration>
</plugin>
The AWS Beanstalk error during deployment:
Tue May 18 2021 18:07:14 GMT+0000 (UTC) INFO Successfully built aws_beanstalk/staging-app
Tue May 18 2021 18:07:22 GMT+0000 (UTC) ERROR Docker container quit unexpectedly after launch: 0M, -Xss1M * 250 threads
[31;1mERROR: [0mfailed to launch: exec.d: failed to execute exec.d file at path '/layers/paketo-buildpacks_bellsoft-liberica/helper/exec.d/memory-calculator': exit status 1. Check snapshot logs for details.
Tue May 18 2021 18:07:24 GMT+0000 (UTC) ERROR [Instance: i-0dc33dcb517e89ef9] Command failed on instance. Return code: 1 Output: (TRUNCATED)...pectedly after launch: 0M, -Xss1M * 250 threads
[31;1mERROR: [0mfailed to launch: exec.d: failed to execute exec.d file at path '/layers/paketo-buildpacks_bellsoft-liberica/helper/exec.d/memory-calculator': exit status 1. Check snapshot logs for details.
Hook /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
Tue May 18 2021 18:07:24 GMT+0000 (UTC) INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
Tue May 18 2021 18:07:24 GMT+0000 (UTC) ERROR Unsuccessful command execution on instance id(s) 'i-0dc33dcb517e89ef9'. Aborting the operation.
Tue May 18 2021 18:07:24 GMT+0000 (UTC) ERROR Failed to deploy application.
Tue May 18 2021 18:07:24 GMT+0000 (UTC) ERROR During an aborted deployment, some instances may have deployed the new application version. To ensure all instances are running the same version, re-deploy the appropriate application version.
##[error]Error: Error deploy application version to Elastic Beanstalk
The Docker error log downloaded in AWS Beanstalk:
Docker container quit unexpectedly on Tue May 18 18:07:21 UTC 2021:
Setting Active Processor Count to 1
Calculating JVM memory based on 274300K available memory
unable to calculate memory configuration
fixed memory regions require 662096K which is greater than 274300K available for allocation: -XX:MaxDirectMemorySize=10M, -XX:MaxMetaspaceSize=150096K, -XX:ReservedCodeCacheSize=240M, -Xss1M * 250 threads
[31;1mERROR: [0mfailed to launch: exec.d: failed to execute exec.d file at path '/layers/paketo-buildpacks_bellsoft-liberica/helper/exec.d/memory-calculator': exit status 1
OK, so here's what this is telling us:
Calculating JVM memory based on 274300K available memory
The memory calculator is detecting a maximum amount of memory available in the container as 274300KB, or about 274M.
fixed memory regions require 662096K which is greater than 274300K available for allocation: -XX:MaxDirectMemorySize=10M, -XX:MaxMetaspaceSize=150096K, -XX:ReservedCodeCacheSize=240M, -Xss1M * 250 threads
This message is saying that the memory calculator needs at least 662096KB or 662M in its present configuration.
It's also breaking down why it needs/wants that much:
10M for direct memory
150096K for metaspace
240M for reserved code cache
250M for threads (thread stack specifically)
That's not counting the heap which will require more (you seem to want at least 1G for the heap).
This leaves two possibilities:
The container is not provisioned large enough. You need to give it more memory.
The memory calculator is not correctly detecting the memory limit.
If you suspect #2, look at the following. The memory calculator selects it's max memory limit (i.e. the 274M in the example above) by looking in these places in this order.
Check the configured container memory limit by looking at /sys/fs/cgroup/memory/memory.limit_in_bytes inside the container.
Check the system's max available memory by looking at /proc/meminfo and the MemAvailable metric, again, from inside the container.
If all else fails, it'll end up with a 1G fallback.
If it's truly not working as described above, please open a bug and provide as much detail as you can.
Alternatively, you may tune the memory calculator. You can instruct it to give less memory to specific regions such that you reduce the total memory required to be less than the max available memory.
You can do that by setting the JVM memory flags in the JAVA_TOOL_OPTIONS env variable (you have BPE_JAVA_TOOL_OPTIONS which isn't right). See https://paketo.io/docs/buildpacks/language-family-buildpacks/java/#runtime-jvm-configuration.
For example, if you want to override the heap size then set -Xmx in JAVA_TOOL_OPTIONS to something custom. The memory calculator will see what you've set and adjust the remaining memory settings accordingly. Override as many as necessary.
To get things down to fit within 274M of RAM, you'd have to go really small. Something like -Xss256K -XX:ReservedCodeCacheSize=64M -XX:MaxMetaspaceSize=64 -Xmx64M. I didn't test to confirm, but this shows the idea of what you need to do. Reduce the memory settings such that the sum total all fits within your max memory limit for the container.
This also does not take into account if your application will actually be able to run within such small limits. If you go too small you may at some point see OutOfMemoryErrors or StackOverflowErrors, and your app will crash. You can also negatively impact performance by reducing code cache size too much since this is where the JIT stores byte code that it's optimized to native code. You could even cause GC issues, or degraded performance due to too much GC if the heap isn't sized right. In short, be very careful if you're going to do this.

Code: EXTERNAL_RESOURCE_NOT_FOUND; Message: The resource '<projectID>-compute#developer.gserviceaccount.com' of type 'serviceAccount' was not found

I am trying to import OVA with the following command:
gcloud beta compute instances import instance --source-uri=gs://abc/xyz.ova --compute-service-account user#serviceaccount.iam.gserviceaccount.com --os=centos-7
Getting error saying -compute#developer.gserviceaccount.com not found
Following is the error message:
[import-disk-1]: 2021-05-12T13:40:22Z Finished creating Google Compute Engine disk
[import-disk-1]: 2021-05-12T13:40:22Z Inspecting disk for OS and bootloader
[import-disk-1]: 2021-05-12T13:41:56Z Inspection result=elapsed_time_ms:93608
[import-disk-1]: 2021-05-12T13:42:13Z Making disk bootable on Google Compute Engine
[import-ovf]: 2021-05-12T14:23:24Z step "create-instance" run error: operation failed &{ClientOperationId: CreationTimestamp: Description: EndTime:2021-05-12T07:23:24.254-07:00 Error:0xc000080690 HttpErrorMessage:BAD REQUEST HttpErrorStatusCode:400 Id:3961720838234200610 InsertTime:2021-05-12T07:23:09.145-07:00 Kind:compute#operation Name:operation-5c222bfe9828a-4022eae2 OperationType:insert Progress:100 Region: SelfLink:https://www.googleapis.com/compute/v1/projects/abcxyz-149810/zones/us-central1-a/operations/operation-5c222bfe9828a-4022eae2 StartTime:2021-05-12T07:23:09.145-07:00 Status:DONE StatusMessage: TargetId:8909656942867536419 TargetLink:https://www.googleapis.com/compute/v1/projects/abcxyz-149810/zones/us-central1-a/instances/somename-instancep2 User:<projID>#cloudbuild.gserviceaccount.com Warnings:[] Zone:https://www.googleapis.com/compute/v1/projects/abcxyz-149810/zones/us-central1-a ServerResponse:{HTTPStatusCode:200 Header:map[Cache-Control:[private] Content-Type:[application/json; charset=UTF-8] Date:[Wed, 12 May 2021 14:23:24 GMT] Server:[ESF] Vary:[Origin X-Origin Referer] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[0]]} ForceSendFields:[] NullFields:[]}:
[import-ovf]: 2021-05-12T14:23:24Z Cleaning up.
[import-ovf]: 2021-05-12T14:23:24Z Deleting content of: gs://abcxyz-149810-ovf-import-bkt-us-central1/5dx29/ovf/
[import-ovf]: 2021-05-12T14:23:24Z Deleting gs://abcxyz-149810-ovf-import-bkt-us-central1/5dx29/ovf/abcxyz-remote-11.31p2-gcp-disk1.vmdk
[import-ovf]: 2021-05-12T14:23:24Z Deleting gs://abcxyz-149810-ovf-import-bkt-us-central1/5dx29/ovf/abcxyz-remote-11.31p2-gcp-file1.iso
[import-ovf]: 2021-05-12T14:23:24Z Deleting gs://abcxyz-149810-ovf-import-bkt-us-central1/5dx29/ovf/abcxyz-remote-11.31p2-gcp.mf
[import-ovf]: 2021-05-12T14:23:24Z Deleting gs://abcxyz-149810-ovf-import-bkt-us-central1/5dx29/ovf/abcxyz-remote-11.31p2-gcp.ovf
[import-ovf]: 2021-05-12T14:23:25Z step "create-instance" run error: operation failed &{ClientOperationId: CreationTimestamp: Description: EndTime:2021-05-12T07:23:24.254-07:00 Error:0xc000080690 HttpErrorMessage:BAD REQUEST HttpErrorStatusCode:400 Id:3961720838234200610 InsertTime:2021-05-12T07:23:09.145-07:00 Kind:compute#operation Name:operation-5c222bfe9828a-4022eae2 OperationType:insert Progress:100 Region: SelfLink:https://www.googleapis.com/compute/v1/projects/abcxyz-149810/zones/us-central1-a/operations/operation-5c222bfe9828a-4022eae2 StartTime:2021-05-12T07:23:09.145-07:00 Status:DONE StatusMessage: TargetId:8909656942867536419 TargetLink:https://www.googleapis.com/compute/v1/projects/abcxyz-149810/zones/us-central1-a/instances/somename-instancep2 User:<projID>#cloudbuild.gserviceaccount.com Warnings:[] Zone:https://www.googleapis.com/compute/v1/projects/abcxyz-149810/zones/us-central1-a ServerResponse:{HTTPStatusCode:200 Header:map[Cache-Control:[private] Content-Type:[application/json; charset=UTF-8] Date:[Wed, 12 May 2021 14:23:24 GMT] Server:[ESF] Vary:[Origin X-Origin Referer] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[0]]} ForceSendFields:[] NullFields:[]}: Code: EXTERNAL_RESOURCE_NOT_FOUND; Message: The resource '<projID>-compute#developer.gserviceaccount.com' of type 'serviceAccount' was not found.
Any suggestion on this.
The resource - 'projectid-compute#developer.gserviceaccount.com' of type 'serviceAccount' was not found-
-the Error is occurring if the default compute service account was somehow deleted.
This 'projectID-compute#developer.gserviceaccount.com' is the Compute Engine default service account. You can check if this service account is present or not by Navigating to your IAM and Admin page in cloud console.
I was able to get this error by deleting my default compute service account and attempting to create an instance through the Cloud Shell, so I assume this is the issue.
If the default compute service account was somehow deleted unknowingly and if it has been less than 30 days, you can restore the account by using this command:
gcloud beta iam service-accounts undelete [ACCOUNT_ID]
you can refer this Manage service accounts.
After doing the above step, you will have to go API and services page and disable and re-enable the compute engine API. This will take a few moments, but after the GCE(Google Compute Engine) API is re-enabled you should be able to import your ova file using the cloud shell.
gcloud beta compute instances import public documentation

Adding a source in a gateway

This is the error im facing when trying to add a datasource in a gateway:
Unable to connect: We encountered an error while trying to connect to
. Details: "We could not register this data source for any gateway
instances within this cluster. Please find more details below about
specific errors for each gateway instance.
Activity ID:
66610131-d0fc-4787-9432-36b2bbc95dbb
Request ID:
b9231dc4-dd80-8b86-6301-c171aad3b879
Cluster URI:
https://wabi-south-east-asia-redirect.analysis.windows.net
Status code:
400
Error Code:
DMTS_PublishDatasourceToClusterErrorCode
Time:
Wed Oct 17 2018 12:48:44 GMT-0700 (Pacific Daylight Time)
Version:
13.0.6980.207
434Gateway:
Invalid connection credentials.
Underlying error code:
-2147467259
Underlying error message:
The credentials provided for the File source are invalid. (Source at c:\users\rohan\documents\latest 2018\sara\new folder\2018_sales.xls.)
DM_ErrorDetailNameCode_UnderlyingHResult:
-2147467259
Microsoft.Data.Mashup.CredentialError.DataSourceKind:
File
Microsoft.Data.Mashup.CredentialError.DataSourcePath:
c:\users\rohan\documents\latest 2018\sara\new folder\2018_sales.xls
Microsoft.Data.Mashup.CredentialError.Reason:
AccessUnauthorized
Microsoft.Data.Mashup.MashupSecurityException.DataSources:
[{"kind":"File","path":"c:\\users\\rohan\\documents\\latest 2018\\sara\\new folder\\2018_sales.xls"}]
Microsoft.Data.Mashup.MashupSecurityException.Reason:
AccessUnauthorized
Troubleshoot connection problems

Kubernetes node fails (CoreOS/AWS/Kubernetes stack)

We have a small testing Kubernetes cluster running on AWS, using CoreOS, as per the instructions here. Currently this consists of only a master and a worker node. In the past couple of weeks we've been running this cluster we've noticed that the worker instance occasionally fails. The first time this happened the instance was subsequently killed and restarted by the auto-scaling group it is in. Today the same thing happened, but we were able to login to the instance before it was shut down and retrieve some information, but it remains unclear to me exactly what has caused this problem.
The node failure seems to happen on an irregular basis, and there is no evidence that there is anything abnormal happening which would precipitate this (external load etc).
Subsquent to the failure (kubernetes node status Not Ready) the instance was still running, but had inactive kubelet and docker services (start failed with result 'dependency'). The flanneld service was running, but with a restart time after the time the node failure was seen.
Logs from around the time of the node failure don't seem to show anything clearly pointing to a cause of the failure. There's a couple of kubelet-wrapper errors at about the time the failure was seen:
`Jul 22 07:25:33 ip-10-0-0-92.ec2.internal kubelet-wrapper[1204]: E0722 07:25:33.121506 1204 kubelet.go:2745] Error updating node status, will retry: nodes "ip-10-0-0-92.ec2.internal" cannot be updated: the object has been modified; please apply your changes to the latest version and try again`
`Jul 22 07:25:34 ip-10-0-0-92.ec2.internal kubelet-wrapper[1204]: E0722 07:25:34.557047 1204 event.go:193] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"ip-10-0-0-92.ec2.internal.1462693ef85b56d8", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"4687622", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-0-92.ec2.internal", UID:"ip-10-0-0-92.ec2.internal", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientDisk", Message:"Node ip-10-0-0-92.ec2.internal status is now: NodeHasSufficientDisk", Source:api.EventSource{Component:"kubelet", Host:"ip-10-0-0-92.ec2.internal"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63604448947, nsec:0, loc:(*time.Location)(0x3b1a5c0)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63604769134, nsec:388015022, loc:(*time.Location)(0x3b1a5c0)}}, Count:2, Type:"Normal"}': 'events "ip-10-0-0-92.ec2.internal.1462693ef85b56d8" not found' (will not retry!)
Jul 22 07:25:34 ip-10-0-0-92.ec2.internal kubelet-wrapper[1204]: E0722 07:25:34.560636 1204 event.go:193] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"ip-10-0-0-92.ec2.internal.14626941554cc358", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"4687645", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-0-92.ec2.internal", UID:"ip-10-0-0-92.ec2.internal", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeReady", Message:"Node ip-10-0-0-92.ec2.internal status is now: NodeReady", Source:api.EventSource{Component:"kubelet", Host:"ip-10-0-0-92.ec2.internal"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63604448957, nsec:0, loc:(*time.Location)(0x3b1a5c0)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63604769134, nsec:388022975, loc:(*time.Location)(0x3b1a5c0)}}, Count:2, Type:"Normal"}': 'events "ip-10-0-0-92.ec2.internal.14626941554cc358" not found' (will not retry!)`
followed by what looks like some etcd errors:
`Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [WARNING][1305/140149086452400] calico.etcddriver.driver 810: etcd watch returned bad HTTP status topoll on index 5237916: 400
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [ERROR][1305/140149086452400] calico.etcddriver.driver 852: Error from etcd for index 5237916: {u'errorCode': 401, u'index': 5239005, u'message': u'The event in requested index is outdated and cleared', u'cause': u'the requested history has been cleared [5238006/5237916]'}; triggering a resync.
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [INFO][1305/140149086452400] calico.etcddriver.driver 916: STAT: Final watcher etcd response time: 0 in 630.6s (0.000/s) min=0.000ms mean=0.000ms max=0.000ms
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [INFO][1305/140149086452400] calico.etcddriver.driver 916: STAT: Final watcher processing time: 7 in 630.6s (0.011/s) min=90066.312ms mean=90078.569ms max=90092.505ms
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [INFO][1305/140149086452400] calico.etcddriver.driver 919: Watcher thread finished. Signalled to resync thread. Was at index 5237916. Queue length is 1.
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,743 [WARNING][1305/140149192694448] calico.etcddriver.driver 291: Watcher died; resyncing.`
and a few minutes later a large number of failed connections to the master (10.0.0.50):
`Jul 22 07:36:41 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:36:37,641 [WARNING][1305/140149086452400] urllib3.connectionpool 647: Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7700b85b90>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': http://10.0.0.50:2379/v2/keys/calico/v1?waitIndex=5239006&recursive=true&wait=true
Jul 22 07:36:41 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:36:37,641 [INFO][1305/140149086452400] urllib3.connectionpool 213: Starting new HTTP connection (2): 10.0.0.50`
Although these errors are presumably related to the node/instance failure, these don't really mean a lot to me, and certainly don't seem to suggest the underlying cause - but if anyone can see anything here that would suggest a possible cause of the node/instance failure (and how we can go about rectifying this) that would be greatly appreciated!
Something in your description and log confuse me, you said that you use docker runtime which there is rkt in your log; you said that you use flannel in your cluster which there is calico in your log...
Anyway, from the log you provide, it's more like your etcd is down... which makes kubelet and calico can't update their state, and apiserver will regard they are down. There is not enough information here, I could only suggest that you need to backup etcd's log next time you see this...
Another suggestion is that better not use the same etcd for both kubenetes cluster and calico...