I tried both internal and external IP to pull the image from local repository, I am getting this error:
I0113 12:15:49.024701 6602 master.cpp:3760] Sending 1 offers to framework 20160113-100358-33554442-5050-2351-0001 (marathon) at scheduler-6654c9ed-c68e-4531-894f-420a093fb578#10.0.0.2:42279
I0113 12:15:49.031448 6600 master.cpp:2273] Processing ACCEPT call for offers: [ 20160113-113337-33554442-5050-6575-O37 ] on slave 20160113-100358-33554442-5050-2351-S0 at slave(1)#10.0.0.3:5051 (development-7734-cb7.c.project-sample-1180.internal) for framework 20160113-100358-33554442-5050-2351-0001 (marathon) at scheduler-6654c9ed-c68e-4531-894f-420a093fb578#10.0.0.2:42279
I0113 12:15:49.031901 6602 master.hpp:822] Adding task tomcat.619735cd-b9ef-11e5-8e8e-0242602a4b85 with resources cpus(*):0.5; mem(*):512; ports(*):[31482-31482] on slave 20160113-100358-33554442-5050-2351-S0 (development-7734-cb7.c.project-sample-1180.internal)
I0113 12:15:49.031973 6602 master.cpp:2550] Launching task tomcat.619735cd-b9ef-11e5-8e8e-0242602a4b85 of framework 20160113-100358-33554442-5050-2351-0001 (marathon) at scheduler-6654c9ed-c68e-4531-894f-420a093fb578#10.0.0.2:42279 with resources cpus(*):0.5; mem(*):512; ports(*):[31482-31482] on slave 20160113-100358-33554442-5050-2351-S0 at slave(1)#10.0.0.3:5051 (development-7734-cb7.c.project-sample-1180.internal)
I0113 12:15:49.032171 6602 hierarchical.hpp:648] Recovered cpus(*):1.4; mem(*):5947; disk(*):4974; ports(*):[31553-32000, 31000-31481, 31483-31551] (total allocatable: cpus(*):1.4; mem(*):5947; disk(*):4974; ports(*):[31553-32000, 31000-31481, 31483-31551]) on slave 20160113-100358-33554442-5050-2351-S0 from framework 20160113-100358-33554442-5050-2351-0001
I0113 12:15:49.725064 6600 master.cpp:3300] Status update TASK_FAILED (UUID: 561c943e-2cae-4b9e-93e0-93fc2815bc2b) for task tomcat.619735cd-b9ef-11e5-8e8e-0242602a4b85 of framework 20160113-100358-33554442-5050-2351-0001 from slave 20160113-100358-33554442-5050-2351-S0 at slave(1)#10.0.0.3:5051 (development-7734-cb7.c.project-sample-1180.internal)
I0113 12:15:49.725160 6600 master.cpp:3341] Forwarding status update TASK_FAILED (UUID: 561c943e-2cae-4b9e-93e0-93fc2815bc2b) for task tomcat.619735cd-b9ef-11e5-8e8e-0242602a4b85 of framework 20160113-100358-33554442-5050-2351-0001
**I0113 12:15:49.725265 6600 master.cpp:4623] Updating the latest state of task tomcat.619735cd-b9ef-11e5-8e8e-0242602a4b85 of framework 20160113-100358-33554442-5050-2351-0001 to TASK_FAILED**
As you didn't specify what configuration steps you done to allow the usage of the private registry, one can only give you the link to the overall docs regarding this topic.
Have a look at
https://mesosphere.github.io/marathon/docs/native-docker-private-registry.html
Related
I have more than 50 datafusion pipelines running concurrently in an Enterprise istance of DataFusion.
About 4 of them randomly fail at each concurrent run, showing in the logs only the operation of provision followed by the deprovision of the Dataproc cluster, as in this log:
2021-04-29 12:52:49,936 - INFO [provisioning-service-4:i.c.c.r.s.p.d.DataprocProvisioner#203] - Creating Dataproc cluster cdap-fm-smartd-cc94285f-a8e9-11eb-9891-6ea1fb306892 in project project-test, in region europe-west2, with image 1.3, with system labels {goog-datafusion-version=6_1, cdap-version=6_1_4-1598048594947, goog-datafusion-edition=enterprise}
2021-04-29 12:56:08,527 - DEBUG [provisioning-service-1:i.c.c.i.p.t.ProvisioningTask#116] - Completed PROVISION task for program run program_run:default.[pipeline_name].-SNAPSHOT.workflow.DataPipelineWorkflow.cc94285f-a8e9-11eb-9891-6ea1fb306892.
2021-04-29 13:04:01,678 - DEBUG [provisioning-service-7:i.c.c.i.p.t.ProvisioningTask#116] - Completed DEPROVISION task for program run program_run:default.[pipeline_name].-SNAPSHOT.workflow.DataPipelineWorkflow.cc94285f-a8e9-11eb-9891-6ea1fb306892.
When a failed pipeline is restarted it completes the execution with success.
All the pipeline are started and monitored via Composer using async start and custom wait SensorOperator.
There is no warning of quota exceeded.
Additional info:
Data Fusion 6.1.4
with Dataporc ephemeral cluster with 1 master 2 workers. Image version 1.3.89
EDIT
The appfabric log releted to each failed pipeline are:
WARN [program.status:i.c.c.i.a.r.d.DistributedProgramRuntimeService#172] - Twill RunId does not exist for the program program:default.[pipeline_name].-SNAPSHOT.workflow.DataPipelineWorkflow, runId f34a6fb4-acb2-11eb-bbb2-26edc49aada0
WARN [pool-11-thread-1:i.c.c.i.a.s.RunRecordCorrectorService#141] - Fixed RunRecord for program run program_run:default.[piepleine_name].-SNAPSHOT.workflow.DataPipelineWorkflow.fdc22f56-acb2-11eb-bbcf-26edc49aada0 in STARTING state because it is actually not running
Further research connected somehow the problem to an inconsistent state in the CDAP run records, when many concurrent requests (via REST API) are made.
I have been following the documentation in every step, and I didn't face any errors. Configured, deployed and made a subscription to hello/world topic just as the documentation detailed. However, when I arrived at the testing step here: https://docs.aws.amazon.com/greengrass/latest/developerguide/lambda-check.html
No messages were showing up on the IoT console (subscription view hello/world)! I am using Greengrass core daemon which runs on my Ubuntu machine, it is active and listens to port 8000. I don't think there is anything wrong with my local device because the group was deployed successfully and because I see the communications going both ways on Wireshark.
I have these logs on my machine: /home/##/Desktop/greengrass/ggc/var/log/system/runtime.log:
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Version: 1.9.3-RC3
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Root: /home/##/Desktop/greengrass
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Write Directory: /home/##/Desktop/greengrass/ggc
[2019-09-28T06:57:42.492-07:00][INFO]-Group File Directory: /home/##/Desktop/greengrass/ggc/deployment/group
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda UID: 122
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda GID: 127
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.492-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.493-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.494-07:00][INFO]-No proxy URL found.
[2019-09-28T06:57:42.495-07:00][INFO]-Started Deployment Agent to listen for updates. [2019-09-28T06:57:42.495-07:00][INFO]-Connecting with MQTT. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.497-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection successful. {"attemptId": "GVko", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection established. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection connected. Start subscribing. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Deployment agent connected to cloud.
[2019-09-28T06:57:42.685-07:00][INFO]-Start subscribing. {"numOfTopics": 2, "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/update/delta
[2019-09-28T06:57:42.727-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/get/accepted
[2019-09-28T06:57:42.814-07:00][INFO]-All topics subscribed. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:58:57.888-07:00][INFO]-Daemon received signal: terminated. [2019-09-28T06:58:57.888-07:00][INFO]-Shutting down daemon.
[2019-09-28T06:58:57.888-07:00][INFO]-Stopping all workers.
[2019-09-28T06:58:57.888-07:00][INFO]-Lifecycle manager is stopped.
[2019-09-28T06:58:57.888-07:00][INFO]-IPC server stopped.
/home/##/Desktop/greengrass/ggc/var/log/system/localwatch/localwatch.log:
[2019-09-28T06:57:42.491-07:00][DEBUG]-will keep the log files for the following lambdas {"readingPath": "/home/##/Desktop/greengrass/ggc/var/log/user", "lambdas": "map[]"}
[2019-09-28T06:57:42.492-07:00][WARN]-failed to list the user log directory {"path": "/home/##/Desktop/greengrass/ggc/var/log/user"}
Thanks in advance.
I had a similar issue on another platform (Jetson Nano). I could not get a response after going through the AWS instructions for setting up a simple Lambda using IOT Greengrass. In my search for answers I discovered that AWS has a qualification test script for any device you connect.
It goes through an automated process of deploying and testing a lambda function(as well as other functionality) and reports results for each step and docs provide troubleshooting info for failures.
By going through those tests I was able to narrow down the issues with my setup, installation, and configuration. The testing docs give pointers to troubleshoot test results. Here is a link to the test: https://docs.aws.amazon.com/greengrass/latest/developerguide/device-tester-for-greengrass-ug.html
If you follow the 'Next Topic' links, it will take you through the complete test. Let me warn you that its extensive, and will take some time, but for me it gave a lot of detailed insight that a hello world does not.
when I ran the workflow manager getting the error message at add host to service bus farm.
We have the SharePoint as standalone, OS is Windows server 2012 r2
SQL server 2016 developer.
Followed below two url's for installing
https://collab365.community/configuring-sharepoint-2013-to-support-workflow-management/
https://www.c-sharpcorner.com/article/workflow-manager-configuration-for-sharepoint-server-2013/ unable to under stand the issue where exactly.
please find the below log file
[Verbose] [12/10/2018 4:43:54 PM]: Service Bus services starting.
[Progress] [12/10/2018 4:43:54 PM]: Service Bus services starting.
[Error] [12/10/2018 4:53:55 PM]: System.Management.Automation.CmdletInvocationException: Starting service Service Bus Message Broker failed: Time out has expired and the operation has not been completed. ---> Microsoft.ServiceBus.Commands.Common.Exceptions.OperationFailedException: Starting service Service Bus Message Broker failed: Time out has expired and the operation has not been completed.
at Microsoft.ServiceBus.Commands.Common.SCMHelper.StartService(String serviceName, Nullable1 waitTimeout, String hostName)
at Microsoft.ServiceBus.Commands.ServiceBusConfigHelper.StartSBServices(String hostName, Nullable1 waitTimeout)
at Microsoft.ServiceBus.Commands.AddSBHost.ProcessRecordImplementation()
--- End of inner exception stack trace ---
at System.Management.Automation.Runspaces.AsyncResult.EndInvoke()
at System.Management.Automation.PowerShell.EndInvoke(IAsyncResult asyncResult)
at Microsoft.Workflow.Deployment.ConfigWizard.CommandletHelper.InvokePowershell(Command command, Action`3 updateProgress)
at Microsoft.Workflow.Deployment.ConfigWizard.ProgressPageViewModel.AddSBNode(FarmCreationModel model, Boolean isFirstCommand)
please let me know how to resolve this issue for installing the workflowmanager.
what worked for me was enabling TLS 1.0 in the registry.
in my case I don't have registry of client but only enabled the server one
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0\Client]
"Enabled"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0\Server]
"Enabled"=dword:00000001
fyi... I was stopped the Service Bus Message Broker while the workflow manager configuration wizard was running in the "add host to service bus fam" task, then the changes the wizard complete successfully. I hope so much you can resolve this issue :)
this is the link where I fund the answers http://answersweb.azurewebsites.net/MVC/Post/Thread/e6667e72-36db-44d7-bcb9-0d537cd19542?category=workflow and is the CRBenson post, thank you very much
I had almost same issue. Installing the correct patch fixed the issue.
Complete details on below thread.
http://fixingsharepoint.blogspot.com/2021/02/service-bus-gateway-service-stuck-at.html
I created a DCOS setup on AWS using default config
I add two kafka brokers using CLI
(DCOS) PS C:\DCOS> dcos kafka broker list
brokers:
id: 1
active: false
state: stopped
resources: cpus:2.00, mem:8192, heap:1024, port:auto
failover: delay:1m, max-delay:10m
stickiness: period:10m, expires:2016-03-22 15:58:51-04
When I start broker I see that offer from master was declined
I0322 20:56:38.954476 1316 master.cpp:5350] Sending 2 offers to framework d8c03032-ebab-4c88-80cb-e2de92e3c4c4-0001 (kafka) at scheduler-fff6da19-e31e-4518-864e-2dfcdc31a5d2#10.0.3.104:53766
I0322 20:56:38.966846 1320 master.cpp:3673] Processing DECLINE call for offers: [ d8c03032-ebab-4c88-80cb-e2de92e3c4c4-O7389 ] for framework d8c03032-ebab-4c88-80cb-e2de92e3c4c4-0001 (kafka) at scheduler-fff6da19-e31e-4518-864e-2dfcdc31a5d2#10.0.3.104:53766
I0322 20:56:38.967591 1319 master.cpp:3673] Processing DECLINE call for offers: [ d8c03032-ebab-4c88-80cb-e2de92e3c4c4-O7390 ] for framework d8c03032-ebab-4c88-80cb-e2de92e3c4c4-0001 (kafka) at scheduler-fff6da19-e31e-4518-864e-2dfcdc31a5d2#10.0.3.104:53766
I0322 20:56:40.043771 1318 http.cpp:512] HTTP GET for /master/state-summary from 10.0.6.116:60000 with User-Agent='python-requests/2.6.0 CPython/3.4.2 Linux/4.1.7-coreos-r1'
I'm not able to find any relevant logs on the slaves to see what is going on.
/var/log/mesos has some files with no relevant info. As per the doc I should see syslogs in /var/log/messages but I don't see that file. The default config provisions CoreOS. I tried journalctl command but didn't find anything there too. Not sure how to debug this.
Tools:
Jenkins version 1.506
GitHub
GitHub SQS Plugin 1.4
Jenkins configured to consume the messages and GitHub for sending them over Amazon SQS (set the active key and secret key and queue name). Also configured for modules with "Build when a message is published to an SQS Queue"
The messages are sent by GitHub and consumed by Jenkins as warranted. I can see SQS Activity in Jenkins (see below) but for some reason Jenkins does not trigger the build.
I wonder what are we missing?
Last SQS Activity
Started on Mar 20, 2013 3:03:49 AM Using strategy: Default [poll] Last
Build : #16 [poll] Last Built Revision: Revision
408d9c4d6412e44737b62f25e9c36fc8b3b074ca (origin/maple-sprint-4)
Fetching changes from the remote Git repositories Fetching upstream
changes from origin Polling for changes in Done. Took 1.3 sec Changes
found
I had to set "Poll SCM" and set the period "* * * * *" that did the trick!