404 not found on running URL after bluemix deployment - cloud-foundry

I have written a sample MVC code using the Spring framework and I have deployed it in Bluemix.
When running the deployed URL, I am receiving the following error.
The application or context root for this request has not been found
What am i doing wrong ? Anything needed to be changed in web.xml?
Logs message
[AUDIT ] CWWKE0001I: The server defaultServer has been launched.
[AUDIT ] CWWKG0028A: Processing included configuration resource:
/home/vcap/app/wlp/usr/servers/defaultServer/runtime-vars.xml
[INFO ] CWWKE0002I: The kernel started after 10.005 seconds
[INFO ] CWWKF0007I: Feature update started.
[INFO ] CWWKO0219I: TCP Channel httpEndpoint-179 has been started
and is now listening for requests on host * (IPv6) port 61031.
[INFO ] CWWKO0219I: TCP Channel defaultHttpEndpoint has been
started and is now listening for requests on host localhost (IPv4:
127.0.0.1) port 9080.
[INFO ] CWSCX0122I: Register management Bean provider:
com.ibm.ws.cloudoe.management.client.provider.dump.JavaDumpBeanProvider#c68ae63e.
[INFO ] CWSCX0122I: Register management Bean provider:
com.ibm.ws.cloudoe.management.client.provider.logging.LibertyLoggingBeanProvider#f0d6d754.
[INFO ] SRVE0169I: Loading Web Module:
com.ibm.ws.cloudoe.management.client.liberty.connector.
[INFO ] SRVE0250I: Web Module
com.ibm.ws.cloudoe.management.client.liberty.connector has been bound
to default_host.
[AUDIT ] CWWKT0016I: Web application available (default_host):
http://localhost:9080/IBMMGMTRest/
[INFO ] CWWKZ0018I: Starting application myapp.
[INFO ] SRVE0169I: Loading Web Module: TaxBillReminder.
[INFO ] SRVE0250I: Web Module TaxBillReminder has been bound to
default_host.
[AUDIT ] CWWKT0016I: Web application available (default_host):
http://localhost:9080/
[AUDIT ] CWWKZ0001I: Application myapp started in 2.113 seconds.
[AUDIT ] CWWKF0012I: The server installed the following features:
[json-1.0, jpa-2.0, icap:managementConnector-1.0, beanValidation-1.0,
jdbc-4.0, managedBeans-1.0, jsf-2.0, jsp-2.2, servlet-3.0, jaxrs-1.1,
jndi-1.0, appState-1.0, ejbLite-3.1, cdi-1.0].
[INFO ] CWWKF0008I: Feature update completed in 9.472 seconds.
[AUDIT ] CWWKF0011I: The server defaultServer is ready to run a
smarter planet.
[INFO ] SESN8501I: The session manager did not find a persistent
storage location; HttpSession objects will be stored in the local
application server's memory.
[INFO ] SESN0176I: A new session context will be created for
application key default_host/
[INFO ] SESN0172I: The session manager is using the Java default
SecureRandom implementation for session ID generation.
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[INFO ] FFDC1015I: An FFDC Incident has been created:
"java.util.ServiceConfigurationError:
javax.servlet.ServletContainerInitializer: Provider
org.cloudfoundry.reconfiguration.spring.AutoReconfigurationServletContainerInitializer
could not be instantiated
com.ibm.ws.webcontainer.osgi.DynamicVirtualHost startWebApp" at
ffdc_15.05.22_06.28.59.0.log TaxBillReminder.mybluemix.net -
[22/05/2015:06:28:58 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.42:31418 x_forwarded_for:"-" vcap_request_id:430a380b-a68e-4123-6ff8-c87348c535a3
response_time:0.813611619 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:00 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.46:42514 x_forwarded_for:"-" vcap_request_id:c54dff7f-908f-4cc1-49d9-de6d8bd04fe7
response_time:0.127545436 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:01 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.43:29980 x_forwarded_for:"-" vcap_request_id:23bc66ac-c78e-42ab-5a07-60f99ffc492b
response_time:0.117255613 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". [WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:03 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.43:23392 x_forwarded_for:"-" vcap_request_id:c255a3fb-5eb1-44f5-4c08-b22222a4c8b7
response_time:0.111495485 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". TaxBillReminder.mybluemix.net - [22/05/2015:06:29:04 +0000] "GET
/ HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.46:41130
x_forwarded_for:"-"
vcap_request_id:0c009c84-f0c0-46e9-7b6d-da8e3ff91a55
response_time:0.115888617 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. [INFO ] SESN0175I: An existing session context
will be used for application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:05 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.46:52243 x_forwarded_for:"-" vcap_request_id:c4c29b52-ff3a-48b6-47e4-7e1fce0c3f74
response_time:0.187145593 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". TaxBillReminder.mybluemix.net - [22/05/2015:06:29:06 +0000] "GET
/ HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.42:11225
x_forwarded_for:"-"
vcap_request_id:54e0e021-826e-443b-6a7a-5f6bbc28a926
response_time:0.132534560 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp.
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". TaxBillReminder.mybluemix.net - [22/05/2015:06:29:08 +0000] "GET
/ HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.43:32255
x_forwarded_for:"-"
vcap_request_id:0ac50be0-e2e9-436c-4e97-d854f78e1f49
response_time:0.089186493 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp.
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". TaxBillReminder.mybluemix.net - [22/05/2015:06:29:09 +0000] "GET
/ HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.46:39103
x_forwarded_for:"-"
vcap_request_id:ddc4754a-cf0f-494c-78de-26fcd61ba1af
response_time:0.102293236 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp.
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". TaxBillReminder.mybluemix.net - [22/05/2015:06:29:10 +0000] "GET
/ HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.42:30749
x_forwarded_for:"-"
vcap_request_id:fa6ba947-4b8c-474b-4b48-ace26fc3274e
response_time:0.091226461 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp.
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". TaxBillReminder.mybluemix.net - [22/05/2015:06:29:11 +0000] "GET
/ HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.46:46353
x_forwarded_for:"-"
vcap_request_id:dfc99308-11c0-4ea7-48ca-b4061b3b4c6f
response_time:0.096913693 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp.
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:12 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.46:57429 x_forwarded_for:"-" vcap_request_id:4f7e9876-cf5d-46c2-6cb1-19f00329e029
response_time:0.100562784 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:13 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.43:52701 x_forwarded_for:"-" vcap_request_id:fd13c364-d65a-4ca6-66b1-9bc49c1ea427
response_time:0.098537113 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:15 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.42:10951 x_forwarded_for:"-" vcap_request_id:883eb6fc-cdb4-45c6-41f6-cc65970ef256
response_time:0.095498510 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:16 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.42:30830 x_forwarded_for:"-" vcap_request_id:fc251ebf-da3a-48ae-4312-5218bd83808b
response_time:0.134904531 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15". TaxBillReminder.mybluemix.net - [22/05/2015:06:29:17 +0000] "GET
/ HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.42:54827
x_forwarded_for:"-"
vcap_request_id:e09e1926-860b-481e-4b48-ed5a66330580
response_time:0.084558083 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. [INFO ] SESN0175I: An existing session context
will be used for application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:18 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.42:31009 x_forwarded_for:"-" vcap_request_id:a9c3a69f-ae27-4c72-7422-608fe01451fd
response_time:0.092770319 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:19 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.46:55458 x_forwarded_for:"-" vcap_request_id:20ebe389-2371-455a-5832-71c85f48c46d
response_time:0.083255059 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:21 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.46:44171 x_forwarded_for:"-" vcap_request_id:14081f78-3959-462f-5602-dd474718094c
response_time:0.104446356 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. TaxBillReminder.mybluemix.net -
[22/05/2015:06:29:22 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0"
75.126.70.43:21091 x_forwarded_for:"-" vcap_request_id:930a620b-e6a2-4bdb-6b72-36c072eea29b
response_time:0.100104583 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. taxbillreminder.mybluemix.net -
[22/05/2015:06:29:23 +0000] "GET / HTTP/1.1" 404 217 "-" "Mozilla/5.0
(compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)"
75.126.70.43:45588 x_forwarded_for:"-" vcap_request_id:cd805473-5b36-423c-441f-4a013e0c91c3
response_time:0.092833842 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
[INFO ] SESN0175I: An existing session context will be used for
application key default_host/
[INFO ] JSPG8502I: The value of the JSP attribute jdkSourceLevel is
"15".
[WARNING ] SRVE0274W: Error while adding servlet mapping for
path-->/forms/, wrapper-->ServletWrapper[dispatcher:[/forms/]],
application-->myapp. taxbillreminder.mybluemix.net -
[22/05/2015:06:30:31 +0000] "GET / HTTP/1.1" 404 217 "-" "Mozilla/5.0
(Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
75.126.70.43:54400 x_forwarded_for:"-" vcap_request_id:7ca062d7-13ff-4ae2-5441-265d3c2194b5
response_time:0.424214609 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211

You need to look at the context-root or contextRoot that might be defined in your server.xml or web.xml. If there is no context-root or contextRoot defined then the name of the liberty application is used, see here for the rules. The route to your app running on liberty will normally be something like this:
http://your_bluemix_app.mybluemix.net/the_liberty_app_name
The deployed url that you see Bluemix report is the base url for the application which in this case is a liberty server, so you need to append your context-root (or liberty app name) for your app to it.
You can imagine that you can push 2 or more liberty apps packaged in one liberty server to Bluemix. In this case you have one Bluemix app with 2 web applications running within it that can be accessed like this:
http://your_bluemix_app.mybluemix.net/the_liberty_app_name_1
http://your_bluemix_app.mybluemix.net/the_liberty_app_name_2

I had similar issue. The solution described at http://developer.ibm.com/answers/answers/185697/view.html worked for me.
Looks like the application failed to initialize because of the following:
[INFO ] FFDC1015I: An FFDC Incident has been created: "java.util.ServiceConfigurationError: javax.servlet.ServletContainerInitializer: Provider org.cloudfoundry.reconfiguration.spring.AutoReconfigurationServletContainerInitializer could not be instantiated com.ibm.ws.webcontainer.osgi.DynamicVirtualHost startWebApp" at ffdc_15.05.22_06.28.59.0.log TaxBillReminder.mybluemix.net - [22/05/2015:06:28:58 +0000] "GET / HTTP/1.1" 404 217 "-" "Java/1.8.0" 75.126.70.42:31418 x_forwarded_for:"-" vcap_request_id:430a380b-a68e-4123-6ff8-c87348c535a3 response_time:0.813611619 app_id:70683a0f-06f4-4ad9-93b7-b37dc8241211
Your application is spring and the autoconfiguration is causing problems.
With the latest Liberty buildpack, you can set JBP_CONFIG_SPRINGAUTORECONFIGURATION environment variable to '[enabled: false]' to disable Spring auto-reconfiguration. I think in your case the Spring auto-reconfiguration bit is the cause of this problem. Using the cf client execute and then restage your application:
$ cf set-env myApplication JBP_CONFIG_SPRINGAUTORECONFIGURATION '[enabled: false]'

Related

How to deploy image classifier with resnet50 model on AWS endpoint to predict without worker dying?

Created a imageclassifier model built on renet50 to identify dog breeds. I created it in sagemaker studio. Tuning and training are done, I deployed it, but when I try to predict on it, it fails. I believe this is related to the pid of the worker because its first warning I see.
Getting following Cloudwatch log output says worker pid not available yet then soon after the worker dies.
timestamp,message,logStreamName
1648240674535,"2022-03-25 20:37:54,107 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...",AllTraffic/i-055c5d00e53e84b93
1648240674535,"2022-03-25 20:37:54,188 [INFO ] main org.pytorch.serve.ModelServer - ",AllTraffic/i-055c5d00e53e84b93
1648240674535,Torchserve version: 0.4.0,AllTraffic/i-055c5d00e53e84b93
1648240674535,TS Home: /opt/conda/lib/python3.6/site-packages,AllTraffic/i-055c5d00e53e84b93
1648240674535,Current directory: /,AllTraffic/i-055c5d00e53e84b93
1648240674535,Temp directory: /home/model-server/tmp,AllTraffic/i-055c5d00e53e84b93
1648240674535,Number of GPUs: 0,AllTraffic/i-055c5d00e53e84b93
1648240674535,Number of CPUs: 1,AllTraffic/i-055c5d00e53e84b93
1648240674535,Max heap size: 6838 M,AllTraffic/i-055c5d00e53e84b93
1648240674535,Python executable: /opt/conda/bin/python3.6,AllTraffic/i-055c5d00e53e84b93
1648240674535,Config file: /etc/sagemaker-ts.properties,AllTraffic/i-055c5d00e53e84b93
1648240674535,Inference address: http://0.0.0.0:8080,AllTraffic/i-055c5d00e53e84b93
1648240674535,Management address: http://0.0.0.0:8080,AllTraffic/i-055c5d00e53e84b93
1648240674535,Metrics address: http://127.0.0.1:8082,AllTraffic/i-055c5d00e53e84b93
1648240674535,Model Store: /.sagemaker/ts/models,AllTraffic/i-055c5d00e53e84b93
1648240674535,Initial Models: model.mar,AllTraffic/i-055c5d00e53e84b93
1648240674535,Log dir: /logs,AllTraffic/i-055c5d00e53e84b93
1648240674535,Metrics dir: /logs,AllTraffic/i-055c5d00e53e84b93
1648240674535,Netty threads: 0,AllTraffic/i-055c5d00e53e84b93
1648240674535,Netty client threads: 0,AllTraffic/i-055c5d00e53e84b93
1648240674535,Default workers per model: 1,AllTraffic/i-055c5d00e53e84b93
1648240674535,Blacklist Regex: N/A,AllTraffic/i-055c5d00e53e84b93
1648240674535,Maximum Response Size: 6553500,AllTraffic/i-055c5d00e53e84b93
1648240674536,Maximum Request Size: 6553500,AllTraffic/i-055c5d00e53e84b93
1648240674536,Prefer direct buffer: false,AllTraffic/i-055c5d00e53e84b93
1648240674536,Allowed Urls: [file://.*|http(s)?://.*],AllTraffic/i-055c5d00e53e84b93
1648240674536,Custom python dependency for model allowed: false,AllTraffic/i-055c5d00e53e84b93
1648240674536,Metrics report format: prometheus,AllTraffic/i-055c5d00e53e84b93
1648240674536,Enable metrics API: true,AllTraffic/i-055c5d00e53e84b93
1648240674536,Workflow Store: /.sagemaker/ts/models,AllTraffic/i-055c5d00e53e84b93
1648240674536,"2022-03-25 20:37:54,195 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...",AllTraffic/i-055c5d00e53e84b93
1648240675536,"2022-03-25 20:37:54,217 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: model.mar",AllTraffic/i-055c5d00e53e84b93
1648240675536,"2022-03-25 20:37:55,505 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model model loaded.",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,515 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,569 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,569 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,569 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082",AllTraffic/i-055c5d00e53e84b93
1648240675786,Model server started.,AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,727 [WARN ] pool-2-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,812 [INFO ] pool-2-thread-1 TS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,813 [INFO ] pool-2-thread-1 TS_METRICS - DiskAvailable.Gigabytes:38.02598190307617|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,813 [INFO ] pool-2-thread-1 TS_METRICS - DiskUsage.Gigabytes:12.715518951416016|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,814 [INFO ] pool-2-thread-1 TS_METRICS - DiskUtilization.Percent:25.1|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,815 [INFO ] pool-2-thread-1 TS_METRICS - MemoryAvailable.Megabytes:29583.98046875|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,815 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUsed.Megabytes:1355.765625|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,816 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUtilization.Percent:5.7|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - [PID]48",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - Torch worker started.",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - Python runtime: 3.6.13",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,999 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,006 [INFO ] W-9000-model_1-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - Backend worker process died.",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - Traceback (most recent call last):",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 182, in <module>",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - worker.run_server()",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 154, in run_server",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - self.handle_connection(cl_socket)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 116, in handle_connection",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - service, result, code = self.load_model(msg)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 89, in load_model",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_loader.py"", line 110, in load",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - initialize_fn(service.context)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/home/model-server/tmp/models/23b30361031647d08792d32672910688/handler_service.py"", line 51, in initialize",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout MODEL_LOG - super().initialize(context)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/sagemaker_inference/default_handler_service.py"", line 66, in initialize",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240676536,"2022-03-25 20:37:56,114 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.",AllTraffic/i-055c5d00e53e84b93
1648240676536,"2022-03-25 20:37:56,416 [INFO ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240676536,"2022-03-25 20:37:56,461 [INFO ] W-9000-model_1 ACCESS_LOG - /169.254.178.2:39848 ""GET /ping HTTP/1.1"" 200 9",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:56,461 [INFO ] W-9000-model_1 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:container-0.local,timestamp:null",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,567 [INFO ] W-9000-model_1-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1-stdout MODEL_LOG - [PID]86",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1-stdout MODEL_LOG - Torch worker started.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1-stdout MODEL_LOG - Python runtime: 3.6.13",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,569 [INFO ] W-9000-model_1-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] W-9000-model_1-stdout MODEL_LOG - Backend worker process died.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] W-9000-model_1-stdout MODEL_LOG - Traceback (most recent call last):",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 182, in <module>",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - worker.run_server()",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 154, in run_server",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - self.handle_connection(cl_socket)",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 116, in handle_connection",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - service, result, code = self.load_model(msg)",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 89, in load_model",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240678037,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.",AllTraffic/i-055c5d00e53e84b93
1648240679288,"2022-03-25 20:37:57,991 [INFO ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240679288,"2022-03-25 20:37:59,096 [INFO ] W-9000-model_1-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240679288,"2022-03-25 20:37:59,097 [INFO ] W-9000-model_1-stdout MODEL_LOG - [PID]114",AllTraffic/i-055c5d00e53e84b93
Model tuning and training came out alright so I'm not sure why it won't predict if that is fine. Someone mentioned to me that it might be due to entry point script, but I don't know what would cause it fail in predicting after deployed if it can predict fine during training.
Entry point script:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
import json
import copy
import argparse
import os
import logging
import sys
from tqdm import tqdm
from PIL import ImageFile
import smdebug.pytorch as smd
ImageFile.LOAD_TRUNCATED_IMAGES = True
logger=logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))
def test(model, test_loader, criterion, hook):
model.eval()
running_loss=0
running_corrects=0
hook.set_mode(smd.modes.EVAL)
for inputs, labels in test_loader:
outputs=model(inputs)
loss=criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
##total_loss = running_loss // len(test_loader)
##total_acc = running_corrects.double() // len(test_loader)
##logger.info(f"Testing Loss: {total_loss}")
##logger.info(f"Testing Accuracy: {total_acc}")
logger.info("New test acc")
logger.info(f'Test set: Accuracy: {running_corrects}/{len(test_loader.dataset)} = {100*(running_corrects/len(test_loader.dataset))}%)')
def train(model, train_loader, validation_loader, criterion, optimizer, hook):
epochs=50
best_loss=1e6
image_dataset={'train':train_loader, 'valid':validation_loader}
loss_counter=0
hook.set_mode(smd.modes.TRAIN)
for epoch in range(epochs):
logger.info(f"Epoch: {epoch}")
for phase in ['train', 'valid']:
if phase=='train':
model.train()
logger.info("Model Trained")
else:
model.eval()
running_loss = 0.0
running_corrects = 0
for inputs, labels in image_dataset[phase]:
outputs = model(inputs)
loss = criterion(outputs, labels)
if phase=='train':
optimizer.zero_grad()
loss.backward()
optimizer.step()
logger.info("Model Optimized")
_, preds = torch.max(outputs, 1)
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss // len(image_dataset[phase])
epoch_acc = running_corrects // len(image_dataset[phase])
if phase=='valid':
logger.info("Model Validating")
if epoch_loss<best_loss:
best_loss=epoch_loss
else:
loss_counter+=1
logger.info(loss_counter)
'''logger.info('{} loss: {:.4f}, acc: {:.4f}, best loss: {:.4f}'.format(phase,
epoch_loss,
epoch_acc,
best_loss))'''
if phase=="train":
logger.info("New epoch acc for Train:")
logger.info(f"Epoch {epoch}: Loss {loss_counter/len(train_loader.dataset)}, Accuracy {100*(running_corrects/len(train_loader.dataset))}%")
if phase=="valid":
logger.info("New epoch acc for Valid:")
logger.info(f"Epoch {epoch}: Loss {loss_counter/len(train_loader.dataset)}, Accuracy {100*(running_corrects/len(train_loader.dataset))}%")
##if loss_counter==1:
## break
##if epoch==0:
## break
return model
def net():
model = models.resnet50(pretrained=True)
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Sequential(
nn.Linear(2048, 128),
nn.ReLU(inplace=True),
nn.Linear(128, 133))
return model
def create_data_loaders(data, batch_size):
train_data_path = os.path.join(data, 'train')
test_data_path = os.path.join(data, 'test')
validation_data_path=os.path.join(data, 'valid')
train_transform = transforms.Compose([
transforms.RandomResizedCrop((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
])
test_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
train_data = torchvision.datasets.ImageFolder(root=train_data_path, transform=train_transform)
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_data = torchvision.datasets.ImageFolder(root=test_data_path, transform=test_transform)
test_data_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, shuffle=True)
validation_data = torchvision.datasets.ImageFolder(root=validation_data_path, transform=test_transform)
validation_data_loader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size, shuffle=True)
return train_data_loader, test_data_loader, validation_data_loader
def main(args):
logger.info(f'Hyperparameters are LR: {args.lr}, Batch Size: {args.batch_size}')
logger.info(f'Data Paths: {args.data}')
train_loader, test_loader, validation_loader=create_data_loaders(args.data, args.batch_size)
model=net()
hook = smd.Hook.create_from_json_file()
hook.register_hook(model)
criterion = nn.CrossEntropyLoss(ignore_index=133)
optimizer = optim.Adam(model.fc.parameters(), lr=args.lr)
logger.info("Starting Model Training")
model=train(model, train_loader, validation_loader, criterion, optimizer, hook)
logger.info("Testing Model")
test(model, test_loader, criterion, hook)
logger.info("Saving Model")
torch.save(model.cpu().state_dict(), os.path.join(args.model_dir, "model.pth"))
if __name__=='__main__':
parser=argparse.ArgumentParser()
'''
TODO: Specify any training args that you might need
'''
parser.add_argument(
"--batch-size",
type=int,
default=64,
metavar="N",
help="input batch size for training (default: 64)",
)
parser.add_argument(
"--test-batch-size",
type=int,
default=1000,
metavar="N",
help="input batch size for testing (default: 1000)",
)
parser.add_argument(
"--epochs",
type=int,
default=5,
metavar="N",
help="number of epochs to train (default: 10)",
)
parser.add_argument(
"--lr", type=float, default=0.01, metavar="LR", help="learning rate (default: 0.01)"
)
parser.add_argument(
"--momentum", type=float, default=0.5, metavar="M", help="SGD momentum (default: 0.5)"
)
# Container environment
parser.add_argument("--hosts", type=list, default=json.loads(os.environ["SM_HOSTS"]))
parser.add_argument("--current-host", type=str, default=os.environ["SM_CURRENT_HOST"])
parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
parser.add_argument("--data", type=str, default=os.environ["SM_CHANNEL_TRAINING"])
parser.add_argument("--num-gpus", type=int, default=os.environ["SM_NUM_GPUS"])
args=parser.parse_args()
main(args)
To test the model on the endpoint I sent over an image using the following code:
from sagemaker.serializers import IdentitySerializer
import base64
predictor.serializer = IdentitySerializer("image/png")
with open("Akita_00282.jpg", "rb") as f:
payload = f.read()
response = predictor.predict(payload)```
The model serving workers are either dying because they cannot load your model or deserialize the payload you are sending to them.
Note that you have to provide a model_fn implementation. Please read these docs here or this blog here to know more about how to adapt the inference scripts for SageMaker deployment. If you do not want to override the input_fn, predict_fn, and/or output_fn handlers, you can find their default implementations, for example, here.

Reference: jfrog artifactory could not validate router error

I have tried everyone's suggestions and I still get a failure. This is on a new installation of artifactory: jfrog-artifactory-oss-7.4.1-linux.tar.gz. This is on a local CentOS VM.
2020-04-18T11:53:25.305Z [jfrt ] [INFO ] [a88f4f6ce96d65bb] [o.j.c.ExecutionUtils:141 ] [pool-13-thread-1 ] - Cluster join: Retry 5: Service registry ping failed, will retry. Error while trying to connect to local router at address ‘http://localhost:8046/access’: Connect to localhost:8046 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
hostname -i
172.16.217.147
more /etc/hosts
127.0.0.1 centos7 centos7.example.com localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.217.147 artifactory-master
system.yaml
shared:
node:
ip: 172.16.217.147
This is from access-service.log:
2020-04-18T11:52:19.789Z [jfac ] [INFO ] [7fbbd46f40602f6b] [o.j.a.s.r.s.GrpcServerImpl:65 ] [ocalhost-startStop-2] - Starting gRPC Server on port 8045
2020-04-18T11:52:20.072Z [jfac ] [INFO ] [7fbbd46f40602f6b] [o.j.a.s.r.s.GrpcServerImpl:84 ] [ocalhost-startStop-2] - gRPC Server started, listening on 8045
2020-04-18T11:52:21.995Z [jfac ] [INFO ] [7fbbd46f40602f6b] [o.j.a.AccessApplication:59 ] [ocalhost-startStop-2] - Started AccessApplication in 11.711 seconds (JVM running for 13.514)
2020-04-18T11:52:29.093Z [jfac ] [WARN ] [7b2c676f76c7ef43] [o.j.c.ExecutionUtils:141 ] [pool-6-thread-2 ] - Retry 20 Elapsed 9.54 secs failed: Registration with router on URL http://localhost:8046 failed with error: UNAVAILABLE: io exception. Trying again
2020-04-18T11:52:34.119Z [jfac ] [WARN ] [7b2c676f76c7ef43] [o.j.c.ExecutionUtils:141 ] [pool-6-thread-2 ] - Retry 30 Elapsed 14.57 secs failed: Registration with router on URL http://localhost:8046 failed with error: UNAVAILABLE: io exception. Trying again

ELB health check failing

an instance was taken out of service in response to a ELB system
health check failure.
I hit the health check endpoint with my browser and it returns fine, but I'm getting the above message.
How can I debug this?
I've looked at instant settings => Get System Logs and nginx logs,
edit
nginx has
- [27/Mar/2020:05:35:42 +0000] "GET /littlehome/heartbeat/ HTTP/1.1" 200 2 2.920 2.920 "-" "ELB-HealthChecker/2.0"
- [27/Mar/2020:05:35:42 +0000] "GET /littlehome/heartbeat/ HTTP/1.1" 200 2 2.858 2.856 "-" "ELB-HealthChecker/2.0"
it returned 200 for sure..
and still aws think it received 502
{
"Target": {
"Id": "i-085e8dffe8781f876",
"Port": 80
},
"HealthCheckPort": "80",
"TargetHealth": {
"State": "unhealthy",
"Reason": "Target.ResponseCodeMismatch",
"Description": "Health checks failed with these codes: [502]"
}
},
Based on the comments, the issue was that grace period in Auto Scaling Group was too short. The solution was to increase it.

Servicing concurrent JAX-RS requests with WebLogic 12.2.1

I wrote a JAX-RS web service method to run on WebLogic 12.2.1, to test how many concurrent requests it can handle. I purposely make the method take 5 minutes to execute.
#Singleton
#Path("Services")
#ApplicationPath("resources")
public class Services extends Application {
private static int count = 0;
private static synchronized int addCount(int a) {
count = count + a;
return count;
}
#GET
#Path("Ping")
public Response ping(#Context HttpServletRequest request) {
int c = addCount(1);
logger.log(INFO, "Method entered, total running requests: [{0}]", c);
try {
Thread.sleep(300000);
} catch (InterruptedException exception) {
}
c = addCount(-1);
logger.log(INFO, "Exiting method, total running requests: [{0}]", c);
return Response.ok().build();
}
}
I also wrote a stand-alone client program to send 500 concurrent requests to this service. The client uses one thread for each request.
From what I understand, WebLogic has a default maximum of 400 threads, which means that it can handle 400 requests concurrently. This figure is confirmed with my test result below. As you can see, within the first 5 minutes, starting from 10:46:31, only 400 requests were been serviced.
23/08/2016 10:46:31.393 [132] [INFO] [Services.ping] - Method entered, total running requests: [1]
23/08/2016 10:46:31.471 [204] [INFO] [Services.ping] - Method entered, total running requests: [2]
23/08/2016 10:46:31.471 [66] [INFO] [Services.ping] - Method entered, total running requests: [3]
23/08/2016 10:46:31.471 [210] [INFO] [Services.ping] - Method entered, total running requests: [4]
23/08/2016 10:46:31.471 [206] [INFO] [Services.ping] - Method entered, total running requests: [5]
23/08/2016 10:46:31.487 [207] [INFO] [Services.ping] - Method entered, total running requests: [6]
23/08/2016 10:46:31.487 [211] [INFO] [Services.ping] - Method entered, total running requests: [7]
23/08/2016 10:46:31.487 [267] [INFO] [Services.ping] - Method entered, total running requests: [8]
23/08/2016 10:46:31.487 [131] [INFO] [Services.ping] - Method entered, total running requests: [9]
23/08/2016 10:46:31.502 [65] [INFO] [Services.ping] - Method entered, total running requests: [10]
23/08/2016 10:46:31.518 [265] [INFO] [Services.ping] - Method entered, total running requests: [11]
23/08/2016 10:46:31.565 [266] [INFO] [Services.ping] - Method entered, total running requests: [12]
23/08/2016 10:46:35.690 [215] [INFO] [Services.ping] - Method entered, total running requests: [13]
23/08/2016 10:46:35.690 [269] [INFO] [Services.ping] - Method entered, total running requests: [14]
23/08/2016 10:46:35.690 [268] [INFO] [Services.ping] - Method entered, total running requests: [15]
23/08/2016 10:46:35.690 [214] [INFO] [Services.ping] - Method entered, total running requests: [16]
23/08/2016 10:46:35.690 [80] [INFO] [Services.ping] - Method entered, total running requests: [17]
23/08/2016 10:46:35.690 [79] [INFO] [Services.ping] - Method entered, total running requests: [18]
23/08/2016 10:46:35.690 [152] [INFO] [Services.ping] - Method entered, total running requests: [19]
23/08/2016 10:46:37.674 [158] [INFO] [Services.ping] - Method entered, total running requests: [20]
23/08/2016 10:46:37.674 [155] [INFO] [Services.ping] - Method entered, total running requests: [21]
23/08/2016 10:46:39.674 [163] [INFO] [Services.ping] - Method entered, total running requests: [22]
23/08/2016 10:46:39.705 [165] [INFO] [Services.ping] - Method entered, total running requests: [23]
23/08/2016 10:46:39.705 [82] [INFO] [Services.ping] - Method entered, total running requests: [24]
23/08/2016 10:46:39.705 [166] [INFO] [Services.ping] - Method entered, total running requests: [25]
23/08/2016 10:46:41.690 [84] [INFO] [Services.ping] - Method entered, total running requests: [26]
23/08/2016 10:46:41.690 [160] [INFO] [Services.ping] - Method entered, total running requests: [27]
23/08/2016 10:46:43.690 [226] [INFO] [Services.ping] - Method entered, total running requests: [28]
23/08/2016 10:46:43.705 [162] [INFO] [Services.ping] - Method entered, total running requests: [29]
....
....
23/08/2016 10:50:52.008 [445] [INFO] [Services.ping] - Method entered, total running requests: [398]
23/08/2016 10:50:52.008 [446] [INFO] [Services.ping] - Method entered, total running requests: [399]
23/08/2016 10:50:54.008 [447] [INFO] [Services.ping] - Method entered, total running requests: [400]
23/08/2016 10:51:31.397 [132] [INFO] [Services.ping] - Exiting method, total running requests: [399]
23/08/2016 10:51:31.475 [207] [INFO] [Services.ping] - Exiting method, total running requests: [398]
23/08/2016 10:51:31.475 [207] [INFO] [Services.ping] - Method entered, total running requests: [399]
....
....
But what I don't understand is how come the first 400 requests were not serviced at the same time by the service method? As you can see from the test result, the first request was serviced at 10:46:31.393, but the 400th request was serviced at 10:50:54.008, which is more than 4 minutes later.
If we look at access.log, we can see that all 500 requests were received by WebLogic between 10:46:31 and 10:46:35. So it seems that even though WebLogic received the requests with a very short period of time, it doesn't allocate a thread and call the service method that fast.
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:31 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
....
....
10.204.133.176 - - [23/Aug/2016:10:46:35 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:35 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:35 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
10.204.133.176 - - [23/Aug/2016:10:46:35 +0800] "GET /Test/Services/Ping HTTP/1.1" 200 0
EDITED
Added work manager to define a minimum of 400 threads.
weblogic.xml
<wls:work-manager>
<wls:name>HighPriorityWorkManager</wls:name>
<wls:fair-share-request-class>
<wls:name>HighPriority</wls:name>
<wls:fair-share>100</wls:fair-share>
</wls:fair-share-request-class>
<wls:min-threads-constraint>
<wls:name>MinThreadsCount</wls:name>
<wls:count>400</wls:count>
</wls:min-threads-constraint>
</wls:work-manager>
web.xml
<init-param>
<param-name>wl-dispatch-policy</param-name>
<param-value>HighPriorityWorkManager</param-value>
</init-param>
That's how weblogic scales threadpools (they are "self-tuning"), it does not start 400 Threads immediately. It's more a slow increase of threads to maximize throughput.
https://docs.oracle.com/cd/E24329_01/web.1211/e24432/self_tuned.htm#CNFGD113

nginx, gunicorn and django timing out

I'm so confused!
I set everything up, my site was working for two days, and then suddenly today it stops working.
The only thing I changed was yesterday I was trying to serve PHP files so I installed PHP and uwsgi. It was late and I didn't realize what I was doing. It was from this website: http://uwsgi-docs.readthedocs.org/en/latest/PHP.html
# Add ppa with libphp5-embed package
sudo add-apt-repository ppa:l-mierzwa/lucid-php5
# Update to use package from ppa
sudo apt-get update
# Install needed dependencies
sudo apt-get install php5-dev libphp5-embed libonig-dev libqdbm-dev
# Compile uWSGI PHP plugin
python uwsgiconfig --plugin plugins/php
But didn't change any settings. Even after doing that, everything was still fine. However the next day, my site just doesn't load.
I tried a few things which didn't work. In my settings:
ALLOWED_HOSTS = ['*']
In my gunicorn.sh, I set TIMEOUT=60. However, when I try to access my site (lewischi.com), nothing even happens. But when I go to http://127.0.0.1:8000, I do see workers doing stuff and get a 404 error.
Using the URLconf defined in django_project.urls,
Django tried these URL patterns, in this order:
I'm not sure what's going on! nginx-error log isn't very helpful but the access log seems more useful.
From my nginx-access.log (it works, then stops working):
50.156.86.221 - - [25/Sep/2015:00:25:43 -0700] "GET /codeWindow.html
HTTP/1.1" 200 2081 "http://lewischi.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"
50.156.86.221 - - [25/Sep/2015:00:25:58 -0700] "GET /test.jpg HTTP/1.1"
404 208 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"
192.168.2.6 - - [25/Sep/2015:16:42:19 -0700] "GET / HTTP/1.1" 200 9596 "-" "-"
192.168.2.6 - - [25/Sep/2015:17:24:44 -0700] "GET / HTTP/1.1" 200 9596 "-" "-"
192.168.2.6 - - [25/Sep/2015:23:28:51 -0700] "GET / HTTP/1.1" 200 9596 "-" "-"
192.168.2.6 - - [25/Sep/2015:23:29:02 -0700] "GET / HTTP/1.1" 200 9596 "-" "-"
From my supervisor log file:
supervisor: couldn't exec /home/lewischi/projects/active/django_project/gunicorn.sh: ENOEXEC
supervisor: child process was not spawned
ANY HELP would be greatly appreciated!!!! I feel like I should just uninstall uwsgi. I don't want to break anything so I'm asking for advice before I go messing things up.
I'm pretty new to this so I may be overlooking something obvious. My gunicorn debug mode output:
“Starting ”djangotut” as lewischi”
[2015-09-26 17:50:28 +0000] [2316] [DEBUG] Current configuration:
proxy_protocol: False
worker_connections: 1000
statsd_host: None
max_requests_jitter: 0
post_fork: <function post_fork at 0x7faf049ec848>
pythonpath: None
enable_stdio_inheritance: False
worker_class: sync
ssl_version: 3
suppress_ragged_eofs: True
syslog: False
syslog_facility: user
when_ready: <function when_ready at 0x7faf049ec578>
pre_fork: <function pre_fork at 0x7faf049ec6e0>
cert_reqs: 0
preload_app: False
keepalive: 2
accesslog: None
group: 1000
graceful_timeout: 30
do_handshake_on_connect: False
spew: False
workers: 3
proc_name: ”djangotut”
sendfile: True
pidfile: None
umask: 0
on_reload: <function on_reload at 0x7faf049ec410>
pre_exec: <function pre_exec at 0x7faf049ecde8>
worker_tmp_dir: None
post_worker_init: <function post_worker_init at 0x7faf049ec9b0>
limit_request_fields: 100
on_exit: <function on_exit at 0x7faf049f2500>
config: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
proxy_allow_ips: ['127.0.0.1']
pre_request: <function pre_request at 0x7faf049ecf50>
post_request: <function post_request at 0x7faf049f20c8>
user: 1000
forwarded_allow_ips: ['127.0.0.1']
worker_int: <function worker_int at 0x7faf049ecb18>
threads: 1
max_requests: 1
limit_request_line: 4094
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
certfile: None
worker_exit: <function worker_exit at 0x7faf049f2230>
chdir: /home/lewischi/projects/active/django_project
paste: None
default_proc_name: django_project.wsgi:application
errorlog: -
loglevel: DEBUG
logconfig: None
syslog_addr: udp://localhost:514
syslog_prefix: None
daemon: False
ciphers: TLSv1
on_starting: <function on_starting at 0x7faf049ec2a8>
worker_abort: <function worker_abort at 0x7faf049ecc80>
bind: ['0.0.0.0:8000']
raw_env: []
reload: False
check_config: False
limit_request_field_size: 8190
nworkers_changed: <function nworkers_changed at 0x7faf049f2398>
timeout: 60
ca_certs: None
django_settings: None
tmp_upload_dir: None
keyfile: None
backlog: 2048
logger_class: gunicorn.glogging.Logger
statsd_prefix:
[2015-09-26 17:50:28 +0000] [2316] [INFO] Starting gunicorn 19.3.0
[2015-09-26 17:50:28 +0000] [2316] [DEBUG] Arbiter booted
[2015-09-26 17:50:28 +0000] [2316] [INFO] Listening at: http://0.0.0.0:8000 (2316)
[2015-09-26 17:50:28 +0000] [2316] [INFO] Using worker: sync
[2015-09-26 17:50:28 +0000] [2327] [INFO] Booting worker with pid: 2327
[2015-09-26 17:50:28 +0000] [2328] [INFO] Booting worker with pid: 2328
[2015-09-26 17:50:28 +0000] [2329] [INFO] Booting worker with pid: 2329
[2015-09-26 17:50:29 +0000] [2316] [DEBUG] 3 workers
[2015-09-26 17:50:30 +0000] [2316] [DEBUG] 3 workers
The problem is not with supervisord itself, few things to consider when dealing with Nginx, Gunicorn and Django in general:
Make sure the user running the app process(minimum 1 user non root not including users created by default for e.g: Nginx, Postgresql. Changes with the stack) has the right permissions and ownership to achieve it's goals.
When adding another app to your stack, you should first check the port it runs on by default, and change it to prevent port conflicts, keep in mind the difference between internal and external ports since you use Nginx as a proxy to Gunicorn(this is what causes most timeouts, happened to me several times at late night work), you can use Nginx as a proxy server and create many apps with different unique internal port for each app.
With the error log you provided for supervisor, it seems you're running your gunicorn.sh either with a user that doesn't have enough permissions or ownership, or executing with a wrong command.
Please provide the supervisor config file relevant to your app.
Update: seems his ip address changed.
Ah never mind. Thanks for your time.
It turned out that my ip address somehow changed which should not have happened.... Rookie mistake.